bio-gemma-wrapper 0.92.2 → 0.99.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: a5750dc833b64764d55eaec57f9ce981302659b9
4
- data.tar.gz: f8ebc55f80ca39d84f03f4bae4385212aa5e35f7
2
+ SHA256:
3
+ metadata.gz: 9ddfd904e74beebe0de1b97732d872fce171732965a835b101b9cc9be815bb05
4
+ data.tar.gz: 2dae1c019da23f2f87216694d641fc1eb852aa7800557bd10cfb08cb3425e844
5
5
  SHA512:
6
- metadata.gz: 634361cf98042f4653d3ea4bd19d883ec08f7a1edff5f7c44935b4ef58d0b17d007451a23f7304ae29dd08c477549459b3d0cb2bcbda3060a6641d9ad665d012
7
- data.tar.gz: 55f75d9a5208595e5f3ff1dffd530f17f4fdb3040c2311ea993d69b1c21001673d3825015d5e4ec327d450225470d078b46f2db84a364d27bdbcbd5852741404
6
+ metadata.gz: 38454a3f12dab85bef711051e73e20a015fe6b6d9c71bafada2197b9aef1aa0eabe3f3709cb0dc9d0c39f4cc454c15bc4d3aea5d06140ccde72fa13aa6285f51
7
+ data.tar.gz: 28e77a6995893245c501e602d488b5e0c504549fa91d8c94f902591b87b4454fe9b7923667dfacae2ab1dac7f6f7d814df1ec036b2b4f616dfd4b84c549d35d1
data/README.md CHANGED
@@ -1,10 +1,19 @@
1
- # GEMMA wrapper caches K between runs with LOCO support
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
+
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
2
4
 
3
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
4
6
  Nat. Genet., 2016)](cfw.gif)
5
7
 
6
8
  ## Introduction
7
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
12
+ is used to run GEMMA as part of the https://genenetwork.org/
13
+ environment.
14
+
15
+ Note that gemma-wrapper is projected to be integrated into gemma2/lib.
16
+
8
17
  GEMMA is a software toolkit for fast application of linear mixed
9
18
  models (LMMs) and related models to genome-wide association studies
10
19
  (GWAS) and other large-scale data sets.
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
12
21
  This repository contains gemma-wrapper, essentially a wrapper of
13
22
  GEMMA that provides support for caching the kinship or relatedness
14
23
  matrix (K) and caching LM and LMM computations with the option of full
15
- leave-one-chromosome-out genome scans (LOCO).
24
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
25
+ submitted to HPC PBS, i.e., slurm.
16
26
 
17
27
  gemma-wrapper requires a recent version of GEMMA and essentially
18
28
  does a pass-through of all standard GEMMA invocation switches. On
19
29
  return gemma-wrapper can return a JSON object (--json) which is
20
30
  useful for web-services.
21
31
 
22
- Note that this a work in progress (WIP). What is described below
23
- should work.
24
-
25
32
  ## Installation
26
33
 
27
34
  Prerequisites are
@@ -30,8 +37,9 @@ Prerequisites are
30
37
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
31
38
  almost all Linux systems
32
39
 
33
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
34
- can be installed with
40
+ gemma-wrapper comes as a Ruby
41
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
42
+ installed with
35
43
 
36
44
  gem install bio-gemma-wrapper
37
45
 
@@ -39,15 +47,18 @@ Invoke the tool with
39
47
 
40
48
  gemma-wrapper --help
41
49
 
42
- and it will render
50
+ and it will render something like
43
51
 
44
52
  ```
45
53
  Usage: gemma-wrapper [options] -- [gemma-options]
54
+ --permutate n Permutate # times by shuffling phenotypes
55
+ --permute-phenotypes filen Phenotypes to be shuffled in permutations
46
56
  --loco [x,y,1,2,3...] Run full LOCO
47
57
  --input filen JSON input variables (used for LOCO)
48
58
  --cache-dir path Use a cache directory
49
59
  --json Create output file in JSON format
50
60
  --force Force computation
61
+ --slurm [options] Submit to slurm PBS
51
62
  --q, --quiet Run quietly
52
63
  -v, --verbose Run verbosely
53
64
  --debug Show debug messages and keep intermediate output
@@ -65,6 +76,8 @@ Unpack it and run the tool as
65
76
 
66
77
  ./bin/gemma-wrapper --help
67
78
 
79
+ See below for using a GNU Guix environment.
80
+
68
81
  ## Usage
69
82
 
70
83
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -91,11 +104,12 @@ the data files are found):
91
104
 
92
105
  Run it twice to see
93
106
 
94
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
107
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
95
108
 
96
109
  gemma-wrapper computes the unique HASH value over the command
97
110
  line switches passed into GEMMA as well as the contents of the files
98
- passed in (here the genotype and phenotype files).
111
+ passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
112
+ GEMMA always computes the same K).
99
113
 
100
114
  You can also get JSON output on STDOUT by providing the --json switch
101
115
 
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
103
117
  -g test/data/input/BXD_geno.txt.gz \
104
118
  -p test/data/input/BXD_pheno.txt \
105
119
  -gk \
106
- -debug
120
+ -debug > K.json
107
121
 
108
- prints out something that can be parsed with a calling program
122
+ K.json is something that can be parsed with a calling program, and is
123
+ also below as input for the GWA step. Example:
109
124
 
110
125
  ```json
111
126
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
123
138
 
124
139
  will store K in ~/.gemma-cache.
125
140
 
141
+ ### GWA
142
+
143
+ Run the LMM using the K's captured earlier in K.json using the --input
144
+ switch
145
+
146
+ gemma-wrapper --json --loco --input K.json -- \
147
+ -g test/data/input/BXD_geno.txt.gz \
148
+ -p test/data/input/BXD_pheno.txt \
149
+ -c test/data/input/BXD_covariates2.txt \
150
+ -a test/data/input/BXD_snps.txt \
151
+ -lmm 2 -maf 0.1 \
152
+ -debug > GWA.json
153
+
154
+ Running it twice should show that GWA is not recomputed.
155
+
156
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
157
+
126
158
  ### LOCO
127
159
 
128
160
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
158
190
  The -k switch is injected automatically. Again output switches are not
159
191
  allowed (-o, -outdir)
160
192
 
193
+ ### Permutations
194
+
195
+ Permutations can be run with and without LOCO. First create K
196
+
197
+ gemma-wrapper --json -- \
198
+ -g test/data/input/BXD_geno.txt.gz \
199
+ -p test/data/input/BXD_pheno.txt \
200
+ -gk \
201
+ -debug > K.json
202
+
203
+ Next, using K.json, permute the phenotypes with something like
204
+
205
+ gemma-wrapper --json --loco --input K.json \
206
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
207
+ -g test/data/input/BXD_geno.txt.gz \
208
+ -p test/data/input/BXD_pheno.txt \
209
+ -c test/data/input/BXD_covariates2.txt \
210
+ -a test/data/input/BXD_snps.txt \
211
+ -lmm 2 -maf 0.1 \
212
+ -debug > GWA.json
213
+
214
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
215
+
216
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
217
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
218
+
219
+ ### Slurm PBS
220
+
221
+ To run gemma-wrapper on HPC use the '--slurm' switch.
222
+
223
+ ## Development
224
+
225
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
226
+
227
+ ```
228
+ source .guix-deploy
229
+ ruby bin/gemma-wrapper --help
230
+ ```
231
+
161
232
  ## Copyright
162
233
 
163
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
234
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.92.2
1
+ 0.99.1
data/bin/gemma-wrapper CHANGED
@@ -4,9 +4,10 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
- USAGE = "GEMMA wrapper example:
9
+ USAGE = "
10
+ GEMMA wrapper example:
10
11
 
11
12
  Simple caching of K computation with
12
13
 
@@ -34,9 +35,13 @@ USAGE = "GEMMA wrapper example:
34
35
  -lmm 2 -maf 0.1 \\
35
36
  -debug > GWA.json
36
37
 
38
+ Gemma gets used from the path. You can override by setting
39
+
40
+ env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
37
41
  "
38
- GEMMA_V_MAJOR = 97
39
- GEMMA_V_MINOR = 2
42
+ # These are used for testing compatibility with the gemma tool
43
+ GEMMA_V_MAJOR = 98
44
+ GEMMA_V_MINOR = 1
40
45
 
41
46
  basepath = File.dirname(File.dirname(__FILE__))
42
47
  $: << File.join(basepath,'lib')
@@ -59,8 +64,11 @@ if not gemma_command
59
64
  end
60
65
  end
61
66
 
67
+
68
+ require 'digest/sha1'
62
69
  require 'fileutils'
63
70
  require 'optparse'
71
+ require 'tempfile'
64
72
  require 'tmpdir'
65
73
 
66
74
  split_at = ARGV.index('--')
@@ -68,12 +76,22 @@ if split_at
68
76
  gemma_args = ARGV[split_at+1..-1]
69
77
  end
70
78
 
71
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
79
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
72
80
 
73
81
  opts = OptionParser.new do |o|
74
- o.banner = "Usage: #{File.basename($0)} [options] -- [gemma-options]"
82
+ o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
83
+
84
+ o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
85
+ options[:permutate] = lst
86
+ options[:force] = true
87
+ end
88
+
89
+ o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
90
+ options[:permute_phenotypes] = phenotypes
91
+ raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
+ end
75
93
 
76
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
94
+ o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
77
95
  options[:loco] = lst
78
96
  end
79
97
 
@@ -90,10 +108,22 @@ opts = OptionParser.new do |o|
90
108
  options[:json] = b
91
109
  end
92
110
 
93
- o.on("--force", "Force computation") do |q|
111
+ o.on("--force", "Force computation (override cache)") do |q|
94
112
  options[:force] = true
95
113
  end
96
114
 
115
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
116
+ options[:parallel] = false
117
+ end
118
+
119
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
120
+ options[:slurm_opts] = ""
121
+ options[:slurm] = true
122
+ if slurm
123
+ options[:slurm_opts] = slurm
124
+ end
125
+ end
126
+
97
127
  o.on("--q", "--quiet", "Run quietly") do |q|
98
128
  options[:quiet] = true
99
129
  end
@@ -102,15 +132,20 @@ opts = OptionParser.new do |o|
102
132
  options[:verbose] = true
103
133
  end
104
134
 
105
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
135
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
106
136
  options[:debug] = true
107
137
  end
108
138
 
139
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
140
+ options[:dry_run] = b
141
+ end
142
+
109
143
  o.on('--','Anything after gets passed to GEMMA') do
110
144
  o.terminate()
111
145
  end
112
146
 
113
147
  o.separator ""
148
+
114
149
  o.on_tail('-h', '--help', 'display this help and exit') do
115
150
  options[:show_help] = true
116
151
  end
@@ -129,6 +164,7 @@ json_out = lambda do
129
164
  print record.to_json if options[:json]
130
165
  end
131
166
 
167
+ # ---- Some error handlers
132
168
  error = lambda do |*msg|
133
169
  if options[:json]
134
170
  record[:error] = *msg.join(" ")
@@ -137,12 +173,14 @@ error = lambda do |*msg|
137
173
  end
138
174
  raise *msg
139
175
  end
176
+
140
177
  debug = lambda do |*msg|
141
178
  if options[:debug]
142
179
  record[:debug].push *msg.join("") if options[:json]
143
180
  OUTPUT.print "DEBUG: ",*msg,"\n"
144
181
  end
145
182
  end
183
+
146
184
  warning = lambda do |*msg|
147
185
  record[:warnings].push *msg.join("")
148
186
  OUTPUT.print "WARNING: ",*msg,"\n"
@@ -152,18 +190,32 @@ info = lambda do |*msg|
152
190
  OUTPUT.print *msg,"\n" if !options[:quiet]
153
191
  end
154
192
 
193
+ # ---- Start banner
194
+
155
195
  GEMMA_K_VERSION=version
156
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017\n"
196
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
157
197
  info.call GEMMA_K_BANNER
158
198
 
159
199
  # Check gemma version
160
200
  GEMMA_COMMAND=options[:gemma_command]
161
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/Version/)[0].strip
162
- info.call "Using GEMMA ",gemma_version_header,"\n"
201
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
202
+
203
+ begin
204
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
205
+ rescue Errno::ENOENT
206
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
207
+ error.call "<#{GEMMA_COMMAND}> command not found"
208
+ end
209
+
210
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
211
+ info.call "Using ",gemma_version_header,"\n"
163
212
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
164
213
  v_version, v_major, v_minor = gemma_version.split(".")
214
+ info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
215
+
216
+ info.call gemma_version_header
165
217
 
166
- error.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor == nil or v_minor.to_i < GEMMA_V_MINOR))
218
+ warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
167
219
 
168
220
  options[:gemma_version_header] = gemma_version_header
169
221
  options[:gemma_version] = gemma_version
@@ -178,25 +230,82 @@ if RUBY_VERSION =~ /^1/
178
230
  warning "runs on Ruby 2.x only\n"
179
231
  end
180
232
 
233
+ debug.call(options) # some debug output
234
+ debug.call(record)
235
+
236
+ DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
237
+ DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
238
+
239
+ # ---- Set up parallel
240
+ if options[:parallel]
241
+ begin
242
+ PARALLEL_INFO = `parallel --help`
243
+ rescue Errno::ENOENT
244
+ error.call "<parallel> command not found"
245
+ end
246
+ parallel_cmds = []
247
+ end
248
+
181
249
  # ---- Compute HASH on inputs
182
250
  hashme = []
183
251
  geno_idx = gemma_args.index '-g'
184
- raise "Expected GEMMA -g switch" if geno_idx == nil
185
- hashme = gemma_args
252
+ raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
253
+ pheno_idx = gemma_args.index '-p'
186
254
 
187
- require 'digest/sha1'
188
- debug.call "Hashing on ",hashme,"\n"
189
- hashes = []
190
- hashme.each do | item |
191
- if File.exist?(item)
192
- hashes << Digest::SHA1.hexdigest(File.read(item))
193
- debug.call [item,hashes.last]
255
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
256
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
257
+ end
258
+
259
+
260
+ execute = lambda { |cmd|
261
+ info.call("Executing: #{cmd}")
262
+ err = 0
263
+ if not options[:debug]
264
+ # send output to stderr line by line
265
+ IO.popen("#{cmd}") do |io|
266
+ while s = io.gets
267
+ $stderr.print s
268
+ end
269
+ io.close
270
+ err = $?.to_i
271
+ end
194
272
  else
195
- hashes << item
273
+ $stderr.print `#{cmd}`
274
+ err = $?.to_i
196
275
  end
276
+ err
277
+ }
278
+
279
+ hashme =
280
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
281
+ # Remove the phenotype file from the hash for GRM computation
282
+ gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
283
+ else
284
+ gemma_args
285
+ end
286
+
287
+ compute_hash = lambda do | phenofn = nil |
288
+ # Compute a HASH on the inputs
289
+ debug.call "Hashing on ",hashme,"\n"
290
+ hashes = []
291
+ hm = if phenofn
292
+ hashme + ["-p", phenofn]
293
+ else
294
+ hashme
295
+ end
296
+ debug.call(hm)
297
+ hm.each do | item |
298
+ if File.file?(item)
299
+ hashes << Digest::SHA1.hexdigest(File.read(item))
300
+ debug.call [item,hashes.last]
301
+ else
302
+ hashes << item
303
+ end
304
+ end
305
+ Digest::SHA1.hexdigest hashes.join(' ')
197
306
  end
198
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
199
307
 
308
+ HASH = compute_hash.call()
200
309
  options[:hash] = HASH
201
310
 
202
311
  # Create cache dir
@@ -210,26 +319,49 @@ GEMMA_ARGS = gemma_args
210
319
 
211
320
  debug.call "Options: ",options,"\n" if !options[:quiet]
212
321
 
213
- invoke_gemma = lambda do |extra_args, cache_hit = false|
214
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
322
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
323
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
215
324
  record[:gemma_command] = cmd
216
325
  return if cache_hit
217
- # debug.call cmd
326
+ if options[:slurm]
327
+ info.call cmd
328
+ hashi = HASH
329
+ prefix = options[:cache_dir]+'/'+hashi
330
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
331
+ script = "#!/bin/bash
332
+ #SBATCH --job-name=gemma-#{scriptfn}
333
+ #SBATCH --ntasks=1
334
+ #SBATCH --time=20:00
335
+ srun #{cmd}
336
+ "
337
+ debug.call(script)
338
+ File.open(scriptfn,"w") { |f|
339
+ f.write(script)
340
+ }
341
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
342
+ end
218
343
  errno =
219
344
  if options[:json]
220
345
  # capture output
221
346
  err = 0
222
- IO.popen(cmd) do |io|
223
- while s = io.gets
224
- $stderr.print s
225
- end
226
- io.close
227
- err = $?.to_i
347
+ if options[:dry_run]
348
+ info.call("Would have invoked: ",cmd)
349
+ elsif options[:parallel]
350
+ info.call("Add parallel job: ",cmd)
351
+ parallel_cmds << cmd
352
+ else
353
+ err = execute.call(cmd)
228
354
  end
229
355
  err
230
356
  else
231
- system(cmd)
232
- $?.exitstatus
357
+ if options[:dry_run]
358
+ info.call("Would have invoked ",cmd)
359
+ 0
360
+ else
361
+ debug.call("Invoking ",cmd) if options[:debug]
362
+ system(cmd)
363
+ $?.exitstatus
364
+ end
233
365
  end
234
366
  if errno != 0
235
367
  debug.call "Gemma exit ",errno
@@ -240,10 +372,12 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
240
372
  end
241
373
 
242
374
  # returns datafn, logfn, cache_hit
243
- cache = lambda do | chr, ext |
375
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
244
376
  inject = (chr==nil ? "" : ".#{chr}" )+ext
245
- hashi = HASH+inject
246
- prefix = options[:cache_dir]+'/'+hashi
377
+ hashi = (chr==nil ? h : h+inject)
378
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
379
+ # for chr 3 and permutation 1 forms something like
380
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
247
381
  logfn = prefix+".log.txt"
248
382
  datafn = prefix+ext
249
383
  record[:files] ||= []
@@ -260,6 +394,7 @@ cache = lambda do | chr, ext |
260
394
  return hashi,false
261
395
  end
262
396
 
397
+ # ---- Compute K
263
398
  kinship = lambda do | chr = nil |
264
399
  record[:type] = "K"
265
400
  ext = case (GEMMA_ARGS[GEMMA_ARGS.index('-gk')+1]).to_i
@@ -277,21 +412,23 @@ kinship = lambda do | chr = nil |
277
412
  end
278
413
  end
279
414
 
280
- gwas = lambda do | chr, kfn |
415
+ # ---- Run GWA
416
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
281
417
  record[:type] = "GWA"
282
- error.call "Do not use the GEMMA -k switch!" if GEMMA_ARGS.include? '-k'
283
- hashi, cache_hit = cache.call chr,".assoc.txt"
418
+ error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
419
+ # Update hash for each permutation
420
+ hash = compute_hash.call(pfn)
421
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
284
422
  if not cache_hit
285
- if chr != nil
286
- invoke_gemma.call [ '-loco', chr, '-k', kfn, '-o', hashi ]
287
- else
288
- error.call "Not supported"
289
- end
423
+ args = [ '-k', kfn, '-o', hashi ]
424
+ args << [ '-loco', chr ] if chr != nil
425
+ args << [ '-p', pfn ] if pfn
426
+ invoke_gemma.call args,false,chr,permutation
290
427
  end
291
428
  end
292
429
 
293
430
  LOCO = options[:loco]
294
- if GEMMA_ARGS.include? '-gk'
431
+ if DO_COMPUTE_KINSHIP
295
432
  # compute K
296
433
  info.call LOCO
297
434
  if LOCO != nil
@@ -303,14 +440,80 @@ if GEMMA_ARGS.include? '-gk'
303
440
  kinship.call # no LOCO
304
441
  end
305
442
  else
306
- # GWAS
443
+ # DO_COMPUTE_GWA
307
444
  json_in = JSON.parse(File.read(options[:input]))
308
445
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
446
+
447
+ pfn = options[:permute_phenotypes] # can be nil
309
448
  k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
310
- k_files.each do | chr, kfn |
311
- gwas.call(chr,kfn)
449
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
450
+ gwas.call(chr,kfn,pfn)
451
+ end
452
+ # Permute
453
+ if options[:permutate]
454
+ ps = []
455
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
456
+ File.foreach(pfn).with_index do |line, line_num|
457
+ ps << line
458
+ end
459
+ score_list = []
460
+ debug.call(options[:permutate],"x permutations")
461
+ (1..options[:permutate]).each do |permutation|
462
+ $stderr.print "Iteration ",permutation,"\n"
463
+ # Create a shuffled phenotype file
464
+ file = File.open("phenotypes-#{permutation}","w")
465
+ tmp_pfn = file.path
466
+ p tmp_pfn
467
+ ps.shuffle.each do | l |
468
+ file.print(l)
469
+ end
470
+ file.close
471
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
472
+ gwas.call(chr,kfn,tmp_pfn,permutation)
473
+ end
474
+ score_min = 1000.0
475
+ if false and not options[:slurm]
476
+ # p [:HEY,record[:files].last]
477
+ assocfn = record[:files].last[2]
478
+ debug.call("Reading ",assocfn)
479
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
480
+ if assoc_line_num > 0
481
+ value = assoc.strip.split(/\t/).last.to_f
482
+ score_min = value if value < score_min
483
+ end
484
+ end
485
+ end
486
+ score_list << score_min
487
+ end
488
+ exit 0 if options[:slurm]
489
+ ls = score_list.sort
490
+ p ls
491
+ significant = ls[(ls.size - ls.size*0.95).floor]
492
+ suggestive = ls[(ls.size - ls.size*0.67).floor]
493
+ p ["95 percentile (significant) ",significant,(-Math.log10(significant)).round(1)]
494
+ p ["67 percentile (suggestive) ",suggestive,(-Math.log10(suggestive)).round(1)]
495
+ exit 0
312
496
  end
313
497
  end
314
498
 
499
+ # ---- Invoke parallel
500
+ if options[:parallel]
501
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
502
+ cmd = parallel_cmds.join("\\n")
503
+
504
+ cmd = "echo -e \"#{cmd}\""
505
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
506
+ if err != 0
507
+ [16,8,4,1].each do |jobs|
508
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
509
+ err = execute.call(cmd+"|parallel -j #{jobs}")
510
+ break if err == 0
511
+ end
512
+ if err != 0
513
+ info.call("Run failed!")
514
+ exit err
515
+ end
516
+ end
517
+ info.call("Run successful!")
518
+ end
315
519
  json_out.call
316
- exit 0
@@ -1,8 +1,8 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
- s.summary = "Cache GEMMA with LOCO"
5
- s.description = "GEMMA wrapper caches K between runs with LOCO support"
4
+ s.summary = "GEMMA with LOCO and permutations"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,16 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.92.2
4
+ version: 0.99.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-09-03 00:00:00.000000000 Z
11
+ date: 2021-07-11 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper caches K between runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
+ runs with LOCO support
14
15
  email: pjotr.public01@thebird.nl
15
16
  executables:
16
17
  - gemma-wrapper
@@ -43,8 +44,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
44
  version: '0'
44
45
  requirements: []
45
46
  rubyforge_project:
46
- rubygems_version: 2.5.1
47
+ rubygems_version: 2.7.6.2
47
48
  signing_key:
48
49
  specification_version: 4
49
- summary: Cache GEMMA with LOCO
50
+ summary: GEMMA with LOCO and permutations
50
51
  test_files: []