bio-gemma-wrapper 0.98 → 0.99.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: d8f36b92e82dda9e1e592724204521f9f4f1a950
4
- data.tar.gz: 077abf1c1a704ab93a27627d511aeba0dbf6b4c5
2
+ SHA256:
3
+ metadata.gz: 0bd37b153e121de9c1758af736cd6904744da2de3540f2a7c547cc423382d8d1
4
+ data.tar.gz: 84298a943e7cfe6126653895d9714babc83a7be2bf903c7b61ff9072f1d4e4a8
5
5
  SHA512:
6
- metadata.gz: b49bb2c9362eb7babd2cf20f640fc7806719a8c82e31f93a97b4517d5c2777bd4ae55b9e5a1fd37df5f22e63e2673fa01ca4afbc4e07de6955391d77bd7f90fb
7
- data.tar.gz: 863ef847e4ddff6b544a733e001bcd2291c73ab394573a299d54a8f2581066fb423b40be052ce6a440c1daffbd6ddb7500c5f01b7ce2de0e38e4277c8ae9e53e
6
+ metadata.gz: f32d48ec2f194a513e0cf8f15463b05662e1b269fdd22a110e4cdfb9f6bc541238bac7e766cbc31a8b782379891636e4f79fb3e3667d567ab3c610298d4f11c2
7
+ data.tar.gz: abc9b3faf8ef2f63d566caa14312bde2b2f438c722a83e90f7da775c55e2415171efbfab385273c82e45398c31e5306fb99437c38e287570652a3b04a5432883
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
- [![Gem Version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
2
 
3
- # GEMMA wrapper caches K between runs with LOCO support
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
4
4
 
5
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
6
6
  Nat. Genet., 2016)](cfw.gif)
7
7
 
8
8
  ## Introduction
9
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default with LOCO), and GEMMA on
12
+ PBS. Gemma-wrapper is used to run GEMMA as part of the
13
+ https://genenetwork.org/ environment.
14
+
15
+ Note that a version of gemma-wrapper is projected to be integrated
16
+ into gemma itself.
17
+
10
18
  GEMMA is a software toolkit for fast application of linear mixed
11
19
  models (LMMs) and related models to genome-wide association studies
12
20
  (GWAS) and other large-scale data sets.
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
14
22
  This repository contains gemma-wrapper, essentially a wrapper of
15
23
  GEMMA that provides support for caching the kinship or relatedness
16
24
  matrix (K) and caching LM and LMM computations with the option of full
17
- leave-one-chromosome-out genome scans (LOCO).
25
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
26
+ submitted to HPC PBS, i.e., slurm.
18
27
 
19
28
  gemma-wrapper requires a recent version of GEMMA and essentially
20
29
  does a pass-through of all standard GEMMA invocation switches. On
21
30
  return gemma-wrapper can return a JSON object (--json) which is
22
31
  useful for web-services.
23
32
 
24
- Note that this a work in progress (WIP). What is described below
25
- should work.
33
+ ## Performance
34
+
35
+ LOCO runs in parallel by default which is at least a 5x performance
36
+ improvement on a machine with enough cores. GEMMA without LOCO,
37
+ however, does not run in parallel by default. Performance
38
+ improvements with the parallel implementation for LOCO and non-LOCO
39
+ can be viewed [here](./test/performance/releases.gmi).
26
40
 
27
41
  ## Installation
28
42
 
@@ -32,8 +46,9 @@ Prerequisites are
32
46
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
33
47
  almost all Linux systems
34
48
 
35
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
36
- can be installed with
49
+ gemma-wrapper comes as a Ruby
50
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
51
+ installed with
37
52
 
38
53
  gem install bio-gemma-wrapper
39
54
 
@@ -47,14 +62,19 @@ and it will render something like
47
62
  Usage: gemma-wrapper [options] -- [gemma-options]
48
63
  --permutate n Permutate # times by shuffling phenotypes
49
64
  --permute-phenotypes filen Phenotypes to be shuffled in permutations
50
- --loco [x,y,1,2,3...] Run full LOCO
65
+ --loco Run full leave-one-chromosome-out (LOCO)
66
+ --chromosomes [1,2,3] Run specific chromosomes
51
67
  --input filen JSON input variables (used for LOCO)
52
68
  --cache-dir path Use a cache directory
53
69
  --json Create output file in JSON format
54
- --force Force computation
70
+ --force Force computation (override cache)
71
+ --parallel Run jobs in parallel
72
+ --no-parallel Do not run jobs in parallel
73
+ --slurm[=opts] Use slurm PBS for submitting jobs
55
74
  --q, --quiet Run quietly
56
75
  -v, --verbose Run verbosely
57
- --debug Show debug messages and keep intermediate output
76
+ -d, --debug Show debug messages and keep intermediate output
77
+ --dry-run Show commands, but don't execute
58
78
  -- Anything after gets passed to GEMMA
59
79
 
60
80
  -h, --help display this help and exit
@@ -69,6 +89,8 @@ Unpack it and run the tool as
69
89
 
70
90
  ./bin/gemma-wrapper --help
71
91
 
92
+ See below for using a GNU Guix environment.
93
+
72
94
  ## Usage
73
95
 
74
96
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -90,12 +112,13 @@ the data files are found):
90
112
  gemma-wrapper -- \
91
113
  -g test/data/input/BXD_geno.txt.gz \
92
114
  -p test/data/input/BXD_pheno.txt \
115
+ -a test/data/input/BXD_snps.txt \
93
116
  -gk \
94
117
  -debug
95
118
 
96
119
  Run it twice to see
97
120
 
98
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
121
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
99
122
 
100
123
  gemma-wrapper computes the unique HASH value over the command
101
124
  line switches passed into GEMMA as well as the contents of the files
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
107
130
  gemma-wrapper --json -- \
108
131
  -g test/data/input/BXD_geno.txt.gz \
109
132
  -p test/data/input/BXD_pheno.txt \
133
+ -a test/data/input/BXD_snps.txt \
110
134
  -gk \
111
- -debug
135
+ -debug > K.json
112
136
 
113
- prints out something that can be parsed with a calling program
137
+ K.json is something that can be parsed with a calling program, and is
138
+ also below as input for the GWA step. Example:
114
139
 
115
140
  ```json
116
141
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,11 +148,29 @@ default. If you want something else provide a --cache-dir, e.g.
123
148
  gemma-wrapper --cache-dir ~/.gemma-cache -- \
124
149
  -g test/data/input/BXD_geno.txt.gz \
125
150
  -p test/data/input/BXD_pheno.txt \
151
+ -a test/data/input/BXD_snps.txt \
126
152
  -gk \
127
153
  -debug
128
154
 
129
155
  will store K in ~/.gemma-cache.
130
156
 
157
+ ### GWA
158
+
159
+ Run the LMM using the K's captured earlier in K.json using the --input
160
+ switch
161
+
162
+ gemma-wrapper --json --input K.json -- \
163
+ -g test/data/input/BXD_geno.txt.gz \
164
+ -p test/data/input/BXD_pheno.txt \
165
+ -c test/data/input/BXD_covariates2.txt \
166
+ -a test/data/input/BXD_snps.txt \
167
+ -lmm 2 -maf 0.1 \
168
+ -debug > GWA.json
169
+
170
+ Running it twice should show that GWA is not recomputed.
171
+
172
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
173
+
131
174
  ### LOCO
132
175
 
133
176
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -136,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
136
179
  chromosomes first create all K's with
137
180
 
138
181
  gemma-wrapper --json \
139
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \
182
+ --loco -- \
140
183
  -g test/data/input/BXD_geno.txt.gz \
141
184
  -p test/data/input/BXD_pheno.txt \
142
185
  -a test/data/input/BXD_snps.txt \
@@ -163,6 +206,45 @@ GWA.json contains the file names of every chromosome
163
206
  The -k switch is injected automatically. Again output switches are not
164
207
  allowed (-o, -outdir)
165
208
 
209
+ ### Permutations
210
+
211
+ Permutations can be run with and without LOCO. First create K
212
+
213
+ gemma-wrapper --json -- \
214
+ -g test/data/input/BXD_geno.txt.gz \
215
+ -p test/data/input/BXD_pheno.txt \
216
+ -gk \
217
+ -debug > K.json
218
+
219
+ Next, using K.json, permute the phenotypes with something like
220
+
221
+ gemma-wrapper --json --loco --input K.json \
222
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
223
+ -g test/data/input/BXD_geno.txt.gz \
224
+ -p test/data/input/BXD_pheno.txt \
225
+ -c test/data/input/BXD_covariates2.txt \
226
+ -a test/data/input/BXD_snps.txt \
227
+ -lmm 2 -maf 0.1 \
228
+ -debug > GWA.json
229
+
230
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
231
+
232
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
233
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
234
+
235
+ ### Slurm PBS
236
+
237
+ To run gemma-wrapper on HPC use the '--slurm' switch.
238
+
239
+ ## Development
240
+
241
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
242
+
243
+ ```
244
+ source .guix-deploy
245
+ ruby bin/gemma-wrapper --help
246
+ ```
247
+
166
248
  ## Copyright
167
249
 
168
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
250
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.98
1
+ 0.99.3
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -14,12 +14,12 @@ GEMMA wrapper example:
14
14
  gemma-wrapper -- \\
15
15
  -g test/data/input/BXD_geno.txt.gz \\
16
16
  -p test/data/input/BXD_pheno.txt \\
17
+ -a test/data/input/BXD_snps.txt \
17
18
  -gk
18
19
 
19
20
  LOCO K computation with caching and JSON output
20
21
 
21
- gemma-wrapper --json \\
22
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
22
+ gemma-wrapper --json --loco -- \\
23
23
  -g test/data/input/BXD_geno.txt.gz \\
24
24
  -p test/data/input/BXD_pheno.txt \\
25
25
  -a test/data/input/BXD_snps.txt \\
@@ -38,11 +38,10 @@ GEMMA wrapper example:
38
38
  Gemma gets used from the path. You can override by setting
39
39
 
40
40
  env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
41
-
42
41
  "
43
- # These are used for testing compatibility
42
+ # These are used for testing compatibility with the gemma tool
44
43
  GEMMA_V_MAJOR = 98
45
- GEMMA_V_MINOR = 0
44
+ GEMMA_V_MINOR = 4
46
45
 
47
46
  basepath = File.dirname(File.dirname(__FILE__))
48
47
  $: << File.join(basepath,'lib')
@@ -66,17 +65,19 @@ if not gemma_command
66
65
  end
67
66
 
68
67
 
68
+ require 'digest/sha1'
69
69
  require 'fileutils'
70
70
  require 'optparse'
71
- require 'tmpdir'
72
71
  require 'tempfile'
72
+ require 'tmpdir'
73
73
 
74
74
  split_at = ARGV.index('--')
75
+
75
76
  if split_at
76
77
  gemma_args = ARGV[split_at+1..-1]
77
78
  end
78
79
 
79
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
80
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
80
81
 
81
82
  opts = OptionParser.new do |o|
82
83
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
@@ -91,8 +92,12 @@ opts = OptionParser.new do |o|
91
92
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
93
  end
93
94
 
94
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
95
- options[:loco] = lst
95
+ o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
96
+ options[:loco] = b
97
+ end
98
+
99
+ o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
100
+ options[:chromosomes] = lst
96
101
  end
97
102
 
98
103
  o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
@@ -112,6 +117,22 @@ opts = OptionParser.new do |o|
112
117
  options[:force] = true
113
118
  end
114
119
 
120
+ o.on("--parallel", "Run jobs in parallel") do |b|
121
+ options[:parallel] = true
122
+ end
123
+
124
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
125
+ options[:parallel] = false
126
+ end
127
+
128
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
129
+ options[:slurm_opts] = ""
130
+ options[:slurm] = true
131
+ if slurm
132
+ options[:slurm_opts] = slurm
133
+ end
134
+ end
135
+
115
136
  o.on("--q", "--quiet", "Run quietly") do |q|
116
137
  options[:quiet] = true
117
138
  end
@@ -120,15 +141,20 @@ opts = OptionParser.new do |o|
120
141
  options[:verbose] = true
121
142
  end
122
143
 
123
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
144
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
124
145
  options[:debug] = true
125
146
  end
126
147
 
148
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
149
+ options[:dry_run] = b
150
+ end
151
+
127
152
  o.on('--','Anything after gets passed to GEMMA') do
128
153
  o.terminate()
129
154
  end
130
155
 
131
156
  o.separator ""
157
+
132
158
  o.on_tail('-h', '--help', 'display this help and exit') do
133
159
  options[:show_help] = true
134
160
  end
@@ -173,21 +199,40 @@ info = lambda do |*msg|
173
199
  OUTPUT.print *msg,"\n" if !options[:quiet]
174
200
  end
175
201
 
202
+ # Fetch chromosomes
203
+ def get_chromosomes annofn
204
+ h = {}
205
+ File.open(annofn,"r").each_line do | line |
206
+ chr = line.split(/\s+/)[2]
207
+ h[chr] = true
208
+ end
209
+ h.map { |k,v| k }
210
+ end
176
211
  # ---- Start banner
177
212
 
178
213
  GEMMA_K_VERSION=version
179
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
214
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
180
215
  info.call GEMMA_K_BANNER
181
216
 
182
217
  # Check gemma version
183
218
  GEMMA_COMMAND=options[:gemma_command]
219
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
220
+
221
+ begin
222
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
223
+ rescue Errno::ENOENT
224
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
225
+ error.call "<#{GEMMA_COMMAND}> command not found"
226
+ end
184
227
 
185
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
228
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
186
229
  info.call "Using ",gemma_version_header,"\n"
187
230
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
188
231
  v_version, v_major, v_minor = gemma_version.split(".")
189
232
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
190
233
 
234
+ info.call gemma_version_header
235
+
191
236
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
192
237
 
193
238
  options[:gemma_version_header] = gemma_version_header
@@ -203,74 +248,152 @@ if RUBY_VERSION =~ /^1/
203
248
  warning "runs on Ruby 2.x only\n"
204
249
  end
205
250
 
251
+ # ---- LOCO defaults to parallel
252
+ if options[:parallel] == nil
253
+ options[:parallel] = true if options[:loco]
254
+ end
255
+
256
+ debug.call(options) # some debug output
257
+ debug.call(record)
258
+
206
259
  DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
207
260
  DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
208
261
 
262
+ if options[:parallel]
263
+ begin
264
+ skip_cite = `echo "will cite" |parallel --citation`
265
+ debug.call(skip_cite)
266
+ PARALLEL_INFO = `parallel --help`
267
+ rescue Errno::ENOENT
268
+ error.call "<parallel> command not found"
269
+ end
270
+ parallel_cmds = []
271
+ end
272
+
273
+ # ---- Fetch chromosomes from SNP annotation file
274
+ anno_idx = gemma_args.index '-a'
275
+ raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
276
+ CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
277
+
209
278
  # ---- Compute HASH on inputs
210
279
  hashme = []
211
280
  geno_idx = gemma_args.index '-g'
212
281
  raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
213
282
  pheno_idx = gemma_args.index '-p'
214
- hashme =
215
- if DO_COMPUTE_KINSHIP and pheno_idx != nil
216
- p [pheno_idx,gemma_args[pheno_idx+2..-1]]
217
- gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
218
- else
219
- gemma_args
220
- end
221
283
 
222
- if DO_COMPUTE_GWA
223
- raise "Did not expect GEMMA -p phenotype file switch" if pheno_idx
224
- hashme += ['-p', options[:permute_phenotypes]] if options[:permute_phenotypes]
284
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
285
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
225
286
  end
226
287
 
227
- require 'digest/sha1'
228
- debug.call "Hashing on ",hashme,"\n"
229
- hashes = []
230
- hashme.each do | item |
231
- if File.exist?(item)
232
- hashes << Digest::SHA1.hexdigest(File.read(item))
233
- debug.call [item,hashes.last]
288
+ execute = lambda { |cmd|
289
+ info.call("Executing: #{cmd}")
290
+ err = 0
291
+ if not options[:debug]
292
+ # send output to stderr line by line
293
+ IO.popen("#{cmd}") do |io|
294
+ while s = io.gets
295
+ $stderr.print s
296
+ end
297
+ io.close
298
+ err = $?.to_i
299
+ end
234
300
  else
235
- hashes << item
301
+ $stderr.print `#{cmd}`
302
+ err = $?.to_i
303
+ end
304
+ err
305
+ }
306
+
307
+ compute_hash = lambda do | phenofn = nil |
308
+ # Compute a HASH on the inputs
309
+ debug.call "Hashing on ",hashme,"\n"
310
+ hashes = []
311
+ hm = if phenofn
312
+ hashme + ["-p", phenofn]
313
+ else
314
+ hashme
315
+ end
316
+ debug.call(hm)
317
+ hm.each do | item |
318
+ if File.file?(item)
319
+ hashes << Digest::SHA1.hexdigest(File.read(item))
320
+ debug.call [item,hashes.last]
321
+ else
322
+ hashes << item
323
+ end
236
324
  end
325
+ debug.call(hashes)
326
+ Digest::SHA1.hexdigest hashes.join(' ')
237
327
  end
238
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
239
328
 
329
+ HASH = compute_hash.call()
240
330
  options[:hash] = HASH
241
331
 
242
332
  # Create cache dir
243
333
  FileUtils::mkdir_p options[:cache_dir]
244
334
 
335
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
336
+
245
337
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
246
338
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
339
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
247
340
  gemma_args << '-outdir'
248
- gemma_args << options[:cache_dir]
341
+ gemma_args << tmpdir
249
342
  GEMMA_ARGS = gemma_args
250
343
 
344
+ hashme =
345
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
346
+ # Remove the phenotype file from the hash for GRM computation
347
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
348
+ else
349
+ GEMMA_ARGS_HASH
350
+ end
351
+
251
352
  debug.call "Options: ",options,"\n" if !options[:quiet]
252
353
 
253
- invoke_gemma = lambda do |extra_args, cache_hit = false|
254
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
354
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
355
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
255
356
  record[:gemma_command] = cmd
256
357
  return if cache_hit
257
- # debug.call cmd
358
+ if options[:slurm]
359
+ info.call cmd
360
+ hashi = HASH
361
+ prefix = tmpdir+'/'+hashi
362
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
363
+ script = "#!/bin/bash
364
+ #SBATCH --job-name=gemma-#{scriptfn}
365
+ #SBATCH --ntasks=1
366
+ #SBATCH --time=20:00
367
+ srun #{cmd}
368
+ "
369
+ debug.call(script)
370
+ File.open(scriptfn,"w") { |f|
371
+ f.write(script)
372
+ }
373
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
374
+ end
258
375
  errno =
259
376
  if options[:json]
260
377
  # capture output
261
378
  err = 0
262
- IO.popen(cmd) do |io|
263
- while s = io.gets
264
- $stderr.print s
265
- end
266
- io.close
267
- err = $?.to_i
379
+ if options[:dry_run]
380
+ info.call("Would have invoked: ",cmd)
381
+ elsif options[:parallel]
382
+ info.call("Add parallel job: ",cmd)
383
+ parallel_cmds << cmd
384
+ else
385
+ err = execute.call(cmd)
268
386
  end
269
387
  err
270
388
  else
271
- debug.call("Invoking ",cmd) if options[:debug]
272
- system(cmd)
273
- $?.exitstatus
389
+ if options[:dry_run]
390
+ info.call("Would have invoked ",cmd)
391
+ 0
392
+ else
393
+ debug.call("Invoking ",cmd) if options[:debug]
394
+ system(cmd)
395
+ $?.exitstatus
396
+ end
274
397
  end
275
398
  if errno != 0
276
399
  debug.call "Gemma exit ",errno
@@ -280,11 +403,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
280
403
  end
281
404
  end
282
405
 
406
+ # Takes the hash value and checks whether the (output) file exists
283
407
  # returns datafn, logfn, cache_hit
284
- cache = lambda do | chr, ext |
408
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
285
409
  inject = (chr==nil ? "" : ".#{chr}" )+ext
286
- hashi = (chr==nil ? HASH : HASH+inject)
287
- prefix = options[:cache_dir]+'/'+hashi
410
+ hashi = (chr==nil ? h : h+inject)
411
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
412
+ # for chr 3 and permutation 1 forms something like
413
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
288
414
  logfn = prefix+".log.txt"
289
415
  datafn = prefix+ext
290
416
  record[:files] ||= []
@@ -320,25 +446,32 @@ kinship = lambda do | chr = nil |
320
446
  end
321
447
 
322
448
  # ---- Run GWA
323
- gwas = lambda do | chr, kfn, pfn |
449
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
324
450
  record[:type] = "GWA"
325
451
  error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
326
- hashi, cache_hit = cache.call chr,".assoc.txt"
452
+ # Update hash for each permutation
453
+ hash = compute_hash.call(pfn)
454
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
327
455
  if not cache_hit
328
456
  args = [ '-k', kfn, '-o', hashi ]
329
457
  args << [ '-loco', chr ] if chr != nil
330
458
  args << [ '-p', pfn ] if pfn
331
- invoke_gemma.call args
459
+ invoke_gemma.call args,false,chr,permutation
332
460
  end
333
461
  end
334
462
 
335
463
  LOCO = options[:loco]
336
- # if GEMMA_ARGS.include? '-gk'
464
+ if LOCO
465
+ if options[:chromosomes]
466
+ CHROMOSOMES = options[:chromosomes]
467
+ end
468
+ end
469
+
337
470
  if DO_COMPUTE_KINSHIP
338
471
  # compute K
339
- info.call LOCO
340
- if LOCO != nil
341
- LOCO.each do |chr|
472
+ info.call CHROMOSOMES
473
+ if LOCO
474
+ CHROMOSOMES.each do |chr|
342
475
  info.call "LOCO for ",chr
343
476
  kinship.call(chr)
344
477
  end
@@ -347,27 +480,38 @@ if DO_COMPUTE_KINSHIP
347
480
  end
348
481
  else
349
482
  # DO_COMPUTE_GWA
350
- json_in = JSON.parse(File.read(options[:input]))
483
+ begin
484
+ json_in = JSON.parse(File.read(options[:input]))
485
+ rescue TypeError
486
+ raise "Missing JSON input file?"
487
+ end
351
488
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
352
489
 
353
- pfn = options[:phenotypes] # can be nil
354
- k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
355
- k_files.each do | chr, kfn | # call a GWA for each chromosome
356
- gwas.call(chr,kfn,pfn)
490
+ pfn = options[:permute_phenotypes] # can be nil
491
+ if LOCO
492
+ k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
493
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
494
+ gwas.call(chr,kfn,pfn)
495
+ end
496
+ else
497
+ kfn = json_in["files"][0][2]
498
+ CHROMOSOMES.each do | chr |
499
+ gwas.call(chr,kfn,pfn)
500
+ end
357
501
  end
358
502
  # Permute
359
503
  if options[:permutate]
360
504
  ps = []
361
- raise "You should supply --phenotype with gemma-wrapper --permutate" if not pfn
505
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
362
506
  File.foreach(pfn).with_index do |line, line_num|
363
507
  ps << line
364
508
  end
365
509
  score_list = []
366
510
  debug.call(options[:permutate],"x permutations")
367
- (1..options[:permutate]).each do |i|
368
- $stderr.print "Iteration ",i,"\n"
511
+ (1..options[:permutate]).each do |permutation|
512
+ $stderr.print "Iteration ",permutation,"\n"
369
513
  # Create a shuffled phenotype file
370
- file = File.open("phenotypes-#{i}","w")
514
+ file = File.open("phenotypes-#{permutation}","w")
371
515
  tmp_pfn = file.path
372
516
  p tmp_pfn
373
517
  ps.shuffle.each do | l |
@@ -375,20 +519,23 @@ else
375
519
  end
376
520
  file.close
377
521
  k_files.each do | chr, kfn | # call a GWA for each chromosome
378
- gwas.call(chr,kfn,tmp_pfn)
522
+ gwas.call(chr,kfn,tmp_pfn,permutation)
379
523
  end
380
- # p [:HEY,record[:files].last]
381
- assocfn = record[:files].last[2]
382
- debug.call("Reading ",assocfn)
383
524
  score_min = 1000.0
384
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
385
- if assoc_line_num > 0
386
- value = assoc.strip.split(/\t/).last.to_f
387
- score_min = value if value < score_min
525
+ if false and not options[:slurm]
526
+ # p [:HEY,record[:files].last]
527
+ assocfn = record[:files].last[2]
528
+ debug.call("Reading ",assocfn)
529
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
530
+ if assoc_line_num > 0
531
+ value = assoc.strip.split(/\t/).last.to_f
532
+ score_min = value if value < score_min
533
+ end
388
534
  end
389
535
  end
390
536
  score_list << score_min
391
537
  end
538
+ exit 0 if options[:slurm]
392
539
  ls = score_list.sort
393
540
  p ls
394
541
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -399,5 +546,38 @@ else
399
546
  end
400
547
  end
401
548
 
549
+ # ---- Invoke parallel
550
+ if options[:parallel]
551
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
552
+ cmd = parallel_cmds.join("\\n")
553
+
554
+ cmd = "echo -e \"#{cmd}\""
555
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
556
+ if err != 0
557
+ [16,8,4,1].each do |jobs|
558
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
559
+ err = execute.call(cmd+"|parallel -j #{jobs}")
560
+ break if err == 0
561
+ end
562
+ if err != 0
563
+ info.call("Run failed!")
564
+ exit err
565
+ end
566
+ end
567
+ info.call("Run successful!")
568
+ end
402
569
  json_out.call
403
- exit 0
570
+
571
+ # copy all output files to the cache_dir. If a file exists only emit a warning
572
+ Dir.glob("*.txt", base: tmpdir) do | fn |
573
+ source = tmpdir + "/" + fn
574
+ dest = options[:cache_dir] + "/" + fn
575
+ if not File.exist?(dest) or options[:force]
576
+ info.call "Move #{source} to #{dest}"
577
+ FileUtils.mv source, dest, verbose: false
578
+ else
579
+ warning.call "File #{dest} already exists. Not overwriting"
580
+ end
581
+ end
582
+
583
+ end # tmpdir
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: '0.98'
4
+ version: 0.99.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-11-19 00:00:00.000000000 Z
11
+ date: 2021-08-22 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
43
  - !ruby/object:Gem::Version
44
44
  version: '0'
45
45
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
46
+ rubygems_version: 3.2.5
48
47
  signing_key:
49
48
  specification_version: 4
50
49
  summary: GEMMA with LOCO and permutations