bio-gemma-wrapper 0.98 → 0.99.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: d8f36b92e82dda9e1e592724204521f9f4f1a950
4
- data.tar.gz: 077abf1c1a704ab93a27627d511aeba0dbf6b4c5
2
+ SHA256:
3
+ metadata.gz: 0bd37b153e121de9c1758af736cd6904744da2de3540f2a7c547cc423382d8d1
4
+ data.tar.gz: 84298a943e7cfe6126653895d9714babc83a7be2bf903c7b61ff9072f1d4e4a8
5
5
  SHA512:
6
- metadata.gz: b49bb2c9362eb7babd2cf20f640fc7806719a8c82e31f93a97b4517d5c2777bd4ae55b9e5a1fd37df5f22e63e2673fa01ca4afbc4e07de6955391d77bd7f90fb
7
- data.tar.gz: 863ef847e4ddff6b544a733e001bcd2291c73ab394573a299d54a8f2581066fb423b40be052ce6a440c1daffbd6ddb7500c5f01b7ce2de0e38e4277c8ae9e53e
6
+ metadata.gz: f32d48ec2f194a513e0cf8f15463b05662e1b269fdd22a110e4cdfb9f6bc541238bac7e766cbc31a8b782379891636e4f79fb3e3667d567ab3c610298d4f11c2
7
+ data.tar.gz: abc9b3faf8ef2f63d566caa14312bde2b2f438c722a83e90f7da775c55e2415171efbfab385273c82e45398c31e5306fb99437c38e287570652a3b04a5432883
data/README.md CHANGED
@@ -1,12 +1,20 @@
1
- [![Gem Version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
1
+ [![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
2
2
 
3
- # GEMMA wrapper caches K between runs with LOCO support
3
+ # GEMMA with LOCO, permutations and slurm support (and caching)
4
4
 
5
5
  ![Genetic associations identified in CFW mice using GEMMA (Parker et al,
6
6
  Nat. Genet., 2016)](cfw.gif)
7
7
 
8
8
  ## Introduction
9
9
 
10
+ Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
+ GEMMA in parallel (now the default with LOCO), and GEMMA on
12
+ PBS. Gemma-wrapper is used to run GEMMA as part of the
13
+ https://genenetwork.org/ environment.
14
+
15
+ Note that a version of gemma-wrapper is projected to be integrated
16
+ into gemma itself.
17
+
10
18
  GEMMA is a software toolkit for fast application of linear mixed
11
19
  models (LMMs) and related models to genome-wide association studies
12
20
  (GWAS) and other large-scale data sets.
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
14
22
  This repository contains gemma-wrapper, essentially a wrapper of
15
23
  GEMMA that provides support for caching the kinship or relatedness
16
24
  matrix (K) and caching LM and LMM computations with the option of full
17
- leave-one-chromosome-out genome scans (LOCO).
25
+ leave-one-chromosome-out genome scans (LOCO). Jobs can also be
26
+ submitted to HPC PBS, i.e., slurm.
18
27
 
19
28
  gemma-wrapper requires a recent version of GEMMA and essentially
20
29
  does a pass-through of all standard GEMMA invocation switches. On
21
30
  return gemma-wrapper can return a JSON object (--json) which is
22
31
  useful for web-services.
23
32
 
24
- Note that this a work in progress (WIP). What is described below
25
- should work.
33
+ ## Performance
34
+
35
+ LOCO runs in parallel by default which is at least a 5x performance
36
+ improvement on a machine with enough cores. GEMMA without LOCO,
37
+ however, does not run in parallel by default. Performance
38
+ improvements with the parallel implementation for LOCO and non-LOCO
39
+ can be viewed [here](./test/performance/releases.gmi).
26
40
 
27
41
  ## Installation
28
42
 
@@ -32,8 +46,9 @@ Prerequisites are
32
46
  * Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
33
47
  almost all Linux systems
34
48
 
35
- gemma-wrapper comes as a Ruby [gem](https://rubygems.org/gems/bio-gemma-wrapper) and
36
- can be installed with
49
+ gemma-wrapper comes as a Ruby
50
+ [gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
51
+ installed with
37
52
 
38
53
  gem install bio-gemma-wrapper
39
54
 
@@ -47,14 +62,19 @@ and it will render something like
47
62
  Usage: gemma-wrapper [options] -- [gemma-options]
48
63
  --permutate n Permutate # times by shuffling phenotypes
49
64
  --permute-phenotypes filen Phenotypes to be shuffled in permutations
50
- --loco [x,y,1,2,3...] Run full LOCO
65
+ --loco Run full leave-one-chromosome-out (LOCO)
66
+ --chromosomes [1,2,3] Run specific chromosomes
51
67
  --input filen JSON input variables (used for LOCO)
52
68
  --cache-dir path Use a cache directory
53
69
  --json Create output file in JSON format
54
- --force Force computation
70
+ --force Force computation (override cache)
71
+ --parallel Run jobs in parallel
72
+ --no-parallel Do not run jobs in parallel
73
+ --slurm[=opts] Use slurm PBS for submitting jobs
55
74
  --q, --quiet Run quietly
56
75
  -v, --verbose Run verbosely
57
- --debug Show debug messages and keep intermediate output
76
+ -d, --debug Show debug messages and keep intermediate output
77
+ --dry-run Show commands, but don't execute
58
78
  -- Anything after gets passed to GEMMA
59
79
 
60
80
  -h, --help display this help and exit
@@ -69,6 +89,8 @@ Unpack it and run the tool as
69
89
 
70
90
  ./bin/gemma-wrapper --help
71
91
 
92
+ See below for using a GNU Guix environment.
93
+
72
94
  ## Usage
73
95
 
74
96
  gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
@@ -90,12 +112,13 @@ the data files are found):
90
112
  gemma-wrapper -- \
91
113
  -g test/data/input/BXD_geno.txt.gz \
92
114
  -p test/data/input/BXD_pheno.txt \
115
+ -a test/data/input/BXD_snps.txt \
93
116
  -gk \
94
117
  -debug
95
118
 
96
119
  Run it twice to see
97
120
 
98
- /tmp/3079151e14b219c3b243b673d88001c1675168b4.log.txt gemma-wrapper CACHE HIT!
121
+ /tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
99
122
 
100
123
  gemma-wrapper computes the unique HASH value over the command
101
124
  line switches passed into GEMMA as well as the contents of the files
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
107
130
  gemma-wrapper --json -- \
108
131
  -g test/data/input/BXD_geno.txt.gz \
109
132
  -p test/data/input/BXD_pheno.txt \
133
+ -a test/data/input/BXD_snps.txt \
110
134
  -gk \
111
- -debug
135
+ -debug > K.json
112
136
 
113
- prints out something that can be parsed with a calling program
137
+ K.json is something that can be parsed with a calling program, and is
138
+ also below as input for the GWA step. Example:
114
139
 
115
140
  ```json
116
141
  {"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
@@ -123,11 +148,29 @@ default. If you want something else provide a --cache-dir, e.g.
123
148
  gemma-wrapper --cache-dir ~/.gemma-cache -- \
124
149
  -g test/data/input/BXD_geno.txt.gz \
125
150
  -p test/data/input/BXD_pheno.txt \
151
+ -a test/data/input/BXD_snps.txt \
126
152
  -gk \
127
153
  -debug
128
154
 
129
155
  will store K in ~/.gemma-cache.
130
156
 
157
+ ### GWA
158
+
159
+ Run the LMM using the K's captured earlier in K.json using the --input
160
+ switch
161
+
162
+ gemma-wrapper --json --input K.json -- \
163
+ -g test/data/input/BXD_geno.txt.gz \
164
+ -p test/data/input/BXD_pheno.txt \
165
+ -c test/data/input/BXD_covariates2.txt \
166
+ -a test/data/input/BXD_snps.txt \
167
+ -lmm 2 -maf 0.1 \
168
+ -debug > GWA.json
169
+
170
+ Running it twice should show that GWA is not recomputed.
171
+
172
+ /tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
173
+
131
174
  ### LOCO
132
175
 
133
176
  Recent versions of GEMMA have LOCO support for a single chromosome
@@ -136,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
136
179
  chromosomes first create all K's with
137
180
 
138
181
  gemma-wrapper --json \
139
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \
182
+ --loco -- \
140
183
  -g test/data/input/BXD_geno.txt.gz \
141
184
  -p test/data/input/BXD_pheno.txt \
142
185
  -a test/data/input/BXD_snps.txt \
@@ -163,6 +206,45 @@ GWA.json contains the file names of every chromosome
163
206
  The -k switch is injected automatically. Again output switches are not
164
207
  allowed (-o, -outdir)
165
208
 
209
+ ### Permutations
210
+
211
+ Permutations can be run with and without LOCO. First create K
212
+
213
+ gemma-wrapper --json -- \
214
+ -g test/data/input/BXD_geno.txt.gz \
215
+ -p test/data/input/BXD_pheno.txt \
216
+ -gk \
217
+ -debug > K.json
218
+
219
+ Next, using K.json, permute the phenotypes with something like
220
+
221
+ gemma-wrapper --json --loco --input K.json \
222
+ --permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
223
+ -g test/data/input/BXD_geno.txt.gz \
224
+ -p test/data/input/BXD_pheno.txt \
225
+ -c test/data/input/BXD_covariates2.txt \
226
+ -a test/data/input/BXD_snps.txt \
227
+ -lmm 2 -maf 0.1 \
228
+ -debug > GWA.json
229
+
230
+ This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
231
+
232
+ ["95 percentile (significant) ", 1.92081e-05, 4.7]
233
+ ["67 percentile (suggestive) ", 5.227785e-05, 4.3]
234
+
235
+ ### Slurm PBS
236
+
237
+ To run gemma-wrapper on HPC use the '--slurm' switch.
238
+
239
+ ## Development
240
+
241
+ We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
242
+
243
+ ```
244
+ source .guix-deploy
245
+ ruby bin/gemma-wrapper --help
246
+ ```
247
+
166
248
  ## Copyright
167
249
 
168
- Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
250
+ Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.98
1
+ 0.99.3
data/bin/gemma-wrapper CHANGED
@@ -4,7 +4,7 @@
4
4
  # Author:: Pjotr Prins
5
5
  # License:: GPL3
6
6
  #
7
- # Copyright (C) 2017,2018 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
  USAGE = "
10
10
  GEMMA wrapper example:
@@ -14,12 +14,12 @@ GEMMA wrapper example:
14
14
  gemma-wrapper -- \\
15
15
  -g test/data/input/BXD_geno.txt.gz \\
16
16
  -p test/data/input/BXD_pheno.txt \\
17
+ -a test/data/input/BXD_snps.txt \
17
18
  -gk
18
19
 
19
20
  LOCO K computation with caching and JSON output
20
21
 
21
- gemma-wrapper --json \\
22
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
22
+ gemma-wrapper --json --loco -- \\
23
23
  -g test/data/input/BXD_geno.txt.gz \\
24
24
  -p test/data/input/BXD_pheno.txt \\
25
25
  -a test/data/input/BXD_snps.txt \\
@@ -38,11 +38,10 @@ GEMMA wrapper example:
38
38
  Gemma gets used from the path. You can override by setting
39
39
 
40
40
  env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
41
-
42
41
  "
43
- # These are used for testing compatibility
42
+ # These are used for testing compatibility with the gemma tool
44
43
  GEMMA_V_MAJOR = 98
45
- GEMMA_V_MINOR = 0
44
+ GEMMA_V_MINOR = 4
46
45
 
47
46
  basepath = File.dirname(File.dirname(__FILE__))
48
47
  $: << File.join(basepath,'lib')
@@ -66,17 +65,19 @@ if not gemma_command
66
65
  end
67
66
 
68
67
 
68
+ require 'digest/sha1'
69
69
  require 'fileutils'
70
70
  require 'optparse'
71
- require 'tmpdir'
72
71
  require 'tempfile'
72
+ require 'tmpdir'
73
73
 
74
74
  split_at = ARGV.index('--')
75
+
75
76
  if split_at
76
77
  gemma_args = ARGV[split_at+1..-1]
77
78
  end
78
79
 
79
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
80
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
80
81
 
81
82
  opts = OptionParser.new do |o|
82
83
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
@@ -91,8 +92,12 @@ opts = OptionParser.new do |o|
91
92
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
93
  end
93
94
 
94
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
95
- options[:loco] = lst
95
+ o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
96
+ options[:loco] = b
97
+ end
98
+
99
+ o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
100
+ options[:chromosomes] = lst
96
101
  end
97
102
 
98
103
  o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
@@ -112,6 +117,22 @@ opts = OptionParser.new do |o|
112
117
  options[:force] = true
113
118
  end
114
119
 
120
+ o.on("--parallel", "Run jobs in parallel") do |b|
121
+ options[:parallel] = true
122
+ end
123
+
124
+ o.on("--no-parallel", "Do not run jobs in parallel") do |b|
125
+ options[:parallel] = false
126
+ end
127
+
128
+ o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
129
+ options[:slurm_opts] = ""
130
+ options[:slurm] = true
131
+ if slurm
132
+ options[:slurm_opts] = slurm
133
+ end
134
+ end
135
+
115
136
  o.on("--q", "--quiet", "Run quietly") do |q|
116
137
  options[:quiet] = true
117
138
  end
@@ -120,15 +141,20 @@ opts = OptionParser.new do |o|
120
141
  options[:verbose] = true
121
142
  end
122
143
 
123
- o.on("--debug", "Show debug messages and keep intermediate output") do |v|
144
+ o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
124
145
  options[:debug] = true
125
146
  end
126
147
 
148
+ o.on("--dry-run", "Show commands, but don't execute") do |b|
149
+ options[:dry_run] = b
150
+ end
151
+
127
152
  o.on('--','Anything after gets passed to GEMMA') do
128
153
  o.terminate()
129
154
  end
130
155
 
131
156
  o.separator ""
157
+
132
158
  o.on_tail('-h', '--help', 'display this help and exit') do
133
159
  options[:show_help] = true
134
160
  end
@@ -173,21 +199,40 @@ info = lambda do |*msg|
173
199
  OUTPUT.print *msg,"\n" if !options[:quiet]
174
200
  end
175
201
 
202
+ # Fetch chromosomes
203
+ def get_chromosomes annofn
204
+ h = {}
205
+ File.open(annofn,"r").each_line do | line |
206
+ chr = line.split(/\s+/)[2]
207
+ h[chr] = true
208
+ end
209
+ h.map { |k,v| k }
210
+ end
176
211
  # ---- Start banner
177
212
 
178
213
  GEMMA_K_VERSION=version
179
- GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017,2018\n"
214
+ GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
180
215
  info.call GEMMA_K_BANNER
181
216
 
182
217
  # Check gemma version
183
218
  GEMMA_COMMAND=options[:gemma_command]
219
+ info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
220
+
221
+ begin
222
+ GEMMA_INFO = `#{GEMMA_COMMAND}`
223
+ rescue Errno::ENOENT
224
+ GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
225
+ error.call "<#{GEMMA_COMMAND}> command not found"
226
+ end
184
227
 
185
- gemma_version_header = `#{GEMMA_COMMAND}`.split("\n").grep(/GEMMA|Version/)[0].strip
228
+ gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
186
229
  info.call "Using ",gemma_version_header,"\n"
187
230
  gemma_version = gemma_version_header.split(/[,\s]+/)[1]
188
231
  v_version, v_major, v_minor = gemma_version.split(".")
189
232
  info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
190
233
 
234
+ info.call gemma_version_header
235
+
191
236
  warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
192
237
 
193
238
  options[:gemma_version_header] = gemma_version_header
@@ -203,74 +248,152 @@ if RUBY_VERSION =~ /^1/
203
248
  warning "runs on Ruby 2.x only\n"
204
249
  end
205
250
 
251
+ # ---- LOCO defaults to parallel
252
+ if options[:parallel] == nil
253
+ options[:parallel] = true if options[:loco]
254
+ end
255
+
256
+ debug.call(options) # some debug output
257
+ debug.call(record)
258
+
206
259
  DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
207
260
  DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
208
261
 
262
+ if options[:parallel]
263
+ begin
264
+ skip_cite = `echo "will cite" |parallel --citation`
265
+ debug.call(skip_cite)
266
+ PARALLEL_INFO = `parallel --help`
267
+ rescue Errno::ENOENT
268
+ error.call "<parallel> command not found"
269
+ end
270
+ parallel_cmds = []
271
+ end
272
+
273
+ # ---- Fetch chromosomes from SNP annotation file
274
+ anno_idx = gemma_args.index '-a'
275
+ raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
276
+ CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
277
+
209
278
  # ---- Compute HASH on inputs
210
279
  hashme = []
211
280
  geno_idx = gemma_args.index '-g'
212
281
  raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
213
282
  pheno_idx = gemma_args.index '-p'
214
- hashme =
215
- if DO_COMPUTE_KINSHIP and pheno_idx != nil
216
- p [pheno_idx,gemma_args[pheno_idx+2..-1]]
217
- gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
218
- else
219
- gemma_args
220
- end
221
283
 
222
- if DO_COMPUTE_GWA
223
- raise "Did not expect GEMMA -p phenotype file switch" if pheno_idx
224
- hashme += ['-p', options[:permute_phenotypes]] if options[:permute_phenotypes]
284
+ if DO_COMPUTE_GWA and options[:permute_phenotypes]
285
+ raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
225
286
  end
226
287
 
227
- require 'digest/sha1'
228
- debug.call "Hashing on ",hashme,"\n"
229
- hashes = []
230
- hashme.each do | item |
231
- if File.exist?(item)
232
- hashes << Digest::SHA1.hexdigest(File.read(item))
233
- debug.call [item,hashes.last]
288
+ execute = lambda { |cmd|
289
+ info.call("Executing: #{cmd}")
290
+ err = 0
291
+ if not options[:debug]
292
+ # send output to stderr line by line
293
+ IO.popen("#{cmd}") do |io|
294
+ while s = io.gets
295
+ $stderr.print s
296
+ end
297
+ io.close
298
+ err = $?.to_i
299
+ end
234
300
  else
235
- hashes << item
301
+ $stderr.print `#{cmd}`
302
+ err = $?.to_i
303
+ end
304
+ err
305
+ }
306
+
307
+ compute_hash = lambda do | phenofn = nil |
308
+ # Compute a HASH on the inputs
309
+ debug.call "Hashing on ",hashme,"\n"
310
+ hashes = []
311
+ hm = if phenofn
312
+ hashme + ["-p", phenofn]
313
+ else
314
+ hashme
315
+ end
316
+ debug.call(hm)
317
+ hm.each do | item |
318
+ if File.file?(item)
319
+ hashes << Digest::SHA1.hexdigest(File.read(item))
320
+ debug.call [item,hashes.last]
321
+ else
322
+ hashes << item
323
+ end
236
324
  end
325
+ debug.call(hashes)
326
+ Digest::SHA1.hexdigest hashes.join(' ')
237
327
  end
238
- HASH = Digest::SHA1.hexdigest hashes.join(' ')
239
328
 
329
+ HASH = compute_hash.call()
240
330
  options[:hash] = HASH
241
331
 
242
332
  # Create cache dir
243
333
  FileUtils::mkdir_p options[:cache_dir]
244
334
 
335
+ Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
336
+
245
337
  error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
246
338
  error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
339
+ GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
247
340
  gemma_args << '-outdir'
248
- gemma_args << options[:cache_dir]
341
+ gemma_args << tmpdir
249
342
  GEMMA_ARGS = gemma_args
250
343
 
344
+ hashme =
345
+ if DO_COMPUTE_KINSHIP and pheno_idx != nil
346
+ # Remove the phenotype file from the hash for GRM computation
347
+ GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
348
+ else
349
+ GEMMA_ARGS_HASH
350
+ end
351
+
251
352
  debug.call "Options: ",options,"\n" if !options[:quiet]
252
353
 
253
- invoke_gemma = lambda do |extra_args, cache_hit = false|
254
- cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
354
+ invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
355
+ cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
255
356
  record[:gemma_command] = cmd
256
357
  return if cache_hit
257
- # debug.call cmd
358
+ if options[:slurm]
359
+ info.call cmd
360
+ hashi = HASH
361
+ prefix = tmpdir+'/'+hashi
362
+ scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
363
+ script = "#!/bin/bash
364
+ #SBATCH --job-name=gemma-#{scriptfn}
365
+ #SBATCH --ntasks=1
366
+ #SBATCH --time=20:00
367
+ srun #{cmd}
368
+ "
369
+ debug.call(script)
370
+ File.open(scriptfn,"w") { |f|
371
+ f.write(script)
372
+ }
373
+ cmd = "sbatch "+options[:slurm_opts] + scriptfn
374
+ end
258
375
  errno =
259
376
  if options[:json]
260
377
  # capture output
261
378
  err = 0
262
- IO.popen(cmd) do |io|
263
- while s = io.gets
264
- $stderr.print s
265
- end
266
- io.close
267
- err = $?.to_i
379
+ if options[:dry_run]
380
+ info.call("Would have invoked: ",cmd)
381
+ elsif options[:parallel]
382
+ info.call("Add parallel job: ",cmd)
383
+ parallel_cmds << cmd
384
+ else
385
+ err = execute.call(cmd)
268
386
  end
269
387
  err
270
388
  else
271
- debug.call("Invoking ",cmd) if options[:debug]
272
- system(cmd)
273
- $?.exitstatus
389
+ if options[:dry_run]
390
+ info.call("Would have invoked ",cmd)
391
+ 0
392
+ else
393
+ debug.call("Invoking ",cmd) if options[:debug]
394
+ system(cmd)
395
+ $?.exitstatus
396
+ end
274
397
  end
275
398
  if errno != 0
276
399
  debug.call "Gemma exit ",errno
@@ -280,11 +403,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
280
403
  end
281
404
  end
282
405
 
406
+ # Takes the hash value and checks whether the (output) file exists
283
407
  # returns datafn, logfn, cache_hit
284
- cache = lambda do | chr, ext |
408
+ cache = lambda do | chr, ext, h=HASH, permutation=0 |
285
409
  inject = (chr==nil ? "" : ".#{chr}" )+ext
286
- hashi = (chr==nil ? HASH : HASH+inject)
287
- prefix = options[:cache_dir]+'/'+hashi
410
+ hashi = (chr==nil ? h : h+inject)
411
+ prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
412
+ # for chr 3 and permutation 1 forms something like
413
+ # /tmp/1b700-a996f.3.cXX.txt.1.log.txt
288
414
  logfn = prefix+".log.txt"
289
415
  datafn = prefix+ext
290
416
  record[:files] ||= []
@@ -320,25 +446,32 @@ kinship = lambda do | chr = nil |
320
446
  end
321
447
 
322
448
  # ---- Run GWA
323
- gwas = lambda do | chr, kfn, pfn |
449
+ gwas = lambda do | chr, kfn, pfn, permutation=0 |
324
450
  record[:type] = "GWA"
325
451
  error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
326
- hashi, cache_hit = cache.call chr,".assoc.txt"
452
+ # Update hash for each permutation
453
+ hash = compute_hash.call(pfn)
454
+ hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
327
455
  if not cache_hit
328
456
  args = [ '-k', kfn, '-o', hashi ]
329
457
  args << [ '-loco', chr ] if chr != nil
330
458
  args << [ '-p', pfn ] if pfn
331
- invoke_gemma.call args
459
+ invoke_gemma.call args,false,chr,permutation
332
460
  end
333
461
  end
334
462
 
335
463
  LOCO = options[:loco]
336
- # if GEMMA_ARGS.include? '-gk'
464
+ if LOCO
465
+ if options[:chromosomes]
466
+ CHROMOSOMES = options[:chromosomes]
467
+ end
468
+ end
469
+
337
470
  if DO_COMPUTE_KINSHIP
338
471
  # compute K
339
- info.call LOCO
340
- if LOCO != nil
341
- LOCO.each do |chr|
472
+ info.call CHROMOSOMES
473
+ if LOCO
474
+ CHROMOSOMES.each do |chr|
342
475
  info.call "LOCO for ",chr
343
476
  kinship.call(chr)
344
477
  end
@@ -347,27 +480,38 @@ if DO_COMPUTE_KINSHIP
347
480
  end
348
481
  else
349
482
  # DO_COMPUTE_GWA
350
- json_in = JSON.parse(File.read(options[:input]))
483
+ begin
484
+ json_in = JSON.parse(File.read(options[:input]))
485
+ rescue TypeError
486
+ raise "Missing JSON input file?"
487
+ end
351
488
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
352
489
 
353
- pfn = options[:phenotypes] # can be nil
354
- k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
355
- k_files.each do | chr, kfn | # call a GWA for each chromosome
356
- gwas.call(chr,kfn,pfn)
490
+ pfn = options[:permute_phenotypes] # can be nil
491
+ if LOCO
492
+ k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
493
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
494
+ gwas.call(chr,kfn,pfn)
495
+ end
496
+ else
497
+ kfn = json_in["files"][0][2]
498
+ CHROMOSOMES.each do | chr |
499
+ gwas.call(chr,kfn,pfn)
500
+ end
357
501
  end
358
502
  # Permute
359
503
  if options[:permutate]
360
504
  ps = []
361
- raise "You should supply --phenotype with gemma-wrapper --permutate" if not pfn
505
+ raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
362
506
  File.foreach(pfn).with_index do |line, line_num|
363
507
  ps << line
364
508
  end
365
509
  score_list = []
366
510
  debug.call(options[:permutate],"x permutations")
367
- (1..options[:permutate]).each do |i|
368
- $stderr.print "Iteration ",i,"\n"
511
+ (1..options[:permutate]).each do |permutation|
512
+ $stderr.print "Iteration ",permutation,"\n"
369
513
  # Create a shuffled phenotype file
370
- file = File.open("phenotypes-#{i}","w")
514
+ file = File.open("phenotypes-#{permutation}","w")
371
515
  tmp_pfn = file.path
372
516
  p tmp_pfn
373
517
  ps.shuffle.each do | l |
@@ -375,20 +519,23 @@ else
375
519
  end
376
520
  file.close
377
521
  k_files.each do | chr, kfn | # call a GWA for each chromosome
378
- gwas.call(chr,kfn,tmp_pfn)
522
+ gwas.call(chr,kfn,tmp_pfn,permutation)
379
523
  end
380
- # p [:HEY,record[:files].last]
381
- assocfn = record[:files].last[2]
382
- debug.call("Reading ",assocfn)
383
524
  score_min = 1000.0
384
- File.foreach(assocfn).with_index do |assoc, assoc_line_num|
385
- if assoc_line_num > 0
386
- value = assoc.strip.split(/\t/).last.to_f
387
- score_min = value if value < score_min
525
+ if false and not options[:slurm]
526
+ # p [:HEY,record[:files].last]
527
+ assocfn = record[:files].last[2]
528
+ debug.call("Reading ",assocfn)
529
+ File.foreach(assocfn).with_index do |assoc, assoc_line_num|
530
+ if assoc_line_num > 0
531
+ value = assoc.strip.split(/\t/).last.to_f
532
+ score_min = value if value < score_min
533
+ end
388
534
  end
389
535
  end
390
536
  score_list << score_min
391
537
  end
538
+ exit 0 if options[:slurm]
392
539
  ls = score_list.sort
393
540
  p ls
394
541
  significant = ls[(ls.size - ls.size*0.95).floor]
@@ -399,5 +546,38 @@ else
399
546
  end
400
547
  end
401
548
 
549
+ # ---- Invoke parallel
550
+ if options[:parallel]
551
+ # parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
552
+ cmd = parallel_cmds.join("\\n")
553
+
554
+ cmd = "echo -e \"#{cmd}\""
555
+ err = execute.call(cmd+"|parallel") # all jobs in parallel
556
+ if err != 0
557
+ [16,8,4,1].each do |jobs|
558
+ info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
559
+ err = execute.call(cmd+"|parallel -j #{jobs}")
560
+ break if err == 0
561
+ end
562
+ if err != 0
563
+ info.call("Run failed!")
564
+ exit err
565
+ end
566
+ end
567
+ info.call("Run successful!")
568
+ end
402
569
  json_out.call
403
- exit 0
570
+
571
+ # copy all output files to the cache_dir. If a file exists only emit a warning
572
+ Dir.glob("*.txt", base: tmpdir) do | fn |
573
+ source = tmpdir + "/" + fn
574
+ dest = options[:cache_dir] + "/" + fn
575
+ if not File.exist?(dest) or options[:force]
576
+ info.call "Move #{source} to #{dest}"
577
+ FileUtils.mv source, dest, verbose: false
578
+ else
579
+ warning.call "File #{dest} already exists. Not overwriting"
580
+ end
581
+ end
582
+
583
+ end # tmpdir
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
2
2
  s.name = 'bio-gemma-wrapper'
3
3
  s.version = File.read('VERSION')
4
4
  s.summary = "GEMMA with LOCO and permutations"
5
- s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
5
+ s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
6
6
  s.authors = ["Pjotr Prins"]
7
7
  s.email = 'pjotr.public01@thebird.nl'
8
8
  s.files = ["bin/gemma-wrapper",
metadata CHANGED
@@ -1,17 +1,17 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: '0.98'
4
+ version: 0.99.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-11-19 00:00:00.000000000 Z
11
+ date: 2021-08-22 00:00:00.000000000 Z
12
12
  dependencies: []
13
- description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
14
- runs with LOCO support
13
+ description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
+ and caches K between runs with LOCO support
15
15
  email: pjotr.public01@thebird.nl
16
16
  executables:
17
17
  - gemma-wrapper
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
43
43
  - !ruby/object:Gem::Version
44
44
  version: '0'
45
45
  requirements: []
46
- rubyforge_project:
47
- rubygems_version: 2.6.8
46
+ rubygems_version: 3.2.5
48
47
  signing_key:
49
48
  specification_version: 4
50
49
  summary: GEMMA with LOCO and permutations