bio-gemma-wrapper 0.97.1 → 0.99.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +84 -13
- data/VERSION +1 -1
- data/bin/gemma-wrapper +218 -61
- data/gemma-wrapper.gemspec +1 -1
- metadata +5 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: e27a8a3abb00b758095df5956b3854674faf5ff681a93bc028df273c40125c0d
|
4
|
+
data.tar.gz: e9675dbb0ea0f087dd21774635d38f3cda11b46a88b36c77dd308086fd0ec5f2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 81cf5440fa531d5a831efa787800c8bea230d47cddc666a31fff066551ff347708a41ddf1368c0d3946c7ba9faef8e5882e398ad340850253c53961cce96f662
|
7
|
+
data.tar.gz: 582ae78c48a1eb8eeca01172eaeaba9d5ca23e69601967e334f8c218e3a4dd74b297861b01ce49b1357798b49a96c12e737100dcacec7fc34b70da1fc9c75f0d
|
data/README.md
CHANGED
@@ -1,10 +1,19 @@
|
|
1
|
-
|
1
|
+
[![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
|
+
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
2
4
|
|
3
5
|
![Genetic associations identified in CFW mice using GEMMA (Parker et al,
|
4
6
|
Nat. Genet., 2016)](cfw.gif)
|
5
7
|
|
6
8
|
## Introduction
|
7
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
|
12
|
+
is used to run GEMMA as part of the https://genenetwork.org/
|
13
|
+
environment.
|
14
|
+
|
15
|
+
Note that gemma-wrapper is projected to be integrated into gemma2/lib.
|
16
|
+
|
8
17
|
GEMMA is a software toolkit for fast application of linear mixed
|
9
18
|
models (LMMs) and related models to genome-wide association studies
|
10
19
|
(GWAS) and other large-scale data sets.
|
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
|
|
12
21
|
This repository contains gemma-wrapper, essentially a wrapper of
|
13
22
|
GEMMA that provides support for caching the kinship or relatedness
|
14
23
|
matrix (K) and caching LM and LMM computations with the option of full
|
15
|
-
leave-one-chromosome-out genome scans (LOCO).
|
24
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
25
|
+
submitted to HPC PBS, i.e., slurm.
|
16
26
|
|
17
27
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
18
28
|
does a pass-through of all standard GEMMA invocation switches. On
|
19
29
|
return gemma-wrapper can return a JSON object (--json) which is
|
20
30
|
useful for web-services.
|
21
31
|
|
22
|
-
Note that this a work in progress (WIP). What is described below
|
23
|
-
should work.
|
24
|
-
|
25
32
|
## Installation
|
26
33
|
|
27
34
|
Prerequisites are
|
@@ -30,8 +37,9 @@ Prerequisites are
|
|
30
37
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
31
38
|
almost all Linux systems
|
32
39
|
|
33
|
-
gemma-wrapper comes as a Ruby
|
34
|
-
can be
|
40
|
+
gemma-wrapper comes as a Ruby
|
41
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
42
|
+
installed with
|
35
43
|
|
36
44
|
gem install bio-gemma-wrapper
|
37
45
|
|
@@ -39,15 +47,18 @@ Invoke the tool with
|
|
39
47
|
|
40
48
|
gemma-wrapper --help
|
41
49
|
|
42
|
-
and it will render
|
50
|
+
and it will render something like
|
43
51
|
|
44
52
|
```
|
45
53
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
54
|
+
--permutate n Permutate # times by shuffling phenotypes
|
55
|
+
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
46
56
|
--loco [x,y,1,2,3...] Run full LOCO
|
47
57
|
--input filen JSON input variables (used for LOCO)
|
48
58
|
--cache-dir path Use a cache directory
|
49
59
|
--json Create output file in JSON format
|
50
60
|
--force Force computation
|
61
|
+
--slurm [options] Submit to slurm PBS
|
51
62
|
--q, --quiet Run quietly
|
52
63
|
-v, --verbose Run verbosely
|
53
64
|
--debug Show debug messages and keep intermediate output
|
@@ -65,6 +76,8 @@ Unpack it and run the tool as
|
|
65
76
|
|
66
77
|
./bin/gemma-wrapper --help
|
67
78
|
|
79
|
+
See below for using a GNU Guix environment.
|
80
|
+
|
68
81
|
## Usage
|
69
82
|
|
70
83
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -91,11 +104,12 @@ the data files are found):
|
|
91
104
|
|
92
105
|
Run it twice to see
|
93
106
|
|
94
|
-
/tmp/
|
107
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
95
108
|
|
96
109
|
gemma-wrapper computes the unique HASH value over the command
|
97
110
|
line switches passed into GEMMA as well as the contents of the files
|
98
|
-
passed in (here the genotype and phenotype files
|
111
|
+
passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
|
112
|
+
GEMMA always computes the same K).
|
99
113
|
|
100
114
|
You can also get JSON output on STDOUT by providing the --json switch
|
101
115
|
|
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
103
117
|
-g test/data/input/BXD_geno.txt.gz \
|
104
118
|
-p test/data/input/BXD_pheno.txt \
|
105
119
|
-gk \
|
106
|
-
-debug
|
120
|
+
-debug > K.json
|
107
121
|
|
108
|
-
|
122
|
+
K.json is something that can be parsed with a calling program, and is
|
123
|
+
also below as input for the GWA step. Example:
|
109
124
|
|
110
125
|
```json
|
111
126
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
138
|
|
124
139
|
will store K in ~/.gemma-cache.
|
125
140
|
|
141
|
+
### GWA
|
142
|
+
|
143
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
144
|
+
switch
|
145
|
+
|
146
|
+
gemma-wrapper --json --loco --input K.json -- \
|
147
|
+
-g test/data/input/BXD_geno.txt.gz \
|
148
|
+
-p test/data/input/BXD_pheno.txt \
|
149
|
+
-c test/data/input/BXD_covariates2.txt \
|
150
|
+
-a test/data/input/BXD_snps.txt \
|
151
|
+
-lmm 2 -maf 0.1 \
|
152
|
+
-debug > GWA.json
|
153
|
+
|
154
|
+
Running it twice should show that GWA is not recomputed.
|
155
|
+
|
156
|
+
/tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
|
157
|
+
|
126
158
|
### LOCO
|
127
159
|
|
128
160
|
Recent versions of GEMMA have LOCO support for a single chromosome
|
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
|
|
158
190
|
The -k switch is injected automatically. Again output switches are not
|
159
191
|
allowed (-o, -outdir)
|
160
192
|
|
193
|
+
### Permutations
|
194
|
+
|
195
|
+
Permutations can be run with and without LOCO. First create K
|
196
|
+
|
197
|
+
gemma-wrapper --json -- \
|
198
|
+
-g test/data/input/BXD_geno.txt.gz \
|
199
|
+
-p test/data/input/BXD_pheno.txt \
|
200
|
+
-gk \
|
201
|
+
-debug > K.json
|
202
|
+
|
203
|
+
Next, using K.json, permute the phenotypes with something like
|
204
|
+
|
205
|
+
gemma-wrapper --json --loco --input K.json \
|
206
|
+
--permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
|
207
|
+
-g test/data/input/BXD_geno.txt.gz \
|
208
|
+
-p test/data/input/BXD_pheno.txt \
|
209
|
+
-c test/data/input/BXD_covariates2.txt \
|
210
|
+
-a test/data/input/BXD_snps.txt \
|
211
|
+
-lmm 2 -maf 0.1 \
|
212
|
+
-debug > GWA.json
|
213
|
+
|
214
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
215
|
+
|
216
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
217
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
218
|
+
|
219
|
+
### Slurm PBS
|
220
|
+
|
221
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
222
|
+
|
223
|
+
## Development
|
224
|
+
|
225
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
226
|
+
|
227
|
+
```
|
228
|
+
source .guix-deploy
|
229
|
+
ruby bin/gemma-wrapper --help
|
230
|
+
```
|
231
|
+
|
161
232
|
## Copyright
|
162
233
|
|
163
|
-
Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
234
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.2
|
data/bin/gemma-wrapper
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
9
|
USAGE = "
|
10
10
|
GEMMA wrapper example:
|
@@ -35,10 +35,13 @@ GEMMA wrapper example:
|
|
35
35
|
-lmm 2 -maf 0.1 \\
|
36
36
|
-debug > GWA.json
|
37
37
|
|
38
|
+
Gemma gets used from the path. You can override by setting
|
39
|
+
|
40
|
+
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
38
41
|
"
|
39
|
-
# These are used for testing compatibility
|
40
|
-
GEMMA_V_MAJOR =
|
41
|
-
GEMMA_V_MINOR =
|
42
|
+
# These are used for testing compatibility with the gemma tool
|
43
|
+
GEMMA_V_MAJOR = 98
|
44
|
+
GEMMA_V_MINOR = 4
|
42
45
|
|
43
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
44
47
|
$: << File.join(basepath,'lib')
|
@@ -61,32 +64,34 @@ if not gemma_command
|
|
61
64
|
end
|
62
65
|
end
|
63
66
|
|
67
|
+
|
68
|
+
require 'digest/sha1'
|
64
69
|
require 'fileutils'
|
65
70
|
require 'optparse'
|
66
|
-
require 'tmpdir'
|
67
71
|
require 'tempfile'
|
72
|
+
require 'tmpdir'
|
68
73
|
|
69
74
|
split_at = ARGV.index('--')
|
70
75
|
if split_at
|
71
76
|
gemma_args = ARGV[split_at+1..-1]
|
72
77
|
end
|
73
78
|
|
74
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
79
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
|
75
80
|
|
76
81
|
opts = OptionParser.new do |o|
|
77
82
|
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
78
83
|
|
79
|
-
o.on('--permutate n', Integer, 'Permutate by shuffling phenotypes') do |lst|
|
84
|
+
o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
|
80
85
|
options[:permutate] = lst
|
81
86
|
options[:force] = true
|
82
87
|
end
|
83
88
|
|
84
|
-
o.on('--phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
85
|
-
options[:
|
89
|
+
o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
90
|
+
options[:permute_phenotypes] = phenotypes
|
86
91
|
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
87
92
|
end
|
88
93
|
|
89
|
-
o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
|
94
|
+
o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
|
90
95
|
options[:loco] = lst
|
91
96
|
end
|
92
97
|
|
@@ -107,6 +112,18 @@ opts = OptionParser.new do |o|
|
|
107
112
|
options[:force] = true
|
108
113
|
end
|
109
114
|
|
115
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
116
|
+
options[:parallel] = false
|
117
|
+
end
|
118
|
+
|
119
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
120
|
+
options[:slurm_opts] = ""
|
121
|
+
options[:slurm] = true
|
122
|
+
if slurm
|
123
|
+
options[:slurm_opts] = slurm
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
110
127
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
111
128
|
options[:quiet] = true
|
112
129
|
end
|
@@ -115,15 +132,20 @@ opts = OptionParser.new do |o|
|
|
115
132
|
options[:verbose] = true
|
116
133
|
end
|
117
134
|
|
118
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
135
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
119
136
|
options[:debug] = true
|
120
137
|
end
|
121
138
|
|
139
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
140
|
+
options[:dry_run] = b
|
141
|
+
end
|
142
|
+
|
122
143
|
o.on('--','Anything after gets passed to GEMMA') do
|
123
144
|
o.terminate()
|
124
145
|
end
|
125
146
|
|
126
147
|
o.separator ""
|
148
|
+
|
127
149
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
128
150
|
options[:show_help] = true
|
129
151
|
end
|
@@ -171,17 +193,28 @@ end
|
|
171
193
|
# ---- Start banner
|
172
194
|
|
173
195
|
GEMMA_K_VERSION=version
|
174
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017
|
196
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
175
197
|
info.call GEMMA_K_BANNER
|
176
198
|
|
177
199
|
# Check gemma version
|
178
200
|
GEMMA_COMMAND=options[:gemma_command]
|
179
|
-
|
201
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
202
|
+
|
203
|
+
begin
|
204
|
+
GEMMA_INFO = `#{GEMMA_COMMAND}`
|
205
|
+
rescue Errno::ENOENT
|
206
|
+
GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
|
207
|
+
error.call "<#{GEMMA_COMMAND}> command not found"
|
208
|
+
end
|
209
|
+
|
210
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
180
211
|
info.call "Using ",gemma_version_header,"\n"
|
181
212
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
182
213
|
v_version, v_major, v_minor = gemma_version.split(".")
|
183
214
|
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
184
215
|
|
216
|
+
info.call gemma_version_header
|
217
|
+
|
185
218
|
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
186
219
|
|
187
220
|
options[:gemma_version_header] = gemma_version_header
|
@@ -197,60 +230,143 @@ if RUBY_VERSION =~ /^1/
|
|
197
230
|
warning "runs on Ruby 2.x only\n"
|
198
231
|
end
|
199
232
|
|
233
|
+
debug.call(options) # some debug output
|
234
|
+
debug.call(record)
|
235
|
+
|
236
|
+
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
237
|
+
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
238
|
+
|
239
|
+
# ---- Set up parallel
|
240
|
+
if options[:parallel]
|
241
|
+
begin
|
242
|
+
skip_cite = `echo "will cite" |parallel --citation`
|
243
|
+
debug.call(skip_cite)
|
244
|
+
PARALLEL_INFO = `parallel --help`
|
245
|
+
rescue Errno::ENOENT
|
246
|
+
error.call "<parallel> command not found"
|
247
|
+
end
|
248
|
+
parallel_cmds = []
|
249
|
+
end
|
250
|
+
|
200
251
|
# ---- Compute HASH on inputs
|
201
252
|
hashme = []
|
202
253
|
geno_idx = gemma_args.index '-g'
|
203
|
-
raise "Expected GEMMA -g switch" if geno_idx == nil
|
204
|
-
|
205
|
-
hashme += ['-p', options[:phenotypes]] if options[:phenotypes]
|
254
|
+
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
255
|
+
pheno_idx = gemma_args.index '-p'
|
206
256
|
|
207
|
-
|
208
|
-
|
209
|
-
|
210
|
-
|
211
|
-
|
212
|
-
|
213
|
-
|
257
|
+
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
258
|
+
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
259
|
+
end
|
260
|
+
|
261
|
+
execute = lambda { |cmd|
|
262
|
+
info.call("Executing: #{cmd}")
|
263
|
+
err = 0
|
264
|
+
if not options[:debug]
|
265
|
+
# send output to stderr line by line
|
266
|
+
IO.popen("#{cmd}") do |io|
|
267
|
+
while s = io.gets
|
268
|
+
$stderr.print s
|
269
|
+
end
|
270
|
+
io.close
|
271
|
+
err = $?.to_i
|
272
|
+
end
|
214
273
|
else
|
215
|
-
|
274
|
+
$stderr.print `#{cmd}`
|
275
|
+
err = $?.to_i
|
276
|
+
end
|
277
|
+
err
|
278
|
+
}
|
279
|
+
|
280
|
+
compute_hash = lambda do | phenofn = nil |
|
281
|
+
# Compute a HASH on the inputs
|
282
|
+
debug.call "Hashing on ",hashme,"\n"
|
283
|
+
hashes = []
|
284
|
+
hm = if phenofn
|
285
|
+
hashme + ["-p", phenofn]
|
286
|
+
else
|
287
|
+
hashme
|
288
|
+
end
|
289
|
+
debug.call(hm)
|
290
|
+
hm.each do | item |
|
291
|
+
if File.file?(item)
|
292
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
293
|
+
debug.call [item,hashes.last]
|
294
|
+
else
|
295
|
+
hashes << item
|
296
|
+
end
|
216
297
|
end
|
298
|
+
debug.call(hashes)
|
299
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
217
300
|
end
|
218
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
219
301
|
|
302
|
+
HASH = compute_hash.call()
|
220
303
|
options[:hash] = HASH
|
221
304
|
|
222
305
|
# Create cache dir
|
223
306
|
FileUtils::mkdir_p options[:cache_dir]
|
224
307
|
|
308
|
+
Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
|
309
|
+
|
225
310
|
error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
|
226
311
|
error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
|
312
|
+
GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
|
227
313
|
gemma_args << '-outdir'
|
228
|
-
gemma_args <<
|
314
|
+
gemma_args << tmpdir
|
229
315
|
GEMMA_ARGS = gemma_args
|
230
316
|
|
317
|
+
hashme =
|
318
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
319
|
+
# Remove the phenotype file from the hash for GRM computation
|
320
|
+
GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
|
321
|
+
else
|
322
|
+
GEMMA_ARGS_HASH
|
323
|
+
end
|
324
|
+
|
231
325
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
232
326
|
|
233
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
234
|
-
cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
327
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
328
|
+
cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
235
329
|
record[:gemma_command] = cmd
|
236
330
|
return if cache_hit
|
237
|
-
|
331
|
+
if options[:slurm]
|
332
|
+
info.call cmd
|
333
|
+
hashi = HASH
|
334
|
+
prefix = tmpdir+'/'+hashi
|
335
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
336
|
+
script = "#!/bin/bash
|
337
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
338
|
+
#SBATCH --ntasks=1
|
339
|
+
#SBATCH --time=20:00
|
340
|
+
srun #{cmd}
|
341
|
+
"
|
342
|
+
debug.call(script)
|
343
|
+
File.open(scriptfn,"w") { |f|
|
344
|
+
f.write(script)
|
345
|
+
}
|
346
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
347
|
+
end
|
238
348
|
errno =
|
239
349
|
if options[:json]
|
240
350
|
# capture output
|
241
351
|
err = 0
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
352
|
+
if options[:dry_run]
|
353
|
+
info.call("Would have invoked: ",cmd)
|
354
|
+
elsif options[:parallel]
|
355
|
+
info.call("Add parallel job: ",cmd)
|
356
|
+
parallel_cmds << cmd
|
357
|
+
else
|
358
|
+
err = execute.call(cmd)
|
248
359
|
end
|
249
360
|
err
|
250
361
|
else
|
251
|
-
|
252
|
-
|
253
|
-
|
362
|
+
if options[:dry_run]
|
363
|
+
info.call("Would have invoked ",cmd)
|
364
|
+
0
|
365
|
+
else
|
366
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
367
|
+
system(cmd)
|
368
|
+
$?.exitstatus
|
369
|
+
end
|
254
370
|
end
|
255
371
|
if errno != 0
|
256
372
|
debug.call "Gemma exit ",errno
|
@@ -260,11 +376,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
260
376
|
end
|
261
377
|
end
|
262
378
|
|
379
|
+
# Takes the hash value and checks whether the (output) file exists
|
263
380
|
# returns datafn, logfn, cache_hit
|
264
|
-
cache = lambda do | chr, ext |
|
381
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
265
382
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
266
|
-
hashi = (chr==nil ?
|
267
|
-
prefix = options[:cache_dir]+'/'+hashi
|
383
|
+
hashi = (chr==nil ? h : h+inject)
|
384
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
385
|
+
# for chr 3 and permutation 1 forms something like
|
386
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
268
387
|
logfn = prefix+".log.txt"
|
269
388
|
datafn = prefix+ext
|
270
389
|
record[:files] ||= []
|
@@ -300,20 +419,22 @@ kinship = lambda do | chr = nil |
|
|
300
419
|
end
|
301
420
|
|
302
421
|
# ---- Run GWA
|
303
|
-
gwas = lambda do | chr, kfn, pfn |
|
422
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
304
423
|
record[:type] = "GWA"
|
305
|
-
error.call "Do not use the GEMMA -k switch with gemma-wrapper!" if GEMMA_ARGS.include? '-k' # K is automatic
|
306
|
-
|
424
|
+
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
425
|
+
# Update hash for each permutation
|
426
|
+
hash = compute_hash.call(pfn)
|
427
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
307
428
|
if not cache_hit
|
308
429
|
args = [ '-k', kfn, '-o', hashi ]
|
309
430
|
args << [ '-loco', chr ] if chr != nil
|
310
431
|
args << [ '-p', pfn ] if pfn
|
311
|
-
invoke_gemma.call args
|
432
|
+
invoke_gemma.call args,false,chr,permutation
|
312
433
|
end
|
313
434
|
end
|
314
435
|
|
315
436
|
LOCO = options[:loco]
|
316
|
-
if
|
437
|
+
if DO_COMPUTE_KINSHIP
|
317
438
|
# compute K
|
318
439
|
info.call LOCO
|
319
440
|
if LOCO != nil
|
@@ -325,11 +446,11 @@ if GEMMA_ARGS.include? '-gk'
|
|
325
446
|
kinship.call # no LOCO
|
326
447
|
end
|
327
448
|
else
|
328
|
-
#
|
449
|
+
# DO_COMPUTE_GWA
|
329
450
|
json_in = JSON.parse(File.read(options[:input]))
|
330
451
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
331
452
|
|
332
|
-
pfn = options[:
|
453
|
+
pfn = options[:permute_phenotypes] # can be nil
|
333
454
|
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
334
455
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
335
456
|
gwas.call(chr,kfn,pfn)
|
@@ -337,16 +458,16 @@ else
|
|
337
458
|
# Permute
|
338
459
|
if options[:permutate]
|
339
460
|
ps = []
|
340
|
-
raise "You should supply --
|
461
|
+
raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
|
341
462
|
File.foreach(pfn).with_index do |line, line_num|
|
342
463
|
ps << line
|
343
464
|
end
|
344
465
|
score_list = []
|
345
466
|
debug.call(options[:permutate],"x permutations")
|
346
|
-
(1..options[:permutate]).each do |
|
347
|
-
$stderr.print "Iteration ",
|
467
|
+
(1..options[:permutate]).each do |permutation|
|
468
|
+
$stderr.print "Iteration ",permutation,"\n"
|
348
469
|
# Create a shuffled phenotype file
|
349
|
-
file = File.open("phenotypes-#{
|
470
|
+
file = File.open("phenotypes-#{permutation}","w")
|
350
471
|
tmp_pfn = file.path
|
351
472
|
p tmp_pfn
|
352
473
|
ps.shuffle.each do | l |
|
@@ -354,20 +475,23 @@ else
|
|
354
475
|
end
|
355
476
|
file.close
|
356
477
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
357
|
-
gwas.call(chr,kfn,tmp_pfn)
|
478
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
358
479
|
end
|
359
|
-
# p [:HEY,record[:files].last]
|
360
|
-
assocfn = record[:files].last[2]
|
361
|
-
debug.call("Reading ",assocfn)
|
362
480
|
score_min = 1000.0
|
363
|
-
|
364
|
-
|
365
|
-
|
366
|
-
|
481
|
+
if false and not options[:slurm]
|
482
|
+
# p [:HEY,record[:files].last]
|
483
|
+
assocfn = record[:files].last[2]
|
484
|
+
debug.call("Reading ",assocfn)
|
485
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
486
|
+
if assoc_line_num > 0
|
487
|
+
value = assoc.strip.split(/\t/).last.to_f
|
488
|
+
score_min = value if value < score_min
|
489
|
+
end
|
367
490
|
end
|
368
491
|
end
|
369
492
|
score_list << score_min
|
370
493
|
end
|
494
|
+
exit 0 if options[:slurm]
|
371
495
|
ls = score_list.sort
|
372
496
|
p ls
|
373
497
|
significant = ls[(ls.size - ls.size*0.95).floor]
|
@@ -378,5 +502,38 @@ else
|
|
378
502
|
end
|
379
503
|
end
|
380
504
|
|
505
|
+
# ---- Invoke parallel
|
506
|
+
if options[:parallel]
|
507
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
508
|
+
cmd = parallel_cmds.join("\\n")
|
509
|
+
|
510
|
+
cmd = "echo -e \"#{cmd}\""
|
511
|
+
err = execute.call(cmd+"|parallel") # all jobs in parallel
|
512
|
+
if err != 0
|
513
|
+
[16,8,4,1].each do |jobs|
|
514
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
515
|
+
err = execute.call(cmd+"|parallel -j #{jobs}")
|
516
|
+
break if err == 0
|
517
|
+
end
|
518
|
+
if err != 0
|
519
|
+
info.call("Run failed!")
|
520
|
+
exit err
|
521
|
+
end
|
522
|
+
end
|
523
|
+
info.call("Run successful!")
|
524
|
+
end
|
381
525
|
json_out.call
|
382
|
-
|
526
|
+
|
527
|
+
# copy all output files to the cache_dir. If a file exists only emit a warning
|
528
|
+
Dir.glob("*.txt", base: tmpdir) do | fn |
|
529
|
+
source = tmpdir + "/" + fn
|
530
|
+
dest = options[:cache_dir] + "/" + fn
|
531
|
+
if not File.exist?(dest) or options[:force]
|
532
|
+
info.call "Move #{source} to #{dest}"
|
533
|
+
FileUtils.mv source, dest, verbose: false
|
534
|
+
else
|
535
|
+
warning.call "File #{dest} already exists. Not overwriting"
|
536
|
+
end
|
537
|
+
end
|
538
|
+
|
539
|
+
end # tmpdir
|
data/gemma-wrapper.gemspec
CHANGED
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
4
|
s.summary = "GEMMA with LOCO and permutations"
|
5
|
-
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
metadata
CHANGED
@@ -1,17 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.99.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-08-08 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper adds LOCO and permutation support. Also
|
14
|
-
runs with LOCO support
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
|
14
|
+
and caches K between runs with LOCO support
|
15
15
|
email: pjotr.public01@thebird.nl
|
16
16
|
executables:
|
17
17
|
- gemma-wrapper
|
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
43
|
- !ruby/object:Gem::Version
|
44
44
|
version: '0'
|
45
45
|
requirements: []
|
46
|
-
|
47
|
-
rubygems_version: 2.6.8
|
46
|
+
rubygems_version: 3.2.5
|
48
47
|
signing_key:
|
49
48
|
specification_version: 4
|
50
49
|
summary: GEMMA with LOCO and permutations
|