bio-gemma-wrapper 0.92.2 → 0.99.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/README.md +84 -13
- data/VERSION +1 -1
- data/bin/gemma-wrapper +254 -51
- data/gemma-wrapper.gemspec +2 -2
- metadata +6 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 9ddfd904e74beebe0de1b97732d872fce171732965a835b101b9cc9be815bb05
|
4
|
+
data.tar.gz: 2dae1c019da23f2f87216694d641fc1eb852aa7800557bd10cfb08cb3425e844
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 38454a3f12dab85bef711051e73e20a015fe6b6d9c71bafada2197b9aef1aa0eabe3f3709cb0dc9d0c39f4cc454c15bc4d3aea5d06140ccde72fa13aa6285f51
|
7
|
+
data.tar.gz: 28e77a6995893245c501e602d488b5e0c504549fa91d8c94f902591b87b4454fe9b7923667dfacae2ab1dac7f6f7d814df1ec036b2b4f616dfd4b84c549d35d1
|
data/README.md
CHANGED
@@ -1,10 +1,19 @@
|
|
1
|
-
|
1
|
+
[](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
|
+
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
2
4
|
|
3
5
|

|
5
7
|
|
6
8
|
## Introduction
|
7
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
|
12
|
+
is used to run GEMMA as part of the https://genenetwork.org/
|
13
|
+
environment.
|
14
|
+
|
15
|
+
Note that gemma-wrapper is projected to be integrated into gemma2/lib.
|
16
|
+
|
8
17
|
GEMMA is a software toolkit for fast application of linear mixed
|
9
18
|
models (LMMs) and related models to genome-wide association studies
|
10
19
|
(GWAS) and other large-scale data sets.
|
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
|
|
12
21
|
This repository contains gemma-wrapper, essentially a wrapper of
|
13
22
|
GEMMA that provides support for caching the kinship or relatedness
|
14
23
|
matrix (K) and caching LM and LMM computations with the option of full
|
15
|
-
leave-one-chromosome-out genome scans (LOCO).
|
24
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
25
|
+
submitted to HPC PBS, i.e., slurm.
|
16
26
|
|
17
27
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
18
28
|
does a pass-through of all standard GEMMA invocation switches. On
|
19
29
|
return gemma-wrapper can return a JSON object (--json) which is
|
20
30
|
useful for web-services.
|
21
31
|
|
22
|
-
Note that this a work in progress (WIP). What is described below
|
23
|
-
should work.
|
24
|
-
|
25
32
|
## Installation
|
26
33
|
|
27
34
|
Prerequisites are
|
@@ -30,8 +37,9 @@ Prerequisites are
|
|
30
37
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
31
38
|
almost all Linux systems
|
32
39
|
|
33
|
-
gemma-wrapper comes as a Ruby
|
34
|
-
can be
|
40
|
+
gemma-wrapper comes as a Ruby
|
41
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
42
|
+
installed with
|
35
43
|
|
36
44
|
gem install bio-gemma-wrapper
|
37
45
|
|
@@ -39,15 +47,18 @@ Invoke the tool with
|
|
39
47
|
|
40
48
|
gemma-wrapper --help
|
41
49
|
|
42
|
-
and it will render
|
50
|
+
and it will render something like
|
43
51
|
|
44
52
|
```
|
45
53
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
54
|
+
--permutate n Permutate # times by shuffling phenotypes
|
55
|
+
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
46
56
|
--loco [x,y,1,2,3...] Run full LOCO
|
47
57
|
--input filen JSON input variables (used for LOCO)
|
48
58
|
--cache-dir path Use a cache directory
|
49
59
|
--json Create output file in JSON format
|
50
60
|
--force Force computation
|
61
|
+
--slurm [options] Submit to slurm PBS
|
51
62
|
--q, --quiet Run quietly
|
52
63
|
-v, --verbose Run verbosely
|
53
64
|
--debug Show debug messages and keep intermediate output
|
@@ -65,6 +76,8 @@ Unpack it and run the tool as
|
|
65
76
|
|
66
77
|
./bin/gemma-wrapper --help
|
67
78
|
|
79
|
+
See below for using a GNU Guix environment.
|
80
|
+
|
68
81
|
## Usage
|
69
82
|
|
70
83
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -91,11 +104,12 @@ the data files are found):
|
|
91
104
|
|
92
105
|
Run it twice to see
|
93
106
|
|
94
|
-
/tmp/
|
107
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
95
108
|
|
96
109
|
gemma-wrapper computes the unique HASH value over the command
|
97
110
|
line switches passed into GEMMA as well as the contents of the files
|
98
|
-
passed in (here the genotype and phenotype files
|
111
|
+
passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
|
112
|
+
GEMMA always computes the same K).
|
99
113
|
|
100
114
|
You can also get JSON output on STDOUT by providing the --json switch
|
101
115
|
|
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
103
117
|
-g test/data/input/BXD_geno.txt.gz \
|
104
118
|
-p test/data/input/BXD_pheno.txt \
|
105
119
|
-gk \
|
106
|
-
-debug
|
120
|
+
-debug > K.json
|
107
121
|
|
108
|
-
|
122
|
+
K.json is something that can be parsed with a calling program, and is
|
123
|
+
also below as input for the GWA step. Example:
|
109
124
|
|
110
125
|
```json
|
111
126
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
138
|
|
124
139
|
will store K in ~/.gemma-cache.
|
125
140
|
|
141
|
+
### GWA
|
142
|
+
|
143
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
144
|
+
switch
|
145
|
+
|
146
|
+
gemma-wrapper --json --loco --input K.json -- \
|
147
|
+
-g test/data/input/BXD_geno.txt.gz \
|
148
|
+
-p test/data/input/BXD_pheno.txt \
|
149
|
+
-c test/data/input/BXD_covariates2.txt \
|
150
|
+
-a test/data/input/BXD_snps.txt \
|
151
|
+
-lmm 2 -maf 0.1 \
|
152
|
+
-debug > GWA.json
|
153
|
+
|
154
|
+
Running it twice should show that GWA is not recomputed.
|
155
|
+
|
156
|
+
/tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
|
157
|
+
|
126
158
|
### LOCO
|
127
159
|
|
128
160
|
Recent versions of GEMMA have LOCO support for a single chromosome
|
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
|
|
158
190
|
The -k switch is injected automatically. Again output switches are not
|
159
191
|
allowed (-o, -outdir)
|
160
192
|
|
193
|
+
### Permutations
|
194
|
+
|
195
|
+
Permutations can be run with and without LOCO. First create K
|
196
|
+
|
197
|
+
gemma-wrapper --json -- \
|
198
|
+
-g test/data/input/BXD_geno.txt.gz \
|
199
|
+
-p test/data/input/BXD_pheno.txt \
|
200
|
+
-gk \
|
201
|
+
-debug > K.json
|
202
|
+
|
203
|
+
Next, using K.json, permute the phenotypes with something like
|
204
|
+
|
205
|
+
gemma-wrapper --json --loco --input K.json \
|
206
|
+
--permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
|
207
|
+
-g test/data/input/BXD_geno.txt.gz \
|
208
|
+
-p test/data/input/BXD_pheno.txt \
|
209
|
+
-c test/data/input/BXD_covariates2.txt \
|
210
|
+
-a test/data/input/BXD_snps.txt \
|
211
|
+
-lmm 2 -maf 0.1 \
|
212
|
+
-debug > GWA.json
|
213
|
+
|
214
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
215
|
+
|
216
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
217
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
218
|
+
|
219
|
+
### Slurm PBS
|
220
|
+
|
221
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
222
|
+
|
223
|
+
## Development
|
224
|
+
|
225
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
226
|
+
|
227
|
+
```
|
228
|
+
source .guix-deploy
|
229
|
+
ruby bin/gemma-wrapper --help
|
230
|
+
```
|
231
|
+
|
161
232
|
## Copyright
|
162
233
|
|
163
|
-
Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
234
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.1
|
data/bin/gemma-wrapper
CHANGED
@@ -4,9 +4,10 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017 Pjotr Prins <pjotr.prins@thebird.nl>
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
|
-
USAGE = "
|
9
|
+
USAGE = "
|
10
|
+
GEMMA wrapper example:
|
10
11
|
|
11
12
|
Simple caching of K computation with
|
12
13
|
|
@@ -34,9 +35,13 @@ USAGE = "GEMMA wrapper example:
|
|
34
35
|
-lmm 2 -maf 0.1 \\
|
35
36
|
-debug > GWA.json
|
36
37
|
|
38
|
+
Gemma gets used from the path. You can override by setting
|
39
|
+
|
40
|
+
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
37
41
|
"
|
38
|
-
|
39
|
-
|
42
|
+
# These are used for testing compatibility with the gemma tool
|
43
|
+
GEMMA_V_MAJOR = 98
|
44
|
+
GEMMA_V_MINOR = 1
|
40
45
|
|
41
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
42
47
|
$: << File.join(basepath,'lib')
|
@@ -59,8 +64,11 @@ if not gemma_command
|
|
59
64
|
end
|
60
65
|
end
|
61
66
|
|
67
|
+
|
68
|
+
require 'digest/sha1'
|
62
69
|
require 'fileutils'
|
63
70
|
require 'optparse'
|
71
|
+
require 'tempfile'
|
64
72
|
require 'tmpdir'
|
65
73
|
|
66
74
|
split_at = ARGV.index('--')
|
@@ -68,12 +76,22 @@ if split_at
|
|
68
76
|
gemma_args = ARGV[split_at+1..-1]
|
69
77
|
end
|
70
78
|
|
71
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
79
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
|
72
80
|
|
73
81
|
opts = OptionParser.new do |o|
|
74
|
-
o.banner = "
|
82
|
+
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
83
|
+
|
84
|
+
o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
|
85
|
+
options[:permutate] = lst
|
86
|
+
options[:force] = true
|
87
|
+
end
|
88
|
+
|
89
|
+
o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
90
|
+
options[:permute_phenotypes] = phenotypes
|
91
|
+
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
92
|
+
end
|
75
93
|
|
76
|
-
o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
|
94
|
+
o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
|
77
95
|
options[:loco] = lst
|
78
96
|
end
|
79
97
|
|
@@ -90,10 +108,22 @@ opts = OptionParser.new do |o|
|
|
90
108
|
options[:json] = b
|
91
109
|
end
|
92
110
|
|
93
|
-
o.on("--force", "Force computation") do |q|
|
111
|
+
o.on("--force", "Force computation (override cache)") do |q|
|
94
112
|
options[:force] = true
|
95
113
|
end
|
96
114
|
|
115
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
116
|
+
options[:parallel] = false
|
117
|
+
end
|
118
|
+
|
119
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
120
|
+
options[:slurm_opts] = ""
|
121
|
+
options[:slurm] = true
|
122
|
+
if slurm
|
123
|
+
options[:slurm_opts] = slurm
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
97
127
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
98
128
|
options[:quiet] = true
|
99
129
|
end
|
@@ -102,15 +132,20 @@ opts = OptionParser.new do |o|
|
|
102
132
|
options[:verbose] = true
|
103
133
|
end
|
104
134
|
|
105
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
135
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
106
136
|
options[:debug] = true
|
107
137
|
end
|
108
138
|
|
139
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
140
|
+
options[:dry_run] = b
|
141
|
+
end
|
142
|
+
|
109
143
|
o.on('--','Anything after gets passed to GEMMA') do
|
110
144
|
o.terminate()
|
111
145
|
end
|
112
146
|
|
113
147
|
o.separator ""
|
148
|
+
|
114
149
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
115
150
|
options[:show_help] = true
|
116
151
|
end
|
@@ -129,6 +164,7 @@ json_out = lambda do
|
|
129
164
|
print record.to_json if options[:json]
|
130
165
|
end
|
131
166
|
|
167
|
+
# ---- Some error handlers
|
132
168
|
error = lambda do |*msg|
|
133
169
|
if options[:json]
|
134
170
|
record[:error] = *msg.join(" ")
|
@@ -137,12 +173,14 @@ error = lambda do |*msg|
|
|
137
173
|
end
|
138
174
|
raise *msg
|
139
175
|
end
|
176
|
+
|
140
177
|
debug = lambda do |*msg|
|
141
178
|
if options[:debug]
|
142
179
|
record[:debug].push *msg.join("") if options[:json]
|
143
180
|
OUTPUT.print "DEBUG: ",*msg,"\n"
|
144
181
|
end
|
145
182
|
end
|
183
|
+
|
146
184
|
warning = lambda do |*msg|
|
147
185
|
record[:warnings].push *msg.join("")
|
148
186
|
OUTPUT.print "WARNING: ",*msg,"\n"
|
@@ -152,18 +190,32 @@ info = lambda do |*msg|
|
|
152
190
|
OUTPUT.print *msg,"\n" if !options[:quiet]
|
153
191
|
end
|
154
192
|
|
193
|
+
# ---- Start banner
|
194
|
+
|
155
195
|
GEMMA_K_VERSION=version
|
156
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017\n"
|
196
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
157
197
|
info.call GEMMA_K_BANNER
|
158
198
|
|
159
199
|
# Check gemma version
|
160
200
|
GEMMA_COMMAND=options[:gemma_command]
|
161
|
-
|
162
|
-
|
201
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
202
|
+
|
203
|
+
begin
|
204
|
+
GEMMA_INFO = `#{GEMMA_COMMAND}`
|
205
|
+
rescue Errno::ENOENT
|
206
|
+
GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
|
207
|
+
error.call "<#{GEMMA_COMMAND}> command not found"
|
208
|
+
end
|
209
|
+
|
210
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
211
|
+
info.call "Using ",gemma_version_header,"\n"
|
163
212
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
164
213
|
v_version, v_major, v_minor = gemma_version.split(".")
|
214
|
+
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
215
|
+
|
216
|
+
info.call gemma_version_header
|
165
217
|
|
166
|
-
|
218
|
+
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
167
219
|
|
168
220
|
options[:gemma_version_header] = gemma_version_header
|
169
221
|
options[:gemma_version] = gemma_version
|
@@ -178,25 +230,82 @@ if RUBY_VERSION =~ /^1/
|
|
178
230
|
warning "runs on Ruby 2.x only\n"
|
179
231
|
end
|
180
232
|
|
233
|
+
debug.call(options) # some debug output
|
234
|
+
debug.call(record)
|
235
|
+
|
236
|
+
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
237
|
+
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
238
|
+
|
239
|
+
# ---- Set up parallel
|
240
|
+
if options[:parallel]
|
241
|
+
begin
|
242
|
+
PARALLEL_INFO = `parallel --help`
|
243
|
+
rescue Errno::ENOENT
|
244
|
+
error.call "<parallel> command not found"
|
245
|
+
end
|
246
|
+
parallel_cmds = []
|
247
|
+
end
|
248
|
+
|
181
249
|
# ---- Compute HASH on inputs
|
182
250
|
hashme = []
|
183
251
|
geno_idx = gemma_args.index '-g'
|
184
|
-
raise "Expected GEMMA -g switch" if geno_idx == nil
|
185
|
-
|
252
|
+
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
253
|
+
pheno_idx = gemma_args.index '-p'
|
186
254
|
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
255
|
+
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
256
|
+
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
257
|
+
end
|
258
|
+
|
259
|
+
|
260
|
+
execute = lambda { |cmd|
|
261
|
+
info.call("Executing: #{cmd}")
|
262
|
+
err = 0
|
263
|
+
if not options[:debug]
|
264
|
+
# send output to stderr line by line
|
265
|
+
IO.popen("#{cmd}") do |io|
|
266
|
+
while s = io.gets
|
267
|
+
$stderr.print s
|
268
|
+
end
|
269
|
+
io.close
|
270
|
+
err = $?.to_i
|
271
|
+
end
|
194
272
|
else
|
195
|
-
|
273
|
+
$stderr.print `#{cmd}`
|
274
|
+
err = $?.to_i
|
196
275
|
end
|
276
|
+
err
|
277
|
+
}
|
278
|
+
|
279
|
+
hashme =
|
280
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
281
|
+
# Remove the phenotype file from the hash for GRM computation
|
282
|
+
gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
|
283
|
+
else
|
284
|
+
gemma_args
|
285
|
+
end
|
286
|
+
|
287
|
+
compute_hash = lambda do | phenofn = nil |
|
288
|
+
# Compute a HASH on the inputs
|
289
|
+
debug.call "Hashing on ",hashme,"\n"
|
290
|
+
hashes = []
|
291
|
+
hm = if phenofn
|
292
|
+
hashme + ["-p", phenofn]
|
293
|
+
else
|
294
|
+
hashme
|
295
|
+
end
|
296
|
+
debug.call(hm)
|
297
|
+
hm.each do | item |
|
298
|
+
if File.file?(item)
|
299
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
300
|
+
debug.call [item,hashes.last]
|
301
|
+
else
|
302
|
+
hashes << item
|
303
|
+
end
|
304
|
+
end
|
305
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
197
306
|
end
|
198
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
199
307
|
|
308
|
+
HASH = compute_hash.call()
|
200
309
|
options[:hash] = HASH
|
201
310
|
|
202
311
|
# Create cache dir
|
@@ -210,26 +319,49 @@ GEMMA_ARGS = gemma_args
|
|
210
319
|
|
211
320
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
212
321
|
|
213
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
214
|
-
cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
322
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
323
|
+
cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
215
324
|
record[:gemma_command] = cmd
|
216
325
|
return if cache_hit
|
217
|
-
|
326
|
+
if options[:slurm]
|
327
|
+
info.call cmd
|
328
|
+
hashi = HASH
|
329
|
+
prefix = options[:cache_dir]+'/'+hashi
|
330
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
331
|
+
script = "#!/bin/bash
|
332
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
333
|
+
#SBATCH --ntasks=1
|
334
|
+
#SBATCH --time=20:00
|
335
|
+
srun #{cmd}
|
336
|
+
"
|
337
|
+
debug.call(script)
|
338
|
+
File.open(scriptfn,"w") { |f|
|
339
|
+
f.write(script)
|
340
|
+
}
|
341
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
342
|
+
end
|
218
343
|
errno =
|
219
344
|
if options[:json]
|
220
345
|
# capture output
|
221
346
|
err = 0
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
347
|
+
if options[:dry_run]
|
348
|
+
info.call("Would have invoked: ",cmd)
|
349
|
+
elsif options[:parallel]
|
350
|
+
info.call("Add parallel job: ",cmd)
|
351
|
+
parallel_cmds << cmd
|
352
|
+
else
|
353
|
+
err = execute.call(cmd)
|
228
354
|
end
|
229
355
|
err
|
230
356
|
else
|
231
|
-
|
232
|
-
|
357
|
+
if options[:dry_run]
|
358
|
+
info.call("Would have invoked ",cmd)
|
359
|
+
0
|
360
|
+
else
|
361
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
362
|
+
system(cmd)
|
363
|
+
$?.exitstatus
|
364
|
+
end
|
233
365
|
end
|
234
366
|
if errno != 0
|
235
367
|
debug.call "Gemma exit ",errno
|
@@ -240,10 +372,12 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
240
372
|
end
|
241
373
|
|
242
374
|
# returns datafn, logfn, cache_hit
|
243
|
-
cache = lambda do | chr, ext |
|
375
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
244
376
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
245
|
-
hashi =
|
246
|
-
prefix = options[:cache_dir]+'/'+hashi
|
377
|
+
hashi = (chr==nil ? h : h+inject)
|
378
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
379
|
+
# for chr 3 and permutation 1 forms something like
|
380
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
247
381
|
logfn = prefix+".log.txt"
|
248
382
|
datafn = prefix+ext
|
249
383
|
record[:files] ||= []
|
@@ -260,6 +394,7 @@ cache = lambda do | chr, ext |
|
|
260
394
|
return hashi,false
|
261
395
|
end
|
262
396
|
|
397
|
+
# ---- Compute K
|
263
398
|
kinship = lambda do | chr = nil |
|
264
399
|
record[:type] = "K"
|
265
400
|
ext = case (GEMMA_ARGS[GEMMA_ARGS.index('-gk')+1]).to_i
|
@@ -277,21 +412,23 @@ kinship = lambda do | chr = nil |
|
|
277
412
|
end
|
278
413
|
end
|
279
414
|
|
280
|
-
|
415
|
+
# ---- Run GWA
|
416
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
281
417
|
record[:type] = "GWA"
|
282
|
-
error.call "Do not use the GEMMA -k switch!" if GEMMA_ARGS.include? '-k'
|
283
|
-
|
418
|
+
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
419
|
+
# Update hash for each permutation
|
420
|
+
hash = compute_hash.call(pfn)
|
421
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
284
422
|
if not cache_hit
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
end
|
423
|
+
args = [ '-k', kfn, '-o', hashi ]
|
424
|
+
args << [ '-loco', chr ] if chr != nil
|
425
|
+
args << [ '-p', pfn ] if pfn
|
426
|
+
invoke_gemma.call args,false,chr,permutation
|
290
427
|
end
|
291
428
|
end
|
292
429
|
|
293
430
|
LOCO = options[:loco]
|
294
|
-
if
|
431
|
+
if DO_COMPUTE_KINSHIP
|
295
432
|
# compute K
|
296
433
|
info.call LOCO
|
297
434
|
if LOCO != nil
|
@@ -303,14 +440,80 @@ if GEMMA_ARGS.include? '-gk'
|
|
303
440
|
kinship.call # no LOCO
|
304
441
|
end
|
305
442
|
else
|
306
|
-
#
|
443
|
+
# DO_COMPUTE_GWA
|
307
444
|
json_in = JSON.parse(File.read(options[:input]))
|
308
445
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
446
|
+
|
447
|
+
pfn = options[:permute_phenotypes] # can be nil
|
309
448
|
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
310
|
-
k_files.each do | chr, kfn |
|
311
|
-
gwas.call(chr,kfn)
|
449
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
450
|
+
gwas.call(chr,kfn,pfn)
|
451
|
+
end
|
452
|
+
# Permute
|
453
|
+
if options[:permutate]
|
454
|
+
ps = []
|
455
|
+
raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
|
456
|
+
File.foreach(pfn).with_index do |line, line_num|
|
457
|
+
ps << line
|
458
|
+
end
|
459
|
+
score_list = []
|
460
|
+
debug.call(options[:permutate],"x permutations")
|
461
|
+
(1..options[:permutate]).each do |permutation|
|
462
|
+
$stderr.print "Iteration ",permutation,"\n"
|
463
|
+
# Create a shuffled phenotype file
|
464
|
+
file = File.open("phenotypes-#{permutation}","w")
|
465
|
+
tmp_pfn = file.path
|
466
|
+
p tmp_pfn
|
467
|
+
ps.shuffle.each do | l |
|
468
|
+
file.print(l)
|
469
|
+
end
|
470
|
+
file.close
|
471
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
472
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
473
|
+
end
|
474
|
+
score_min = 1000.0
|
475
|
+
if false and not options[:slurm]
|
476
|
+
# p [:HEY,record[:files].last]
|
477
|
+
assocfn = record[:files].last[2]
|
478
|
+
debug.call("Reading ",assocfn)
|
479
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
480
|
+
if assoc_line_num > 0
|
481
|
+
value = assoc.strip.split(/\t/).last.to_f
|
482
|
+
score_min = value if value < score_min
|
483
|
+
end
|
484
|
+
end
|
485
|
+
end
|
486
|
+
score_list << score_min
|
487
|
+
end
|
488
|
+
exit 0 if options[:slurm]
|
489
|
+
ls = score_list.sort
|
490
|
+
p ls
|
491
|
+
significant = ls[(ls.size - ls.size*0.95).floor]
|
492
|
+
suggestive = ls[(ls.size - ls.size*0.67).floor]
|
493
|
+
p ["95 percentile (significant) ",significant,(-Math.log10(significant)).round(1)]
|
494
|
+
p ["67 percentile (suggestive) ",suggestive,(-Math.log10(suggestive)).round(1)]
|
495
|
+
exit 0
|
312
496
|
end
|
313
497
|
end
|
314
498
|
|
499
|
+
# ---- Invoke parallel
|
500
|
+
if options[:parallel]
|
501
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
502
|
+
cmd = parallel_cmds.join("\\n")
|
503
|
+
|
504
|
+
cmd = "echo -e \"#{cmd}\""
|
505
|
+
err = execute.call(cmd+"|parallel") # all jobs in parallel
|
506
|
+
if err != 0
|
507
|
+
[16,8,4,1].each do |jobs|
|
508
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
509
|
+
err = execute.call(cmd+"|parallel -j #{jobs}")
|
510
|
+
break if err == 0
|
511
|
+
end
|
512
|
+
if err != 0
|
513
|
+
info.call("Run failed!")
|
514
|
+
exit err
|
515
|
+
end
|
516
|
+
end
|
517
|
+
info.call("Run successful!")
|
518
|
+
end
|
315
519
|
json_out.call
|
316
|
-
exit 0
|
data/gemma-wrapper.gemspec
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
|
-
s.summary = "
|
5
|
-
s.description = "GEMMA wrapper caches K between runs with LOCO support"
|
4
|
+
s.summary = "GEMMA with LOCO and permutations"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
metadata
CHANGED
@@ -1,16 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.99.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-07-11 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
|
14
|
+
runs with LOCO support
|
14
15
|
email: pjotr.public01@thebird.nl
|
15
16
|
executables:
|
16
17
|
- gemma-wrapper
|
@@ -43,8 +44,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
44
|
version: '0'
|
44
45
|
requirements: []
|
45
46
|
rubyforge_project:
|
46
|
-
rubygems_version: 2.
|
47
|
+
rubygems_version: 2.7.6.2
|
47
48
|
signing_key:
|
48
49
|
specification_version: 4
|
49
|
-
summary:
|
50
|
+
summary: GEMMA with LOCO and permutations
|
50
51
|
test_files: []
|