bio-gemma-wrapper 0.92.2 → 0.99.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +84 -13
- data/VERSION +1 -1
- data/bin/gemma-wrapper +254 -51
- data/gemma-wrapper.gemspec +2 -2
- metadata +6 -5
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 9ddfd904e74beebe0de1b97732d872fce171732965a835b101b9cc9be815bb05
|
4
|
+
data.tar.gz: 2dae1c019da23f2f87216694d641fc1eb852aa7800557bd10cfb08cb3425e844
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 38454a3f12dab85bef711051e73e20a015fe6b6d9c71bafada2197b9aef1aa0eabe3f3709cb0dc9d0c39f4cc454c15bc4d3aea5d06140ccde72fa13aa6285f51
|
7
|
+
data.tar.gz: 28e77a6995893245c501e602d488b5e0c504549fa91d8c94f902591b87b4454fe9b7923667dfacae2ab1dac7f6f7d814df1ec036b2b4f616dfd4b84c549d35d1
|
data/README.md
CHANGED
@@ -1,10 +1,19 @@
|
|
1
|
-
|
1
|
+
[![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
|
+
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
2
4
|
|
3
5
|
![Genetic associations identified in CFW mice using GEMMA (Parker et al,
|
4
6
|
Nat. Genet., 2016)](cfw.gif)
|
5
7
|
|
6
8
|
## Introduction
|
7
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
|
12
|
+
is used to run GEMMA as part of the https://genenetwork.org/
|
13
|
+
environment.
|
14
|
+
|
15
|
+
Note that gemma-wrapper is projected to be integrated into gemma2/lib.
|
16
|
+
|
8
17
|
GEMMA is a software toolkit for fast application of linear mixed
|
9
18
|
models (LMMs) and related models to genome-wide association studies
|
10
19
|
(GWAS) and other large-scale data sets.
|
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
|
|
12
21
|
This repository contains gemma-wrapper, essentially a wrapper of
|
13
22
|
GEMMA that provides support for caching the kinship or relatedness
|
14
23
|
matrix (K) and caching LM and LMM computations with the option of full
|
15
|
-
leave-one-chromosome-out genome scans (LOCO).
|
24
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
25
|
+
submitted to HPC PBS, i.e., slurm.
|
16
26
|
|
17
27
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
18
28
|
does a pass-through of all standard GEMMA invocation switches. On
|
19
29
|
return gemma-wrapper can return a JSON object (--json) which is
|
20
30
|
useful for web-services.
|
21
31
|
|
22
|
-
Note that this a work in progress (WIP). What is described below
|
23
|
-
should work.
|
24
|
-
|
25
32
|
## Installation
|
26
33
|
|
27
34
|
Prerequisites are
|
@@ -30,8 +37,9 @@ Prerequisites are
|
|
30
37
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
31
38
|
almost all Linux systems
|
32
39
|
|
33
|
-
gemma-wrapper comes as a Ruby
|
34
|
-
can be
|
40
|
+
gemma-wrapper comes as a Ruby
|
41
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
42
|
+
installed with
|
35
43
|
|
36
44
|
gem install bio-gemma-wrapper
|
37
45
|
|
@@ -39,15 +47,18 @@ Invoke the tool with
|
|
39
47
|
|
40
48
|
gemma-wrapper --help
|
41
49
|
|
42
|
-
and it will render
|
50
|
+
and it will render something like
|
43
51
|
|
44
52
|
```
|
45
53
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
54
|
+
--permutate n Permutate # times by shuffling phenotypes
|
55
|
+
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
46
56
|
--loco [x,y,1,2,3...] Run full LOCO
|
47
57
|
--input filen JSON input variables (used for LOCO)
|
48
58
|
--cache-dir path Use a cache directory
|
49
59
|
--json Create output file in JSON format
|
50
60
|
--force Force computation
|
61
|
+
--slurm [options] Submit to slurm PBS
|
51
62
|
--q, --quiet Run quietly
|
52
63
|
-v, --verbose Run verbosely
|
53
64
|
--debug Show debug messages and keep intermediate output
|
@@ -65,6 +76,8 @@ Unpack it and run the tool as
|
|
65
76
|
|
66
77
|
./bin/gemma-wrapper --help
|
67
78
|
|
79
|
+
See below for using a GNU Guix environment.
|
80
|
+
|
68
81
|
## Usage
|
69
82
|
|
70
83
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -91,11 +104,12 @@ the data files are found):
|
|
91
104
|
|
92
105
|
Run it twice to see
|
93
106
|
|
94
|
-
/tmp/
|
107
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
95
108
|
|
96
109
|
gemma-wrapper computes the unique HASH value over the command
|
97
110
|
line switches passed into GEMMA as well as the contents of the files
|
98
|
-
passed in (here the genotype and phenotype files
|
111
|
+
passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
|
112
|
+
GEMMA always computes the same K).
|
99
113
|
|
100
114
|
You can also get JSON output on STDOUT by providing the --json switch
|
101
115
|
|
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
103
117
|
-g test/data/input/BXD_geno.txt.gz \
|
104
118
|
-p test/data/input/BXD_pheno.txt \
|
105
119
|
-gk \
|
106
|
-
-debug
|
120
|
+
-debug > K.json
|
107
121
|
|
108
|
-
|
122
|
+
K.json is something that can be parsed with a calling program, and is
|
123
|
+
also below as input for the GWA step. Example:
|
109
124
|
|
110
125
|
```json
|
111
126
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
138
|
|
124
139
|
will store K in ~/.gemma-cache.
|
125
140
|
|
141
|
+
### GWA
|
142
|
+
|
143
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
144
|
+
switch
|
145
|
+
|
146
|
+
gemma-wrapper --json --loco --input K.json -- \
|
147
|
+
-g test/data/input/BXD_geno.txt.gz \
|
148
|
+
-p test/data/input/BXD_pheno.txt \
|
149
|
+
-c test/data/input/BXD_covariates2.txt \
|
150
|
+
-a test/data/input/BXD_snps.txt \
|
151
|
+
-lmm 2 -maf 0.1 \
|
152
|
+
-debug > GWA.json
|
153
|
+
|
154
|
+
Running it twice should show that GWA is not recomputed.
|
155
|
+
|
156
|
+
/tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
|
157
|
+
|
126
158
|
### LOCO
|
127
159
|
|
128
160
|
Recent versions of GEMMA have LOCO support for a single chromosome
|
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
|
|
158
190
|
The -k switch is injected automatically. Again output switches are not
|
159
191
|
allowed (-o, -outdir)
|
160
192
|
|
193
|
+
### Permutations
|
194
|
+
|
195
|
+
Permutations can be run with and without LOCO. First create K
|
196
|
+
|
197
|
+
gemma-wrapper --json -- \
|
198
|
+
-g test/data/input/BXD_geno.txt.gz \
|
199
|
+
-p test/data/input/BXD_pheno.txt \
|
200
|
+
-gk \
|
201
|
+
-debug > K.json
|
202
|
+
|
203
|
+
Next, using K.json, permute the phenotypes with something like
|
204
|
+
|
205
|
+
gemma-wrapper --json --loco --input K.json \
|
206
|
+
--permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
|
207
|
+
-g test/data/input/BXD_geno.txt.gz \
|
208
|
+
-p test/data/input/BXD_pheno.txt \
|
209
|
+
-c test/data/input/BXD_covariates2.txt \
|
210
|
+
-a test/data/input/BXD_snps.txt \
|
211
|
+
-lmm 2 -maf 0.1 \
|
212
|
+
-debug > GWA.json
|
213
|
+
|
214
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
215
|
+
|
216
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
217
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
218
|
+
|
219
|
+
### Slurm PBS
|
220
|
+
|
221
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
222
|
+
|
223
|
+
## Development
|
224
|
+
|
225
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
226
|
+
|
227
|
+
```
|
228
|
+
source .guix-deploy
|
229
|
+
ruby bin/gemma-wrapper --help
|
230
|
+
```
|
231
|
+
|
161
232
|
## Copyright
|
162
233
|
|
163
|
-
Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
234
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.1
|
data/bin/gemma-wrapper
CHANGED
@@ -4,9 +4,10 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017 Pjotr Prins <pjotr.prins@thebird.nl>
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
|
-
USAGE = "
|
9
|
+
USAGE = "
|
10
|
+
GEMMA wrapper example:
|
10
11
|
|
11
12
|
Simple caching of K computation with
|
12
13
|
|
@@ -34,9 +35,13 @@ USAGE = "GEMMA wrapper example:
|
|
34
35
|
-lmm 2 -maf 0.1 \\
|
35
36
|
-debug > GWA.json
|
36
37
|
|
38
|
+
Gemma gets used from the path. You can override by setting
|
39
|
+
|
40
|
+
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
37
41
|
"
|
38
|
-
|
39
|
-
|
42
|
+
# These are used for testing compatibility with the gemma tool
|
43
|
+
GEMMA_V_MAJOR = 98
|
44
|
+
GEMMA_V_MINOR = 1
|
40
45
|
|
41
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
42
47
|
$: << File.join(basepath,'lib')
|
@@ -59,8 +64,11 @@ if not gemma_command
|
|
59
64
|
end
|
60
65
|
end
|
61
66
|
|
67
|
+
|
68
|
+
require 'digest/sha1'
|
62
69
|
require 'fileutils'
|
63
70
|
require 'optparse'
|
71
|
+
require 'tempfile'
|
64
72
|
require 'tmpdir'
|
65
73
|
|
66
74
|
split_at = ARGV.index('--')
|
@@ -68,12 +76,22 @@ if split_at
|
|
68
76
|
gemma_args = ARGV[split_at+1..-1]
|
69
77
|
end
|
70
78
|
|
71
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
79
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
|
72
80
|
|
73
81
|
opts = OptionParser.new do |o|
|
74
|
-
o.banner = "
|
82
|
+
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
83
|
+
|
84
|
+
o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
|
85
|
+
options[:permutate] = lst
|
86
|
+
options[:force] = true
|
87
|
+
end
|
88
|
+
|
89
|
+
o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
90
|
+
options[:permute_phenotypes] = phenotypes
|
91
|
+
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
92
|
+
end
|
75
93
|
|
76
|
-
o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
|
94
|
+
o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
|
77
95
|
options[:loco] = lst
|
78
96
|
end
|
79
97
|
|
@@ -90,10 +108,22 @@ opts = OptionParser.new do |o|
|
|
90
108
|
options[:json] = b
|
91
109
|
end
|
92
110
|
|
93
|
-
o.on("--force", "Force computation") do |q|
|
111
|
+
o.on("--force", "Force computation (override cache)") do |q|
|
94
112
|
options[:force] = true
|
95
113
|
end
|
96
114
|
|
115
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
116
|
+
options[:parallel] = false
|
117
|
+
end
|
118
|
+
|
119
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
120
|
+
options[:slurm_opts] = ""
|
121
|
+
options[:slurm] = true
|
122
|
+
if slurm
|
123
|
+
options[:slurm_opts] = slurm
|
124
|
+
end
|
125
|
+
end
|
126
|
+
|
97
127
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
98
128
|
options[:quiet] = true
|
99
129
|
end
|
@@ -102,15 +132,20 @@ opts = OptionParser.new do |o|
|
|
102
132
|
options[:verbose] = true
|
103
133
|
end
|
104
134
|
|
105
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
135
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
106
136
|
options[:debug] = true
|
107
137
|
end
|
108
138
|
|
139
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
140
|
+
options[:dry_run] = b
|
141
|
+
end
|
142
|
+
|
109
143
|
o.on('--','Anything after gets passed to GEMMA') do
|
110
144
|
o.terminate()
|
111
145
|
end
|
112
146
|
|
113
147
|
o.separator ""
|
148
|
+
|
114
149
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
115
150
|
options[:show_help] = true
|
116
151
|
end
|
@@ -129,6 +164,7 @@ json_out = lambda do
|
|
129
164
|
print record.to_json if options[:json]
|
130
165
|
end
|
131
166
|
|
167
|
+
# ---- Some error handlers
|
132
168
|
error = lambda do |*msg|
|
133
169
|
if options[:json]
|
134
170
|
record[:error] = *msg.join(" ")
|
@@ -137,12 +173,14 @@ error = lambda do |*msg|
|
|
137
173
|
end
|
138
174
|
raise *msg
|
139
175
|
end
|
176
|
+
|
140
177
|
debug = lambda do |*msg|
|
141
178
|
if options[:debug]
|
142
179
|
record[:debug].push *msg.join("") if options[:json]
|
143
180
|
OUTPUT.print "DEBUG: ",*msg,"\n"
|
144
181
|
end
|
145
182
|
end
|
183
|
+
|
146
184
|
warning = lambda do |*msg|
|
147
185
|
record[:warnings].push *msg.join("")
|
148
186
|
OUTPUT.print "WARNING: ",*msg,"\n"
|
@@ -152,18 +190,32 @@ info = lambda do |*msg|
|
|
152
190
|
OUTPUT.print *msg,"\n" if !options[:quiet]
|
153
191
|
end
|
154
192
|
|
193
|
+
# ---- Start banner
|
194
|
+
|
155
195
|
GEMMA_K_VERSION=version
|
156
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017\n"
|
196
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
157
197
|
info.call GEMMA_K_BANNER
|
158
198
|
|
159
199
|
# Check gemma version
|
160
200
|
GEMMA_COMMAND=options[:gemma_command]
|
161
|
-
|
162
|
-
|
201
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
202
|
+
|
203
|
+
begin
|
204
|
+
GEMMA_INFO = `#{GEMMA_COMMAND}`
|
205
|
+
rescue Errno::ENOENT
|
206
|
+
GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
|
207
|
+
error.call "<#{GEMMA_COMMAND}> command not found"
|
208
|
+
end
|
209
|
+
|
210
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
211
|
+
info.call "Using ",gemma_version_header,"\n"
|
163
212
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
164
213
|
v_version, v_major, v_minor = gemma_version.split(".")
|
214
|
+
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
215
|
+
|
216
|
+
info.call gemma_version_header
|
165
217
|
|
166
|
-
|
218
|
+
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
167
219
|
|
168
220
|
options[:gemma_version_header] = gemma_version_header
|
169
221
|
options[:gemma_version] = gemma_version
|
@@ -178,25 +230,82 @@ if RUBY_VERSION =~ /^1/
|
|
178
230
|
warning "runs on Ruby 2.x only\n"
|
179
231
|
end
|
180
232
|
|
233
|
+
debug.call(options) # some debug output
|
234
|
+
debug.call(record)
|
235
|
+
|
236
|
+
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
237
|
+
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
238
|
+
|
239
|
+
# ---- Set up parallel
|
240
|
+
if options[:parallel]
|
241
|
+
begin
|
242
|
+
PARALLEL_INFO = `parallel --help`
|
243
|
+
rescue Errno::ENOENT
|
244
|
+
error.call "<parallel> command not found"
|
245
|
+
end
|
246
|
+
parallel_cmds = []
|
247
|
+
end
|
248
|
+
|
181
249
|
# ---- Compute HASH on inputs
|
182
250
|
hashme = []
|
183
251
|
geno_idx = gemma_args.index '-g'
|
184
|
-
raise "Expected GEMMA -g switch" if geno_idx == nil
|
185
|
-
|
252
|
+
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
253
|
+
pheno_idx = gemma_args.index '-p'
|
186
254
|
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
255
|
+
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
256
|
+
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
257
|
+
end
|
258
|
+
|
259
|
+
|
260
|
+
execute = lambda { |cmd|
|
261
|
+
info.call("Executing: #{cmd}")
|
262
|
+
err = 0
|
263
|
+
if not options[:debug]
|
264
|
+
# send output to stderr line by line
|
265
|
+
IO.popen("#{cmd}") do |io|
|
266
|
+
while s = io.gets
|
267
|
+
$stderr.print s
|
268
|
+
end
|
269
|
+
io.close
|
270
|
+
err = $?.to_i
|
271
|
+
end
|
194
272
|
else
|
195
|
-
|
273
|
+
$stderr.print `#{cmd}`
|
274
|
+
err = $?.to_i
|
196
275
|
end
|
276
|
+
err
|
277
|
+
}
|
278
|
+
|
279
|
+
hashme =
|
280
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
281
|
+
# Remove the phenotype file from the hash for GRM computation
|
282
|
+
gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
|
283
|
+
else
|
284
|
+
gemma_args
|
285
|
+
end
|
286
|
+
|
287
|
+
compute_hash = lambda do | phenofn = nil |
|
288
|
+
# Compute a HASH on the inputs
|
289
|
+
debug.call "Hashing on ",hashme,"\n"
|
290
|
+
hashes = []
|
291
|
+
hm = if phenofn
|
292
|
+
hashme + ["-p", phenofn]
|
293
|
+
else
|
294
|
+
hashme
|
295
|
+
end
|
296
|
+
debug.call(hm)
|
297
|
+
hm.each do | item |
|
298
|
+
if File.file?(item)
|
299
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
300
|
+
debug.call [item,hashes.last]
|
301
|
+
else
|
302
|
+
hashes << item
|
303
|
+
end
|
304
|
+
end
|
305
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
197
306
|
end
|
198
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
199
307
|
|
308
|
+
HASH = compute_hash.call()
|
200
309
|
options[:hash] = HASH
|
201
310
|
|
202
311
|
# Create cache dir
|
@@ -210,26 +319,49 @@ GEMMA_ARGS = gemma_args
|
|
210
319
|
|
211
320
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
212
321
|
|
213
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
214
|
-
cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
322
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
323
|
+
cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
215
324
|
record[:gemma_command] = cmd
|
216
325
|
return if cache_hit
|
217
|
-
|
326
|
+
if options[:slurm]
|
327
|
+
info.call cmd
|
328
|
+
hashi = HASH
|
329
|
+
prefix = options[:cache_dir]+'/'+hashi
|
330
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
331
|
+
script = "#!/bin/bash
|
332
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
333
|
+
#SBATCH --ntasks=1
|
334
|
+
#SBATCH --time=20:00
|
335
|
+
srun #{cmd}
|
336
|
+
"
|
337
|
+
debug.call(script)
|
338
|
+
File.open(scriptfn,"w") { |f|
|
339
|
+
f.write(script)
|
340
|
+
}
|
341
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
342
|
+
end
|
218
343
|
errno =
|
219
344
|
if options[:json]
|
220
345
|
# capture output
|
221
346
|
err = 0
|
222
|
-
|
223
|
-
|
224
|
-
|
225
|
-
|
226
|
-
|
227
|
-
|
347
|
+
if options[:dry_run]
|
348
|
+
info.call("Would have invoked: ",cmd)
|
349
|
+
elsif options[:parallel]
|
350
|
+
info.call("Add parallel job: ",cmd)
|
351
|
+
parallel_cmds << cmd
|
352
|
+
else
|
353
|
+
err = execute.call(cmd)
|
228
354
|
end
|
229
355
|
err
|
230
356
|
else
|
231
|
-
|
232
|
-
|
357
|
+
if options[:dry_run]
|
358
|
+
info.call("Would have invoked ",cmd)
|
359
|
+
0
|
360
|
+
else
|
361
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
362
|
+
system(cmd)
|
363
|
+
$?.exitstatus
|
364
|
+
end
|
233
365
|
end
|
234
366
|
if errno != 0
|
235
367
|
debug.call "Gemma exit ",errno
|
@@ -240,10 +372,12 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
240
372
|
end
|
241
373
|
|
242
374
|
# returns datafn, logfn, cache_hit
|
243
|
-
cache = lambda do | chr, ext |
|
375
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
244
376
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
245
|
-
hashi =
|
246
|
-
prefix = options[:cache_dir]+'/'+hashi
|
377
|
+
hashi = (chr==nil ? h : h+inject)
|
378
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
379
|
+
# for chr 3 and permutation 1 forms something like
|
380
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
247
381
|
logfn = prefix+".log.txt"
|
248
382
|
datafn = prefix+ext
|
249
383
|
record[:files] ||= []
|
@@ -260,6 +394,7 @@ cache = lambda do | chr, ext |
|
|
260
394
|
return hashi,false
|
261
395
|
end
|
262
396
|
|
397
|
+
# ---- Compute K
|
263
398
|
kinship = lambda do | chr = nil |
|
264
399
|
record[:type] = "K"
|
265
400
|
ext = case (GEMMA_ARGS[GEMMA_ARGS.index('-gk')+1]).to_i
|
@@ -277,21 +412,23 @@ kinship = lambda do | chr = nil |
|
|
277
412
|
end
|
278
413
|
end
|
279
414
|
|
280
|
-
|
415
|
+
# ---- Run GWA
|
416
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
281
417
|
record[:type] = "GWA"
|
282
|
-
error.call "Do not use the GEMMA -k switch!" if GEMMA_ARGS.include? '-k'
|
283
|
-
|
418
|
+
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
419
|
+
# Update hash for each permutation
|
420
|
+
hash = compute_hash.call(pfn)
|
421
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
284
422
|
if not cache_hit
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
end
|
423
|
+
args = [ '-k', kfn, '-o', hashi ]
|
424
|
+
args << [ '-loco', chr ] if chr != nil
|
425
|
+
args << [ '-p', pfn ] if pfn
|
426
|
+
invoke_gemma.call args,false,chr,permutation
|
290
427
|
end
|
291
428
|
end
|
292
429
|
|
293
430
|
LOCO = options[:loco]
|
294
|
-
if
|
431
|
+
if DO_COMPUTE_KINSHIP
|
295
432
|
# compute K
|
296
433
|
info.call LOCO
|
297
434
|
if LOCO != nil
|
@@ -303,14 +440,80 @@ if GEMMA_ARGS.include? '-gk'
|
|
303
440
|
kinship.call # no LOCO
|
304
441
|
end
|
305
442
|
else
|
306
|
-
#
|
443
|
+
# DO_COMPUTE_GWA
|
307
444
|
json_in = JSON.parse(File.read(options[:input]))
|
308
445
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
446
|
+
|
447
|
+
pfn = options[:permute_phenotypes] # can be nil
|
309
448
|
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
310
|
-
k_files.each do | chr, kfn |
|
311
|
-
gwas.call(chr,kfn)
|
449
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
450
|
+
gwas.call(chr,kfn,pfn)
|
451
|
+
end
|
452
|
+
# Permute
|
453
|
+
if options[:permutate]
|
454
|
+
ps = []
|
455
|
+
raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
|
456
|
+
File.foreach(pfn).with_index do |line, line_num|
|
457
|
+
ps << line
|
458
|
+
end
|
459
|
+
score_list = []
|
460
|
+
debug.call(options[:permutate],"x permutations")
|
461
|
+
(1..options[:permutate]).each do |permutation|
|
462
|
+
$stderr.print "Iteration ",permutation,"\n"
|
463
|
+
# Create a shuffled phenotype file
|
464
|
+
file = File.open("phenotypes-#{permutation}","w")
|
465
|
+
tmp_pfn = file.path
|
466
|
+
p tmp_pfn
|
467
|
+
ps.shuffle.each do | l |
|
468
|
+
file.print(l)
|
469
|
+
end
|
470
|
+
file.close
|
471
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
472
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
473
|
+
end
|
474
|
+
score_min = 1000.0
|
475
|
+
if false and not options[:slurm]
|
476
|
+
# p [:HEY,record[:files].last]
|
477
|
+
assocfn = record[:files].last[2]
|
478
|
+
debug.call("Reading ",assocfn)
|
479
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
480
|
+
if assoc_line_num > 0
|
481
|
+
value = assoc.strip.split(/\t/).last.to_f
|
482
|
+
score_min = value if value < score_min
|
483
|
+
end
|
484
|
+
end
|
485
|
+
end
|
486
|
+
score_list << score_min
|
487
|
+
end
|
488
|
+
exit 0 if options[:slurm]
|
489
|
+
ls = score_list.sort
|
490
|
+
p ls
|
491
|
+
significant = ls[(ls.size - ls.size*0.95).floor]
|
492
|
+
suggestive = ls[(ls.size - ls.size*0.67).floor]
|
493
|
+
p ["95 percentile (significant) ",significant,(-Math.log10(significant)).round(1)]
|
494
|
+
p ["67 percentile (suggestive) ",suggestive,(-Math.log10(suggestive)).round(1)]
|
495
|
+
exit 0
|
312
496
|
end
|
313
497
|
end
|
314
498
|
|
499
|
+
# ---- Invoke parallel
|
500
|
+
if options[:parallel]
|
501
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
502
|
+
cmd = parallel_cmds.join("\\n")
|
503
|
+
|
504
|
+
cmd = "echo -e \"#{cmd}\""
|
505
|
+
err = execute.call(cmd+"|parallel") # all jobs in parallel
|
506
|
+
if err != 0
|
507
|
+
[16,8,4,1].each do |jobs|
|
508
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
509
|
+
err = execute.call(cmd+"|parallel -j #{jobs}")
|
510
|
+
break if err == 0
|
511
|
+
end
|
512
|
+
if err != 0
|
513
|
+
info.call("Run failed!")
|
514
|
+
exit err
|
515
|
+
end
|
516
|
+
end
|
517
|
+
info.call("Run successful!")
|
518
|
+
end
|
315
519
|
json_out.call
|
316
|
-
exit 0
|
data/gemma-wrapper.gemspec
CHANGED
@@ -1,8 +1,8 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
|
-
s.summary = "
|
5
|
-
s.description = "GEMMA wrapper caches K between runs with LOCO support"
|
4
|
+
s.summary = "GEMMA with LOCO and permutations"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
metadata
CHANGED
@@ -1,16 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.99.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-07-11 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also caches K between
|
14
|
+
runs with LOCO support
|
14
15
|
email: pjotr.public01@thebird.nl
|
15
16
|
executables:
|
16
17
|
- gemma-wrapper
|
@@ -43,8 +44,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
44
|
version: '0'
|
44
45
|
requirements: []
|
45
46
|
rubyforge_project:
|
46
|
-
rubygems_version: 2.
|
47
|
+
rubygems_version: 2.7.6.2
|
47
48
|
signing_key:
|
48
49
|
specification_version: 4
|
49
|
-
summary:
|
50
|
+
summary: GEMMA with LOCO and permutations
|
50
51
|
test_files: []
|