bio-gemma-wrapper 0.98 → 0.99.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/README.md +97 -15
- data/VERSION +1 -1
- data/bin/gemma-wrapper +253 -73
- data/gemma-wrapper.gemspec +1 -1
- metadata +5 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: 0bd37b153e121de9c1758af736cd6904744da2de3540f2a7c547cc423382d8d1
|
4
|
+
data.tar.gz: 84298a943e7cfe6126653895d9714babc83a7be2bf903c7b61ff9072f1d4e4a8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f32d48ec2f194a513e0cf8f15463b05662e1b269fdd22a110e4cdfb9f6bc541238bac7e766cbc31a8b782379891636e4f79fb3e3667d567ab3c610298d4f11c2
|
7
|
+
data.tar.gz: abc9b3faf8ef2f63d566caa14312bde2b2f438c722a83e90f7da775c55e2415171efbfab385273c82e45398c31e5306fb99437c38e287570652a3b04a5432883
|
data/README.md
CHANGED
@@ -1,12 +1,20 @@
|
|
1
|
-
[](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
2
|
|
3
|
-
# GEMMA
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
4
4
|
|
5
5
|

|
7
7
|
|
8
8
|
## Introduction
|
9
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default with LOCO), and GEMMA on
|
12
|
+
PBS. Gemma-wrapper is used to run GEMMA as part of the
|
13
|
+
https://genenetwork.org/ environment.
|
14
|
+
|
15
|
+
Note that a version of gemma-wrapper is projected to be integrated
|
16
|
+
into gemma itself.
|
17
|
+
|
10
18
|
GEMMA is a software toolkit for fast application of linear mixed
|
11
19
|
models (LMMs) and related models to genome-wide association studies
|
12
20
|
(GWAS) and other large-scale data sets.
|
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
|
|
14
22
|
This repository contains gemma-wrapper, essentially a wrapper of
|
15
23
|
GEMMA that provides support for caching the kinship or relatedness
|
16
24
|
matrix (K) and caching LM and LMM computations with the option of full
|
17
|
-
leave-one-chromosome-out genome scans (LOCO).
|
25
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
26
|
+
submitted to HPC PBS, i.e., slurm.
|
18
27
|
|
19
28
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
20
29
|
does a pass-through of all standard GEMMA invocation switches. On
|
21
30
|
return gemma-wrapper can return a JSON object (--json) which is
|
22
31
|
useful for web-services.
|
23
32
|
|
24
|
-
|
25
|
-
|
33
|
+
## Performance
|
34
|
+
|
35
|
+
LOCO runs in parallel by default which is at least a 5x performance
|
36
|
+
improvement on a machine with enough cores. GEMMA without LOCO,
|
37
|
+
however, does not run in parallel by default. Performance
|
38
|
+
improvements with the parallel implementation for LOCO and non-LOCO
|
39
|
+
can be viewed [here](./test/performance/releases.gmi).
|
26
40
|
|
27
41
|
## Installation
|
28
42
|
|
@@ -32,8 +46,9 @@ Prerequisites are
|
|
32
46
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
33
47
|
almost all Linux systems
|
34
48
|
|
35
|
-
gemma-wrapper comes as a Ruby
|
36
|
-
can be
|
49
|
+
gemma-wrapper comes as a Ruby
|
50
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
51
|
+
installed with
|
37
52
|
|
38
53
|
gem install bio-gemma-wrapper
|
39
54
|
|
@@ -47,14 +62,19 @@ and it will render something like
|
|
47
62
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
48
63
|
--permutate n Permutate # times by shuffling phenotypes
|
49
64
|
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
50
|
-
--loco
|
65
|
+
--loco Run full leave-one-chromosome-out (LOCO)
|
66
|
+
--chromosomes [1,2,3] Run specific chromosomes
|
51
67
|
--input filen JSON input variables (used for LOCO)
|
52
68
|
--cache-dir path Use a cache directory
|
53
69
|
--json Create output file in JSON format
|
54
|
-
--force Force computation
|
70
|
+
--force Force computation (override cache)
|
71
|
+
--parallel Run jobs in parallel
|
72
|
+
--no-parallel Do not run jobs in parallel
|
73
|
+
--slurm[=opts] Use slurm PBS for submitting jobs
|
55
74
|
--q, --quiet Run quietly
|
56
75
|
-v, --verbose Run verbosely
|
57
|
-
|
76
|
+
-d, --debug Show debug messages and keep intermediate output
|
77
|
+
--dry-run Show commands, but don't execute
|
58
78
|
-- Anything after gets passed to GEMMA
|
59
79
|
|
60
80
|
-h, --help display this help and exit
|
@@ -69,6 +89,8 @@ Unpack it and run the tool as
|
|
69
89
|
|
70
90
|
./bin/gemma-wrapper --help
|
71
91
|
|
92
|
+
See below for using a GNU Guix environment.
|
93
|
+
|
72
94
|
## Usage
|
73
95
|
|
74
96
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -90,12 +112,13 @@ the data files are found):
|
|
90
112
|
gemma-wrapper -- \
|
91
113
|
-g test/data/input/BXD_geno.txt.gz \
|
92
114
|
-p test/data/input/BXD_pheno.txt \
|
115
|
+
-a test/data/input/BXD_snps.txt \
|
93
116
|
-gk \
|
94
117
|
-debug
|
95
118
|
|
96
119
|
Run it twice to see
|
97
120
|
|
98
|
-
/tmp/
|
121
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
99
122
|
|
100
123
|
gemma-wrapper computes the unique HASH value over the command
|
101
124
|
line switches passed into GEMMA as well as the contents of the files
|
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
107
130
|
gemma-wrapper --json -- \
|
108
131
|
-g test/data/input/BXD_geno.txt.gz \
|
109
132
|
-p test/data/input/BXD_pheno.txt \
|
133
|
+
-a test/data/input/BXD_snps.txt \
|
110
134
|
-gk \
|
111
|
-
-debug
|
135
|
+
-debug > K.json
|
112
136
|
|
113
|
-
|
137
|
+
K.json is something that can be parsed with a calling program, and is
|
138
|
+
also below as input for the GWA step. Example:
|
114
139
|
|
115
140
|
```json
|
116
141
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,11 +148,29 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
148
|
gemma-wrapper --cache-dir ~/.gemma-cache -- \
|
124
149
|
-g test/data/input/BXD_geno.txt.gz \
|
125
150
|
-p test/data/input/BXD_pheno.txt \
|
151
|
+
-a test/data/input/BXD_snps.txt \
|
126
152
|
-gk \
|
127
153
|
-debug
|
128
154
|
|
129
155
|
will store K in ~/.gemma-cache.
|
130
156
|
|
157
|
+
### GWA
|
158
|
+
|
159
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
160
|
+
switch
|
161
|
+
|
162
|
+
gemma-wrapper --json --input K.json -- \
|
163
|
+
-g test/data/input/BXD_geno.txt.gz \
|
164
|
+
-p test/data/input/BXD_pheno.txt \
|
165
|
+
-c test/data/input/BXD_covariates2.txt \
|
166
|
+
-a test/data/input/BXD_snps.txt \
|
167
|
+
-lmm 2 -maf 0.1 \
|
168
|
+
-debug > GWA.json
|
169
|
+
|
170
|
+
Running it twice should show that GWA is not recomputed.
|
171
|
+
|
172
|
+
/tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
|
173
|
+
|
131
174
|
### LOCO
|
132
175
|
|
133
176
|
Recent versions of GEMMA have LOCO support for a single chromosome
|
@@ -136,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
|
|
136
179
|
chromosomes first create all K's with
|
137
180
|
|
138
181
|
gemma-wrapper --json \
|
139
|
-
--loco
|
182
|
+
--loco -- \
|
140
183
|
-g test/data/input/BXD_geno.txt.gz \
|
141
184
|
-p test/data/input/BXD_pheno.txt \
|
142
185
|
-a test/data/input/BXD_snps.txt \
|
@@ -163,6 +206,45 @@ GWA.json contains the file names of every chromosome
|
|
163
206
|
The -k switch is injected automatically. Again output switches are not
|
164
207
|
allowed (-o, -outdir)
|
165
208
|
|
209
|
+
### Permutations
|
210
|
+
|
211
|
+
Permutations can be run with and without LOCO. First create K
|
212
|
+
|
213
|
+
gemma-wrapper --json -- \
|
214
|
+
-g test/data/input/BXD_geno.txt.gz \
|
215
|
+
-p test/data/input/BXD_pheno.txt \
|
216
|
+
-gk \
|
217
|
+
-debug > K.json
|
218
|
+
|
219
|
+
Next, using K.json, permute the phenotypes with something like
|
220
|
+
|
221
|
+
gemma-wrapper --json --loco --input K.json \
|
222
|
+
--permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
|
223
|
+
-g test/data/input/BXD_geno.txt.gz \
|
224
|
+
-p test/data/input/BXD_pheno.txt \
|
225
|
+
-c test/data/input/BXD_covariates2.txt \
|
226
|
+
-a test/data/input/BXD_snps.txt \
|
227
|
+
-lmm 2 -maf 0.1 \
|
228
|
+
-debug > GWA.json
|
229
|
+
|
230
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
231
|
+
|
232
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
233
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
234
|
+
|
235
|
+
### Slurm PBS
|
236
|
+
|
237
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
238
|
+
|
239
|
+
## Development
|
240
|
+
|
241
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
242
|
+
|
243
|
+
```
|
244
|
+
source .guix-deploy
|
245
|
+
ruby bin/gemma-wrapper --help
|
246
|
+
```
|
247
|
+
|
166
248
|
## Copyright
|
167
249
|
|
168
|
-
Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
250
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.3
|
data/bin/gemma-wrapper
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
9
|
USAGE = "
|
10
10
|
GEMMA wrapper example:
|
@@ -14,12 +14,12 @@ GEMMA wrapper example:
|
|
14
14
|
gemma-wrapper -- \\
|
15
15
|
-g test/data/input/BXD_geno.txt.gz \\
|
16
16
|
-p test/data/input/BXD_pheno.txt \\
|
17
|
+
-a test/data/input/BXD_snps.txt \
|
17
18
|
-gk
|
18
19
|
|
19
20
|
LOCO K computation with caching and JSON output
|
20
21
|
|
21
|
-
gemma-wrapper --json \\
|
22
|
-
--loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
|
22
|
+
gemma-wrapper --json --loco -- \\
|
23
23
|
-g test/data/input/BXD_geno.txt.gz \\
|
24
24
|
-p test/data/input/BXD_pheno.txt \\
|
25
25
|
-a test/data/input/BXD_snps.txt \\
|
@@ -38,11 +38,10 @@ GEMMA wrapper example:
|
|
38
38
|
Gemma gets used from the path. You can override by setting
|
39
39
|
|
40
40
|
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
41
|
-
|
42
41
|
"
|
43
|
-
# These are used for testing compatibility
|
42
|
+
# These are used for testing compatibility with the gemma tool
|
44
43
|
GEMMA_V_MAJOR = 98
|
45
|
-
GEMMA_V_MINOR =
|
44
|
+
GEMMA_V_MINOR = 4
|
46
45
|
|
47
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
48
47
|
$: << File.join(basepath,'lib')
|
@@ -66,17 +65,19 @@ if not gemma_command
|
|
66
65
|
end
|
67
66
|
|
68
67
|
|
68
|
+
require 'digest/sha1'
|
69
69
|
require 'fileutils'
|
70
70
|
require 'optparse'
|
71
|
-
require 'tmpdir'
|
72
71
|
require 'tempfile'
|
72
|
+
require 'tmpdir'
|
73
73
|
|
74
74
|
split_at = ARGV.index('--')
|
75
|
+
|
75
76
|
if split_at
|
76
77
|
gemma_args = ARGV[split_at+1..-1]
|
77
78
|
end
|
78
79
|
|
79
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
80
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
|
80
81
|
|
81
82
|
opts = OptionParser.new do |o|
|
82
83
|
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
@@ -91,8 +92,12 @@ opts = OptionParser.new do |o|
|
|
91
92
|
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
92
93
|
end
|
93
94
|
|
94
|
-
o.on('--loco
|
95
|
-
options[:loco] =
|
95
|
+
o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
|
96
|
+
options[:loco] = b
|
97
|
+
end
|
98
|
+
|
99
|
+
o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
|
100
|
+
options[:chromosomes] = lst
|
96
101
|
end
|
97
102
|
|
98
103
|
o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
|
@@ -112,6 +117,22 @@ opts = OptionParser.new do |o|
|
|
112
117
|
options[:force] = true
|
113
118
|
end
|
114
119
|
|
120
|
+
o.on("--parallel", "Run jobs in parallel") do |b|
|
121
|
+
options[:parallel] = true
|
122
|
+
end
|
123
|
+
|
124
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
125
|
+
options[:parallel] = false
|
126
|
+
end
|
127
|
+
|
128
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
129
|
+
options[:slurm_opts] = ""
|
130
|
+
options[:slurm] = true
|
131
|
+
if slurm
|
132
|
+
options[:slurm_opts] = slurm
|
133
|
+
end
|
134
|
+
end
|
135
|
+
|
115
136
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
116
137
|
options[:quiet] = true
|
117
138
|
end
|
@@ -120,15 +141,20 @@ opts = OptionParser.new do |o|
|
|
120
141
|
options[:verbose] = true
|
121
142
|
end
|
122
143
|
|
123
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
144
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
124
145
|
options[:debug] = true
|
125
146
|
end
|
126
147
|
|
148
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
149
|
+
options[:dry_run] = b
|
150
|
+
end
|
151
|
+
|
127
152
|
o.on('--','Anything after gets passed to GEMMA') do
|
128
153
|
o.terminate()
|
129
154
|
end
|
130
155
|
|
131
156
|
o.separator ""
|
157
|
+
|
132
158
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
133
159
|
options[:show_help] = true
|
134
160
|
end
|
@@ -173,21 +199,40 @@ info = lambda do |*msg|
|
|
173
199
|
OUTPUT.print *msg,"\n" if !options[:quiet]
|
174
200
|
end
|
175
201
|
|
202
|
+
# Fetch chromosomes
|
203
|
+
def get_chromosomes annofn
|
204
|
+
h = {}
|
205
|
+
File.open(annofn,"r").each_line do | line |
|
206
|
+
chr = line.split(/\s+/)[2]
|
207
|
+
h[chr] = true
|
208
|
+
end
|
209
|
+
h.map { |k,v| k }
|
210
|
+
end
|
176
211
|
# ---- Start banner
|
177
212
|
|
178
213
|
GEMMA_K_VERSION=version
|
179
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017
|
214
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
180
215
|
info.call GEMMA_K_BANNER
|
181
216
|
|
182
217
|
# Check gemma version
|
183
218
|
GEMMA_COMMAND=options[:gemma_command]
|
219
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
220
|
+
|
221
|
+
begin
|
222
|
+
GEMMA_INFO = `#{GEMMA_COMMAND}`
|
223
|
+
rescue Errno::ENOENT
|
224
|
+
GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
|
225
|
+
error.call "<#{GEMMA_COMMAND}> command not found"
|
226
|
+
end
|
184
227
|
|
185
|
-
gemma_version_header =
|
228
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
186
229
|
info.call "Using ",gemma_version_header,"\n"
|
187
230
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
188
231
|
v_version, v_major, v_minor = gemma_version.split(".")
|
189
232
|
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
190
233
|
|
234
|
+
info.call gemma_version_header
|
235
|
+
|
191
236
|
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
192
237
|
|
193
238
|
options[:gemma_version_header] = gemma_version_header
|
@@ -203,74 +248,152 @@ if RUBY_VERSION =~ /^1/
|
|
203
248
|
warning "runs on Ruby 2.x only\n"
|
204
249
|
end
|
205
250
|
|
251
|
+
# ---- LOCO defaults to parallel
|
252
|
+
if options[:parallel] == nil
|
253
|
+
options[:parallel] = true if options[:loco]
|
254
|
+
end
|
255
|
+
|
256
|
+
debug.call(options) # some debug output
|
257
|
+
debug.call(record)
|
258
|
+
|
206
259
|
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
207
260
|
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
208
261
|
|
262
|
+
if options[:parallel]
|
263
|
+
begin
|
264
|
+
skip_cite = `echo "will cite" |parallel --citation`
|
265
|
+
debug.call(skip_cite)
|
266
|
+
PARALLEL_INFO = `parallel --help`
|
267
|
+
rescue Errno::ENOENT
|
268
|
+
error.call "<parallel> command not found"
|
269
|
+
end
|
270
|
+
parallel_cmds = []
|
271
|
+
end
|
272
|
+
|
273
|
+
# ---- Fetch chromosomes from SNP annotation file
|
274
|
+
anno_idx = gemma_args.index '-a'
|
275
|
+
raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
|
276
|
+
CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
|
277
|
+
|
209
278
|
# ---- Compute HASH on inputs
|
210
279
|
hashme = []
|
211
280
|
geno_idx = gemma_args.index '-g'
|
212
281
|
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
213
282
|
pheno_idx = gemma_args.index '-p'
|
214
|
-
hashme =
|
215
|
-
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
216
|
-
p [pheno_idx,gemma_args[pheno_idx+2..-1]]
|
217
|
-
gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
|
218
|
-
else
|
219
|
-
gemma_args
|
220
|
-
end
|
221
283
|
|
222
|
-
if DO_COMPUTE_GWA
|
223
|
-
raise "Did not expect GEMMA -p phenotype
|
224
|
-
hashme += ['-p', options[:permute_phenotypes]] if options[:permute_phenotypes]
|
284
|
+
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
285
|
+
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
225
286
|
end
|
226
287
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
288
|
+
execute = lambda { |cmd|
|
289
|
+
info.call("Executing: #{cmd}")
|
290
|
+
err = 0
|
291
|
+
if not options[:debug]
|
292
|
+
# send output to stderr line by line
|
293
|
+
IO.popen("#{cmd}") do |io|
|
294
|
+
while s = io.gets
|
295
|
+
$stderr.print s
|
296
|
+
end
|
297
|
+
io.close
|
298
|
+
err = $?.to_i
|
299
|
+
end
|
234
300
|
else
|
235
|
-
|
301
|
+
$stderr.print `#{cmd}`
|
302
|
+
err = $?.to_i
|
303
|
+
end
|
304
|
+
err
|
305
|
+
}
|
306
|
+
|
307
|
+
compute_hash = lambda do | phenofn = nil |
|
308
|
+
# Compute a HASH on the inputs
|
309
|
+
debug.call "Hashing on ",hashme,"\n"
|
310
|
+
hashes = []
|
311
|
+
hm = if phenofn
|
312
|
+
hashme + ["-p", phenofn]
|
313
|
+
else
|
314
|
+
hashme
|
315
|
+
end
|
316
|
+
debug.call(hm)
|
317
|
+
hm.each do | item |
|
318
|
+
if File.file?(item)
|
319
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
320
|
+
debug.call [item,hashes.last]
|
321
|
+
else
|
322
|
+
hashes << item
|
323
|
+
end
|
236
324
|
end
|
325
|
+
debug.call(hashes)
|
326
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
237
327
|
end
|
238
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
239
328
|
|
329
|
+
HASH = compute_hash.call()
|
240
330
|
options[:hash] = HASH
|
241
331
|
|
242
332
|
# Create cache dir
|
243
333
|
FileUtils::mkdir_p options[:cache_dir]
|
244
334
|
|
335
|
+
Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
|
336
|
+
|
245
337
|
error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
|
246
338
|
error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
|
339
|
+
GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
|
247
340
|
gemma_args << '-outdir'
|
248
|
-
gemma_args <<
|
341
|
+
gemma_args << tmpdir
|
249
342
|
GEMMA_ARGS = gemma_args
|
250
343
|
|
344
|
+
hashme =
|
345
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
346
|
+
# Remove the phenotype file from the hash for GRM computation
|
347
|
+
GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
|
348
|
+
else
|
349
|
+
GEMMA_ARGS_HASH
|
350
|
+
end
|
351
|
+
|
251
352
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
252
353
|
|
253
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
254
|
-
cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
354
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
355
|
+
cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
255
356
|
record[:gemma_command] = cmd
|
256
357
|
return if cache_hit
|
257
|
-
|
358
|
+
if options[:slurm]
|
359
|
+
info.call cmd
|
360
|
+
hashi = HASH
|
361
|
+
prefix = tmpdir+'/'+hashi
|
362
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
363
|
+
script = "#!/bin/bash
|
364
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
365
|
+
#SBATCH --ntasks=1
|
366
|
+
#SBATCH --time=20:00
|
367
|
+
srun #{cmd}
|
368
|
+
"
|
369
|
+
debug.call(script)
|
370
|
+
File.open(scriptfn,"w") { |f|
|
371
|
+
f.write(script)
|
372
|
+
}
|
373
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
374
|
+
end
|
258
375
|
errno =
|
259
376
|
if options[:json]
|
260
377
|
# capture output
|
261
378
|
err = 0
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
379
|
+
if options[:dry_run]
|
380
|
+
info.call("Would have invoked: ",cmd)
|
381
|
+
elsif options[:parallel]
|
382
|
+
info.call("Add parallel job: ",cmd)
|
383
|
+
parallel_cmds << cmd
|
384
|
+
else
|
385
|
+
err = execute.call(cmd)
|
268
386
|
end
|
269
387
|
err
|
270
388
|
else
|
271
|
-
|
272
|
-
|
273
|
-
|
389
|
+
if options[:dry_run]
|
390
|
+
info.call("Would have invoked ",cmd)
|
391
|
+
0
|
392
|
+
else
|
393
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
394
|
+
system(cmd)
|
395
|
+
$?.exitstatus
|
396
|
+
end
|
274
397
|
end
|
275
398
|
if errno != 0
|
276
399
|
debug.call "Gemma exit ",errno
|
@@ -280,11 +403,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
280
403
|
end
|
281
404
|
end
|
282
405
|
|
406
|
+
# Takes the hash value and checks whether the (output) file exists
|
283
407
|
# returns datafn, logfn, cache_hit
|
284
|
-
cache = lambda do | chr, ext |
|
408
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
285
409
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
286
|
-
hashi = (chr==nil ?
|
287
|
-
prefix = options[:cache_dir]+'/'+hashi
|
410
|
+
hashi = (chr==nil ? h : h+inject)
|
411
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
412
|
+
# for chr 3 and permutation 1 forms something like
|
413
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
288
414
|
logfn = prefix+".log.txt"
|
289
415
|
datafn = prefix+ext
|
290
416
|
record[:files] ||= []
|
@@ -320,25 +446,32 @@ kinship = lambda do | chr = nil |
|
|
320
446
|
end
|
321
447
|
|
322
448
|
# ---- Run GWA
|
323
|
-
gwas = lambda do | chr, kfn, pfn |
|
449
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
324
450
|
record[:type] = "GWA"
|
325
451
|
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
326
|
-
|
452
|
+
# Update hash for each permutation
|
453
|
+
hash = compute_hash.call(pfn)
|
454
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
327
455
|
if not cache_hit
|
328
456
|
args = [ '-k', kfn, '-o', hashi ]
|
329
457
|
args << [ '-loco', chr ] if chr != nil
|
330
458
|
args << [ '-p', pfn ] if pfn
|
331
|
-
invoke_gemma.call args
|
459
|
+
invoke_gemma.call args,false,chr,permutation
|
332
460
|
end
|
333
461
|
end
|
334
462
|
|
335
463
|
LOCO = options[:loco]
|
336
|
-
|
464
|
+
if LOCO
|
465
|
+
if options[:chromosomes]
|
466
|
+
CHROMOSOMES = options[:chromosomes]
|
467
|
+
end
|
468
|
+
end
|
469
|
+
|
337
470
|
if DO_COMPUTE_KINSHIP
|
338
471
|
# compute K
|
339
|
-
info.call
|
340
|
-
if LOCO
|
341
|
-
|
472
|
+
info.call CHROMOSOMES
|
473
|
+
if LOCO
|
474
|
+
CHROMOSOMES.each do |chr|
|
342
475
|
info.call "LOCO for ",chr
|
343
476
|
kinship.call(chr)
|
344
477
|
end
|
@@ -347,27 +480,38 @@ if DO_COMPUTE_KINSHIP
|
|
347
480
|
end
|
348
481
|
else
|
349
482
|
# DO_COMPUTE_GWA
|
350
|
-
|
483
|
+
begin
|
484
|
+
json_in = JSON.parse(File.read(options[:input]))
|
485
|
+
rescue TypeError
|
486
|
+
raise "Missing JSON input file?"
|
487
|
+
end
|
351
488
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
352
489
|
|
353
|
-
pfn = options[:
|
354
|
-
|
355
|
-
|
356
|
-
|
490
|
+
pfn = options[:permute_phenotypes] # can be nil
|
491
|
+
if LOCO
|
492
|
+
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
493
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
494
|
+
gwas.call(chr,kfn,pfn)
|
495
|
+
end
|
496
|
+
else
|
497
|
+
kfn = json_in["files"][0][2]
|
498
|
+
CHROMOSOMES.each do | chr |
|
499
|
+
gwas.call(chr,kfn,pfn)
|
500
|
+
end
|
357
501
|
end
|
358
502
|
# Permute
|
359
503
|
if options[:permutate]
|
360
504
|
ps = []
|
361
|
-
raise "You should supply --
|
505
|
+
raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
|
362
506
|
File.foreach(pfn).with_index do |line, line_num|
|
363
507
|
ps << line
|
364
508
|
end
|
365
509
|
score_list = []
|
366
510
|
debug.call(options[:permutate],"x permutations")
|
367
|
-
(1..options[:permutate]).each do |
|
368
|
-
$stderr.print "Iteration ",
|
511
|
+
(1..options[:permutate]).each do |permutation|
|
512
|
+
$stderr.print "Iteration ",permutation,"\n"
|
369
513
|
# Create a shuffled phenotype file
|
370
|
-
file = File.open("phenotypes-#{
|
514
|
+
file = File.open("phenotypes-#{permutation}","w")
|
371
515
|
tmp_pfn = file.path
|
372
516
|
p tmp_pfn
|
373
517
|
ps.shuffle.each do | l |
|
@@ -375,20 +519,23 @@ else
|
|
375
519
|
end
|
376
520
|
file.close
|
377
521
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
378
|
-
gwas.call(chr,kfn,tmp_pfn)
|
522
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
379
523
|
end
|
380
|
-
# p [:HEY,record[:files].last]
|
381
|
-
assocfn = record[:files].last[2]
|
382
|
-
debug.call("Reading ",assocfn)
|
383
524
|
score_min = 1000.0
|
384
|
-
|
385
|
-
|
386
|
-
|
387
|
-
|
525
|
+
if false and not options[:slurm]
|
526
|
+
# p [:HEY,record[:files].last]
|
527
|
+
assocfn = record[:files].last[2]
|
528
|
+
debug.call("Reading ",assocfn)
|
529
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
530
|
+
if assoc_line_num > 0
|
531
|
+
value = assoc.strip.split(/\t/).last.to_f
|
532
|
+
score_min = value if value < score_min
|
533
|
+
end
|
388
534
|
end
|
389
535
|
end
|
390
536
|
score_list << score_min
|
391
537
|
end
|
538
|
+
exit 0 if options[:slurm]
|
392
539
|
ls = score_list.sort
|
393
540
|
p ls
|
394
541
|
significant = ls[(ls.size - ls.size*0.95).floor]
|
@@ -399,5 +546,38 @@ else
|
|
399
546
|
end
|
400
547
|
end
|
401
548
|
|
549
|
+
# ---- Invoke parallel
|
550
|
+
if options[:parallel]
|
551
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
552
|
+
cmd = parallel_cmds.join("\\n")
|
553
|
+
|
554
|
+
cmd = "echo -e \"#{cmd}\""
|
555
|
+
err = execute.call(cmd+"|parallel") # all jobs in parallel
|
556
|
+
if err != 0
|
557
|
+
[16,8,4,1].each do |jobs|
|
558
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
559
|
+
err = execute.call(cmd+"|parallel -j #{jobs}")
|
560
|
+
break if err == 0
|
561
|
+
end
|
562
|
+
if err != 0
|
563
|
+
info.call("Run failed!")
|
564
|
+
exit err
|
565
|
+
end
|
566
|
+
end
|
567
|
+
info.call("Run successful!")
|
568
|
+
end
|
402
569
|
json_out.call
|
403
|
-
|
570
|
+
|
571
|
+
# copy all output files to the cache_dir. If a file exists only emit a warning
|
572
|
+
Dir.glob("*.txt", base: tmpdir) do | fn |
|
573
|
+
source = tmpdir + "/" + fn
|
574
|
+
dest = options[:cache_dir] + "/" + fn
|
575
|
+
if not File.exist?(dest) or options[:force]
|
576
|
+
info.call "Move #{source} to #{dest}"
|
577
|
+
FileUtils.mv source, dest, verbose: false
|
578
|
+
else
|
579
|
+
warning.call "File #{dest} already exists. Not overwriting"
|
580
|
+
end
|
581
|
+
end
|
582
|
+
|
583
|
+
end # tmpdir
|
data/gemma-wrapper.gemspec
CHANGED
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
4
|
s.summary = "GEMMA with LOCO and permutations"
|
5
|
-
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
metadata
CHANGED
@@ -1,17 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 0.99.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-08-22 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper adds LOCO and permutation support. Also
|
14
|
-
runs with LOCO support
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
|
14
|
+
and caches K between runs with LOCO support
|
15
15
|
email: pjotr.public01@thebird.nl
|
16
16
|
executables:
|
17
17
|
- gemma-wrapper
|
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
43
|
- !ruby/object:Gem::Version
|
44
44
|
version: '0'
|
45
45
|
requirements: []
|
46
|
-
|
47
|
-
rubygems_version: 2.6.8
|
46
|
+
rubygems_version: 3.2.5
|
48
47
|
signing_key:
|
49
48
|
specification_version: 4
|
50
49
|
summary: GEMMA with LOCO and permutations
|