bio-gemma-wrapper 0.97.1 → 0.99.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/README.md +84 -13
- data/VERSION +1 -1
- data/bin/gemma-wrapper +218 -61
- data/gemma-wrapper.gemspec +1 -1
- metadata +5 -6
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
|
-
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
2
|
+
SHA256:
|
|
3
|
+
metadata.gz: e27a8a3abb00b758095df5956b3854674faf5ff681a93bc028df273c40125c0d
|
|
4
|
+
data.tar.gz: e9675dbb0ea0f087dd21774635d38f3cda11b46a88b36c77dd308086fd0ec5f2
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 81cf5440fa531d5a831efa787800c8bea230d47cddc666a31fff066551ff347708a41ddf1368c0d3946c7ba9faef8e5882e398ad340850253c53961cce96f662
|
|
7
|
+
data.tar.gz: 582ae78c48a1eb8eeca01172eaeaba9d5ca23e69601967e334f8c218e3a4dd74b297861b01ce49b1357798b49a96c12e737100dcacec7fc34b70da1fc9c75f0d
|
data/README.md
CHANGED
|
@@ -1,10 +1,19 @@
|
|
|
1
|
-
|
|
1
|
+
[](https://badge.fury.io/rb/bio-gemma-wrapper)
|
|
2
|
+
|
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
|
2
4
|
|
|
3
5
|

|
|
5
7
|
|
|
6
8
|
## Introduction
|
|
7
9
|
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
|
11
|
+
GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
|
|
12
|
+
is used to run GEMMA as part of the https://genenetwork.org/
|
|
13
|
+
environment.
|
|
14
|
+
|
|
15
|
+
Note that gemma-wrapper is projected to be integrated into gemma2/lib.
|
|
16
|
+
|
|
8
17
|
GEMMA is a software toolkit for fast application of linear mixed
|
|
9
18
|
models (LMMs) and related models to genome-wide association studies
|
|
10
19
|
(GWAS) and other large-scale data sets.
|
|
@@ -12,16 +21,14 @@ models (LMMs) and related models to genome-wide association studies
|
|
|
12
21
|
This repository contains gemma-wrapper, essentially a wrapper of
|
|
13
22
|
GEMMA that provides support for caching the kinship or relatedness
|
|
14
23
|
matrix (K) and caching LM and LMM computations with the option of full
|
|
15
|
-
leave-one-chromosome-out genome scans (LOCO).
|
|
24
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
|
25
|
+
submitted to HPC PBS, i.e., slurm.
|
|
16
26
|
|
|
17
27
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
|
18
28
|
does a pass-through of all standard GEMMA invocation switches. On
|
|
19
29
|
return gemma-wrapper can return a JSON object (--json) which is
|
|
20
30
|
useful for web-services.
|
|
21
31
|
|
|
22
|
-
Note that this a work in progress (WIP). What is described below
|
|
23
|
-
should work.
|
|
24
|
-
|
|
25
32
|
## Installation
|
|
26
33
|
|
|
27
34
|
Prerequisites are
|
|
@@ -30,8 +37,9 @@ Prerequisites are
|
|
|
30
37
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
|
31
38
|
almost all Linux systems
|
|
32
39
|
|
|
33
|
-
gemma-wrapper comes as a Ruby
|
|
34
|
-
can be
|
|
40
|
+
gemma-wrapper comes as a Ruby
|
|
41
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
|
42
|
+
installed with
|
|
35
43
|
|
|
36
44
|
gem install bio-gemma-wrapper
|
|
37
45
|
|
|
@@ -39,15 +47,18 @@ Invoke the tool with
|
|
|
39
47
|
|
|
40
48
|
gemma-wrapper --help
|
|
41
49
|
|
|
42
|
-
and it will render
|
|
50
|
+
and it will render something like
|
|
43
51
|
|
|
44
52
|
```
|
|
45
53
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
|
54
|
+
--permutate n Permutate # times by shuffling phenotypes
|
|
55
|
+
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
|
46
56
|
--loco [x,y,1,2,3...] Run full LOCO
|
|
47
57
|
--input filen JSON input variables (used for LOCO)
|
|
48
58
|
--cache-dir path Use a cache directory
|
|
49
59
|
--json Create output file in JSON format
|
|
50
60
|
--force Force computation
|
|
61
|
+
--slurm [options] Submit to slurm PBS
|
|
51
62
|
--q, --quiet Run quietly
|
|
52
63
|
-v, --verbose Run verbosely
|
|
53
64
|
--debug Show debug messages and keep intermediate output
|
|
@@ -65,6 +76,8 @@ Unpack it and run the tool as
|
|
|
65
76
|
|
|
66
77
|
./bin/gemma-wrapper --help
|
|
67
78
|
|
|
79
|
+
See below for using a GNU Guix environment.
|
|
80
|
+
|
|
68
81
|
## Usage
|
|
69
82
|
|
|
70
83
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
|
@@ -91,11 +104,12 @@ the data files are found):
|
|
|
91
104
|
|
|
92
105
|
Run it twice to see
|
|
93
106
|
|
|
94
|
-
/tmp/
|
|
107
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
|
95
108
|
|
|
96
109
|
gemma-wrapper computes the unique HASH value over the command
|
|
97
110
|
line switches passed into GEMMA as well as the contents of the files
|
|
98
|
-
passed in (here the genotype and phenotype files
|
|
111
|
+
passed in (here the genotype and phenotype files - actually it ignores the phenotype with K because
|
|
112
|
+
GEMMA always computes the same K).
|
|
99
113
|
|
|
100
114
|
You can also get JSON output on STDOUT by providing the --json switch
|
|
101
115
|
|
|
@@ -103,9 +117,10 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
|
103
117
|
-g test/data/input/BXD_geno.txt.gz \
|
|
104
118
|
-p test/data/input/BXD_pheno.txt \
|
|
105
119
|
-gk \
|
|
106
|
-
-debug
|
|
120
|
+
-debug > K.json
|
|
107
121
|
|
|
108
|
-
|
|
122
|
+
K.json is something that can be parsed with a calling program, and is
|
|
123
|
+
also below as input for the GWA step. Example:
|
|
109
124
|
|
|
110
125
|
```json
|
|
111
126
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
|
@@ -123,6 +138,23 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
|
123
138
|
|
|
124
139
|
will store K in ~/.gemma-cache.
|
|
125
140
|
|
|
141
|
+
### GWA
|
|
142
|
+
|
|
143
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
|
144
|
+
switch
|
|
145
|
+
|
|
146
|
+
gemma-wrapper --json --loco --input K.json -- \
|
|
147
|
+
-g test/data/input/BXD_geno.txt.gz \
|
|
148
|
+
-p test/data/input/BXD_pheno.txt \
|
|
149
|
+
-c test/data/input/BXD_covariates2.txt \
|
|
150
|
+
-a test/data/input/BXD_snps.txt \
|
|
151
|
+
-lmm 2 -maf 0.1 \
|
|
152
|
+
-debug > GWA.json
|
|
153
|
+
|
|
154
|
+
Running it twice should show that GWA is not recomputed.
|
|
155
|
+
|
|
156
|
+
/tmp/9e411810ad341de6456ce0c6efd4f973356d0bad.log.txt CACHE HIT!
|
|
157
|
+
|
|
126
158
|
### LOCO
|
|
127
159
|
|
|
128
160
|
Recent versions of GEMMA have LOCO support for a single chromosome
|
|
@@ -158,6 +190,45 @@ GWA.json contains the file names of every chromosome
|
|
|
158
190
|
The -k switch is injected automatically. Again output switches are not
|
|
159
191
|
allowed (-o, -outdir)
|
|
160
192
|
|
|
193
|
+
### Permutations
|
|
194
|
+
|
|
195
|
+
Permutations can be run with and without LOCO. First create K
|
|
196
|
+
|
|
197
|
+
gemma-wrapper --json -- \
|
|
198
|
+
-g test/data/input/BXD_geno.txt.gz \
|
|
199
|
+
-p test/data/input/BXD_pheno.txt \
|
|
200
|
+
-gk \
|
|
201
|
+
-debug > K.json
|
|
202
|
+
|
|
203
|
+
Next, using K.json, permute the phenotypes with something like
|
|
204
|
+
|
|
205
|
+
gemma-wrapper --json --loco --input K.json \
|
|
206
|
+
--permutate 100 --permute-phenotype test/data/input/BXD_pheno.txt -- \
|
|
207
|
+
-g test/data/input/BXD_geno.txt.gz \
|
|
208
|
+
-p test/data/input/BXD_pheno.txt \
|
|
209
|
+
-c test/data/input/BXD_covariates2.txt \
|
|
210
|
+
-a test/data/input/BXD_snps.txt \
|
|
211
|
+
-lmm 2 -maf 0.1 \
|
|
212
|
+
-debug > GWA.json
|
|
213
|
+
|
|
214
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
|
215
|
+
|
|
216
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
|
217
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
|
218
|
+
|
|
219
|
+
### Slurm PBS
|
|
220
|
+
|
|
221
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
|
222
|
+
|
|
223
|
+
## Development
|
|
224
|
+
|
|
225
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
|
226
|
+
|
|
227
|
+
```
|
|
228
|
+
source .guix-deploy
|
|
229
|
+
ruby bin/gemma-wrapper --help
|
|
230
|
+
```
|
|
231
|
+
|
|
161
232
|
## Copyright
|
|
162
233
|
|
|
163
|
-
Copyright (c) 2017 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
|
234
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
|
@@ -1 +1 @@
|
|
|
1
|
-
0.
|
|
1
|
+
0.99.2
|
data/bin/gemma-wrapper
CHANGED
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
# Author:: Pjotr Prins
|
|
5
5
|
# License:: GPL3
|
|
6
6
|
#
|
|
7
|
-
# Copyright (C) 2017
|
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
|
8
8
|
|
|
9
9
|
USAGE = "
|
|
10
10
|
GEMMA wrapper example:
|
|
@@ -35,10 +35,13 @@ GEMMA wrapper example:
|
|
|
35
35
|
-lmm 2 -maf 0.1 \\
|
|
36
36
|
-debug > GWA.json
|
|
37
37
|
|
|
38
|
+
Gemma gets used from the path. You can override by setting
|
|
39
|
+
|
|
40
|
+
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
|
38
41
|
"
|
|
39
|
-
# These are used for testing compatibility
|
|
40
|
-
GEMMA_V_MAJOR =
|
|
41
|
-
GEMMA_V_MINOR =
|
|
42
|
+
# These are used for testing compatibility with the gemma tool
|
|
43
|
+
GEMMA_V_MAJOR = 98
|
|
44
|
+
GEMMA_V_MINOR = 4
|
|
42
45
|
|
|
43
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
|
44
47
|
$: << File.join(basepath,'lib')
|
|
@@ -61,32 +64,34 @@ if not gemma_command
|
|
|
61
64
|
end
|
|
62
65
|
end
|
|
63
66
|
|
|
67
|
+
|
|
68
|
+
require 'digest/sha1'
|
|
64
69
|
require 'fileutils'
|
|
65
70
|
require 'optparse'
|
|
66
|
-
require 'tmpdir'
|
|
67
71
|
require 'tempfile'
|
|
72
|
+
require 'tmpdir'
|
|
68
73
|
|
|
69
74
|
split_at = ARGV.index('--')
|
|
70
75
|
if split_at
|
|
71
76
|
gemma_args = ARGV[split_at+1..-1]
|
|
72
77
|
end
|
|
73
78
|
|
|
74
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
|
79
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
|
|
75
80
|
|
|
76
81
|
opts = OptionParser.new do |o|
|
|
77
82
|
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
|
78
83
|
|
|
79
|
-
o.on('--permutate n', Integer, 'Permutate by shuffling phenotypes') do |lst|
|
|
84
|
+
o.on('--permutate n', Integer, 'Permutate # times by shuffling phenotypes') do |lst|
|
|
80
85
|
options[:permutate] = lst
|
|
81
86
|
options[:force] = true
|
|
82
87
|
end
|
|
83
88
|
|
|
84
|
-
o.on('--phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
|
85
|
-
options[:
|
|
89
|
+
o.on('--permute-phenotypes filen',String, 'Phenotypes to be shuffled in permutations') do |phenotypes|
|
|
90
|
+
options[:permute_phenotypes] = phenotypes
|
|
86
91
|
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
|
87
92
|
end
|
|
88
93
|
|
|
89
|
-
o.on('--loco [x,y,1,2,3...]', Array, 'Run full LOCO') do |lst|
|
|
94
|
+
o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
|
|
90
95
|
options[:loco] = lst
|
|
91
96
|
end
|
|
92
97
|
|
|
@@ -107,6 +112,18 @@ opts = OptionParser.new do |o|
|
|
|
107
112
|
options[:force] = true
|
|
108
113
|
end
|
|
109
114
|
|
|
115
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
|
116
|
+
options[:parallel] = false
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
|
120
|
+
options[:slurm_opts] = ""
|
|
121
|
+
options[:slurm] = true
|
|
122
|
+
if slurm
|
|
123
|
+
options[:slurm_opts] = slurm
|
|
124
|
+
end
|
|
125
|
+
end
|
|
126
|
+
|
|
110
127
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
|
111
128
|
options[:quiet] = true
|
|
112
129
|
end
|
|
@@ -115,15 +132,20 @@ opts = OptionParser.new do |o|
|
|
|
115
132
|
options[:verbose] = true
|
|
116
133
|
end
|
|
117
134
|
|
|
118
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
|
135
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
|
119
136
|
options[:debug] = true
|
|
120
137
|
end
|
|
121
138
|
|
|
139
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
|
140
|
+
options[:dry_run] = b
|
|
141
|
+
end
|
|
142
|
+
|
|
122
143
|
o.on('--','Anything after gets passed to GEMMA') do
|
|
123
144
|
o.terminate()
|
|
124
145
|
end
|
|
125
146
|
|
|
126
147
|
o.separator ""
|
|
148
|
+
|
|
127
149
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
|
128
150
|
options[:show_help] = true
|
|
129
151
|
end
|
|
@@ -171,17 +193,28 @@ end
|
|
|
171
193
|
# ---- Start banner
|
|
172
194
|
|
|
173
195
|
GEMMA_K_VERSION=version
|
|
174
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017
|
|
196
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
|
175
197
|
info.call GEMMA_K_BANNER
|
|
176
198
|
|
|
177
199
|
# Check gemma version
|
|
178
200
|
GEMMA_COMMAND=options[:gemma_command]
|
|
179
|
-
|
|
201
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
|
202
|
+
|
|
203
|
+
begin
|
|
204
|
+
GEMMA_INFO = `#{GEMMA_COMMAND}`
|
|
205
|
+
rescue Errno::ENOENT
|
|
206
|
+
GEMMA_COMMAND = "gemma" if not GEMMA_COMMAND
|
|
207
|
+
error.call "<#{GEMMA_COMMAND}> command not found"
|
|
208
|
+
end
|
|
209
|
+
|
|
210
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
|
180
211
|
info.call "Using ",gemma_version_header,"\n"
|
|
181
212
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
|
182
213
|
v_version, v_major, v_minor = gemma_version.split(".")
|
|
183
214
|
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
|
184
215
|
|
|
216
|
+
info.call gemma_version_header
|
|
217
|
+
|
|
185
218
|
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
|
186
219
|
|
|
187
220
|
options[:gemma_version_header] = gemma_version_header
|
|
@@ -197,60 +230,143 @@ if RUBY_VERSION =~ /^1/
|
|
|
197
230
|
warning "runs on Ruby 2.x only\n"
|
|
198
231
|
end
|
|
199
232
|
|
|
233
|
+
debug.call(options) # some debug output
|
|
234
|
+
debug.call(record)
|
|
235
|
+
|
|
236
|
+
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
|
237
|
+
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
|
238
|
+
|
|
239
|
+
# ---- Set up parallel
|
|
240
|
+
if options[:parallel]
|
|
241
|
+
begin
|
|
242
|
+
skip_cite = `echo "will cite" |parallel --citation`
|
|
243
|
+
debug.call(skip_cite)
|
|
244
|
+
PARALLEL_INFO = `parallel --help`
|
|
245
|
+
rescue Errno::ENOENT
|
|
246
|
+
error.call "<parallel> command not found"
|
|
247
|
+
end
|
|
248
|
+
parallel_cmds = []
|
|
249
|
+
end
|
|
250
|
+
|
|
200
251
|
# ---- Compute HASH on inputs
|
|
201
252
|
hashme = []
|
|
202
253
|
geno_idx = gemma_args.index '-g'
|
|
203
|
-
raise "Expected GEMMA -g switch" if geno_idx == nil
|
|
204
|
-
|
|
205
|
-
hashme += ['-p', options[:phenotypes]] if options[:phenotypes]
|
|
254
|
+
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
|
255
|
+
pheno_idx = gemma_args.index '-p'
|
|
206
256
|
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
|
|
257
|
+
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
|
258
|
+
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
|
259
|
+
end
|
|
260
|
+
|
|
261
|
+
execute = lambda { |cmd|
|
|
262
|
+
info.call("Executing: #{cmd}")
|
|
263
|
+
err = 0
|
|
264
|
+
if not options[:debug]
|
|
265
|
+
# send output to stderr line by line
|
|
266
|
+
IO.popen("#{cmd}") do |io|
|
|
267
|
+
while s = io.gets
|
|
268
|
+
$stderr.print s
|
|
269
|
+
end
|
|
270
|
+
io.close
|
|
271
|
+
err = $?.to_i
|
|
272
|
+
end
|
|
214
273
|
else
|
|
215
|
-
|
|
274
|
+
$stderr.print `#{cmd}`
|
|
275
|
+
err = $?.to_i
|
|
276
|
+
end
|
|
277
|
+
err
|
|
278
|
+
}
|
|
279
|
+
|
|
280
|
+
compute_hash = lambda do | phenofn = nil |
|
|
281
|
+
# Compute a HASH on the inputs
|
|
282
|
+
debug.call "Hashing on ",hashme,"\n"
|
|
283
|
+
hashes = []
|
|
284
|
+
hm = if phenofn
|
|
285
|
+
hashme + ["-p", phenofn]
|
|
286
|
+
else
|
|
287
|
+
hashme
|
|
288
|
+
end
|
|
289
|
+
debug.call(hm)
|
|
290
|
+
hm.each do | item |
|
|
291
|
+
if File.file?(item)
|
|
292
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
|
293
|
+
debug.call [item,hashes.last]
|
|
294
|
+
else
|
|
295
|
+
hashes << item
|
|
296
|
+
end
|
|
216
297
|
end
|
|
298
|
+
debug.call(hashes)
|
|
299
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
|
217
300
|
end
|
|
218
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
|
219
301
|
|
|
302
|
+
HASH = compute_hash.call()
|
|
220
303
|
options[:hash] = HASH
|
|
221
304
|
|
|
222
305
|
# Create cache dir
|
|
223
306
|
FileUtils::mkdir_p options[:cache_dir]
|
|
224
307
|
|
|
308
|
+
Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
|
|
309
|
+
|
|
225
310
|
error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
|
|
226
311
|
error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
|
|
312
|
+
GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
|
|
227
313
|
gemma_args << '-outdir'
|
|
228
|
-
gemma_args <<
|
|
314
|
+
gemma_args << tmpdir
|
|
229
315
|
GEMMA_ARGS = gemma_args
|
|
230
316
|
|
|
317
|
+
hashme =
|
|
318
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
|
319
|
+
# Remove the phenotype file from the hash for GRM computation
|
|
320
|
+
GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
|
|
321
|
+
else
|
|
322
|
+
GEMMA_ARGS_HASH
|
|
323
|
+
end
|
|
324
|
+
|
|
231
325
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
|
232
326
|
|
|
233
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
234
|
-
cmd="#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
|
327
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
|
328
|
+
cmd = "#{GEMMA_COMMAND} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
|
235
329
|
record[:gemma_command] = cmd
|
|
236
330
|
return if cache_hit
|
|
237
|
-
|
|
331
|
+
if options[:slurm]
|
|
332
|
+
info.call cmd
|
|
333
|
+
hashi = HASH
|
|
334
|
+
prefix = tmpdir+'/'+hashi
|
|
335
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
|
336
|
+
script = "#!/bin/bash
|
|
337
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
|
338
|
+
#SBATCH --ntasks=1
|
|
339
|
+
#SBATCH --time=20:00
|
|
340
|
+
srun #{cmd}
|
|
341
|
+
"
|
|
342
|
+
debug.call(script)
|
|
343
|
+
File.open(scriptfn,"w") { |f|
|
|
344
|
+
f.write(script)
|
|
345
|
+
}
|
|
346
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
|
347
|
+
end
|
|
238
348
|
errno =
|
|
239
349
|
if options[:json]
|
|
240
350
|
# capture output
|
|
241
351
|
err = 0
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
247
|
-
|
|
352
|
+
if options[:dry_run]
|
|
353
|
+
info.call("Would have invoked: ",cmd)
|
|
354
|
+
elsif options[:parallel]
|
|
355
|
+
info.call("Add parallel job: ",cmd)
|
|
356
|
+
parallel_cmds << cmd
|
|
357
|
+
else
|
|
358
|
+
err = execute.call(cmd)
|
|
248
359
|
end
|
|
249
360
|
err
|
|
250
361
|
else
|
|
251
|
-
|
|
252
|
-
|
|
253
|
-
|
|
362
|
+
if options[:dry_run]
|
|
363
|
+
info.call("Would have invoked ",cmd)
|
|
364
|
+
0
|
|
365
|
+
else
|
|
366
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
|
367
|
+
system(cmd)
|
|
368
|
+
$?.exitstatus
|
|
369
|
+
end
|
|
254
370
|
end
|
|
255
371
|
if errno != 0
|
|
256
372
|
debug.call "Gemma exit ",errno
|
|
@@ -260,11 +376,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
|
260
376
|
end
|
|
261
377
|
end
|
|
262
378
|
|
|
379
|
+
# Takes the hash value and checks whether the (output) file exists
|
|
263
380
|
# returns datafn, logfn, cache_hit
|
|
264
|
-
cache = lambda do | chr, ext |
|
|
381
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
|
265
382
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
|
266
|
-
hashi = (chr==nil ?
|
|
267
|
-
prefix = options[:cache_dir]+'/'+hashi
|
|
383
|
+
hashi = (chr==nil ? h : h+inject)
|
|
384
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
|
385
|
+
# for chr 3 and permutation 1 forms something like
|
|
386
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
|
268
387
|
logfn = prefix+".log.txt"
|
|
269
388
|
datafn = prefix+ext
|
|
270
389
|
record[:files] ||= []
|
|
@@ -300,20 +419,22 @@ kinship = lambda do | chr = nil |
|
|
|
300
419
|
end
|
|
301
420
|
|
|
302
421
|
# ---- Run GWA
|
|
303
|
-
gwas = lambda do | chr, kfn, pfn |
|
|
422
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
|
304
423
|
record[:type] = "GWA"
|
|
305
|
-
error.call "Do not use the GEMMA -k switch with gemma-wrapper!" if GEMMA_ARGS.include? '-k' # K is automatic
|
|
306
|
-
|
|
424
|
+
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
|
425
|
+
# Update hash for each permutation
|
|
426
|
+
hash = compute_hash.call(pfn)
|
|
427
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
|
307
428
|
if not cache_hit
|
|
308
429
|
args = [ '-k', kfn, '-o', hashi ]
|
|
309
430
|
args << [ '-loco', chr ] if chr != nil
|
|
310
431
|
args << [ '-p', pfn ] if pfn
|
|
311
|
-
invoke_gemma.call args
|
|
432
|
+
invoke_gemma.call args,false,chr,permutation
|
|
312
433
|
end
|
|
313
434
|
end
|
|
314
435
|
|
|
315
436
|
LOCO = options[:loco]
|
|
316
|
-
if
|
|
437
|
+
if DO_COMPUTE_KINSHIP
|
|
317
438
|
# compute K
|
|
318
439
|
info.call LOCO
|
|
319
440
|
if LOCO != nil
|
|
@@ -325,11 +446,11 @@ if GEMMA_ARGS.include? '-gk'
|
|
|
325
446
|
kinship.call # no LOCO
|
|
326
447
|
end
|
|
327
448
|
else
|
|
328
|
-
#
|
|
449
|
+
# DO_COMPUTE_GWA
|
|
329
450
|
json_in = JSON.parse(File.read(options[:input]))
|
|
330
451
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
|
331
452
|
|
|
332
|
-
pfn = options[:
|
|
453
|
+
pfn = options[:permute_phenotypes] # can be nil
|
|
333
454
|
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
|
334
455
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
|
335
456
|
gwas.call(chr,kfn,pfn)
|
|
@@ -337,16 +458,16 @@ else
|
|
|
337
458
|
# Permute
|
|
338
459
|
if options[:permutate]
|
|
339
460
|
ps = []
|
|
340
|
-
raise "You should supply --
|
|
461
|
+
raise "You should supply --permute-phenotypes with gemma-wrapper --permutate" if not pfn
|
|
341
462
|
File.foreach(pfn).with_index do |line, line_num|
|
|
342
463
|
ps << line
|
|
343
464
|
end
|
|
344
465
|
score_list = []
|
|
345
466
|
debug.call(options[:permutate],"x permutations")
|
|
346
|
-
(1..options[:permutate]).each do |
|
|
347
|
-
$stderr.print "Iteration ",
|
|
467
|
+
(1..options[:permutate]).each do |permutation|
|
|
468
|
+
$stderr.print "Iteration ",permutation,"\n"
|
|
348
469
|
# Create a shuffled phenotype file
|
|
349
|
-
file = File.open("phenotypes-#{
|
|
470
|
+
file = File.open("phenotypes-#{permutation}","w")
|
|
350
471
|
tmp_pfn = file.path
|
|
351
472
|
p tmp_pfn
|
|
352
473
|
ps.shuffle.each do | l |
|
|
@@ -354,20 +475,23 @@ else
|
|
|
354
475
|
end
|
|
355
476
|
file.close
|
|
356
477
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
|
357
|
-
gwas.call(chr,kfn,tmp_pfn)
|
|
478
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
|
358
479
|
end
|
|
359
|
-
# p [:HEY,record[:files].last]
|
|
360
|
-
assocfn = record[:files].last[2]
|
|
361
|
-
debug.call("Reading ",assocfn)
|
|
362
480
|
score_min = 1000.0
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
481
|
+
if false and not options[:slurm]
|
|
482
|
+
# p [:HEY,record[:files].last]
|
|
483
|
+
assocfn = record[:files].last[2]
|
|
484
|
+
debug.call("Reading ",assocfn)
|
|
485
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
|
486
|
+
if assoc_line_num > 0
|
|
487
|
+
value = assoc.strip.split(/\t/).last.to_f
|
|
488
|
+
score_min = value if value < score_min
|
|
489
|
+
end
|
|
367
490
|
end
|
|
368
491
|
end
|
|
369
492
|
score_list << score_min
|
|
370
493
|
end
|
|
494
|
+
exit 0 if options[:slurm]
|
|
371
495
|
ls = score_list.sort
|
|
372
496
|
p ls
|
|
373
497
|
significant = ls[(ls.size - ls.size*0.95).floor]
|
|
@@ -378,5 +502,38 @@ else
|
|
|
378
502
|
end
|
|
379
503
|
end
|
|
380
504
|
|
|
505
|
+
# ---- Invoke parallel
|
|
506
|
+
if options[:parallel]
|
|
507
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
|
508
|
+
cmd = parallel_cmds.join("\\n")
|
|
509
|
+
|
|
510
|
+
cmd = "echo -e \"#{cmd}\""
|
|
511
|
+
err = execute.call(cmd+"|parallel") # all jobs in parallel
|
|
512
|
+
if err != 0
|
|
513
|
+
[16,8,4,1].each do |jobs|
|
|
514
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
|
515
|
+
err = execute.call(cmd+"|parallel -j #{jobs}")
|
|
516
|
+
break if err == 0
|
|
517
|
+
end
|
|
518
|
+
if err != 0
|
|
519
|
+
info.call("Run failed!")
|
|
520
|
+
exit err
|
|
521
|
+
end
|
|
522
|
+
end
|
|
523
|
+
info.call("Run successful!")
|
|
524
|
+
end
|
|
381
525
|
json_out.call
|
|
382
|
-
|
|
526
|
+
|
|
527
|
+
# copy all output files to the cache_dir. If a file exists only emit a warning
|
|
528
|
+
Dir.glob("*.txt", base: tmpdir) do | fn |
|
|
529
|
+
source = tmpdir + "/" + fn
|
|
530
|
+
dest = options[:cache_dir] + "/" + fn
|
|
531
|
+
if not File.exist?(dest) or options[:force]
|
|
532
|
+
info.call "Move #{source} to #{dest}"
|
|
533
|
+
FileUtils.mv source, dest, verbose: false
|
|
534
|
+
else
|
|
535
|
+
warning.call "File #{dest} already exists. Not overwriting"
|
|
536
|
+
end
|
|
537
|
+
end
|
|
538
|
+
|
|
539
|
+
end # tmpdir
|
data/gemma-wrapper.gemspec
CHANGED
|
@@ -2,7 +2,7 @@ Gem::Specification.new do |s|
|
|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
|
3
3
|
s.version = File.read('VERSION')
|
|
4
4
|
s.summary = "GEMMA with LOCO and permutations"
|
|
5
|
-
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
|
|
6
6
|
s.authors = ["Pjotr Prins"]
|
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
metadata
CHANGED
|
@@ -1,17 +1,17 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: bio-gemma-wrapper
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.
|
|
4
|
+
version: 0.99.2
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Pjotr Prins
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: bin
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date:
|
|
11
|
+
date: 2021-08-08 00:00:00.000000000 Z
|
|
12
12
|
dependencies: []
|
|
13
|
-
description: GEMMA wrapper adds LOCO and permutation support. Also
|
|
14
|
-
runs with LOCO support
|
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
|
|
14
|
+
and caches K between runs with LOCO support
|
|
15
15
|
email: pjotr.public01@thebird.nl
|
|
16
16
|
executables:
|
|
17
17
|
- gemma-wrapper
|
|
@@ -43,8 +43,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
43
43
|
- !ruby/object:Gem::Version
|
|
44
44
|
version: '0'
|
|
45
45
|
requirements: []
|
|
46
|
-
|
|
47
|
-
rubygems_version: 2.6.8
|
|
46
|
+
rubygems_version: 3.2.5
|
|
48
47
|
signing_key:
|
|
49
48
|
specification_version: 4
|
|
50
49
|
summary: GEMMA with LOCO and permutations
|