bio-gemma-wrapper 0.98.1 → 0.99.4
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +5 -5
- data/README.md +57 -19
- data/VERSION +1 -1
- data/bin/gemma-wrapper +262 -69
- data/gemma-wrapper.gemspec +2 -1
- data/lib/lock.rb +95 -0
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: da5f26b8acd9c3782c2b3f5f2a39af965fc7e1785cc820b49faca82924d74e51
|
4
|
+
data.tar.gz: 17035ee5fada269ae88dd0ed91d84075b2af88b400de1d0e9829cbdb60d5d0cb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: eaec3c7dad4fc1bda713765e056bfe11dd69d4ca850333fed5a1a27e344724365a705ddf7845ce63b5af6b35ab6140da10f4bc7067aaa4539e47f6c6f94de1f0
|
7
|
+
data.tar.gz: c26b282c0fd7c70a702467e58c3f6ea22f820d91a8a364b335cdab7e807add9cf1079faa25c46a87b89c78e4293990cdea2210427c5f9c3565bd5040fdbef496
|
data/README.md
CHANGED
@@ -1,12 +1,20 @@
|
|
1
1
|
[![gemma-wrapper gem version](https://badge.fury.io/rb/bio-gemma-wrapper.svg)](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
2
|
|
3
|
-
# GEMMA
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
4
4
|
|
5
5
|
![Genetic associations identified in CFW mice using GEMMA (Parker et al,
|
6
6
|
Nat. Genet., 2016)](cfw.gif)
|
7
7
|
|
8
8
|
## Introduction
|
9
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default with LOCO), and GEMMA on
|
12
|
+
PBS. Gemma-wrapper is used to run GEMMA as part of the
|
13
|
+
https://genenetwork.org/ environment.
|
14
|
+
|
15
|
+
Note that a version of gemma-wrapper is projected to be integrated
|
16
|
+
into gemma itself.
|
17
|
+
|
10
18
|
GEMMA is a software toolkit for fast application of linear mixed
|
11
19
|
models (LMMs) and related models to genome-wide association studies
|
12
20
|
(GWAS) and other large-scale data sets.
|
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
|
|
14
22
|
This repository contains gemma-wrapper, essentially a wrapper of
|
15
23
|
GEMMA that provides support for caching the kinship or relatedness
|
16
24
|
matrix (K) and caching LM and LMM computations with the option of full
|
17
|
-
leave-one-chromosome-out genome scans (LOCO).
|
25
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
26
|
+
submitted to HPC PBS, i.e., slurm.
|
18
27
|
|
19
28
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
20
29
|
does a pass-through of all standard GEMMA invocation switches. On
|
21
30
|
return gemma-wrapper can return a JSON object (--json) which is
|
22
31
|
useful for web-services.
|
23
32
|
|
24
|
-
|
25
|
-
|
33
|
+
## Performance
|
34
|
+
|
35
|
+
LOCO runs in parallel by default which is at least a 5x performance
|
36
|
+
improvement on a machine with enough cores. GEMMA without LOCO,
|
37
|
+
however, does not run in parallel by default. Performance
|
38
|
+
improvements with the parallel implementation for LOCO and non-LOCO
|
39
|
+
can be viewed [here](./test/performance/releases.gmi).
|
26
40
|
|
27
41
|
## Installation
|
28
42
|
|
@@ -32,8 +46,9 @@ Prerequisites are
|
|
32
46
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
33
47
|
almost all Linux systems
|
34
48
|
|
35
|
-
gemma-wrapper comes as a Ruby
|
36
|
-
can be
|
49
|
+
gemma-wrapper comes as a Ruby
|
50
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
51
|
+
installed with
|
37
52
|
|
38
53
|
gem install bio-gemma-wrapper
|
39
54
|
|
@@ -47,14 +62,19 @@ and it will render something like
|
|
47
62
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
48
63
|
--permutate n Permutate # times by shuffling phenotypes
|
49
64
|
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
50
|
-
--loco
|
65
|
+
--loco Run full leave-one-chromosome-out (LOCO)
|
66
|
+
--chromosomes [1,2,3] Run specific chromosomes
|
51
67
|
--input filen JSON input variables (used for LOCO)
|
52
68
|
--cache-dir path Use a cache directory
|
53
69
|
--json Create output file in JSON format
|
54
|
-
--force Force computation
|
70
|
+
--force Force computation (override cache)
|
71
|
+
--parallel Run jobs in parallel
|
72
|
+
--no-parallel Do not run jobs in parallel
|
73
|
+
--slurm[=opts] Use slurm PBS for submitting jobs
|
55
74
|
--q, --quiet Run quietly
|
56
75
|
-v, --verbose Run verbosely
|
57
|
-
|
76
|
+
-d, --debug Show debug messages and keep intermediate output
|
77
|
+
--dry-run Show commands, but don't execute
|
58
78
|
-- Anything after gets passed to GEMMA
|
59
79
|
|
60
80
|
-h, --help display this help and exit
|
@@ -69,6 +89,8 @@ Unpack it and run the tool as
|
|
69
89
|
|
70
90
|
./bin/gemma-wrapper --help
|
71
91
|
|
92
|
+
See below for using a GNU Guix environment.
|
93
|
+
|
72
94
|
## Usage
|
73
95
|
|
74
96
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -90,12 +112,13 @@ the data files are found):
|
|
90
112
|
gemma-wrapper -- \
|
91
113
|
-g test/data/input/BXD_geno.txt.gz \
|
92
114
|
-p test/data/input/BXD_pheno.txt \
|
115
|
+
-a test/data/input/BXD_snps.txt \
|
93
116
|
-gk \
|
94
117
|
-debug
|
95
118
|
|
96
119
|
Run it twice to see
|
97
120
|
|
98
|
-
/tmp/
|
121
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
99
122
|
|
100
123
|
gemma-wrapper computes the unique HASH value over the command
|
101
124
|
line switches passed into GEMMA as well as the contents of the files
|
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
107
130
|
gemma-wrapper --json -- \
|
108
131
|
-g test/data/input/BXD_geno.txt.gz \
|
109
132
|
-p test/data/input/BXD_pheno.txt \
|
133
|
+
-a test/data/input/BXD_snps.txt \
|
110
134
|
-gk \
|
111
|
-
-debug
|
135
|
+
-debug > K.json
|
112
136
|
|
113
|
-
|
137
|
+
K.json is something that can be parsed with a calling program, and is
|
138
|
+
also below as input for the GWA step. Example:
|
114
139
|
|
115
140
|
```json
|
116
141
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,6 +148,7 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
148
|
gemma-wrapper --cache-dir ~/.gemma-cache -- \
|
124
149
|
-g test/data/input/BXD_geno.txt.gz \
|
125
150
|
-p test/data/input/BXD_pheno.txt \
|
151
|
+
-a test/data/input/BXD_snps.txt \
|
126
152
|
-gk \
|
127
153
|
-debug
|
128
154
|
|
@@ -130,10 +156,10 @@ will store K in ~/.gemma-cache.
|
|
130
156
|
|
131
157
|
### GWA
|
132
158
|
|
133
|
-
Run the LMM using the K's captured in K.json using the --input
|
159
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
134
160
|
switch
|
135
161
|
|
136
|
-
gemma-wrapper --json --
|
162
|
+
gemma-wrapper --json --input K.json -- \
|
137
163
|
-g test/data/input/BXD_geno.txt.gz \
|
138
164
|
-p test/data/input/BXD_pheno.txt \
|
139
165
|
-c test/data/input/BXD_covariates2.txt \
|
@@ -153,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
|
|
153
179
|
chromosomes first create all K's with
|
154
180
|
|
155
181
|
gemma-wrapper --json \
|
156
|
-
--loco
|
182
|
+
--loco -- \
|
157
183
|
-g test/data/input/BXD_geno.txt.gz \
|
158
184
|
-p test/data/input/BXD_pheno.txt \
|
159
185
|
-a test/data/input/BXD_snps.txt \
|
@@ -201,12 +227,24 @@ Next, using K.json, permute the phenotypes with something like
|
|
201
227
|
-lmm 2 -maf 0.1 \
|
202
228
|
-debug > GWA.json
|
203
229
|
|
204
|
-
This should get the 95% significant and 67% suggestive thresholds:
|
230
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
231
|
+
|
232
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
233
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
234
|
+
|
235
|
+
### Slurm PBS
|
205
236
|
|
206
|
-
|
207
|
-
["67 percentile (suggestive) ", 2.015475e-05, 4.7]
|
237
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
208
238
|
|
239
|
+
## Development
|
240
|
+
|
241
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
242
|
+
|
243
|
+
```
|
244
|
+
source .guix-deploy
|
245
|
+
ruby bin/gemma-wrapper --help
|
246
|
+
```
|
209
247
|
|
210
248
|
## Copyright
|
211
249
|
|
212
|
-
Copyright (c) 2017
|
250
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.4
|
data/bin/gemma-wrapper
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
9
|
USAGE = "
|
10
10
|
GEMMA wrapper example:
|
@@ -14,12 +14,12 @@ GEMMA wrapper example:
|
|
14
14
|
gemma-wrapper -- \\
|
15
15
|
-g test/data/input/BXD_geno.txt.gz \\
|
16
16
|
-p test/data/input/BXD_pheno.txt \\
|
17
|
+
-a test/data/input/BXD_snps.txt \
|
17
18
|
-gk
|
18
19
|
|
19
20
|
LOCO K computation with caching and JSON output
|
20
21
|
|
21
|
-
gemma-wrapper --json \\
|
22
|
-
--loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
|
22
|
+
gemma-wrapper --json --loco -- \\
|
23
23
|
-g test/data/input/BXD_geno.txt.gz \\
|
24
24
|
-p test/data/input/BXD_pheno.txt \\
|
25
25
|
-a test/data/input/BXD_snps.txt \\
|
@@ -38,11 +38,10 @@ GEMMA wrapper example:
|
|
38
38
|
Gemma gets used from the path. You can override by setting
|
39
39
|
|
40
40
|
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
41
|
-
|
42
41
|
"
|
43
42
|
# These are used for testing compatibility with the gemma tool
|
44
43
|
GEMMA_V_MAJOR = 98
|
45
|
-
GEMMA_V_MINOR =
|
44
|
+
GEMMA_V_MINOR = 4
|
46
45
|
|
47
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
48
47
|
$: << File.join(basepath,'lib')
|
@@ -66,17 +65,21 @@ if not gemma_command
|
|
66
65
|
end
|
67
66
|
|
68
67
|
|
68
|
+
require 'digest/sha1'
|
69
69
|
require 'fileutils'
|
70
70
|
require 'optparse'
|
71
|
-
require 'tmpdir'
|
72
71
|
require 'tempfile'
|
72
|
+
require 'tmpdir'
|
73
|
+
|
74
|
+
require 'lock'
|
73
75
|
|
74
76
|
split_at = ARGV.index('--')
|
77
|
+
|
75
78
|
if split_at
|
76
79
|
gemma_args = ARGV[split_at+1..-1]
|
77
80
|
end
|
78
81
|
|
79
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
82
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
|
80
83
|
|
81
84
|
opts = OptionParser.new do |o|
|
82
85
|
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
@@ -91,8 +94,12 @@ opts = OptionParser.new do |o|
|
|
91
94
|
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
92
95
|
end
|
93
96
|
|
94
|
-
o.on('--loco
|
95
|
-
options[:loco] =
|
97
|
+
o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
|
98
|
+
options[:loco] = b
|
99
|
+
end
|
100
|
+
|
101
|
+
o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
|
102
|
+
options[:chromosomes] = lst
|
96
103
|
end
|
97
104
|
|
98
105
|
o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
|
@@ -112,6 +119,22 @@ opts = OptionParser.new do |o|
|
|
112
119
|
options[:force] = true
|
113
120
|
end
|
114
121
|
|
122
|
+
o.on("--parallel", "Run jobs in parallel") do |b|
|
123
|
+
options[:parallel] = true
|
124
|
+
end
|
125
|
+
|
126
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
127
|
+
options[:parallel] = false
|
128
|
+
end
|
129
|
+
|
130
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
131
|
+
options[:slurm_opts] = ""
|
132
|
+
options[:slurm] = true
|
133
|
+
if slurm
|
134
|
+
options[:slurm_opts] = slurm
|
135
|
+
end
|
136
|
+
end
|
137
|
+
|
115
138
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
116
139
|
options[:quiet] = true
|
117
140
|
end
|
@@ -120,15 +143,20 @@ opts = OptionParser.new do |o|
|
|
120
143
|
options[:verbose] = true
|
121
144
|
end
|
122
145
|
|
123
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
146
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
124
147
|
options[:debug] = true
|
125
148
|
end
|
126
149
|
|
150
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
151
|
+
options[:dry_run] = b
|
152
|
+
end
|
153
|
+
|
127
154
|
o.on('--','Anything after gets passed to GEMMA') do
|
128
155
|
o.terminate()
|
129
156
|
end
|
130
157
|
|
131
158
|
o.separator ""
|
159
|
+
|
132
160
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
133
161
|
options[:show_help] = true
|
134
162
|
end
|
@@ -168,26 +196,46 @@ warning = lambda do |*msg|
|
|
168
196
|
record[:warnings].push *msg.join("")
|
169
197
|
OUTPUT.print "WARNING: ",*msg,"\n"
|
170
198
|
end
|
199
|
+
|
171
200
|
info = lambda do |*msg|
|
172
201
|
record[:debug].push *msg.join("") if options[:json] and options[:debug]
|
173
202
|
OUTPUT.print *msg,"\n" if !options[:quiet]
|
174
203
|
end
|
175
204
|
|
205
|
+
# Fetch chromosomes
|
206
|
+
def get_chromosomes annofn
|
207
|
+
h = {}
|
208
|
+
File.open(annofn,"r").each_line do | line |
|
209
|
+
chr = line.split(/\s+/)[2]
|
210
|
+
h[chr] = true
|
211
|
+
end
|
212
|
+
h.map { |k,v| k }
|
213
|
+
end
|
176
214
|
# ---- Start banner
|
177
215
|
|
178
216
|
GEMMA_K_VERSION=version
|
179
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017
|
217
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
180
218
|
info.call GEMMA_K_BANNER
|
181
219
|
|
182
220
|
# Check gemma version
|
183
|
-
|
221
|
+
begin
|
222
|
+
gemma_command2 = options[:gemma_command]
|
223
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
224
|
+
|
225
|
+
GEMMA_INFO = `#{gemma_command2}`
|
226
|
+
rescue Errno::ENOENT
|
227
|
+
gemma_command2 = "gemma"
|
228
|
+
error.call "<#{gemma_command2}> command not found"
|
229
|
+
end
|
184
230
|
|
185
|
-
gemma_version_header =
|
231
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
186
232
|
info.call "Using ",gemma_version_header,"\n"
|
187
233
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
188
234
|
v_version, v_major, v_minor = gemma_version.split(".")
|
189
235
|
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
190
236
|
|
237
|
+
info.call gemma_version_header
|
238
|
+
|
191
239
|
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
192
240
|
|
193
241
|
options[:gemma_version_header] = gemma_version_header
|
@@ -203,74 +251,160 @@ if RUBY_VERSION =~ /^1/
|
|
203
251
|
warning "runs on Ruby 2.x only\n"
|
204
252
|
end
|
205
253
|
|
254
|
+
# ---- LOCO defaults to parallel
|
255
|
+
if options[:parallel] == nil
|
256
|
+
options[:parallel] = true if options[:loco]
|
257
|
+
end
|
258
|
+
|
259
|
+
debug.call(options) # some debug output
|
260
|
+
debug.call(record)
|
261
|
+
|
206
262
|
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
207
263
|
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
208
264
|
|
265
|
+
if options[:parallel]
|
266
|
+
begin
|
267
|
+
skip_cite = `echo "will cite" |parallel --citation`
|
268
|
+
debug.call(skip_cite)
|
269
|
+
PARALLEL_INFO = `parallel --help`
|
270
|
+
rescue Errno::ENOENT
|
271
|
+
error.call "<parallel> command not found"
|
272
|
+
end
|
273
|
+
parallel_cmds = []
|
274
|
+
end
|
275
|
+
|
276
|
+
# ---- Fetch chromosomes from SNP annotation file
|
277
|
+
anno_idx = gemma_args.index '-a'
|
278
|
+
raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
|
279
|
+
CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
|
280
|
+
|
209
281
|
# ---- Compute HASH on inputs
|
210
282
|
hashme = []
|
211
283
|
geno_idx = gemma_args.index '-g'
|
212
284
|
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
213
285
|
pheno_idx = gemma_args.index '-p'
|
214
|
-
hashme =
|
215
|
-
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
216
|
-
# Remove the phenotype file from the hash
|
217
|
-
gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
|
218
|
-
else
|
219
|
-
gemma_args
|
220
|
-
end
|
221
286
|
|
222
287
|
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
223
288
|
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
224
|
-
hashme += ['-p', options[:permute_phenotypes]]
|
225
289
|
end
|
226
290
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
291
|
+
execute = lambda { |cmd|
|
292
|
+
info.call("Executing: #{cmd}")
|
293
|
+
err = 0
|
294
|
+
if not options[:debug]
|
295
|
+
# send output to stderr line by line
|
296
|
+
IO.popen("#{cmd}") do |io|
|
297
|
+
while s = io.gets
|
298
|
+
$stderr.print s
|
299
|
+
end
|
300
|
+
io.close
|
301
|
+
err = $?.to_i
|
302
|
+
end
|
234
303
|
else
|
235
|
-
|
304
|
+
$stderr.print `#{cmd}`
|
305
|
+
err = $?.to_i
|
306
|
+
end
|
307
|
+
err
|
308
|
+
}
|
309
|
+
|
310
|
+
compute_hash = lambda do | phenofn = nil |
|
311
|
+
# Compute a HASH on the inputs
|
312
|
+
debug.call "Hashing on ",hashme,"\n"
|
313
|
+
hashes = []
|
314
|
+
hm = if phenofn
|
315
|
+
hashme + ["-p", phenofn]
|
316
|
+
else
|
317
|
+
hashme
|
318
|
+
end
|
319
|
+
debug.call(hm)
|
320
|
+
hm.each do | item |
|
321
|
+
if File.file?(item)
|
322
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
323
|
+
debug.call [item,hashes.last]
|
324
|
+
else
|
325
|
+
hashes << item
|
326
|
+
end
|
236
327
|
end
|
328
|
+
debug.call(hashes)
|
329
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
237
330
|
end
|
238
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
239
331
|
|
332
|
+
HASH = compute_hash.call()
|
240
333
|
options[:hash] = HASH
|
241
334
|
|
335
|
+
at_exit do
|
336
|
+
Lock.release(HASH)
|
337
|
+
end
|
338
|
+
|
339
|
+
Lock.create(HASH) # this will wait for a lock to expire
|
340
|
+
|
341
|
+
joblog = options[:cache_dir]+"/"+HASH+"-parallel.log"
|
342
|
+
|
242
343
|
# Create cache dir
|
243
344
|
FileUtils::mkdir_p options[:cache_dir]
|
244
345
|
|
346
|
+
Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
|
347
|
+
|
245
348
|
error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
|
246
349
|
error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
|
350
|
+
GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
|
247
351
|
gemma_args << '-outdir'
|
248
|
-
gemma_args <<
|
352
|
+
gemma_args << tmpdir
|
249
353
|
GEMMA_ARGS = gemma_args
|
250
354
|
|
355
|
+
hashme =
|
356
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
357
|
+
# Remove the phenotype file from the hash for GRM computation
|
358
|
+
GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
|
359
|
+
else
|
360
|
+
GEMMA_ARGS_HASH
|
361
|
+
end
|
362
|
+
|
251
363
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
252
364
|
|
253
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
254
|
-
cmd="#{
|
365
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
366
|
+
cmd = "#{gemma_command2} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
255
367
|
record[:gemma_command] = cmd
|
256
368
|
return if cache_hit
|
257
|
-
|
369
|
+
if options[:slurm]
|
370
|
+
info.call cmd
|
371
|
+
hashi = HASH
|
372
|
+
prefix = tmpdir+'/'+hashi
|
373
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
374
|
+
script = "#!/bin/bash
|
375
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
376
|
+
#SBATCH --ntasks=1
|
377
|
+
#SBATCH --time=20:00
|
378
|
+
srun #{cmd}
|
379
|
+
"
|
380
|
+
debug.call(script)
|
381
|
+
File.open(scriptfn,"w") { |f|
|
382
|
+
f.write(script)
|
383
|
+
}
|
384
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
385
|
+
end
|
258
386
|
errno =
|
259
387
|
if options[:json]
|
260
388
|
# capture output
|
261
389
|
err = 0
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
390
|
+
if options[:dry_run]
|
391
|
+
info.call("Would have invoked: ",cmd)
|
392
|
+
elsif options[:parallel]
|
393
|
+
info.call("Add parallel job: ",cmd)
|
394
|
+
parallel_cmds << cmd
|
395
|
+
else
|
396
|
+
err = execute.call(cmd)
|
268
397
|
end
|
269
398
|
err
|
270
399
|
else
|
271
|
-
|
272
|
-
|
273
|
-
|
400
|
+
if options[:dry_run]
|
401
|
+
info.call("Would have invoked ",cmd)
|
402
|
+
0
|
403
|
+
else
|
404
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
405
|
+
system(cmd)
|
406
|
+
$?.exitstatus
|
407
|
+
end
|
274
408
|
end
|
275
409
|
if errno != 0
|
276
410
|
debug.call "Gemma exit ",errno
|
@@ -280,11 +414,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
280
414
|
end
|
281
415
|
end
|
282
416
|
|
417
|
+
# Takes the hash value and checks whether the (output) file exists
|
283
418
|
# returns datafn, logfn, cache_hit
|
284
|
-
cache = lambda do | chr, ext |
|
419
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
285
420
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
286
|
-
hashi = (chr==nil ?
|
287
|
-
prefix = options[:cache_dir]+'/'+hashi
|
421
|
+
hashi = (chr==nil ? h : h+inject)
|
422
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
423
|
+
# for chr 3 and permutation 1 forms something like
|
424
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
288
425
|
logfn = prefix+".log.txt"
|
289
426
|
datafn = prefix+ext
|
290
427
|
record[:files] ||= []
|
@@ -320,25 +457,32 @@ kinship = lambda do | chr = nil |
|
|
320
457
|
end
|
321
458
|
|
322
459
|
# ---- Run GWA
|
323
|
-
gwas = lambda do | chr, kfn, pfn |
|
460
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
324
461
|
record[:type] = "GWA"
|
325
462
|
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
326
|
-
|
463
|
+
# Update hash for each permutation
|
464
|
+
hash = compute_hash.call(pfn)
|
465
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
327
466
|
if not cache_hit
|
328
467
|
args = [ '-k', kfn, '-o', hashi ]
|
329
468
|
args << [ '-loco', chr ] if chr != nil
|
330
469
|
args << [ '-p', pfn ] if pfn
|
331
|
-
invoke_gemma.call args
|
470
|
+
invoke_gemma.call args,false,chr,permutation
|
332
471
|
end
|
333
472
|
end
|
334
473
|
|
335
474
|
LOCO = options[:loco]
|
336
|
-
|
475
|
+
if LOCO
|
476
|
+
if options[:chromosomes]
|
477
|
+
CHROMOSOMES = options[:chromosomes]
|
478
|
+
end
|
479
|
+
end
|
480
|
+
|
337
481
|
if DO_COMPUTE_KINSHIP
|
338
482
|
# compute K
|
339
|
-
info.call
|
340
|
-
if LOCO
|
341
|
-
|
483
|
+
info.call CHROMOSOMES
|
484
|
+
if LOCO
|
485
|
+
CHROMOSOMES.each do |chr|
|
342
486
|
info.call "LOCO for ",chr
|
343
487
|
kinship.call(chr)
|
344
488
|
end
|
@@ -347,13 +491,24 @@ if DO_COMPUTE_KINSHIP
|
|
347
491
|
end
|
348
492
|
else
|
349
493
|
# DO_COMPUTE_GWA
|
350
|
-
|
494
|
+
begin
|
495
|
+
json_in = JSON.parse(File.read(options[:input]))
|
496
|
+
rescue TypeError
|
497
|
+
raise "Missing JSON input file?"
|
498
|
+
end
|
351
499
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
352
500
|
|
353
501
|
pfn = options[:permute_phenotypes] # can be nil
|
354
|
-
|
355
|
-
|
356
|
-
|
502
|
+
if LOCO
|
503
|
+
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
504
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
505
|
+
gwas.call(chr,kfn,pfn)
|
506
|
+
end
|
507
|
+
else
|
508
|
+
kfn = json_in["files"][0][2]
|
509
|
+
CHROMOSOMES.each do | chr |
|
510
|
+
gwas.call(chr,kfn,pfn)
|
511
|
+
end
|
357
512
|
end
|
358
513
|
# Permute
|
359
514
|
if options[:permutate]
|
@@ -364,10 +519,10 @@ else
|
|
364
519
|
end
|
365
520
|
score_list = []
|
366
521
|
debug.call(options[:permutate],"x permutations")
|
367
|
-
(1..options[:permutate]).each do |
|
368
|
-
$stderr.print "Iteration ",
|
522
|
+
(1..options[:permutate]).each do |permutation|
|
523
|
+
$stderr.print "Iteration ",permutation,"\n"
|
369
524
|
# Create a shuffled phenotype file
|
370
|
-
file = File.open("phenotypes-#{
|
525
|
+
file = File.open("phenotypes-#{permutation}","w")
|
371
526
|
tmp_pfn = file.path
|
372
527
|
p tmp_pfn
|
373
528
|
ps.shuffle.each do | l |
|
@@ -375,20 +530,23 @@ else
|
|
375
530
|
end
|
376
531
|
file.close
|
377
532
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
378
|
-
gwas.call(chr,kfn,tmp_pfn)
|
533
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
379
534
|
end
|
380
|
-
# p [:HEY,record[:files].last]
|
381
|
-
assocfn = record[:files].last[2]
|
382
|
-
debug.call("Reading ",assocfn)
|
383
535
|
score_min = 1000.0
|
384
|
-
|
385
|
-
|
386
|
-
|
387
|
-
|
536
|
+
if false and not options[:slurm]
|
537
|
+
# p [:HEY,record[:files].last]
|
538
|
+
assocfn = record[:files].last[2]
|
539
|
+
debug.call("Reading ",assocfn)
|
540
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
541
|
+
if assoc_line_num > 0
|
542
|
+
value = assoc.strip.split(/\t/).last.to_f
|
543
|
+
score_min = value if value < score_min
|
544
|
+
end
|
388
545
|
end
|
389
546
|
end
|
390
547
|
score_list << score_min
|
391
548
|
end
|
549
|
+
exit 0 if options[:slurm]
|
392
550
|
ls = score_list.sort
|
393
551
|
p ls
|
394
552
|
significant = ls[(ls.size - ls.size*0.95).floor]
|
@@ -399,5 +557,40 @@ else
|
|
399
557
|
end
|
400
558
|
end
|
401
559
|
|
560
|
+
# ---- Invoke parallel
|
561
|
+
if options[:parallel]
|
562
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
563
|
+
cmd = parallel_cmds.join("\\n")
|
564
|
+
|
565
|
+
cmd = "echo -e \"#{cmd}\""
|
566
|
+
err = execute.call(cmd+"|parallel --joblog #{joblog}") # first try optimistically to run all jobs in parallel
|
567
|
+
if err != 0
|
568
|
+
[16,8,4,1].each do |jobs|
|
569
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
570
|
+
err = execute.call(cmd+"|parallel -j #{jobs} --resume --joblog #{joblog}")
|
571
|
+
break if err == 0
|
572
|
+
end
|
573
|
+
if err != 0
|
574
|
+
info.call("Run failed!")
|
575
|
+
# Remove remaining files
|
576
|
+
FileUtils.rm_rf("#{tmpdir}/*", secure: true)
|
577
|
+
exit err
|
578
|
+
end
|
579
|
+
end
|
580
|
+
info.call("Run successful!")
|
581
|
+
end
|
402
582
|
json_out.call
|
403
|
-
|
583
|
+
|
584
|
+
# copy all output files to the cache_dir. If a file exists only emit a warning
|
585
|
+
Dir.glob("*.txt", base: tmpdir) do | fn |
|
586
|
+
source = tmpdir + "/" + fn
|
587
|
+
dest = options[:cache_dir] + "/" + fn
|
588
|
+
if not File.exist?(dest) or options[:force]
|
589
|
+
info.call "Move #{source} to #{dest}"
|
590
|
+
FileUtils.mv source, dest, verbose: false
|
591
|
+
else
|
592
|
+
warning.call "File #{dest} already exists. Not overwriting"
|
593
|
+
end
|
594
|
+
end
|
595
|
+
|
596
|
+
end # tmpdir
|
data/gemma-wrapper.gemspec
CHANGED
@@ -2,10 +2,11 @@ Gem::Specification.new do |s|
|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
4
|
s.summary = "GEMMA with LOCO and permutations"
|
5
|
-
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
9
|
+
"lib/lock.rb",
|
9
10
|
"Gemfile",
|
10
11
|
"LICENSE.txt",
|
11
12
|
"README.md",
|
data/lib/lock.rb
ADDED
@@ -0,0 +1,95 @@
|
|
1
|
+
# Locking module for gemma (wrapper)
|
2
|
+
#
|
3
|
+
|
4
|
+
=begin
|
5
|
+
|
6
|
+
The logic is as follows:
|
7
|
+
|
8
|
+
1. a program creates a named lock file (based on a hash of its inputs) with its PID
|
9
|
+
2. on exit it destroys the file
|
10
|
+
3. a new program checks for the lock file
|
11
|
+
4. if it exists and the PID is still in the ps table - wait
|
12
|
+
5. when the pid disappears or the lock file - continue
|
13
|
+
6. a timeout will return an error in 3 minutes
|
14
|
+
|
15
|
+
Note that there is a theoretical chance the lock file existed, but disappeared. I think I have it covered by ignoring the unlink errors. Also the use of /proc/PID is Linux specific.
|
16
|
+
|
17
|
+
=end
|
18
|
+
|
19
|
+
|
20
|
+
require 'timeout'
|
21
|
+
|
22
|
+
module Lock
|
23
|
+
|
24
|
+
def self.local name
|
25
|
+
ENV['HOME']+"/."+name.gsub("/","-")+".lck"
|
26
|
+
end
|
27
|
+
|
28
|
+
def self.lock_pid name
|
29
|
+
lockfn = local(name)
|
30
|
+
if File.exist?(lockfn)
|
31
|
+
File.read(lockfn).to_i
|
32
|
+
else
|
33
|
+
0
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def self.locked? name
|
38
|
+
lockfn = local(name)
|
39
|
+
pid = lock_pid(name)
|
40
|
+
if File.exist?("/proc/#{pid}")
|
41
|
+
true
|
42
|
+
else
|
43
|
+
# the program went away - remove any 'stale' lock
|
44
|
+
begin
|
45
|
+
File.unlink(lockfn)
|
46
|
+
rescue Errno::ENOENT
|
47
|
+
# ignore error when the lock file went missing
|
48
|
+
end
|
49
|
+
false # --> no longer locked
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def Lock::create name
|
54
|
+
wait_for(name)
|
55
|
+
lockfn = local(name)
|
56
|
+
if File.exist?(lockfn)
|
57
|
+
$stderr.print "\nERROR: Can not steal #{lockfn}"
|
58
|
+
exit 1
|
59
|
+
end
|
60
|
+
File.open(lockfn, File::RDWR|File::CREAT, 0644) do |f|
|
61
|
+
f.flock(File::LOCK_EX)
|
62
|
+
f.print(Process.pid)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def Lock::wait_for name
|
67
|
+
lockfn = local(name)
|
68
|
+
begin
|
69
|
+
status = Timeout::timeout(180) { # 3 minutes
|
70
|
+
while locked?(name)
|
71
|
+
$stderr.print("\nWaiting for lock #{lockfn}...")
|
72
|
+
sleep 2
|
73
|
+
end
|
74
|
+
}
|
75
|
+
rescue Timeout::Error
|
76
|
+
$stderr.print "\nERROR: Timed out, but I can not steal #{lockfn}"
|
77
|
+
exit 1
|
78
|
+
end
|
79
|
+
# yah! lock is released
|
80
|
+
end
|
81
|
+
|
82
|
+
def Lock::release name
|
83
|
+
lockfn = local(name)
|
84
|
+
if Process.pid == lock_pid(name)
|
85
|
+
begin
|
86
|
+
File.unlink(lockfn) # PID expired
|
87
|
+
rescue Errno::ENOENT
|
88
|
+
# ignore error when the lock file went missing
|
89
|
+
end
|
90
|
+
else
|
91
|
+
$stderr.print "\nERROR: can not release #{lockfn} because it is not owned by me"
|
92
|
+
end
|
93
|
+
end
|
94
|
+
|
95
|
+
end
|
metadata
CHANGED
@@ -1,17 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.99.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-11-25 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper adds LOCO and permutation support. Also
|
14
|
-
runs with LOCO support
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
|
14
|
+
and caches K between runs with LOCO support
|
15
15
|
email: pjotr.public01@thebird.nl
|
16
16
|
executables:
|
17
17
|
- gemma-wrapper
|
@@ -24,6 +24,7 @@ files:
|
|
24
24
|
- VERSION
|
25
25
|
- bin/gemma-wrapper
|
26
26
|
- gemma-wrapper.gemspec
|
27
|
+
- lib/lock.rb
|
27
28
|
homepage: https://github.com/genetics-statistics/gemma-wrapper
|
28
29
|
licenses:
|
29
30
|
- GPL3
|
@@ -43,8 +44,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
44
|
- !ruby/object:Gem::Version
|
44
45
|
version: '0'
|
45
46
|
requirements: []
|
46
|
-
|
47
|
-
rubygems_version: 2.6.8
|
47
|
+
rubygems_version: 3.1.4
|
48
48
|
signing_key:
|
49
49
|
specification_version: 4
|
50
50
|
summary: GEMMA with LOCO and permutations
|