bio-gemma-wrapper 0.98.1 → 0.99.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +5 -5
- data/README.md +57 -19
- data/VERSION +1 -1
- data/bin/gemma-wrapper +262 -69
- data/gemma-wrapper.gemspec +2 -1
- data/lib/lock.rb +95 -0
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
|
-
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
2
|
+
SHA256:
|
3
|
+
metadata.gz: da5f26b8acd9c3782c2b3f5f2a39af965fc7e1785cc820b49faca82924d74e51
|
4
|
+
data.tar.gz: 17035ee5fada269ae88dd0ed91d84075b2af88b400de1d0e9829cbdb60d5d0cb
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: eaec3c7dad4fc1bda713765e056bfe11dd69d4ca850333fed5a1a27e344724365a705ddf7845ce63b5af6b35ab6140da10f4bc7067aaa4539e47f6c6f94de1f0
|
7
|
+
data.tar.gz: c26b282c0fd7c70a702467e58c3f6ea22f820d91a8a364b335cdab7e807add9cf1079faa25c46a87b89c78e4293990cdea2210427c5f9c3565bd5040fdbef496
|
data/README.md
CHANGED
@@ -1,12 +1,20 @@
|
|
1
1
|
[](https://badge.fury.io/rb/bio-gemma-wrapper)
|
2
2
|
|
3
|
-
# GEMMA
|
3
|
+
# GEMMA with LOCO, permutations and slurm support (and caching)
|
4
4
|
|
5
5
|

|
7
7
|
|
8
8
|
## Introduction
|
9
9
|
|
10
|
+
Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
|
11
|
+
GEMMA in parallel (now the default with LOCO), and GEMMA on
|
12
|
+
PBS. Gemma-wrapper is used to run GEMMA as part of the
|
13
|
+
https://genenetwork.org/ environment.
|
14
|
+
|
15
|
+
Note that a version of gemma-wrapper is projected to be integrated
|
16
|
+
into gemma itself.
|
17
|
+
|
10
18
|
GEMMA is a software toolkit for fast application of linear mixed
|
11
19
|
models (LMMs) and related models to genome-wide association studies
|
12
20
|
(GWAS) and other large-scale data sets.
|
@@ -14,15 +22,21 @@ models (LMMs) and related models to genome-wide association studies
|
|
14
22
|
This repository contains gemma-wrapper, essentially a wrapper of
|
15
23
|
GEMMA that provides support for caching the kinship or relatedness
|
16
24
|
matrix (K) and caching LM and LMM computations with the option of full
|
17
|
-
leave-one-chromosome-out genome scans (LOCO).
|
25
|
+
leave-one-chromosome-out genome scans (LOCO). Jobs can also be
|
26
|
+
submitted to HPC PBS, i.e., slurm.
|
18
27
|
|
19
28
|
gemma-wrapper requires a recent version of GEMMA and essentially
|
20
29
|
does a pass-through of all standard GEMMA invocation switches. On
|
21
30
|
return gemma-wrapper can return a JSON object (--json) which is
|
22
31
|
useful for web-services.
|
23
32
|
|
24
|
-
|
25
|
-
|
33
|
+
## Performance
|
34
|
+
|
35
|
+
LOCO runs in parallel by default which is at least a 5x performance
|
36
|
+
improvement on a machine with enough cores. GEMMA without LOCO,
|
37
|
+
however, does not run in parallel by default. Performance
|
38
|
+
improvements with the parallel implementation for LOCO and non-LOCO
|
39
|
+
can be viewed [here](./test/performance/releases.gmi).
|
26
40
|
|
27
41
|
## Installation
|
28
42
|
|
@@ -32,8 +46,9 @@ Prerequisites are
|
|
32
46
|
* Standard [Ruby >2.0 ](https://www.ruby-lang.org/en/) which comes on
|
33
47
|
almost all Linux systems
|
34
48
|
|
35
|
-
gemma-wrapper comes as a Ruby
|
36
|
-
can be
|
49
|
+
gemma-wrapper comes as a Ruby
|
50
|
+
[gem](https://rubygems.org/gems/bio-gemma-wrapper) and can be
|
51
|
+
installed with
|
37
52
|
|
38
53
|
gem install bio-gemma-wrapper
|
39
54
|
|
@@ -47,14 +62,19 @@ and it will render something like
|
|
47
62
|
Usage: gemma-wrapper [options] -- [gemma-options]
|
48
63
|
--permutate n Permutate # times by shuffling phenotypes
|
49
64
|
--permute-phenotypes filen Phenotypes to be shuffled in permutations
|
50
|
-
--loco
|
65
|
+
--loco Run full leave-one-chromosome-out (LOCO)
|
66
|
+
--chromosomes [1,2,3] Run specific chromosomes
|
51
67
|
--input filen JSON input variables (used for LOCO)
|
52
68
|
--cache-dir path Use a cache directory
|
53
69
|
--json Create output file in JSON format
|
54
|
-
--force Force computation
|
70
|
+
--force Force computation (override cache)
|
71
|
+
--parallel Run jobs in parallel
|
72
|
+
--no-parallel Do not run jobs in parallel
|
73
|
+
--slurm[=opts] Use slurm PBS for submitting jobs
|
55
74
|
--q, --quiet Run quietly
|
56
75
|
-v, --verbose Run verbosely
|
57
|
-
|
76
|
+
-d, --debug Show debug messages and keep intermediate output
|
77
|
+
--dry-run Show commands, but don't execute
|
58
78
|
-- Anything after gets passed to GEMMA
|
59
79
|
|
60
80
|
-h, --help display this help and exit
|
@@ -69,6 +89,8 @@ Unpack it and run the tool as
|
|
69
89
|
|
70
90
|
./bin/gemma-wrapper --help
|
71
91
|
|
92
|
+
See below for using a GNU Guix environment.
|
93
|
+
|
72
94
|
## Usage
|
73
95
|
|
74
96
|
gemma-wrapper picks up GEMMA from the PATH. To override that behaviour
|
@@ -90,12 +112,13 @@ the data files are found):
|
|
90
112
|
gemma-wrapper -- \
|
91
113
|
-g test/data/input/BXD_geno.txt.gz \
|
92
114
|
-p test/data/input/BXD_pheno.txt \
|
115
|
+
-a test/data/input/BXD_snps.txt \
|
93
116
|
-gk \
|
94
117
|
-debug
|
95
118
|
|
96
119
|
Run it twice to see
|
97
120
|
|
98
|
-
/tmp/
|
121
|
+
/tmp/0bdd7add5e8f7d9af36b283d0341c115124273e0.log.txt CACHE HIT!
|
99
122
|
|
100
123
|
gemma-wrapper computes the unique HASH value over the command
|
101
124
|
line switches passed into GEMMA as well as the contents of the files
|
@@ -107,10 +130,12 @@ You can also get JSON output on STDOUT by providing the --json switch
|
|
107
130
|
gemma-wrapper --json -- \
|
108
131
|
-g test/data/input/BXD_geno.txt.gz \
|
109
132
|
-p test/data/input/BXD_pheno.txt \
|
133
|
+
-a test/data/input/BXD_snps.txt \
|
110
134
|
-gk \
|
111
|
-
-debug
|
135
|
+
-debug > K.json
|
112
136
|
|
113
|
-
|
137
|
+
K.json is something that can be parsed with a calling program, and is
|
138
|
+
also below as input for the GWA step. Example:
|
114
139
|
|
115
140
|
```json
|
116
141
|
{"warnings":[],"errno":0,"debug":[],"type":"K","files":[["/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.log.txt","/tmp/18ce786ab92064a7ee38a7422e7838abf91f5eb0.cXX.txt"]],"cache_hit":true,"gemma_command":"../gemma/bin/gemma -g test/data/input/BXD_geno.txt.gz -p test/data/input/BXD_pheno.txt -gk -debug -outdir /tmp -o 18ce786ab92064a7ee38a7422e7838abf91f5eb0"}
|
@@ -123,6 +148,7 @@ default. If you want something else provide a --cache-dir, e.g.
|
|
123
148
|
gemma-wrapper --cache-dir ~/.gemma-cache -- \
|
124
149
|
-g test/data/input/BXD_geno.txt.gz \
|
125
150
|
-p test/data/input/BXD_pheno.txt \
|
151
|
+
-a test/data/input/BXD_snps.txt \
|
126
152
|
-gk \
|
127
153
|
-debug
|
128
154
|
|
@@ -130,10 +156,10 @@ will store K in ~/.gemma-cache.
|
|
130
156
|
|
131
157
|
### GWA
|
132
158
|
|
133
|
-
Run the LMM using the K's captured in K.json using the --input
|
159
|
+
Run the LMM using the K's captured earlier in K.json using the --input
|
134
160
|
switch
|
135
161
|
|
136
|
-
gemma-wrapper --json --
|
162
|
+
gemma-wrapper --json --input K.json -- \
|
137
163
|
-g test/data/input/BXD_geno.txt.gz \
|
138
164
|
-p test/data/input/BXD_pheno.txt \
|
139
165
|
-c test/data/input/BXD_covariates2.txt \
|
@@ -153,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
|
|
153
179
|
chromosomes first create all K's with
|
154
180
|
|
155
181
|
gemma-wrapper --json \
|
156
|
-
--loco
|
182
|
+
--loco -- \
|
157
183
|
-g test/data/input/BXD_geno.txt.gz \
|
158
184
|
-p test/data/input/BXD_pheno.txt \
|
159
185
|
-a test/data/input/BXD_snps.txt \
|
@@ -201,12 +227,24 @@ Next, using K.json, permute the phenotypes with something like
|
|
201
227
|
-lmm 2 -maf 0.1 \
|
202
228
|
-debug > GWA.json
|
203
229
|
|
204
|
-
This should get the 95% significant and 67% suggestive thresholds:
|
230
|
+
This should get the estimated 95% (significant) and 67% (suggestive) thresholds:
|
231
|
+
|
232
|
+
["95 percentile (significant) ", 1.92081e-05, 4.7]
|
233
|
+
["67 percentile (suggestive) ", 5.227785e-05, 4.3]
|
234
|
+
|
235
|
+
### Slurm PBS
|
205
236
|
|
206
|
-
|
207
|
-
["67 percentile (suggestive) ", 2.015475e-05, 4.7]
|
237
|
+
To run gemma-wrapper on HPC use the '--slurm' switch.
|
208
238
|
|
239
|
+
## Development
|
240
|
+
|
241
|
+
We use GNU Guix for development and deployment. Use the [.guix-deploy](.guix-deploy) script in the checked out git repo:
|
242
|
+
|
243
|
+
```
|
244
|
+
source .guix-deploy
|
245
|
+
ruby bin/gemma-wrapper --help
|
246
|
+
```
|
209
247
|
|
210
248
|
## Copyright
|
211
249
|
|
212
|
-
Copyright (c) 2017
|
250
|
+
Copyright (c) 2017-2021 Pjotr Prins. See [LICENSE.txt](LICENSE.txt) for further details.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.
|
1
|
+
0.99.4
|
data/bin/gemma-wrapper
CHANGED
@@ -4,7 +4,7 @@
|
|
4
4
|
# Author:: Pjotr Prins
|
5
5
|
# License:: GPL3
|
6
6
|
#
|
7
|
-
# Copyright (C) 2017
|
7
|
+
# Copyright (C) 2017-2021 Pjotr Prins <pjotr.prins@thebird.nl>
|
8
8
|
|
9
9
|
USAGE = "
|
10
10
|
GEMMA wrapper example:
|
@@ -14,12 +14,12 @@ GEMMA wrapper example:
|
|
14
14
|
gemma-wrapper -- \\
|
15
15
|
-g test/data/input/BXD_geno.txt.gz \\
|
16
16
|
-p test/data/input/BXD_pheno.txt \\
|
17
|
+
-a test/data/input/BXD_snps.txt \
|
17
18
|
-gk
|
18
19
|
|
19
20
|
LOCO K computation with caching and JSON output
|
20
21
|
|
21
|
-
gemma-wrapper --json \\
|
22
|
-
--loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
|
22
|
+
gemma-wrapper --json --loco -- \\
|
23
23
|
-g test/data/input/BXD_geno.txt.gz \\
|
24
24
|
-p test/data/input/BXD_pheno.txt \\
|
25
25
|
-a test/data/input/BXD_snps.txt \\
|
@@ -38,11 +38,10 @@ GEMMA wrapper example:
|
|
38
38
|
Gemma gets used from the path. You can override by setting
|
39
39
|
|
40
40
|
env GEMMA_COMMAND=path/bin/gemma gemma-wrapper ...
|
41
|
-
|
42
41
|
"
|
43
42
|
# These are used for testing compatibility with the gemma tool
|
44
43
|
GEMMA_V_MAJOR = 98
|
45
|
-
GEMMA_V_MINOR =
|
44
|
+
GEMMA_V_MINOR = 4
|
46
45
|
|
47
46
|
basepath = File.dirname(File.dirname(__FILE__))
|
48
47
|
$: << File.join(basepath,'lib')
|
@@ -66,17 +65,21 @@ if not gemma_command
|
|
66
65
|
end
|
67
66
|
|
68
67
|
|
68
|
+
require 'digest/sha1'
|
69
69
|
require 'fileutils'
|
70
70
|
require 'optparse'
|
71
|
-
require 'tmpdir'
|
72
71
|
require 'tempfile'
|
72
|
+
require 'tmpdir'
|
73
|
+
|
74
|
+
require 'lock'
|
73
75
|
|
74
76
|
split_at = ARGV.index('--')
|
77
|
+
|
75
78
|
if split_at
|
76
79
|
gemma_args = ARGV[split_at+1..-1]
|
77
80
|
end
|
78
81
|
|
79
|
-
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir() }
|
82
|
+
options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
|
80
83
|
|
81
84
|
opts = OptionParser.new do |o|
|
82
85
|
o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
|
@@ -91,8 +94,12 @@ opts = OptionParser.new do |o|
|
|
91
94
|
raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
|
92
95
|
end
|
93
96
|
|
94
|
-
o.on('--loco
|
95
|
-
options[:loco] =
|
97
|
+
o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
|
98
|
+
options[:loco] = b
|
99
|
+
end
|
100
|
+
|
101
|
+
o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
|
102
|
+
options[:chromosomes] = lst
|
96
103
|
end
|
97
104
|
|
98
105
|
o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
|
@@ -112,6 +119,22 @@ opts = OptionParser.new do |o|
|
|
112
119
|
options[:force] = true
|
113
120
|
end
|
114
121
|
|
122
|
+
o.on("--parallel", "Run jobs in parallel") do |b|
|
123
|
+
options[:parallel] = true
|
124
|
+
end
|
125
|
+
|
126
|
+
o.on("--no-parallel", "Do not run jobs in parallel") do |b|
|
127
|
+
options[:parallel] = false
|
128
|
+
end
|
129
|
+
|
130
|
+
o.on("--slurm[=opts]",String,"Use slurm PBS for submitting jobs") do |slurm|
|
131
|
+
options[:slurm_opts] = ""
|
132
|
+
options[:slurm] = true
|
133
|
+
if slurm
|
134
|
+
options[:slurm_opts] = slurm
|
135
|
+
end
|
136
|
+
end
|
137
|
+
|
115
138
|
o.on("--q", "--quiet", "Run quietly") do |q|
|
116
139
|
options[:quiet] = true
|
117
140
|
end
|
@@ -120,15 +143,20 @@ opts = OptionParser.new do |o|
|
|
120
143
|
options[:verbose] = true
|
121
144
|
end
|
122
145
|
|
123
|
-
o.on("--debug", "Show debug messages and keep intermediate output") do |v|
|
146
|
+
o.on("-d", "--debug", "Show debug messages and keep intermediate output") do |v|
|
124
147
|
options[:debug] = true
|
125
148
|
end
|
126
149
|
|
150
|
+
o.on("--dry-run", "Show commands, but don't execute") do |b|
|
151
|
+
options[:dry_run] = b
|
152
|
+
end
|
153
|
+
|
127
154
|
o.on('--','Anything after gets passed to GEMMA') do
|
128
155
|
o.terminate()
|
129
156
|
end
|
130
157
|
|
131
158
|
o.separator ""
|
159
|
+
|
132
160
|
o.on_tail('-h', '--help', 'display this help and exit') do
|
133
161
|
options[:show_help] = true
|
134
162
|
end
|
@@ -168,26 +196,46 @@ warning = lambda do |*msg|
|
|
168
196
|
record[:warnings].push *msg.join("")
|
169
197
|
OUTPUT.print "WARNING: ",*msg,"\n"
|
170
198
|
end
|
199
|
+
|
171
200
|
info = lambda do |*msg|
|
172
201
|
record[:debug].push *msg.join("") if options[:json] and options[:debug]
|
173
202
|
OUTPUT.print *msg,"\n" if !options[:quiet]
|
174
203
|
end
|
175
204
|
|
205
|
+
# Fetch chromosomes
|
206
|
+
def get_chromosomes annofn
|
207
|
+
h = {}
|
208
|
+
File.open(annofn,"r").each_line do | line |
|
209
|
+
chr = line.split(/\s+/)[2]
|
210
|
+
h[chr] = true
|
211
|
+
end
|
212
|
+
h.map { |k,v| k }
|
213
|
+
end
|
176
214
|
# ---- Start banner
|
177
215
|
|
178
216
|
GEMMA_K_VERSION=version
|
179
|
-
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017
|
217
|
+
GEMMA_K_BANNER = "gemma-wrapper #{version} (Ruby #{RUBY_VERSION}) by Pjotr Prins 2017-2021\n"
|
180
218
|
info.call GEMMA_K_BANNER
|
181
219
|
|
182
220
|
# Check gemma version
|
183
|
-
|
221
|
+
begin
|
222
|
+
gemma_command2 = options[:gemma_command]
|
223
|
+
info.call "NOTE: gemma-wrapper is soon to be replaced by gemma2/lib"
|
224
|
+
|
225
|
+
GEMMA_INFO = `#{gemma_command2}`
|
226
|
+
rescue Errno::ENOENT
|
227
|
+
gemma_command2 = "gemma"
|
228
|
+
error.call "<#{gemma_command2}> command not found"
|
229
|
+
end
|
184
230
|
|
185
|
-
gemma_version_header =
|
231
|
+
gemma_version_header = GEMMA_INFO.split("\n").grep(/GEMMA|Version/)[0].strip
|
186
232
|
info.call "Using ",gemma_version_header,"\n"
|
187
233
|
gemma_version = gemma_version_header.split(/[,\s]+/)[1]
|
188
234
|
v_version, v_major, v_minor = gemma_version.split(".")
|
189
235
|
info.call "Found #{gemma_version}, comparing against expected v0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}"
|
190
236
|
|
237
|
+
info.call gemma_version_header
|
238
|
+
|
191
239
|
warning.call "GEMMA version is out of date. Update GEMMA to 0.#{GEMMA_V_MAJOR}.#{GEMMA_V_MINOR}!" if v_major.to_i < GEMMA_V_MAJOR or (v_major.to_i == GEMMA_V_MAJOR and (v_minor != nil and v_minor.to_i < GEMMA_V_MINOR))
|
192
240
|
|
193
241
|
options[:gemma_version_header] = gemma_version_header
|
@@ -203,74 +251,160 @@ if RUBY_VERSION =~ /^1/
|
|
203
251
|
warning "runs on Ruby 2.x only\n"
|
204
252
|
end
|
205
253
|
|
254
|
+
# ---- LOCO defaults to parallel
|
255
|
+
if options[:parallel] == nil
|
256
|
+
options[:parallel] = true if options[:loco]
|
257
|
+
end
|
258
|
+
|
259
|
+
debug.call(options) # some debug output
|
260
|
+
debug.call(record)
|
261
|
+
|
206
262
|
DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
|
207
263
|
DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
|
208
264
|
|
265
|
+
if options[:parallel]
|
266
|
+
begin
|
267
|
+
skip_cite = `echo "will cite" |parallel --citation`
|
268
|
+
debug.call(skip_cite)
|
269
|
+
PARALLEL_INFO = `parallel --help`
|
270
|
+
rescue Errno::ENOENT
|
271
|
+
error.call "<parallel> command not found"
|
272
|
+
end
|
273
|
+
parallel_cmds = []
|
274
|
+
end
|
275
|
+
|
276
|
+
# ---- Fetch chromosomes from SNP annotation file
|
277
|
+
anno_idx = gemma_args.index '-a'
|
278
|
+
raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
|
279
|
+
CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
|
280
|
+
|
209
281
|
# ---- Compute HASH on inputs
|
210
282
|
hashme = []
|
211
283
|
geno_idx = gemma_args.index '-g'
|
212
284
|
raise "Expected GEMMA -g genotype file switch" if geno_idx == nil
|
213
285
|
pheno_idx = gemma_args.index '-p'
|
214
|
-
hashme =
|
215
|
-
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
216
|
-
# Remove the phenotype file from the hash
|
217
|
-
gemma_args[0..pheno_idx-1] + gemma_args[pheno_idx+2..-1]
|
218
|
-
else
|
219
|
-
gemma_args
|
220
|
-
end
|
221
286
|
|
222
287
|
if DO_COMPUTE_GWA and options[:permute_phenotypes]
|
223
288
|
raise "Did not expect GEMMA -p phenotype whith permutations (only use --permutate-phenotypes)" if pheno_idx
|
224
|
-
hashme += ['-p', options[:permute_phenotypes]]
|
225
289
|
end
|
226
290
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
|
233
|
-
|
291
|
+
execute = lambda { |cmd|
|
292
|
+
info.call("Executing: #{cmd}")
|
293
|
+
err = 0
|
294
|
+
if not options[:debug]
|
295
|
+
# send output to stderr line by line
|
296
|
+
IO.popen("#{cmd}") do |io|
|
297
|
+
while s = io.gets
|
298
|
+
$stderr.print s
|
299
|
+
end
|
300
|
+
io.close
|
301
|
+
err = $?.to_i
|
302
|
+
end
|
234
303
|
else
|
235
|
-
|
304
|
+
$stderr.print `#{cmd}`
|
305
|
+
err = $?.to_i
|
306
|
+
end
|
307
|
+
err
|
308
|
+
}
|
309
|
+
|
310
|
+
compute_hash = lambda do | phenofn = nil |
|
311
|
+
# Compute a HASH on the inputs
|
312
|
+
debug.call "Hashing on ",hashme,"\n"
|
313
|
+
hashes = []
|
314
|
+
hm = if phenofn
|
315
|
+
hashme + ["-p", phenofn]
|
316
|
+
else
|
317
|
+
hashme
|
318
|
+
end
|
319
|
+
debug.call(hm)
|
320
|
+
hm.each do | item |
|
321
|
+
if File.file?(item)
|
322
|
+
hashes << Digest::SHA1.hexdigest(File.read(item))
|
323
|
+
debug.call [item,hashes.last]
|
324
|
+
else
|
325
|
+
hashes << item
|
326
|
+
end
|
236
327
|
end
|
328
|
+
debug.call(hashes)
|
329
|
+
Digest::SHA1.hexdigest hashes.join(' ')
|
237
330
|
end
|
238
|
-
HASH = Digest::SHA1.hexdigest hashes.join(' ')
|
239
331
|
|
332
|
+
HASH = compute_hash.call()
|
240
333
|
options[:hash] = HASH
|
241
334
|
|
335
|
+
at_exit do
|
336
|
+
Lock.release(HASH)
|
337
|
+
end
|
338
|
+
|
339
|
+
Lock.create(HASH) # this will wait for a lock to expire
|
340
|
+
|
341
|
+
joblog = options[:cache_dir]+"/"+HASH+"-parallel.log"
|
342
|
+
|
242
343
|
# Create cache dir
|
243
344
|
FileUtils::mkdir_p options[:cache_dir]
|
244
345
|
|
346
|
+
Dir.mktmpdir do |tmpdir| # tmpdir for GEMMA output
|
347
|
+
|
245
348
|
error.call "Do not use the GEMMA -o switch!" if gemma_args.include? '-o'
|
246
349
|
error.call "Do not use the GEMMA -outdir switch!" if gemma_args.include? '-outdir'
|
350
|
+
GEMMA_ARGS_HASH = gemma_args.dup # do not include outdir
|
247
351
|
gemma_args << '-outdir'
|
248
|
-
gemma_args <<
|
352
|
+
gemma_args << tmpdir
|
249
353
|
GEMMA_ARGS = gemma_args
|
250
354
|
|
355
|
+
hashme =
|
356
|
+
if DO_COMPUTE_KINSHIP and pheno_idx != nil
|
357
|
+
# Remove the phenotype file from the hash for GRM computation
|
358
|
+
GEMMA_ARGS_HASH[0..pheno_idx-1] + GEMMA_ARGS_HASH[pheno_idx+2..-1]
|
359
|
+
else
|
360
|
+
GEMMA_ARGS_HASH
|
361
|
+
end
|
362
|
+
|
251
363
|
debug.call "Options: ",options,"\n" if !options[:quiet]
|
252
364
|
|
253
|
-
invoke_gemma = lambda do |extra_args, cache_hit = false|
|
254
|
-
cmd="#{
|
365
|
+
invoke_gemma = lambda do |extra_args, cache_hit = false, chr = "full", permutation = 1|
|
366
|
+
cmd = "#{gemma_command2} #{GEMMA_ARGS.join(' ')} #{extra_args.join(' ')}"
|
255
367
|
record[:gemma_command] = cmd
|
256
368
|
return if cache_hit
|
257
|
-
|
369
|
+
if options[:slurm]
|
370
|
+
info.call cmd
|
371
|
+
hashi = HASH
|
372
|
+
prefix = tmpdir+'/'+hashi
|
373
|
+
scriptfn = prefix+".#{chr}.#{permutation}-pbs.sh"
|
374
|
+
script = "#!/bin/bash
|
375
|
+
#SBATCH --job-name=gemma-#{scriptfn}
|
376
|
+
#SBATCH --ntasks=1
|
377
|
+
#SBATCH --time=20:00
|
378
|
+
srun #{cmd}
|
379
|
+
"
|
380
|
+
debug.call(script)
|
381
|
+
File.open(scriptfn,"w") { |f|
|
382
|
+
f.write(script)
|
383
|
+
}
|
384
|
+
cmd = "sbatch "+options[:slurm_opts] + scriptfn
|
385
|
+
end
|
258
386
|
errno =
|
259
387
|
if options[:json]
|
260
388
|
# capture output
|
261
389
|
err = 0
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
390
|
+
if options[:dry_run]
|
391
|
+
info.call("Would have invoked: ",cmd)
|
392
|
+
elsif options[:parallel]
|
393
|
+
info.call("Add parallel job: ",cmd)
|
394
|
+
parallel_cmds << cmd
|
395
|
+
else
|
396
|
+
err = execute.call(cmd)
|
268
397
|
end
|
269
398
|
err
|
270
399
|
else
|
271
|
-
|
272
|
-
|
273
|
-
|
400
|
+
if options[:dry_run]
|
401
|
+
info.call("Would have invoked ",cmd)
|
402
|
+
0
|
403
|
+
else
|
404
|
+
debug.call("Invoking ",cmd) if options[:debug]
|
405
|
+
system(cmd)
|
406
|
+
$?.exitstatus
|
407
|
+
end
|
274
408
|
end
|
275
409
|
if errno != 0
|
276
410
|
debug.call "Gemma exit ",errno
|
@@ -280,11 +414,14 @@ invoke_gemma = lambda do |extra_args, cache_hit = false|
|
|
280
414
|
end
|
281
415
|
end
|
282
416
|
|
417
|
+
# Takes the hash value and checks whether the (output) file exists
|
283
418
|
# returns datafn, logfn, cache_hit
|
284
|
-
cache = lambda do | chr, ext |
|
419
|
+
cache = lambda do | chr, ext, h=HASH, permutation=0 |
|
285
420
|
inject = (chr==nil ? "" : ".#{chr}" )+ext
|
286
|
-
hashi = (chr==nil ?
|
287
|
-
prefix = options[:cache_dir]+'/'+hashi
|
421
|
+
hashi = (chr==nil ? h : h+inject)
|
422
|
+
prefix = options[:cache_dir]+'/'+hashi+(permutation!=0 ? "."+permutation.to_s : "")
|
423
|
+
# for chr 3 and permutation 1 forms something like
|
424
|
+
# /tmp/1b700-a996f.3.cXX.txt.1.log.txt
|
288
425
|
logfn = prefix+".log.txt"
|
289
426
|
datafn = prefix+ext
|
290
427
|
record[:files] ||= []
|
@@ -320,25 +457,32 @@ kinship = lambda do | chr = nil |
|
|
320
457
|
end
|
321
458
|
|
322
459
|
# ---- Run GWA
|
323
|
-
gwas = lambda do | chr, kfn, pfn |
|
460
|
+
gwas = lambda do | chr, kfn, pfn, permutation=0 |
|
324
461
|
record[:type] = "GWA"
|
325
462
|
error.call "Do not use the GEMMA -k switch with gemma-wrapper - it is automatic!" if GEMMA_ARGS.include? '-k' # K is automatic
|
326
|
-
|
463
|
+
# Update hash for each permutation
|
464
|
+
hash = compute_hash.call(pfn)
|
465
|
+
hashi, cache_hit = cache.call(chr,".assoc.txt",hash,permutation)
|
327
466
|
if not cache_hit
|
328
467
|
args = [ '-k', kfn, '-o', hashi ]
|
329
468
|
args << [ '-loco', chr ] if chr != nil
|
330
469
|
args << [ '-p', pfn ] if pfn
|
331
|
-
invoke_gemma.call args
|
470
|
+
invoke_gemma.call args,false,chr,permutation
|
332
471
|
end
|
333
472
|
end
|
334
473
|
|
335
474
|
LOCO = options[:loco]
|
336
|
-
|
475
|
+
if LOCO
|
476
|
+
if options[:chromosomes]
|
477
|
+
CHROMOSOMES = options[:chromosomes]
|
478
|
+
end
|
479
|
+
end
|
480
|
+
|
337
481
|
if DO_COMPUTE_KINSHIP
|
338
482
|
# compute K
|
339
|
-
info.call
|
340
|
-
if LOCO
|
341
|
-
|
483
|
+
info.call CHROMOSOMES
|
484
|
+
if LOCO
|
485
|
+
CHROMOSOMES.each do |chr|
|
342
486
|
info.call "LOCO for ",chr
|
343
487
|
kinship.call(chr)
|
344
488
|
end
|
@@ -347,13 +491,24 @@ if DO_COMPUTE_KINSHIP
|
|
347
491
|
end
|
348
492
|
else
|
349
493
|
# DO_COMPUTE_GWA
|
350
|
-
|
494
|
+
begin
|
495
|
+
json_in = JSON.parse(File.read(options[:input]))
|
496
|
+
rescue TypeError
|
497
|
+
raise "Missing JSON input file?"
|
498
|
+
end
|
351
499
|
raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
|
352
500
|
|
353
501
|
pfn = options[:permute_phenotypes] # can be nil
|
354
|
-
|
355
|
-
|
356
|
-
|
502
|
+
if LOCO
|
503
|
+
k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
|
504
|
+
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
505
|
+
gwas.call(chr,kfn,pfn)
|
506
|
+
end
|
507
|
+
else
|
508
|
+
kfn = json_in["files"][0][2]
|
509
|
+
CHROMOSOMES.each do | chr |
|
510
|
+
gwas.call(chr,kfn,pfn)
|
511
|
+
end
|
357
512
|
end
|
358
513
|
# Permute
|
359
514
|
if options[:permutate]
|
@@ -364,10 +519,10 @@ else
|
|
364
519
|
end
|
365
520
|
score_list = []
|
366
521
|
debug.call(options[:permutate],"x permutations")
|
367
|
-
(1..options[:permutate]).each do |
|
368
|
-
$stderr.print "Iteration ",
|
522
|
+
(1..options[:permutate]).each do |permutation|
|
523
|
+
$stderr.print "Iteration ",permutation,"\n"
|
369
524
|
# Create a shuffled phenotype file
|
370
|
-
file = File.open("phenotypes-#{
|
525
|
+
file = File.open("phenotypes-#{permutation}","w")
|
371
526
|
tmp_pfn = file.path
|
372
527
|
p tmp_pfn
|
373
528
|
ps.shuffle.each do | l |
|
@@ -375,20 +530,23 @@ else
|
|
375
530
|
end
|
376
531
|
file.close
|
377
532
|
k_files.each do | chr, kfn | # call a GWA for each chromosome
|
378
|
-
gwas.call(chr,kfn,tmp_pfn)
|
533
|
+
gwas.call(chr,kfn,tmp_pfn,permutation)
|
379
534
|
end
|
380
|
-
# p [:HEY,record[:files].last]
|
381
|
-
assocfn = record[:files].last[2]
|
382
|
-
debug.call("Reading ",assocfn)
|
383
535
|
score_min = 1000.0
|
384
|
-
|
385
|
-
|
386
|
-
|
387
|
-
|
536
|
+
if false and not options[:slurm]
|
537
|
+
# p [:HEY,record[:files].last]
|
538
|
+
assocfn = record[:files].last[2]
|
539
|
+
debug.call("Reading ",assocfn)
|
540
|
+
File.foreach(assocfn).with_index do |assoc, assoc_line_num|
|
541
|
+
if assoc_line_num > 0
|
542
|
+
value = assoc.strip.split(/\t/).last.to_f
|
543
|
+
score_min = value if value < score_min
|
544
|
+
end
|
388
545
|
end
|
389
546
|
end
|
390
547
|
score_list << score_min
|
391
548
|
end
|
549
|
+
exit 0 if options[:slurm]
|
392
550
|
ls = score_list.sort
|
393
551
|
p ls
|
394
552
|
significant = ls[(ls.size - ls.size*0.95).floor]
|
@@ -399,5 +557,40 @@ else
|
|
399
557
|
end
|
400
558
|
end
|
401
559
|
|
560
|
+
# ---- Invoke parallel
|
561
|
+
if options[:parallel]
|
562
|
+
# parallel_cmds = ["echo 1","sleep 1 && echo 2", "false", "echo 3"]
|
563
|
+
cmd = parallel_cmds.join("\\n")
|
564
|
+
|
565
|
+
cmd = "echo -e \"#{cmd}\""
|
566
|
+
err = execute.call(cmd+"|parallel --joblog #{joblog}") # first try optimistically to run all jobs in parallel
|
567
|
+
if err != 0
|
568
|
+
[16,8,4,1].each do |jobs|
|
569
|
+
info.call("Failed to complete parallel run -- retrying with smaller RAM footprint!")
|
570
|
+
err = execute.call(cmd+"|parallel -j #{jobs} --resume --joblog #{joblog}")
|
571
|
+
break if err == 0
|
572
|
+
end
|
573
|
+
if err != 0
|
574
|
+
info.call("Run failed!")
|
575
|
+
# Remove remaining files
|
576
|
+
FileUtils.rm_rf("#{tmpdir}/*", secure: true)
|
577
|
+
exit err
|
578
|
+
end
|
579
|
+
end
|
580
|
+
info.call("Run successful!")
|
581
|
+
end
|
402
582
|
json_out.call
|
403
|
-
|
583
|
+
|
584
|
+
# copy all output files to the cache_dir. If a file exists only emit a warning
|
585
|
+
Dir.glob("*.txt", base: tmpdir) do | fn |
|
586
|
+
source = tmpdir + "/" + fn
|
587
|
+
dest = options[:cache_dir] + "/" + fn
|
588
|
+
if not File.exist?(dest) or options[:force]
|
589
|
+
info.call "Move #{source} to #{dest}"
|
590
|
+
FileUtils.mv source, dest, verbose: false
|
591
|
+
else
|
592
|
+
warning.call "File #{dest} already exists. Not overwriting"
|
593
|
+
end
|
594
|
+
end
|
595
|
+
|
596
|
+
end # tmpdir
|
data/gemma-wrapper.gemspec
CHANGED
@@ -2,10 +2,11 @@ Gem::Specification.new do |s|
|
|
2
2
|
s.name = 'bio-gemma-wrapper'
|
3
3
|
s.version = File.read('VERSION')
|
4
4
|
s.summary = "GEMMA with LOCO and permutations"
|
5
|
-
s.description = "GEMMA wrapper adds LOCO and permutation support. Also caches K between runs with LOCO support"
|
5
|
+
s.description = "GEMMA wrapper adds LOCO and permutation support. Also runs in parallel and caches K between runs with LOCO support"
|
6
6
|
s.authors = ["Pjotr Prins"]
|
7
7
|
s.email = 'pjotr.public01@thebird.nl'
|
8
8
|
s.files = ["bin/gemma-wrapper",
|
9
|
+
"lib/lock.rb",
|
9
10
|
"Gemfile",
|
10
11
|
"LICENSE.txt",
|
11
12
|
"README.md",
|
data/lib/lock.rb
ADDED
@@ -0,0 +1,95 @@
|
|
1
|
+
# Locking module for gemma (wrapper)
|
2
|
+
#
|
3
|
+
|
4
|
+
=begin
|
5
|
+
|
6
|
+
The logic is as follows:
|
7
|
+
|
8
|
+
1. a program creates a named lock file (based on a hash of its inputs) with its PID
|
9
|
+
2. on exit it destroys the file
|
10
|
+
3. a new program checks for the lock file
|
11
|
+
4. if it exists and the PID is still in the ps table - wait
|
12
|
+
5. when the pid disappears or the lock file - continue
|
13
|
+
6. a timeout will return an error in 3 minutes
|
14
|
+
|
15
|
+
Note that there is a theoretical chance the lock file existed, but disappeared. I think I have it covered by ignoring the unlink errors. Also the use of /proc/PID is Linux specific.
|
16
|
+
|
17
|
+
=end
|
18
|
+
|
19
|
+
|
20
|
+
require 'timeout'
|
21
|
+
|
22
|
+
module Lock
|
23
|
+
|
24
|
+
def self.local name
|
25
|
+
ENV['HOME']+"/."+name.gsub("/","-")+".lck"
|
26
|
+
end
|
27
|
+
|
28
|
+
def self.lock_pid name
|
29
|
+
lockfn = local(name)
|
30
|
+
if File.exist?(lockfn)
|
31
|
+
File.read(lockfn).to_i
|
32
|
+
else
|
33
|
+
0
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
def self.locked? name
|
38
|
+
lockfn = local(name)
|
39
|
+
pid = lock_pid(name)
|
40
|
+
if File.exist?("/proc/#{pid}")
|
41
|
+
true
|
42
|
+
else
|
43
|
+
# the program went away - remove any 'stale' lock
|
44
|
+
begin
|
45
|
+
File.unlink(lockfn)
|
46
|
+
rescue Errno::ENOENT
|
47
|
+
# ignore error when the lock file went missing
|
48
|
+
end
|
49
|
+
false # --> no longer locked
|
50
|
+
end
|
51
|
+
end
|
52
|
+
|
53
|
+
def Lock::create name
|
54
|
+
wait_for(name)
|
55
|
+
lockfn = local(name)
|
56
|
+
if File.exist?(lockfn)
|
57
|
+
$stderr.print "\nERROR: Can not steal #{lockfn}"
|
58
|
+
exit 1
|
59
|
+
end
|
60
|
+
File.open(lockfn, File::RDWR|File::CREAT, 0644) do |f|
|
61
|
+
f.flock(File::LOCK_EX)
|
62
|
+
f.print(Process.pid)
|
63
|
+
end
|
64
|
+
end
|
65
|
+
|
66
|
+
def Lock::wait_for name
|
67
|
+
lockfn = local(name)
|
68
|
+
begin
|
69
|
+
status = Timeout::timeout(180) { # 3 minutes
|
70
|
+
while locked?(name)
|
71
|
+
$stderr.print("\nWaiting for lock #{lockfn}...")
|
72
|
+
sleep 2
|
73
|
+
end
|
74
|
+
}
|
75
|
+
rescue Timeout::Error
|
76
|
+
$stderr.print "\nERROR: Timed out, but I can not steal #{lockfn}"
|
77
|
+
exit 1
|
78
|
+
end
|
79
|
+
# yah! lock is released
|
80
|
+
end
|
81
|
+
|
82
|
+
def Lock::release name
|
83
|
+
lockfn = local(name)
|
84
|
+
if Process.pid == lock_pid(name)
|
85
|
+
begin
|
86
|
+
File.unlink(lockfn) # PID expired
|
87
|
+
rescue Errno::ENOENT
|
88
|
+
# ignore error when the lock file went missing
|
89
|
+
end
|
90
|
+
else
|
91
|
+
$stderr.print "\nERROR: can not release #{lockfn} because it is not owned by me"
|
92
|
+
end
|
93
|
+
end
|
94
|
+
|
95
|
+
end
|
metadata
CHANGED
@@ -1,17 +1,17 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-gemma-wrapper
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.99.4
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pjotr Prins
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-11-25 00:00:00.000000000 Z
|
12
12
|
dependencies: []
|
13
|
-
description: GEMMA wrapper adds LOCO and permutation support. Also
|
14
|
-
runs with LOCO support
|
13
|
+
description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
|
14
|
+
and caches K between runs with LOCO support
|
15
15
|
email: pjotr.public01@thebird.nl
|
16
16
|
executables:
|
17
17
|
- gemma-wrapper
|
@@ -24,6 +24,7 @@ files:
|
|
24
24
|
- VERSION
|
25
25
|
- bin/gemma-wrapper
|
26
26
|
- gemma-wrapper.gemspec
|
27
|
+
- lib/lock.rb
|
27
28
|
homepage: https://github.com/genetics-statistics/gemma-wrapper
|
28
29
|
licenses:
|
29
30
|
- GPL3
|
@@ -43,8 +44,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
43
44
|
- !ruby/object:Gem::Version
|
44
45
|
version: '0'
|
45
46
|
requirements: []
|
46
|
-
|
47
|
-
rubygems_version: 2.6.8
|
47
|
+
rubygems_version: 3.1.4
|
48
48
|
signing_key:
|
49
49
|
specification_version: 4
|
50
50
|
summary: GEMMA with LOCO and permutations
|