bio-gemma-wrapper 0.99.2 → 0.99.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. checksums.yaml +4 -4
  2. data/README.md +26 -10
  3. data/VERSION +1 -1
  4. data/bin/gemma-wrapper +57 -13
  5. metadata +2 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: e27a8a3abb00b758095df5956b3854674faf5ff681a93bc028df273c40125c0d
4
- data.tar.gz: e9675dbb0ea0f087dd21774635d38f3cda11b46a88b36c77dd308086fd0ec5f2
3
+ metadata.gz: 0bd37b153e121de9c1758af736cd6904744da2de3540f2a7c547cc423382d8d1
4
+ data.tar.gz: 84298a943e7cfe6126653895d9714babc83a7be2bf903c7b61ff9072f1d4e4a8
5
5
  SHA512:
6
- metadata.gz: 81cf5440fa531d5a831efa787800c8bea230d47cddc666a31fff066551ff347708a41ddf1368c0d3946c7ba9faef8e5882e398ad340850253c53961cce96f662
7
- data.tar.gz: 582ae78c48a1eb8eeca01172eaeaba9d5ca23e69601967e334f8c218e3a4dd74b297861b01ce49b1357798b49a96c12e737100dcacec7fc34b70da1fc9c75f0d
6
+ metadata.gz: f32d48ec2f194a513e0cf8f15463b05662e1b269fdd22a110e4cdfb9f6bc541238bac7e766cbc31a8b782379891636e4f79fb3e3667d567ab3c610298d4f11c2
7
+ data.tar.gz: abc9b3faf8ef2f63d566caa14312bde2b2f438c722a83e90f7da775c55e2415171efbfab385273c82e45398c31e5306fb99437c38e287570652a3b04a5432883
data/README.md CHANGED
@@ -8,11 +8,12 @@ Nat. Genet., 2016)](cfw.gif)
8
8
  ## Introduction
9
9
 
10
10
  Gemma-wrapper allows running GEMMA with LOCO, GEMMA with caching,
11
- GEMMA in parallel (now the default), and GEMMA on PBS. Gemma-wrapper
12
- is used to run GEMMA as part of the https://genenetwork.org/
13
- environment.
11
+ GEMMA in parallel (now the default with LOCO), and GEMMA on
12
+ PBS. Gemma-wrapper is used to run GEMMA as part of the
13
+ https://genenetwork.org/ environment.
14
14
 
15
- Note that gemma-wrapper is projected to be integrated into gemma2/lib.
15
+ Note that a version of gemma-wrapper is projected to be integrated
16
+ into gemma itself.
16
17
 
17
18
  GEMMA is a software toolkit for fast application of linear mixed
18
19
  models (LMMs) and related models to genome-wide association studies
@@ -29,6 +30,14 @@ does a pass-through of all standard GEMMA invocation switches. On
29
30
  return gemma-wrapper can return a JSON object (--json) which is
30
31
  useful for web-services.
31
32
 
33
+ ## Performance
34
+
35
+ LOCO runs in parallel by default which is at least a 5x performance
36
+ improvement on a machine with enough cores. GEMMA without LOCO,
37
+ however, does not run in parallel by default. Performance
38
+ improvements with the parallel implementation for LOCO and non-LOCO
39
+ can be viewed [here](./test/performance/releases.gmi).
40
+
32
41
  ## Installation
33
42
 
34
43
  Prerequisites are
@@ -53,15 +62,19 @@ and it will render something like
53
62
  Usage: gemma-wrapper [options] -- [gemma-options]
54
63
  --permutate n Permutate # times by shuffling phenotypes
55
64
  --permute-phenotypes filen Phenotypes to be shuffled in permutations
56
- --loco [x,y,1,2,3...] Run full LOCO
65
+ --loco Run full leave-one-chromosome-out (LOCO)
66
+ --chromosomes [1,2,3] Run specific chromosomes
57
67
  --input filen JSON input variables (used for LOCO)
58
68
  --cache-dir path Use a cache directory
59
69
  --json Create output file in JSON format
60
- --force Force computation
61
- --slurm [options] Submit to slurm PBS
70
+ --force Force computation (override cache)
71
+ --parallel Run jobs in parallel
72
+ --no-parallel Do not run jobs in parallel
73
+ --slurm[=opts] Use slurm PBS for submitting jobs
62
74
  --q, --quiet Run quietly
63
75
  -v, --verbose Run verbosely
64
- --debug Show debug messages and keep intermediate output
76
+ -d, --debug Show debug messages and keep intermediate output
77
+ --dry-run Show commands, but don't execute
65
78
  -- Anything after gets passed to GEMMA
66
79
 
67
80
  -h, --help display this help and exit
@@ -99,6 +112,7 @@ the data files are found):
99
112
  gemma-wrapper -- \
100
113
  -g test/data/input/BXD_geno.txt.gz \
101
114
  -p test/data/input/BXD_pheno.txt \
115
+ -a test/data/input/BXD_snps.txt \
102
116
  -gk \
103
117
  -debug
104
118
 
@@ -116,6 +130,7 @@ You can also get JSON output on STDOUT by providing the --json switch
116
130
  gemma-wrapper --json -- \
117
131
  -g test/data/input/BXD_geno.txt.gz \
118
132
  -p test/data/input/BXD_pheno.txt \
133
+ -a test/data/input/BXD_snps.txt \
119
134
  -gk \
120
135
  -debug > K.json
121
136
 
@@ -133,6 +148,7 @@ default. If you want something else provide a --cache-dir, e.g.
133
148
  gemma-wrapper --cache-dir ~/.gemma-cache -- \
134
149
  -g test/data/input/BXD_geno.txt.gz \
135
150
  -p test/data/input/BXD_pheno.txt \
151
+ -a test/data/input/BXD_snps.txt \
136
152
  -gk \
137
153
  -debug
138
154
 
@@ -143,7 +159,7 @@ will store K in ~/.gemma-cache.
143
159
  Run the LMM using the K's captured earlier in K.json using the --input
144
160
  switch
145
161
 
146
- gemma-wrapper --json --loco --input K.json -- \
162
+ gemma-wrapper --json --input K.json -- \
147
163
  -g test/data/input/BXD_geno.txt.gz \
148
164
  -p test/data/input/BXD_pheno.txt \
149
165
  -c test/data/input/BXD_covariates2.txt \
@@ -163,7 +179,7 @@ https://github.com/genetics-statistics/GEMMA/issues/46). To loop all
163
179
  chromosomes first create all K's with
164
180
 
165
181
  gemma-wrapper --json \
166
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \
182
+ --loco -- \
167
183
  -g test/data/input/BXD_geno.txt.gz \
168
184
  -p test/data/input/BXD_pheno.txt \
169
185
  -a test/data/input/BXD_snps.txt \
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.99.2
1
+ 0.99.3
data/bin/gemma-wrapper CHANGED
@@ -14,12 +14,12 @@ GEMMA wrapper example:
14
14
  gemma-wrapper -- \\
15
15
  -g test/data/input/BXD_geno.txt.gz \\
16
16
  -p test/data/input/BXD_pheno.txt \\
17
+ -a test/data/input/BXD_snps.txt \
17
18
  -gk
18
19
 
19
20
  LOCO K computation with caching and JSON output
20
21
 
21
- gemma-wrapper --json \\
22
- --loco 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,X -- \\
22
+ gemma-wrapper --json --loco -- \\
23
23
  -g test/data/input/BXD_geno.txt.gz \\
24
24
  -p test/data/input/BXD_pheno.txt \\
25
25
  -a test/data/input/BXD_snps.txt \\
@@ -72,11 +72,12 @@ require 'tempfile'
72
72
  require 'tmpdir'
73
73
 
74
74
  split_at = ARGV.index('--')
75
+
75
76
  if split_at
76
77
  gemma_args = ARGV[split_at+1..-1]
77
78
  end
78
79
 
79
- options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, parallel: true }
80
+ options = { show_help: false, source: 'https://github.com/genetics-statistics/gemma-wrapper', version: version+' (Pjotr Prins)', date: Time.now.to_s, gemma_command: gemma_command, cache_dir: Dir.tmpdir(), quiet: false, permute_phenotypes: false, parallel: nil }
80
81
 
81
82
  opts = OptionParser.new do |o|
82
83
  o.banner = "\nUsage: #{File.basename($0)} [options] -- [gemma-options]"
@@ -91,8 +92,12 @@ opts = OptionParser.new do |o|
91
92
  raise "Phenotype input file #{phenotypes} does not exist" if !File.exist?(phenotypes)
92
93
  end
93
94
 
94
- o.on('--loco [x,y,1,2,3...]', Array, 'Run full leave-one-chromosome-out (LOCO)') do |lst|
95
- options[:loco] = lst
95
+ o.on('--loco', 'Run full leave-one-chromosome-out (LOCO)') do |b|
96
+ options[:loco] = b
97
+ end
98
+
99
+ o.on('--chromosomes [1,2,3]',Array,'Run specific chromosomes') do |lst|
100
+ options[:chromosomes] = lst
96
101
  end
97
102
 
98
103
  o.on('--input filen',String, 'JSON input variables (used for LOCO)') do |filen|
@@ -112,6 +117,10 @@ opts = OptionParser.new do |o|
112
117
  options[:force] = true
113
118
  end
114
119
 
120
+ o.on("--parallel", "Run jobs in parallel") do |b|
121
+ options[:parallel] = true
122
+ end
123
+
115
124
  o.on("--no-parallel", "Do not run jobs in parallel") do |b|
116
125
  options[:parallel] = false
117
126
  end
@@ -190,6 +199,15 @@ info = lambda do |*msg|
190
199
  OUTPUT.print *msg,"\n" if !options[:quiet]
191
200
  end
192
201
 
202
+ # Fetch chromosomes
203
+ def get_chromosomes annofn
204
+ h = {}
205
+ File.open(annofn,"r").each_line do | line |
206
+ chr = line.split(/\s+/)[2]
207
+ h[chr] = true
208
+ end
209
+ h.map { |k,v| k }
210
+ end
193
211
  # ---- Start banner
194
212
 
195
213
  GEMMA_K_VERSION=version
@@ -230,13 +248,17 @@ if RUBY_VERSION =~ /^1/
230
248
  warning "runs on Ruby 2.x only\n"
231
249
  end
232
250
 
251
+ # ---- LOCO defaults to parallel
252
+ if options[:parallel] == nil
253
+ options[:parallel] = true if options[:loco]
254
+ end
255
+
233
256
  debug.call(options) # some debug output
234
257
  debug.call(record)
235
258
 
236
259
  DO_COMPUTE_KINSHIP = gemma_args.include?("-gk")
237
260
  DO_COMPUTE_GWA = !DO_COMPUTE_KINSHIP
238
261
 
239
- # ---- Set up parallel
240
262
  if options[:parallel]
241
263
  begin
242
264
  skip_cite = `echo "will cite" |parallel --citation`
@@ -248,6 +270,11 @@ if options[:parallel]
248
270
  parallel_cmds = []
249
271
  end
250
272
 
273
+ # ---- Fetch chromosomes from SNP annotation file
274
+ anno_idx = gemma_args.index '-a'
275
+ raise "Expected GEMMA -a genotype file switch" if anno_idx == nil
276
+ CHROMOSOMES = get_chromosomes(gemma_args[anno_idx+1])
277
+
251
278
  # ---- Compute HASH on inputs
252
279
  hashme = []
253
280
  geno_idx = gemma_args.index '-g'
@@ -434,11 +461,17 @@ gwas = lambda do | chr, kfn, pfn, permutation=0 |
434
461
  end
435
462
 
436
463
  LOCO = options[:loco]
464
+ if LOCO
465
+ if options[:chromosomes]
466
+ CHROMOSOMES = options[:chromosomes]
467
+ end
468
+ end
469
+
437
470
  if DO_COMPUTE_KINSHIP
438
471
  # compute K
439
- info.call LOCO
440
- if LOCO != nil
441
- LOCO.each do |chr|
472
+ info.call CHROMOSOMES
473
+ if LOCO
474
+ CHROMOSOMES.each do |chr|
442
475
  info.call "LOCO for ",chr
443
476
  kinship.call(chr)
444
477
  end
@@ -447,13 +480,24 @@ if DO_COMPUTE_KINSHIP
447
480
  end
448
481
  else
449
482
  # DO_COMPUTE_GWA
450
- json_in = JSON.parse(File.read(options[:input]))
483
+ begin
484
+ json_in = JSON.parse(File.read(options[:input]))
485
+ rescue TypeError
486
+ raise "Missing JSON input file?"
487
+ end
451
488
  raise "JSON problem, file #{options[:input]} is not -gk derived" if json_in["type"] != "K"
452
489
 
453
490
  pfn = options[:permute_phenotypes] # can be nil
454
- k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
455
- k_files.each do | chr, kfn | # call a GWA for each chromosome
456
- gwas.call(chr,kfn,pfn)
491
+ if LOCO
492
+ k_files = json_in["files"].map { |rec| [rec[0],rec[2]] }
493
+ k_files.each do | chr, kfn | # call a GWA for each chromosome
494
+ gwas.call(chr,kfn,pfn)
495
+ end
496
+ else
497
+ kfn = json_in["files"][0][2]
498
+ CHROMOSOMES.each do | chr |
499
+ gwas.call(chr,kfn,pfn)
500
+ end
457
501
  end
458
502
  # Permute
459
503
  if options[:permutate]
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-gemma-wrapper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.99.2
4
+ version: 0.99.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pjotr Prins
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-08-08 00:00:00.000000000 Z
11
+ date: 2021-08-22 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description: GEMMA wrapper adds LOCO and permutation support. Also runs in parallel
14
14
  and caches K between runs with LOCO support