cheripic 1.1.0 → 1.2.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +2 -2
- data/Gemfile +1 -0
- data/README.md +75 -1
- data/Rakefile +72 -4
- data/lib/cheripic/cmd.rb +16 -7
- data/lib/cheripic/contig_pileups.rb +3 -1
- data/lib/cheripic/implementer.rb +3 -3
- data/lib/cheripic/options.rb +9 -8
- data/lib/cheripic/variants.rb +13 -13
- data/lib/cheripic/version.rb +1 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 958b4091f2c95903c3a43a13af7d75cbc7605813
|
4
|
+
data.tar.gz: 18b91af8e68553f4d1700dae921beb7e420f11ac
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 7290c13e270aae1a777767179168353c5c55a035bfd6e82025d000414112425e77f59ad7a6fd0c736d2d2775182e17f978ffa6b67153201e9d316458a6360db6
|
7
|
+
data.tar.gz: 595be6e01fdc4e0d6185a86f79f207abb2dc6bf50a8763a67186339e639542294185656b204cde8d86cda2f3519f7ae3341443a978f9f019c51cfef294513694
|
data/.travis.yml
CHANGED
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -15,6 +15,15 @@ Currently this gem is still in development and nearing complete working package.
|
|
15
15
|
|
16
16
|
## Installation
|
17
17
|
|
18
|
+
Cheripic is available both as a command line tool and as a gem.
|
19
|
+
Binaries are available for Linux 64bit and OSX.
|
20
|
+
Best way to use Cheripic is to download appropriate binary arhcive
|
21
|
+
unpack (`tar -xzf`) and add the unpacked directory to your `PATH`
|
22
|
+
|
23
|
+
Latest binaries are available to [download here](https://github.com/shyamrallapalli/cheripic/releases/tag/v1.1.0)
|
24
|
+
|
25
|
+
|
26
|
+
To install gem and use the gem in your development
|
18
27
|
Add this line to your application's Gemfile:
|
19
28
|
|
20
29
|
```ruby
|
@@ -31,7 +40,72 @@ Or install it yourself as:
|
|
31
40
|
|
32
41
|
## Usage
|
33
42
|
|
34
|
-
|
43
|
+
Running `cheripic` without any input at command line interface shows following help options
|
44
|
+
|
45
|
+
```
|
46
|
+
|
47
|
+
Cheripic v1.1.0
|
48
|
+
Authors: Shyam Rallapalli and Dan MacLean
|
49
|
+
|
50
|
+
Description: Candidate mutation and closely linked marker selection for non reference genomes
|
51
|
+
Uses bulk segregant data from non-reference sequence genomes
|
52
|
+
|
53
|
+
Inputs:
|
54
|
+
1. Needs a reference fasta file of asssembly use for variant analysis
|
55
|
+
2. Pileup files for mutant (phenotype of interest) bulks and background (wildtype phenotype) bulks
|
56
|
+
3. If polyploid species, include of pileup from one or both parents
|
57
|
+
|
58
|
+
USAGE:
|
59
|
+
cheripic <options>
|
60
|
+
|
61
|
+
OPTIONS:
|
62
|
+
-f, --assembly=<s> Assembly file in FASTA format
|
63
|
+
-F, --input-format=<s> bulk and parent alignment file format types - set either pileup or bam (default: pileup)
|
64
|
+
-a, --mut-bulk=<s> Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1
|
65
|
+
-b, --bg-bulk=<s> Pileup or sorted BAM file alignments from background/wildtype bulk 2
|
66
|
+
--output=<s> Directory to store results, will be created if not existing (default: cheripic_results)
|
67
|
+
--loglevel=<s> Choose any one of "info / warn / debug" level for logs generated (default: debug)
|
68
|
+
--hmes-adjust=<f> factor added to snp count of each contig to adjust for hme score calculations (default: 0.5)
|
69
|
+
--htlow=<f> lower level for categorizing heterozygosity (default: 0.2)
|
70
|
+
--hthigh=<f> high level for categorizing heterozygosity (default: 0.9)
|
71
|
+
--mindepth=<i> minimum read depth to conisder a position for variant calls (default: 6)
|
72
|
+
--min-non-ref-count=<i> minimum read depth supporting non reference base at each position (default: 3)
|
73
|
+
--min-indel-count-support=<i> minimum read depth supporting an indel at each position (default: 3)
|
74
|
+
--ignore-reference-n, --no-ignore-reference-n ignore variant calls at N (completely ambigous) bases in the reference (default: true)
|
75
|
+
-q, --mapping-quality=<i> minimum mapping quality of read covering the position (default: 20)
|
76
|
+
-Q, --base-quality=<i> minimum base quality of bases covering the position (default: 15)
|
77
|
+
--noise=<f> praportion of reads for a variant to conisder as noise (default: 0.1)
|
78
|
+
--cross-type=<s> type of cross used to generated mapping population - back or out (default: back)
|
79
|
+
--only-frag-with-vars, --no-only-frag-with-vars select only contigs containing variants for analysis (default: true)
|
80
|
+
--filter-out-low-hmes, --no-filter-out-low-hmes ignore variants from contigs with low hmescore or bfr to list in the final output (default: true)
|
81
|
+
--polyploidy Set if the data input is from polyploids
|
82
|
+
-p, --mut-parent=<s> Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )
|
83
|
+
-r, --bg-parent=<s> Pileup or sorted BAM file alignments from background/wildtype parent (default: )
|
84
|
+
--bfr-adjust=<f> factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)
|
85
|
+
--examples shows some example commands with explanation
|
86
|
+
|
87
|
+
```
|
88
|
+
|
89
|
+
|
90
|
+
|
91
|
+
Example Commands
|
92
|
+
|
93
|
+
|
94
|
+
```
|
95
|
+
EXAMPLE COMMANDS:
|
96
|
+
1. cheripic -f assembly.fa -a mutbulk.pileup -b bgbulk.pileup --output=cheripic_output
|
97
|
+
2. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
|
98
|
+
--mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results
|
99
|
+
3. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
|
100
|
+
--mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true
|
101
|
+
--no-only-frag-with-vars --no-filter-out-low-hmes --output cheripic_results
|
102
|
+
|
103
|
+
```
|
104
|
+
|
105
|
+
|
106
|
+
By default contigs with out a variant and thos contigs with lower scores are discarded.
|
107
|
+
so use options `--no-only-frag-with-vars` and `--no-filter-out-low-hmes` to disable them
|
108
|
+
|
35
109
|
|
36
110
|
## Development
|
37
111
|
|
data/Rakefile
CHANGED
@@ -1,10 +1,78 @@
|
|
1
|
-
require
|
2
|
-
require
|
1
|
+
require 'bundler/gem_tasks'
|
2
|
+
require 'rake/testtask'
|
3
|
+
# For Bundler.with_clean_env
|
4
|
+
require 'bundler/setup'
|
3
5
|
|
4
6
|
Rake::TestTask.new(:test) do |t|
|
5
|
-
t.libs <<
|
6
|
-
t.libs <<
|
7
|
+
t.libs << 'test'
|
8
|
+
t.libs << 'lib'
|
7
9
|
t.test_files = FileList['test/**/*_test.rb']
|
8
10
|
end
|
9
11
|
|
10
12
|
task :default => :test
|
13
|
+
|
14
|
+
|
15
|
+
# for packaging
|
16
|
+
|
17
|
+
PACKAGE_NAME = 'cheripic'
|
18
|
+
VERSION = `bundle exec bin/cheripic -v`.chomp
|
19
|
+
TRAVELING_RUBY_VERSION = '20150210-2.1.5'
|
20
|
+
|
21
|
+
# pre-downloaded travelling ruby from following links and placed them in 'packaging' dirctory
|
22
|
+
# http://d6r77u77i8pq3.cloudfront.net/releases/traveling-ruby-20150210-2.1.5-linux-x86_64.tar.gz
|
23
|
+
# http://d6r77u77i8pq3.cloudfront.net/releases/traveling-ruby-20150210-2.1.5-osx.tar.gz
|
24
|
+
|
25
|
+
desc 'Package your app'
|
26
|
+
task :package => ['package:linux:x86_64', 'package:osx']
|
27
|
+
|
28
|
+
namespace :package do
|
29
|
+
|
30
|
+
namespace :linux do
|
31
|
+
desc 'Package your app for Linux x86_64'
|
32
|
+
task :x86_64 => [:bundle_install, "packaging/traveling-ruby-#{TRAVELING_RUBY_VERSION}-linux-x86_64.tar.gz"] do
|
33
|
+
create_package('linux-x86_64')
|
34
|
+
end
|
35
|
+
end
|
36
|
+
|
37
|
+
desc 'Package your app for OS X'
|
38
|
+
task :osx => [:bundle_install, "packaging/traveling-ruby-#{TRAVELING_RUBY_VERSION}-osx.tar.gz"] do
|
39
|
+
create_package('osx')
|
40
|
+
end
|
41
|
+
|
42
|
+
desc 'Install gems to local directory'
|
43
|
+
task :bundle_install do
|
44
|
+
if RUBY_VERSION !~ /^2\.1\./
|
45
|
+
abort "You can only 'bundle install' using Ruby 2.1, because that's what Traveling Ruby uses."
|
46
|
+
end
|
47
|
+
sh 'rm -rf packaging/tmp'
|
48
|
+
sh 'mkdir packaging/tmp'
|
49
|
+
sh 'cp Gemfile.lock packaging/tmp/'
|
50
|
+
sh 'cp packaging/Gemfile packaging/tmp/'
|
51
|
+
Bundler.with_clean_env do
|
52
|
+
sh 'env BUNDLE_IGNORE_CONFIG=1 bundle install --path packaging/vendor --without development'
|
53
|
+
end
|
54
|
+
sh 'rm -rf packaging/tmp'
|
55
|
+
sh 'rm -f packaging/vendor/*/*/cache/*'
|
56
|
+
end
|
57
|
+
end
|
58
|
+
|
59
|
+
def create_package(target)
|
60
|
+
package_dest = "#{PACKAGE_NAME}-#{VERSION}-#{target}"
|
61
|
+
package_dir = "packaging/#{package_dest}"
|
62
|
+
sh "rm -rf #{package_dir}"
|
63
|
+
sh "mkdir #{package_dir}"
|
64
|
+
sh "mkdir -p #{package_dir}/lib/app"
|
65
|
+
sh "cp -R bin #{package_dir}/lib/app/"
|
66
|
+
sh "cp -R lib #{package_dir}/lib/app/"
|
67
|
+
sh "mkdir #{package_dir}/lib/app/ruby"
|
68
|
+
sh "tar -xzf packaging/traveling-ruby-#{TRAVELING_RUBY_VERSION}-#{target}.tar.gz -C #{package_dir}/lib/app/ruby"
|
69
|
+
sh "cp packaging/wrapper.sh #{package_dir}/cheripic"
|
70
|
+
sh "cp -pR packaging/vendor/ruby/2.1.0 #{package_dir}/lib/app/ruby/"
|
71
|
+
sh "cp packaging/cheripic.gemspec Gemfile Gemfile.lock LICENSE.txt #{package_dir}/lib/app/"
|
72
|
+
sh "mkdir #{package_dir}/lib/app/.bundle"
|
73
|
+
sh "cp packaging/bundler-config #{package_dir}/lib/app/.bundle/config"
|
74
|
+
# if !ENV['DIR_ONLY']
|
75
|
+
# sh "tar -czf #{package_dir}.tar.gz #{package_dir}"
|
76
|
+
# sh "rm -rf #{package_dir}"
|
77
|
+
# end
|
78
|
+
end
|
data/lib/cheripic/cmd.rb
CHANGED
@@ -40,6 +40,7 @@ module Cheripic
|
|
40
40
|
def argument_parser
|
41
41
|
cmds = self
|
42
42
|
Trollop::Parser.new do
|
43
|
+
version Cheripic::VERSION
|
43
44
|
banner cmds.help_message
|
44
45
|
opt :assembly, 'Assembly file in FASTA format',
|
45
46
|
:short => '-f',
|
@@ -76,9 +77,9 @@ module Cheripic
|
|
76
77
|
opt :min_indel_count_support, 'minimum read depth supporting an indel at each position',
|
77
78
|
:type => Integer,
|
78
79
|
:default => 3
|
79
|
-
opt :
|
80
|
+
opt :ambiguous_ref_bases, 'including variant at completely ambiguous bases in the reference',
|
80
81
|
:type => FalseClass,
|
81
|
-
:default =>
|
82
|
+
:default => false
|
82
83
|
opt :mapping_quality, 'minimum mapping quality of read covering the position',
|
83
84
|
:short => '-q',
|
84
85
|
:type => Integer,
|
@@ -93,12 +94,12 @@ module Cheripic
|
|
93
94
|
opt :cross_type, 'type of cross used to generated mapping population - back or out',
|
94
95
|
:type => String,
|
95
96
|
:default => 'back'
|
96
|
-
opt :
|
97
|
+
opt :use_all_contigs, 'option to select all contigs or only contigs containing variants for analysis',
|
97
98
|
:type => FalseClass,
|
98
|
-
:default =>
|
99
|
-
opt :
|
99
|
+
:default => false
|
100
|
+
opt :include_low_hmes, 'option to include or discard variants from contigs with low hme-score or bfr score to list in the final output',
|
100
101
|
:type => FalseClass,
|
101
|
-
:default =>
|
102
|
+
:default => false
|
102
103
|
opt :polyploidy, 'Set if the data input is from polyploids',
|
103
104
|
:type => FalseClass,
|
104
105
|
:default => false
|
@@ -113,6 +114,9 @@ module Cheripic
|
|
113
114
|
opt :bfr_adjust, 'factor added to hemi snp frequency of each parent to adjust for bfr calculations',
|
114
115
|
:type => Float,
|
115
116
|
:default => 0.05
|
117
|
+
opt :sel_seq_len, 'sequence length to print from either side of selected variants',
|
118
|
+
:type => Integer,
|
119
|
+
:default => 50
|
116
120
|
opt :examples, 'shows some example commands with explanation'
|
117
121
|
end
|
118
122
|
end
|
@@ -148,7 +152,12 @@ module Cheripic
|
|
148
152
|
Cheripic v#{Cheripic::VERSION.dup}
|
149
153
|
|
150
154
|
EXAMPLE COMMANDS:
|
151
|
-
|
155
|
+
1. cheripic -f assembly.fa -a mutbulk.pileup -b bgbulk.pileup --output=cheripic_output
|
156
|
+
2. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
|
157
|
+
--mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results
|
158
|
+
3. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
|
159
|
+
--mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true
|
160
|
+
--no-only-frag-with-vars --no-filter-out-low-hmes --output cheripic_results
|
152
161
|
EOS
|
153
162
|
puts msg.split("\n").map{ |line| line.lstrip }.join("\n")
|
154
163
|
exit(0)
|
@@ -131,12 +131,14 @@ module Cheripic
|
|
131
131
|
# @return [Symbol] variant mode of the background bulk (:hom or :het) at the position
|
132
132
|
def bg_bulk_var(pos)
|
133
133
|
bg_base_hash = @bg_bulk[pos].var_base_frac
|
134
|
+
bg_base_hash.delete(:ref)
|
135
|
+
return nil if bg_base_hash.empty?
|
134
136
|
if bg_base_hash.length > 1
|
135
137
|
# taking only var mode
|
136
138
|
var_mode(bg_base_hash.values.max)
|
137
139
|
else
|
138
140
|
# taking only var mode
|
139
|
-
var_mode(bg_base_hash[0])
|
141
|
+
var_mode(bg_base_hash[bg_base_hash.keys[0]])
|
140
142
|
end
|
141
143
|
end
|
142
144
|
|
data/lib/cheripic/implementer.rb
CHANGED
@@ -36,13 +36,13 @@ module Cheripic
|
|
36
36
|
mindepth
|
37
37
|
min_non_ref_count
|
38
38
|
min_indel_count_support
|
39
|
-
|
39
|
+
ambiguous_ref_bases
|
40
40
|
mapping_quality
|
41
41
|
base_quality
|
42
42
|
noise
|
43
43
|
cross_type
|
44
|
-
|
45
|
-
|
44
|
+
use_all_contigs
|
45
|
+
include_low_hmes
|
46
46
|
polyploidy
|
47
47
|
bfr_adjust}
|
48
48
|
settings = inputs.select { |k| set2.include?(k) }
|
data/lib/cheripic/options.rb
CHANGED
@@ -14,13 +14,13 @@ module Cheripic
|
|
14
14
|
:mindepth => 6,
|
15
15
|
:min_non_ref_count => 3,
|
16
16
|
:min_indel_count_support => 3,
|
17
|
-
:
|
17
|
+
:ambiguous_ref_bases => false,
|
18
18
|
:mapping_quality => 20,
|
19
19
|
:base_quality => 15,
|
20
20
|
:noise => 0.1,
|
21
21
|
:cross_type => 'back',
|
22
|
-
:
|
23
|
-
:
|
22
|
+
:use_all_contigs => false,
|
23
|
+
:include_low_hmes => false,
|
24
24
|
:polyploidy => false,
|
25
25
|
:bfr_adjust => 0.05,
|
26
26
|
:sel_seq_len => 50
|
@@ -66,9 +66,10 @@ module Cheripic
|
|
66
66
|
end
|
67
67
|
|
68
68
|
# Option to whether to ignore or consider the reference positions which are ambiguous
|
69
|
+
# @note switching option name here so Pileup options are same
|
69
70
|
# @return [Boolean]
|
70
71
|
def self.ignore_reference_n
|
71
|
-
@user_settings[:
|
72
|
+
@user_settings[:ambiguous_ref_bases] ? false : true
|
72
73
|
end
|
73
74
|
|
74
75
|
# Minimum alignment mapping quality of the read to be used for bam files
|
@@ -98,14 +99,14 @@ module Cheripic
|
|
98
99
|
|
99
100
|
# Option to whether to ignore or consider the contigs with out any variants
|
100
101
|
# @return [Boolean]
|
101
|
-
def self.
|
102
|
-
@user_settings[:
|
102
|
+
def self.use_all_contigs
|
103
|
+
@user_settings[:use_all_contigs]
|
103
104
|
end
|
104
105
|
|
105
106
|
# Option to whether to ignore or consider the contigs with low HME score
|
106
107
|
# @return [Boolean]
|
107
|
-
def self.
|
108
|
-
@user_settings[:
|
108
|
+
def self.include_low_hmes
|
109
|
+
@user_settings[:include_low_hmes]
|
109
110
|
end
|
110
111
|
|
111
112
|
# Option to whether to set the input data is from polyploid or not
|
data/lib/cheripic/variants.rb
CHANGED
@@ -119,15 +119,17 @@ module Cheripic
|
|
119
119
|
end
|
120
120
|
|
121
121
|
# Applies selection procedure on assembly contigs based on the ratio_type provided.
|
122
|
-
# If
|
122
|
+
# If use_all_contigs is set to false then contigs without any variant are discarded for :hme_score
|
123
123
|
# while contigs without any hemisnps are discarded for :bfr_score
|
124
|
-
# If
|
124
|
+
# If include_low_hmes is set to false then contigs are further filtered based on a cut off value of the score
|
125
125
|
# @param ratio_type [Symbol] ratio_type is either :hme_score or :bfr_score
|
126
126
|
def select_contigs(ratio_type)
|
127
127
|
selected_contigs ={}
|
128
|
-
|
128
|
+
use_all_contigs = Options.use_all_contigs
|
129
129
|
@assembly.each_key do | frag |
|
130
|
-
if
|
130
|
+
if use_all_contigs
|
131
|
+
selected_contigs[frag] = @assembly[frag]
|
132
|
+
else
|
131
133
|
if ratio_type == :hme_score
|
132
134
|
# selecting fragments which have a variant
|
133
135
|
if @assembly[frag].hm_num + @assembly[frag].ht_num > 2 * Options.hmes_adjust
|
@@ -139,15 +141,13 @@ module Cheripic
|
|
139
141
|
selected_contigs[frag] = @assembly[frag]
|
140
142
|
end
|
141
143
|
end
|
142
|
-
else
|
143
|
-
selected_contigs[frag] = @assembly[frag]
|
144
144
|
end
|
145
145
|
end
|
146
146
|
selected_contigs = filter_contigs(selected_contigs, ratio_type)
|
147
|
-
if
|
148
|
-
logger.info "Selected #{selected_contigs.length} out of #{@assembly.length} fragments with #{ratio_type} score\n"
|
149
|
-
else
|
147
|
+
if use_all_contigs
|
150
148
|
logger.info "No filtering was applied to fragments\n"
|
149
|
+
else
|
150
|
+
logger.info "Selected #{selected_contigs.length} out of #{@assembly.length} fragments with #{ratio_type} score\n"
|
151
151
|
end
|
152
152
|
selected_contigs
|
153
153
|
end
|
@@ -171,11 +171,13 @@ module Cheripic
|
|
171
171
|
# @param ratio_type [Symbol] ratio_type is either :hme_score or :bfr_score
|
172
172
|
# @param selected_contigs [Hash] a hash of contigs with selected ratio_type, a subset of assembly hash
|
173
173
|
def get_cutoff(selected_contigs, ratio_type)
|
174
|
-
|
174
|
+
include_low_hmes = Options.include_low_hmes
|
175
175
|
# set minimum cut off hme_score or bfr_score to pick fragments with variants
|
176
176
|
# calculate min hme score for back or out crossed data or bfr_score for polypoidy data
|
177
177
|
# if no filtering applied set cutoff to 1.1
|
178
|
-
if
|
178
|
+
if include_low_hmes
|
179
|
+
cutoff = 0.0
|
180
|
+
else
|
179
181
|
if ratio_type == :hme_score
|
180
182
|
adjust = Options.hmes_adjust
|
181
183
|
if Options.cross_type == 'back'
|
@@ -186,8 +188,6 @@ module Cheripic
|
|
186
188
|
else # ratio_type is bfr_score
|
187
189
|
cutoff = bfr_cutoff(selected_contigs)
|
188
190
|
end
|
189
|
-
else
|
190
|
-
cutoff = 0.0
|
191
191
|
end
|
192
192
|
cutoff
|
193
193
|
end
|
data/lib/cheripic/version.rb
CHANGED
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: cheripic
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.
|
4
|
+
version: 1.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Shyam Rallapalli
|
8
8
|
autorequire:
|
9
9
|
bindir: exe
|
10
10
|
cert_chain: []
|
11
|
-
date: 2016-08-
|
11
|
+
date: 2016-08-11 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: yell
|