cheripic 1.2.0 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 958b4091f2c95903c3a43a13af7d75cbc7605813
4
- data.tar.gz: 18b91af8e68553f4d1700dae921beb7e420f11ac
3
+ metadata.gz: 824a8c68d3707ad02cf0d3b7d567191244d1a5a6
4
+ data.tar.gz: 6d2b3c7bef04aba06b5206968d1a0d69996a25b0
5
5
  SHA512:
6
- metadata.gz: 7290c13e270aae1a777767179168353c5c55a035bfd6e82025d000414112425e77f59ad7a6fd0c736d2d2775182e17f978ffa6b67153201e9d316458a6360db6
7
- data.tar.gz: 595be6e01fdc4e0d6185a86f79f207abb2dc6bf50a8763a67186339e639542294185656b204cde8d86cda2f3519f7ae3341443a978f9f019c51cfef294513694
6
+ metadata.gz: 6ae5a85c30a0b1ea19f118409ddec95a6c7c3e11f00663e9769a5642770e90cb2ab5b0200f9d9eaa4ed8c6873492ac7f5f3acd568dc0cc14ddf8ccaac5012435
7
+ data.tar.gz: efe77b2ccafd0ad7ed4eeb47b497207cacf3dbaee058779b46af7a5ca34597991a3949dbeb0f3807bba69ea7dca0dd7e24fb58c040bf0065fef2ca1e4e3424fc
data/ChangeLog.md ADDED
@@ -0,0 +1,21 @@
1
+ ### Change Log
2
+
3
+ All significant changes to this project at each release are documented in this file.
4
+
5
+
6
+ #### Future changes to include
7
+
8
+ 1. option to take multiple background pileup files
9
+ 2. replace output directory with output file name tag, since we only write to one file
10
+ 3. option to take bam file or pileup file as inputs of bulks
11
+
12
+ #### [1.2.0] - 2016-08-11
13
+
14
+ 1. fixed calculation of heterzygosity for background bulks
15
+ 2. changed command line boolean option to be set using only true or false
16
+ 3. included command line option to set length of sequnce to retireve on either side of each variant
17
+
18
+
19
+ #### [1.1.0] - 2016-07-26
20
+
21
+ first release of the binaries for Linux 64 bit and OSX 64bit
data/Gemfile CHANGED
@@ -1,5 +1,4 @@
1
1
  source 'https://rubygems.org'
2
- ruby '2.1.5'
3
2
 
4
3
  # Specify your gem's dependencies in cheripic.gemspec
5
4
  gemspec
data/README.md CHANGED
@@ -11,6 +11,7 @@ Computing Homozygosity Enriched Regions In genomes to Prioritize Identification
11
11
  is a ruby tools to pick causative mutation from bulks segregant sequencing.
12
12
 
13
13
  Currently this gem is still in development and nearing complete working package.
14
+ And software only works with pileup as input files, use of bam and vcf files will be implemented in future
14
15
 
15
16
 
16
17
  ## Installation
@@ -20,7 +21,7 @@ Binaries are available for Linux 64bit and OSX.
20
21
  Best way to use Cheripic is to download appropriate binary arhcive
21
22
  unpack (`tar -xzf`) and add the unpacked directory to your `PATH`
22
23
 
23
- Latest binaries are available to [download here](https://github.com/shyamrallapalli/cheripic/releases/tag/v1.1.0)
24
+ Latest binaries are available to [download here](https://github.com/shyamrallapalli/cheripic/releases/latest)
24
25
 
25
26
 
26
27
  To install gem and use the gem in your development
@@ -44,7 +45,7 @@ Running `cheripic` without any input at command line interface shows following h
44
45
 
45
46
  ```
46
47
 
47
- Cheripic v1.1.0
48
+ Cheripic v1.2.0
48
49
  Authors: Shyam Rallapalli and Dan MacLean
49
50
 
50
51
  Description: Candidate mutation and closely linked marker selection for non reference genomes
@@ -59,30 +60,31 @@ USAGE:
59
60
  cheripic <options>
60
61
 
61
62
  OPTIONS:
62
- -f, --assembly=<s> Assembly file in FASTA format
63
- -F, --input-format=<s> bulk and parent alignment file format types - set either pileup or bam (default: pileup)
64
- -a, --mut-bulk=<s> Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1
65
- -b, --bg-bulk=<s> Pileup or sorted BAM file alignments from background/wildtype bulk 2
66
- --output=<s> Directory to store results, will be created if not existing (default: cheripic_results)
67
- --loglevel=<s> Choose any one of "info / warn / debug" level for logs generated (default: debug)
68
- --hmes-adjust=<f> factor added to snp count of each contig to adjust for hme score calculations (default: 0.5)
69
- --htlow=<f> lower level for categorizing heterozygosity (default: 0.2)
70
- --hthigh=<f> high level for categorizing heterozygosity (default: 0.9)
71
- --mindepth=<i> minimum read depth to conisder a position for variant calls (default: 6)
72
- --min-non-ref-count=<i> minimum read depth supporting non reference base at each position (default: 3)
73
- --min-indel-count-support=<i> minimum read depth supporting an indel at each position (default: 3)
74
- --ignore-reference-n, --no-ignore-reference-n ignore variant calls at N (completely ambigous) bases in the reference (default: true)
75
- -q, --mapping-quality=<i> minimum mapping quality of read covering the position (default: 20)
76
- -Q, --base-quality=<i> minimum base quality of bases covering the position (default: 15)
77
- --noise=<f> praportion of reads for a variant to conisder as noise (default: 0.1)
78
- --cross-type=<s> type of cross used to generated mapping population - back or out (default: back)
79
- --only-frag-with-vars, --no-only-frag-with-vars select only contigs containing variants for analysis (default: true)
80
- --filter-out-low-hmes, --no-filter-out-low-hmes ignore variants from contigs with low hmescore or bfr to list in the final output (default: true)
81
- --polyploidy Set if the data input is from polyploids
82
- -p, --mut-parent=<s> Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )
83
- -r, --bg-parent=<s> Pileup or sorted BAM file alignments from background/wildtype parent (default: )
84
- --bfr-adjust=<f> factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)
85
- --examples shows some example commands with explanation
63
+ -f, --assembly=<s> Assembly file in FASTA format
64
+ -F, --input-format=<s> bulk and parent alignment file format types - set either pileup or bam (default: pileup)
65
+ -a, --mut-bulk=<s> Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1
66
+ -b, --bg-bulk=<s> Pileup or sorted BAM file alignments from background/wildtype bulk 2
67
+ --output=<s> Directory to store results, will be created if not existing (default: cheripic_results)
68
+ --loglevel=<s> Choose any one of "info / warn / debug" level for logs generated (default: debug)
69
+ --hmes-adjust=<f> factor added to snp count of each contig to adjust for hme score calculations (default: 0.5)
70
+ --htlow=<f> lower level for categorizing heterozygosity (default: 0.2)
71
+ --hthigh=<f> high level for categorizing heterozygosity (default: 0.9)
72
+ --mindepth=<i> minimum read depth to conisder a position for variant calls (default: 6)
73
+ --min-non-ref-count=<i> minimum read depth supporting non reference base at each position (default: 3)
74
+ --min-indel-count-support=<i> minimum read depth supporting an indel at each position (default: 3)
75
+ --ambiguous-ref-bases including variant at completely ambiguous bases in the reference
76
+ -q, --mapping-quality=<i> minimum mapping quality of read covering the position (default: 20)
77
+ -Q, --base-quality=<i> minimum base quality of bases covering the position (default: 15)
78
+ --noise=<f> praportion of reads for a variant to conisder as noise (default: 0.1)
79
+ --cross-type=<s> type of cross used to generated mapping population - back or out (default: back)
80
+ --use-all-contigs option to select all contigs or only contigs containing variants for analysis
81
+ --include-low-hmes option to include or discard variants from contigs with low hme-score or bfr score to list in the final output
82
+ --polyploidy Set if the data input is from polyploids
83
+ -p, --mut-parent=<s> Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )
84
+ -r, --bg-parent=<s> Pileup or sorted BAM file alignments from background/wildtype parent (default: )
85
+ --bfr-adjust=<f> factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)
86
+ --sel-seq-len=<i> sequence length to print from either side of selected variants (default: 50)
87
+ --examples shows some example commands with explanation
86
88
 
87
89
  ```
88
90
 
@@ -98,7 +100,7 @@ EXAMPLE COMMANDS:
98
100
  --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results
99
101
  3. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
100
102
  --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true
101
- --no-only-frag-with-vars --no-filter-out-low-hmes --output cheripic_results
103
+ --use-all-contigs true --include-low-hmes true --output cheripic_results
102
104
 
103
105
  ```
104
106
 
data/Rakefile CHANGED
@@ -23,7 +23,7 @@ TRAVELING_RUBY_VERSION = '20150210-2.1.5'
23
23
  # http://d6r77u77i8pq3.cloudfront.net/releases/traveling-ruby-20150210-2.1.5-osx.tar.gz
24
24
 
25
25
  desc 'Package your app'
26
- task :package => ['package:linux:x86_64', 'package:osx']
26
+ task :package => %w(package:linux:x86_64 package:osx)
27
27
 
28
28
  namespace :package do
29
29
 
@@ -71,8 +71,12 @@ def create_package(target)
71
71
  sh "cp packaging/cheripic.gemspec Gemfile Gemfile.lock LICENSE.txt #{package_dir}/lib/app/"
72
72
  sh "mkdir #{package_dir}/lib/app/.bundle"
73
73
  sh "cp packaging/bundler-config #{package_dir}/lib/app/.bundle/config"
74
- # if !ENV['DIR_ONLY']
75
- # sh "tar -czf #{package_dir}.tar.gz #{package_dir}"
76
- # sh "rm -rf #{package_dir}"
77
- # end
74
+ if target == 'linux-x86_64'
75
+ sh "cp -p packaging/linux-x86_64_samtools/external/* packaging/cheripic-#{VERSION}-linux-x86_64/lib/app/ruby/2.1.0/gems/bio-samtools-2.4.0/lib/bio/db/sam/external/"
76
+ end
77
+ unless ENV['DIR_ONLY']
78
+ Dir.chdir('packaging') do
79
+ sh "gtar -czf #{package_dest}.tar.gz #{package_dest}"
80
+ end
81
+ end
78
82
  end
data/bin/cheripic CHANGED
@@ -1,4 +1,5 @@
1
1
  #!/usr/bin/env ruby
2
+ $LOAD_PATH.unshift(File.join(File.dirname(__FILE__), '..', 'lib'))
2
3
  require 'cheripic'
3
4
 
4
5
  # rescue errors to get clean error messages through the logger
data/cheripic.gemspec CHANGED
@@ -23,8 +23,6 @@ Gem::Specification.new do |spec|
23
23
  spec.add_runtime_dependency 'trollop', '~> 2.1', '>= 2.1.2'
24
24
  spec.add_runtime_dependency 'bio', '~> 1.5', '>= 1.5.0'
25
25
  spec.add_dependency 'bio-samtools', '~> 2.4.0'
26
- spec.add_dependency 'bio-gngm', '~> 0.2.1'
27
- spec.add_runtime_dependency 'rinruby', '~> 2.0', '>= 2.0.3'
28
26
 
29
27
  spec.add_development_dependency 'activesupport', '~> 4.2.6'
30
28
  spec.add_development_dependency 'bundler', '~> 1.7.6'
@@ -0,0 +1,196 @@
1
+ <tool id="cheripic" name="CHERIPIC" version="1.2.0">
2
+
3
+ <description>CHERIPIC</description>
4
+
5
+ <version_command>cheripic -v</version_command>
6
+
7
+ <command>
8
+ <![CDATA[
9
+ cheripic
10
+ --assembly $assembly
11
+ --mut-bulk $mut_bulk
12
+ --bg-bulk $bg_bulk
13
+ --output $output
14
+ --loglevel $loglevel
15
+ --hmes-adjust $hmes_adjust
16
+ --htlow $ht_low
17
+ --hthigh $ht_high
18
+ --mindepth $min_depth
19
+ --min-non-ref-count $min_non_ref_count
20
+ --min-indel-count-support $min_indel_count_support
21
+ --ambiguous-ref-bases $ambiguous_ref_bases
22
+ --mapping-quality $mapping_quality
23
+ --base-quality $base_quality
24
+ --noise $noise
25
+ --cross-type $cross_type
26
+ --use-all-contigs $use_all_contigs
27
+ --include-low-hmes $include_low_hmes
28
+ --polyploidy $polyploidy
29
+ --mut-parent $mut_parent
30
+ --bg-parent $bg_parent
31
+ --bfr-adjust $bfr_adjust
32
+ --sel-seq-len $sel_seq_len
33
+ ]]>
34
+ </command>
35
+
36
+ <inputs>
37
+ <param name="assembly" type="data" format="fasta" label="Input Assembly file" help="Select Assembly fasta file" />
38
+ <param name="mut_bulk" type="data" format="pileup" label="mutant bulk pileup file" help="Select mutant bulk pileup file" />
39
+ <param name="bg_bulk" type="data" format="pileup" label="background bulk pileup file" min="1" multiple="true" help="Select background bulk pileup file" />
40
+ <param name="loglevel" type="select" optional="true" label="analysis log level" help="choose between info, warn and debug levels">
41
+ <option value="info" selected="true">info </option>
42
+ <option value="warn">warnings</option>
43
+ <option value="debug">debug</option>
44
+ </param>
45
+ <param name="hmes_adjust" size="4" type="float" optional="true" value="0.5" min="0.01" max="1.0"
46
+ label="hme score adjuster" help="factor added to snp count of each contig to adjust for hme score calculations" />
47
+ <param name="ht_low" size="4" type="float" optional="true" value="0.25" min="0.1" max="1.0"
48
+ label="heterozygosity low limit" help="lower limit to heterozygosity allele fraction" />
49
+ <param name="ht_high" size="4" type="float" optional="true" value="0.75" min="0.1" max="1.0"
50
+ label="heterozygosity high limit" help="upper limit to heterozygosity allele fraction" />
51
+ <param name="min_depth" size="4" type="integer" optional="true" value="6" min="1" max="8000"
52
+ label="minimum read coverage" help="minimum read depth to conisder a position for variant calls" />
53
+ <param name="min_non_ref_count" size="4" type="integer" optional="true" value="3" min="1" max="8000"
54
+ label="minimum alternate read coverage" help="minimum read depth supporting non reference base at each position" />
55
+ <param name="min_indel_count_support" size="4" type="integer" optional="true" value="3" min="1" max="8000"
56
+ label="minimum indel read coverage" help="minimum read depth supporting an indel at each position" />
57
+ <param name="ambiguous_ref_bases" type="boolean" optional="true" checked="false" label="ambiguous reference position"
58
+ help="including variant at completely ambiguous bases in the reference" truevalue="true" falsevalue="false" />
59
+ <param name="mapping_quality" size="4" type="integer" optional="true" value="20" min="0" max="255"
60
+ label="minimum mapping quality" help="minimum mapping quality of read covering the position" />
61
+ <param name="base_quality" size="4" type="integer" optional="true" value="15" min="0" max="40"
62
+ label="minimum base quality" help="minimum base quality of nucleotides covering the position" />
63
+ <param name="noise" size="4" type="float" optional="true" value="0.1" min="0" max="0.2"
64
+ label="read noise" help="proportion of reads supporting a variant, below which are consider as noise" />
65
+ <param name="cross_type" type="select" optional="true" label="cross type" help="type of cross used to generated mapping population - back or out" >
66
+ <option value="back" selected="true">back cross</option>
67
+ <option value="out">out cross</option>
68
+ </param>
69
+
70
+ <param name="use_all_contigs" type="boolean" optional="true" checked="false" label="use all contigs in analysis"
71
+ help="option to select all contigs or only contigs containing variants for analysis" truevalue="true" falsevalue="false" />
72
+ <param name="include_low_hmes" type="boolean" optional="true" checked="false" label="no hme or bfr score cut off"
73
+ help="option to include or discard variants from contigs with low hme-score or bfr score to list in the final output" truevalue="true" falsevalue="false" />
74
+ <param name="polyploidy" type="boolean" optional="true" checked="false" label="polyploid data"
75
+ help="Set if the input data is from polyploids" truevalue="true" falsevalue="false" />
76
+ <param name="mut-parent" type="data" optional="true" format="pileup" label="mutant parent pileup file" help="Select mutant parent pileup file" />
77
+ <param name="bg-parent" type="data" optional="true" format="pileup" label="background parent pileup file" help="Select background parent pileup file" />
78
+
79
+ <param name="bfr_adjust" size="4" type="float" optional="true" value="0.05" min="0.01" max="1.0"
80
+ label="bfr score adjuster" help="factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)" />
81
+ <param name="sel_seq_len" size="4" type="integer" optional="true" value="50" min="10" max="250"
82
+ label="selected variant seq length out" help="sequence length to print from either side of selected variants (default: 50)" />
83
+
84
+ <param name="output" type="text" size="30" value="cheripic_results" label="tag for output filename" help="write a tag to include with output filename" />
85
+ </inputs>
86
+
87
+ <outputs>
88
+ <data name="output_1" format="txt" file="${output}_selected_hme_variants.txt" />
89
+ <data name="output_2" format="txt" file="${output}_selected_bfr_variants.txt" />
90
+ </outputs>
91
+
92
+ <tests>
93
+ <test>
94
+ <param name="assembly" value="picked_fasta.fa" ftype="fasta" />
95
+ <param name="mut_bulk" value="mut_bulk.pileup" ftype="pileup" />
96
+ <param name="bg_bulk" value="wt_bulk.pileup" ftype="pileup" />
97
+ <output name="output" ftype="txt" file="selected_variants.out" />
98
+ </test>
99
+ </tests>
100
+
101
+ <help>
102
+
103
+ **Computing Homozygosity Enriched Regions In genomes to Prioritize Identification of Candidate variants (CHERIPIC)**
104
+
105
+ CHERIPIC is a ruby tool to pick causative mutation from bulk segregant sequencing
106
+
107
+ ------
108
+
109
+ **What it does**
110
+
111
+ This tool uses ``cheripic`` tool to analyse bulk segregant sequencing to identify causative muation
112
+
113
+
114
+ .. class:: infomark
115
+
116
+ Provides a list of snps that could either closely linked markers or the causative mutation.
117
+
118
+ ------
119
+
120
+ **Input formats**
121
+
122
+ assembly file should be a fasta file used for generating pileups from bulks
123
+ bulk alignment files should be pileup files
124
+
125
+ ------
126
+
127
+ **Outputs**
128
+
129
+ The output is a text file, and has the following columns::
130
+
131
+ Column Description
132
+ ----------------- --------------------------------------------------------
133
+ 1 HME_Score Homozygosity Enrichment score
134
+ 2 AlleleFreq Allele frequency
135
+ 3 seq_id Contig/Scaffold id
136
+ 4 position 1-based index of the position in contig
137
+ 5 ref_base Reference nucleotide at the position
138
+ 6 coverage read depth
139
+ 7 bases read bases
140
+ 8 base_quals read base qualities
141
+ 9 sequence_left selected size of reference sequence on the left variant
142
+ 10 Alt_seq Alternate allele at the position
143
+ 11 sequence_right selected size of reference sequence on the right variant
144
+
145
+ ------
146
+
147
+ **cheripic settings**
148
+
149
+ All of the options have a default value. You can change any of them. All of the options are implemented.
150
+
151
+ ------
152
+
153
+ **cheripic parameter list**
154
+
155
+ OPTIONS:
156
+ -f, --assembly Assembly file in FASTA format
157
+ -F, --input-format bulk and parent alignment file format types - set either pileup or bam (default: pileup)
158
+ -a, --mut-bulk Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1
159
+ -b, --bg-bulk Pileup or sorted BAM file alignments from background/wildtype bulk 2
160
+ --output Directory to store results, will be created if not existing (default: cheripic_results)
161
+ --loglevel Choose any one of "info / warn / debug" level for logs generated (default: debug)
162
+ --hmes-adjust factor added to snp count of each contig to adjust for hme score calculations (default: 0.5)
163
+ --htlow lower level for categorizing heterozygosity (default: 0.2)
164
+ --hthigh high level for categorizing heterozygosity (default: 0.9)
165
+ --mindepth minimum read depth to conisder a position for variant calls (default: 6)
166
+ --min-non-ref-count minimum read depth supporting non reference base at each position (default: 3)
167
+ --min-indel-count-support minimum read depth supporting an indel at each position (default: 3)
168
+ --ambiguous-ref-bases including variant at completely ambiguous bases in the reference
169
+ -q, --mapping-quality minimum mapping quality of read covering the position (default: 20)
170
+ -Q, --base-quality minimum base quality of bases covering the position (default: 15)
171
+ --noise praportion of reads for a variant to conisder as noise (default: 0.1)
172
+ --cross-type type of cross used to generated mapping population - back or out (default: back)
173
+ --use-all-contigs option to select all contigs or only contigs containing variants for analysis
174
+ --include-low-hmes option to include or discard variants from contigs with low hme-score or bfr score to list in the final output
175
+ --polyploidy Set if the data input is from polyploids
176
+ -p, --mut-parent Pileup or sorted BAM file alignments from mutant/trait of interest parent (default: )
177
+ -r, --bg-parent Pileup or sorted BAM file alignments from background/wildtype parent (default: )
178
+ --bfr-adjust factor added to hemi snp frequency of each parent to adjust for bfr calculations (default: 0.05)
179
+ --sel-seq-len sequence length to print from either side of selected variants (default: 50)
180
+
181
+ ------
182
+
183
+ .. class:: infomark
184
+
185
+ **Tool Author**
186
+
187
+ Shyam Rallapalli
188
+
189
+
190
+ </help>
191
+
192
+ <citations>
193
+ <citation type="doi">10.1093/bioinformatics/btg1080</citation>
194
+ </citations>
195
+
196
+ </tool>
data/lib/cheripic.rb CHANGED
@@ -38,3 +38,4 @@ require 'cheripic/options'
38
38
  require 'cheripic/contig_pileups'
39
39
  require 'cheripic/bfr'
40
40
  require 'cheripic/regions'
41
+ require 'cheripic/vcf'
data/lib/cheripic/cmd.rb CHANGED
@@ -52,10 +52,14 @@ module Cheripic
52
52
  opt :mut_bulk, 'Pileup or sorted BAM file alignments from mutant/trait of interest bulk 1',
53
53
  :short => '-a',
54
54
  :type => String
55
+ opt :mut_bulk_vcf, 'vcf file for variants from mutant/trait of interest bulk 1',
56
+ :type => String
55
57
  opt :bg_bulk, 'Pileup or sorted BAM file alignments from background/wildtype bulk 2',
56
58
  :short => '-b',
57
59
  :type => String
58
- opt :output, 'Directory to store results, will be created if not existing',
60
+ opt :bg_bulk_vcf, 'vcf file for variants from background/wildtype bulk 2',
61
+ :type => String
62
+ opt :output, 'custom name tag to include in the output file name',
59
63
  :default => 'cheripic_results'
60
64
  opt :loglevel, 'Choose any one of "info / warn / debug" level for logs generated',
61
65
  :default => 'debug'
@@ -68,9 +72,17 @@ module Cheripic
68
72
  opt :hthigh, 'high level for categorizing heterozygosity',
69
73
  :type => Float,
70
74
  :default => 0.9
71
- opt :mindepth, 'minimum read depth to conisder a position for variant calls',
75
+ opt :mindepth, 'minimum read depth at a position to consider for variant calls',
72
76
  :type => Integer,
73
77
  :default => 6
78
+ opt :max_d_multiple, "multiplication factor for average coverage to calculate maximum read coverage
79
+ if set zero no calculation will be made from bam file.\nsetting this value will override user set max depth",
80
+ :type => Integer,
81
+ :default => 5
82
+ opt :maxdepth, "maximum read depth at a position to consider for variant calls
83
+ if set to zero no user max depth will be used",
84
+ :type => Integer,
85
+ :default => 0
74
86
  opt :min_non_ref_count, 'minimum read depth supporting non reference base at each position',
75
87
  :type => Integer,
76
88
  :default => 3
@@ -97,7 +109,8 @@ module Cheripic
97
109
  opt :use_all_contigs, 'option to select all contigs or only contigs containing variants for analysis',
98
110
  :type => FalseClass,
99
111
  :default => false
100
- opt :include_low_hmes, 'option to include or discard variants from contigs with low hme-score or bfr score to list in the final output',
112
+ opt :include_low_hmes, 'option to include or discard variants from contigs with
113
+ low hme-score or bfr score to list in the final output',
101
114
  :type => FalseClass,
102
115
  :default => false
103
116
  opt :polyploidy, 'Set if the data input is from polyploids',
@@ -111,6 +124,10 @@ module Cheripic
111
124
  :short => '-r',
112
125
  :type => String,
113
126
  :default => ''
127
+ opt :repeats_file, 'repeat masker output file for the assembly ',
128
+ :short => '-R',
129
+ :type => String,
130
+ :default => ''
114
131
  opt :bfr_adjust, 'factor added to hemi snp frequency of each parent to adjust for bfr calculations',
115
132
  :type => Float,
116
133
  :default => 0.05
@@ -133,8 +150,9 @@ module Cheripic
133
150
 
134
151
  Inputs:
135
152
  1. Needs a reference fasta file of asssembly use for variant analysis
136
- 2. Pileup files for mutant (phenotype of interest) bulks and background (wildtype phenotype) bulks
137
- 3. If polyploid species, include of pileup from one or both parents
153
+ 2. Pileup/Bam files for mutant (phenotype of interest) bulks and background (wildtype phenotype) bulks
154
+ 3. If providing bam files, you have to include vcf files for the respective bulks
155
+ 4. If polyploid species, include pileup/bam files from one or both parents
138
156
 
139
157
  USAGE:
140
158
  cheripic <options>
@@ -149,15 +167,19 @@ module Cheripic
149
167
  def print_examples
150
168
  msg = <<-EOS
151
169
 
152
- Cheripic v#{Cheripic::VERSION.dup}
170
+ Cheripic v#{Cheripic::VERSION.dup}
171
+ Authors: Shyam Rallapalli and Dan MacLean
172
+
173
+ EXAMPLE COMMANDS:
174
+ 1. cheripic -f assembly.fa -a mutbulk.pileup -b bgbulk.pileup --output=cheripic_output
175
+ 2. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
176
+ --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results
177
+ 3. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
178
+ --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true
179
+ --no-only-frag-with-vars --no-filter-out-low-hmes --output cheripic_results
180
+ 4. cheripic -h or cheripic --help
181
+ 5. cheripic -v or cheripic --version
153
182
 
154
- EXAMPLE COMMANDS:
155
- 1. cheripic -f assembly.fa -a mutbulk.pileup -b bgbulk.pileup --output=cheripic_output
156
- 2. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
157
- --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true --output cheripic_results
158
- 3. cheripic --assembly assembly.fa --mut-bulk mutbulk.pileup --bg-bulk bgbulk.pileup
159
- --mut-parent mutparent.pileup --bg-parent bgparent.pileup --polyploidy true
160
- --no-only-frag-with-vars --no-filter-out-low-hmes --output cheripic_results
161
183
  EOS
162
184
  puts msg.split("\n").map{ |line| line.lstrip }.join("\n")
163
185
  exit(0)
@@ -165,44 +187,66 @@ module Cheripic
165
187
 
166
188
  # calls other methods to check if command line inputs are valid
167
189
  def check_arguments
168
- check_output_dir
190
+ check_output
169
191
  check_log_level
170
- check_input_files
192
+ check_input_types
171
193
  end
172
194
 
173
- # TODO: check bulk input types and process associated files
174
- # def check_input_types
175
- # if @options[:input_format] == 'vcf'
176
- #
177
- # end
178
- # end
179
-
180
- # checks if input files are valid
181
- def check_input_files
195
+ # checks input files based on bulk file type
196
+ def check_input_types
197
+ inputfiles = {}
198
+ inputfiles[:required] = %i{assembly mut_bulk}
199
+ inputfiles[:optional] = %i{bg_bulk}
200
+ if @options[:input_format] == 'bam'
201
+ inputfiles[:required] << %i{mut_bulk_vcf}
202
+ inputfiles[:optional] << %i{bg_bulk_vcf}
203
+ end
182
204
  if @options[:polyploidy]
183
- inputfiles = %i{assembly mut_bulk bg_bulk mut_parent bg_parent}
184
- else
185
- inputfiles = %i{assembly mut_bulk bg_bulk}
205
+ inputfiles[:either] = %i{mut_parent bg_parent}
186
206
  end
187
- inputfiles.each do | symbol |
188
- if @options[symbol]
189
- file = @options[symbol]
190
- @options[symbol] = File.expand_path(file)
191
- unless File.exist?(file)
192
- raise CheripicIOError.new "#{symbol} file, #{file} does not exist: "
207
+ check_input_files(inputfiles)
208
+ end
209
+
210
+ # checks if input files are valid
211
+ def check_input_files(inputfiles)
212
+ check = 0
213
+ inputfiles.each_key do | type |
214
+ inputfiles[type].flatten!
215
+ inputfiles[type].each do | symbol |
216
+ if @options[symbol]
217
+ file = @options[symbol]
218
+ @options[symbol] = File.expand_path(file)
219
+ next if type == :optional
220
+ if type == :required and not File.exist?(file)
221
+ raise CheripicIOError.new "#{symbol} file, #{file} does not exist: "
222
+ elsif type == :either and File.exist?(file)
223
+ check = 1
224
+ end
225
+ elsif type == :required
226
+ raise CheripicArgError.new "Options #{inputfiles}, all must be specified. " +
227
+ 'Try --help for further help.'
193
228
  end
194
- else
195
- raise CheripicArgError.new "Options #{inputfiles}, all must be specified. " +
196
- 'Try --help for help.'
229
+ end
230
+ if type == :either and check == 0
231
+ raise CheripicArgError.new "One of the options #{inputfiles}, must be specified. " +
232
+ 'Try --help for further help.'
197
233
  end
198
234
  end
199
235
  end
200
236
 
201
- # checks if output directory already exists
202
- def check_output_dir
203
- if Dir.exist?(@options[:output])
204
- raise CheripicArgError.new "#{@options[:output]} directory exists" +
205
- 'please choose a different output directory name'
237
+ # checks if files with output tag name already exists
238
+ def check_output
239
+ if (@options[:output].split('') & %w{# / : * ? ' < > | & $ ,}).any?
240
+ raise CheripicArgError.new 'please choose a name tag that contains ' +
241
+ 'alphanumeric characters, hyphen(-) and underscore(_) only'
242
+ end
243
+ @options[:hmes_frags] = "#{@options[:output]}_selected_hme_variants.txt"
244
+ @options[:bfr_frags] = "#{@options[:output]}_selected_bfr_variants.txt"
245
+ [@options[:hmes_frags], @options[:bfr_frags]].each do | file |
246
+ if File.exist?(file)
247
+ raise CheripicArgError.new "'#{file}' file exists " +
248
+ 'please choose a different name tag to be included in the output file name'
249
+ end
206
250
  end
207
251
  end
208
252
 
@@ -220,7 +264,8 @@ module Cheripic
220
264
  # A hash of trollop option names as keys and user or default
221
265
  # setting as values is passed to Implementer object
222
266
  def run
223
- @options[:output] = File.expand_path @options[:output]
267
+ @options[:hmes_frags] = File.expand_path @options[:hmes_frags]
268
+ @options[:bfr_frags] = File.expand_path @options[:bfr_frags]
224
269
  analysis = Implementer.new(@options)
225
270
  analysis.run
226
271
  end
@@ -22,7 +22,7 @@ module Cheripic
22
22
  # @return [Integer] length of contig in bases
23
23
  class Contig
24
24
 
25
- attr_accessor :hm_pos, :ht_pos, :hemi_pos
25
+ attr_accessor :hm_pos, :ht_pos, :hemi_pos, :mean_depth, :sd_depth
26
26
  attr_reader :id, :length
27
27
 
28
28
  # creates a Contig object using fasta entry
@@ -33,6 +33,8 @@ module Cheripic
33
33
  @hm_pos = {}
34
34
  @ht_pos = {}
35
35
  @hemi_pos = {}
36
+ @mean_depth = nil
37
+ @sd_depth = nil
36
38
  end
37
39
 
38
40
  # Number of homozygous variants identified in the contig
@@ -32,7 +32,7 @@ module Cheripic
32
32
  def_delegators :@mut_parent, :each, :each_key, :each_value, :length, :[], :store
33
33
  def_delegators :@bg_parent, :each, :each_key, :each_value, :length, :[], :store
34
34
  attr_accessor :id, :parent_hemi
35
- attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent
35
+ attr_accessor :mut_bulk, :bg_bulk, :mut_parent, :bg_parent, :masked_regions
36
36
 
37
37
  # creates a ContigPileup object using fasta entry id
38
38
  # @param fasta [String] a contig id from fasta entry
@@ -43,16 +43,27 @@ module Cheripic
43
43
  @mut_parent = {}
44
44
  @bg_parent = {}
45
45
  @parent_hemi = {}
46
+ @masked_regions = Hash.new { |h,k| h[k] = {} }
47
+ @hm_pos = {}
48
+ @ht_pos = {}
49
+ @hemi_pos = {}
46
50
  end
47
51
 
48
52
  # bulk pileups are compared and variant positions are selected
49
53
  # @return [Array<Hash>] variant positions are stored in hashes
50
54
  # for homozygous, heterozygous and hemi-variant positions
51
55
  def bulks_compared
52
- @hm_pos = {}
53
- @ht_pos = {}
54
- @hemi_pos = {}
55
56
  @mut_bulk.each_key do | pos |
57
+ ignore = 0
58
+ unless @masked_regions.empty?
59
+ @masked_regions.each_key do | index |
60
+ if pos.between?(@masked_regions[index][:begin], @masked_regions[index][:end])
61
+ ignore = 1
62
+ logger.info "variant is in the masked region\t#{@mut_bulk[pos].to_s}"
63
+ end
64
+ end
65
+ end
66
+ next if ignore == 1
56
67
  if Options.polyploidy and @parent_hemi.key?(pos)
57
68
  bg_bases = ''
58
69
  if @bg_bulk.key?(pos)
@@ -74,27 +85,37 @@ module Cheripic
74
85
  # @param pos [Integer] position in the contig
75
86
  # stores variant type, position and allele fraction to either @hm_pos or @ht_pos hashes
76
87
  def compare_pileup(pos)
77
- base_hash = @mut_bulk[pos].var_base_frac
78
- base_hash.delete(:ref)
79
- return nil if base_hash.empty?
80
- # we could ignore complex loci or
81
- # take the variant type based on predominant base
82
- if base_hash.length > 1
83
- fraction = base_hash.values.max
84
- mut_type = var_mode(fraction)
85
- else
86
- fraction = base_hash[base_hash.keys[0]]
87
- mut_type = var_mode(fraction)
88
- end
88
+ mut_type, fraction = var_mode_fraction(@mut_bulk[pos])
89
+ return nil if mut_type.nil?
89
90
  if @bg_bulk.key?(pos)
90
- bg_type = bg_bulk_var(pos)
91
+ bg_type = var_mode_fraction(@bg_bulk[pos])[0]
91
92
  mut_type = compare_var_type(mut_type, bg_type)
92
93
  end
93
- unless mut_type == nil
94
+ unless mut_type.nil?
94
95
  categorise_pos(mut_type, pos, fraction)
95
96
  end
96
97
  end
97
98
 
99
+
100
+ # Method to extract var_mode and allele fraction from pileup information at a position in contig
101
+ #
102
+ # @param pileup_info [Pileup] pileup object
103
+ # @return [Symbol] variant mode from pileup position (:hom or :het) at the position
104
+ # @return [Float] allele fraction at the position
105
+ def var_mode_fraction(pileup_info)
106
+ base_frac_hash = pileup_info.var_base_frac
107
+ base_frac_hash.delete(:ref)
108
+ return [nil, nil] if base_frac_hash.empty?
109
+ # we could ignore complex loci or
110
+ # take the variant type based on predominant base
111
+ if base_frac_hash.length > 1
112
+ fraction = base_frac_hash.values.max
113
+ else
114
+ fraction = base_frac_hash[base_frac_hash.keys[0]]
115
+ end
116
+ [var_mode(fraction), fraction]
117
+ end
118
+
98
119
  # Categorizes variant zygosity based on the allele fraction provided.
99
120
  # Uses lower and upper limit set for heterozygosity in the options.
100
121
  # @note consider increasing the range of heterozygosity limits for RNA-seq data
@@ -125,23 +146,6 @@ module Cheripic
125
146
  end
126
147
  end
127
148
 
128
- # Method to extract var_mode from pileup information at a position in contig
129
- #
130
- # @param pos [Integer] position in the contig
131
- # @return [Symbol] variant mode of the background bulk (:hom or :het) at the position
132
- def bg_bulk_var(pos)
133
- bg_base_hash = @bg_bulk[pos].var_base_frac
134
- bg_base_hash.delete(:ref)
135
- return nil if bg_base_hash.empty?
136
- if bg_base_hash.length > 1
137
- # taking only var mode
138
- var_mode(bg_base_hash.values.max)
139
- else
140
- # taking only var mode
141
- var_mode(bg_base_hash[bg_base_hash.keys[0]])
142
- end
143
- end
144
-
145
149
  # method stores pos as key and allele fraction as value
146
150
  # to @hm_pos or @ht_pos hash based on variant type
147
151
  # @param var_type [Symbol] values are either :hom or :het
@@ -156,18 +160,18 @@ module Cheripic
156
160
  end
157
161
 
158
162
  # Compares parental pileups for the contig and identify position
159
- # that indicate variants from homelogues called hemi-snps
163
+ # that indicate variants from homeologues called hemi-snps
160
164
  # and calculates bulk frequency ratio (bfr)
161
165
  # @return [Hash] parent_hemi hash with position as key and bfr as value
162
166
  def hemisnps_in_parent
163
167
  # mark all the hemi snp based on both parents
164
- self.mut_parent.each_key do |pos|
168
+ @mut_parent.each_key do |pos|
165
169
  mut_parent_frac = @mut_parent[pos].var_base_frac
166
- if self.bg_parent.key?(pos)
170
+ if @bg_parent.key?(pos)
167
171
  bg_parent_frac = @bg_parent[pos].var_base_frac
168
172
  bfr = Bfr.get_bfr(mut_parent_frac, bg_parent_frac)
169
173
  @parent_hemi[pos] = bfr
170
- self.bg_parent.delete(pos)
174
+ @bg_parent.delete(pos)
171
175
  else
172
176
  bfr = Bfr.get_bfr(mut_parent_frac)
173
177
  @parent_hemi[pos] = bfr
@@ -175,7 +179,7 @@ module Cheripic
175
179
  end
176
180
 
177
181
  # now include all hemi snp unique to background parent
178
- self.bg_parent.each_key do |pos|
182
+ @bg_parent.each_key do |pos|
179
183
  unless @parent_hemi.key?(pos)
180
184
  bg_parent_frac = @bg_parent[pos].var_base_frac
181
185
  bfr = Bfr.get_bfr(bg_parent_frac)
@@ -25,15 +25,21 @@ module Cheripic
25
25
  input_format
26
26
  mut_bulk
27
27
  bg_bulk
28
- output
28
+ mut_bulk_vcf
29
+ bg_bulk_vcf
30
+ hmes_frags
31
+ bfr_frags
29
32
  mut_parent
30
- bg_parent}
33
+ bg_parent
34
+ repeats_file}
31
35
  @options = OpenStruct.new(inputs.select { |k| set1.include?(k) })
32
36
 
33
37
  set2 = %i{hmes_adjust
34
38
  htlow
35
39
  hthigh
36
40
  mindepth
41
+ maxdepth
42
+ max_d_multiple
37
43
  min_non_ref_count
38
44
  min_indel_count_support
39
45
  ambiguous_ref_bases
@@ -44,10 +50,10 @@ module Cheripic
44
50
  use_all_contigs
45
51
  include_low_hmes
46
52
  polyploidy
47
- bfr_adjust}
53
+ bfr_adjust
54
+ sel_seq_len}
48
55
  settings = inputs.select { |k| set2.include?(k) }
49
56
  Options.update(settings)
50
- FileUtils.mkdir_p @options.output
51
57
  @vars_extracted = false
52
58
  @has_run = false
53
59
  end
@@ -62,15 +68,21 @@ module Cheripic
62
68
 
63
69
  # Extracted variants from bulk comparison are re-analysed
64
70
  # and selected variants are written to a file
65
- def process_variants
66
- @variants.verify_bg_bulk_pileup
71
+ def process_variants(pos_type)
72
+ if pos_type == :hmes_frags
73
+ @variants.verify_bg_bulk_pileup
74
+ end
67
75
  # print selected variants that could be potential markers or mutation
68
- out_file = File.open("#{@options.output}/selected_variants.txt", 'w')
69
- out_file.puts "HME_Score\tAlleleFreq\tseq_id\tposition\tref_base\tcoverage\tbases\tbase_quals\tsequence_left\tAlt_seq\tsequence_right"
76
+ out_file = File.open(@options[pos_type], 'w')
77
+ out_file.puts "Score\tAlleleFreq\tseq_id\tposition\tref_base\tcoverage\tbases\tbase_quals\tsequence_left\tAlt_seq\tsequence_right"
70
78
  regions = Regions.new(@options.assembly)
71
- @variants.hmes_frags.each_key do | frag |
79
+ @variants.send(pos_type).each_key do | frag |
72
80
  contig_obj = @variants.assembly[frag]
73
- positions = contig_obj.hm_pos.keys
81
+ if pos_type == :hmes_frags
82
+ positions = contig_obj.hm_pos.keys
83
+ else
84
+ positions = contig_obj.hemi_pos.keys
85
+ end
74
86
  positions.each do | pos |
75
87
  pileup = @variants.pileups[frag].mut_bulk[pos]
76
88
  seqs = regions.fetch_seq(frag,pos)
@@ -87,11 +99,9 @@ module Cheripic
87
99
  unless @vars_extracted
88
100
  self.extract_vars
89
101
  end
102
+ self.process_variants(:hmes_frags)
90
103
  if Options.polyploidy
91
- self.process_variants
92
- @variants.bfr_frags
93
- else
94
- self.process_variants
104
+ self.process_variants(:bfr_frags)
95
105
  end
96
106
  @has_run = true
97
107
  end
@@ -12,6 +12,8 @@ module Cheripic
12
12
  :htlow => 0.2,
13
13
  :hthigh => 0.9,
14
14
  :mindepth => 6,
15
+ :maxdepth => 0,
16
+ :max_d_multiple => 5,
15
17
  :min_non_ref_count => 3,
16
18
  :min_indel_count_support => 3,
17
19
  :ambiguous_ref_bases => false,
@@ -53,6 +55,26 @@ module Cheripic
53
55
  @user_settings[:mindepth]
54
56
  end
55
57
 
58
+ # Maximum read coverage at the variant position to be considered for analysis
59
+ # @return [Integer]
60
+ def self.maxdepth
61
+ @user_settings[:maxdepth]
62
+ end
63
+
64
+ # Setting maximum read coverage at the variant position to be considered for analysis
65
+ # @param value [Integer] provided integer value will be updated as maxdepth
66
+ # @return [Integer] updated maxdepth value
67
+ def self.maxdepth=(value)
68
+ @user_settings[:maxdepth] = value
69
+ end
70
+
71
+ # Multiplication factor to average coverage to calculate maximum read coverage
72
+ # at the variant position to be considered for analysis
73
+ # @return [Integer]
74
+ def self.max_d_multiple
75
+ @user_settings[:max_d_multiple]
76
+ end
77
+
56
78
  # Minimum non reference count at the variant position to be considered for analysis
57
79
  # @return [Integer]
58
80
  def self.min_non_ref_count
@@ -4,6 +4,36 @@ require 'forwardable'
4
4
 
5
5
  module Cheripic
6
6
 
7
+ require 'bio-samtools'
8
+ require 'bio/db/sam'
9
+ require 'open3'
10
+
11
+ # An extension of Bio::DB::Sam object to modify depth method
12
+ class Bio::DB::Sam
13
+
14
+ # A method to retrieve depth information from bam object
15
+ # @param opts [Hash] a hash of following input options
16
+ # b [File] list of positions or regions in BED format
17
+ # l [INT] minQLen
18
+ # q [INT] base quality threshold
19
+ # Q [INT] mapping quality threshold
20
+ # r [chr:from-to] region
21
+ # @returns a block with each line reporting sequence_name, position and depth
22
+ def depth(opts={})
23
+ command = form_opt_string(self.samtools, 'depth', opts)
24
+ # capture returns string output, so careful not to give whole genome or big contigs for depth analysis
25
+ stdout, stderr, status = Open3.capture3(command)
26
+ unless status.success?
27
+ logger.error "resulted in exit code #{status.exitstatus} using #{command}"
28
+ logger.error "stderr output is: #{stderr}"
29
+ raise CheripicError
30
+ end
31
+ # return stdout
32
+ stdout
33
+ end
34
+
35
+ end
36
+
7
37
  # Custom error handling for Variants class
8
38
  class VariantsError < CheripicError; end
9
39
 
@@ -27,10 +57,10 @@ module Cheripic
27
57
  include Enumerable
28
58
  extend Forwardable
29
59
  def_delegators :@assembly, :each, :each_key, :each_value, :size, :length, :[]
30
- attr_reader :assembly, :pileups, :hmes_frags, :bfr_frags, :pileups_analyzed
60
+ attr_reader :assembly, :pileups, :pileups_analyzed
31
61
 
32
62
  # creates a Variants object using user input files
33
- # @param options [Hash] a hash of required input files as keys and file paths as values
63
+ # @param options [OpenStruct] a hash of required input files as keys and file paths as values
34
64
  def initialize(options)
35
65
  @params = options
36
66
  @assembly = {}
@@ -50,25 +80,76 @@ module Cheripic
50
80
  @pileups[contig.id] = ContigPileups.new(contig.id)
51
81
  end
52
82
  @pileups_analyzed = false
83
+ unless @params.repeats_file == ''
84
+ store_repeat_regions
85
+ end
86
+ end
87
+
88
+ # reads repeat masker output file and stores masked regions to ignore variants in thos regions
89
+ def store_repeat_regions
90
+ File.foreach(@params.repeats_file) do |line|
91
+ line.strip!
92
+ next if line =~ /^SW/ or line =~ /^score/ or line == ''
93
+ info = line.split("\s")
94
+ pileups_obj = @pileups[info[4]]
95
+ index = pileups_obj.masked_regions.length
96
+ pileups_obj.masked_regions[index + 1][:begin] = info[5].to_i
97
+ pileups_obj.masked_regions[index + 1][:end] = info[6].to_i
98
+ end
53
99
  end
54
100
 
55
101
  # Reads and store pileup data for each of input bulk and parents pileup files
56
102
  # And sets pileups_analyzed to true that pileups files are processed
57
103
  def analyse_pileups
58
- @bg_bulk = @params.bg_bulk
59
- @mut_parent = @params.mut_parent
60
- @bg_parent = @params.bg_parent
61
-
104
+ if @params.input_format == 'bam'
105
+ @vcf_hash = Vcf.filtering(@params.mut_bulk_vcf, @params.bg_bulk_vcf)
106
+ end
62
107
  %i{mut_bulk bg_bulk mut_parent bg_parent}.each do | input |
63
108
  infile = @params[input]
64
109
  if infile != ''
65
- extract_pileup(infile, input)
110
+ logger.info "processing #{input} file"
111
+ if @params.input_format == 'pileup'
112
+ extract_pileup(infile, input)
113
+ else
114
+ extract_bam_pileup(infile, input)
115
+ end
66
116
  end
67
117
  end
68
118
 
69
119
  @pileups_analyzed = true
70
120
  end
71
121
 
122
+ # Bam object is read and each contig mean and std deviation of depth calculated
123
+ # @param bamobject [Bio::DB::Sam]
124
+ # Open3 capture returns string output, so careful not to give whole genome or big contigs for depth analysis
125
+ def set_max_depth(bamobject, bamfile)
126
+ logger.info "processing #{bamfile} file for depth"
127
+ all_depths = []
128
+ bq = Options.base_quality
129
+ mq = Options.mapping_quality
130
+ @assembly.each_key do | id |
131
+ contig_obj = @assembly[id]
132
+ len = contig_obj.length
133
+ data = bamobject.depth(:r => "#{id}", :Q => bq, :q => mq)
134
+ depths = []
135
+ data.split("\n").each do |line|
136
+ info = line.split("\t")
137
+ depths << info[2].to_i
138
+ end
139
+ variance = 0
140
+ mean_depth = depths.reduce(0, :+) / len.to_f
141
+ depths.each do |value|
142
+ variance += (value.to_f - mean_depth)**2
143
+ end
144
+ all_depths << mean_depth
145
+ contig_obj.sd_depth = Math.sqrt(variance)
146
+ contig_obj.mean_depth = mean_depth
147
+ end
148
+ # setting max depth as 3 times the average depth
149
+ mean_coverage = all_depths.reduce(0, :+) / @assembly.length.to_f
150
+ Options.maxdepth = Options.max_d_multiple * mean_coverage
151
+ end
152
+
72
153
  # Input pileup file is read and positions are selected that pass the thresholds
73
154
  # @param pileupfile [String] path to the pileup file to read
74
155
  # @param sym [Symbol] Symbol of the pileup file used to write selected variants
@@ -84,6 +165,54 @@ module Cheripic
84
165
  end
85
166
  end
86
167
 
168
+ # Input bamfile is read and selected positions pileups are stored
169
+ # @param bamfile [String] path to the bam file to read
170
+ # @param sym [Symbol] Symbol of the bam file used to write selected variants
171
+ # pileup information to respective ContigPileups object
172
+ def extract_bam_pileup(bamfile, sym)
173
+ bq = Options.base_quality
174
+ mq = Options.mapping_quality
175
+ bamobject = Bio::DB::Sam.new(:bam=>bamfile, :fasta=>@params.assembly)
176
+ bamobject.index unless bamobject.indexed?
177
+
178
+ # check if user has set max depth or set to zero to ignore
179
+ max_d = Options.maxdepth
180
+ # or calculate from bamfile
181
+ if Options.max_d_multiple > 0
182
+ set_max_depth(bamobject, bamfile)
183
+ max_d = Options.maxdepth
184
+ logger.info "max depth used for #{sym} file\t#{max_d}"
185
+ end
186
+
187
+ @vcf_hash.each_key do | id |
188
+ positions = @vcf_hash[id][:het].keys
189
+ positions << @vcf_hash[id][:hom].keys
190
+ positions.flatten!
191
+ next if positions.empty?
192
+ contig_obj = @pileups[id]
193
+ positions.each do | pos |
194
+ command = "#{bamobject.samtools} mpileup -r #{id}:#{pos}-#{pos} -Q #{bq} -q #{mq} -B -f #{@params.assembly} #{bamfile}"
195
+ stdout, stderr, status = Open3.capture3(command)
196
+ unless status.success?
197
+ logger.error "resulted in exit code #{status.exitstatus} using #{command}"
198
+ logger.error "stderr output is: #{stderr}"
199
+ raise CheripicError
200
+ end
201
+ stdout.chomp!
202
+ if stdout == '' or stdout.split("\t")[3].to_i == 0 or stdout =~ /^\t0/
203
+ logger.info "pileup data empty for\t#{id}\t#{pos}"
204
+ else
205
+ pileup = Pileup.new(stdout)
206
+ unless max_d == 0 or pileup.coverage <= max_d
207
+ logger.info "pileup coverage is higher than max\t#{pileup.to_s}"
208
+ next
209
+ end
210
+ contig_obj.send(sym).store(pos, pileup)
211
+ end
212
+ end
213
+ end
214
+ end
215
+
87
216
  # Once pileup files are analysed and variants are extracted from each bulk;
88
217
  # bulks are compared to identify and isolate variants for downstream analysis.
89
218
  # If polyploidy set to trye and mut_parent and bg_parent bulks are provided
@@ -95,8 +224,10 @@ module Cheripic
95
224
  @assembly.each_key do | id |
96
225
  contig = @assembly[id]
97
226
  # extract parental hemi snps for polyploids before bulks are compared
98
- if @mut_parent != '' or @bg_parent != ''
99
- @pileups[id].hemisnps_in_parent
227
+ if Options.polyploidy
228
+ if @params.mut_parent != '' or @params.bg_parent != ''
229
+ @pileups[id].hemisnps_in_parent
230
+ end
100
231
  end
101
232
  contig.hm_pos, contig.ht_pos, contig.hemi_pos = @pileups[id].bulks_compared
102
233
  end
@@ -0,0 +1,83 @@
1
+ # encoding: utf-8
2
+
3
+ module Cheripic
4
+
5
+ # Custom error handling for Vcf class
6
+ class VcfError < CheripicError; end
7
+
8
+ require 'bio-samtools'
9
+
10
+ class Vcf
11
+
12
+ def self.get_allele_freq(vcf_obj)
13
+ # check if the vcf is from samtools (has DP4 and AF1 fields in INFO)
14
+ if vcf_obj.info.key?('DP4')
15
+ freq = vcf_obj.info['DP4'].split(',')
16
+ depth = freq.inject { | sum, n | sum.to_f + n.to_f }
17
+ alt = freq[2].to_f + freq[3].to_f
18
+ allele_freq = alt / depth
19
+ # allele_freq = vcf_obj.non_ref_allele_freq
20
+ # check if the vcf is from VarScan (has RD, AD and FREQ fields in FORMAT)
21
+ elsif vcf_obj.samples['1'].key?('RD')
22
+ alt = vcf_obj.samples['1']['AD'].to_f
23
+ depth = vcf_obj.samples['1']['RD'].to_f + alt
24
+ allele_freq = alt / depth
25
+ # check if the vcf is from GATK (has AD and GT fields in FORMAT)
26
+ elsif vcf_obj.samples['1'].key?('AD') and vcf_obj.samples['1']['AD'].include?(',')
27
+ freq = vcf_obj.samples['1']['AD'].split(',')
28
+ allele_freq = freq[1].to_f / ( freq[0].to_f + freq[1].to_f )
29
+ # check if the vcf has has AF fields in INFO
30
+ elsif vcf_obj.info.key?('AF')
31
+ allele_freq = vcf_obj.info['AF'].to_f
32
+ else
33
+ raise VcfError.new 'not a supported vcf format (VarScan, GATK, Bcftools(Samtools), Vcf 4.0, 4.1 and 4.2)' +
34
+ " and check that it is one sample vcf\n"
35
+ end
36
+ allele_freq
37
+ end
38
+
39
+
40
+ ##Input: vcf file
41
+ ##Ouput: lists of hm and ht SNPS and hash of all fragments with variants
42
+ def self.get_vars(vcf_file)
43
+ ht_low = Options.htlow
44
+ ht_high = Options.hthigh
45
+
46
+ # hash of :het and :hom with frag ids and respective variant positions
47
+ var_pos = Hash.new{ |h,k| h[k] = Hash.new(&h.default_proc) }
48
+ File.foreach(vcf_file) do |line|
49
+ next if line =~ /^#/
50
+ v = Bio::DB::Vcf.new(line)
51
+ unless v.alt == '.'
52
+ allele_freq = get_allele_freq(v)
53
+ if allele_freq.between?(ht_low, ht_high)
54
+ var_pos[v.chrom][:het][v.pos] = allele_freq
55
+ elsif allele_freq > ht_high
56
+ var_pos[v.chrom][:hom][v.pos] = allele_freq
57
+ end
58
+ end
59
+ end
60
+ var_pos
61
+ end
62
+
63
+ def self.filtering(mutant_vcf, bgbulk_vcf)
64
+ var_pos_mut = get_vars(mutant_vcf)
65
+ return var_pos_mut if bgbulk_vcf == ''
66
+ var_pos_bg = get_vars(bgbulk_vcf)
67
+
68
+ # if both bulks have homozygous mutations at same positions then deleting them
69
+ var_pos_mut.each_key do | frag |
70
+ positions = var_pos_mut[frag][:hom].keys
71
+ pos_bg_bulk = var_pos_bg[frag][:hom].keys
72
+ positions.each do |pos|
73
+ if pos_bg_bulk.include?(pos)
74
+ var_pos_mut[frag][:hom].delete(pos)
75
+ end
76
+ end
77
+ end
78
+ var_pos_mut
79
+ end
80
+
81
+ end
82
+
83
+ end
@@ -2,6 +2,6 @@ module Cheripic
2
2
 
3
3
  # Sets the semantic version number for this module.
4
4
  # Version number will be used in help messages and for generating gem.
5
- VERSION = '1.2.0'
5
+ VERSION = '1.2.5'
6
6
 
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cheripic
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.0
4
+ version: 1.2.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Shyam Rallapalli
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-08-11 00:00:00.000000000 Z
11
+ date: 2016-10-17 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: yell
@@ -84,40 +84,6 @@ dependencies:
84
84
  - - "~>"
85
85
  - !ruby/object:Gem::Version
86
86
  version: 2.4.0
87
- - !ruby/object:Gem::Dependency
88
- name: bio-gngm
89
- requirement: !ruby/object:Gem::Requirement
90
- requirements:
91
- - - "~>"
92
- - !ruby/object:Gem::Version
93
- version: 0.2.1
94
- type: :runtime
95
- prerelease: false
96
- version_requirements: !ruby/object:Gem::Requirement
97
- requirements:
98
- - - "~>"
99
- - !ruby/object:Gem::Version
100
- version: 0.2.1
101
- - !ruby/object:Gem::Dependency
102
- name: rinruby
103
- requirement: !ruby/object:Gem::Requirement
104
- requirements:
105
- - - "~>"
106
- - !ruby/object:Gem::Version
107
- version: '2.0'
108
- - - ">="
109
- - !ruby/object:Gem::Version
110
- version: 2.0.3
111
- type: :runtime
112
- prerelease: false
113
- version_requirements: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - "~>"
116
- - !ruby/object:Gem::Version
117
- version: '2.0'
118
- - - ">="
119
- - !ruby/object:Gem::Version
120
- version: 2.0.3
121
87
  - !ruby/object:Gem::Dependency
122
88
  name: activesupport
123
89
  requirement: !ruby/object:Gem::Requirement
@@ -259,6 +225,7 @@ files:
259
225
  - ".gitignore"
260
226
  - ".travis.yml"
261
227
  - CODE_OF_CONDUCT.md
228
+ - ChangeLog.md
262
229
  - Gemfile
263
230
  - LICENSE.txt
264
231
  - README.md
@@ -267,6 +234,7 @@ files:
267
234
  - bin/console
268
235
  - bin/setup
269
236
  - cheripic.gemspec
237
+ - galaxy_cheripic_tool.xml
270
238
  - lib/cheripic.rb
271
239
  - lib/cheripic/bfr.rb
272
240
  - lib/cheripic/cmd.rb
@@ -277,6 +245,7 @@ files:
277
245
  - lib/cheripic/pileup.rb
278
246
  - lib/cheripic/regions.rb
279
247
  - lib/cheripic/variants.rb
248
+ - lib/cheripic/vcf.rb
280
249
  - lib/cheripic/version.rb
281
250
  homepage: https://github.com/shyamrallapalli/cheripic
282
251
  licenses: