bio-polyploid-tools 0.8.0 → 0.8.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +31 -17
- data/VERSION +1 -1
- data/bin/homokaryot_primers.rb +1 -1
- data/bin/polymarker.rb +6 -1
- data/bin/snp_position_to_polymarker.rb +16 -6
- data/bio-polyploid-tools.gemspec +7 -3
- data/lib/bio/PolyploidTools/ExonContainer.rb +6 -6
- data/lib/bio/PolyploidTools/SNP.rb +137 -140
- data/lib/bio/PolyploidTools/SNPMutant.rb +0 -1
- data/lib/bio/PolyploidTools/SNPSequence.rb +1 -1
- data/lib/bio/db/primer3.rb +3 -1
- data/test/data/BS00068396_51_contigs.fa.fai +4 -0
- data/test/data/BS00068396_51_contigs.fa.nhr +0 -0
- data/test/data/BS00068396_51_contigs.fa.nin +0 -0
- data/test/data/BS00068396_51_contigs.fa.nsq +0 -0
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5cc3c126779f27e61f521959b82d13f240cb2ce8d5c5416e511f9150ced798eb
|
4
|
+
data.tar.gz: 13adf99f1336327f7b399057d66ee63892ee276ee507a398fb4a6936a2a765a2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d345c23216e2d6aa3174885a053300ec499230125625d36fe1b5efc8de3d151b5423ad8ed5e0c563b2c9be5b07df29a759a9f95ee1196588eefa0ab2e40ec802
|
7
|
+
data.tar.gz: 1ed47854ec04f95c7d8449e1a15885a67f00c076b4bc07705616d6762ad7cf8822d60933cb737c71b95cbfb46293b4cb970e3d2f6413ccde70ca8e0f372e2ab4
|
data/README.md
CHANGED
@@ -1,10 +1,12 @@
|
|
1
|
-
#bio-polyploid-tools
|
1
|
+
# bio-polyploid-tools
|
2
|
+
|
3
|
+
## Introduction
|
2
4
|
|
3
|
-
##Introduction
|
4
5
|
This tools are designed to deal with polyploid wheat. The first tool is to design KASP primers, making them as specific as possible.
|
5
6
|
|
6
7
|
|
7
|
-
##Installation
|
8
|
+
## Installation
|
9
|
+
|
8
10
|
```sh
|
9
11
|
gem install bio-polyploid-tools
|
10
12
|
```
|
@@ -13,13 +15,19 @@ You need to have in your ```$PATH``` the following programs:
|
|
13
15
|
* [MAFFT](http://mafft.cbrc.jp/alignment/software/)
|
14
16
|
* [primer3](http://primer3.sourceforge.net/releases.php)
|
15
17
|
* [exonerate](http://www.ebi.ac.uk/~guy/exonerate/)
|
18
|
+
* [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE%3DBlastDocs&DOC_TYPE%3DDownload)
|
16
19
|
|
17
|
-
The code
|
20
|
+
The code was originally developed on ruby 2.1.0, but it should work on 1.9.3 and above. However, it is only actively tested in currently supported ruby versions:
|
21
|
+
|
22
|
+
* 2.1.10
|
23
|
+
* 2.2.5
|
24
|
+
* 2.3.5
|
25
|
+
* 2.4.2
|
18
26
|
|
19
|
-
#PolyMarker
|
20
27
|
|
21
|
-
|
28
|
+
# PolyMarker
|
22
29
|
|
30
|
+
To run PolyMarker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
|
23
31
|
|
24
32
|
|
25
33
|
```sh
|
@@ -80,7 +88,7 @@ If the flanking sequence is unknow, but the position on a reference is available
|
|
80
88
|
* **alternative allele** The base in the alternative allele.
|
81
89
|
* **target chromosome** for the specific primers. Must be in line with the chromosome selection critieria.
|
82
90
|
|
83
|
-
####Example
|
91
|
+
#### Example
|
84
92
|
|
85
93
|
```
|
86
94
|
IWGSC_CSS_1AL_scaff_110,C,519,A,2A
|
@@ -89,7 +97,8 @@ IWGSC_CSS_1AL_scaff_110,C,519,A,2A
|
|
89
97
|
This file format can be used with ```snp_positions_to_polymarker.rb``` to produce the input for the option```--marker_list```.
|
90
98
|
|
91
99
|
|
92
|
-
###Custom reference sequences.
|
100
|
+
### Custom reference sequences.
|
101
|
+
|
93
102
|
By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
94
103
|
) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
|
95
104
|
|
@@ -117,33 +126,38 @@ To use blast instead of exonerate, use the following command:
|
|
117
126
|
```
|
118
127
|
|
119
128
|
|
120
|
-
##Release Notes
|
129
|
+
## Release Notes
|
130
|
+
|
131
|
+
### 0.8
|
132
|
+
|
133
|
+
* FEATURE: ```polymarker.rb``` added the flag ```--aligner blast|exonerate ``` which lets you pick between ```blast``` or ```exonerate``` as the aligner. For blast the default is to have the database with the same name as the ```--contigs``` file. However, it is possible to use a different name vua the option ```--database```.
|
134
|
+
|
135
|
+
### 0.7.3
|
121
136
|
|
122
|
-
###0.7.3
|
123
137
|
* FEATURE: ```polymarker.rb``` Added to the flag ```--arm_selection``` the option ```scaffold```, which now supports a scaffold specific primer.
|
124
138
|
* FEATURE: ```snp_position_to_polymarker``` Added the option ```--mutant_list``` to prepare files for PolyMarker from files with the following columns ```ID,Allele_1,position,Allele_1,target_chromosome```.
|
125
139
|
|
126
|
-
###0.7.2
|
140
|
+
### 0.7.2
|
141
|
+
|
127
142
|
* FEATURE: Added a flag ```min_identity``` to set the minimum identity to consider a hit. The default is 90
|
128
143
|
|
129
|
-
###0.7.1
|
144
|
+
### 0.7.1
|
130
145
|
* BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
|
131
146
|
* DOC: Added documentation on how to use custom references.
|
132
147
|
|
133
|
-
###0.7.0
|
148
|
+
### 0.7.0
|
134
149
|
* Added flag ```genomes_count``` for number of genomes, to be used on tetraploids, etc.
|
135
150
|
|
136
|
-
###0.6.1
|
151
|
+
### 0.6.1
|
137
152
|
|
138
153
|
|
139
154
|
* polymarker.rb now validates that all the files exist.
|
140
155
|
* BUGFIX: A reference was required even when it was not used to generate contigs.
|
141
156
|
|
142
|
-
#Notes
|
143
|
-
|
157
|
+
# Notes
|
144
158
|
|
145
|
-
* If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
146
159
|
|
160
|
+
* BUG: If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
147
161
|
* BUG: Blocks with NNNs are picked and treated as semi-specific.
|
148
162
|
* BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
149
163
|
* TODO: Add a parameter file to configure the alignments.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.8.
|
1
|
+
0.8.1
|
data/bin/homokaryot_primers.rb
CHANGED
@@ -180,4 +180,4 @@ end
|
|
180
180
|
|
181
181
|
kasp_container.add_primers_file(primer_3_output)
|
182
182
|
header = "Marker,SNP,RegionSize,SNP_type,#{snp_in},#{original_name},common,primer_type,orientation,#{snp_in}_TM,#{original_name}_TM,common_TM,selected_from,product_size"
|
183
|
-
File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
|
183
|
+
File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
|
data/bin/polymarker.rb
CHANGED
@@ -12,6 +12,11 @@ require path
|
|
12
12
|
|
13
13
|
arm_selection_functions = Hash.new;
|
14
14
|
|
15
|
+
arm_selection_functions[:arm_selection_nrgenes] = lambda do | contig_name |
|
16
|
+
#example format: chr2A
|
17
|
+
ret = contig_name[3,2]
|
18
|
+
return ret
|
19
|
+
end
|
15
20
|
|
16
21
|
arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
|
17
22
|
contig_name.gsub!(/chr/,"")
|
@@ -417,4 +422,4 @@ rescue StandardError => e
|
|
417
422
|
rescue Exception => e
|
418
423
|
write_status "ERROR\t#{e.message}"
|
419
424
|
raise e
|
420
|
-
end
|
425
|
+
end
|
@@ -71,16 +71,26 @@ File.open(test_file) do | f |
|
|
71
71
|
snp = Bio::PolyploidTools::SNPMutant.parse(line)
|
72
72
|
entry = fasta_reference_db.index.region_for_entry(snp.contig)
|
73
73
|
end
|
74
|
-
|
74
|
+
#puts line
|
75
75
|
if entry
|
76
76
|
region = entry.get_full_region
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
77
|
+
snp_name = snp.snp_id_in_seq
|
78
|
+
|
79
|
+
#if region != lastRegion
|
80
|
+
# lastTemplate = fasta_reference_db.fetch_sequence(region)
|
81
|
+
#end
|
82
|
+
start, total, new_position = snp.to_polymarker_coordinates(options[:flanking_size])
|
83
|
+
region.start = start
|
84
|
+
region.end = start + total
|
85
|
+
#puts region
|
86
|
+
local_template = fasta_reference_db.fetch_sequence(region)
|
87
|
+
|
88
|
+
snp.position = new_position
|
89
|
+
|
90
|
+
snp.template_sequence = local_template
|
81
91
|
lastRegion = region
|
82
92
|
|
83
|
-
out.puts "#{snp.gene}_#{
|
93
|
+
out.puts "#{snp.gene}_#{snp_name},#{snp.chromosome},#{snp.to_polymarker_sequence(options[:flanking_size])}"
|
84
94
|
else
|
85
95
|
$stderr.puts "ERROR: Unable to find entry for #{snp.gene}"
|
86
96
|
end
|
data/bio-polyploid-tools.gemspec
CHANGED
@@ -2,16 +2,16 @@
|
|
2
2
|
# DO NOT EDIT THIS FILE DIRECTLY
|
3
3
|
# Instead, edit Juwelier::Tasks in Rakefile, and run 'rake gemspec'
|
4
4
|
# -*- encoding: utf-8 -*-
|
5
|
-
# stub: bio-polyploid-tools 0.8.
|
5
|
+
# stub: bio-polyploid-tools 0.8.1 ruby lib
|
6
6
|
|
7
7
|
Gem::Specification.new do |s|
|
8
8
|
s.name = "bio-polyploid-tools".freeze
|
9
|
-
s.version = "0.8.
|
9
|
+
s.version = "0.8.1"
|
10
10
|
|
11
11
|
s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
|
12
12
|
s.require_paths = ["lib".freeze]
|
13
13
|
s.authors = ["Ricardo H. Ramirez-Gonzalez".freeze]
|
14
|
-
s.date = "2018-01-
|
14
|
+
s.date = "2018-01-19"
|
15
15
|
s.description = "Repository of tools developed at Crop Genetics in JIC to work with polyploid wheat".freeze
|
16
16
|
s.email = "ricardo.ramirez-gonzalez@jic.ac.uk".freeze
|
17
17
|
s.executables = ["bfr.rb".freeze, "blast_triads.rb".freeze, "blast_triads_promoters.rb".freeze, "count_variations.rb".freeze, "filter_blat_by_target_coverage.rb".freeze, "filter_exonerate_by_identity.rb".freeze, "find_best_blat_hit.rb".freeze, "find_best_exonerate.rb".freeze, "find_homoeologue_variations.rb".freeze, "get_longest_hsp_blastx_triads.rb".freeze, "hexaploid_primers.rb".freeze, "homokaryot_primers.rb".freeze, "mafft_triads.rb".freeze, "mafft_triads_promoters.rb".freeze, "map_markers_to_contigs.rb".freeze, "markers_in_region.rb".freeze, "polymarker.rb".freeze, "polymarker_capillary.rb".freeze, "snp_position_to_polymarker.rb".freeze, "snps_between_bams.rb".freeze, "vcfLineToTable.rb".freeze]
|
@@ -102,6 +102,10 @@ Gem::Specification.new do |s|
|
|
102
102
|
"test/data/BS00068396_51_contigs.aln",
|
103
103
|
"test/data/BS00068396_51_contigs.dnd",
|
104
104
|
"test/data/BS00068396_51_contigs.fa",
|
105
|
+
"test/data/BS00068396_51_contigs.fa.fai",
|
106
|
+
"test/data/BS00068396_51_contigs.fa.nhr",
|
107
|
+
"test/data/BS00068396_51_contigs.fa.nin",
|
108
|
+
"test/data/BS00068396_51_contigs.fa.nsq",
|
105
109
|
"test/data/BS00068396_51_contigs.nhr",
|
106
110
|
"test/data/BS00068396_51_contigs.nin",
|
107
111
|
"test/data/BS00068396_51_contigs.nsq",
|
@@ -116,13 +116,13 @@ module Bio::PolyploidTools
|
|
116
116
|
target_region = exon.target_region
|
117
117
|
exon_start_offset = exon.query_region.start - gene_region.start
|
118
118
|
chr_local_pos=local_pos_in_gene + target_region.start + 1
|
119
|
-
ret_str
|
120
|
-
to_print =
|
121
|
-
chr_seq
|
122
|
-
l_pos
|
123
|
-
to_print <<
|
119
|
+
ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
|
120
|
+
to_print = "-" * exon_start_offset
|
121
|
+
chr_seq = chromosome_sequence(exon.target_region).to_s
|
122
|
+
l_pos = exon_start_offset + local_pos_in_gene
|
123
|
+
to_print << chr_seq
|
124
124
|
to_print[local_pos_in_gene] = to_print[local_pos_in_gene].upcase
|
125
|
-
ret_str
|
125
|
+
ret_str << to_print
|
126
126
|
end
|
127
127
|
ret_str
|
128
128
|
end
|
@@ -16,6 +16,8 @@ module Bio::PolyploidTools
|
|
16
16
|
attr_accessor :chromosome
|
17
17
|
attr_accessor :variation_free_region
|
18
18
|
|
19
|
+
|
20
|
+
|
19
21
|
#Format:
|
20
22
|
#Gene_name,Original,SNP_Pos,pos,chromosome
|
21
23
|
#A_comp0_c0_seq1,C,519,A,2A
|
@@ -30,7 +32,7 @@ module Bio::PolyploidTools
|
|
30
32
|
snp.snp.upcase!
|
31
33
|
snp.snp.strip!
|
32
34
|
snp.chromosome.strip!
|
33
|
-
|
35
|
+
|
34
36
|
snp.use_reference = false
|
35
37
|
snp
|
36
38
|
end
|
@@ -60,6 +62,16 @@ module Bio::PolyploidTools
|
|
60
62
|
@primer_3_min_seq_length = 50
|
61
63
|
@variation_free_region = 0
|
62
64
|
@contig = false
|
65
|
+
@exon_list = Hash.new {|hsh, key| hsh[key] = [] }
|
66
|
+
end
|
67
|
+
|
68
|
+
def to_polymarker_coordinates(flanking_size, total:nil)
|
69
|
+
start = position - flanking_size + 1
|
70
|
+
start = 0 if start < 0
|
71
|
+
total = flanking_size * 2 unless total
|
72
|
+
total += 1
|
73
|
+
new_position = position - start + 2
|
74
|
+
[start , total, new_position ]
|
63
75
|
end
|
64
76
|
|
65
77
|
def to_polymarker_sequence(flanking_size, total:nil)
|
@@ -103,8 +115,7 @@ module Bio::PolyploidTools
|
|
103
115
|
end
|
104
116
|
|
105
117
|
def add_exon(exon, arm)
|
106
|
-
|
107
|
-
@exon_list[arm] = exon if exon.record.score > @exon_list[arm].record.score
|
118
|
+
exon_list[arm] << exon
|
108
119
|
end
|
109
120
|
|
110
121
|
def covered_region
|
@@ -115,28 +126,28 @@ module Bio::PolyploidTools
|
|
115
126
|
reg.orientation = :forward
|
116
127
|
reg.start = self.position - self.flanking_size
|
117
128
|
reg.end = self.position + self.flanking_size
|
118
|
-
|
119
129
|
reg.start = 1 if reg.start < 1
|
120
|
-
|
121
130
|
return reg
|
122
131
|
end
|
123
132
|
|
124
133
|
min = @position
|
125
134
|
max = @position
|
126
|
-
|
127
|
-
|
128
|
-
#raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
|
135
|
+
# puts "Calculating covered region for #{self.inspect}"
|
136
|
+
# puts "#{@exon_list.inspect}"
|
137
|
+
# raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
|
129
138
|
if @exon_list.size == 0
|
130
139
|
min = self.position - self.flanking_size
|
131
140
|
min = 1 if min < 1
|
132
141
|
max = self.position + self.flanking_size
|
133
142
|
end
|
134
|
-
@exon_list.each do | chromosome,
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
143
|
+
@exon_list.each do | chromosome, exon_arr |
|
144
|
+
exon_arr.each do | exon |
|
145
|
+
reg = exon.query_region
|
146
|
+
min = reg.start if reg.start < min
|
147
|
+
max = reg.end if reg.end > max
|
148
|
+
end
|
139
149
|
end
|
150
|
+
|
140
151
|
reg = Bio::DB::Fasta::Region.new()
|
141
152
|
reg.entry = gene
|
142
153
|
reg.orientation = :forward
|
@@ -168,24 +179,6 @@ module Bio::PolyploidTools
|
|
168
179
|
pos + left_padding
|
169
180
|
end
|
170
181
|
|
171
|
-
def exon_fasta_string
|
172
|
-
gene_region = self.covered_region
|
173
|
-
local_pos_in_gene = self.local_position
|
174
|
-
ret_str = ""
|
175
|
-
container.parents.each do |name, bam|
|
176
|
-
ret_str << ">#{gene_region.entry}-#{self.position}_#{name} Overlapping_exons:#{gene_region.to_s} localSNPpo:#{local_pos_in_gene+1}\n"
|
177
|
-
to_print = parental_sequences[name]
|
178
|
-
ret_str << to_print << "\n"
|
179
|
-
end
|
180
|
-
self.exon_sequences.each do | chromosome, exon_seq |
|
181
|
-
ret_str << ">#{chromosome}\n#{exon_seq}\n"
|
182
|
-
end
|
183
|
-
mask = masked_chromosomal_snps("1BS", flanking_size)
|
184
|
-
ret_str << ">Mask\n#{mask}\n"
|
185
|
-
ret_str
|
186
|
-
end
|
187
|
-
|
188
|
-
|
189
182
|
def primer_fasta_string
|
190
183
|
gene_region = self.covered_region
|
191
184
|
local_pos_in_gene = self.local_position
|
@@ -209,12 +202,15 @@ module Bio::PolyploidTools
|
|
209
202
|
end
|
210
203
|
|
211
204
|
def primer_region(target_chromosome, parental )
|
205
|
+
|
212
206
|
parental = aligned_sequences[parental].downcase
|
207
|
+
names = aligned_sequences.keys
|
208
|
+
target_chromosome = get_target_sequence(names, target_chromosome)
|
209
|
+
|
213
210
|
chromosome_seq = aligned_sequences[target_chromosome]
|
214
211
|
chromosome_seq = "-" * parental.size unless chromosome_seq
|
215
212
|
chromosome_seq = chromosome_seq.downcase
|
216
213
|
mask = mask_aligned_chromosomal_snp(target_chromosome)
|
217
|
-
#puts "'#{mask}'"
|
218
214
|
|
219
215
|
pr = PrimerRegion.new
|
220
216
|
position_in_region = 0
|
@@ -291,8 +287,9 @@ module Bio::PolyploidTools
|
|
291
287
|
|
292
288
|
end
|
293
289
|
|
294
|
-
|
295
|
-
|
290
|
+
#puts "__"
|
291
|
+
#puts self.inspect
|
292
|
+
str = "SEQUENCE_ID=#{opts[:name]} #{orientation} \n"
|
296
293
|
str << "SEQUENCE_FORCE_LEFT_END=#{left}\n" unless opts[:extra_f]
|
297
294
|
str << "SEQUENCE_FORCE_RIGHT_END=#{right}\n" if opts[:right_pos]
|
298
295
|
str << extra if extra
|
@@ -326,10 +323,10 @@ module Bio::PolyploidTools
|
|
326
323
|
primer_3_propertes = Array.new
|
327
324
|
|
328
325
|
seq_original = String.new(pr.sequence)
|
329
|
-
puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
|
326
|
+
#puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
|
330
327
|
return primer_3_propertes if seq_original.size < primer_3_min_seq_length
|
331
328
|
#puts self.inspect
|
332
|
-
puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
|
329
|
+
#puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
|
333
330
|
|
334
331
|
seq_original[pr.snp_pos] = self.original
|
335
332
|
seq_original_reverse = reverse_complement_string(seq_original)
|
@@ -432,12 +429,13 @@ module Bio::PolyploidTools
|
|
432
429
|
|
433
430
|
seq[local_pos_in_gene] = self.snp if name == self.snp_in
|
434
431
|
@parental_sequences [name] = seq
|
435
|
-
puts name
|
436
|
-
puts seq
|
437
432
|
end
|
438
433
|
@parental_sequences
|
439
434
|
end
|
440
435
|
|
436
|
+
|
437
|
+
|
438
|
+
|
441
439
|
def surrounding_parental_sequences
|
442
440
|
return @surrounding_parental_sequences if @surrounding_parental_sequences
|
443
441
|
gene_region = self.covered_region
|
@@ -450,11 +448,15 @@ module Bio::PolyploidTools
|
|
450
448
|
seq = bam.consensus_with_ambiguities({:region=>gene_region}).to_s
|
451
449
|
else
|
452
450
|
seq = container.gene_model_sequence(gene_region)
|
453
|
-
|
454
|
-
|
455
|
-
|
456
|
-
|
457
|
-
|
451
|
+
#puts "#{name} #{self.snp_in}"
|
452
|
+
#puts "Modifing original: #{name}\n#{seq}"
|
453
|
+
unless name == self.snp_in
|
454
|
+
|
455
|
+
seq[local_pos_in_gene] = self.original
|
456
|
+
else
|
457
|
+
seq[local_pos_in_gene] = self.snp
|
458
|
+
end
|
459
|
+
#puts "#{seq}"
|
458
460
|
end
|
459
461
|
seq[local_pos_in_gene] = seq[local_pos_in_gene].upcase
|
460
462
|
seq[local_pos_in_gene] = self.snp if name == self.snp_in
|
@@ -522,71 +524,101 @@ module Bio::PolyploidTools
|
|
522
524
|
ret_str
|
523
525
|
end
|
524
526
|
|
527
|
+
|
528
|
+
def get_snp_position_after_trim
|
529
|
+
local_pos_in_gene = self.local_position
|
530
|
+
ideal_min = self.local_position - flanking_size
|
531
|
+
ideal_max = self.local_position + flanking_size
|
532
|
+
left_pad = 0
|
533
|
+
if ideal_min < 0
|
534
|
+
left_pad = ideal_min * -1
|
535
|
+
ideal_min = 0
|
536
|
+
end
|
537
|
+
local_pos_in_gene - ideal_min
|
538
|
+
end
|
539
|
+
|
525
540
|
def aligned_snp_position
|
526
541
|
return @aligned_snp_position if @aligned_snp_position
|
542
|
+
#puts self.inspect
|
527
543
|
pos = -1
|
528
544
|
parental_strings = Array.new
|
529
545
|
parental_sequences.keys.each do | par |
|
530
|
-
|
531
546
|
parental_strings << aligned_sequences[par]
|
532
547
|
end
|
533
|
-
template_sequence = nil
|
534
|
-
aligned_sequences.keys.each do |temp |
|
535
|
-
template_sequence = aligned_sequences[ temp ] if aligned_sequences[ temp ][0] != "-"
|
536
|
-
end
|
537
548
|
$stderr.puts "WARN: #{self.to_s} #{parental_sequences.keys} is not of size 2 (#{parental_strings.size})" if parental_strings.size != 2
|
538
549
|
|
550
|
+
local_pos_in_parental = get_snp_position_after_trim
|
539
551
|
i = 0
|
540
|
-
differences = 0
|
541
|
-
local_pos_in_gene = flanking_size
|
542
|
-
local_pos = 0
|
543
|
-
started = false
|
544
|
-
#TODO: Validate the cases when the alignment has padding on the left on all the chromosomes
|
545
|
-
#unless parental_strings[0]
|
546
|
-
#puts "parental hash: #{parental_sequences}"
|
547
|
-
#puts "Aligned sequences: #{aligned_sequences.to_fasta}"
|
548
|
-
# puts "parental_strings: #{parental_strings.to_s}"
|
549
|
-
#end
|
550
552
|
while i < parental_strings[0].size do
|
551
|
-
if
|
553
|
+
if local_pos_in_parental == 0 and parental_strings[0][i] != "-"
|
552
554
|
pos = i
|
553
555
|
if parental_strings[0][i] == parental_strings[1][i]
|
554
556
|
$stderr.puts "WARN: #{self.to_s} doesn't have a SNP in the marked place (#{i})! \n#{parental_strings[0]}\n#{parental_strings[1]}"
|
555
557
|
end
|
556
|
-
|
557
|
-
end
|
558
|
-
|
559
|
-
started = true if template_sequence[i] != "-"
|
560
|
-
if started == false or template_sequence[i] != "-"
|
561
|
-
local_pos += 1
|
562
558
|
end
|
559
|
+
|
560
|
+
local_pos_in_parental -= 1 if parental_strings[0][i] != "-"
|
563
561
|
i += 1
|
564
562
|
end
|
565
563
|
@aligned_snp_position = pos
|
566
564
|
return pos
|
567
565
|
end
|
568
566
|
|
567
|
+
def get_target_sequence(names, chromosome)
|
568
|
+
|
569
|
+
best = chromosome
|
570
|
+
best_score = 0
|
571
|
+
names.each do |e|
|
572
|
+
arr = e.split("_")
|
573
|
+
if arr.length == 3
|
574
|
+
score = arr[2].to_f
|
575
|
+
if score >best_score
|
576
|
+
best_score = score
|
577
|
+
best = e
|
578
|
+
end
|
579
|
+
end
|
580
|
+
end
|
581
|
+
best
|
582
|
+
end
|
583
|
+
|
584
|
+
|
585
|
+
|
569
586
|
def mask_aligned_chromosomal_snp(chromosome)
|
570
|
-
names =
|
587
|
+
names = aligned_sequences.keys
|
571
588
|
parentals = parental_sequences.keys
|
572
589
|
|
590
|
+
position_after_trim = get_snp_position_after_trim
|
591
|
+
|
592
|
+
names = names - parentals
|
573
593
|
local_pos_in_gene = aligned_snp_position
|
574
|
-
|
575
|
-
|
594
|
+
|
595
|
+
best_target = get_target_sequence(names, chromosome)
|
596
|
+
masked_snps = aligned_sequences[best_target].downcase if aligned_sequences[best_target]
|
597
|
+
masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[best_target]
|
576
598
|
#TODO: Make this chromosome specific, even when we have more than one alignment going to the region we want.
|
599
|
+
#puts "mask_aligned_chromosomal_snp(#{chromosome})"
|
600
|
+
#puts names
|
577
601
|
i = 0
|
578
|
-
|
602
|
+
for i in 0..masked_snps.size-1
|
603
|
+
#puts i
|
579
604
|
different = 0
|
580
605
|
cov = 0
|
581
606
|
from_group = 0
|
582
607
|
nCount = 0
|
608
|
+
seen = []
|
583
609
|
names.each do | chr |
|
584
610
|
if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
|
611
|
+
#puts aligned_sequences[chr][i]
|
585
612
|
cov += 1
|
586
613
|
nCount += 1 if aligned_sequences[chr][i] == 'N' or aligned_sequences[chr][i] == 'n' # maybe fix this to use ambiguity codes instead.
|
587
|
-
|
614
|
+
|
615
|
+
if chr[0] == chromosome_group and not seen.include? chr[1]
|
616
|
+
seen << chr[1]
|
617
|
+
from_group += 1
|
618
|
+
|
619
|
+
end
|
588
620
|
#puts "Comparing #{chromosome_group} and #{chr[0]} as chromosomes"
|
589
|
-
if chr !=
|
621
|
+
if chr != best_target
|
590
622
|
$stderr.puts "WARN: No base for #{masked_snps} : ##{i}" unless masked_snps[i].upcase
|
591
623
|
$stderr.puts "WARN: No base for #{aligned_sequences[chr]} : ##{i}" unless masked_snps[i].upcase
|
592
624
|
different += 1 if masked_snps[i].upcase != aligned_sequences[chr][i].upcase
|
@@ -598,12 +630,15 @@ module Bio::PolyploidTools
|
|
598
630
|
masked_snps[i] = "-" if nCount > 0
|
599
631
|
masked_snps[i] = "*" if cov == 0
|
600
632
|
expected_snps = names.size - 1
|
601
|
-
|
633
|
+
|
634
|
+
#puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
|
602
635
|
|
603
636
|
masked_snps[i] = masked_snps[i].upcase if different == expected_snps and from_group == genomes_count
|
637
|
+
#puts "#{i}:#{masked_snps[i]}"
|
604
638
|
|
605
639
|
if i == local_pos_in_gene
|
606
640
|
masked_snps[i] = "&"
|
641
|
+
#puts "#{i}:#{masked_snps[i]}___"
|
607
642
|
bases = ""
|
608
643
|
names.each do | chr |
|
609
644
|
bases << aligned_sequences[chr][i] if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
|
@@ -617,62 +652,22 @@ module Bio::PolyploidTools
|
|
617
652
|
end
|
618
653
|
|
619
654
|
end
|
620
|
-
i += 1
|
621
|
-
end
|
622
|
-
masked_snps
|
623
|
-
end
|
624
|
-
|
625
|
-
def masked_chromosomal_snps(chromosome)
|
626
|
-
chromosomes = exon_sequences
|
627
|
-
names = chromosomes.keys
|
628
|
-
masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
|
629
|
-
masked_snps = "-" * covered_region.size unless chromosomes[chromosome]
|
630
|
-
local_pos_in_gene = self.local_position
|
631
|
-
ideal_min = local_pos_in_gene - flanking_size
|
632
|
-
ideal_max = local_pos_in_gene + flanking_size
|
633
|
-
i = 0
|
634
|
-
while i < masked_snps.size do
|
635
|
-
if i > ideal_min and i <= ideal_max
|
636
|
-
|
637
|
-
different = 0
|
638
|
-
cov = 0
|
639
|
-
names.each do | chr |
|
640
|
-
if chromosomes[chr][i] != "-"
|
641
|
-
cov += 1
|
642
|
-
if chr != chromosome and masked_snps[i] != "+"
|
643
|
-
different += 1 if masked_snps[i] != chromosomes[chr][i]
|
644
|
-
end
|
645
|
-
end
|
646
|
-
|
647
|
-
end
|
648
|
-
masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
|
649
|
-
masked_snps[i] = "-" if cov < 2
|
650
|
-
masked_snps[i] = masked_snps[i].upcase if different > 1
|
651
|
-
|
652
|
-
else
|
653
|
-
masked_snps[i] = "*"
|
654
|
-
end
|
655
|
-
if i == local_pos_in_gene
|
656
|
-
masked_snps[i] = "&"
|
657
|
-
end
|
658
|
-
i += 1
|
655
|
+
#i += 1
|
659
656
|
end
|
660
657
|
masked_snps
|
661
658
|
end
|
662
659
|
|
660
|
+
|
663
661
|
def surrounding_masked_chromosomal_snps(chromosome)
|
664
662
|
|
665
663
|
chromosomes = surrounding_exon_sequences
|
666
664
|
names = chromosomes.keys
|
665
|
+
get_target_sequence(names)
|
667
666
|
masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
|
668
667
|
masked_snps = "-" * (flanking_size * 2 ) unless chromosomes[chromosome]
|
669
668
|
local_pos_in_gene = flanking_size
|
670
|
-
# ideal_min = local_pos_in_gene - flanking_size
|
671
|
-
#ideal_max = local_pos_in_gene + flanking_size
|
672
669
|
i = 0
|
673
670
|
while i < masked_snps.size do
|
674
|
-
|
675
|
-
|
676
671
|
different = 0
|
677
672
|
cov = 0
|
678
673
|
names.each do | chr |
|
@@ -682,13 +677,11 @@ module Bio::PolyploidTools
|
|
682
677
|
different += 1 if masked_snps[i] != chromosomes[chr][i]
|
683
678
|
end
|
684
679
|
end
|
685
|
-
|
686
680
|
end
|
687
681
|
masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
|
688
682
|
masked_snps[i] = "-" if cov < 2
|
689
683
|
masked_snps[i] = masked_snps[i].upcase if different > 1
|
690
684
|
|
691
|
-
|
692
685
|
if i == local_pos_in_gene
|
693
686
|
masked_snps[i] = "&"
|
694
687
|
end
|
@@ -699,18 +692,19 @@ module Bio::PolyploidTools
|
|
699
692
|
|
700
693
|
def surrounding_exon_sequences
|
701
694
|
return @surrounding_exon_sequences if @surrounding_exon_sequences
|
695
|
+
gene_region = self.covered_region
|
702
696
|
@surrounding_exon_sequences = Bio::Alignment::SequenceHash.new
|
703
|
-
self.exon_list.each do |chromosome,
|
704
|
-
|
705
|
-
|
706
|
-
|
707
|
-
|
708
|
-
|
709
|
-
|
710
|
-
|
711
|
-
|
712
|
-
|
713
|
-
|
697
|
+
self.exon_list.each do |chromosome, exon_arr|
|
698
|
+
exon_arr.each do |exon|
|
699
|
+
exon_start_offset = exon.query_region.start - gene_region.start
|
700
|
+
flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
|
701
|
+
#TODO: Padd when the exon goes over the regions...
|
702
|
+
#puts flanquing_region.inspect
|
703
|
+
#Ignoring when the exon is in a gap
|
704
|
+
unless exon.snp_in_gap
|
705
|
+
exon_seq = container.chromosome_sequence(flanquing_region)
|
706
|
+
@surrounding_exon_sequences["#{chromosome}_#{flanquing_region.start}_#{exon.record.score}"] = exon_seq
|
707
|
+
end
|
714
708
|
end
|
715
709
|
end
|
716
710
|
@surrounding_exon_sequences
|
@@ -722,18 +716,21 @@ module Bio::PolyploidTools
|
|
722
716
|
gene_region = self.covered_region
|
723
717
|
local_pos_in_gene = self.local_position
|
724
718
|
@exon_sequences = Bio::Alignment::SequenceHash.new
|
725
|
-
self.exon_list.each do |chromosome,
|
726
|
-
|
727
|
-
|
728
|
-
|
729
|
-
|
730
|
-
|
731
|
-
|
732
|
-
|
733
|
-
|
734
|
-
|
735
|
-
|
736
|
-
|
719
|
+
self.exon_list.each do |chromosome, exon_arr|
|
720
|
+
exon_arr.each do |exon|
|
721
|
+
exon_start_offset = exon.query_region.start - gene_region.start
|
722
|
+
exon_seq = "-" * exon_start_offset
|
723
|
+
exon_seq << container.chromosome_sequence(exon.target_region).to_s
|
724
|
+
#puts exon_seq
|
725
|
+
#l_pos = exon_start_offset + local_pos_in_gene
|
726
|
+
unless exon.snp_in_gap
|
727
|
+
#puts "local position: #{local_pos_in_gene}"
|
728
|
+
#puts "Exon_seq: #{exon_seq}"
|
729
|
+
exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
|
730
|
+
exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
|
731
|
+
#puts exon.inspect
|
732
|
+
@exon_sequences["#{chromosome}_#{exon.query_region.start}_#{exon.record.score}"] = exon_seq
|
733
|
+
end
|
737
734
|
end
|
738
735
|
end
|
739
736
|
@exon_sequences[@chromosome] = "-" * gene_region.size unless @exon_sequences[@chromosome]
|
data/lib/bio/db/primer3.rb
CHANGED
@@ -113,6 +113,8 @@ module Bio::DB::Primer3
|
|
113
113
|
right_start = 0
|
114
114
|
right_end = 0
|
115
115
|
total_columns_before_messages=17
|
116
|
+
#puts "Values in primer3"
|
117
|
+
#puts snp_from.inspect
|
116
118
|
@values = Array.new
|
117
119
|
#@values << "#{gene},,#{template_length},"
|
118
120
|
@values << gene
|
@@ -763,7 +765,7 @@ module Bio::DB::Primer3
|
|
763
765
|
snp.line_1 = @line_1
|
764
766
|
snp.line_2 = @line_2
|
765
767
|
snp.snp_from = snp_in
|
766
|
-
snp.regions = snp_in.exon_list.values.collect { |x|
|
768
|
+
snp.regions = snp_in.exon_list.values.collect { |x| x.collect {|y| y.target_region.to_s }}
|
767
769
|
@snp_hash[snp.to_s] = snp
|
768
770
|
snp
|
769
771
|
end
|
Binary file
|
Binary file
|
Binary file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-polyploid-tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.8.
|
4
|
+
version: 0.8.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ricardo H. Ramirez-Gonzalez
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-01-
|
11
|
+
date: 2018-01-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bio
|
@@ -206,6 +206,10 @@ files:
|
|
206
206
|
- test/data/BS00068396_51_contigs.aln
|
207
207
|
- test/data/BS00068396_51_contigs.dnd
|
208
208
|
- test/data/BS00068396_51_contigs.fa
|
209
|
+
- test/data/BS00068396_51_contigs.fa.fai
|
210
|
+
- test/data/BS00068396_51_contigs.fa.nhr
|
211
|
+
- test/data/BS00068396_51_contigs.fa.nin
|
212
|
+
- test/data/BS00068396_51_contigs.fa.nsq
|
209
213
|
- test/data/BS00068396_51_contigs.nhr
|
210
214
|
- test/data/BS00068396_51_contigs.nin
|
211
215
|
- test/data/BS00068396_51_contigs.nsq
|