bio-polyploid-tools 0.8.0 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +31 -17
- data/VERSION +1 -1
- data/bin/homokaryot_primers.rb +1 -1
- data/bin/polymarker.rb +6 -1
- data/bin/snp_position_to_polymarker.rb +16 -6
- data/bio-polyploid-tools.gemspec +7 -3
- data/lib/bio/PolyploidTools/ExonContainer.rb +6 -6
- data/lib/bio/PolyploidTools/SNP.rb +137 -140
- data/lib/bio/PolyploidTools/SNPMutant.rb +0 -1
- data/lib/bio/PolyploidTools/SNPSequence.rb +1 -1
- data/lib/bio/db/primer3.rb +3 -1
- data/test/data/BS00068396_51_contigs.fa.fai +4 -0
- data/test/data/BS00068396_51_contigs.fa.nhr +0 -0
- data/test/data/BS00068396_51_contigs.fa.nin +0 -0
- data/test/data/BS00068396_51_contigs.fa.nsq +0 -0
- metadata +6 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 5cc3c126779f27e61f521959b82d13f240cb2ce8d5c5416e511f9150ced798eb
|
4
|
+
data.tar.gz: 13adf99f1336327f7b399057d66ee63892ee276ee507a398fb4a6936a2a765a2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: d345c23216e2d6aa3174885a053300ec499230125625d36fe1b5efc8de3d151b5423ad8ed5e0c563b2c9be5b07df29a759a9f95ee1196588eefa0ab2e40ec802
|
7
|
+
data.tar.gz: 1ed47854ec04f95c7d8449e1a15885a67f00c076b4bc07705616d6762ad7cf8822d60933cb737c71b95cbfb46293b4cb970e3d2f6413ccde70ca8e0f372e2ab4
|
data/README.md
CHANGED
@@ -1,10 +1,12 @@
|
|
1
|
-
#bio-polyploid-tools
|
1
|
+
# bio-polyploid-tools
|
2
|
+
|
3
|
+
## Introduction
|
2
4
|
|
3
|
-
##Introduction
|
4
5
|
This tools are designed to deal with polyploid wheat. The first tool is to design KASP primers, making them as specific as possible.
|
5
6
|
|
6
7
|
|
7
|
-
##Installation
|
8
|
+
## Installation
|
9
|
+
|
8
10
|
```sh
|
9
11
|
gem install bio-polyploid-tools
|
10
12
|
```
|
@@ -13,13 +15,19 @@ You need to have in your ```$PATH``` the following programs:
|
|
13
15
|
* [MAFFT](http://mafft.cbrc.jp/alignment/software/)
|
14
16
|
* [primer3](http://primer3.sourceforge.net/releases.php)
|
15
17
|
* [exonerate](http://www.ebi.ac.uk/~guy/exonerate/)
|
18
|
+
* [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE%3DBlastDocs&DOC_TYPE%3DDownload)
|
16
19
|
|
17
|
-
The code
|
20
|
+
The code was originally developed on ruby 2.1.0, but it should work on 1.9.3 and above. However, it is only actively tested in currently supported ruby versions:
|
21
|
+
|
22
|
+
* 2.1.10
|
23
|
+
* 2.2.5
|
24
|
+
* 2.3.5
|
25
|
+
* 2.4.2
|
18
26
|
|
19
|
-
#PolyMarker
|
20
27
|
|
21
|
-
|
28
|
+
# PolyMarker
|
22
29
|
|
30
|
+
To run PolyMarker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
|
23
31
|
|
24
32
|
|
25
33
|
```sh
|
@@ -80,7 +88,7 @@ If the flanking sequence is unknow, but the position on a reference is available
|
|
80
88
|
* **alternative allele** The base in the alternative allele.
|
81
89
|
* **target chromosome** for the specific primers. Must be in line with the chromosome selection critieria.
|
82
90
|
|
83
|
-
####Example
|
91
|
+
#### Example
|
84
92
|
|
85
93
|
```
|
86
94
|
IWGSC_CSS_1AL_scaff_110,C,519,A,2A
|
@@ -89,7 +97,8 @@ IWGSC_CSS_1AL_scaff_110,C,519,A,2A
|
|
89
97
|
This file format can be used with ```snp_positions_to_polymarker.rb``` to produce the input for the option```--marker_list```.
|
90
98
|
|
91
99
|
|
92
|
-
###Custom reference sequences.
|
100
|
+
### Custom reference sequences.
|
101
|
+
|
93
102
|
By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
94
103
|
) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
|
95
104
|
|
@@ -117,33 +126,38 @@ To use blast instead of exonerate, use the following command:
|
|
117
126
|
```
|
118
127
|
|
119
128
|
|
120
|
-
##Release Notes
|
129
|
+
## Release Notes
|
130
|
+
|
131
|
+
### 0.8
|
132
|
+
|
133
|
+
* FEATURE: ```polymarker.rb``` added the flag ```--aligner blast|exonerate ``` which lets you pick between ```blast``` or ```exonerate``` as the aligner. For blast the default is to have the database with the same name as the ```--contigs``` file. However, it is possible to use a different name vua the option ```--database```.
|
134
|
+
|
135
|
+
### 0.7.3
|
121
136
|
|
122
|
-
###0.7.3
|
123
137
|
* FEATURE: ```polymarker.rb``` Added to the flag ```--arm_selection``` the option ```scaffold```, which now supports a scaffold specific primer.
|
124
138
|
* FEATURE: ```snp_position_to_polymarker``` Added the option ```--mutant_list``` to prepare files for PolyMarker from files with the following columns ```ID,Allele_1,position,Allele_1,target_chromosome```.
|
125
139
|
|
126
|
-
###0.7.2
|
140
|
+
### 0.7.2
|
141
|
+
|
127
142
|
* FEATURE: Added a flag ```min_identity``` to set the minimum identity to consider a hit. The default is 90
|
128
143
|
|
129
|
-
###0.7.1
|
144
|
+
### 0.7.1
|
130
145
|
* BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
|
131
146
|
* DOC: Added documentation on how to use custom references.
|
132
147
|
|
133
|
-
###0.7.0
|
148
|
+
### 0.7.0
|
134
149
|
* Added flag ```genomes_count``` for number of genomes, to be used on tetraploids, etc.
|
135
150
|
|
136
|
-
###0.6.1
|
151
|
+
### 0.6.1
|
137
152
|
|
138
153
|
|
139
154
|
* polymarker.rb now validates that all the files exist.
|
140
155
|
* BUGFIX: A reference was required even when it was not used to generate contigs.
|
141
156
|
|
142
|
-
#Notes
|
143
|
-
|
157
|
+
# Notes
|
144
158
|
|
145
|
-
* If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
146
159
|
|
160
|
+
* BUG: If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
147
161
|
* BUG: Blocks with NNNs are picked and treated as semi-specific.
|
148
162
|
* BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
149
163
|
* TODO: Add a parameter file to configure the alignments.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.8.
|
1
|
+
0.8.1
|
data/bin/homokaryot_primers.rb
CHANGED
@@ -180,4 +180,4 @@ end
|
|
180
180
|
|
181
181
|
kasp_container.add_primers_file(primer_3_output)
|
182
182
|
header = "Marker,SNP,RegionSize,SNP_type,#{snp_in},#{original_name},common,primer_type,orientation,#{snp_in}_TM,#{original_name}_TM,common_TM,selected_from,product_size"
|
183
|
-
File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
|
183
|
+
File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
|
data/bin/polymarker.rb
CHANGED
@@ -12,6 +12,11 @@ require path
|
|
12
12
|
|
13
13
|
arm_selection_functions = Hash.new;
|
14
14
|
|
15
|
+
arm_selection_functions[:arm_selection_nrgenes] = lambda do | contig_name |
|
16
|
+
#example format: chr2A
|
17
|
+
ret = contig_name[3,2]
|
18
|
+
return ret
|
19
|
+
end
|
15
20
|
|
16
21
|
arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
|
17
22
|
contig_name.gsub!(/chr/,"")
|
@@ -417,4 +422,4 @@ rescue StandardError => e
|
|
417
422
|
rescue Exception => e
|
418
423
|
write_status "ERROR\t#{e.message}"
|
419
424
|
raise e
|
420
|
-
end
|
425
|
+
end
|
@@ -71,16 +71,26 @@ File.open(test_file) do | f |
|
|
71
71
|
snp = Bio::PolyploidTools::SNPMutant.parse(line)
|
72
72
|
entry = fasta_reference_db.index.region_for_entry(snp.contig)
|
73
73
|
end
|
74
|
-
|
74
|
+
#puts line
|
75
75
|
if entry
|
76
76
|
region = entry.get_full_region
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
77
|
+
snp_name = snp.snp_id_in_seq
|
78
|
+
|
79
|
+
#if region != lastRegion
|
80
|
+
# lastTemplate = fasta_reference_db.fetch_sequence(region)
|
81
|
+
#end
|
82
|
+
start, total, new_position = snp.to_polymarker_coordinates(options[:flanking_size])
|
83
|
+
region.start = start
|
84
|
+
region.end = start + total
|
85
|
+
#puts region
|
86
|
+
local_template = fasta_reference_db.fetch_sequence(region)
|
87
|
+
|
88
|
+
snp.position = new_position
|
89
|
+
|
90
|
+
snp.template_sequence = local_template
|
81
91
|
lastRegion = region
|
82
92
|
|
83
|
-
out.puts "#{snp.gene}_#{
|
93
|
+
out.puts "#{snp.gene}_#{snp_name},#{snp.chromosome},#{snp.to_polymarker_sequence(options[:flanking_size])}"
|
84
94
|
else
|
85
95
|
$stderr.puts "ERROR: Unable to find entry for #{snp.gene}"
|
86
96
|
end
|
data/bio-polyploid-tools.gemspec
CHANGED
@@ -2,16 +2,16 @@
|
|
2
2
|
# DO NOT EDIT THIS FILE DIRECTLY
|
3
3
|
# Instead, edit Juwelier::Tasks in Rakefile, and run 'rake gemspec'
|
4
4
|
# -*- encoding: utf-8 -*-
|
5
|
-
# stub: bio-polyploid-tools 0.8.
|
5
|
+
# stub: bio-polyploid-tools 0.8.1 ruby lib
|
6
6
|
|
7
7
|
Gem::Specification.new do |s|
|
8
8
|
s.name = "bio-polyploid-tools".freeze
|
9
|
-
s.version = "0.8.
|
9
|
+
s.version = "0.8.1"
|
10
10
|
|
11
11
|
s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
|
12
12
|
s.require_paths = ["lib".freeze]
|
13
13
|
s.authors = ["Ricardo H. Ramirez-Gonzalez".freeze]
|
14
|
-
s.date = "2018-01-
|
14
|
+
s.date = "2018-01-19"
|
15
15
|
s.description = "Repository of tools developed at Crop Genetics in JIC to work with polyploid wheat".freeze
|
16
16
|
s.email = "ricardo.ramirez-gonzalez@jic.ac.uk".freeze
|
17
17
|
s.executables = ["bfr.rb".freeze, "blast_triads.rb".freeze, "blast_triads_promoters.rb".freeze, "count_variations.rb".freeze, "filter_blat_by_target_coverage.rb".freeze, "filter_exonerate_by_identity.rb".freeze, "find_best_blat_hit.rb".freeze, "find_best_exonerate.rb".freeze, "find_homoeologue_variations.rb".freeze, "get_longest_hsp_blastx_triads.rb".freeze, "hexaploid_primers.rb".freeze, "homokaryot_primers.rb".freeze, "mafft_triads.rb".freeze, "mafft_triads_promoters.rb".freeze, "map_markers_to_contigs.rb".freeze, "markers_in_region.rb".freeze, "polymarker.rb".freeze, "polymarker_capillary.rb".freeze, "snp_position_to_polymarker.rb".freeze, "snps_between_bams.rb".freeze, "vcfLineToTable.rb".freeze]
|
@@ -102,6 +102,10 @@ Gem::Specification.new do |s|
|
|
102
102
|
"test/data/BS00068396_51_contigs.aln",
|
103
103
|
"test/data/BS00068396_51_contigs.dnd",
|
104
104
|
"test/data/BS00068396_51_contigs.fa",
|
105
|
+
"test/data/BS00068396_51_contigs.fa.fai",
|
106
|
+
"test/data/BS00068396_51_contigs.fa.nhr",
|
107
|
+
"test/data/BS00068396_51_contigs.fa.nin",
|
108
|
+
"test/data/BS00068396_51_contigs.fa.nsq",
|
105
109
|
"test/data/BS00068396_51_contigs.nhr",
|
106
110
|
"test/data/BS00068396_51_contigs.nin",
|
107
111
|
"test/data/BS00068396_51_contigs.nsq",
|
@@ -116,13 +116,13 @@ module Bio::PolyploidTools
|
|
116
116
|
target_region = exon.target_region
|
117
117
|
exon_start_offset = exon.query_region.start - gene_region.start
|
118
118
|
chr_local_pos=local_pos_in_gene + target_region.start + 1
|
119
|
-
ret_str
|
120
|
-
to_print =
|
121
|
-
chr_seq
|
122
|
-
l_pos
|
123
|
-
to_print <<
|
119
|
+
ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
|
120
|
+
to_print = "-" * exon_start_offset
|
121
|
+
chr_seq = chromosome_sequence(exon.target_region).to_s
|
122
|
+
l_pos = exon_start_offset + local_pos_in_gene
|
123
|
+
to_print << chr_seq
|
124
124
|
to_print[local_pos_in_gene] = to_print[local_pos_in_gene].upcase
|
125
|
-
ret_str
|
125
|
+
ret_str << to_print
|
126
126
|
end
|
127
127
|
ret_str
|
128
128
|
end
|
@@ -16,6 +16,8 @@ module Bio::PolyploidTools
|
|
16
16
|
attr_accessor :chromosome
|
17
17
|
attr_accessor :variation_free_region
|
18
18
|
|
19
|
+
|
20
|
+
|
19
21
|
#Format:
|
20
22
|
#Gene_name,Original,SNP_Pos,pos,chromosome
|
21
23
|
#A_comp0_c0_seq1,C,519,A,2A
|
@@ -30,7 +32,7 @@ module Bio::PolyploidTools
|
|
30
32
|
snp.snp.upcase!
|
31
33
|
snp.snp.strip!
|
32
34
|
snp.chromosome.strip!
|
33
|
-
|
35
|
+
|
34
36
|
snp.use_reference = false
|
35
37
|
snp
|
36
38
|
end
|
@@ -60,6 +62,16 @@ module Bio::PolyploidTools
|
|
60
62
|
@primer_3_min_seq_length = 50
|
61
63
|
@variation_free_region = 0
|
62
64
|
@contig = false
|
65
|
+
@exon_list = Hash.new {|hsh, key| hsh[key] = [] }
|
66
|
+
end
|
67
|
+
|
68
|
+
def to_polymarker_coordinates(flanking_size, total:nil)
|
69
|
+
start = position - flanking_size + 1
|
70
|
+
start = 0 if start < 0
|
71
|
+
total = flanking_size * 2 unless total
|
72
|
+
total += 1
|
73
|
+
new_position = position - start + 2
|
74
|
+
[start , total, new_position ]
|
63
75
|
end
|
64
76
|
|
65
77
|
def to_polymarker_sequence(flanking_size, total:nil)
|
@@ -103,8 +115,7 @@ module Bio::PolyploidTools
|
|
103
115
|
end
|
104
116
|
|
105
117
|
def add_exon(exon, arm)
|
106
|
-
|
107
|
-
@exon_list[arm] = exon if exon.record.score > @exon_list[arm].record.score
|
118
|
+
exon_list[arm] << exon
|
108
119
|
end
|
109
120
|
|
110
121
|
def covered_region
|
@@ -115,28 +126,28 @@ module Bio::PolyploidTools
|
|
115
126
|
reg.orientation = :forward
|
116
127
|
reg.start = self.position - self.flanking_size
|
117
128
|
reg.end = self.position + self.flanking_size
|
118
|
-
|
119
129
|
reg.start = 1 if reg.start < 1
|
120
|
-
|
121
130
|
return reg
|
122
131
|
end
|
123
132
|
|
124
133
|
min = @position
|
125
134
|
max = @position
|
126
|
-
|
127
|
-
|
128
|
-
#raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
|
135
|
+
# puts "Calculating covered region for #{self.inspect}"
|
136
|
+
# puts "#{@exon_list.inspect}"
|
137
|
+
# raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
|
129
138
|
if @exon_list.size == 0
|
130
139
|
min = self.position - self.flanking_size
|
131
140
|
min = 1 if min < 1
|
132
141
|
max = self.position + self.flanking_size
|
133
142
|
end
|
134
|
-
@exon_list.each do | chromosome,
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
143
|
+
@exon_list.each do | chromosome, exon_arr |
|
144
|
+
exon_arr.each do | exon |
|
145
|
+
reg = exon.query_region
|
146
|
+
min = reg.start if reg.start < min
|
147
|
+
max = reg.end if reg.end > max
|
148
|
+
end
|
139
149
|
end
|
150
|
+
|
140
151
|
reg = Bio::DB::Fasta::Region.new()
|
141
152
|
reg.entry = gene
|
142
153
|
reg.orientation = :forward
|
@@ -168,24 +179,6 @@ module Bio::PolyploidTools
|
|
168
179
|
pos + left_padding
|
169
180
|
end
|
170
181
|
|
171
|
-
def exon_fasta_string
|
172
|
-
gene_region = self.covered_region
|
173
|
-
local_pos_in_gene = self.local_position
|
174
|
-
ret_str = ""
|
175
|
-
container.parents.each do |name, bam|
|
176
|
-
ret_str << ">#{gene_region.entry}-#{self.position}_#{name} Overlapping_exons:#{gene_region.to_s} localSNPpo:#{local_pos_in_gene+1}\n"
|
177
|
-
to_print = parental_sequences[name]
|
178
|
-
ret_str << to_print << "\n"
|
179
|
-
end
|
180
|
-
self.exon_sequences.each do | chromosome, exon_seq |
|
181
|
-
ret_str << ">#{chromosome}\n#{exon_seq}\n"
|
182
|
-
end
|
183
|
-
mask = masked_chromosomal_snps("1BS", flanking_size)
|
184
|
-
ret_str << ">Mask\n#{mask}\n"
|
185
|
-
ret_str
|
186
|
-
end
|
187
|
-
|
188
|
-
|
189
182
|
def primer_fasta_string
|
190
183
|
gene_region = self.covered_region
|
191
184
|
local_pos_in_gene = self.local_position
|
@@ -209,12 +202,15 @@ module Bio::PolyploidTools
|
|
209
202
|
end
|
210
203
|
|
211
204
|
def primer_region(target_chromosome, parental )
|
205
|
+
|
212
206
|
parental = aligned_sequences[parental].downcase
|
207
|
+
names = aligned_sequences.keys
|
208
|
+
target_chromosome = get_target_sequence(names, target_chromosome)
|
209
|
+
|
213
210
|
chromosome_seq = aligned_sequences[target_chromosome]
|
214
211
|
chromosome_seq = "-" * parental.size unless chromosome_seq
|
215
212
|
chromosome_seq = chromosome_seq.downcase
|
216
213
|
mask = mask_aligned_chromosomal_snp(target_chromosome)
|
217
|
-
#puts "'#{mask}'"
|
218
214
|
|
219
215
|
pr = PrimerRegion.new
|
220
216
|
position_in_region = 0
|
@@ -291,8 +287,9 @@ module Bio::PolyploidTools
|
|
291
287
|
|
292
288
|
end
|
293
289
|
|
294
|
-
|
295
|
-
|
290
|
+
#puts "__"
|
291
|
+
#puts self.inspect
|
292
|
+
str = "SEQUENCE_ID=#{opts[:name]} #{orientation} \n"
|
296
293
|
str << "SEQUENCE_FORCE_LEFT_END=#{left}\n" unless opts[:extra_f]
|
297
294
|
str << "SEQUENCE_FORCE_RIGHT_END=#{right}\n" if opts[:right_pos]
|
298
295
|
str << extra if extra
|
@@ -326,10 +323,10 @@ module Bio::PolyploidTools
|
|
326
323
|
primer_3_propertes = Array.new
|
327
324
|
|
328
325
|
seq_original = String.new(pr.sequence)
|
329
|
-
puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
|
326
|
+
#puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
|
330
327
|
return primer_3_propertes if seq_original.size < primer_3_min_seq_length
|
331
328
|
#puts self.inspect
|
332
|
-
puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
|
329
|
+
#puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
|
333
330
|
|
334
331
|
seq_original[pr.snp_pos] = self.original
|
335
332
|
seq_original_reverse = reverse_complement_string(seq_original)
|
@@ -432,12 +429,13 @@ module Bio::PolyploidTools
|
|
432
429
|
|
433
430
|
seq[local_pos_in_gene] = self.snp if name == self.snp_in
|
434
431
|
@parental_sequences [name] = seq
|
435
|
-
puts name
|
436
|
-
puts seq
|
437
432
|
end
|
438
433
|
@parental_sequences
|
439
434
|
end
|
440
435
|
|
436
|
+
|
437
|
+
|
438
|
+
|
441
439
|
def surrounding_parental_sequences
|
442
440
|
return @surrounding_parental_sequences if @surrounding_parental_sequences
|
443
441
|
gene_region = self.covered_region
|
@@ -450,11 +448,15 @@ module Bio::PolyploidTools
|
|
450
448
|
seq = bam.consensus_with_ambiguities({:region=>gene_region}).to_s
|
451
449
|
else
|
452
450
|
seq = container.gene_model_sequence(gene_region)
|
453
|
-
|
454
|
-
|
455
|
-
|
456
|
-
|
457
|
-
|
451
|
+
#puts "#{name} #{self.snp_in}"
|
452
|
+
#puts "Modifing original: #{name}\n#{seq}"
|
453
|
+
unless name == self.snp_in
|
454
|
+
|
455
|
+
seq[local_pos_in_gene] = self.original
|
456
|
+
else
|
457
|
+
seq[local_pos_in_gene] = self.snp
|
458
|
+
end
|
459
|
+
#puts "#{seq}"
|
458
460
|
end
|
459
461
|
seq[local_pos_in_gene] = seq[local_pos_in_gene].upcase
|
460
462
|
seq[local_pos_in_gene] = self.snp if name == self.snp_in
|
@@ -522,71 +524,101 @@ module Bio::PolyploidTools
|
|
522
524
|
ret_str
|
523
525
|
end
|
524
526
|
|
527
|
+
|
528
|
+
def get_snp_position_after_trim
|
529
|
+
local_pos_in_gene = self.local_position
|
530
|
+
ideal_min = self.local_position - flanking_size
|
531
|
+
ideal_max = self.local_position + flanking_size
|
532
|
+
left_pad = 0
|
533
|
+
if ideal_min < 0
|
534
|
+
left_pad = ideal_min * -1
|
535
|
+
ideal_min = 0
|
536
|
+
end
|
537
|
+
local_pos_in_gene - ideal_min
|
538
|
+
end
|
539
|
+
|
525
540
|
def aligned_snp_position
|
526
541
|
return @aligned_snp_position if @aligned_snp_position
|
542
|
+
#puts self.inspect
|
527
543
|
pos = -1
|
528
544
|
parental_strings = Array.new
|
529
545
|
parental_sequences.keys.each do | par |
|
530
|
-
|
531
546
|
parental_strings << aligned_sequences[par]
|
532
547
|
end
|
533
|
-
template_sequence = nil
|
534
|
-
aligned_sequences.keys.each do |temp |
|
535
|
-
template_sequence = aligned_sequences[ temp ] if aligned_sequences[ temp ][0] != "-"
|
536
|
-
end
|
537
548
|
$stderr.puts "WARN: #{self.to_s} #{parental_sequences.keys} is not of size 2 (#{parental_strings.size})" if parental_strings.size != 2
|
538
549
|
|
550
|
+
local_pos_in_parental = get_snp_position_after_trim
|
539
551
|
i = 0
|
540
|
-
differences = 0
|
541
|
-
local_pos_in_gene = flanking_size
|
542
|
-
local_pos = 0
|
543
|
-
started = false
|
544
|
-
#TODO: Validate the cases when the alignment has padding on the left on all the chromosomes
|
545
|
-
#unless parental_strings[0]
|
546
|
-
#puts "parental hash: #{parental_sequences}"
|
547
|
-
#puts "Aligned sequences: #{aligned_sequences.to_fasta}"
|
548
|
-
# puts "parental_strings: #{parental_strings.to_s}"
|
549
|
-
#end
|
550
552
|
while i < parental_strings[0].size do
|
551
|
-
if
|
553
|
+
if local_pos_in_parental == 0 and parental_strings[0][i] != "-"
|
552
554
|
pos = i
|
553
555
|
if parental_strings[0][i] == parental_strings[1][i]
|
554
556
|
$stderr.puts "WARN: #{self.to_s} doesn't have a SNP in the marked place (#{i})! \n#{parental_strings[0]}\n#{parental_strings[1]}"
|
555
557
|
end
|
556
|
-
|
557
|
-
end
|
558
|
-
|
559
|
-
started = true if template_sequence[i] != "-"
|
560
|
-
if started == false or template_sequence[i] != "-"
|
561
|
-
local_pos += 1
|
562
558
|
end
|
559
|
+
|
560
|
+
local_pos_in_parental -= 1 if parental_strings[0][i] != "-"
|
563
561
|
i += 1
|
564
562
|
end
|
565
563
|
@aligned_snp_position = pos
|
566
564
|
return pos
|
567
565
|
end
|
568
566
|
|
567
|
+
def get_target_sequence(names, chromosome)
|
568
|
+
|
569
|
+
best = chromosome
|
570
|
+
best_score = 0
|
571
|
+
names.each do |e|
|
572
|
+
arr = e.split("_")
|
573
|
+
if arr.length == 3
|
574
|
+
score = arr[2].to_f
|
575
|
+
if score >best_score
|
576
|
+
best_score = score
|
577
|
+
best = e
|
578
|
+
end
|
579
|
+
end
|
580
|
+
end
|
581
|
+
best
|
582
|
+
end
|
583
|
+
|
584
|
+
|
585
|
+
|
569
586
|
def mask_aligned_chromosomal_snp(chromosome)
|
570
|
-
names =
|
587
|
+
names = aligned_sequences.keys
|
571
588
|
parentals = parental_sequences.keys
|
572
589
|
|
590
|
+
position_after_trim = get_snp_position_after_trim
|
591
|
+
|
592
|
+
names = names - parentals
|
573
593
|
local_pos_in_gene = aligned_snp_position
|
574
|
-
|
575
|
-
|
594
|
+
|
595
|
+
best_target = get_target_sequence(names, chromosome)
|
596
|
+
masked_snps = aligned_sequences[best_target].downcase if aligned_sequences[best_target]
|
597
|
+
masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[best_target]
|
576
598
|
#TODO: Make this chromosome specific, even when we have more than one alignment going to the region we want.
|
599
|
+
#puts "mask_aligned_chromosomal_snp(#{chromosome})"
|
600
|
+
#puts names
|
577
601
|
i = 0
|
578
|
-
|
602
|
+
for i in 0..masked_snps.size-1
|
603
|
+
#puts i
|
579
604
|
different = 0
|
580
605
|
cov = 0
|
581
606
|
from_group = 0
|
582
607
|
nCount = 0
|
608
|
+
seen = []
|
583
609
|
names.each do | chr |
|
584
610
|
if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
|
611
|
+
#puts aligned_sequences[chr][i]
|
585
612
|
cov += 1
|
586
613
|
nCount += 1 if aligned_sequences[chr][i] == 'N' or aligned_sequences[chr][i] == 'n' # maybe fix this to use ambiguity codes instead.
|
587
|
-
|
614
|
+
|
615
|
+
if chr[0] == chromosome_group and not seen.include? chr[1]
|
616
|
+
seen << chr[1]
|
617
|
+
from_group += 1
|
618
|
+
|
619
|
+
end
|
588
620
|
#puts "Comparing #{chromosome_group} and #{chr[0]} as chromosomes"
|
589
|
-
if chr !=
|
621
|
+
if chr != best_target
|
590
622
|
$stderr.puts "WARN: No base for #{masked_snps} : ##{i}" unless masked_snps[i].upcase
|
591
623
|
$stderr.puts "WARN: No base for #{aligned_sequences[chr]} : ##{i}" unless masked_snps[i].upcase
|
592
624
|
different += 1 if masked_snps[i].upcase != aligned_sequences[chr][i].upcase
|
@@ -598,12 +630,15 @@ module Bio::PolyploidTools
|
|
598
630
|
masked_snps[i] = "-" if nCount > 0
|
599
631
|
masked_snps[i] = "*" if cov == 0
|
600
632
|
expected_snps = names.size - 1
|
601
|
-
|
633
|
+
|
634
|
+
#puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
|
602
635
|
|
603
636
|
masked_snps[i] = masked_snps[i].upcase if different == expected_snps and from_group == genomes_count
|
637
|
+
#puts "#{i}:#{masked_snps[i]}"
|
604
638
|
|
605
639
|
if i == local_pos_in_gene
|
606
640
|
masked_snps[i] = "&"
|
641
|
+
#puts "#{i}:#{masked_snps[i]}___"
|
607
642
|
bases = ""
|
608
643
|
names.each do | chr |
|
609
644
|
bases << aligned_sequences[chr][i] if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
|
@@ -617,62 +652,22 @@ module Bio::PolyploidTools
|
|
617
652
|
end
|
618
653
|
|
619
654
|
end
|
620
|
-
i += 1
|
621
|
-
end
|
622
|
-
masked_snps
|
623
|
-
end
|
624
|
-
|
625
|
-
def masked_chromosomal_snps(chromosome)
|
626
|
-
chromosomes = exon_sequences
|
627
|
-
names = chromosomes.keys
|
628
|
-
masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
|
629
|
-
masked_snps = "-" * covered_region.size unless chromosomes[chromosome]
|
630
|
-
local_pos_in_gene = self.local_position
|
631
|
-
ideal_min = local_pos_in_gene - flanking_size
|
632
|
-
ideal_max = local_pos_in_gene + flanking_size
|
633
|
-
i = 0
|
634
|
-
while i < masked_snps.size do
|
635
|
-
if i > ideal_min and i <= ideal_max
|
636
|
-
|
637
|
-
different = 0
|
638
|
-
cov = 0
|
639
|
-
names.each do | chr |
|
640
|
-
if chromosomes[chr][i] != "-"
|
641
|
-
cov += 1
|
642
|
-
if chr != chromosome and masked_snps[i] != "+"
|
643
|
-
different += 1 if masked_snps[i] != chromosomes[chr][i]
|
644
|
-
end
|
645
|
-
end
|
646
|
-
|
647
|
-
end
|
648
|
-
masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
|
649
|
-
masked_snps[i] = "-" if cov < 2
|
650
|
-
masked_snps[i] = masked_snps[i].upcase if different > 1
|
651
|
-
|
652
|
-
else
|
653
|
-
masked_snps[i] = "*"
|
654
|
-
end
|
655
|
-
if i == local_pos_in_gene
|
656
|
-
masked_snps[i] = "&"
|
657
|
-
end
|
658
|
-
i += 1
|
655
|
+
#i += 1
|
659
656
|
end
|
660
657
|
masked_snps
|
661
658
|
end
|
662
659
|
|
660
|
+
|
663
661
|
def surrounding_masked_chromosomal_snps(chromosome)
|
664
662
|
|
665
663
|
chromosomes = surrounding_exon_sequences
|
666
664
|
names = chromosomes.keys
|
665
|
+
get_target_sequence(names)
|
667
666
|
masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
|
668
667
|
masked_snps = "-" * (flanking_size * 2 ) unless chromosomes[chromosome]
|
669
668
|
local_pos_in_gene = flanking_size
|
670
|
-
# ideal_min = local_pos_in_gene - flanking_size
|
671
|
-
#ideal_max = local_pos_in_gene + flanking_size
|
672
669
|
i = 0
|
673
670
|
while i < masked_snps.size do
|
674
|
-
|
675
|
-
|
676
671
|
different = 0
|
677
672
|
cov = 0
|
678
673
|
names.each do | chr |
|
@@ -682,13 +677,11 @@ module Bio::PolyploidTools
|
|
682
677
|
different += 1 if masked_snps[i] != chromosomes[chr][i]
|
683
678
|
end
|
684
679
|
end
|
685
|
-
|
686
680
|
end
|
687
681
|
masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
|
688
682
|
masked_snps[i] = "-" if cov < 2
|
689
683
|
masked_snps[i] = masked_snps[i].upcase if different > 1
|
690
684
|
|
691
|
-
|
692
685
|
if i == local_pos_in_gene
|
693
686
|
masked_snps[i] = "&"
|
694
687
|
end
|
@@ -699,18 +692,19 @@ module Bio::PolyploidTools
|
|
699
692
|
|
700
693
|
def surrounding_exon_sequences
|
701
694
|
return @surrounding_exon_sequences if @surrounding_exon_sequences
|
695
|
+
gene_region = self.covered_region
|
702
696
|
@surrounding_exon_sequences = Bio::Alignment::SequenceHash.new
|
703
|
-
self.exon_list.each do |chromosome,
|
704
|
-
|
705
|
-
|
706
|
-
|
707
|
-
|
708
|
-
|
709
|
-
|
710
|
-
|
711
|
-
|
712
|
-
|
713
|
-
|
697
|
+
self.exon_list.each do |chromosome, exon_arr|
|
698
|
+
exon_arr.each do |exon|
|
699
|
+
exon_start_offset = exon.query_region.start - gene_region.start
|
700
|
+
flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
|
701
|
+
#TODO: Padd when the exon goes over the regions...
|
702
|
+
#puts flanquing_region.inspect
|
703
|
+
#Ignoring when the exon is in a gap
|
704
|
+
unless exon.snp_in_gap
|
705
|
+
exon_seq = container.chromosome_sequence(flanquing_region)
|
706
|
+
@surrounding_exon_sequences["#{chromosome}_#{flanquing_region.start}_#{exon.record.score}"] = exon_seq
|
707
|
+
end
|
714
708
|
end
|
715
709
|
end
|
716
710
|
@surrounding_exon_sequences
|
@@ -722,18 +716,21 @@ module Bio::PolyploidTools
|
|
722
716
|
gene_region = self.covered_region
|
723
717
|
local_pos_in_gene = self.local_position
|
724
718
|
@exon_sequences = Bio::Alignment::SequenceHash.new
|
725
|
-
self.exon_list.each do |chromosome,
|
726
|
-
|
727
|
-
|
728
|
-
|
729
|
-
|
730
|
-
|
731
|
-
|
732
|
-
|
733
|
-
|
734
|
-
|
735
|
-
|
736
|
-
|
719
|
+
self.exon_list.each do |chromosome, exon_arr|
|
720
|
+
exon_arr.each do |exon|
|
721
|
+
exon_start_offset = exon.query_region.start - gene_region.start
|
722
|
+
exon_seq = "-" * exon_start_offset
|
723
|
+
exon_seq << container.chromosome_sequence(exon.target_region).to_s
|
724
|
+
#puts exon_seq
|
725
|
+
#l_pos = exon_start_offset + local_pos_in_gene
|
726
|
+
unless exon.snp_in_gap
|
727
|
+
#puts "local position: #{local_pos_in_gene}"
|
728
|
+
#puts "Exon_seq: #{exon_seq}"
|
729
|
+
exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
|
730
|
+
exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
|
731
|
+
#puts exon.inspect
|
732
|
+
@exon_sequences["#{chromosome}_#{exon.query_region.start}_#{exon.record.score}"] = exon_seq
|
733
|
+
end
|
737
734
|
end
|
738
735
|
end
|
739
736
|
@exon_sequences[@chromosome] = "-" * gene_region.size unless @exon_sequences[@chromosome]
|
data/lib/bio/db/primer3.rb
CHANGED
@@ -113,6 +113,8 @@ module Bio::DB::Primer3
|
|
113
113
|
right_start = 0
|
114
114
|
right_end = 0
|
115
115
|
total_columns_before_messages=17
|
116
|
+
#puts "Values in primer3"
|
117
|
+
#puts snp_from.inspect
|
116
118
|
@values = Array.new
|
117
119
|
#@values << "#{gene},,#{template_length},"
|
118
120
|
@values << gene
|
@@ -763,7 +765,7 @@ module Bio::DB::Primer3
|
|
763
765
|
snp.line_1 = @line_1
|
764
766
|
snp.line_2 = @line_2
|
765
767
|
snp.snp_from = snp_in
|
766
|
-
snp.regions = snp_in.exon_list.values.collect { |x|
|
768
|
+
snp.regions = snp_in.exon_list.values.collect { |x| x.collect {|y| y.target_region.to_s }}
|
767
769
|
@snp_hash[snp.to_s] = snp
|
768
770
|
snp
|
769
771
|
end
|
Binary file
|
Binary file
|
Binary file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-polyploid-tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.8.
|
4
|
+
version: 0.8.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ricardo H. Ramirez-Gonzalez
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-01-
|
11
|
+
date: 2018-01-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bio
|
@@ -206,6 +206,10 @@ files:
|
|
206
206
|
- test/data/BS00068396_51_contigs.aln
|
207
207
|
- test/data/BS00068396_51_contigs.dnd
|
208
208
|
- test/data/BS00068396_51_contigs.fa
|
209
|
+
- test/data/BS00068396_51_contigs.fa.fai
|
210
|
+
- test/data/BS00068396_51_contigs.fa.nhr
|
211
|
+
- test/data/BS00068396_51_contigs.fa.nin
|
212
|
+
- test/data/BS00068396_51_contigs.fa.nsq
|
209
213
|
- test/data/BS00068396_51_contigs.nhr
|
210
214
|
- test/data/BS00068396_51_contigs.nin
|
211
215
|
- test/data/BS00068396_51_contigs.nsq
|