bio-polyploid-tools 0.8.0 → 0.8.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '08be9c740b45561cf8de023e6ca63bb6be4ae63e6f89bd1eb4b149da9cf47334'
4
- data.tar.gz: 94aa0d62f15ad380a35fe2c4bbcd870f2cb984f04c76aa825084b9ab97431d8b
3
+ metadata.gz: 5cc3c126779f27e61f521959b82d13f240cb2ce8d5c5416e511f9150ced798eb
4
+ data.tar.gz: 13adf99f1336327f7b399057d66ee63892ee276ee507a398fb4a6936a2a765a2
5
5
  SHA512:
6
- metadata.gz: 6f15740cb929555b6627eac53dc12b28d75c10709e271a23aef06935c11fb83bf99479afe68d8db5e5bac8d9ecc06c62ac8f17fc4e3066e8ae6de1094b3fb042
7
- data.tar.gz: 7a8cee46ca1ecf4a6ed71b497005f32f851067667c59e36a6b91bea3e8153c9beee4a765866f0849ae0fe83378cc241372fde6368f6fddc11e426a0a12415c36
6
+ metadata.gz: d345c23216e2d6aa3174885a053300ec499230125625d36fe1b5efc8de3d151b5423ad8ed5e0c563b2c9be5b07df29a759a9f95ee1196588eefa0ab2e40ec802
7
+ data.tar.gz: 1ed47854ec04f95c7d8449e1a15885a67f00c076b4bc07705616d6762ad7cf8822d60933cb737c71b95cbfb46293b4cb970e3d2f6413ccde70ca8e0f372e2ab4
data/README.md CHANGED
@@ -1,10 +1,12 @@
1
- #bio-polyploid-tools
1
+ # bio-polyploid-tools
2
+
3
+ ## Introduction
2
4
 
3
- ##Introduction
4
5
  This tools are designed to deal with polyploid wheat. The first tool is to design KASP primers, making them as specific as possible.
5
6
 
6
7
 
7
- ##Installation
8
+ ## Installation
9
+
8
10
  ```sh
9
11
  gem install bio-polyploid-tools
10
12
  ```
@@ -13,13 +15,19 @@ You need to have in your ```$PATH``` the following programs:
13
15
  * [MAFFT](http://mafft.cbrc.jp/alignment/software/)
14
16
  * [primer3](http://primer3.sourceforge.net/releases.php)
15
17
  * [exonerate](http://www.ebi.ac.uk/~guy/exonerate/)
18
+ * [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE%3DBlastDocs&DOC_TYPE%3DDownload)
16
19
 
17
- The code has been developed on ruby 2.1.0, but it should work on 1.9.3 and above.
20
+ The code was originally developed on ruby 2.1.0, but it should work on 1.9.3 and above. However, it is only actively tested in currently supported ruby versions:
21
+
22
+ * 2.1.10
23
+ * 2.2.5
24
+ * 2.3.5
25
+ * 2.4.2
18
26
 
19
- #PolyMarker
20
27
 
21
- To run poolymerker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
28
+ # PolyMarker
22
29
 
30
+ To run PolyMarker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
23
31
 
24
32
 
25
33
  ```sh
@@ -80,7 +88,7 @@ If the flanking sequence is unknow, but the position on a reference is available
80
88
  * **alternative allele** The base in the alternative allele.
81
89
  * **target chromosome** for the specific primers. Must be in line with the chromosome selection critieria.
82
90
 
83
- ####Example
91
+ #### Example
84
92
 
85
93
  ```
86
94
  IWGSC_CSS_1AL_scaff_110,C,519,A,2A
@@ -89,7 +97,8 @@ IWGSC_CSS_1AL_scaff_110,C,519,A,2A
89
97
  This file format can be used with ```snp_positions_to_polymarker.rb``` to produce the input for the option```--marker_list```.
90
98
 
91
99
 
92
- ###Custom reference sequences.
100
+ ### Custom reference sequences.
101
+
93
102
  By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
94
103
  ) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
95
104
 
@@ -117,33 +126,38 @@ To use blast instead of exonerate, use the following command:
117
126
  ```
118
127
 
119
128
 
120
- ##Release Notes
129
+ ## Release Notes
130
+
131
+ ### 0.8
132
+
133
+ * FEATURE: ```polymarker.rb``` added the flag ```--aligner blast|exonerate ``` which lets you pick between ```blast``` or ```exonerate``` as the aligner. For blast the default is to have the database with the same name as the ```--contigs``` file. However, it is possible to use a different name vua the option ```--database```.
134
+
135
+ ### 0.7.3
121
136
 
122
- ###0.7.3
123
137
  * FEATURE: ```polymarker.rb``` Added to the flag ```--arm_selection``` the option ```scaffold```, which now supports a scaffold specific primer.
124
138
  * FEATURE: ```snp_position_to_polymarker``` Added the option ```--mutant_list``` to prepare files for PolyMarker from files with the following columns ```ID,Allele_1,position,Allele_1,target_chromosome```.
125
139
 
126
- ###0.7.2
140
+ ### 0.7.2
141
+
127
142
  * FEATURE: Added a flag ```min_identity``` to set the minimum identity to consider a hit. The default is 90
128
143
 
129
- ###0.7.1
144
+ ### 0.7.1
130
145
  * BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
131
146
  * DOC: Added documentation on how to use custom references.
132
147
 
133
- ###0.7.0
148
+ ### 0.7.0
134
149
  * Added flag ```genomes_count``` for number of genomes, to be used on tetraploids, etc.
135
150
 
136
- ###0.6.1
151
+ ### 0.6.1
137
152
 
138
153
 
139
154
  * polymarker.rb now validates that all the files exist.
140
155
  * BUGFIX: A reference was required even when it was not used to generate contigs.
141
156
 
142
- #Notes
143
-
157
+ # Notes
144
158
 
145
- * If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
146
159
 
160
+ * BUG: If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
147
161
  * BUG: Blocks with NNNs are picked and treated as semi-specific.
148
162
  * BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
149
163
  * TODO: Add a parameter file to configure the alignments.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.8.0
1
+ 0.8.1
@@ -180,4 +180,4 @@ end
180
180
 
181
181
  kasp_container.add_primers_file(primer_3_output)
182
182
  header = "Marker,SNP,RegionSize,SNP_type,#{snp_in},#{original_name},common,primer_type,orientation,#{snp_in}_TM,#{original_name}_TM,common_TM,selected_from,product_size"
183
- File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
183
+ File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
data/bin/polymarker.rb CHANGED
@@ -12,6 +12,11 @@ require path
12
12
 
13
13
  arm_selection_functions = Hash.new;
14
14
 
15
+ arm_selection_functions[:arm_selection_nrgenes] = lambda do | contig_name |
16
+ #example format: chr2A
17
+ ret = contig_name[3,2]
18
+ return ret
19
+ end
15
20
 
16
21
  arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
17
22
  contig_name.gsub!(/chr/,"")
@@ -417,4 +422,4 @@ rescue StandardError => e
417
422
  rescue Exception => e
418
423
  write_status "ERROR\t#{e.message}"
419
424
  raise e
420
- end
425
+ end
@@ -71,16 +71,26 @@ File.open(test_file) do | f |
71
71
  snp = Bio::PolyploidTools::SNPMutant.parse(line)
72
72
  entry = fasta_reference_db.index.region_for_entry(snp.contig)
73
73
  end
74
-
74
+ #puts line
75
75
  if entry
76
76
  region = entry.get_full_region
77
- if region != lastRegion
78
- lastTemplate = fasta_reference_db.fetch_sequence(region)
79
- end
80
- snp.full_sequence = lastTemplate
77
+ snp_name = snp.snp_id_in_seq
78
+
79
+ #if region != lastRegion
80
+ # lastTemplate = fasta_reference_db.fetch_sequence(region)
81
+ #end
82
+ start, total, new_position = snp.to_polymarker_coordinates(options[:flanking_size])
83
+ region.start = start
84
+ region.end = start + total
85
+ #puts region
86
+ local_template = fasta_reference_db.fetch_sequence(region)
87
+
88
+ snp.position = new_position
89
+
90
+ snp.template_sequence = local_template
81
91
  lastRegion = region
82
92
 
83
- out.puts "#{snp.gene}_#{snp.snp_id_in_seq},#{snp.chromosome},#{snp.sequence_original}"
93
+ out.puts "#{snp.gene}_#{snp_name},#{snp.chromosome},#{snp.to_polymarker_sequence(options[:flanking_size])}"
84
94
  else
85
95
  $stderr.puts "ERROR: Unable to find entry for #{snp.gene}"
86
96
  end
@@ -2,16 +2,16 @@
2
2
  # DO NOT EDIT THIS FILE DIRECTLY
3
3
  # Instead, edit Juwelier::Tasks in Rakefile, and run 'rake gemspec'
4
4
  # -*- encoding: utf-8 -*-
5
- # stub: bio-polyploid-tools 0.8.0 ruby lib
5
+ # stub: bio-polyploid-tools 0.8.1 ruby lib
6
6
 
7
7
  Gem::Specification.new do |s|
8
8
  s.name = "bio-polyploid-tools".freeze
9
- s.version = "0.8.0"
9
+ s.version = "0.8.1"
10
10
 
11
11
  s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
12
12
  s.require_paths = ["lib".freeze]
13
13
  s.authors = ["Ricardo H. Ramirez-Gonzalez".freeze]
14
- s.date = "2018-01-18"
14
+ s.date = "2018-01-19"
15
15
  s.description = "Repository of tools developed at Crop Genetics in JIC to work with polyploid wheat".freeze
16
16
  s.email = "ricardo.ramirez-gonzalez@jic.ac.uk".freeze
17
17
  s.executables = ["bfr.rb".freeze, "blast_triads.rb".freeze, "blast_triads_promoters.rb".freeze, "count_variations.rb".freeze, "filter_blat_by_target_coverage.rb".freeze, "filter_exonerate_by_identity.rb".freeze, "find_best_blat_hit.rb".freeze, "find_best_exonerate.rb".freeze, "find_homoeologue_variations.rb".freeze, "get_longest_hsp_blastx_triads.rb".freeze, "hexaploid_primers.rb".freeze, "homokaryot_primers.rb".freeze, "mafft_triads.rb".freeze, "mafft_triads_promoters.rb".freeze, "map_markers_to_contigs.rb".freeze, "markers_in_region.rb".freeze, "polymarker.rb".freeze, "polymarker_capillary.rb".freeze, "snp_position_to_polymarker.rb".freeze, "snps_between_bams.rb".freeze, "vcfLineToTable.rb".freeze]
@@ -102,6 +102,10 @@ Gem::Specification.new do |s|
102
102
  "test/data/BS00068396_51_contigs.aln",
103
103
  "test/data/BS00068396_51_contigs.dnd",
104
104
  "test/data/BS00068396_51_contigs.fa",
105
+ "test/data/BS00068396_51_contigs.fa.fai",
106
+ "test/data/BS00068396_51_contigs.fa.nhr",
107
+ "test/data/BS00068396_51_contigs.fa.nin",
108
+ "test/data/BS00068396_51_contigs.fa.nsq",
105
109
  "test/data/BS00068396_51_contigs.nhr",
106
110
  "test/data/BS00068396_51_contigs.nin",
107
111
  "test/data/BS00068396_51_contigs.nsq",
@@ -116,13 +116,13 @@ module Bio::PolyploidTools
116
116
  target_region = exon.target_region
117
117
  exon_start_offset = exon.query_region.start - gene_region.start
118
118
  chr_local_pos=local_pos_in_gene + target_region.start + 1
119
- ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
120
- to_print = "-" * exon_start_offset
121
- chr_seq = chromosome_sequence(exon.target_region).to_s
122
- l_pos = exon_start_offset + local_pos_in_gene
123
- to_print << chr_seq
119
+ ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
120
+ to_print = "-" * exon_start_offset
121
+ chr_seq = chromosome_sequence(exon.target_region).to_s
122
+ l_pos = exon_start_offset + local_pos_in_gene
123
+ to_print << chr_seq
124
124
  to_print[local_pos_in_gene] = to_print[local_pos_in_gene].upcase
125
- ret_str << to_print
125
+ ret_str << to_print
126
126
  end
127
127
  ret_str
128
128
  end
@@ -16,6 +16,8 @@ module Bio::PolyploidTools
16
16
  attr_accessor :chromosome
17
17
  attr_accessor :variation_free_region
18
18
 
19
+
20
+
19
21
  #Format:
20
22
  #Gene_name,Original,SNP_Pos,pos,chromosome
21
23
  #A_comp0_c0_seq1,C,519,A,2A
@@ -30,7 +32,7 @@ module Bio::PolyploidTools
30
32
  snp.snp.upcase!
31
33
  snp.snp.strip!
32
34
  snp.chromosome.strip!
33
- snp.exon_list = Hash.new()
35
+
34
36
  snp.use_reference = false
35
37
  snp
36
38
  end
@@ -60,6 +62,16 @@ module Bio::PolyploidTools
60
62
  @primer_3_min_seq_length = 50
61
63
  @variation_free_region = 0
62
64
  @contig = false
65
+ @exon_list = Hash.new {|hsh, key| hsh[key] = [] }
66
+ end
67
+
68
+ def to_polymarker_coordinates(flanking_size, total:nil)
69
+ start = position - flanking_size + 1
70
+ start = 0 if start < 0
71
+ total = flanking_size * 2 unless total
72
+ total += 1
73
+ new_position = position - start + 2
74
+ [start , total, new_position ]
63
75
  end
64
76
 
65
77
  def to_polymarker_sequence(flanking_size, total:nil)
@@ -103,8 +115,7 @@ module Bio::PolyploidTools
103
115
  end
104
116
 
105
117
  def add_exon(exon, arm)
106
- @exon_list[arm] = exon unless @exon_list[arm]
107
- @exon_list[arm] = exon if exon.record.score > @exon_list[arm].record.score
118
+ exon_list[arm] << exon
108
119
  end
109
120
 
110
121
  def covered_region
@@ -115,28 +126,28 @@ module Bio::PolyploidTools
115
126
  reg.orientation = :forward
116
127
  reg.start = self.position - self.flanking_size
117
128
  reg.end = self.position + self.flanking_size
118
-
119
129
  reg.start = 1 if reg.start < 1
120
-
121
130
  return reg
122
131
  end
123
132
 
124
133
  min = @position
125
134
  max = @position
126
- # puts "Calculating covered region for #{self.inspect}"
127
- # puts "#{@exon_list.inspect}"
128
- #raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
135
+ # puts "Calculating covered region for #{self.inspect}"
136
+ # puts "#{@exon_list.inspect}"
137
+ # raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
129
138
  if @exon_list.size == 0
130
139
  min = self.position - self.flanking_size
131
140
  min = 1 if min < 1
132
141
  max = self.position + self.flanking_size
133
142
  end
134
- @exon_list.each do | chromosome, exon |
135
- # puts exon.inspect
136
- reg = exon.query_region
137
- min = reg.start if reg.start < min
138
- max = reg.end if reg.end > max
143
+ @exon_list.each do | chromosome, exon_arr |
144
+ exon_arr.each do | exon |
145
+ reg = exon.query_region
146
+ min = reg.start if reg.start < min
147
+ max = reg.end if reg.end > max
148
+ end
139
149
  end
150
+
140
151
  reg = Bio::DB::Fasta::Region.new()
141
152
  reg.entry = gene
142
153
  reg.orientation = :forward
@@ -168,24 +179,6 @@ module Bio::PolyploidTools
168
179
  pos + left_padding
169
180
  end
170
181
 
171
- def exon_fasta_string
172
- gene_region = self.covered_region
173
- local_pos_in_gene = self.local_position
174
- ret_str = ""
175
- container.parents.each do |name, bam|
176
- ret_str << ">#{gene_region.entry}-#{self.position}_#{name} Overlapping_exons:#{gene_region.to_s} localSNPpo:#{local_pos_in_gene+1}\n"
177
- to_print = parental_sequences[name]
178
- ret_str << to_print << "\n"
179
- end
180
- self.exon_sequences.each do | chromosome, exon_seq |
181
- ret_str << ">#{chromosome}\n#{exon_seq}\n"
182
- end
183
- mask = masked_chromosomal_snps("1BS", flanking_size)
184
- ret_str << ">Mask\n#{mask}\n"
185
- ret_str
186
- end
187
-
188
-
189
182
  def primer_fasta_string
190
183
  gene_region = self.covered_region
191
184
  local_pos_in_gene = self.local_position
@@ -209,12 +202,15 @@ module Bio::PolyploidTools
209
202
  end
210
203
 
211
204
  def primer_region(target_chromosome, parental )
205
+
212
206
  parental = aligned_sequences[parental].downcase
207
+ names = aligned_sequences.keys
208
+ target_chromosome = get_target_sequence(names, target_chromosome)
209
+
213
210
  chromosome_seq = aligned_sequences[target_chromosome]
214
211
  chromosome_seq = "-" * parental.size unless chromosome_seq
215
212
  chromosome_seq = chromosome_seq.downcase
216
213
  mask = mask_aligned_chromosomal_snp(target_chromosome)
217
- #puts "'#{mask}'"
218
214
 
219
215
  pr = PrimerRegion.new
220
216
  position_in_region = 0
@@ -291,8 +287,9 @@ module Bio::PolyploidTools
291
287
 
292
288
  end
293
289
 
294
-
295
- str = "SEQUENCE_ID=#{opts[:name]} #{orientation}\n"
290
+ #puts "__"
291
+ #puts self.inspect
292
+ str = "SEQUENCE_ID=#{opts[:name]} #{orientation} \n"
296
293
  str << "SEQUENCE_FORCE_LEFT_END=#{left}\n" unless opts[:extra_f]
297
294
  str << "SEQUENCE_FORCE_RIGHT_END=#{right}\n" if opts[:right_pos]
298
295
  str << extra if extra
@@ -326,10 +323,10 @@ module Bio::PolyploidTools
326
323
  primer_3_propertes = Array.new
327
324
 
328
325
  seq_original = String.new(pr.sequence)
329
- puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
326
+ #puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
330
327
  return primer_3_propertes if seq_original.size < primer_3_min_seq_length
331
328
  #puts self.inspect
332
- puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
329
+ #puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
333
330
 
334
331
  seq_original[pr.snp_pos] = self.original
335
332
  seq_original_reverse = reverse_complement_string(seq_original)
@@ -432,12 +429,13 @@ module Bio::PolyploidTools
432
429
 
433
430
  seq[local_pos_in_gene] = self.snp if name == self.snp_in
434
431
  @parental_sequences [name] = seq
435
- puts name
436
- puts seq
437
432
  end
438
433
  @parental_sequences
439
434
  end
440
435
 
436
+
437
+
438
+
441
439
  def surrounding_parental_sequences
442
440
  return @surrounding_parental_sequences if @surrounding_parental_sequences
443
441
  gene_region = self.covered_region
@@ -450,11 +448,15 @@ module Bio::PolyploidTools
450
448
  seq = bam.consensus_with_ambiguities({:region=>gene_region}).to_s
451
449
  else
452
450
  seq = container.gene_model_sequence(gene_region)
453
-
454
- unless name == self.snp_in
455
- #puts "Modifing original: #{name} #{seq}"
456
- seq[local_pos_in_gene] = self.original
457
- end
451
+ #puts "#{name} #{self.snp_in}"
452
+ #puts "Modifing original: #{name}\n#{seq}"
453
+ unless name == self.snp_in
454
+
455
+ seq[local_pos_in_gene] = self.original
456
+ else
457
+ seq[local_pos_in_gene] = self.snp
458
+ end
459
+ #puts "#{seq}"
458
460
  end
459
461
  seq[local_pos_in_gene] = seq[local_pos_in_gene].upcase
460
462
  seq[local_pos_in_gene] = self.snp if name == self.snp_in
@@ -522,71 +524,101 @@ module Bio::PolyploidTools
522
524
  ret_str
523
525
  end
524
526
 
527
+
528
+ def get_snp_position_after_trim
529
+ local_pos_in_gene = self.local_position
530
+ ideal_min = self.local_position - flanking_size
531
+ ideal_max = self.local_position + flanking_size
532
+ left_pad = 0
533
+ if ideal_min < 0
534
+ left_pad = ideal_min * -1
535
+ ideal_min = 0
536
+ end
537
+ local_pos_in_gene - ideal_min
538
+ end
539
+
525
540
  def aligned_snp_position
526
541
  return @aligned_snp_position if @aligned_snp_position
542
+ #puts self.inspect
527
543
  pos = -1
528
544
  parental_strings = Array.new
529
545
  parental_sequences.keys.each do | par |
530
-
531
546
  parental_strings << aligned_sequences[par]
532
547
  end
533
- template_sequence = nil
534
- aligned_sequences.keys.each do |temp |
535
- template_sequence = aligned_sequences[ temp ] if aligned_sequences[ temp ][0] != "-"
536
- end
537
548
  $stderr.puts "WARN: #{self.to_s} #{parental_sequences.keys} is not of size 2 (#{parental_strings.size})" if parental_strings.size != 2
538
549
 
550
+ local_pos_in_parental = get_snp_position_after_trim
539
551
  i = 0
540
- differences = 0
541
- local_pos_in_gene = flanking_size
542
- local_pos = 0
543
- started = false
544
- #TODO: Validate the cases when the alignment has padding on the left on all the chromosomes
545
- #unless parental_strings[0]
546
- #puts "parental hash: #{parental_sequences}"
547
- #puts "Aligned sequences: #{aligned_sequences.to_fasta}"
548
- # puts "parental_strings: #{parental_strings.to_s}"
549
- #end
550
552
  while i < parental_strings[0].size do
551
- if local_pos_in_gene == local_pos
553
+ if local_pos_in_parental == 0 and parental_strings[0][i] != "-"
552
554
  pos = i
553
555
  if parental_strings[0][i] == parental_strings[1][i]
554
556
  $stderr.puts "WARN: #{self.to_s} doesn't have a SNP in the marked place (#{i})! \n#{parental_strings[0]}\n#{parental_strings[1]}"
555
557
  end
556
-
557
- end
558
-
559
- started = true if template_sequence[i] != "-"
560
- if started == false or template_sequence[i] != "-"
561
- local_pos += 1
562
558
  end
559
+
560
+ local_pos_in_parental -= 1 if parental_strings[0][i] != "-"
563
561
  i += 1
564
562
  end
565
563
  @aligned_snp_position = pos
566
564
  return pos
567
565
  end
568
566
 
567
+ def get_target_sequence(names, chromosome)
568
+
569
+ best = chromosome
570
+ best_score = 0
571
+ names.each do |e|
572
+ arr = e.split("_")
573
+ if arr.length == 3
574
+ score = arr[2].to_f
575
+ if score >best_score
576
+ best_score = score
577
+ best = e
578
+ end
579
+ end
580
+ end
581
+ best
582
+ end
583
+
584
+
585
+
569
586
  def mask_aligned_chromosomal_snp(chromosome)
570
- names = exon_sequences.keys
587
+ names = aligned_sequences.keys
571
588
  parentals = parental_sequences.keys
572
589
 
590
+ position_after_trim = get_snp_position_after_trim
591
+
592
+ names = names - parentals
573
593
  local_pos_in_gene = aligned_snp_position
574
- masked_snps = aligned_sequences[chromosome].downcase if aligned_sequences[chromosome]
575
- masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[chromosome]
594
+
595
+ best_target = get_target_sequence(names, chromosome)
596
+ masked_snps = aligned_sequences[best_target].downcase if aligned_sequences[best_target]
597
+ masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[best_target]
576
598
  #TODO: Make this chromosome specific, even when we have more than one alignment going to the region we want.
599
+ #puts "mask_aligned_chromosomal_snp(#{chromosome})"
600
+ #puts names
577
601
  i = 0
578
- while i < masked_snps.size
602
+ for i in 0..masked_snps.size-1
603
+ #puts i
579
604
  different = 0
580
605
  cov = 0
581
606
  from_group = 0
582
607
  nCount = 0
608
+ seen = []
583
609
  names.each do | chr |
584
610
  if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
611
+ #puts aligned_sequences[chr][i]
585
612
  cov += 1
586
613
  nCount += 1 if aligned_sequences[chr][i] == 'N' or aligned_sequences[chr][i] == 'n' # maybe fix this to use ambiguity codes instead.
587
- from_group += 1 if chr[0] == chromosome_group
614
+
615
+ if chr[0] == chromosome_group and not seen.include? chr[1]
616
+ seen << chr[1]
617
+ from_group += 1
618
+
619
+ end
588
620
  #puts "Comparing #{chromosome_group} and #{chr[0]} as chromosomes"
589
- if chr != chromosome
621
+ if chr != best_target
590
622
  $stderr.puts "WARN: No base for #{masked_snps} : ##{i}" unless masked_snps[i].upcase
591
623
  $stderr.puts "WARN: No base for #{aligned_sequences[chr]} : ##{i}" unless masked_snps[i].upcase
592
624
  different += 1 if masked_snps[i].upcase != aligned_sequences[chr][i].upcase
@@ -598,12 +630,15 @@ module Bio::PolyploidTools
598
630
  masked_snps[i] = "-" if nCount > 0
599
631
  masked_snps[i] = "*" if cov == 0
600
632
  expected_snps = names.size - 1
601
- # puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
633
+
634
+ #puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
602
635
 
603
636
  masked_snps[i] = masked_snps[i].upcase if different == expected_snps and from_group == genomes_count
637
+ #puts "#{i}:#{masked_snps[i]}"
604
638
 
605
639
  if i == local_pos_in_gene
606
640
  masked_snps[i] = "&"
641
+ #puts "#{i}:#{masked_snps[i]}___"
607
642
  bases = ""
608
643
  names.each do | chr |
609
644
  bases << aligned_sequences[chr][i] if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
@@ -617,62 +652,22 @@ module Bio::PolyploidTools
617
652
  end
618
653
 
619
654
  end
620
- i += 1
621
- end
622
- masked_snps
623
- end
624
-
625
- def masked_chromosomal_snps(chromosome)
626
- chromosomes = exon_sequences
627
- names = chromosomes.keys
628
- masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
629
- masked_snps = "-" * covered_region.size unless chromosomes[chromosome]
630
- local_pos_in_gene = self.local_position
631
- ideal_min = local_pos_in_gene - flanking_size
632
- ideal_max = local_pos_in_gene + flanking_size
633
- i = 0
634
- while i < masked_snps.size do
635
- if i > ideal_min and i <= ideal_max
636
-
637
- different = 0
638
- cov = 0
639
- names.each do | chr |
640
- if chromosomes[chr][i] != "-"
641
- cov += 1
642
- if chr != chromosome and masked_snps[i] != "+"
643
- different += 1 if masked_snps[i] != chromosomes[chr][i]
644
- end
645
- end
646
-
647
- end
648
- masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
649
- masked_snps[i] = "-" if cov < 2
650
- masked_snps[i] = masked_snps[i].upcase if different > 1
651
-
652
- else
653
- masked_snps[i] = "*"
654
- end
655
- if i == local_pos_in_gene
656
- masked_snps[i] = "&"
657
- end
658
- i += 1
655
+ #i += 1
659
656
  end
660
657
  masked_snps
661
658
  end
662
659
 
660
+
663
661
  def surrounding_masked_chromosomal_snps(chromosome)
664
662
 
665
663
  chromosomes = surrounding_exon_sequences
666
664
  names = chromosomes.keys
665
+ get_target_sequence(names)
667
666
  masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
668
667
  masked_snps = "-" * (flanking_size * 2 ) unless chromosomes[chromosome]
669
668
  local_pos_in_gene = flanking_size
670
- # ideal_min = local_pos_in_gene - flanking_size
671
- #ideal_max = local_pos_in_gene + flanking_size
672
669
  i = 0
673
670
  while i < masked_snps.size do
674
-
675
-
676
671
  different = 0
677
672
  cov = 0
678
673
  names.each do | chr |
@@ -682,13 +677,11 @@ module Bio::PolyploidTools
682
677
  different += 1 if masked_snps[i] != chromosomes[chr][i]
683
678
  end
684
679
  end
685
-
686
680
  end
687
681
  masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
688
682
  masked_snps[i] = "-" if cov < 2
689
683
  masked_snps[i] = masked_snps[i].upcase if different > 1
690
684
 
691
-
692
685
  if i == local_pos_in_gene
693
686
  masked_snps[i] = "&"
694
687
  end
@@ -699,18 +692,19 @@ module Bio::PolyploidTools
699
692
 
700
693
  def surrounding_exon_sequences
701
694
  return @surrounding_exon_sequences if @surrounding_exon_sequences
695
+ gene_region = self.covered_region
702
696
  @surrounding_exon_sequences = Bio::Alignment::SequenceHash.new
703
- self.exon_list.each do |chromosome, exon|
704
- #puts "surrounding_exon_sequences #{flanking_size}"
705
- #puts chromosome
706
- #puts exon
707
- flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
708
- #TODO: Padd when the exon goes over the regions...
709
-
710
- #Ignoring when the exon is in a gap
711
- unless exon.snp_in_gap
712
- exon_seq = container.chromosome_sequence(flanquing_region)
713
- @surrounding_exon_sequences[chromosome] = exon_seq
697
+ self.exon_list.each do |chromosome, exon_arr|
698
+ exon_arr.each do |exon|
699
+ exon_start_offset = exon.query_region.start - gene_region.start
700
+ flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
701
+ #TODO: Padd when the exon goes over the regions...
702
+ #puts flanquing_region.inspect
703
+ #Ignoring when the exon is in a gap
704
+ unless exon.snp_in_gap
705
+ exon_seq = container.chromosome_sequence(flanquing_region)
706
+ @surrounding_exon_sequences["#{chromosome}_#{flanquing_region.start}_#{exon.record.score}"] = exon_seq
707
+ end
714
708
  end
715
709
  end
716
710
  @surrounding_exon_sequences
@@ -722,18 +716,21 @@ module Bio::PolyploidTools
722
716
  gene_region = self.covered_region
723
717
  local_pos_in_gene = self.local_position
724
718
  @exon_sequences = Bio::Alignment::SequenceHash.new
725
- self.exon_list.each do |chromosome, exon|
726
- exon_start_offset = exon.query_region.start - gene_region.start
727
- exon_seq = "-" * exon_start_offset
728
- exon_seq << container.chromosome_sequence(exon.target_region).to_s
729
- #puts exon_seq
730
- # l_pos = exon_start_offset + local_pos_in_gene
731
- unless exon.snp_in_gap
732
- #puts "local position: #{local_pos_in_gene}"
733
- #puts "Exon_seq: #{exon_seq}"
734
- exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
735
- exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
736
- @exon_sequences[chromosome] = exon_seq
719
+ self.exon_list.each do |chromosome, exon_arr|
720
+ exon_arr.each do |exon|
721
+ exon_start_offset = exon.query_region.start - gene_region.start
722
+ exon_seq = "-" * exon_start_offset
723
+ exon_seq << container.chromosome_sequence(exon.target_region).to_s
724
+ #puts exon_seq
725
+ #l_pos = exon_start_offset + local_pos_in_gene
726
+ unless exon.snp_in_gap
727
+ #puts "local position: #{local_pos_in_gene}"
728
+ #puts "Exon_seq: #{exon_seq}"
729
+ exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
730
+ exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
731
+ #puts exon.inspect
732
+ @exon_sequences["#{chromosome}_#{exon.query_region.start}_#{exon.record.score}"] = exon_seq
733
+ end
737
734
  end
738
735
  end
739
736
  @exon_sequences[@chromosome] = "-" * gene_region.size unless @exon_sequences[@chromosome]
@@ -38,7 +38,6 @@ module Bio::PolyploidTools
38
38
  $stderr.puts e
39
39
  end
40
40
 
41
- snp.exon_list = Hash.new()
42
41
  snp.flanking_size=100
43
42
  snp.region_size = region_size.to_i if region_size
44
43
  snp.flanking_size = parsed_flanking.to_i if parsed_flanking
@@ -29,7 +29,7 @@ module Bio::PolyploidTools
29
29
  #snp.snp.upcase!
30
30
  snp.chromosome. strip!
31
31
  snp.parse_sequence_snp
32
- snp.exon_list = Hash.new()
32
+
33
33
  snp
34
34
  end
35
35
 
@@ -113,6 +113,8 @@ module Bio::DB::Primer3
113
113
  right_start = 0
114
114
  right_end = 0
115
115
  total_columns_before_messages=17
116
+ #puts "Values in primer3"
117
+ #puts snp_from.inspect
116
118
  @values = Array.new
117
119
  #@values << "#{gene},,#{template_length},"
118
120
  @values << gene
@@ -763,7 +765,7 @@ module Bio::DB::Primer3
763
765
  snp.line_1 = @line_1
764
766
  snp.line_2 = @line_2
765
767
  snp.snp_from = snp_in
766
- snp.regions = snp_in.exon_list.values.collect { |x| x.target_region.to_s }
768
+ snp.regions = snp_in.exon_list.values.collect { |x| x.collect {|y| y.target_region.to_s }}
767
769
  @snp_hash[snp.to_s] = snp
768
770
  snp
769
771
  end
@@ -0,0 +1,4 @@
1
+ 2AS_5222932 6364 46 6364 6365
2
+ 2BS_5245544 8836 6457 8836 8837
3
+ 2BS_5163353 11974 15341 11974 11975
4
+ 2DS_5334799 7226 27363 7226 7227
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-polyploid-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 0.8.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ricardo H. Ramirez-Gonzalez
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-01-18 00:00:00.000000000 Z
11
+ date: 2018-01-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bio
@@ -206,6 +206,10 @@ files:
206
206
  - test/data/BS00068396_51_contigs.aln
207
207
  - test/data/BS00068396_51_contigs.dnd
208
208
  - test/data/BS00068396_51_contigs.fa
209
+ - test/data/BS00068396_51_contigs.fa.fai
210
+ - test/data/BS00068396_51_contigs.fa.nhr
211
+ - test/data/BS00068396_51_contigs.fa.nin
212
+ - test/data/BS00068396_51_contigs.fa.nsq
209
213
  - test/data/BS00068396_51_contigs.nhr
210
214
  - test/data/BS00068396_51_contigs.nin
211
215
  - test/data/BS00068396_51_contigs.nsq