bio-polyploid-tools 0.8.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: '08be9c740b45561cf8de023e6ca63bb6be4ae63e6f89bd1eb4b149da9cf47334'
4
- data.tar.gz: 94aa0d62f15ad380a35fe2c4bbcd870f2cb984f04c76aa825084b9ab97431d8b
3
+ metadata.gz: 5cc3c126779f27e61f521959b82d13f240cb2ce8d5c5416e511f9150ced798eb
4
+ data.tar.gz: 13adf99f1336327f7b399057d66ee63892ee276ee507a398fb4a6936a2a765a2
5
5
  SHA512:
6
- metadata.gz: 6f15740cb929555b6627eac53dc12b28d75c10709e271a23aef06935c11fb83bf99479afe68d8db5e5bac8d9ecc06c62ac8f17fc4e3066e8ae6de1094b3fb042
7
- data.tar.gz: 7a8cee46ca1ecf4a6ed71b497005f32f851067667c59e36a6b91bea3e8153c9beee4a765866f0849ae0fe83378cc241372fde6368f6fddc11e426a0a12415c36
6
+ metadata.gz: d345c23216e2d6aa3174885a053300ec499230125625d36fe1b5efc8de3d151b5423ad8ed5e0c563b2c9be5b07df29a759a9f95ee1196588eefa0ab2e40ec802
7
+ data.tar.gz: 1ed47854ec04f95c7d8449e1a15885a67f00c076b4bc07705616d6762ad7cf8822d60933cb737c71b95cbfb46293b4cb970e3d2f6413ccde70ca8e0f372e2ab4
data/README.md CHANGED
@@ -1,10 +1,12 @@
1
- #bio-polyploid-tools
1
+ # bio-polyploid-tools
2
+
3
+ ## Introduction
2
4
 
3
- ##Introduction
4
5
  This tools are designed to deal with polyploid wheat. The first tool is to design KASP primers, making them as specific as possible.
5
6
 
6
7
 
7
- ##Installation
8
+ ## Installation
9
+
8
10
  ```sh
9
11
  gem install bio-polyploid-tools
10
12
  ```
@@ -13,13 +15,19 @@ You need to have in your ```$PATH``` the following programs:
13
15
  * [MAFFT](http://mafft.cbrc.jp/alignment/software/)
14
16
  * [primer3](http://primer3.sourceforge.net/releases.php)
15
17
  * [exonerate](http://www.ebi.ac.uk/~guy/exonerate/)
18
+ * [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE%3DBlastDocs&DOC_TYPE%3DDownload)
16
19
 
17
- The code has been developed on ruby 2.1.0, but it should work on 1.9.3 and above.
20
+ The code was originally developed on ruby 2.1.0, but it should work on 1.9.3 and above. However, it is only actively tested in currently supported ruby versions:
21
+
22
+ * 2.1.10
23
+ * 2.2.5
24
+ * 2.3.5
25
+ * 2.4.2
18
26
 
19
- #PolyMarker
20
27
 
21
- To run poolymerker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
28
+ # PolyMarker
22
29
 
30
+ To run PolyMarker with the CSS wheat contigs, you need to unzip the reference file from [ensembl](http://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz).
23
31
 
24
32
 
25
33
  ```sh
@@ -80,7 +88,7 @@ If the flanking sequence is unknow, but the position on a reference is available
80
88
  * **alternative allele** The base in the alternative allele.
81
89
  * **target chromosome** for the specific primers. Must be in line with the chromosome selection critieria.
82
90
 
83
- ####Example
91
+ #### Example
84
92
 
85
93
  ```
86
94
  IWGSC_CSS_1AL_scaff_110,C,519,A,2A
@@ -89,7 +97,8 @@ IWGSC_CSS_1AL_scaff_110,C,519,A,2A
89
97
  This file format can be used with ```snp_positions_to_polymarker.rb``` to produce the input for the option```--marker_list```.
90
98
 
91
99
 
92
- ###Custom reference sequences.
100
+ ### Custom reference sequences.
101
+
93
102
  By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
94
103
  ) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
95
104
 
@@ -117,33 +126,38 @@ To use blast instead of exonerate, use the following command:
117
126
  ```
118
127
 
119
128
 
120
- ##Release Notes
129
+ ## Release Notes
130
+
131
+ ### 0.8
132
+
133
+ * FEATURE: ```polymarker.rb``` added the flag ```--aligner blast|exonerate ``` which lets you pick between ```blast``` or ```exonerate``` as the aligner. For blast the default is to have the database with the same name as the ```--contigs``` file. However, it is possible to use a different name vua the option ```--database```.
134
+
135
+ ### 0.7.3
121
136
 
122
- ###0.7.3
123
137
  * FEATURE: ```polymarker.rb``` Added to the flag ```--arm_selection``` the option ```scaffold```, which now supports a scaffold specific primer.
124
138
  * FEATURE: ```snp_position_to_polymarker``` Added the option ```--mutant_list``` to prepare files for PolyMarker from files with the following columns ```ID,Allele_1,position,Allele_1,target_chromosome```.
125
139
 
126
- ###0.7.2
140
+ ### 0.7.2
141
+
127
142
  * FEATURE: Added a flag ```min_identity``` to set the minimum identity to consider a hit. The default is 90
128
143
 
129
- ###0.7.1
144
+ ### 0.7.1
130
145
  * BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
131
146
  * DOC: Added documentation on how to use custom references.
132
147
 
133
- ###0.7.0
148
+ ### 0.7.0
134
149
  * Added flag ```genomes_count``` for number of genomes, to be used on tetraploids, etc.
135
150
 
136
- ###0.6.1
151
+ ### 0.6.1
137
152
 
138
153
 
139
154
  * polymarker.rb now validates that all the files exist.
140
155
  * BUGFIX: A reference was required even when it was not used to generate contigs.
141
156
 
142
- #Notes
143
-
157
+ # Notes
144
158
 
145
- * If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
146
159
 
160
+ * BUG: If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
147
161
  * BUG: Blocks with NNNs are picked and treated as semi-specific.
148
162
  * BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
149
163
  * TODO: Add a parameter file to configure the alignments.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.8.0
1
+ 0.8.1
@@ -180,4 +180,4 @@ end
180
180
 
181
181
  kasp_container.add_primers_file(primer_3_output)
182
182
  header = "Marker,SNP,RegionSize,SNP_type,#{snp_in},#{original_name},common,primer_type,orientation,#{snp_in}_TM,#{original_name}_TM,common_TM,selected_from,product_size"
183
- File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
183
+ File.open(output_primers, 'w') { |f| f.write("#{header}\n#{kasp_container.print_primers}") }
data/bin/polymarker.rb CHANGED
@@ -12,6 +12,11 @@ require path
12
12
 
13
13
  arm_selection_functions = Hash.new;
14
14
 
15
+ arm_selection_functions[:arm_selection_nrgenes] = lambda do | contig_name |
16
+ #example format: chr2A
17
+ ret = contig_name[3,2]
18
+ return ret
19
+ end
15
20
 
16
21
  arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
17
22
  contig_name.gsub!(/chr/,"")
@@ -417,4 +422,4 @@ rescue StandardError => e
417
422
  rescue Exception => e
418
423
  write_status "ERROR\t#{e.message}"
419
424
  raise e
420
- end
425
+ end
@@ -71,16 +71,26 @@ File.open(test_file) do | f |
71
71
  snp = Bio::PolyploidTools::SNPMutant.parse(line)
72
72
  entry = fasta_reference_db.index.region_for_entry(snp.contig)
73
73
  end
74
-
74
+ #puts line
75
75
  if entry
76
76
  region = entry.get_full_region
77
- if region != lastRegion
78
- lastTemplate = fasta_reference_db.fetch_sequence(region)
79
- end
80
- snp.full_sequence = lastTemplate
77
+ snp_name = snp.snp_id_in_seq
78
+
79
+ #if region != lastRegion
80
+ # lastTemplate = fasta_reference_db.fetch_sequence(region)
81
+ #end
82
+ start, total, new_position = snp.to_polymarker_coordinates(options[:flanking_size])
83
+ region.start = start
84
+ region.end = start + total
85
+ #puts region
86
+ local_template = fasta_reference_db.fetch_sequence(region)
87
+
88
+ snp.position = new_position
89
+
90
+ snp.template_sequence = local_template
81
91
  lastRegion = region
82
92
 
83
- out.puts "#{snp.gene}_#{snp.snp_id_in_seq},#{snp.chromosome},#{snp.sequence_original}"
93
+ out.puts "#{snp.gene}_#{snp_name},#{snp.chromosome},#{snp.to_polymarker_sequence(options[:flanking_size])}"
84
94
  else
85
95
  $stderr.puts "ERROR: Unable to find entry for #{snp.gene}"
86
96
  end
@@ -2,16 +2,16 @@
2
2
  # DO NOT EDIT THIS FILE DIRECTLY
3
3
  # Instead, edit Juwelier::Tasks in Rakefile, and run 'rake gemspec'
4
4
  # -*- encoding: utf-8 -*-
5
- # stub: bio-polyploid-tools 0.8.0 ruby lib
5
+ # stub: bio-polyploid-tools 0.8.1 ruby lib
6
6
 
7
7
  Gem::Specification.new do |s|
8
8
  s.name = "bio-polyploid-tools".freeze
9
- s.version = "0.8.0"
9
+ s.version = "0.8.1"
10
10
 
11
11
  s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
12
12
  s.require_paths = ["lib".freeze]
13
13
  s.authors = ["Ricardo H. Ramirez-Gonzalez".freeze]
14
- s.date = "2018-01-18"
14
+ s.date = "2018-01-19"
15
15
  s.description = "Repository of tools developed at Crop Genetics in JIC to work with polyploid wheat".freeze
16
16
  s.email = "ricardo.ramirez-gonzalez@jic.ac.uk".freeze
17
17
  s.executables = ["bfr.rb".freeze, "blast_triads.rb".freeze, "blast_triads_promoters.rb".freeze, "count_variations.rb".freeze, "filter_blat_by_target_coverage.rb".freeze, "filter_exonerate_by_identity.rb".freeze, "find_best_blat_hit.rb".freeze, "find_best_exonerate.rb".freeze, "find_homoeologue_variations.rb".freeze, "get_longest_hsp_blastx_triads.rb".freeze, "hexaploid_primers.rb".freeze, "homokaryot_primers.rb".freeze, "mafft_triads.rb".freeze, "mafft_triads_promoters.rb".freeze, "map_markers_to_contigs.rb".freeze, "markers_in_region.rb".freeze, "polymarker.rb".freeze, "polymarker_capillary.rb".freeze, "snp_position_to_polymarker.rb".freeze, "snps_between_bams.rb".freeze, "vcfLineToTable.rb".freeze]
@@ -102,6 +102,10 @@ Gem::Specification.new do |s|
102
102
  "test/data/BS00068396_51_contigs.aln",
103
103
  "test/data/BS00068396_51_contigs.dnd",
104
104
  "test/data/BS00068396_51_contigs.fa",
105
+ "test/data/BS00068396_51_contigs.fa.fai",
106
+ "test/data/BS00068396_51_contigs.fa.nhr",
107
+ "test/data/BS00068396_51_contigs.fa.nin",
108
+ "test/data/BS00068396_51_contigs.fa.nsq",
105
109
  "test/data/BS00068396_51_contigs.nhr",
106
110
  "test/data/BS00068396_51_contigs.nin",
107
111
  "test/data/BS00068396_51_contigs.nsq",
@@ -116,13 +116,13 @@ module Bio::PolyploidTools
116
116
  target_region = exon.target_region
117
117
  exon_start_offset = exon.query_region.start - gene_region.start
118
118
  chr_local_pos=local_pos_in_gene + target_region.start + 1
119
- ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
120
- to_print = "-" * exon_start_offset
121
- chr_seq = chromosome_sequence(exon.target_region).to_s
122
- l_pos = exon_start_offset + local_pos_in_gene
123
- to_print << chr_seq
119
+ ret_str << ">#{chromosome}_SNP-#{chr_local_pos} #{exon.to_s} #{target_region.orientation}\n"
120
+ to_print = "-" * exon_start_offset
121
+ chr_seq = chromosome_sequence(exon.target_region).to_s
122
+ l_pos = exon_start_offset + local_pos_in_gene
123
+ to_print << chr_seq
124
124
  to_print[local_pos_in_gene] = to_print[local_pos_in_gene].upcase
125
- ret_str << to_print
125
+ ret_str << to_print
126
126
  end
127
127
  ret_str
128
128
  end
@@ -16,6 +16,8 @@ module Bio::PolyploidTools
16
16
  attr_accessor :chromosome
17
17
  attr_accessor :variation_free_region
18
18
 
19
+
20
+
19
21
  #Format:
20
22
  #Gene_name,Original,SNP_Pos,pos,chromosome
21
23
  #A_comp0_c0_seq1,C,519,A,2A
@@ -30,7 +32,7 @@ module Bio::PolyploidTools
30
32
  snp.snp.upcase!
31
33
  snp.snp.strip!
32
34
  snp.chromosome.strip!
33
- snp.exon_list = Hash.new()
35
+
34
36
  snp.use_reference = false
35
37
  snp
36
38
  end
@@ -60,6 +62,16 @@ module Bio::PolyploidTools
60
62
  @primer_3_min_seq_length = 50
61
63
  @variation_free_region = 0
62
64
  @contig = false
65
+ @exon_list = Hash.new {|hsh, key| hsh[key] = [] }
66
+ end
67
+
68
+ def to_polymarker_coordinates(flanking_size, total:nil)
69
+ start = position - flanking_size + 1
70
+ start = 0 if start < 0
71
+ total = flanking_size * 2 unless total
72
+ total += 1
73
+ new_position = position - start + 2
74
+ [start , total, new_position ]
63
75
  end
64
76
 
65
77
  def to_polymarker_sequence(flanking_size, total:nil)
@@ -103,8 +115,7 @@ module Bio::PolyploidTools
103
115
  end
104
116
 
105
117
  def add_exon(exon, arm)
106
- @exon_list[arm] = exon unless @exon_list[arm]
107
- @exon_list[arm] = exon if exon.record.score > @exon_list[arm].record.score
118
+ exon_list[arm] << exon
108
119
  end
109
120
 
110
121
  def covered_region
@@ -115,28 +126,28 @@ module Bio::PolyploidTools
115
126
  reg.orientation = :forward
116
127
  reg.start = self.position - self.flanking_size
117
128
  reg.end = self.position + self.flanking_size
118
-
119
129
  reg.start = 1 if reg.start < 1
120
-
121
130
  return reg
122
131
  end
123
132
 
124
133
  min = @position
125
134
  max = @position
126
- # puts "Calculating covered region for #{self.inspect}"
127
- # puts "#{@exon_list.inspect}"
128
- #raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
135
+ # puts "Calculating covered region for #{self.inspect}"
136
+ # puts "#{@exon_list.inspect}"
137
+ # raise SNPException.new "Exons haven't been loaded for #{self.to_s}" if @exon_list.size == 0
129
138
  if @exon_list.size == 0
130
139
  min = self.position - self.flanking_size
131
140
  min = 1 if min < 1
132
141
  max = self.position + self.flanking_size
133
142
  end
134
- @exon_list.each do | chromosome, exon |
135
- # puts exon.inspect
136
- reg = exon.query_region
137
- min = reg.start if reg.start < min
138
- max = reg.end if reg.end > max
143
+ @exon_list.each do | chromosome, exon_arr |
144
+ exon_arr.each do | exon |
145
+ reg = exon.query_region
146
+ min = reg.start if reg.start < min
147
+ max = reg.end if reg.end > max
148
+ end
139
149
  end
150
+
140
151
  reg = Bio::DB::Fasta::Region.new()
141
152
  reg.entry = gene
142
153
  reg.orientation = :forward
@@ -168,24 +179,6 @@ module Bio::PolyploidTools
168
179
  pos + left_padding
169
180
  end
170
181
 
171
- def exon_fasta_string
172
- gene_region = self.covered_region
173
- local_pos_in_gene = self.local_position
174
- ret_str = ""
175
- container.parents.each do |name, bam|
176
- ret_str << ">#{gene_region.entry}-#{self.position}_#{name} Overlapping_exons:#{gene_region.to_s} localSNPpo:#{local_pos_in_gene+1}\n"
177
- to_print = parental_sequences[name]
178
- ret_str << to_print << "\n"
179
- end
180
- self.exon_sequences.each do | chromosome, exon_seq |
181
- ret_str << ">#{chromosome}\n#{exon_seq}\n"
182
- end
183
- mask = masked_chromosomal_snps("1BS", flanking_size)
184
- ret_str << ">Mask\n#{mask}\n"
185
- ret_str
186
- end
187
-
188
-
189
182
  def primer_fasta_string
190
183
  gene_region = self.covered_region
191
184
  local_pos_in_gene = self.local_position
@@ -209,12 +202,15 @@ module Bio::PolyploidTools
209
202
  end
210
203
 
211
204
  def primer_region(target_chromosome, parental )
205
+
212
206
  parental = aligned_sequences[parental].downcase
207
+ names = aligned_sequences.keys
208
+ target_chromosome = get_target_sequence(names, target_chromosome)
209
+
213
210
  chromosome_seq = aligned_sequences[target_chromosome]
214
211
  chromosome_seq = "-" * parental.size unless chromosome_seq
215
212
  chromosome_seq = chromosome_seq.downcase
216
213
  mask = mask_aligned_chromosomal_snp(target_chromosome)
217
- #puts "'#{mask}'"
218
214
 
219
215
  pr = PrimerRegion.new
220
216
  position_in_region = 0
@@ -291,8 +287,9 @@ module Bio::PolyploidTools
291
287
 
292
288
  end
293
289
 
294
-
295
- str = "SEQUENCE_ID=#{opts[:name]} #{orientation}\n"
290
+ #puts "__"
291
+ #puts self.inspect
292
+ str = "SEQUENCE_ID=#{opts[:name]} #{orientation} \n"
296
293
  str << "SEQUENCE_FORCE_LEFT_END=#{left}\n" unless opts[:extra_f]
297
294
  str << "SEQUENCE_FORCE_RIGHT_END=#{right}\n" if opts[:right_pos]
298
295
  str << extra if extra
@@ -326,10 +323,10 @@ module Bio::PolyploidTools
326
323
  primer_3_propertes = Array.new
327
324
 
328
325
  seq_original = String.new(pr.sequence)
329
- puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
326
+ #puts seq_original.size.to_s << "-" << primer_3_min_seq_length.to_s
330
327
  return primer_3_propertes if seq_original.size < primer_3_min_seq_length
331
328
  #puts self.inspect
332
- puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
329
+ #puts pr.snp_pos.to_s << "(" << seq_original.length.to_s << ")"
333
330
 
334
331
  seq_original[pr.snp_pos] = self.original
335
332
  seq_original_reverse = reverse_complement_string(seq_original)
@@ -432,12 +429,13 @@ module Bio::PolyploidTools
432
429
 
433
430
  seq[local_pos_in_gene] = self.snp if name == self.snp_in
434
431
  @parental_sequences [name] = seq
435
- puts name
436
- puts seq
437
432
  end
438
433
  @parental_sequences
439
434
  end
440
435
 
436
+
437
+
438
+
441
439
  def surrounding_parental_sequences
442
440
  return @surrounding_parental_sequences if @surrounding_parental_sequences
443
441
  gene_region = self.covered_region
@@ -450,11 +448,15 @@ module Bio::PolyploidTools
450
448
  seq = bam.consensus_with_ambiguities({:region=>gene_region}).to_s
451
449
  else
452
450
  seq = container.gene_model_sequence(gene_region)
453
-
454
- unless name == self.snp_in
455
- #puts "Modifing original: #{name} #{seq}"
456
- seq[local_pos_in_gene] = self.original
457
- end
451
+ #puts "#{name} #{self.snp_in}"
452
+ #puts "Modifing original: #{name}\n#{seq}"
453
+ unless name == self.snp_in
454
+
455
+ seq[local_pos_in_gene] = self.original
456
+ else
457
+ seq[local_pos_in_gene] = self.snp
458
+ end
459
+ #puts "#{seq}"
458
460
  end
459
461
  seq[local_pos_in_gene] = seq[local_pos_in_gene].upcase
460
462
  seq[local_pos_in_gene] = self.snp if name == self.snp_in
@@ -522,71 +524,101 @@ module Bio::PolyploidTools
522
524
  ret_str
523
525
  end
524
526
 
527
+
528
+ def get_snp_position_after_trim
529
+ local_pos_in_gene = self.local_position
530
+ ideal_min = self.local_position - flanking_size
531
+ ideal_max = self.local_position + flanking_size
532
+ left_pad = 0
533
+ if ideal_min < 0
534
+ left_pad = ideal_min * -1
535
+ ideal_min = 0
536
+ end
537
+ local_pos_in_gene - ideal_min
538
+ end
539
+
525
540
  def aligned_snp_position
526
541
  return @aligned_snp_position if @aligned_snp_position
542
+ #puts self.inspect
527
543
  pos = -1
528
544
  parental_strings = Array.new
529
545
  parental_sequences.keys.each do | par |
530
-
531
546
  parental_strings << aligned_sequences[par]
532
547
  end
533
- template_sequence = nil
534
- aligned_sequences.keys.each do |temp |
535
- template_sequence = aligned_sequences[ temp ] if aligned_sequences[ temp ][0] != "-"
536
- end
537
548
  $stderr.puts "WARN: #{self.to_s} #{parental_sequences.keys} is not of size 2 (#{parental_strings.size})" if parental_strings.size != 2
538
549
 
550
+ local_pos_in_parental = get_snp_position_after_trim
539
551
  i = 0
540
- differences = 0
541
- local_pos_in_gene = flanking_size
542
- local_pos = 0
543
- started = false
544
- #TODO: Validate the cases when the alignment has padding on the left on all the chromosomes
545
- #unless parental_strings[0]
546
- #puts "parental hash: #{parental_sequences}"
547
- #puts "Aligned sequences: #{aligned_sequences.to_fasta}"
548
- # puts "parental_strings: #{parental_strings.to_s}"
549
- #end
550
552
  while i < parental_strings[0].size do
551
- if local_pos_in_gene == local_pos
553
+ if local_pos_in_parental == 0 and parental_strings[0][i] != "-"
552
554
  pos = i
553
555
  if parental_strings[0][i] == parental_strings[1][i]
554
556
  $stderr.puts "WARN: #{self.to_s} doesn't have a SNP in the marked place (#{i})! \n#{parental_strings[0]}\n#{parental_strings[1]}"
555
557
  end
556
-
557
- end
558
-
559
- started = true if template_sequence[i] != "-"
560
- if started == false or template_sequence[i] != "-"
561
- local_pos += 1
562
558
  end
559
+
560
+ local_pos_in_parental -= 1 if parental_strings[0][i] != "-"
563
561
  i += 1
564
562
  end
565
563
  @aligned_snp_position = pos
566
564
  return pos
567
565
  end
568
566
 
567
+ def get_target_sequence(names, chromosome)
568
+
569
+ best = chromosome
570
+ best_score = 0
571
+ names.each do |e|
572
+ arr = e.split("_")
573
+ if arr.length == 3
574
+ score = arr[2].to_f
575
+ if score >best_score
576
+ best_score = score
577
+ best = e
578
+ end
579
+ end
580
+ end
581
+ best
582
+ end
583
+
584
+
585
+
569
586
  def mask_aligned_chromosomal_snp(chromosome)
570
- names = exon_sequences.keys
587
+ names = aligned_sequences.keys
571
588
  parentals = parental_sequences.keys
572
589
 
590
+ position_after_trim = get_snp_position_after_trim
591
+
592
+ names = names - parentals
573
593
  local_pos_in_gene = aligned_snp_position
574
- masked_snps = aligned_sequences[chromosome].downcase if aligned_sequences[chromosome]
575
- masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[chromosome]
594
+
595
+ best_target = get_target_sequence(names, chromosome)
596
+ masked_snps = aligned_sequences[best_target].downcase if aligned_sequences[best_target]
597
+ masked_snps = "-" * aligned_sequences.values[0].size unless aligned_sequences[best_target]
576
598
  #TODO: Make this chromosome specific, even when we have more than one alignment going to the region we want.
599
+ #puts "mask_aligned_chromosomal_snp(#{chromosome})"
600
+ #puts names
577
601
  i = 0
578
- while i < masked_snps.size
602
+ for i in 0..masked_snps.size-1
603
+ #puts i
579
604
  different = 0
580
605
  cov = 0
581
606
  from_group = 0
582
607
  nCount = 0
608
+ seen = []
583
609
  names.each do | chr |
584
610
  if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
611
+ #puts aligned_sequences[chr][i]
585
612
  cov += 1
586
613
  nCount += 1 if aligned_sequences[chr][i] == 'N' or aligned_sequences[chr][i] == 'n' # maybe fix this to use ambiguity codes instead.
587
- from_group += 1 if chr[0] == chromosome_group
614
+
615
+ if chr[0] == chromosome_group and not seen.include? chr[1]
616
+ seen << chr[1]
617
+ from_group += 1
618
+
619
+ end
588
620
  #puts "Comparing #{chromosome_group} and #{chr[0]} as chromosomes"
589
- if chr != chromosome
621
+ if chr != best_target
590
622
  $stderr.puts "WARN: No base for #{masked_snps} : ##{i}" unless masked_snps[i].upcase
591
623
  $stderr.puts "WARN: No base for #{aligned_sequences[chr]} : ##{i}" unless masked_snps[i].upcase
592
624
  different += 1 if masked_snps[i].upcase != aligned_sequences[chr][i].upcase
@@ -598,12 +630,15 @@ module Bio::PolyploidTools
598
630
  masked_snps[i] = "-" if nCount > 0
599
631
  masked_snps[i] = "*" if cov == 0
600
632
  expected_snps = names.size - 1
601
- # puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
633
+
634
+ #puts "Diferences: #{different} to expected: #{ expected_snps } [#{i}] Genome count (#{from_group} == #{genomes_count})"
602
635
 
603
636
  masked_snps[i] = masked_snps[i].upcase if different == expected_snps and from_group == genomes_count
637
+ #puts "#{i}:#{masked_snps[i]}"
604
638
 
605
639
  if i == local_pos_in_gene
606
640
  masked_snps[i] = "&"
641
+ #puts "#{i}:#{masked_snps[i]}___"
607
642
  bases = ""
608
643
  names.each do | chr |
609
644
  bases << aligned_sequences[chr][i] if aligned_sequences[chr] and aligned_sequences[chr][i] != "-"
@@ -617,62 +652,22 @@ module Bio::PolyploidTools
617
652
  end
618
653
 
619
654
  end
620
- i += 1
621
- end
622
- masked_snps
623
- end
624
-
625
- def masked_chromosomal_snps(chromosome)
626
- chromosomes = exon_sequences
627
- names = chromosomes.keys
628
- masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
629
- masked_snps = "-" * covered_region.size unless chromosomes[chromosome]
630
- local_pos_in_gene = self.local_position
631
- ideal_min = local_pos_in_gene - flanking_size
632
- ideal_max = local_pos_in_gene + flanking_size
633
- i = 0
634
- while i < masked_snps.size do
635
- if i > ideal_min and i <= ideal_max
636
-
637
- different = 0
638
- cov = 0
639
- names.each do | chr |
640
- if chromosomes[chr][i] != "-"
641
- cov += 1
642
- if chr != chromosome and masked_snps[i] != "+"
643
- different += 1 if masked_snps[i] != chromosomes[chr][i]
644
- end
645
- end
646
-
647
- end
648
- masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
649
- masked_snps[i] = "-" if cov < 2
650
- masked_snps[i] = masked_snps[i].upcase if different > 1
651
-
652
- else
653
- masked_snps[i] = "*"
654
- end
655
- if i == local_pos_in_gene
656
- masked_snps[i] = "&"
657
- end
658
- i += 1
655
+ #i += 1
659
656
  end
660
657
  masked_snps
661
658
  end
662
659
 
660
+
663
661
  def surrounding_masked_chromosomal_snps(chromosome)
664
662
 
665
663
  chromosomes = surrounding_exon_sequences
666
664
  names = chromosomes.keys
665
+ get_target_sequence(names)
667
666
  masked_snps = chromosomes[chromosome].tr("-","+") if chromosomes[chromosome]
668
667
  masked_snps = "-" * (flanking_size * 2 ) unless chromosomes[chromosome]
669
668
  local_pos_in_gene = flanking_size
670
- # ideal_min = local_pos_in_gene - flanking_size
671
- #ideal_max = local_pos_in_gene + flanking_size
672
669
  i = 0
673
670
  while i < masked_snps.size do
674
-
675
-
676
671
  different = 0
677
672
  cov = 0
678
673
  names.each do | chr |
@@ -682,13 +677,11 @@ module Bio::PolyploidTools
682
677
  different += 1 if masked_snps[i] != chromosomes[chr][i]
683
678
  end
684
679
  end
685
-
686
680
  end
687
681
  masked_snps[i] = "-" if different == 0 and masked_snps[i] != "+"
688
682
  masked_snps[i] = "-" if cov < 2
689
683
  masked_snps[i] = masked_snps[i].upcase if different > 1
690
684
 
691
-
692
685
  if i == local_pos_in_gene
693
686
  masked_snps[i] = "&"
694
687
  end
@@ -699,18 +692,19 @@ module Bio::PolyploidTools
699
692
 
700
693
  def surrounding_exon_sequences
701
694
  return @surrounding_exon_sequences if @surrounding_exon_sequences
695
+ gene_region = self.covered_region
702
696
  @surrounding_exon_sequences = Bio::Alignment::SequenceHash.new
703
- self.exon_list.each do |chromosome, exon|
704
- #puts "surrounding_exon_sequences #{flanking_size}"
705
- #puts chromosome
706
- #puts exon
707
- flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
708
- #TODO: Padd when the exon goes over the regions...
709
-
710
- #Ignoring when the exon is in a gap
711
- unless exon.snp_in_gap
712
- exon_seq = container.chromosome_sequence(flanquing_region)
713
- @surrounding_exon_sequences[chromosome] = exon_seq
697
+ self.exon_list.each do |chromosome, exon_arr|
698
+ exon_arr.each do |exon|
699
+ exon_start_offset = exon.query_region.start - gene_region.start
700
+ flanquing_region = exon.target_flanking_region_from_position(position,flanking_size)
701
+ #TODO: Padd when the exon goes over the regions...
702
+ #puts flanquing_region.inspect
703
+ #Ignoring when the exon is in a gap
704
+ unless exon.snp_in_gap
705
+ exon_seq = container.chromosome_sequence(flanquing_region)
706
+ @surrounding_exon_sequences["#{chromosome}_#{flanquing_region.start}_#{exon.record.score}"] = exon_seq
707
+ end
714
708
  end
715
709
  end
716
710
  @surrounding_exon_sequences
@@ -722,18 +716,21 @@ module Bio::PolyploidTools
722
716
  gene_region = self.covered_region
723
717
  local_pos_in_gene = self.local_position
724
718
  @exon_sequences = Bio::Alignment::SequenceHash.new
725
- self.exon_list.each do |chromosome, exon|
726
- exon_start_offset = exon.query_region.start - gene_region.start
727
- exon_seq = "-" * exon_start_offset
728
- exon_seq << container.chromosome_sequence(exon.target_region).to_s
729
- #puts exon_seq
730
- # l_pos = exon_start_offset + local_pos_in_gene
731
- unless exon.snp_in_gap
732
- #puts "local position: #{local_pos_in_gene}"
733
- #puts "Exon_seq: #{exon_seq}"
734
- exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
735
- exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
736
- @exon_sequences[chromosome] = exon_seq
719
+ self.exon_list.each do |chromosome, exon_arr|
720
+ exon_arr.each do |exon|
721
+ exon_start_offset = exon.query_region.start - gene_region.start
722
+ exon_seq = "-" * exon_start_offset
723
+ exon_seq << container.chromosome_sequence(exon.target_region).to_s
724
+ #puts exon_seq
725
+ #l_pos = exon_start_offset + local_pos_in_gene
726
+ unless exon.snp_in_gap
727
+ #puts "local position: #{local_pos_in_gene}"
728
+ #puts "Exon_seq: #{exon_seq}"
729
+ exon_seq[local_pos_in_gene] = exon_seq[local_pos_in_gene].upcase
730
+ exon_seq << "-" * (gene_region.size - exon_seq.size + 1)
731
+ #puts exon.inspect
732
+ @exon_sequences["#{chromosome}_#{exon.query_region.start}_#{exon.record.score}"] = exon_seq
733
+ end
737
734
  end
738
735
  end
739
736
  @exon_sequences[@chromosome] = "-" * gene_region.size unless @exon_sequences[@chromosome]
@@ -38,7 +38,6 @@ module Bio::PolyploidTools
38
38
  $stderr.puts e
39
39
  end
40
40
 
41
- snp.exon_list = Hash.new()
42
41
  snp.flanking_size=100
43
42
  snp.region_size = region_size.to_i if region_size
44
43
  snp.flanking_size = parsed_flanking.to_i if parsed_flanking
@@ -29,7 +29,7 @@ module Bio::PolyploidTools
29
29
  #snp.snp.upcase!
30
30
  snp.chromosome. strip!
31
31
  snp.parse_sequence_snp
32
- snp.exon_list = Hash.new()
32
+
33
33
  snp
34
34
  end
35
35
 
@@ -113,6 +113,8 @@ module Bio::DB::Primer3
113
113
  right_start = 0
114
114
  right_end = 0
115
115
  total_columns_before_messages=17
116
+ #puts "Values in primer3"
117
+ #puts snp_from.inspect
116
118
  @values = Array.new
117
119
  #@values << "#{gene},,#{template_length},"
118
120
  @values << gene
@@ -763,7 +765,7 @@ module Bio::DB::Primer3
763
765
  snp.line_1 = @line_1
764
766
  snp.line_2 = @line_2
765
767
  snp.snp_from = snp_in
766
- snp.regions = snp_in.exon_list.values.collect { |x| x.target_region.to_s }
768
+ snp.regions = snp_in.exon_list.values.collect { |x| x.collect {|y| y.target_region.to_s }}
767
769
  @snp_hash[snp.to_s] = snp
768
770
  snp
769
771
  end
@@ -0,0 +1,4 @@
1
+ 2AS_5222932 6364 46 6364 6365
2
+ 2BS_5245544 8836 6457 8836 8837
3
+ 2BS_5163353 11974 15341 11974 11975
4
+ 2DS_5334799 7226 27363 7226 7227
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-polyploid-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.0
4
+ version: 0.8.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ricardo H. Ramirez-Gonzalez
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-01-18 00:00:00.000000000 Z
11
+ date: 2018-01-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bio
@@ -206,6 +206,10 @@ files:
206
206
  - test/data/BS00068396_51_contigs.aln
207
207
  - test/data/BS00068396_51_contigs.dnd
208
208
  - test/data/BS00068396_51_contigs.fa
209
+ - test/data/BS00068396_51_contigs.fa.fai
210
+ - test/data/BS00068396_51_contigs.fa.nhr
211
+ - test/data/BS00068396_51_contigs.fa.nin
212
+ - test/data/BS00068396_51_contigs.fa.nsq
209
213
  - test/data/BS00068396_51_contigs.nhr
210
214
  - test/data/BS00068396_51_contigs.nin
211
215
  - test/data/BS00068396_51_contigs.nsq