bio-polyploid-tools 0.8.1 → 0.8.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA256:
3
- metadata.gz: 5cc3c126779f27e61f521959b82d13f240cb2ce8d5c5416e511f9150ced798eb
4
- data.tar.gz: 13adf99f1336327f7b399057d66ee63892ee276ee507a398fb4a6936a2a765a2
2
+ SHA1:
3
+ metadata.gz: 28167dfdf75d85d33f970351d2f9a1b166d179c7
4
+ data.tar.gz: f1635243148bb245ff2af217eb333ace9087a011
5
5
  SHA512:
6
- metadata.gz: d345c23216e2d6aa3174885a053300ec499230125625d36fe1b5efc8de3d151b5423ad8ed5e0c563b2c9be5b07df29a759a9f95ee1196588eefa0ab2e40ec802
7
- data.tar.gz: 1ed47854ec04f95c7d8449e1a15885a67f00c076b4bc07705616d6762ad7cf8822d60933cb737c71b95cbfb46293b4cb970e3d2f6413ccde70ca8e0f372e2ab4
6
+ metadata.gz: 1cfb4d6a3f49874f430da5bc1342b2bef31ca57cc73e4108272b8e4606426acfc303d21d9035dfcc96569988fcd6f74e35c8e60e5bcc35348ab8d6014a044e7b
7
+ data.tar.gz: 632d1e2488f566adb856745c9b85999b5fc8cb3e4f7326999f94fd70db7c30555e13963282785b2465ac8e6cab5d2644a06eb54a5fa1367294c99d49c30ffd60
@@ -12,6 +12,7 @@ rvm:
12
12
  - 2.2.5
13
13
  - 2.3.5
14
14
  - 2.4.2
15
+ - 2.5.0
15
16
 
16
17
  before_install:
17
18
  - export RUBYOPT="-W1"
data/README.md CHANGED
@@ -17,13 +17,13 @@ You need to have in your ```$PATH``` the following programs:
17
17
  * [exonerate](http://www.ebi.ac.uk/~guy/exonerate/)
18
18
  * [blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE%3DBlastDocs&DOC_TYPE%3DDownload)
19
19
 
20
- The code was originally developed on ruby 2.1.0, but it should work on 1.9.3 and above. However, it is only actively tested in currently supported ruby versions:
20
+ The code was originally developed on ruby 2.1, 2.3 and 2.5. It may work on older version. However, it is only actively tested in currently supported ruby versions:
21
21
 
22
22
  * 2.1.10
23
23
  * 2.2.5
24
24
  * 2.3.5
25
25
  * 2.4.2
26
-
26
+ * 2.5.0
27
27
 
28
28
  # PolyMarker
29
29
 
@@ -102,10 +102,10 @@ This file format can be used with ```snp_positions_to_polymarker.rb``` to produc
102
102
  By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
103
103
  ) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
104
104
 
105
- If your contigs follow a different convention, in the file ```polymarker.rb``` it is possible to define new parsers, by adding at the begining, with the rest of the parsers a new lambda like:
105
+ If your contigs follow a different convention, in the file ```ChromosomeArm.rb``` it is possible to define new parsers, by adding at the begining, with the rest of the parsers a new lambda like:
106
106
 
107
107
  ```rb
108
- arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
108
+ @@arm_selection_functions[:embl] = lambda do | contig_name|
109
109
  arr = contig_name.split('_')
110
110
  ret = "U"
111
111
  ret = arr[2][0,2] if arr.size >= 3
@@ -128,6 +128,16 @@ To use blast instead of exonerate, use the following command:
128
128
 
129
129
  ## Release Notes
130
130
 
131
+ ### 0.8.2
132
+
133
+ * FEATURE: The functions to select the chromosome arm are now in ```lib/bio/PolyploidTools/ChromosomeArm.rb``` and the help message is updated automatically with the valid options.
134
+ * FEATURE: Added option ```filter_best``` to replicate the original behaviour of selecting the best hit of each chromosome. Still useful for assemblies which still contain synthetic duplications.
135
+
136
+ ### 0.8.1
137
+
138
+ * BUGFIX: There was an error which prevented the correct localisation of the SNP in markeres with gaps in the local alignment before the position with the snp.
139
+ * FEATURE: PolyMarker now selects the best hit of the target chromosome. This improves the specificity in regions with a recent duplication. The drawback is that if your assembly has artificial repetitions, the primers won't be marked as 'chromosome specific', but as 'chromosome semi-specific '. In a future version this will be addressed.
140
+
131
141
  ### 0.8
132
142
 
133
143
  * FEATURE: ```polymarker.rb``` added the flag ```--aligner blast|exonerate ``` which lets you pick between ```blast``` or ```exonerate``` as the aligner. For blast the default is to have the database with the same name as the ```--contigs``` file. However, it is possible to use a different name vua the option ```--database```.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.8.1
1
+ 0.8.2
@@ -10,43 +10,7 @@ $: << File.expand_path('.')
10
10
  path= File.expand_path(File.dirname(__FILE__) + '/../lib/bioruby-polyploid-tools.rb')
11
11
  require path
12
12
 
13
- arm_selection_functions = Hash.new;
14
13
 
15
- arm_selection_functions[:arm_selection_nrgenes] = lambda do | contig_name |
16
- #example format: chr2A
17
- ret = contig_name[3,2]
18
- return ret
19
- end
20
-
21
- arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
22
- contig_name.gsub!(/chr/,"")
23
- ret = contig_name[0,2]
24
- return ret
25
- end
26
-
27
- #Function to parse stuff like: "IWGSC_CSS_1AL_scaff_110"
28
- #Or the first two characters in the contig name, to deal with
29
- #pseudomolecules that start with headers like: "1A"
30
- #And with the cases when 3B is named with the prefix: v443
31
- arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
32
-
33
- arr = contig_name.split('_')
34
- ret = "U"
35
- ret = arr[2][0,2] if arr.size >= 3
36
- ret = "3B" if arr.size == 2 and arr[0] == "v443"
37
- ret = arr[0][0,2] if arr.size == 1
38
- return ret
39
- end
40
-
41
- arm_selection_functions[:arm_selection_morex] = lambda do | contig_name |
42
- ret = contig_name.split(':')[0].split("_")[1];
43
- return ret
44
- end
45
-
46
- arm_selection_functions[:scaffold] = lambda do | contig_name |
47
- ret = contig_name;
48
- return ret
49
- end
50
14
 
51
15
  def validate_files(o)
52
16
  [
@@ -66,7 +30,7 @@ options[:chunks] = 1
66
30
  options[:bucket_size] = 0
67
31
  options[:bucket] = 1
68
32
  options[:model] = "est2genome"
69
- options[:arm_selection] = arm_selection_functions[:arm_selection_embl] ;
33
+ options[:arm_selection] = Bio::PolyploidTools::ChromosomeArm.getArmSelection("nrgene");
70
34
  options[:flanking_size] = 150;
71
35
  options[:variation_free_region] = 0
72
36
  options[:extract_found_contigs] = false
@@ -74,6 +38,7 @@ options[:genomes_count] = 3
74
38
  options[:min_identity] = 90
75
39
  options[:scoring] = :genome_specific
76
40
  options[:database] = false
41
+ options[:filter_best] = false
77
42
  options[:aligner] = :exonerate
78
43
 
79
44
 
@@ -87,6 +52,8 @@ options[:primer_3_preferences] = {
87
52
  :primer_thermodynamic_parameters_path=>File.expand_path(File.dirname(__FILE__) + '../../conf/primer3_config/') + '/'
88
53
  }
89
54
 
55
+
56
+
90
57
  OptionParser.new do |opts|
91
58
  opts.banner = "Usage: polymarker.rb [options]"
92
59
 
@@ -102,6 +69,11 @@ OptionParser.new do |opts|
102
69
  options[:genomes_count] = o.to_i
103
70
  end
104
71
 
72
+ opts.on("-b", "--filter_best", "If set, only keep the best alignment for each chromosome") do
73
+ options[:filter_best] = true
74
+ end
75
+
76
+
105
77
  opts.on("-s", "--snp_list FILE", "File with the list of snps to search from, requires --reference to get the sequence using a position") do |o|
106
78
  options[:snp_list] = o
107
79
  end
@@ -127,7 +99,7 @@ OptionParser.new do |opts|
127
99
  options[:model] = o
128
100
  end
129
101
 
130
- opts.on("-a", "--arm_selection arm_selection_embl|arm_selection_morex|arm_selection_first_two|scaffold", "Function to decide the chromome arm") do |o|
102
+ opts.on("-a", "--arm_selection #{Bio::PolyploidTools::ChromosomeArm.getValidFunctions.join('|')}", "Function to decide the chromome arm") do |o|
131
103
  tmp_str = o
132
104
  arr = o.split(",")
133
105
  if arr.size == 2
@@ -138,7 +110,7 @@ OptionParser.new do |opts|
138
110
  return ret
139
111
  end
140
112
  else
141
- options[:arm_selection] = arm_selection_functions[o.to_sym];
113
+ options[:arm_selection] = Bio::PolyploidTools::ChromosomeArm.getArmSelection(o)
142
114
  end
143
115
 
144
116
  end
@@ -370,7 +342,11 @@ snps.each do |snp|
370
342
  snp.variation_free_region = options[:variation_free_region]
371
343
  container.add_snp(snp)
372
344
  end
373
- container.add_alignments({:exonerate_file=>exonerate_file, :arm_selection=>options[:arm_selection] , :min_identity=>min_identity})
345
+ container.add_alignments({
346
+ :exonerate_file=>exonerate_file,
347
+ :arm_selection=>options[:arm_selection],
348
+ :min_identity=>min_identity,
349
+ :filter_best=>options[:filter_best]})
374
350
 
375
351
 
376
352
  #4.1 generating primer3 file
@@ -2,16 +2,16 @@
2
2
  # DO NOT EDIT THIS FILE DIRECTLY
3
3
  # Instead, edit Juwelier::Tasks in Rakefile, and run 'rake gemspec'
4
4
  # -*- encoding: utf-8 -*-
5
- # stub: bio-polyploid-tools 0.8.1 ruby lib
5
+ # stub: bio-polyploid-tools 0.8.2 ruby lib
6
6
 
7
7
  Gem::Specification.new do |s|
8
8
  s.name = "bio-polyploid-tools".freeze
9
- s.version = "0.8.1"
9
+ s.version = "0.8.2"
10
10
 
11
11
  s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
12
12
  s.require_paths = ["lib".freeze]
13
13
  s.authors = ["Ricardo H. Ramirez-Gonzalez".freeze]
14
- s.date = "2018-01-19"
14
+ s.date = "2018-01-23"
15
15
  s.description = "Repository of tools developed at Crop Genetics in JIC to work with polyploid wheat".freeze
16
16
  s.email = "ricardo.ramirez-gonzalez@jic.ac.uk".freeze
17
17
  s.executables = ["bfr.rb".freeze, "blast_triads.rb".freeze, "blast_triads_promoters.rb".freeze, "count_variations.rb".freeze, "filter_blat_by_target_coverage.rb".freeze, "filter_exonerate_by_identity.rb".freeze, "find_best_blat_hit.rb".freeze, "find_best_exonerate.rb".freeze, "find_homoeologue_variations.rb".freeze, "get_longest_hsp_blastx_triads.rb".freeze, "hexaploid_primers.rb".freeze, "homokaryot_primers.rb".freeze, "mafft_triads.rb".freeze, "mafft_triads_promoters.rb".freeze, "map_markers_to_contigs.rb".freeze, "markers_in_region.rb".freeze, "polymarker.rb".freeze, "polymarker_capillary.rb".freeze, "snp_position_to_polymarker.rb".freeze, "snps_between_bams.rb".freeze, "vcfLineToTable.rb".freeze]
@@ -172,7 +172,7 @@ Gem::Specification.new do |s|
172
172
  ]
173
173
  s.homepage = "http://github.com/tgac/bioruby-polyploid-tools".freeze
174
174
  s.licenses = ["MIT".freeze]
175
- s.rubygems_version = "2.7.4".freeze
175
+ s.rubygems_version = "2.6.14".freeze
176
176
  s.summary = "Tool to work with polyploids, NGS and molecular biology".freeze
177
177
 
178
178
  if s.respond_to? :specification_version then
@@ -1,48 +1,51 @@
1
- module Bio::PolyploidTools
2
-
3
- class ChromosomeArm
4
- attr_accessor :name
5
- attr_reader :genes
6
- attr_reader :loaded_entries
7
- attr_reader :fasta_db
8
-
9
- def initialize(name, path_to_fasta)
10
- @name = name
11
- @fasta_db = Bio::DB::Fasta::FastaFile.new({:fasta=>path_to_fasta})
12
- #$stderr.puts "Loading entries for #{name}"
13
-
14
- @genes = Hash.new
15
- end
16
-
17
- def fetch_contig(contig_id)
18
-
19
- @fasta_db.load_fai_entries unless @loaded_entries
20
- @loaded_entries = true
21
- entry = fasta_db.index.region_for_entry(contig_id)
22
- # puts entry
23
- @fasta_db.fetch_sequence(entry.get_full_region)
24
- end
25
-
26
- #Loads all the chromosome arms in a folder.
27
- #The current version requires that all the references end with .fa, and start with XXX_*.fa
28
- #Where XXX is the chromosome name
29
- def self.load_from_folder(path_to_contigs)
30
- chromosomeArms = Hash.new
31
-
32
- Dir.foreach(path_to_contigs) do |filename |
33
- if File.fnmatch("*.fa", filename)
34
-
35
- parsed = /^(?<arm>\d\w+)/.match(filename)
36
- target="#{path_to_contigs}/#{filename}"
37
- #fasta_file = Bio::DB::Fasta::FastaFile.new(target)
38
- #fasta_file.load_fai_entries
39
- arm = ChromosomeArm.new(parsed[:arm], target)
40
- chromosomeArms[arm.name] = arm
41
- end
42
- end
43
- return chromosomeArms
44
- end
45
-
1
+ class Bio::PolyploidTools::ChromosomeArm
2
+
3
+
4
+
5
+ @@arm_selection_functions = Hash.new;
6
+
7
+ #example format: chr2A
8
+ @@arm_selection_functions[:nrgene] = lambda do | contig_name |
9
+ ret = contig_name[3,2]
10
+ return ret
11
+ end
12
+
13
+ @@arm_selection_functions[:first_two] = lambda do | contig_name |
14
+ contig_name.gsub!(/chr/,"")
15
+ ret = contig_name[0,2]
16
+ return ret
17
+ end
18
+
19
+ #Function to parse stuff like: "IWGSC_CSS_1AL_scaff_110"
20
+ #Or the first two characters in the contig name, to deal with
21
+ #pseudomolecules that start with headers like: "1A"
22
+ #And with the cases when 3B is named with the prefix: v443
23
+ @@arm_selection_functions[:embl] = lambda do | contig_name|
24
+
25
+ arr = contig_name.split('_')
26
+ ret = "U"
27
+ ret = arr[2][0,2] if arr.size >= 3
28
+ ret = "3B" if arr.size == 2 and arr[0] == "v443"
29
+ ret = arr[0][0,2] if arr.size == 1
30
+ return ret
31
+ end
32
+
33
+ @@arm_selection_functions[:morex] = lambda do | contig_name |
34
+ ret = contig_name.split(':')[0].split("_")[1];
35
+ return ret
36
+ end
37
+
38
+ @@arm_selection_functions[:scaffold] = lambda do | contig_name |
39
+ ret = contig_name;
40
+ return ret
41
+ end
42
+
43
+ def self.getArmSelection(name)
44
+ @@arm_selection_functions[name.to_sym]
45
+ end
46
+
47
+ def self.getValidFunctions
48
+ @@arm_selection_functions.keys.map { |e| e.to_s }
46
49
  end
47
50
 
48
51
  end
@@ -175,9 +175,10 @@ module Bio::PolyploidTools
175
175
  end
176
176
 
177
177
  def add_alignments(opts=Hash.new)
178
- opts = { :min_identity=>90 }.merge!(opts)
178
+ opts = { :min_identity=>90, filter_best:false }.merge!(opts)
179
179
  exonerate_filename = opts[:exonerate_file]
180
180
  arm_selection = opts[:arm_selection]
181
+ filter_best = opts[:filter_best]
181
182
 
182
183
  unless arm_selection
183
184
  arm_selection = lambda do | contig_name |
@@ -197,7 +198,7 @@ module Bio::PolyploidTools
197
198
  if snp != nil and snp.position.between?( (record.query_start + 1) , record.query_end)
198
199
  begin
199
200
  exon = record.exon_on_gene_position(snp.position)
200
- snp.add_exon(exon, arm_selection.call(record.target_id))
201
+ snp.add_exon(exon, arm_selection.call(record.target_id), filter_best:filter_best)
201
202
  rescue Bio::DB::Exonerate::ExonerateException
202
203
  $stderr.puts "Failed for the range #{record.query_start}-#{record.query_end} for position #{snp.position}"
203
204
  end
@@ -114,8 +114,14 @@ module Bio::PolyploidTools
114
114
  return ">#{self.gene}\n#{self.template_sequence}\n"
115
115
  end
116
116
 
117
- def add_exon(exon, arm)
118
- exon_list[arm] << exon
117
+ def add_exon(exon, arm, filter_best: true)
118
+ if filter_best and exon_list[arm].size > 0
119
+ current = exon_list[arm].first
120
+ exon_list[arm] = [exon] if exon.record.score > current.record.score
121
+ else
122
+ exon_list[arm] << exon
123
+ end
124
+
119
125
  end
120
126
 
121
127
  def covered_region
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-polyploid-tools
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.8.1
4
+ version: 0.8.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ricardo H. Ramirez-Gonzalez
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2018-01-19 00:00:00.000000000 Z
11
+ date: 2018-01-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bio
@@ -293,7 +293,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
293
293
  version: '0'
294
294
  requirements: []
295
295
  rubyforge_project:
296
- rubygems_version: 2.7.4
296
+ rubygems_version: 2.6.14
297
297
  signing_key:
298
298
  specification_version: 4
299
299
  summary: Tool to work with polyploids, NGS and molecular biology