bio-polyploid-tools 0.7.0 → 0.7.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +57 -8
- data/VERSION +1 -1
- data/bin/polymarker.rb +11 -2
- data/bio-polyploid-tools.gemspec +4 -4
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b5cbfc6ce3619d8af0f5b7c9780227b2d8730e6c
|
4
|
+
data.tar.gz: 430ebabf4af13f68b00cf90d2018fd144a2a07d2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c01c1705ec3af7ae6253b7cc870f455e4f02e27e1d909f54b877ec3d291064deca272c14bab162990c8c57b180ae779f1c66bd9f53b0a38ad6ecba65cef96df6
|
7
|
+
data.tar.gz: cdbdf9f70e044f581e80058373d43d365767066e63f0155196a8ddd03daeabad78bba2612c52f918d48707a3369c6d9d6c9bd85fd90254f419841da239b6705f
|
data/README.md
CHANGED
@@ -21,8 +21,8 @@ The code has been developed on ruby 2.1.0, but it should work on 1.9.3 and above
|
|
21
21
|
#PolyMarker
|
22
22
|
|
23
23
|
To run poolymerker with the CSS wheat contigs, you need to unzip the
|
24
|
-
|
25
|
-
|
24
|
+
[reference file](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
25
|
+
).
|
26
26
|
|
27
27
|
|
28
28
|
|
@@ -30,14 +30,63 @@ To run poolymerker with the CSS wheat contigs, you need to unzip the
|
|
30
30
|
polymarker.rb --contigs Triticum_aestivum.IWGSC2.25.dna.genome.fa --marker_list snp_list.csv --output output_folder
|
31
31
|
```
|
32
32
|
|
33
|
-
The snp_list file must follow the convention
|
34
|
-
|
33
|
+
The ```snp_list``` file must follow the convention
|
34
|
+
```<ID>,<Chromosome>,<SEQUENCE>```
|
35
35
|
with the SNP inside the sequence in the format [A/T]. As a reference, look at test/data/short_primer_design_test.csv
|
36
36
|
|
37
37
|
If you want to use the web interface, visit the [PolyMarker webservice at TGAC](http://polymarker.tgac.ac.uk)
|
38
38
|
|
39
|
+
The available command line arguments are:
|
40
|
+
|
41
|
+
```
|
42
|
+
Usage: polymarker.rb [options]
|
43
|
+
-c, --contigs FILE File with contigs to use as database
|
44
|
+
-m, --marker_list FILE File with the list of markers to search from
|
45
|
+
-g, --genomes_count INT Number of genomes (default 3, for hexaploid)
|
46
|
+
-s, --snp_list FILE File with the list of snps to search from, requires --reference to get the sequence using a position
|
47
|
+
-t, --mutant_list FILE File with the list of positions with mutation and the mutation line.
|
48
|
+
requires --reference to get the sequence using a position
|
49
|
+
-r, --reference FILE Fasta file with the sequence for the markers (to complement --snp_list)
|
50
|
+
-o, --output FOLDER Output folder
|
51
|
+
-e, --exonerate_model MODEL Model to be used in exonerate to search for the contigs
|
52
|
+
-a, --arm_selection arm_selection_embl|arm_selection_morex|arm_selection_first_two
|
53
|
+
Function to decide the chromome arm
|
54
|
+
-p, --primer_3_preferences FILE file with preferences to be sent to primer3
|
55
|
+
-v, --variation_free_region INT If present, avoid generating the common primer if there are homoeologous SNPs within the specified distance (not tested)
|
56
|
+
-x, --extract_found_contigs If present, save in a separate file the contigs with matches. Useful to debug.
|
57
|
+
-P, --primers_to_order If present, saves a file named primers_to_order which contains the KASP tails
|
58
|
+
```
|
59
|
+
|
60
|
+
###Custom reference sequences.
|
61
|
+
By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
62
|
+
) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
|
63
|
+
|
64
|
+
If your contigs follow a different convention, in ```polymarker.rb``` it is possible to define new parsers, by adding at the begining, with the rest of the parsers a new lambda like:
|
65
|
+
|
66
|
+
```rb
|
67
|
+
arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
|
68
|
+
arr = contig_name.split('_')
|
69
|
+
ret = "U"
|
70
|
+
ret = arr[2][0,2] if arr.size >= 3
|
71
|
+
ret = "3B" if arr.size == 2 and arr[0] == "v443"
|
72
|
+
ret = arr[0][0,2] if arr.size == 1
|
73
|
+
return ret
|
74
|
+
end
|
75
|
+
```
|
76
|
+
|
77
|
+
The function should return a 2 character string, when the first is the chromosome number and the second the chromosome group. The symbol in the hash is the name to be used in the argument ```--arm_selection```. If you want your parser to be added to the distribution, feel free to fork and make a pull request.
|
78
|
+
|
79
|
+
|
80
|
+
|
39
81
|
##Release Notes
|
40
82
|
|
83
|
+
###0.7.1
|
84
|
+
* BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
|
85
|
+
* DOC: Added documentation on how to use custom references.
|
86
|
+
|
87
|
+
###0.7.0
|
88
|
+
* Added flag ```gebines_count``` for number of genomes, to be used on tetraploids, etc.
|
89
|
+
|
41
90
|
###0.6.1
|
42
91
|
|
43
92
|
|
@@ -49,10 +98,10 @@ If you want to use the web interface, visit the [PolyMarker webservice at TGAC](
|
|
49
98
|
|
50
99
|
* If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
51
100
|
|
52
|
-
BUG: Blocks with NNNs are picked and treated as semi-specific.
|
53
|
-
BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
54
|
-
TODO: Add a parameter file to configure the alignments.
|
55
|
-
TODO: Produce primers for products of different sizes
|
101
|
+
* BUG: Blocks with NNNs are picked and treated as semi-specific.
|
102
|
+
* BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
103
|
+
* TODO: Add a parameter file to configure the alignments.
|
104
|
+
* TODO: Produce primers for products of different sizes. This can probably be done with the primer_3_preferences option, but hasn't been tested.
|
56
105
|
|
57
106
|
|
58
107
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.7.
|
1
|
+
0.7.1
|
data/bin/polymarker.rb
CHANGED
@@ -17,9 +17,18 @@ arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
|
|
17
17
|
ret = contig_name[0,2]
|
18
18
|
return ret
|
19
19
|
end
|
20
|
-
|
20
|
+
|
21
|
+
#Function to parse stuff like: "IWGSC_CSS_1AL_scaff_110"
|
22
|
+
#Or the first two characters in the contig name, to deal with
|
23
|
+
#pseudomolecules that start with headers like: "1A"
|
24
|
+
#And with the cases when 3B is named with the prefix: v443
|
21
25
|
arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
|
22
|
-
|
26
|
+
|
27
|
+
arr = contig_name.split('_')
|
28
|
+
ret = "U"
|
29
|
+
ret = arr[2][0,2] if arr.size >= 3
|
30
|
+
ret = "3B" if arr.size == 2 and arr[0] == "v443"
|
31
|
+
ret = arr[0][0,2] if arr.size == 1
|
23
32
|
return ret
|
24
33
|
end
|
25
34
|
|
data/bio-polyploid-tools.gemspec
CHANGED
@@ -2,16 +2,16 @@
|
|
2
2
|
# DO NOT EDIT THIS FILE DIRECTLY
|
3
3
|
# Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
|
4
4
|
# -*- encoding: utf-8 -*-
|
5
|
-
# stub: bio-polyploid-tools 0.7.
|
5
|
+
# stub: bio-polyploid-tools 0.7.1 ruby lib
|
6
6
|
|
7
7
|
Gem::Specification.new do |s|
|
8
8
|
s.name = "bio-polyploid-tools"
|
9
|
-
s.version = "0.7.
|
9
|
+
s.version = "0.7.1"
|
10
10
|
|
11
11
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
12
12
|
s.require_paths = ["lib"]
|
13
13
|
s.authors = ["Ricardo H. Ramirez-Gonzalez"]
|
14
|
-
s.date = "2015-05-
|
14
|
+
s.date = "2015-05-28"
|
15
15
|
s.description = "Repository of tools developed in TGAC and Crop Genetics in JIC to work with polyploid wheat"
|
16
16
|
s.email = "ricardo.ramirez-gonzalez@tgac.ac.uk"
|
17
17
|
s.executables = ["bfr.rb", "count_variations.rb", "filter_blat_by_target_coverage.rb", "filter_exonerate_by_identity.rb", "find_best_blat_hit.rb", "find_best_exonerate.rb", "hexaploid_primers.rb", "homokaryot_primers.rb", "map_markers_to_contigs.rb", "markers_in_region.rb", "polymarker.rb", "snp_position_to_polymarker.rb", "snps_between_bams.rb"]
|
@@ -129,7 +129,7 @@ Gem::Specification.new do |s|
|
|
129
129
|
]
|
130
130
|
s.homepage = "http://github.com/tgac/bioruby-polyploid-tools"
|
131
131
|
s.licenses = ["MIT"]
|
132
|
-
s.rubygems_version = "2.
|
132
|
+
s.rubygems_version = "2.4.7"
|
133
133
|
s.summary = "Tool to work with polyploids, NGS and molecular biology"
|
134
134
|
|
135
135
|
if s.respond_to? :specification_version then
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-polyploid-tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ricardo H. Ramirez-Gonzalez
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-05-
|
11
|
+
date: 2015-05-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bio
|
@@ -228,7 +228,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
228
228
|
version: '0'
|
229
229
|
requirements: []
|
230
230
|
rubyforge_project:
|
231
|
-
rubygems_version: 2.
|
231
|
+
rubygems_version: 2.4.7
|
232
232
|
signing_key:
|
233
233
|
specification_version: 4
|
234
234
|
summary: Tool to work with polyploids, NGS and molecular biology
|