bio-polyploid-tools 0.7.0 → 0.7.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +57 -8
- data/VERSION +1 -1
- data/bin/polymarker.rb +11 -2
- data/bio-polyploid-tools.gemspec +4 -4
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: b5cbfc6ce3619d8af0f5b7c9780227b2d8730e6c
|
4
|
+
data.tar.gz: 430ebabf4af13f68b00cf90d2018fd144a2a07d2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c01c1705ec3af7ae6253b7cc870f455e4f02e27e1d909f54b877ec3d291064deca272c14bab162990c8c57b180ae779f1c66bd9f53b0a38ad6ecba65cef96df6
|
7
|
+
data.tar.gz: cdbdf9f70e044f581e80058373d43d365767066e63f0155196a8ddd03daeabad78bba2612c52f918d48707a3369c6d9d6c9bd85fd90254f419841da239b6705f
|
data/README.md
CHANGED
@@ -21,8 +21,8 @@ The code has been developed on ruby 2.1.0, but it should work on 1.9.3 and above
|
|
21
21
|
#PolyMarker
|
22
22
|
|
23
23
|
To run poolymerker with the CSS wheat contigs, you need to unzip the
|
24
|
-
|
25
|
-
|
24
|
+
[reference file](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
25
|
+
).
|
26
26
|
|
27
27
|
|
28
28
|
|
@@ -30,14 +30,63 @@ To run poolymerker with the CSS wheat contigs, you need to unzip the
|
|
30
30
|
polymarker.rb --contigs Triticum_aestivum.IWGSC2.25.dna.genome.fa --marker_list snp_list.csv --output output_folder
|
31
31
|
```
|
32
32
|
|
33
|
-
The snp_list file must follow the convention
|
34
|
-
|
33
|
+
The ```snp_list``` file must follow the convention
|
34
|
+
```<ID>,<Chromosome>,<SEQUENCE>```
|
35
35
|
with the SNP inside the sequence in the format [A/T]. As a reference, look at test/data/short_primer_design_test.csv
|
36
36
|
|
37
37
|
If you want to use the web interface, visit the [PolyMarker webservice at TGAC](http://polymarker.tgac.ac.uk)
|
38
38
|
|
39
|
+
The available command line arguments are:
|
40
|
+
|
41
|
+
```
|
42
|
+
Usage: polymarker.rb [options]
|
43
|
+
-c, --contigs FILE File with contigs to use as database
|
44
|
+
-m, --marker_list FILE File with the list of markers to search from
|
45
|
+
-g, --genomes_count INT Number of genomes (default 3, for hexaploid)
|
46
|
+
-s, --snp_list FILE File with the list of snps to search from, requires --reference to get the sequence using a position
|
47
|
+
-t, --mutant_list FILE File with the list of positions with mutation and the mutation line.
|
48
|
+
requires --reference to get the sequence using a position
|
49
|
+
-r, --reference FILE Fasta file with the sequence for the markers (to complement --snp_list)
|
50
|
+
-o, --output FOLDER Output folder
|
51
|
+
-e, --exonerate_model MODEL Model to be used in exonerate to search for the contigs
|
52
|
+
-a, --arm_selection arm_selection_embl|arm_selection_morex|arm_selection_first_two
|
53
|
+
Function to decide the chromome arm
|
54
|
+
-p, --primer_3_preferences FILE file with preferences to be sent to primer3
|
55
|
+
-v, --variation_free_region INT If present, avoid generating the common primer if there are homoeologous SNPs within the specified distance (not tested)
|
56
|
+
-x, --extract_found_contigs If present, save in a separate file the contigs with matches. Useful to debug.
|
57
|
+
-P, --primers_to_order If present, saves a file named primers_to_order which contains the KASP tails
|
58
|
+
```
|
59
|
+
|
60
|
+
###Custom reference sequences.
|
61
|
+
By default, the contigs and pseudomolecules from [ensembl](ftp://ftp.ensemblgenomes.org/pub/release-25/plants/fasta/triticum_aestivum/dna/Triticum_aestivum.IWGSC2.25.dna.genome.fa.gz
|
62
|
+
) are used. However, it is possible to use a custom reference. To define the chromosome where each contig belongs the argument ```arm_selection``` is used. The defailt uses ids like: ```IWGSC_CSS_1AL_scaff_110```, where the third field, separated by underscores is used. A simple way to add costum references is to rename the fasta file to follow that convention. Another way is to use the option ```--arm_selection arm_selection_first_two```, where only the first two characters in each contig is used as identifier, useful when pseudomolecules are named after the chromosomes (ie: ">1A" in the fasta file).
|
63
|
+
|
64
|
+
If your contigs follow a different convention, in ```polymarker.rb``` it is possible to define new parsers, by adding at the begining, with the rest of the parsers a new lambda like:
|
65
|
+
|
66
|
+
```rb
|
67
|
+
arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
|
68
|
+
arr = contig_name.split('_')
|
69
|
+
ret = "U"
|
70
|
+
ret = arr[2][0,2] if arr.size >= 3
|
71
|
+
ret = "3B" if arr.size == 2 and arr[0] == "v443"
|
72
|
+
ret = arr[0][0,2] if arr.size == 1
|
73
|
+
return ret
|
74
|
+
end
|
75
|
+
```
|
76
|
+
|
77
|
+
The function should return a 2 character string, when the first is the chromosome number and the second the chromosome group. The symbol in the hash is the name to be used in the argument ```--arm_selection```. If you want your parser to be added to the distribution, feel free to fork and make a pull request.
|
78
|
+
|
79
|
+
|
80
|
+
|
39
81
|
##Release Notes
|
40
82
|
|
83
|
+
###0.7.1
|
84
|
+
* BUGFIX: Now the parser for ```arm_selection_embl``` works with the mixture of contigs and pseudomolecules
|
85
|
+
* DOC: Added documentation on how to use custom references.
|
86
|
+
|
87
|
+
###0.7.0
|
88
|
+
* Added flag ```gebines_count``` for number of genomes, to be used on tetraploids, etc.
|
89
|
+
|
41
90
|
###0.6.1
|
42
91
|
|
43
92
|
|
@@ -49,10 +98,10 @@ If you want to use the web interface, visit the [PolyMarker webservice at TGAC](
|
|
49
98
|
|
50
99
|
* If the SNP is in a gap in the alignment to the chromosomes, it is ignored.
|
51
100
|
|
52
|
-
BUG: Blocks with NNNs are picked and treated as semi-specific.
|
53
|
-
BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
54
|
-
TODO: Add a parameter file to configure the alignments.
|
55
|
-
TODO: Produce primers for products of different sizes
|
101
|
+
* BUG: Blocks with NNNs are picked and treated as semi-specific.
|
102
|
+
* BUG: If the name of the reference have space, the ID is not chopped. ">gene_1 (G12A)" shouls be treated as ">gene_1".
|
103
|
+
* TODO: Add a parameter file to configure the alignments.
|
104
|
+
* TODO: Produce primers for products of different sizes. This can probably be done with the primer_3_preferences option, but hasn't been tested.
|
56
105
|
|
57
106
|
|
58
107
|
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.7.
|
1
|
+
0.7.1
|
data/bin/polymarker.rb
CHANGED
@@ -17,9 +17,18 @@ arm_selection_functions[:arm_selection_first_two] = lambda do | contig_name |
|
|
17
17
|
ret = contig_name[0,2]
|
18
18
|
return ret
|
19
19
|
end
|
20
|
-
|
20
|
+
|
21
|
+
#Function to parse stuff like: "IWGSC_CSS_1AL_scaff_110"
|
22
|
+
#Or the first two characters in the contig name, to deal with
|
23
|
+
#pseudomolecules that start with headers like: "1A"
|
24
|
+
#And with the cases when 3B is named with the prefix: v443
|
21
25
|
arm_selection_functions[:arm_selection_embl] = lambda do | contig_name|
|
22
|
-
|
26
|
+
|
27
|
+
arr = contig_name.split('_')
|
28
|
+
ret = "U"
|
29
|
+
ret = arr[2][0,2] if arr.size >= 3
|
30
|
+
ret = "3B" if arr.size == 2 and arr[0] == "v443"
|
31
|
+
ret = arr[0][0,2] if arr.size == 1
|
23
32
|
return ret
|
24
33
|
end
|
25
34
|
|
data/bio-polyploid-tools.gemspec
CHANGED
@@ -2,16 +2,16 @@
|
|
2
2
|
# DO NOT EDIT THIS FILE DIRECTLY
|
3
3
|
# Instead, edit Jeweler::Tasks in Rakefile, and run 'rake gemspec'
|
4
4
|
# -*- encoding: utf-8 -*-
|
5
|
-
# stub: bio-polyploid-tools 0.7.
|
5
|
+
# stub: bio-polyploid-tools 0.7.1 ruby lib
|
6
6
|
|
7
7
|
Gem::Specification.new do |s|
|
8
8
|
s.name = "bio-polyploid-tools"
|
9
|
-
s.version = "0.7.
|
9
|
+
s.version = "0.7.1"
|
10
10
|
|
11
11
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
12
12
|
s.require_paths = ["lib"]
|
13
13
|
s.authors = ["Ricardo H. Ramirez-Gonzalez"]
|
14
|
-
s.date = "2015-05-
|
14
|
+
s.date = "2015-05-28"
|
15
15
|
s.description = "Repository of tools developed in TGAC and Crop Genetics in JIC to work with polyploid wheat"
|
16
16
|
s.email = "ricardo.ramirez-gonzalez@tgac.ac.uk"
|
17
17
|
s.executables = ["bfr.rb", "count_variations.rb", "filter_blat_by_target_coverage.rb", "filter_exonerate_by_identity.rb", "find_best_blat_hit.rb", "find_best_exonerate.rb", "hexaploid_primers.rb", "homokaryot_primers.rb", "map_markers_to_contigs.rb", "markers_in_region.rb", "polymarker.rb", "snp_position_to_polymarker.rb", "snps_between_bams.rb"]
|
@@ -129,7 +129,7 @@ Gem::Specification.new do |s|
|
|
129
129
|
]
|
130
130
|
s.homepage = "http://github.com/tgac/bioruby-polyploid-tools"
|
131
131
|
s.licenses = ["MIT"]
|
132
|
-
s.rubygems_version = "2.
|
132
|
+
s.rubygems_version = "2.4.7"
|
133
133
|
s.summary = "Tool to work with polyploids, NGS and molecular biology"
|
134
134
|
|
135
135
|
if s.respond_to? :specification_version then
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-polyploid-tools
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.7.
|
4
|
+
version: 0.7.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ricardo H. Ramirez-Gonzalez
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-05-
|
11
|
+
date: 2015-05-28 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bio
|
@@ -228,7 +228,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
228
228
|
version: '0'
|
229
229
|
requirements: []
|
230
230
|
rubyforge_project:
|
231
|
-
rubygems_version: 2.
|
231
|
+
rubygems_version: 2.4.7
|
232
232
|
signing_key:
|
233
233
|
specification_version: 4
|
234
234
|
summary: Tool to work with polyploids, NGS and molecular biology
|