bio-alignment 0.0.4 → 0.0.5
Sign up to get free protection for your applications and to get access to all the features.
- data/.travis.yml +12 -0
- data/README.md +65 -5
- data/VERSION +1 -1
- data/doc/bio-alignment-design.md +28 -3
- data/features/bioruby-feature.rb +8 -4
- data/features/bioruby.feature +3 -2
- data/features/columns-feature.rb +33 -0
- data/features/columns.feature +14 -0
- data/features/edit/del_bridges-feature.rb +21 -0
- data/features/edit/del_bridges.feature +36 -0
- data/features/edit/del_non_informative_sequences.feature +4 -0
- data/features/edit/gblocks-feature.rb +13 -0
- data/features/edit/gblocks.feature +31 -0
- data/features/edit/mask_islands-feature.rb +12 -0
- data/features/edit/mask_islands.feature +44 -0
- data/features/edit/mask_serial_mutations-feature.rb +16 -0
- data/features/edit/mask_serial_mutations.feature +36 -0
- data/features/rows-feature.rb +35 -0
- data/features/rows.feature +14 -0
- data/lib/bio-alignment/alignment.rb +20 -4
- data/lib/bio-alignment/bioruby.rb +9 -0
- data/lib/bio-alignment/codonsequence.rb +4 -0
- data/lib/bio-alignment/column.rb +47 -0
- data/lib/bio-alignment/edit/del_bridges.rb +11 -0
- data/lib/bio-alignment/pal2nal.rb +10 -2
- data/lib/bio-alignment/sequence.rb +5 -0
- data/lib/bio-alignment/state.rb +31 -0
- metadata +34 -17
data/.travis.yml
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
language: ruby
|
2
|
+
rvm:
|
3
|
+
- 1.9.2
|
4
|
+
- jruby-19mode # JRuby in 1.9 mode
|
5
|
+
# - 1.8.7
|
6
|
+
# - 1.9.3
|
7
|
+
# - rbx-19mode
|
8
|
+
# - jruby-18mode # JRuby in 1.8 mode
|
9
|
+
# - rbx-18mode
|
10
|
+
|
11
|
+
# uncomment this line if your project needs to run something other than `rake`:
|
12
|
+
# script: bundle exec rspec spec
|
data/README.md
CHANGED
@@ -41,9 +41,43 @@ aligmment (note codon gaps are represented by '---')
|
|
41
41
|
end
|
42
42
|
```
|
43
43
|
|
44
|
+
Now add some state - you can define your own row state
|
45
|
+
|
46
|
+
```ruby
|
47
|
+
# tell the row to handle state
|
48
|
+
aln[0].extend(State)
|
49
|
+
# mark the first row for deletion
|
50
|
+
aln[0].state = MyStateDeleteObject.new
|
51
|
+
if aln.rows[0].state.deleted?
|
52
|
+
# do something
|
53
|
+
end
|
54
|
+
```
|
55
|
+
|
56
|
+
### Accessing columns
|
57
|
+
|
58
|
+
BioAlignment has a module for handling columns in an alignment. As
|
59
|
+
long as the contained sequence objects have the [] and length methods,
|
60
|
+
they can lazily be iterated by column. To get a column and iterate it
|
61
|
+
|
62
|
+
```ruby
|
63
|
+
column = aln.columns[3]
|
64
|
+
column.each do |element|
|
65
|
+
p element
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
Now add some state - you can define your own column state
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
aln.columns[3].state = MyStateDeleteObject.new
|
73
|
+
if aln.columns[3].state.deleted?
|
74
|
+
# do something
|
75
|
+
end
|
76
|
+
```
|
77
|
+
|
44
78
|
### BioRuby Sequence objects
|
45
79
|
|
46
|
-
|
80
|
+
BioAlignment supports adding BioRuby's Bio::Sequence objects:
|
47
81
|
|
48
82
|
```ruby
|
49
83
|
require 'bio' # BioRuby
|
@@ -54,6 +88,14 @@ The BioAlignment supports BioRuby's Bio::Sequence objects:
|
|
54
88
|
aln << Bio::Sequence::NA.new("atg---tcaaaa")
|
55
89
|
```
|
56
90
|
|
91
|
+
and we can transform BioAlignment into BioRuby's Bio::Alignment and
|
92
|
+
use BioRuby functions
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
bioruby_aln = aln.to_bioruby_alignment
|
96
|
+
bioruby_aln.consensus_iupac
|
97
|
+
```
|
98
|
+
|
57
99
|
### Pal2nal
|
58
100
|
|
59
101
|
A protein (amino acid) to nucleotide alignment would first load
|
@@ -72,7 +114,7 @@ the sequences
|
|
72
114
|
end
|
73
115
|
```
|
74
116
|
|
75
|
-
|
117
|
+
Writing a (simple) version of pal2nal would be something like
|
76
118
|
|
77
119
|
```ruby
|
78
120
|
fasta3 = FastaWriter.new('nt-aln.fa')
|
@@ -95,15 +137,33 @@ Write a (simple) version of pal2nal would be something like
|
|
95
137
|
end
|
96
138
|
```
|
97
139
|
|
98
|
-
With amino acid
|
99
|
-
|
140
|
+
With amino acid aa_aln and nucleotide nt_aln loaded, the library
|
141
|
+
version of pal2nal includes validation
|
100
142
|
|
101
143
|
```ruby
|
102
|
-
|
144
|
+
aln = aa_aln.pal2nal(nt_aln, :codon_table => 3, :do_validate => true)
|
103
145
|
```
|
104
146
|
|
105
147
|
resulting in the codon alignment.
|
106
148
|
|
149
|
+
### Alignment editing
|
150
|
+
|
151
|
+
BioAlignment supports multiple alignment editing features, which are
|
152
|
+
listed
|
153
|
+
[here](https://github.com/pjotrp/bioruby-alignment/tree/master/features/edit).
|
154
|
+
Each edition feature is added at runtime(!) Example:
|
155
|
+
|
156
|
+
```ruby
|
157
|
+
require 'bio-alignment/edit/del_bridges'
|
158
|
+
|
159
|
+
aln.extend DelBridges # bring the module into scope
|
160
|
+
aln2 = aln.clean(50) # execute the alignment editor
|
161
|
+
```
|
162
|
+
|
163
|
+
|
164
|
+
|
165
|
+
### See also
|
166
|
+
|
107
167
|
The API documentation is online. For more code examples see
|
108
168
|
[./spec/*.rb](https://github.com/pjotrp/bioruby-alignment/tree/master/spec) and
|
109
169
|
[./features/*](https://github.com/pjotrp/bioruby-alignment/tree/master/features).
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.0.
|
1
|
+
0.0.5
|
data/doc/bio-alignment-design.md
CHANGED
@@ -22,13 +22,13 @@ say we have a nucleotide sequence with pay load
|
|
22
22
|
5 9 * 1
|
23
23
|
|
24
24
|
most library implementations will have two strings "AGTA" and "59*1".
|
25
|
-
Removing the third
|
25
|
+
Removing the third nucleotide would mean removing it twice, into first
|
26
26
|
"AGA", and second "591". With bio-alignment this is one action because we
|
27
27
|
have one object for each element that contains both values, e.g. the
|
28
28
|
payload of 'T' is '*'. Moving 'T' automatically moves '*'.
|
29
29
|
|
30
30
|
In addition the bio-alignment library deals with codons and codon translation.
|
31
|
-
Rather than track
|
31
|
+
Rather than track multiple matrices, the codon is viewed as an element,
|
32
32
|
and the translated codon as the pay load. Again, when an alignment gets
|
33
33
|
reordered the code only has to do it in one place.
|
34
34
|
|
@@ -89,7 +89,7 @@ do a fancy
|
|
89
89
|
|
90
90
|
Elements in the list should respond to a gap? method, for an alignment
|
91
91
|
gap, and the undefined? method for a position that is either an
|
92
|
-
element or a gap. Also it should
|
92
|
+
element or a gap. Also it should respond to the to_s method.
|
93
93
|
|
94
94
|
An element can contain any pay load. If a list of attributes exists
|
95
95
|
in the sequence object, it can be used.
|
@@ -100,6 +100,11 @@ The column list tracks the columns of the alignment. The requirement
|
|
100
100
|
is that it should be iterable and can be indexed. The Column contains
|
101
101
|
no elements, but may point to a list when the alignment is transposed.
|
102
102
|
|
103
|
+
One of the 'features' of this library is that the Column access logic is
|
104
|
+
split out into a separate module, which accesses the data in a lazy fashion.
|
105
|
+
Also column state is stored as an 'any object'. I.e. a column can contain
|
106
|
+
any state.
|
107
|
+
|
103
108
|
## Matrix or MSA
|
104
109
|
|
105
110
|
The Matrix consists of a Column list, multiple Sequences, in turn
|
@@ -130,6 +135,26 @@ The Matrix can be accessed in transposed fashion, but accessing the normal
|
|
130
135
|
matrix and transposed matrix at the same time is not supported. Matrix is not
|
131
136
|
designed to be transaction safe - though you can copy the Matrix any time.
|
132
137
|
|
138
|
+
## Adding functionality
|
139
|
+
|
140
|
+
To ascertain that the basic BioAlignment does not get polluted, extra functionality
|
141
|
+
is added by Modules. These modules can be added at run time(!) One advantage is
|
142
|
+
that there is less name space pollution, the other is that different implementations
|
143
|
+
can be plugged in - using the same interface. For example, here we are going to
|
144
|
+
use an alignment editor named DelBridges, which has a method named clean:
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
require 'bio-alignment/edit/del_bridges'
|
148
|
+
|
149
|
+
aln = Alignment.new(string.split(/\n/))
|
150
|
+
aln.extend DelBridges # bring the module into scope
|
151
|
+
aln2 = aln.clean
|
152
|
+
```
|
153
|
+
|
154
|
+
in other words, the functionality in DelBridges gets attached to the aln
|
155
|
+
instance at run time, without affecting any other Alignment object(!) Also,
|
156
|
+
when not requiring 'bio-alignment/edit/del_bridges', the functionality is never
|
157
|
+
visible, and never added to the environment.
|
133
158
|
|
134
159
|
|
135
160
|
Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
|
data/features/bioruby-feature.rb
CHANGED
@@ -67,14 +67,18 @@ end
|
|
67
67
|
# ----
|
68
68
|
|
69
69
|
Given /^I have a BioAlignment$/ do
|
70
|
-
|
70
|
+
@aln1 = Alignment.new
|
71
|
+
fasta = FastaReader.new('test/data/fasta/codon/aa-alignment.fa')
|
72
|
+
fasta.each do | rec |
|
73
|
+
@aln1.sequences << Sequence.new(rec.id, rec.seq)
|
74
|
+
end
|
71
75
|
end
|
72
76
|
|
73
77
|
When /^I convert$/ do
|
74
|
-
|
78
|
+
@bioruby_alignment = @aln1.to_bioruby_alignment
|
75
79
|
end
|
76
80
|
|
77
|
-
Then /^I should have a Bio::Alignment$/ do
|
78
|
-
|
81
|
+
Then /^I should have a BioRuby Bio::Alignment$/ do
|
82
|
+
@bioruby_alignment.consensus_iupac[0..8].should == '???????v?'
|
79
83
|
end
|
80
84
|
|
data/features/bioruby.feature
CHANGED
@@ -3,6 +3,7 @@ Feature: BioAlignment should play with BioRuby
|
|
3
3
|
I want to convert BioAlignment to Bio::Alignment
|
4
4
|
And I want to support Bio::Sequence objects
|
5
5
|
|
6
|
+
@bioruby
|
6
7
|
Scenario: Use Bio::Sequence to fill BioAlignment
|
7
8
|
Given I have multiple Bio::Sequence objects
|
8
9
|
When I assign BioAlignment
|
@@ -17,7 +18,7 @@ Feature: BioAlignment should play with BioRuby
|
|
17
18
|
And and return a partial AA sequence
|
18
19
|
And be AA indexable
|
19
20
|
|
20
|
-
Scenario: Convert BioAlignment to Bio::Alignment
|
21
|
+
Scenario: Convert BioAlignment to BioRuby Bio::Alignment
|
21
22
|
Given I have a BioAlignment
|
22
23
|
When I convert
|
23
|
-
Then I should have a Bio::Alignment
|
24
|
+
Then I should have a BioRuby Bio::Alignment
|
@@ -0,0 +1,33 @@
|
|
1
|
+
Given /^I iterate the columns$/ do
|
2
|
+
@aln.should_not be_nil # aln is loaded by codon-feature.rb
|
3
|
+
end
|
4
|
+
|
5
|
+
column = nil
|
6
|
+
When /^I fetch a column$/ do
|
7
|
+
column = @aln.columns[3]
|
8
|
+
column.should_not be_nil
|
9
|
+
column[0].to_s.should == 'cga'
|
10
|
+
end
|
11
|
+
|
12
|
+
When /^I inject column state$/ do
|
13
|
+
column.state = ColumnState.new
|
14
|
+
column.state.deleted = true
|
15
|
+
end
|
16
|
+
|
17
|
+
Then /^I should be able to get the column state$/ do
|
18
|
+
column.state.deleted?.should be_true
|
19
|
+
end
|
20
|
+
|
21
|
+
list = []
|
22
|
+
When /^I iterate a column$/ do
|
23
|
+
column10 = @aln.columns[10]
|
24
|
+
column10.each do | element |
|
25
|
+
list << element.to_s
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
Then /^I should get the column elements$/ do
|
30
|
+
list[0..10].should ==
|
31
|
+
["ctt", "gcg", "ctt", "ttt", "gcg", "ttt", "ttt", "agt", "ttt", "atg", "agt"]
|
32
|
+
end
|
33
|
+
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Feature: Alignment column support
|
2
|
+
In order to access an alignment by column
|
3
|
+
I want to access column state
|
4
|
+
I want to get all elements in a column
|
5
|
+
|
6
|
+
@columns
|
7
|
+
Scenario: Access column information in an alignment
|
8
|
+
Given I read an MSA nucleotide FASTA file in the test/data folder
|
9
|
+
When I fetch a column
|
10
|
+
When I inject column state
|
11
|
+
Then I should be able to get the column state
|
12
|
+
When I iterate a column
|
13
|
+
Then I should get the column elements
|
14
|
+
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
require 'bio-alignment/edit/del_bridges'
|
3
|
+
|
4
|
+
Given /^I have a bridged alignment$/ do |string|
|
5
|
+
@aln = Alignment.new(string.split(/\n/))
|
6
|
+
end
|
7
|
+
|
8
|
+
When /^I apply the bridge rule$/ do
|
9
|
+
@aln.extend DelBridges
|
10
|
+
aln2 = @aln.clean
|
11
|
+
end
|
12
|
+
|
13
|
+
Then /^it should have removed (\d+) bridges$/ do |arg1, string|
|
14
|
+
pending # express the regexp above with the code you wish you had
|
15
|
+
end
|
16
|
+
|
17
|
+
Then /^I should be able to track removed columns$/ do
|
18
|
+
pending # express the regexp above with the code you wish you had
|
19
|
+
end
|
20
|
+
|
21
|
+
|
@@ -0,0 +1,36 @@
|
|
1
|
+
Feature: Alignment editing, the bridge rule
|
2
|
+
Remove columns that contain too many gaps
|
3
|
+
|
4
|
+
Drop all bridges in less than 'min_bridges_fraction' (default 1/3 or 33%).
|
5
|
+
|
6
|
+
The dropped columns are tracked by the table columns.
|
7
|
+
|
8
|
+
@dev
|
9
|
+
Scenario: Apply bridge rule to an amino acid alignment
|
10
|
+
Given I have a bridged alignment
|
11
|
+
"""
|
12
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
13
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
14
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
15
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
16
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
17
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
18
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
19
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
20
|
+
-------------IFHAVR-TC-HP-----------------
|
21
|
+
"""
|
22
|
+
When I apply the bridge rule
|
23
|
+
Then it should have removed 4 bridges
|
24
|
+
"""
|
25
|
+
SNSFSRPTIIFSGCSTACSGKSELVCGFRSFMLSDV
|
26
|
+
SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
|
27
|
+
SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
|
28
|
+
PKLFSRPTIIFSGCSTACSGKSEPVCGFRSFMLSDV
|
29
|
+
------PTIIFSGCSKACSGKSELVCGFRSFMLSDV
|
30
|
+
------PTIIFSGCSKACSGK---FRSFRSFMLSAV
|
31
|
+
------PTIIFSGCSKACSGK---VCGIFHAVRSFM
|
32
|
+
------PTIIFSGCSKACSGKSELVCGFRSFMLSAV
|
33
|
+
---------IFHAVR-TC-HP---------------
|
34
|
+
"""
|
35
|
+
Then I should be able to track removed columns
|
36
|
+
|
@@ -0,0 +1,13 @@
|
|
1
|
+
When /^I apply GBlocks$/ do
|
2
|
+
pending # express the regexp above with the code you wish you had
|
3
|
+
end
|
4
|
+
|
5
|
+
Then /^it should return the GBlocks cleaned alignment$/ do
|
6
|
+
pending # express the regexp above with the code you wish you had
|
7
|
+
end
|
8
|
+
|
9
|
+
Then /^return a list of removed columns$/ do
|
10
|
+
pending # express the regexp above with the code you wish you had
|
11
|
+
end
|
12
|
+
|
13
|
+
|
@@ -0,0 +1,31 @@
|
|
1
|
+
Feature: GBlocks implementation in Ruby
|
2
|
+
|
3
|
+
The GBlocks routine is often used, but the source code is not open source. This
|
4
|
+
is a feature request for a reimplementation of GBlocks. Some links:
|
5
|
+
|
6
|
+
Open sourcing request by Debian: http://lists.debian.org/debian-med/2011/02/msg00008.html
|
7
|
+
|
8
|
+
Binary download of GBlocks: http://molevol.cmima.csic.es/castresana/Gblocks.html
|
9
|
+
|
10
|
+
Documentation: http://molevol.cmima.csic.es/castresana/Gblocks/Gblocks_documentation.html
|
11
|
+
|
12
|
+
It is quite a simple routine, and would be easy to validate against existing outcomes.
|
13
|
+
|
14
|
+
Scenario: Apply GBlocks to an alignment
|
15
|
+
Given I have an alignment
|
16
|
+
"""
|
17
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
18
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
19
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
20
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
21
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
22
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
23
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
24
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
25
|
+
-------------IFHAVR-TC-HP-----------------
|
26
|
+
"""
|
27
|
+
When I apply GBlocks
|
28
|
+
Then it should return the GBlocks cleaned alignment
|
29
|
+
And return a list of removed columns
|
30
|
+
|
31
|
+
|
@@ -0,0 +1,12 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
|
3
|
+
When /^I apply island rule with max_gap_size (\d+)$/ do |arg1|
|
4
|
+
pending # express the regexp above with the code you wish you had
|
5
|
+
end
|
6
|
+
|
7
|
+
Then /^it should result in$/ do |string|
|
8
|
+
pending # express the regexp above with the code you wish you had
|
9
|
+
end
|
10
|
+
|
11
|
+
|
12
|
+
|
@@ -0,0 +1,44 @@
|
|
1
|
+
Feature: Alignment editing with the Island rule
|
2
|
+
The idea is to drop hypervariable floating sequences, as they are probably
|
3
|
+
misaligned.
|
4
|
+
|
5
|
+
Drop all 'islands' in a sequence with low island consensus, that show a gap
|
6
|
+
larger than 'max_gap_size' (default 6) on both sides, and are shorter than
|
7
|
+
'min_island_size' (default 30). The latter may be a large size, as an island
|
8
|
+
needs to loop in and out several times to be (arguably) functional. We also
|
9
|
+
add a parameter 'max_gap_size_inside' (default 2) which allows for small gaps
|
10
|
+
inside the island - though the total island size is calculated including
|
11
|
+
those small gaps.
|
12
|
+
|
13
|
+
The island consensus is calculated by column.
|
14
|
+
'max_island_elements_unique_percentage' (default 10%) of elements in the
|
15
|
+
island should have a 'min_island_column_matched' (default 1) somewhere in the
|
16
|
+
element's column.
|
17
|
+
|
18
|
+
Scenario: Apply island rule to an amino acid alignment
|
19
|
+
Given I have an alignment
|
20
|
+
"""
|
21
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
22
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
23
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
24
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
25
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
26
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
27
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
28
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
29
|
+
-------------IFHAVR-TC-HP-----------------
|
30
|
+
"""
|
31
|
+
When I apply island rule with max_gap_size 4
|
32
|
+
Then it should have removed 2 islands
|
33
|
+
"""
|
34
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
35
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
36
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
37
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
38
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
39
|
+
----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
|
40
|
+
----------PTIIFSGCSKACSGK-----------------
|
41
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
42
|
+
------------------------------------------
|
43
|
+
"""
|
44
|
+
|
@@ -0,0 +1,16 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
|
3
|
+
Given /^I have an alignment$/ do |string|
|
4
|
+
@aln = Alignment.new(string.split(/\n/))
|
5
|
+
p @aln
|
6
|
+
end
|
7
|
+
|
8
|
+
When /^I apply rule masking with X and max_gap_size (\d+)$/ do |arg1|
|
9
|
+
pending # express the regexp above with the code you wish you had
|
10
|
+
end
|
11
|
+
|
12
|
+
Then /^it should have removed (\d+) islands$/ do |arg1, string|
|
13
|
+
pending # express the regexp above with the code you wish you had
|
14
|
+
end
|
15
|
+
|
16
|
+
|
@@ -0,0 +1,36 @@
|
|
1
|
+
Feature: Alignment editing masking serial mutations
|
2
|
+
Edit an alignment removing or masking unique elements column-wise.
|
3
|
+
|
4
|
+
If a sequence has a unique AA in a column it is a single mutation event. If
|
5
|
+
multiple neighbouring AA's are also unique we suspect the sequence is an
|
6
|
+
outlier. This rule masks, or deletes, stretches of unique AAs. The stretch of
|
7
|
+
unique AA's is defined in 'max_serial_unique' (default 5, so two bordering
|
8
|
+
unique AA's are allowed).
|
9
|
+
|
10
|
+
Scenario: Apply rule to an amino acid alignment
|
11
|
+
Given I have an alignment
|
12
|
+
"""
|
13
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
14
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
15
|
+
SSIISNSFSRPTIIFSGCSTACSQQKLTSEQVCFR---LSDV
|
16
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
17
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
18
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
19
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
20
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
21
|
+
-------------IFHAVR-TC-HP-----------------
|
22
|
+
"""
|
23
|
+
When I apply rule masking with X and max_gap_size 5
|
24
|
+
Then it should result in
|
25
|
+
"""
|
26
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
27
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
28
|
+
SSIISNSFSRPTIIFSGCSTACXXXXXXXXXXXFR---LSDV
|
29
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
30
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
31
|
+
----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
|
32
|
+
----------PTIIFSGCSKACSGK-----------------
|
33
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
34
|
+
-------------XXXXXX-XX-XX-----------------
|
35
|
+
"""
|
36
|
+
|
@@ -0,0 +1,35 @@
|
|
1
|
+
Given /^I iterate the rows$/ do
|
2
|
+
@aln.should_not be_nil # aln is loaded by codon-feature.rb
|
3
|
+
end
|
4
|
+
|
5
|
+
row = nil
|
6
|
+
When /^I fetch a row$/ do
|
7
|
+
row = @aln.rows[3]
|
8
|
+
row.should_not be_nil
|
9
|
+
row[0].to_s.should == '---'
|
10
|
+
end
|
11
|
+
|
12
|
+
When /^I inject row state$/ do
|
13
|
+
# tell row to handle state
|
14
|
+
row.extend(State)
|
15
|
+
row.state = RowState.new
|
16
|
+
row.state.deleted = true
|
17
|
+
end
|
18
|
+
|
19
|
+
Then /^I should be able to get the row state$/ do
|
20
|
+
row.state.deleted?.should be_true
|
21
|
+
end
|
22
|
+
|
23
|
+
list = []
|
24
|
+
When /^I iterate a row$/ do
|
25
|
+
row10 = @aln.rows[10]
|
26
|
+
row10.each do | element |
|
27
|
+
list << element.to_s
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
Then /^I should get the row elements$/ do
|
32
|
+
list[0..10].should == ["---", "---", "---", "---", "---", "---", "---", "atg", "tcg", "tcc", "agt"]
|
33
|
+
|
34
|
+
end
|
35
|
+
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Feature: Alignment row support
|
2
|
+
In order to access an alignment by row
|
3
|
+
I want to access the row state
|
4
|
+
|
5
|
+
@rows
|
6
|
+
Scenario: Access row information in an alignment
|
7
|
+
Given I read an MSA nucleotide FASTA file in the test/data folder
|
8
|
+
When I fetch a row
|
9
|
+
When I inject row state
|
10
|
+
Then I should be able to get the row state
|
11
|
+
When I iterate a row
|
12
|
+
Then I should get the row elements
|
13
|
+
|
14
|
+
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# Alignment
|
2
2
|
|
3
3
|
require 'bio-alignment/pal2nal'
|
4
|
+
require 'bio-alignment/column'
|
4
5
|
|
5
6
|
module Bio
|
6
7
|
|
@@ -9,22 +10,37 @@ module Bio
|
|
9
10
|
class Alignment
|
10
11
|
include Enumerable
|
11
12
|
include Pal2Nal
|
13
|
+
include Columns
|
12
14
|
|
13
15
|
attr_accessor :sequences
|
14
16
|
|
15
|
-
|
17
|
+
# Create alignment. seqs can be a list of sequences. If these
|
18
|
+
# are String types, they get converted to the library Sequence
|
19
|
+
# container
|
20
|
+
def initialize seqs = nil
|
16
21
|
@sequences = []
|
22
|
+
if seqs
|
23
|
+
seqs.each_with_index do | seq, i |
|
24
|
+
@sequences <<
|
25
|
+
if seq.kind_of?(String)
|
26
|
+
Sequence.new(i,seq)
|
27
|
+
else
|
28
|
+
seq
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
17
32
|
end
|
18
33
|
|
19
34
|
alias rows sequences
|
20
35
|
|
21
|
-
|
22
|
-
|
23
|
-
|
36
|
+
def [] index
|
37
|
+
rows[index]
|
38
|
+
end
|
24
39
|
|
25
40
|
def each
|
26
41
|
rows.each { | seq | yield seq }
|
27
42
|
end
|
43
|
+
|
28
44
|
end
|
29
45
|
end
|
30
46
|
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
require 'bio-alignment/state'
|
2
|
+
|
3
|
+
module Bio
|
4
|
+
|
5
|
+
module BioAlignment
|
6
|
+
|
7
|
+
# The Columns module provides accessors for the column list
|
8
|
+
# returning Column objects
|
9
|
+
module Columns
|
10
|
+
|
11
|
+
# Return a list of Column objects. The contents of the
|
12
|
+
# columns are accessed lazily
|
13
|
+
def columns
|
14
|
+
(0..num_columns-1).map { | col | Column.new(self,col) }
|
15
|
+
end
|
16
|
+
|
17
|
+
def num_columns
|
18
|
+
rows.first.length
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
# Support the notion of columns in an alignment. A column
|
23
|
+
# can have state by attaching state objects
|
24
|
+
class Column
|
25
|
+
include State
|
26
|
+
|
27
|
+
def initialize aln, col
|
28
|
+
@aln = aln
|
29
|
+
@col = col
|
30
|
+
end
|
31
|
+
|
32
|
+
def [] index
|
33
|
+
@aln[index][@col]
|
34
|
+
end
|
35
|
+
|
36
|
+
# iterator fetches a column on demand
|
37
|
+
def each
|
38
|
+
@aln.each do | seq |
|
39
|
+
yield seq[@col]
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
end
|
45
|
+
|
46
|
+
end
|
47
|
+
|
@@ -3,17 +3,24 @@
|
|
3
3
|
module Bio
|
4
4
|
module BioAlignment
|
5
5
|
module Pal2Nal
|
6
|
-
|
6
|
+
|
7
|
+
# Protein to nucleotide alignment, using a codon table for testing. If :do_validate is
|
8
|
+
# false, translation for validation is skipped (note CodonSequence translation is lazy).
|
9
|
+
def pal2nal nt_aln, options = { :codon_table => 1, :do_validate => true }
|
10
|
+
do_validate = options[:do_validate]
|
7
11
|
aa_aln = self
|
8
12
|
codon_aln = Alignment.new
|
9
13
|
aa_aln.each_with_index do | aaseq, i |
|
10
14
|
ntseq = nt_aln.sequences[i]
|
11
15
|
raise "pal2nal sequence IDs do not match (for #{aaseq.id} != #{ntseq.id})" if aaseq.id != ntseq.id
|
12
16
|
raise "pal2nal sequence size does not match (for #{aaseq.id}'s #{aaseq.to_s.size}!= #{ntseq.to_s.size * 3})" if aaseq.id != ntseq.id
|
17
|
+
# create a Codon sequence out of the nucleotide sequence (no gaps)
|
13
18
|
codonseq = CodonSequence.new(ntseq.id, ntseq.seq, options)
|
14
19
|
|
15
20
|
codon_pos = 0
|
16
21
|
result = []
|
22
|
+
|
23
|
+
# now fill the result array by finding codons and gaps, and testing for valid amino acids
|
17
24
|
aaseq.each do | aa |
|
18
25
|
result <<
|
19
26
|
if aa.gap?
|
@@ -21,11 +28,12 @@ module Bio
|
|
21
28
|
else
|
22
29
|
codon = codonseq[codon_pos]
|
23
30
|
# validate codon translates to amino acid
|
24
|
-
raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if codon.to_aa != aa.to_s
|
31
|
+
raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if do_validate and codon.to_aa != aa.to_s
|
25
32
|
codon_pos += 1
|
26
33
|
codon.to_s
|
27
34
|
end
|
28
35
|
end
|
36
|
+
# the new result is transformed to a gapped CodonSequence
|
29
37
|
codon_seq = CodonSequence.new(aaseq.id, result.join(''), options)
|
30
38
|
codon_aln.sequences << codon_seq
|
31
39
|
end
|
@@ -18,6 +18,7 @@ module Bio
|
|
18
18
|
#
|
19
19
|
class Sequence
|
20
20
|
include Enumerable
|
21
|
+
include State
|
21
22
|
|
22
23
|
attr_reader :id, :seq
|
23
24
|
def initialize id, seq
|
@@ -29,6 +30,10 @@ module Bio
|
|
29
30
|
@seq[index]
|
30
31
|
end
|
31
32
|
|
33
|
+
def length
|
34
|
+
@seq.length
|
35
|
+
end
|
36
|
+
|
32
37
|
def each
|
33
38
|
@seq.each_char { | c | yield Element.new(c) }
|
34
39
|
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
module Bio
|
2
|
+
|
3
|
+
module BioAlignment
|
4
|
+
|
5
|
+
module State
|
6
|
+
attr_accessor :state
|
7
|
+
end
|
8
|
+
|
9
|
+
# Convenience class for tracking state. Note you can add
|
10
|
+
# any class you like
|
11
|
+
class ColumnState
|
12
|
+
attr_accessor :deleted
|
13
|
+
|
14
|
+
def deleted?
|
15
|
+
deleted == true
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
# Convenience class for tracking state. Note you can add
|
20
|
+
# any class you like
|
21
|
+
class RowState
|
22
|
+
attr_accessor :deleted
|
23
|
+
|
24
|
+
def deleted?
|
25
|
+
deleted == true
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
|
31
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-alignment
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-02-
|
12
|
+
date: 2012-02-28 00:00:00.000000000Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bio-logger
|
16
|
-
requirement: &
|
16
|
+
requirement: &11633860 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *11633860
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: bio
|
27
|
-
requirement: &
|
27
|
+
requirement: &11632600 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ! '>='
|
@@ -32,10 +32,10 @@ dependencies:
|
|
32
32
|
version: 1.4.2
|
33
33
|
type: :runtime
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *11632600
|
36
36
|
- !ruby/object:Gem::Dependency
|
37
37
|
name: bio-bigbio
|
38
|
-
requirement: &
|
38
|
+
requirement: &11622800 !ruby/object:Gem::Requirement
|
39
39
|
none: false
|
40
40
|
requirements:
|
41
41
|
- - ! '>'
|
@@ -43,10 +43,10 @@ dependencies:
|
|
43
43
|
version: 0.1.3
|
44
44
|
type: :development
|
45
45
|
prerelease: false
|
46
|
-
version_requirements: *
|
46
|
+
version_requirements: *11622800
|
47
47
|
- !ruby/object:Gem::Dependency
|
48
48
|
name: cucumber
|
49
|
-
requirement: &
|
49
|
+
requirement: &11622000 !ruby/object:Gem::Requirement
|
50
50
|
none: false
|
51
51
|
requirements:
|
52
52
|
- - ! '>='
|
@@ -54,10 +54,10 @@ dependencies:
|
|
54
54
|
version: '0'
|
55
55
|
type: :development
|
56
56
|
prerelease: false
|
57
|
-
version_requirements: *
|
57
|
+
version_requirements: *11622000
|
58
58
|
- !ruby/object:Gem::Dependency
|
59
59
|
name: rspec
|
60
|
-
requirement: &
|
60
|
+
requirement: &11621400 !ruby/object:Gem::Requirement
|
61
61
|
none: false
|
62
62
|
requirements:
|
63
63
|
- - ~>
|
@@ -65,10 +65,10 @@ dependencies:
|
|
65
65
|
version: 2.3.0
|
66
66
|
type: :development
|
67
67
|
prerelease: false
|
68
|
-
version_requirements: *
|
68
|
+
version_requirements: *11621400
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: bundler
|
71
|
-
requirement: &
|
71
|
+
requirement: &11620480 !ruby/object:Gem::Requirement
|
72
72
|
none: false
|
73
73
|
requirements:
|
74
74
|
- - ~>
|
@@ -76,10 +76,10 @@ dependencies:
|
|
76
76
|
version: 1.0.0
|
77
77
|
type: :development
|
78
78
|
prerelease: false
|
79
|
-
version_requirements: *
|
79
|
+
version_requirements: *11620480
|
80
80
|
- !ruby/object:Gem::Dependency
|
81
81
|
name: jeweler
|
82
|
-
requirement: &
|
82
|
+
requirement: &11619680 !ruby/object:Gem::Requirement
|
83
83
|
none: false
|
84
84
|
requirements:
|
85
85
|
- - ~>
|
@@ -87,7 +87,7 @@ dependencies:
|
|
87
87
|
version: 1.7.0
|
88
88
|
type: :development
|
89
89
|
prerelease: false
|
90
|
-
version_requirements: *
|
90
|
+
version_requirements: *11619680
|
91
91
|
description: Alignment handler for multiple sequence alignments (MSA)
|
92
92
|
email: pjotr.public01@thebird.nl
|
93
93
|
executables:
|
@@ -99,6 +99,7 @@ extra_rdoc_files:
|
|
99
99
|
files:
|
100
100
|
- .document
|
101
101
|
- .rspec
|
102
|
+
- .travis.yml
|
102
103
|
- Gemfile
|
103
104
|
- LICENSE.txt
|
104
105
|
- README.md
|
@@ -110,14 +111,30 @@ files:
|
|
110
111
|
- features/bioruby.feature
|
111
112
|
- features/codon-feature.rb
|
112
113
|
- features/codon.feature
|
114
|
+
- features/columns-feature.rb
|
115
|
+
- features/columns.feature
|
116
|
+
- features/edit/del_bridges-feature.rb
|
117
|
+
- features/edit/del_bridges.feature
|
118
|
+
- features/edit/del_non_informative_sequences.feature
|
119
|
+
- features/edit/gblocks-feature.rb
|
120
|
+
- features/edit/gblocks.feature
|
121
|
+
- features/edit/mask_islands-feature.rb
|
122
|
+
- features/edit/mask_islands.feature
|
123
|
+
- features/edit/mask_serial_mutations-feature.rb
|
124
|
+
- features/edit/mask_serial_mutations.feature
|
113
125
|
- features/pal2nal-feature.rb
|
114
126
|
- features/pal2nal.feature
|
127
|
+
- features/rows-feature.rb
|
128
|
+
- features/rows.feature
|
115
129
|
- lib/bio-alignment.rb
|
116
130
|
- lib/bio-alignment/alignment.rb
|
117
131
|
- lib/bio-alignment/bioruby.rb
|
118
132
|
- lib/bio-alignment/codonsequence.rb
|
133
|
+
- lib/bio-alignment/column.rb
|
134
|
+
- lib/bio-alignment/edit/del_bridges.rb
|
119
135
|
- lib/bio-alignment/pal2nal.rb
|
120
136
|
- lib/bio-alignment/sequence.rb
|
137
|
+
- lib/bio-alignment/state.rb
|
121
138
|
- spec/bio-alignment_spec.rb
|
122
139
|
- spec/spec_helper.rb
|
123
140
|
- test/data/fasta/codon/aa-alignment.fa
|
@@ -141,7 +158,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
141
158
|
version: '0'
|
142
159
|
segments:
|
143
160
|
- 0
|
144
|
-
hash:
|
161
|
+
hash: 1565072942973090495
|
145
162
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
146
163
|
none: false
|
147
164
|
requirements:
|