bio-alignment 0.0.4 → 0.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.travis.yml +12 -0
- data/README.md +65 -5
- data/VERSION +1 -1
- data/doc/bio-alignment-design.md +28 -3
- data/features/bioruby-feature.rb +8 -4
- data/features/bioruby.feature +3 -2
- data/features/columns-feature.rb +33 -0
- data/features/columns.feature +14 -0
- data/features/edit/del_bridges-feature.rb +21 -0
- data/features/edit/del_bridges.feature +36 -0
- data/features/edit/del_non_informative_sequences.feature +4 -0
- data/features/edit/gblocks-feature.rb +13 -0
- data/features/edit/gblocks.feature +31 -0
- data/features/edit/mask_islands-feature.rb +12 -0
- data/features/edit/mask_islands.feature +44 -0
- data/features/edit/mask_serial_mutations-feature.rb +16 -0
- data/features/edit/mask_serial_mutations.feature +36 -0
- data/features/rows-feature.rb +35 -0
- data/features/rows.feature +14 -0
- data/lib/bio-alignment/alignment.rb +20 -4
- data/lib/bio-alignment/bioruby.rb +9 -0
- data/lib/bio-alignment/codonsequence.rb +4 -0
- data/lib/bio-alignment/column.rb +47 -0
- data/lib/bio-alignment/edit/del_bridges.rb +11 -0
- data/lib/bio-alignment/pal2nal.rb +10 -2
- data/lib/bio-alignment/sequence.rb +5 -0
- data/lib/bio-alignment/state.rb +31 -0
- metadata +34 -17
data/.travis.yml
ADDED
@@ -0,0 +1,12 @@
|
|
1
|
+
language: ruby
|
2
|
+
rvm:
|
3
|
+
- 1.9.2
|
4
|
+
- jruby-19mode # JRuby in 1.9 mode
|
5
|
+
# - 1.8.7
|
6
|
+
# - 1.9.3
|
7
|
+
# - rbx-19mode
|
8
|
+
# - jruby-18mode # JRuby in 1.8 mode
|
9
|
+
# - rbx-18mode
|
10
|
+
|
11
|
+
# uncomment this line if your project needs to run something other than `rake`:
|
12
|
+
# script: bundle exec rspec spec
|
data/README.md
CHANGED
@@ -41,9 +41,43 @@ aligmment (note codon gaps are represented by '---')
|
|
41
41
|
end
|
42
42
|
```
|
43
43
|
|
44
|
+
Now add some state - you can define your own row state
|
45
|
+
|
46
|
+
```ruby
|
47
|
+
# tell the row to handle state
|
48
|
+
aln[0].extend(State)
|
49
|
+
# mark the first row for deletion
|
50
|
+
aln[0].state = MyStateDeleteObject.new
|
51
|
+
if aln.rows[0].state.deleted?
|
52
|
+
# do something
|
53
|
+
end
|
54
|
+
```
|
55
|
+
|
56
|
+
### Accessing columns
|
57
|
+
|
58
|
+
BioAlignment has a module for handling columns in an alignment. As
|
59
|
+
long as the contained sequence objects have the [] and length methods,
|
60
|
+
they can lazily be iterated by column. To get a column and iterate it
|
61
|
+
|
62
|
+
```ruby
|
63
|
+
column = aln.columns[3]
|
64
|
+
column.each do |element|
|
65
|
+
p element
|
66
|
+
end
|
67
|
+
```
|
68
|
+
|
69
|
+
Now add some state - you can define your own column state
|
70
|
+
|
71
|
+
```ruby
|
72
|
+
aln.columns[3].state = MyStateDeleteObject.new
|
73
|
+
if aln.columns[3].state.deleted?
|
74
|
+
# do something
|
75
|
+
end
|
76
|
+
```
|
77
|
+
|
44
78
|
### BioRuby Sequence objects
|
45
79
|
|
46
|
-
|
80
|
+
BioAlignment supports adding BioRuby's Bio::Sequence objects:
|
47
81
|
|
48
82
|
```ruby
|
49
83
|
require 'bio' # BioRuby
|
@@ -54,6 +88,14 @@ The BioAlignment supports BioRuby's Bio::Sequence objects:
|
|
54
88
|
aln << Bio::Sequence::NA.new("atg---tcaaaa")
|
55
89
|
```
|
56
90
|
|
91
|
+
and we can transform BioAlignment into BioRuby's Bio::Alignment and
|
92
|
+
use BioRuby functions
|
93
|
+
|
94
|
+
```ruby
|
95
|
+
bioruby_aln = aln.to_bioruby_alignment
|
96
|
+
bioruby_aln.consensus_iupac
|
97
|
+
```
|
98
|
+
|
57
99
|
### Pal2nal
|
58
100
|
|
59
101
|
A protein (amino acid) to nucleotide alignment would first load
|
@@ -72,7 +114,7 @@ the sequences
|
|
72
114
|
end
|
73
115
|
```
|
74
116
|
|
75
|
-
|
117
|
+
Writing a (simple) version of pal2nal would be something like
|
76
118
|
|
77
119
|
```ruby
|
78
120
|
fasta3 = FastaWriter.new('nt-aln.fa')
|
@@ -95,15 +137,33 @@ Write a (simple) version of pal2nal would be something like
|
|
95
137
|
end
|
96
138
|
```
|
97
139
|
|
98
|
-
With amino acid
|
99
|
-
|
140
|
+
With amino acid aa_aln and nucleotide nt_aln loaded, the library
|
141
|
+
version of pal2nal includes validation
|
100
142
|
|
101
143
|
```ruby
|
102
|
-
|
144
|
+
aln = aa_aln.pal2nal(nt_aln, :codon_table => 3, :do_validate => true)
|
103
145
|
```
|
104
146
|
|
105
147
|
resulting in the codon alignment.
|
106
148
|
|
149
|
+
### Alignment editing
|
150
|
+
|
151
|
+
BioAlignment supports multiple alignment editing features, which are
|
152
|
+
listed
|
153
|
+
[here](https://github.com/pjotrp/bioruby-alignment/tree/master/features/edit).
|
154
|
+
Each edition feature is added at runtime(!) Example:
|
155
|
+
|
156
|
+
```ruby
|
157
|
+
require 'bio-alignment/edit/del_bridges'
|
158
|
+
|
159
|
+
aln.extend DelBridges # bring the module into scope
|
160
|
+
aln2 = aln.clean(50) # execute the alignment editor
|
161
|
+
```
|
162
|
+
|
163
|
+
|
164
|
+
|
165
|
+
### See also
|
166
|
+
|
107
167
|
The API documentation is online. For more code examples see
|
108
168
|
[./spec/*.rb](https://github.com/pjotrp/bioruby-alignment/tree/master/spec) and
|
109
169
|
[./features/*](https://github.com/pjotrp/bioruby-alignment/tree/master/features).
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.0.
|
1
|
+
0.0.5
|
data/doc/bio-alignment-design.md
CHANGED
@@ -22,13 +22,13 @@ say we have a nucleotide sequence with pay load
|
|
22
22
|
5 9 * 1
|
23
23
|
|
24
24
|
most library implementations will have two strings "AGTA" and "59*1".
|
25
|
-
Removing the third
|
25
|
+
Removing the third nucleotide would mean removing it twice, into first
|
26
26
|
"AGA", and second "591". With bio-alignment this is one action because we
|
27
27
|
have one object for each element that contains both values, e.g. the
|
28
28
|
payload of 'T' is '*'. Moving 'T' automatically moves '*'.
|
29
29
|
|
30
30
|
In addition the bio-alignment library deals with codons and codon translation.
|
31
|
-
Rather than track
|
31
|
+
Rather than track multiple matrices, the codon is viewed as an element,
|
32
32
|
and the translated codon as the pay load. Again, when an alignment gets
|
33
33
|
reordered the code only has to do it in one place.
|
34
34
|
|
@@ -89,7 +89,7 @@ do a fancy
|
|
89
89
|
|
90
90
|
Elements in the list should respond to a gap? method, for an alignment
|
91
91
|
gap, and the undefined? method for a position that is either an
|
92
|
-
element or a gap. Also it should
|
92
|
+
element or a gap. Also it should respond to the to_s method.
|
93
93
|
|
94
94
|
An element can contain any pay load. If a list of attributes exists
|
95
95
|
in the sequence object, it can be used.
|
@@ -100,6 +100,11 @@ The column list tracks the columns of the alignment. The requirement
|
|
100
100
|
is that it should be iterable and can be indexed. The Column contains
|
101
101
|
no elements, but may point to a list when the alignment is transposed.
|
102
102
|
|
103
|
+
One of the 'features' of this library is that the Column access logic is
|
104
|
+
split out into a separate module, which accesses the data in a lazy fashion.
|
105
|
+
Also column state is stored as an 'any object'. I.e. a column can contain
|
106
|
+
any state.
|
107
|
+
|
103
108
|
## Matrix or MSA
|
104
109
|
|
105
110
|
The Matrix consists of a Column list, multiple Sequences, in turn
|
@@ -130,6 +135,26 @@ The Matrix can be accessed in transposed fashion, but accessing the normal
|
|
130
135
|
matrix and transposed matrix at the same time is not supported. Matrix is not
|
131
136
|
designed to be transaction safe - though you can copy the Matrix any time.
|
132
137
|
|
138
|
+
## Adding functionality
|
139
|
+
|
140
|
+
To ascertain that the basic BioAlignment does not get polluted, extra functionality
|
141
|
+
is added by Modules. These modules can be added at run time(!) One advantage is
|
142
|
+
that there is less name space pollution, the other is that different implementations
|
143
|
+
can be plugged in - using the same interface. For example, here we are going to
|
144
|
+
use an alignment editor named DelBridges, which has a method named clean:
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
require 'bio-alignment/edit/del_bridges'
|
148
|
+
|
149
|
+
aln = Alignment.new(string.split(/\n/))
|
150
|
+
aln.extend DelBridges # bring the module into scope
|
151
|
+
aln2 = aln.clean
|
152
|
+
```
|
153
|
+
|
154
|
+
in other words, the functionality in DelBridges gets attached to the aln
|
155
|
+
instance at run time, without affecting any other Alignment object(!) Also,
|
156
|
+
when not requiring 'bio-alignment/edit/del_bridges', the functionality is never
|
157
|
+
visible, and never added to the environment.
|
133
158
|
|
134
159
|
|
135
160
|
Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
|
data/features/bioruby-feature.rb
CHANGED
@@ -67,14 +67,18 @@ end
|
|
67
67
|
# ----
|
68
68
|
|
69
69
|
Given /^I have a BioAlignment$/ do
|
70
|
-
|
70
|
+
@aln1 = Alignment.new
|
71
|
+
fasta = FastaReader.new('test/data/fasta/codon/aa-alignment.fa')
|
72
|
+
fasta.each do | rec |
|
73
|
+
@aln1.sequences << Sequence.new(rec.id, rec.seq)
|
74
|
+
end
|
71
75
|
end
|
72
76
|
|
73
77
|
When /^I convert$/ do
|
74
|
-
|
78
|
+
@bioruby_alignment = @aln1.to_bioruby_alignment
|
75
79
|
end
|
76
80
|
|
77
|
-
Then /^I should have a Bio::Alignment$/ do
|
78
|
-
|
81
|
+
Then /^I should have a BioRuby Bio::Alignment$/ do
|
82
|
+
@bioruby_alignment.consensus_iupac[0..8].should == '???????v?'
|
79
83
|
end
|
80
84
|
|
data/features/bioruby.feature
CHANGED
@@ -3,6 +3,7 @@ Feature: BioAlignment should play with BioRuby
|
|
3
3
|
I want to convert BioAlignment to Bio::Alignment
|
4
4
|
And I want to support Bio::Sequence objects
|
5
5
|
|
6
|
+
@bioruby
|
6
7
|
Scenario: Use Bio::Sequence to fill BioAlignment
|
7
8
|
Given I have multiple Bio::Sequence objects
|
8
9
|
When I assign BioAlignment
|
@@ -17,7 +18,7 @@ Feature: BioAlignment should play with BioRuby
|
|
17
18
|
And and return a partial AA sequence
|
18
19
|
And be AA indexable
|
19
20
|
|
20
|
-
Scenario: Convert BioAlignment to Bio::Alignment
|
21
|
+
Scenario: Convert BioAlignment to BioRuby Bio::Alignment
|
21
22
|
Given I have a BioAlignment
|
22
23
|
When I convert
|
23
|
-
Then I should have a Bio::Alignment
|
24
|
+
Then I should have a BioRuby Bio::Alignment
|
@@ -0,0 +1,33 @@
|
|
1
|
+
Given /^I iterate the columns$/ do
|
2
|
+
@aln.should_not be_nil # aln is loaded by codon-feature.rb
|
3
|
+
end
|
4
|
+
|
5
|
+
column = nil
|
6
|
+
When /^I fetch a column$/ do
|
7
|
+
column = @aln.columns[3]
|
8
|
+
column.should_not be_nil
|
9
|
+
column[0].to_s.should == 'cga'
|
10
|
+
end
|
11
|
+
|
12
|
+
When /^I inject column state$/ do
|
13
|
+
column.state = ColumnState.new
|
14
|
+
column.state.deleted = true
|
15
|
+
end
|
16
|
+
|
17
|
+
Then /^I should be able to get the column state$/ do
|
18
|
+
column.state.deleted?.should be_true
|
19
|
+
end
|
20
|
+
|
21
|
+
list = []
|
22
|
+
When /^I iterate a column$/ do
|
23
|
+
column10 = @aln.columns[10]
|
24
|
+
column10.each do | element |
|
25
|
+
list << element.to_s
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
Then /^I should get the column elements$/ do
|
30
|
+
list[0..10].should ==
|
31
|
+
["ctt", "gcg", "ctt", "ttt", "gcg", "ttt", "ttt", "agt", "ttt", "atg", "agt"]
|
32
|
+
end
|
33
|
+
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Feature: Alignment column support
|
2
|
+
In order to access an alignment by column
|
3
|
+
I want to access column state
|
4
|
+
I want to get all elements in a column
|
5
|
+
|
6
|
+
@columns
|
7
|
+
Scenario: Access column information in an alignment
|
8
|
+
Given I read an MSA nucleotide FASTA file in the test/data folder
|
9
|
+
When I fetch a column
|
10
|
+
When I inject column state
|
11
|
+
Then I should be able to get the column state
|
12
|
+
When I iterate a column
|
13
|
+
Then I should get the column elements
|
14
|
+
|
@@ -0,0 +1,21 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
require 'bio-alignment/edit/del_bridges'
|
3
|
+
|
4
|
+
Given /^I have a bridged alignment$/ do |string|
|
5
|
+
@aln = Alignment.new(string.split(/\n/))
|
6
|
+
end
|
7
|
+
|
8
|
+
When /^I apply the bridge rule$/ do
|
9
|
+
@aln.extend DelBridges
|
10
|
+
aln2 = @aln.clean
|
11
|
+
end
|
12
|
+
|
13
|
+
Then /^it should have removed (\d+) bridges$/ do |arg1, string|
|
14
|
+
pending # express the regexp above with the code you wish you had
|
15
|
+
end
|
16
|
+
|
17
|
+
Then /^I should be able to track removed columns$/ do
|
18
|
+
pending # express the regexp above with the code you wish you had
|
19
|
+
end
|
20
|
+
|
21
|
+
|
@@ -0,0 +1,36 @@
|
|
1
|
+
Feature: Alignment editing, the bridge rule
|
2
|
+
Remove columns that contain too many gaps
|
3
|
+
|
4
|
+
Drop all bridges in less than 'min_bridges_fraction' (default 1/3 or 33%).
|
5
|
+
|
6
|
+
The dropped columns are tracked by the table columns.
|
7
|
+
|
8
|
+
@dev
|
9
|
+
Scenario: Apply bridge rule to an amino acid alignment
|
10
|
+
Given I have a bridged alignment
|
11
|
+
"""
|
12
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
13
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
14
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
15
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
16
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
17
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
18
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
19
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
20
|
+
-------------IFHAVR-TC-HP-----------------
|
21
|
+
"""
|
22
|
+
When I apply the bridge rule
|
23
|
+
Then it should have removed 4 bridges
|
24
|
+
"""
|
25
|
+
SNSFSRPTIIFSGCSTACSGKSELVCGFRSFMLSDV
|
26
|
+
SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
|
27
|
+
SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
|
28
|
+
PKLFSRPTIIFSGCSTACSGKSEPVCGFRSFMLSDV
|
29
|
+
------PTIIFSGCSKACSGKSELVCGFRSFMLSDV
|
30
|
+
------PTIIFSGCSKACSGK---FRSFRSFMLSAV
|
31
|
+
------PTIIFSGCSKACSGK---VCGIFHAVRSFM
|
32
|
+
------PTIIFSGCSKACSGKSELVCGFRSFMLSAV
|
33
|
+
---------IFHAVR-TC-HP---------------
|
34
|
+
"""
|
35
|
+
Then I should be able to track removed columns
|
36
|
+
|
@@ -0,0 +1,13 @@
|
|
1
|
+
When /^I apply GBlocks$/ do
|
2
|
+
pending # express the regexp above with the code you wish you had
|
3
|
+
end
|
4
|
+
|
5
|
+
Then /^it should return the GBlocks cleaned alignment$/ do
|
6
|
+
pending # express the regexp above with the code you wish you had
|
7
|
+
end
|
8
|
+
|
9
|
+
Then /^return a list of removed columns$/ do
|
10
|
+
pending # express the regexp above with the code you wish you had
|
11
|
+
end
|
12
|
+
|
13
|
+
|
@@ -0,0 +1,31 @@
|
|
1
|
+
Feature: GBlocks implementation in Ruby
|
2
|
+
|
3
|
+
The GBlocks routine is often used, but the source code is not open source. This
|
4
|
+
is a feature request for a reimplementation of GBlocks. Some links:
|
5
|
+
|
6
|
+
Open sourcing request by Debian: http://lists.debian.org/debian-med/2011/02/msg00008.html
|
7
|
+
|
8
|
+
Binary download of GBlocks: http://molevol.cmima.csic.es/castresana/Gblocks.html
|
9
|
+
|
10
|
+
Documentation: http://molevol.cmima.csic.es/castresana/Gblocks/Gblocks_documentation.html
|
11
|
+
|
12
|
+
It is quite a simple routine, and would be easy to validate against existing outcomes.
|
13
|
+
|
14
|
+
Scenario: Apply GBlocks to an alignment
|
15
|
+
Given I have an alignment
|
16
|
+
"""
|
17
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
18
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
19
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
20
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
21
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
22
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
23
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
24
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
25
|
+
-------------IFHAVR-TC-HP-----------------
|
26
|
+
"""
|
27
|
+
When I apply GBlocks
|
28
|
+
Then it should return the GBlocks cleaned alignment
|
29
|
+
And return a list of removed columns
|
30
|
+
|
31
|
+
|
@@ -0,0 +1,12 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
|
3
|
+
When /^I apply island rule with max_gap_size (\d+)$/ do |arg1|
|
4
|
+
pending # express the regexp above with the code you wish you had
|
5
|
+
end
|
6
|
+
|
7
|
+
Then /^it should result in$/ do |string|
|
8
|
+
pending # express the regexp above with the code you wish you had
|
9
|
+
end
|
10
|
+
|
11
|
+
|
12
|
+
|
@@ -0,0 +1,44 @@
|
|
1
|
+
Feature: Alignment editing with the Island rule
|
2
|
+
The idea is to drop hypervariable floating sequences, as they are probably
|
3
|
+
misaligned.
|
4
|
+
|
5
|
+
Drop all 'islands' in a sequence with low island consensus, that show a gap
|
6
|
+
larger than 'max_gap_size' (default 6) on both sides, and are shorter than
|
7
|
+
'min_island_size' (default 30). The latter may be a large size, as an island
|
8
|
+
needs to loop in and out several times to be (arguably) functional. We also
|
9
|
+
add a parameter 'max_gap_size_inside' (default 2) which allows for small gaps
|
10
|
+
inside the island - though the total island size is calculated including
|
11
|
+
those small gaps.
|
12
|
+
|
13
|
+
The island consensus is calculated by column.
|
14
|
+
'max_island_elements_unique_percentage' (default 10%) of elements in the
|
15
|
+
island should have a 'min_island_column_matched' (default 1) somewhere in the
|
16
|
+
element's column.
|
17
|
+
|
18
|
+
Scenario: Apply island rule to an amino acid alignment
|
19
|
+
Given I have an alignment
|
20
|
+
"""
|
21
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
22
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
23
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
24
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
25
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
26
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
27
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
28
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
29
|
+
-------------IFHAVR-TC-HP-----------------
|
30
|
+
"""
|
31
|
+
When I apply island rule with max_gap_size 4
|
32
|
+
Then it should have removed 2 islands
|
33
|
+
"""
|
34
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
35
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
36
|
+
SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
|
37
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
38
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
39
|
+
----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
|
40
|
+
----------PTIIFSGCSKACSGK-----------------
|
41
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
42
|
+
------------------------------------------
|
43
|
+
"""
|
44
|
+
|
@@ -0,0 +1,16 @@
|
|
1
|
+
require 'bio-alignment'
|
2
|
+
|
3
|
+
Given /^I have an alignment$/ do |string|
|
4
|
+
@aln = Alignment.new(string.split(/\n/))
|
5
|
+
p @aln
|
6
|
+
end
|
7
|
+
|
8
|
+
When /^I apply rule masking with X and max_gap_size (\d+)$/ do |arg1|
|
9
|
+
pending # express the regexp above with the code you wish you had
|
10
|
+
end
|
11
|
+
|
12
|
+
Then /^it should have removed (\d+) islands$/ do |arg1, string|
|
13
|
+
pending # express the regexp above with the code you wish you had
|
14
|
+
end
|
15
|
+
|
16
|
+
|
@@ -0,0 +1,36 @@
|
|
1
|
+
Feature: Alignment editing masking serial mutations
|
2
|
+
Edit an alignment removing or masking unique elements column-wise.
|
3
|
+
|
4
|
+
If a sequence has a unique AA in a column it is a single mutation event. If
|
5
|
+
multiple neighbouring AA's are also unique we suspect the sequence is an
|
6
|
+
outlier. This rule masks, or deletes, stretches of unique AAs. The stretch of
|
7
|
+
unique AA's is defined in 'max_serial_unique' (default 5, so two bordering
|
8
|
+
unique AA's are allowed).
|
9
|
+
|
10
|
+
Scenario: Apply rule to an amino acid alignment
|
11
|
+
Given I have an alignment
|
12
|
+
"""
|
13
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
14
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
15
|
+
SSIISNSFSRPTIIFSGCSTACSQQKLTSEQVCFR---LSDV
|
16
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
17
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
18
|
+
----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
|
19
|
+
----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
|
20
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
21
|
+
-------------IFHAVR-TC-HP-----------------
|
22
|
+
"""
|
23
|
+
When I apply rule masking with X and max_gap_size 5
|
24
|
+
Then it should result in
|
25
|
+
"""
|
26
|
+
----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
|
27
|
+
SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
|
28
|
+
SSIISNSFSRPTIIFSGCSTACXXXXXXXXXXXFR---LSDV
|
29
|
+
----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
|
30
|
+
----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
|
31
|
+
----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
|
32
|
+
----------PTIIFSGCSKACSGK-----------------
|
33
|
+
----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
|
34
|
+
-------------XXXXXX-XX-XX-----------------
|
35
|
+
"""
|
36
|
+
|
@@ -0,0 +1,35 @@
|
|
1
|
+
Given /^I iterate the rows$/ do
|
2
|
+
@aln.should_not be_nil # aln is loaded by codon-feature.rb
|
3
|
+
end
|
4
|
+
|
5
|
+
row = nil
|
6
|
+
When /^I fetch a row$/ do
|
7
|
+
row = @aln.rows[3]
|
8
|
+
row.should_not be_nil
|
9
|
+
row[0].to_s.should == '---'
|
10
|
+
end
|
11
|
+
|
12
|
+
When /^I inject row state$/ do
|
13
|
+
# tell row to handle state
|
14
|
+
row.extend(State)
|
15
|
+
row.state = RowState.new
|
16
|
+
row.state.deleted = true
|
17
|
+
end
|
18
|
+
|
19
|
+
Then /^I should be able to get the row state$/ do
|
20
|
+
row.state.deleted?.should be_true
|
21
|
+
end
|
22
|
+
|
23
|
+
list = []
|
24
|
+
When /^I iterate a row$/ do
|
25
|
+
row10 = @aln.rows[10]
|
26
|
+
row10.each do | element |
|
27
|
+
list << element.to_s
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
Then /^I should get the row elements$/ do
|
32
|
+
list[0..10].should == ["---", "---", "---", "---", "---", "---", "---", "atg", "tcg", "tcc", "agt"]
|
33
|
+
|
34
|
+
end
|
35
|
+
|
@@ -0,0 +1,14 @@
|
|
1
|
+
Feature: Alignment row support
|
2
|
+
In order to access an alignment by row
|
3
|
+
I want to access the row state
|
4
|
+
|
5
|
+
@rows
|
6
|
+
Scenario: Access row information in an alignment
|
7
|
+
Given I read an MSA nucleotide FASTA file in the test/data folder
|
8
|
+
When I fetch a row
|
9
|
+
When I inject row state
|
10
|
+
Then I should be able to get the row state
|
11
|
+
When I iterate a row
|
12
|
+
Then I should get the row elements
|
13
|
+
|
14
|
+
|
@@ -1,6 +1,7 @@
|
|
1
1
|
# Alignment
|
2
2
|
|
3
3
|
require 'bio-alignment/pal2nal'
|
4
|
+
require 'bio-alignment/column'
|
4
5
|
|
5
6
|
module Bio
|
6
7
|
|
@@ -9,22 +10,37 @@ module Bio
|
|
9
10
|
class Alignment
|
10
11
|
include Enumerable
|
11
12
|
include Pal2Nal
|
13
|
+
include Columns
|
12
14
|
|
13
15
|
attr_accessor :sequences
|
14
16
|
|
15
|
-
|
17
|
+
# Create alignment. seqs can be a list of sequences. If these
|
18
|
+
# are String types, they get converted to the library Sequence
|
19
|
+
# container
|
20
|
+
def initialize seqs = nil
|
16
21
|
@sequences = []
|
22
|
+
if seqs
|
23
|
+
seqs.each_with_index do | seq, i |
|
24
|
+
@sequences <<
|
25
|
+
if seq.kind_of?(String)
|
26
|
+
Sequence.new(i,seq)
|
27
|
+
else
|
28
|
+
seq
|
29
|
+
end
|
30
|
+
end
|
31
|
+
end
|
17
32
|
end
|
18
33
|
|
19
34
|
alias rows sequences
|
20
35
|
|
21
|
-
|
22
|
-
|
23
|
-
|
36
|
+
def [] index
|
37
|
+
rows[index]
|
38
|
+
end
|
24
39
|
|
25
40
|
def each
|
26
41
|
rows.each { | seq | yield seq }
|
27
42
|
end
|
43
|
+
|
28
44
|
end
|
29
45
|
end
|
30
46
|
end
|
@@ -0,0 +1,47 @@
|
|
1
|
+
require 'bio-alignment/state'
|
2
|
+
|
3
|
+
module Bio
|
4
|
+
|
5
|
+
module BioAlignment
|
6
|
+
|
7
|
+
# The Columns module provides accessors for the column list
|
8
|
+
# returning Column objects
|
9
|
+
module Columns
|
10
|
+
|
11
|
+
# Return a list of Column objects. The contents of the
|
12
|
+
# columns are accessed lazily
|
13
|
+
def columns
|
14
|
+
(0..num_columns-1).map { | col | Column.new(self,col) }
|
15
|
+
end
|
16
|
+
|
17
|
+
def num_columns
|
18
|
+
rows.first.length
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
# Support the notion of columns in an alignment. A column
|
23
|
+
# can have state by attaching state objects
|
24
|
+
class Column
|
25
|
+
include State
|
26
|
+
|
27
|
+
def initialize aln, col
|
28
|
+
@aln = aln
|
29
|
+
@col = col
|
30
|
+
end
|
31
|
+
|
32
|
+
def [] index
|
33
|
+
@aln[index][@col]
|
34
|
+
end
|
35
|
+
|
36
|
+
# iterator fetches a column on demand
|
37
|
+
def each
|
38
|
+
@aln.each do | seq |
|
39
|
+
yield seq[@col]
|
40
|
+
end
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
end
|
45
|
+
|
46
|
+
end
|
47
|
+
|
@@ -3,17 +3,24 @@
|
|
3
3
|
module Bio
|
4
4
|
module BioAlignment
|
5
5
|
module Pal2Nal
|
6
|
-
|
6
|
+
|
7
|
+
# Protein to nucleotide alignment, using a codon table for testing. If :do_validate is
|
8
|
+
# false, translation for validation is skipped (note CodonSequence translation is lazy).
|
9
|
+
def pal2nal nt_aln, options = { :codon_table => 1, :do_validate => true }
|
10
|
+
do_validate = options[:do_validate]
|
7
11
|
aa_aln = self
|
8
12
|
codon_aln = Alignment.new
|
9
13
|
aa_aln.each_with_index do | aaseq, i |
|
10
14
|
ntseq = nt_aln.sequences[i]
|
11
15
|
raise "pal2nal sequence IDs do not match (for #{aaseq.id} != #{ntseq.id})" if aaseq.id != ntseq.id
|
12
16
|
raise "pal2nal sequence size does not match (for #{aaseq.id}'s #{aaseq.to_s.size}!= #{ntseq.to_s.size * 3})" if aaseq.id != ntseq.id
|
17
|
+
# create a Codon sequence out of the nucleotide sequence (no gaps)
|
13
18
|
codonseq = CodonSequence.new(ntseq.id, ntseq.seq, options)
|
14
19
|
|
15
20
|
codon_pos = 0
|
16
21
|
result = []
|
22
|
+
|
23
|
+
# now fill the result array by finding codons and gaps, and testing for valid amino acids
|
17
24
|
aaseq.each do | aa |
|
18
25
|
result <<
|
19
26
|
if aa.gap?
|
@@ -21,11 +28,12 @@ module Bio
|
|
21
28
|
else
|
22
29
|
codon = codonseq[codon_pos]
|
23
30
|
# validate codon translates to amino acid
|
24
|
-
raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if codon.to_aa != aa.to_s
|
31
|
+
raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if do_validate and codon.to_aa != aa.to_s
|
25
32
|
codon_pos += 1
|
26
33
|
codon.to_s
|
27
34
|
end
|
28
35
|
end
|
36
|
+
# the new result is transformed to a gapped CodonSequence
|
29
37
|
codon_seq = CodonSequence.new(aaseq.id, result.join(''), options)
|
30
38
|
codon_aln.sequences << codon_seq
|
31
39
|
end
|
@@ -18,6 +18,7 @@ module Bio
|
|
18
18
|
#
|
19
19
|
class Sequence
|
20
20
|
include Enumerable
|
21
|
+
include State
|
21
22
|
|
22
23
|
attr_reader :id, :seq
|
23
24
|
def initialize id, seq
|
@@ -29,6 +30,10 @@ module Bio
|
|
29
30
|
@seq[index]
|
30
31
|
end
|
31
32
|
|
33
|
+
def length
|
34
|
+
@seq.length
|
35
|
+
end
|
36
|
+
|
32
37
|
def each
|
33
38
|
@seq.each_char { | c | yield Element.new(c) }
|
34
39
|
end
|
@@ -0,0 +1,31 @@
|
|
1
|
+
module Bio
|
2
|
+
|
3
|
+
module BioAlignment
|
4
|
+
|
5
|
+
module State
|
6
|
+
attr_accessor :state
|
7
|
+
end
|
8
|
+
|
9
|
+
# Convenience class for tracking state. Note you can add
|
10
|
+
# any class you like
|
11
|
+
class ColumnState
|
12
|
+
attr_accessor :deleted
|
13
|
+
|
14
|
+
def deleted?
|
15
|
+
deleted == true
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
# Convenience class for tracking state. Note you can add
|
20
|
+
# any class you like
|
21
|
+
class RowState
|
22
|
+
attr_accessor :deleted
|
23
|
+
|
24
|
+
def deleted?
|
25
|
+
deleted == true
|
26
|
+
end
|
27
|
+
end
|
28
|
+
|
29
|
+
end
|
30
|
+
|
31
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: bio-alignment
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.5
|
5
5
|
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
@@ -9,11 +9,11 @@ authors:
|
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-02-
|
12
|
+
date: 2012-02-28 00:00:00.000000000Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: bio-logger
|
16
|
-
requirement: &
|
16
|
+
requirement: &11633860 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ! '>='
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: '0'
|
22
22
|
type: :runtime
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *11633860
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: bio
|
27
|
-
requirement: &
|
27
|
+
requirement: &11632600 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ! '>='
|
@@ -32,10 +32,10 @@ dependencies:
|
|
32
32
|
version: 1.4.2
|
33
33
|
type: :runtime
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *11632600
|
36
36
|
- !ruby/object:Gem::Dependency
|
37
37
|
name: bio-bigbio
|
38
|
-
requirement: &
|
38
|
+
requirement: &11622800 !ruby/object:Gem::Requirement
|
39
39
|
none: false
|
40
40
|
requirements:
|
41
41
|
- - ! '>'
|
@@ -43,10 +43,10 @@ dependencies:
|
|
43
43
|
version: 0.1.3
|
44
44
|
type: :development
|
45
45
|
prerelease: false
|
46
|
-
version_requirements: *
|
46
|
+
version_requirements: *11622800
|
47
47
|
- !ruby/object:Gem::Dependency
|
48
48
|
name: cucumber
|
49
|
-
requirement: &
|
49
|
+
requirement: &11622000 !ruby/object:Gem::Requirement
|
50
50
|
none: false
|
51
51
|
requirements:
|
52
52
|
- - ! '>='
|
@@ -54,10 +54,10 @@ dependencies:
|
|
54
54
|
version: '0'
|
55
55
|
type: :development
|
56
56
|
prerelease: false
|
57
|
-
version_requirements: *
|
57
|
+
version_requirements: *11622000
|
58
58
|
- !ruby/object:Gem::Dependency
|
59
59
|
name: rspec
|
60
|
-
requirement: &
|
60
|
+
requirement: &11621400 !ruby/object:Gem::Requirement
|
61
61
|
none: false
|
62
62
|
requirements:
|
63
63
|
- - ~>
|
@@ -65,10 +65,10 @@ dependencies:
|
|
65
65
|
version: 2.3.0
|
66
66
|
type: :development
|
67
67
|
prerelease: false
|
68
|
-
version_requirements: *
|
68
|
+
version_requirements: *11621400
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: bundler
|
71
|
-
requirement: &
|
71
|
+
requirement: &11620480 !ruby/object:Gem::Requirement
|
72
72
|
none: false
|
73
73
|
requirements:
|
74
74
|
- - ~>
|
@@ -76,10 +76,10 @@ dependencies:
|
|
76
76
|
version: 1.0.0
|
77
77
|
type: :development
|
78
78
|
prerelease: false
|
79
|
-
version_requirements: *
|
79
|
+
version_requirements: *11620480
|
80
80
|
- !ruby/object:Gem::Dependency
|
81
81
|
name: jeweler
|
82
|
-
requirement: &
|
82
|
+
requirement: &11619680 !ruby/object:Gem::Requirement
|
83
83
|
none: false
|
84
84
|
requirements:
|
85
85
|
- - ~>
|
@@ -87,7 +87,7 @@ dependencies:
|
|
87
87
|
version: 1.7.0
|
88
88
|
type: :development
|
89
89
|
prerelease: false
|
90
|
-
version_requirements: *
|
90
|
+
version_requirements: *11619680
|
91
91
|
description: Alignment handler for multiple sequence alignments (MSA)
|
92
92
|
email: pjotr.public01@thebird.nl
|
93
93
|
executables:
|
@@ -99,6 +99,7 @@ extra_rdoc_files:
|
|
99
99
|
files:
|
100
100
|
- .document
|
101
101
|
- .rspec
|
102
|
+
- .travis.yml
|
102
103
|
- Gemfile
|
103
104
|
- LICENSE.txt
|
104
105
|
- README.md
|
@@ -110,14 +111,30 @@ files:
|
|
110
111
|
- features/bioruby.feature
|
111
112
|
- features/codon-feature.rb
|
112
113
|
- features/codon.feature
|
114
|
+
- features/columns-feature.rb
|
115
|
+
- features/columns.feature
|
116
|
+
- features/edit/del_bridges-feature.rb
|
117
|
+
- features/edit/del_bridges.feature
|
118
|
+
- features/edit/del_non_informative_sequences.feature
|
119
|
+
- features/edit/gblocks-feature.rb
|
120
|
+
- features/edit/gblocks.feature
|
121
|
+
- features/edit/mask_islands-feature.rb
|
122
|
+
- features/edit/mask_islands.feature
|
123
|
+
- features/edit/mask_serial_mutations-feature.rb
|
124
|
+
- features/edit/mask_serial_mutations.feature
|
113
125
|
- features/pal2nal-feature.rb
|
114
126
|
- features/pal2nal.feature
|
127
|
+
- features/rows-feature.rb
|
128
|
+
- features/rows.feature
|
115
129
|
- lib/bio-alignment.rb
|
116
130
|
- lib/bio-alignment/alignment.rb
|
117
131
|
- lib/bio-alignment/bioruby.rb
|
118
132
|
- lib/bio-alignment/codonsequence.rb
|
133
|
+
- lib/bio-alignment/column.rb
|
134
|
+
- lib/bio-alignment/edit/del_bridges.rb
|
119
135
|
- lib/bio-alignment/pal2nal.rb
|
120
136
|
- lib/bio-alignment/sequence.rb
|
137
|
+
- lib/bio-alignment/state.rb
|
121
138
|
- spec/bio-alignment_spec.rb
|
122
139
|
- spec/spec_helper.rb
|
123
140
|
- test/data/fasta/codon/aa-alignment.fa
|
@@ -141,7 +158,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
141
158
|
version: '0'
|
142
159
|
segments:
|
143
160
|
- 0
|
144
|
-
hash:
|
161
|
+
hash: 1565072942973090495
|
145
162
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
146
163
|
none: false
|
147
164
|
requirements:
|