bio-alignment 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
data/.travis.yml ADDED
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - jruby-19mode # JRuby in 1.9 mode
5
+ # - 1.8.7
6
+ # - 1.9.3
7
+ # - rbx-19mode
8
+ # - jruby-18mode # JRuby in 1.8 mode
9
+ # - rbx-18mode
10
+
11
+ # uncomment this line if your project needs to run something other than `rake`:
12
+ # script: bundle exec rspec spec
data/README.md CHANGED
@@ -41,9 +41,43 @@ aligmment (note codon gaps are represented by '---')
41
41
  end
42
42
  ```
43
43
 
44
+ Now add some state - you can define your own row state
45
+
46
+ ```ruby
47
+ # tell the row to handle state
48
+ aln[0].extend(State)
49
+ # mark the first row for deletion
50
+ aln[0].state = MyStateDeleteObject.new
51
+ if aln.rows[0].state.deleted?
52
+ # do something
53
+ end
54
+ ```
55
+
56
+ ### Accessing columns
57
+
58
+ BioAlignment has a module for handling columns in an alignment. As
59
+ long as the contained sequence objects have the [] and length methods,
60
+ they can lazily be iterated by column. To get a column and iterate it
61
+
62
+ ```ruby
63
+ column = aln.columns[3]
64
+ column.each do |element|
65
+ p element
66
+ end
67
+ ```
68
+
69
+ Now add some state - you can define your own column state
70
+
71
+ ```ruby
72
+ aln.columns[3].state = MyStateDeleteObject.new
73
+ if aln.columns[3].state.deleted?
74
+ # do something
75
+ end
76
+ ```
77
+
44
78
  ### BioRuby Sequence objects
45
79
 
46
- The BioAlignment supports BioRuby's Bio::Sequence objects:
80
+ BioAlignment supports adding BioRuby's Bio::Sequence objects:
47
81
 
48
82
  ```ruby
49
83
  require 'bio' # BioRuby
@@ -54,6 +88,14 @@ The BioAlignment supports BioRuby's Bio::Sequence objects:
54
88
  aln << Bio::Sequence::NA.new("atg---tcaaaa")
55
89
  ```
56
90
 
91
+ and we can transform BioAlignment into BioRuby's Bio::Alignment and
92
+ use BioRuby functions
93
+
94
+ ```ruby
95
+ bioruby_aln = aln.to_bioruby_alignment
96
+ bioruby_aln.consensus_iupac
97
+ ```
98
+
57
99
  ### Pal2nal
58
100
 
59
101
  A protein (amino acid) to nucleotide alignment would first load
@@ -72,7 +114,7 @@ the sequences
72
114
  end
73
115
  ```
74
116
 
75
- Write a (simple) version of pal2nal would be something like
117
+ Writing a (simple) version of pal2nal would be something like
76
118
 
77
119
  ```ruby
78
120
  fasta3 = FastaWriter.new('nt-aln.fa')
@@ -95,15 +137,33 @@ Write a (simple) version of pal2nal would be something like
95
137
  end
96
138
  ```
97
139
 
98
- With amino acid aln1 and nucleotide aln2 loaded, the library version
99
- includes validation, and is the shorter command
140
+ With amino acid aa_aln and nucleotide nt_aln loaded, the library
141
+ version of pal2nal includes validation
100
142
 
101
143
  ```ruby
102
- aln3 = aln1.pal2nal(aln2, :codon_table => 3)
144
+ aln = aa_aln.pal2nal(nt_aln, :codon_table => 3, :do_validate => true)
103
145
  ```
104
146
 
105
147
  resulting in the codon alignment.
106
148
 
149
+ ### Alignment editing
150
+
151
+ BioAlignment supports multiple alignment editing features, which are
152
+ listed
153
+ [here](https://github.com/pjotrp/bioruby-alignment/tree/master/features/edit).
154
+ Each edition feature is added at runtime(!) Example:
155
+
156
+ ```ruby
157
+ require 'bio-alignment/edit/del_bridges'
158
+
159
+ aln.extend DelBridges # bring the module into scope
160
+ aln2 = aln.clean(50) # execute the alignment editor
161
+ ```
162
+
163
+
164
+
165
+ ### See also
166
+
107
167
  The API documentation is online. For more code examples see
108
168
  [./spec/*.rb](https://github.com/pjotrp/bioruby-alignment/tree/master/spec) and
109
169
  [./features/*](https://github.com/pjotrp/bioruby-alignment/tree/master/features).
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.0.4
1
+ 0.0.5
@@ -22,13 +22,13 @@ say we have a nucleotide sequence with pay load
22
22
  5 9 * 1
23
23
 
24
24
  most library implementations will have two strings "AGTA" and "59*1".
25
- Removing the third nucleodide would mean removing it twice, into first
25
+ Removing the third nucleotide would mean removing it twice, into first
26
26
  "AGA", and second "591". With bio-alignment this is one action because we
27
27
  have one object for each element that contains both values, e.g. the
28
28
  payload of 'T' is '*'. Moving 'T' automatically moves '*'.
29
29
 
30
30
  In addition the bio-alignment library deals with codons and codon translation.
31
- Rather than track mulitiple matrices, the codon is viewed as an element,
31
+ Rather than track multiple matrices, the codon is viewed as an element,
32
32
  and the translated codon as the pay load. Again, when an alignment gets
33
33
  reordered the code only has to do it in one place.
34
34
 
@@ -89,7 +89,7 @@ do a fancy
89
89
 
90
90
  Elements in the list should respond to a gap? method, for an alignment
91
91
  gap, and the undefined? method for a position that is either an
92
- element or a gap. Also it should respont to the to_s method.
92
+ element or a gap. Also it should respond to the to_s method.
93
93
 
94
94
  An element can contain any pay load. If a list of attributes exists
95
95
  in the sequence object, it can be used.
@@ -100,6 +100,11 @@ The column list tracks the columns of the alignment. The requirement
100
100
  is that it should be iterable and can be indexed. The Column contains
101
101
  no elements, but may point to a list when the alignment is transposed.
102
102
 
103
+ One of the 'features' of this library is that the Column access logic is
104
+ split out into a separate module, which accesses the data in a lazy fashion.
105
+ Also column state is stored as an 'any object'. I.e. a column can contain
106
+ any state.
107
+
103
108
  ## Matrix or MSA
104
109
 
105
110
  The Matrix consists of a Column list, multiple Sequences, in turn
@@ -130,6 +135,26 @@ The Matrix can be accessed in transposed fashion, but accessing the normal
130
135
  matrix and transposed matrix at the same time is not supported. Matrix is not
131
136
  designed to be transaction safe - though you can copy the Matrix any time.
132
137
 
138
+ ## Adding functionality
139
+
140
+ To ascertain that the basic BioAlignment does not get polluted, extra functionality
141
+ is added by Modules. These modules can be added at run time(!) One advantage is
142
+ that there is less name space pollution, the other is that different implementations
143
+ can be plugged in - using the same interface. For example, here we are going to
144
+ use an alignment editor named DelBridges, which has a method named clean:
145
+
146
+ ```ruby
147
+ require 'bio-alignment/edit/del_bridges'
148
+
149
+ aln = Alignment.new(string.split(/\n/))
150
+ aln.extend DelBridges # bring the module into scope
151
+ aln2 = aln.clean
152
+ ```
153
+
154
+ in other words, the functionality in DelBridges gets attached to the aln
155
+ instance at run time, without affecting any other Alignment object(!) Also,
156
+ when not requiring 'bio-alignment/edit/del_bridges', the functionality is never
157
+ visible, and never added to the environment.
133
158
 
134
159
 
135
160
  Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
@@ -67,14 +67,18 @@ end
67
67
  # ----
68
68
 
69
69
  Given /^I have a BioAlignment$/ do
70
- pending # express the regexp above with the code you wish you had
70
+ @aln1 = Alignment.new
71
+ fasta = FastaReader.new('test/data/fasta/codon/aa-alignment.fa')
72
+ fasta.each do | rec |
73
+ @aln1.sequences << Sequence.new(rec.id, rec.seq)
74
+ end
71
75
  end
72
76
 
73
77
  When /^I convert$/ do
74
- pending # express the regexp above with the code you wish you had
78
+ @bioruby_alignment = @aln1.to_bioruby_alignment
75
79
  end
76
80
 
77
- Then /^I should have a Bio::Alignment$/ do
78
- pending # express the regexp above with the code you wish you had
81
+ Then /^I should have a BioRuby Bio::Alignment$/ do
82
+ @bioruby_alignment.consensus_iupac[0..8].should == '???????v?'
79
83
  end
80
84
 
@@ -3,6 +3,7 @@ Feature: BioAlignment should play with BioRuby
3
3
  I want to convert BioAlignment to Bio::Alignment
4
4
  And I want to support Bio::Sequence objects
5
5
 
6
+ @bioruby
6
7
  Scenario: Use Bio::Sequence to fill BioAlignment
7
8
  Given I have multiple Bio::Sequence objects
8
9
  When I assign BioAlignment
@@ -17,7 +18,7 @@ Feature: BioAlignment should play with BioRuby
17
18
  And and return a partial AA sequence
18
19
  And be AA indexable
19
20
 
20
- Scenario: Convert BioAlignment to Bio::Alignment
21
+ Scenario: Convert BioAlignment to BioRuby Bio::Alignment
21
22
  Given I have a BioAlignment
22
23
  When I convert
23
- Then I should have a Bio::Alignment
24
+ Then I should have a BioRuby Bio::Alignment
@@ -0,0 +1,33 @@
1
+ Given /^I iterate the columns$/ do
2
+ @aln.should_not be_nil # aln is loaded by codon-feature.rb
3
+ end
4
+
5
+ column = nil
6
+ When /^I fetch a column$/ do
7
+ column = @aln.columns[3]
8
+ column.should_not be_nil
9
+ column[0].to_s.should == 'cga'
10
+ end
11
+
12
+ When /^I inject column state$/ do
13
+ column.state = ColumnState.new
14
+ column.state.deleted = true
15
+ end
16
+
17
+ Then /^I should be able to get the column state$/ do
18
+ column.state.deleted?.should be_true
19
+ end
20
+
21
+ list = []
22
+ When /^I iterate a column$/ do
23
+ column10 = @aln.columns[10]
24
+ column10.each do | element |
25
+ list << element.to_s
26
+ end
27
+ end
28
+
29
+ Then /^I should get the column elements$/ do
30
+ list[0..10].should ==
31
+ ["ctt", "gcg", "ctt", "ttt", "gcg", "ttt", "ttt", "agt", "ttt", "atg", "agt"]
32
+ end
33
+
@@ -0,0 +1,14 @@
1
+ Feature: Alignment column support
2
+ In order to access an alignment by column
3
+ I want to access column state
4
+ I want to get all elements in a column
5
+
6
+ @columns
7
+ Scenario: Access column information in an alignment
8
+ Given I read an MSA nucleotide FASTA file in the test/data folder
9
+ When I fetch a column
10
+ When I inject column state
11
+ Then I should be able to get the column state
12
+ When I iterate a column
13
+ Then I should get the column elements
14
+
@@ -0,0 +1,21 @@
1
+ require 'bio-alignment'
2
+ require 'bio-alignment/edit/del_bridges'
3
+
4
+ Given /^I have a bridged alignment$/ do |string|
5
+ @aln = Alignment.new(string.split(/\n/))
6
+ end
7
+
8
+ When /^I apply the bridge rule$/ do
9
+ @aln.extend DelBridges
10
+ aln2 = @aln.clean
11
+ end
12
+
13
+ Then /^it should have removed (\d+) bridges$/ do |arg1, string|
14
+ pending # express the regexp above with the code you wish you had
15
+ end
16
+
17
+ Then /^I should be able to track removed columns$/ do
18
+ pending # express the regexp above with the code you wish you had
19
+ end
20
+
21
+
@@ -0,0 +1,36 @@
1
+ Feature: Alignment editing, the bridge rule
2
+ Remove columns that contain too many gaps
3
+
4
+ Drop all bridges in less than 'min_bridges_fraction' (default 1/3 or 33%).
5
+
6
+ The dropped columns are tracked by the table columns.
7
+
8
+ @dev
9
+ Scenario: Apply bridge rule to an amino acid alignment
10
+ Given I have a bridged alignment
11
+ """
12
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
13
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
14
+ SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
15
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
16
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
17
+ ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
18
+ ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
19
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
20
+ -------------IFHAVR-TC-HP-----------------
21
+ """
22
+ When I apply the bridge rule
23
+ Then it should have removed 4 bridges
24
+ """
25
+ SNSFSRPTIIFSGCSTACSGKSELVCGFRSFMLSDV
26
+ SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
27
+ SNSFSRPTIIFSGCSTACSGKSEQVCGFR---LSDV
28
+ PKLFSRPTIIFSGCSTACSGKSEPVCGFRSFMLSDV
29
+ ------PTIIFSGCSKACSGKSELVCGFRSFMLSDV
30
+ ------PTIIFSGCSKACSGK---FRSFRSFMLSAV
31
+ ------PTIIFSGCSKACSGK---VCGIFHAVRSFM
32
+ ------PTIIFSGCSKACSGKSELVCGFRSFMLSAV
33
+ ---------IFHAVR-TC-HP---------------
34
+ """
35
+ Then I should be able to track removed columns
36
+
@@ -0,0 +1,4 @@
1
+ Feature: Remove non-informative sequences
2
+
3
+ After alignment cleaning, it may be we have non-informative sequences. These
4
+ can be removed from the alignment.
@@ -0,0 +1,13 @@
1
+ When /^I apply GBlocks$/ do
2
+ pending # express the regexp above with the code you wish you had
3
+ end
4
+
5
+ Then /^it should return the GBlocks cleaned alignment$/ do
6
+ pending # express the regexp above with the code you wish you had
7
+ end
8
+
9
+ Then /^return a list of removed columns$/ do
10
+ pending # express the regexp above with the code you wish you had
11
+ end
12
+
13
+
@@ -0,0 +1,31 @@
1
+ Feature: GBlocks implementation in Ruby
2
+
3
+ The GBlocks routine is often used, but the source code is not open source. This
4
+ is a feature request for a reimplementation of GBlocks. Some links:
5
+
6
+ Open sourcing request by Debian: http://lists.debian.org/debian-med/2011/02/msg00008.html
7
+
8
+ Binary download of GBlocks: http://molevol.cmima.csic.es/castresana/Gblocks.html
9
+
10
+ Documentation: http://molevol.cmima.csic.es/castresana/Gblocks/Gblocks_documentation.html
11
+
12
+ It is quite a simple routine, and would be easy to validate against existing outcomes.
13
+
14
+ Scenario: Apply GBlocks to an alignment
15
+ Given I have an alignment
16
+ """
17
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
18
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
19
+ SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
20
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
21
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
22
+ ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
23
+ ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
24
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
25
+ -------------IFHAVR-TC-HP-----------------
26
+ """
27
+ When I apply GBlocks
28
+ Then it should return the GBlocks cleaned alignment
29
+ And return a list of removed columns
30
+
31
+
@@ -0,0 +1,12 @@
1
+ require 'bio-alignment'
2
+
3
+ When /^I apply island rule with max_gap_size (\d+)$/ do |arg1|
4
+ pending # express the regexp above with the code you wish you had
5
+ end
6
+
7
+ Then /^it should result in$/ do |string|
8
+ pending # express the regexp above with the code you wish you had
9
+ end
10
+
11
+
12
+
@@ -0,0 +1,44 @@
1
+ Feature: Alignment editing with the Island rule
2
+ The idea is to drop hypervariable floating sequences, as they are probably
3
+ misaligned.
4
+
5
+ Drop all 'islands' in a sequence with low island consensus, that show a gap
6
+ larger than 'max_gap_size' (default 6) on both sides, and are shorter than
7
+ 'min_island_size' (default 30). The latter may be a large size, as an island
8
+ needs to loop in and out several times to be (arguably) functional. We also
9
+ add a parameter 'max_gap_size_inside' (default 2) which allows for small gaps
10
+ inside the island - though the total island size is calculated including
11
+ those small gaps.
12
+
13
+ The island consensus is calculated by column.
14
+ 'max_island_elements_unique_percentage' (default 10%) of elements in the
15
+ island should have a 'min_island_column_matched' (default 1) somewhere in the
16
+ element's column.
17
+
18
+ Scenario: Apply island rule to an amino acid alignment
19
+ Given I have an alignment
20
+ """
21
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
22
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
23
+ SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
24
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
25
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
26
+ ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
27
+ ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
28
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
29
+ -------------IFHAVR-TC-HP-----------------
30
+ """
31
+ When I apply island rule with max_gap_size 4
32
+ Then it should have removed 2 islands
33
+ """
34
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
35
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
36
+ SSIISNSFSRPTIIFSGCSTACSGKLTSEQVCGFR---LSDV
37
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
38
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
39
+ ----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
40
+ ----------PTIIFSGCSKACSGK-----------------
41
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
42
+ ------------------------------------------
43
+ """
44
+
@@ -0,0 +1,16 @@
1
+ require 'bio-alignment'
2
+
3
+ Given /^I have an alignment$/ do |string|
4
+ @aln = Alignment.new(string.split(/\n/))
5
+ p @aln
6
+ end
7
+
8
+ When /^I apply rule masking with X and max_gap_size (\d+)$/ do |arg1|
9
+ pending # express the regexp above with the code you wish you had
10
+ end
11
+
12
+ Then /^it should have removed (\d+) islands$/ do |arg1, string|
13
+ pending # express the regexp above with the code you wish you had
14
+ end
15
+
16
+
@@ -0,0 +1,36 @@
1
+ Feature: Alignment editing masking serial mutations
2
+ Edit an alignment removing or masking unique elements column-wise.
3
+
4
+ If a sequence has a unique AA in a column it is a single mutation event. If
5
+ multiple neighbouring AA's are also unique we suspect the sequence is an
6
+ outlier. This rule masks, or deletes, stretches of unique AAs. The stretch of
7
+ unique AA's is defined in 'max_serial_unique' (default 5, so two bordering
8
+ unique AA's are allowed).
9
+
10
+ Scenario: Apply rule to an amino acid alignment
11
+ Given I have an alignment
12
+ """
13
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
14
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
15
+ SSIISNSFSRPTIIFSGCSTACSQQKLTSEQVCFR---LSDV
16
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
17
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
18
+ ----------PTIIFSGCSKACSGK-----FRSFRSFMLSAV
19
+ ----------PTIIFSGCSKACSGK-----VCGIFHAVRSFM
20
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
21
+ -------------IFHAVR-TC-HP-----------------
22
+ """
23
+ When I apply rule masking with X and max_gap_size 5
24
+ Then it should result in
25
+ """
26
+ ----SNSFSRPTIIFSGCSTACSGK--SELVCGFRSFMLSDV
27
+ SSIISNSFSRPTIIFSGCSTACSGK--SEQVCGFR---LSDV
28
+ SSIISNSFSRPTIIFSGCSTACXXXXXXXXXXXFR---LSDV
29
+ ----PKLFSRPTIIFSGCSTACSGK--SEPVCGFRSFMLSDV
30
+ ----------PTIIFSGCSKACSGKGLSELVCGFRSFMLSDV
31
+ ----------PTIIFSGCSKACSGK-----VCGFRSFMLSAV
32
+ ----------PTIIFSGCSKACSGK-----------------
33
+ ----------PTIIFSGCSKACSGK--SELVCGFRSFMLSAV
34
+ -------------XXXXXX-XX-XX-----------------
35
+ """
36
+
@@ -0,0 +1,35 @@
1
+ Given /^I iterate the rows$/ do
2
+ @aln.should_not be_nil # aln is loaded by codon-feature.rb
3
+ end
4
+
5
+ row = nil
6
+ When /^I fetch a row$/ do
7
+ row = @aln.rows[3]
8
+ row.should_not be_nil
9
+ row[0].to_s.should == '---'
10
+ end
11
+
12
+ When /^I inject row state$/ do
13
+ # tell row to handle state
14
+ row.extend(State)
15
+ row.state = RowState.new
16
+ row.state.deleted = true
17
+ end
18
+
19
+ Then /^I should be able to get the row state$/ do
20
+ row.state.deleted?.should be_true
21
+ end
22
+
23
+ list = []
24
+ When /^I iterate a row$/ do
25
+ row10 = @aln.rows[10]
26
+ row10.each do | element |
27
+ list << element.to_s
28
+ end
29
+ end
30
+
31
+ Then /^I should get the row elements$/ do
32
+ list[0..10].should == ["---", "---", "---", "---", "---", "---", "---", "atg", "tcg", "tcc", "agt"]
33
+
34
+ end
35
+
@@ -0,0 +1,14 @@
1
+ Feature: Alignment row support
2
+ In order to access an alignment by row
3
+ I want to access the row state
4
+
5
+ @rows
6
+ Scenario: Access row information in an alignment
7
+ Given I read an MSA nucleotide FASTA file in the test/data folder
8
+ When I fetch a row
9
+ When I inject row state
10
+ Then I should be able to get the row state
11
+ When I iterate a row
12
+ Then I should get the row elements
13
+
14
+
@@ -1,6 +1,7 @@
1
1
  # Alignment
2
2
 
3
3
  require 'bio-alignment/pal2nal'
4
+ require 'bio-alignment/column'
4
5
 
5
6
  module Bio
6
7
 
@@ -9,22 +10,37 @@ module Bio
9
10
  class Alignment
10
11
  include Enumerable
11
12
  include Pal2Nal
13
+ include Columns
12
14
 
13
15
  attr_accessor :sequences
14
16
 
15
- def initialize
17
+ # Create alignment. seqs can be a list of sequences. If these
18
+ # are String types, they get converted to the library Sequence
19
+ # container
20
+ def initialize seqs = nil
16
21
  @sequences = []
22
+ if seqs
23
+ seqs.each_with_index do | seq, i |
24
+ @sequences <<
25
+ if seq.kind_of?(String)
26
+ Sequence.new(i,seq)
27
+ else
28
+ seq
29
+ end
30
+ end
31
+ end
17
32
  end
18
33
 
19
34
  alias rows sequences
20
35
 
21
- # def [] index <- need matrix
22
- # rows[index]
23
- # end
36
+ def [] index
37
+ rows[index]
38
+ end
24
39
 
25
40
  def each
26
41
  rows.each { | seq | yield seq }
27
42
  end
43
+
28
44
  end
29
45
  end
30
46
  end
@@ -18,5 +18,14 @@ module Bio
18
18
  end
19
19
  end
20
20
  end
21
+
22
+ # Here we add a BioRuby converter
23
+ module BioAlignment
24
+ class Alignment
25
+ def to_bioruby_alignment
26
+ Bio::Alignment.new(self)
27
+ end
28
+ end
29
+ end
21
30
  end
22
31
 
@@ -74,6 +74,10 @@ module Bio
74
74
  @seq[index]
75
75
  end
76
76
 
77
+ def length
78
+ @seq.length
79
+ end
80
+
77
81
  def each
78
82
  @seq.each { | codon | yield codon }
79
83
  end
@@ -0,0 +1,47 @@
1
+ require 'bio-alignment/state'
2
+
3
+ module Bio
4
+
5
+ module BioAlignment
6
+
7
+ # The Columns module provides accessors for the column list
8
+ # returning Column objects
9
+ module Columns
10
+
11
+ # Return a list of Column objects. The contents of the
12
+ # columns are accessed lazily
13
+ def columns
14
+ (0..num_columns-1).map { | col | Column.new(self,col) }
15
+ end
16
+
17
+ def num_columns
18
+ rows.first.length
19
+ end
20
+ end
21
+
22
+ # Support the notion of columns in an alignment. A column
23
+ # can have state by attaching state objects
24
+ class Column
25
+ include State
26
+
27
+ def initialize aln, col
28
+ @aln = aln
29
+ @col = col
30
+ end
31
+
32
+ def [] index
33
+ @aln[index][@col]
34
+ end
35
+
36
+ # iterator fetches a column on demand
37
+ def each
38
+ @aln.each do | seq |
39
+ yield seq[@col]
40
+ end
41
+ end
42
+ end
43
+
44
+ end
45
+
46
+ end
47
+
@@ -0,0 +1,11 @@
1
+
2
+ module Bio
3
+ module BioAlignment
4
+
5
+ module DelBridges
6
+
7
+ def clean
8
+ end
9
+ end
10
+ end
11
+ end
@@ -3,17 +3,24 @@
3
3
  module Bio
4
4
  module BioAlignment
5
5
  module Pal2Nal
6
- def pal2nal nt_aln, options = { :codon_table => 1 }
6
+
7
+ # Protein to nucleotide alignment, using a codon table for testing. If :do_validate is
8
+ # false, translation for validation is skipped (note CodonSequence translation is lazy).
9
+ def pal2nal nt_aln, options = { :codon_table => 1, :do_validate => true }
10
+ do_validate = options[:do_validate]
7
11
  aa_aln = self
8
12
  codon_aln = Alignment.new
9
13
  aa_aln.each_with_index do | aaseq, i |
10
14
  ntseq = nt_aln.sequences[i]
11
15
  raise "pal2nal sequence IDs do not match (for #{aaseq.id} != #{ntseq.id})" if aaseq.id != ntseq.id
12
16
  raise "pal2nal sequence size does not match (for #{aaseq.id}'s #{aaseq.to_s.size}!= #{ntseq.to_s.size * 3})" if aaseq.id != ntseq.id
17
+ # create a Codon sequence out of the nucleotide sequence (no gaps)
13
18
  codonseq = CodonSequence.new(ntseq.id, ntseq.seq, options)
14
19
 
15
20
  codon_pos = 0
16
21
  result = []
22
+
23
+ # now fill the result array by finding codons and gaps, and testing for valid amino acids
17
24
  aaseq.each do | aa |
18
25
  result <<
19
26
  if aa.gap?
@@ -21,11 +28,12 @@ module Bio
21
28
  else
22
29
  codon = codonseq[codon_pos]
23
30
  # validate codon translates to amino acid
24
- raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if codon.to_aa != aa.to_s
31
+ raise "codon does not match amino acid (for #{aaseq.id}, position #{codon_pos}, #{codon} translates to #{codon.to_aa} instead of #{aa.to_s})" if do_validate and codon.to_aa != aa.to_s
25
32
  codon_pos += 1
26
33
  codon.to_s
27
34
  end
28
35
  end
36
+ # the new result is transformed to a gapped CodonSequence
29
37
  codon_seq = CodonSequence.new(aaseq.id, result.join(''), options)
30
38
  codon_aln.sequences << codon_seq
31
39
  end
@@ -18,6 +18,7 @@ module Bio
18
18
  #
19
19
  class Sequence
20
20
  include Enumerable
21
+ include State
21
22
 
22
23
  attr_reader :id, :seq
23
24
  def initialize id, seq
@@ -29,6 +30,10 @@ module Bio
29
30
  @seq[index]
30
31
  end
31
32
 
33
+ def length
34
+ @seq.length
35
+ end
36
+
32
37
  def each
33
38
  @seq.each_char { | c | yield Element.new(c) }
34
39
  end
@@ -0,0 +1,31 @@
1
+ module Bio
2
+
3
+ module BioAlignment
4
+
5
+ module State
6
+ attr_accessor :state
7
+ end
8
+
9
+ # Convenience class for tracking state. Note you can add
10
+ # any class you like
11
+ class ColumnState
12
+ attr_accessor :deleted
13
+
14
+ def deleted?
15
+ deleted == true
16
+ end
17
+ end
18
+
19
+ # Convenience class for tracking state. Note you can add
20
+ # any class you like
21
+ class RowState
22
+ attr_accessor :deleted
23
+
24
+ def deleted?
25
+ deleted == true
26
+ end
27
+ end
28
+
29
+ end
30
+
31
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bio-alignment
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  prerelease:
6
6
  platform: ruby
7
7
  authors:
@@ -9,11 +9,11 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-02-06 00:00:00.000000000Z
12
+ date: 2012-02-28 00:00:00.000000000Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: bio-logger
16
- requirement: &16310560 !ruby/object:Gem::Requirement
16
+ requirement: &11633860 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: '0'
22
22
  type: :runtime
23
23
  prerelease: false
24
- version_requirements: *16310560
24
+ version_requirements: *11633860
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: bio
27
- requirement: &16309720 !ruby/object:Gem::Requirement
27
+ requirement: &11632600 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: 1.4.2
33
33
  type: :runtime
34
34
  prerelease: false
35
- version_requirements: *16309720
35
+ version_requirements: *11632600
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: bio-bigbio
38
- requirement: &16309060 !ruby/object:Gem::Requirement
38
+ requirement: &11622800 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ! '>'
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: 0.1.3
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *16309060
46
+ version_requirements: *11622800
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: cucumber
49
- requirement: &16308000 !ruby/object:Gem::Requirement
49
+ requirement: &11622000 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ! '>='
@@ -54,10 +54,10 @@ dependencies:
54
54
  version: '0'
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *16308000
57
+ version_requirements: *11622000
58
58
  - !ruby/object:Gem::Dependency
59
59
  name: rspec
60
- requirement: &16306500 !ruby/object:Gem::Requirement
60
+ requirement: &11621400 !ruby/object:Gem::Requirement
61
61
  none: false
62
62
  requirements:
63
63
  - - ~>
@@ -65,10 +65,10 @@ dependencies:
65
65
  version: 2.3.0
66
66
  type: :development
67
67
  prerelease: false
68
- version_requirements: *16306500
68
+ version_requirements: *11621400
69
69
  - !ruby/object:Gem::Dependency
70
70
  name: bundler
71
- requirement: &16305460 !ruby/object:Gem::Requirement
71
+ requirement: &11620480 !ruby/object:Gem::Requirement
72
72
  none: false
73
73
  requirements:
74
74
  - - ~>
@@ -76,10 +76,10 @@ dependencies:
76
76
  version: 1.0.0
77
77
  type: :development
78
78
  prerelease: false
79
- version_requirements: *16305460
79
+ version_requirements: *11620480
80
80
  - !ruby/object:Gem::Dependency
81
81
  name: jeweler
82
- requirement: &16304680 !ruby/object:Gem::Requirement
82
+ requirement: &11619680 !ruby/object:Gem::Requirement
83
83
  none: false
84
84
  requirements:
85
85
  - - ~>
@@ -87,7 +87,7 @@ dependencies:
87
87
  version: 1.7.0
88
88
  type: :development
89
89
  prerelease: false
90
- version_requirements: *16304680
90
+ version_requirements: *11619680
91
91
  description: Alignment handler for multiple sequence alignments (MSA)
92
92
  email: pjotr.public01@thebird.nl
93
93
  executables:
@@ -99,6 +99,7 @@ extra_rdoc_files:
99
99
  files:
100
100
  - .document
101
101
  - .rspec
102
+ - .travis.yml
102
103
  - Gemfile
103
104
  - LICENSE.txt
104
105
  - README.md
@@ -110,14 +111,30 @@ files:
110
111
  - features/bioruby.feature
111
112
  - features/codon-feature.rb
112
113
  - features/codon.feature
114
+ - features/columns-feature.rb
115
+ - features/columns.feature
116
+ - features/edit/del_bridges-feature.rb
117
+ - features/edit/del_bridges.feature
118
+ - features/edit/del_non_informative_sequences.feature
119
+ - features/edit/gblocks-feature.rb
120
+ - features/edit/gblocks.feature
121
+ - features/edit/mask_islands-feature.rb
122
+ - features/edit/mask_islands.feature
123
+ - features/edit/mask_serial_mutations-feature.rb
124
+ - features/edit/mask_serial_mutations.feature
113
125
  - features/pal2nal-feature.rb
114
126
  - features/pal2nal.feature
127
+ - features/rows-feature.rb
128
+ - features/rows.feature
115
129
  - lib/bio-alignment.rb
116
130
  - lib/bio-alignment/alignment.rb
117
131
  - lib/bio-alignment/bioruby.rb
118
132
  - lib/bio-alignment/codonsequence.rb
133
+ - lib/bio-alignment/column.rb
134
+ - lib/bio-alignment/edit/del_bridges.rb
119
135
  - lib/bio-alignment/pal2nal.rb
120
136
  - lib/bio-alignment/sequence.rb
137
+ - lib/bio-alignment/state.rb
121
138
  - spec/bio-alignment_spec.rb
122
139
  - spec/spec_helper.rb
123
140
  - test/data/fasta/codon/aa-alignment.fa
@@ -141,7 +158,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
141
158
  version: '0'
142
159
  segments:
143
160
  - 0
144
- hash: -237499918493109825
161
+ hash: 1565072942973090495
145
162
  required_rubygems_version: !ruby/object:Gem::Requirement
146
163
  none: false
147
164
  requirements: