bio-velvet 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 798289b36dd93bb47a40f8f5c1e71ecf59305699
4
+ data.tar.gz: 3b97653d1ca5fd6b62c1ab1f097c3ffe868ce04c
5
+ SHA512:
6
+ metadata.gz: ee227d4e19f9ce09edb22316aea8fa05fde499e27dcec8af4bec6afc7c8bbb7567b081f86cd64d0c34a00ec4d7e7c2202a3336914770ee00e053bf09907cf8f0
7
+ data.tar.gz: f84054f0cff8d627a57d2431d3af35b19a9941d65da10aee04e38431e436bc28ddef4021904df445d31d59a87cddabe8b61dcc409abd6df1db19bc9f41930fb2
@@ -0,0 +1,5 @@
1
+ lib/**/*.rb
2
+ bin/*
3
+ -
4
+ features/**/*.feature
5
+ LICENSE.txt
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --color
@@ -0,0 +1,12 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.2
4
+ - 1.9.3
5
+ - jruby-19mode # JRuby in 1.9 mode
6
+ - rbx-19mode
7
+ # - 1.8.7
8
+ # - jruby-18mode # JRuby in 1.8 mode
9
+ # - rbx-18mode
10
+
11
+ # uncomment this line if your project needs to run something other than `rake`:
12
+ script: bundle exec rspec spec/bio-velvet_graph_spec.rb
data/Gemfile ADDED
@@ -0,0 +1,17 @@
1
+ source "http://rubygems.org"
2
+
3
+ gem 'bio-logger', '>=1.0.1'
4
+ gem 'systemu'
5
+ gem 'files'
6
+ gem 'hopcsv', '>= 0.4.3'
7
+
8
+ # Add dependencies to develop your gem here.
9
+ # Include everything needed to run rake, tests, features, etc.
10
+ group :development do
11
+ gem "rspec", ">= 2.8.0"
12
+ gem "rdoc", ">= 3.12"
13
+ gem "jeweler", ">= 1.8.4"
14
+ gem "bundler", ">= 1.0.21"
15
+ gem "bio", ">= 1.4.2"
16
+ gem "rdoc", ">= 3.12"
17
+ end
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2013 Ben J Woodcroft
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,62 @@
1
+ # bio-velvet
2
+
3
+ [![Build Status](https://secure.travis-ci.org/wwood/bioruby-velvet.png)](http://travis-ci.org/wwood/bioruby-velvet)
4
+
5
+ ```bio-velvet``` is a [biogem](biogems.info) for interacting with the [velvet](http://www.ebi.ac.uk/~zerbino/velvet/) sequence assembler. It includes both a wrapper for the velvet executable, as well as a a parser for the 'LastGraph' format files that velvet creates. This gives access to the underlying assembly graph created by velvet.
6
+
7
+ ## Installation
8
+ To install ```bio-velvet``` and its rubygem dependencies:
9
+
10
+ ```sh
11
+ gem install bio-velvet
12
+ ```
13
+
14
+ ## Usage
15
+
16
+ To run velvet with a kmer length of 87 on a set of single ended reads in ```/path/to/reads.fa```:
17
+ ```ruby
18
+ require 'bio-velvet'
19
+
20
+ velvet_result = Bio::Velvet::Runner.new.velvet(87, '-short /path/to/reads.fa') #=> Bio::Velvet::Result object
21
+
22
+ contigs_file = velvet_result.contigs_path #=> path to contigs file as a String
23
+ lastgraph_file = velvet_result.last_graph_path #=> path to last graph file as a String
24
+ ```
25
+
26
+ The graph file can be then parsed from the ```velvet_result```:
27
+ ```ruby
28
+ graph = velvet_result.last_graph #=> Bio::Velvet::Graph object
29
+ ```
30
+ In my experience (mostly on complex metagenomes), the graph object itself does not take as much RAM as I initially expected. Most of the hard work has already been done by velvet itself, particularly if the ```-cov_cutoff``` has been set. However parsing in the graph can take many minutes if the LastGraph file is big (>500MB).
31
+
32
+ With this graph you can access interact with the graph e.g.
33
+ ```ruby
34
+ graph.kmer_length #=> 87
35
+ graph.nodes #=> Bio::Velvet::Graph::NodeArray object
36
+ graph.nodes[3] #=> Bio::Velvet::Graph::Node object with node ID 3
37
+ graph.get_arcs_by_node_id(1, 3) #=> an array of arcs between nodes 1 and 3 (Bio::Velvet::Graph::Arc objects)
38
+ graph.nodes[5].noded_reads #=> array of Bio::Velvet::Graph::NodedRead objects, for read tracking
39
+ ```
40
+ There is much more that can be done to interact with the graph object and its components - see the [rubydoc](http://rubydoc.info/gems/bio-velvet).
41
+
42
+ ## Project home page
43
+
44
+ Information on the source tree, documentation, examples, issues and
45
+ how to contribute, see
46
+
47
+ http://github.com/wwood/bioruby-velvet
48
+
49
+ The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
50
+
51
+ ## Cite
52
+
53
+ This code is currently unpublished.
54
+
55
+ ## Biogems.info
56
+
57
+ This Biogem is published at (http://biogems.info/index.html#bio-velvet)
58
+
59
+ ## Copyright
60
+
61
+ Copyright (c) 2013 Ben J Woodcroft. See LICENSE.txt for further details.
62
+
@@ -0,0 +1,49 @@
1
+ # encoding: utf-8
2
+
3
+ require 'rubygems'
4
+ require 'bundler'
5
+ begin
6
+ Bundler.setup(:default, :development)
7
+ rescue Bundler::BundlerError => e
8
+ $stderr.puts e.message
9
+ $stderr.puts "Run `bundle install` to install missing gems"
10
+ exit e.status_code
11
+ end
12
+ require 'rake'
13
+
14
+ require 'jeweler'
15
+ Jeweler::Tasks.new do |gem|
16
+ # gem is a Gem::Specification... see http://docs.rubygems.org/read/chapter/20 for more options
17
+ gem.name = "bio-velvet"
18
+ gem.homepage = "http://github.com/wwood/bioruby-velvet"
19
+ gem.license = "MIT"
20
+ gem.summary = %Q{Parser to work with file formats used in the velvet DNA assembler}
21
+ gem.description = %Q{Parser to work with some file formats used in the velvet DNA assembler}
22
+ gem.email = "donttrustben@gmail.com"
23
+ gem.authors = ["Ben J Woodcroft"]
24
+ # dependencies defined in Gemfile
25
+ end
26
+ Jeweler::RubygemsDotOrgTasks.new
27
+
28
+ require 'rspec/core'
29
+ require 'rspec/core/rake_task'
30
+ RSpec::Core::RakeTask.new(:spec) do |spec|
31
+ spec.pattern = FileList['spec/**/*_spec.rb']
32
+ end
33
+
34
+ RSpec::Core::RakeTask.new(:rcov) do |spec|
35
+ spec.pattern = 'spec/**/*_spec.rb'
36
+ spec.rcov = true
37
+ end
38
+
39
+ task :default => :spec
40
+
41
+ require 'rdoc/task'
42
+ Rake::RDocTask.new do |rdoc|
43
+ version = File.exist?('VERSION') ? File.read('VERSION') : ""
44
+
45
+ rdoc.rdoc_dir = 'rdoc'
46
+ rdoc.title = "bio-velvet #{version}"
47
+ rdoc.rdoc_files.include('README*')
48
+ rdoc.rdoc_files.include('lib/**/*.rb')
49
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.0.1
@@ -0,0 +1,13 @@
1
+ require 'bio'
2
+ require 'bio-logger'
3
+ Bio::Log::LoggerPlus.new('bio-velvet')
4
+ module Bio::Velvet
5
+ module Logging
6
+ def log
7
+ Bio::Log::LoggerPlus['bio-velvet']
8
+ end
9
+ end
10
+ end
11
+
12
+ require 'bio-velvet/graph'
13
+ require 'bio-velvet/runner'
@@ -0,0 +1,517 @@
1
+ require 'hopcsv'
2
+ require 'bio'
3
+
4
+ module Bio
5
+ module Velvet
6
+ class NotImplementedException < Exception; end
7
+
8
+ # Parser for a velvet assembler's graph file (Graph or LastGraph) output from velvetg
9
+ #
10
+ # The definition of this file is given in the velvet manual, at
11
+ # http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf
12
+ class Graph
13
+ include Bio::Velvet::Logging
14
+
15
+ # $NUMBER_OF_NODES $NUMBER_OF_SEQUENCES $HASH_LENGTH
16
+ attr_accessor :number_of_nodes, :number_of_sequences, :hash_length
17
+
18
+ # NodeArray object of all the graph's node objects
19
+ attr_accessor :nodes
20
+
21
+ # Array of Arc objects
22
+ attr_accessor :arcs
23
+
24
+ def self.log
25
+ self.new.log
26
+ end
27
+
28
+ # Parse a graph file from a Graph, Graph2 or LastGraph output file from velvet
29
+ # into a Bio::Velvet::Graph object
30
+ def self.parse_from_file(path_to_graph_file)
31
+ graph = self.new
32
+ state = :header
33
+
34
+ current_node = nil
35
+ graph.nodes = NodeArray.new
36
+ graph.arcs = ArcArray.new
37
+ current_node_direction = nil
38
+
39
+ line_number = 0
40
+ Hopcsv.foreach(path_to_graph_file,"\t") do |row|
41
+ line_number += 1
42
+
43
+ if state == :header
44
+ raise "parse exception on header line, this line #{line_number}: #{row.inspect}" unless row.length >= 3
45
+ graph.number_of_nodes = row[0].to_i
46
+ graph.number_of_sequences = row[1].to_i
47
+ graph.hash_length = row[2].to_i
48
+ #Not quite sure what the function of the 4th column is
49
+ state = :nodes_0
50
+ log.debug "Now parsing velvet graph nodes" if log.debug?
51
+ next
52
+ end
53
+
54
+ if state == :nodes_0
55
+ # NODE $NODE_ID $COV_SHORT1 $O_COV_SHORT1 $COV_SHORT2 $O_COV_SHORT2
56
+ # $ENDS_OF_KMERS_OF_NODE
57
+ # $ENDS_OF_KMERS_OF_TWIN_NODE
58
+ if row[0] == 'NODE'
59
+ raise unless row.length > 2
60
+ current_node = Node.new
61
+ current_node.node_id = row[1].to_i
62
+ current_node.length = row[2].to_i
63
+ current_node.coverages = row[3...row.length].collect{|c| c.to_i}
64
+ current_node.parent_graph = graph
65
+ state = :nodes_1
66
+ raise "Duplicate node name" unless graph.nodes[current_node.node_id].nil?
67
+ graph.nodes[current_node.node_id] = current_node
68
+ next
69
+ else
70
+ state = :arc
71
+ log.debug "Now parsing velvet graph arcs" if log.debug?
72
+ # No next in the loop so that this line gets parsed as an ARC further down the loop
73
+ end
74
+ elsif state == :nodes_1
75
+ # Sometimes nodes can be empty
76
+ row[0] ||= ''
77
+ current_node.ends_of_kmers_of_node = row[0]
78
+ raise "Unexpected nodes_1 type line on line #{line_number}: #{row.inspect}" if row.length != 1
79
+ state = :nodes_2
80
+ next
81
+ elsif state == :nodes_2
82
+ # Sometimes nodes can be empty
83
+ row[0] ||= ''
84
+ raise if row.length != 1
85
+ current_node.ends_of_kmers_of_twin_node = row[0]
86
+ state = :nodes_0
87
+ next
88
+ end
89
+
90
+ if state == :arc
91
+ if row[0] == 'ARC'
92
+ # ARC $START_NODE $END_NODE $MULTIPLICITY
93
+ arc = Arc.new
94
+ raise unless row.length == 4
95
+ arc.begin_node_id = row[1].to_i.abs
96
+ arc.end_node_id = row[2].to_i.abs
97
+ arc.multiplicity = row[3].to_i
98
+ arc.begin_node_direction = (row[1].to_i > 0)
99
+ arc.end_node_direction = (row[2].to_i > 0)
100
+ graph.arcs.push arc
101
+ next
102
+ else
103
+ state = :nr
104
+ log.debug "Finished parsing velvet graph arcs. Now parsing the rest of the file" if log.debug?
105
+ end
106
+ end
107
+
108
+ if state == :nr
109
+ if row[0] == 'SEQ'
110
+ log.warn "velvet graph parse warning: SEQ lines in the Graph file parsing not implemented yet, tracking of reads now not parsed either"
111
+ break
112
+ end
113
+
114
+ # If short reads are tracked, for every node a block of read identifiers:
115
+ # NR $NODE_ID $NUMBER_OF_SHORT_READS
116
+ # $READ_ID $OFFSET_FROM_START_OF_NODE $START_COORD
117
+ # $READ_ID2 etc.
118
+ #p row
119
+ if row[0] == 'NR'
120
+ raise unless row.length == 3
121
+ node_pm = row[1].to_i
122
+ current_node_direction = node_pm > 0
123
+ current_node = graph.nodes[node_pm.abs]
124
+ current_node.number_of_short_reads ||= 0
125
+ current_node.number_of_short_reads += row[2].to_i
126
+ next
127
+ else
128
+ raise unless row.length == 3
129
+ nr = NodedRead.new
130
+ nr.read_id = row[0].to_i
131
+ nr.offset_from_start_of_node = row[1].to_i
132
+ nr.start_coord = row[2].to_i
133
+ nr.direction = current_node_direction
134
+ current_node.short_reads ||= []
135
+ current_node.short_reads.push nr
136
+ next
137
+ end
138
+ end
139
+ end
140
+ log.debug "Finished parsing velvet graph file" if log.debug?
141
+
142
+ return graph
143
+ end
144
+
145
+ # Return an array of Arc objects between two nodes (specified by integer IDs),
146
+ # or an empty array if none exists. There is four possible arcs between
147
+ # two nodes, connecting their beginnings and ends
148
+ def get_arcs_by_node_id(node_id1, node_id2)
149
+ @arcs.get_arcs_by_node_id(node_id1, node_id2)
150
+ end
151
+
152
+ # Return an array of Arc objects between two nodes (specified by node objects),
153
+ # or an empty array if none exists. There is four possible arcs between
154
+ # two nodes, connecting their beginnings and ends
155
+ def get_arcs_by_node(node1, node2)
156
+ @arcs.get_arcs_by_node_id(node1.node_id, node2.node_id)
157
+ end
158
+
159
+ # Return the adjacent nodes in the graph that connect to the end of a node
160
+ def neighbours_off_end(node)
161
+ # Find all arcs that include this node in the right place
162
+ passable_nodes = []
163
+ @arcs.get_arcs_by_node_id(node.node_id).each do |arc|
164
+ if arc.begin_node_id == node.node_id and arc.begin_node_direction
165
+ # The most intuitive case
166
+ passable_nodes.push nodes[arc.end_node_id]
167
+ elsif arc.end_node_id == node.node_id and !arc.end_node_direction
168
+ passable_nodes.push nodes[arc.begin_node_id]
169
+ end
170
+ end
171
+ return passable_nodes
172
+ end
173
+
174
+ # Return the adjacent nodes in the graph that connect to the end of a node
175
+ def neighbours_into_start(node)
176
+ # Find all arcs that include this node in the right place
177
+ passable_nodes = []
178
+ @arcs.get_arcs_by_node_id(node.node_id).each do |arc|
179
+ if arc.end_node_id == node.node_id and arc.end_node_direction
180
+ passable_nodes.push nodes[arc.begin_node_id]
181
+ elsif arc.begin_node_id == node.node_id and !arc.begin_node_direction
182
+ passable_nodes.push nodes[arc.end_node_id]
183
+ end
184
+ end
185
+ return passable_nodes
186
+ end
187
+
188
+
189
+ # Deletes nodes and associated arcs from the graph if the block passed
190
+ # evaluates to true (as in Array#delete_if). Actually the associated arcs
191
+ # are deleted first, and then the node, so that the graph remains sane at all
192
+ # times - there is never dangling arcs, as such.
193
+ #
194
+ # Returns a [deleted_nodes, deleted_arc] tuple, which are both enumerables,
195
+ # each in no particular order.
196
+ def delete_nodes_if(&block)
197
+ deleted_nodes = []
198
+ deleted_arcs = []
199
+ nodes.each do |node|
200
+ if yield(node)
201
+ deleted_nodes.push node
202
+
203
+ # delete associated arcs
204
+ arcs_to_del = @arcs.get_arcs_by_node_id(node.node_id)
205
+ deleted_arcs.push arcs_to_del
206
+ arcs_to_del.each do |arc|
207
+ @arcs.delete arc
208
+ end
209
+
210
+ # delete the arc itself
211
+ nodes.delete node
212
+ end
213
+ end
214
+ return deleted_nodes, deleted_arcs.flatten
215
+ end
216
+
217
+
218
+
219
+
220
+
221
+ # A container class for a list of Node objects. Can index with 1-offset
222
+ # IDs, so that they line up with the identifiers in velvet Graph files,
223
+ # yet respond sensibly to NodeArray#length, etc.
224
+ class NodeArray
225
+ include Enumerable
226
+
227
+ def initialize
228
+ # Internal index is required because when things get deleted stuff changes.
229
+ @internal_structure = {}
230
+ end
231
+
232
+ def []=(node_id, value)
233
+ @internal_structure[node_id] = value
234
+ end
235
+
236
+ def [](node_id)
237
+ @internal_structure[node_id]
238
+ end
239
+
240
+ def delete(node)
241
+ @internal_structure.delete node.node_id
242
+ end
243
+
244
+ def length
245
+ @internal_structure.length
246
+ end
247
+
248
+ def each(&block)
249
+ @internal_structure.each do |internal_id, node|
250
+ block.yield node
251
+ end
252
+ end
253
+ end
254
+
255
+ class ArcArray
256
+ include Enumerable
257
+
258
+ def initialize
259
+ # Internal structure is hash of [node_id1, node_id2] => Array of arcs
260
+ @internal_structure = {}
261
+ @node_to_keys = {}
262
+ end
263
+
264
+ def push(arc)
265
+ key = [arc.begin_node_id, arc.end_node_id].sort
266
+ @internal_structure[key] ||= []
267
+ @internal_structure[key].push arc
268
+ @node_to_keys[arc.begin_node_id] ||= []
269
+ @node_to_keys[arc.begin_node_id].push key
270
+ unless arc.begin_node_id == arc.end_node_id
271
+ @node_to_keys[arc.end_node_id] ||= []
272
+ @node_to_keys[arc.end_node_id].push key
273
+ end
274
+ end
275
+
276
+ # Return all arcs into or out of the given node_id, or
277
+ def get_arcs_by_node_id(node_id1, node_id2=nil)
278
+ if node_id2.nil?
279
+ next_keys = @node_to_keys[node_id1]
280
+ return [] if next_keys.nil?
281
+ next_keys.uniq.collect do |key|
282
+ @internal_structure[key]
283
+ end.flatten
284
+ else
285
+ to_return = @internal_structure[[node_id1, node_id2].sort]
286
+ if to_return.nil?
287
+ return []
288
+ else
289
+ return to_return
290
+ end
291
+ end
292
+ end
293
+
294
+ def delete(arc)
295
+ key = [arc.begin_node_id, arc.end_node_id].sort
296
+ @internal_structure[key].delete arc
297
+ # If there is no other arcs with this same key, clean up more
298
+ if @internal_structure[key].empty?
299
+ @internal_structure.delete key
300
+ @node_to_keys[key[0]].delete key
301
+ @node_to_keys[key[1]].delete key
302
+ @node_to_keys[key[0]] = nil if @node_to_keys[key[0]].nil? or @node_to_keys[key[0]].empty?
303
+ @node_to_keys[key[1]] = nil if @node_to_keys[key[1]].nil? or @node_to_keys[key[1]].empty?
304
+ end
305
+ end
306
+
307
+ def length
308
+ @internal_structure.values.flatten.length
309
+ end
310
+
311
+ def each(&block)
312
+ @internal_structure.each do |internal_id, arcs|
313
+ arcs.each do |arc|
314
+ block.yield arc
315
+ end
316
+ end
317
+ end
318
+ end
319
+
320
+ class Node
321
+ include Bio::Velvet::Logging
322
+
323
+ attr_accessor :node_id, :coverages, :ends_of_kmers_of_node, :ends_of_kmers_of_twin_node
324
+
325
+ # For read tracking
326
+ attr_accessor :number_of_short_reads
327
+ # For read tracking - an array of NodedRead objects
328
+ attr_accessor :short_reads
329
+
330
+ # Graph to which this node belongs
331
+ attr_accessor :parent_graph
332
+
333
+ # Number of nucleotides in this node if a contig was made from this contig alone
334
+ attr_accessor :length
335
+
336
+ # The sequence of this node, should a contig be made solely out of this node.
337
+ # The kmer length is that kmer length that was used to create the assembly.
338
+ #
339
+ # If this node has a sequence that is 2 or more less than the hash length, then the
340
+ # sequence of this node requires information outside of this object, and gathering
341
+ # that information is not implemented here.
342
+ def sequence
343
+ if !sequence?
344
+ raise NotImplementedException, "Attempted to get the sequence of a velvet node that is too short, such that the sequence info is not fully present in the node object"
345
+ end
346
+ kmer_length = @parent_graph.hash_length
347
+
348
+ # Sequence is the reverse complement of the ends_of_kmers_of_twin_node,
349
+ # Then the ends_of_kmers_of_node after removing the first kmer_length - 1
350
+ # nucleotides
351
+ length_to_get_from_fwd = corresponding_contig_length - @ends_of_kmers_of_twin_node.length
352
+ fwd_length = @ends_of_kmers_of_node.length
353
+ raise "Programming error" if length_to_get_from_fwd > fwd_length
354
+ revcom(@ends_of_kmers_of_twin_node)+
355
+ @ends_of_kmers_of_node[-length_to_get_from_fwd...fwd_length]
356
+ end
357
+
358
+ # Number of nucleotides in this node if this contig length is being added to
359
+ # another node's length (nodes overlap)
360
+ def length_alone
361
+ @ends_of_kmers_of_node.length
362
+ end
363
+
364
+ # The common length of [ends_of_kmers_of_node and :ends_of_kmers_of_twin_node]
365
+ # is equal to the length of the corresponding contig minus k − 1.
366
+ #
367
+ # This method returns that corresponding contig's length
368
+ def corresponding_contig_length
369
+ @ends_of_kmers_of_node.length+@parent_graph.hash_length-1
370
+ end
371
+
372
+ # Is it possible to extract the sequence of this node? I.e. is it long enough?
373
+ def sequence?
374
+ kmer_length = @parent_graph.hash_length
375
+ if kmer_length -1 > @ends_of_kmers_of_node.length
376
+ return false
377
+ else
378
+ return true
379
+ end
380
+ end
381
+
382
+ # The reverse complement of this node's sequence
383
+ def reverse_sequence
384
+ revcom(sequence)
385
+ end
386
+
387
+ # Number of nucleotides in this node if this contig length is being added to
388
+ # another node's length (nodes overlap)
389
+ def length_alone
390
+ @ends_of_kmers_of_node.length
391
+ end
392
+
393
+ def to_s
394
+ "Node #{@node_id}: #{@ends_of_kmers_of_node} / #{@ends_of_kmers_of_twin_node}"
395
+ end
396
+
397
+ def inspect
398
+ to_s
399
+ end
400
+
401
+ # Return the sum of all coverage columns, divided by the length of the node,
402
+ # or nil if this node has no coverage
403
+ def coverage
404
+ return nil if length == 0
405
+
406
+ coverage = 0
407
+ coverages.each_with_index do |cov, i|
408
+ # Only take the 0th, 2nd, 4th, etc, don't want the O_cov things
409
+ coverage += cov if i.modulo(2) == 0
410
+ end
411
+ return coverage.to_f / length
412
+ end
413
+
414
+ private
415
+ def revcom(seq)
416
+ Bio::Sequence::NA.new(seq).reverse_complement.to_s.upcase
417
+ end
418
+ end
419
+
420
+ class Arc
421
+ attr_accessor :begin_node_id, :end_node_id, :multiplicity
422
+
423
+ # true for forwards direction, false for reverse
424
+ attr_accessor :begin_node_direction, :end_node_direction
425
+
426
+ def directions_opposing?
427
+ if (@begin_node_direction == true and @end_node_direction == false) or
428
+ (@begin_node_direction == false and @end_node_direction == true)
429
+ return true
430
+ elsif [true,false].include?(@begin_node_direction) and [true,false].include?(@end_node_direction)
431
+ return false
432
+ else
433
+ raise Exception, "Node directions not set! Cannot tell whether directions are opposing"
434
+ end
435
+ end
436
+
437
+ def begin_node_forward?
438
+ @begin_node_direction
439
+ end
440
+
441
+ def end_node_forward?
442
+ @end_node_forward
443
+ end
444
+
445
+ # Returns true if this arc connects the end of the first node
446
+ # to the start of the second node, else false
447
+ def connects_end_to_beginning?(first_node_id, second_node_id)
448
+ # ARC $START_NODE $END_NODE $MULTIPLICITY
449
+ #Note: this one line implicitly represents an arc from node A to B and
450
+ #another, with same multiplicity, from -B to -A.
451
+ (first_node_id == @begin_node_id and second_node_id == @end_node_id and
452
+ @begin_node_direction == true and @end_node_direction == true) or
453
+ (first_node_id == @end_node_id and second_node_id = @begin_node_id and
454
+ @begin_node_direction == false and @end_node_direction == false)
455
+ end
456
+
457
+ # Returns true if this arc connects the end of the first node
458
+ # to the end of the second node, else false
459
+ def connects_end_to_end?(first_node_id, second_node_id)
460
+ (first_node_id == @begin_node_id and second_node_id == @end_node_id and
461
+ @begin_node_direction == true and @end_node_direction == false) or
462
+ (first_node_id == @end_node_id and second_node_id = @begin_node_id and
463
+ @begin_node_direction == true and @end_node_direction == false)
464
+ end
465
+
466
+ # Returns true if this arc connects the start of the first node
467
+ # to the start of the second node, else false
468
+ def connects_beginning_to_beginning?(first_node_id, second_node_id)
469
+ (first_node_id == @begin_node_id and second_node_id == @end_node_id and
470
+ @begin_node_direction == false and @end_node_direction == true) or
471
+ (first_node_id == @end_node_id and second_node_id = @begin_node_id and
472
+ @begin_node_direction == false and @end_node_direction == true)
473
+ end
474
+
475
+ # Returns true if this arc connects the start of the first node
476
+ # to the start of the second node, else false
477
+ def connects_beginning_to_end?(first_node_id, second_node_id)
478
+ (first_node_id == @begin_node_id and second_node_id == @end_node_id and
479
+ @begin_node_direction == false and @end_node_direction == false) or
480
+ (first_node_id == @end_node_id and second_node_id = @begin_node_id and
481
+ @begin_node_direction == true and @end_node_direction == true)
482
+ end
483
+
484
+ # Return true if this arc connects the beginning of the node,
485
+ # else false
486
+ def connects_to_beginning?(node_id)
487
+ (node_id == @begin_node_id and !@begin_node_direction) or
488
+ (node_id == @end_node_id and @end_node_direction)
489
+ end
490
+
491
+ # Return true if this arc connects the end of the node,
492
+ # else false
493
+ def connects_to_end?(node_id)
494
+ (node_id == @begin_node_id and @begin_node_direction) or
495
+ (node_id == @end_node_id and !@end_node_direction)
496
+ end
497
+
498
+ def to_s
499
+ str = ''
500
+ str += '-' if @begin_node_direction == false
501
+ str += @begin_node_id.to_s
502
+ str += ' '
503
+ str += '-' if @end_node_direction == false
504
+ str += @end_node_id.to_s
505
+ str += ' '
506
+ str += @multiplicity.to_s
507
+ str
508
+ end
509
+ end
510
+
511
+ # Tracked read, part of a node
512
+ class NodedRead
513
+ attr_accessor :read_id, :offset_from_start_of_node, :start_coord, :direction
514
+ end
515
+ end
516
+ end
517
+ end