bio-maf 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/.gitignore ADDED
@@ -0,0 +1,53 @@
1
+ # rcov generated
2
+ coverage
3
+ coverage.data
4
+
5
+ # rdoc generated
6
+ rdoc
7
+
8
+ # yard generated
9
+ doc
10
+ .yardoc
11
+
12
+ # bundler
13
+ .bundle
14
+
15
+ # jeweler generated
16
+ pkg
17
+
18
+ # Have editor/IDE/OS specific files you need to ignore? Consider using a global gitignore:
19
+ #
20
+ # * Create a file at ~/.gitignore
21
+ # * Include files you want ignored
22
+ # * Run: git config --global core.excludesfile ~/.gitignore
23
+ #
24
+ # After doing this, these files will be ignored in all your git projects,
25
+ # saving you from having to 'pollute' every project you touch with them
26
+ #
27
+ # Not sure what to needs to be ignored for particular editors/OSes? Here's some ideas to get you started. (Remember, remove the leading # of the line)
28
+ #
29
+ # For MacOS:
30
+ #
31
+ #.DS_Store
32
+
33
+ # For TextMate
34
+ #*.tmproj
35
+ #tmtags
36
+
37
+ # For emacs:
38
+ #*~
39
+ #\#*
40
+ #.\#*
41
+
42
+ # For vim:
43
+ #*.swp
44
+
45
+ # For redcar:
46
+ #.redcar
47
+
48
+ # For rubinius:
49
+ *.rbc
50
+ .rbx
51
+ # Ignore Gemfile.lock for gems. See http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/
52
+ Gemfile.lock
53
+
data/DEVELOPMENT.md CHANGED
@@ -3,6 +3,35 @@
3
3
  Here are notes on less obvious aspects of the development process for
4
4
  this library.
5
5
 
6
+ ## Gem build / tagging / release
7
+
8
+ This now uses [rubygems-tasks][] for building and releasing gems.
9
+
10
+ [rubygems-tasks]: https://github.com/postmodern/rubygems-tasks
11
+
12
+ We build two gem platform variants: a 'default' one for MRI with no
13
+ platform set, and a JRuby one with `platform = 'java'`. These get
14
+ built as `bio-maf-X.Y.Z.gem` and `bio-maf-X.Y.Z-java.gem`. At least
15
+ for now, this is done by running `gem release` separately under JRuby
16
+ and MRI. SCM tagging and pushing is done under MRI only, but the gems
17
+ will be built and pushed to rubygems.org separately under each
18
+ platform.
19
+
20
+ The version is simply set by hand in `bio-maf.gemspec`. Don't forget
21
+ to increment it!
22
+
23
+ Testing the build:
24
+
25
+ $ rake build
26
+ $ rake install
27
+
28
+ Release:
29
+
30
+ $ rvm use 1.9.3@bioruby-maf
31
+ $ rake release
32
+ $ rvm use jruby-1.6.7.2@bioruby-maf
33
+ $ rake release
34
+
6
35
  ## kyotocabinet-java
7
36
 
8
37
  Running `bio-maf` on JRuby requires the [kyotocabinet-java][] gem, a
data/Gemfile CHANGED
@@ -13,6 +13,7 @@ group :development do
13
13
  gem "redcarpet", "~> 2.1.1", :platforms => :mri
14
14
  gem "ronn", "~> 0.7.3", :platforms => :mri
15
15
  gem "sinatra", "~> 1.3.2" # for ronn --server
16
+ gem "rubygems-tasks", "~> 0.2.3"
16
17
  end
17
18
 
18
19
  group :test do
data/README.md CHANGED
@@ -47,8 +47,29 @@ problems building or using this gem, which is still fairly new.
47
47
 
48
48
  ## Installation
49
49
 
50
+ `bio-maf` is now published as a Ruby [gem](https://rubygems.org/gems/bio-maf).
51
+
50
52
  $ gem install bio-maf
51
53
 
54
+ ## Performance
55
+
56
+ This parser performs best under [JRuby][], particularly with Java
57
+ 7. See the [Performance][] wiki page for more information. For best
58
+ results, use JRuby in 1.9 mode with the ObjectProxyCache disabled:
59
+
60
+ [JRuby]: http://jruby.org/
61
+ [Performance]: https://github.com/csw/bioruby-maf/wiki/Performance
62
+
63
+ $ export JRUBY_OPTS='--1.9 -Xji.objectProxyCache=false'
64
+
65
+ Many parsing modes are multithreaded. Under JRuby, it will default to
66
+ using one parser thread per available core, but if desired this can be
67
+ configured with the `:threads` parser option.
68
+
69
+ Ruby 1.9.3 is fully supported, but does not perform as well,
70
+ especially since its concurrency features are not useful for this
71
+ workload.
72
+
52
73
  ## Usage
53
74
 
54
75
  ### Create an index on a MAF file
@@ -162,6 +183,47 @@ Refer to [`chr22_ieq.maf`](https://github.com/csw/bioruby-maf/blob/master/test/d
162
183
  # @size=1601, @strand=:+, @src_size=50103, @text=nil,
163
184
  # @status="I">
164
185
 
186
+ ### Remove gaps from parsed blocks
187
+
188
+ After filtering out species with
189
+ [`Parser#sequence_filter`](#filter-species-returned-in-alignment-blocks),
190
+ gaps may be left where there was an insertion present only in
191
+ sequences that were filtered out. Such gaps can be removed by setting
192
+ the `:remove_gaps` parser option:
193
+
194
+ require 'bio-maf'
195
+ p = Bio::MAF::Parser.new('test/data/chr22_ieq.maf',
196
+ :remove_gaps => true)
197
+
198
+ ### Tile blocks together over an interval
199
+
200
+ Extracts alignment blocks overlapping the given genomic interval and
201
+ constructs a single alignment block covering the entire interval for
202
+ the specified species. Optionally, any gaps in coverage of the MAF
203
+ file's reference sequence can be filled in from a FASTA sequence
204
+ file. See the Cucumber [feature][] for examples of output, and also
205
+ the
206
+ [`maf_tile(1)`](http://csw.github.com/bioruby-maf/man/maf_tile.1.html)
207
+ man page.
208
+
209
+ [feature]: https://github.com/csw/bioruby-maf/blob/master/features/gap-filling.feature
210
+
211
+ require 'bio-maf'
212
+ tiler = Bio::MAF::Tiler.new
213
+ tiler.index = Bio::MAF::KyotoIndex.open('test/data/mm8_chr7_tiny.kct')
214
+ tiler.parser = Bio::MAF::Parser.new('test/data/mm8_chr7_tiny.maf')
215
+ # optional
216
+ tiler.reference = Bio::MAF::FASTARangeReader.new('reference.fa.gz')
217
+ tiler.species = %w(mm8 rn4 hg18)
218
+ tiler.species_map = {
219
+ 'mm8' => 'mouse',
220
+ 'rn4' => 'rat',
221
+ 'hg18' => 'human'
222
+ }
223
+ tiler.interval = Bio::GenomicInterval.zero_based('mm8.chr7',
224
+ 80082334,
225
+ 80082468)
226
+ tiler.write_fasta($stdout)
165
227
 
166
228
  ### Command line tools
167
229
 
@@ -169,6 +231,12 @@ Man pages for command line tools:
169
231
 
170
232
  * [`maf_index(1)`](http://csw.github.com/bioruby-maf/man/maf_index.1.html)
171
233
  * [`maf_to_fasta(1)`](http://csw.github.com/bioruby-maf/man/maf_to_fasta.1.html)
234
+ * [`maf_tile(1)`](http://csw.github.com/bioruby-maf/man/maf_tile.1.html)
235
+
236
+ With [gem-man](https://github.com/defunkt/gem-man) installed, these
237
+ can be read with:
238
+
239
+ $ gem man bio-maf
172
240
 
173
241
  ### Other documentation
174
242
 
@@ -201,7 +269,7 @@ If you use this software, please cite one of
201
269
 
202
270
  ## Biogems.info
203
271
 
204
- This Biogem will be published at [#bio-maf](http://biogems.info/index.html)
272
+ This Biogem is published at [biogems.info](http://biogems.info/index.html#bio-maf).
205
273
 
206
274
  ## Copyright
207
275
 
data/Rakefile CHANGED
@@ -10,10 +10,11 @@ rescue Bundler::BundlerError => e
10
10
  exit e.status_code
11
11
  end
12
12
  require 'rake'
13
- require 'rubygems/package_task'
14
13
 
15
- $gemspec = Gem::Specification.load("bio-maf.gemspec")
16
- Gem::PackageTask.new($gemspec) { |pkg| }
14
+ require 'rubygems/tasks'
15
+ # we only want to do the SCM tag/push stuff once, on MRI
16
+ use_scm = (RUBY_PLATFORM != 'java')
17
+ Gem::Tasks.new(:scm => {:tag => use_scm, :push => use_scm})
17
18
 
18
19
  require 'rspec/core'
19
20
  require 'rspec/core/rake_task'
data/bin/find_overlaps ADDED
@@ -0,0 +1,21 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'bio-maf'
4
+
5
+ parser = Bio::MAF::Parser.new(ARGV.shift, :threads => 4)
6
+
7
+ def desc(seq)
8
+ "#{seq.source}:#{seq.start}-#{seq.end}"
9
+ end
10
+
11
+ open = []
12
+ parser.parse_blocks.each do |block|
13
+ start_pos = block.ref_seq.start
14
+ open.delete_if { |open_b| open_b.ref_seq.end <= start_pos }
15
+ open.each do |ovl|
16
+ ref_a = ovl.ref_seq
17
+ ref_b = block.ref_seq
18
+ puts "#{desc(ref_a)} overlaps #{desc(ref_b)}"
19
+ end
20
+ open << block
21
+ end
data/bin/maf_tile ADDED
@@ -0,0 +1,103 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'optparse'
4
+ require 'ostruct'
5
+
6
+ require 'bio-maf'
7
+ require 'bio-genomic-interval'
8
+
9
+ options = OpenStruct.new
10
+ options.p = { :threads => 1 }
11
+ options.species = []
12
+ options.species_map = {}
13
+ options.usage = false
14
+
15
+ o_parser = OptionParser.new do |opts|
16
+ opts.banner = "Usage: maf_tile [options] <maf> <index>"
17
+ opts.separator ""
18
+ opts.separator "Options:"
19
+ opts.on("-r", "--reference SEQ", "FASTA reference sequence") do |ref|
20
+ options.ref = ref
21
+ end
22
+ opts.on("-i", "--interval BEGIN:END", "Genomic interval, zero-based") do |int|
23
+ if int =~ /(\d+):(\d+)/
24
+ options.interval = ($1.to_i)...($2.to_i)
25
+ else
26
+ options.usage = true
27
+ end
28
+ end
29
+ opts.on("-s", "--species SPECIES[:NAME]", "Species to use (with mapped name)") do |sp|
30
+ if sp =~ /:/
31
+ species, mapped = sp.split(/:/)
32
+ options.species << species
33
+ options.species_map[species] = mapped
34
+ else
35
+ options.species << sp
36
+ end
37
+ end
38
+ opts.on("-o", "--output-base BASE", "Base name for output files",
39
+ "Use stdout for a single interval if not given") do |base|
40
+ options.output_base = base
41
+ end
42
+ opts.on("--bed BED", "BED file specifying intervals",
43
+ "(requires --output-base)") do |bed|
44
+ options.bed = bed
45
+ end
46
+ end
47
+
48
+ o_parser.parse!(ARGV)
49
+
50
+ maf_p = ARGV.shift
51
+ index_p = ARGV.shift
52
+
53
+ unless (! options.usage) \
54
+ && maf_p && index_p && (! options.species.empty?) \
55
+ && (options.output_base ? options.bed : options.interval)
56
+ $stderr.puts o_parser
57
+ exit 2
58
+ end
59
+
60
+ tiler = Bio::MAF::Tiler.new
61
+ tiler.index = Bio::MAF::KyotoIndex.open(index_p)
62
+ tiler.parser = Bio::MAF::Parser.new(maf_p, options.p)
63
+ tiler.reference = Bio::MAF::FASTARangeReader.new(options.ref) if options.ref
64
+ tiler.species = options.species
65
+ tiler.species_map = options.species_map
66
+
67
+ def parse_interval(line)
68
+ src, r_start_s, r_end_s, _ = line.split(nil, 4)
69
+ r_start = r_start_s.to_i
70
+ r_end = r_end_s.to_i
71
+ return Bio::GenomicInterval.zero_based(src, r_start, r_end)
72
+ end
73
+
74
+ def target_for(base, interval)
75
+ path = "#{base}_#{interval.zero_start}-#{interval.zero_end}.fa"
76
+ File.open(path, 'w')
77
+ end
78
+
79
+ if options.bed
80
+ intervals = []
81
+ File.open(options.bed) do |bed_f|
82
+ bed_f.each_line { |line| intervals << parse_interval(line) }
83
+ end
84
+ intervals.sort_by! { |int| int.zero_start }
85
+ intervals.each do |int|
86
+ tiler.interval = int
87
+ target = target_for(options.output_base, int)
88
+ tiler.write_fasta(target)
89
+ target.close
90
+ end
91
+ else
92
+ # single interval
93
+ tiler.interval = Bio::GenomicInterval.zero_based(tiler.index.ref_seq,
94
+ options.interval.begin,
95
+ options.interval.end)
96
+ if options.output_base
97
+ target = target_for(options.output_base, tiler.interval)
98
+ else
99
+ target = $stdout
100
+ end
101
+ tiler.write_fasta(target)
102
+ target.close
103
+ end
data/bio-maf.gemspec ADDED
@@ -0,0 +1,43 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = "bio-maf"
5
+ s.version = "0.2.0"
6
+
7
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
8
+ s.authors = ["Clayton Wheeler"]
9
+ s.date = "2012-06-29"
10
+ s.description = "Multiple Alignment Format parser for BioRuby."
11
+ s.email = "cswh@umich.edu"
12
+ s.executables = ["maf_count", "maf_dump_blocks", "maf_extract_ranges_count", "maf_index", "maf_parse_bench", "maf_to_fasta", "maf_write", "random_ranges"]
13
+ s.extra_rdoc_files = [
14
+ "LICENSE.txt",
15
+ "README.md"
16
+ ]
17
+ s.files = `git ls-files`.split("\n")
18
+ s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
19
+ s.executables = `git ls-files -- bin/*`.split("\n").map {
20
+ |f| File.basename(f)
21
+ }
22
+
23
+ s.homepage = "http://github.com/csw/bioruby-maf"
24
+ s.licenses = ["MIT"]
25
+ s.require_paths = ["lib"]
26
+ s.rubygems_version = "1.8.24"
27
+ s.summary = "MAF parser for BioRuby"
28
+
29
+ s.specification_version = 3
30
+
31
+ if RUBY_PLATFORM == 'java'
32
+ s.platform = 'java'
33
+ end
34
+
35
+ s.add_runtime_dependency('bio-bigbio', [">= 0"])
36
+ s.add_runtime_dependency('bio-genomic-interval', ["~> 0.1.2"])
37
+ if RUBY_PLATFORM == 'java'
38
+ s.add_runtime_dependency('kyotocabinet-java', ["~> 0.2.0"])
39
+ else
40
+ s.add_runtime_dependency('kyotocabinet-ruby', ["~> 1.27.1"])
41
+ end
42
+
43
+ end
@@ -0,0 +1,158 @@
1
+ Feature: Join alignment blocks with reference data
2
+ In order to produce FASTA output with one sequence per species
3
+ For use in downstream tools
4
+ We need to join adjacent MAF blocks together
5
+ And fill gaps in the reference sequence from reference data
6
+
7
+ Scenario: Non-overlapping MAF blocks in region of interest
8
+ Given MAF data:
9
+ """
10
+ ##maf version=1
11
+ a score=20.0
12
+ s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG
13
+ s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG
14
+ s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG
15
+
16
+ a score=21.0
17
+ s sp1.chr1 30 10 + 50 AGGGCGGTCC
18
+ s sp2.chr5 53030 10 + 65536 AGGGCGGTGC
19
+ """
20
+ And chromosome reference sequence:
21
+ """
22
+ >sp1.chr1
23
+ CCAGGATGCT
24
+ GGGCTGAGGG
25
+ CAGTTGTGTC
26
+ AGGGCGGTCC
27
+ GGTGCAGGCA
28
+ """
29
+ When I open it with a MAF reader
30
+ And build an index on the reference sequence
31
+ And tile sp1.chr1:0-50 with the chromosome reference
32
+ And tile with species [sp1, sp2, sp3]
33
+ And write the tiled data as FASTA
34
+ Then the FASTA data obtained should be:
35
+ """
36
+ >sp1
37
+ CCAGGATGCTGGGCTGAGGGC--AGTTGTGTCAGGGCGGTCCGGTGCAGGCA
38
+ >sp2
39
+ **********GGGCTGACGGC--AG*******AGGGCGGTGC**********
40
+ >sp3
41
+ **********AGGTTTAGGGCAGAG***************************
42
+ """
43
+
44
+ Scenario: Non-overlapping MAF blocks with species map
45
+ Given MAF data:
46
+ """
47
+ ##maf version=1
48
+ a score=20.0
49
+ s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG
50
+ s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG
51
+ s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG
52
+
53
+ a score=21.0
54
+ s sp1.chr1 30 10 + 50 AGGGCGGTCC
55
+ s sp2.chr5 53030 10 + 65536 AGGGCGGTGC
56
+ """
57
+ And chromosome reference sequence:
58
+ """
59
+ >sp1.chr1
60
+ CCAGGATGCT
61
+ GGGCTGAGGG
62
+ CAGTTGTGTC
63
+ AGGGCGGTCC
64
+ GGTGCAGGCA
65
+ """
66
+ When I open it with a MAF reader
67
+ And build an index on the reference sequence
68
+ And tile sp1.chr1:0-50 with the chromosome reference
69
+ And tile with species [sp1, sp2, sp3]
70
+ And map species sp1 as mouse
71
+ And map species sp2 as hippo
72
+ And map species sp3 as squid
73
+ And write the tiled data as FASTA
74
+ Then the FASTA data obtained should be:
75
+ """
76
+ >mouse
77
+ CCAGGATGCTGGGCTGAGGGC--AGTTGTGTCAGGGCGGTCCGGTGCAGGCA
78
+ >hippo
79
+ **********GGGCTGACGGC--AG*******AGGGCGGTGC**********
80
+ >squid
81
+ **********AGGTTTAGGGCAGAG***************************
82
+ """
83
+
84
+ Scenario: Subset of non-overlapping MAF blocks in region
85
+ Given MAF data:
86
+ """
87
+ ##maf version=1
88
+ a score=20.0
89
+ s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG
90
+ s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG
91
+ s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG
92
+
93
+ a score=21.0
94
+ s sp1.chr1 30 10 + 50 AGGGCGGTCC
95
+ s sp2.chr5 53030 10 + 65536 AGGGCGGTGC
96
+ """
97
+ And chromosome reference sequence:
98
+ """
99
+ >sp1.chr1
100
+ CCAGGATGCT
101
+ GGGCTGAGGG
102
+ CAGTTGTGTC
103
+ AGGGCGGTCC
104
+ GGTGCAGGCA
105
+ """
106
+ When I open it with a MAF reader
107
+ And build an index on the reference sequence
108
+ And tile sp1.chr1:12-36 with the chromosome reference
109
+ And tile with species [sp1, sp2, sp3]
110
+ And write the tiled data as FASTA
111
+ Then the FASTA data obtained should be:
112
+ """
113
+ >sp1
114
+ GCTGAGGGC--AGTTGTGTCAGGGCG
115
+ >sp2
116
+ GCTGACGGC--AG*******AGGGCG
117
+ >sp3
118
+ GTTTAGGGCAGAG*************
119
+ """
120
+ Scenario: Overlapping MAF blocks in region of interest
121
+ Given MAF data:
122
+ """
123
+ ##maf version=1
124
+ a score=20.0
125
+ s sp1.chr1 10 13 + 50 GGGCTGAGGGC--AG
126
+ s sp2.chr5 53010 13 + 65536 GGGCTGACGGC--AG
127
+ s sp3.chr2 33010 15 + 65536 AGGTTTAGGGCAGAG
128
+
129
+ a score=21.0
130
+ s sp1.chr1 20 10 + 50 AGGGCGGTCC
131
+ s sp2.chr5 53020 10 + 65536 AGGGCGGTGC
132
+ """
133
+ And chromosome reference sequence:
134
+ """
135
+ >sp1.chr1
136
+ CCAGGATGCT
137
+ GGGCTGAGGG
138
+ CAGTTGTGTC
139
+ AGGGCGGTCC
140
+ GGTGCAGGCA
141
+ """
142
+ When I open it with a MAF reader
143
+ And build an index on the reference sequence
144
+ And tile sp1.chr1:0-50 with the chromosome reference
145
+ And tile with species [sp1, sp2, sp3]
146
+ And write the tiled data as FASTA
147
+ Then the FASTA data obtained should be:
148
+ """
149
+ >sp1
150
+ CCAGGATGCTGGGCTGAGGGAGGGCGGTCCAGGGCGGTCCGGTGCAGGCA
151
+ >sp2
152
+ **********GGGCTGACGGAGGGCGGTGC********************
153
+ >sp3
154
+ **********AGGTTTAGGG******************************
155
+ """
156
+
157
+
158
+
@@ -0,0 +1,50 @@
1
+ Feature: Remove gaps from MAF files
2
+ In order to work with only the alignment data involving sequences
3
+ Which can be used by downstream software
4
+ We may want to filter out certain species
5
+ Which can leave gap regions where sequence data was only present
6
+ For removed species
7
+ So it is useful to be able to remove those gaps
8
+
9
+ Background:
10
+ Given MAF data:
11
+ """
12
+ ##maf version=1
13
+ a score=10542.0
14
+ s mm8.chr7 80082334 34 + 145134094 GGGCTGAGGGC--AGGGATGG---AGGGCGGTCC--------------CAGCA-
15
+ s rn4.chr1 136011785 34 + 267910886 GGGCTGAGGGC--AGGGACGG---AGGGCGGTCC--------------CAGCA-
16
+ s oryCun1.scaffold_199771 14021 43 - 75077 -----ATGGGC--AAGCGTGG---AGGGGAACCTCTCCTCCCCTCCGACAAAG-
17
+ s hg18.chr15 88557580 27 + 100338915 --------GGC--AAGTGTGGA--AGGGAAGCCC--------------CAGAA-
18
+ s panTro2.chr15 87959837 27 + 100063422 --------GGC--AAGTGTGGA--AGGGAAGCCC--------------CAGAA-
19
+ s rheMac2.chr7 69864714 28 + 169801366 -------GGGC--AAGTATGGA--AGGGAAGCCC--------------CAGAA-
20
+ s canFam2.chr3 56030570 39 + 94715083 AGGTTTAGGGCAGAGGGATGAAGGAGGAGAATCC--------------CTATG-
21
+ s dasNov1.scaffold_106893 7435 34 + 9831 GGAACGAGGGC--ATGTGTGG---AGGGGGCTGC--------------CCACA-
22
+ s loxAfr1.scaffold_8298 30264 38 + 78952 ATGATGAGGGG--AAGCGTGGAGGAGGGGAACCC--------------CTAGGA
23
+ s echTel1.scaffold_304651 594 37 - 10007 -TGCTATGGCT--TTGTGTCTAGGAGGGGAATCC--------------CCAGGA
24
+ """
25
+ When I open it with a MAF reader
26
+ And filter for only the species
27
+ | mm8 |
28
+ | rn4 |
29
+ | hg18 |
30
+ | canFam2 |
31
+ | loxAfr1 |
32
+
33
+ Scenario: Detect filtered blocks
34
+ When an alignment block can be obtained
35
+ Then the alignment block is marked as filtered
36
+ And the alignment block has 5 sequences
37
+
38
+ Scenario: Detect gaps
39
+ When an alignment block can be obtained
40
+ Then 1 gap is found with length [14]
41
+
42
+ Scenario: Remove gaps
43
+ When an alignment block can be obtained
44
+ And gaps are removed
45
+ Then the text size of the block is 40
46
+
47
+ Scenario: Remove gaps in the parser
48
+ When I enable the :remove_gaps parser option
49
+ And an alignment block can be obtained
50
+ Then the text size of the block is 40
@@ -0,0 +1,32 @@
1
+ Given /^chromosome reference sequence:$/ do |string|
2
+ sio = StringIO.new(string)
3
+ @refseq = Bio::MAF::FASTARangeReader.new(sio)
4
+ end
5
+
6
+ When /^tile ([^:\s]+):(\d+)-(\d+)( with the chromosome reference)?$/ do |seq, i_start, i_end, ref_p|
7
+ @tiler = Bio::MAF::Tiler.new
8
+ @tiler.index = @idx
9
+ @tiler.parser = @parser
10
+ @tiler.reference = @refseq if ref_p
11
+ @tiler.interval = Bio::GenomicInterval.zero_based(seq,
12
+ i_start.to_i,
13
+ i_end.to_i)
14
+ end
15
+
16
+ When /^tile with species \[(.+?)\]$/ do |species_text|
17
+ @tiler.species = species_text.split(/,\s*/)
18
+ end
19
+
20
+ When /^map species (\S+) as (\S+)$/ do |sp1, sp2|
21
+ @tiler.species_map[sp1] = sp2
22
+ end
23
+
24
+ When /^write the tiled data as FASTA$/ do
25
+ @dst = Tempfile.new(["cuke", ".fa"])
26
+ @tiler.write_fasta(@dst)
27
+ end
28
+
29
+ Then /^the FASTA data obtained should be:$/ do |string|
30
+ @dst.seek(0)
31
+ @dst.read.rstrip.should == string.rstrip
32
+ end
@@ -0,0 +1,19 @@
1
+ Then /^the alignment block is marked as filtered$/ do
2
+ @block.filtered?.should be_true
3
+ end
4
+
5
+ Then /^(\d+) gaps? (?:is|are) found with length \[(\d+)\]$/ do |n_gaps, gap_sizes_s|
6
+ gaps = @block.find_gaps
7
+ gaps.size.should == n_gaps.to_i
8
+ e_gap_sizes = gap_sizes_s.split(/,\s*/).collect { |n| n.to_i }
9
+ gap_sizes = gaps.collect { |gap| gap[1] }
10
+ gap_sizes.should == e_gap_sizes
11
+ end
12
+
13
+ When /^gaps are removed$/ do
14
+ @block.remove_gaps!
15
+ end
16
+
17
+ Then /^the text size of the block is (\d+)$/ do |e_text_size|
18
+ @block.text_size.should == e_text_size.to_i
19
+ end
@@ -1,5 +1,6 @@
1
1
  When /^I open it with a MAF reader$/ do
2
- @parser = Bio::MAF::Parser.new(@src_f, @opts || {})
2
+ @opts ||= {}
3
+ @parser = Bio::MAF::Parser.new(@src_f, @opts)
3
4
  end
4
5
 
5
6
  When /^I enable the :(\S+) parser option$/ do |opt_s|