bio-gff3 0.6.0 → 0.8.0

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -11,4 +11,5 @@ group :development do
11
11
  gem "jeweler", "~> 1.5.2"
12
12
  gem "rcov", ">= 0"
13
13
  gem "bio", ">= 1.4.1"
14
+ gem "rspec"
14
15
  end
@@ -2,6 +2,7 @@ GEM
2
2
  remote: http://rubygems.org/
3
3
  specs:
4
4
  bio (1.4.1)
5
+ diff-lcs (1.1.2)
5
6
  git (1.2.5)
6
7
  jeweler (1.5.2)
7
8
  bundler (~> 1.0.0)
@@ -9,6 +10,14 @@ GEM
9
10
  rake
10
11
  rake (0.8.7)
11
12
  rcov (0.9.9)
13
+ rspec (2.3.0)
14
+ rspec-core (~> 2.3.0)
15
+ rspec-expectations (~> 2.3.0)
16
+ rspec-mocks (~> 2.3.0)
17
+ rspec-core (2.3.1)
18
+ rspec-expectations (2.3.0)
19
+ diff-lcs (~> 1.1.2)
20
+ rspec-mocks (2.3.0)
12
21
  shoulda (2.11.3)
13
22
 
14
23
  PLATFORMS
@@ -19,4 +28,5 @@ DEPENDENCIES
19
28
  bundler (~> 1.0.0)
20
29
  jeweler (~> 1.5.2)
21
30
  rcov
31
+ rspec
22
32
  shoulda
@@ -1,19 +1,69 @@
1
1
  = bio-gff3
2
2
 
3
- Description goes here.
4
-
5
- == Contributing to bio-gff3
6
-
7
- * Check out the latest master to make sure the feature hasn't been implemented or the bug hasn't been fixed yet
8
- * Check out the issue tracker to make sure someone already hasn't requested it and/or contributed it
9
- * Fork the project
10
- * Start a feature/bugfix branch
11
- * Commit and push until you are happy with your contribution
12
- * Make sure to add tests for it. This is important so I don't break it in a future version unintentionally.
13
- * Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.
3
+ GFF3 plugin for BioRuby, aimed at parsing big data
4
+
5
+ Features:
6
+
7
+ # Take GFF (genome browser) information and digest mRNA and CDS sequences
8
+ # Options for low memory use and caching of records
9
+ # Support for external FASTA files
10
+
11
+ You can use this plugin in two ways. First as a standalone program, next as a
12
+ plugin library to BioRuby.
13
+
14
+ For example, fetch mRNA and CDS information from GFF3 files and output to FASTA:
15
+
16
+ ./bin/gff3-fetch mrna test/data/gff/test.gff3
17
+ ./bin/gff3-fetch cds test/data/gff/test.gff3
18
+
19
+ Or clone this repository and add the 'lib' dir to the Ruby search path and
20
+
21
+ require 'bio/db/gff/gffdb'
22
+
23
+ You can also run RSpec with something like
24
+
25
+ rspec -I ../bioruby/lib/ spec/*.rb
26
+
27
+ This implementation depends on BioRuby's basic GFF3 parser, with the possible
28
+ advantage that the plugin is faster and does not consume all memory. The Gff3
29
+ specs are based on the output of the Wormbase genome browser.
30
+
31
+ For a write-up see http://thebird.nl/bioruby/BioRuby_GFF3.html
32
+
33
+ -------------------------------------------------------------------------------
34
+
35
+
36
+ Fetch and assemble mRNAs, or CDS and print in FASTA format.
37
+
38
+ gff3-fetch [--no-cache] mRNA|CDS [filename.fa] filename.gff
39
+
40
+ Where:
41
+
42
+ --no-cache : do not load everything in memory (slower)
43
+ mRNA : assemble mRNA
44
+ CDS : assemble CDS
45
+
46
+ Multiple GFF3 files can be used. For external FASTA files, always the last
47
+ one before the GFF file is used.
48
+
49
+ Examples:
50
+
51
+ Find mRNA and CDS information from test.gff3 (which includes sequence information)
52
+
53
+ gff3-fetch mRNA test/data/gff/test.gff3
54
+ gff3-fetch CDS test/data/gff/test.gff3
55
+
56
+ Find CDS from external FASTA file
57
+
58
+ gff3-fetch CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3
59
+
60
+ Find mRNA from external FASTA file, without loading everything in RAM
61
+
62
+ gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
63
+
64
+ If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
14
65
 
15
66
  == Copyright
16
67
 
17
- Copyright (c) 2010 Pjotr Prins. See LICENSE.txt for
18
- further details.
68
+ Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
19
69
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.6.0
1
+ 0.8.0
@@ -4,7 +4,7 @@
4
4
  # Copyright:: August 2010
5
5
  # License:: Ruby License
6
6
  #
7
- # Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
7
+ # Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
8
8
 
9
9
 
10
10
  USAGE = <<EOM
@@ -14,7 +14,7 @@ USAGE = <<EOM
14
14
 
15
15
  Where:
16
16
 
17
- --no-cache : do not load everything in memory
17
+ --no-cache : do not load everything in memory (slower)
18
18
  mRNA : assemble mRNA
19
19
  CDS : assemble CDS
20
20
 
@@ -25,19 +25,22 @@ USAGE = <<EOM
25
25
 
26
26
  Find mRNA and CDS information from test.gff3 (which includes sequence information)
27
27
 
28
- ./bin/gff3-fetch mRNA test/data/gff/test.gff3
29
- ./bin/gff3-fetch CDS test/data/gff/test.gff3
28
+ gff3-fetch mRNA test/data/gff/test.gff3
29
+ gff3-fetch CDS test/data/gff/test.gff3
30
30
 
31
- Find CDS from exteranl FASTA file
31
+ Find CDS from external FASTA file
32
32
 
33
- ./bin/gff3-fetch cds test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3
33
+ gff3-fetch CDS test/data/gff/MhA1_Contig1133.fa test/data/gff/MhA1_Contig1133.gff3
34
34
 
35
35
  Find mRNA from external FASTA file, without loading everything in RAM
36
36
 
37
- ./bin/gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
37
+ gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
38
38
 
39
39
  If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
40
40
 
41
+ == Copyright
42
+
43
+ Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
41
44
 
42
45
  EOM
43
46
 
@@ -45,9 +48,9 @@ rootpath = File.dirname(File.dirname(__FILE__))
45
48
  $: << rootpath+'/lib'
46
49
  $: << rootpath+'/../bioruby/lib'
47
50
 
48
- require 'bio/db/gff/gffdb'
51
+ require 'bio-gff3'
49
52
 
50
- $stderr.print "BioRuby GFF3 Plugin Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>\n\n"
53
+ $stderr.print "BioRuby GFF3 Plugin Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>\n\n"
51
54
 
52
55
  if ARGV.size == 0
53
56
  print USAGE
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{bio-gff3}
8
- s.version = "0.6.0"
8
+ s.version = "0.8.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Pjotr Prins"]
12
- s.date = %q{2010-12-29}
12
+ s.date = %q{2010-12-31}
13
13
  s.default_executable = %q{gff3-fetch}
14
14
  s.description = %q{GFF3 (genome browser) information and digest mRNA and CDS sequences.
15
15
  Options for low memory use and caching of records.
@@ -19,14 +19,12 @@ Support for external FASTA files.
19
19
  s.executables = ["gff3-fetch"]
20
20
  s.extra_rdoc_files = [
21
21
  "LICENSE.txt",
22
- "README",
23
22
  "README.rdoc"
24
23
  ]
25
24
  s.files = [
26
25
  "Gemfile",
27
26
  "Gemfile.lock",
28
27
  "LICENSE.txt",
29
- "README",
30
28
  "README.rdoc",
31
29
  "Rakefile",
32
30
  "VERSION",
@@ -83,12 +81,14 @@ Support for external FASTA files.
83
81
  s.add_development_dependency(%q<jeweler>, ["~> 1.5.2"])
84
82
  s.add_development_dependency(%q<rcov>, [">= 0"])
85
83
  s.add_development_dependency(%q<bio>, [">= 1.4.1"])
84
+ s.add_development_dependency(%q<rspec>, [">= 0"])
86
85
  else
87
86
  s.add_dependency(%q<shoulda>, [">= 0"])
88
87
  s.add_dependency(%q<bundler>, ["~> 1.0.0"])
89
88
  s.add_dependency(%q<jeweler>, ["~> 1.5.2"])
90
89
  s.add_dependency(%q<rcov>, [">= 0"])
91
90
  s.add_dependency(%q<bio>, [">= 1.4.1"])
91
+ s.add_dependency(%q<rspec>, [">= 0"])
92
92
  end
93
93
  else
94
94
  s.add_dependency(%q<shoulda>, [">= 0"])
@@ -96,6 +96,7 @@ Support for external FASTA files.
96
96
  s.add_dependency(%q<jeweler>, ["~> 1.5.2"])
97
97
  s.add_dependency(%q<rcov>, [">= 0"])
98
98
  s.add_dependency(%q<bio>, [">= 1.4.1"])
99
+ s.add_dependency(%q<rspec>, [">= 0"])
99
100
  end
100
101
  end
101
102
 
@@ -0,0 +1 @@
1
+ require 'bio/db/gff/gffdb'
@@ -198,77 +198,76 @@ module Bio
198
198
  # to the landmark given in column 1 - in this case the sequence as it
199
199
  # is passed in. The following options are available:
200
200
  #
201
- # :phase : set phase (default true)
202
- # :reverse : do reverse if reverse is indicated (true)
203
- # :complement : do complement if reverse is indicated (true)
204
- # :trim : make sure sequence is multiple of 3 nucleotide bps (false)
201
+ # :reverse : do reverse if reverse is indicated (default true)
202
+ # :complement : do complement if reverse is indicated (default true)
203
+ # :phase : do set CDS phase (default false, normally ignore)
204
+ # :trim : make sure sequence is multiple of 3 nucleotide bps (default false)
205
205
  #
206
206
  # there are two special options:
207
207
  #
208
208
  # :raw : raw sequence (all above false)
209
- # :codonize : codon sequence (all above true)
209
+ # :codonize : codon sequence (reverse, complement and trim are true)
210
210
  #
211
- def assemble sequence, startpos, reclist, options = { :phase=>true, :reverse=>true, :trim=>false, :complement=>false }
211
+ def assemble sequence, startpos, reclist, options = { :phase=>false, :reverse=>true, :trim=>false, :complement=>true, :debug=>false }
212
+ do_debug = options[:debug]
212
213
  do_phase = options[:phase]
213
- do_reverse = options[:reverse]
214
- do_trim = options[:trim]
215
- do_complement = options[:complement]
214
+ do_reverse = (options[:reverse] == false ? false : true)
215
+ do_trim = (options[:trim] == false ? false : true)
216
+ do_complement = (options[:complement] == false ? false : true)
216
217
  if options[:raw]
217
218
  do_phase = false
218
219
  do_reverse = false
219
220
  do_trim = false
220
221
  do_complement = false
221
222
  elsif options[:codonize]
222
- do_phase = true
223
+ do_phase = false
223
224
  do_reverse = true
224
225
  do_trim = true
225
226
  do_complement = true
226
227
  end
227
- retval = ""
228
228
  sectionlist = Sections::sort(reclist)
229
- reverse = false
230
- # we assume strand is always the same
231
229
  rec0 = sectionlist.first.rec
232
- reverse = (rec0.strand == '-') if rec0.strand
233
- if reverse
234
- # fetch phase from the last feature when reversed
235
- rec0 = sectionlist.last.rec
236
- end
237
- frame = 0
238
- frame = rec0.frame if rec0.frame
239
- sectionlist.each do | section |
240
- if sequence.kind_of?(Bio::FastaFormat)
241
- sequence = sequence.seq
242
- end
243
- rec = section.rec
244
- seq = sequence[(rec.start-1)..(rec.end-1)]
245
- retval += seq
230
+ # we assume ORF is always read in the same direction
231
+ orf_reverse = (rec0.strand == '-')
232
+ orf_frame = startpos - 1
233
+ orf_frameshift = orf_frame % 3
234
+ sectionlist = sectionlist.reverse if orf_reverse
235
+ if do_debug
236
+ p "------------------"
237
+ p options
238
+ p [:reverse,do_reverse]
239
+ p [:complement,do_complement]
240
+ p [:trim,do_trim]
241
+ p [:orf_reverse, orf_reverse, rec0.strand]
246
242
  end
247
- seq = retval
248
- if do_reverse
249
- # if strand is negative, reverse
250
- seq = seq.reverse if reverse
243
+
244
+ if sequence.kind_of?(Bio::FastaFormat)
245
+ # BioRuby conversion
246
+ sequence = sequence.seq
251
247
  end
252
- if do_phase
253
- # For forward strand features, phase is counted from the start
254
- # field. For reverse strand features, phase is counted from the end
255
- # field.
256
- #
257
- # With a reverse protein coding string in Wormbase
258
- # the phase appears to be disregarded - or rather handled
259
- # by start-stop. This is a hack.
260
- if do_reverse and reverse and (seq.size % 3 == 0)
261
- # do nothing
262
- else
263
- seq = seq[frame..-1] if frame != 0 # set phase
248
+ # Generate array of sequences
249
+ seq = sectionlist.map { | section |
250
+ rec = section.rec
251
+ s = sequence[(section.begin-1)..(section.end-1)]
252
+ if do_reverse and orf_reverse
253
+ s = s.reverse
264
254
  end
265
- end
266
- if do_complement
267
- # if strand is negative, forward complement
268
- if reverse
269
- ntseq = Bio::Sequence::NA.new(seq)
270
- seq = ntseq.forward_complement.upcase
255
+ # Correct for phase. Unfortunately the use of phase is ambiguous.
256
+ # Here we check whether rec.start is in line with orf_frame. If it
257
+ # is, we correct for phase. Otherwise it is ignored.
258
+ if do_phase and rec.phase
259
+ phase = rec.phase.to_i
260
+ # if ((rec.start-startpos) % 3 == 0)
261
+ s = s[phase..-1]
262
+ # end
271
263
  end
264
+ s
265
+ }
266
+ # p seq
267
+ seq = seq.join
268
+ if do_complement and do_reverse and orf_reverse
269
+ ntseq = Bio::Sequence::NA.new(seq)
270
+ seq = ntseq.forward_complement.upcase
272
271
  end
273
272
  if do_trim
274
273
  reduce = seq.size % 3
@@ -279,9 +278,10 @@ module Bio
279
278
  end
280
279
 
281
280
  # Patch a sequence together from a Sequence string and an array
282
- # of records and translate in the correct direction and frame
283
- def assembleAA sequence, startpos, rec
284
- seq = assemble(sequence, startpos, rec, :phase=>true, :reverse=>true, :complement=>true)
281
+ # of records and translate in the correct direction and frame. The options
282
+ # are the same as for +assemble+.
283
+ def assembleAA sequence, startpos, reclist, options = { :phase=>false, :reverse=>true, :trim=>false, :complement=>true }
284
+ seq = assemble(sequence, startpos, reclist, options)
285
285
  ntseq = Bio::Sequence::NA.new(seq)
286
286
  ntseq.translate
287
287
  end
@@ -58,7 +58,7 @@ module Bio
58
58
  end
59
59
 
60
60
  def fasta_rec header, buf
61
- fst = Bio::FastaFormat.new(header+"\n"+buf.to_s)
61
+ fst = Bio::FastaFormat.new(header+"\n"+buf.join(''))
62
62
  return fst.definition, fst
63
63
  end
64
64
 
@@ -6,7 +6,7 @@
6
6
  #
7
7
  $: << "../lib"
8
8
 
9
- require 'bio/db/gff/gffdb'
9
+ require 'bio-gff3'
10
10
 
11
11
  include Bio::GFFbrowser
12
12
 
@@ -1,12 +1,12 @@
1
1
  # RSpec for BioRuby-GFF3-Plugin. Run with something like:
2
2
  #
3
- # ruby -I ../bioruby/lib/ ~/.gems/bin/spec spec/gff3_assemble3_spec.rb
3
+ # rspec -I ../bioruby/lib/ spec/gff3_assemble3_spec.rb
4
4
  #
5
- # Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
5
+ # Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
6
6
  #
7
7
  $: << "../lib"
8
8
 
9
- require 'bio/db/gff/gffdb'
9
+ require 'bio-gff3'
10
10
 
11
11
  include Bio::GFFbrowser
12
12
 
@@ -1,12 +1,12 @@
1
1
  # RSpec for BioRuby-GFF3-Plugin. Run with something like:
2
2
  #
3
- # ruby -I ../bioruby/lib/ ~/.gems/bin/spec spec/gff3_assemble_spec.rb
3
+ # rspec -I ../bioruby/lib/ spec/gff3_assemble_spec.rb
4
4
  #
5
- # Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
5
+ # Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
6
6
  #
7
7
  $: << "../lib"
8
8
 
9
- require 'bio/db/gff/gffdb'
9
+ require 'bio-gff3'
10
10
 
11
11
  include Bio::GFFbrowser
12
12
 
@@ -83,17 +83,20 @@ describe GFFdb, "Assemble CDS" do
83
83
  aaseq = @gff.assembleAA(@contigsequence,component.start,[cds0])
84
84
  aaseq.should == "MRPLTDEETEKFFKKLSNYIGDNIKLLLEREDGEYVFRLHKDRVYYC"
85
85
  end
86
+ # MhA1_Contig1133 WormBase CDS 8065 8308 . + 1 ID=cds:MhA1_Contig1133.frz3.gene4;Parent=transcript:MhA1_Contig1133.frz3.gene4
86
87
  it "should translate CDS 8065:8308 (in frame 1, + strand)" do
87
88
  recs = @cdslist['cds:MhA1_Contig1133.frz3.gene4']
88
89
  component = @componentlist['cds:MhA1_Contig1133.frz3.gene4']
89
90
  cds1 = recs[1]
90
- seq = @gff.assemble(@contigsequence,component.start,[cds1], :phase => false)
91
+ seq = @gff.assemble(@contigsequence,component.start,[cds1])
91
92
  seq.size.should == 244
92
93
  seq.should == "TGAAAAATTAATGCGACAAGCAGCATGTATTGGACGTAAACAATTGGGATCTTTTGGAACTTGTTTGGGTAAATTCACAAAAGGAGGGTCTTTCTTTCTTCATATAACATCATTGGATTATTTGGCACCTTATGCTTTAGCAAAAATTTGGTTAAAACCACAAGCTGAACAACAATTTTTATATGGAAATAATATTGTTAAATCTGGTGTTGGAAGAATGAGTGAAGGGATTGAAGAAAAACAA"
93
- seq = @gff.assemble(@contigsequence,component.start,[cds1])
94
+ seq = @gff.assemble(@contigsequence,component.start,[cds1],:phase => true)
95
+ seq.size.should == 243
94
96
  seq.should == "GAAAAATTAATGCGACAAGCAGCATGTATTGGACGTAAACAATTGGGATCTTTTGGAACTTGTTTGGGTAAATTCACAAAAGGAGGGTCTTTCTTTCTTCATATAACATCATTGGATTATTTGGCACCTTATGCTTTAGCAAAAATTTGGTTAAAACCACAAGCTGAACAACAATTTTTATATGGAAATAATATTGTTAAATCTGGTGTTGGAAGAATGAGTGAAGGGATTGAAGAAAAACAA"
95
- aaseq = @gff.assembleAA(@contigsequence,component.start,[cds1])
97
+ aaseq = @gff.assembleAA(@contigsequence,component.start,[cds1],:phase => true)
96
98
  # note it should handle the frame shift and direction!
99
+ # wormbase validated
97
100
  aaseq.should == "EKLMRQAACIGRKQLGSFGTCLGKFTKGGSFFLHITSLDYLAPYALAKIWLKPQAEQQFLYGNNIVKSGVGRMSEGIEEKQ"
98
101
  end
99
102
  it "should translate CDS3 (in frame 0, + strand)" do
@@ -114,7 +117,7 @@ describe GFFdb, "Assemble CDS" do
114
117
  seq.size.should == 543
115
118
  seq.should == "ATGCGTCCTTTAACAGATGAAGAAACTGAAAAGTTTTTCAAAAAACTTTCAAATTATATTGGTGACAATATTAAACTTTTATTGGAAAGAGAAGATGGAGAATATGTTTTTCGTTTACATAAAGACAGAGTTTATTATTGCAGTGAAAAATTAATGCGACAAGCAGCATGTATTGGACGTAAACAATTGGGATCTTTTGGAACTTGTTTGGGTAAATTCACAAAAGGAGGGTCTTTCTTTCTTCATATAACATCATTGGATTATTTGGCACCTTATGCTTTAGCAAAAATTTGGTTAAAACCACAAGCTGAACAACAATTTTTATATGGAAATAATATTGTTAAATCTGGTGTTGGAAGAATGAGTGAAGGGATTGAAGAAAAACAAGGTATTATTATTTATAATATGTCAGATTTACCATTGGGTTTTGGAGTGGCTGCAAAGGGAACATTATCTTGTAGAAAAGTAGATCCTACAGCTTTAGTTGTTTTACATCAATCAGATTTGGGTGAATATATTCGAAATGAAGAGGGATTAATTTAA"
116
119
  seq = @gff.assemble(@contigsequence,component.start,recs)
117
- seq.size.should == 543
120
+ seq.size.should == 543 # auto correct for phase problem
118
121
  seq.should == "ATGCGTCCTTTAACAGATGAAGAAACTGAAAAGTTTTTCAAAAAACTTTCAAATTATATTGGTGACAATATTAAACTTTTATTGGAAAGAGAAGATGGAGAATATGTTTTTCGTTTACATAAAGACAGAGTTTATTATTGCAGTGAAAAATTAATGCGACAAGCAGCATGTATTGGACGTAAACAATTGGGATCTTTTGGAACTTGTTTGGGTAAATTCACAAAAGGAGGGTCTTTCTTTCTTCATATAACATCATTGGATTATTTGGCACCTTATGCTTTAGCAAAAATTTGGTTAAAACCACAAGCTGAACAACAATTTTTATATGGAAATAATATTGTTAAATCTGGTGTTGGAAGAATGAGTGAAGGGATTGAAGAAAAACAAGGTATTATTATTTATAATATGTCAGATTTACCATTGGGTTTTGGAGTGGCTGCAAAGGGAACATTATCTTGTAGAAAAGTAGATCCTACAGCTTTAGTTGTTTTACATCAATCAGATTTGGGTGAATATATTCGAAATGAAGAGGGATTAATTTAA"
119
122
  aaseq = @gff.assembleAA(@contigsequence,component.start,recs)
120
123
  aaseq.should == "MRPLTDEETEKFFKKLSNYIGDNIKLLLEREDGEYVFRLHKDRVYYCSEKLMRQAACIGRKQLGSFGTCLGKFTKGGSFFLHITSLDYLAPYALAKIWLKPQAEQQFLYGNNIVKSGVGRMSEGIEEKQGIIIYNMSDLPLGFGVAAKGTLSCRKVDPTALVVLHQSDLGEYIRNEEGLI*"
@@ -161,17 +164,17 @@ describe GFFdb, "Assemble CDS" do
161
164
  # tctttgtgcttccaaacgagctaatgacattccactacgatctcgcaatgattgtcgtct
162
165
  # aattgcacctctagctgagaaaggattttctaatgttgaaggtggttgttgaggagattc
163
166
  # aaacttttttctt
164
- cds1 = recs[5]
165
- cds1.start.should == 27981
166
- cds1.frame.should == 1
167
- cds1.strand.should == '-'
168
- seq = @gff.assemble(@contigsequence,component.start,[cds1],:phase=>true,:reverse=>true)
167
+ cds5 = recs[5]
168
+ cds5.start.should == 27981
169
+ cds5.frame.should == 1
170
+ cds5.strand.should == '-'
171
+ seq = @gff.assemble(@contigsequence,component.start,[cds5],:phase=>true,:complement=>false)
169
172
  seq.should == "TCTTTTTTCAAACTTAGAGGAGTTGTTGGTGGAAGTTGTAATCTTTTAGGAAAGAGTCGATCTCCACGTTAATCTGCTGTTAGTAACGCTCTAGCATCACCTTACAGTAATCGAGCAAACCTTCGTGTTTCTCTCCCAAGACTGGAATAATCTTCAATATTATCATTTCTTCTGGAAAGAAGATTATGTCGC"
170
173
  seq.size.should == 192
171
- seq = @gff.assemble(@contigsequence,component.start,[cds1],:phase=>true,:reverse=>true,:complement=>true)
174
+ seq = @gff.assemble(@contigsequence,component.start,[cds5],:phase=>true,:reverse=>true,:complement=>true)
172
175
  seq.should == "AGAAAAAAGTTTGAATCTCCTCAACAACCACCTTCAACATTAGAAAATCCTTTCTCAGCTAGAGGTGCAATTAGACGACAATCATTGCGAGATCGTAGTGGAATGTCATTAGCTCGTTTGGAAGCACAAAGAGAGGGTTCTGACCTTATTAGAAGTTATAATAGTAAAGAAGACCTTTCTTCTAATACAGCG"
173
176
  seq.size.should == 192
174
- aaseq = @gff.assembleAA(@contigsequence,component.start,[cds1])
177
+ aaseq = @gff.assembleAA(@contigsequence,component.start,[cds5],:phase=>true)
175
178
  # note it should handle the frame shift and direction!
176
179
  # >EMBOSS_001_4
177
180
  # RKKFESPQQPPSTLENPFSARGAIRRQSLRDRSGMSLARLEAQREGSDLIRSYNSKEDLSSNTA
@@ -190,9 +193,9 @@ describe GFFdb, "Assemble CDS" do
190
193
  cds2.start.should == 27981
191
194
  cds2.frame.should == 1
192
195
  cds2.strand.should == '-'
193
- seq = @gff.assemble(@contigsequence,component.start,[cds2],:complement=>true)
194
- seq.should == "GCGACATAATCTTCTTTCCAGAAGAAATGATAATATTGAAGATTATTCCAGTCTTGGGAGAGAAACACGAAGGTTTGCTCGATTACTGTAAGGTGATGCTAGAGCGTTACTAACAGCAGATTAACGTGGAGATCGACTCTTTCCTAAAAGATTACAACTTCCACCAACAACTCCTCTAAGTTTGAAAAAAGAA"
195
- aaseq = @gff.assembleAA(@contigsequence,component.start,[cds2])
196
+ seq = @gff.assemble(@contigsequence,component.start,[cds2],:reverse=>false,:complement=>true)
197
+ seq.should == "CGCTGTATTAGAAGAAAGGTCTTCTTTACTATTATAACTTCTAATAAGGTCAGAACCCTCTCTTTGTGCTTCCAAACGAGCTAATGACATTCCACTACGATCTCGCAATGATTGTCGTCTAATTGCACCTCTAGCTGAGAAAGGATTTTCTAATGTTGAAGGTGGTTGTTGAGGAGATTCAAACTTTTTTCT"
198
+ aaseq = @gff.assembleAA(@contigsequence,component.start,[cds2],:phase=>true)
196
199
  # note it should handle the frame shift and direction!
197
200
  # >27981..28173_4 RKKFESPQQPPSTLENPFSARGAIRRQSLRDRSGMSLARLEAQREGSDLIRSYNSKEDLSSNTA
198
201
  aaseq.should == "RKKFESPQQPPSTLENPFSARGAIRRQSLRDRSGMSLARLEAQREGSDLIRSYNSKEDLSSNTA"
@@ -222,17 +225,17 @@ describe GFFdb, "Assemble CDS" do
222
225
  cds2.strand.should == '-'
223
226
  seq = @gff.assemble(@contigsequence,component.start,[cds2], :raw=>true)
224
227
  seq.should == "ATAAATTTCCCTTTCTCCAGAAAAACTTACAAAAGTAGATTTATCAACAGAATTTCTTTGATCTAAAGGTAATCCTCTTTGATGTAAAATTTTCATATCATTTAACATTTCCCTTTCTGGTTGTTGTCTTCTTTCATCAATCATTTCTTGTGTAATTCCTCTAGCAGCCATTTCAGATTCAATAAGGTCAAGGGTTTGTTCATCATCACAAATATCATAAGGCATATTACCATCTGCATTTACTGCTAGTAAATCTGCGTTG"
225
- seq = @gff.assemble(@contigsequence,component.start,[cds2], :codonize=>true)
228
+ seq = @gff.assemble(@contigsequence,component.start,[cds2], :phase=>true)
226
229
  seq.should == "AACGCAGATTTACTAGCAGTAAATGCAGATGGTAATATGCCTTATGATATTTGTGATGATGAACAAACCCTTGACCTTATTGAATCTGAAATGGCTGCTAGAGGAATTACACAAGAAATGATTGATGAAAGAAGACAACAACCAGAAAGGGAAATGTTAAATGATATGAAAATTTTACATCAAAGAGGATTACCTTTAGATCAAAGAAATTCTGTTGATAAATCTACTTTTGTAAGTTTTTCTGGAGAAAGGGAAATTTAT"
227
230
  # cds1.frame = 1
228
- aaseq = @gff.assembleAA(@contigsequence,component.start,[cds2])
231
+ aaseq = @gff.assembleAA(@contigsequence,component.start,[cds2],:phase=>true)
229
232
  # note it should handle the frame shift and direction!
230
233
  aaseq.should == "NADLLAVNADGNMPYDICDDEQTLDLIESEMAARGITQEMIDERRQQPEREMLNDMKILHQRGLPLDQRNSVDKSTFVSFSGEREIY"
231
234
  end
232
235
  it "should assemble the protein sequence for MhA1_Contig1133.frz3.gene11" do
233
236
  recs = @cdslist['cds:MhA1_Contig1133.frz3.gene11']
234
237
  component = @componentlist['cds:MhA1_Contig1133.frz3.gene11']
235
- seq = @gff.assemble(@contigsequence,component.start,recs, :phase=>true, :reverse=>true, :complement=>true)
238
+ seq = @gff.assemble(@contigsequence,component.start,recs, :reverse=>true, :complement=>true)
236
239
  seq.should == "ATGGACCATCATGCATTGGTGGAGGAATTACCAGAAATTGAAAAATTAACTCCTCAAGAACGTATTGCATTAGCTAGAGAACGCCGTGCTGAACAACTTCGACAGAATGCTGCACGGGAGGCTCAATTGCCAATGCCTGCACAGCGCCGGCCTCGTCTTCGATTTACACCAGATGTTGCTTTACTTGAGGCAACATGTGCCATTGACAATAATGAAAGAATTGTTCGTCTTCTGCTTAGGTACGGAGCTTGTGTTAATGCCAAAGACACTGAACTTTGGACACCATTGCACGCAGCTGCATGTTGTGCTTATATTGATATTGTTCGATTGCTTATTGCACACAACGCAGATTTACTAGCAGTAAATGCAGATGGTAATATGCCTTATGATATTTGTGATGATGAACAAACCCTTGACCTTATTGAATCTGAAATGGCTGCTAGAGGAATTACACAAGAAATGATTGATGAAAGAAGACAACAACCAGAAAGGGAAATGTTAAATGATATGAAAATTTTACATCAAAGAGGATTACCTTTAGATCAAAGAAATTCTGTTGATAAATCTACTTTTGTAAGTTTTTCTGGAGAAAGGGAAATTTATTTACATATAGCAGCAGCTAATGGTTATTATGATGTTGCTGCTTTCCTTCTTCGTTGTAATGTTTCTCCAGCATTGAGAGATATAGATTTGTGGCAACCAATTCATGCAGCTGCTTCTTGGAATCAACCAGACTTAATCGAGCTTTTATGCGAATATGGGGCTGATATAAATGCAAAAACTGGAGCTGGGGAAAGCCCTTTAGAATTAACTGAAGATGAACCAACCCAACAAGTAATTAGAACAATCGCTCAGACAGAAGCAAGGAGACGGCGTGGTCCAGGTGGTGGTTACTTTGGTGTTCGTGATTCTCGACGACAAAGCCGAAAAAGAAAAAAGTTTGAATCTCCTCAACAACCACCTTCAACATTAGAAAATCCTTTCTCAGCTAGAGGTGCAATTAGACGACAATCATTGCGAGATCGTAGTGGAATGTCATTAGCTCGTTTGGAAGCACAAAGAGAGGGTTCTGACCTTATTAGAAGTTATAATAGTAAAGAAGACCTTTCTTCTAATACAGCGGATGATTCTTTAAATGTTGGAAGTTCTTCATATCTCAACAATCCAACAGCCTCGGCTAGTGCTTCCTCTTCAGCATTACACGGAACTCCACATCAACAACAACGTCGTGAATCTCCACCTAAACGTGCATTAATGGCTAGAAGTGCTTCTCATCAAAAACAAAAACAACAAATGTCTCCAGATGAATGGCTGAAAAAATTAGAAGCAGATTCTGCAGGTTTTCGAGATAATGATGGAGAAGATGGTGAATTACAATCTGAACTTAAAGGAGGACAAAGAATGAAGAGTGGTGGTGGTGGAGGAGCGAGAGGTCAGCAAGAAATGAATGGTGGTCCAACAGCAACATTTGGTGGAGCTTCAAAACAACAATTAGCAATGGGCTCTGGACCCAATAGACGGCGCAAACAAGGATGTTGCTCTGTTTTGTGA"
237
240
  aaseq = @gff.assembleAA(@contigsequence,component.start,recs)
238
241
  aaseq.should == "MDHHALVEELPEIEKLTPQERIALARERRAEQLRQNAAREAQLPMPAQRRPRLRFTPDVALLEATCAIDNNERIVRLLLRYGACVNAKDTELWTPLHAAACCAYIDIVRLLIAHNADLLAVNADGNMPYDICDDEQTLDLIESEMAARGITQEMIDERRQQPEREMLNDMKILHQRGLPLDQRNSVDKSTFVSFSGEREIYLHIAAANGYYDVAAFLLRCNVSPALRDIDLWQPIHAAASWNQPDLIELLCEYGADINAKTGAGESPLELTEDEPTQQVIRTIAQTEARRRRGPGGGYFGVRDSRRQSRKRKKFESPQQPPSTLENPFSARGAIRRQSLRDRSGMSLARLEAQREGSDLIRSYNSKEDLSSNTADDSLNVGSSSYLNNPTASASASSSALHGTPHQQQRRESPPKRALMARSASHQKQKQQMSPDEWLKKLEADSAGFRDNDGEDGELQSELKGGQRMKSGGGGGARGQQEMNGGPTATFGGASKQQLAMGSGPNRRRKQGCCSVL*"
@@ -1,12 +1,12 @@
1
1
  # RSpec for BioRuby-GFF3-Plugin. Run with something like:
2
2
  #
3
- # ruby -I ../bioruby/lib/ ~/.gems/bin/spec spec/gff3_spec.rb
3
+ # rspec -I ../bioruby/lib/ spec/gff3_fileiterator_spec.rb
4
4
  #
5
5
  # Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
6
6
  #
7
7
  $: << "../lib"
8
8
 
9
- require 'bio/db/gff/gffdb'
9
+ require 'bio-gff3'
10
10
 
11
11
  TEST1='test/data/gff/test.gff3'
12
12
  TEST2='test/data/gff/standard.gff3'
@@ -1,13 +1,13 @@
1
1
  # RSpec for BioRuby-GFF3-Plugin. Run with something like:
2
2
  #
3
- # ruby -I ../bioruby/lib/ ~/.gems/bin/spec spec/gffdb_spec.rb
3
+ # rspec -I ../bioruby/lib/ spec/gffdb_spec.rb
4
4
  #
5
5
  # Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
6
6
  #
7
7
  $: << "../lib"
8
8
 
9
9
 
10
- require 'bio/db/gff/gffdb'
10
+ require 'bio-gff3'
11
11
 
12
12
  include Bio::GFFbrowser
13
13
 
@@ -64,11 +64,11 @@ AATGGGTACTGCACCCCTCGTCCTGTAGAGACGTCACAGCCAACGTGCCTTCTTATCTTGATACATTAGT
64
64
  GCCCAAGAATGCGATCCCAGAAGTCTTGGTTCTAAAGTCGTCGGAAAGATTTGAGGAACTGCCATACAGC
65
65
  CCGTGGGTGAAACTGTCGACATCCATTGTGCGAATAGGCCTGCTAGTGAC
66
66
  >test02
67
- ACGAAGATTTGTATGACTGATTTATCCTGGACAGGCATTGGTCAGATGTCTCCTTCCGTATCGTCGTTTA
67
+ ACGACGAAGATTTGTATGACTGATTTATCCTGGACAGGCATTGGTCAGATGTCTCCTTCCGTATCGTCGTTTA
68
68
  GTTGCAAATCCGAGTGTTCGGGGGTATTGCTATTTGCCACCTAGAAGCGCAACATGCCCAGCTTCACACA
69
69
  CCATAGCGAACACGCCGCCCCGGTGGCGACTATCGGTCGAAGTTAAGACAATTCATGGGCGAAACGAGAT
70
70
  AATGGGTACTGCACCCCTCGTCCTGTAGAGACGTCACAGCCAACGTGCCTTCTTATCTTGATACATTAGT
71
71
  GCCCAAGAATGCGATCCCAGAAGTCTTGGTTCTAAAGTCGTCGGAAAGATTTGAGGAACTGCCATACAGC
72
- CCGTGGGTGAAACTGTCGACATCCATTGTGCGAATAGGCCTGCTAGTGAC
72
+ CCGTGGGTGAAACTGTCGACATCCATTGTGCGAATAGGCCTGCTAGTGACAAAAAA
73
73
 
74
74
 
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 0
7
- - 6
7
+ - 8
8
8
  - 0
9
- version: 0.6.0
9
+ version: 0.8.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Pjotr Prins
@@ -14,7 +14,7 @@ autorequire:
14
14
  bindir: bin
15
15
  cert_chain: []
16
16
 
17
- date: 2010-12-29 00:00:00 +01:00
17
+ date: 2010-12-31 00:00:00 +01:00
18
18
  default_executable: gff3-fetch
19
19
  dependencies:
20
20
  - !ruby/object:Gem::Dependency
@@ -88,6 +88,19 @@ dependencies:
88
88
  type: :development
89
89
  prerelease: false
90
90
  version_requirements: *id005
91
+ - !ruby/object:Gem::Dependency
92
+ name: rspec
93
+ requirement: &id006 !ruby/object:Gem::Requirement
94
+ none: false
95
+ requirements:
96
+ - - ">="
97
+ - !ruby/object:Gem::Version
98
+ segments:
99
+ - 0
100
+ version: "0"
101
+ type: :development
102
+ prerelease: false
103
+ version_requirements: *id006
91
104
  description: |
92
105
  GFF3 (genome browser) information and digest mRNA and CDS sequences.
93
106
  Options for low memory use and caching of records.
@@ -100,13 +113,11 @@ extensions: []
100
113
 
101
114
  extra_rdoc_files:
102
115
  - LICENSE.txt
103
- - README
104
116
  - README.rdoc
105
117
  files:
106
118
  - Gemfile
107
119
  - Gemfile.lock
108
120
  - LICENSE.txt
109
- - README
110
121
  - README.rdoc
111
122
  - Rakefile
112
123
  - VERSION
@@ -151,7 +162,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
151
162
  requirements:
152
163
  - - ">="
153
164
  - !ruby/object:Gem::Version
154
- hash: -266764915
165
+ hash: -1033924243
155
166
  segments:
156
167
  - 0
157
168
  version: "0"
data/README DELETED
@@ -1,65 +0,0 @@
1
- = GFF3 plugin for BioRuby, aimed at parsing big data =
2
-
3
- Features:
4
-
5
- # Take GFF (genome browser) information and digest mRNA and CDS sequences
6
- # Options for low memory use and caching of records
7
- # Support for external FASTA files
8
-
9
- You can use this plugin in two ways. First as a standalone program, next as a
10
- plugin library to BioRuby.
11
-
12
- For example, fetch mRNA and CDS information from GFF3 files and output to FASTA:
13
-
14
- ./bin/gff3-fetch mrna test/data/gff/test.gff3
15
- ./bin/gff3-fetch cds test/data/gff/test.gff3
16
-
17
- Or clone this repository and add the 'lib' dir to the Ruby search path and
18
-
19
- require 'bio/db/gff/gffdb'
20
-
21
- You can also run RSpec with something like
22
-
23
- ruby -I ../bioruby/lib/ ~/.gems/bin/spec spec/gffdb_spec.rb
24
-
25
- This implementation depends on BioRuby's basic GFF3 parser, with the possible
26
- advantage that the plugin is faster and does not consume all memory. The Gff3
27
- specs are based on the output of the Wormbase genome browser.
28
-
29
- For a write-up see http://thebird.nl/bioruby/BioRuby_GFF3.html
30
-
31
- Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
32
-
33
- -------------------------------------------------------------------------------
34
-
35
- Usage:
36
-
37
- BioRuby GFF3 Plugin Copyright (C) 2010 Pjotr Prins <pjotr.prins@thebird.nl>
38
-
39
- Fetch and assemble mRNAs, or CDS and print in FASTA format.
40
-
41
- gff3-fetch [--no-cache] mRNA|CDS [filename.fa] filename.gff
42
-
43
- Where:
44
-
45
- --no-cache : do not load everything in memory
46
- mRNA : assemble mRNA
47
- CDS : assemble CDS
48
-
49
- Multiple GFF3 files can be used. For external FASTA files, always the last
50
- one before the GFF file is used.
51
-
52
- Examples:
53
-
54
- Find mRNA and CDS information from test.gff3 (which includes sequence information)
55
-
56
- ./bin/gff3-fetch mRNA test/data/gff/test.gff3
57
- ./bin/gff3-fetch CDS test/data/gff/test.gff3
58
-
59
- Find mRNA from external FASTA file, without loading everythin in RAM
60
-
61
- ./bin/gff3-fetch --no-cache mRNA test/data/gff/test-ext-fasta.fa test/data/gff/test-ext-fasta.gff3
62
-
63
- If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
64
-
65
-