RubyGems - bio-bigbio - Versions diffs - 0.1.4 → 0.1.5 - Mend

bio-bigbio 0.1.4 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

data/.travis.yml +12 -0
data/LICENSE.txt +20 -0
data/README.md +147 -15
data/Rakefile +1 -0
data/VERSION +1 -1
data/bin/fasta_filter.rb +100 -0
data/bin/fasta_sort.rb +24 -0
data/bin/getorf +4 -8
data/bin/nt2aa.rb +3 -6
data/bio-bigbio.gemspec +9 -5
data/lib/bigbio/db/fasta/fastareader.rb +35 -0
data/lib/bigbio/db/fasta/fastarecord.rb +7 -1
data/lib/bigbio/db/phylip.rb +49 -0
data/spec/emitter_spec.rb +17 -0
metadata +23 -17
data/LICENSE +0 -34

data/.travis.yml ADDED Viewed

@@ -0,0 +1,12 @@
+language: ruby
+rvm:
+  - 1.9.2
+#  - 1.9.3
+#  - 1.8.7
+#  - jruby-19mode # JRuby in 1.9 mode
+#  - rbx-19mode
+#  - jruby-18mode # JRuby in 1.8 mode
+#  - rbx-18mode
+# uncomment this line if your project needs to run something other than `rake`:
+# script: bundle exec rspec spec

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,20 @@
+Copyright (c) 2011-2013 Pjotr Prins
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.md CHANGED Viewed

@@ -8,31 +8,119 @@ computing in biology.
 BigBio may use BioLib C/C++/D functions for increasing performance and
 reducing memory consumption.
-This is an experimental project. If you wish to contribute subscribe
-to the BioRuby and/or BioLib mailing lists.
+In a way, this is an experimental project. I use it for
+experimentation, but what is in here should work fine. If you wish to
+contribute subscribe to the BioRuby and/or BioLib mailing lists
+instead.
 # Overview
 * BigBio can translate nucleotide sequences to amino acid
   sequences using an EMBOSS C function, or BioRuby's translator.
+* BigBio has a terrific FASTA file emitter which iterates FASTA files and
+  iterates sequences without loading everything in memory. There is
+  also an indexed edition
+* BioBio has a flexible FASTA filter
 * BigBio has an ORF emitter which parses DNA/RNA sequences and emits
   ORFs between START_STOP or STOP_STOP codons.
-* BigBio has a FASTA file emitter, with iterates FASTA files and
-  iterates sequences without loading everything in memory.
+* BigBio has a Phylip (PAML style) emitter and writer
-# Examples
+# Installation
+The easy way
+```sh
+gem install bio-bigbio
+```
+in your code
+```ruby
+require 'bigbio'
+```
+# Command line tools
+Some functionality comes also as executable command line tools (see the
+./bin directory). Use the -h switch to get information. Current tools
+are
+1. getorf: fetch all areas between start-stop and stop-stop codons in six frames (using EMBOSS when biolib is available)
+2. nt2aa.rb: translate in six frames (using EMBOSS when biolib is available)
+3. fasta_filter.rb
+## Command line Fasta Filter
+The CLI filter accepts standard Ruby commands.
+Filter sequences that contain more than 25% C's
+```sh
+fasta_filter.rb --filter "rec.seq.count('C') > rec.seq.size*0.25" test/data/fasta/nt.fa
+```
+Look for IDs containing -126 and sequences ending on CCC
+```sh
+fasta_filter.rb --filter "rec.id =~ /-126/ or rec.seq =~ /CCC$/" test/data/fasta/nt.fa
+```
+Filter out all masked sequences that contain more than 10% masked
+nucleotides
+```sh
+fasta_filter.rb --filter "rec.seq.count('N')<rec.seq.size*0.10"
+```
+Next to rec.id and rec.seq, you have rec.descr and 'num' as variables,
+so to skip every other record
+```sh
+fasta_filter.rb --filter "num % 2 == 0"
+```
+Rewrite all sequences to lower case, you can use the useful rewrite
+option
+```sh
+fasta_filter.rb --rewrite 'rec.seq = rec.seq.downcase'
+```
+Filters and rewrites can be combined. The rest is up to your imagination!
+# API Examples
 ## Iterate through a FASTA file
 Read a file without loading the whole thing in memory
 ```ruby
+require 'bigbio'
 fasta = FastaReader.new(fn)
 fasta.each do | rec |
   print rec.descr,rec.seq
 end
 ```
+Since FastaReader parses the ID, write a tab file with id and sequence
+```ruby
+i = 1
+print "num\tid\tseq\n"
+FastaReader.new(fn).each do | rec |
+  if rec.id =~ /(AT\w+)/
+    print i,"\t",$1,"\t",rec.seq,"\n"
+    i += 1
+  end
+end
+```
+wich, for example, can be turned into RDF with the
+[bio-table](https://github.com/pjotrp/bioruby-table) biogem.
+## Write a FASTA file
 Write a FASTA file. The simple way
 ```ruby
@@ -60,6 +148,44 @@ fasta = FastaWriter.new(fn)
 fasta.write(mysequence)
 ```
+## Transform a FASTA file
+You can combine above FastaReader and FastaWriter to transform
+sequences, e.g.
+```ruby
+fasta = FastaWriter.new(in_fn)
+FastaReader.new(out_fn).each do | rec |
+  # Strip the description down to the second ID
+  (id1,id2) = /(\S+)\s+(\S+)/.match(rec.descr)
+  fasta.write(id2,rec.seq)
+end
+```
+The downside to this approach is the explicit file naming. What if you
+want to use STDIN or some other source instead? I have come round to
+the idea of using a combination of lambda and block. For example:
+```ruby
+  FastaReader::emit_fastarecord(-> {gets}) { |rec|
+    print FastaWriter.to_fasta(rec)
+  }
+```
+which takes STDIN line by line, and outputs FASTA on STDOUT. This is
+a better design as the FastaReader and FastaWriter know nothing of
+the mechanism fetching and displaying data. These can both be 'pure'
+functions. Note also that the data is never fully loaded into RAM.
+Here the transformer functional style
+```ruby
+  FastaReader::emit_fastarecord(-> {gets}) { |rec|
+    (id1,id2) = /(\S+)\s+(\S+)/.match(rec.descr)
+    print FastaWriter.to_fasta(id2,req.seq)
+  }
+```
 ## Fetch ORFs from a sequence
 BigBio can parse a sequence for ORFs. Together with the FastaReader
@@ -83,21 +209,27 @@ translate = Nucleotide::Translate.new(trn_table)
 aa_frames = translate.aa_6_frames("ATCATTAGCAACACCAGCTTCCTCTCTCTCGCTTCAAAGTTCACTACTCGTGGATCTCGT")
 ```
-# Install
+# Project home page
-The easy way
+Information on the source tree, documentation, examples, issues and
+how to contribute, see
-```sh
-gem install bio-bigbio
-```
+  http://github.com/pjotrp/bigbio
-in your code
+The BioRuby community is on IRC server: irc.freenode.org, channel: #bioruby.
-```ruby
-require 'bigbio'
-```
+# Cite
+If you use this software, please cite one of
+* [BioRuby: bioinformatics software for the Ruby programming language](http://dx.doi.org/10.1093/bioinformatics/btq475)
+* [Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics](http://dx.doi.org/10.1093/bioinformatics/bts080)
+# Biogems.info
+This Biogem is published at [#bio-table](http://biogems.info/index.html)
 # Copyright
-Copyright (c) 2011-2012 Pjotr Prins. See LICENSE for further details.
+Copyright (c) 2011-2013 Pjotr Prins. See LICENSE for further details.

data/Rakefile CHANGED Viewed

@@ -37,6 +37,7 @@ RSpec::Core::RakeTask.new(:rcov) do |spec|
   spec.rcov = true
 end
+task :test => :spec
 task :default => :spec
 require 'rake/rdoctask'

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 0.1.4
1	+ 0.1.5

data/bin/fasta_filter.rb ADDED Viewed

@@ -0,0 +1,100 @@
+#! /usr/bin/env ruby
+#
+# Filter for FASTA files
+#
+$: << File.dirname(__FILE__)+'/../lib'
+require 'bigbio'
+require 'optparse'
+require 'ostruct'
+class OptParser
+  #
+  # Return a structure describing the options.
+  #
+  def self.parse(args)
+    # The options specified on the command line will be collected in *options*.
+    # We set default values here.
+    options = OpenStruct.new
+    options.codonize = false
+    options.verbose = false
+    opt_parser = OptionParser.new do |opts|
+      opts.banner = "Usage: fasta_filter.rb [options]"
+      opts.separator ""
+      opts.separator "Specific options:"
+      opts.on("--filter expression","Filter on Ruby expression") do |expr|
+        options.filter = expr
+      end
+      opts.on("--rewrite expression","Rewrite expression") do |expr|
+        options.rewrite = expr
+      end
+      opts.on("--codonize",
+              "Trim sequence to be at multiple of 3 nucleotides") do |b|
+        options.codonize = b
+      end
+      opts.on("--min size",
+              "Set minimum sequence size") do |min|
+        options.min = min.to_i
+      end
+      opts.on("--id","Write out ID only") do |b|
+        options.id = b
+      end
+      opts.on("-v", "--[no-]verbose", "Run verbosely") do |v|
+        options.verbose = v
+      end
+      opts.separator ""
+      opts.separator "Examples:"
+      opts.separator ""
+      opts.separator "  fasta_filter.rb --filter \"rec.id =~ /-126/ or rec.seq =~ /CCC$/\" test/data/fasta/nt.fa"
+      opts.separator "  fasta_filter.rb --filter \"rec.seq.count('C') > rec.seq.size*0.25\" test/data/fasta/nt.fa"
+      opts.separator "  fasta_filter.rb --filter \"rec.descr =~ /C. elegans/\" test/data/fasta/nt.fa"
+      opts.separator "  fasta_filter.rb --filter \"num % 2 == 0\" test/data/fasta/nt.fa"
+      opts.separator "  fasta_filter.rb test/data/fasta/nt.fa --rewrite 'rec.seq.downcase!'"
+      opts.separator ""
+      opts.separator "Other options:"
+      opts.separator ""
+      opts.on_tail("-h", "--help", "Show this message") do
+        puts opts
+        exit
+      end
+    end
+    opt_parser.parse!(args)
+    options
+  end  # parse()
+end  # class OptParser
+options = OptParser.parse(ARGV)
+num = -1
+FastaReader::emit_fastarecord(-> { ARGF.gets }) { | rec |
+  num += 1
+  # --- Filtering
+  next if options.filter and not eval(options.filter)
+  if options.codonize
+    # --- Round sequence to nearest 3 nucleotides
+    size = rec.seq.size
+    rec.seq = rec.seq[0..size - (size % 3) - 1]
+  end
+  # --- Only use sequences from MIN size
+  next if options.min and rec.seq.size < options.min
+  # --- Truncate description to ID
+  rec.descr = rec.id if options.id
+  # --- rewrite
+  eval(options.rewrite) if options.rewrite
+  print rec.to_fasta
+}

data/bin/fasta_sort.rb ADDED Viewed

@@ -0,0 +1,24 @@
+#!/usr/bin/env ruby
+#
+# fasta_sort: Sorts a FASTA file and outputs sorted unique records as FASTA again
+#
+# Usage:
+#
+#   fasta_sort inputfile(s)
+require 'bio'
+include Bio
+table = Hash.new
+ARGV.each do | fn |
+  Bio::FlatFile.auto(fn).each do | seq |
+    table[seq.definition] ||= seq.data
+  end
+end
+table.sort.each do | definition, data |
+  rec = Bio::FastaFormat.new('> '+definition.strip+"\n"+data)
+  print rec
+end

data/bin/getorf CHANGED Viewed

@@ -6,12 +6,8 @@
 # (aa_heuristic.fa and nt_heuristic.fa respectively)
 #
 # You can choose the heuristic on the command line (default stopstop).
-#
-# Author:: Pjotr Prins
-# Copyright:: 2009-2011
-# License:: Ruby License
-#
-# Copyright (C) 2009-2011 Pjotr Prins <pjotr.prins@thebird.nl>
+$stderr.print "WARNING: This tool has one or more known bugs! Better use the EMBOSS getorf instead for now\n"
 rootpath = File.dirname(File.dirname(__FILE__))
 $: << File.join(rootpath,'lib')
@@ -48,10 +44,10 @@ EXAMPLE
     exit()
   }
-  opts.on("-h heuristic", String, "Heuristic (stopstop)") do | s |
+  opts.on("-h heuristic", String, "Heuristic (default #{heuristic})") do | s |
     heuristic = s
   end
-  opts.on("-s size", "--min-size", Integer, "Minimal sequence size") do | n |
+  opts.on("-s size", "--min-size", Integer, "Minimal sequence size (default #{minsize})") do | n |
     minsize = n
   end
   opts.on("--longest", "Only get longest ORF match") do

data/bin/nt2aa.rb CHANGED Viewed

@@ -3,11 +3,6 @@
 # Translate nucleotide sequences into aminoacids sequences in all
 # reading frames.
 #
-#
-# (: pjotrp 2009, 2012 rblicense :)
-#
-# Copyright (C) 2012 Pjotr Prins <pjotr.prins@thebird.nl>
 USAGE =<<EOM
   ruby #{__FILE__} [--six-frame] inputfile(s)
 EOM
@@ -44,7 +39,9 @@ ARGV.each do | fn |
         # ajpseqt  = Biolib::Emboss.ajTrnSeqOrig(trnTable,ajpseq,frame)
         # aa       = Biolib::Emboss.ajSeqGetSeqCopyC(ajpseqt)
-        print "> ",rec.descr," [",frame.to_s,"]\n"
+        print ">",rec.descr
+        print " [",frame.to_s,"]" if do_sixframes
+        print "\n"
         print aa,"\n"
     end
   }

data/bio-bigbio.gemspec CHANGED Viewed

@@ -5,25 +5,28 @@
 Gem::Specification.new do |s|
   s.name = "bio-bigbio"
-  s.version = "0.1.4"
+  s.version = "0.1.5"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.authors = ["Pjotr Prins"]
-  s.date = "2012-02-03"
+  s.date = "2013-05-03"
   s.description = "Fasta reader, ORF emitter, sequence translation"
   s.email = "pjotr.public01@thebird.nl"
-  s.executables = ["getorf", "nt2aa.rb"]
+  s.executables = ["fasta_filter.rb", "fasta_sort.rb", "getorf", "nt2aa.rb"]
   s.extra_rdoc_files = [
-    "LICENSE",
+    "LICENSE.txt",
     "README.md"
   ]
   s.files = [
+    ".travis.yml",
     "Gemfile",
     "Gemfile.lock",
-    "LICENSE",
+    "LICENSE.txt",
     "README.md",
     "Rakefile",
     "VERSION",
+    "bin/fasta_filter.rb",
+    "bin/fasta_sort.rb",
     "bin/getorf",
     "bin/nt2aa.rb",
     "bio-bigbio.gemspec",
@@ -42,6 +45,7 @@ Gem::Specification.new do |s|
     "lib/bigbio/db/fasta/fastarecord.rb",
     "lib/bigbio/db/fasta/fastawriter.rb",
     "lib/bigbio/db/fasta/indexer.rb",
+    "lib/bigbio/db/phylip.rb",
     "lib/bigbio/environment.rb",
     "lib/bigbio/sequence/predictorf.rb",
     "lib/bigbio/sequence/translate.rb",

data/lib/bigbio/db/fasta/fastareader.rb CHANGED Viewed

@@ -130,3 +130,38 @@ class FastaReader
   end
 end
+# The following is actually a module/trait implementation without state
+class FastaReader
+  # func passes in a FASTA buffer. Every time a record is parsed it is
+  # yielded.
+  #
+  def FastaReader::emit getbuf_func
+    seq = ""
+    id = nil
+    descr = nil
+    while buf = getbuf_func.call
+      buf.split(/\n/).each do | line |
+        if line =~ /^>/
+          yield id, descr, seq if descr
+          descr = line[1..-1].strip
+          matched = /^(\S+)/.match(descr)
+          id = matched[0]
+          seq = ""
+        else
+          seq += line.strip
+        end
+      end
+    end
+    yield id, descr, seq if descr and seq.size > 0
+  end
+  def FastaReader::emit_fastarecord getbuf_func
+    emit(getbuf_func) do | id, descr, seq |
+      yield FastaRecord.new(id, descr, seq)
+    end
+  end
+end

data/lib/bigbio/db/fasta/fastarecord.rb CHANGED Viewed

@@ -7,6 +7,10 @@ class FastaRecord
     @descr = descr
     @seq = seq
   end
+  def to_fasta
+    ">"+@descr+"\n"+@seq+"\n"
+  end
 end
 class FastaPairedRecord
@@ -30,7 +34,9 @@ class FastaPairedRecord
     if nt.seq.size == aa.seq.size*3-3
       aa.seq.chop!
     end
-    raise "Sequence size mismatch for #{nt.id} <nt:#{nt.seq.size} != #{aa.seq.size*3} (aa:#{aa.seq.size}*3)>" if nt.seq.size != aa.seq.size*3
+    nt_size = nt.seq.size
+    expected_size = aa.seq.size*3
+    # raise "Sequence size mismatch for #{nt.id} <nt:#{nt.seq.size} != #{aa.seq.size*3} (aa:#{aa.seq.size}*3)>" if expected_size - 3 > nt_size and  nt_size > expected_size + 3
   end
   def id

data/lib/bigbio/db/phylip.rb ADDED Viewed

@@ -0,0 +1,49 @@
+# Simple phylip reader. Supports PAML style files formatted as
+#
+# sequence 1
+# AAGCTTCACCGGCGCAGTCATTCTCATAAT
+# CGCCCACGGACTTACATCCTCATTACTATT
+# sequence 2
+# AAGCTTCACCGGCGCAATTATCCTCATAAT
+# CGCCCACGGACTTACATCCTCATTATTATT
+# sequence 3
+# AAGCTTCACCGGCGCAGTTGTTCTTATAAT
+# TGCCCACGGACTTACATCATCATTATTATT
+# sequence 4
+# AAGCTTCACCGGCGCAACCACCCTCATGAT
+# TGCCCATGGACTCACATCCTCCCTACTGTT
+module Bio
+  module Big
+    module PhylipReader
+      # Define get_line as a lambda function, e.g.
+      #   Bio::Big::PhylipReader.emit_seq(-> { lines.next }) { | name, seq | p [name,seq] }
+      def PhylipReader::emit_seq get_line
+        line = get_line.call.strip
+        a = line.split
+        seq_num = a[0].to_i
+        seq_size = a[1].to_i
+        name = nil
+        seq = ""
+        while true
+          line = get_line.call
+          break if line == nil or line == ""
+          line = line.strip
+          if name == nil
+            name = line
+            next
+          end
+          seq += line
+          if seq.size >= seq_size
+            raise "Name wrong size for #{name}" if name.size > 20
+            raise "Sequence wrong size for #{name}" if seq.size > seq_size
+            yield name, seq
+            name = nil
+            seq = ""
+          end
+        end
+      end
+    end
+  end
+end

data/spec/emitter_spec.rb CHANGED Viewed

@@ -20,6 +20,23 @@ describe Bio::Big::FastaEmitter, "when using the emitter" do
     end
   end
+  it "should emit functional style" do
+    count = 0
+    FastaReader::emit_fastarecord(-> { File.open("test/data/fasta/nt.fa").read }) { |rec|
+      case count
+        when 0
+          rec.id.should == "PUT-157a-Arabidopsis_thaliana-1"
+          rec.seq[0..10].should == "AGGTTCGNACG"
+        when 1
+          rec.id.should == "PUT-157a-Arabidopsis_thaliana-2"
+          rec.seq[0..10].should == "AGACAAACGAC"
+        else
+          break
+      end
+      count += 1
+    }
+  end
   it "should emit large parts" do
     FastaEmitter.new("test/data/fasta/nt.fa").emit_seq do | part, index, tag, seq |
       # p [index, part, tag, seq]

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: bio-bigbio
 version: !ruby/object:Gem::Version
-  version: 0.1.4
+  version: 0.1.5
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-02-03 00:00:00.000000000Z
+date: 2013-05-03 00:00:00.000000000Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bio
-  requirement: &15446660 !ruby/object:Gem::Requirement
+  requirement: &27203900 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
         version: 1.4.1
   type: :runtime
   prerelease: false
-  version_requirements: *15446660
+  version_requirements: *27203900
 - !ruby/object:Gem::Dependency
   name: bio-logger
-  requirement: &15445800 !ruby/object:Gem::Requirement
+  requirement: &27203120 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
         version: 0.9.0
   type: :runtime
   prerelease: false
-  version_requirements: *15445800
+  version_requirements: *27203120
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &15445180 !ruby/object:Gem::Requirement
+  requirement: &27202300 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -43,10 +43,10 @@ dependencies:
         version: 2.3.0
   type: :development
   prerelease: false
-  version_requirements: *15445180
+  version_requirements: *27202300
 - !ruby/object:Gem::Dependency
   name: bundler
-  requirement: &15444540 !ruby/object:Gem::Requirement
+  requirement: &27201380 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -54,10 +54,10 @@ dependencies:
         version: 1.0.0
   type: :development
   prerelease: false
-  version_requirements: *15444540
+  version_requirements: *27201380
 - !ruby/object:Gem::Dependency
   name: jeweler
-  requirement: &15443800 !ruby/object:Gem::Requirement
+  requirement: &27200760 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -65,10 +65,10 @@ dependencies:
         version: 1.5.2
   type: :development
   prerelease: false
-  version_requirements: *15443800
+  version_requirements: *27200760
 - !ruby/object:Gem::Dependency
   name: rcov
-  requirement: &15440240 !ruby/object:Gem::Requirement
+  requirement: &27199840 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -76,23 +76,28 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *15440240
+  version_requirements: *27199840
 description: Fasta reader, ORF emitter, sequence translation
 email: pjotr.public01@thebird.nl
 executables:
+- fasta_filter.rb
+- fasta_sort.rb
 - getorf
 - nt2aa.rb
 extensions: []
 extra_rdoc_files:
-- LICENSE
+- LICENSE.txt
 - README.md
 files:
+- .travis.yml
 - Gemfile
 - Gemfile.lock
-- LICENSE
+- LICENSE.txt
 - README.md
 - Rakefile
 - VERSION
+- bin/fasta_filter.rb
+- bin/fasta_sort.rb
 - bin/getorf
 - bin/nt2aa.rb
 - bio-bigbio.gemspec
@@ -111,6 +116,7 @@ files:
 - lib/bigbio/db/fasta/fastarecord.rb
 - lib/bigbio/db/fasta/fastawriter.rb
 - lib/bigbio/db/fasta/indexer.rb
+- lib/bigbio/db/phylip.rb
 - lib/bigbio/environment.rb
 - lib/bigbio/sequence/predictorf.rb
 - lib/bigbio/sequence/translate.rb
@@ -139,7 +145,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: -2436097965031091716
+      hash: 2941883289909211187
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:

data/LICENSE DELETED Viewed

@@ -1,34 +0,0 @@
-If a license is not specified the code contributed to BioBig defaults to the
-BSD license:
-Copyright (c) 2008, 2009 The BioLib Project
-All rights reserved.
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
-    * Redistributions of source code must retain the above copyright notice,
-      this list of conditions and the following disclaimer.
-    * Redistributions in binary form must reproduce the above copyright notice,
-      this list of conditions and the following disclaimer in the documentation
-      and/or other materials provided with the distribution.
-    * Neither the name of the The BioLib Project nor the names of
-      its contributors may be used to endorse or promote products derived from
-      this software without specific prior written permission.
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
-ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
-(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
-LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
-ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-For more information on opensource software licenses see
-http://www.opensource.org/licenses/bsd-license.php,
-http://www.gnu.org/licenses/gpl.html and http://www.fsf.org/.