RubyGems - bio-blastxmlparser - Versions diffs - 1.1.0 → 1.1.1 - Mend

bio-blastxmlparser 1.1.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

data/README.rdoc +84 -84
data/Rakefile +1 -1
data/VERSION +1 -1
data/bin/blastxmlparser +3 -4
data/bio-blastxmlparser.gemspec +3 -3
data/lib/bio/db/blast/parser/nokogiri.rb +1 -1
data/lib/bio/db/blast/xmliterator.rb +1 -1
data/lib/bio/db/blast/xmlsplitter.rb +1 -1
data/sample/blastxmlparserdemo.rb +1 -1
data/spec/bio-blastxmlparser_spec.rb +6 -6
metadata +19 -18

data/README.rdoc CHANGED Viewed

@@ -28,10 +28,11 @@ see why libxml2 based Nokogiri is fast, see
 http://www.rubyinside.com/ruby-xml-performance-benchmarks-1641.html and
 http://www.xml.com/lpt/a/1703.
-The parser is also designed with other optimizations, such as lazy evaluation,
-only creating objects when required, and (in a future version) parallelization. When parsing
-a full BLAST result usually only a few fields are used. By using XPath queries
-only the relevant fields are queried.
+The parser is also designed with other optimizations, such as lazy
+evaluation, i.e. only creating objects when required, and (in a future
+version) parallelization. When parsing a full BLAST result usually
+only a few fields are used. By using XPath queries only the relevant
+fields are queried.
 Timings for parsing test/data/nt_example_blastn.m7 (file size 3.4Mb)
@@ -47,7 +48,7 @@ Timings for parsing test/data/nt_example_blastn.m7 (file size 3.4Mb)
   user    0m1.444s
   sys     0m0.160s
-  BioRuby ReXML DOM parser
+  BioRuby ReXML DOM parser (old style)
   real    1m14.548s
   user    1m13.065s
@@ -72,13 +73,88 @@ example on Debian:
 for more installation on other platforms see
 http://nokogiri.org/tutorials/installing_nokogiri.html.
+== Command line usage
+=== Usage
+  blastxmlparser [options] file(s)
+    -p, --parser name                Use full|split parser (default full)
+        --output-fasta               Output FASTA
+    -n, --named fields               Set named fields
+    -e, --exec filter                Execute filter
+        --logger filename            Log to file (default stderr)
+        --trace options              Set log level (default INFO, see bio-logger)
+    -q, --quiet                      Run quietly
+    -v, --verbose                    Run verbosely
+        --debug                      Show debug messages
+    -h, --help                       Show help and examples
+  bioblastxmlparser filename(s)
+    Use --help switch for more information
+=== Examples
+Print result fields of iterations containing 'lcl', using a regex
+  blastxmlparser -e 'iter.query_id=~/lcl/' test/data/nt_example_blastn.m7
+Print fields where bit_score > 145
+  blastxmlparser -e 'hsp.bit_score>145' test/data/nt_example_blastn.m7
+prints a tab delimited
+  1       1       lcl|1_0 lcl|I_74685     1       5.82208e-34
+  2       1       lcl|1_0 lcl|I_1 1       5.82208e-34
+  3       2       lcl|2_0 lcl|I_2 1       6.05436e-59
+  4       3       lcl|3_0 lcl|I_3 1       2.03876e-56
+The second and third column show the BLAST iteration, and the others
+relate to the hits.
+As this is evaluated Ruby, it is also possible to use the XML element
+names directly
+  blastxmlparser -e 'hsp["Hsp_bit-score"].to_i>145' test/data/nt_example_blastn.m7
+And it is possible to print (non default) named fields where E-value < 0.001
+and hit length > 100. E.g.
+  blastxmlparser -n 'hsp.evalue,hsp.qseq' -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
+  1       5.82208e-34     AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCT...
+  2       5.82208e-34     AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCT...
+  3       2.76378e-11     AATATGGTAGCTACAGAAACGGTAGTACACTCTTC
+  4       1.13373e-13     CTAAACACAGGAGCATATAGGTTGGCAGGCAGGCAAAAT
+  5       2.76378e-11     GAAGAGTGTACTACCGTTTCTGTAGCTACCATATT
+  etc. etc.
+prints the evalue and qseq columns. To output FASTA use --output-fasta
+  blastxmlparser --output-fasta -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
+which prints matching sequences, where the first field is the accession, followed
+by query iteration id, and hit_id. E.g.
+  >I_74685 1|lcl|1_0 lcl|I_74685 [57809 - 57666] (REVERSE SENSE)
+  AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTTAG
+  >I_1 1|lcl|1_0 lcl|I_1 [477 - 884]
+  AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTTAG
+  etc. etc.
+To use the low-mem (iterated slower) version of the parser use
+  blastxmlparser --parser split -n 'hsp.evalue,hsp.qseq' -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
 == API (Ruby library)
 To loop through a BLAST result:
     >> require 'bio-blastxmlparser'
     >> fn = 'test/data/nt_example_blastn.m7'
-    >>   n = Bio::Blast::XmlIterator.new(fn).to_enum
+    >>   n = Bio::BlastXMLParser::XmlIterator.new(fn).to_enum
     >>   n.each do | iter |
     >>     puts "Hits for " + iter.query_id
     >>     iter.each do | hit |
@@ -91,7 +167,7 @@ To loop through a BLAST result:
 The next example parses XML using less memory by using a Ruby
 Iterator
-    >> blast = Bio::Blast::XmlSplitterIterator.new(fn).to_enum
+    >> blast = Bio::BlastXMLParser::XmlSplitterIterator.new(fn).to_enum
     >> iter = blast.next
     >> iter.iter_num
     => 1
@@ -175,87 +251,11 @@ etc. etc.
 For more examples see the files in ./spec
-== Command line usage
-== Usage
-  blastxmlparser [options] file(s)
-    -p, --parser name                Use full|split parser (default full)
-        --output-fasta               Output FASTA
-    -n, --named fields               Set named fields
-    -e, --exec filter                Execute filter
-        --logger filename            Log to file (default stderr)
-        --trace options              Set log level (default INFO, see bio-logger)
-    -q, --quiet                      Run quietly
-    -v, --verbose                    Run verbosely
-        --debug                      Show debug messages
-    -h, --help                       Show help and examples
-  bioblastxmlparser filename(s)
-    Use --help switch for more information
-== Examples
-Print result fields of iterations containing 'lcl', using a regex
-  blastxmlparser -e 'iter.query_id=~/lcl/' test/data/nt_example_blastn.m7
-Print fields where bit_score > 145
-  blastxmlparser -e 'hsp.bit_score>145' test/data/nt_example_blastn.m7
-prints a tab delimited
-  1       1       lcl|1_0 lcl|I_74685     1       5.82208e-34
-  2       1       lcl|1_0 lcl|I_1 1       5.82208e-34
-  3       2       lcl|2_0 lcl|I_2 1       6.05436e-59
-  4       3       lcl|3_0 lcl|I_3 1       2.03876e-56
-The second and third column show the BLAST iteration, and the others
-relate to the hits.
-As this is evaluated Ruby, it is also possible to use the XML element
-names directly
-  blastxmlparser -e 'hsp["Hsp_bit-score"].to_i>145' test/data/nt_example_blastn.m7
-And it is possible to print (non default) named fields where E-value < 0.001
-and hit length > 100. E.g.
-  blastxmlparser -n 'hsp.evalue,hsp.qseq' -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
-  1       5.82208e-34     AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCT...
-  2       5.82208e-34     AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCT...
-  3       2.76378e-11     AATATGGTAGCTACAGAAACGGTAGTACACTCTTC
-  4       1.13373e-13     CTAAACACAGGAGCATATAGGTTGGCAGGCAGGCAAAAT
-  5       2.76378e-11     GAAGAGTGTACTACCGTTTCTGTAGCTACCATATT
-  etc. etc.
-prints the evalue and qseq columns. To output FASTA use --output-fasta
-  blastxmlparser --output-fasta -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
-which prints matching sequences, where the first field is the accession, followed
-by query iteration id, and hit_id. E.g.
-  >I_74685 1|lcl|1_0 lcl|I_74685 [57809 - 57666] (REVERSE SENSE)
-  AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTTAG
-  >I_1 1|lcl|1_0 lcl|I_1 [477 - 884]
-  AGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTTAG
-  etc. etc.
-To use the low-mem (iterated slower) version of the parser use
-  blastxmlparser --parser split -n 'hsp.evalue,hsp.qseq' -e 'hsp.evalue<0.01 and hit.len>100' test/data/nt_example_blastn.m7
 == URL
 The project lives at http://github.com/pjotrp/blastxmlparser. If you use this software, please cite http://dx.doi.org/10.1093/bioinformatics/btq475
 == Copyright
-Copyright (c) 2011 Pjotr Prins under the MIT licence.  See LICENSE.txt and http://www.opensource.org/licenses/mit-license.html for further details.
+Copyright (c) 2011,2012 Pjotr Prins under the MIT licence.  See LICENSE.txt and http://www.opensource.org/licenses/mit-license.html for further details.

data/Rakefile CHANGED Viewed

@@ -16,7 +16,7 @@ Jeweler::Tasks.new do |gem|
   gem.homepage = "http://github.com/pjotrp/blastxmlparser"
   gem.license = "MIT"
   gem.summary = %Q{Very fast BLAST XML parser and library for big data}
-  gem.description = %Q{Fast big data XML parser and library, libxml2 based 50x faster than BioRuby}
+  gem.description = %Q{Fast big data BLAST XML parser and library; this libxml2 based version is 50x faster than BioRuby}
   gem.email = "pjotr.public01@thebird.nl"
   gem.authors = ["Pjotr Prins"]
   # Include your dependencies below. Runtime dependencies are required when using your gem,

data/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 1.1.0
1	+ 1.1.1

data/bin/blastxmlparser CHANGED Viewed

@@ -2,10 +2,9 @@
 #
 # BioRuby bio-blastxmlparser Plugin
 # Author:: Pjotr Prins
-# Copyright:: 2011
 # License:: MIT License
 #
-# Copyright (C) 2010,2011 Pjotr Prins <pjotr.prins@thebird.nl>
+# Copyright (C) 2010-2013 Pjotr Prins <pjotr.prins@thebird.nl>
 rootpath = File.dirname(File.dirname(__FILE__))
 $: << File.join(rootpath,'lib')
@@ -160,9 +159,9 @@ begin
   ARGV.each do | fn |
     logger.info("XML parsing #{fn}")
     n = if options.parser == :split
-      Bio::Blast::XmlSplitterIterator.new(fn).to_enum
+      Bio::BlastXMLParser::XmlSplitterIterator.new(fn).to_enum
     else
-      Bio::Blast::XmlIterator.new(fn).to_enum
+      Bio::BlastXMLParser::XmlIterator.new(fn).to_enum
     end
     i = 1
     n.each do | iter |

data/bio-blastxmlparser.gemspec CHANGED Viewed

@@ -5,12 +5,12 @@
 Gem::Specification.new do |s|
   s.name = "bio-blastxmlparser"
-  s.version = "1.1.0"
+  s.version = "1.1.1"
   s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
   s.authors = ["Pjotr Prins"]
-  s.date = "2012-08-08"
-  s.description = "Fast big data XML parser and library, libxml2 based 50x faster than BioRuby"
+  s.date = "2013-02-07"
+  s.description = "Fast big data BLAST XML parser and library; this libxml2 based version is 50x faster than BioRuby"
   s.email = "pjotr.public01@thebird.nl"
   s.executables = ["blastxmlparser"]
   s.extra_rdoc_files = [

data/lib/bio/db/blast/parser/nokogiri.rb CHANGED Viewed

@@ -3,7 +3,7 @@ require 'nokogiri'
 require 'enumerator'
 module Bio
-  module Blast
+  module BlastXMLParser
     module XPath
       def field name

data/lib/bio/db/blast/xmliterator.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 module Bio
-  module Blast
+  module BlastXMLParser
     # Iterate a BLAST file yielding (lazy) results
     class XmlIterator

data/lib/bio/db/blast/xmlsplitter.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 require 'enumerator'
 module Bio
-  module Blast
+  module BlastXMLParser
     # Reads a full XML result and splits it out into a buffer for each
     # Iteration (query result).
     class XmlSplitterIterator

data/sample/blastxmlparserdemo.rb CHANGED Viewed

@@ -5,7 +5,7 @@ $: << File.join(rootpath,'lib')
 require 'bio-blastxmlparser'
 fn = 'test/data/nt_example_blastn.m7'
-n = Bio::Blast::XmlIterator.new(fn).to_enum
+n = Bio::BlastXMLParser::XmlIterator.new(fn).to_enum
 n.each do | iter |
   puts "Hits for " + iter.query_id
   iter.each do | hit |

data/spec/bio-blastxmlparser_spec.rb CHANGED Viewed

@@ -1,9 +1,9 @@
 require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
 TESTFILE = "./test/data/nt_example_blastn.m7"
-include Bio::Blast
+include Bio::BlastXMLParser
-describe "Bio::Blast::NokogiriBlastXml" do
+describe "Bio::BlastXMLParser::NokogiriBlastXml" do
   before(:all) do
     n = NokogiriBlastXml.new(File.new(TESTFILE)).to_enum
     @iter1 = n.next
@@ -75,8 +75,8 @@ describe "Bio::Blast::NokogiriBlastXml" do
   end
 end
-describe Bio::Blast::XmlIterator do
-  include Bio::Blast
+describe Bio::BlastXMLParser::XmlIterator do
+  include Bio::BlastXMLParser
   it "should parse with Nokogiri" do
     blast = XmlIterator.new(TESTFILE).to_enum
     iter1 = blast.next
@@ -86,8 +86,8 @@ describe Bio::Blast::XmlIterator do
   end
 end
-describe Bio::Blast::XmlSplitterIterator do
-  include Bio::Blast
+describe Bio::BlastXMLParser::XmlSplitterIterator do
+  include Bio::BlastXMLParser
   # it "should read a large file and yield Iterations" do
   #   s = XmlSplitter.new("./test/data/nt_example_blastn.m7")
   #   s.each do | result |

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: bio-blastxmlparser
 version: !ruby/object:Gem::Version
-  version: 1.1.0
+  version: 1.1.1
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-08-08 00:00:00.000000000Z
+date: 2013-02-07 00:00:00.000000000Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bio-logger
-  requirement: &14068420 !ruby/object:Gem::Requirement
+  requirement: &24214160 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -21,10 +21,10 @@ dependencies:
         version: 1.0.0
   type: :runtime
   prerelease: false
-  version_requirements: *14068420
+  version_requirements: *24214160
 - !ruby/object:Gem::Dependency
   name: nokogiri
-  requirement: &14067240 !ruby/object:Gem::Requirement
+  requirement: &24213120 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -32,10 +32,10 @@ dependencies:
         version: 1.5.0
   type: :runtime
   prerelease: false
-  version_requirements: *14067240
+  version_requirements: *24213120
 - !ruby/object:Gem::Dependency
   name: rake
-  requirement: &14066000 !ruby/object:Gem::Requirement
+  requirement: &24212220 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -43,10 +43,10 @@ dependencies:
         version: 0.9.2.2
   type: :development
   prerelease: false
-  version_requirements: *14066000
+  version_requirements: *24212220
 - !ruby/object:Gem::Dependency
   name: bundler
-  requirement: &14064760 !ruby/object:Gem::Requirement
+  requirement: &24211440 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -54,10 +54,10 @@ dependencies:
         version: '0'
   type: :development
   prerelease: false
-  version_requirements: *14064760
+  version_requirements: *24211440
 - !ruby/object:Gem::Dependency
   name: jeweler
-  requirement: &14063640 !ruby/object:Gem::Requirement
+  requirement: &24174660 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -65,10 +65,10 @@ dependencies:
         version: 1.8.4
   type: :development
   prerelease: false
-  version_requirements: *14063640
+  version_requirements: *24174660
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &14062500 !ruby/object:Gem::Requirement
+  requirement: &24173840 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -76,10 +76,10 @@ dependencies:
         version: 2.3.0
   type: :development
   prerelease: false
-  version_requirements: *14062500
+  version_requirements: *24173840
 - !ruby/object:Gem::Dependency
   name: rdoc
-  requirement: &14055400 !ruby/object:Gem::Requirement
+  requirement: &24173100 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ! '>='
@@ -87,8 +87,9 @@ dependencies:
         version: 2.4.2
   type: :development
   prerelease: false
-  version_requirements: *14055400
-description: Fast big data XML parser and library, libxml2 based 50x faster than BioRuby
+  version_requirements: *24173100
+description: Fast big data BLAST XML parser and library; this libxml2 based version
+  is 50x faster than BioRuby
 email: pjotr.public01@thebird.nl
 executables:
 - blastxmlparser
@@ -140,7 +141,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: -1696395694674995706
+      hash: -3287387609254152406
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements: