RubyGems - xml_node_stream - Versions diffs - 1.0.0 - Mend

xml_node_stream 1.0.0

Files changed (21) hide show

data/MIT_LICENSE +20 -0
data/README.rdoc +61 -0
data/Rakefile +42 -0
data/VERSION +1 -0
data/init.rb +1 -0
data/lib/xml_node_stream.rb +10 -0
data/lib/xml_node_stream/node.rb +130 -0
data/lib/xml_node_stream/parser.rb +71 -0
data/lib/xml_node_stream/parser/base.rb +40 -0
data/lib/xml_node_stream/parser/libxml_parser.rb +44 -0
data/lib/xml_node_stream/parser/nokogiri_parser.rb +50 -0
data/lib/xml_node_stream/parser/rexml_parser.rb +43 -0
data/lib/xml_node_stream/selector.rb +72 -0
data/spec/node_spec.rb +140 -0
data/spec/parser_spec.rb +148 -0
data/spec/selector_spec.rb +73 -0
data/spec/spec_helper.rb +3 -0
data/spec/test.xml +57 -0
data/spec/xml_node_stream_spec.rb +11 -0
data/xml_node_stream.gemspec +68 -0
metadata +97 -0

data/MIT_LICENSE ADDED Viewed

@@ -0,0 +1,20 @@
+Copyright (c) 2010 Brian Durand
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+"Software"), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
+LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
+OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/README.rdoc ADDED Viewed

@@ -0,0 +1,61 @@
+= XML Node Stream
+This gem provides a very easy to use XML parser the provides the benefits of both stream parsing (i.e. SAX) and document parsing (i.e. DOM). In addition, it provides a unified parsing language for each of the major Ruby XML parsers (REXML, Nokogiri, and LibXML) so that your code doesn't have to be bound to a particular XML library.
+== Stream Parsing
+The primary purpose of this gem is to facilitate parsing large XML files (i.e. several megabytes in size). Often, reading these files into a document structure is not feasible because the whole document must be read into memory. Stream/SAX parsing solves this issue by reading in the file incrementally and providing callbacks for various events. This method can be quite painful to deal with for any sort of complex document structure.
+This gem attempts to solve both of these issues by combining the best features of both. Parsing is performed by a stream parser which construct document style nodes and calls back to the application code with these nodes. When your application is done with a node, it can release it to free up memory and keep your heap from bloating.
+In order to keep the interface simple and universal, only XML elements and text nodes are supported. XML processing instructions and comments will be ignored.
+== Examples
+Suppose we have file with every book in the world in it:
+  <books>
+    <book isbn="123456">
+      <title>Moby Dick</title>
+      <author>Herman Melville</author>
+      <categories>
+        <category>Fiction</category>
+        <category>Adventure</category>
+      </categories>
+    </book>
+    <book isbn="98765643">
+      <title>The Decline and Fall of the Roman Empire</title>
+      <author>Edward Gibbon</author>
+      <category>
+        <category>History</category>
+        <category>Ancient</category>
+      </categories>
+    </book>
+    ...
+  </books>
+And we want to get them into our Books data model:
+  XmlNodeStream.parse('/tmp/books.xml') do |node|
+    if node.path == '/books/book'
+      book = Book.new
+      book.isbn = node['isbn']
+      book.title = node.find('title').value
+      book.author = node.find('author/text()')
+      book.categories = node.select('categories/category/text()')
+      book.save
+      node.release!
+    end
+  end
+== Releasing Nodes
+In the above example, what prevents memory bloat when parsing a large document is the call to node.release!. This call will remove the node from the node tree. The general practice is to look for the higher level nodes you are interested in and then release them immediately. If there are nodes you don't care about at all, those can be released immediately as well.
+A sample 77Mb XML document parsed into Nokogiri consumes over 800Mb of memory. Parsing the same document with XmlNodeStream and releasing top level nodes as they're processed uses less than 1Mb.
+== XPath
+You can use a subset of the XPath language to navigate nodes. The only parts of XPath implemented are the paths themselves and the text() function. The text() function is useful for getting the value of node directly from the find or select methods without having to do a nil check on the nodes. For instance, in the above example we can get the name of an author with node.find('author/text()') instead of node.find('author').value if node.find('author').
+The rest of the XPath language is not implemented since it is a programming language and there is really no need for it since we already have Ruby at our disposal which is far more powerful than XPath. See the Selector class for details.

data/Rakefile ADDED Viewed

@@ -0,0 +1,42 @@
+require 'rubygems'
+require 'rake'
+require 'rake/rdoctask'
+desc 'Default: run unit tests.'
+task :default => :test
+begin
+  require 'spec/rake/spectask'
+  desc 'Test xml_node_stream.'
+  Spec::Rake::SpecTask.new(:test) do |t|
+    t.spec_files = 'spec/**/*_spec.rb'
+  end
+rescue LoadError
+  tast :test do
+    STDERR.puts "You must have rspec >= 1.2.9 to run the tests"
+  end
+end
+desc 'Generate documentation for xml_node_stream.'
+Rake::RDocTask.new(:rdoc) do |rdoc|
+  rdoc.rdoc_dir = 'rdoc'
+  rdoc.options << '--title' << 'XML Node Stream' << '--line-numbers' << '--inline-source' << '--main' << 'README.rdoc'
+  rdoc.rdoc_files.include('README.rdoc')
+  rdoc.rdoc_files.include('lib/**/*.rb')
+end
+begin
+  require 'jeweler'
+  Jeweler::Tasks.new do |gem|
+    gem.name = "xml_node_stream"
+    gem.summary = %Q{Simple XML parser wrapper that provides the benefits of stream parsing with the ease of using document nodes.}
+    gem.email = "brian@embellishedvisions.com"
+    gem.homepage = "http://github.com/bdurand/xml_node_stream"
+    gem.authors = ["Brian Durand"]
+    gem.add_development_dependency('rspec', '>= 1.2.9')
+    gem.add_development_dependency('jeweler')
+  end
+  Jeweler::GemcutterTasks.new
+rescue LoadError
+end

data/VERSION ADDED Viewed

	@@ -0,0 +1 @@
1	+ 1.0.0

data/init.rb ADDED Viewed

	@@ -0,0 +1 @@
1	+ require "#{File.dirname(__FILE__)}/lib/xml_node_stream"

data/lib/xml_node_stream.rb ADDED Viewed

@@ -0,0 +1,10 @@
+require File.expand_path(File.join(File.dirname(__FILE__), 'xml_node_stream', 'node'))
+require File.expand_path(File.join(File.dirname(__FILE__), 'xml_node_stream', 'parser'))
+require File.expand_path(File.join(File.dirname(__FILE__), 'xml_node_stream', 'selector'))
+module XmlNodeStream
+  # Helper method to parse XML. See Parser#parse for details.
+  def self.parse (io, &block)
+    Parser.parse(io, &block)
+  end
+end

data/lib/xml_node_stream/node.rb ADDED Viewed

@@ -0,0 +1,130 @@
+module XmlNodeStream
+  # Representation of an XML node.
+  class Node
+    attr_reader :name, :parent
+    attr_accessor :value
+    def initialize (name, parent = nil, attributes = nil, value = nil)
+      @name = name
+      @attributes = attributes
+      @parent = parent
+      @parent.add_child(self) if @parent
+      @value = value
+    end
+    # Release a node by removing it from the tree structure so that the Ruby garbage collector can reclaim the memory.
+    # This method should be called after you are done with a node. After it is called, the node will be removed from
+    # its parent's children and will no longer be accessible.
+    def release!
+      @parent.remove_child(self) if @parent
+    end
+    # Array of the child nodes of the node.
+    def children
+      @children ||= []
+    end
+    # Array of all descendants of the node.
+    def descendants
+      if children.empty?
+        return children
+      else
+        return (children + children.collect{|child| child.descendants}).flatten
+      end
+    end
+    # Array of all ancestors of the node.
+    def ancestors
+      if @parent
+        return [@parent] + @parent.ancestors
+      else
+        return []
+      end
+    end
+    # Get the attributes of the node as a hash.
+    def attributes
+      @attributes ||= {}
+    end
+    # Get the root element of the node tree.
+    def root
+      @parent ? @parent.root : self
+    end
+    # Get the full XPath of the node.
+    def path
+      unless @path
+        if @parent
+          @path = "#{@parent.path}/#{@name}"
+        else
+          @path = "/#{@name}"
+        end
+      end
+      return @path
+    end
+    # Get the value of the node attribute with the given name.
+    def [] (name)
+      return @attributes[name] if @attributes
+    end
+    # Set the value of the node attribute with the given name.
+    def []= (name, val)
+      attributes[name] = val
+    end
+    # Add a child node.
+    def add_child (node)
+      children << node
+      node.instance_variable_set(:@parent, self)
+    end
+    # Remove a child node.
+    def remove_child (node)
+      if @children
+        if @children.delete(node)
+          node.instance_variable_set(:@parent, nil)
+        end
+      end
+    end
+    # Get the first child node.
+    def first_child
+      @children.first if @children
+    end
+    # Find the first node that matches the given XPath. See Selector for details.
+    def find (selector)
+      select(selector).first
+    end
+    # Find all nodes that match the given XPath. See Selector for details.
+    def select (selector)
+      selector = selector.is_a?(Selector) ? selector : Selector.new(selector)
+      return selector.find(self)
+    end
+    # Append CDATA to the node value.
+    def append_cdata (text)
+      append(text, false)
+    end
+    # Append text to the node value. If strip_whitespace is true, whitespace at the beginning and end
+    # of the node value will be removed.
+    def append (text, strip_whitespace = true)
+      if text
+        @value ||= ''
+        @last_strip_whitespace = strip_whitespace
+        text = text.lstrip if @value.length == 0 and strip_whitespace
+        @value << text if text.length > 0
+      end
+    end
+    # Called after end tag to ensure that whitespace at the end of the string is properly stripped.
+    def finish! #:nodoc
+      @value.rstrip! if @value and @last_strip_whitespace
+    end
+  end
+end

data/lib/xml_node_stream/parser.rb ADDED Viewed

@@ -0,0 +1,71 @@
+require 'open-uri'
+require 'rubygems'
+require 'pathname'
+require File.expand_path(File.join(File.dirname(__FILE__), 'parser', 'base'))
+module XmlNodeStream
+  # The abstract parser class that wraps the actual parser implementation.
+  class Parser
+    SUPPORTED_PARSERS = [:nokogiri, :libxml, :rexml]
+    class << self
+      # Set the parser implementation. The parser argument should be one of :nokogiri, :libxml, or :rexml. If this method
+      # is not called, it will default to :rexml which is the slowest choice possible. If you set the parser to one of the
+      # other values, though, you'll need to make sure you have the nokogiri gem or libxml-ruby gem installed.
+      def parser_name= (parser)
+        parser_sym = parser.to_sym
+        raise ArgumentError.new("must be one of #{SUPPORTED_PARSERS.inspect}") unless SUPPORTED_PARSERS.include?(parser_sym)
+        @parser_name = parser_sym
+      end
+      # Get the name of the current parser.
+      def parser_name
+        @parser_name ||= :rexml
+      end
+      # Parse the document specified in io. This can be either a Stream, URI, Pathname, or String. If it is a String,
+      # it can either be a XML document, file system path, or URI. The parser will figure it out. If a block is given,
+      # it will be yielded to with each node as it is parsed.
+      def parse (io, &block)
+        close_stream = false
+        if io.is_a?(String)
+          if io.include?('<') and io.include?('>')
+            io = StringIO.new(io)
+          else
+            io = open(io)
+          end
+          close_stream = true
+        elsif io.is_a?(Pathname)
+          io = io.open
+          close_stream = true
+        elsif io.is_a?(URI)
+          io = io.open
+          close_stream = true
+        end
+        begin
+          parser = parser_class(parser_name).new(&block)
+          parser.parse_stream(io)
+          return parser.root
+        ensure
+          io.close if close_stream
+        end
+      end
+      protected
+      def parser_class (class_symbol)
+        @loaded_parsers ||= {}
+        klass = @loaded_parsers[class_symbol]
+        unless klass
+          require File.expand_path(File.join(File.dirname(__FILE__), 'parser', "#{class_symbol}_parser"))
+          class_name = "#{class_symbol.to_s.capitalize}Parser"
+          klass = const_get(class_name)
+          @loaded_parsers[class_symbol] = klass
+        end
+        return klass
+      end
+    end
+  end
+end

data/lib/xml_node_stream/parser/base.rb ADDED Viewed

@@ -0,0 +1,40 @@
+module XmlNodeStream
+  class Parser
+    # This is the base parser syntax that normalizes the SAX callbacks by providing a common interface
+    # so that the actual parser implementation doesn't matter.
+    module Base
+      attr_reader :root
+      def initialize (&block)
+        @nodes = []
+        @parse_block = block
+        @root = nil
+      end
+      def parse_stream (io)
+        raise NotImplementedError.new("could not load gem")
+      end
+      def do_start_element (name, attributes)
+        node = XmlNodeStream::Node.new(name, @nodes.last, attributes)
+        @nodes.push(node)
+      end
+      def do_end_element (name)
+        node = @nodes.pop
+        node.finish!
+        @root = node if @nodes.empty?
+        @parse_block.call(node) if @parse_block
+      end
+      def do_characters (characters)
+        @nodes.last.append(characters) unless @nodes.empty?
+      end
+      def do_cdata_block (characters)
+        @nodes.last.append_cdata(characters) unless @nodes.empty?
+      end
+    end
+  end
+end

data/lib/xml_node_stream/parser/libxml_parser.rb ADDED Viewed

@@ -0,0 +1,44 @@
+begin
+  require 'libxml'
+  module XmlNodeStream
+    class Parser
+      # Wrapper for the LibXML SAX parser.
+      class LibxmlParser
+        include LibXML::XML::SaxParser::Callbacks
+        include Base
+        def parse_stream (io)
+          parser = LibXML::XML::SaxParser.new
+          parser.callbacks = self
+          parser.io = io
+          parser.parse
+        end
+        def on_start_element (name, attributes)
+          do_start_element(name, attributes)
+        end
+        def on_end_element (name)
+          do_end_element(name)
+        end
+        def on_characters (characters)
+          do_characters(characters)
+        end
+        def on_cdata_block (characters)
+          do_cdata_block(characters)
+        end
+      end
+    end
+  end
+rescue LoadError
+  module XmlNodeStream
+    class Parser
+      class LibxmlParser
+        include Base
+      end
+    end
+  end
+end

data/lib/xml_node_stream/parser/nokogiri_parser.rb ADDED Viewed

@@ -0,0 +1,50 @@
+begin
+  require 'nokogiri'
+  module XmlNodeStream
+    class Parser
+      # Wrapper for the Nokogiri SAX parser.
+      class NokogiriParser
+        include Base
+        def parse_stream (io)
+          listener = Listener.new(self)
+          parser = Nokogiri::XML::SAX::Parser.new(listener)
+          parser.parse(io)
+        end
+        class Listener < Nokogiri::XML::SAX::Document
+          def initialize (parser)
+            @parser = parser
+          end
+          def start_element (name, attributes = [])
+            attributes_hash = {}
+            (attributes.size / 2).times{|i| attributes_hash[attributes[i * 2]] = attributes[(i * 2) + 1]}
+            @parser.do_start_element(name, attributes_hash)
+          end
+          def end_element (name)
+            @parser.do_end_element(name)
+          end
+          def characters (characters)
+            @parser.do_characters(characters)
+          end
+          def cdata_block (characters)
+            @parser.do_cdata_block(characters)
+          end
+        end
+      end
+    end
+  end
+rescue LoadError
+  module XmlNodeStream
+    class Parser
+      class NokogiriParser
+        include Base
+      end
+    end
+  end
+end

data/lib/xml_node_stream/parser/rexml_parser.rb ADDED Viewed

@@ -0,0 +1,43 @@
+begin
+  require 'rexml/document'
+  require 'rexml/streamlistener'
+  module XmlNodeStream
+    class Parser
+      # Wrapper for the REXML SAX parser.
+      class RexmlParser
+        include REXML::StreamListener
+        include Base
+        def parse_stream (io)
+          parser = REXML::Parsers::StreamParser.new(io, self)
+          parser.parse
+        end
+        def tag_start (name, attributes)
+          do_start_element(name, attributes)
+        end
+        def tag_end (name)
+          do_end_element(name)
+        end
+        def text (content)
+          do_characters(content)
+        end
+        def cdata (content)
+          do_cdata_block(content)
+        end
+      end
+    end
+  end
+rescue LoadError
+  module XmlNodeStream
+    class Parser
+      class RexmlParser
+        include Base
+      end
+    end
+  end
+end

data/lib/xml_node_stream/selector.rb ADDED Viewed

@@ -0,0 +1,72 @@
+module XmlNodeStream
+  # Partial implementation of XPath selectors. Only abbreviated paths and the text() function are supported. The rest of XPath
+  # is unecessary in the context of a Ruby application since XPath is also a programming language. If you really need an XPath
+  # function, chances are you can just do it in the Ruby code.
+  #
+  # Example selectors:
+  # * book - find all child book elements
+  # * book/author - find all author elements that are children of the book child elements
+  # * ../book - find all sibling book elements
+  # * */author - find all author elements that are children of any child elements
+  # * book//author - find all author elements that descendants at any level of book child elements
+  # * .//author - find all author elements that are descendants of the current element
+  # * /library/books/book - find all book elements with the full path /library/books/book
+  # * author/text() - get the text values of all author child elements
+  class Selector
+    # Create a selector. Path should be an abbreviated XPath string.
+    def initialize (path)
+      @parts = []
+      path.gsub('//', '/%/').split('/').each do |part_path|
+        part_matchers = []
+        @parts << part_matchers
+        or_paths = part_path.split('|')
+        or_paths << "" if or_paths.empty?
+        or_paths.each do |matcher_path|
+          part_matchers << Matcher.new(matcher_path)
+        end
+      end
+    end
+    # Apply the selector to the current node. Note, if your path started with a /, it will be applied
+    # to the root node.
+    def find (node)
+      matched = [node]
+      @parts.each do |part_matchers|
+        context = matched
+        matched = []
+        part_matchers.each do |matcher|
+          matched.concat(matcher.select(context))
+        end
+        break if matched.empty?
+      end
+      return matched
+    end
+    # Match a partial path to a node.
+    class Matcher
+      def initialize (path)
+        case path
+        when 'text()'
+          @extractor = lambda{|node| node.value}
+        when '%'
+          @extractor = lambda{|node| node.descendants}
+        when '*'
+          @extractor = lambda{|node| node.children}
+        when '.'
+          @extractor = lambda{|node| node}
+        when '..'
+          @extractor = lambda{|node| node.parent ? node.parent : []}
+        when ''
+          @extractor = lambda{|node| root = Node.new(nil); root.children << node.root; root}
+        else
+          @extractor = lambda{|node| node.children.select{|child| child.name == path}}
+        end
+      end
+      # Select all nodes that match a partial path.
+      def select (context_nodes)
+        context_nodes.collect{|node| @extractor.call(node) if node.is_a?(Node)}.flatten
+      end
+    end
+  end
+end

data/spec/node_spec.rb ADDED Viewed

@@ -0,0 +1,140 @@
+require File.expand_path(File.join(File.dirname(__FILE__), 'spec_helper'))
+describe XmlNodeStream::Node do
+  it "should have a name" do
+    node = XmlNodeStream::Node.new("tag")
+    node.name.should == "tag"
+  end
+  it "should have attributes" do
+    node = XmlNodeStream::Node.new("tag")
+    node.attributes.should == {}
+    node["attr1"].should == nil
+    node = XmlNodeStream::Node.new("tag", nil, "attr1" => "val1", "attr2" => "val2")
+    node.attributes.should == {"attr1" => "val1", "attr2" => "val2"}
+    node["attr1"].should == "val1"
+  end
+  it "should have a value" do
+    node = XmlNodeStream::Node.new("tag")
+    node.value.should == nil
+    node = XmlNodeStream::Node.new("tag", nil, nil, "value")
+    node.value.should == "value"
+  end
+  it "should have a parent and children" do
+    parent = XmlNodeStream::Node.new("tag")
+    parent.parent.should == nil
+    parent.children.should == []
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child")
+    parent.add_child(child_2)
+    parent.children.should == [child_1, child_2]
+    child_1.parent.should == parent
+    child_2.parent.should == parent
+  end
+  it "should be able to remove children" do
+    parent = XmlNodeStream::Node.new("tag")
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child", parent)
+    parent.children.should == [child_1, child_2]
+    parent.remove_child(child_1)
+    parent.children.should == [child_2]
+    child_1.parent.should == nil
+  end
+  it "should release itself from its parent" do
+    parent = XmlNodeStream::Node.new("tag")
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child", parent)
+    parent.children.should == [child_1, child_2]
+    child_1.release!
+    parent.children.should == [child_2]
+    child_1.parent.should == nil
+  end
+  it "should have ancestors" do
+    parent = XmlNodeStream::Node.new("tag")
+    child = XmlNodeStream::Node.new("child", parent)
+    grandchild = XmlNodeStream::Node.new("grandchild", child)
+    parent.ancestors.should == []
+    child.ancestors.should == [parent]
+    grandchild.ancestors.should == [child, parent]
+  end
+  it "should have descendants" do
+    parent = XmlNodeStream::Node.new("tag")
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child", parent)
+    grandchild_1 = XmlNodeStream::Node.new("grandchild", child_1)
+    grandchild_2 = XmlNodeStream::Node.new("grandchild", child_1)
+    parent.descendants.should == [child_1, child_2, grandchild_1, grandchild_2]
+    child_1.descendants.should == [grandchild_1, grandchild_2]
+    grandchild_1.descendants.should == []
+  end
+  it "should have a root node" do
+    parent = XmlNodeStream::Node.new("tag")
+    child = XmlNodeStream::Node.new("child", parent)
+    grandchild = XmlNodeStream::Node.new("grandchild", child)
+    parent.root.should == parent
+    child.root.should == parent
+    grandchild.root.should == parent
+  end
+  it "should have a path" do
+    parent = XmlNodeStream::Node.new("tag")
+    child = XmlNodeStream::Node.new("child", parent)
+    grandchild = XmlNodeStream::Node.new("grandchild", child)
+    parent.path.should == "/tag"
+    child.path.should == "/tag/child"
+    grandchild.path.should == "/tag/child/grandchild"
+  end
+  it "should be able to select related nodes using a selector" do
+    parent = XmlNodeStream::Node.new("tag")
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child", parent)
+    grandchild_1 = XmlNodeStream::Node.new("grandchild", child_1, nil, "val1")
+    grandchild_2 = XmlNodeStream::Node.new("grandchild", child_1, nil, "val2")
+    parent.select("nothing").should == []
+    parent.select("child").should == [child_1, child_2]
+    parent.select("child/grandchild").should == [grandchild_1, grandchild_2]
+    parent.select("child/grandchild/text()").should == ["val1", "val2"]
+    grandchild_1.select("../..").should == [parent]
+  end
+  it "should be able to find the first related node using a selector" do
+    parent = XmlNodeStream::Node.new("tag")
+    child_1 = XmlNodeStream::Node.new("child", parent)
+    child_2 = XmlNodeStream::Node.new("child", parent)
+    grandchild_1 = XmlNodeStream::Node.new("grandchild", child_1, nil, "val1")
+    grandchild_2 = XmlNodeStream::Node.new("grandchild", child_1, nil, "val2")
+    parent.find("nothing").should == nil
+    parent.find("child").should == child_1
+    parent.find("child/grandchild").should == grandchild_1
+    parent.find("child/grandchild/text()").should == "val1"
+    grandchild_1.find("../..").should == parent
+  end
+  it "should append text which strips whitespace from the start and end of the value" do
+    node = XmlNodeStream::Node.new("tag")
+    node.append("   ")
+    node.append(" \t\r\nhello ")
+    node.append(" there\n")
+    node.finish!
+    node.value.should == "hello  there"
+  end
+  it "should append cdata which preserves all whitespace" do
+    node = XmlNodeStream::Node.new("tag")
+    node.append_cdata("   ")
+    node.append(" \t\r\nhello ")
+    node.append_cdata(" there\n")
+    node.finish!
+    node.value.should == "    \t\r\nhello  there\n"
+  end
+end

data/spec/parser_spec.rb ADDED Viewed

@@ -0,0 +1,148 @@
+require File.expand_path(File.join(File.dirname(__FILE__), 'spec_helper'))
+describe XmlNodeStream::Parser do
+  before :each do
+    @text_xml_path = File.expand_path(File.join(File.dirname(__FILE__), 'test.xml'))
+  end
+  it "should parse a document in a string" do
+    validate_text_xml(XmlNodeStream::Parser.parse(File.read(@text_xml_path)))
+  end
+  it "should parse a document in a file path string" do
+    validate_text_xml(XmlNodeStream::Parser.parse(@text_xml_path))
+  end
+  it "should parse a document in a file path" do
+    validate_text_xml(XmlNodeStream::Parser.parse(Pathname.new(@text_xml_path)))
+  end
+  it "should parse a document in a url string" do
+    uri = URI.parse("http://test.host/test.xml")
+    URI.should_receive(:parse).with("http://test.host/test.xml").and_return(uri)
+    File.open(@text_xml_path) do |stream|
+      uri.should_receive(:open).and_return(stream)
+      validate_text_xml(XmlNodeStream::Parser.parse("http://test.host/test.xml"))
+    end
+  end
+  it "should parse a document in a URI" do
+    uri = URI.parse("http://test.host/test.xml")
+    stream = mock(:stream)
+    File.open(@text_xml_path) do |stream|
+      uri.should_receive(:open).and_return(stream)
+      validate_text_xml(XmlNodeStream::Parser.parse(uri))
+    end
+  end
+  it "should parse a document in a stream" do
+    io = StringIO.new(File.read(@text_xml_path))
+    io.should_not_receive(:close)
+    validate_text_xml(XmlNodeStream::Parser.parse(io))
+  end
+  it "should call a block with each element in a document" do
+    nodes = []
+    XmlNodeStream::Parser.parse(@text_xml_path) do |node|
+      nodes << node.path
+    end
+    nodes.should == %w(
+      /library/authors/author/name
+      /library/authors/author
+      /library/authors/author/name
+      /library/authors/author
+      /library/authors/author/name
+      /library/authors/author
+      /library/authors
+      /library/collection/section/book/title
+      /library/collection/section/book/author
+      /library/collection/section/book/abstract
+      /library/collection/section/book/volumes
+      /library/collection/section/book
+      /library/collection/section
+      /library/collection/section/book/title
+      /library/collection/section/book/author
+      /library/collection/section/book/abstract
+      /library/collection/section/book
+      /library/collection/section/book/title
+      /library/collection/section/book/author
+      /library/collection/section/book/abstract
+      /library/collection/section/book
+      /library/collection/section/book/title
+      /library/collection/section/book/alternate_title
+      /library/collection/section/book/author
+      /library/collection/section/book/abstract
+      /library/collection/section/book
+      /library/collection/section
+      /library/collection
+      /library
+    )
+  end
+  XmlNodeStream::Parser::SUPPORTED_PARSERS.each do |parser_name|
+    context "with #{parser_name}" do
+      before :all do
+        @save_parser_name = XmlNodeStream::Parser.parser_name
+        XmlNodeStream::Parser.parser_name = parser_name
+      end
+      after :all do
+        XmlNodeStream::Parser.parser_name = @save_parser_name
+      end
+      it "should parse a document" do
+        begin
+          validate_text_xml(XmlNodeStream::Parser.parse(@text_xml_path))
+        rescue NotImplementedError
+          pending("#{parser_name} is not installed for testing")
+        end
+      end
+    end
+  end
+  def validate_text_xml (root)
+    validate(root, :name => "library", :children => ["authors", "collection"])
+    validate(root.children[0], :name => "authors", :children => ["author"] * 3)
+    validate(root.children[0].children[0], :name => "author", :attributes => {"id" => "1"}, :children => ["name"])
+    validate(root.children[0].children[0].children[0], :name => "name", :value => "Edward Gibbon")
+    validate(root.children[0].children[1], :name => "author", :attributes => {"id" => "2"}, :children => ["name"])
+    validate(root.children[0].children[1].children[0], :name => "name", :value => "Herman Melville")
+    validate(root.children[0].children[2], :name => "author", :attributes => {"id" => "3"}, :children => ["name"])
+    validate(root.children[0].children[2].children[0], :name => "name", :value => "Jack London")
+    validate(root.children[1], :name => "collection", :children => ["section"] * 2)
+    history = root.children[1].children[0]
+    fiction = root.children[1].children[1]
+    validate(history, :name => "section", :attributes => {"id" => "100", "name" => "History"}, :children => ["book"])
+    validate(history.children[0], :name => "book", :attributes => {"id" => "1"}, :children => ["title", "author", "abstract", "volumes"])
+    validate(history.children[0].children[0], :name => "title", :value => "The Decline & Fall of the Roman Empire")
+    validate(history.children[0].children[1], :name => "author", :value => nil, :attributes => {"id" => "1"})
+    validate(history.children[0].children[2], :name => "abstract", :value => "History of the fall of Rome.")
+    validate(history.children[0].children[3], :name => "volumes", :value => "6")
+    validate(fiction, :name => "section", :attributes => {"id" => "200", "name" => "Fiction"}, :children => ["book"] * 3)
+    validate(fiction.children[0], :name => "book", :attributes => {"id" => "2"}, :children => ["title", "author", "abstract"])
+    validate(fiction.children[0].children[0], :name => "title", :value => "Call of the Wild")
+    validate(fiction.children[0].children[1], :name => "author", :value => nil, :attributes => {"id" => "3"})
+    validate(fiction.children[0].children[2], :name => "abstract", :value => "\n          A dog goes to Alaska.\n        ")
+    validate(fiction.children[1], :name => "book", :attributes => {"id" => "3"}, :children => ["title", "author", "abstract"])
+    validate(fiction.children[1].children[0], :name => "title", :value => "White Fang")
+    validate(fiction.children[1].children[1], :name => "author", :value => nil, :attributes => {"id" => "3"})
+    validate(fiction.children[1].children[2], :name => "abstract", :value => "Dogs, wolves, etc.")
+    validate(fiction.children[2], :name => "book", :attributes => {"id" => "4"}, :children => ["title", "alternate_title", "author", "abstract"])
+    validate(fiction.children[2].children[0], :name => "title", :value => "Moby Dick")
+    validate(fiction.children[2].children[1], :name => "alternate_title", :value => "The Whale")
+    validate(fiction.children[2].children[2], :name => "author", :value => nil, :attributes => {"id" => "2"})
+    validate(fiction.children[2].children[3], :name => "abstract", :value => "A mad captain seeks a mysterious white whale.")
+  end
+  def validate (node, options)
+    node.name.should == options[:name]
+    node.attributes.should == (options[:attributes] || {})
+    node.value.should == (options.include?(:value) ? options[:value] : "")
+    node.children.collect{|c| c.name}.should == (options[:children] || [])
+  end
+end

data/spec/selector_spec.rb ADDED Viewed

@@ -0,0 +1,73 @@
+require File.expand_path(File.join(File.dirname(__FILE__), 'spec_helper'))
+describe XmlNodeStream::Selector do
+  before :each do
+    @root = XmlNodeStream::Node.new("root")
+    @child_1 = XmlNodeStream::Node.new("child", @root)
+    @child_2 = XmlNodeStream::Node.new("child", @root)
+    @grandchild_1 = XmlNodeStream::Node.new("grandchild", @child_1, nil, "val1")
+    @grandchild_2 = XmlNodeStream::Node.new("grandchild", @child_1, nil, "val2")
+    @grandchild_3 = XmlNodeStream::Node.new("grandchild", @child_2, nil, "val3")
+    @grandchild_4 = XmlNodeStream::Node.new("grandchild", @child_2, nil, "val4")
+    @great_grandchild = XmlNodeStream::Node.new("grandchild", @grandchild_1, nil, "val1.a")
+  end
+  it "should find child nodes with a specified name" do
+    selector = XmlNodeStream::Selector.new("child")
+    selector.find(@root).should == [@child_1, @child_2]
+    selector = XmlNodeStream::Selector.new("./child")
+    selector.find(@root).should == [@child_1, @child_2]
+    selector = XmlNodeStream::Selector.new("nothing")
+    selector.find(@root).should == []
+    selector.find(@child_1).should == []
+  end
+  it "should find descendant nodes with a specified name" do
+    selector = XmlNodeStream::Selector.new(".//grandchild")
+    selector.find(@root).should == [@grandchild_1, @grandchild_2, @grandchild_3, @grandchild_4, @great_grandchild]
+    selector.find(@child_1).should == [@great_grandchild]
+    selector.find(@child_2).should == []
+  end
+  it "should find child nodes in a specified hierarchy" do
+    selector = XmlNodeStream::Selector.new("child/grandchild")
+    selector.find(@root).should == [@grandchild_1, @grandchild_2, @grandchild_3, @grandchild_4]
+    selector = XmlNodeStream::Selector.new("child/nothing")
+    selector.find(@root).should == []
+    selector.find(@child_1).should == []
+  end
+  it "should find an node itself" do
+    selector = XmlNodeStream::Selector.new(".")
+    selector.find(@child_1).should == [@child_1]
+  end
+  it "should find a parent node" do
+    selector = XmlNodeStream::Selector.new("..")
+    selector.find(@child_1).should == [@root]
+    selector.find(@root).should == []
+  end
+  it "should find an node's value" do
+    selector = XmlNodeStream::Selector.new("text()")
+    selector.find(@child_1).should == [nil]
+    selector.find(@grandchild_1).should == ["val1"]
+    selector = XmlNodeStream::Selector.new("child/grandchild/text()")
+    selector.find(@root).should == ["val1", "val2", "val3", "val4"]
+  end
+  it "should allow wildcards in the hierarchy" do
+    selector = XmlNodeStream::Selector.new("*/grandchild")
+    selector.find(@root).should == [@grandchild_1, @grandchild_2, @grandchild_3, @grandchild_4]
+    selector.find(@child_1).should == [@great_grandchild]
+    selector.find(@child_2).should == []
+  end
+  it "should find using full paths" do
+    selector = XmlNodeStream::Selector.new("/root/child")
+    selector.find(@root).should == [@child_1, @child_2]
+    selector.find(@grandchild_1).should == [@child_1, @child_2]
+  end
+end

data/spec/spec_helper.rb ADDED Viewed

@@ -0,0 +1,3 @@
+require 'rubygems'
+require 'spec'
+require File.expand_path(File.join(File.dirname(__FILE__), '..', 'lib', 'xml_node_stream'))

data/spec/test.xml ADDED Viewed

@@ -0,0 +1,57 @@
+<?xml version="1.0"?>
+<library>
+  <?library-info version="2.0" ignore="yes" ?>
+  <!-- Authors -->
+  <authors>
+    <author id="1">
+      <name>Edward Gibbon</name>
+    </author>
+    <author id="2">
+      <name>Herman Melville</name>
+    </author>
+    <author id="3">
+      <name>Jack London</name>
+    </author>
+  </authors>
+  <!-- Books -->
+  <collection>
+    <section id="100" name="History">
+      <book id="1">
+        <title>
+          The Decline &amp; Fall of the Roman Empire
+        </title>
+        <author id="1"/>
+        <abstract><![CDATA[History of the fall of Rome.]]></abstract>
+        <volumes>6</volumes>
+      </book>
+    </section>
+    <section id="200" name="Fiction">
+      <book id="2">
+        <title>
+          Call of the Wild
+        </title>
+        <author id="3"/>
+        <abstract><![CDATA[
+          A dog goes to Alaska.
+        ]]></abstract>
+      </book>
+      <book id="3">
+        <title>
+          White Fang
+        </title>
+        <author id="3"/>
+        <abstract><![CDATA[Dogs, wolves, etc.]]></abstract>
+      </book>
+      <book id="4">
+        <title>
+          Moby Dick
+        </title>
+        <alternate_title>
+          The Whale
+        </alternate_title>
+        <author id="2"/>
+        <abstract><![CDATA[A mad captain seeks a mysterious white whale.]]></abstract>
+      </book>
+    </section>
+  </collection>
+</library>

data/spec/xml_node_stream_spec.rb ADDED Viewed

@@ -0,0 +1,11 @@
+require File.expand_path(File.join(File.dirname(__FILE__), 'spec_helper'))
+describe XmlNodeStream do
+  it "should parse a document using the Parser.parse method" do
+    block = lambda{}
+    XmlNodeStream::Parser.should_receive(:parse).with("<xml/>", &block)
+    XmlNodeStream.parse("<xml/>", &block)
+  end
+end

data/xml_node_stream.gemspec ADDED Viewed

@@ -0,0 +1,68 @@
+# Generated by jeweler
+# DO NOT EDIT THIS FILE DIRECTLY
+# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
+# -*- encoding: utf-8 -*-
+Gem::Specification.new do |s|
+  s.name = %q{xml_node_stream}
+  s.version = "1.0.0"
+  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
+  s.authors = ["Brian Durand"]
+  s.date = %q{2010-02-07}
+  s.email = %q{brian@embellishedvisions.com}
+  s.extra_rdoc_files = [
+    "README.rdoc"
+  ]
+  s.files = [
+    "MIT_LICENSE",
+     "README.rdoc",
+     "Rakefile",
+     "VERSION",
+     "init.rb",
+     "lib/xml_node_stream.rb",
+     "lib/xml_node_stream/node.rb",
+     "lib/xml_node_stream/parser.rb",
+     "lib/xml_node_stream/parser/base.rb",
+     "lib/xml_node_stream/parser/libxml_parser.rb",
+     "lib/xml_node_stream/parser/nokogiri_parser.rb",
+     "lib/xml_node_stream/parser/rexml_parser.rb",
+     "lib/xml_node_stream/selector.rb",
+     "spec/node_spec.rb",
+     "spec/parser_spec.rb",
+     "spec/selector_spec.rb",
+     "spec/spec_helper.rb",
+     "spec/test.xml",
+     "spec/xml_node_stream_spec.rb",
+     "xml_node_stream.gemspec"
+  ]
+  s.homepage = %q{http://github.com/bdurand/xml_node_stream}
+  s.rdoc_options = ["--charset=UTF-8"]
+  s.require_paths = ["lib"]
+  s.rubygems_version = %q{1.3.5}
+  s.summary = %q{Simple XML parser wrapper that provides the benefits of stream parsing with the ease of using document nodes.}
+  s.test_files = [
+    "spec/node_spec.rb",
+     "spec/parser_spec.rb",
+     "spec/selector_spec.rb",
+     "spec/spec_helper.rb",
+     "spec/xml_node_stream_spec.rb"
+  ]
+  if s.respond_to? :specification_version then
+    current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
+    s.specification_version = 3
+    if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
+      s.add_development_dependency(%q<rspec>, [">= 1.2.9"])
+      s.add_development_dependency(%q<jeweler>, [">= 0"])
+    else
+      s.add_dependency(%q<rspec>, [">= 1.2.9"])
+      s.add_dependency(%q<jeweler>, [">= 0"])
+    end
+  else
+    s.add_dependency(%q<rspec>, [">= 1.2.9"])
+    s.add_dependency(%q<jeweler>, [">= 0"])
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,97 @@
+--- !ruby/object:Gem::Specification
+name: xml_node_stream
+version: !ruby/object:Gem::Version
+  version: 1.0.0
+platform: ruby
+authors:
+- Brian Durand
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2010-02-07 00:00:00 -06:00
+default_executable:
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec
+  type: :development
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.2.9
+    version:
+- !ruby/object:Gem::Dependency
+  name: jeweler
+  type: :development
+  version_requirement:
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: "0"
+    version:
+description:
+email: brian@embellishedvisions.com
+executables: []
+extensions: []
+extra_rdoc_files:
+- README.rdoc
+files:
+- MIT_LICENSE
+- README.rdoc
+- Rakefile
+- VERSION
+- init.rb
+- lib/xml_node_stream.rb
+- lib/xml_node_stream/node.rb
+- lib/xml_node_stream/parser.rb
+- lib/xml_node_stream/parser/base.rb
+- lib/xml_node_stream/parser/libxml_parser.rb
+- lib/xml_node_stream/parser/nokogiri_parser.rb
+- lib/xml_node_stream/parser/rexml_parser.rb
+- lib/xml_node_stream/selector.rb
+- spec/node_spec.rb
+- spec/parser_spec.rb
+- spec/selector_spec.rb
+- spec/spec_helper.rb
+- spec/test.xml
+- spec/xml_node_stream_spec.rb
+- xml_node_stream.gemspec
+has_rdoc: true
+homepage: http://github.com/bdurand/xml_node_stream
+licenses: []
+post_install_message:
+rdoc_options:
+- --charset=UTF-8
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: "0"
+  version:
+requirements: []
+rubyforge_project:
+rubygems_version: 1.3.5
+signing_key:
+specification_version: 3
+summary: Simple XML parser wrapper that provides the benefits of stream parsing with the ease of using document nodes.
+test_files:
+- spec/node_spec.rb
+- spec/parser_spec.rb
+- spec/selector_spec.rb
+- spec/spec_helper.rb
+- spec/xml_node_stream_spec.rb