RubyGems - axml - Versions diffs - 0.0.3 → 0.0.4 - Mend

axml 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

data/README CHANGED Viewed

@@ -1,30 +1,27 @@
 AXML
 ====
-AXML - Provides a simple DOM for working with XML (using XMLParser under the
-hood) that can serve as a drop in replacement for a subset of basic libxml
-functionality (e.g., each, children, child, find_first, find, next).
+AXML - Provides a simple, minimalistic DOM for working with data stored in an
+XML document.  The API is very similar to LibXML, differing slightly in the
+handling of text nodes.  It is designed with very large documents in mind: nodes are represented in memory efficient Struct objects and it works with either XMLParser or LibXML!
-'AXML' means 'ax XML' which succinctly describes the occasional feeling of a
-programmer towards XML or its myriad parsers.  AXML won't solve all your
-problems, but it does make working with XML much less painful.
+'AXML' literally translates into 'ax XML' which succinctly describes the
+occasional feeling of a programmer towards XML or its myriad parsers.  AXML
+won't solve all your XML woes, but it does make working with XML much less
+painful.
 Features
 --------
-* *fast*: it's implemented in XMLParser (expat under the hood)
-* *lean*: as in 'lines of code' (~220 w/ blank lines) and as in 'memory consumption' (nodes implemented as Struct, children in Array)
+* *fast*: runs on either XMLParser or LibXML
+* *lean*: as in 'lines of code' and as in 'memory consumption' (nodes implemented as Struct, children in Array)
 * *easy to extend*: code your Grandmother could read and understand (if she reads ruby)
-* *quacks like libxml*: implements a very useful subset of libxml methods for near drop in replacement.
+* PLOS: implements a useful subset of libxml methods for near drop in replacement.
 Examples
 --------
-    require 'axml'  # currently requires 'xmlparser' be installed
-                    # Windows: already in one-click-installer
-                    # Ubuntu: sudo apt-get install libxml-parser-ruby1.8
-                    # Cygwin: see http://mspire.rubyforge.org/tutorial/cygwin_mspire.html
+    require 'axml'
     # a little example xml string to use
     string_or_io = "
@@ -39,18 +36,24 @@ Examples
     </n1>
     "
-### Read a string or io
+### Read a string, io, or file
     n1_node = AXML.parse(string_or_io)
-### Read a file
-    n1_node = AXML.parse_file('path/to/file')
+    # --or--
+    n1_node = AXML.parse('path/to/file')
 ### Access children
     n1_node.children # -> [array]
-    n1_node.each {|child|  # do something with child }
+    n1_node.each {|child|  # do something with each child }
+### Traverse the whole tree structure
+    n1_node.traverse do |node|
+      # pre traversal
+    end
+    n1_node.traverse(:post) {|node| # post traversal }
 ### Get attributes and text
@@ -59,7 +62,7 @@ Examples
     n3_node.text    # -> 'words here'
     n3_node.content # -> [same]
-### Traverse nodes with next and child
+### Navigate nodes
     n2_node = n1_node.child
     the_other_n2_node = n2_node.next
@@ -71,26 +74,12 @@ Examples
     n3_node = n1_node.find_first('descendant::n3')
     other_n3_node = n3_node.find_first('following-sibling::n3')
     n1_node.find_first('child::n3')    # -> nil
+    # also callable as find_first_child and find_first_descendant
     # find (returns an array)
-    n1_node.find('descendant::n3')     # -> [array of all 3 <n3> nodes]
     n1_node.find('child::n2')          # -> [array of 2 <n2> nodes]
-### Switch to libxml
-This is all it takes to get all of the above code to work under libxml:
-    require 'xml/libxml'   # instead of:  require 'axml'
-    # A file
-    REPLACE: n1_node = AXML.parse_file(file)
-    WITH:    n1_node = XML::Document.file(file).root          # note the .root call on the end!
-    # A string
-    REPLACE: n1_node = AXML.parse(string)
-    WITH:    n1_node = XML::Parser.string(string).parse.root  # note the .root call on the end!
-Wallah!  All the above method calls work under libxml
+    n1_node.find('descendant::n3')     # -> [array of all 3 <n3> nodes]
+    # also callable as find_child and find_descendant
 See `specs/axml_spec.rb` for more examples and functionality
@@ -107,3 +96,10 @@ Installation
 ------------
     gem install axml
+See Also
+--------
+If you are parsing HTML or complex word processing documents this is not the parser for you.  Try something like hpricot or LibXML.

data/Rakefile CHANGED Viewed

@@ -2,9 +2,9 @@ require 'rake'
 require 'rubygems'
 require 'rake/rdoctask'
 require 'rake/gempackagetask'
+require 'rake/testtask'
 require 'rake/clean'
 require 'fileutils'
-#require 'spec/rake/spectask'
 require 'email_encrypt'
 ###############################################
@@ -59,7 +59,7 @@ task :html_docs do
     # add contact info:
     index.puts '<h2>Contact</h2>'
-    index.puts 'jprince@icmb.utexas.edu'.email_encrypt
+    index.puts 'jtprince@gmail.com'.email_encrypt
     index.puts '</body></html>'
   end
@@ -75,64 +75,20 @@ end
 # TESTS
 ###############################################
-task :ensure_gem_is_uninstalled do
-  reply = `#{$gemcmd} list -l #{NAME}`
-  if reply.include? NAME + " ("
-    puts "GOING to uninstall gem '#{NAME}' for testing"
-    if WIN32
-      %x( #{$gemcmd} uninstall -x #{NAME} )
-    else
-      %x( sudo #{$gemcmd} uninstall -x #{NAME} )
-    end
-  end
+desc 'Default: Run specs.'
+task :default => :spec
+desc 'Run specs.'
+Rake::TestTask.new(:spec) do |t|
+  t.verbose = true
+  t.warning = true
+  ENV['RUBYOPT'] = 'rubygems'
+  ENV['TEST'] = ENV['SPEC'] if ENV['SPEC']
+  t.libs = ['lib']
+  t.test_files = Dir.glob( File.join('spec', ENV['pattern'] || '**/*_spec.rb') )
+  t.options = "-v"
 end
-#namespace :spec do
-#  task :autotest do
-#    require './specs/rspec_autotest'
-#    RspecAutotest.run
-#  end
-#end
-#desc "Run specs"
-#Spec::Rake::SpecTask.new('spec') do |t|
-#  Rake::Task[:ensure_gem_is_uninstalled].invoke
-#  t.libs = ['lib']
-#  t.spec_files = FileList['specs/**/*_spec.rb']
-#end
-#desc "Run specs and output specdoc"
-#Spec::Rake::SpecTask.new('specl') do |t|
-#  Rake::Task[:ensure_gem_is_uninstalled].invoke
-#  t.spec_files = FileList['specs/**/*_spec.rb']
-#  t.libs = ['lib']
-#  t.spec_opts = ['--format', 'specdoc' ]
-#end
-#desc "Run all specs with RCov"
-#Spec::Rake::SpecTask.new('rcov') do |t|
-#  Rake::Task[:ensure_gem_is_uninstalled].invoke
-#  t.spec_files = FileList['specs/**/*_spec.rb']
-#  t.rcov = true
-#  t.libs = ['lib']
-#  t.rcov_opts = ['--exclude', 'specs']
-#end
-#task :spec do
-#  uninstall_gem
-#  # files that match a key word
-#  files_to_run = ENV['SPEC'] || FileList['specs/**/*_spec.rb']
-#  if ENV['SPECM']
-#    files_to_run = files_to_run.select do |file|
-#      file.include?(ENV['SPECM'])
-#    end
-#  end
-#  files_to_run.each do |spc|
-#    system "ruby -I lib -S spec #{spc} --format specdoc"
-#  end
-#end
 ###############################################
 # PACKAGE / INSTALL / UNINSTALL
 ###############################################
@@ -191,9 +147,11 @@ gemspec = Gem::Specification.new do |t|
   t.platform = Gem::Platform::RUBY
   t.name = NAME
   t.version =  IO.readlines(changelog).grep(/##.*version/).pop.split(/\s+/).last.chomp
+  t.homepage = 'http://axml.rubyforge.org/'
+  t.rubyforge_project = 'axml'
   t.summary = summary
   t.date = "#{tm.year}-#{tm.month}-#{tm.day}"
-  t.email = "jprince@icmb.utexas.edu"
+  t.email = "jtprince@gmail.com"
   t.description = description
   t.has_rdoc = true
   t.authors = ["John Prince"]
@@ -201,8 +159,8 @@ gemspec = Gem::Specification.new do |t|
   t.rdoc_options = rdoc_options
   t.extra_rdoc_files = rdoc_extra_includes
   t.executables = FL["bin/*"].map {|file| File.basename(file) }
-  t.requirements << 'xmlparser is needed right now'
-  t.test_files = FL["specs/*_spec.rb"]
+  t.requirements << 'xmlparser or libxml'
+  t.test_files = FL["spec/**/*_spec.rb"]
 end
 desc "Create packages."

data/lib/axml.rb CHANGED Viewed

@@ -1,376 +1,39 @@
-require 'xmlparser'
-class AXML
-  NotBlankText_re = /[^\s+]+/m
-  def self.parse_file(file)
-    root = nil
-    File.open(file) {|fh| root = parse(fh) }
-    root
-  end
-  # Returns the root node (as Element) or nodes (as Array)
-  # options:
-  #   :keep_blanks => *true | false
-  def self.parse(stream, opts={:keep_blanks => false})
-    parser = AXML::XMLParser.new
-    if opts[:keep_blanks] == false
-      parser.set_no_keep_blanks
-    end
-    if ti = opts[:text_indices]
-      if ti.is_a?(Array) && ti.size > 1
-        raise NotImplementedError, "currently only supports a single element"
-      else
-        ti =
-          if ti.is_a?(Array)
-            ti.first.to_s
-          else
-            ti.to_s
-          end
-        parser.set_single_text_indices(ti)
-      end
-    end
-    parser.parse(stream)
-    parser.root
-  end
-end
-AXML::El = Struct.new(:parent, :name, :attrs, :text, :children, :array_index)
-class AXML::El
-  include Enumerable
-  # use AXML::El::Indent.replace to swap without warning
-  # ["", "  ", "    ", "      ", "        ", "          ", ... ]
-  Indent = '  '
-  # use AXML::El::Indentation.replace to replace w/o warning
-  Indentation = (0...30).to_a.map {|num| Indent*num }
-  # current depth
-  @@depth = 0
-  alias_method  :content, :text
-  alias_method  :content=, :text=
-  alias_method  :kids, :children
-  alias_method  :kids=, :children=
-  def [](attribute_string)
-    attrs[attribute_string]
-  end
-  def []=(attribute_string, value)
-    attrs[attribute_string] = value
-  end
-  # has text?
-  def text?
-    !!text
-  end
-  def children?
-    children.size > 0
-  end
-  alias_method :child?, :children?
-  # full traversal from the initial node
-  def traverse(type=:pre, &block)
-    if type == :pre
-      block.call(self)
-    end
-    children.each do |child|
-      child.traverse(type, &block)
-    end
-    if type == :post
-      block.call(self)
-    end
-  end
-  def each(&block)
-    children.each do |child|
-      block.call(child)
-    end
-  end
-  # drops the current element from the list of its parents children
-  def drop
-    parent.children.delete(self)
-  end
-  def drop_child(node)
-    found_it = false
-    found_index = nil
-    children.each_with_index do |v,i|
-      if found_it
-        v.array_index = i - 1
-      end
-      if v.object_id == node.object_id
-        found_index = i
-        found_it = true
-      end
-    end
-    children.delete_at(found_index) if found_index
-  end
-  def tabs
-    Indentation[@@depth]
-  end
-  EscapeCharsRe = /['"&><]/
-  # returns data escaped if necessary
-  def escape(data)
-    # modified slightly from xmlsimple.rb
-    return data if !data.is_a?(String) || data.nil? || data == ''
-    result = data.dup
-    if EscapeCharsRe.match(data)
-      result.gsub!('&', '&amp;')
-      result.gsub!('<', '&lt;')
-      result.gsub!('>', '&gt;')
-      result.gsub!('"', '&quot;')
-      result.gsub!("'", '&apos;')
-    end
-    result
-  end
-  def to_s
-    attstring = ""
-    if attrs.size > 0
-      attstring = " " + attrs.collect { |k,v| "#{k}=\"#{escape(v)}\"" }.join(" ")
-    end
-    string = "#{tabs}<#{name}#{attstring}"
-    if children.size > 0
-      string << ">"
-      if text?
-        string << escape(text)
-      end
-      string << "\n"
-      @@depth += 1
-      string << children.collect {|child| child.to_s }.join("")
-      @@depth -= 1
-      string << "#{tabs}</#{name}>\n"
-    elsif text?
-      string << ">" << escape(text) << "</#{name}>\n"
-    else
-      string << "/>\n"
-    end
-    string
-  end
-  def inspect
-    "<name='#{name}' attrs='#{attrs.inspect}' children.size=#{children.size}>"
-  end
-  # the next node
-  def next
-    parent.children[array_index+1]
-  end
-  # the first child (equivalent to children.first)
-  def child
-    children.first
-  end
-  def add_node(node)
-    node.array_index = children.size
-    children.push( node )
-  end
-  ########################################################################
-  # FIND and FIND_FIRST (with a little useful xpath)
-  ########################################################################
-  # Returns an array of nodes.  Accepts same xpath strings as find_first.
-  def find(string)
-    (tp, name) = string.split('::')
-    case tp
-    when 'child'
-      find_children(name)
-    when 'descendant'
-      find_descendants(name)
-    when 'following-sibling'
-      find_following_siblings(name)
-    end
-  end
-  # currently must be called with descendant:: or child:: string prefix! e.g.
-  # "descendant::<name>" and "child::<name>" where <name> is the name of the
-  # node you seek)
-  def find_first(string)
-    (tp, name) = string.split('::')
-    case tp
-    when 'child'
-      find_first_child(name)
-    when 'descendant'
-      find_first_descendant(name)
-    when 'following-sibling'
-      find_first_following_sibling(name)
-    end
-  end
-  def find_descendants(name, collect_descendants=[])
-    children.each do |child|
-      collect_descendants.push(child) if child.name == name
-      child.find_descendants(name, collect_descendants)
-    end
-    collect_descendants
-  end
-  def find_first_descendant(name)
-    self.each do |child_node|
-      if child_node.name == name
-        return child_node
+require 'axml/autoload'
+module AXML
+  # note that if Autoload must find a suitable parser, it will be set as the
+  # :parser default to be used for future reference.
+  DEFAULTS = {:keep_blanks => false, :parser => nil}
+  PREFERRED = [:xmlparser, :libxml, :libxml_sax, :rexml]
+  CLASS_MAPPINGS = {:xmlparser => 'XMLParser', :libxml => 'LibXML', :libxml_sax => 'libXMLSax', :rexml => 'REXML' }
+  WARN = {:rexml => "Using REXML as parser!  This is very slow on large docs!\nCall the method AXML::Autoload.install_instructions for help installing\nsomething FASTER!",
+  }
+  # opts:
+  #   :parser =
+  def parse(arg, opts={})
+    opts = DEFAULTS.merge opts
+    parser = AXML::Autoload.parser!(opts[:parser])
+    method =
+      if arg.is_a?(String) && File.exist?(arg)
+        :parse_file
+      elsif arg.is_a?(IO)
+        :parse_io
+      elsif arg.is_a?(String)
+        :parse_string
       else
-        return child_node.find_first_descendant(name)
+        raise ArgumentError, "can deal with filenames, Strings, and IO objects.\nDon't know how to work with object of class: #{arg.class}"
       end
-    end
-    return nil
+    parser.send(method, arg, opts)
   end
-  def find_children(name)
-    children.select {|v| v.name == name }
+  def parse_file(file, opts={}) # :nodoc:
+    opts = DEFAULTS.merge opts
+    parser = AXML::Autoload.parser!(opts[:parser])
+    File.open(file) {|fh| parser.parse_io(fh, opts) }
   end
-  def find_first_child(name)
-    self.each do |child_node|
-      if child_node.name == name
-        return child_node
-      end
-    end
-    return nil
-  end
-  def find_following_siblings(name)
-    parent.children[(array_index+1)..-1].select {|v| v.name == name }
-  end
-  def find_first_following_sibling(name)
-    node = nil
-    parent.children[(array_index+1)..-1].each do |sibling|
-      if sibling.name == name
-        node = sibling
-        break
-      end
-    end
-    node
-  end
+  extend AXML
 end
-class AXML::XMLParser < XMLParser
-  attr_writer :root
-  # returns the first node found in the document
-  def root
-    @root.child
-  end
-  def set_no_keep_blanks
-    instance_eval do
-      def endElement(name)
-        unless AXML::NotBlankText_re.match(@cur.text)
-          @cur.text = nil
-        end
-        @cur = @cur.parent
-      end
-    end
-  end
-  # returns text as an array for each occurence of the specified element: [start_index, num_bytes]
-  def set_single_text_indices(el_name)
-    @el_name = el_name
-    instance_eval do
-      def startElement(name, attributes)
-        text =
-          if name == @el_name ; []
-          else ; ''
-          end
-        new_el = ::AXML::El.new(@cur, name, attributes, text, [])
-        @cur.add_node(new_el)
-        @cur = new_el
-      end
-      def character(data)
-        if @cur.text.is_a? Array
-          @cur.text << byteIndex
-        else
-          @cur.text << data
-        end
-      end
-      def endElement(name)
-        if @cur.text.is_a? Array
-          @cur.text << (byteIndex - @cur.text.first)
-        end
-        @cur = @cur.parent
-      end
-    end
-  end
-  # takes opts from AXML::parse method
-  def initialize
-    #@keep_blanks = opts[:keep_blanks]
-    @root = AXML::El.new(nil, "root", {}, '', [])
-    @cur = @root
-  end
-  def startElement(name, attributes)
-    new_el = AXML::El.new(@cur, name, attributes, '', [])
-    @cur.add_node(new_el)
-    @cur = new_el
-  end
-  def character(data)
-    @cur.text << data
-  end
-  def endElement(name)
-    @cur = @cur.parent
-  end
-end
-=begin
-# This parser stores information about where the peaks information is in the
-# file
-# The content of the peaks node is an array where the first member is the
-# start index and the last member is the number of bytes.  All other members
-# should be ignored.
-class AXML::XMLParser::LazyPeaks < ::AXML::XMLParser
-  def startElement(name, attributes)
-    text =
-      if name == 'peaks' ; []
-      else ; ''
-      end
-    new_el = ::AXML::El.new(@cur, name, attributes, text, [])
-    # add the new node to the previous parent node
-    @cur.add_node(new_el)
-    # notice the change in @cur node
-    @cur = new_el
-  end
-  def character(data)
-    if @cur.text.is_a? Array
-      @cur.text << byteIndex
-    else
-      @cur.text << data
-    end
-  end
-  def endElement(name)
-    if @cur.text.is_a? Array
-      @cur.text << (byteIndex - @cur.text.first)
-    end
-    @cur = @cur.parent
-  end
-end
-=end