RubyGems - xmlstreamin - Versions diffs - 0.0.1 - Mend

xmlstreamin 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

data/README ADDED

@@ -0,0 +1,63 @@
+                     XMLStreamin
+                     ===========
+XMLStreamin is a small Ruby module that provides a way of reading an XML
+document as a stream, while letting it be processed acccording to its tree
+structure much more handily than the usual 'flat' stream reader does.
+Unlike such readers, which typically simply call unspecialized methods
+for each start-tag, end-tag, text segment, and so on, and leave it to the
+application to sort out the hierarchy, XMLStreamin uses a pre-built tree
+of XMLSpec nodes to model the expected document structure.
+(The module is fairly basic: it only handles the main hierarchy of the document.
+No attention is paid to other elements like declarations, as it is not intended
+as a do-everything parser.  As the XMLStreamListener class is derived from
+REXML::StreamListener, you could add methods to handle such things if needed.)
+Each node specifies the actual actions to be taken when an element that
+it represents is encountered.  It can specify what processing needs to be
+done on the attributes of a start-tag, the handling of included text, and any
+clean-up actions when the end-tag is read.  It also contains a table of the
+expected sub-elements and their XMLSpec nodes, thus reflecting the document
+tree.
+The central class is 'XMLStreamListener' which extends REXML::StreamListener
+to provide an interface to a 'tree' of 'XMLSpec' nodes that models the hierarchy
+of the XML document to be read.
+'XMLSpec' is a base class intended to be extended as needed to handle
+processing for each type of expected element at each level of the XML
+hierarchy in the document.  It has two categories of methods:
+those concerned with setup (which should not need to be modified)
+-- 'specs!', 'default!', and 'spec' --, and the handler methods
+'start', 'done', 'empty' and 'text', that should be specialized
+as necessary in derived classes or instances.
+XMLSpec nodes are intended to be linked in a tree structure, reflecting
+the structure of the XML document to be read.  Each node has a 'dispatch'
+(hash) table associating expected tag names with the subordinate XMLSpec
+nodes that should handle them; the hash table default should reference
+a node that will handle unexpected tags.  If appropriate, you can use
+a single node to service several different tag names (bearing in mind
+that the dispatch table will be shared).
+There are two predefined global XMLSpecs: '$specXMLVoid', which is a do-nothing
+basic XMLSpec that can be used to represent elements that you aren't interested
+in, and '$specXMLFail' which will raise an error if it is invoked.
+TO use this module. xmlstreamin.rb should either be in the local directory
+or the Ruby library path.  It can then be loaded with 'require "xmlstreamin"'.
+It loads "rexml/document" and "rexml/streamlistener" itself.  The latter
+should not be needed outside the module, but XMLStreamListener is invoked
+via a call to "REXML::Document.parse_stream".
+See "xmldemo.rb" and the comments therein for a small -- contrived --
+example of how to use it.  RDOC documentation is also provided.
+Contents:
+      xmlstreamin.rb   -- move this into the Ruby library path
+      xmldemo.rb       -- example of usage
+      demodoc.xml      -- to invoke xmldemo.rb on
+      README           -- this file
+      XMLStreamin.html -- link to:
+      doc              -- RDOC directory

data/demodoc.xml ADDED

@@ -0,0 +1,27 @@
+<inventory title="OmniCorp Store #45x10^3">
+  <section name="health">
+    <item upc="123456789" stock="12">
+      <name>Invisibility Cream</name>
+      <price>14.50</price>
+      <description>Makes you invisible</description>
+    </item>
+    <item upc="445322344" stock="18">
+      <name>Levitation Salve</name>
+      <price>23.99</price>
+      <description>Levitate yourself for up to 3 hours per application</description>
+    </item>
+  </section>
+  <section name="food">
+    <item upc="485672034" stock="653">
+      <name>Blork and Freen Instameal</name>
+      <price>4.95</price>
+      <description>A tasty meal in a tablet;<intron>junk added</intron>just add water</description>
+    </item>
+    <item upc="132957764" stock="44">
+      <name>Grob winglets</name>
+      <price>3.56</price>
+      <description>Tender winglets of Grob. Just add water</description>
+    </item>
+  </section>
+</inventory>

data/lib/xmlstreamin.rb ADDED

@@ -0,0 +1,259 @@
+# :title: XMLStreamin Documentation
+# XMLStreamin is a small module that provides a way of reading XML as a stream,
+# while letting it be processed acccording to its tree structure much more
+# handily than the usual 'flat' stream reader does.
+#
+# Unlike such readers, which typically simply call unspecialized methods
+# for each start-tag, end-tag, text segment, and so on, and leave it to the
+# application to sort out the hierarchy, XMLStreamin uses a pre-built tree
+# of #XMLSpec nodes to model the expected document structure.
+#
+# Each node specifies the actual actions to be taken when an element that
+# it represents is encountered.  It can specify what processing needs to be
+# done on the attributes of a start-tag, the handling of included text, and any
+# clean-up actions when the end-tag is read.  It also contains a table of the
+# expected sub-elements and _their_ XMLSpec nodes, thus reflecting the document
+# tree.
+#
+# Usage skeleton:
+#	 require "xmlstreamin"
+#	 include XMLStreamin # for convenience in naming
+#
+#	 # Supply XMLSpec objects to model the document tree:
+#
+#	 #First for the leaf elements:
+#	 someleafspec = XMLSpec.new
+#	 # override methods as needed, e.g.:
+#	 def someleafspec.text(context,data)
+#	 	#...
+#	 end
+#
+#	 #... then provide nodes for the elements that enclose these:
+#	 someotherspec = XMLSpec.new
+#	 someotherspec.specs!({'someleaftag'=>someleafspec,...})
+#	 # Override methods in these as well, if appropriate:
+#	 def someotherspec.start(context,name,attrs)
+#	 	#...
+#	 	return context
+#	 end
+#
+#	 #...... more specs complete the tree, up to some 'toplevelspec'
+#	 # for the document's enclosing element.
+#
+#	 # Finally one spec for the document itself that just has
+#	 # an entry for that top level tag:
+#	 specDocument = XMLSpec.new
+#	 specDocument.specs!({'document_top_tag'=>toplevelspec})
+#
+#	 To run, provide a source stream, and invoke the REXML stream parser:
+#	 source = ...the source of the document stream
+#	 REXML::Document.parse_stream(source,
+#	 		XMLStreamListener.new(specDocument))
+#
+#
+module XMLStreamin
+ require "rexml/document"
+ require "rexml/streamlistener"
+	# Exception class that will be thrown by the module if it hits trouble.
+	class XMLError < RuntimeError
+	end
+	# XMLSpec is a base class intended to be extended as needed to handle
+	# processing for each type of expected element at each level of the
+	# XML hierarchy in the document.  It has two categories of methods:
+	# those concerned with setup (which should not need to be modified)
+	# -- #specs!, #default!, and #spec --, and the handler methods
+	# #start, #done, #empty and #text, that are intended to be
+	# specialized as necessary in derived classes or instances.
+	#
+	# XMLSpec nodes are intended to be linked in a tree structure, reflecting
+	# the structure of the XML document to be read.  Each node has a 'dispatch'
+	# (hash) table associating expected tag names with the subordinate XMLSpec
+	# nodes that should handle them; the hash table _default_ should reference
+	# a node that will handle unexpected tags.  If appropriate, you can use
+	# a single node to service several different tag names (bearing in mind
+	# that the dispatch table will be shared).
+	#
+	# There are two predefined global XMLSpecs:
+	# $specXMLVoid::
+	#  This is a do-nothing basic XMLSpec that can be used to represent
+	#  elements that you aren't interested in.  These and any subordinate
+	#  elements will be skipped during parsing.
+	# $specXMLFail::
+	#  This will raise XMLError if it is invoked.  It may be used if an
+	#  unexpected element is a serious problem.
+	class XMLSpec
+		# Create an empty instance.  Unless it is a non-functional leaf node,
+		# it will need to be specialized by filling the dispatch table ( #specs! )
+		# and redefining methods as appropriate.
+		def initialize
+			@subspecs=Hash.new($specXMLVoid) end
+		# Set the XMLSpec that will be used for tag names that are not
+		# specifically referenced by the dispatch table.
+		def default! spec
+			@subspecs.default = spec end
+		# Adds tagnames and their associated XMLSpecs to the dispatch table.
+		# The argument _specs_ is a preconstructed hash that will be merged
+		# with any existing table.  (There is no way of directly removing
+		# entries -- except by overwriting them -- as the tree is a static
+		# structure built before reading the document.)
+		def specs! specs
+			@subspecs.merge! specs end
+		# Convenience method for accessing the dispatch table; returns the hash.
+		def specs
+			return @subspecs end
+		# Locates and returns the XMLSpec that handles _name_ in the dispatch table.
+		def spec name
+			return @subspecs[name] end
+		# Called by XMLStreamListener when a start-tag for an element handled
+		# by this XMLSpec is read. Arguments:
+		# _context_::
+		#   the context object passed from the level above (application
+		#   specific -- may be nil).  This is also the return value
+		#   from the method, and will be passed on to any subordinate
+		#   nodes.  A derived version may actually supply a completely
+		#   new context if desired, but if it does so, it _must_
+		#   preserve the one it received (in an instance variable)
+		#   for restoration later by the #done (or #empty) method.
+		#   (In any normal situation the same context object --
+		#   updated as required -- will be used throughout.  No
+		#   special preservation is then needed, as long as all
+		#   methods return it on exit.)
+		# _name_::
+		#   the actual tag name of the element.  (The same node can handle
+		#   several tagnames.)
+		# _attrs_::	a hash of the attribute _name_/_value_ pairs in the tag.
+		# The base method is a dummy that does nothing but return the context.
+		# Derived nodes will redefine the method as necessary.
+		def start context,name,attrs
+	 		return context	# can get changed
+		end
+		# Called by XMLStreamListener when an end-tag for an element handled
+		# by this XMLSpec is read _and_ the element was not empty.
+		# If it is actually an empty element #empty in invoked instead.
+		# If any action needs to be taken at the end of an element, this dummy
+		# method may be redefined.
+		# Arguments:
+		# _context_::
+		#   the context object at this level (application
+		#   specific -- may be nil).  This is also the return value
+		#   from the method, and will be passed back to XMLStreamListener
+		#   (which does _not_ keep track of context itself!).
+		#   If the #start method provided a new context this _must_
+		#   restore the one preserved at that time.
+		# _name_::
+		#   the actual tag name of the element.  (Not normally needed,
+		#   but it might be useful to have the name again here.)
+		def done context,name
+	 		return context	# can get restored
+		end
+		# Called by XMLStreamListener when an end-tag for an element handled
+		# by this XMLSpec is read _and_ the element was empty.
+		# If it is not actually an empty element #done in invoked instead.
+		# If any action needs to be taken for an empty element, this dummy
+		# method may be redefined.
+		# Arguments:
+		# _context_::
+		#   the context object at this level (application
+		#   specific -- may be nil).  This is also the return value
+		#   from the method, and will be passed back to XMLStreamListener
+		#   (which does _not_ keep track of context itself!).
+		#   If the #start method provided a new context this _must_
+		#   restore the one preserved at that time.
+		# _name_::
+		#   the actual tag name of the element.  (Not normally needed,
+		#   but it might be useful to have the name again here.)
+		def empty context
+			# called only if no enclosed text or elements
+			return context
+		end
+		# Called by XMLStreamListener when text is encountered within a
+		# handled element.  It will be invoked separately for each text
+		# segment (separated by subordinate elements) within the element.
+		# It is a dummy method that may be redefined as desired to handle
+		# the text.  It does not need to return the _context_.
+		def text context,data
+		end
+	end
+	# Predefined global XMLSpecs:
+	$specXMLVoid = XMLSpec.new
+	$specXMLVoid.default!($specXMLVoid)	# to avoid circularity
+	$specXMLFail = XMLSpec.new
+	$specXMLFail.default!($specXMLFail)
+	def $specXMLFail.start(context,name,attrs)
+		raise XMLError.new("Failed Tag <#{name}...>")
+	end
+	# This class extends REXML::StreamListener to provide an interface to
+	# a 'tree' of XMLSpec nodes that models the hierarchy of the XML document
+	# to be read.
+	class XMLStreamListener
+	 include REXML::StreamListener
+	 	# Create a new XMLStreamListener with _root_ as the root XMLSpec
+	 	# of the XML hierarchy to be parsed. _base_ is an optional 'context',
+	 	# of any form suitable to the task, that will be passed to all XMLSpec
+	 	# methods invoked.
+		def initialize root=$specXMLVoid, base=nil
+			@currSpec=root
+			@currContext=base
+			@prevspecs=[]
+			@openTag=nil
+		end
+		# Invoked when a tag is encountered, with args:
+		# * _name_ the tag name
+		# * _attrs_ a Hash of attribute/value pairs. [*NOT* an array of arrays!]
+		#   -- i.e. a start tag like:
+		#	 <tag attr1="value1" attr2="value2">
+		#   will result in:
+		#   tag_start( "tag", {"attr1"=>"value1","attr2"=>"value2"})
+		#
+		# This in turn determines the appropriate XMLSpec node that should
+		# handle the tag (by querying the current spec), sets this as the new
+		# current spec, and invokes its XMLSpec#start method.
+		def tag_start name, attrs
+			@prevspecs.push(@currSpec)
+			@openTag=name
+			@currSpec = @currSpec.spec name
+		 	@currContext = @currSpec.start(@currContext,name,attrs)
+		end
+		# Invoked when the end tag is reached, with the _name_ of the tag
+		# as argument.  In the case of  an empty tag ('<tag/>',
+		# tag_end will be called immediately after tag_start.
+		# If the element was not empty, the current XMLSpec#done method is
+		# called, otherwise the XMLSpec#empty method.  Then the previous
+		# higher level) spec is restored.
+		def tag_end name
+			@currContext = if (@openTag == name)
+				@currSpec.empty(@currContext)
+			  else
+				@currSpec.done(@currContext, name)
+			  end
+			@openTag=nil
+			@currSpec = @prevspecs.pop
+		end
+		# Invoked when text is encountered in the document,
+		# with the _text_ content as argument.
+		# The current XMLSpec#text method is in turn called.
+		# (Note that if the text is interspersed with other elements,
+		# this method is invoked for each segment separately.)
+		def text text
+			@openTag=nil
+			@currSpec.text(@currContext, text)
+		end
+	end
+end

data/xmldemo.rb ADDED

@@ -0,0 +1,75 @@
+# 'xmldemo.rb'
+# This is an example of how to use the XMLStreamin classes
+# (XMLStreamListener and XMLSpec) in 'xmlstreamin.rb' to parse an XML stream.
+# [These comments are intended to be read inline -- not via rdoc...]
+#
+# For demonstration, it is set to extract parts of the XML example used
+# in the REXML Tutorial ('demodoc.xml' here), but any other file should
+# just get its tags listed.
+# xmlstreamin.rb should be in the local directory or the library path:
+require "xmlstreamin"
+# This saves qualifying all the references with "XMLStreamin::":
+include XMLStreamin
+####################################################
+# It is useful to create a new class that extends the basic one slightly:
+class XMLGenericSpec < XMLSpec
+	def start(context,name,attrs)
+		print "Element: #{name} \n"
+		attrs.each {|attr|
+		 print "    #{attr[0]} = #{attr[1]}\n"}
+ 		return context
+	end
+end
+# ...and we can use this for a default node for unknown XML formats:
+specShow = XMLGenericSpec.new
+# This invokes itself for any enclosed elements:
+specShow.default! specShow
+####################################################
+# We declare a spec to handle the "name" elements of the example format:
+specName = XMLGenericSpec.new
+# The name itself is element text, so we redefine that method:
+def specName.text(context,data)
+	print "    [#{data}]\n"	## assumes single line
+end
+# There shouldn't be any sub-elements! Let's fail if there are:
+specName.default! $specXMLFail
+# The enclosing element is an 'item', so we define a node for that
+# (just a basic XMLSpec, so it does nothing but model the structure):
+specItem = XMLSpec.new
+# We pass 'name' sub-elements on to their handling node:
+specItem.specs!({'name'=>specName})
+# Everything else gets ignored (the default is '$specXMLVoid, which skips them)
+# Enclosing this is the 'section' element.
+# For variety we use XMLGenericSpec again here, so we see the attributes:
+specSection = XMLGenericSpec.new
+# The only sub-element here should be an 'item':
+specSection.specs!({'item'=>specItem})
+# The top-level element is 'inventory'
+specInventory = XMLSpec.new
+specInventory.specs!({'section'=>specSection})
+####################################################
+# The very top is the "Document" node which references the top 'inventory' element:
+specDocument = XMLSpec.new
+# If we don't see the expected top tag, we go into "show" mode:
+specDocument.default! specShow
+specDocument.specs!({'inventory'=>specInventory})
+# Finally, to parse a document, we create an XMLStreamListener and a source object,
+# and pass them to REXML.
+list = XMLStreamListener.new specDocument
+source = File.new ARGV[0]
+REXML::Document.parse_stream(source, list)

metadata ADDED

@@ -0,0 +1,49 @@
+--- !ruby/object:Gem::Specification
+rubygems_version: 0.9.4
+specification_version: 1
+name: xmlstreamin
+version: !ruby/object:Gem::Version
+  version: 0.0.1
+date: 2007-08-06 00:00:00 +02:00
+summary: XMLStreamin
+require_paths:
+- lib
+email: pete@jwgibbs.cchem.berkeley.edu
+homepage:
+rubyforge_project:
+description:
+autorequire:
+default_executable:
+bindir: bin
+has_rdoc: true
+required_ruby_version: !ruby/object:Gem::Version::Requirement
+  requirements:
+  - - ">"
+    - !ruby/object:Gem::Version
+      version: 0.0.0
+  version:
+platform: ruby
+signing_key:
+cert_chain:
+post_install_message:
+authors:
+- Peter Goodeve
+files:
+- lib/xmlstreamin.rb
+- xmldemo.rb
+- demodoc.xml
+- README
+test_files: []
+rdoc_options: []
+extra_rdoc_files: []
+executables: []
+extensions: []
+requirements: []
+dependencies: []