xmlstreamin 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. data/README +63 -0
  2. data/demodoc.xml +27 -0
  3. data/lib/xmlstreamin.rb +259 -0
  4. data/xmldemo.rb +75 -0
  5. metadata +49 -0
data/README ADDED
@@ -0,0 +1,63 @@
1
+ XMLStreamin
2
+ ===========
3
+
4
+ XMLStreamin is a small Ruby module that provides a way of reading an XML
5
+ document as a stream, while letting it be processed acccording to its tree
6
+ structure much more handily than the usual 'flat' stream reader does.
7
+
8
+ Unlike such readers, which typically simply call unspecialized methods
9
+ for each start-tag, end-tag, text segment, and so on, and leave it to the
10
+ application to sort out the hierarchy, XMLStreamin uses a pre-built tree
11
+ of XMLSpec nodes to model the expected document structure.
12
+ (The module is fairly basic: it only handles the main hierarchy of the document.
13
+ No attention is paid to other elements like declarations, as it is not intended
14
+ as a do-everything parser. As the XMLStreamListener class is derived from
15
+ REXML::StreamListener, you could add methods to handle such things if needed.)
16
+
17
+ Each node specifies the actual actions to be taken when an element that
18
+ it represents is encountered. It can specify what processing needs to be
19
+ done on the attributes of a start-tag, the handling of included text, and any
20
+ clean-up actions when the end-tag is read. It also contains a table of the
21
+ expected sub-elements and their XMLSpec nodes, thus reflecting the document
22
+ tree.
23
+
24
+ The central class is 'XMLStreamListener' which extends REXML::StreamListener
25
+ to provide an interface to a 'tree' of 'XMLSpec' nodes that models the hierarchy
26
+ of the XML document to be read.
27
+
28
+ 'XMLSpec' is a base class intended to be extended as needed to handle
29
+ processing for each type of expected element at each level of the XML
30
+ hierarchy in the document. It has two categories of methods:
31
+ those concerned with setup (which should not need to be modified)
32
+ -- 'specs!', 'default!', and 'spec' --, and the handler methods
33
+ 'start', 'done', 'empty' and 'text', that should be specialized
34
+ as necessary in derived classes or instances.
35
+
36
+ XMLSpec nodes are intended to be linked in a tree structure, reflecting
37
+ the structure of the XML document to be read. Each node has a 'dispatch'
38
+ (hash) table associating expected tag names with the subordinate XMLSpec
39
+ nodes that should handle them; the hash table default should reference
40
+ a node that will handle unexpected tags. If appropriate, you can use
41
+ a single node to service several different tag names (bearing in mind
42
+ that the dispatch table will be shared).
43
+
44
+ There are two predefined global XMLSpecs: '$specXMLVoid', which is a do-nothing
45
+ basic XMLSpec that can be used to represent elements that you aren't interested
46
+ in, and '$specXMLFail' which will raise an error if it is invoked.
47
+
48
+ TO use this module. xmlstreamin.rb should either be in the local directory
49
+ or the Ruby library path. It can then be loaded with 'require "xmlstreamin"'.
50
+ It loads "rexml/document" and "rexml/streamlistener" itself. The latter
51
+ should not be needed outside the module, but XMLStreamListener is invoked
52
+ via a call to "REXML::Document.parse_stream".
53
+
54
+ See "xmldemo.rb" and the comments therein for a small -- contrived --
55
+ example of how to use it. RDOC documentation is also provided.
56
+
57
+ Contents:
58
+ xmlstreamin.rb -- move this into the Ruby library path
59
+ xmldemo.rb -- example of usage
60
+ demodoc.xml -- to invoke xmldemo.rb on
61
+ README -- this file
62
+ XMLStreamin.html -- link to:
63
+ doc -- RDOC directory
@@ -0,0 +1,27 @@
1
+ <inventory title="OmniCorp Store #45x10^3">
2
+ <section name="health">
3
+ <item upc="123456789" stock="12">
4
+ <name>Invisibility Cream</name>
5
+ <price>14.50</price>
6
+ <description>Makes you invisible</description>
7
+ </item>
8
+ <item upc="445322344" stock="18">
9
+ <name>Levitation Salve</name>
10
+ <price>23.99</price>
11
+ <description>Levitate yourself for up to 3 hours per application</description>
12
+ </item>
13
+ </section>
14
+ <section name="food">
15
+ <item upc="485672034" stock="653">
16
+ <name>Blork and Freen Instameal</name>
17
+ <price>4.95</price>
18
+ <description>A tasty meal in a tablet;<intron>junk added</intron>just add water</description>
19
+ </item>
20
+ <item upc="132957764" stock="44">
21
+ <name>Grob winglets</name>
22
+ <price>3.56</price>
23
+ <description>Tender winglets of Grob. Just add water</description>
24
+
25
+ </item>
26
+ </section>
27
+ </inventory>
@@ -0,0 +1,259 @@
1
+ # :title: XMLStreamin Documentation
2
+ # XMLStreamin is a small module that provides a way of reading XML as a stream,
3
+ # while letting it be processed acccording to its tree structure much more
4
+ # handily than the usual 'flat' stream reader does.
5
+ #
6
+ # Unlike such readers, which typically simply call unspecialized methods
7
+ # for each start-tag, end-tag, text segment, and so on, and leave it to the
8
+ # application to sort out the hierarchy, XMLStreamin uses a pre-built tree
9
+ # of #XMLSpec nodes to model the expected document structure.
10
+ #
11
+ # Each node specifies the actual actions to be taken when an element that
12
+ # it represents is encountered. It can specify what processing needs to be
13
+ # done on the attributes of a start-tag, the handling of included text, and any
14
+ # clean-up actions when the end-tag is read. It also contains a table of the
15
+ # expected sub-elements and _their_ XMLSpec nodes, thus reflecting the document
16
+ # tree.
17
+ #
18
+ # Usage skeleton:
19
+ # require "xmlstreamin"
20
+ # include XMLStreamin # for convenience in naming
21
+ #
22
+ # # Supply XMLSpec objects to model the document tree:
23
+ #
24
+ # #First for the leaf elements:
25
+ # someleafspec = XMLSpec.new
26
+ # # override methods as needed, e.g.:
27
+ # def someleafspec.text(context,data)
28
+ # #...
29
+ # end
30
+ #
31
+ # #... then provide nodes for the elements that enclose these:
32
+ # someotherspec = XMLSpec.new
33
+ # someotherspec.specs!({'someleaftag'=>someleafspec,...})
34
+ # # Override methods in these as well, if appropriate:
35
+ # def someotherspec.start(context,name,attrs)
36
+ # #...
37
+ # return context
38
+ # end
39
+ #
40
+ # #...... more specs complete the tree, up to some 'toplevelspec'
41
+ # # for the document's enclosing element.
42
+ #
43
+ # # Finally one spec for the document itself that just has
44
+ # # an entry for that top level tag:
45
+ # specDocument = XMLSpec.new
46
+ # specDocument.specs!({'document_top_tag'=>toplevelspec})
47
+ #
48
+ # To run, provide a source stream, and invoke the REXML stream parser:
49
+ # source = ...the source of the document stream
50
+ # REXML::Document.parse_stream(source,
51
+ # XMLStreamListener.new(specDocument))
52
+ #
53
+ #
54
+ module XMLStreamin
55
+
56
+ require "rexml/document"
57
+ require "rexml/streamlistener"
58
+
59
+ # Exception class that will be thrown by the module if it hits trouble.
60
+ class XMLError < RuntimeError
61
+ end
62
+
63
+ # XMLSpec is a base class intended to be extended as needed to handle
64
+ # processing for each type of expected element at each level of the
65
+ # XML hierarchy in the document. It has two categories of methods:
66
+ # those concerned with setup (which should not need to be modified)
67
+ # -- #specs!, #default!, and #spec --, and the handler methods
68
+ # #start, #done, #empty and #text, that are intended to be
69
+ # specialized as necessary in derived classes or instances.
70
+ #
71
+ # XMLSpec nodes are intended to be linked in a tree structure, reflecting
72
+ # the structure of the XML document to be read. Each node has a 'dispatch'
73
+ # (hash) table associating expected tag names with the subordinate XMLSpec
74
+ # nodes that should handle them; the hash table _default_ should reference
75
+ # a node that will handle unexpected tags. If appropriate, you can use
76
+ # a single node to service several different tag names (bearing in mind
77
+ # that the dispatch table will be shared).
78
+ #
79
+ # There are two predefined global XMLSpecs:
80
+ # $specXMLVoid::
81
+ # This is a do-nothing basic XMLSpec that can be used to represent
82
+ # elements that you aren't interested in. These and any subordinate
83
+ # elements will be skipped during parsing.
84
+ # $specXMLFail::
85
+ # This will raise XMLError if it is invoked. It may be used if an
86
+ # unexpected element is a serious problem.
87
+ class XMLSpec
88
+ # Create an empty instance. Unless it is a non-functional leaf node,
89
+ # it will need to be specialized by filling the dispatch table ( #specs! )
90
+ # and redefining methods as appropriate.
91
+ def initialize
92
+ @subspecs=Hash.new($specXMLVoid) end
93
+
94
+ # Set the XMLSpec that will be used for tag names that are not
95
+ # specifically referenced by the dispatch table.
96
+ def default! spec
97
+ @subspecs.default = spec end
98
+
99
+ # Adds tagnames and their associated XMLSpecs to the dispatch table.
100
+ # The argument _specs_ is a preconstructed hash that will be merged
101
+ # with any existing table. (There is no way of directly removing
102
+ # entries -- except by overwriting them -- as the tree is a static
103
+ # structure built before reading the document.)
104
+ def specs! specs
105
+ @subspecs.merge! specs end
106
+
107
+ # Convenience method for accessing the dispatch table; returns the hash.
108
+ def specs
109
+ return @subspecs end
110
+
111
+ # Locates and returns the XMLSpec that handles _name_ in the dispatch table.
112
+ def spec name
113
+ return @subspecs[name] end
114
+
115
+ # Called by XMLStreamListener when a start-tag for an element handled
116
+ # by this XMLSpec is read. Arguments:
117
+ # _context_::
118
+ # the context object passed from the level above (application
119
+ # specific -- may be nil). This is also the return value
120
+ # from the method, and will be passed on to any subordinate
121
+ # nodes. A derived version may actually supply a completely
122
+ # new context if desired, but if it does so, it _must_
123
+ # preserve the one it received (in an instance variable)
124
+ # for restoration later by the #done (or #empty) method.
125
+ # (In any normal situation the same context object --
126
+ # updated as required -- will be used throughout. No
127
+ # special preservation is then needed, as long as all
128
+ # methods return it on exit.)
129
+ # _name_::
130
+ # the actual tag name of the element. (The same node can handle
131
+ # several tagnames.)
132
+ # _attrs_:: a hash of the attribute _name_/_value_ pairs in the tag.
133
+ # The base method is a dummy that does nothing but return the context.
134
+ # Derived nodes will redefine the method as necessary.
135
+ def start context,name,attrs
136
+ return context # can get changed
137
+ end
138
+
139
+ # Called by XMLStreamListener when an end-tag for an element handled
140
+ # by this XMLSpec is read _and_ the element was not empty.
141
+ # If it is actually an empty element #empty in invoked instead.
142
+ # If any action needs to be taken at the end of an element, this dummy
143
+ # method may be redefined.
144
+ # Arguments:
145
+ # _context_::
146
+ # the context object at this level (application
147
+ # specific -- may be nil). This is also the return value
148
+ # from the method, and will be passed back to XMLStreamListener
149
+ # (which does _not_ keep track of context itself!).
150
+ # If the #start method provided a new context this _must_
151
+ # restore the one preserved at that time.
152
+ # _name_::
153
+ # the actual tag name of the element. (Not normally needed,
154
+ # but it might be useful to have the name again here.)
155
+ def done context,name
156
+ return context # can get restored
157
+ end
158
+
159
+ # Called by XMLStreamListener when an end-tag for an element handled
160
+ # by this XMLSpec is read _and_ the element was empty.
161
+ # If it is not actually an empty element #done in invoked instead.
162
+ # If any action needs to be taken for an empty element, this dummy
163
+ # method may be redefined.
164
+ # Arguments:
165
+ # _context_::
166
+ # the context object at this level (application
167
+ # specific -- may be nil). This is also the return value
168
+ # from the method, and will be passed back to XMLStreamListener
169
+ # (which does _not_ keep track of context itself!).
170
+ # If the #start method provided a new context this _must_
171
+ # restore the one preserved at that time.
172
+ # _name_::
173
+ # the actual tag name of the element. (Not normally needed,
174
+ # but it might be useful to have the name again here.)
175
+ def empty context
176
+ # called only if no enclosed text or elements
177
+ return context
178
+ end
179
+
180
+ # Called by XMLStreamListener when text is encountered within a
181
+ # handled element. It will be invoked separately for each text
182
+ # segment (separated by subordinate elements) within the element.
183
+ # It is a dummy method that may be redefined as desired to handle
184
+ # the text. It does not need to return the _context_.
185
+ def text context,data
186
+ end
187
+ end
188
+
189
+ # Predefined global XMLSpecs:
190
+
191
+ $specXMLVoid = XMLSpec.new
192
+ $specXMLVoid.default!($specXMLVoid) # to avoid circularity
193
+
194
+ $specXMLFail = XMLSpec.new
195
+ $specXMLFail.default!($specXMLFail)
196
+ def $specXMLFail.start(context,name,attrs)
197
+ raise XMLError.new("Failed Tag <#{name}...>")
198
+ end
199
+
200
+ # This class extends REXML::StreamListener to provide an interface to
201
+ # a 'tree' of XMLSpec nodes that models the hierarchy of the XML document
202
+ # to be read.
203
+ class XMLStreamListener
204
+ include REXML::StreamListener
205
+ # Create a new XMLStreamListener with _root_ as the root XMLSpec
206
+ # of the XML hierarchy to be parsed. _base_ is an optional 'context',
207
+ # of any form suitable to the task, that will be passed to all XMLSpec
208
+ # methods invoked.
209
+ def initialize root=$specXMLVoid, base=nil
210
+ @currSpec=root
211
+ @currContext=base
212
+ @prevspecs=[]
213
+ @openTag=nil
214
+ end
215
+ # Invoked when a tag is encountered, with args:
216
+ # * _name_ the tag name
217
+ # * _attrs_ a Hash of attribute/value pairs. [*NOT* an array of arrays!]
218
+ # -- i.e. a start tag like:
219
+ # <tag attr1="value1" attr2="value2">
220
+ # will result in:
221
+ # tag_start( "tag", {"attr1"=>"value1","attr2"=>"value2"})
222
+ #
223
+ # This in turn determines the appropriate XMLSpec node that should
224
+ # handle the tag (by querying the current spec), sets this as the new
225
+ # current spec, and invokes its XMLSpec#start method.
226
+ def tag_start name, attrs
227
+ @prevspecs.push(@currSpec)
228
+ @openTag=name
229
+ @currSpec = @currSpec.spec name
230
+ @currContext = @currSpec.start(@currContext,name,attrs)
231
+ end
232
+ # Invoked when the end tag is reached, with the _name_ of the tag
233
+ # as argument. In the case of an empty tag ('<tag/>',
234
+ # tag_end will be called immediately after tag_start.
235
+ # If the element was not empty, the current XMLSpec#done method is
236
+ # called, otherwise the XMLSpec#empty method. Then the previous
237
+ # higher level) spec is restored.
238
+ def tag_end name
239
+ @currContext = if (@openTag == name)
240
+ @currSpec.empty(@currContext)
241
+ else
242
+ @currSpec.done(@currContext, name)
243
+ end
244
+ @openTag=nil
245
+ @currSpec = @prevspecs.pop
246
+ end
247
+ # Invoked when text is encountered in the document,
248
+ # with the _text_ content as argument.
249
+ # The current XMLSpec#text method is in turn called.
250
+ # (Note that if the text is interspersed with other elements,
251
+ # this method is invoked for each segment separately.)
252
+ def text text
253
+ @openTag=nil
254
+ @currSpec.text(@currContext, text)
255
+ end
256
+ end
257
+
258
+ end
259
+
@@ -0,0 +1,75 @@
1
+ # 'xmldemo.rb'
2
+ # This is an example of how to use the XMLStreamin classes
3
+ # (XMLStreamListener and XMLSpec) in 'xmlstreamin.rb' to parse an XML stream.
4
+ # [These comments are intended to be read inline -- not via rdoc...]
5
+ #
6
+ # For demonstration, it is set to extract parts of the XML example used
7
+ # in the REXML Tutorial ('demodoc.xml' here), but any other file should
8
+ # just get its tags listed.
9
+
10
+
11
+ # xmlstreamin.rb should be in the local directory or the library path:
12
+ require "xmlstreamin"
13
+ # This saves qualifying all the references with "XMLStreamin::":
14
+ include XMLStreamin
15
+
16
+ ####################################################
17
+
18
+ # It is useful to create a new class that extends the basic one slightly:
19
+ class XMLGenericSpec < XMLSpec
20
+ def start(context,name,attrs)
21
+ print "Element: #{name} \n"
22
+ attrs.each {|attr|
23
+ print " #{attr[0]} = #{attr[1]}\n"}
24
+ return context
25
+ end
26
+ end
27
+
28
+ # ...and we can use this for a default node for unknown XML formats:
29
+ specShow = XMLGenericSpec.new
30
+ # This invokes itself for any enclosed elements:
31
+ specShow.default! specShow
32
+
33
+ ####################################################
34
+
35
+ # We declare a spec to handle the "name" elements of the example format:
36
+ specName = XMLGenericSpec.new
37
+ # The name itself is element text, so we redefine that method:
38
+ def specName.text(context,data)
39
+ print " [#{data}]\n" ## assumes single line
40
+ end
41
+ # There shouldn't be any sub-elements! Let's fail if there are:
42
+ specName.default! $specXMLFail
43
+
44
+ # The enclosing element is an 'item', so we define a node for that
45
+ # (just a basic XMLSpec, so it does nothing but model the structure):
46
+ specItem = XMLSpec.new
47
+ # We pass 'name' sub-elements on to their handling node:
48
+ specItem.specs!({'name'=>specName})
49
+ # Everything else gets ignored (the default is '$specXMLVoid, which skips them)
50
+
51
+ # Enclosing this is the 'section' element.
52
+ # For variety we use XMLGenericSpec again here, so we see the attributes:
53
+ specSection = XMLGenericSpec.new
54
+ # The only sub-element here should be an 'item':
55
+ specSection.specs!({'item'=>specItem})
56
+
57
+ # The top-level element is 'inventory'
58
+ specInventory = XMLSpec.new
59
+ specInventory.specs!({'section'=>specSection})
60
+
61
+
62
+ ####################################################
63
+
64
+ # The very top is the "Document" node which references the top 'inventory' element:
65
+ specDocument = XMLSpec.new
66
+ # If we don't see the expected top tag, we go into "show" mode:
67
+ specDocument.default! specShow
68
+ specDocument.specs!({'inventory'=>specInventory})
69
+
70
+ # Finally, to parse a document, we create an XMLStreamListener and a source object,
71
+ # and pass them to REXML.
72
+ list = XMLStreamListener.new specDocument
73
+ source = File.new ARGV[0]
74
+ REXML::Document.parse_stream(source, list)
75
+
metadata ADDED
@@ -0,0 +1,49 @@
1
+ --- !ruby/object:Gem::Specification
2
+ rubygems_version: 0.9.4
3
+ specification_version: 1
4
+ name: xmlstreamin
5
+ version: !ruby/object:Gem::Version
6
+ version: 0.0.1
7
+ date: 2007-08-06 00:00:00 +02:00
8
+ summary: XMLStreamin
9
+ require_paths:
10
+ - lib
11
+ email: pete@jwgibbs.cchem.berkeley.edu
12
+ homepage:
13
+ rubyforge_project:
14
+ description:
15
+ autorequire:
16
+ default_executable:
17
+ bindir: bin
18
+ has_rdoc: true
19
+ required_ruby_version: !ruby/object:Gem::Version::Requirement
20
+ requirements:
21
+ - - ">"
22
+ - !ruby/object:Gem::Version
23
+ version: 0.0.0
24
+ version:
25
+ platform: ruby
26
+ signing_key:
27
+ cert_chain:
28
+ post_install_message:
29
+ authors:
30
+ - Peter Goodeve
31
+ files:
32
+ - lib/xmlstreamin.rb
33
+ - xmldemo.rb
34
+ - demodoc.xml
35
+ - README
36
+ test_files: []
37
+
38
+ rdoc_options: []
39
+
40
+ extra_rdoc_files: []
41
+
42
+ executables: []
43
+
44
+ extensions: []
45
+
46
+ requirements: []
47
+
48
+ dependencies: []
49
+