xmlstreamin 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. data/README +63 -0
  2. data/demodoc.xml +27 -0
  3. data/lib/xmlstreamin.rb +259 -0
  4. data/xmldemo.rb +75 -0
  5. metadata +49 -0
data/README ADDED
@@ -0,0 +1,63 @@
1
+ XMLStreamin
2
+ ===========
3
+
4
+ XMLStreamin is a small Ruby module that provides a way of reading an XML
5
+ document as a stream, while letting it be processed acccording to its tree
6
+ structure much more handily than the usual 'flat' stream reader does.
7
+
8
+ Unlike such readers, which typically simply call unspecialized methods
9
+ for each start-tag, end-tag, text segment, and so on, and leave it to the
10
+ application to sort out the hierarchy, XMLStreamin uses a pre-built tree
11
+ of XMLSpec nodes to model the expected document structure.
12
+ (The module is fairly basic: it only handles the main hierarchy of the document.
13
+ No attention is paid to other elements like declarations, as it is not intended
14
+ as a do-everything parser. As the XMLStreamListener class is derived from
15
+ REXML::StreamListener, you could add methods to handle such things if needed.)
16
+
17
+ Each node specifies the actual actions to be taken when an element that
18
+ it represents is encountered. It can specify what processing needs to be
19
+ done on the attributes of a start-tag, the handling of included text, and any
20
+ clean-up actions when the end-tag is read. It also contains a table of the
21
+ expected sub-elements and their XMLSpec nodes, thus reflecting the document
22
+ tree.
23
+
24
+ The central class is 'XMLStreamListener' which extends REXML::StreamListener
25
+ to provide an interface to a 'tree' of 'XMLSpec' nodes that models the hierarchy
26
+ of the XML document to be read.
27
+
28
+ 'XMLSpec' is a base class intended to be extended as needed to handle
29
+ processing for each type of expected element at each level of the XML
30
+ hierarchy in the document. It has two categories of methods:
31
+ those concerned with setup (which should not need to be modified)
32
+ -- 'specs!', 'default!', and 'spec' --, and the handler methods
33
+ 'start', 'done', 'empty' and 'text', that should be specialized
34
+ as necessary in derived classes or instances.
35
+
36
+ XMLSpec nodes are intended to be linked in a tree structure, reflecting
37
+ the structure of the XML document to be read. Each node has a 'dispatch'
38
+ (hash) table associating expected tag names with the subordinate XMLSpec
39
+ nodes that should handle them; the hash table default should reference
40
+ a node that will handle unexpected tags. If appropriate, you can use
41
+ a single node to service several different tag names (bearing in mind
42
+ that the dispatch table will be shared).
43
+
44
+ There are two predefined global XMLSpecs: '$specXMLVoid', which is a do-nothing
45
+ basic XMLSpec that can be used to represent elements that you aren't interested
46
+ in, and '$specXMLFail' which will raise an error if it is invoked.
47
+
48
+ TO use this module. xmlstreamin.rb should either be in the local directory
49
+ or the Ruby library path. It can then be loaded with 'require "xmlstreamin"'.
50
+ It loads "rexml/document" and "rexml/streamlistener" itself. The latter
51
+ should not be needed outside the module, but XMLStreamListener is invoked
52
+ via a call to "REXML::Document.parse_stream".
53
+
54
+ See "xmldemo.rb" and the comments therein for a small -- contrived --
55
+ example of how to use it. RDOC documentation is also provided.
56
+
57
+ Contents:
58
+ xmlstreamin.rb -- move this into the Ruby library path
59
+ xmldemo.rb -- example of usage
60
+ demodoc.xml -- to invoke xmldemo.rb on
61
+ README -- this file
62
+ XMLStreamin.html -- link to:
63
+ doc -- RDOC directory
@@ -0,0 +1,27 @@
1
+ <inventory title="OmniCorp Store #45x10^3">
2
+ <section name="health">
3
+ <item upc="123456789" stock="12">
4
+ <name>Invisibility Cream</name>
5
+ <price>14.50</price>
6
+ <description>Makes you invisible</description>
7
+ </item>
8
+ <item upc="445322344" stock="18">
9
+ <name>Levitation Salve</name>
10
+ <price>23.99</price>
11
+ <description>Levitate yourself for up to 3 hours per application</description>
12
+ </item>
13
+ </section>
14
+ <section name="food">
15
+ <item upc="485672034" stock="653">
16
+ <name>Blork and Freen Instameal</name>
17
+ <price>4.95</price>
18
+ <description>A tasty meal in a tablet;<intron>junk added</intron>just add water</description>
19
+ </item>
20
+ <item upc="132957764" stock="44">
21
+ <name>Grob winglets</name>
22
+ <price>3.56</price>
23
+ <description>Tender winglets of Grob. Just add water</description>
24
+
25
+ </item>
26
+ </section>
27
+ </inventory>
@@ -0,0 +1,259 @@
1
+ # :title: XMLStreamin Documentation
2
+ # XMLStreamin is a small module that provides a way of reading XML as a stream,
3
+ # while letting it be processed acccording to its tree structure much more
4
+ # handily than the usual 'flat' stream reader does.
5
+ #
6
+ # Unlike such readers, which typically simply call unspecialized methods
7
+ # for each start-tag, end-tag, text segment, and so on, and leave it to the
8
+ # application to sort out the hierarchy, XMLStreamin uses a pre-built tree
9
+ # of #XMLSpec nodes to model the expected document structure.
10
+ #
11
+ # Each node specifies the actual actions to be taken when an element that
12
+ # it represents is encountered. It can specify what processing needs to be
13
+ # done on the attributes of a start-tag, the handling of included text, and any
14
+ # clean-up actions when the end-tag is read. It also contains a table of the
15
+ # expected sub-elements and _their_ XMLSpec nodes, thus reflecting the document
16
+ # tree.
17
+ #
18
+ # Usage skeleton:
19
+ # require "xmlstreamin"
20
+ # include XMLStreamin # for convenience in naming
21
+ #
22
+ # # Supply XMLSpec objects to model the document tree:
23
+ #
24
+ # #First for the leaf elements:
25
+ # someleafspec = XMLSpec.new
26
+ # # override methods as needed, e.g.:
27
+ # def someleafspec.text(context,data)
28
+ # #...
29
+ # end
30
+ #
31
+ # #... then provide nodes for the elements that enclose these:
32
+ # someotherspec = XMLSpec.new
33
+ # someotherspec.specs!({'someleaftag'=>someleafspec,...})
34
+ # # Override methods in these as well, if appropriate:
35
+ # def someotherspec.start(context,name,attrs)
36
+ # #...
37
+ # return context
38
+ # end
39
+ #
40
+ # #...... more specs complete the tree, up to some 'toplevelspec'
41
+ # # for the document's enclosing element.
42
+ #
43
+ # # Finally one spec for the document itself that just has
44
+ # # an entry for that top level tag:
45
+ # specDocument = XMLSpec.new
46
+ # specDocument.specs!({'document_top_tag'=>toplevelspec})
47
+ #
48
+ # To run, provide a source stream, and invoke the REXML stream parser:
49
+ # source = ...the source of the document stream
50
+ # REXML::Document.parse_stream(source,
51
+ # XMLStreamListener.new(specDocument))
52
+ #
53
+ #
54
+ module XMLStreamin
55
+
56
+ require "rexml/document"
57
+ require "rexml/streamlistener"
58
+
59
+ # Exception class that will be thrown by the module if it hits trouble.
60
+ class XMLError < RuntimeError
61
+ end
62
+
63
+ # XMLSpec is a base class intended to be extended as needed to handle
64
+ # processing for each type of expected element at each level of the
65
+ # XML hierarchy in the document. It has two categories of methods:
66
+ # those concerned with setup (which should not need to be modified)
67
+ # -- #specs!, #default!, and #spec --, and the handler methods
68
+ # #start, #done, #empty and #text, that are intended to be
69
+ # specialized as necessary in derived classes or instances.
70
+ #
71
+ # XMLSpec nodes are intended to be linked in a tree structure, reflecting
72
+ # the structure of the XML document to be read. Each node has a 'dispatch'
73
+ # (hash) table associating expected tag names with the subordinate XMLSpec
74
+ # nodes that should handle them; the hash table _default_ should reference
75
+ # a node that will handle unexpected tags. If appropriate, you can use
76
+ # a single node to service several different tag names (bearing in mind
77
+ # that the dispatch table will be shared).
78
+ #
79
+ # There are two predefined global XMLSpecs:
80
+ # $specXMLVoid::
81
+ # This is a do-nothing basic XMLSpec that can be used to represent
82
+ # elements that you aren't interested in. These and any subordinate
83
+ # elements will be skipped during parsing.
84
+ # $specXMLFail::
85
+ # This will raise XMLError if it is invoked. It may be used if an
86
+ # unexpected element is a serious problem.
87
+ class XMLSpec
88
+ # Create an empty instance. Unless it is a non-functional leaf node,
89
+ # it will need to be specialized by filling the dispatch table ( #specs! )
90
+ # and redefining methods as appropriate.
91
+ def initialize
92
+ @subspecs=Hash.new($specXMLVoid) end
93
+
94
+ # Set the XMLSpec that will be used for tag names that are not
95
+ # specifically referenced by the dispatch table.
96
+ def default! spec
97
+ @subspecs.default = spec end
98
+
99
+ # Adds tagnames and their associated XMLSpecs to the dispatch table.
100
+ # The argument _specs_ is a preconstructed hash that will be merged
101
+ # with any existing table. (There is no way of directly removing
102
+ # entries -- except by overwriting them -- as the tree is a static
103
+ # structure built before reading the document.)
104
+ def specs! specs
105
+ @subspecs.merge! specs end
106
+
107
+ # Convenience method for accessing the dispatch table; returns the hash.
108
+ def specs
109
+ return @subspecs end
110
+
111
+ # Locates and returns the XMLSpec that handles _name_ in the dispatch table.
112
+ def spec name
113
+ return @subspecs[name] end
114
+
115
+ # Called by XMLStreamListener when a start-tag for an element handled
116
+ # by this XMLSpec is read. Arguments:
117
+ # _context_::
118
+ # the context object passed from the level above (application
119
+ # specific -- may be nil). This is also the return value
120
+ # from the method, and will be passed on to any subordinate
121
+ # nodes. A derived version may actually supply a completely
122
+ # new context if desired, but if it does so, it _must_
123
+ # preserve the one it received (in an instance variable)
124
+ # for restoration later by the #done (or #empty) method.
125
+ # (In any normal situation the same context object --
126
+ # updated as required -- will be used throughout. No
127
+ # special preservation is then needed, as long as all
128
+ # methods return it on exit.)
129
+ # _name_::
130
+ # the actual tag name of the element. (The same node can handle
131
+ # several tagnames.)
132
+ # _attrs_:: a hash of the attribute _name_/_value_ pairs in the tag.
133
+ # The base method is a dummy that does nothing but return the context.
134
+ # Derived nodes will redefine the method as necessary.
135
+ def start context,name,attrs
136
+ return context # can get changed
137
+ end
138
+
139
+ # Called by XMLStreamListener when an end-tag for an element handled
140
+ # by this XMLSpec is read _and_ the element was not empty.
141
+ # If it is actually an empty element #empty in invoked instead.
142
+ # If any action needs to be taken at the end of an element, this dummy
143
+ # method may be redefined.
144
+ # Arguments:
145
+ # _context_::
146
+ # the context object at this level (application
147
+ # specific -- may be nil). This is also the return value
148
+ # from the method, and will be passed back to XMLStreamListener
149
+ # (which does _not_ keep track of context itself!).
150
+ # If the #start method provided a new context this _must_
151
+ # restore the one preserved at that time.
152
+ # _name_::
153
+ # the actual tag name of the element. (Not normally needed,
154
+ # but it might be useful to have the name again here.)
155
+ def done context,name
156
+ return context # can get restored
157
+ end
158
+
159
+ # Called by XMLStreamListener when an end-tag for an element handled
160
+ # by this XMLSpec is read _and_ the element was empty.
161
+ # If it is not actually an empty element #done in invoked instead.
162
+ # If any action needs to be taken for an empty element, this dummy
163
+ # method may be redefined.
164
+ # Arguments:
165
+ # _context_::
166
+ # the context object at this level (application
167
+ # specific -- may be nil). This is also the return value
168
+ # from the method, and will be passed back to XMLStreamListener
169
+ # (which does _not_ keep track of context itself!).
170
+ # If the #start method provided a new context this _must_
171
+ # restore the one preserved at that time.
172
+ # _name_::
173
+ # the actual tag name of the element. (Not normally needed,
174
+ # but it might be useful to have the name again here.)
175
+ def empty context
176
+ # called only if no enclosed text or elements
177
+ return context
178
+ end
179
+
180
+ # Called by XMLStreamListener when text is encountered within a
181
+ # handled element. It will be invoked separately for each text
182
+ # segment (separated by subordinate elements) within the element.
183
+ # It is a dummy method that may be redefined as desired to handle
184
+ # the text. It does not need to return the _context_.
185
+ def text context,data
186
+ end
187
+ end
188
+
189
+ # Predefined global XMLSpecs:
190
+
191
+ $specXMLVoid = XMLSpec.new
192
+ $specXMLVoid.default!($specXMLVoid) # to avoid circularity
193
+
194
+ $specXMLFail = XMLSpec.new
195
+ $specXMLFail.default!($specXMLFail)
196
+ def $specXMLFail.start(context,name,attrs)
197
+ raise XMLError.new("Failed Tag <#{name}...>")
198
+ end
199
+
200
+ # This class extends REXML::StreamListener to provide an interface to
201
+ # a 'tree' of XMLSpec nodes that models the hierarchy of the XML document
202
+ # to be read.
203
+ class XMLStreamListener
204
+ include REXML::StreamListener
205
+ # Create a new XMLStreamListener with _root_ as the root XMLSpec
206
+ # of the XML hierarchy to be parsed. _base_ is an optional 'context',
207
+ # of any form suitable to the task, that will be passed to all XMLSpec
208
+ # methods invoked.
209
+ def initialize root=$specXMLVoid, base=nil
210
+ @currSpec=root
211
+ @currContext=base
212
+ @prevspecs=[]
213
+ @openTag=nil
214
+ end
215
+ # Invoked when a tag is encountered, with args:
216
+ # * _name_ the tag name
217
+ # * _attrs_ a Hash of attribute/value pairs. [*NOT* an array of arrays!]
218
+ # -- i.e. a start tag like:
219
+ # <tag attr1="value1" attr2="value2">
220
+ # will result in:
221
+ # tag_start( "tag", {"attr1"=>"value1","attr2"=>"value2"})
222
+ #
223
+ # This in turn determines the appropriate XMLSpec node that should
224
+ # handle the tag (by querying the current spec), sets this as the new
225
+ # current spec, and invokes its XMLSpec#start method.
226
+ def tag_start name, attrs
227
+ @prevspecs.push(@currSpec)
228
+ @openTag=name
229
+ @currSpec = @currSpec.spec name
230
+ @currContext = @currSpec.start(@currContext,name,attrs)
231
+ end
232
+ # Invoked when the end tag is reached, with the _name_ of the tag
233
+ # as argument. In the case of an empty tag ('<tag/>',
234
+ # tag_end will be called immediately after tag_start.
235
+ # If the element was not empty, the current XMLSpec#done method is
236
+ # called, otherwise the XMLSpec#empty method. Then the previous
237
+ # higher level) spec is restored.
238
+ def tag_end name
239
+ @currContext = if (@openTag == name)
240
+ @currSpec.empty(@currContext)
241
+ else
242
+ @currSpec.done(@currContext, name)
243
+ end
244
+ @openTag=nil
245
+ @currSpec = @prevspecs.pop
246
+ end
247
+ # Invoked when text is encountered in the document,
248
+ # with the _text_ content as argument.
249
+ # The current XMLSpec#text method is in turn called.
250
+ # (Note that if the text is interspersed with other elements,
251
+ # this method is invoked for each segment separately.)
252
+ def text text
253
+ @openTag=nil
254
+ @currSpec.text(@currContext, text)
255
+ end
256
+ end
257
+
258
+ end
259
+
@@ -0,0 +1,75 @@
1
+ # 'xmldemo.rb'
2
+ # This is an example of how to use the XMLStreamin classes
3
+ # (XMLStreamListener and XMLSpec) in 'xmlstreamin.rb' to parse an XML stream.
4
+ # [These comments are intended to be read inline -- not via rdoc...]
5
+ #
6
+ # For demonstration, it is set to extract parts of the XML example used
7
+ # in the REXML Tutorial ('demodoc.xml' here), but any other file should
8
+ # just get its tags listed.
9
+
10
+
11
+ # xmlstreamin.rb should be in the local directory or the library path:
12
+ require "xmlstreamin"
13
+ # This saves qualifying all the references with "XMLStreamin::":
14
+ include XMLStreamin
15
+
16
+ ####################################################
17
+
18
+ # It is useful to create a new class that extends the basic one slightly:
19
+ class XMLGenericSpec < XMLSpec
20
+ def start(context,name,attrs)
21
+ print "Element: #{name} \n"
22
+ attrs.each {|attr|
23
+ print " #{attr[0]} = #{attr[1]}\n"}
24
+ return context
25
+ end
26
+ end
27
+
28
+ # ...and we can use this for a default node for unknown XML formats:
29
+ specShow = XMLGenericSpec.new
30
+ # This invokes itself for any enclosed elements:
31
+ specShow.default! specShow
32
+
33
+ ####################################################
34
+
35
+ # We declare a spec to handle the "name" elements of the example format:
36
+ specName = XMLGenericSpec.new
37
+ # The name itself is element text, so we redefine that method:
38
+ def specName.text(context,data)
39
+ print " [#{data}]\n" ## assumes single line
40
+ end
41
+ # There shouldn't be any sub-elements! Let's fail if there are:
42
+ specName.default! $specXMLFail
43
+
44
+ # The enclosing element is an 'item', so we define a node for that
45
+ # (just a basic XMLSpec, so it does nothing but model the structure):
46
+ specItem = XMLSpec.new
47
+ # We pass 'name' sub-elements on to their handling node:
48
+ specItem.specs!({'name'=>specName})
49
+ # Everything else gets ignored (the default is '$specXMLVoid, which skips them)
50
+
51
+ # Enclosing this is the 'section' element.
52
+ # For variety we use XMLGenericSpec again here, so we see the attributes:
53
+ specSection = XMLGenericSpec.new
54
+ # The only sub-element here should be an 'item':
55
+ specSection.specs!({'item'=>specItem})
56
+
57
+ # The top-level element is 'inventory'
58
+ specInventory = XMLSpec.new
59
+ specInventory.specs!({'section'=>specSection})
60
+
61
+
62
+ ####################################################
63
+
64
+ # The very top is the "Document" node which references the top 'inventory' element:
65
+ specDocument = XMLSpec.new
66
+ # If we don't see the expected top tag, we go into "show" mode:
67
+ specDocument.default! specShow
68
+ specDocument.specs!({'inventory'=>specInventory})
69
+
70
+ # Finally, to parse a document, we create an XMLStreamListener and a source object,
71
+ # and pass them to REXML.
72
+ list = XMLStreamListener.new specDocument
73
+ source = File.new ARGV[0]
74
+ REXML::Document.parse_stream(source, list)
75
+
metadata ADDED
@@ -0,0 +1,49 @@
1
+ --- !ruby/object:Gem::Specification
2
+ rubygems_version: 0.9.4
3
+ specification_version: 1
4
+ name: xmlstreamin
5
+ version: !ruby/object:Gem::Version
6
+ version: 0.0.1
7
+ date: 2007-08-06 00:00:00 +02:00
8
+ summary: XMLStreamin
9
+ require_paths:
10
+ - lib
11
+ email: pete@jwgibbs.cchem.berkeley.edu
12
+ homepage:
13
+ rubyforge_project:
14
+ description:
15
+ autorequire:
16
+ default_executable:
17
+ bindir: bin
18
+ has_rdoc: true
19
+ required_ruby_version: !ruby/object:Gem::Version::Requirement
20
+ requirements:
21
+ - - ">"
22
+ - !ruby/object:Gem::Version
23
+ version: 0.0.0
24
+ version:
25
+ platform: ruby
26
+ signing_key:
27
+ cert_chain:
28
+ post_install_message:
29
+ authors:
30
+ - Peter Goodeve
31
+ files:
32
+ - lib/xmlstreamin.rb
33
+ - xmldemo.rb
34
+ - demodoc.xml
35
+ - README
36
+ test_files: []
37
+
38
+ rdoc_options: []
39
+
40
+ extra_rdoc_files: []
41
+
42
+ executables: []
43
+
44
+ extensions: []
45
+
46
+ requirements: []
47
+
48
+ dependencies: []
49
+