xamplr-pp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,298 @@
1
+ #
2
+ # xampl-pp : XML pull parser
3
+ # Copyright (C) 2002-2009 Bob Hutchison
4
+ #
5
+ # This library is free software; you can redistribute it and/or
6
+ # modify it under the terms of the GNU Lesser General Public
7
+ # License as published by the Free Software Foundation; either
8
+ # version 2.1 of the License, or (at your option) any later version.
9
+ #
10
+ # This library is distributed in the hope that it will be useful,
11
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
+ # #Lesser General Public License for more details.
14
+ #
15
+ # You should have received a copy of the GNU Lesser General Public
16
+ # License along with this library; if not, write to the Free Software
17
+ # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
+ #
19
+ require "xampl-pp"
20
+
21
+ ##
22
+ ## It may seem strange, but it seems that a good way to demonstrate the use
23
+ ## of the xampl-pp pull parser is to show how to build a SAX-like XML
24
+ ## parser. Both pull parsers and SAX parsers are stream based -- they parse
25
+ ## the XML file bit by bit informing its client of interesting events as
26
+ ## they are encountered. The whole XML document is not required to be in
27
+ ## memory. The significant difference between pull parsers and SAX parsers
28
+ ## is in where the 'main loop' is located: in the client for pull parsers,
29
+ ## in the parser for SAX parsers. Clients call a method of the pull parser
30
+ ## to get the next event. SAX parsers call methods of the client to notify
31
+ ## it of events (so these are 'push parsers').
32
+ ##
33
+ ## It turns out to be quite easy to build a SAX-like parser from a pull
34
+ ## parser. It is quite a lot harder to build a pull parser from a SAX-like
35
+ ## parser.
36
+ ##
37
+ ## This class demonstrates (most) of the xampl-pp interface by implementing a
38
+ ## SAX-like parser. No attempt has been made to provide all the functionality
39
+ ## provided by a good Java SAX parser, though the equivalent of a significant,
40
+ ## and useful, subset is implemented.
41
+ ##
42
+ ## The program text is annotated. Note, that the annotations generally
43
+ ## follow the code being described.
44
+ ##
45
+
46
+
47
+ class SAXish
48
+
49
+ ##
50
+ ## The Ruby implementation of the xampl-pp parser is called Xampl_PP, and
51
+ ## SAXish will be the name of our SAX-like parser.
52
+ ##
53
+
54
+ attr :handler, true
55
+
56
+ ##
57
+ ## Sax parsers need an event handler. 'handler' is it. Handler is expected to
58
+ ## implement the methods defined in the module 'saxishHandler'. SaxishHandler
59
+ ## is intended to be an adapter (so you can include it in any hander you
60
+ ## write), so only the event-handlers for those events in which you are
61
+ ## interested in need to be re-defined. SAXdemo is an implementation of
62
+ ## SaxishHandler that gathers some statistics.
63
+ ##
64
+ ## Xampl-pp requires something it calls a resolver. This is a class that
65
+ ## implements a method called resolve. There are a number of predefined
66
+ ## entities in xampl-pp: & ' > < and ". It is possible
67
+ ## to add more entities by adding entries to the entityMap hashtable. If an
68
+ ## entity is encountered that is not in entityMap then the resolve method on
69
+ ## the resolver is called. The default resolver returns nil, which causes
70
+ ## an exception to be thrown. If you specify your own resolver you can do
71
+ ## anything you like to obtain a value for the entity, or you can return nil
72
+ ## (and an exception will be thrown). Xampl-pp, by default, is its own
73
+ ## resolver and simply return nil.
74
+ ##
75
+ ## We are going to require that our saxish handler also be the entity
76
+ ## resolver. This is reflected in the SaxHandler module, which implements
77
+ ## a resolve method that always returns nil.
78
+ ##
79
+
80
+ attr :processNamespace, true
81
+ attr :reportNamespaceAttributes, true
82
+
83
+ ##
84
+ ## This block of comments can be ignored, certainly for the first reading.
85
+ ## It talks about some control you have over how the xampl-pp works. The
86
+ ## default behaviour is the most commonly used.
87
+ ##
88
+ ## There are two main controls used here: processNamespace, and
89
+ ## reportNamespaceAttributes. If processNamespaces is true, then namespaces
90
+ ## in the XML file being parsed will be processed. Processing means that if
91
+ ## an element <prefix:name/> is encountered, then four variables will be
92
+ ## set up in the parser instance: name is 'name', prefix is 'prefix',
93
+ ## qname is 'prefix:name', and namespace is defined. If the namespace cannot
94
+ ## be defined an exception is thrown. In addition the xmlns attributes
95
+ ## are processed. If processNamespace is false then name and qname
96
+ ## will both be 'prefix:name', and both prefix and namespace undefined.
97
+ ## If reportNamespaceAttributes is true then the xmlns attributes will be
98
+ ## reported along with all the other attributes, if false then they will
99
+ ## be hidden. The default behaviour is to process namespaces but to not
100
+ ## report the namespace attributes.
101
+ ##
102
+ ## There are two other controls that should be mentioned. They are not
103
+ ## used here.
104
+ ##
105
+ ## Pull parsers are pretty low level tools. They are meant to be fast. While
106
+ ## may wellformedness constraints are enforced, not all are. If the control
107
+ ## checkWellFormed is true then additional checks are made. Xampl-pp does
108
+ ## not guarantee that it will parse only well formed XML documents. It
109
+ ## will parse some XML files that are not well formed without objecting. In
110
+ ## future releases, it will be possible to have xampl-pp accept only
111
+ ## well formed documents. If checkWellFormed is false, then the parser
112
+ ## doesn't go out of its way to notice ill formed documents. The default
113
+ ## is true.
114
+ ##
115
+ ## The fourth control is 'utf8encode'. If this is true, and it defaults to
116
+ ## true, then an entity like &#1234; is encountered then it will be encoded
117
+ ## using utf8 rules. Given the current state of the parser, it would be best
118
+ ## to leave it set to true. If you want to change this then you must either
119
+ ## never use &#; encodings with numbers greater than 255 (Ruby will throw an
120
+ ## exception), or you must redefine xampl-pp's encode method to do the right
121
+ ## thing.
122
+ ##
123
+
124
+ def parse(filename)
125
+ @xpp = Xampl_PP.new
126
+ @xpp.input = File.new(filename)
127
+ @xpp.processNamespace = @processNamespace
128
+ @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
129
+ @xpp.resolver = @handler
130
+
131
+ work
132
+ end
133
+
134
+ def parseString(string)
135
+ @xpp = Xampl_PP.new
136
+ @xpp.input = string
137
+ @xpp.processNamespace = @processNamespace
138
+ @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
139
+ @xpp.resolver = @handler
140
+
141
+ work
142
+ end
143
+
144
+ #
145
+ # Constructing an instance of xampl-pp is pretty straight forward: Xampl_PP.new
146
+ #
147
+ # Xampl_PP accepts two kinds of input: IO and String. The same method,
148
+ # 'input', is used to specify the input. It is possible to set the input
149
+ # anytime, but if you do, the current input will be closed if it is of
150
+ # type IO, and the parsing will begin at the current location of the input.
151
+ #
152
+ # The methods parse and parseString illustrate.
153
+ #
154
+
155
+ def work
156
+ while not @xpp.endDocument? do
157
+ case @xpp.nextEvent
158
+ when Xampl_PP::START_DOCUMENT
159
+ @handler.startDocument
160
+ when Xampl_PP::END_DOCUMENT
161
+ @handler.endDocument
162
+ when Xampl_PP::START_ELEMENT
163
+ @handler.startElement(@xpp.name,
164
+ @xpp.namespace,
165
+ @xpp.qname,
166
+ @xpp.prefix,
167
+ attributeCount,
168
+ @xpp.emptyElement,
169
+ self)
170
+ when Xampl_PP::END_ELEMENT
171
+ @handler.endElement(@xpp.name,
172
+ @xpp.namespace,
173
+ @xpp.qname,
174
+ @xpp.prefix)
175
+ when Xampl_PP::TEXT
176
+ @handler.text(@xpp.text, @xpp.whitespace?)
177
+ when Xampl_PP::CDATA_SECTION
178
+ @handler.cdataSection(@xpp.text)
179
+ when Xampl_PP::ENTITY_REF
180
+ @handler.entityRef(@xpp.name, @xpp.text)
181
+ when Xampl_PP::IGNORABLE_WHITESPACE
182
+ @handler.ignoreableWhitespace(@xpp.text)
183
+ when Xampl_PP::PROCESSING_INSTRUCTION
184
+ @handler.processingInstruction(@xpp.text)
185
+ when Xampl_PP::COMMENT
186
+ @handler.comment(@xpp.text)
187
+ when Xampl_PP::DOCTYPE
188
+ @handler.doctype(@xpp.text)
189
+ end
190
+ end
191
+ end
192
+
193
+ def attributeCount
194
+ return @xpp.attributeName.length
195
+ end
196
+
197
+ def attributeName(i)
198
+ return @xpp.attributeName[i]
199
+ end
200
+
201
+ def attributeNamespace(i)
202
+ return @xpp.attributeNamespace[i]
203
+ end
204
+
205
+ def attributeQName(i)
206
+ return @xpp.attributeQName[i]
207
+ end
208
+
209
+ def attributePrefix(i)
210
+ return @xpp.attributePrefix[i]
211
+ end
212
+
213
+ def attributeValue(i)
214
+ return @xpp.attributeValue[i]
215
+ end
216
+
217
+ def depth
218
+ return @xpp.depth
219
+ end
220
+
221
+ def line
222
+ return @xpp.line
223
+ end
224
+
225
+ def column
226
+ return @xpp.column
227
+ end
228
+
229
+
230
+ ##
231
+ ## There is one method used to parse the XML document: nextEvent. It returns
232
+ ## the type of the event (described below). There are corresponding queries
233
+ ## defined for each event type. The event is described by variables in the
234
+ ## xampl-pp instance.
235
+ ##
236
+ ## It is possible to obtain the depth in the XML file (i.e. who many elements
237
+ ## are currently open) using the xampl-pp method 'depth'. This is made
238
+ ## available to the saxish client using a method on the sishax parser with the
239
+ ## same name.
240
+ ##
241
+ ## The line and column number of the next unparsed character is available
242
+ ## using the line and column methods. Note that line is always 1 for
243
+ ## string input.
244
+ ##
245
+ ## There is a method, whitespace?, that will tell you if the current text
246
+ ## value is whitespace.
247
+ ##
248
+ ## The event types are:
249
+ ##
250
+ ## START_DOCUMENT, END_DOCUMENT -- informational
251
+ ##
252
+ ## START_ELEMENT -- on this event several features are defined in the parser
253
+ ## that are pertinent. name, namespace, qname, prefix describe the element
254
+ ## tag name. emptyElement is true if the element is of the form <element/>,
255
+ ## false otherwise. And the arrays attributeName, attributeNamespace,
256
+ ## attributeQName, attributePrefix, and attributeValue contain attribute
257
+ ## information. The number of attributes is obtained from the length of
258
+ ## any of these arrays. Attribute information is presented to the sax
259
+ ## client using six methods: attributeCount, attributeName(i),
260
+ ## attributeNamespace(i), attributeQName(i), attributePrefix(i),
261
+ ## attributeValue(i).
262
+ ##
263
+ ## END_ELEMENT -- name, namespace, qname, and prefix are defined. NOTE that
264
+ ## emptyElement will always be false for this event, even though it is called
265
+ ## for elements of the form <element/>.
266
+ ##
267
+ ## TEXT -- upon plain text found in an element. Note that it is
268
+ ## quite possible that several text events in succession may be made for a
269
+ ## single run of text in the XML file
270
+ ##
271
+ ## CDATA_SECTION -- upon a CDATA section. Note that it is quite possible
272
+ ## that several CDATA events in succession may be made for a single CDATA
273
+ ## section.
274
+ ##
275
+ ## ENTITY_REF -- for each entity encountered. It will have the
276
+ ## value in the text field, and the name in the name field.
277
+ ##
278
+ ## IGNORABLE_WHITESPACE -- for whitespace that occurs at the document
279
+ ## level of the XML file (i.e. outside the root element). This whitespace is
280
+ ## meaningless in XML and so can be ignored (and so the name). If you are
281
+ ## interested in it, the whitespace is in the text field.
282
+ ##
283
+ ## PROCESSING_INSTRUCTION -- upon a processing instruction. The content of
284
+ ## the processing instruction (with the <? and ?> removed) is provied in
285
+ ## the text field.
286
+ ##
287
+ ## COMMENT -- upon a comment. The content of the comment (with the <!--
288
+ ## and --> removed) is provied in the text field.
289
+ ##
290
+ ## DOCTYPE -- upon encountering a doctype. The content of the doctype
291
+ ## (with the <!DOCTYPE and trailing > removed) is provided in the text field.
292
+ ##
293
+ ## The event query methods are: cdata?, comment?, doctype?, endDocument?,
294
+ ## endElement?, entityRef?, ignorableWhitespace?, processingInstruction?,
295
+ ## startDocument?, startElement?, and text?
296
+ ##
297
+
298
+ end
@@ -0,0 +1,58 @@
1
+ # xampl-pp : XML pull parser
2
+ # Copyright (C) 2002-2009 Bob Hutchison
3
+ #
4
+ # This library is free software; you can redistribute it and/or
5
+ # modify it under the terms of the GNU Lesser General Public
6
+ # License as published by the Free Software Foundation; either
7
+ # version 2.1 of the License, or (at your option) any later version.
8
+ #
9
+ # This library is distributed in the hope that it will be useful,
10
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12
+ # #Lesser General Public License for more details.
13
+ #
14
+ # You should have received a copy of the GNU Lesser General Public
15
+ # License along with this library; if not, write to the Free Software
16
+ # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
+ #
18
+
19
+ module SAXishHandler
20
+
21
+ def resolve(name)
22
+ return nil
23
+ end
24
+
25
+ def startDocument
26
+ end
27
+
28
+ def endDocument
29
+ end
30
+
31
+ def startElement(name, namespace, qname, prefix, attributeCount, isEmptyElement, saxParser)
32
+ end
33
+
34
+ def endElement(name, namespace, qname, prefix)
35
+ end
36
+
37
+ def entityRef(name, text)
38
+ end
39
+
40
+ def text(text, isWhitespace)
41
+ end
42
+
43
+ def cdataSection(text)
44
+ end
45
+
46
+ def ignoreableWhitespace(text)
47
+ end
48
+
49
+ def processingInstruction(text)
50
+ end
51
+
52
+ def doctype(text)
53
+ end
54
+
55
+ def comment(text)
56
+ end
57
+ end
58
+
@@ -0,0 +1,62 @@
1
+ #!/usr/local/bin/ruby
2
+ require "xampl-pp-wf"
3
+ #require "xampl-pp"
4
+
5
+ class Chew
6
+
7
+ def resolve(name)
8
+ @resolveRequest = true
9
+ # if not @xpp.standalone then
10
+ # # for the purposes of conformance, accept this since we don't
11
+ # # know if the external subset defines something
12
+ # return "fake it"
13
+ # else
14
+ # return nil
15
+ # end
16
+ end
17
+
18
+ def run
19
+ @allFiles = File.new ARGV[1]
20
+
21
+ while true do
22
+ fileName = @allFiles.gets
23
+ if nil == fileName then
24
+ break
25
+ end
26
+ fileName.chop!
27
+
28
+ @xpp = Xampl_PP.new
29
+ @xpp.input = File.new(fileName)
30
+ @xpp.resolver = self
31
+ @resolveRequest = false
32
+ @xpp.processNamespace = false
33
+ @xpp.reportNamespaceAttributes = false
34
+
35
+ begin
36
+ i = 0
37
+ while not @xpp.endDocument? do
38
+ type = @xpp.nextEvent
39
+ i += 1
40
+ end
41
+ printf("%sPASSED '%s' -- there were %d events\n", (("PASS" == ARGV[0])? " " : "#"), fileName, i)
42
+ rescue RuntimeError => message
43
+ #print message.backtrace.join("\n")
44
+ if @resolveRequest then
45
+ printf("ENTITY [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
46
+ else
47
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
48
+ end
49
+ rescue Exception => message
50
+ #print message.backtrace.join("\n")
51
+ if @resolveRequest then
52
+ printf("ENTITY [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
53
+ else
54
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
60
+
61
+ chew = Chew.new
62
+ chew.run
@@ -0,0 +1,44 @@
1
+ #!/usr/local/bin/ruby
2
+ require "xppMultibyte"
3
+
4
+ class Chew
5
+
6
+ def resolve(name)
7
+ return "fake it"
8
+ end
9
+
10
+ def run
11
+ @allFiles = File.new ARGV[1]
12
+
13
+ while true do
14
+ fileName = @allFiles.gets
15
+ if nil == fileName then
16
+ break
17
+ end
18
+ fileName.chop!
19
+
20
+ @xpp = Xpp.new
21
+ @xpp.input = File.new(fileName)
22
+ @xpp.resolver = self
23
+ @xpp.processNamespace = false
24
+ @xpp.reportNamespaceAttributes = false
25
+
26
+ begin
27
+ i = 0
28
+ while not @xpp.endDocument? do
29
+ type = @xpp.nextEvent
30
+ i += 1
31
+ end
32
+ printf("%sPASSED '%s' -- there were %d events\n", (("PASS" == ARGV[0])? " " : "#"), fileName, i)
33
+ rescue RuntimeError => message
34
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
35
+ rescue Exception => message
36
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
37
+ end
38
+ end
39
+ end
40
+ end
41
+
42
+ chew = Chew.new
43
+ chew.run
44
+