xamplr-pp 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,298 @@
1
+ #
2
+ # xampl-pp : XML pull parser
3
+ # Copyright (C) 2002-2009 Bob Hutchison
4
+ #
5
+ # This library is free software; you can redistribute it and/or
6
+ # modify it under the terms of the GNU Lesser General Public
7
+ # License as published by the Free Software Foundation; either
8
+ # version 2.1 of the License, or (at your option) any later version.
9
+ #
10
+ # This library is distributed in the hope that it will be useful,
11
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
12
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13
+ # #Lesser General Public License for more details.
14
+ #
15
+ # You should have received a copy of the GNU Lesser General Public
16
+ # License along with this library; if not, write to the Free Software
17
+ # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
18
+ #
19
+ require "xampl-pp"
20
+
21
+ ##
22
+ ## It may seem strange, but it seems that a good way to demonstrate the use
23
+ ## of the xampl-pp pull parser is to show how to build a SAX-like XML
24
+ ## parser. Both pull parsers and SAX parsers are stream based -- they parse
25
+ ## the XML file bit by bit informing its client of interesting events as
26
+ ## they are encountered. The whole XML document is not required to be in
27
+ ## memory. The significant difference between pull parsers and SAX parsers
28
+ ## is in where the 'main loop' is located: in the client for pull parsers,
29
+ ## in the parser for SAX parsers. Clients call a method of the pull parser
30
+ ## to get the next event. SAX parsers call methods of the client to notify
31
+ ## it of events (so these are 'push parsers').
32
+ ##
33
+ ## It turns out to be quite easy to build a SAX-like parser from a pull
34
+ ## parser. It is quite a lot harder to build a pull parser from a SAX-like
35
+ ## parser.
36
+ ##
37
+ ## This class demonstrates (most) of the xampl-pp interface by implementing a
38
+ ## SAX-like parser. No attempt has been made to provide all the functionality
39
+ ## provided by a good Java SAX parser, though the equivalent of a significant,
40
+ ## and useful, subset is implemented.
41
+ ##
42
+ ## The program text is annotated. Note, that the annotations generally
43
+ ## follow the code being described.
44
+ ##
45
+
46
+
47
+ class SAXish
48
+
49
+ ##
50
+ ## The Ruby implementation of the xampl-pp parser is called Xampl_PP, and
51
+ ## SAXish will be the name of our SAX-like parser.
52
+ ##
53
+
54
+ attr :handler, true
55
+
56
+ ##
57
+ ## Sax parsers need an event handler. 'handler' is it. Handler is expected to
58
+ ## implement the methods defined in the module 'saxishHandler'. SaxishHandler
59
+ ## is intended to be an adapter (so you can include it in any hander you
60
+ ## write), so only the event-handlers for those events in which you are
61
+ ## interested in need to be re-defined. SAXdemo is an implementation of
62
+ ## SaxishHandler that gathers some statistics.
63
+ ##
64
+ ## Xampl-pp requires something it calls a resolver. This is a class that
65
+ ## implements a method called resolve. There are a number of predefined
66
+ ## entities in xampl-pp: & ' > < and ". It is possible
67
+ ## to add more entities by adding entries to the entityMap hashtable. If an
68
+ ## entity is encountered that is not in entityMap then the resolve method on
69
+ ## the resolver is called. The default resolver returns nil, which causes
70
+ ## an exception to be thrown. If you specify your own resolver you can do
71
+ ## anything you like to obtain a value for the entity, or you can return nil
72
+ ## (and an exception will be thrown). Xampl-pp, by default, is its own
73
+ ## resolver and simply return nil.
74
+ ##
75
+ ## We are going to require that our saxish handler also be the entity
76
+ ## resolver. This is reflected in the SaxHandler module, which implements
77
+ ## a resolve method that always returns nil.
78
+ ##
79
+
80
+ attr :processNamespace, true
81
+ attr :reportNamespaceAttributes, true
82
+
83
+ ##
84
+ ## This block of comments can be ignored, certainly for the first reading.
85
+ ## It talks about some control you have over how the xampl-pp works. The
86
+ ## default behaviour is the most commonly used.
87
+ ##
88
+ ## There are two main controls used here: processNamespace, and
89
+ ## reportNamespaceAttributes. If processNamespaces is true, then namespaces
90
+ ## in the XML file being parsed will be processed. Processing means that if
91
+ ## an element <prefix:name/> is encountered, then four variables will be
92
+ ## set up in the parser instance: name is 'name', prefix is 'prefix',
93
+ ## qname is 'prefix:name', and namespace is defined. If the namespace cannot
94
+ ## be defined an exception is thrown. In addition the xmlns attributes
95
+ ## are processed. If processNamespace is false then name and qname
96
+ ## will both be 'prefix:name', and both prefix and namespace undefined.
97
+ ## If reportNamespaceAttributes is true then the xmlns attributes will be
98
+ ## reported along with all the other attributes, if false then they will
99
+ ## be hidden. The default behaviour is to process namespaces but to not
100
+ ## report the namespace attributes.
101
+ ##
102
+ ## There are two other controls that should be mentioned. They are not
103
+ ## used here.
104
+ ##
105
+ ## Pull parsers are pretty low level tools. They are meant to be fast. While
106
+ ## may wellformedness constraints are enforced, not all are. If the control
107
+ ## checkWellFormed is true then additional checks are made. Xampl-pp does
108
+ ## not guarantee that it will parse only well formed XML documents. It
109
+ ## will parse some XML files that are not well formed without objecting. In
110
+ ## future releases, it will be possible to have xampl-pp accept only
111
+ ## well formed documents. If checkWellFormed is false, then the parser
112
+ ## doesn't go out of its way to notice ill formed documents. The default
113
+ ## is true.
114
+ ##
115
+ ## The fourth control is 'utf8encode'. If this is true, and it defaults to
116
+ ## true, then an entity like &#1234; is encountered then it will be encoded
117
+ ## using utf8 rules. Given the current state of the parser, it would be best
118
+ ## to leave it set to true. If you want to change this then you must either
119
+ ## never use &#; encodings with numbers greater than 255 (Ruby will throw an
120
+ ## exception), or you must redefine xampl-pp's encode method to do the right
121
+ ## thing.
122
+ ##
123
+
124
+ def parse(filename)
125
+ @xpp = Xampl_PP.new
126
+ @xpp.input = File.new(filename)
127
+ @xpp.processNamespace = @processNamespace
128
+ @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
129
+ @xpp.resolver = @handler
130
+
131
+ work
132
+ end
133
+
134
+ def parseString(string)
135
+ @xpp = Xampl_PP.new
136
+ @xpp.input = string
137
+ @xpp.processNamespace = @processNamespace
138
+ @xpp.reportNamespaceAttributes = @reportNamespaceAttributes
139
+ @xpp.resolver = @handler
140
+
141
+ work
142
+ end
143
+
144
+ #
145
+ # Constructing an instance of xampl-pp is pretty straight forward: Xampl_PP.new
146
+ #
147
+ # Xampl_PP accepts two kinds of input: IO and String. The same method,
148
+ # 'input', is used to specify the input. It is possible to set the input
149
+ # anytime, but if you do, the current input will be closed if it is of
150
+ # type IO, and the parsing will begin at the current location of the input.
151
+ #
152
+ # The methods parse and parseString illustrate.
153
+ #
154
+
155
+ def work
156
+ while not @xpp.endDocument? do
157
+ case @xpp.nextEvent
158
+ when Xampl_PP::START_DOCUMENT
159
+ @handler.startDocument
160
+ when Xampl_PP::END_DOCUMENT
161
+ @handler.endDocument
162
+ when Xampl_PP::START_ELEMENT
163
+ @handler.startElement(@xpp.name,
164
+ @xpp.namespace,
165
+ @xpp.qname,
166
+ @xpp.prefix,
167
+ attributeCount,
168
+ @xpp.emptyElement,
169
+ self)
170
+ when Xampl_PP::END_ELEMENT
171
+ @handler.endElement(@xpp.name,
172
+ @xpp.namespace,
173
+ @xpp.qname,
174
+ @xpp.prefix)
175
+ when Xampl_PP::TEXT
176
+ @handler.text(@xpp.text, @xpp.whitespace?)
177
+ when Xampl_PP::CDATA_SECTION
178
+ @handler.cdataSection(@xpp.text)
179
+ when Xampl_PP::ENTITY_REF
180
+ @handler.entityRef(@xpp.name, @xpp.text)
181
+ when Xampl_PP::IGNORABLE_WHITESPACE
182
+ @handler.ignoreableWhitespace(@xpp.text)
183
+ when Xampl_PP::PROCESSING_INSTRUCTION
184
+ @handler.processingInstruction(@xpp.text)
185
+ when Xampl_PP::COMMENT
186
+ @handler.comment(@xpp.text)
187
+ when Xampl_PP::DOCTYPE
188
+ @handler.doctype(@xpp.text)
189
+ end
190
+ end
191
+ end
192
+
193
+ def attributeCount
194
+ return @xpp.attributeName.length
195
+ end
196
+
197
+ def attributeName(i)
198
+ return @xpp.attributeName[i]
199
+ end
200
+
201
+ def attributeNamespace(i)
202
+ return @xpp.attributeNamespace[i]
203
+ end
204
+
205
+ def attributeQName(i)
206
+ return @xpp.attributeQName[i]
207
+ end
208
+
209
+ def attributePrefix(i)
210
+ return @xpp.attributePrefix[i]
211
+ end
212
+
213
+ def attributeValue(i)
214
+ return @xpp.attributeValue[i]
215
+ end
216
+
217
+ def depth
218
+ return @xpp.depth
219
+ end
220
+
221
+ def line
222
+ return @xpp.line
223
+ end
224
+
225
+ def column
226
+ return @xpp.column
227
+ end
228
+
229
+
230
+ ##
231
+ ## There is one method used to parse the XML document: nextEvent. It returns
232
+ ## the type of the event (described below). There are corresponding queries
233
+ ## defined for each event type. The event is described by variables in the
234
+ ## xampl-pp instance.
235
+ ##
236
+ ## It is possible to obtain the depth in the XML file (i.e. who many elements
237
+ ## are currently open) using the xampl-pp method 'depth'. This is made
238
+ ## available to the saxish client using a method on the sishax parser with the
239
+ ## same name.
240
+ ##
241
+ ## The line and column number of the next unparsed character is available
242
+ ## using the line and column methods. Note that line is always 1 for
243
+ ## string input.
244
+ ##
245
+ ## There is a method, whitespace?, that will tell you if the current text
246
+ ## value is whitespace.
247
+ ##
248
+ ## The event types are:
249
+ ##
250
+ ## START_DOCUMENT, END_DOCUMENT -- informational
251
+ ##
252
+ ## START_ELEMENT -- on this event several features are defined in the parser
253
+ ## that are pertinent. name, namespace, qname, prefix describe the element
254
+ ## tag name. emptyElement is true if the element is of the form <element/>,
255
+ ## false otherwise. And the arrays attributeName, attributeNamespace,
256
+ ## attributeQName, attributePrefix, and attributeValue contain attribute
257
+ ## information. The number of attributes is obtained from the length of
258
+ ## any of these arrays. Attribute information is presented to the sax
259
+ ## client using six methods: attributeCount, attributeName(i),
260
+ ## attributeNamespace(i), attributeQName(i), attributePrefix(i),
261
+ ## attributeValue(i).
262
+ ##
263
+ ## END_ELEMENT -- name, namespace, qname, and prefix are defined. NOTE that
264
+ ## emptyElement will always be false for this event, even though it is called
265
+ ## for elements of the form <element/>.
266
+ ##
267
+ ## TEXT -- upon plain text found in an element. Note that it is
268
+ ## quite possible that several text events in succession may be made for a
269
+ ## single run of text in the XML file
270
+ ##
271
+ ## CDATA_SECTION -- upon a CDATA section. Note that it is quite possible
272
+ ## that several CDATA events in succession may be made for a single CDATA
273
+ ## section.
274
+ ##
275
+ ## ENTITY_REF -- for each entity encountered. It will have the
276
+ ## value in the text field, and the name in the name field.
277
+ ##
278
+ ## IGNORABLE_WHITESPACE -- for whitespace that occurs at the document
279
+ ## level of the XML file (i.e. outside the root element). This whitespace is
280
+ ## meaningless in XML and so can be ignored (and so the name). If you are
281
+ ## interested in it, the whitespace is in the text field.
282
+ ##
283
+ ## PROCESSING_INSTRUCTION -- upon a processing instruction. The content of
284
+ ## the processing instruction (with the <? and ?> removed) is provied in
285
+ ## the text field.
286
+ ##
287
+ ## COMMENT -- upon a comment. The content of the comment (with the <!--
288
+ ## and --> removed) is provied in the text field.
289
+ ##
290
+ ## DOCTYPE -- upon encountering a doctype. The content of the doctype
291
+ ## (with the <!DOCTYPE and trailing > removed) is provided in the text field.
292
+ ##
293
+ ## The event query methods are: cdata?, comment?, doctype?, endDocument?,
294
+ ## endElement?, entityRef?, ignorableWhitespace?, processingInstruction?,
295
+ ## startDocument?, startElement?, and text?
296
+ ##
297
+
298
+ end
@@ -0,0 +1,58 @@
1
+ # xampl-pp : XML pull parser
2
+ # Copyright (C) 2002-2009 Bob Hutchison
3
+ #
4
+ # This library is free software; you can redistribute it and/or
5
+ # modify it under the terms of the GNU Lesser General Public
6
+ # License as published by the Free Software Foundation; either
7
+ # version 2.1 of the License, or (at your option) any later version.
8
+ #
9
+ # This library is distributed in the hope that it will be useful,
10
+ # but WITHOUT ANY WARRANTY; without even the implied warranty of
11
+ # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
12
+ # #Lesser General Public License for more details.
13
+ #
14
+ # You should have received a copy of the GNU Lesser General Public
15
+ # License along with this library; if not, write to the Free Software
16
+ # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17
+ #
18
+
19
+ module SAXishHandler
20
+
21
+ def resolve(name)
22
+ return nil
23
+ end
24
+
25
+ def startDocument
26
+ end
27
+
28
+ def endDocument
29
+ end
30
+
31
+ def startElement(name, namespace, qname, prefix, attributeCount, isEmptyElement, saxParser)
32
+ end
33
+
34
+ def endElement(name, namespace, qname, prefix)
35
+ end
36
+
37
+ def entityRef(name, text)
38
+ end
39
+
40
+ def text(text, isWhitespace)
41
+ end
42
+
43
+ def cdataSection(text)
44
+ end
45
+
46
+ def ignoreableWhitespace(text)
47
+ end
48
+
49
+ def processingInstruction(text)
50
+ end
51
+
52
+ def doctype(text)
53
+ end
54
+
55
+ def comment(text)
56
+ end
57
+ end
58
+
@@ -0,0 +1,62 @@
1
+ #!/usr/local/bin/ruby
2
+ require "xampl-pp-wf"
3
+ #require "xampl-pp"
4
+
5
+ class Chew
6
+
7
+ def resolve(name)
8
+ @resolveRequest = true
9
+ # if not @xpp.standalone then
10
+ # # for the purposes of conformance, accept this since we don't
11
+ # # know if the external subset defines something
12
+ # return "fake it"
13
+ # else
14
+ # return nil
15
+ # end
16
+ end
17
+
18
+ def run
19
+ @allFiles = File.new ARGV[1]
20
+
21
+ while true do
22
+ fileName = @allFiles.gets
23
+ if nil == fileName then
24
+ break
25
+ end
26
+ fileName.chop!
27
+
28
+ @xpp = Xampl_PP.new
29
+ @xpp.input = File.new(fileName)
30
+ @xpp.resolver = self
31
+ @resolveRequest = false
32
+ @xpp.processNamespace = false
33
+ @xpp.reportNamespaceAttributes = false
34
+
35
+ begin
36
+ i = 0
37
+ while not @xpp.endDocument? do
38
+ type = @xpp.nextEvent
39
+ i += 1
40
+ end
41
+ printf("%sPASSED '%s' -- there were %d events\n", (("PASS" == ARGV[0])? " " : "#"), fileName, i)
42
+ rescue RuntimeError => message
43
+ #print message.backtrace.join("\n")
44
+ if @resolveRequest then
45
+ printf("ENTITY [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
46
+ else
47
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
48
+ end
49
+ rescue Exception => message
50
+ #print message.backtrace.join("\n")
51
+ if @resolveRequest then
52
+ printf("ENTITY [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
53
+ else
54
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
55
+ end
56
+ end
57
+ end
58
+ end
59
+ end
60
+
61
+ chew = Chew.new
62
+ chew.run
@@ -0,0 +1,44 @@
1
+ #!/usr/local/bin/ruby
2
+ require "xppMultibyte"
3
+
4
+ class Chew
5
+
6
+ def resolve(name)
7
+ return "fake it"
8
+ end
9
+
10
+ def run
11
+ @allFiles = File.new ARGV[1]
12
+
13
+ while true do
14
+ fileName = @allFiles.gets
15
+ if nil == fileName then
16
+ break
17
+ end
18
+ fileName.chop!
19
+
20
+ @xpp = Xpp.new
21
+ @xpp.input = File.new(fileName)
22
+ @xpp.resolver = self
23
+ @xpp.processNamespace = false
24
+ @xpp.reportNamespaceAttributes = false
25
+
26
+ begin
27
+ i = 0
28
+ while not @xpp.endDocument? do
29
+ type = @xpp.nextEvent
30
+ i += 1
31
+ end
32
+ printf("%sPASSED '%s' -- there were %d events\n", (("PASS" == ARGV[0])? " " : "#"), fileName, i)
33
+ rescue RuntimeError => message
34
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
35
+ rescue Exception => message
36
+ printf("%sFAILED [%s] '%s'\n", (("FAIL" == ARGV[0])? " " : "#"), message, fileName)
37
+ end
38
+ end
39
+ end
40
+ end
41
+
42
+ chew = Chew.new
43
+ chew.run
44
+