xmlscan 0.3.0prec → 0.3.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +59 -2
- data/VERSION +1 -1
- metadata +14 -14
data/README.rdoc
CHANGED
@@ -44,6 +44,20 @@ a core part of a library providing such features.
|
|
44
44
|
|
45
45
|
XMLscan contains htmlscan, an HTML parser.
|
46
46
|
|
47
|
+
== New Scanner Translater Feature with 0.3.0
|
48
|
+
|
49
|
+
Created for a use case in Wagn (http://wagn.org)
|
50
|
+
|
51
|
+
The fragment is pretty much the whole parsing as used in Wagn:
|
52
|
+
|
53
|
+
pairs = XMLScan::XMLProcessor.process(io, {:key=>:name, :element=>:card,
|
54
|
+
:substitute=>":transclude|{{:name}}", :extras=>[:type]})
|
55
|
+
|
56
|
+
The 'io' object is directly passed from the request.body in a XML POST action.
|
57
|
+
|
58
|
+
ref: https://github.com/GerryG/wagn/commit/46e57dfd88cde45d33fe7f167d5803ab819ceef3#commitcomment-1012403
|
59
|
+
|
60
|
+
See the tests cases in test/integration for some examples.
|
47
61
|
|
48
62
|
=== Character encodings
|
49
63
|
|
@@ -58,6 +72,9 @@ Shift_JIS, or UTF-8.
|
|
58
72
|
UTF-16 is not supported directly. You should convert it into
|
59
73
|
UTF-8 before parsing.
|
60
74
|
|
75
|
+
Character encoding needs more work with 0.3.0+. The character maps were
|
76
|
+
taken out in favor of \p{} RE classes in ruby, but it needs attention still.
|
77
|
+
|
61
78
|
=== XML Namespaces
|
62
79
|
|
63
80
|
XML Namespaces have been already implemented in
|
@@ -89,13 +106,21 @@ Raised when an XML document violates an validity constraint.
|
|
89
106
|
|
90
107
|
|
91
108
|
XMLScan::Visitor
|
109
|
+
|
92
110
|
Mix-in for receiving the result of parsing an XML document.
|
93
111
|
Each parser included in xmlscan parses an XML document from
|
94
112
|
the beginning, and calls each specific method of given instance of
|
95
113
|
XMLScan::Visitor for each syntactic element, such as a tag.
|
96
114
|
It is ensured that these calls is in order of the appearance
|
97
115
|
in the document from the beginning.
|
116
|
+
|
117
|
+
Changes: The scanner/parser now sends some additional arguments with
|
118
|
+
each event, primarily the original string that was parsed for the even, but
|
119
|
+
also a hash of attributes for stag_end. This supports scanning for
|
120
|
+
replacement of some elements very simple.
|
121
|
+
|
98
122
|
Methods:
|
123
|
+
|
99
124
|
Without special notice, the following methods do nothing by
|
100
125
|
default.
|
101
126
|
|
@@ -107,6 +132,7 @@ a production. By default, this method raises
|
|
107
132
|
XMLScan::ParseError exception. If no exception is
|
108
133
|
raised and this method returns normally, the parser recovers
|
109
134
|
the error and continues to parse.
|
135
|
+
|
110
136
|
XMLScan::Visitor#wellformed_error(msg)
|
111
137
|
|
112
138
|
Called when the parser meets an well-formedness constraint
|
@@ -114,6 +140,7 @@ violation. By default, this method raises
|
|
114
140
|
XMLScan::NotWellFormedError exception. If no exception
|
115
141
|
is raised and this method returns normally, the parser recovers
|
116
142
|
the error and continues to parse.
|
143
|
+
|
117
144
|
XMLScan::Visitor#valid_error(msg)
|
118
145
|
|
119
146
|
Called when the parser meets validity constraint
|
@@ -123,18 +150,23 @@ is raised and this method returns normally, the parser recovers
|
|
123
150
|
the error and continues to parse.
|
124
151
|
FYI, current version of xmlscan includes no validating XML
|
125
152
|
processor. This method is reserved for future versions.
|
153
|
+
|
126
154
|
XMLScan::Visitor#warning(msg)
|
127
155
|
|
128
156
|
Called when the parser meets a non-error but unrecommended
|
129
157
|
thing or a syntax which xmlscan is not able to parse.
|
158
|
+
|
130
159
|
XMLScan::Visitor#on_start_document
|
131
160
|
|
132
161
|
Called just before the parser starts parsing an XML document.
|
162
|
+
|
133
163
|
After this method is called, corresponding
|
134
164
|
XMLScan::Visitor#on_end_document method is always called.
|
165
|
+
|
135
166
|
XMLScan::Visitor#on_end_document
|
136
167
|
|
137
168
|
Called after the parser reaches the end of an XML document.
|
169
|
+
|
138
170
|
XMLScan::Visitor#on_xmldecl
|
139
171
|
XMLScan::Visitor#on_xmldecl_version(str)
|
140
172
|
XMLScan::Visitor#on_xmldecl_encoding(str)
|
@@ -154,50 +186,63 @@ Called when the parser meets an XML declaration.
|
|
154
186
|
3: on_xmldecl_encoding ("euc-jp")
|
155
187
|
4: on_xmldecl_standalone ("yes")
|
156
188
|
5: on_xmldecl_end
|
189
|
+
|
157
190
|
When an XML declaration is found, both on_xmldecl and
|
158
191
|
on_xmldecl_end method are always called. Any other methods
|
159
192
|
are called only when the corresponding syntaxes are found.
|
193
|
+
|
160
194
|
When a declaration except version, encoding, and standalone
|
161
195
|
is found in an XML declaration, on_xmldecl_other method is
|
162
196
|
called. Since such a declaration is not permitted, note that
|
163
197
|
the parser always calls XMLScan::Visitor#parse_error method
|
164
198
|
before calling on_xmldecl_other method.
|
199
|
+
|
165
200
|
XMLScan::Visitor#on_doctype(root, pubid, sysid)
|
166
201
|
|
167
202
|
Called when the parser meets a document type declaration.
|
203
|
+
|
168
204
|
document argument
|
169
205
|
--------------------------------------------------------------
|
170
206
|
1: <!DOCTYPE foo> ('foo', nil, nil)
|
171
207
|
2: <!DOCTYPE foo SYSTEM "bar"> ('foo', nil, 'bar')
|
172
208
|
3: <!DOCTYPE foo PUBLIC "bar"> ('foo', 'bar', nil )
|
173
209
|
4: <!DOCTYPE foo PUBLIC "bar" "baz"> ('foo', 'bar', 'baz')
|
210
|
+
|
174
211
|
XMLScan::Visitor#on_prolog_space(str)
|
175
212
|
|
176
213
|
Called when the parser meets whitespaces in prolog.
|
214
|
+
|
177
215
|
XMLScan::Visitor#on_comment(str)
|
178
216
|
|
179
217
|
Called when the parser meets a comment.
|
218
|
+
|
180
219
|
XMLScan::Visitor#on_pi(target, pi)
|
181
220
|
|
182
221
|
Called when the parser meets a processing instruction.
|
222
|
+
|
183
223
|
XMLScan::Visitor#on_chardata(str)
|
184
224
|
|
185
225
|
Called when the parser meets character data.
|
226
|
+
|
186
227
|
XMLScan::Visitor#on_cdata(str)
|
187
228
|
|
188
229
|
Called when the parser meets a CDATA section.
|
230
|
+
|
189
231
|
XMLScan::Visitor#on_entityref(ref)
|
190
232
|
|
191
233
|
Called when the parser meets a general entity reference
|
192
234
|
in a place except an attribute value.
|
235
|
+
|
193
236
|
XMLScan::Visitor#on_charref(code)
|
194
237
|
XMLScan::Visitor#on_charref_hex(code)
|
195
238
|
|
196
239
|
Called when the parser meets a character reference
|
197
240
|
in a place except an attribute value.
|
241
|
+
|
198
242
|
When the character code is represented by decimals,
|
199
243
|
on_charref is called. When by hexadecimals, on_charref_hex
|
200
244
|
is called. code is an integer.
|
245
|
+
|
201
246
|
XMLScan::Visitor#on_stag(name)
|
202
247
|
XMLScan::Visitor#on_attribute(name)
|
203
248
|
XMLScan::Visitor#on_attr_value(str)
|
@@ -226,19 +271,24 @@ Called when the parser meets an XML declaration.
|
|
226
271
|
9: on_stag_end ('hoge')
|
227
272
|
or
|
228
273
|
on_stag_end_empty ('hoge')
|
274
|
+
|
229
275
|
When a start tag is found, both on_stag and corresponding
|
230
276
|
either on_stag_end or on_stag_end_empty method are always
|
231
277
|
called. Any other methods are called only when at least one
|
232
278
|
attribute is found in the start tag.
|
279
|
+
|
233
280
|
When an attribute is found, both on_attribute and
|
234
281
|
on_attribute_end method are always called. If the attribute
|
235
282
|
value is empty, only these two methods are called.
|
283
|
+
|
236
284
|
When the parser meets a general entity reference in an
|
237
285
|
attribute value, it calls on_attr_entityref method.
|
286
|
+
|
238
287
|
When the parser meets a character reference in an attribute
|
239
288
|
value, it calls either on_charref or on_charref_hex method.
|
240
289
|
If the tag is an empty element tag, on_stag_end_empty method
|
241
290
|
is called instead of on_stag_end method.
|
291
|
+
|
242
292
|
XMLScan::Visitor#on_etag(name)
|
243
293
|
|
244
294
|
Called when the parser meets an end tag.
|
@@ -246,6 +296,7 @@ Called when the parser meets an end tag.
|
|
246
296
|
XMLScan::XMLScanner
|
247
297
|
The scanner which tokenizes an XML document and recognize tags,
|
248
298
|
and so on.
|
299
|
+
|
249
300
|
The conformance of XMLScan::XMLScanner to the specification
|
250
301
|
is described in another document.
|
251
302
|
SuperClass:
|
@@ -259,29 +310,33 @@ XMLScan::XMLScanner.new(visitor[, option ...])
|
|
259
310
|
Creates an instance. visitor is a instance of
|
260
311
|
XMLScan::Visitor and receives the result of parsing
|
261
312
|
from the XMLScan::Scanner object.
|
313
|
+
|
262
314
|
You can specify one of more option as a string or symbol.
|
315
|
+
|
263
316
|
XMLScan::Scanner's options are as follows:
|
264
317
|
|
265
318
|
'strict_char'
|
266
319
|
|
267
320
|
This option is enabled after
|
268
321
|
require 'xmlscan/xmlchar'.
|
322
|
+
|
269
323
|
XMLScan::Scanner checks whether an XML document includes
|
270
324
|
an illegal character. The performance decreases sharply.
|
271
325
|
|
272
326
|
|
273
|
-
|
274
327
|
Methods:
|
275
328
|
|
276
329
|
XMLScan::XMLScanner#kcode= arg
|
277
330
|
|
278
331
|
Sets CES. Available values for code are same as $KCODE
|
279
332
|
except nil. If code is nil, $KCODE decides the CES.
|
333
|
+
|
280
334
|
XMLScan::XMLScanner#kcode
|
281
335
|
|
282
336
|
Returns CES. The format of the return value is same as
|
283
337
|
Regexp#kcode. If this method returns nil, it represents that
|
284
338
|
$KCODE decides the CES.
|
339
|
+
|
285
340
|
XMLScan::XMLScanner#parse(source)
|
286
341
|
|
287
342
|
Parses source as an XML document. source must be
|
@@ -289,9 +344,12 @@ a string, an array of strings, or an object which responds to
|
|
289
344
|
gets method which behaves same as IO#gets does.
|
290
345
|
|
291
346
|
XMLScan::XMLParser
|
347
|
+
|
292
348
|
The non-validating XML parser.
|
349
|
+
|
293
350
|
The conformance of XMLScan::XMLParser to the specification
|
294
351
|
is described in another document.
|
352
|
+
|
295
353
|
SuperClass:
|
296
354
|
|
297
355
|
XMLScan::XMLScanner
|
@@ -308,7 +366,6 @@ XMLScan::Visitor#on_stag
|
|
308
366
|
After calling this method, XMLScan::Parser always call
|
309
367
|
corresponding XMLScan::Visitor#on_etag method.
|
310
368
|
|
311
|
-
|
312
369
|
In addition, if you never intend error recovery, method calls
|
313
370
|
which must not be occurred in a well-formed XML document are
|
314
371
|
all suppressed.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.3.
|
1
|
+
0.3.0
|
metadata
CHANGED
@@ -1,19 +1,19 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xmlscan
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
5
|
-
prerelease:
|
4
|
+
version: 0.3.0
|
5
|
+
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- UENO Katsuhiro <katsu@blue.sky.or.jp>
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-02-
|
12
|
+
date: 2012-02-26 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &5315060 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: 2.8.0
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *5315060
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: rdoc
|
27
|
-
requirement: &
|
27
|
+
requirement: &5313840 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ~>
|
@@ -32,10 +32,10 @@ dependencies:
|
|
32
32
|
version: '3.12'
|
33
33
|
type: :development
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *5313840
|
36
36
|
- !ruby/object:Gem::Dependency
|
37
37
|
name: bundler
|
38
|
-
requirement: &
|
38
|
+
requirement: &5312600 !ruby/object:Gem::Requirement
|
39
39
|
none: false
|
40
40
|
requirements:
|
41
41
|
- - ~>
|
@@ -43,10 +43,10 @@ dependencies:
|
|
43
43
|
version: 1.0.0
|
44
44
|
type: :development
|
45
45
|
prerelease: false
|
46
|
-
version_requirements: *
|
46
|
+
version_requirements: *5312600
|
47
47
|
- !ruby/object:Gem::Dependency
|
48
48
|
name: jeweler
|
49
|
-
requirement: &
|
49
|
+
requirement: &5310620 !ruby/object:Gem::Requirement
|
50
50
|
none: false
|
51
51
|
requirements:
|
52
52
|
- - ~>
|
@@ -54,7 +54,7 @@ dependencies:
|
|
54
54
|
version: 1.8.3
|
55
55
|
type: :development
|
56
56
|
prerelease: false
|
57
|
-
version_requirements: *
|
57
|
+
version_requirements: *5310620
|
58
58
|
description: The fastest XML parser written in 100% pure Ruby.
|
59
59
|
email: gerryg@inbox.com
|
60
60
|
executables: []
|
@@ -96,13 +96,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
96
96
|
version: '0'
|
97
97
|
segments:
|
98
98
|
- 0
|
99
|
-
hash:
|
99
|
+
hash: 21734141110843515
|
100
100
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
101
101
|
none: false
|
102
102
|
requirements:
|
103
|
-
- - ! '
|
103
|
+
- - ! '>='
|
104
104
|
- !ruby/object:Gem::Version
|
105
|
-
version:
|
105
|
+
version: '0'
|
106
106
|
requirements: []
|
107
107
|
rubyforge_project:
|
108
108
|
rubygems_version: 1.8.15
|