xmlscan 0.3.0prec → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +59 -2
- data/VERSION +1 -1
- metadata +14 -14
data/README.rdoc
CHANGED
@@ -44,6 +44,20 @@ a core part of a library providing such features.
|
|
44
44
|
|
45
45
|
XMLscan contains htmlscan, an HTML parser.
|
46
46
|
|
47
|
+
== New Scanner Translater Feature with 0.3.0
|
48
|
+
|
49
|
+
Created for a use case in Wagn (http://wagn.org)
|
50
|
+
|
51
|
+
The fragment is pretty much the whole parsing as used in Wagn:
|
52
|
+
|
53
|
+
pairs = XMLScan::XMLProcessor.process(io, {:key=>:name, :element=>:card,
|
54
|
+
:substitute=>":transclude|{{:name}}", :extras=>[:type]})
|
55
|
+
|
56
|
+
The 'io' object is directly passed from the request.body in a XML POST action.
|
57
|
+
|
58
|
+
ref: https://github.com/GerryG/wagn/commit/46e57dfd88cde45d33fe7f167d5803ab819ceef3#commitcomment-1012403
|
59
|
+
|
60
|
+
See the tests cases in test/integration for some examples.
|
47
61
|
|
48
62
|
=== Character encodings
|
49
63
|
|
@@ -58,6 +72,9 @@ Shift_JIS, or UTF-8.
|
|
58
72
|
UTF-16 is not supported directly. You should convert it into
|
59
73
|
UTF-8 before parsing.
|
60
74
|
|
75
|
+
Character encoding needs more work with 0.3.0+. The character maps were
|
76
|
+
taken out in favor of \p{} RE classes in ruby, but it needs attention still.
|
77
|
+
|
61
78
|
=== XML Namespaces
|
62
79
|
|
63
80
|
XML Namespaces have been already implemented in
|
@@ -89,13 +106,21 @@ Raised when an XML document violates an validity constraint.
|
|
89
106
|
|
90
107
|
|
91
108
|
XMLScan::Visitor
|
109
|
+
|
92
110
|
Mix-in for receiving the result of parsing an XML document.
|
93
111
|
Each parser included in xmlscan parses an XML document from
|
94
112
|
the beginning, and calls each specific method of given instance of
|
95
113
|
XMLScan::Visitor for each syntactic element, such as a tag.
|
96
114
|
It is ensured that these calls is in order of the appearance
|
97
115
|
in the document from the beginning.
|
116
|
+
|
117
|
+
Changes: The scanner/parser now sends some additional arguments with
|
118
|
+
each event, primarily the original string that was parsed for the even, but
|
119
|
+
also a hash of attributes for stag_end. This supports scanning for
|
120
|
+
replacement of some elements very simple.
|
121
|
+
|
98
122
|
Methods:
|
123
|
+
|
99
124
|
Without special notice, the following methods do nothing by
|
100
125
|
default.
|
101
126
|
|
@@ -107,6 +132,7 @@ a production. By default, this method raises
|
|
107
132
|
XMLScan::ParseError exception. If no exception is
|
108
133
|
raised and this method returns normally, the parser recovers
|
109
134
|
the error and continues to parse.
|
135
|
+
|
110
136
|
XMLScan::Visitor#wellformed_error(msg)
|
111
137
|
|
112
138
|
Called when the parser meets an well-formedness constraint
|
@@ -114,6 +140,7 @@ violation. By default, this method raises
|
|
114
140
|
XMLScan::NotWellFormedError exception. If no exception
|
115
141
|
is raised and this method returns normally, the parser recovers
|
116
142
|
the error and continues to parse.
|
143
|
+
|
117
144
|
XMLScan::Visitor#valid_error(msg)
|
118
145
|
|
119
146
|
Called when the parser meets validity constraint
|
@@ -123,18 +150,23 @@ is raised and this method returns normally, the parser recovers
|
|
123
150
|
the error and continues to parse.
|
124
151
|
FYI, current version of xmlscan includes no validating XML
|
125
152
|
processor. This method is reserved for future versions.
|
153
|
+
|
126
154
|
XMLScan::Visitor#warning(msg)
|
127
155
|
|
128
156
|
Called when the parser meets a non-error but unrecommended
|
129
157
|
thing or a syntax which xmlscan is not able to parse.
|
158
|
+
|
130
159
|
XMLScan::Visitor#on_start_document
|
131
160
|
|
132
161
|
Called just before the parser starts parsing an XML document.
|
162
|
+
|
133
163
|
After this method is called, corresponding
|
134
164
|
XMLScan::Visitor#on_end_document method is always called.
|
165
|
+
|
135
166
|
XMLScan::Visitor#on_end_document
|
136
167
|
|
137
168
|
Called after the parser reaches the end of an XML document.
|
169
|
+
|
138
170
|
XMLScan::Visitor#on_xmldecl
|
139
171
|
XMLScan::Visitor#on_xmldecl_version(str)
|
140
172
|
XMLScan::Visitor#on_xmldecl_encoding(str)
|
@@ -154,50 +186,63 @@ Called when the parser meets an XML declaration.
|
|
154
186
|
3: on_xmldecl_encoding ("euc-jp")
|
155
187
|
4: on_xmldecl_standalone ("yes")
|
156
188
|
5: on_xmldecl_end
|
189
|
+
|
157
190
|
When an XML declaration is found, both on_xmldecl and
|
158
191
|
on_xmldecl_end method are always called. Any other methods
|
159
192
|
are called only when the corresponding syntaxes are found.
|
193
|
+
|
160
194
|
When a declaration except version, encoding, and standalone
|
161
195
|
is found in an XML declaration, on_xmldecl_other method is
|
162
196
|
called. Since such a declaration is not permitted, note that
|
163
197
|
the parser always calls XMLScan::Visitor#parse_error method
|
164
198
|
before calling on_xmldecl_other method.
|
199
|
+
|
165
200
|
XMLScan::Visitor#on_doctype(root, pubid, sysid)
|
166
201
|
|
167
202
|
Called when the parser meets a document type declaration.
|
203
|
+
|
168
204
|
document argument
|
169
205
|
--------------------------------------------------------------
|
170
206
|
1: <!DOCTYPE foo> ('foo', nil, nil)
|
171
207
|
2: <!DOCTYPE foo SYSTEM "bar"> ('foo', nil, 'bar')
|
172
208
|
3: <!DOCTYPE foo PUBLIC "bar"> ('foo', 'bar', nil )
|
173
209
|
4: <!DOCTYPE foo PUBLIC "bar" "baz"> ('foo', 'bar', 'baz')
|
210
|
+
|
174
211
|
XMLScan::Visitor#on_prolog_space(str)
|
175
212
|
|
176
213
|
Called when the parser meets whitespaces in prolog.
|
214
|
+
|
177
215
|
XMLScan::Visitor#on_comment(str)
|
178
216
|
|
179
217
|
Called when the parser meets a comment.
|
218
|
+
|
180
219
|
XMLScan::Visitor#on_pi(target, pi)
|
181
220
|
|
182
221
|
Called when the parser meets a processing instruction.
|
222
|
+
|
183
223
|
XMLScan::Visitor#on_chardata(str)
|
184
224
|
|
185
225
|
Called when the parser meets character data.
|
226
|
+
|
186
227
|
XMLScan::Visitor#on_cdata(str)
|
187
228
|
|
188
229
|
Called when the parser meets a CDATA section.
|
230
|
+
|
189
231
|
XMLScan::Visitor#on_entityref(ref)
|
190
232
|
|
191
233
|
Called when the parser meets a general entity reference
|
192
234
|
in a place except an attribute value.
|
235
|
+
|
193
236
|
XMLScan::Visitor#on_charref(code)
|
194
237
|
XMLScan::Visitor#on_charref_hex(code)
|
195
238
|
|
196
239
|
Called when the parser meets a character reference
|
197
240
|
in a place except an attribute value.
|
241
|
+
|
198
242
|
When the character code is represented by decimals,
|
199
243
|
on_charref is called. When by hexadecimals, on_charref_hex
|
200
244
|
is called. code is an integer.
|
245
|
+
|
201
246
|
XMLScan::Visitor#on_stag(name)
|
202
247
|
XMLScan::Visitor#on_attribute(name)
|
203
248
|
XMLScan::Visitor#on_attr_value(str)
|
@@ -226,19 +271,24 @@ Called when the parser meets an XML declaration.
|
|
226
271
|
9: on_stag_end ('hoge')
|
227
272
|
or
|
228
273
|
on_stag_end_empty ('hoge')
|
274
|
+
|
229
275
|
When a start tag is found, both on_stag and corresponding
|
230
276
|
either on_stag_end or on_stag_end_empty method are always
|
231
277
|
called. Any other methods are called only when at least one
|
232
278
|
attribute is found in the start tag.
|
279
|
+
|
233
280
|
When an attribute is found, both on_attribute and
|
234
281
|
on_attribute_end method are always called. If the attribute
|
235
282
|
value is empty, only these two methods are called.
|
283
|
+
|
236
284
|
When the parser meets a general entity reference in an
|
237
285
|
attribute value, it calls on_attr_entityref method.
|
286
|
+
|
238
287
|
When the parser meets a character reference in an attribute
|
239
288
|
value, it calls either on_charref or on_charref_hex method.
|
240
289
|
If the tag is an empty element tag, on_stag_end_empty method
|
241
290
|
is called instead of on_stag_end method.
|
291
|
+
|
242
292
|
XMLScan::Visitor#on_etag(name)
|
243
293
|
|
244
294
|
Called when the parser meets an end tag.
|
@@ -246,6 +296,7 @@ Called when the parser meets an end tag.
|
|
246
296
|
XMLScan::XMLScanner
|
247
297
|
The scanner which tokenizes an XML document and recognize tags,
|
248
298
|
and so on.
|
299
|
+
|
249
300
|
The conformance of XMLScan::XMLScanner to the specification
|
250
301
|
is described in another document.
|
251
302
|
SuperClass:
|
@@ -259,29 +310,33 @@ XMLScan::XMLScanner.new(visitor[, option ...])
|
|
259
310
|
Creates an instance. visitor is a instance of
|
260
311
|
XMLScan::Visitor and receives the result of parsing
|
261
312
|
from the XMLScan::Scanner object.
|
313
|
+
|
262
314
|
You can specify one of more option as a string or symbol.
|
315
|
+
|
263
316
|
XMLScan::Scanner's options are as follows:
|
264
317
|
|
265
318
|
'strict_char'
|
266
319
|
|
267
320
|
This option is enabled after
|
268
321
|
require 'xmlscan/xmlchar'.
|
322
|
+
|
269
323
|
XMLScan::Scanner checks whether an XML document includes
|
270
324
|
an illegal character. The performance decreases sharply.
|
271
325
|
|
272
326
|
|
273
|
-
|
274
327
|
Methods:
|
275
328
|
|
276
329
|
XMLScan::XMLScanner#kcode= arg
|
277
330
|
|
278
331
|
Sets CES. Available values for code are same as $KCODE
|
279
332
|
except nil. If code is nil, $KCODE decides the CES.
|
333
|
+
|
280
334
|
XMLScan::XMLScanner#kcode
|
281
335
|
|
282
336
|
Returns CES. The format of the return value is same as
|
283
337
|
Regexp#kcode. If this method returns nil, it represents that
|
284
338
|
$KCODE decides the CES.
|
339
|
+
|
285
340
|
XMLScan::XMLScanner#parse(source)
|
286
341
|
|
287
342
|
Parses source as an XML document. source must be
|
@@ -289,9 +344,12 @@ a string, an array of strings, or an object which responds to
|
|
289
344
|
gets method which behaves same as IO#gets does.
|
290
345
|
|
291
346
|
XMLScan::XMLParser
|
347
|
+
|
292
348
|
The non-validating XML parser.
|
349
|
+
|
293
350
|
The conformance of XMLScan::XMLParser to the specification
|
294
351
|
is described in another document.
|
352
|
+
|
295
353
|
SuperClass:
|
296
354
|
|
297
355
|
XMLScan::XMLScanner
|
@@ -308,7 +366,6 @@ XMLScan::Visitor#on_stag
|
|
308
366
|
After calling this method, XMLScan::Parser always call
|
309
367
|
corresponding XMLScan::Visitor#on_etag method.
|
310
368
|
|
311
|
-
|
312
369
|
In addition, if you never intend error recovery, method calls
|
313
370
|
which must not be occurred in a well-formed XML document are
|
314
371
|
all suppressed.
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.3.
|
1
|
+
0.3.0
|
metadata
CHANGED
@@ -1,19 +1,19 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: xmlscan
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
5
|
-
prerelease:
|
4
|
+
version: 0.3.0
|
5
|
+
prerelease:
|
6
6
|
platform: ruby
|
7
7
|
authors:
|
8
8
|
- UENO Katsuhiro <katsu@blue.sky.or.jp>
|
9
9
|
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2012-02-
|
12
|
+
date: 2012-02-26 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rspec
|
16
|
-
requirement: &
|
16
|
+
requirement: &5315060 !ruby/object:Gem::Requirement
|
17
17
|
none: false
|
18
18
|
requirements:
|
19
19
|
- - ~>
|
@@ -21,10 +21,10 @@ dependencies:
|
|
21
21
|
version: 2.8.0
|
22
22
|
type: :development
|
23
23
|
prerelease: false
|
24
|
-
version_requirements: *
|
24
|
+
version_requirements: *5315060
|
25
25
|
- !ruby/object:Gem::Dependency
|
26
26
|
name: rdoc
|
27
|
-
requirement: &
|
27
|
+
requirement: &5313840 !ruby/object:Gem::Requirement
|
28
28
|
none: false
|
29
29
|
requirements:
|
30
30
|
- - ~>
|
@@ -32,10 +32,10 @@ dependencies:
|
|
32
32
|
version: '3.12'
|
33
33
|
type: :development
|
34
34
|
prerelease: false
|
35
|
-
version_requirements: *
|
35
|
+
version_requirements: *5313840
|
36
36
|
- !ruby/object:Gem::Dependency
|
37
37
|
name: bundler
|
38
|
-
requirement: &
|
38
|
+
requirement: &5312600 !ruby/object:Gem::Requirement
|
39
39
|
none: false
|
40
40
|
requirements:
|
41
41
|
- - ~>
|
@@ -43,10 +43,10 @@ dependencies:
|
|
43
43
|
version: 1.0.0
|
44
44
|
type: :development
|
45
45
|
prerelease: false
|
46
|
-
version_requirements: *
|
46
|
+
version_requirements: *5312600
|
47
47
|
- !ruby/object:Gem::Dependency
|
48
48
|
name: jeweler
|
49
|
-
requirement: &
|
49
|
+
requirement: &5310620 !ruby/object:Gem::Requirement
|
50
50
|
none: false
|
51
51
|
requirements:
|
52
52
|
- - ~>
|
@@ -54,7 +54,7 @@ dependencies:
|
|
54
54
|
version: 1.8.3
|
55
55
|
type: :development
|
56
56
|
prerelease: false
|
57
|
-
version_requirements: *
|
57
|
+
version_requirements: *5310620
|
58
58
|
description: The fastest XML parser written in 100% pure Ruby.
|
59
59
|
email: gerryg@inbox.com
|
60
60
|
executables: []
|
@@ -96,13 +96,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
96
96
|
version: '0'
|
97
97
|
segments:
|
98
98
|
- 0
|
99
|
-
hash:
|
99
|
+
hash: 21734141110843515
|
100
100
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
101
101
|
none: false
|
102
102
|
requirements:
|
103
|
-
- - ! '
|
103
|
+
- - ! '>='
|
104
104
|
- !ruby/object:Gem::Version
|
105
|
-
version:
|
105
|
+
version: '0'
|
106
106
|
requirements: []
|
107
107
|
rubyforge_project:
|
108
108
|
rubygems_version: 1.8.15
|