xmlscan 0.3.0prec → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (3) hide show
  1. data/README.rdoc +59 -2
  2. data/VERSION +1 -1
  3. metadata +14 -14
@@ -44,6 +44,20 @@ a core part of a library providing such features.
44
44
 
45
45
  XMLscan contains htmlscan, an HTML parser.
46
46
 
47
+ == New Scanner Translater Feature with 0.3.0
48
+
49
+ Created for a use case in Wagn (http://wagn.org)
50
+
51
+ The fragment is pretty much the whole parsing as used in Wagn:
52
+
53
+ pairs = XMLScan::XMLProcessor.process(io, {:key=>:name, :element=>:card,
54
+ :substitute=>":transclude|{{:name}}", :extras=>[:type]})
55
+
56
+ The 'io' object is directly passed from the request.body in a XML POST action.
57
+
58
+ ref: https://github.com/GerryG/wagn/commit/46e57dfd88cde45d33fe7f167d5803ab819ceef3#commitcomment-1012403
59
+
60
+ See the tests cases in test/integration for some examples.
47
61
 
48
62
  === Character encodings
49
63
 
@@ -58,6 +72,9 @@ Shift_JIS, or UTF-8.
58
72
  UTF-16 is not supported directly. You should convert it into
59
73
  UTF-8 before parsing.
60
74
 
75
+ Character encoding needs more work with 0.3.0+. The character maps were
76
+ taken out in favor of \p{} RE classes in ruby, but it needs attention still.
77
+
61
78
  === XML Namespaces
62
79
 
63
80
  XML Namespaces have been already implemented in
@@ -89,13 +106,21 @@ Raised when an XML document violates an validity constraint.
89
106
 
90
107
 
91
108
  XMLScan::Visitor
109
+
92
110
  Mix-in for receiving the result of parsing an XML document.
93
111
  Each parser included in xmlscan parses an XML document from
94
112
  the beginning, and calls each specific method of given instance of
95
113
  XMLScan::Visitor for each syntactic element, such as a tag.
96
114
  It is ensured that these calls is in order of the appearance
97
115
  in the document from the beginning.
116
+
117
+ Changes: The scanner/parser now sends some additional arguments with
118
+ each event, primarily the original string that was parsed for the even, but
119
+ also a hash of attributes for stag_end. This supports scanning for
120
+ replacement of some elements very simple.
121
+
98
122
  Methods:
123
+
99
124
  Without special notice, the following methods do nothing by
100
125
  default.
101
126
 
@@ -107,6 +132,7 @@ a production. By default, this method raises
107
132
  XMLScan::ParseError exception. If no exception is
108
133
  raised and this method returns normally, the parser recovers
109
134
  the error and continues to parse.
135
+
110
136
  XMLScan::Visitor#wellformed_error(msg)
111
137
 
112
138
  Called when the parser meets an well-formedness constraint
@@ -114,6 +140,7 @@ violation. By default, this method raises
114
140
  XMLScan::NotWellFormedError exception. If no exception
115
141
  is raised and this method returns normally, the parser recovers
116
142
  the error and continues to parse.
143
+
117
144
  XMLScan::Visitor#valid_error(msg)
118
145
 
119
146
  Called when the parser meets validity constraint
@@ -123,18 +150,23 @@ is raised and this method returns normally, the parser recovers
123
150
  the error and continues to parse.
124
151
  FYI, current version of xmlscan includes no validating XML
125
152
  processor. This method is reserved for future versions.
153
+
126
154
  XMLScan::Visitor#warning(msg)
127
155
 
128
156
  Called when the parser meets a non-error but unrecommended
129
157
  thing or a syntax which xmlscan is not able to parse.
158
+
130
159
  XMLScan::Visitor#on_start_document
131
160
 
132
161
  Called just before the parser starts parsing an XML document.
162
+
133
163
  After this method is called, corresponding
134
164
  XMLScan::Visitor#on_end_document method is always called.
165
+
135
166
  XMLScan::Visitor#on_end_document
136
167
 
137
168
  Called after the parser reaches the end of an XML document.
169
+
138
170
  XMLScan::Visitor#on_xmldecl
139
171
  XMLScan::Visitor#on_xmldecl_version(str)
140
172
  XMLScan::Visitor#on_xmldecl_encoding(str)
@@ -154,50 +186,63 @@ Called when the parser meets an XML declaration.
154
186
  3: on_xmldecl_encoding ("euc-jp")
155
187
  4: on_xmldecl_standalone ("yes")
156
188
  5: on_xmldecl_end
189
+
157
190
  When an XML declaration is found, both on_xmldecl and
158
191
  on_xmldecl_end method are always called. Any other methods
159
192
  are called only when the corresponding syntaxes are found.
193
+
160
194
  When a declaration except version, encoding, and standalone
161
195
  is found in an XML declaration, on_xmldecl_other method is
162
196
  called. Since such a declaration is not permitted, note that
163
197
  the parser always calls XMLScan::Visitor#parse_error method
164
198
  before calling on_xmldecl_other method.
199
+
165
200
  XMLScan::Visitor#on_doctype(root, pubid, sysid)
166
201
 
167
202
  Called when the parser meets a document type declaration.
203
+
168
204
  document argument
169
205
  --------------------------------------------------------------
170
206
  1: <!DOCTYPE foo> ('foo', nil, nil)
171
207
  2: <!DOCTYPE foo SYSTEM "bar"> ('foo', nil, 'bar')
172
208
  3: <!DOCTYPE foo PUBLIC "bar"> ('foo', 'bar', nil )
173
209
  4: <!DOCTYPE foo PUBLIC "bar" "baz"> ('foo', 'bar', 'baz')
210
+
174
211
  XMLScan::Visitor#on_prolog_space(str)
175
212
 
176
213
  Called when the parser meets whitespaces in prolog.
214
+
177
215
  XMLScan::Visitor#on_comment(str)
178
216
 
179
217
  Called when the parser meets a comment.
218
+
180
219
  XMLScan::Visitor#on_pi(target, pi)
181
220
 
182
221
  Called when the parser meets a processing instruction.
222
+
183
223
  XMLScan::Visitor#on_chardata(str)
184
224
 
185
225
  Called when the parser meets character data.
226
+
186
227
  XMLScan::Visitor#on_cdata(str)
187
228
 
188
229
  Called when the parser meets a CDATA section.
230
+
189
231
  XMLScan::Visitor#on_entityref(ref)
190
232
 
191
233
  Called when the parser meets a general entity reference
192
234
  in a place except an attribute value.
235
+
193
236
  XMLScan::Visitor#on_charref(code)
194
237
  XMLScan::Visitor#on_charref_hex(code)
195
238
 
196
239
  Called when the parser meets a character reference
197
240
  in a place except an attribute value.
241
+
198
242
  When the character code is represented by decimals,
199
243
  on_charref is called. When by hexadecimals, on_charref_hex
200
244
  is called. code is an integer.
245
+
201
246
  XMLScan::Visitor#on_stag(name)
202
247
  XMLScan::Visitor#on_attribute(name)
203
248
  XMLScan::Visitor#on_attr_value(str)
@@ -226,19 +271,24 @@ Called when the parser meets an XML declaration.
226
271
  9: on_stag_end ('hoge')
227
272
  or
228
273
  on_stag_end_empty ('hoge')
274
+
229
275
  When a start tag is found, both on_stag and corresponding
230
276
  either on_stag_end or on_stag_end_empty method are always
231
277
  called. Any other methods are called only when at least one
232
278
  attribute is found in the start tag.
279
+
233
280
  When an attribute is found, both on_attribute and
234
281
  on_attribute_end method are always called. If the attribute
235
282
  value is empty, only these two methods are called.
283
+
236
284
  When the parser meets a general entity reference in an
237
285
  attribute value, it calls on_attr_entityref method.
286
+
238
287
  When the parser meets a character reference in an attribute
239
288
  value, it calls either on_charref or on_charref_hex method.
240
289
  If the tag is an empty element tag, on_stag_end_empty method
241
290
  is called instead of on_stag_end method.
291
+
242
292
  XMLScan::Visitor#on_etag(name)
243
293
 
244
294
  Called when the parser meets an end tag.
@@ -246,6 +296,7 @@ Called when the parser meets an end tag.
246
296
  XMLScan::XMLScanner
247
297
  The scanner which tokenizes an XML document and recognize tags,
248
298
  and so on.
299
+
249
300
  The conformance of XMLScan::XMLScanner to the specification
250
301
  is described in another document.
251
302
  SuperClass:
@@ -259,29 +310,33 @@ XMLScan::XMLScanner.new(visitor[, option ...])
259
310
  Creates an instance. visitor is a instance of
260
311
  XMLScan::Visitor and receives the result of parsing
261
312
  from the XMLScan::Scanner object.
313
+
262
314
  You can specify one of more option as a string or symbol.
315
+
263
316
  XMLScan::Scanner's options are as follows:
264
317
 
265
318
  'strict_char'
266
319
 
267
320
  This option is enabled after
268
321
  require 'xmlscan/xmlchar'.
322
+
269
323
  XMLScan::Scanner checks whether an XML document includes
270
324
  an illegal character. The performance decreases sharply.
271
325
 
272
326
 
273
-
274
327
  Methods:
275
328
 
276
329
  XMLScan::XMLScanner#kcode= arg
277
330
 
278
331
  Sets CES. Available values for code are same as $KCODE
279
332
  except nil. If code is nil, $KCODE decides the CES.
333
+
280
334
  XMLScan::XMLScanner#kcode
281
335
 
282
336
  Returns CES. The format of the return value is same as
283
337
  Regexp#kcode. If this method returns nil, it represents that
284
338
  $KCODE decides the CES.
339
+
285
340
  XMLScan::XMLScanner#parse(source)
286
341
 
287
342
  Parses source as an XML document. source must be
@@ -289,9 +344,12 @@ a string, an array of strings, or an object which responds to
289
344
  gets method which behaves same as IO#gets does.
290
345
 
291
346
  XMLScan::XMLParser
347
+
292
348
  The non-validating XML parser.
349
+
293
350
  The conformance of XMLScan::XMLParser to the specification
294
351
  is described in another document.
352
+
295
353
  SuperClass:
296
354
 
297
355
  XMLScan::XMLScanner
@@ -308,7 +366,6 @@ XMLScan::Visitor#on_stag
308
366
  After calling this method, XMLScan::Parser always call
309
367
  corresponding XMLScan::Visitor#on_etag method.
310
368
 
311
-
312
369
  In addition, if you never intend error recovery, method calls
313
370
  which must not be occurred in a well-formed XML document are
314
371
  all suppressed.
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.3.0prec
1
+ 0.3.0
metadata CHANGED
@@ -1,19 +1,19 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xmlscan
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0prec
5
- prerelease: 5
4
+ version: 0.3.0
5
+ prerelease:
6
6
  platform: ruby
7
7
  authors:
8
8
  - UENO Katsuhiro <katsu@blue.sky.or.jp>
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2012-02-17 00:00:00.000000000 Z
12
+ date: 2012-02-26 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rspec
16
- requirement: &9706320 !ruby/object:Gem::Requirement
16
+ requirement: &5315060 !ruby/object:Gem::Requirement
17
17
  none: false
18
18
  requirements:
19
19
  - - ~>
@@ -21,10 +21,10 @@ dependencies:
21
21
  version: 2.8.0
22
22
  type: :development
23
23
  prerelease: false
24
- version_requirements: *9706320
24
+ version_requirements: *5315060
25
25
  - !ruby/object:Gem::Dependency
26
26
  name: rdoc
27
- requirement: &9705800 !ruby/object:Gem::Requirement
27
+ requirement: &5313840 !ruby/object:Gem::Requirement
28
28
  none: false
29
29
  requirements:
30
30
  - - ~>
@@ -32,10 +32,10 @@ dependencies:
32
32
  version: '3.12'
33
33
  type: :development
34
34
  prerelease: false
35
- version_requirements: *9705800
35
+ version_requirements: *5313840
36
36
  - !ruby/object:Gem::Dependency
37
37
  name: bundler
38
- requirement: &9705220 !ruby/object:Gem::Requirement
38
+ requirement: &5312600 !ruby/object:Gem::Requirement
39
39
  none: false
40
40
  requirements:
41
41
  - - ~>
@@ -43,10 +43,10 @@ dependencies:
43
43
  version: 1.0.0
44
44
  type: :development
45
45
  prerelease: false
46
- version_requirements: *9705220
46
+ version_requirements: *5312600
47
47
  - !ruby/object:Gem::Dependency
48
48
  name: jeweler
49
- requirement: &9704580 !ruby/object:Gem::Requirement
49
+ requirement: &5310620 !ruby/object:Gem::Requirement
50
50
  none: false
51
51
  requirements:
52
52
  - - ~>
@@ -54,7 +54,7 @@ dependencies:
54
54
  version: 1.8.3
55
55
  type: :development
56
56
  prerelease: false
57
- version_requirements: *9704580
57
+ version_requirements: *5310620
58
58
  description: The fastest XML parser written in 100% pure Ruby.
59
59
  email: gerryg@inbox.com
60
60
  executables: []
@@ -96,13 +96,13 @@ required_ruby_version: !ruby/object:Gem::Requirement
96
96
  version: '0'
97
97
  segments:
98
98
  - 0
99
- hash: 4206592949743860129
99
+ hash: 21734141110843515
100
100
  required_rubygems_version: !ruby/object:Gem::Requirement
101
101
  none: false
102
102
  requirements:
103
- - - ! '>'
103
+ - - ! '>='
104
104
  - !ruby/object:Gem::Version
105
- version: 1.3.1
105
+ version: '0'
106
106
  requirements: []
107
107
  rubyforge_project:
108
108
  rubygems_version: 1.8.15