rgen 0.3.0 → 0.4.0
Sign up to get free protection for your applications and to get access to all the features.
- data/CHANGELOG +20 -1
- data/MIT-LICENSE +1 -1
- data/README +12 -9
- data/lib/instantiators/ea_instantiator.rb +36 -0
- data/lib/metamodels/uml13_metamodel.rb +559 -0
- data/lib/metamodels/uml13_metamodel_ext.rb +26 -0
- data/lib/mmgen/metamodel_generator.rb +5 -5
- data/lib/mmgen/mm_ext/ecore_ext.rb +95 -0
- data/lib/mmgen/mmgen.rb +6 -4
- data/lib/mmgen/templates/annotations.tpl +37 -0
- data/lib/mmgen/templates/metamodel_generator.tpl +171 -0
- data/lib/rgen/ecore/ecore.rb +190 -0
- data/lib/rgen/ecore/ecore_instantiator.rb +25 -0
- data/lib/rgen/ecore/ecore_transformer.rb +85 -0
- data/lib/rgen/environment.rb +9 -24
- data/lib/rgen/find_helper.rb +68 -0
- data/lib/rgen/{instantiator.rb → instantiator/abstract_instantiator.rb} +6 -2
- data/lib/rgen/instantiator/abstract_xml_instantiator.rb +59 -0
- data/lib/rgen/instantiator/default_xml_instantiator.rb +117 -0
- data/lib/rgen/instantiator/ecore_xml_instantiator.rb +144 -0
- data/lib/rgen/instantiator/nodebased_xml_instantiator.rb +157 -0
- data/lib/rgen/instantiator/xmi11_instantiator.rb +164 -0
- data/lib/rgen/metamodel_builder.rb +103 -9
- data/lib/rgen/metamodel_builder/build_helper.rb +26 -4
- data/lib/rgen/metamodel_builder/builder_extensions.rb +285 -88
- data/lib/rgen/metamodel_builder/builder_runtime.rb +7 -1
- data/lib/rgen/metamodel_builder/data_types.rb +67 -0
- data/lib/rgen/metamodel_builder/intermediate/annotation.rb +30 -0
- data/lib/rgen/metamodel_builder/metamodel_description.rb +232 -0
- data/lib/rgen/metamodel_builder/mm_multiple.rb +23 -0
- data/lib/rgen/metamodel_builder/module_extension.rb +33 -0
- data/lib/rgen/model_comparator.rb +56 -0
- data/lib/rgen/model_dumper.rb +5 -5
- data/lib/rgen/name_helper.rb +17 -1
- data/lib/rgen/template_language.rb +148 -28
- data/lib/rgen/template_language/directory_template_container.rb +56 -38
- data/lib/rgen/template_language/output_handler.rb +93 -77
- data/lib/rgen/template_language/template_container.rb +186 -143
- data/lib/rgen/transformer.rb +19 -14
- data/lib/transformers/uml13_to_ecore.rb +75 -0
- data/redist/xmlscan/ChangeLog +1301 -0
- data/redist/xmlscan/README +34 -0
- data/redist/xmlscan/THANKS +11 -0
- data/redist/xmlscan/doc/changes.html +74 -0
- data/redist/xmlscan/doc/changes.rd +80 -0
- data/redist/xmlscan/doc/en/conformance.html +136 -0
- data/redist/xmlscan/doc/en/conformance.rd +152 -0
- data/redist/xmlscan/doc/en/manual.html +356 -0
- data/redist/xmlscan/doc/en/manual.rd +402 -0
- data/redist/xmlscan/doc/ja/conformance.ja.html +118 -0
- data/redist/xmlscan/doc/ja/conformance.ja.rd +134 -0
- data/redist/xmlscan/doc/ja/manual.ja.html +325 -0
- data/redist/xmlscan/doc/ja/manual.ja.rd +370 -0
- data/redist/xmlscan/doc/src/Makefile +41 -0
- data/redist/xmlscan/doc/src/conformance.rd.src +256 -0
- data/redist/xmlscan/doc/src/langsplit.rb +110 -0
- data/redist/xmlscan/doc/src/manual.rd.src +614 -0
- data/redist/xmlscan/install.rb +41 -0
- data/redist/xmlscan/lib/xmlscan/encoding.rb +311 -0
- data/redist/xmlscan/lib/xmlscan/htmlscan.rb +289 -0
- data/redist/xmlscan/lib/xmlscan/namespace.rb +352 -0
- data/redist/xmlscan/lib/xmlscan/parser.rb +299 -0
- data/redist/xmlscan/lib/xmlscan/scanner.rb +1109 -0
- data/redist/xmlscan/lib/xmlscan/version.rb +22 -0
- data/redist/xmlscan/lib/xmlscan/visitor.rb +158 -0
- data/redist/xmlscan/lib/xmlscan/xmlchar.rb +441 -0
- data/redist/xmlscan/memo/CONFORMANCE +1249 -0
- data/redist/xmlscan/memo/PRODUCTIONS +195 -0
- data/redist/xmlscan/memo/contentspec.ry +335 -0
- data/redist/xmlscan/samples/chibixml.rb +105 -0
- data/redist/xmlscan/samples/getxmlchar.rb +122 -0
- data/redist/xmlscan/samples/rexml.rb +159 -0
- data/redist/xmlscan/samples/xmlbench.rb +88 -0
- data/redist/xmlscan/samples/xmlbench/parser/chibixml.rb +22 -0
- data/redist/xmlscan/samples/xmlbench/parser/nqxml.rb +29 -0
- data/redist/xmlscan/samples/xmlbench/parser/rexml.rb +62 -0
- data/redist/xmlscan/samples/xmlbench/parser/xmlparser.rb +22 -0
- data/redist/xmlscan/samples/xmlbench/parser/xmlscan-0.0.10.rb +62 -0
- data/redist/xmlscan/samples/xmlbench/parser/xmlscan-chibixml.rb +22 -0
- data/redist/xmlscan/samples/xmlbench/parser/xmlscan-rexml.rb +22 -0
- data/redist/xmlscan/samples/xmlbench/parser/xmlscan.rb +99 -0
- data/redist/xmlscan/samples/xmlbench/xmlbench-lib.rb +116 -0
- data/redist/xmlscan/samples/xmlconftest.rb +200 -0
- data/redist/xmlscan/test.rb +7 -0
- data/redist/xmlscan/tests/deftestcase.rb +73 -0
- data/redist/xmlscan/tests/runtest.rb +47 -0
- data/redist/xmlscan/tests/testall.rb +14 -0
- data/redist/xmlscan/tests/testencoding.rb +438 -0
- data/redist/xmlscan/tests/testhtmlscan.rb +752 -0
- data/redist/xmlscan/tests/testnamespace.rb +457 -0
- data/redist/xmlscan/tests/testparser.rb +591 -0
- data/redist/xmlscan/tests/testscanner.rb +1749 -0
- data/redist/xmlscan/tests/testxmlchar.rb +143 -0
- data/redist/xmlscan/tests/visitor.rb +34 -0
- data/test/array_extensions_test.rb +2 -2
- data/test/ea_instantiator_test.rb +41 -0
- data/test/ecore_self_test.rb +53 -0
- data/test/environment_test.rb +11 -6
- data/test/metamodel_builder_test.rb +404 -245
- data/test/metamodel_roundtrip_test.rb +52 -0
- data/test/metamodel_roundtrip_test/TestModel.rb +65 -0
- data/test/metamodel_roundtrip_test/TestModel_Regenerated.rb +64 -0
- data/test/metamodel_roundtrip_test/houseMetamodel.ecore +32 -0
- data/test/metamodel_roundtrip_test/houseMetamodel_from_ecore.rb +39 -0
- data/test/rgen_test.rb +3 -3
- data/test/template_language_test.rb +65 -39
- data/test/template_language_test/expected_result.txt +24 -3
- data/test/template_language_test/templates/code/array.tpl +11 -0
- data/test/template_language_test/templates/content/author.tpl +7 -0
- data/test/template_language_test/templates/content/chapter.tpl +1 -1
- data/test/template_language_test/templates/root.tpl +17 -8
- data/test/template_language_test/testout.txt +24 -3
- data/test/testmodel/class_model_checker.rb +119 -0
- data/test/{xmi_instantiator_test/testmodel.eap → testmodel/ea_testmodel.eap} +0 -0
- data/test/{xmi_instantiator_test/testmodel.xml → testmodel/ea_testmodel.xml} +81 -14
- data/test/testmodel/ea_testmodel_partial.xml +317 -0
- data/test/testmodel/ecore_model_checker.rb +101 -0
- data/test/testmodel/manual_testmodel.xml +22 -0
- data/test/testmodel/object_model_checker.rb +67 -0
- data/test/transformer_test.rb +18 -10
- data/test/xml_instantiator_test.rb +81 -8
- data/test/xml_instantiator_test/simple_ecore_model_checker.rb +94 -0
- data/test/xml_instantiator_test/simple_xmi_ecore_instantiator.rb +53 -0
- data/test/xml_instantiator_test/simple_xmi_metamodel.rb +49 -0
- data/test/xml_instantiator_test/simple_xmi_to_ecore.rb +75 -0
- metadata +126 -28
- data/lib/ea/xmi_class_instantiator.rb +0 -46
- data/lib/ea/xmi_helper.rb +0 -26
- data/lib/ea/xmi_metamodel.rb +0 -34
- data/lib/ea/xmi_object_instantiator.rb +0 -46
- data/lib/ea/xmi_to_classmodel.rb +0 -78
- data/lib/ea/xmi_to_objectmodel.rb +0 -92
- data/lib/mmgen/mm_ext/uml_classmodel_ext.rb +0 -71
- data/lib/mmgen/templates/uml_classmodel.tpl +0 -63
- data/lib/rgen/xml_instantiator.rb +0 -132
- data/lib/uml/objectmodel_instantiator.rb +0 -53
- data/lib/uml/uml_classmodel.rb +0 -92
- data/lib/uml/uml_objectmodel.rb +0 -65
- data/test/metamodel_generator_test.rb +0 -44
- data/test/metamodel_generator_test/TestModel.rb +0 -40
- data/test/metamodel_generator_test/expected_result.txt +0 -40
- data/test/xmi_class_instantiator_test.rb +0 -24
- data/test/xmi_instantiator_test/class_model_checker.rb +0 -97
- data/test/xmi_object_instantiator_test.rb +0 -65
- data/test/xml_instantiator_test/testmodel.xml +0 -7
@@ -0,0 +1,356 @@
|
|
1
|
+
<?xml version="1.0" ?>
|
2
|
+
<!DOCTYPE html
|
3
|
+
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
4
|
+
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
5
|
+
<html xmlns="http://www.w3.org/1999/xhtml">
|
6
|
+
<head>
|
7
|
+
<title>en/manual.rd</title>
|
8
|
+
</head>
|
9
|
+
<body>
|
10
|
+
<h1><a name="label-0" id="label-0">xmlscan version 0.2 Reference Manual</a></h1><!-- RDLabel: "xmlscan version 0.2 Reference Manual" -->
|
11
|
+
<p>This is a broken English version. If you find lexical or
|
12
|
+
grammatical mistakes, or strange expressions (including kidding,
|
13
|
+
unnatural or unclear ones) in this document, please
|
14
|
+
<a href="mailto:katsu@blue.sky.or.jp">let me know</a>.</p>
|
15
|
+
<h2><a name="label-1" id="label-1">Abstract</a></h2><!-- RDLabel: "Abstract" -->
|
16
|
+
<p>XMLscan is one of non-validating XML parser written in 100%
|
17
|
+
pure Ruby.</p>
|
18
|
+
<p>XMLscan's features are as follows:</p>
|
19
|
+
<dl>
|
20
|
+
<dt><a name="label-2" id="label-2">100% pure Ruby</a></dt><!-- RDLabel: "100% pure Ruby" -->
|
21
|
+
<dd>
|
22
|
+
XMLscan doesn't require any extension libraries, so
|
23
|
+
it completely works only with a Ruby interpreter version
|
24
|
+
1.6 or above.
|
25
|
+
(It also needs no standard-bundled extension library.)
|
26
|
+
</dd>
|
27
|
+
<dt><a name="label-3" id="label-3">Compliant to the specification</a></dt><!-- RDLabel: "Compliant to the specification" -->
|
28
|
+
<dd>
|
29
|
+
XMLscan has been developed to satisfy all conditions,
|
30
|
+
described in XML 1.0 Specification and required to a
|
31
|
+
non-validating XML processor
|
32
|
+
</dd>
|
33
|
+
<dt><a name="label-4" id="label-4">High-speed</a></dt><!-- RDLabel: "High-speed" -->
|
34
|
+
<dd>
|
35
|
+
XMLscan is, probably, the fastest parser among all
|
36
|
+
existing XML/HTML parsers written in pure Ruby.
|
37
|
+
</dd>
|
38
|
+
<dt><a name="label-5" id="label-5">Support for various CES.</a></dt><!-- RDLabel: "Support for various CES." -->
|
39
|
+
<dd>
|
40
|
+
XMLscan can parse an XML document encoded in at least
|
41
|
+
iso-8859-*, EUC-*, Shift_JIS, and UTF-8 as it is.
|
42
|
+
UTF-16 is not supported directly, though.
|
43
|
+
</dd>
|
44
|
+
<dt><a name="label-6" id="label-6">Just parsing</a></dt><!-- RDLabel: "Just parsing" -->
|
45
|
+
<dd>
|
46
|
+
The role of xmlscan is just to parse an XML document.
|
47
|
+
XMLscan doesn't provide high-level features to easily
|
48
|
+
handle an XML document. XMLscan is assumed to be used as
|
49
|
+
a core part of a library providing such features.
|
50
|
+
</dd>
|
51
|
+
<dt><a name="label-7" id="label-7">HTML</a></dt><!-- RDLabel: "HTML" -->
|
52
|
+
<dd>
|
53
|
+
XMLscan contains htmlscan, an HTML parser.
|
54
|
+
</dd>
|
55
|
+
</dl>
|
56
|
+
<h2><a name="label-8" id="label-8">Character encodings</a></h2><!-- RDLabel: "Character encodings" -->
|
57
|
+
<p>By default, the value of global variable $KCODE decides
|
58
|
+
which CES (character encoding scheme) is assumed for xmlscan
|
59
|
+
to parse an XML document.
|
60
|
+
You need to set $KCODE or <!-- Reference, RDLabel "XMLScan::XMLScanner#kcode=" doesn't exist --><em class="label-not-found">XMLScan::XMLScanner#kcode=</em><!-- Reference end -->
|
61
|
+
an appropriate value to parse an XML document encoded in EUC-*,
|
62
|
+
Shift_JIS, or UTF-8.</p>
|
63
|
+
<p>UTF-16 is not supported directly. You should convert it into
|
64
|
+
UTF-8 before parsing.</p>
|
65
|
+
<h2><a name="label-9" id="label-9">XML Namespaces</a></h2><!-- RDLabel: "XML Namespaces" -->
|
66
|
+
<p>XML Namespaces have been already implemented in
|
67
|
+
xmlscan/namespace.rb. However, since its interface is going
|
68
|
+
to be modified, this feature is undocumented now.</p>
|
69
|
+
<h2><a name="label-10" id="label-10">Class Reference</a></h2><!-- RDLabel: "Class Reference" -->
|
70
|
+
<h3><a name="label-11" id="label-11">XMLScan::Error</a></h3><!-- RDLabel: "XMLScan::Error" -->
|
71
|
+
<p>The superclass for all exceptions related to xmlscan.</p>
|
72
|
+
<p>These exceptions are raised by XMLScan::Visitor
|
73
|
+
by default when it receives an error report from a parser,
|
74
|
+
such as XMLScan::XMLScanner or XMLScan::XMLParser.
|
75
|
+
Each parser never raises these exceptions by itself.</p>
|
76
|
+
<dl>
|
77
|
+
<dt><a name="label-12" id="label-12">XMLScan::ParseError</a></dt><!-- RDLabel: "XMLScan::ParseError" -->
|
78
|
+
<dd>
|
79
|
+
An error except a constraint violation, for example,
|
80
|
+
an XML document is unmatched with a production.
|
81
|
+
</dd>
|
82
|
+
<dt><a name="label-13" id="label-13">XMLScan::NotWellFormedError</a></dt><!-- RDLabel: "XMLScan::NotWellFormedError" -->
|
83
|
+
<dd>
|
84
|
+
Raised when an XML document violates an well-formedness
|
85
|
+
constraint.
|
86
|
+
</dd>
|
87
|
+
<dt><a name="label-14" id="label-14">XMLScan::NotValidError</a></dt><!-- RDLabel: "XMLScan::NotValidError" -->
|
88
|
+
<dd>
|
89
|
+
Raised when an XML document violates an validity constraint.
|
90
|
+
</dd>
|
91
|
+
</dl>
|
92
|
+
<h3><a name="label-15" id="label-15">XMLScan::Visitor</a></h3><!-- RDLabel: "XMLScan::Visitor" -->
|
93
|
+
<p>Mix-in for receiving the result of parsing an XML document.</p>
|
94
|
+
<p>Each parser included in xmlscan parses an XML document from
|
95
|
+
the beginning, and calls each specific method of given instance of
|
96
|
+
XMLScan::Visitor for each syntactic element, such as a tag.
|
97
|
+
It is ensured that these calls is in order of the appearance
|
98
|
+
in the document from the beginning.</p>
|
99
|
+
<h4><a name="label-16" id="label-16">Methods:</a></h4><!-- RDLabel: "Methods:" -->
|
100
|
+
<p>Without special notice, the following methods do nothing by
|
101
|
+
default.</p>
|
102
|
+
<dl>
|
103
|
+
<dt><a name="label-17" id="label-17"><code>XMLScan::Visitor#parse_error(<var>msg</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#parse_error" -->
|
104
|
+
<dd>
|
105
|
+
Called when the parser meets an error except a constraint
|
106
|
+
violation, for example, an XML document is unmatched with
|
107
|
+
a production. By default, this method raises
|
108
|
+
<a href="#label-12">XMLScan::ParseError</a> exception. If no exception is
|
109
|
+
raised and this method returns normally, the parser recovers
|
110
|
+
the error and continues to parse.</dd>
|
111
|
+
<dt><a name="label-18" id="label-18"><code>XMLScan::Visitor#wellformed_error(<var>msg</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#wellformed_error" -->
|
112
|
+
<dd>
|
113
|
+
Called when the parser meets an well-formedness constraint
|
114
|
+
violation. By default, this method raises
|
115
|
+
<a href="#label-13">XMLScan::NotWellFormedError</a> exception. If no exception
|
116
|
+
is raised and this method returns normally, the parser recovers
|
117
|
+
the error and continues to parse.</dd>
|
118
|
+
<dt><a name="label-19" id="label-19"><code>XMLScan::Visitor#valid_error(<var>msg</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#valid_error" -->
|
119
|
+
<dd>
|
120
|
+
<p>Called when the parser meets validity constraint
|
121
|
+
violation. By default, this method raises
|
122
|
+
<a href="#label-14">XMLScan::NotValidError</a> exception. If no exception
|
123
|
+
is raised and this method returns normally, the parser recovers
|
124
|
+
the error and continues to parse.</p>
|
125
|
+
<p>FYI, current version of xmlscan includes no validating XML
|
126
|
+
processor. This method is reserved for future versions.</p></dd>
|
127
|
+
<dt><a name="label-20" id="label-20"><code>XMLScan::Visitor#warning(<var>msg</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#warning" -->
|
128
|
+
<dd>
|
129
|
+
Called when the parser meets a non-error but unrecommended
|
130
|
+
thing or a syntax which xmlscan is not able to parse.</dd>
|
131
|
+
<dt><a name="label-21" id="label-21"><code>XMLScan::Visitor#on_start_document</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_start_document" -->
|
132
|
+
<dd>
|
133
|
+
Called just before the parser starts parsing an XML document.
|
134
|
+
After this method is called, corresponding
|
135
|
+
<a href="#label-22">XMLScan::Visitor#on_end_document</a> method is always called.</dd>
|
136
|
+
<dt><a name="label-22" id="label-22"><code>XMLScan::Visitor#on_end_document</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_end_document" -->
|
137
|
+
<dd>
|
138
|
+
Called after the parser reaches the end of an XML document.</dd>
|
139
|
+
<dt><a name="label-23" id="label-23"><code>XMLScan::Visitor#on_xmldecl</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl" -->
|
140
|
+
<dt><a name="label-24" id="label-24"><code>XMLScan::Visitor#on_xmldecl_version(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_version" -->
|
141
|
+
<dt><a name="label-25" id="label-25"><code>XMLScan::Visitor#on_xmldecl_encoding(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_encoding" -->
|
142
|
+
<dt><a name="label-26" id="label-26"><code>XMLScan::Visitor#on_xmldecl_standalone(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_standalone" -->
|
143
|
+
<dt><a name="label-27" id="label-27"><code>XMLScan::Visitor#on_xmldecl_other(<var>name</var>, <var>value</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_other" -->
|
144
|
+
<dt><a name="label-28" id="label-28"><code>XMLScan::Visitor#on_xmldecl_end</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_end" -->
|
145
|
+
<dd>
|
146
|
+
<p>Called when the parser meets an XML declaration.</p>
|
147
|
+
<pre><?xml version="1.0" encoding="euc-jp" standalone="yes" ?>
|
148
|
+
^ ^ ^ ^ ^
|
149
|
+
1 2 3 4 5
|
150
|
+
|
151
|
+
method argument
|
152
|
+
--------------------------------------
|
153
|
+
1: on_xmldecl
|
154
|
+
2: on_xmldecl_version ("1.0")
|
155
|
+
3: on_xmldecl_encoding ("euc-jp")
|
156
|
+
4: on_xmldecl_standalone ("yes")
|
157
|
+
5: on_xmldecl_end</pre>
|
158
|
+
<p>When an XML declaration is found, both on_xmldecl and
|
159
|
+
on_xmldecl_end method are always called. Any other methods
|
160
|
+
are called only when the corresponding syntaxes are found.</p>
|
161
|
+
<p>When a declaration except version, encoding, and standalone
|
162
|
+
is found in an XML declaration, on_xmldecl_other method is
|
163
|
+
called. Since such a declaration is not permitted, note that
|
164
|
+
the parser always calls <a href="#label-17">XMLScan::Visitor#parse_error</a> method
|
165
|
+
before calling on_xmldecl_other method.</p></dd>
|
166
|
+
<dt><a name="label-29" id="label-29"><code>XMLScan::Visitor#on_doctype(<var>root</var>, <var>pubid</var>, <var>sysid</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_doctype" -->
|
167
|
+
<dd>
|
168
|
+
<p>Called when the parser meets a document type declaration.</p>
|
169
|
+
<pre>document argument</pre>
|
170
|
+
<pre>--------------------------------------------------------------
|
171
|
+
1: <!DOCTYPE foo> ('foo', nil, nil)
|
172
|
+
2: <!DOCTYPE foo SYSTEM "bar"> ('foo', nil, 'bar')
|
173
|
+
3: <!DOCTYPE foo PUBLIC "bar"> ('foo', 'bar', nil )
|
174
|
+
4: <!DOCTYPE foo PUBLIC "bar" "baz"> ('foo', 'bar', 'baz')</pre></dd>
|
175
|
+
<dt><a name="label-30" id="label-30"><code>XMLScan::Visitor#on_prolog_space(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_prolog_space" -->
|
176
|
+
<dd>
|
177
|
+
Called when the parser meets whitespaces in prolog.</dd>
|
178
|
+
<dt><a name="label-31" id="label-31"><code>XMLScan::Visitor#on_comment(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_comment" -->
|
179
|
+
<dd>
|
180
|
+
Called when the parser meets a comment.</dd>
|
181
|
+
<dt><a name="label-32" id="label-32"><code>XMLScan::Visitor#on_pi(<var>target</var>, <var>pi</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_pi" -->
|
182
|
+
<dd>
|
183
|
+
Called when the parser meets a processing instruction.</dd>
|
184
|
+
<dt><a name="label-33" id="label-33"><code>XMLScan::Visitor#on_chardata(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_chardata" -->
|
185
|
+
<dd>
|
186
|
+
Called when the parser meets character data.</dd>
|
187
|
+
<dt><a name="label-34" id="label-34"><code>XMLScan::Visitor#on_cdata(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_cdata" -->
|
188
|
+
<dd>
|
189
|
+
Called when the parser meets a CDATA section.</dd>
|
190
|
+
<dt><a name="label-35" id="label-35"><code>XMLScan::Visitor#on_entityref(<var>ref</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_entityref" -->
|
191
|
+
<dd>
|
192
|
+
Called when the parser meets a general entity reference
|
193
|
+
in a place except an attribute value.</dd>
|
194
|
+
<dt><a name="label-36" id="label-36"><code>XMLScan::Visitor#on_charref(<var>code</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_charref" -->
|
195
|
+
<dt><a name="label-37" id="label-37"><code>XMLScan::Visitor#on_charref_hex(<var>code</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_charref_hex" -->
|
196
|
+
<dd>
|
197
|
+
Called when the parser meets a character reference
|
198
|
+
in a place except an attribute value.
|
199
|
+
When the character code is represented by decimals,
|
200
|
+
on_charref is called. When by hexadecimals, on_charref_hex
|
201
|
+
is called. <var>code</var> is an integer.</dd>
|
202
|
+
<dt><a name="label-38" id="label-38"><code>XMLScan::Visitor#on_stag(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_stag" -->
|
203
|
+
<dt><a name="label-39" id="label-39"><code>XMLScan::Visitor#on_attribute(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attribute" -->
|
204
|
+
<dt><a name="label-40" id="label-40"><code>XMLScan::Visitor#on_attr_value(<var>str</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attr_value" -->
|
205
|
+
<dt><a name="label-41" id="label-41"><code>XMLScan::Visitor#on_attr_entityref(<var>ref</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attr_entityref" -->
|
206
|
+
<dt><a name="label-42" id="label-42"><code>XMLScan::Visitor#on_attr_charref(<var>code</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attr_charref" -->
|
207
|
+
<dt><a name="label-43" id="label-43"><code>XMLScan::Visitor#on_attr_charref_hex(<var>code</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attr_charref_hex" -->
|
208
|
+
<dt><a name="label-44" id="label-44"><code>XMLScan::Visitor#on_attribute_end(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_attribute_end" -->
|
209
|
+
<dt><a name="label-45" id="label-45"><code>XMLScan::Visitor#on_stag_end_empty(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_stag_end_empty" -->
|
210
|
+
<dt><a name="label-46" id="label-46"><code>XMLScan::Visitor#on_stag_end(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_stag_end" -->
|
211
|
+
<dd>
|
212
|
+
<p>Called when the parser meets an XML declaration.</p>
|
213
|
+
<pre><hoge fuga="foo&bar;&#38;&#x26;baz" >
|
214
|
+
^ ^ ^ ^ ^ ^ ^ ^ ^
|
215
|
+
1 2 3 4 5 6 7 8 9
|
216
|
+
|
217
|
+
method argument
|
218
|
+
------------------------------------
|
219
|
+
1: on_stag ('hoge')
|
220
|
+
2: on_attribute ('fuga')
|
221
|
+
3: on_attr_value ('foo')
|
222
|
+
4: on_attr_entityref ('bar')
|
223
|
+
5: on_attr_charref (38)
|
224
|
+
6: on_attr_charref_hex (38)
|
225
|
+
7: on_attr_value ('baz')
|
226
|
+
8: on_attribute_end ('fuga')
|
227
|
+
9: on_stag_end ('hoge')
|
228
|
+
or
|
229
|
+
on_stag_end_empty ('hoge')</pre>
|
230
|
+
<p>When a start tag is found, both on_stag and corresponding
|
231
|
+
either on_stag_end or on_stag_end_empty method are always
|
232
|
+
called. Any other methods are called only when at least one
|
233
|
+
attribute is found in the start tag.</p>
|
234
|
+
<p>When an attribute is found, both on_attribute and
|
235
|
+
on_attribute_end method are always called. If the attribute
|
236
|
+
value is empty, only these two methods are called.</p>
|
237
|
+
<p>When the parser meets a general entity reference in an
|
238
|
+
attribute value, it calls on_attr_entityref method.
|
239
|
+
When the parser meets a character reference in an attribute
|
240
|
+
value, it calls either on_charref or on_charref_hex method.</p>
|
241
|
+
<p>If the tag is an empty element tag, on_stag_end_empty method
|
242
|
+
is called instead of on_stag_end method.</p></dd>
|
243
|
+
<dt><a name="label-47" id="label-47"><code>XMLScan::Visitor#on_etag(<var>name</var>)</code></a></dt><!-- RDLabel: "XMLScan::Visitor#on_etag" -->
|
244
|
+
<dd>
|
245
|
+
Called when the parser meets an end tag.</dd>
|
246
|
+
</dl>
|
247
|
+
<h3><a name="label-48" id="label-48">XMLScan::XMLScanner</a></h3><!-- RDLabel: "XMLScan::XMLScanner" -->
|
248
|
+
<p>The scanner which tokenizes an XML document and recognize tags,
|
249
|
+
and so on.</p>
|
250
|
+
<p>The conformance of XMLScan::XMLScanner to the specification
|
251
|
+
is described in another document.</p>
|
252
|
+
<h4><a name="label-49" id="label-49">SuperClass:</a></h4><!-- RDLabel: "SuperClass:" -->
|
253
|
+
<ul>
|
254
|
+
<li>Object</li>
|
255
|
+
</ul>
|
256
|
+
<h4><a name="label-50" id="label-50">Class Methods:</a></h4><!-- RDLabel: "Class Methods:" -->
|
257
|
+
<dl>
|
258
|
+
<dt><a name="label-51" id="label-51"><code>XMLScan::XMLScanner.new(<var>visitor</var>[, <var>option</var> ...])</code></a></dt><!-- RDLabel: "XMLScan::XMLScanner.new" -->
|
259
|
+
<dd>
|
260
|
+
<p>Creates an instance. <var>visitor</var> is a instance of
|
261
|
+
<a href="#label-15">XMLScan::Visitor</a> and receives the result of parsing
|
262
|
+
from the XMLScan::Scanner object.</p>
|
263
|
+
<p>You can specify one of more <var>option</var> as a string or symbol.
|
264
|
+
XMLScan::Scanner's options are as follows:</p>
|
265
|
+
<dl>
|
266
|
+
<dt><a name="label-52" id="label-52">'strict_char'</a></dt><!-- RDLabel: "'strict_char'" -->
|
267
|
+
<dd>
|
268
|
+
This option is enabled after
|
269
|
+
<code>require 'xmlscan/xmlchar'</code>.
|
270
|
+
XMLScan::Scanner checks whether an XML document includes
|
271
|
+
an illegal character. The performance decreases sharply.
|
272
|
+
</dd>
|
273
|
+
</dl></dd>
|
274
|
+
</dl>
|
275
|
+
<h4><a name="label-53" id="label-53">Methods:</a></h4><!-- RDLabel: "Methods:" -->
|
276
|
+
<dl>
|
277
|
+
<dt><a name="label-54" id="label-54"><code>XMLScan::XMLScanner#kcode= <var>arg</var></code></a></dt><!-- RDLabel: "XMLScan::XMLScanner#kcode= arg" -->
|
278
|
+
<dd>
|
279
|
+
Sets CES. Available values for <var>code</var> are same as $KCODE
|
280
|
+
except nil. If <var>code</var> is nil, $KCODE decides the CES.</dd>
|
281
|
+
<dt><a name="label-55" id="label-55"><code>XMLScan::XMLScanner#kcode</code></a></dt><!-- RDLabel: "XMLScan::XMLScanner#kcode" -->
|
282
|
+
<dd>
|
283
|
+
Returns CES. The format of the return value is same as
|
284
|
+
Regexp#kcode. If this method returns nil, it represents that
|
285
|
+
$KCODE decides the CES.</dd>
|
286
|
+
<dt><a name="label-56" id="label-56"><code>XMLScan::XMLScanner#parse(<var>source</var>)</code></a></dt><!-- RDLabel: "XMLScan::XMLScanner#parse" -->
|
287
|
+
<dd>
|
288
|
+
Parses <var>source</var> as an XML document. <var>source</var> must be
|
289
|
+
a string, an array of strings, or an object which responds to
|
290
|
+
gets method which behaves same as IO#gets does.</dd>
|
291
|
+
</dl>
|
292
|
+
<h3><a name="label-57" id="label-57">XMLScan::XMLParser</a></h3><!-- RDLabel: "XMLScan::XMLParser" -->
|
293
|
+
<p>The non-validating XML parser.</p>
|
294
|
+
<p>The conformance of XMLScan::XMLParser to the specification
|
295
|
+
is described in another document.</p>
|
296
|
+
<h4><a name="label-58" id="label-58">SuperClass:</a></h4><!-- RDLabel: "SuperClass:" -->
|
297
|
+
<ul>
|
298
|
+
<li><a href="#label-48">XMLScan::XMLScanner</a></li>
|
299
|
+
</ul>
|
300
|
+
<h4><a name="label-59" id="label-59">Class Methods:</a></h4><!-- RDLabel: "Class Methods:" -->
|
301
|
+
<dl>
|
302
|
+
<dt><a name="label-60" id="label-60"><code>XMLScan::XMLParser.new(<var>visitor</var>[, <var>option</var> ...])</code></a></dt><!-- RDLabel: "XMLScan::XMLParser.new" -->
|
303
|
+
<dd>
|
304
|
+
<p>XMLScan::XMLParser makes sure the following for each
|
305
|
+
method of <var>visitor</var>:</p>
|
306
|
+
<dl>
|
307
|
+
<dt><a name="label-61" id="label-61"><a href="#label-38">XMLScan::Visitor#on_stag</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_stag" -->
|
308
|
+
<dd>
|
309
|
+
After calling this method, XMLScan::Parser always call
|
310
|
+
corresponding <a href="#label-47">XMLScan::Visitor#on_etag</a> method.
|
311
|
+
</dd>
|
312
|
+
</dl>
|
313
|
+
<p>In addition, if you never intend error recovery, method calls
|
314
|
+
which must not be occurred in a well-formed XML document are
|
315
|
+
all suppressed.</p></dd>
|
316
|
+
</dl>
|
317
|
+
<h3><a name="label-62" id="label-62">XMLScan::HTMLScanner</a></h3><!-- RDLabel: "XMLScan::HTMLScanner" -->
|
318
|
+
<p>An HTML parser based on <a href="#label-48">XMLScan::XMLScanner</a>.</p>
|
319
|
+
<p>The conformance of XMLScan::HTMLScanner to the specification
|
320
|
+
is described in another document.</p>
|
321
|
+
<h4><a name="label-63" id="label-63">SuperClass:</a></h4><!-- RDLabel: "SuperClass:" -->
|
322
|
+
<ul>
|
323
|
+
<li><a href="#label-48">XMLScan::XMLScanner</a></li>
|
324
|
+
</ul>
|
325
|
+
<h4><a name="label-64" id="label-64">Class Methods:</a></h4><!-- RDLabel: "Class Methods:" -->
|
326
|
+
<dl>
|
327
|
+
<dt><a name="label-65" id="label-65"><code>XMLScan::HTMLScanner.new(<var>visitor</var>[, <var>option</var> ...])</code></a></dt><!-- RDLabel: "XMLScan::HTMLScanner.new" -->
|
328
|
+
<dd>
|
329
|
+
XMLScan::HTMLScanner makes sure the following for each
|
330
|
+
method of <var>visitor</var>:
|
331
|
+
<dl>
|
332
|
+
<dt><a name="label-66" id="label-66"><a href="#label-23">XMLScan::Visitor#on_xmldecl</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl" -->
|
333
|
+
<dt><a name="label-67" id="label-67"><a href="#label-24">XMLScan::Visitor#on_xmldecl_version</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_version" -->
|
334
|
+
<dt><a name="label-68" id="label-68"><a href="#label-25">XMLScan::Visitor#on_xmldecl_encoding</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_encoding" -->
|
335
|
+
<dt><a name="label-69" id="label-69"><a href="#label-26">XMLScan::Visitor#on_xmldecl_standalone</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_standalone" -->
|
336
|
+
<dt><a name="label-70" id="label-70"><a href="#label-28">XMLScan::Visitor#on_xmldecl_end</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_xmldecl_end" -->
|
337
|
+
<dd>
|
338
|
+
An XML declaration never appears in an HTML document,
|
339
|
+
so XMLScan::HTMLScanner never calls these methods.
|
340
|
+
</dd>
|
341
|
+
<dt><a name="label-71" id="label-71"><a href="#label-45">XMLScan::Visitor#on_stag_end_empty</a></a></dt><!-- RDLabel: "XMLScan::Visitor#on_stag_end_empty" -->
|
342
|
+
<dd>
|
343
|
+
An empty element tag never appears in an HTML document,
|
344
|
+
so XMLScan::HTMLScanner never calls this method.
|
345
|
+
An empty element tag causes a parse error.
|
346
|
+
</dd>
|
347
|
+
<dt><a name="label-72" id="label-72"><a href="#label-18">XMLScan::Visitor#wellformed_error</a></a></dt><!-- RDLabel: "XMLScan::Visitor#wellformed_error" -->
|
348
|
+
<dd>
|
349
|
+
There is no well-formedness constraint for HTML,
|
350
|
+
so XMLScan::HTMLScanner never calls this method.
|
351
|
+
</dd>
|
352
|
+
</dl></dd>
|
353
|
+
</dl>
|
354
|
+
|
355
|
+
</body>
|
356
|
+
</html>
|
@@ -0,0 +1,402 @@
|
|
1
|
+
=begin
|
2
|
+
# $Id: manual.rd.src,v 1.1 2003/01/22 16:41:45 katsu Exp $
|
3
|
+
|
4
|
+
= xmlscan version 0.2 Reference Manual
|
5
|
+
|
6
|
+
This is a broken English version. If you find lexical or
|
7
|
+
grammatical mistakes, or strange expressions (including kidding,
|
8
|
+
unnatural or unclear ones) in this document, please
|
9
|
+
((<let me know|URL:mailto:katsu@blue.sky.or.jp>)).
|
10
|
+
|
11
|
+
== Abstract
|
12
|
+
|
13
|
+
XMLscan is one of non-validating XML parser written in 100%
|
14
|
+
pure Ruby.
|
15
|
+
|
16
|
+
XMLscan's features are as follows:
|
17
|
+
|
18
|
+
: 100% pure Ruby
|
19
|
+
XMLscan doesn't require any extension libraries, so
|
20
|
+
it completely works only with a Ruby interpreter version
|
21
|
+
1.6 or above.
|
22
|
+
(It also needs no standard-bundled extension library.)
|
23
|
+
|
24
|
+
: Compliant to the specification
|
25
|
+
XMLscan has been developed to satisfy all conditions,
|
26
|
+
described in XML 1.0 Specification and required to a
|
27
|
+
non-validating XML processor
|
28
|
+
|
29
|
+
: High-speed
|
30
|
+
XMLscan is, probably, the fastest parser among all
|
31
|
+
existing XML/HTML parsers written in pure Ruby.
|
32
|
+
|
33
|
+
: Support for various CES.
|
34
|
+
XMLscan can parse an XML document encoded in at least
|
35
|
+
iso-8859-*, EUC-*, Shift_JIS, and UTF-8 as it is.
|
36
|
+
UTF-16 is not supported directly, though.
|
37
|
+
|
38
|
+
: Just parsing
|
39
|
+
The role of xmlscan is just to parse an XML document.
|
40
|
+
XMLscan doesn't provide high-level features to easily
|
41
|
+
handle an XML document. XMLscan is assumed to be used as
|
42
|
+
a core part of a library providing such features.
|
43
|
+
|
44
|
+
: HTML
|
45
|
+
XMLscan contains htmlscan, an HTML parser.
|
46
|
+
|
47
|
+
|
48
|
+
== Character encodings
|
49
|
+
|
50
|
+
By default, the value of global variable $KCODE decides
|
51
|
+
which CES (character encoding scheme) is assumed for xmlscan
|
52
|
+
to parse an XML document.
|
53
|
+
You need to set $KCODE or ((<XMLScan::XMLScanner#kcode=>))
|
54
|
+
an appropriate value to parse an XML document encoded in EUC-*,
|
55
|
+
Shift_JIS, or UTF-8.
|
56
|
+
|
57
|
+
UTF-16 is not supported directly. You should convert it into
|
58
|
+
UTF-8 before parsing.
|
59
|
+
|
60
|
+
|
61
|
+
== XML Namespaces
|
62
|
+
|
63
|
+
XML Namespaces have been already implemented in
|
64
|
+
xmlscan/namespace.rb. However, since its interface is going
|
65
|
+
to be modified, this feature is undocumented now.
|
66
|
+
|
67
|
+
|
68
|
+
|
69
|
+
== Class Reference
|
70
|
+
|
71
|
+
|
72
|
+
=== XMLScan::Error
|
73
|
+
|
74
|
+
The superclass for all exceptions related to xmlscan.
|
75
|
+
|
76
|
+
These exceptions are raised by XMLScan::Visitor
|
77
|
+
by default when it receives an error report from a parser,
|
78
|
+
such as XMLScan::XMLScanner or XMLScan::XMLParser.
|
79
|
+
Each parser never raises these exceptions by itself.
|
80
|
+
|
81
|
+
#The following exceptions are defined in xmlscan/scanner.rb:
|
82
|
+
|
83
|
+
: XMLScan::ParseError
|
84
|
+
|
85
|
+
An error except a constraint violation, for example,
|
86
|
+
an XML document is unmatched with a production.
|
87
|
+
|
88
|
+
: XMLScan::NotWellFormedError
|
89
|
+
|
90
|
+
Raised when an XML document violates an well-formedness
|
91
|
+
constraint.
|
92
|
+
|
93
|
+
: XMLScan::NotValidError
|
94
|
+
|
95
|
+
Raised when an XML document violates an validity constraint.
|
96
|
+
|
97
|
+
|
98
|
+
=== XMLScan::Visitor
|
99
|
+
|
100
|
+
Mix-in for receiving the result of parsing an XML document.
|
101
|
+
|
102
|
+
Each parser included in xmlscan parses an XML document from
|
103
|
+
the beginning, and calls each specific method of given instance of
|
104
|
+
XMLScan::Visitor for each syntactic element, such as a tag.
|
105
|
+
It is ensured that these calls is in order of the appearance
|
106
|
+
in the document from the beginning.
|
107
|
+
|
108
|
+
==== Methods:
|
109
|
+
|
110
|
+
Without special notice, the following methods do nothing by
|
111
|
+
default.
|
112
|
+
|
113
|
+
--- XMLScan::Visitor#parse_error(msg)
|
114
|
+
|
115
|
+
Called when the parser meets an error except a constraint
|
116
|
+
violation, for example, an XML document is unmatched with
|
117
|
+
a production. By default, this method raises
|
118
|
+
((<XMLScan::ParseError>)) exception. If no exception is
|
119
|
+
raised and this method returns normally, the parser recovers
|
120
|
+
the error and continues to parse.
|
121
|
+
|
122
|
+
--- XMLScan::Visitor#wellformed_error(msg)
|
123
|
+
|
124
|
+
Called when the parser meets an well-formedness constraint
|
125
|
+
violation. By default, this method raises
|
126
|
+
((<XMLScan::NotWellFormedError>)) exception. If no exception
|
127
|
+
is raised and this method returns normally, the parser recovers
|
128
|
+
the error and continues to parse.
|
129
|
+
|
130
|
+
--- XMLScan::Visitor#valid_error(msg)
|
131
|
+
|
132
|
+
Called when the parser meets validity constraint
|
133
|
+
violation. By default, this method raises
|
134
|
+
((<XMLScan::NotValidError>)) exception. If no exception
|
135
|
+
is raised and this method returns normally, the parser recovers
|
136
|
+
the error and continues to parse.
|
137
|
+
|
138
|
+
FYI, current version of xmlscan includes no validating XML
|
139
|
+
processor. This method is reserved for future versions.
|
140
|
+
|
141
|
+
--- XMLScan::Visitor#warning(msg)
|
142
|
+
|
143
|
+
Called when the parser meets a non-error but unrecommended
|
144
|
+
thing or a syntax which xmlscan is not able to parse.
|
145
|
+
|
146
|
+
--- XMLScan::Visitor#on_start_document
|
147
|
+
|
148
|
+
Called just before the parser starts parsing an XML document.
|
149
|
+
After this method is called, corresponding
|
150
|
+
((<XMLScan::Visitor#on_end_document>)) method is always called.
|
151
|
+
|
152
|
+
--- XMLScan::Visitor#on_end_document
|
153
|
+
|
154
|
+
Called after the parser reaches the end of an XML document.
|
155
|
+
|
156
|
+
--- XMLScan::Visitor#on_xmldecl
|
157
|
+
--- XMLScan::Visitor#on_xmldecl_version(str)
|
158
|
+
--- XMLScan::Visitor#on_xmldecl_encoding(str)
|
159
|
+
--- XMLScan::Visitor#on_xmldecl_standalone(str)
|
160
|
+
--- XMLScan::Visitor#on_xmldecl_other(name, value)
|
161
|
+
--- XMLScan::Visitor#on_xmldecl_end
|
162
|
+
|
163
|
+
Called when the parser meets an XML declaration.
|
164
|
+
|
165
|
+
<?xml version="1.0" encoding="euc-jp" standalone="yes" ?>
|
166
|
+
^ ^ ^ ^ ^
|
167
|
+
1 2 3 4 5
|
168
|
+
|
169
|
+
method argument
|
170
|
+
--------------------------------------
|
171
|
+
1: on_xmldecl
|
172
|
+
2: on_xmldecl_version ("1.0")
|
173
|
+
3: on_xmldecl_encoding ("euc-jp")
|
174
|
+
4: on_xmldecl_standalone ("yes")
|
175
|
+
5: on_xmldecl_end
|
176
|
+
|
177
|
+
When an XML declaration is found, both on_xmldecl and
|
178
|
+
on_xmldecl_end method are always called. Any other methods
|
179
|
+
are called only when the corresponding syntaxes are found.
|
180
|
+
|
181
|
+
When a declaration except version, encoding, and standalone
|
182
|
+
is found in an XML declaration, on_xmldecl_other method is
|
183
|
+
called. Since such a declaration is not permitted, note that
|
184
|
+
the parser always calls ((<XMLScan::Visitor#parse_error>)) method
|
185
|
+
before calling on_xmldecl_other method.
|
186
|
+
|
187
|
+
--- XMLScan::Visitor#on_doctype(root, pubid, sysid)
|
188
|
+
|
189
|
+
Called when the parser meets a document type declaration.
|
190
|
+
|
191
|
+
document argument
|
192
|
+
--------------------------------------------------------------
|
193
|
+
1: <!DOCTYPE foo> ('foo', nil, nil)
|
194
|
+
2: <!DOCTYPE foo SYSTEM "bar"> ('foo', nil, 'bar')
|
195
|
+
3: <!DOCTYPE foo PUBLIC "bar"> ('foo', 'bar', nil )
|
196
|
+
4: <!DOCTYPE foo PUBLIC "bar" "baz"> ('foo', 'bar', 'baz')
|
197
|
+
|
198
|
+
--- XMLScan::Visitor#on_prolog_space(str)
|
199
|
+
|
200
|
+
Called when the parser meets whitespaces in prolog.
|
201
|
+
|
202
|
+
--- XMLScan::Visitor#on_comment(str)
|
203
|
+
|
204
|
+
Called when the parser meets a comment.
|
205
|
+
|
206
|
+
--- XMLScan::Visitor#on_pi(target, pi)
|
207
|
+
|
208
|
+
Called when the parser meets a processing instruction.
|
209
|
+
|
210
|
+
--- XMLScan::Visitor#on_chardata(str)
|
211
|
+
|
212
|
+
Called when the parser meets character data.
|
213
|
+
|
214
|
+
--- XMLScan::Visitor#on_cdata(str)
|
215
|
+
|
216
|
+
Called when the parser meets a CDATA section.
|
217
|
+
|
218
|
+
--- XMLScan::Visitor#on_entityref(ref)
|
219
|
+
|
220
|
+
Called when the parser meets a general entity reference
|
221
|
+
in a place except an attribute value.
|
222
|
+
|
223
|
+
--- XMLScan::Visitor#on_charref(code)
|
224
|
+
--- XMLScan::Visitor#on_charref_hex(code)
|
225
|
+
|
226
|
+
Called when the parser meets a character reference
|
227
|
+
in a place except an attribute value.
|
228
|
+
When the character code is represented by decimals,
|
229
|
+
on_charref is called. When by hexadecimals, on_charref_hex
|
230
|
+
is called. ((|code|)) is an integer.
|
231
|
+
|
232
|
+
--- XMLScan::Visitor#on_stag(name)
|
233
|
+
--- XMLScan::Visitor#on_attribute(name)
|
234
|
+
--- XMLScan::Visitor#on_attr_value(str)
|
235
|
+
--- XMLScan::Visitor#on_attr_entityref(ref)
|
236
|
+
--- XMLScan::Visitor#on_attr_charref(code)
|
237
|
+
--- XMLScan::Visitor#on_attr_charref_hex(code)
|
238
|
+
--- XMLScan::Visitor#on_attribute_end(name)
|
239
|
+
--- XMLScan::Visitor#on_stag_end_empty(name)
|
240
|
+
--- XMLScan::Visitor#on_stag_end(name)
|
241
|
+
|
242
|
+
Called when the parser meets an XML declaration.
|
243
|
+
|
244
|
+
<hoge fuga="foo&bar;&&baz" >
|
245
|
+
^ ^ ^ ^ ^ ^ ^ ^ ^
|
246
|
+
1 2 3 4 5 6 7 8 9
|
247
|
+
|
248
|
+
method argument
|
249
|
+
------------------------------------
|
250
|
+
1: on_stag ('hoge')
|
251
|
+
2: on_attribute ('fuga')
|
252
|
+
3: on_attr_value ('foo')
|
253
|
+
4: on_attr_entityref ('bar')
|
254
|
+
5: on_attr_charref (38)
|
255
|
+
6: on_attr_charref_hex (38)
|
256
|
+
7: on_attr_value ('baz')
|
257
|
+
8: on_attribute_end ('fuga')
|
258
|
+
9: on_stag_end ('hoge')
|
259
|
+
or
|
260
|
+
on_stag_end_empty ('hoge')
|
261
|
+
|
262
|
+
When a start tag is found, both on_stag and corresponding
|
263
|
+
either on_stag_end or on_stag_end_empty method are always
|
264
|
+
called. Any other methods are called only when at least one
|
265
|
+
attribute is found in the start tag.
|
266
|
+
|
267
|
+
When an attribute is found, both on_attribute and
|
268
|
+
on_attribute_end method are always called. If the attribute
|
269
|
+
value is empty, only these two methods are called.
|
270
|
+
|
271
|
+
When the parser meets a general entity reference in an
|
272
|
+
attribute value, it calls on_attr_entityref method.
|
273
|
+
When the parser meets a character reference in an attribute
|
274
|
+
value, it calls either on_charref or on_charref_hex method.
|
275
|
+
|
276
|
+
If the tag is an empty element tag, on_stag_end_empty method
|
277
|
+
is called instead of on_stag_end method.
|
278
|
+
|
279
|
+
--- XMLScan::Visitor#on_etag(name)
|
280
|
+
|
281
|
+
Called when the parser meets an end tag.
|
282
|
+
|
283
|
+
|
284
|
+
|
285
|
+
=== XMLScan::XMLScanner
|
286
|
+
|
287
|
+
The scanner which tokenizes an XML document and recognize tags,
|
288
|
+
and so on.
|
289
|
+
|
290
|
+
The conformance of XMLScan::XMLScanner to the specification
|
291
|
+
is described in another document.
|
292
|
+
|
293
|
+
==== SuperClass:
|
294
|
+
|
295
|
+
* Object
|
296
|
+
|
297
|
+
==== Class Methods:
|
298
|
+
|
299
|
+
--- XMLScan::XMLScanner.new(visitor[, option ...])
|
300
|
+
|
301
|
+
Creates an instance. ((|visitor|)) is a instance of
|
302
|
+
((<XMLScan::Visitor>)) and receives the result of parsing
|
303
|
+
from the XMLScan::Scanner object.
|
304
|
+
|
305
|
+
You can specify one of more ((|option|)) as a string or symbol.
|
306
|
+
XMLScan::Scanner's options are as follows:
|
307
|
+
|
308
|
+
: 'strict_char'
|
309
|
+
|
310
|
+
This option is enabled after
|
311
|
+
(({require 'xmlscan/xmlchar'})).
|
312
|
+
XMLScan::Scanner checks whether an XML document includes
|
313
|
+
an illegal character. The performance decreases sharply.
|
314
|
+
|
315
|
+
==== Methods:
|
316
|
+
|
317
|
+
--- XMLScan::XMLScanner#kcode= arg
|
318
|
+
|
319
|
+
Sets CES. Available values for ((|code|)) are same as $KCODE
|
320
|
+
except nil. If ((|code|)) is nil, $KCODE decides the CES.
|
321
|
+
|
322
|
+
--- XMLScan::XMLScanner#kcode
|
323
|
+
|
324
|
+
Returns CES. The format of the return value is same as
|
325
|
+
Regexp#kcode. If this method returns nil, it represents that
|
326
|
+
$KCODE decides the CES.
|
327
|
+
|
328
|
+
--- XMLScan::XMLScanner#parse(source)
|
329
|
+
|
330
|
+
Parses ((|source|)) as an XML document. ((|source|)) must be
|
331
|
+
a string, an array of strings, or an object which responds to
|
332
|
+
gets method which behaves same as IO#gets does.
|
333
|
+
|
334
|
+
|
335
|
+
=== XMLScan::XMLParser
|
336
|
+
|
337
|
+
The non-validating XML parser.
|
338
|
+
|
339
|
+
The conformance of XMLScan::XMLParser to the specification
|
340
|
+
is described in another document.
|
341
|
+
|
342
|
+
|
343
|
+
==== SuperClass:
|
344
|
+
|
345
|
+
* ((<XMLScan::XMLScanner>))
|
346
|
+
|
347
|
+
==== Class Methods:
|
348
|
+
|
349
|
+
--- XMLScan::XMLParser.new(visitor[, option ...])
|
350
|
+
|
351
|
+
XMLScan::XMLParser makes sure the following for each
|
352
|
+
method of ((|visitor|)):
|
353
|
+
|
354
|
+
: ((<XMLScan::Visitor#on_stag>))
|
355
|
+
|
356
|
+
After calling this method, XMLScan::Parser always call
|
357
|
+
corresponding ((<XMLScan::Visitor#on_etag>)) method.
|
358
|
+
|
359
|
+
In addition, if you never intend error recovery, method calls
|
360
|
+
which must not be occurred in a well-formed XML document are
|
361
|
+
all suppressed.
|
362
|
+
|
363
|
+
|
364
|
+
=== XMLScan::HTMLScanner
|
365
|
+
|
366
|
+
An HTML parser based on ((<XMLScan::XMLScanner>)).
|
367
|
+
|
368
|
+
The conformance of XMLScan::HTMLScanner to the specification
|
369
|
+
is described in another document.
|
370
|
+
|
371
|
+
==== SuperClass:
|
372
|
+
|
373
|
+
* ((<XMLScan::XMLScanner>))
|
374
|
+
|
375
|
+
==== Class Methods:
|
376
|
+
|
377
|
+
--- XMLScan::HTMLScanner.new(visitor[, option ...])
|
378
|
+
|
379
|
+
XMLScan::HTMLScanner makes sure the following for each
|
380
|
+
method of ((|visitor|)):
|
381
|
+
|
382
|
+
: ((<XMLScan::Visitor#on_xmldecl>))
|
383
|
+
: ((<XMLScan::Visitor#on_xmldecl_version>))
|
384
|
+
: ((<XMLScan::Visitor#on_xmldecl_encoding>))
|
385
|
+
: ((<XMLScan::Visitor#on_xmldecl_standalone>))
|
386
|
+
: ((<XMLScan::Visitor#on_xmldecl_end>))
|
387
|
+
|
388
|
+
An XML declaration never appears in an HTML document,
|
389
|
+
so XMLScan::HTMLScanner never calls these methods.
|
390
|
+
|
391
|
+
: ((<XMLScan::Visitor#on_stag_end_empty>))
|
392
|
+
|
393
|
+
An empty element tag never appears in an HTML document,
|
394
|
+
so XMLScan::HTMLScanner never calls this method.
|
395
|
+
An empty element tag causes a parse error.
|
396
|
+
|
397
|
+
: ((<XMLScan::Visitor#wellformed_error>))
|
398
|
+
|
399
|
+
There is no well-formedness constraint for HTML,
|
400
|
+
so XMLScan::HTMLScanner never calls this method.
|
401
|
+
|
402
|
+
=end
|