xmlscan 0.3.0prec → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +59 -2
- data/VERSION +1 -1
- metadata +14 -14
    
        data/README.rdoc
    CHANGED
    
    | @@ -44,6 +44,20 @@ a core part of a library providing such features. | |
| 44 44 |  | 
| 45 45 | 
             
            XMLscan contains htmlscan, an HTML parser.
         | 
| 46 46 |  | 
| 47 | 
            +
            == New Scanner Translater Feature with 0.3.0
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            Created for a use case in Wagn (http://wagn.org)
         | 
| 50 | 
            +
             | 
| 51 | 
            +
            The fragment is pretty much the whole parsing as used in Wagn:
         | 
| 52 | 
            +
             | 
| 53 | 
            +
               pairs = XMLScan::XMLProcessor.process(io, {:key=>:name, :element=>:card,
         | 
| 54 | 
            +
                                 :substitute=>":transclude|{{:name}}", :extras=>[:type]})
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            The 'io' object is directly passed from the request.body in a XML POST action.
         | 
| 57 | 
            +
             | 
| 58 | 
            +
            ref: https://github.com/GerryG/wagn/commit/46e57dfd88cde45d33fe7f167d5803ab819ceef3#commitcomment-1012403
         | 
| 59 | 
            +
             | 
| 60 | 
            +
            See the tests cases in test/integration for some examples.
         | 
| 47 61 |  | 
| 48 62 | 
             
            === Character encodings
         | 
| 49 63 |  | 
| @@ -58,6 +72,9 @@ Shift_JIS, or UTF-8. | |
| 58 72 | 
             
            UTF-16 is not supported directly. You should convert it into
         | 
| 59 73 | 
             
            UTF-8 before parsing.
         | 
| 60 74 |  | 
| 75 | 
            +
            Character encoding needs more work with 0.3.0+.  The character maps were
         | 
| 76 | 
            +
            taken out in favor of \p{} RE classes in ruby, but it needs attention still.
         | 
| 77 | 
            +
             | 
| 61 78 | 
             
            === XML Namespaces
         | 
| 62 79 |  | 
| 63 80 | 
             
            XML Namespaces have been already implemented in
         | 
| @@ -89,13 +106,21 @@ Raised when an XML document violates an validity constraint. | |
| 89 106 |  | 
| 90 107 |  | 
| 91 108 | 
             
            XMLScan::Visitor
         | 
| 109 | 
            +
             | 
| 92 110 | 
             
            Mix-in for receiving the result of parsing an XML document.
         | 
| 93 111 | 
             
            Each parser included in xmlscan parses an XML document from
         | 
| 94 112 | 
             
            the beginning, and calls each specific method of given instance of
         | 
| 95 113 | 
             
            XMLScan::Visitor for each syntactic element, such as a tag.
         | 
| 96 114 | 
             
            It is ensured that these calls is in order of the appearance
         | 
| 97 115 | 
             
            in the document from the beginning.
         | 
| 116 | 
            +
             | 
| 117 | 
            +
            Changes: The scanner/parser now sends some additional arguments with
         | 
| 118 | 
            +
            each event, primarily the original string that was parsed for the even, but
         | 
| 119 | 
            +
            also a hash of attributes for stag_end.  This supports scanning for
         | 
| 120 | 
            +
            replacement of some elements very simple.
         | 
| 121 | 
            +
             | 
| 98 122 | 
             
            Methods:
         | 
| 123 | 
            +
             | 
| 99 124 | 
             
            Without special notice, the following methods do nothing by
         | 
| 100 125 | 
             
            default.
         | 
| 101 126 |  | 
| @@ -107,6 +132,7 @@ a production. By default, this method raises | |
| 107 132 | 
             
            XMLScan::ParseError exception. If no exception is
         | 
| 108 133 | 
             
            raised and this method returns normally, the parser recovers
         | 
| 109 134 | 
             
            the error and continues to parse.
         | 
| 135 | 
            +
             | 
| 110 136 | 
             
            XMLScan::Visitor#wellformed_error(msg)
         | 
| 111 137 |  | 
| 112 138 | 
             
            Called when the parser meets an well-formedness constraint
         | 
| @@ -114,6 +140,7 @@ violation. By default, this method raises | |
| 114 140 | 
             
            XMLScan::NotWellFormedError exception. If no exception
         | 
| 115 141 | 
             
            is raised and this method returns normally, the parser recovers
         | 
| 116 142 | 
             
            the error and continues to parse.
         | 
| 143 | 
            +
             | 
| 117 144 | 
             
            XMLScan::Visitor#valid_error(msg)
         | 
| 118 145 |  | 
| 119 146 | 
             
            Called when the parser meets validity constraint
         | 
| @@ -123,18 +150,23 @@ is raised and this method returns normally, the parser recovers | |
| 123 150 | 
             
            the error and continues to parse.
         | 
| 124 151 | 
             
            FYI, current version of xmlscan includes no validating XML
         | 
| 125 152 | 
             
            processor. This method is reserved for future versions.
         | 
| 153 | 
            +
             | 
| 126 154 | 
             
            XMLScan::Visitor#warning(msg)
         | 
| 127 155 |  | 
| 128 156 | 
             
            Called when the parser meets a non-error but unrecommended
         | 
| 129 157 | 
             
            thing or a syntax which xmlscan is not able to parse.
         | 
| 158 | 
            +
             | 
| 130 159 | 
             
            XMLScan::Visitor#on_start_document
         | 
| 131 160 |  | 
| 132 161 | 
             
            Called just before the parser starts parsing an XML document.
         | 
| 162 | 
            +
             | 
| 133 163 | 
             
            After this method is called, corresponding
         | 
| 134 164 | 
             
            XMLScan::Visitor#on_end_document method is always called.
         | 
| 165 | 
            +
             | 
| 135 166 | 
             
            XMLScan::Visitor#on_end_document
         | 
| 136 167 |  | 
| 137 168 | 
             
            Called after the parser reaches the end of an XML document.
         | 
| 169 | 
            +
             | 
| 138 170 | 
             
            XMLScan::Visitor#on_xmldecl
         | 
| 139 171 | 
             
            XMLScan::Visitor#on_xmldecl_version(str)
         | 
| 140 172 | 
             
            XMLScan::Visitor#on_xmldecl_encoding(str)
         | 
| @@ -154,50 +186,63 @@ Called when the parser meets an XML declaration. | |
| 154 186 | 
             
                      3: on_xmldecl_encoding    ("euc-jp")
         | 
| 155 187 | 
             
                      4: on_xmldecl_standalone  ("yes")
         | 
| 156 188 | 
             
                      5: on_xmldecl_end
         | 
| 189 | 
            +
             | 
| 157 190 | 
             
            When an XML declaration is found, both on_xmldecl and
         | 
| 158 191 | 
             
            on_xmldecl_end method are always called. Any other methods
         | 
| 159 192 | 
             
            are called only when the corresponding syntaxes are found.
         | 
| 193 | 
            +
             | 
| 160 194 | 
             
            When a declaration except version, encoding, and standalone
         | 
| 161 195 | 
             
            is found in an XML declaration, on_xmldecl_other method is
         | 
| 162 196 | 
             
            called. Since such a declaration is not permitted, note that
         | 
| 163 197 | 
             
            the parser always calls XMLScan::Visitor#parse_error method
         | 
| 164 198 | 
             
            before calling on_xmldecl_other method.
         | 
| 199 | 
            +
             | 
| 165 200 | 
             
            XMLScan::Visitor#on_doctype(root, pubid, sysid)
         | 
| 166 201 |  | 
| 167 202 | 
             
            Called when the parser meets a document type declaration.
         | 
| 203 | 
            +
             | 
| 168 204 | 
             
            document                            argument
         | 
| 169 205 | 
             
            --------------------------------------------------------------
         | 
| 170 206 | 
             
             1: <!DOCTYPE foo>                      ('foo', nil,   nil)
         | 
| 171 207 | 
             
             2: <!DOCTYPE foo SYSTEM "bar">         ('foo', nil,   'bar')
         | 
| 172 208 | 
             
             3: <!DOCTYPE foo PUBLIC "bar">         ('foo', 'bar',  nil )
         | 
| 173 209 | 
             
             4: <!DOCTYPE foo PUBLIC "bar" "baz">   ('foo', 'bar', 'baz')
         | 
| 210 | 
            +
             | 
| 174 211 | 
             
            XMLScan::Visitor#on_prolog_space(str)
         | 
| 175 212 |  | 
| 176 213 | 
             
            Called when the parser meets whitespaces in prolog.
         | 
| 214 | 
            +
             | 
| 177 215 | 
             
            XMLScan::Visitor#on_comment(str)
         | 
| 178 216 |  | 
| 179 217 | 
             
            Called when the parser meets a comment.
         | 
| 218 | 
            +
             | 
| 180 219 | 
             
            XMLScan::Visitor#on_pi(target, pi)
         | 
| 181 220 |  | 
| 182 221 | 
             
            Called when the parser meets a processing instruction.
         | 
| 222 | 
            +
             | 
| 183 223 | 
             
            XMLScan::Visitor#on_chardata(str)
         | 
| 184 224 |  | 
| 185 225 | 
             
            Called when the parser meets character data.
         | 
| 226 | 
            +
             | 
| 186 227 | 
             
            XMLScan::Visitor#on_cdata(str)
         | 
| 187 228 |  | 
| 188 229 | 
             
            Called when the parser meets a CDATA section.
         | 
| 230 | 
            +
             | 
| 189 231 | 
             
            XMLScan::Visitor#on_entityref(ref)
         | 
| 190 232 |  | 
| 191 233 | 
             
            Called when the parser meets a general entity reference
         | 
| 192 234 | 
             
            in a place except an attribute value.
         | 
| 235 | 
            +
             | 
| 193 236 | 
             
            XMLScan::Visitor#on_charref(code)
         | 
| 194 237 | 
             
            XMLScan::Visitor#on_charref_hex(code)
         | 
| 195 238 |  | 
| 196 239 | 
             
            Called when the parser meets a character reference
         | 
| 197 240 | 
             
            in a place except an attribute value.
         | 
| 241 | 
            +
             | 
| 198 242 | 
             
            When the character code is represented by decimals,
         | 
| 199 243 | 
             
            on_charref is called. When by hexadecimals, on_charref_hex
         | 
| 200 244 | 
             
            is called. code is an integer.
         | 
| 245 | 
            +
             | 
| 201 246 | 
             
            XMLScan::Visitor#on_stag(name)
         | 
| 202 247 | 
             
            XMLScan::Visitor#on_attribute(name)
         | 
| 203 248 | 
             
            XMLScan::Visitor#on_attr_value(str)
         | 
| @@ -226,19 +271,24 @@ Called when the parser meets an XML declaration. | |
| 226 271 | 
             
              9: on_stag_end            ('hoge')
         | 
| 227 272 | 
             
                  or
         | 
| 228 273 | 
             
                 on_stag_end_empty      ('hoge')
         | 
| 274 | 
            +
             | 
| 229 275 | 
             
            When a start tag is found, both on_stag and corresponding
         | 
| 230 276 | 
             
            either on_stag_end or on_stag_end_empty method are always
         | 
| 231 277 | 
             
            called. Any other methods are called only when at least one
         | 
| 232 278 | 
             
            attribute is found in the start tag.
         | 
| 279 | 
            +
             | 
| 233 280 | 
             
            When an attribute is found, both on_attribute and
         | 
| 234 281 | 
             
            on_attribute_end method are always called. If the attribute
         | 
| 235 282 | 
             
            value is empty, only these two methods are called.
         | 
| 283 | 
            +
             | 
| 236 284 | 
             
            When the parser meets a general entity reference in an
         | 
| 237 285 | 
             
            attribute value, it calls on_attr_entityref method.
         | 
| 286 | 
            +
             | 
| 238 287 | 
             
            When the parser meets a character reference in an attribute
         | 
| 239 288 | 
             
            value, it calls either on_charref or on_charref_hex method.
         | 
| 240 289 | 
             
            If the tag is an empty element tag, on_stag_end_empty method
         | 
| 241 290 | 
             
            is called instead of on_stag_end method.
         | 
| 291 | 
            +
             | 
| 242 292 | 
             
            XMLScan::Visitor#on_etag(name)
         | 
| 243 293 |  | 
| 244 294 | 
             
            Called when the parser meets an end tag.
         | 
| @@ -246,6 +296,7 @@ Called when the parser meets an end tag. | |
| 246 296 | 
             
            XMLScan::XMLScanner
         | 
| 247 297 | 
             
            The scanner which tokenizes an XML document and recognize tags,
         | 
| 248 298 | 
             
            and so on.
         | 
| 299 | 
            +
             | 
| 249 300 | 
             
            The conformance of XMLScan::XMLScanner to the specification
         | 
| 250 301 | 
             
            is described in another document.
         | 
| 251 302 | 
             
            SuperClass:
         | 
| @@ -259,29 +310,33 @@ XMLScan::XMLScanner.new(visitor[, option ...]) | |
| 259 310 | 
             
            Creates an instance. visitor is a instance of
         | 
| 260 311 | 
             
            XMLScan::Visitor and receives the result of parsing
         | 
| 261 312 | 
             
            from the XMLScan::Scanner object.
         | 
| 313 | 
            +
             | 
| 262 314 | 
             
            You can specify one of more option as a string or symbol.
         | 
| 315 | 
            +
             | 
| 263 316 | 
             
            XMLScan::Scanner's options are as follows:
         | 
| 264 317 |  | 
| 265 318 | 
             
            'strict_char'
         | 
| 266 319 |  | 
| 267 320 | 
             
            This option is enabled after
         | 
| 268 321 | 
             
            require 'xmlscan/xmlchar'.
         | 
| 322 | 
            +
             | 
| 269 323 | 
             
            XMLScan::Scanner checks whether an XML document includes
         | 
| 270 324 | 
             
            an illegal character. The performance decreases sharply.
         | 
| 271 325 |  | 
| 272 326 |  | 
| 273 | 
            -
             | 
| 274 327 | 
             
            Methods:
         | 
| 275 328 |  | 
| 276 329 | 
             
            XMLScan::XMLScanner#kcode= arg
         | 
| 277 330 |  | 
| 278 331 | 
             
            Sets CES. Available values for code are same as $KCODE
         | 
| 279 332 | 
             
            except nil. If code is nil, $KCODE decides the CES.
         | 
| 333 | 
            +
             | 
| 280 334 | 
             
            XMLScan::XMLScanner#kcode
         | 
| 281 335 |  | 
| 282 336 | 
             
            Returns CES. The format of the return value is same as
         | 
| 283 337 | 
             
            Regexp#kcode. If this method returns nil, it represents that
         | 
| 284 338 | 
             
            $KCODE decides the CES.
         | 
| 339 | 
            +
             | 
| 285 340 | 
             
            XMLScan::XMLScanner#parse(source)
         | 
| 286 341 |  | 
| 287 342 | 
             
            Parses source as an XML document. source must be
         | 
| @@ -289,9 +344,12 @@ a string, an array of strings, or an object which responds to | |
| 289 344 | 
             
            gets method which behaves same as IO#gets does.
         | 
| 290 345 |  | 
| 291 346 | 
             
            XMLScan::XMLParser
         | 
| 347 | 
            +
             | 
| 292 348 | 
             
            The non-validating XML parser.
         | 
| 349 | 
            +
             | 
| 293 350 | 
             
            The conformance of XMLScan::XMLParser to the specification
         | 
| 294 351 | 
             
            is described in another document.
         | 
| 352 | 
            +
             | 
| 295 353 | 
             
            SuperClass:
         | 
| 296 354 |  | 
| 297 355 | 
             
            XMLScan::XMLScanner
         | 
| @@ -308,7 +366,6 @@ XMLScan::Visitor#on_stag | |
| 308 366 | 
             
            After calling this method, XMLScan::Parser always call
         | 
| 309 367 | 
             
            corresponding XMLScan::Visitor#on_etag method.
         | 
| 310 368 |  | 
| 311 | 
            -
             | 
| 312 369 | 
             
            In addition, if you never intend error recovery, method calls
         | 
| 313 370 | 
             
            which must not be occurred in a well-formed XML document are
         | 
| 314 371 | 
             
            all suppressed.
         | 
    
        data/VERSION
    CHANGED
    
    | @@ -1 +1 @@ | |
| 1 | 
            -
            0.3. | 
| 1 | 
            +
            0.3.0
         | 
    
        metadata
    CHANGED
    
    | @@ -1,19 +1,19 @@ | |
| 1 1 | 
             
            --- !ruby/object:Gem::Specification
         | 
| 2 2 | 
             
            name: xmlscan
         | 
| 3 3 | 
             
            version: !ruby/object:Gem::Version
         | 
| 4 | 
            -
              version: 0.3. | 
| 5 | 
            -
              prerelease:  | 
| 4 | 
            +
              version: 0.3.0
         | 
| 5 | 
            +
              prerelease: 
         | 
| 6 6 | 
             
            platform: ruby
         | 
| 7 7 | 
             
            authors:
         | 
| 8 8 | 
             
            - UENO Katsuhiro <katsu@blue.sky.or.jp>
         | 
| 9 9 | 
             
            autorequire: 
         | 
| 10 10 | 
             
            bindir: bin
         | 
| 11 11 | 
             
            cert_chain: []
         | 
| 12 | 
            -
            date: 2012-02- | 
| 12 | 
            +
            date: 2012-02-26 00:00:00.000000000 Z
         | 
| 13 13 | 
             
            dependencies:
         | 
| 14 14 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 15 15 | 
             
              name: rspec
         | 
| 16 | 
            -
              requirement: & | 
| 16 | 
            +
              requirement: &5315060 !ruby/object:Gem::Requirement
         | 
| 17 17 | 
             
                none: false
         | 
| 18 18 | 
             
                requirements:
         | 
| 19 19 | 
             
                - - ~>
         | 
| @@ -21,10 +21,10 @@ dependencies: | |
| 21 21 | 
             
                    version: 2.8.0
         | 
| 22 22 | 
             
              type: :development
         | 
| 23 23 | 
             
              prerelease: false
         | 
| 24 | 
            -
              version_requirements: * | 
| 24 | 
            +
              version_requirements: *5315060
         | 
| 25 25 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 26 26 | 
             
              name: rdoc
         | 
| 27 | 
            -
              requirement: & | 
| 27 | 
            +
              requirement: &5313840 !ruby/object:Gem::Requirement
         | 
| 28 28 | 
             
                none: false
         | 
| 29 29 | 
             
                requirements:
         | 
| 30 30 | 
             
                - - ~>
         | 
| @@ -32,10 +32,10 @@ dependencies: | |
| 32 32 | 
             
                    version: '3.12'
         | 
| 33 33 | 
             
              type: :development
         | 
| 34 34 | 
             
              prerelease: false
         | 
| 35 | 
            -
              version_requirements: * | 
| 35 | 
            +
              version_requirements: *5313840
         | 
| 36 36 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 37 37 | 
             
              name: bundler
         | 
| 38 | 
            -
              requirement: & | 
| 38 | 
            +
              requirement: &5312600 !ruby/object:Gem::Requirement
         | 
| 39 39 | 
             
                none: false
         | 
| 40 40 | 
             
                requirements:
         | 
| 41 41 | 
             
                - - ~>
         | 
| @@ -43,10 +43,10 @@ dependencies: | |
| 43 43 | 
             
                    version: 1.0.0
         | 
| 44 44 | 
             
              type: :development
         | 
| 45 45 | 
             
              prerelease: false
         | 
| 46 | 
            -
              version_requirements: * | 
| 46 | 
            +
              version_requirements: *5312600
         | 
| 47 47 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 48 48 | 
             
              name: jeweler
         | 
| 49 | 
            -
              requirement: & | 
| 49 | 
            +
              requirement: &5310620 !ruby/object:Gem::Requirement
         | 
| 50 50 | 
             
                none: false
         | 
| 51 51 | 
             
                requirements:
         | 
| 52 52 | 
             
                - - ~>
         | 
| @@ -54,7 +54,7 @@ dependencies: | |
| 54 54 | 
             
                    version: 1.8.3
         | 
| 55 55 | 
             
              type: :development
         | 
| 56 56 | 
             
              prerelease: false
         | 
| 57 | 
            -
              version_requirements: * | 
| 57 | 
            +
              version_requirements: *5310620
         | 
| 58 58 | 
             
            description: The fastest XML parser written in 100% pure Ruby.
         | 
| 59 59 | 
             
            email: gerryg@inbox.com
         | 
| 60 60 | 
             
            executables: []
         | 
| @@ -96,13 +96,13 @@ required_ruby_version: !ruby/object:Gem::Requirement | |
| 96 96 | 
             
                  version: '0'
         | 
| 97 97 | 
             
                  segments:
         | 
| 98 98 | 
             
                  - 0
         | 
| 99 | 
            -
                  hash:  | 
| 99 | 
            +
                  hash: 21734141110843515
         | 
| 100 100 | 
             
            required_rubygems_version: !ruby/object:Gem::Requirement
         | 
| 101 101 | 
             
              none: false
         | 
| 102 102 | 
             
              requirements:
         | 
| 103 | 
            -
              - - ! ' | 
| 103 | 
            +
              - - ! '>='
         | 
| 104 104 | 
             
                - !ruby/object:Gem::Version
         | 
| 105 | 
            -
                  version:  | 
| 105 | 
            +
                  version: '0'
         | 
| 106 106 | 
             
            requirements: []
         | 
| 107 107 | 
             
            rubyforge_project: 
         | 
| 108 108 | 
             
            rubygems_version: 1.8.15
         |