sanitize 5.1.0 → 5.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +38 -18
- data/README.md +36 -38
- data/lib/sanitize.rb +15 -11
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +11 -11
- data/lib/sanitize/version.rb +1 -1
- data/test/test_clean_element.rb +4 -4
- data/test/test_malicious_html.rb +7 -1
- data/test/test_parser.rb +1 -1
- data/test/test_sanitize.rb +1 -1
- data/test/test_sanitize_css.rb +4 -4
- data/test/test_transformers.rb +25 -19
- metadata +6 -6
    
        checksums.yaml
    CHANGED
    
    | @@ -1,7 +1,7 @@ | |
| 1 1 | 
             
            ---
         | 
| 2 2 | 
             
            SHA256:
         | 
| 3 | 
            -
              metadata.gz:  | 
| 4 | 
            -
              data.tar.gz:  | 
| 3 | 
            +
              metadata.gz: 4f01a992746ecc3f28e9c1fd14c08c99456fb98a59c0b7ba6a8c6f01d0ab07cf
         | 
| 4 | 
            +
              data.tar.gz: 4f379538b26db4d239078ea7e54fea3b106e7801d093ed7407e9b71282f6c4d3
         | 
| 5 5 | 
             
            SHA512:
         | 
| 6 | 
            -
              metadata.gz:  | 
| 7 | 
            -
              data.tar.gz:  | 
| 6 | 
            +
              metadata.gz: 52d96c5f73eea8d738fe23d816d5aec856f9f37ca37cf88d88d385fcffbf242605d13494ab531b517af7bdea44bfae2569f27bc2d5fb005dbeee85a54211d674
         | 
| 7 | 
            +
              data.tar.gz: 897e95c05448509cfeb455bb4ec156ff7557495987e1d058ff63b888f9c0069a821a9b3684e0fe0463f78e4f28faf9fe2089760ad59bbbd1b5a5390fe9632154
         | 
    
        data/HISTORY.md
    CHANGED
    
    | @@ -1,5 +1,25 @@ | |
| 1 1 | 
             
            # Sanitize History
         | 
| 2 2 |  | 
| 3 | 
            +
            ## 5.2.0 (2020-06-06)
         | 
| 4 | 
            +
             | 
| 5 | 
            +
            ### Changes
         | 
| 6 | 
            +
             | 
| 7 | 
            +
            * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
         | 
| 8 | 
            +
              source and documentation.
         | 
| 9 | 
            +
             | 
| 10 | 
            +
              While the etymology of "whitelist" may not be explicitly racist in origin or
         | 
| 11 | 
            +
              intent, there are inherent racial connotations in the implication that white
         | 
| 12 | 
            +
              is good and black (as in "blacklist") is not.
         | 
| 13 | 
            +
             | 
| 14 | 
            +
              This is a change I should have made long ago, and I apologize for not making
         | 
| 15 | 
            +
              it sooner.
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
         | 
| 18 | 
            +
              deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
         | 
| 19 | 
            +
              The old keys will continue to work in order to avoid breaking existing code,
         | 
| 20 | 
            +
              but they are no longer documented and may be removed in a future semver major
         | 
| 21 | 
            +
              release.
         | 
| 22 | 
            +
             | 
| 3 23 | 
             
            ## 5.1.0 (2019-09-07)
         | 
| 4 24 |  | 
| 5 25 | 
             
            ### Features
         | 
| @@ -45,7 +65,7 @@ review the changes below carefully. | |
| 45 65 | 
             
              - `script`
         | 
| 46 66 | 
             
              - `style`
         | 
| 47 67 |  | 
| 48 | 
            -
            * Children of  | 
| 68 | 
            +
            * Children of allowlisted `iframe` elements are now always removed. In modern
         | 
| 49 69 | 
             
              HTML, `iframe` elements should never have children. In HTML 4 and earlier
         | 
| 50 70 | 
             
              `iframe` elements were allowed to contain fallback content for legacy
         | 
| 51 71 | 
             
              browsers, but it's been almost two decades since that was useful.
         | 
| @@ -84,7 +104,7 @@ review the changes below carefully. | |
| 84 104 |  | 
| 85 105 | 
             
              When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
         | 
| 86 106 | 
             
              specially crafted HTML fragment can cause libxml2 to generate improperly
         | 
| 87 | 
            -
              escaped output, allowing non- | 
| 107 | 
            +
              escaped output, allowing non-allowlisted attributes to be used on allowlisted
         | 
| 88 108 | 
             
              elements.
         | 
| 89 109 |  | 
| 90 110 | 
             
              Sanitize now performs additional escaping on affected attributes to prevent
         | 
| @@ -128,7 +148,7 @@ review the changes below carefully. | |
| 128 148 |  | 
| 129 149 | 
             
            ## 4.4.0 (2016-09-29)
         | 
| 130 150 |  | 
| 131 | 
            -
            * Added `srcset` to the attribute  | 
| 151 | 
            +
            * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
         | 
| 132 152 | 
             
              config. [@ejtttje - #156][156]
         | 
| 133 153 |  | 
| 134 154 | 
             
            [156]:https://github.com/rgrove/sanitize/pull/156
         | 
| @@ -249,7 +269,7 @@ review the changes below carefully. | |
| 249 269 | 
             
            ## 3.0.4 (2014-12-12)
         | 
| 250 270 |  | 
| 251 271 | 
             
            * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
         | 
| 252 | 
            -
              caused the URL to be removed even when the protocol was  | 
| 272 | 
            +
              caused the URL to be removed even when the protocol was allowlisted.
         | 
| 253 273 | 
             
              [@benubois - #126][126]
         | 
| 254 274 |  | 
| 255 275 | 
             
            [126]:https://github.com/rgrove/sanitize/pull/126
         | 
| @@ -258,7 +278,7 @@ review the changes below carefully. | |
| 258 278 | 
             
            ## 3.0.3 (2014-10-29)
         | 
| 259 279 |  | 
| 260 280 | 
             
            * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
         | 
| 261 | 
            -
              `@media` block, causing them to be removed even when  | 
| 281 | 
            +
              `@media` block, causing them to be removed even when allowlist rules should
         | 
| 262 282 | 
             
              have allowed them to remain. [#121][121]
         | 
| 263 283 |  | 
| 264 284 | 
             
            [121]:https://github.com/rgrove/sanitize/issues/121
         | 
| @@ -323,7 +343,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 323 343 | 
             
            * The `clean_node!` method was renamed to `node!`.
         | 
| 324 344 |  | 
| 325 345 | 
             
            * The `document` method now raises a `Sanitize::Error` if the `<html>` element
         | 
| 326 | 
            -
              isn't  | 
| 346 | 
            +
              isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
         | 
| 327 347 | 
             
              regardless of the `:remove_contents` config setting.
         | 
| 328 348 |  | 
| 329 349 | 
             
            * The `:output` config has been removed. Output is now always HTML, not XHTML.
         | 
| @@ -334,7 +354,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 334 354 |  | 
| 335 355 | 
             
            * Added advanced CSS sanitization support using [Crass][crass], which is fully
         | 
| 336 356 | 
             
              compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
         | 
| 337 | 
            -
               | 
| 357 | 
            +
              allowlisted `<style>` elements and `style` attributes in HTML will be
         | 
| 338 358 | 
             
              sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
         | 
| 339 359 | 
             
              sanitize CSS stylesheets or properties.
         | 
| 340 360 |  | 
| @@ -386,7 +406,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 386 406 |  | 
| 387 407 | 
             
              When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
         | 
| 388 408 | 
             
              specially crafted HTML fragment can cause libxml2 to generate improperly
         | 
| 389 | 
            -
              escaped output, allowing non- | 
| 409 | 
            +
              escaped output, allowing non-allowlisted attributes to be used on allowlisted
         | 
| 390 410 | 
             
              elements.
         | 
| 391 411 |  | 
| 392 412 | 
             
              Sanitize now performs additional escaping on affected attributes to prevent
         | 
| @@ -401,7 +421,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 401 421 |  | 
| 402 422 | 
             
            ## 2.1.0 (2014-01-13)
         | 
| 403 423 |  | 
| 404 | 
            -
            * Added support for  | 
| 424 | 
            +
            * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
         | 
| 405 425 | 
             
              symbol `:data` instead of an attribute name in the `:attributes` config to
         | 
| 406 426 | 
             
              indicate that arbitrary data attributes should be allowed on an element.
         | 
| 407 427 |  | 
| @@ -482,12 +502,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 482 502 | 
             
              the default depth-first mode.
         | 
| 483 503 |  | 
| 484 504 | 
             
            * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
         | 
| 485 | 
            -
              elements to the  | 
| 505 | 
            +
              elements to the allowlists for the basic and relaxed configs.
         | 
| 486 506 |  | 
| 487 507 | 
             
            * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
         | 
| 488 | 
            -
              `ruby`, and `wbr` elements to the  | 
| 508 | 
            +
              `ruby`, and `wbr` elements to the allowlist for the relaxed config.
         | 
| 489 509 |  | 
| 490 | 
            -
            * The `dir`, `lang`, and `title` attributes are now  | 
| 510 | 
            +
            * The `dir`, `lang`, and `title` attributes are now allowlisted for all
         | 
| 491 511 | 
             
              elements in the relaxed config.
         | 
| 492 512 |  | 
| 493 513 | 
             
            * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
         | 
| @@ -498,7 +518,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 498 518 | 
             
            ## 1.2.1 (2010-04-20)
         | 
| 499 519 |  | 
| 500 520 | 
             
            * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
         | 
| 501 | 
            -
              remove the contents of all non- | 
| 521 | 
            +
              remove the contents of all non-allowlisted elements in addition to the
         | 
| 502 522 | 
             
              elements themselves. If set to an array of element names, Sanitize will
         | 
| 503 523 | 
             
              remove the contents of only those elements (when filtered), and leave the
         | 
| 504 524 | 
             
              contents of other filtered elements. [Thanks to Rafael Souza for the array
         | 
| @@ -526,7 +546,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 526 546 | 
             
            * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
         | 
| 527 547 | 
             
              all its children.
         | 
| 528 548 |  | 
| 529 | 
            -
            * Added elements `<h1>` through `<h6>` to the Relaxed  | 
| 549 | 
            +
            * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
         | 
| 530 550 | 
             
              David Reese]
         | 
| 531 551 |  | 
| 532 552 |  | 
| @@ -546,7 +566,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 546 566 |  | 
| 547 567 | 
             
            * Added a workaround for an Hpricot bug that prevents attribute names from
         | 
| 548 568 | 
             
              being downcased in recent versions of Hpricot. This was exploitable to
         | 
| 549 | 
            -
              prevent non- | 
| 569 | 
            +
              prevent non-allowlisted protocols from being cleaned. [Reported by Ben
         | 
| 550 570 | 
             
              Wanicur]
         | 
| 551 571 |  | 
| 552 572 |  | 
| @@ -576,7 +596,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 576 596 |  | 
| 577 597 | 
             
            ## 1.0.5 (2009-02-05)
         | 
| 578 598 |  | 
| 579 | 
            -
            * Fixed a bug introduced in version 1.0.3 that prevented non- | 
| 599 | 
            +
            * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
         | 
| 580 600 | 
             
              protocols from being cleaned when relative URLs were allowed. [Reported by
         | 
| 581 601 | 
             
              Dev Purkayastha]
         | 
| 582 602 |  | 
| @@ -586,7 +606,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 586 606 |  | 
| 587 607 | 
             
            ## 1.0.4 (2009-01-16)
         | 
| 588 608 |  | 
| 589 | 
            -
            * Fixed a bug that made it possible to sneak a non- | 
| 609 | 
            +
            * Fixed a bug that made it possible to sneak a non-allowlisted element through
         | 
| 590 610 | 
             
              by repeating it several times in a row. All versions of Sanitize prior to
         | 
| 591 611 | 
             
              1.0.4 are vulnerable. [Reported by Cristobal]
         | 
| 592 612 |  | 
| @@ -594,7 +614,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC, | |
| 594 614 | 
             
            ## 1.0.3 (2009-01-15)
         | 
| 595 615 |  | 
| 596 616 | 
             
            * Fixed a bug whereby incomplete Unicode or hex entities could be used to
         | 
| 597 | 
            -
              prevent non- | 
| 617 | 
            +
              prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
         | 
| 598 618 | 
             
              still decode the incomplete entities, users of those browsers may be
         | 
| 599 619 | 
             
              vulnerable to malicious script injection on websites using versions of
         | 
| 600 620 | 
             
              Sanitize prior to 1.0.3.
         | 
    
        data/README.md
    CHANGED
    
    | @@ -1,20 +1,19 @@ | |
| 1 1 | 
             
            Sanitize
         | 
| 2 2 | 
             
            ========
         | 
| 3 3 |  | 
| 4 | 
            -
            Sanitize is  | 
| 5 | 
            -
            elements, attributes, and  | 
| 6 | 
            -
             | 
| 4 | 
            +
            Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
         | 
| 5 | 
            +
            and/or CSS from a string except the elements, attributes, and properties you
         | 
| 6 | 
            +
            choose to allow.
         | 
| 7 7 |  | 
| 8 8 | 
             
            Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
         | 
| 9 9 | 
             
            elements, certain attributes within those elements, and even certain URL
         | 
| 10 | 
            -
            protocols within attributes that contain URLs. You can also  | 
| 11 | 
            -
            properties, @ rules, and URL protocols  | 
| 12 | 
            -
             | 
| 13 | 
            -
            be removed.
         | 
| 10 | 
            +
            protocols within attributes that contain URLs. You can also allow specific CSS
         | 
| 11 | 
            +
            properties, @ rules, and URL protocols in elements or attributes containing CSS.
         | 
| 12 | 
            +
            Any HTML or CSS that you don't explicitly allow will be removed.
         | 
| 14 13 |  | 
| 15 14 | 
             
            Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
         | 
| 16 15 | 
             
            exactly the same way modern browsers do, and [Crass][crass], which parses CSS
         | 
| 17 | 
            -
            exactly the same way modern browsers do. As long as your  | 
| 16 | 
            +
            exactly the same way modern browsers do. As long as your allowlist config only
         | 
| 18 17 | 
             
            allows safe markup and CSS, even the most malformed or malicious input will be
         | 
| 19 18 | 
             
            transformed into safe output.
         | 
| 20 19 |  | 
| @@ -88,7 +87,7 @@ Sanitize.fragment(html) | |
| 88 87 | 
             
            # => 'foo'
         | 
| 89 88 | 
             
            ```
         | 
| 90 89 |  | 
| 91 | 
            -
            To keep certain elements, add them to the element  | 
| 90 | 
            +
            To keep certain elements, add them to the element allowlist.
         | 
| 92 91 |  | 
| 93 92 | 
             
            ```ruby
         | 
| 94 93 | 
             
            Sanitize.fragment(html, :elements => ['b'])
         | 
| @@ -97,7 +96,7 @@ Sanitize.fragment(html, :elements => ['b']) | |
| 97 96 |  | 
| 98 97 | 
             
            ### HTML Documents
         | 
| 99 98 |  | 
| 100 | 
            -
            When sanitizing a document, the `<html>` element must be  | 
| 99 | 
            +
            When sanitizing a document, the `<html>` element must be allowlisted. You can
         | 
| 101 100 | 
             
            also set `:allow_doctype` to `true` to allow well-formed document type
         | 
| 102 101 | 
             
            definitions.
         | 
| 103 102 |  | 
| @@ -123,8 +122,8 @@ Sanitize.document(html, | |
| 123 122 |  | 
| 124 123 | 
             
            ### CSS in HTML
         | 
| 125 124 |  | 
| 126 | 
            -
            To sanitize CSS in an HTML fragment or document, first  | 
| 127 | 
            -
            element and/or the `style` attribute. Then  | 
| 125 | 
            +
            To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
         | 
| 126 | 
            +
            element and/or the `style` attribute. Then allowlist the CSS properties,
         | 
| 128 127 | 
             
            @ rules, and URL protocols you wish to allow. You can also choose whether to
         | 
| 129 128 | 
             
            allow CSS comments or browser compatibility hacks.
         | 
| 130 129 |  | 
| @@ -267,7 +266,7 @@ new copy using `Sanitize::Config.merge()`, like so: | |
| 267 266 |  | 
| 268 267 | 
             
            ```ruby
         | 
| 269 268 | 
             
            # Create a customized copy of the Basic config, adding <div> and <table> to the
         | 
| 270 | 
            -
            # existing  | 
| 269 | 
            +
            # existing allowlisted elements.
         | 
| 271 270 | 
             
            Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
         | 
| 272 271 | 
             
              :elements        => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
         | 
| 273 272 | 
             
              :remove_contents => true
         | 
| @@ -395,8 +394,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") } | |
| 395 394 |  | 
| 396 395 | 
             
            ##### :css => :properties (Array or Set)
         | 
| 397 396 |  | 
| 398 | 
            -
             | 
| 399 | 
            -
            lowercase.
         | 
| 397 | 
            +
            List of CSS property names to allow. Names should be specified in lowercase.
         | 
| 400 398 |  | 
| 401 399 | 
             
            ##### :css => :protocols (Array or Set)
         | 
| 402 400 |  | 
| @@ -452,7 +450,7 @@ include the symbol `:relative` in the protocol array: | |
| 452 450 |  | 
| 453 451 | 
             
            #### :remove_contents (boolean or Array or Set)
         | 
| 454 452 |  | 
| 455 | 
            -
            If this is `true`, Sanitize will remove the contents of any non- | 
| 453 | 
            +
            If this is `true`, Sanitize will remove the contents of any non-allowlisted
         | 
| 456 454 | 
             
            elements in addition to the elements themselves. By default, Sanitize leaves the
         | 
| 457 455 | 
             
            safe parts of an element's contents behind when the element is removed.
         | 
| 458 456 |  | 
| @@ -518,33 +516,33 @@ argument a Hash that contains the following items: | |
| 518 516 |  | 
| 519 517 | 
             
              * **:config** - The current Sanitize configuration Hash.
         | 
| 520 518 |  | 
| 521 | 
            -
              * **: | 
| 519 | 
            +
              * **:is_allowlisted** - `true` if the current node has been allowlisted by a
         | 
| 522 520 | 
             
                previous transformer, `false` otherwise. It's generally bad form to remove
         | 
| 523 | 
            -
                a node that a previous transformer has  | 
| 521 | 
            +
                a node that a previous transformer has allowlisted.
         | 
| 524 522 |  | 
| 525 523 | 
             
              * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
         | 
| 526 524 | 
             
                node may be an element, a text node, a comment, a CDATA node, or a document
         | 
| 527 525 | 
             
                fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
         | 
| 528 526 | 
             
                selectively ignore node types you aren't interested in.
         | 
| 529 527 |  | 
| 528 | 
            +
              * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
         | 
| 529 | 
            +
                document that have been allowlisted by previous transformers, if any. It's
         | 
| 530 | 
            +
                generally bad form to remove a node that a previous transformer has
         | 
| 531 | 
            +
                allowlisted.
         | 
| 532 | 
            +
             | 
| 530 533 | 
             
              * **:node_name** - The name of the current HTML node, always lowercase (e.g.
         | 
| 531 534 | 
             
                "div" or "span"). For non-element nodes, the name will be something like
         | 
| 532 535 | 
             
                "text", "comment", "#cdata-section", "#document-fragment", etc.
         | 
| 533 536 |  | 
| 534 | 
            -
              * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
         | 
| 535 | 
            -
                document that have been whitelisted by previous transformers, if any. It's
         | 
| 536 | 
            -
                generally bad form to remove a node that a previous transformer has
         | 
| 537 | 
            -
                whitelisted.
         | 
| 538 | 
            -
             | 
| 539 537 | 
             
            ### Output
         | 
| 540 538 |  | 
| 541 539 | 
             
            A transformer doesn't have to return anything, but may optionally return a Hash,
         | 
| 542 540 | 
             
            which may contain the following items:
         | 
| 543 541 |  | 
| 544 | 
            -
              * **: | 
| 545 | 
            -
                to add to the document's  | 
| 546 | 
            -
                These specific nodes and all their attributes will be  | 
| 547 | 
            -
                their children will not be.
         | 
| 542 | 
            +
              * **:node_allowlist** -  Array or Set of specific `Nokogiri::XML::Node`
         | 
| 543 | 
            +
                objects to add to the document's allowlist, bypassing the current Sanitize
         | 
| 544 | 
            +
                config. These specific nodes and all their attributes will be allowlisted,
         | 
| 545 | 
            +
                but their children will not be.
         | 
| 548 546 |  | 
| 549 547 | 
             
            If a transformer returns anything other than a Hash, the return value will be
         | 
| 550 548 | 
             
            ignored.
         | 
| @@ -587,16 +585,16 @@ Transformers have a tremendous amount of power, including the power to | |
| 587 585 | 
             
            completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
         | 
| 588 586 | 
             
            your own hands.
         | 
| 589 587 |  | 
| 590 | 
            -
            ### Example: Transformer to  | 
| 588 | 
            +
            ### Example: Transformer to allow image URLs by domain
         | 
| 591 589 |  | 
| 592 590 | 
             
            The following example demonstrates how to remove image elements unless they use
         | 
| 593 591 | 
             
            a relative URL or are hosted on a specific domain. It assumes that the `<img>`
         | 
| 594 | 
            -
            element and its `src` attribute are already  | 
| 592 | 
            +
            element and its `src` attribute are already allowlisted.
         | 
| 595 593 |  | 
| 596 594 | 
             
            ```ruby
         | 
| 597 595 | 
             
            require 'uri'
         | 
| 598 596 |  | 
| 599 | 
            -
             | 
| 597 | 
            +
            image_allowlist_transformer = lambda do |env|
         | 
| 600 598 | 
             
              # Ignore everything except <img> elements.
         | 
| 601 599 | 
             
              return unless env[:node_name] == 'img'
         | 
| 602 600 |  | 
| @@ -612,20 +610,20 @@ image_whitelist_transformer = lambda do |env| | |
| 612 610 | 
             
            end
         | 
| 613 611 | 
             
            ```
         | 
| 614 612 |  | 
| 615 | 
            -
            ### Example: Transformer to  | 
| 613 | 
            +
            ### Example: Transformer to allow YouTube video embeds
         | 
| 616 614 |  | 
| 617 615 | 
             
            The following example demonstrates how to create a transformer that will safely
         | 
| 618 | 
            -
             | 
| 619 | 
            -
             | 
| 620 | 
            -
             | 
| 616 | 
            +
            allow valid YouTube video embeds without having to allow other kinds of embedded
         | 
| 617 | 
            +
            content, which would be the case if you tried to do this by just allowing all
         | 
| 618 | 
            +
            `<iframe>` elements:
         | 
| 621 619 |  | 
| 622 620 | 
             
            ```ruby
         | 
| 623 621 | 
             
            youtube_transformer = lambda do |env|
         | 
| 624 622 | 
             
              node      = env[:node]
         | 
| 625 623 | 
             
              node_name = env[:node_name]
         | 
| 626 624 |  | 
| 627 | 
            -
              # Don't continue if this node is already  | 
| 628 | 
            -
              return if env[: | 
| 625 | 
            +
              # Don't continue if this node is already allowlisted or is not an element.
         | 
| 626 | 
            +
              return if env[:is_allowlisted] || !node.element?
         | 
| 629 627 |  | 
| 630 628 | 
             
              # Don't continue unless the node is an iframe.
         | 
| 631 629 | 
             
              return unless node_name == 'iframe'
         | 
| @@ -646,8 +644,8 @@ youtube_transformer = lambda do |env| | |
| 646 644 |  | 
| 647 645 | 
             
              # Now that we're sure that this is a valid YouTube embed and that there are
         | 
| 648 646 | 
             
              # no unwanted elements or attributes hidden inside it, we can tell Sanitize
         | 
| 649 | 
            -
              # to  | 
| 650 | 
            -
              {: | 
| 647 | 
            +
              # to allowlist the current node.
         | 
| 648 | 
            +
              {:node_allowlist => [node]}
         | 
| 651 649 | 
             
            end
         | 
| 652 650 |  | 
| 653 651 | 
             
            html = %[
         | 
    
        data/lib/sanitize.rb
    CHANGED
    
    | @@ -54,7 +54,7 @@ class Sanitize | |
| 54 54 | 
             
              # Returns a sanitized copy of the given full _html_ document, using the
         | 
| 55 55 | 
             
              # settings in _config_ if specified.
         | 
| 56 56 | 
             
              #
         | 
| 57 | 
            -
              # When sanitizing a document, the `<html>` element must be  | 
| 57 | 
            +
              # When sanitizing a document, the `<html>` element must be allowlisted or an
         | 
| 58 58 | 
             
              # error will be raised. If this is undesirable, you should probably use
         | 
| 59 59 | 
             
              # {#fragment} instead.
         | 
| 60 60 | 
             
              def self.document(html, config = {})
         | 
| @@ -117,7 +117,7 @@ class Sanitize | |
| 117 117 |  | 
| 118 118 | 
             
              # Returns a sanitized copy of the given _html_ document.
         | 
| 119 119 | 
             
              #
         | 
| 120 | 
            -
              # When sanitizing a document, the `<html>` element must be  | 
| 120 | 
            +
              # When sanitizing a document, the `<html>` element must be allowlisted or an
         | 
| 121 121 | 
             
              # error will be raised. If this is undesirable, you should probably use
         | 
| 122 122 | 
             
              # {#fragment} instead.
         | 
| 123 123 | 
             
              def document(html)
         | 
| @@ -147,20 +147,20 @@ class Sanitize | |
| 147 147 | 
             
              # in place.
         | 
| 148 148 | 
             
              #
         | 
| 149 149 | 
             
              # If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
         | 
| 150 | 
            -
              #  | 
| 150 | 
            +
              # allowlisted or an error will be raised.
         | 
| 151 151 | 
             
              def node!(node)
         | 
| 152 152 | 
             
                raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
         | 
| 153 153 |  | 
| 154 154 | 
             
                if node.is_a?(Nokogiri::XML::Document)
         | 
| 155 155 | 
             
                  unless @config[:elements].include?('html')
         | 
| 156 | 
            -
                    raise Error, 'When sanitizing a document, "<html>" must be  | 
| 156 | 
            +
                    raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
         | 
| 157 157 | 
             
                  end
         | 
| 158 158 | 
             
                end
         | 
| 159 159 |  | 
| 160 | 
            -
                 | 
| 160 | 
            +
                node_allowlist = Set.new
         | 
| 161 161 |  | 
| 162 162 | 
             
                traverse(node) do |n|
         | 
| 163 | 
            -
                  transform_node!(n,  | 
| 163 | 
            +
                  transform_node!(n, node_allowlist)
         | 
| 164 164 | 
             
                end
         | 
| 165 165 |  | 
| 166 166 | 
             
                node
         | 
| @@ -189,7 +189,7 @@ class Sanitize | |
| 189 189 | 
             
                node.to_html(preserve_newline: true)
         | 
| 190 190 | 
             
              end
         | 
| 191 191 |  | 
| 192 | 
            -
              def transform_node!(node,  | 
| 192 | 
            +
              def transform_node!(node, node_allowlist)
         | 
| 193 193 | 
             
                @transformers.each do |transformer|
         | 
| 194 194 | 
             
                  # Since transform_node! may be called in a tight loop to process thousands
         | 
| 195 195 | 
             
                  # of items, we can optimize both memory and CPU performance by:
         | 
| @@ -199,15 +199,19 @@ class Sanitize | |
| 199 199 | 
             
                  # does merge! create a new hash, it is also 2.6x slower:
         | 
| 200 200 | 
             
                  # https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
         | 
| 201 201 | 
             
                  config = @transformer_config
         | 
| 202 | 
            -
                  config[:is_whitelisted] =  | 
| 202 | 
            +
                  config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
         | 
| 203 203 | 
             
                  config[:node] = node
         | 
| 204 204 | 
             
                  config[:node_name] = node.name.downcase
         | 
| 205 | 
            -
                  config[:node_whitelist] =  | 
| 205 | 
            +
                  config[:node_allowlist] = config[:node_whitelist] = node_allowlist
         | 
| 206 206 |  | 
| 207 207 | 
             
                  result = transformer.call(config)
         | 
| 208 208 |  | 
| 209 | 
            -
                  if result.is_a?(Hash) | 
| 210 | 
            -
                     | 
| 209 | 
            +
                  if result.is_a?(Hash)
         | 
| 210 | 
            +
                    result_allowlist = result[:node_allowlist] || result[:node_whitelist]
         | 
| 211 | 
            +
             | 
| 212 | 
            +
                    if result_allowlist.respond_to?(:each)
         | 
| 213 | 
            +
                      node_allowlist.merge(result_allowlist)
         | 
| 214 | 
            +
                    end
         | 
| 211 215 | 
             
                  end
         | 
| 212 216 | 
             
                end
         | 
| 213 217 |  | 
    
        data/lib/sanitize/css.rb
    CHANGED
    
    | @@ -175,7 +175,7 @@ class Sanitize; class CSS | |
| 175 175 | 
             
                    next prop
         | 
| 176 176 |  | 
| 177 177 | 
             
                  when :semicolon
         | 
| 178 | 
            -
                    # Only preserve the semicolon if it was preceded by  | 
| 178 | 
            +
                    # Only preserve the semicolon if it was preceded by an allowlisted
         | 
| 179 179 | 
             
                    # property. Otherwise, omit it in order to prevent redundant semicolons.
         | 
| 180 180 | 
             
                    if preceded_by_property
         | 
| 181 181 | 
             
                      preceded_by_property = false
         | 
| @@ -296,7 +296,7 @@ class Sanitize; class CSS | |
| 296 296 | 
             
              end
         | 
| 297 297 |  | 
| 298 298 | 
             
              # Returns `true` if the given node (which may be of type `:url` or
         | 
| 299 | 
            -
              # `:function`, since the CSS syntax can produce both) uses  | 
| 299 | 
            +
              # `:function`, since the CSS syntax can produce both) uses an allowlisted
         | 
| 300 300 | 
             
              # protocol.
         | 
| 301 301 | 
             
              def valid_url?(node)
         | 
| 302 302 | 
             
                type = node[:node]
         | 
| @@ -1,6 +1,6 @@ | |
| 1 1 | 
             
            class Sanitize; module Transformers; module CSS
         | 
| 2 2 |  | 
| 3 | 
            -
            # Enforces a CSS  | 
| 3 | 
            +
            # Enforces a CSS allowlist on the contents of `style` attributes.
         | 
| 4 4 | 
             
            class CleanAttribute
         | 
| 5 5 | 
             
              def initialize(sanitizer_or_config)
         | 
| 6 6 | 
             
                if Sanitize::CSS === sanitizer_or_config
         | 
| @@ -14,7 +14,7 @@ class CleanAttribute | |
| 14 14 | 
             
                node = env[:node]
         | 
| 15 15 |  | 
| 16 16 | 
             
                return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
         | 
| 17 | 
            -
                    node.key?('style') && !env[: | 
| 17 | 
            +
                    node.key?('style') && !env[:is_allowlisted]
         | 
| 18 18 |  | 
| 19 19 | 
             
                attr = node.attribute('style')
         | 
| 20 20 | 
             
                css  = @scss.properties(attr.value)
         | 
| @@ -27,7 +27,7 @@ class CleanAttribute | |
| 27 27 | 
             
              end
         | 
| 28 28 | 
             
            end
         | 
| 29 29 |  | 
| 30 | 
            -
            # Enforces a CSS  | 
| 30 | 
            +
            # Enforces a CSS allowlist on the contents of `<style>` elements.
         | 
| 31 31 | 
             
            class CleanElement
         | 
| 32 32 | 
             
              def initialize(sanitizer_or_config)
         | 
| 33 33 | 
             
                if Sanitize::CSS === sanitizer_or_config
         | 
| @@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement | |
| 76 76 |  | 
| 77 77 | 
             
              def call(env)
         | 
| 78 78 | 
             
                node = env[:node]
         | 
| 79 | 
            -
                return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[: | 
| 79 | 
            +
                return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
         | 
| 80 80 |  | 
| 81 81 | 
             
                name = env[:node_name]
         | 
| 82 82 |  | 
| 83 | 
            -
                # Delete any element that isn't in the config  | 
| 83 | 
            +
                # Delete any element that isn't in the config allowlist, unless the node has
         | 
| 84 84 | 
             
                # already been deleted from the document.
         | 
| 85 85 | 
             
                #
         | 
| 86 86 | 
             
                # It's important that we not try to reparent the children of a node that has
         | 
| @@ -107,20 +107,20 @@ class Sanitize; module Transformers; class CleanElement | |
| 107 107 | 
             
                  return
         | 
| 108 108 | 
             
                end
         | 
| 109 109 |  | 
| 110 | 
            -
                 | 
| 110 | 
            +
                attr_allowlist = @attributes[name] || @attributes[:all]
         | 
| 111 111 |  | 
| 112 | 
            -
                if  | 
| 113 | 
            -
                  # Delete all attributes from elements with no  | 
| 112 | 
            +
                if attr_allowlist.nil?
         | 
| 113 | 
            +
                  # Delete all attributes from elements with no allowlisted attributes.
         | 
| 114 114 | 
             
                  node.attribute_nodes.each {|attr| attr.unlink }
         | 
| 115 115 | 
             
                else
         | 
| 116 | 
            -
                  allow_data_attributes =  | 
| 116 | 
            +
                  allow_data_attributes = attr_allowlist.include?(:data)
         | 
| 117 117 |  | 
| 118 118 | 
             
                  # Delete any attribute that isn't allowed on this element.
         | 
| 119 119 | 
             
                  node.attribute_nodes.each do |attr|
         | 
| 120 120 | 
             
                    attr_name = attr.name.downcase
         | 
| 121 121 |  | 
| 122 | 
            -
                    unless  | 
| 123 | 
            -
                      # The attribute isn't  | 
| 122 | 
            +
                    unless attr_allowlist.include?(attr_name)
         | 
| 123 | 
            +
                      # The attribute isn't allowed.
         | 
| 124 124 |  | 
| 125 125 | 
             
                      if allow_data_attributes && attr_name.start_with?('data-')
         | 
| 126 126 | 
             
                        # Arbitrary data attributes are allowed. If this is a data
         | 
| @@ -134,7 +134,7 @@ class Sanitize; module Transformers; class CleanElement | |
| 134 134 | 
             
                      next
         | 
| 135 135 | 
             
                    end
         | 
| 136 136 |  | 
| 137 | 
            -
                    # The attribute is  | 
| 137 | 
            +
                    # The attribute is allowed.
         | 
| 138 138 |  | 
| 139 139 | 
             
                    # Remove any attributes that use unacceptable protocols.
         | 
| 140 140 | 
             
                    if @protocols.include?(name) && @protocols[name].include?(attr_name)
         | 
| @@ -162,7 +162,7 @@ class Sanitize; module Transformers; class CleanElement | |
| 162 162 | 
             
                    # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
         | 
| 163 163 | 
             
                    # attempt to preserve server-side includes. This can result in XSS since
         | 
| 164 164 | 
             
                    # an unescaped double quote can allow an attacker to inject a
         | 
| 165 | 
            -
                    # non- | 
| 165 | 
            +
                    # non-allowlisted attribute.
         | 
| 166 166 | 
             
                    #
         | 
| 167 167 | 
             
                    # Sanitize works around this by implementing its own escaping for
         | 
| 168 168 | 
             
                    # affected attributes, some of which can exist on any element and some
         | 
| @@ -191,7 +191,7 @@ class Sanitize; module Transformers; class CleanElement | |
| 191 191 | 
             
                # Element-specific special cases.
         | 
| 192 192 | 
             
                case name
         | 
| 193 193 |  | 
| 194 | 
            -
                # If this is  | 
| 194 | 
            +
                # If this is an allowlisted iframe that has children, remove all its
         | 
| 195 195 | 
             
                # children. The HTML standard says iframes shouldn't have content, but when
         | 
| 196 196 | 
             
                # they do, this content is parsed as text and is serialized verbatim without
         | 
| 197 197 | 
             
                # being escaped, which is unsafe because legacy browsers may still render it
         | 
    
        data/lib/sanitize/version.rb
    CHANGED
    
    
    
        data/test/test_clean_element.rb
    CHANGED
    
    | @@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do | |
| 162 162 | 
             
              }
         | 
| 163 163 |  | 
| 164 164 | 
             
              describe 'Default config' do
         | 
| 165 | 
            -
                it 'should remove non- | 
| 165 | 
            +
                it 'should remove non-allowlisted elements, leaving safe contents behind' do
         | 
| 166 166 | 
             
                  Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
         | 
| 167 167 | 
             
                    .must_equal 'foo bar baz quux'
         | 
| 168 168 |  | 
| @@ -315,7 +315,7 @@ describe 'Sanitize::Transformers::CleanElement' do | |
| 315 315 | 
             
              end
         | 
| 316 316 |  | 
| 317 317 | 
             
              describe 'Custom configs' do
         | 
| 318 | 
            -
                it 'should allow attributes on all elements if  | 
| 318 | 
            +
                it 'should allow attributes on all elements if allowlisted under :all' do
         | 
| 319 319 | 
             
                  input = '<p class="foo">bar</p>'
         | 
| 320 320 |  | 
| 321 321 | 
             
                  Sanitize.fragment(input).must_equal ' bar '
         | 
| @@ -336,7 +336,7 @@ describe 'Sanitize::Transformers::CleanElement' do | |
| 336 336 | 
             
                  }).must_equal input
         | 
| 337 337 | 
             
                end
         | 
| 338 338 |  | 
| 339 | 
            -
                it "should not allow relative URLs when relative URLs aren't  | 
| 339 | 
            +
                it "should not allow relative URLs when relative URLs aren't allowlisted" do
         | 
| 340 340 | 
             
                  input = '<a href="/foo/bar">Link</a>'
         | 
| 341 341 |  | 
| 342 342 | 
             
                  Sanitize.fragment(input,
         | 
| @@ -400,7 +400,7 @@ describe 'Sanitize::Transformers::CleanElement' do | |
| 400 400 | 
             
                  ).must_equal 'foo bar  baz hi '
         | 
| 401 401 | 
             
                end
         | 
| 402 402 |  | 
| 403 | 
            -
                it 'should remove the contents of  | 
| 403 | 
            +
                it 'should remove the contents of allowlisted iframes' do
         | 
| 404 404 | 
             
                  Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
         | 
| 405 405 | 
             
                    :elements => ['iframe']
         | 
| 406 406 | 
             
                  ).must_equal '<iframe></iframe>'
         | 
    
        data/test/test_malicious_html.rb
    CHANGED
    
    | @@ -128,13 +128,15 @@ describe 'Malicious HTML' do | |
| 128 128 |  | 
| 129 129 | 
             
              # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
         | 
| 130 130 | 
             
              # attempt to preserve server-side includes. This can result in XSS since an
         | 
| 131 | 
            -
              # unescaped double quote can allow an attacker to inject a non- | 
| 131 | 
            +
              # unescaped double quote can allow an attacker to inject a non-allowlisted
         | 
| 132 132 | 
             
              # attribute. Sanitize works around this by implementing its own escaping for
         | 
| 133 133 | 
             
              # affected attributes.
         | 
| 134 134 | 
             
              #
         | 
| 135 135 | 
             
              # The relevant libxml2 code is here:
         | 
| 136 136 | 
             
              # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
         | 
| 137 137 | 
             
              describe 'unsafe libxml2 server-side includes in attributes' do
         | 
| 138 | 
            +
                using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
         | 
| 139 | 
            +
             | 
| 138 140 | 
             
                tag_configs = [
         | 
| 139 141 | 
             
                  {
         | 
| 140 142 | 
             
                    tag_name: 'a',
         | 
| @@ -166,6 +168,8 @@ describe 'Malicious HTML' do | |
| 166 168 | 
             
                    input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
         | 
| 167 169 |  | 
| 168 170 | 
             
                    it 'should escape unsafe characters in attributes' do
         | 
| 171 | 
            +
                      skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
         | 
| 172 | 
            +
             | 
| 169 173 | 
             
                      # This uses Nokogumbo's HTML-compliant serializer rather than
         | 
| 170 174 | 
             
                      # libxml2's.
         | 
| 171 175 | 
             
                      @s.fragment(input).
         | 
| @@ -191,6 +195,8 @@ describe 'Malicious HTML' do | |
| 191 195 | 
             
                    input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
         | 
| 192 196 |  | 
| 193 197 | 
             
                    it 'should not escape characters unnecessarily' do
         | 
| 198 | 
            +
                      skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
         | 
| 199 | 
            +
             | 
| 194 200 | 
             
                      # This uses Nokogumbo's HTML-compliant serializer rather than
         | 
| 195 201 | 
             
                      # libxml2's.
         | 
| 196 202 | 
             
                      @s.fragment(input).
         | 
    
        data/test/test_parser.rb
    CHANGED
    
    
    
        data/test/test_sanitize.rb
    CHANGED
    
    | @@ -150,7 +150,7 @@ describe 'Sanitize' do | |
| 150 150 | 
             
                    frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
         | 
| 151 151 | 
             
                  end
         | 
| 152 152 |  | 
| 153 | 
            -
                  describe "when the given node is a document and <html> isn't  | 
| 153 | 
            +
                  describe "when the given node is a document and <html> isn't allowlisted" do
         | 
| 154 154 | 
             
                    it 'should raise a Sanitize::Error' do
         | 
| 155 155 | 
             
                      doc = Nokogiri::HTML5.parse('foo')
         | 
| 156 156 | 
             
                      proc { @s.node!(doc) }.must_raise Sanitize::Error
         | 
    
        data/test/test_sanitize_css.rb
    CHANGED
    
    | @@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do | |
| 21 21 | 
             
                    @custom.properties(css).must_equal 'background: #fff; '
         | 
| 22 22 | 
             
                  end
         | 
| 23 23 |  | 
| 24 | 
            -
                  it 'should allow  | 
| 24 | 
            +
                  it 'should allow allowlisted URL protocols' do
         | 
| 25 25 | 
             
                    [
         | 
| 26 26 | 
             
                      "background: url(relative.jpg)",
         | 
| 27 27 | 
             
                      "background: url('relative.jpg')",
         | 
| @@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do | |
| 36 36 | 
             
                    end
         | 
| 37 37 | 
             
                  end
         | 
| 38 38 |  | 
| 39 | 
            -
                  it 'should not allow non- | 
| 39 | 
            +
                  it 'should not allow non-allowlisted URL protocols' do
         | 
| 40 40 | 
             
                    [
         | 
| 41 41 | 
             
                      "background: url(javascript:alert(0))",
         | 
| 42 42 | 
             
                      "background: url(ja\\56 ascript:alert(0))",
         | 
| @@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do | |
| 307 307 | 
             
                end
         | 
| 308 308 |  | 
| 309 309 | 
             
                describe ":at_rules" do
         | 
| 310 | 
            -
                  it "should remove blockless at-rules that aren't  | 
| 310 | 
            +
                  it "should remove blockless at-rules that aren't allowlisted" do
         | 
| 311 311 | 
             
                    css = %[
         | 
| 312 312 | 
             
                      @charset 'utf-8';
         | 
| 313 313 | 
             
                      @import url('foo.css');
         | 
| @@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do | |
| 319 319 | 
             
                    ].strip
         | 
| 320 320 | 
             
                  end
         | 
| 321 321 |  | 
| 322 | 
            -
                  describe "when blockless at-rules are  | 
| 322 | 
            +
                  describe "when blockless at-rules are allowlisted" do
         | 
| 323 323 | 
             
                    before do
         | 
| 324 324 | 
             
                      @scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
         | 
| 325 325 | 
             
                        :at_rules => ['charset', 'import']
         | 
    
        data/test/test_transformers.rb
    CHANGED
    
    | @@ -12,11 +12,13 @@ describe 'Transformers' do | |
| 12 12 | 
             
                    return unless env[:node].element?
         | 
| 13 13 |  | 
| 14 14 | 
             
                    env[:config][:foo].must_equal :bar
         | 
| 15 | 
            -
                    env[: | 
| 15 | 
            +
                    env[:is_allowlisted].must_equal false
         | 
| 16 | 
            +
                    env[:is_whitelisted].must_equal env[:is_allowlisted]
         | 
| 16 17 | 
             
                    env[:node].must_be_kind_of Nokogiri::XML::Node
         | 
| 17 18 | 
             
                    env[:node_name].must_equal 'span'
         | 
| 18 | 
            -
                    env[: | 
| 19 | 
            -
                    env[: | 
| 19 | 
            +
                    env[:node_allowlist].must_be_kind_of Set
         | 
| 20 | 
            +
                    env[:node_allowlist].must_be_empty
         | 
| 21 | 
            +
                    env[:node_whitelist].must_equal env[:node_allowlist]
         | 
| 20 22 | 
             
                  }
         | 
| 21 23 | 
             
                )
         | 
| 22 24 | 
             
              end
         | 
| @@ -43,34 +45,38 @@ describe 'Transformers' do | |
| 43 45 | 
             
                nodes.must_equal %w[div span strong b p]
         | 
| 44 46 | 
             
              end
         | 
| 45 47 |  | 
| 46 | 
            -
              it 'should  | 
| 48 | 
            +
              it 'should allowlist nodes in the node allowlist' do
         | 
| 47 49 | 
             
                Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
         | 
| 48 50 | 
             
                  :transformers => [
         | 
| 49 51 | 
             
                    proc {|env|
         | 
| 50 | 
            -
                      {: | 
| 52 | 
            +
                      {:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
         | 
| 51 53 | 
             
                    },
         | 
| 52 54 |  | 
| 53 55 | 
             
                    proc {|env|
         | 
| 54 | 
            -
                      env[: | 
| 55 | 
            -
                      env[: | 
| 56 | 
            -
                      env[: | 
| 56 | 
            +
                      env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
         | 
| 57 | 
            +
                      env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
         | 
| 58 | 
            +
                      env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
         | 
| 59 | 
            +
                      env[:is_whitelisted].must_equal env[:is_allowlisted]
         | 
| 60 | 
            +
                      env[:node_whitelist].must_equal env[:node_allowlist]
         | 
| 57 61 | 
             
                    }
         | 
| 58 62 | 
             
                  ]
         | 
| 59 63 | 
             
                ).must_equal '<div class="foo">foo</div>bar'
         | 
| 60 64 | 
             
              end
         | 
| 61 65 |  | 
| 62 | 
            -
              it 'should clear the node  | 
| 66 | 
            +
              it 'should clear the node allowlist after each fragment' do
         | 
| 63 67 | 
             
                called = false
         | 
| 64 68 |  | 
| 65 69 | 
             
                Sanitize.fragment('<div>foo</div>',
         | 
| 66 | 
            -
                  :transformers => proc {|env| {: | 
| 70 | 
            +
                  :transformers => proc {|env| {:node_allowlist => [env[:node]]}}
         | 
| 67 71 | 
             
                )
         | 
| 68 72 |  | 
| 69 73 | 
             
                Sanitize.fragment('<div>foo</div>',
         | 
| 70 74 | 
             
                  :transformers => proc {|env|
         | 
| 71 75 | 
             
                    called = true
         | 
| 72 | 
            -
                    env[: | 
| 73 | 
            -
                    env[: | 
| 76 | 
            +
                    env[:is_allowlisted].must_equal false
         | 
| 77 | 
            +
                    env[:is_whitelisted].must_equal env[:is_allowlisted]
         | 
| 78 | 
            +
                    env[:node_allowlist].must_be_empty
         | 
| 79 | 
            +
                    env[:node_whitelist].must_equal env[:node_allowlist]
         | 
| 74 80 | 
             
                  }
         | 
| 75 81 | 
             
                )
         | 
| 76 82 |  | 
| @@ -83,10 +89,10 @@ describe 'Transformers' do | |
| 83 89 | 
             
                  .must_equal(' foo ')
         | 
| 84 90 | 
             
              end
         | 
| 85 91 |  | 
| 86 | 
            -
              describe 'Image  | 
| 92 | 
            +
              describe 'Image allowlist transformer' do
         | 
| 87 93 | 
             
                require 'uri'
         | 
| 88 94 |  | 
| 89 | 
            -
                 | 
| 95 | 
            +
                image_allowlist_transformer = lambda do |env|
         | 
| 90 96 | 
             
                  # Ignore everything except <img> elements.
         | 
| 91 97 | 
             
                  return unless env[:node_name] == 'img'
         | 
| 92 98 |  | 
| @@ -103,7 +109,7 @@ describe 'Transformers' do | |
| 103 109 |  | 
| 104 110 | 
             
                before do
         | 
| 105 111 | 
             
                  @s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
         | 
| 106 | 
            -
                      :transformers =>  | 
| 112 | 
            +
                      :transformers => image_allowlist_transformer))
         | 
| 107 113 | 
             
                end
         | 
| 108 114 |  | 
| 109 115 | 
             
                it 'should allow images with relative URLs' do
         | 
| @@ -142,8 +148,8 @@ describe 'Transformers' do | |
| 142 148 | 
             
                  node      = env[:node]
         | 
| 143 149 | 
             
                  node_name = env[:node_name]
         | 
| 144 150 |  | 
| 145 | 
            -
                  # Don't continue if this node is already  | 
| 146 | 
            -
                  return if env[: | 
| 151 | 
            +
                  # Don't continue if this node is already allowlisted or is not an element.
         | 
| 152 | 
            +
                  return if env[:is_allowlisted] || !node.element?
         | 
| 147 153 |  | 
| 148 154 | 
             
                  # Don't continue unless the node is an iframe.
         | 
| 149 155 | 
             
                  return unless node_name == 'iframe'
         | 
| @@ -164,8 +170,8 @@ describe 'Transformers' do | |
| 164 170 |  | 
| 165 171 | 
             
                  # Now that we're sure that this is a valid YouTube embed and that there are
         | 
| 166 172 | 
             
                  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
         | 
| 167 | 
            -
                  # to  | 
| 168 | 
            -
                  {: | 
| 173 | 
            +
                  # to allowlist the current node.
         | 
| 174 | 
            +
                  {:node_allowlist => [node]}
         | 
| 169 175 | 
             
                end
         | 
| 170 176 |  | 
| 171 177 | 
             
                it 'should allow HTTP YouTube video embeds' do
         | 
    
        metadata
    CHANGED
    
    | @@ -1,14 +1,14 @@ | |
| 1 1 | 
             
            --- !ruby/object:Gem::Specification
         | 
| 2 2 | 
             
            name: sanitize
         | 
| 3 3 | 
             
            version: !ruby/object:Gem::Version
         | 
| 4 | 
            -
              version: 5. | 
| 4 | 
            +
              version: 5.2.0
         | 
| 5 5 | 
             
            platform: ruby
         | 
| 6 6 | 
             
            authors:
         | 
| 7 7 | 
             
            - Ryan Grove
         | 
| 8 8 | 
             
            autorequire: 
         | 
| 9 9 | 
             
            bindir: bin
         | 
| 10 10 | 
             
            cert_chain: []
         | 
| 11 | 
            -
            date:  | 
| 11 | 
            +
            date: 2020-06-06 00:00:00.000000000 Z
         | 
| 12 12 | 
             
            dependencies:
         | 
| 13 13 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 14 14 | 
             
              name: crass
         | 
| @@ -80,9 +80,9 @@ dependencies: | |
| 80 80 | 
             
                - - "~>"
         | 
| 81 81 | 
             
                  - !ruby/object:Gem::Version
         | 
| 82 82 | 
             
                    version: 12.3.1
         | 
| 83 | 
            -
            description: Sanitize is  | 
| 84 | 
            -
               | 
| 85 | 
            -
               | 
| 83 | 
            +
            description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
         | 
| 84 | 
            +
              HTML and/or CSS from a string except the elements, attributes, and properties you
         | 
| 85 | 
            +
              choose to allow.
         | 
| 86 86 | 
             
            email: ryan@wonko.com
         | 
| 87 87 | 
             
            executables: []
         | 
| 88 88 | 
             
            extensions: []
         | 
| @@ -138,5 +138,5 @@ requirements: [] | |
| 138 138 | 
             
            rubygems_version: 3.0.3
         | 
| 139 139 | 
             
            signing_key: 
         | 
| 140 140 | 
             
            specification_version: 4
         | 
| 141 | 
            -
            summary:  | 
| 141 | 
            +
            summary: Allowlist-based HTML and CSS sanitizer.
         | 
| 142 142 | 
             
            test_files: []
         |