RubyGems - sanitize - Versions diffs - 6.0.0 → 6.0.1 - Mend

sanitize 6.0.0 → 6.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sanitize might be problematic. Click here for more details.

Files changed (18) hide show

checksums.yaml +4 -4
data/HISTORY.md +52 -0
data/README.md +25 -19
data/lib/sanitize/config/default.rb +5 -0
data/lib/sanitize/transformers/clean_element.rb +45 -0
data/lib/sanitize/version.rb +1 -1
data/test/test_clean_comment.rb +16 -16
data/test/test_clean_css.rb +5 -5
data/test/test_clean_doctype.rb +15 -15
data/test/test_clean_element.rb +99 -92
data/test/test_config.rb +9 -9
data/test/test_malicious_css.rb +7 -7
data/test/test_malicious_html.rb +135 -31
data/test/test_parser.rb +8 -8
data/test/test_sanitize.rb +24 -24
data/test/test_sanitize_css.rb +53 -53
data/test/test_transformers.rb +37 -37
metadata +3 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
-  data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
+  metadata.gz: 819d713b2d4a78519e8bd4f2f853d6558d93ffd2d0481e10d012d8f74afbb555
+  data.tar.gz: 04a48476bf940cfffc12654e71d60a95fd93c0576b6bec6870c2defb5b72fa90
 SHA512:
-  metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
-  data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
+  metadata.gz: ed59ea47cc4a620ccf61be3443ef97036a877903bbc90fa855936e57446e34b92f5b9eb41ed9a026e17779fa473ce10d066986c1dd986c58381dae22bb7c9905
+  data.tar.gz: 27b40d2033ecd346c299bb77a7788b5325b79edd39c4767c9e5bf27486cf29bf2a5f3b34f96def645bbefd325b0e51a27182b75f187d2eb00931542769cd8c37

data/HISTORY.md CHANGED Viewed

@@ -1,5 +1,57 @@
 # Sanitize History
+## 6.0.1 (2023-01-27)
+### Bug Fixes
+* Sanitize now always removes `<noscript>` elements and their contents, even
+  when `noscript` is in the allowlist.
+  This fixes a sanitization bypass that could occur when `noscript` was allowed
+  by a custom allowlist. In this scenario, carefully crafted input could sneak
+  arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
+  scripting) attack.
+  Sanitize's default configs don't allow `<noscript>` elements and are not
+  vulnerable. This issue only affects users who are using a custom config that
+  adds `noscript` to the element allowlist.
+  The root cause of this issue is that HTML parsing rules treat the contents of
+  a `<noscript>` element differently depending on whether scripting is enabled
+  in the user agent. Nokogiri doesn't support scripting so it follows the
+  "scripting disabled" rules, but a web browser with scripting enabled will
+  follow the "scripting enabled" rules. This means that Sanitize can't reliably
+  make the contents of a `<noscript>` element safe for scripting enabled
+  browsers, so the safest thing to do is to remove the element and its contents
+  entirely.
+  See the following security advisory for additional details:
+  [GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
+  Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
+  (@leeN) for reporting this issue.
+* Fixed an edge case in which the contents of an "unescaped text" element (such
+  as `<noembed>` or `<xmp>`) were not properly escaped if that element was
+  allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
+  The only way to encounter this situation was to ignore multiple warnings in
+  the readme and create a custom config that allowlisted all the elements
+  involved, including `<math>` or `<svg>`. If you're using a default config or
+  if you heeded the warnings about MathML and SVG not being supported, you're
+  not affected by this issue.
+  Please let this be a reminder that Sanitize cannot safely sanitize MathML or
+  SVG content and does not support this use case. The default configs don't
+  allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
+  custom config may create a security vulnerability in your application.
+  Documentation has been updated to add more warnings and to make the existing
+  warnings about this more prominent.
+  Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
+  (@leeN) for reporting this issue.
 ## 6.0.0 (2021-08-03)
 ### Potentially Breaking Changes

data/README.md CHANGED Viewed

@@ -11,27 +11,26 @@ protocols within attributes that contain URLs. You can also allow specific CSS
 properties, @ rules, and URL protocols in elements or attributes containing CSS.
 Any HTML or CSS that you don't explicitly allow will be removed.
-Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
-exactly the same way modern browsers do, and [Crass][crass], which parses CSS
-exactly the same way modern browsers do. As long as your allowlist config only
-allows safe markup and CSS, even the most malformed or malicious input will be
-transformed into safe output.
+Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
+the same way modern browsers do, and [Crass][crass], which parses CSS the same
+way modern browsers do. As long as your allowlist config only allows safe markup
+and CSS, even the most malformed or malicious input will be transformed into
+safe output.
 [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
 [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
 [crass]:https://github.com/rgrove/crass
-[nokogumbo]:https://github.com/rubys/nokogumbo
+[nokogiri]:https://github.com/sparklemotion/nokogiri
 Links
 -----
 * [Home](https://github.com/rgrove/sanitize/)
-* [API Docs](http://rubydoc.info/github/rgrove/sanitize/master)
+* [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
 * [Issues](https://github.com/rgrove/sanitize/issues)
-* [Release History](https://github.com/rgrove/sanitize/blob/master/HISTORY.md#sanitize-history)
-* [Online Demo](https://sanitize.herokuapp.com/)
-* [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
+* [Release History](https://github.com/rgrove/sanitize/releases)
+* [Online Demo](https://sanitize-web.fly.dev/)
 Installation
 -------------
@@ -72,10 +71,11 @@ Sanitize can sanitize the following types of input:
 * Standalone CSS stylesheets
 * Standalone CSS properties
-However, please note that Sanitize _cannot_ fully sanitize the contents of
-`<math>` or `<svg>` elements, since these elements don't follow the same parsing
-rules as the rest of HTML. If this is something you need, you may want to look
-for another solution.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
 ### HTML Fragments
@@ -420,11 +420,17 @@ elements not in this array will be removed.
 ]
 ```
-**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
-elements, since these elements don't follow the same parsing rules as the rest
-of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
-content inside them will be allowed, even if that content would otherwise be
-removed by Sanitize.
+> **Warning**
+>
+> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
+>
+> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
+> **Note**
+>
+> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
+>
+> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
 #### :parser_options (Hash)

data/lib/sanitize/config/default.rb CHANGED Viewed

@@ -54,6 +54,11 @@ class Sanitize
       # HTML elements to allow. By default, no elements are allowed (which means
       # that all HTML will be stripped).
+      #
+      # Warning: Sanitize cannot safely sanitize the contents of foreign
+      # elements (elements in the MathML or SVG namespaces). Do not add `math`
+      # or `svg` to this list! If you do, you may create a security
+      # vulnerability in your application.
       :elements => [],
       # HTML parsing options to pass to Nokogumbo.

data/lib/sanitize/transformers/clean_element.rb CHANGED Viewed

@@ -1,5 +1,6 @@
 # encoding: utf-8
+require 'cgi'
 require 'set'
 class Sanitize; module Transformers; class CleanElement
@@ -18,6 +19,18 @@ class Sanitize; module Transformers; class CleanElement
   # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
   REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
+  # Elements whose content is treated as unescaped text by HTML parsers.
+  UNESCAPED_TEXT_ELEMENTS = Set.new(%w[
+    iframe
+    noembed
+    noframes
+    noscript
+    plaintext
+    script
+    style
+    xmp
+  ])
   # Attributes that need additional escaping on `<a>` elements due to unsafe
   # libxml2 behavior.
   UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
@@ -185,6 +198,28 @@ class Sanitize; module Transformers; class CleanElement
       @add_attributes[name].each {|key, val| node[key] = val }
     end
+    # Make a best effort to ensure that text nodes in invalid "unescaped text"
+    # elements that are inside a math or svg namespace are properly escaped so
+    # that they don't get parsed as HTML.
+    #
+    # Sanitize is explicitly documented as not supporting MathML or SVG, but
+    # people sometimes allow `<math>` and `<svg>` elements in their custom
+    # configs without realizing that it's not safe. This workaround makes it
+    # slightly less unsafe, but you still shouldn't allow `<math>` or `<svg>`
+    # because Nokogiri doesn't parse them the same way browsers do and Sanitize
+    # can't guarantee that their contents are safe.
+    unless node.namespace.nil?
+      prefix = node.namespace.prefix
+      if (prefix == 'math' || prefix == 'svg') && UNESCAPED_TEXT_ELEMENTS.include?(name)
+        node.children.each do |child|
+          if child.type == Nokogiri::XML::Node::TEXT_NODE
+            child.content = CGI.escapeHTML(child.content)
+          end
+        end
+      end
+    end
     # Element-specific special cases.
     case name
@@ -217,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement
         node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
       end
+    # A `<noscript>` element's content is parsed differently in browsers
+    # depending on whether or not scripting is enabled. Since Nokogiri doesn't
+    # support scripting, it always parses `<noscript>` elements as if scripting
+    # is disabled. This results in edge cases where it's not possible to
+    # reliably sanitize the contents of a `<noscript>` element because Nokogiri
+    # can't fully replicate the parsing behavior of a scripting-enabled browser.
+    # The safest thing to do is to simply remove all `<noscript>` elements.
+    when 'noscript'
+      node.unlink
     end
   end

data/lib/sanitize/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # encoding: utf-8
 class Sanitize
-  VERSION = '6.0.0'
+  VERSION = '6.0.1'
 end

data/test/test_clean_comment.rb CHANGED Viewed

@@ -11,18 +11,18 @@ describe 'Sanitize::Transformers::CleanComment' do
     end
     it 'should remove comments' do
-      @s.fragment('foo <!-- comment --> bar').must_equal 'foo  bar'
-      @s.fragment('foo <!-- ').must_equal 'foo '
-      @s.fragment('foo <!-- - -> bar').must_equal 'foo '
-      @s.fragment("foo <!--\n\n\n\n-->bar").must_equal 'foo bar'
-      @s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo  --&gt; --&gt;bar'
-      @s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>&gt;bar</div>'
+      _(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo  bar'
+      _(@s.fragment('foo <!-- ')).must_equal 'foo '
+      _(@s.fragment('foo <!-- - -> bar')).must_equal 'foo '
+      _(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal 'foo bar'
+      _(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo  --&gt; --&gt;bar'
+      _(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>&gt;bar</div>'
       # Special case: the comment markup is inside a <script>, which makes it
       # text content and not an actual HTML comment.
-      @s.fragment("<script><!-- comment --></script>").must_equal ''
+      _(@s.fragment("<script><!-- comment --></script>")).must_equal ''
-      Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script'])
+      _(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script']))
         .must_equal '<script><!-- comment --></script>'
     end
   end
@@ -33,14 +33,14 @@ describe 'Sanitize::Transformers::CleanComment' do
     end
     it 'should allow comments' do
-      @s.fragment('foo <!-- comment --> bar').must_equal 'foo <!-- comment --> bar'
-      @s.fragment('foo <!-- ').must_equal 'foo <!-- -->'
-      @s.fragment('foo <!-- - -> bar').must_equal 'foo <!-- - -> bar-->'
-      @s.fragment("foo <!--\n\n\n\n-->bar").must_equal "foo <!--\n\n\n\n-->bar"
-      @s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo <!-- <!-- <!-- --> --&gt; --&gt;bar'
-      @s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>&gt;bar</div>'
-      Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script'])
+      _(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo <!-- comment --> bar'
+      _(@s.fragment('foo <!-- ')).must_equal 'foo <!-- -->'
+      _(@s.fragment('foo <!-- - -> bar')).must_equal 'foo <!-- - -> bar-->'
+      _(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal "foo <!--\n\n\n\n-->bar"
+      _(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo <!-- <!-- <!-- --> --&gt; --&gt;bar'
+      _(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>&gt;bar</div>'
+      _(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script']))
         .must_equal '<script><!-- comment --></script>'
     end
   end

data/test/test_clean_css.rb CHANGED Viewed

@@ -10,15 +10,15 @@ describe 'Sanitize::Transformers::CSS::CleanAttribute' do
   end
   it 'should sanitize CSS properties in style attributes' do
-    @s.fragment(%[
+    _(@s.fragment(%[
       <div style="color: #fff; width: expression(alert(1)); /* <-- evil! */"></div>
-    ].strip).must_equal %[
+    ].strip)).must_equal %[
       <div style="color: #fff;  /* <-- evil! */"></div>
     ].strip
   end
   it 'should remove the style attribute if the sanitized CSS is empty' do
-    @s.fragment('<div style="width: expression(alert(1))"></div>').
+    _(@s.fragment('<div style="width: expression(alert(1))"></div>')).
       must_equal '<div></div>'
   end
 end
@@ -46,7 +46,7 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
       </style>
     ].strip
-    @s.fragment(html).must_equal %[
+    _(@s.fragment(html)).must_equal %[
       <style>
       /* Yay CSS! */
       .foo { color: #fff; }
@@ -62,6 +62,6 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
   end
   it 'should remove the <style> element if the sanitized CSS is empty' do
-    @s.fragment('<style></style>').must_equal ''
+    _(@s.fragment('<style></style>')).must_equal ''
   end
 end

data/test/test_clean_doctype.rb CHANGED Viewed

@@ -11,18 +11,18 @@ describe 'Sanitize::Transformers::CleanDoctype' do
     end
     it 'should remove doctype declarations' do
-      @s.document('<!DOCTYPE html><html>foo</html>').must_equal "<html>foo</html>"
-      @s.fragment('<!DOCTYPE html>foo').must_equal 'foo'
+      _(@s.document('<!DOCTYPE html><html>foo</html>')).must_equal "<html>foo</html>"
+      _(@s.fragment('<!DOCTYPE html>foo')).must_equal 'foo'
     end
     it 'should not allow doctype definitions in fragments' do
-      @s.fragment('<!DOCTYPE html><html>foo</html>')
+      _(@s.fragment('<!DOCTYPE html><html>foo</html>'))
         .must_equal "foo"
-      @s.fragment('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
+      _(@s.fragment('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>'))
         .must_equal "foo"
-      @s.fragment("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>")
+      _(@s.fragment("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>"))
         .must_equal "foo"
     end
   end
@@ -33,38 +33,38 @@ describe 'Sanitize::Transformers::CleanDoctype' do
     end
     it 'should allow doctype declarations in documents' do
-      @s.document('<!DOCTYPE html><html>foo</html>')
+      _(@s.document('<!DOCTYPE html><html>foo</html>'))
         .must_equal "<!DOCTYPE html><html>foo</html>"
-      @s.document('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
+      _(@s.document('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>'))
         .must_equal "<!DOCTYPE html><html>foo</html>"
-      @s.document("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>")
+      _(@s.document("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>"))
         .must_equal "<!DOCTYPE html><html>foo</html>"
     end
     it 'should not allow obviously invalid doctype declarations in documents' do
-      @s.document('<!DOCTYPE blah blah blah><html>foo</html>')
+      _(@s.document('<!DOCTYPE blah blah blah><html>foo</html>'))
         .must_equal "<!DOCTYPE html><html>foo</html>"
-      @s.document('<!DOCTYPE blah><html>foo</html>')
+      _(@s.document('<!DOCTYPE blah><html>foo</html>'))
         .must_equal "<!DOCTYPE html><html>foo</html>"
-      @s.document('<!DOCTYPE html BLAH "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
+      _(@s.document('<!DOCTYPE html BLAH "-//W3C//DTD HTML 4.01//EN"><html>foo</html>'))
         .must_equal "<!DOCTYPE html><html>foo</html>"
-      @s.document('<!whatever><html>foo</html>')
+      _(@s.document('<!whatever><html>foo</html>'))
         .must_equal "<html>foo</html>"
     end
     it 'should not allow doctype definitions in fragments' do
-      @s.fragment('<!DOCTYPE html><html>foo</html>')
+      _(@s.fragment('<!DOCTYPE html><html>foo</html>'))
         .must_equal "foo"
-      @s.fragment('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
+      _(@s.fragment('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>'))
         .must_equal "foo"
-      @s.fragment("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>")
+      _(@s.fragment("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n    \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>"))
         .must_equal "foo"
     end
   end