RubyGems - sanitize - Versions diffs - 5.1.0 → 5.2.1 - Mend

sanitize 5.1.0 → 5.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sanitize might be problematic. Click here for more details.

Files changed (18) hide show

checksums.yaml +4 -4
data/HISTORY.md +74 -18
data/README.md +47 -38
data/lib/sanitize.rb +15 -11
data/lib/sanitize/config/default.rb +1 -1
data/lib/sanitize/css.rb +2 -2
data/lib/sanitize/transformers/clean_comment.rb +1 -1
data/lib/sanitize/transformers/clean_css.rb +3 -3
data/lib/sanitize/transformers/clean_doctype.rb +1 -1
data/lib/sanitize/transformers/clean_element.rb +11 -11
data/lib/sanitize/version.rb +1 -1
data/test/test_clean_element.rb +24 -14
data/test/test_malicious_html.rb +20 -1
data/test/test_parser.rb +1 -1
data/test/test_sanitize.rb +1 -1
data/test/test_sanitize_css.rb +4 -4
data/test/test_transformers.rb +25 -19
metadata +7 -7

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8cf7bac25cea64ed464d106bdc57019388598ca9f1a4e7d8eddf3a98bab12267
-  data.tar.gz: e8b1f402b0d67a825b0ad4aad83829816fd9c78cd8445879636cba0a282e8ee5
+  metadata.gz: 3d1290690a9d32db9e06b8fb19c7e285c94a1d91ed51a4eb7e96389e427348d9
+  data.tar.gz: 5131063daf1763c83978954bed9ee3a783099e40aa71e50de26d06b8ae0c1054
 SHA512:
-  metadata.gz: 956edaca6569a5933223da0aa7dcac4880b5164aa59e37256ac896c9fefb271da71425defe7e09e241b1333b441f5a2629893abed6d5a2a47d0726bf03597614
-  data.tar.gz: e45a018b904bcf8cb996f8ed08427e80b8ce058c4fe414782460c5496e88bb6c2a4055304118057621a630e514b4f96bac11bdc686181a6f0097dc7bf912ab04
+  metadata.gz: bfcb7cda6aa70590f642583b41936bc09d8929210046cebdd0d0ff452ccb3213844b4c40d4e205e79c0cd64a2a0d56e16790e38f4c8f247b8abfa32dbec22297
+  data.tar.gz: 0ea5a6d6848f9a125f17e4e23145adff4d3c4ccfe30a3407466fae074ed33cbd4b1869eb5a9f0a72b808449b8cf166a3695c2a6d63b16a83b047fd260bfe50bd

data/HISTORY.md CHANGED Viewed

@@ -1,5 +1,61 @@
 # Sanitize History
+## 5.2.1 (2020-06-16)
+### Bug Fixes
+* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
+  Sanitize versions 3.0.0 through 5.2.0.
+  When HTML was sanitized using the "relaxed" config or a custom config that
+  allows certain elements, some content in a `<math>` or `<svg>` element may not
+  have beeen sanitized correctly even if `math` and `svg` were not in the
+  allowlist. This could allow carefully crafted input to sneak arbitrary HTML
+  through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
+  You are likely to be vulnerable to this issue if you use Sanitize's relaxed
+  config or a custom config that allows one or more of the following HTML
+  elements:
+    -   `iframe`
+    -   `math`
+    -   `noembed`
+    -   `noframes`
+    -   `noscript`
+    -   `plaintext`
+    -   `script`
+    -   `style`
+    -   `svg`
+    -   `xmp`
+  See the security advisory for more details, including a workaround if you're
+  not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
+  Many thanks to Michał Bentkowski of Securitum for reporting this issue and
+  helping to verify the fix.
+[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
+## 5.2.0 (2020-06-06)
+### Changes
+* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
+  source and documentation.
+  While the etymology of "whitelist" may not be explicitly racist in origin or
+  intent, there are inherent racial connotations in the implication that white
+  is good and black (as in "blacklist") is not.
+  This is a change I should have made long ago, and I apologize for not making
+  it sooner.
+* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
+  deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
+  The old keys will continue to work in order to avoid breaking existing code,
+  but they are no longer documented and may be removed in a future semver major
+  release.
 ## 5.1.0 (2019-09-07)
 ### Features
@@ -45,7 +101,7 @@ review the changes below carefully.
   - `script`
   - `style`
-* Children of whitelisted `iframe` elements are now always removed. In modern
+* Children of allowlisted `iframe` elements are now always removed. In modern
   HTML, `iframe` elements should never have children. In HTML 4 and earlier
   `iframe` elements were allowed to contain fallback content for legacy
   browsers, but it's been almost two decades since that was useful.
@@ -84,7 +140,7 @@ review the changes below carefully.
   When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
   specially crafted HTML fragment can cause libxml2 to generate improperly
-  escaped output, allowing non-whitelisted attributes to be used on whitelisted
+  escaped output, allowing non-allowlisted attributes to be used on allowlisted
   elements.
   Sanitize now performs additional escaping on affected attributes to prevent
@@ -128,7 +184,7 @@ review the changes below carefully.
 ## 4.4.0 (2016-09-29)
-* Added `srcset` to the attribute whitelist for `img` elements in the relaxed
+* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
   config. [@ejtttje - #156][156]
 [156]:https://github.com/rgrove/sanitize/pull/156
@@ -249,7 +305,7 @@ review the changes below carefully.
 ## 3.0.4 (2014-12-12)
 * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
-  caused the URL to be removed even when the protocol was whitelisted.
+  caused the URL to be removed even when the protocol was allowlisted.
   [@benubois - #126][126]
 [126]:https://github.com/rgrove/sanitize/pull/126
@@ -258,7 +314,7 @@ review the changes below carefully.
 ## 3.0.3 (2014-10-29)
 * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
-  `@media` block, causing them to be removed even when whitelist rules should
+  `@media` block, causing them to be removed even when allowlist rules should
   have allowed them to remain. [#121][121]
 [121]:https://github.com/rgrove/sanitize/issues/121
@@ -323,7 +379,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 * The `clean_node!` method was renamed to `node!`.
 * The `document` method now raises a `Sanitize::Error` if the `<html>` element
-  isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
+  isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
   regardless of the `:remove_contents` config setting.
 * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -334,7 +390,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 * Added advanced CSS sanitization support using [Crass][crass], which is fully
   compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
-  whitelisted `<style>` elements and `style` attributes in HTML will be
+  allowlisted `<style>` elements and `style` attributes in HTML will be
   sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
   sanitize CSS stylesheets or properties.
@@ -386,7 +442,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
   When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
   specially crafted HTML fragment can cause libxml2 to generate improperly
-  escaped output, allowing non-whitelisted attributes to be used on whitelisted
+  escaped output, allowing non-allowlisted attributes to be used on allowlisted
   elements.
   Sanitize now performs additional escaping on affected attributes to prevent
@@ -401,7 +457,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 ## 2.1.0 (2014-01-13)
-* Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
+* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
   symbol `:data` instead of an attribute name in the `:attributes` config to
   indicate that arbitrary data attributes should be allowed on an element.
@@ -482,12 +538,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
   the default depth-first mode.
 * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
-  elements to the whitelists for the basic and relaxed configs.
+  elements to the allowlists for the basic and relaxed configs.
 * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
-  `ruby`, and `wbr` elements to the whitelist for the relaxed config.
+  `ruby`, and `wbr` elements to the allowlist for the relaxed config.
-* The `dir`, `lang`, and `title` attributes are now whitelisted for all
+* The `dir`, `lang`, and `title` attributes are now allowlisted for all
   elements in the relaxed config.
 * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -498,7 +554,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 ## 1.2.1 (2010-04-20)
 * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
-  remove the contents of all non-whitelisted elements in addition to the
+  remove the contents of all non-allowlisted elements in addition to the
   elements themselves. If set to an array of element names, Sanitize will
   remove the contents of only those elements (when filtered), and leave the
   contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -526,7 +582,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
   all its children.
-* Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
+* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
   David Reese]
@@ -546,7 +602,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 * Added a workaround for an Hpricot bug that prevents attribute names from
   being downcased in recent versions of Hpricot. This was exploitable to
-  prevent non-whitelisted protocols from being cleaned. [Reported by Ben
+  prevent non-allowlisted protocols from being cleaned. [Reported by Ben
   Wanicur]
@@ -576,7 +632,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 ## 1.0.5 (2009-02-05)
-* Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
+* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
   protocols from being cleaned when relative URLs were allowed. [Reported by
   Dev Purkayastha]
@@ -586,7 +642,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 ## 1.0.4 (2009-01-16)
-* Fixed a bug that made it possible to sneak a non-whitelisted element through
+* Fixed a bug that made it possible to sneak a non-allowlisted element through
   by repeating it several times in a row. All versions of Sanitize prior to
   1.0.4 are vulnerable. [Reported by Cristobal]
@@ -594,7 +650,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
 ## 1.0.3 (2009-01-15)
 * Fixed a bug whereby incomplete Unicode or hex entities could be used to
-  prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
+  prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
   still decode the incomplete entities, users of those browsers may be
   vulnerable to malicious script injection on websites using versions of
   Sanitize prior to 1.0.3.

data/README.md CHANGED Viewed

@@ -1,20 +1,19 @@
 Sanitize
 ========
-Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of acceptable
-elements, attributes, and CSS properties, Sanitize will remove all unacceptable
-HTML and/or CSS from a string.
+Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
+and/or CSS from a string except the elements, attributes, and properties you
+choose to allow.
 Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
 elements, certain attributes within those elements, and even certain URL
-protocols within attributes that contain URLs. You can also whitelist CSS
-properties, @ rules, and URL protocols you wish to allow in elements or
-attributes containing CSS. Any HTML or CSS that you don't explicitly allow will
-be removed.
+protocols within attributes that contain URLs. You can also allow specific CSS
+properties, @ rules, and URL protocols in elements or attributes containing CSS.
+Any HTML or CSS that you don't explicitly allow will be removed.
 Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
 exactly the same way modern browsers do, and [Crass][crass], which parses CSS
-exactly the same way modern browsers do. As long as your whitelist config only
+exactly the same way modern browsers do. As long as your allowlist config only
 allows safe markup and CSS, even the most malformed or malicious input will be
 transformed into safe output.
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
 * Standalone CSS stylesheets
 * Standalone CSS properties
+However, please note that Sanitize _cannot_ fully sanitize the contents of
+`<math>` or `<svg>` elements, since these elements don't follow the same parsing
+rules as the rest of HTML. If this is something you need, you may want to look
+for another solution.
 ### HTML Fragments
 A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
 # => 'foo'
 ```
-To keep certain elements, add them to the element whitelist.
+To keep certain elements, add them to the element allowlist.
 ```ruby
 Sanitize.fragment(html, :elements => ['b'])
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
 ### HTML Documents
-When sanitizing a document, the `<html>` element must be whitelisted. You can
+When sanitizing a document, the `<html>` element must be allowlisted. You can
 also set `:allow_doctype` to `true` to allow well-formed document type
 definitions.
@@ -123,8 +127,8 @@ Sanitize.document(html,
 ### CSS in HTML
-To sanitize CSS in an HTML fragment or document, first whitelist the `<style>`
-element and/or the `style` attribute. Then whitelist the CSS properties,
+To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
+element and/or the `style` attribute. Then allowlist the CSS properties,
 @ rules, and URL protocols you wish to allow. You can also choose whether to
 allow CSS comments or browser compatibility hacks.
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
 ```ruby
 # Create a customized copy of the Basic config, adding <div> and <table> to the
-# existing whitelisted elements.
+# existing allowlisted elements.
 Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
   :elements        => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
   :remove_contents => true
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
 ##### :css => :properties (Array or Set)
-Whitelist of CSS property names to allow. Names should be specified in
-lowercase.
+List of CSS property names to allow. Names should be specified in lowercase.
 ##### :css => :protocols (Array or Set)
@@ -417,6 +420,12 @@ elements not in this array will be removed.
 ]
 ```
+**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
+elements, since these elements don't follow the same parsing rules as the rest
+of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
+content inside them will be allowed, even if that content would otherwise be
+removed by Sanitize.
 #### :parser_options (Hash)
 [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
@@ -452,7 +461,7 @@ include the symbol `:relative` in the protocol array:
 #### :remove_contents (boolean or Array or Set)
-If this is `true`, Sanitize will remove the contents of any non-whitelisted
+If this is `true`, Sanitize will remove the contents of any non-allowlisted
 elements in addition to the elements themselves. By default, Sanitize leaves the
 safe parts of an element's contents behind when the element is removed.
@@ -518,33 +527,33 @@ argument a Hash that contains the following items:
   * **:config** - The current Sanitize configuration Hash.
-  * **:is_whitelisted** - `true` if the current node has been whitelisted by a
+  * **:is_allowlisted** - `true` if the current node has been allowlisted by a
     previous transformer, `false` otherwise. It's generally bad form to remove
-    a node that a previous transformer has whitelisted.
+    a node that a previous transformer has allowlisted.
   * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
     node may be an element, a text node, a comment, a CDATA node, or a document
     fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
     selectively ignore node types you aren't interested in.
+  * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
+    document that have been allowlisted by previous transformers, if any. It's
+    generally bad form to remove a node that a previous transformer has
+    allowlisted.
   * **:node_name** - The name of the current HTML node, always lowercase (e.g.
     "div" or "span"). For non-element nodes, the name will be something like
     "text", "comment", "#cdata-section", "#document-fragment", etc.
-  * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
-    document that have been whitelisted by previous transformers, if any. It's
-    generally bad form to remove a node that a previous transformer has
-    whitelisted.
 ### Output
 A transformer doesn't have to return anything, but may optionally return a Hash,
 which may contain the following items:
-  * **:node_whitelist** -  Array or Set of specific Nokogiri::XML::Node objects
-    to add to the document's whitelist, bypassing the current Sanitize config.
-    These specific nodes and all their attributes will be whitelisted, but
-    their children will not be.
+  * **:node_allowlist** -  Array or Set of specific `Nokogiri::XML::Node`
+    objects to add to the document's allowlist, bypassing the current Sanitize
+    config. These specific nodes and all their attributes will be allowlisted,
+    but their children will not be.
 If a transformer returns anything other than a Hash, the return value will be
 ignored.
@@ -587,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
 completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
 your own hands.
-### Example: Transformer to whitelist image URLs by domain
+### Example: Transformer to allow image URLs by domain
 The following example demonstrates how to remove image elements unless they use
 a relative URL or are hosted on a specific domain. It assumes that the `<img>`
-element and its `src` attribute are already whitelisted.
+element and its `src` attribute are already allowlisted.
 ```ruby
 require 'uri'
-image_whitelist_transformer = lambda do |env|
+image_allowlist_transformer = lambda do |env|
   # Ignore everything except <img> elements.
   return unless env[:node_name] == 'img'
@@ -612,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
 end
 ```
-### Example: Transformer to whitelist YouTube video embeds
+### Example: Transformer to allow YouTube video embeds
 The following example demonstrates how to create a transformer that will safely
-whitelist valid YouTube video embeds without having to blindly allow other kinds
-of embedded content, which would be the case if you tried to do this by just
-whitelisting all `<iframe>` elements:
+allow valid YouTube video embeds without having to allow other kinds of embedded
+content, which would be the case if you tried to do this by just allowing all
+`<iframe>` elements:
 ```ruby
 youtube_transformer = lambda do |env|
   node      = env[:node]
   node_name = env[:node_name]
-  # Don't continue if this node is already whitelisted or is not an element.
-  return if env[:is_whitelisted] || !node.element?
+  # Don't continue if this node is already allowlisted or is not an element.
+  return if env[:is_allowlisted] || !node.element?
   # Don't continue unless the node is an iframe.
   return unless node_name == 'iframe'
@@ -646,8 +655,8 @@ youtube_transformer = lambda do |env|
   # Now that we're sure that this is a valid YouTube embed and that there are
   # no unwanted elements or attributes hidden inside it, we can tell Sanitize
-  # to whitelist the current node.
-  {:node_whitelist => [node]}
+  # to allowlist the current node.
+  {:node_allowlist => [node]}
 end
 html = %[

data/lib/sanitize.rb CHANGED Viewed

@@ -54,7 +54,7 @@ class Sanitize
   # Returns a sanitized copy of the given full _html_ document, using the
   # settings in _config_ if specified.
   #
-  # When sanitizing a document, the `<html>` element must be whitelisted or an
+  # When sanitizing a document, the `<html>` element must be allowlisted or an
   # error will be raised. If this is undesirable, you should probably use
   # {#fragment} instead.
   def self.document(html, config = {})
@@ -117,7 +117,7 @@ class Sanitize
   # Returns a sanitized copy of the given _html_ document.
   #
-  # When sanitizing a document, the `<html>` element must be whitelisted or an
+  # When sanitizing a document, the `<html>` element must be allowlisted or an
   # error will be raised. If this is undesirable, you should probably use
   # {#fragment} instead.
   def document(html)
@@ -147,20 +147,20 @@ class Sanitize
   # in place.
   #
   # If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
-  # whitelisted or an error will be raised.
+  # allowlisted or an error will be raised.
   def node!(node)
     raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
     if node.is_a?(Nokogiri::XML::Document)
       unless @config[:elements].include?('html')
-        raise Error, 'When sanitizing a document, "<html>" must be whitelisted.'
+        raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
       end
     end
-    node_whitelist = Set.new
+    node_allowlist = Set.new
     traverse(node) do |n|
-      transform_node!(n, node_whitelist)
+      transform_node!(n, node_allowlist)
     end
     node
@@ -189,7 +189,7 @@ class Sanitize
     node.to_html(preserve_newline: true)
   end
-  def transform_node!(node, node_whitelist)
+  def transform_node!(node, node_allowlist)
     @transformers.each do |transformer|
       # Since transform_node! may be called in a tight loop to process thousands
       # of items, we can optimize both memory and CPU performance by:
@@ -199,15 +199,19 @@ class Sanitize
       # does merge! create a new hash, it is also 2.6x slower:
       # https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
       config = @transformer_config
-      config[:is_whitelisted] = node_whitelist.include?(node)
+      config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
       config[:node] = node
       config[:node_name] = node.name.downcase
-      config[:node_whitelist] = node_whitelist
+      config[:node_allowlist] = config[:node_whitelist] = node_allowlist
       result = transformer.call(config)
-      if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
-        node_whitelist.merge(result[:node_whitelist])
+      if result.is_a?(Hash)
+        result_allowlist = result[:node_allowlist] || result[:node_whitelist]
+        if result_allowlist.respond_to?(:each)
+          node_allowlist.merge(result_allowlist)
+        end
       end
     end

data/lib/sanitize/config/default.rb CHANGED Viewed

@@ -74,7 +74,7 @@ class Sanitize
       # the specified elements (when filtered) will be removed, and the contents
       # of all other filtered elements will be left behind.
       :remove_contents => %w[
-        iframe noembed noframes noscript script style
+        iframe math noembed noframes noscript plaintext script style svg xmp
       ],
       # Transformers allow you to filter or alter nodes using custom logic. See

data/lib/sanitize/css.rb CHANGED Viewed

@@ -175,7 +175,7 @@ class Sanitize; class CSS
         next prop
       when :semicolon
-        # Only preserve the semicolon if it was preceded by a whitelisted
+        # Only preserve the semicolon if it was preceded by an allowlisted
         # property. Otherwise, omit it in order to prevent redundant semicolons.
         if preceded_by_property
           preceded_by_property = false
@@ -296,7 +296,7 @@ class Sanitize; class CSS
   end
   # Returns `true` if the given node (which may be of type `:url` or
-  # `:function`, since the CSS syntax can produce both) uses a whitelisted
+  # `:function`, since the CSS syntax can produce both) uses an allowlisted
   # protocol.
   def valid_url?(node)
     type = node[:node]

data/lib/sanitize/transformers/clean_comment.rb CHANGED Viewed

@@ -6,7 +6,7 @@ class Sanitize; module Transformers
     node = env[:node]
     if node.type == Nokogiri::XML::Node::COMMENT_NODE
-      node.unlink unless env[:is_whitelisted]
+      node.unlink unless env[:is_allowlisted]
     end
   end

data/lib/sanitize/transformers/clean_css.rb CHANGED Viewed

@@ -1,6 +1,6 @@
 class Sanitize; module Transformers; module CSS
-# Enforces a CSS whitelist on the contents of `style` attributes.
+# Enforces a CSS allowlist on the contents of `style` attributes.
 class CleanAttribute
   def initialize(sanitizer_or_config)
     if Sanitize::CSS === sanitizer_or_config
@@ -14,7 +14,7 @@ class CleanAttribute
     node = env[:node]
     return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
-        node.key?('style') && !env[:is_whitelisted]
+        node.key?('style') && !env[:is_allowlisted]
     attr = node.attribute('style')
     css  = @scss.properties(attr.value)
@@ -27,7 +27,7 @@ class CleanAttribute
   end
 end
-# Enforces a CSS whitelist on the contents of `<style>` elements.
+# Enforces a CSS allowlist on the contents of `<style>` elements.
 class CleanElement
   def initialize(sanitizer_or_config)
     if Sanitize::CSS === sanitizer_or_config

data/lib/sanitize/transformers/clean_doctype.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 class Sanitize; module Transformers
   CleanDoctype = lambda do |env|
-    return if env[:is_whitelisted]
+    return if env[:is_allowlisted]
     node = env[:node]

data/lib/sanitize/transformers/clean_element.rb CHANGED Viewed

@@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement
   def call(env)
     node = env[:node]
-    return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_whitelisted]
+    return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
     name = env[:node_name]
-    # Delete any element that isn't in the config whitelist, unless the node has
+    # Delete any element that isn't in the config allowlist, unless the node has
     # already been deleted from the document.
     #
     # It's important that we not try to reparent the children of a node that has
@@ -107,20 +107,20 @@ class Sanitize; module Transformers; class CleanElement
       return
     end
-    attr_whitelist = @attributes[name] || @attributes[:all]
+    attr_allowlist = @attributes[name] || @attributes[:all]
-    if attr_whitelist.nil?
-      # Delete all attributes from elements with no whitelisted attributes.
+    if attr_allowlist.nil?
+      # Delete all attributes from elements with no allowlisted attributes.
       node.attribute_nodes.each {|attr| attr.unlink }
     else
-      allow_data_attributes = attr_whitelist.include?(:data)
+      allow_data_attributes = attr_allowlist.include?(:data)
       # Delete any attribute that isn't allowed on this element.
       node.attribute_nodes.each do |attr|
         attr_name = attr.name.downcase
-        unless attr_whitelist.include?(attr_name)
-          # The attribute isn't whitelisted.
+        unless attr_allowlist.include?(attr_name)
+          # The attribute isn't allowed.
           if allow_data_attributes && attr_name.start_with?('data-')
             # Arbitrary data attributes are allowed. If this is a data
@@ -134,7 +134,7 @@ class Sanitize; module Transformers; class CleanElement
           next
         end
-        # The attribute is whitelisted.
+        # The attribute is allowed.
         # Remove any attributes that use unacceptable protocols.
         if @protocols.include?(name) && @protocols[name].include?(attr_name)
@@ -162,7 +162,7 @@ class Sanitize; module Transformers; class CleanElement
         # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
         # attempt to preserve server-side includes. This can result in XSS since
         # an unescaped double quote can allow an attacker to inject a
-        # non-whitelisted attribute.
+        # non-allowlisted attribute.
         #
         # Sanitize works around this by implementing its own escaping for
         # affected attributes, some of which can exist on any element and some
@@ -191,7 +191,7 @@ class Sanitize; module Transformers; class CleanElement
     # Element-specific special cases.
     case name
-    # If this is a whitelisted iframe that has children, remove all its
+    # If this is an allowlisted iframe that has children, remove all its
     # children. The HTML standard says iframes shouldn't have content, but when
     # they do, this content is parsed as text and is serialized verbatim without
     # being escaped, which is unsafe because legacy browsers may still render it

data/lib/sanitize/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # encoding: utf-8
 class Sanitize
-  VERSION = '5.1.0'
+  VERSION = '5.2.1'
 end

data/test/test_clean_element.rb CHANGED Viewed

@@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do
   }
   describe 'Default config' do
-    it 'should remove non-whitelisted elements, leaving safe contents behind' do
+    it 'should remove non-allowlisted elements, leaving safe contents behind' do
       Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
         .must_equal 'foo bar baz quux'
@@ -192,21 +192,16 @@ describe 'Sanitize::Transformers::CleanElement' do
         .must_equal ''
     end
-    it 'should escape the content of removed `plaintext` elements' do
-      Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
-        .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
-    end
-    it 'should escape the content of removed `xmp` elements' do
-      Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
-        .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
-    end
     it 'should not preserve the content of removed `iframe` elements' do
       Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
         .must_equal ''
     end
+    it 'should not preserve the content of removed `math` elements' do
+      Sanitize.fragment('<math>hello! <script>alert(0)</script></math>')
+        .must_equal ''
+    end
     it 'should not preserve the content of removed `noembed` elements' do
       Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
         .must_equal ''
@@ -222,6 +217,11 @@ describe 'Sanitize::Transformers::CleanElement' do
         .must_equal ''
     end
+    it 'should not preserve the content of removed `plaintext` elements' do
+      Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
+        .must_equal ''
+    end
     it 'should not preserve the content of removed `script` elements' do
       Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
         .must_equal ''
@@ -232,6 +232,16 @@ describe 'Sanitize::Transformers::CleanElement' do
         .must_equal ''
     end
+    it 'should not preserve the content of removed `svg` elements' do
+      Sanitize.fragment('<svg>hello! <script>alert(0)</script></svg>')
+        .must_equal ''
+    end
+    it 'should not preserve the content of removed `xmp` elements' do
+      Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
+        .must_equal ''
+    end
     strings.each do |name, data|
       it "should clean #{name} HTML" do
         Sanitize.fragment(data[:html]).must_equal(data[:default])
@@ -315,7 +325,7 @@ describe 'Sanitize::Transformers::CleanElement' do
   end
   describe 'Custom configs' do
-    it 'should allow attributes on all elements if whitelisted under :all' do
+    it 'should allow attributes on all elements if allowlisted under :all' do
       input = '<p class="foo">bar</p>'
       Sanitize.fragment(input).must_equal ' bar '
@@ -336,7 +346,7 @@ describe 'Sanitize::Transformers::CleanElement' do
       }).must_equal input
     end
-    it "should not allow relative URLs when relative URLs aren't whitelisted" do
+    it "should not allow relative URLs when relative URLs aren't allowlisted" do
       input = '<a href="/foo/bar">Link</a>'
       Sanitize.fragment(input,
@@ -400,7 +410,7 @@ describe 'Sanitize::Transformers::CleanElement' do
       ).must_equal 'foo bar  baz hi '
     end
-    it 'should remove the contents of whitelisted iframes' do
+    it 'should remove the contents of allowlisted iframes' do
       Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
         :elements => ['iframe']
       ).must_equal '<iframe></iframe>'

data/test/test_malicious_html.rb CHANGED Viewed

@@ -128,13 +128,15 @@ describe 'Malicious HTML' do
   # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
   # attempt to preserve server-side includes. This can result in XSS since an
-  # unescaped double quote can allow an attacker to inject a non-whitelisted
+  # unescaped double quote can allow an attacker to inject a non-allowlisted
   # attribute. Sanitize works around this by implementing its own escaping for
   # affected attributes.
   #
   # The relevant libxml2 code is here:
   # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
   describe 'unsafe libxml2 server-side includes in attributes' do
+    using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
     tag_configs = [
       {
         tag_name: 'a',
@@ -166,6 +168,8 @@ describe 'Malicious HTML' do
         input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
         it 'should escape unsafe characters in attributes' do
+          skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
           # This uses Nokogumbo's HTML-compliant serializer rather than
           # libxml2's.
           @s.fragment(input).
@@ -191,6 +195,8 @@ describe 'Malicious HTML' do
         input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
         it 'should not escape characters unnecessarily' do
+          skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
           # This uses Nokogumbo's HTML-compliant serializer rather than
           # libxml2's.
           @s.fragment(input).
@@ -213,4 +219,17 @@ describe 'Malicious HTML' do
       end
     end
   end
+  # https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
+  describe 'foreign content bypass in relaxed config' do
+    it 'prevents a sanitization bypass via carefully crafted foreign content' do
+      %w[iframe noembed noframes noscript plaintext script style xmp].each do |tag_name|
+        @s.fragment(%[<math><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
+          must_equal ''
+        @s.fragment(%[<svg><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
+          must_equal ''
+      end
+    end
+  end
 end

data/test/test_parser.rb CHANGED Viewed

@@ -55,7 +55,7 @@ describe 'Parser' do
             siblings << env[:node][:id]
           end
-          return {:node_whitelist => [env[:node]]}
+          return {:node_allowlist => [env[:node]]}
       })
       # All siblings should be traversed, and in the order added.

data/test/test_sanitize.rb CHANGED Viewed

@@ -150,7 +150,7 @@ describe 'Sanitize' do
         frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
       end
-      describe "when the given node is a document and <html> isn't whitelisted" do
+      describe "when the given node is a document and <html> isn't allowlisted" do
         it 'should raise a Sanitize::Error' do
           doc = Nokogiri::HTML5.parse('foo')
           proc { @s.node!(doc) }.must_raise Sanitize::Error

data/test/test_sanitize_css.rb CHANGED Viewed

@@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do
         @custom.properties(css).must_equal 'background: #fff; '
       end
-      it 'should allow whitelisted URL protocols' do
+      it 'should allow allowlisted URL protocols' do
         [
           "background: url(relative.jpg)",
           "background: url('relative.jpg')",
@@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do
         end
       end
-      it 'should not allow non-whitelisted URL protocols' do
+      it 'should not allow non-allowlisted URL protocols' do
         [
           "background: url(javascript:alert(0))",
           "background: url(ja\\56 ascript:alert(0))",
@@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do
     end
     describe ":at_rules" do
-      it "should remove blockless at-rules that aren't whitelisted" do
+      it "should remove blockless at-rules that aren't allowlisted" do
         css = %[
           @charset 'utf-8';
           @import url('foo.css');
@@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do
         ].strip
       end
-      describe "when blockless at-rules are whitelisted" do
+      describe "when blockless at-rules are allowlisted" do
         before do
           @scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
             :at_rules => ['charset', 'import']

data/test/test_transformers.rb CHANGED Viewed

@@ -12,11 +12,13 @@ describe 'Transformers' do
         return unless env[:node].element?
         env[:config][:foo].must_equal :bar
-        env[:is_whitelisted].must_equal false
+        env[:is_allowlisted].must_equal false
+        env[:is_whitelisted].must_equal env[:is_allowlisted]
         env[:node].must_be_kind_of Nokogiri::XML::Node
         env[:node_name].must_equal 'span'
-        env[:node_whitelist].must_be_kind_of Set
-        env[:node_whitelist].must_be_empty
+        env[:node_allowlist].must_be_kind_of Set
+        env[:node_allowlist].must_be_empty
+        env[:node_whitelist].must_equal env[:node_allowlist]
       }
     )
   end
@@ -43,34 +45,38 @@ describe 'Transformers' do
     nodes.must_equal %w[div span strong b p]
   end
-  it 'should whitelist nodes in the node whitelist' do
+  it 'should allowlist nodes in the node allowlist' do
     Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
       :transformers => [
         proc {|env|
-          {:node_whitelist => [env[:node]]} if env[:node_name] == 'div'
+          {:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
         },
         proc {|env|
-          env[:is_whitelisted].must_equal false unless env[:node_name] == 'div'
-          env[:is_whitelisted].must_equal true if env[:node_name] == 'div'
-          env[:node_whitelist].must_include env[:node] if env[:node_name] == 'div'
+          env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
+          env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
+          env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
+          env[:is_whitelisted].must_equal env[:is_allowlisted]
+          env[:node_whitelist].must_equal env[:node_allowlist]
         }
       ]
     ).must_equal '<div class="foo">foo</div>bar'
   end
-  it 'should clear the node whitelist after each fragment' do
+  it 'should clear the node allowlist after each fragment' do
     called = false
     Sanitize.fragment('<div>foo</div>',
-      :transformers => proc {|env| {:node_whitelist => [env[:node]]}}
+      :transformers => proc {|env| {:node_allowlist => [env[:node]]}}
     )
     Sanitize.fragment('<div>foo</div>',
       :transformers => proc {|env|
         called = true
-        env[:is_whitelisted].must_equal false
-        env[:node_whitelist].must_be_empty
+        env[:is_allowlisted].must_equal false
+        env[:is_whitelisted].must_equal env[:is_allowlisted]
+        env[:node_allowlist].must_be_empty
+        env[:node_whitelist].must_equal env[:node_allowlist]
       }
     )
@@ -83,10 +89,10 @@ describe 'Transformers' do
       .must_equal(' foo ')
   end
-  describe 'Image whitelist transformer' do
+  describe 'Image allowlist transformer' do
     require 'uri'
-    image_whitelist_transformer = lambda do |env|
+    image_allowlist_transformer = lambda do |env|
       # Ignore everything except <img> elements.
       return unless env[:node_name] == 'img'
@@ -103,7 +109,7 @@ describe 'Transformers' do
     before do
       @s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
-          :transformers => image_whitelist_transformer))
+          :transformers => image_allowlist_transformer))
     end
     it 'should allow images with relative URLs' do
@@ -142,8 +148,8 @@ describe 'Transformers' do
       node      = env[:node]
       node_name = env[:node_name]
-      # Don't continue if this node is already whitelisted or is not an element.
-      return if env[:is_whitelisted] || !node.element?
+      # Don't continue if this node is already allowlisted or is not an element.
+      return if env[:is_allowlisted] || !node.element?
       # Don't continue unless the node is an iframe.
       return unless node_name == 'iframe'
@@ -164,8 +170,8 @@ describe 'Transformers' do
       # Now that we're sure that this is a valid YouTube embed and that there are
       # no unwanted elements or attributes hidden inside it, we can tell Sanitize
-      # to whitelist the current node.
-      {:node_whitelist => [node]}
+      # to allowlist the current node.
+      {:node_allowlist => [node]}
     end
     it 'should allow HTTP YouTube video embeds' do

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: sanitize
 version: !ruby/object:Gem::Version
-  version: 5.1.0
+  version: 5.2.1
 platform: ruby
 authors:
 - Ryan Grove
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2019-09-08 00:00:00.000000000 Z
+date: 2020-06-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: crass
@@ -80,9 +80,9 @@ dependencies:
     - - "~>"
       - !ruby/object:Gem::Version
         version: 12.3.1
-description: Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of
-  acceptable elements, attributes, and CSS properties, Sanitize will remove all unacceptable
-  HTML and/or CSS from a string.
+description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
+  HTML and/or CSS from a string except the elements, attributes, and properties you
+  choose to allow.
 email: ryan@wonko.com
 executables: []
 extensions: []
@@ -135,8 +135,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: 1.2.0
 requirements: []
-rubygems_version: 3.0.3
+rubygems_version: 3.1.2
 signing_key:
 specification_version: 4
-summary: Whitelist-based HTML and CSS sanitizer.
+summary: Allowlist-based HTML and CSS sanitizer.
 test_files: []