sanitize 5.1.0 → 6.0.1

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8cf7bac25cea64ed464d106bdc57019388598ca9f1a4e7d8eddf3a98bab12267
4
- data.tar.gz: e8b1f402b0d67a825b0ad4aad83829816fd9c78cd8445879636cba0a282e8ee5
3
+ metadata.gz: 819d713b2d4a78519e8bd4f2f853d6558d93ffd2d0481e10d012d8f74afbb555
4
+ data.tar.gz: 04a48476bf940cfffc12654e71d60a95fd93c0576b6bec6870c2defb5b72fa90
5
5
  SHA512:
6
- metadata.gz: 956edaca6569a5933223da0aa7dcac4880b5164aa59e37256ac896c9fefb271da71425defe7e09e241b1333b441f5a2629893abed6d5a2a47d0726bf03597614
7
- data.tar.gz: e45a018b904bcf8cb996f8ed08427e80b8ce058c4fe414782460c5496e88bb6c2a4055304118057621a630e514b4f96bac11bdc686181a6f0097dc7bf912ab04
6
+ metadata.gz: ed59ea47cc4a620ccf61be3443ef97036a877903bbc90fa855936e57446e34b92f5b9eb41ed9a026e17779fa473ce10d066986c1dd986c58381dae22bb7c9905
7
+ data.tar.gz: 27b40d2033ecd346c299bb77a7788b5325b79edd39c4767c9e5bf27486cf29bf2a5f3b34f96def645bbefd325b0e51a27182b75f187d2eb00931542769cd8c37
data/HISTORY.md CHANGED
@@ -1,5 +1,142 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.0.1 (2023-01-27)
4
+
5
+ ### Bug Fixes
6
+
7
+ * Sanitize now always removes `<noscript>` elements and their contents, even
8
+ when `noscript` is in the allowlist.
9
+
10
+ This fixes a sanitization bypass that could occur when `noscript` was allowed
11
+ by a custom allowlist. In this scenario, carefully crafted input could sneak
12
+ arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
13
+ scripting) attack.
14
+
15
+ Sanitize's default configs don't allow `<noscript>` elements and are not
16
+ vulnerable. This issue only affects users who are using a custom config that
17
+ adds `noscript` to the element allowlist.
18
+
19
+ The root cause of this issue is that HTML parsing rules treat the contents of
20
+ a `<noscript>` element differently depending on whether scripting is enabled
21
+ in the user agent. Nokogiri doesn't support scripting so it follows the
22
+ "scripting disabled" rules, but a web browser with scripting enabled will
23
+ follow the "scripting enabled" rules. This means that Sanitize can't reliably
24
+ make the contents of a `<noscript>` element safe for scripting enabled
25
+ browsers, so the safest thing to do is to remove the element and its contents
26
+ entirely.
27
+
28
+ See the following security advisory for additional details:
29
+ [GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
30
+
31
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
32
+ (@leeN) for reporting this issue.
33
+
34
+ * Fixed an edge case in which the contents of an "unescaped text" element (such
35
+ as `<noembed>` or `<xmp>`) were not properly escaped if that element was
36
+ allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
37
+
38
+ The only way to encounter this situation was to ignore multiple warnings in
39
+ the readme and create a custom config that allowlisted all the elements
40
+ involved, including `<math>` or `<svg>`. If you're using a default config or
41
+ if you heeded the warnings about MathML and SVG not being supported, you're
42
+ not affected by this issue.
43
+
44
+ Please let this be a reminder that Sanitize cannot safely sanitize MathML or
45
+ SVG content and does not support this use case. The default configs don't
46
+ allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
47
+ custom config may create a security vulnerability in your application.
48
+
49
+ Documentation has been updated to add more warnings and to make the existing
50
+ warnings about this more prominent.
51
+
52
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
53
+ (@leeN) for reporting this issue.
54
+
55
+ ## 6.0.0 (2021-08-03)
56
+
57
+ ### Potentially Breaking Changes
58
+
59
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
60
+
61
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
62
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
63
+
64
+ [211]:https://github.com/rgrove/sanitize/pull/211
65
+
66
+ ## 5.2.3 (2021-01-11)
67
+
68
+ ### Bug Fixes
69
+
70
+ * Ensure protocol sanitization is applied to data attributes.
71
+ [@ccutrer - #207][207]
72
+
73
+ [207]:https://github.com/rgrove/sanitize/pull/207
74
+
75
+ ## 5.2.2 (2021-01-06)
76
+
77
+ ### Bug Fixes
78
+
79
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
80
+ custom transformer. [@mscrivo - #206][206]
81
+
82
+ [206]:https://github.com/rgrove/sanitize/pull/206
83
+
84
+ ## 5.2.1 (2020-06-16)
85
+
86
+ ### Bug Fixes
87
+
88
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
89
+ Sanitize versions 3.0.0 through 5.2.0.
90
+
91
+ When HTML was sanitized using the "relaxed" config or a custom config that
92
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
93
+ have beeen sanitized correctly even if `math` and `svg` were not in the
94
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
95
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
96
+
97
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
98
+ config or a custom config that allows one or more of the following HTML
99
+ elements:
100
+
101
+ - `iframe`
102
+ - `math`
103
+ - `noembed`
104
+ - `noframes`
105
+ - `noscript`
106
+ - `plaintext`
107
+ - `script`
108
+ - `style`
109
+ - `svg`
110
+ - `xmp`
111
+
112
+ See the security advisory for more details, including a workaround if you're
113
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
114
+
115
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
116
+ helping to verify the fix.
117
+
118
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
119
+
120
+ ## 5.2.0 (2020-06-06)
121
+
122
+ ### Changes
123
+
124
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
125
+ source and documentation.
126
+
127
+ While the etymology of "whitelist" may not be explicitly racist in origin or
128
+ intent, there are inherent racial connotations in the implication that white
129
+ is good and black (as in "blacklist") is not.
130
+
131
+ This is a change I should have made long ago, and I apologize for not making
132
+ it sooner.
133
+
134
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
135
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
136
+ The old keys will continue to work in order to avoid breaking existing code,
137
+ but they are no longer documented and may be removed in a future semver major
138
+ release.
139
+
3
140
  ## 5.1.0 (2019-09-07)
4
141
 
5
142
  ### Features
@@ -45,7 +182,7 @@ review the changes below carefully.
45
182
  - `script`
46
183
  - `style`
47
184
 
48
- * Children of whitelisted `iframe` elements are now always removed. In modern
185
+ * Children of allowlisted `iframe` elements are now always removed. In modern
49
186
  HTML, `iframe` elements should never have children. In HTML 4 and earlier
50
187
  `iframe` elements were allowed to contain fallback content for legacy
51
188
  browsers, but it's been almost two decades since that was useful.
@@ -84,7 +221,7 @@ review the changes below carefully.
84
221
 
85
222
  When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
86
223
  specially crafted HTML fragment can cause libxml2 to generate improperly
87
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
224
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
88
225
  elements.
89
226
 
90
227
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -128,7 +265,7 @@ review the changes below carefully.
128
265
 
129
266
  ## 4.4.0 (2016-09-29)
130
267
 
131
- * Added `srcset` to the attribute whitelist for `img` elements in the relaxed
268
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
132
269
  config. [@ejtttje - #156][156]
133
270
 
134
271
  [156]:https://github.com/rgrove/sanitize/pull/156
@@ -249,7 +386,7 @@ review the changes below carefully.
249
386
  ## 3.0.4 (2014-12-12)
250
387
 
251
388
  * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
252
- caused the URL to be removed even when the protocol was whitelisted.
389
+ caused the URL to be removed even when the protocol was allowlisted.
253
390
  [@benubois - #126][126]
254
391
 
255
392
  [126]:https://github.com/rgrove/sanitize/pull/126
@@ -258,7 +395,7 @@ review the changes below carefully.
258
395
  ## 3.0.3 (2014-10-29)
259
396
 
260
397
  * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
261
- `@media` block, causing them to be removed even when whitelist rules should
398
+ `@media` block, causing them to be removed even when allowlist rules should
262
399
  have allowed them to remain. [#121][121]
263
400
 
264
401
  [121]:https://github.com/rgrove/sanitize/issues/121
@@ -323,7 +460,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
323
460
  * The `clean_node!` method was renamed to `node!`.
324
461
 
325
462
  * The `document` method now raises a `Sanitize::Error` if the `<html>` element
326
- isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
463
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
327
464
  regardless of the `:remove_contents` config setting.
328
465
 
329
466
  * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -334,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
334
471
 
335
472
  * Added advanced CSS sanitization support using [Crass][crass], which is fully
336
473
  compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
337
- whitelisted `<style>` elements and `style` attributes in HTML will be
474
+ allowlisted `<style>` elements and `style` attributes in HTML will be
338
475
  sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
339
476
  sanitize CSS stylesheets or properties.
340
477
 
@@ -386,7 +523,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
386
523
 
387
524
  When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
388
525
  specially crafted HTML fragment can cause libxml2 to generate improperly
389
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
526
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
390
527
  elements.
391
528
 
392
529
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -401,7 +538,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
401
538
 
402
539
  ## 2.1.0 (2014-01-13)
403
540
 
404
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
541
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
405
542
  symbol `:data` instead of an attribute name in the `:attributes` config to
406
543
  indicate that arbitrary data attributes should be allowed on an element.
407
544
 
@@ -482,12 +619,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
482
619
  the default depth-first mode.
483
620
 
484
621
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
485
- elements to the whitelists for the basic and relaxed configs.
622
+ elements to the allowlists for the basic and relaxed configs.
486
623
 
487
624
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
488
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
625
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
489
626
 
490
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
627
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
491
628
  elements in the relaxed config.
492
629
 
493
630
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -498,7 +635,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
498
635
  ## 1.2.1 (2010-04-20)
499
636
 
500
637
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
501
- remove the contents of all non-whitelisted elements in addition to the
638
+ remove the contents of all non-allowlisted elements in addition to the
502
639
  elements themselves. If set to an array of element names, Sanitize will
503
640
  remove the contents of only those elements (when filtered), and leave the
504
641
  contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -526,7 +663,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
526
663
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
527
664
  all its children.
528
665
 
529
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
666
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
530
667
  David Reese]
531
668
 
532
669
 
@@ -546,7 +683,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
546
683
 
547
684
  * Added a workaround for an Hpricot bug that prevents attribute names from
548
685
  being downcased in recent versions of Hpricot. This was exploitable to
549
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
686
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
550
687
  Wanicur]
551
688
 
552
689
 
@@ -576,7 +713,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
576
713
 
577
714
  ## 1.0.5 (2009-02-05)
578
715
 
579
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
716
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
580
717
  protocols from being cleaned when relative URLs were allowed. [Reported by
581
718
  Dev Purkayastha]
582
719
 
@@ -586,7 +723,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
586
723
 
587
724
  ## 1.0.4 (2009-01-16)
588
725
 
589
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
726
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
590
727
  by repeating it several times in a row. All versions of Sanitize prior to
591
728
  1.0.4 are vulnerable. [Reported by Cristobal]
592
729
 
@@ -594,7 +731,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
594
731
  ## 1.0.3 (2009-01-15)
595
732
 
596
733
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
597
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
734
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
598
735
  still decode the incomplete entities, users of those browsers may be
599
736
  vulnerable to malicious script injection on websites using versions of
600
737
  Sanitize prior to 1.0.3.
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.md CHANGED
@@ -1,38 +1,36 @@
1
1
  Sanitize
2
2
  ========
3
3
 
4
- Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of acceptable
5
- elements, attributes, and CSS properties, Sanitize will remove all unacceptable
6
- HTML and/or CSS from a string.
4
+ Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
5
+ and/or CSS from a string except the elements, attributes, and properties you
6
+ choose to allow.
7
7
 
8
8
  Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
9
9
  elements, certain attributes within those elements, and even certain URL
10
- protocols within attributes that contain URLs. You can also whitelist CSS
11
- properties, @ rules, and URL protocols you wish to allow in elements or
12
- attributes containing CSS. Any HTML or CSS that you don't explicitly allow will
13
- be removed.
14
-
15
- Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
16
- exactly the same way modern browsers do, and [Crass][crass], which parses CSS
17
- exactly the same way modern browsers do. As long as your whitelist config only
18
- allows safe markup and CSS, even the most malformed or malicious input will be
19
- transformed into safe output.
20
-
21
- [![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
10
+ protocols within attributes that contain URLs. You can also allow specific CSS
11
+ properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
+ Any HTML or CSS that you don't explicitly allow will be removed.
13
+
14
+ Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
15
+ the same way modern browsers do, and [Crass][crass], which parses CSS the same
16
+ way modern browsers do. As long as your allowlist config only allows safe markup
17
+ and CSS, even the most malformed or malicious input will be transformed into
18
+ safe output.
19
+
22
20
  [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
21
+ [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
23
22
 
24
23
  [crass]:https://github.com/rgrove/crass
25
- [gumbo]:https://github.com/google/gumbo-parser
24
+ [nokogiri]:https://github.com/sparklemotion/nokogiri
26
25
 
27
26
  Links
28
27
  -----
29
28
 
30
29
  * [Home](https://github.com/rgrove/sanitize/)
31
- * [API Docs](http://rubydoc.info/github/rgrove/sanitize/master)
30
+ * [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
32
31
  * [Issues](https://github.com/rgrove/sanitize/issues)
33
- * [Release History](https://github.com/rgrove/sanitize/blob/master/HISTORY.md#sanitize-history)
34
- * [Online Demo](https://sanitize.herokuapp.com/)
35
- * [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
32
+ * [Release History](https://github.com/rgrove/sanitize/releases)
33
+ * [Online Demo](https://sanitize-web.fly.dev/)
36
34
 
37
35
  Installation
38
36
  -------------
@@ -73,6 +71,12 @@ Sanitize can sanitize the following types of input:
73
71
  * Standalone CSS stylesheets
74
72
  * Standalone CSS properties
75
73
 
74
+ > **Warning**
75
+ >
76
+ > Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
77
+ >
78
+ > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
79
+
76
80
  ### HTML Fragments
77
81
 
78
82
  A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
88
92
  # => 'foo'
89
93
  ```
90
94
 
91
- To keep certain elements, add them to the element whitelist.
95
+ To keep certain elements, add them to the element allowlist.
92
96
 
93
97
  ```ruby
94
98
  Sanitize.fragment(html, :elements => ['b'])
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
97
101
 
98
102
  ### HTML Documents
99
103
 
100
- When sanitizing a document, the `<html>` element must be whitelisted. You can
104
+ When sanitizing a document, the `<html>` element must be allowlisted. You can
101
105
  also set `:allow_doctype` to `true` to allow well-formed document type
102
106
  definitions.
103
107
 
@@ -123,8 +127,8 @@ Sanitize.document(html,
123
127
 
124
128
  ### CSS in HTML
125
129
 
126
- To sanitize CSS in an HTML fragment or document, first whitelist the `<style>`
127
- element and/or the `style` attribute. Then whitelist the CSS properties,
130
+ To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
131
+ element and/or the `style` attribute. Then allowlist the CSS properties,
128
132
  @ rules, and URL protocols you wish to allow. You can also choose whether to
129
133
  allow CSS comments or browser compatibility hacks.
130
134
 
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
267
271
 
268
272
  ```ruby
269
273
  # Create a customized copy of the Basic config, adding <div> and <table> to the
270
- # existing whitelisted elements.
274
+ # existing allowlisted elements.
271
275
  Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
276
  :elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
273
277
  :remove_contents => true
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
395
399
 
396
400
  ##### :css => :properties (Array or Set)
397
401
 
398
- Whitelist of CSS property names to allow. Names should be specified in
399
- lowercase.
402
+ List of CSS property names to allow. Names should be specified in lowercase.
400
403
 
401
404
  ##### :css => :protocols (Array or Set)
402
405
 
@@ -417,9 +420,21 @@ elements not in this array will be removed.
417
420
  ]
418
421
  ```
419
422
 
423
+ > **Warning**
424
+ >
425
+ > Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
426
+ >
427
+ > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
428
+
429
+ > **Note**
430
+ >
431
+ > Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
432
+ >
433
+ > This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
434
+
420
435
  #### :parser_options (Hash)
421
436
 
422
- [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
437
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
423
438
 
424
439
  ```ruby
425
440
  :parser_options => {
@@ -452,7 +467,7 @@ include the symbol `:relative` in the protocol array:
452
467
 
453
468
  #### :remove_contents (boolean or Array or Set)
454
469
 
455
- If this is `true`, Sanitize will remove the contents of any non-whitelisted
470
+ If this is `true`, Sanitize will remove the contents of any non-allowlisted
456
471
  elements in addition to the elements themselves. By default, Sanitize leaves the
457
472
  safe parts of an element's contents behind when the element is removed.
458
473
 
@@ -460,7 +475,7 @@ If this is an Array or Set of element names, then only the contents of the
460
475
  specified elements (when filtered) will be removed, and the contents of all
461
476
  other filtered elements will be left behind.
462
477
 
463
- The default value is `false`.
478
+ The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
464
479
 
465
480
  #### :transformers (Array or callable)
466
481
 
@@ -518,33 +533,33 @@ argument a Hash that contains the following items:
518
533
 
519
534
  * **:config** - The current Sanitize configuration Hash.
520
535
 
521
- * **:is_whitelisted** - `true` if the current node has been whitelisted by a
536
+ * **:is_allowlisted** - `true` if the current node has been allowlisted by a
522
537
  previous transformer, `false` otherwise. It's generally bad form to remove
523
- a node that a previous transformer has whitelisted.
538
+ a node that a previous transformer has allowlisted.
524
539
 
525
540
  * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
526
541
  node may be an element, a text node, a comment, a CDATA node, or a document
527
542
  fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
528
543
  selectively ignore node types you aren't interested in.
529
544
 
545
+ * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
546
+ document that have been allowlisted by previous transformers, if any. It's
547
+ generally bad form to remove a node that a previous transformer has
548
+ allowlisted.
549
+
530
550
  * **:node_name** - The name of the current HTML node, always lowercase (e.g.
531
551
  "div" or "span"). For non-element nodes, the name will be something like
532
552
  "text", "comment", "#cdata-section", "#document-fragment", etc.
533
553
 
534
- * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
535
- document that have been whitelisted by previous transformers, if any. It's
536
- generally bad form to remove a node that a previous transformer has
537
- whitelisted.
538
-
539
554
  ### Output
540
555
 
541
556
  A transformer doesn't have to return anything, but may optionally return a Hash,
542
557
  which may contain the following items:
543
558
 
544
- * **:node_whitelist** - Array or Set of specific Nokogiri::XML::Node objects
545
- to add to the document's whitelist, bypassing the current Sanitize config.
546
- These specific nodes and all their attributes will be whitelisted, but
547
- their children will not be.
559
+ * **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
560
+ objects to add to the document's allowlist, bypassing the current Sanitize
561
+ config. These specific nodes and all their attributes will be allowlisted,
562
+ but their children will not be.
548
563
 
549
564
  If a transformer returns anything other than a Hash, the return value will be
550
565
  ignored.
@@ -587,16 +602,16 @@ Transformers have a tremendous amount of power, including the power to
587
602
  completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
588
603
  your own hands.
589
604
 
590
- ### Example: Transformer to whitelist image URLs by domain
605
+ ### Example: Transformer to allow image URLs by domain
591
606
 
592
607
  The following example demonstrates how to remove image elements unless they use
593
608
  a relative URL or are hosted on a specific domain. It assumes that the `<img>`
594
- element and its `src` attribute are already whitelisted.
609
+ element and its `src` attribute are already allowlisted.
595
610
 
596
611
  ```ruby
597
612
  require 'uri'
598
613
 
599
- image_whitelist_transformer = lambda do |env|
614
+ image_allowlist_transformer = lambda do |env|
600
615
  # Ignore everything except <img> elements.
601
616
  return unless env[:node_name] == 'img'
602
617
 
@@ -612,20 +627,20 @@ image_whitelist_transformer = lambda do |env|
612
627
  end
613
628
  ```
614
629
 
615
- ### Example: Transformer to whitelist YouTube video embeds
630
+ ### Example: Transformer to allow YouTube video embeds
616
631
 
617
632
  The following example demonstrates how to create a transformer that will safely
618
- whitelist valid YouTube video embeds without having to blindly allow other kinds
619
- of embedded content, which would be the case if you tried to do this by just
620
- whitelisting all `<iframe>` elements:
633
+ allow valid YouTube video embeds without having to allow other kinds of embedded
634
+ content, which would be the case if you tried to do this by just allowing all
635
+ `<iframe>` elements:
621
636
 
622
637
  ```ruby
623
638
  youtube_transformer = lambda do |env|
624
639
  node = env[:node]
625
640
  node_name = env[:node_name]
626
641
 
627
- # Don't continue if this node is already whitelisted or is not an element.
628
- return if env[:is_whitelisted] || !node.element?
642
+ # Don't continue if this node is already allowlisted or is not an element.
643
+ return if env[:is_allowlisted] || !node.element?
629
644
 
630
645
  # Don't continue unless the node is an iframe.
631
646
  return unless node_name == 'iframe'
@@ -646,8 +661,8 @@ youtube_transformer = lambda do |env|
646
661
 
647
662
  # Now that we're sure that this is a valid YouTube embed and that there are
648
663
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
649
- # to whitelist the current node.
650
- {:node_whitelist => [node]}
664
+ # to allowlist the current node.
665
+ {:node_allowlist => [node]}
651
666
  end
652
667
 
653
668
  html = %[
@@ -658,25 +673,3 @@ html = %[
658
673
  Sanitize.fragment(html, :transformers => youtube_transformer)
659
674
  # => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
660
675
  ```
661
-
662
- License
663
- -------
664
-
665
- Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
666
-
667
- Permission is hereby granted, free of charge, to any person obtaining a copy of
668
- this software and associated documentation files (the 'Software'), to deal in
669
- the Software without restriction, including without limitation the rights to
670
- use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
671
- the Software, and to permit persons to whom the Software is furnished to do so,
672
- subject to the following conditions:
673
-
674
- The above copyright notice and this permission notice shall be included in all
675
- copies or substantial portions of the Software.
676
-
677
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
678
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
679
- FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
680
- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
681
- IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
682
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -54,6 +54,11 @@ class Sanitize
54
54
 
55
55
  # HTML elements to allow. By default, no elements are allowed (which means
56
56
  # that all HTML will be stripped).
57
+ #
58
+ # Warning: Sanitize cannot safely sanitize the contents of foreign
59
+ # elements (elements in the MathML or SVG namespaces). Do not add `math`
60
+ # or `svg` to this list! If you do, you may create a security
61
+ # vulnerability in your application.
57
62
  :elements => [],
58
63
 
59
64
  # HTML parsing options to pass to Nokogumbo.
@@ -74,7 +79,7 @@ class Sanitize
74
79
  # the specified elements (when filtered) will be removed, and the contents
75
80
  # of all other filtered elements will be left behind.
76
81
  :remove_contents => %w[
77
- iframe noembed noframes noscript script style
82
+ iframe math noembed noframes noscript plaintext script style svg xmp
78
83
  ],
79
84
 
80
85
  # Transformers allow you to filter or alter nodes using custom logic. See
@@ -6,7 +6,7 @@ class Sanitize
6
6
  :elements => BASIC[:elements] + %w[
7
7
  address article aside bdi bdo body caption col colgroup data del div
8
8
  figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
9
- img ins main nav rp rt ruby section span style summary sup table tbody
9
+ img ins main nav rp rt ruby section span style summary table tbody
10
10
  td tfoot th thead title tr wbr
11
11
  ],
12
12
 
data/lib/sanitize/css.rb CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
175
175
  next prop
176
176
 
177
177
  when :semicolon
178
- # Only preserve the semicolon if it was preceded by a whitelisted
178
+ # Only preserve the semicolon if it was preceded by an allowlisted
179
179
  # property. Otherwise, omit it in order to prevent redundant semicolons.
180
180
  if preceded_by_property
181
181
  preceded_by_property = false
@@ -296,7 +296,7 @@ class Sanitize; class CSS
296
296
  end
297
297
 
298
298
  # Returns `true` if the given node (which may be of type `:url` or
299
- # `:function`, since the CSS syntax can produce both) uses a whitelisted
299
+ # `:function`, since the CSS syntax can produce both) uses an allowlisted
300
300
  # protocol.
301
301
  def valid_url?(node)
302
302
  type = node[:node]
@@ -6,7 +6,7 @@ class Sanitize; module Transformers
6
6
  node = env[:node]
7
7
 
8
8
  if node.type == Nokogiri::XML::Node::COMMENT_NODE
9
- node.unlink unless env[:is_whitelisted]
9
+ node.unlink unless env[:is_allowlisted]
10
10
  end
11
11
  end
12
12