sanitize 5.1.0 → 6.0.1
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +155 -18
- data/LICENSE +1 -1
- data/README.md +67 -74
- data/lib/sanitize/config/default.rb +6 -1
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +62 -20
- data/lib/sanitize/version.rb +1 -1
- data/lib/sanitize.rb +17 -13
- data/test/test_clean_comment.rb +16 -16
- data/test/test_clean_css.rb +5 -5
- data/test/test_clean_doctype.rb +15 -15
- data/test/test_clean_element.rb +130 -97
- data/test/test_config.rb +9 -9
- data/test/test_malicious_css.rb +7 -7
- data/test/test_malicious_html.rb +153 -30
- data/test/test_parser.rb +9 -9
- data/test/test_sanitize.rb +29 -29
- data/test/test_sanitize_css.rb +57 -57
- data/test/test_transformers.rb +48 -42
- metadata +17 -31
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 819d713b2d4a78519e8bd4f2f853d6558d93ffd2d0481e10d012d8f74afbb555
|
4
|
+
data.tar.gz: 04a48476bf940cfffc12654e71d60a95fd93c0576b6bec6870c2defb5b72fa90
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ed59ea47cc4a620ccf61be3443ef97036a877903bbc90fa855936e57446e34b92f5b9eb41ed9a026e17779fa473ce10d066986c1dd986c58381dae22bb7c9905
|
7
|
+
data.tar.gz: 27b40d2033ecd346c299bb77a7788b5325b79edd39c4767c9e5bf27486cf29bf2a5f3b34f96def645bbefd325b0e51a27182b75f187d2eb00931542769cd8c37
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,142 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 6.0.1 (2023-01-27)
|
4
|
+
|
5
|
+
### Bug Fixes
|
6
|
+
|
7
|
+
* Sanitize now always removes `<noscript>` elements and their contents, even
|
8
|
+
when `noscript` is in the allowlist.
|
9
|
+
|
10
|
+
This fixes a sanitization bypass that could occur when `noscript` was allowed
|
11
|
+
by a custom allowlist. In this scenario, carefully crafted input could sneak
|
12
|
+
arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
|
13
|
+
scripting) attack.
|
14
|
+
|
15
|
+
Sanitize's default configs don't allow `<noscript>` elements and are not
|
16
|
+
vulnerable. This issue only affects users who are using a custom config that
|
17
|
+
adds `noscript` to the element allowlist.
|
18
|
+
|
19
|
+
The root cause of this issue is that HTML parsing rules treat the contents of
|
20
|
+
a `<noscript>` element differently depending on whether scripting is enabled
|
21
|
+
in the user agent. Nokogiri doesn't support scripting so it follows the
|
22
|
+
"scripting disabled" rules, but a web browser with scripting enabled will
|
23
|
+
follow the "scripting enabled" rules. This means that Sanitize can't reliably
|
24
|
+
make the contents of a `<noscript>` element safe for scripting enabled
|
25
|
+
browsers, so the safest thing to do is to remove the element and its contents
|
26
|
+
entirely.
|
27
|
+
|
28
|
+
See the following security advisory for additional details:
|
29
|
+
[GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
|
30
|
+
|
31
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
32
|
+
(@leeN) for reporting this issue.
|
33
|
+
|
34
|
+
* Fixed an edge case in which the contents of an "unescaped text" element (such
|
35
|
+
as `<noembed>` or `<xmp>`) were not properly escaped if that element was
|
36
|
+
allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
|
37
|
+
|
38
|
+
The only way to encounter this situation was to ignore multiple warnings in
|
39
|
+
the readme and create a custom config that allowlisted all the elements
|
40
|
+
involved, including `<math>` or `<svg>`. If you're using a default config or
|
41
|
+
if you heeded the warnings about MathML and SVG not being supported, you're
|
42
|
+
not affected by this issue.
|
43
|
+
|
44
|
+
Please let this be a reminder that Sanitize cannot safely sanitize MathML or
|
45
|
+
SVG content and does not support this use case. The default configs don't
|
46
|
+
allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
|
47
|
+
custom config may create a security vulnerability in your application.
|
48
|
+
|
49
|
+
Documentation has been updated to add more warnings and to make the existing
|
50
|
+
warnings about this more prominent.
|
51
|
+
|
52
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
53
|
+
(@leeN) for reporting this issue.
|
54
|
+
|
55
|
+
## 6.0.0 (2021-08-03)
|
56
|
+
|
57
|
+
### Potentially Breaking Changes
|
58
|
+
|
59
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
60
|
+
|
61
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
62
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
63
|
+
|
64
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
65
|
+
|
66
|
+
## 5.2.3 (2021-01-11)
|
67
|
+
|
68
|
+
### Bug Fixes
|
69
|
+
|
70
|
+
* Ensure protocol sanitization is applied to data attributes.
|
71
|
+
[@ccutrer - #207][207]
|
72
|
+
|
73
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
74
|
+
|
75
|
+
## 5.2.2 (2021-01-06)
|
76
|
+
|
77
|
+
### Bug Fixes
|
78
|
+
|
79
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
80
|
+
custom transformer. [@mscrivo - #206][206]
|
81
|
+
|
82
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
83
|
+
|
84
|
+
## 5.2.1 (2020-06-16)
|
85
|
+
|
86
|
+
### Bug Fixes
|
87
|
+
|
88
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
89
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
90
|
+
|
91
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
92
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
93
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
94
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
95
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
96
|
+
|
97
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
98
|
+
config or a custom config that allows one or more of the following HTML
|
99
|
+
elements:
|
100
|
+
|
101
|
+
- `iframe`
|
102
|
+
- `math`
|
103
|
+
- `noembed`
|
104
|
+
- `noframes`
|
105
|
+
- `noscript`
|
106
|
+
- `plaintext`
|
107
|
+
- `script`
|
108
|
+
- `style`
|
109
|
+
- `svg`
|
110
|
+
- `xmp`
|
111
|
+
|
112
|
+
See the security advisory for more details, including a workaround if you're
|
113
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
114
|
+
|
115
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
116
|
+
helping to verify the fix.
|
117
|
+
|
118
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
119
|
+
|
120
|
+
## 5.2.0 (2020-06-06)
|
121
|
+
|
122
|
+
### Changes
|
123
|
+
|
124
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
125
|
+
source and documentation.
|
126
|
+
|
127
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
128
|
+
intent, there are inherent racial connotations in the implication that white
|
129
|
+
is good and black (as in "blacklist") is not.
|
130
|
+
|
131
|
+
This is a change I should have made long ago, and I apologize for not making
|
132
|
+
it sooner.
|
133
|
+
|
134
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
135
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
136
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
137
|
+
but they are no longer documented and may be removed in a future semver major
|
138
|
+
release.
|
139
|
+
|
3
140
|
## 5.1.0 (2019-09-07)
|
4
141
|
|
5
142
|
### Features
|
@@ -45,7 +182,7 @@ review the changes below carefully.
|
|
45
182
|
- `script`
|
46
183
|
- `style`
|
47
184
|
|
48
|
-
* Children of
|
185
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
49
186
|
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
50
187
|
`iframe` elements were allowed to contain fallback content for legacy
|
51
188
|
browsers, but it's been almost two decades since that was useful.
|
@@ -84,7 +221,7 @@ review the changes below carefully.
|
|
84
221
|
|
85
222
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
86
223
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
87
|
-
escaped output, allowing non-
|
224
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
88
225
|
elements.
|
89
226
|
|
90
227
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -128,7 +265,7 @@ review the changes below carefully.
|
|
128
265
|
|
129
266
|
## 4.4.0 (2016-09-29)
|
130
267
|
|
131
|
-
* Added `srcset` to the attribute
|
268
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
132
269
|
config. [@ejtttje - #156][156]
|
133
270
|
|
134
271
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -249,7 +386,7 @@ review the changes below carefully.
|
|
249
386
|
## 3.0.4 (2014-12-12)
|
250
387
|
|
251
388
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
252
|
-
caused the URL to be removed even when the protocol was
|
389
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
253
390
|
[@benubois - #126][126]
|
254
391
|
|
255
392
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -258,7 +395,7 @@ review the changes below carefully.
|
|
258
395
|
## 3.0.3 (2014-10-29)
|
259
396
|
|
260
397
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
261
|
-
`@media` block, causing them to be removed even when
|
398
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
262
399
|
have allowed them to remain. [#121][121]
|
263
400
|
|
264
401
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -323,7 +460,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
323
460
|
* The `clean_node!` method was renamed to `node!`.
|
324
461
|
|
325
462
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
326
|
-
isn't
|
463
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
327
464
|
regardless of the `:remove_contents` config setting.
|
328
465
|
|
329
466
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -334,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
334
471
|
|
335
472
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
336
473
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
337
|
-
|
474
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
338
475
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
339
476
|
sanitize CSS stylesheets or properties.
|
340
477
|
|
@@ -386,7 +523,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
386
523
|
|
387
524
|
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
388
525
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
389
|
-
escaped output, allowing non-
|
526
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
390
527
|
elements.
|
391
528
|
|
392
529
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -401,7 +538,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
401
538
|
|
402
539
|
## 2.1.0 (2014-01-13)
|
403
540
|
|
404
|
-
* Added support for
|
541
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
405
542
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
406
543
|
indicate that arbitrary data attributes should be allowed on an element.
|
407
544
|
|
@@ -482,12 +619,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
482
619
|
the default depth-first mode.
|
483
620
|
|
484
621
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
485
|
-
elements to the
|
622
|
+
elements to the allowlists for the basic and relaxed configs.
|
486
623
|
|
487
624
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
488
|
-
`ruby`, and `wbr` elements to the
|
625
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
489
626
|
|
490
|
-
* The `dir`, `lang`, and `title` attributes are now
|
627
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
491
628
|
elements in the relaxed config.
|
492
629
|
|
493
630
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -498,7 +635,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
498
635
|
## 1.2.1 (2010-04-20)
|
499
636
|
|
500
637
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
501
|
-
remove the contents of all non-
|
638
|
+
remove the contents of all non-allowlisted elements in addition to the
|
502
639
|
elements themselves. If set to an array of element names, Sanitize will
|
503
640
|
remove the contents of only those elements (when filtered), and leave the
|
504
641
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -526,7 +663,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
526
663
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
527
664
|
all its children.
|
528
665
|
|
529
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
666
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
530
667
|
David Reese]
|
531
668
|
|
532
669
|
|
@@ -546,7 +683,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
546
683
|
|
547
684
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
548
685
|
being downcased in recent versions of Hpricot. This was exploitable to
|
549
|
-
prevent non-
|
686
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
550
687
|
Wanicur]
|
551
688
|
|
552
689
|
|
@@ -576,7 +713,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
576
713
|
|
577
714
|
## 1.0.5 (2009-02-05)
|
578
715
|
|
579
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
716
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
580
717
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
581
718
|
Dev Purkayastha]
|
582
719
|
|
@@ -586,7 +723,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
586
723
|
|
587
724
|
## 1.0.4 (2009-01-16)
|
588
725
|
|
589
|
-
* Fixed a bug that made it possible to sneak a non-
|
726
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
590
727
|
by repeating it several times in a row. All versions of Sanitize prior to
|
591
728
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
592
729
|
|
@@ -594,7 +731,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
594
731
|
## 1.0.3 (2009-01-15)
|
595
732
|
|
596
733
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
597
|
-
prevent non-
|
734
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
598
735
|
still decode the incomplete entities, users of those browsers may be
|
599
736
|
vulnerable to malicious script injection on websites using versions of
|
600
737
|
Sanitize prior to 1.0.3.
|
data/LICENSE
CHANGED
data/README.md
CHANGED
@@ -1,38 +1,36 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
[![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
13
|
+
|
14
|
+
Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
|
15
|
+
the same way modern browsers do, and [Crass][crass], which parses CSS the same
|
16
|
+
way modern browsers do. As long as your allowlist config only allows safe markup
|
17
|
+
and CSS, even the most malformed or malicious input will be transformed into
|
18
|
+
safe output.
|
19
|
+
|
22
20
|
[![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
|
21
|
+
[![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
|
23
22
|
|
24
23
|
[crass]:https://github.com/rgrove/crass
|
25
|
-
[
|
24
|
+
[nokogiri]:https://github.com/sparklemotion/nokogiri
|
26
25
|
|
27
26
|
Links
|
28
27
|
-----
|
29
28
|
|
30
29
|
* [Home](https://github.com/rgrove/sanitize/)
|
31
|
-
* [API Docs](
|
30
|
+
* [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
|
32
31
|
* [Issues](https://github.com/rgrove/sanitize/issues)
|
33
|
-
* [Release History](https://github.com/rgrove/sanitize/
|
34
|
-
* [Online Demo](https://sanitize.
|
35
|
-
* [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
|
32
|
+
* [Release History](https://github.com/rgrove/sanitize/releases)
|
33
|
+
* [Online Demo](https://sanitize-web.fly.dev/)
|
36
34
|
|
37
35
|
Installation
|
38
36
|
-------------
|
@@ -73,6 +71,12 @@ Sanitize can sanitize the following types of input:
|
|
73
71
|
* Standalone CSS stylesheets
|
74
72
|
* Standalone CSS properties
|
75
73
|
|
74
|
+
> **Warning**
|
75
|
+
>
|
76
|
+
> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
|
77
|
+
>
|
78
|
+
> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
|
79
|
+
|
76
80
|
### HTML Fragments
|
77
81
|
|
78
82
|
A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
|
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
|
|
88
92
|
# => 'foo'
|
89
93
|
```
|
90
94
|
|
91
|
-
To keep certain elements, add them to the element
|
95
|
+
To keep certain elements, add them to the element allowlist.
|
92
96
|
|
93
97
|
```ruby
|
94
98
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
101
|
|
98
102
|
### HTML Documents
|
99
103
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
104
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
105
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
106
|
definitions.
|
103
107
|
|
@@ -123,8 +127,8 @@ Sanitize.document(html,
|
|
123
127
|
|
124
128
|
### CSS in HTML
|
125
129
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
130
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
131
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
132
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
133
|
allow CSS comments or browser compatibility hacks.
|
130
134
|
|
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
271
|
|
268
272
|
```ruby
|
269
273
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
274
|
+
# existing allowlisted elements.
|
271
275
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
276
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
277
|
:remove_contents => true
|
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
399
|
|
396
400
|
##### :css => :properties (Array or Set)
|
397
401
|
|
398
|
-
|
399
|
-
lowercase.
|
402
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
403
|
|
401
404
|
##### :css => :protocols (Array or Set)
|
402
405
|
|
@@ -417,9 +420,21 @@ elements not in this array will be removed.
|
|
417
420
|
]
|
418
421
|
```
|
419
422
|
|
423
|
+
> **Warning**
|
424
|
+
>
|
425
|
+
> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
|
426
|
+
>
|
427
|
+
> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
|
428
|
+
|
429
|
+
> **Note**
|
430
|
+
>
|
431
|
+
> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
|
432
|
+
>
|
433
|
+
> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
|
434
|
+
|
420
435
|
#### :parser_options (Hash)
|
421
436
|
|
422
|
-
[Parsing options](https://github.com/rubys/nokogumbo/tree/
|
437
|
+
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
|
423
438
|
|
424
439
|
```ruby
|
425
440
|
:parser_options => {
|
@@ -452,7 +467,7 @@ include the symbol `:relative` in the protocol array:
|
|
452
467
|
|
453
468
|
#### :remove_contents (boolean or Array or Set)
|
454
469
|
|
455
|
-
If this is `true`, Sanitize will remove the contents of any non-
|
470
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
456
471
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
457
472
|
safe parts of an element's contents behind when the element is removed.
|
458
473
|
|
@@ -460,7 +475,7 @@ If this is an Array or Set of element names, then only the contents of the
|
|
460
475
|
specified elements (when filtered) will be removed, and the contents of all
|
461
476
|
other filtered elements will be left behind.
|
462
477
|
|
463
|
-
The default value is
|
478
|
+
The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
|
464
479
|
|
465
480
|
#### :transformers (Array or callable)
|
466
481
|
|
@@ -518,33 +533,33 @@ argument a Hash that contains the following items:
|
|
518
533
|
|
519
534
|
* **:config** - The current Sanitize configuration Hash.
|
520
535
|
|
521
|
-
* **:
|
536
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
522
537
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
523
|
-
a node that a previous transformer has
|
538
|
+
a node that a previous transformer has allowlisted.
|
524
539
|
|
525
540
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
526
541
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
527
542
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
528
543
|
selectively ignore node types you aren't interested in.
|
529
544
|
|
545
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
546
|
+
document that have been allowlisted by previous transformers, if any. It's
|
547
|
+
generally bad form to remove a node that a previous transformer has
|
548
|
+
allowlisted.
|
549
|
+
|
530
550
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
531
551
|
"div" or "span"). For non-element nodes, the name will be something like
|
532
552
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
533
553
|
|
534
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
535
|
-
document that have been whitelisted by previous transformers, if any. It's
|
536
|
-
generally bad form to remove a node that a previous transformer has
|
537
|
-
whitelisted.
|
538
|
-
|
539
554
|
### Output
|
540
555
|
|
541
556
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
542
557
|
which may contain the following items:
|
543
558
|
|
544
|
-
* **:
|
545
|
-
to add to the document's
|
546
|
-
These specific nodes and all their attributes will be
|
547
|
-
their children will not be.
|
559
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
560
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
561
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
562
|
+
but their children will not be.
|
548
563
|
|
549
564
|
If a transformer returns anything other than a Hash, the return value will be
|
550
565
|
ignored.
|
@@ -587,16 +602,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
587
602
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
588
603
|
your own hands.
|
589
604
|
|
590
|
-
### Example: Transformer to
|
605
|
+
### Example: Transformer to allow image URLs by domain
|
591
606
|
|
592
607
|
The following example demonstrates how to remove image elements unless they use
|
593
608
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
594
|
-
element and its `src` attribute are already
|
609
|
+
element and its `src` attribute are already allowlisted.
|
595
610
|
|
596
611
|
```ruby
|
597
612
|
require 'uri'
|
598
613
|
|
599
|
-
|
614
|
+
image_allowlist_transformer = lambda do |env|
|
600
615
|
# Ignore everything except <img> elements.
|
601
616
|
return unless env[:node_name] == 'img'
|
602
617
|
|
@@ -612,20 +627,20 @@ image_whitelist_transformer = lambda do |env|
|
|
612
627
|
end
|
613
628
|
```
|
614
629
|
|
615
|
-
### Example: Transformer to
|
630
|
+
### Example: Transformer to allow YouTube video embeds
|
616
631
|
|
617
632
|
The following example demonstrates how to create a transformer that will safely
|
618
|
-
|
619
|
-
|
620
|
-
|
633
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
634
|
+
content, which would be the case if you tried to do this by just allowing all
|
635
|
+
`<iframe>` elements:
|
621
636
|
|
622
637
|
```ruby
|
623
638
|
youtube_transformer = lambda do |env|
|
624
639
|
node = env[:node]
|
625
640
|
node_name = env[:node_name]
|
626
641
|
|
627
|
-
# Don't continue if this node is already
|
628
|
-
return if env[:
|
642
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
643
|
+
return if env[:is_allowlisted] || !node.element?
|
629
644
|
|
630
645
|
# Don't continue unless the node is an iframe.
|
631
646
|
return unless node_name == 'iframe'
|
@@ -646,8 +661,8 @@ youtube_transformer = lambda do |env|
|
|
646
661
|
|
647
662
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
648
663
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
649
|
-
# to
|
650
|
-
{:
|
664
|
+
# to allowlist the current node.
|
665
|
+
{:node_allowlist => [node]}
|
651
666
|
end
|
652
667
|
|
653
668
|
html = %[
|
@@ -658,25 +673,3 @@ html = %[
|
|
658
673
|
Sanitize.fragment(html, :transformers => youtube_transformer)
|
659
674
|
# => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
|
660
675
|
```
|
661
|
-
|
662
|
-
License
|
663
|
-
-------
|
664
|
-
|
665
|
-
Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
|
666
|
-
|
667
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
668
|
-
this software and associated documentation files (the 'Software'), to deal in
|
669
|
-
the Software without restriction, including without limitation the rights to
|
670
|
-
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
671
|
-
the Software, and to permit persons to whom the Software is furnished to do so,
|
672
|
-
subject to the following conditions:
|
673
|
-
|
674
|
-
The above copyright notice and this permission notice shall be included in all
|
675
|
-
copies or substantial portions of the Software.
|
676
|
-
|
677
|
-
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
678
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
679
|
-
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
680
|
-
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
681
|
-
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
682
|
-
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@@ -54,6 +54,11 @@ class Sanitize
|
|
54
54
|
|
55
55
|
# HTML elements to allow. By default, no elements are allowed (which means
|
56
56
|
# that all HTML will be stripped).
|
57
|
+
#
|
58
|
+
# Warning: Sanitize cannot safely sanitize the contents of foreign
|
59
|
+
# elements (elements in the MathML or SVG namespaces). Do not add `math`
|
60
|
+
# or `svg` to this list! If you do, you may create a security
|
61
|
+
# vulnerability in your application.
|
57
62
|
:elements => [],
|
58
63
|
|
59
64
|
# HTML parsing options to pass to Nokogumbo.
|
@@ -74,7 +79,7 @@ class Sanitize
|
|
74
79
|
# the specified elements (when filtered) will be removed, and the contents
|
75
80
|
# of all other filtered elements will be left behind.
|
76
81
|
:remove_contents => %w[
|
77
|
-
iframe noembed noframes noscript script style
|
82
|
+
iframe math noembed noframes noscript plaintext script style svg xmp
|
78
83
|
],
|
79
84
|
|
80
85
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
@@ -6,7 +6,7 @@ class Sanitize
|
|
6
6
|
:elements => BASIC[:elements] + %w[
|
7
7
|
address article aside bdi bdo body caption col colgroup data del div
|
8
8
|
figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
|
9
|
-
img ins main nav rp rt ruby section span style summary
|
9
|
+
img ins main nav rp rt ruby section span style summary table tbody
|
10
10
|
td tfoot th thead title tr wbr
|
11
11
|
],
|
12
12
|
|
data/lib/sanitize/css.rb
CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
|
|
175
175
|
next prop
|
176
176
|
|
177
177
|
when :semicolon
|
178
|
-
# Only preserve the semicolon if it was preceded by
|
178
|
+
# Only preserve the semicolon if it was preceded by an allowlisted
|
179
179
|
# property. Otherwise, omit it in order to prevent redundant semicolons.
|
180
180
|
if preceded_by_property
|
181
181
|
preceded_by_property = false
|
@@ -296,7 +296,7 @@ class Sanitize; class CSS
|
|
296
296
|
end
|
297
297
|
|
298
298
|
# Returns `true` if the given node (which may be of type `:url` or
|
299
|
-
# `:function`, since the CSS syntax can produce both) uses
|
299
|
+
# `:function`, since the CSS syntax can produce both) uses an allowlisted
|
300
300
|
# protocol.
|
301
301
|
def valid_url?(node)
|
302
302
|
type = node[:node]
|