sanitize 4.6.5 → 6.0.1
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +235 -16
- data/LICENSE +1 -1
- data/README.md +89 -76
- data/lib/sanitize/config/default.rb +15 -4
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +105 -22
- data/lib/sanitize/version.rb +1 -1
- data/lib/sanitize.rb +53 -68
- data/test/common.rb +0 -31
- data/test/test_clean_comment.rb +16 -20
- data/test/test_clean_css.rb +6 -6
- data/test/test_clean_doctype.rb +22 -22
- data/test/test_clean_element.rb +200 -82
- data/test/test_config.rb +9 -9
- data/test/test_malicious_css.rb +7 -7
- data/test/test_malicious_html.rb +179 -32
- data/test/test_parser.rb +9 -38
- data/test/test_sanitize.rb +114 -29
- data/test/test_sanitize_css.rb +88 -61
- data/test/test_transformers.rb +52 -46
- metadata +17 -33
- data/test/test_unicode.rb +0 -95
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 819d713b2d4a78519e8bd4f2f853d6558d93ffd2d0481e10d012d8f74afbb555
|
4
|
+
data.tar.gz: 04a48476bf940cfffc12654e71d60a95fd93c0576b6bec6870c2defb5b72fa90
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: ed59ea47cc4a620ccf61be3443ef97036a877903bbc90fa855936e57446e34b92f5b9eb41ed9a026e17779fa473ce10d066986c1dd986c58381dae22bb7c9905
|
7
|
+
data.tar.gz: 27b40d2033ecd346c299bb77a7788b5325b79edd39c4767c9e5bf27486cf29bf2a5f3b34f96def645bbefd325b0e51a27182b75f187d2eb00931542769cd8c37
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,204 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 6.0.1 (2023-01-27)
|
4
|
+
|
5
|
+
### Bug Fixes
|
6
|
+
|
7
|
+
* Sanitize now always removes `<noscript>` elements and their contents, even
|
8
|
+
when `noscript` is in the allowlist.
|
9
|
+
|
10
|
+
This fixes a sanitization bypass that could occur when `noscript` was allowed
|
11
|
+
by a custom allowlist. In this scenario, carefully crafted input could sneak
|
12
|
+
arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
|
13
|
+
scripting) attack.
|
14
|
+
|
15
|
+
Sanitize's default configs don't allow `<noscript>` elements and are not
|
16
|
+
vulnerable. This issue only affects users who are using a custom config that
|
17
|
+
adds `noscript` to the element allowlist.
|
18
|
+
|
19
|
+
The root cause of this issue is that HTML parsing rules treat the contents of
|
20
|
+
a `<noscript>` element differently depending on whether scripting is enabled
|
21
|
+
in the user agent. Nokogiri doesn't support scripting so it follows the
|
22
|
+
"scripting disabled" rules, but a web browser with scripting enabled will
|
23
|
+
follow the "scripting enabled" rules. This means that Sanitize can't reliably
|
24
|
+
make the contents of a `<noscript>` element safe for scripting enabled
|
25
|
+
browsers, so the safest thing to do is to remove the element and its contents
|
26
|
+
entirely.
|
27
|
+
|
28
|
+
See the following security advisory for additional details:
|
29
|
+
[GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
|
30
|
+
|
31
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
32
|
+
(@leeN) for reporting this issue.
|
33
|
+
|
34
|
+
* Fixed an edge case in which the contents of an "unescaped text" element (such
|
35
|
+
as `<noembed>` or `<xmp>`) were not properly escaped if that element was
|
36
|
+
allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
|
37
|
+
|
38
|
+
The only way to encounter this situation was to ignore multiple warnings in
|
39
|
+
the readme and create a custom config that allowlisted all the elements
|
40
|
+
involved, including `<math>` or `<svg>`. If you're using a default config or
|
41
|
+
if you heeded the warnings about MathML and SVG not being supported, you're
|
42
|
+
not affected by this issue.
|
43
|
+
|
44
|
+
Please let this be a reminder that Sanitize cannot safely sanitize MathML or
|
45
|
+
SVG content and does not support this use case. The default configs don't
|
46
|
+
allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
|
47
|
+
custom config may create a security vulnerability in your application.
|
48
|
+
|
49
|
+
Documentation has been updated to add more warnings and to make the existing
|
50
|
+
warnings about this more prominent.
|
51
|
+
|
52
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
53
|
+
(@leeN) for reporting this issue.
|
54
|
+
|
55
|
+
## 6.0.0 (2021-08-03)
|
56
|
+
|
57
|
+
### Potentially Breaking Changes
|
58
|
+
|
59
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
60
|
+
|
61
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
62
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
63
|
+
|
64
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
65
|
+
|
66
|
+
## 5.2.3 (2021-01-11)
|
67
|
+
|
68
|
+
### Bug Fixes
|
69
|
+
|
70
|
+
* Ensure protocol sanitization is applied to data attributes.
|
71
|
+
[@ccutrer - #207][207]
|
72
|
+
|
73
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
74
|
+
|
75
|
+
## 5.2.2 (2021-01-06)
|
76
|
+
|
77
|
+
### Bug Fixes
|
78
|
+
|
79
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
80
|
+
custom transformer. [@mscrivo - #206][206]
|
81
|
+
|
82
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
83
|
+
|
84
|
+
## 5.2.1 (2020-06-16)
|
85
|
+
|
86
|
+
### Bug Fixes
|
87
|
+
|
88
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
89
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
90
|
+
|
91
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
92
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
93
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
94
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
95
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
96
|
+
|
97
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
98
|
+
config or a custom config that allows one or more of the following HTML
|
99
|
+
elements:
|
100
|
+
|
101
|
+
- `iframe`
|
102
|
+
- `math`
|
103
|
+
- `noembed`
|
104
|
+
- `noframes`
|
105
|
+
- `noscript`
|
106
|
+
- `plaintext`
|
107
|
+
- `script`
|
108
|
+
- `style`
|
109
|
+
- `svg`
|
110
|
+
- `xmp`
|
111
|
+
|
112
|
+
See the security advisory for more details, including a workaround if you're
|
113
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
114
|
+
|
115
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
116
|
+
helping to verify the fix.
|
117
|
+
|
118
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
119
|
+
|
120
|
+
## 5.2.0 (2020-06-06)
|
121
|
+
|
122
|
+
### Changes
|
123
|
+
|
124
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
125
|
+
source and documentation.
|
126
|
+
|
127
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
128
|
+
intent, there are inherent racial connotations in the implication that white
|
129
|
+
is good and black (as in "blacklist") is not.
|
130
|
+
|
131
|
+
This is a change I should have made long ago, and I apologize for not making
|
132
|
+
it sooner.
|
133
|
+
|
134
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
135
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
136
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
137
|
+
but they are no longer documented and may be removed in a future semver major
|
138
|
+
release.
|
139
|
+
|
140
|
+
## 5.1.0 (2019-09-07)
|
141
|
+
|
142
|
+
### Features
|
143
|
+
|
144
|
+
* Added a `:parser_options` config hash, which makes it possible to pass custom
|
145
|
+
parsing options to Nokogumbo. [@austin-wang - #194][194]
|
146
|
+
|
147
|
+
### Bug Fixes
|
148
|
+
|
149
|
+
* Non-characters and non-whitespace control characters are now stripped from
|
150
|
+
HTML input before parsing to comply with the HTML Standard's [preprocessing
|
151
|
+
guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
|
152
|
+
W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
|
153
|
+
|
154
|
+
[179]:https://github.com/rgrove/sanitize/issues/179
|
155
|
+
[194]:https://github.com/rgrove/sanitize/pull/194
|
156
|
+
[html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
|
157
|
+
[unicode-xml]:https://www.w3.org/TR/unicode-xml/
|
158
|
+
|
159
|
+
## 5.0.0 (2018-10-14)
|
160
|
+
|
161
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
162
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
163
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
164
|
+
review the changes below carefully.
|
165
|
+
|
166
|
+
### Potentially Breaking Changes
|
167
|
+
|
168
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
169
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
170
|
+
no longer works in Ruby 1.9.x.
|
171
|
+
|
172
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
173
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
174
|
+
|
175
|
+
* Children of the following elements are now removed by default when these
|
176
|
+
elements are removed, rather than being preserved and escaped:
|
177
|
+
|
178
|
+
- `iframe`
|
179
|
+
- `noembed`
|
180
|
+
- `noframes`
|
181
|
+
- `noscript`
|
182
|
+
- `script`
|
183
|
+
- `style`
|
184
|
+
|
185
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
186
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
187
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
188
|
+
browsers, but it's been almost two decades since that was useful.
|
189
|
+
|
190
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
191
|
+
`true` when it was actually an Array.
|
192
|
+
|
193
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
194
|
+
|
195
|
+
## 4.6.6 (2018-07-23)
|
196
|
+
|
197
|
+
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
198
|
+
[@stanhu - #183][183]
|
199
|
+
|
200
|
+
[183]:https://github.com/rgrove/sanitize/pull/183
|
201
|
+
|
3
202
|
## 4.6.5 (2018-05-16)
|
4
203
|
|
5
204
|
* Improved performance slightly by tweaking the order of built-in transformers.
|
@@ -22,7 +221,7 @@
|
|
22
221
|
|
23
222
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
24
223
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
25
|
-
escaped output, allowing non-
|
224
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
26
225
|
elements.
|
27
226
|
|
28
227
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -66,7 +265,7 @@
|
|
66
265
|
|
67
266
|
## 4.4.0 (2016-09-29)
|
68
267
|
|
69
|
-
* Added `srcset` to the attribute
|
268
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
70
269
|
config. [@ejtttje - #156][156]
|
71
270
|
|
72
271
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -187,7 +386,7 @@
|
|
187
386
|
## 3.0.4 (2014-12-12)
|
188
387
|
|
189
388
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
190
|
-
caused the URL to be removed even when the protocol was
|
389
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
191
390
|
[@benubois - #126][126]
|
192
391
|
|
193
392
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -196,7 +395,7 @@
|
|
196
395
|
## 3.0.3 (2014-10-29)
|
197
396
|
|
198
397
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
199
|
-
`@media` block, causing them to be removed even when
|
398
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
200
399
|
have allowed them to remain. [#121][121]
|
201
400
|
|
202
401
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -261,7 +460,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
261
460
|
* The `clean_node!` method was renamed to `node!`.
|
262
461
|
|
263
462
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
264
|
-
isn't
|
463
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
265
464
|
regardless of the `:remove_contents` config setting.
|
266
465
|
|
267
466
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -272,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
272
471
|
|
273
472
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
274
473
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
275
|
-
|
474
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
276
475
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
277
476
|
sanitize CSS stylesheets or properties.
|
278
477
|
|
@@ -317,9 +516,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
317
516
|
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
318
517
|
|
319
518
|
|
519
|
+
## 2.1.1 (2018-09-30)
|
520
|
+
|
521
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
522
|
+
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
523
|
+
|
524
|
+
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
525
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
526
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
527
|
+
elements.
|
528
|
+
|
529
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
530
|
+
this.
|
531
|
+
|
532
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
533
|
+
this issue.
|
534
|
+
|
535
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
536
|
+
[188]:https://github.com/rgrove/sanitize/pull/188
|
537
|
+
|
538
|
+
|
320
539
|
## 2.1.0 (2014-01-13)
|
321
540
|
|
322
|
-
* Added support for
|
541
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
323
542
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
324
543
|
indicate that arbitrary data attributes should be allowed on an element.
|
325
544
|
|
@@ -400,12 +619,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
400
619
|
the default depth-first mode.
|
401
620
|
|
402
621
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
403
|
-
elements to the
|
622
|
+
elements to the allowlists for the basic and relaxed configs.
|
404
623
|
|
405
624
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
406
|
-
`ruby`, and `wbr` elements to the
|
625
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
407
626
|
|
408
|
-
* The `dir`, `lang`, and `title` attributes are now
|
627
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
409
628
|
elements in the relaxed config.
|
410
629
|
|
411
630
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -416,7 +635,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
416
635
|
## 1.2.1 (2010-04-20)
|
417
636
|
|
418
637
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
419
|
-
remove the contents of all non-
|
638
|
+
remove the contents of all non-allowlisted elements in addition to the
|
420
639
|
elements themselves. If set to an array of element names, Sanitize will
|
421
640
|
remove the contents of only those elements (when filtered), and leave the
|
422
641
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -444,7 +663,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
444
663
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
445
664
|
all its children.
|
446
665
|
|
447
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
666
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
448
667
|
David Reese]
|
449
668
|
|
450
669
|
|
@@ -464,7 +683,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
464
683
|
|
465
684
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
466
685
|
being downcased in recent versions of Hpricot. This was exploitable to
|
467
|
-
prevent non-
|
686
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
468
687
|
Wanicur]
|
469
688
|
|
470
689
|
|
@@ -494,7 +713,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
494
713
|
|
495
714
|
## 1.0.5 (2009-02-05)
|
496
715
|
|
497
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
716
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
498
717
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
499
718
|
Dev Purkayastha]
|
500
719
|
|
@@ -504,7 +723,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
504
723
|
|
505
724
|
## 1.0.4 (2009-01-16)
|
506
725
|
|
507
|
-
* Fixed a bug that made it possible to sneak a non-
|
726
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
508
727
|
by repeating it several times in a row. All versions of Sanitize prior to
|
509
728
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
510
729
|
|
@@ -512,7 +731,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
512
731
|
## 1.0.3 (2009-01-15)
|
513
732
|
|
514
733
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
515
|
-
prevent non-
|
734
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
516
735
|
still decode the incomplete entities, users of those browsers may be
|
517
736
|
vulnerable to malicious script injection on websites using versions of
|
518
737
|
Sanitize prior to 1.0.3.
|
data/LICENSE
CHANGED