sanitize 4.6.4 → 6.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/HISTORY.md +259 -16
- data/LICENSE +1 -1
- data/README.md +89 -76
- data/lib/sanitize/config/default.rb +15 -4
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +4 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +105 -22
- data/lib/sanitize/version.rb +1 -3
- data/lib/sanitize.rb +56 -72
- data/test/common.rb +0 -31
- data/test/test_clean_comment.rb +16 -20
- data/test/test_clean_css.rb +6 -6
- data/test/test_clean_doctype.rb +22 -22
- data/test/test_clean_element.rb +200 -82
- data/test/test_config.rb +9 -9
- data/test/test_malicious_css.rb +20 -7
- data/test/test_malicious_html.rb +179 -32
- data/test/test_parser.rb +9 -38
- data/test/test_sanitize.rb +114 -29
- data/test/test_sanitize_css.rb +88 -61
- data/test/test_transformers.rb +52 -46
- metadata +17 -33
- data/test/test_unicode.rb +0 -95
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 93adca1e155370d138ccb7c500b618e2ed218297d21593ec8937638f4d99731b
|
4
|
+
data.tar.gz: 740b6d84113a0945928601b6cad03e36b4ee76f7c3098c72ddb1a1b01ec5d0ec
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 4d3e9852ec92ac961c2e35d4a04e7d967dd2eac364e656837b93daf95c1b653da53d4ef7f104af83887e46d08237ddca1efa945facde3efbfcfce0164c0fe334
|
7
|
+
data.tar.gz: 05a56334e5cdbbee7b165b19245b90a8acdd82bcd72bbc9f84e2780d914f8b040d19d9ff71934b0c1bd71df4b55f407f460c76dffdbd275b183ecaffb2fa6c38
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,228 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 6.0.2 (2023-07-06)
|
4
|
+
|
5
|
+
### Bug Fixes
|
6
|
+
|
7
|
+
* CVE-2023-36823: Fixed an HTML+CSS sanitization bypass that could allow XSS
|
8
|
+
(cross-site scripting). This issue affects Sanitize versions 3.0.0 through
|
9
|
+
6.0.1.
|
10
|
+
|
11
|
+
When using Sanitize's relaxed config or a custom config that allows `<style>`
|
12
|
+
elements and one or more CSS at-rules, carefully crafted input could be used
|
13
|
+
to sneak arbitrary HTML through Sanitize.
|
14
|
+
|
15
|
+
See the following security advisory for additional details:
|
16
|
+
[GHSA-f5ww-cq3m-q3g7](https://github.com/rgrove/sanitize/security/advisories/GHSA-f5ww-cq3m-q3g7)
|
17
|
+
|
18
|
+
Thanks to @cure53 for finding this issue.
|
19
|
+
|
20
|
+
## 6.0.1 (2023-01-27)
|
21
|
+
|
22
|
+
### Bug Fixes
|
23
|
+
|
24
|
+
* Sanitize now always removes `<noscript>` elements and their contents, even
|
25
|
+
when `noscript` is in the allowlist.
|
26
|
+
|
27
|
+
This fixes a sanitization bypass that could occur when `noscript` was allowed
|
28
|
+
by a custom allowlist. In this scenario, carefully crafted input could sneak
|
29
|
+
arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
|
30
|
+
scripting) attack.
|
31
|
+
|
32
|
+
Sanitize's default configs don't allow `<noscript>` elements and are not
|
33
|
+
vulnerable. This issue only affects users who are using a custom config that
|
34
|
+
adds `noscript` to the element allowlist.
|
35
|
+
|
36
|
+
The root cause of this issue is that HTML parsing rules treat the contents of
|
37
|
+
a `<noscript>` element differently depending on whether scripting is enabled
|
38
|
+
in the user agent. Nokogiri doesn't support scripting so it follows the
|
39
|
+
"scripting disabled" rules, but a web browser with scripting enabled will
|
40
|
+
follow the "scripting enabled" rules. This means that Sanitize can't reliably
|
41
|
+
make the contents of a `<noscript>` element safe for scripting enabled
|
42
|
+
browsers, so the safest thing to do is to remove the element and its contents
|
43
|
+
entirely.
|
44
|
+
|
45
|
+
See the following security advisory for additional details:
|
46
|
+
[GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
|
47
|
+
|
48
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
49
|
+
(@leeN) for reporting this issue.
|
50
|
+
|
51
|
+
* Fixed an edge case in which the contents of an "unescaped text" element (such
|
52
|
+
as `<noembed>` or `<xmp>`) were not properly escaped if that element was
|
53
|
+
allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
|
54
|
+
|
55
|
+
The only way to encounter this situation was to ignore multiple warnings in
|
56
|
+
the readme and create a custom config that allowlisted all the elements
|
57
|
+
involved, including `<math>` or `<svg>`. If you're using a default config or
|
58
|
+
if you heeded the warnings about MathML and SVG not being supported, you're
|
59
|
+
not affected by this issue.
|
60
|
+
|
61
|
+
Please let this be a reminder that Sanitize cannot safely sanitize MathML or
|
62
|
+
SVG content and does not support this use case. The default configs don't
|
63
|
+
allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
|
64
|
+
custom config may create a security vulnerability in your application.
|
65
|
+
|
66
|
+
Documentation has been updated to add more warnings and to make the existing
|
67
|
+
warnings about this more prominent.
|
68
|
+
|
69
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
70
|
+
(@leeN) for reporting this issue.
|
71
|
+
|
72
|
+
## 6.0.0 (2021-08-03)
|
73
|
+
|
74
|
+
### Potentially Breaking Changes
|
75
|
+
|
76
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
77
|
+
|
78
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
79
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
80
|
+
|
81
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
82
|
+
|
83
|
+
## 5.2.3 (2021-01-11)
|
84
|
+
|
85
|
+
### Bug Fixes
|
86
|
+
|
87
|
+
* Ensure protocol sanitization is applied to data attributes.
|
88
|
+
[@ccutrer - #207][207]
|
89
|
+
|
90
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
91
|
+
|
92
|
+
## 5.2.2 (2021-01-06)
|
93
|
+
|
94
|
+
### Bug Fixes
|
95
|
+
|
96
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
97
|
+
custom transformer. [@mscrivo - #206][206]
|
98
|
+
|
99
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
100
|
+
|
101
|
+
## 5.2.1 (2020-06-16)
|
102
|
+
|
103
|
+
### Bug Fixes
|
104
|
+
|
105
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
106
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
107
|
+
|
108
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
109
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
110
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
111
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
112
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
113
|
+
|
114
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
115
|
+
config or a custom config that allows one or more of the following HTML
|
116
|
+
elements:
|
117
|
+
|
118
|
+
- `iframe`
|
119
|
+
- `math`
|
120
|
+
- `noembed`
|
121
|
+
- `noframes`
|
122
|
+
- `noscript`
|
123
|
+
- `plaintext`
|
124
|
+
- `script`
|
125
|
+
- `style`
|
126
|
+
- `svg`
|
127
|
+
- `xmp`
|
128
|
+
|
129
|
+
See the security advisory for more details, including a workaround if you're
|
130
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
131
|
+
|
132
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
133
|
+
helping to verify the fix.
|
134
|
+
|
135
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
136
|
+
|
137
|
+
## 5.2.0 (2020-06-06)
|
138
|
+
|
139
|
+
### Changes
|
140
|
+
|
141
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
142
|
+
source and documentation.
|
143
|
+
|
144
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
145
|
+
intent, there are inherent racial connotations in the implication that white
|
146
|
+
is good and black (as in "blacklist") is not.
|
147
|
+
|
148
|
+
This is a change I should have made long ago, and I apologize for not making
|
149
|
+
it sooner.
|
150
|
+
|
151
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
152
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
153
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
154
|
+
but they are no longer documented and may be removed in a future semver major
|
155
|
+
release.
|
156
|
+
|
157
|
+
## 5.1.0 (2019-09-07)
|
158
|
+
|
159
|
+
### Features
|
160
|
+
|
161
|
+
* Added a `:parser_options` config hash, which makes it possible to pass custom
|
162
|
+
parsing options to Nokogumbo. [@austin-wang - #194][194]
|
163
|
+
|
164
|
+
### Bug Fixes
|
165
|
+
|
166
|
+
* Non-characters and non-whitespace control characters are now stripped from
|
167
|
+
HTML input before parsing to comply with the HTML Standard's [preprocessing
|
168
|
+
guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
|
169
|
+
W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
|
170
|
+
|
171
|
+
[179]:https://github.com/rgrove/sanitize/issues/179
|
172
|
+
[194]:https://github.com/rgrove/sanitize/pull/194
|
173
|
+
[html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
|
174
|
+
[unicode-xml]:https://www.w3.org/TR/unicode-xml/
|
175
|
+
|
176
|
+
## 5.0.0 (2018-10-14)
|
177
|
+
|
178
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
179
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
180
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
181
|
+
review the changes below carefully.
|
182
|
+
|
183
|
+
### Potentially Breaking Changes
|
184
|
+
|
185
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
186
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
187
|
+
no longer works in Ruby 1.9.x.
|
188
|
+
|
189
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
190
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
191
|
+
|
192
|
+
* Children of the following elements are now removed by default when these
|
193
|
+
elements are removed, rather than being preserved and escaped:
|
194
|
+
|
195
|
+
- `iframe`
|
196
|
+
- `noembed`
|
197
|
+
- `noframes`
|
198
|
+
- `noscript`
|
199
|
+
- `script`
|
200
|
+
- `style`
|
201
|
+
|
202
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
203
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
204
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
205
|
+
browsers, but it's been almost two decades since that was useful.
|
206
|
+
|
207
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
208
|
+
`true` when it was actually an Array.
|
209
|
+
|
210
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
211
|
+
|
212
|
+
## 4.6.6 (2018-07-23)
|
213
|
+
|
214
|
+
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
215
|
+
[@stanhu - #183][183]
|
216
|
+
|
217
|
+
[183]:https://github.com/rgrove/sanitize/pull/183
|
218
|
+
|
219
|
+
## 4.6.5 (2018-05-16)
|
220
|
+
|
221
|
+
* Improved performance slightly by tweaking the order of built-in transformers.
|
222
|
+
[@rafbm - #180][180]
|
223
|
+
|
224
|
+
[180]:https://github.com/rgrove/sanitize/pull/180
|
225
|
+
|
3
226
|
## 4.6.4 (2018-03-20)
|
4
227
|
|
5
228
|
* Fixed: A change introduced in 4.6.2 broke certain transformers that relied on
|
@@ -15,7 +238,7 @@
|
|
15
238
|
|
16
239
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
17
240
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
18
|
-
escaped output, allowing non-
|
241
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
19
242
|
elements.
|
20
243
|
|
21
244
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -59,7 +282,7 @@
|
|
59
282
|
|
60
283
|
## 4.4.0 (2016-09-29)
|
61
284
|
|
62
|
-
* Added `srcset` to the attribute
|
285
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
63
286
|
config. [@ejtttje - #156][156]
|
64
287
|
|
65
288
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -180,7 +403,7 @@
|
|
180
403
|
## 3.0.4 (2014-12-12)
|
181
404
|
|
182
405
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
183
|
-
caused the URL to be removed even when the protocol was
|
406
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
184
407
|
[@benubois - #126][126]
|
185
408
|
|
186
409
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -189,7 +412,7 @@
|
|
189
412
|
## 3.0.3 (2014-10-29)
|
190
413
|
|
191
414
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
192
|
-
`@media` block, causing them to be removed even when
|
415
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
193
416
|
have allowed them to remain. [#121][121]
|
194
417
|
|
195
418
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -254,7 +477,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
254
477
|
* The `clean_node!` method was renamed to `node!`.
|
255
478
|
|
256
479
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
257
|
-
isn't
|
480
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
258
481
|
regardless of the `:remove_contents` config setting.
|
259
482
|
|
260
483
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -265,7 +488,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
265
488
|
|
266
489
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
267
490
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
268
|
-
|
491
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
269
492
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
270
493
|
sanitize CSS stylesheets or properties.
|
271
494
|
|
@@ -310,9 +533,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
310
533
|
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
311
534
|
|
312
535
|
|
536
|
+
## 2.1.1 (2018-09-30)
|
537
|
+
|
538
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
539
|
+
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
540
|
+
|
541
|
+
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
542
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
543
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
544
|
+
elements.
|
545
|
+
|
546
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
547
|
+
this.
|
548
|
+
|
549
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
550
|
+
this issue.
|
551
|
+
|
552
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
553
|
+
[188]:https://github.com/rgrove/sanitize/pull/188
|
554
|
+
|
555
|
+
|
313
556
|
## 2.1.0 (2014-01-13)
|
314
557
|
|
315
|
-
* Added support for
|
558
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
316
559
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
317
560
|
indicate that arbitrary data attributes should be allowed on an element.
|
318
561
|
|
@@ -393,12 +636,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
393
636
|
the default depth-first mode.
|
394
637
|
|
395
638
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
396
|
-
elements to the
|
639
|
+
elements to the allowlists for the basic and relaxed configs.
|
397
640
|
|
398
641
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
399
|
-
`ruby`, and `wbr` elements to the
|
642
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
400
643
|
|
401
|
-
* The `dir`, `lang`, and `title` attributes are now
|
644
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
402
645
|
elements in the relaxed config.
|
403
646
|
|
404
647
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -409,7 +652,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
409
652
|
## 1.2.1 (2010-04-20)
|
410
653
|
|
411
654
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
412
|
-
remove the contents of all non-
|
655
|
+
remove the contents of all non-allowlisted elements in addition to the
|
413
656
|
elements themselves. If set to an array of element names, Sanitize will
|
414
657
|
remove the contents of only those elements (when filtered), and leave the
|
415
658
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -437,7 +680,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
437
680
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
438
681
|
all its children.
|
439
682
|
|
440
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
683
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
441
684
|
David Reese]
|
442
685
|
|
443
686
|
|
@@ -457,7 +700,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
457
700
|
|
458
701
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
459
702
|
being downcased in recent versions of Hpricot. This was exploitable to
|
460
|
-
prevent non-
|
703
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
461
704
|
Wanicur]
|
462
705
|
|
463
706
|
|
@@ -487,7 +730,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
487
730
|
|
488
731
|
## 1.0.5 (2009-02-05)
|
489
732
|
|
490
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
733
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
491
734
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
492
735
|
Dev Purkayastha]
|
493
736
|
|
@@ -497,7 +740,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
497
740
|
|
498
741
|
## 1.0.4 (2009-01-16)
|
499
742
|
|
500
|
-
* Fixed a bug that made it possible to sneak a non-
|
743
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
501
744
|
by repeating it several times in a row. All versions of Sanitize prior to
|
502
745
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
503
746
|
|
@@ -505,7 +748,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
505
748
|
## 1.0.3 (2009-01-15)
|
506
749
|
|
507
750
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
508
|
-
prevent non-
|
751
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
509
752
|
still decode the incomplete entities, users of those browsers may be
|
510
753
|
vulnerable to malicious script injection on websites using versions of
|
511
754
|
Sanitize prior to 1.0.3.
|
data/LICENSE
CHANGED