sanitize 4.6.5 → 6.0.1

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eab36cec7ac13bd15bd00b1141990e9efc35332c95391cb405128ddfe891e242
4
- data.tar.gz: f69f77cf6febfa74b1bdc5103d245543f38ddcfe223d474dffab5913846525ec
3
+ metadata.gz: 819d713b2d4a78519e8bd4f2f853d6558d93ffd2d0481e10d012d8f74afbb555
4
+ data.tar.gz: 04a48476bf940cfffc12654e71d60a95fd93c0576b6bec6870c2defb5b72fa90
5
5
  SHA512:
6
- metadata.gz: 3358c2574bcdd0e3a8c08460f2dd31ecd3ade8e04ed60380f8037c46dd0f67321ac2c4ccace6e1f82080acecb2dc71054630c2c1fec54b2a99cb50c0476dd0b2
7
- data.tar.gz: c10686ec8aacadf3268eafd407e4e2259e88deae363829d5ebe1b2877ed8e15c658bb86fe97b786d116c00b10e6eaff14baf5bb71a7a737ec507f3ab65f61187
6
+ metadata.gz: ed59ea47cc4a620ccf61be3443ef97036a877903bbc90fa855936e57446e34b92f5b9eb41ed9a026e17779fa473ce10d066986c1dd986c58381dae22bb7c9905
7
+ data.tar.gz: 27b40d2033ecd346c299bb77a7788b5325b79edd39c4767c9e5bf27486cf29bf2a5f3b34f96def645bbefd325b0e51a27182b75f187d2eb00931542769cd8c37
data/HISTORY.md CHANGED
@@ -1,5 +1,204 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.0.1 (2023-01-27)
4
+
5
+ ### Bug Fixes
6
+
7
+ * Sanitize now always removes `<noscript>` elements and their contents, even
8
+ when `noscript` is in the allowlist.
9
+
10
+ This fixes a sanitization bypass that could occur when `noscript` was allowed
11
+ by a custom allowlist. In this scenario, carefully crafted input could sneak
12
+ arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
13
+ scripting) attack.
14
+
15
+ Sanitize's default configs don't allow `<noscript>` elements and are not
16
+ vulnerable. This issue only affects users who are using a custom config that
17
+ adds `noscript` to the element allowlist.
18
+
19
+ The root cause of this issue is that HTML parsing rules treat the contents of
20
+ a `<noscript>` element differently depending on whether scripting is enabled
21
+ in the user agent. Nokogiri doesn't support scripting so it follows the
22
+ "scripting disabled" rules, but a web browser with scripting enabled will
23
+ follow the "scripting enabled" rules. This means that Sanitize can't reliably
24
+ make the contents of a `<noscript>` element safe for scripting enabled
25
+ browsers, so the safest thing to do is to remove the element and its contents
26
+ entirely.
27
+
28
+ See the following security advisory for additional details:
29
+ [GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
30
+
31
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
32
+ (@leeN) for reporting this issue.
33
+
34
+ * Fixed an edge case in which the contents of an "unescaped text" element (such
35
+ as `<noembed>` or `<xmp>`) were not properly escaped if that element was
36
+ allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
37
+
38
+ The only way to encounter this situation was to ignore multiple warnings in
39
+ the readme and create a custom config that allowlisted all the elements
40
+ involved, including `<math>` or `<svg>`. If you're using a default config or
41
+ if you heeded the warnings about MathML and SVG not being supported, you're
42
+ not affected by this issue.
43
+
44
+ Please let this be a reminder that Sanitize cannot safely sanitize MathML or
45
+ SVG content and does not support this use case. The default configs don't
46
+ allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
47
+ custom config may create a security vulnerability in your application.
48
+
49
+ Documentation has been updated to add more warnings and to make the existing
50
+ warnings about this more prominent.
51
+
52
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
53
+ (@leeN) for reporting this issue.
54
+
55
+ ## 6.0.0 (2021-08-03)
56
+
57
+ ### Potentially Breaking Changes
58
+
59
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
60
+
61
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
62
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
63
+
64
+ [211]:https://github.com/rgrove/sanitize/pull/211
65
+
66
+ ## 5.2.3 (2021-01-11)
67
+
68
+ ### Bug Fixes
69
+
70
+ * Ensure protocol sanitization is applied to data attributes.
71
+ [@ccutrer - #207][207]
72
+
73
+ [207]:https://github.com/rgrove/sanitize/pull/207
74
+
75
+ ## 5.2.2 (2021-01-06)
76
+
77
+ ### Bug Fixes
78
+
79
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
80
+ custom transformer. [@mscrivo - #206][206]
81
+
82
+ [206]:https://github.com/rgrove/sanitize/pull/206
83
+
84
+ ## 5.2.1 (2020-06-16)
85
+
86
+ ### Bug Fixes
87
+
88
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
89
+ Sanitize versions 3.0.0 through 5.2.0.
90
+
91
+ When HTML was sanitized using the "relaxed" config or a custom config that
92
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
93
+ have beeen sanitized correctly even if `math` and `svg` were not in the
94
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
95
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
96
+
97
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
98
+ config or a custom config that allows one or more of the following HTML
99
+ elements:
100
+
101
+ - `iframe`
102
+ - `math`
103
+ - `noembed`
104
+ - `noframes`
105
+ - `noscript`
106
+ - `plaintext`
107
+ - `script`
108
+ - `style`
109
+ - `svg`
110
+ - `xmp`
111
+
112
+ See the security advisory for more details, including a workaround if you're
113
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
114
+
115
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
116
+ helping to verify the fix.
117
+
118
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
119
+
120
+ ## 5.2.0 (2020-06-06)
121
+
122
+ ### Changes
123
+
124
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
125
+ source and documentation.
126
+
127
+ While the etymology of "whitelist" may not be explicitly racist in origin or
128
+ intent, there are inherent racial connotations in the implication that white
129
+ is good and black (as in "blacklist") is not.
130
+
131
+ This is a change I should have made long ago, and I apologize for not making
132
+ it sooner.
133
+
134
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
135
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
136
+ The old keys will continue to work in order to avoid breaking existing code,
137
+ but they are no longer documented and may be removed in a future semver major
138
+ release.
139
+
140
+ ## 5.1.0 (2019-09-07)
141
+
142
+ ### Features
143
+
144
+ * Added a `:parser_options` config hash, which makes it possible to pass custom
145
+ parsing options to Nokogumbo. [@austin-wang - #194][194]
146
+
147
+ ### Bug Fixes
148
+
149
+ * Non-characters and non-whitespace control characters are now stripped from
150
+ HTML input before parsing to comply with the HTML Standard's [preprocessing
151
+ guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
152
+ W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
153
+
154
+ [179]:https://github.com/rgrove/sanitize/issues/179
155
+ [194]:https://github.com/rgrove/sanitize/pull/194
156
+ [html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
157
+ [unicode-xml]:https://www.w3.org/TR/unicode-xml/
158
+
159
+ ## 5.0.0 (2018-10-14)
160
+
161
+ For most users, upgrading from 4.x shouldn't require any changes. However, the
162
+ minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
163
+ differ in some small ways from 4.x's output. If this matters to you, please
164
+ review the changes below carefully.
165
+
166
+ ### Potentially Breaking Changes
167
+
168
+ * Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
169
+ work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
170
+ no longer works in Ruby 1.9.x.
171
+
172
+ * Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
173
+ standard-compliant HTML serialization. [@stevecheckoway - #189][189]
174
+
175
+ * Children of the following elements are now removed by default when these
176
+ elements are removed, rather than being preserved and escaped:
177
+
178
+ - `iframe`
179
+ - `noembed`
180
+ - `noframes`
181
+ - `noscript`
182
+ - `script`
183
+ - `style`
184
+
185
+ * Children of allowlisted `iframe` elements are now always removed. In modern
186
+ HTML, `iframe` elements should never have children. In HTML 4 and earlier
187
+ `iframe` elements were allowed to contain fallback content for legacy
188
+ browsers, but it's been almost two decades since that was useful.
189
+
190
+ * Fixed a bug that caused `:remove_contents` to behave as if it were set to
191
+ `true` when it was actually an Array.
192
+
193
+ [189]:https://github.com/rgrove/sanitize/pull/189
194
+
195
+ ## 4.6.6 (2018-07-23)
196
+
197
+ * Improved performance and memory usage by optimizing `Sanitize#transform_node!`
198
+ [@stanhu - #183][183]
199
+
200
+ [183]:https://github.com/rgrove/sanitize/pull/183
201
+
3
202
  ## 4.6.5 (2018-05-16)
4
203
 
5
204
  * Improved performance slightly by tweaking the order of built-in transformers.
@@ -22,7 +221,7 @@
22
221
 
23
222
  When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
24
223
  specially crafted HTML fragment can cause libxml2 to generate improperly
25
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
224
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
26
225
  elements.
27
226
 
28
227
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -66,7 +265,7 @@
66
265
 
67
266
  ## 4.4.0 (2016-09-29)
68
267
 
69
- * Added `srcset` to the attribute whitelist for `img` elements in the relaxed
268
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
70
269
  config. [@ejtttje - #156][156]
71
270
 
72
271
  [156]:https://github.com/rgrove/sanitize/pull/156
@@ -187,7 +386,7 @@
187
386
  ## 3.0.4 (2014-12-12)
188
387
 
189
388
  * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
190
- caused the URL to be removed even when the protocol was whitelisted.
389
+ caused the URL to be removed even when the protocol was allowlisted.
191
390
  [@benubois - #126][126]
192
391
 
193
392
  [126]:https://github.com/rgrove/sanitize/pull/126
@@ -196,7 +395,7 @@
196
395
  ## 3.0.3 (2014-10-29)
197
396
 
198
397
  * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
199
- `@media` block, causing them to be removed even when whitelist rules should
398
+ `@media` block, causing them to be removed even when allowlist rules should
200
399
  have allowed them to remain. [#121][121]
201
400
 
202
401
  [121]:https://github.com/rgrove/sanitize/issues/121
@@ -261,7 +460,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
261
460
  * The `clean_node!` method was renamed to `node!`.
262
461
 
263
462
  * The `document` method now raises a `Sanitize::Error` if the `<html>` element
264
- isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
463
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
265
464
  regardless of the `:remove_contents` config setting.
266
465
 
267
466
  * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -272,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
471
 
273
472
  * Added advanced CSS sanitization support using [Crass][crass], which is fully
274
473
  compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
275
- whitelisted `<style>` elements and `style` attributes in HTML will be
474
+ allowlisted `<style>` elements and `style` attributes in HTML will be
276
475
  sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
277
476
  sanitize CSS stylesheets or properties.
278
477
 
@@ -317,9 +516,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
317
516
  [n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
318
517
 
319
518
 
519
+ ## 2.1.1 (2018-09-30)
520
+
521
+ * [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
522
+ XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
523
+
524
+ When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
525
+ specially crafted HTML fragment can cause libxml2 to generate improperly
526
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
527
+ elements.
528
+
529
+ Sanitize now performs additional escaping on affected attributes to prevent
530
+ this.
531
+
532
+ Many thanks to the Shopify Application Security Team for responsibly reporting
533
+ this issue.
534
+
535
+ [176]:https://github.com/rgrove/sanitize/issues/176
536
+ [188]:https://github.com/rgrove/sanitize/pull/188
537
+
538
+
320
539
  ## 2.1.0 (2014-01-13)
321
540
 
322
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
541
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
323
542
  symbol `:data` instead of an attribute name in the `:attributes` config to
324
543
  indicate that arbitrary data attributes should be allowed on an element.
325
544
 
@@ -400,12 +619,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
400
619
  the default depth-first mode.
401
620
 
402
621
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
403
- elements to the whitelists for the basic and relaxed configs.
622
+ elements to the allowlists for the basic and relaxed configs.
404
623
 
405
624
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
406
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
625
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
407
626
 
408
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
627
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
409
628
  elements in the relaxed config.
410
629
 
411
630
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -416,7 +635,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
416
635
  ## 1.2.1 (2010-04-20)
417
636
 
418
637
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
419
- remove the contents of all non-whitelisted elements in addition to the
638
+ remove the contents of all non-allowlisted elements in addition to the
420
639
  elements themselves. If set to an array of element names, Sanitize will
421
640
  remove the contents of only those elements (when filtered), and leave the
422
641
  contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -444,7 +663,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
444
663
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
445
664
  all its children.
446
665
 
447
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
666
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
448
667
  David Reese]
449
668
 
450
669
 
@@ -464,7 +683,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
464
683
 
465
684
  * Added a workaround for an Hpricot bug that prevents attribute names from
466
685
  being downcased in recent versions of Hpricot. This was exploitable to
467
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
686
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
468
687
  Wanicur]
469
688
 
470
689
 
@@ -494,7 +713,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
494
713
 
495
714
  ## 1.0.5 (2009-02-05)
496
715
 
497
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
716
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
498
717
  protocols from being cleaned when relative URLs were allowed. [Reported by
499
718
  Dev Purkayastha]
500
719
 
@@ -504,7 +723,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
504
723
 
505
724
  ## 1.0.4 (2009-01-16)
506
725
 
507
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
726
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
508
727
  by repeating it several times in a row. All versions of Sanitize prior to
509
728
  1.0.4 are vulnerable. [Reported by Cristobal]
510
729
 
@@ -512,7 +731,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
512
731
  ## 1.0.3 (2009-01-15)
513
732
 
514
733
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
515
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
734
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
516
735
  still decode the incomplete entities, users of those browsers may be
517
736
  vulnerable to malicious script injection on websites using versions of
518
737
  Sanitize prior to 1.0.3.
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in