sanitize 5.2.1 → 6.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/HISTORY.md +130 -0
- data/LICENSE +1 -1
- data/README.md +31 -48
- data/lib/sanitize/config/default.rb +5 -0
- data/lib/sanitize/config/relaxed.rb +3 -1
- data/lib/sanitize/css.rb +33 -0
- data/lib/sanitize/transformers/clean_css.rb +1 -0
- data/lib/sanitize/transformers/clean_doctype.rb +5 -1
- data/lib/sanitize/transformers/clean_element.rb +53 -11
- data/lib/sanitize/version.rb +1 -3
- data/lib/sanitize.rb +2 -2
- data/test/test_clean_comment.rb +16 -16
- data/test/test_clean_css.rb +5 -5
- data/test/test_clean_doctype.rb +15 -15
- data/test/test_clean_element.rb +111 -88
- data/test/test_config.rb +9 -9
- data/test/test_malicious_css.rb +20 -7
- data/test/test_malicious_html.rb +135 -31
- data/test/test_parser.rb +8 -8
- data/test/test_sanitize.rb +28 -28
- data/test/test_sanitize_css.rb +62 -53
- data/test/test_transformers.rb +37 -37
- metadata +16 -28
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 8811451060f77afcf698da8e589994af3c5683d08a0032a279f76b3b556b5f33
|
|
4
|
+
data.tar.gz: d2617a785428b5b99717ef1743cc75dc1f8c53bda53fea59050725a1218b5fe8
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 33b4b13b4369ba159031a1298bd5965b9dbe15921121b58d55155d3e717dd7cadf3495e10683613cd1439055f6d5a57249e540824fd9f98a11ae62db08167573
|
|
7
|
+
data.tar.gz: 40f149e0e3c51283b72332efb4598a81263fba029ea121ede31bb578634de339ed5c162fd49355601568c5cbc08f617879f058bcdfe5ce35afa6322e155cff8b
|
data/HISTORY.md
CHANGED
|
@@ -1,5 +1,135 @@
|
|
|
1
1
|
# Sanitize History
|
|
2
2
|
|
|
3
|
+
## 6.1.3 (2024-08-14)
|
|
4
|
+
|
|
5
|
+
### Bug Fixes
|
|
6
|
+
|
|
7
|
+
* The CSS URL protocol allowlist is now enforced on the nonstandard `-webkit-image-set` CSS function. [@ltk - #242][242]
|
|
8
|
+
|
|
9
|
+
[242]:https://github.com/rgrove/sanitize/pull/242
|
|
10
|
+
|
|
11
|
+
## 6.1.2 (2024-07-27)
|
|
12
|
+
|
|
13
|
+
### Bug Fixes
|
|
14
|
+
|
|
15
|
+
* The CSS URL protocol allowlist is now properly enforced in [CSS Images Module Level 4](https://drafts.csswg.org/css-images-4/) `image` and `image-set` functions. [@ltk - #240][240]
|
|
16
|
+
|
|
17
|
+
[240]:https://github.com/rgrove/sanitize/pull/240
|
|
18
|
+
|
|
19
|
+
## 6.1.1 (2024-06-12)
|
|
20
|
+
|
|
21
|
+
### Bug Fixes
|
|
22
|
+
|
|
23
|
+
* Proactively fixed a compatibility issue with libxml >= 2.13.0 (which will be used in an upcoming version of Nokogiri) that caused HTML doctype sanitization to fail. [@flavorjones - #238][238]
|
|
24
|
+
|
|
25
|
+
[238]:https://github.com/rgrove/sanitize/pull/238
|
|
26
|
+
|
|
27
|
+
## 6.1.0 (2023-09-14)
|
|
28
|
+
|
|
29
|
+
### Features
|
|
30
|
+
|
|
31
|
+
* Added the `text-decoration-skip-ink` and `text-decoration-thickness` CSS properties to the relaxed config. [@martineriksson - #228][228]
|
|
32
|
+
|
|
33
|
+
[228]:https://github.com/rgrove/sanitize/pull/228
|
|
34
|
+
|
|
35
|
+
## 6.0.2 (2023-07-06)
|
|
36
|
+
|
|
37
|
+
### Bug Fixes
|
|
38
|
+
|
|
39
|
+
* CVE-2023-36823: Fixed an HTML+CSS sanitization bypass that could allow XSS
|
|
40
|
+
(cross-site scripting). This issue affects Sanitize versions 3.0.0 through
|
|
41
|
+
6.0.1.
|
|
42
|
+
|
|
43
|
+
When using Sanitize's relaxed config or a custom config that allows `<style>`
|
|
44
|
+
elements and one or more CSS at-rules, carefully crafted input could be used
|
|
45
|
+
to sneak arbitrary HTML through Sanitize.
|
|
46
|
+
|
|
47
|
+
See the following security advisory for additional details:
|
|
48
|
+
[GHSA-f5ww-cq3m-q3g7](https://github.com/rgrove/sanitize/security/advisories/GHSA-f5ww-cq3m-q3g7)
|
|
49
|
+
|
|
50
|
+
Thanks to @cure53 for finding this issue.
|
|
51
|
+
|
|
52
|
+
## 6.0.1 (2023-01-27)
|
|
53
|
+
|
|
54
|
+
### Bug Fixes
|
|
55
|
+
|
|
56
|
+
* Sanitize now always removes `<noscript>` elements and their contents, even
|
|
57
|
+
when `noscript` is in the allowlist.
|
|
58
|
+
|
|
59
|
+
This fixes a sanitization bypass that could occur when `noscript` was allowed
|
|
60
|
+
by a custom allowlist. In this scenario, carefully crafted input could sneak
|
|
61
|
+
arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
|
|
62
|
+
scripting) attack.
|
|
63
|
+
|
|
64
|
+
Sanitize's default configs don't allow `<noscript>` elements and are not
|
|
65
|
+
vulnerable. This issue only affects users who are using a custom config that
|
|
66
|
+
adds `noscript` to the element allowlist.
|
|
67
|
+
|
|
68
|
+
The root cause of this issue is that HTML parsing rules treat the contents of
|
|
69
|
+
a `<noscript>` element differently depending on whether scripting is enabled
|
|
70
|
+
in the user agent. Nokogiri doesn't support scripting so it follows the
|
|
71
|
+
"scripting disabled" rules, but a web browser with scripting enabled will
|
|
72
|
+
follow the "scripting enabled" rules. This means that Sanitize can't reliably
|
|
73
|
+
make the contents of a `<noscript>` element safe for scripting enabled
|
|
74
|
+
browsers, so the safest thing to do is to remove the element and its contents
|
|
75
|
+
entirely.
|
|
76
|
+
|
|
77
|
+
See the following security advisory for additional details:
|
|
78
|
+
[GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
|
|
79
|
+
|
|
80
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
|
81
|
+
(@leeN) for reporting this issue.
|
|
82
|
+
|
|
83
|
+
* Fixed an edge case in which the contents of an "unescaped text" element (such
|
|
84
|
+
as `<noembed>` or `<xmp>`) were not properly escaped if that element was
|
|
85
|
+
allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
|
|
86
|
+
|
|
87
|
+
The only way to encounter this situation was to ignore multiple warnings in
|
|
88
|
+
the readme and create a custom config that allowlisted all the elements
|
|
89
|
+
involved, including `<math>` or `<svg>`. If you're using a default config or
|
|
90
|
+
if you heeded the warnings about MathML and SVG not being supported, you're
|
|
91
|
+
not affected by this issue.
|
|
92
|
+
|
|
93
|
+
Please let this be a reminder that Sanitize cannot safely sanitize MathML or
|
|
94
|
+
SVG content and does not support this use case. The default configs don't
|
|
95
|
+
allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
|
|
96
|
+
custom config may create a security vulnerability in your application.
|
|
97
|
+
|
|
98
|
+
Documentation has been updated to add more warnings and to make the existing
|
|
99
|
+
warnings about this more prominent.
|
|
100
|
+
|
|
101
|
+
Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
|
|
102
|
+
(@leeN) for reporting this issue.
|
|
103
|
+
|
|
104
|
+
## 6.0.0 (2021-08-03)
|
|
105
|
+
|
|
106
|
+
### Potentially Breaking Changes
|
|
107
|
+
|
|
108
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
|
109
|
+
|
|
110
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
|
111
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
|
112
|
+
|
|
113
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
|
114
|
+
|
|
115
|
+
## 5.2.3 (2021-01-11)
|
|
116
|
+
|
|
117
|
+
### Bug Fixes
|
|
118
|
+
|
|
119
|
+
* Ensure protocol sanitization is applied to data attributes.
|
|
120
|
+
[@ccutrer - #207][207]
|
|
121
|
+
|
|
122
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
|
123
|
+
|
|
124
|
+
## 5.2.2 (2021-01-06)
|
|
125
|
+
|
|
126
|
+
### Bug Fixes
|
|
127
|
+
|
|
128
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
|
129
|
+
custom transformer. [@mscrivo - #206][206]
|
|
130
|
+
|
|
131
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
|
132
|
+
|
|
3
133
|
## 5.2.1 (2020-06-16)
|
|
4
134
|
|
|
5
135
|
### Bug Fixes
|
data/LICENSE
CHANGED
data/README.md
CHANGED
|
@@ -11,27 +11,26 @@ protocols within attributes that contain URLs. You can also allow specific CSS
|
|
|
11
11
|
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
|
12
12
|
Any HTML or CSS that you don't explicitly allow will be removed.
|
|
13
13
|
|
|
14
|
-
Sanitize is based on [
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
14
|
+
Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
|
|
15
|
+
the same way modern browsers do, and [Crass][crass], which parses CSS the same
|
|
16
|
+
way modern browsers do. As long as your allowlist config only allows safe markup
|
|
17
|
+
and CSS, even the most malformed or malicious input will be transformed into
|
|
18
|
+
safe output.
|
|
19
19
|
|
|
20
|
-
[](https://travis-ci.org/rgrove/sanitize)
|
|
21
20
|
[](http://badge.fury.io/rb/sanitize)
|
|
21
|
+
[](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
|
|
22
22
|
|
|
23
23
|
[crass]:https://github.com/rgrove/crass
|
|
24
|
-
[
|
|
24
|
+
[nokogiri]:https://github.com/sparklemotion/nokogiri
|
|
25
25
|
|
|
26
26
|
Links
|
|
27
27
|
-----
|
|
28
28
|
|
|
29
29
|
* [Home](https://github.com/rgrove/sanitize/)
|
|
30
|
-
* [API Docs](
|
|
30
|
+
* [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
|
|
31
31
|
* [Issues](https://github.com/rgrove/sanitize/issues)
|
|
32
|
-
* [Release History](https://github.com/rgrove/sanitize/
|
|
33
|
-
* [Online Demo](https://sanitize.
|
|
34
|
-
* [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
|
|
32
|
+
* [Release History](https://github.com/rgrove/sanitize/releases)
|
|
33
|
+
* [Online Demo](https://sanitize-web.fly.dev/)
|
|
35
34
|
|
|
36
35
|
Installation
|
|
37
36
|
-------------
|
|
@@ -72,10 +71,11 @@ Sanitize can sanitize the following types of input:
|
|
|
72
71
|
* Standalone CSS stylesheets
|
|
73
72
|
* Standalone CSS properties
|
|
74
73
|
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
74
|
+
> **Warning**
|
|
75
|
+
>
|
|
76
|
+
> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
|
|
77
|
+
>
|
|
78
|
+
> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
|
|
79
79
|
|
|
80
80
|
### HTML Fragments
|
|
81
81
|
|
|
@@ -118,11 +118,10 @@ Sanitize.document(html,
|
|
|
118
118
|
:elements => ['html']
|
|
119
119
|
)
|
|
120
120
|
# => %[
|
|
121
|
-
#
|
|
122
|
-
# <html>foo
|
|
121
|
+
# <!DOCTYPE html><html>foo
|
|
123
122
|
#
|
|
124
|
-
#
|
|
125
|
-
#
|
|
123
|
+
# </html>
|
|
124
|
+
# ]
|
|
126
125
|
```
|
|
127
126
|
|
|
128
127
|
### CSS in HTML
|
|
@@ -420,15 +419,21 @@ elements not in this array will be removed.
|
|
|
420
419
|
]
|
|
421
420
|
```
|
|
422
421
|
|
|
423
|
-
**Warning
|
|
424
|
-
|
|
425
|
-
|
|
426
|
-
|
|
427
|
-
removed by Sanitize.
|
|
422
|
+
> **Warning**
|
|
423
|
+
>
|
|
424
|
+
> Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
|
|
425
|
+
>
|
|
426
|
+
> By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
|
|
427
|
+
|
|
428
|
+
> **Note**
|
|
429
|
+
>
|
|
430
|
+
> Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
|
|
431
|
+
>
|
|
432
|
+
> This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
|
|
428
433
|
|
|
429
434
|
#### :parser_options (Hash)
|
|
430
435
|
|
|
431
|
-
[Parsing options](https://github.com/rubys/nokogumbo/tree/
|
|
436
|
+
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
|
|
432
437
|
|
|
433
438
|
```ruby
|
|
434
439
|
:parser_options => {
|
|
@@ -469,7 +474,7 @@ If this is an Array or Set of element names, then only the contents of the
|
|
|
469
474
|
specified elements (when filtered) will be removed, and the contents of all
|
|
470
475
|
other filtered elements will be left behind.
|
|
471
476
|
|
|
472
|
-
The default value is
|
|
477
|
+
The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
|
|
473
478
|
|
|
474
479
|
#### :transformers (Array or callable)
|
|
475
480
|
|
|
@@ -667,25 +672,3 @@ html = %[
|
|
|
667
672
|
Sanitize.fragment(html, :transformers => youtube_transformer)
|
|
668
673
|
# => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
|
|
669
674
|
```
|
|
670
|
-
|
|
671
|
-
License
|
|
672
|
-
-------
|
|
673
|
-
|
|
674
|
-
Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
|
|
675
|
-
|
|
676
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
|
677
|
-
this software and associated documentation files (the 'Software'), to deal in
|
|
678
|
-
the Software without restriction, including without limitation the rights to
|
|
679
|
-
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
|
680
|
-
the Software, and to permit persons to whom the Software is furnished to do so,
|
|
681
|
-
subject to the following conditions:
|
|
682
|
-
|
|
683
|
-
The above copyright notice and this permission notice shall be included in all
|
|
684
|
-
copies or substantial portions of the Software.
|
|
685
|
-
|
|
686
|
-
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
687
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
|
688
|
-
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
|
689
|
-
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
|
690
|
-
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
|
691
|
-
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
@@ -54,6 +54,11 @@ class Sanitize
|
|
|
54
54
|
|
|
55
55
|
# HTML elements to allow. By default, no elements are allowed (which means
|
|
56
56
|
# that all HTML will be stripped).
|
|
57
|
+
#
|
|
58
|
+
# Warning: Sanitize cannot safely sanitize the contents of foreign
|
|
59
|
+
# elements (elements in the MathML or SVG namespaces). Do not add `math`
|
|
60
|
+
# or `svg` to this list! If you do, you may create a security
|
|
61
|
+
# vulnerability in your application.
|
|
57
62
|
:elements => [],
|
|
58
63
|
|
|
59
64
|
# HTML parsing options to pass to Nokogumbo.
|
|
@@ -6,7 +6,7 @@ class Sanitize
|
|
|
6
6
|
:elements => BASIC[:elements] + %w[
|
|
7
7
|
address article aside bdi bdo body caption col colgroup data del div
|
|
8
8
|
figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
|
|
9
|
-
img ins main nav rp rt ruby section span style summary
|
|
9
|
+
img ins main nav rp rt ruby section span style summary table tbody
|
|
10
10
|
td tfoot th thead title tr wbr
|
|
11
11
|
],
|
|
12
12
|
|
|
@@ -666,7 +666,9 @@ class Sanitize
|
|
|
666
666
|
text-decoration-color
|
|
667
667
|
text-decoration-line
|
|
668
668
|
text-decoration-skip
|
|
669
|
+
text-decoration-skip-ink
|
|
669
670
|
text-decoration-style
|
|
671
|
+
text-decoration-thickness
|
|
670
672
|
text-emphasis
|
|
671
673
|
text-emphasis-color
|
|
672
674
|
text-emphasis-position
|
data/lib/sanitize/css.rb
CHANGED
|
@@ -229,6 +229,12 @@ class Sanitize; class CSS
|
|
|
229
229
|
rule
|
|
230
230
|
end
|
|
231
231
|
|
|
232
|
+
# Returns `true` if the given CSS function name is an image-related function
|
|
233
|
+
# that may contain image URLs that need to be validated.
|
|
234
|
+
def image_function?(name)
|
|
235
|
+
['image', 'image-set', '-webkit-image-set'].include?(name)
|
|
236
|
+
end
|
|
237
|
+
|
|
232
238
|
# Passes the URL value of an @import rule to a block to ensure
|
|
233
239
|
# it's an allowed URL
|
|
234
240
|
def import_url_allowed?(rule)
|
|
@@ -272,6 +278,10 @@ class Sanitize; class CSS
|
|
|
272
278
|
return nil unless valid_url?(child)
|
|
273
279
|
end
|
|
274
280
|
|
|
281
|
+
if image_function?(name)
|
|
282
|
+
return nil unless valid_image?(child)
|
|
283
|
+
end
|
|
284
|
+
|
|
275
285
|
combined_value << name
|
|
276
286
|
return nil if name == 'expression' || combined_value == 'expression'
|
|
277
287
|
end
|
|
@@ -345,4 +355,27 @@ class Sanitize; class CSS
|
|
|
345
355
|
false
|
|
346
356
|
end
|
|
347
357
|
|
|
358
|
+
# Returns `true` if the given node is an image-related function and contains
|
|
359
|
+
# only strings that use an allowlisted protocol.
|
|
360
|
+
def valid_image?(node)
|
|
361
|
+
return false unless node[:node] == :function
|
|
362
|
+
return false unless node.key?(:name) && image_function?(node[:name].downcase)
|
|
363
|
+
return false unless Array === node[:value]
|
|
364
|
+
|
|
365
|
+
node[:value].each do |token|
|
|
366
|
+
return false unless Hash === token
|
|
367
|
+
|
|
368
|
+
case token[:node]
|
|
369
|
+
when :string
|
|
370
|
+
if token[:value] =~ Sanitize::REGEX_PROTOCOL
|
|
371
|
+
return false unless @config[:protocols].include?($1.downcase)
|
|
372
|
+
else
|
|
373
|
+
return false unless @config[:protocols].include?(:relative)
|
|
374
|
+
end
|
|
375
|
+
else
|
|
376
|
+
next
|
|
377
|
+
end
|
|
378
|
+
end
|
|
379
|
+
end
|
|
380
|
+
|
|
348
381
|
end; end
|
|
@@ -9,7 +9,11 @@ class Sanitize; module Transformers
|
|
|
9
9
|
|
|
10
10
|
if node.type == Nokogiri::XML::Node::DTD_NODE
|
|
11
11
|
if env[:config][:allow_doctype]
|
|
12
|
-
node.name
|
|
12
|
+
if node.name != "html"
|
|
13
|
+
document = node.document
|
|
14
|
+
node.unlink
|
|
15
|
+
document.create_internal_subset("html", nil, nil)
|
|
16
|
+
end
|
|
13
17
|
else
|
|
14
18
|
node.unlink
|
|
15
19
|
end
|
|
@@ -1,5 +1,6 @@
|
|
|
1
1
|
# encoding: utf-8
|
|
2
2
|
|
|
3
|
+
require 'cgi'
|
|
3
4
|
require 'set'
|
|
4
5
|
|
|
5
6
|
class Sanitize; module Transformers; class CleanElement
|
|
@@ -18,6 +19,18 @@ class Sanitize; module Transformers; class CleanElement
|
|
|
18
19
|
# http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
|
|
19
20
|
REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
|
|
20
21
|
|
|
22
|
+
# Elements whose content is treated as unescaped text by HTML parsers.
|
|
23
|
+
UNESCAPED_TEXT_ELEMENTS = Set.new(%w[
|
|
24
|
+
iframe
|
|
25
|
+
noembed
|
|
26
|
+
noframes
|
|
27
|
+
noscript
|
|
28
|
+
plaintext
|
|
29
|
+
script
|
|
30
|
+
style
|
|
31
|
+
xmp
|
|
32
|
+
])
|
|
33
|
+
|
|
21
34
|
# Attributes that need additional escaping on `<a>` elements due to unsafe
|
|
22
35
|
# libxml2 behavior.
|
|
23
36
|
UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
|
|
@@ -120,18 +133,15 @@ class Sanitize; module Transformers; class CleanElement
|
|
|
120
133
|
attr_name = attr.name.downcase
|
|
121
134
|
|
|
122
135
|
unless attr_allowlist.include?(attr_name)
|
|
123
|
-
# The attribute isn't allowed
|
|
124
|
-
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
# attribute
|
|
128
|
-
|
|
136
|
+
# The attribute isn't in the allowlist, but may still be allowed if
|
|
137
|
+
# it's a data attribute.
|
|
138
|
+
|
|
139
|
+
unless allow_data_attributes && attr_name.start_with?('data-') && attr_name =~ REGEX_DATA_ATTR
|
|
140
|
+
# Either the attribute isn't a data attribute or arbitrary data
|
|
141
|
+
# attributes aren't allowed. Remove the attribute.
|
|
142
|
+
attr.unlink
|
|
143
|
+
next
|
|
129
144
|
end
|
|
130
|
-
|
|
131
|
-
# Either the attribute isn't a data attribute or arbitrary data
|
|
132
|
-
# attributes aren't allowed. Remove the attribute.
|
|
133
|
-
attr.unlink
|
|
134
|
-
next
|
|
135
145
|
end
|
|
136
146
|
|
|
137
147
|
# The attribute is allowed.
|
|
@@ -188,6 +198,28 @@ class Sanitize; module Transformers; class CleanElement
|
|
|
188
198
|
@add_attributes[name].each {|key, val| node[key] = val }
|
|
189
199
|
end
|
|
190
200
|
|
|
201
|
+
# Make a best effort to ensure that text nodes in invalid "unescaped text"
|
|
202
|
+
# elements that are inside a math or svg namespace are properly escaped so
|
|
203
|
+
# that they don't get parsed as HTML.
|
|
204
|
+
#
|
|
205
|
+
# Sanitize is explicitly documented as not supporting MathML or SVG, but
|
|
206
|
+
# people sometimes allow `<math>` and `<svg>` elements in their custom
|
|
207
|
+
# configs without realizing that it's not safe. This workaround makes it
|
|
208
|
+
# slightly less unsafe, but you still shouldn't allow `<math>` or `<svg>`
|
|
209
|
+
# because Nokogiri doesn't parse them the same way browsers do and Sanitize
|
|
210
|
+
# can't guarantee that their contents are safe.
|
|
211
|
+
unless node.namespace.nil?
|
|
212
|
+
prefix = node.namespace.prefix
|
|
213
|
+
|
|
214
|
+
if (prefix == 'math' || prefix == 'svg') && UNESCAPED_TEXT_ELEMENTS.include?(name)
|
|
215
|
+
node.children.each do |child|
|
|
216
|
+
if child.type == Nokogiri::XML::Node::TEXT_NODE
|
|
217
|
+
child.content = CGI.escapeHTML(child.content)
|
|
218
|
+
end
|
|
219
|
+
end
|
|
220
|
+
end
|
|
221
|
+
end
|
|
222
|
+
|
|
191
223
|
# Element-specific special cases.
|
|
192
224
|
case name
|
|
193
225
|
|
|
@@ -220,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement
|
|
|
220
252
|
|
|
221
253
|
node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
|
|
222
254
|
end
|
|
255
|
+
|
|
256
|
+
# A `<noscript>` element's content is parsed differently in browsers
|
|
257
|
+
# depending on whether or not scripting is enabled. Since Nokogiri doesn't
|
|
258
|
+
# support scripting, it always parses `<noscript>` elements as if scripting
|
|
259
|
+
# is disabled. This results in edge cases where it's not possible to
|
|
260
|
+
# reliably sanitize the contents of a `<noscript>` element because Nokogiri
|
|
261
|
+
# can't fully replicate the parsing behavior of a scripting-enabled browser.
|
|
262
|
+
# The safest thing to do is to simply remove all `<noscript>` elements.
|
|
263
|
+
when 'noscript'
|
|
264
|
+
node.unlink
|
|
223
265
|
end
|
|
224
266
|
end
|
|
225
267
|
|
data/lib/sanitize/version.rb
CHANGED
data/lib/sanitize.rb
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
# encoding: utf-8
|
|
2
2
|
|
|
3
|
-
require '
|
|
3
|
+
require 'nokogiri'
|
|
4
4
|
require 'set'
|
|
5
5
|
|
|
6
6
|
require_relative 'sanitize/version'
|
|
@@ -204,7 +204,7 @@ class Sanitize
|
|
|
204
204
|
config[:node_name] = node.name.downcase
|
|
205
205
|
config[:node_allowlist] = config[:node_whitelist] = node_allowlist
|
|
206
206
|
|
|
207
|
-
result = transformer.call(config)
|
|
207
|
+
result = transformer.call(**config)
|
|
208
208
|
|
|
209
209
|
if result.is_a?(Hash)
|
|
210
210
|
result_allowlist = result[:node_allowlist] || result[:node_whitelist]
|
data/test/test_clean_comment.rb
CHANGED
|
@@ -11,18 +11,18 @@ describe 'Sanitize::Transformers::CleanComment' do
|
|
|
11
11
|
end
|
|
12
12
|
|
|
13
13
|
it 'should remove comments' do
|
|
14
|
-
@s.fragment('foo <!-- comment --> bar').must_equal 'foo bar'
|
|
15
|
-
@s.fragment('foo <!-- ').must_equal 'foo '
|
|
16
|
-
@s.fragment('foo <!-- - -> bar').must_equal 'foo '
|
|
17
|
-
@s.fragment("foo <!--\n\n\n\n-->bar").must_equal 'foo bar'
|
|
18
|
-
@s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo --> -->bar'
|
|
19
|
-
@s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>>bar</div>'
|
|
14
|
+
_(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo bar'
|
|
15
|
+
_(@s.fragment('foo <!-- ')).must_equal 'foo '
|
|
16
|
+
_(@s.fragment('foo <!-- - -> bar')).must_equal 'foo '
|
|
17
|
+
_(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal 'foo bar'
|
|
18
|
+
_(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo --> -->bar'
|
|
19
|
+
_(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>>bar</div>'
|
|
20
20
|
|
|
21
21
|
# Special case: the comment markup is inside a <script>, which makes it
|
|
22
22
|
# text content and not an actual HTML comment.
|
|
23
|
-
@s.fragment("<script><!-- comment --></script>").must_equal ''
|
|
23
|
+
_(@s.fragment("<script><!-- comment --></script>")).must_equal ''
|
|
24
24
|
|
|
25
|
-
Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script'])
|
|
25
|
+
_(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script']))
|
|
26
26
|
.must_equal '<script><!-- comment --></script>'
|
|
27
27
|
end
|
|
28
28
|
end
|
|
@@ -33,14 +33,14 @@ describe 'Sanitize::Transformers::CleanComment' do
|
|
|
33
33
|
end
|
|
34
34
|
|
|
35
35
|
it 'should allow comments' do
|
|
36
|
-
@s.fragment('foo <!-- comment --> bar').must_equal 'foo <!-- comment --> bar'
|
|
37
|
-
@s.fragment('foo <!-- ').must_equal 'foo <!-- -->'
|
|
38
|
-
@s.fragment('foo <!-- - -> bar').must_equal 'foo <!-- - -> bar-->'
|
|
39
|
-
@s.fragment("foo <!--\n\n\n\n-->bar").must_equal "foo <!--\n\n\n\n-->bar"
|
|
40
|
-
@s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo <!-- <!-- <!-- --> --> -->bar'
|
|
41
|
-
@s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>>bar</div>'
|
|
42
|
-
|
|
43
|
-
Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script'])
|
|
36
|
+
_(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo <!-- comment --> bar'
|
|
37
|
+
_(@s.fragment('foo <!-- ')).must_equal 'foo <!-- -->'
|
|
38
|
+
_(@s.fragment('foo <!-- - -> bar')).must_equal 'foo <!-- - -> bar-->'
|
|
39
|
+
_(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal "foo <!--\n\n\n\n-->bar"
|
|
40
|
+
_(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo <!-- <!-- <!-- --> --> -->bar'
|
|
41
|
+
_(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>>bar</div>'
|
|
42
|
+
|
|
43
|
+
_(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script']))
|
|
44
44
|
.must_equal '<script><!-- comment --></script>'
|
|
45
45
|
end
|
|
46
46
|
end
|
data/test/test_clean_css.rb
CHANGED
|
@@ -10,15 +10,15 @@ describe 'Sanitize::Transformers::CSS::CleanAttribute' do
|
|
|
10
10
|
end
|
|
11
11
|
|
|
12
12
|
it 'should sanitize CSS properties in style attributes' do
|
|
13
|
-
@s.fragment(%[
|
|
13
|
+
_(@s.fragment(%[
|
|
14
14
|
<div style="color: #fff; width: expression(alert(1)); /* <-- evil! */"></div>
|
|
15
|
-
].strip).must_equal %[
|
|
15
|
+
].strip)).must_equal %[
|
|
16
16
|
<div style="color: #fff; /* <-- evil! */"></div>
|
|
17
17
|
].strip
|
|
18
18
|
end
|
|
19
19
|
|
|
20
20
|
it 'should remove the style attribute if the sanitized CSS is empty' do
|
|
21
|
-
@s.fragment('<div style="width: expression(alert(1))"></div>').
|
|
21
|
+
_(@s.fragment('<div style="width: expression(alert(1))"></div>')).
|
|
22
22
|
must_equal '<div></div>'
|
|
23
23
|
end
|
|
24
24
|
end
|
|
@@ -46,7 +46,7 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
|
|
|
46
46
|
</style>
|
|
47
47
|
].strip
|
|
48
48
|
|
|
49
|
-
@s.fragment(html).must_equal %[
|
|
49
|
+
_(@s.fragment(html)).must_equal %[
|
|
50
50
|
<style>
|
|
51
51
|
/* Yay CSS! */
|
|
52
52
|
.foo { color: #fff; }
|
|
@@ -62,6 +62,6 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
|
|
|
62
62
|
end
|
|
63
63
|
|
|
64
64
|
it 'should remove the <style> element if the sanitized CSS is empty' do
|
|
65
|
-
@s.fragment('<style></style>').must_equal ''
|
|
65
|
+
_(@s.fragment('<style></style>')).must_equal ''
|
|
66
66
|
end
|
|
67
67
|
end
|