sanitize 5.1.0 → 6.0.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +103 -18
- data/LICENSE +1 -1
- data/README.md +52 -65
- data/lib/sanitize/config/default.rb +1 -1
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +17 -20
- data/lib/sanitize/version.rb +1 -1
- data/lib/sanitize.rb +17 -13
- data/test/test_clean_element.rb +40 -14
- data/test/test_malicious_html.rb +20 -1
- data/test/test_parser.rb +1 -1
- data/test/test_sanitize.rb +5 -5
- data/test/test_sanitize_css.rb +4 -4
- data/test/test_transformers.rb +25 -19
- metadata +17 -31
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
|
4
|
+
data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
|
7
|
+
data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,90 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 6.0.0 (2021-08-03)
|
4
|
+
|
5
|
+
### Potentially Breaking Changes
|
6
|
+
|
7
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
8
|
+
|
9
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
10
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
11
|
+
|
12
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
13
|
+
|
14
|
+
## 5.2.3 (2021-01-11)
|
15
|
+
|
16
|
+
### Bug Fixes
|
17
|
+
|
18
|
+
* Ensure protocol sanitization is applied to data attributes.
|
19
|
+
[@ccutrer - #207][207]
|
20
|
+
|
21
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
22
|
+
|
23
|
+
## 5.2.2 (2021-01-06)
|
24
|
+
|
25
|
+
### Bug Fixes
|
26
|
+
|
27
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
28
|
+
custom transformer. [@mscrivo - #206][206]
|
29
|
+
|
30
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
31
|
+
|
32
|
+
## 5.2.1 (2020-06-16)
|
33
|
+
|
34
|
+
### Bug Fixes
|
35
|
+
|
36
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
37
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
38
|
+
|
39
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
40
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
41
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
42
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
43
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
44
|
+
|
45
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
46
|
+
config or a custom config that allows one or more of the following HTML
|
47
|
+
elements:
|
48
|
+
|
49
|
+
- `iframe`
|
50
|
+
- `math`
|
51
|
+
- `noembed`
|
52
|
+
- `noframes`
|
53
|
+
- `noscript`
|
54
|
+
- `plaintext`
|
55
|
+
- `script`
|
56
|
+
- `style`
|
57
|
+
- `svg`
|
58
|
+
- `xmp`
|
59
|
+
|
60
|
+
See the security advisory for more details, including a workaround if you're
|
61
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
62
|
+
|
63
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
64
|
+
helping to verify the fix.
|
65
|
+
|
66
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
67
|
+
|
68
|
+
## 5.2.0 (2020-06-06)
|
69
|
+
|
70
|
+
### Changes
|
71
|
+
|
72
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
73
|
+
source and documentation.
|
74
|
+
|
75
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
76
|
+
intent, there are inherent racial connotations in the implication that white
|
77
|
+
is good and black (as in "blacklist") is not.
|
78
|
+
|
79
|
+
This is a change I should have made long ago, and I apologize for not making
|
80
|
+
it sooner.
|
81
|
+
|
82
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
83
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
84
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
85
|
+
but they are no longer documented and may be removed in a future semver major
|
86
|
+
release.
|
87
|
+
|
3
88
|
## 5.1.0 (2019-09-07)
|
4
89
|
|
5
90
|
### Features
|
@@ -45,7 +130,7 @@ review the changes below carefully.
|
|
45
130
|
- `script`
|
46
131
|
- `style`
|
47
132
|
|
48
|
-
* Children of
|
133
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
49
134
|
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
50
135
|
`iframe` elements were allowed to contain fallback content for legacy
|
51
136
|
browsers, but it's been almost two decades since that was useful.
|
@@ -84,7 +169,7 @@ review the changes below carefully.
|
|
84
169
|
|
85
170
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
86
171
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
87
|
-
escaped output, allowing non-
|
172
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
88
173
|
elements.
|
89
174
|
|
90
175
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -128,7 +213,7 @@ review the changes below carefully.
|
|
128
213
|
|
129
214
|
## 4.4.0 (2016-09-29)
|
130
215
|
|
131
|
-
* Added `srcset` to the attribute
|
216
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
132
217
|
config. [@ejtttje - #156][156]
|
133
218
|
|
134
219
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -249,7 +334,7 @@ review the changes below carefully.
|
|
249
334
|
## 3.0.4 (2014-12-12)
|
250
335
|
|
251
336
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
252
|
-
caused the URL to be removed even when the protocol was
|
337
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
253
338
|
[@benubois - #126][126]
|
254
339
|
|
255
340
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -258,7 +343,7 @@ review the changes below carefully.
|
|
258
343
|
## 3.0.3 (2014-10-29)
|
259
344
|
|
260
345
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
261
|
-
`@media` block, causing them to be removed even when
|
346
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
262
347
|
have allowed them to remain. [#121][121]
|
263
348
|
|
264
349
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -323,7 +408,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
323
408
|
* The `clean_node!` method was renamed to `node!`.
|
324
409
|
|
325
410
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
326
|
-
isn't
|
411
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
327
412
|
regardless of the `:remove_contents` config setting.
|
328
413
|
|
329
414
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -334,7 +419,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
334
419
|
|
335
420
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
336
421
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
337
|
-
|
422
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
338
423
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
339
424
|
sanitize CSS stylesheets or properties.
|
340
425
|
|
@@ -386,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
386
471
|
|
387
472
|
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
388
473
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
389
|
-
escaped output, allowing non-
|
474
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
390
475
|
elements.
|
391
476
|
|
392
477
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -401,7 +486,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
401
486
|
|
402
487
|
## 2.1.0 (2014-01-13)
|
403
488
|
|
404
|
-
* Added support for
|
489
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
405
490
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
406
491
|
indicate that arbitrary data attributes should be allowed on an element.
|
407
492
|
|
@@ -482,12 +567,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
482
567
|
the default depth-first mode.
|
483
568
|
|
484
569
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
485
|
-
elements to the
|
570
|
+
elements to the allowlists for the basic and relaxed configs.
|
486
571
|
|
487
572
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
488
|
-
`ruby`, and `wbr` elements to the
|
573
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
489
574
|
|
490
|
-
* The `dir`, `lang`, and `title` attributes are now
|
575
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
491
576
|
elements in the relaxed config.
|
492
577
|
|
493
578
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -498,7 +583,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
498
583
|
## 1.2.1 (2010-04-20)
|
499
584
|
|
500
585
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
501
|
-
remove the contents of all non-
|
586
|
+
remove the contents of all non-allowlisted elements in addition to the
|
502
587
|
elements themselves. If set to an array of element names, Sanitize will
|
503
588
|
remove the contents of only those elements (when filtered), and leave the
|
504
589
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -526,7 +611,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
526
611
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
527
612
|
all its children.
|
528
613
|
|
529
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
614
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
530
615
|
David Reese]
|
531
616
|
|
532
617
|
|
@@ -546,7 +631,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
546
631
|
|
547
632
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
548
633
|
being downcased in recent versions of Hpricot. This was exploitable to
|
549
|
-
prevent non-
|
634
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
550
635
|
Wanicur]
|
551
636
|
|
552
637
|
|
@@ -576,7 +661,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
576
661
|
|
577
662
|
## 1.0.5 (2009-02-05)
|
578
663
|
|
579
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
664
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
580
665
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
581
666
|
Dev Purkayastha]
|
582
667
|
|
@@ -586,7 +671,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
586
671
|
|
587
672
|
## 1.0.4 (2009-01-16)
|
588
673
|
|
589
|
-
* Fixed a bug that made it possible to sneak a non-
|
674
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
590
675
|
by repeating it several times in a row. All versions of Sanitize prior to
|
591
676
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
592
677
|
|
@@ -594,7 +679,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
594
679
|
## 1.0.3 (2009-01-15)
|
595
680
|
|
596
681
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
597
|
-
prevent non-
|
682
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
598
683
|
still decode the incomplete entities, users of those browsers may be
|
599
684
|
vulnerable to malicious script injection on websites using versions of
|
600
685
|
Sanitize prior to 1.0.3.
|
data/LICENSE
CHANGED
data/README.md
CHANGED
@@ -1,28 +1,27 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
be removed.
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
14
13
|
|
15
|
-
Sanitize is based on [
|
14
|
+
Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
|
16
15
|
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
|
17
|
-
exactly the same way modern browsers do. As long as your
|
16
|
+
exactly the same way modern browsers do. As long as your allowlist config only
|
18
17
|
allows safe markup and CSS, even the most malformed or malicious input will be
|
19
18
|
transformed into safe output.
|
20
19
|
|
21
|
-
[![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
|
22
20
|
[![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
|
21
|
+
[![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
|
23
22
|
|
24
23
|
[crass]:https://github.com/rgrove/crass
|
25
|
-
[
|
24
|
+
[nokogumbo]:https://github.com/rubys/nokogumbo
|
26
25
|
|
27
26
|
Links
|
28
27
|
-----
|
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
|
|
73
72
|
* Standalone CSS stylesheets
|
74
73
|
* Standalone CSS properties
|
75
74
|
|
75
|
+
However, please note that Sanitize _cannot_ fully sanitize the contents of
|
76
|
+
`<math>` or `<svg>` elements, since these elements don't follow the same parsing
|
77
|
+
rules as the rest of HTML. If this is something you need, you may want to look
|
78
|
+
for another solution.
|
79
|
+
|
76
80
|
### HTML Fragments
|
77
81
|
|
78
82
|
A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
|
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
|
|
88
92
|
# => 'foo'
|
89
93
|
```
|
90
94
|
|
91
|
-
To keep certain elements, add them to the element
|
95
|
+
To keep certain elements, add them to the element allowlist.
|
92
96
|
|
93
97
|
```ruby
|
94
98
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
101
|
|
98
102
|
### HTML Documents
|
99
103
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
104
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
105
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
106
|
definitions.
|
103
107
|
|
@@ -123,8 +127,8 @@ Sanitize.document(html,
|
|
123
127
|
|
124
128
|
### CSS in HTML
|
125
129
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
130
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
131
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
132
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
133
|
allow CSS comments or browser compatibility hacks.
|
130
134
|
|
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
271
|
|
268
272
|
```ruby
|
269
273
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
274
|
+
# existing allowlisted elements.
|
271
275
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
276
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
277
|
:remove_contents => true
|
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
399
|
|
396
400
|
##### :css => :properties (Array or Set)
|
397
401
|
|
398
|
-
|
399
|
-
lowercase.
|
402
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
403
|
|
401
404
|
##### :css => :protocols (Array or Set)
|
402
405
|
|
@@ -417,9 +420,15 @@ elements not in this array will be removed.
|
|
417
420
|
]
|
418
421
|
```
|
419
422
|
|
423
|
+
**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
|
424
|
+
elements, since these elements don't follow the same parsing rules as the rest
|
425
|
+
of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
|
426
|
+
content inside them will be allowed, even if that content would otherwise be
|
427
|
+
removed by Sanitize.
|
428
|
+
|
420
429
|
#### :parser_options (Hash)
|
421
430
|
|
422
|
-
[Parsing options](https://github.com/rubys/nokogumbo/tree/
|
431
|
+
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
|
423
432
|
|
424
433
|
```ruby
|
425
434
|
:parser_options => {
|
@@ -452,7 +461,7 @@ include the symbol `:relative` in the protocol array:
|
|
452
461
|
|
453
462
|
#### :remove_contents (boolean or Array or Set)
|
454
463
|
|
455
|
-
If this is `true`, Sanitize will remove the contents of any non-
|
464
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
456
465
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
457
466
|
safe parts of an element's contents behind when the element is removed.
|
458
467
|
|
@@ -460,7 +469,7 @@ If this is an Array or Set of element names, then only the contents of the
|
|
460
469
|
specified elements (when filtered) will be removed, and the contents of all
|
461
470
|
other filtered elements will be left behind.
|
462
471
|
|
463
|
-
The default value is
|
472
|
+
The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
|
464
473
|
|
465
474
|
#### :transformers (Array or callable)
|
466
475
|
|
@@ -518,33 +527,33 @@ argument a Hash that contains the following items:
|
|
518
527
|
|
519
528
|
* **:config** - The current Sanitize configuration Hash.
|
520
529
|
|
521
|
-
* **:
|
530
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
522
531
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
523
|
-
a node that a previous transformer has
|
532
|
+
a node that a previous transformer has allowlisted.
|
524
533
|
|
525
534
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
526
535
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
527
536
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
528
537
|
selectively ignore node types you aren't interested in.
|
529
538
|
|
539
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
540
|
+
document that have been allowlisted by previous transformers, if any. It's
|
541
|
+
generally bad form to remove a node that a previous transformer has
|
542
|
+
allowlisted.
|
543
|
+
|
530
544
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
531
545
|
"div" or "span"). For non-element nodes, the name will be something like
|
532
546
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
533
547
|
|
534
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
535
|
-
document that have been whitelisted by previous transformers, if any. It's
|
536
|
-
generally bad form to remove a node that a previous transformer has
|
537
|
-
whitelisted.
|
538
|
-
|
539
548
|
### Output
|
540
549
|
|
541
550
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
542
551
|
which may contain the following items:
|
543
552
|
|
544
|
-
* **:
|
545
|
-
to add to the document's
|
546
|
-
These specific nodes and all their attributes will be
|
547
|
-
their children will not be.
|
553
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
554
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
555
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
556
|
+
but their children will not be.
|
548
557
|
|
549
558
|
If a transformer returns anything other than a Hash, the return value will be
|
550
559
|
ignored.
|
@@ -587,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
587
596
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
588
597
|
your own hands.
|
589
598
|
|
590
|
-
### Example: Transformer to
|
599
|
+
### Example: Transformer to allow image URLs by domain
|
591
600
|
|
592
601
|
The following example demonstrates how to remove image elements unless they use
|
593
602
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
594
|
-
element and its `src` attribute are already
|
603
|
+
element and its `src` attribute are already allowlisted.
|
595
604
|
|
596
605
|
```ruby
|
597
606
|
require 'uri'
|
598
607
|
|
599
|
-
|
608
|
+
image_allowlist_transformer = lambda do |env|
|
600
609
|
# Ignore everything except <img> elements.
|
601
610
|
return unless env[:node_name] == 'img'
|
602
611
|
|
@@ -612,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
|
|
612
621
|
end
|
613
622
|
```
|
614
623
|
|
615
|
-
### Example: Transformer to
|
624
|
+
### Example: Transformer to allow YouTube video embeds
|
616
625
|
|
617
626
|
The following example demonstrates how to create a transformer that will safely
|
618
|
-
|
619
|
-
|
620
|
-
|
627
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
628
|
+
content, which would be the case if you tried to do this by just allowing all
|
629
|
+
`<iframe>` elements:
|
621
630
|
|
622
631
|
```ruby
|
623
632
|
youtube_transformer = lambda do |env|
|
624
633
|
node = env[:node]
|
625
634
|
node_name = env[:node_name]
|
626
635
|
|
627
|
-
# Don't continue if this node is already
|
628
|
-
return if env[:
|
636
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
637
|
+
return if env[:is_allowlisted] || !node.element?
|
629
638
|
|
630
639
|
# Don't continue unless the node is an iframe.
|
631
640
|
return unless node_name == 'iframe'
|
@@ -646,8 +655,8 @@ youtube_transformer = lambda do |env|
|
|
646
655
|
|
647
656
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
648
657
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
649
|
-
# to
|
650
|
-
{:
|
658
|
+
# to allowlist the current node.
|
659
|
+
{:node_allowlist => [node]}
|
651
660
|
end
|
652
661
|
|
653
662
|
html = %[
|
@@ -658,25 +667,3 @@ html = %[
|
|
658
667
|
Sanitize.fragment(html, :transformers => youtube_transformer)
|
659
668
|
# => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
|
660
669
|
```
|
661
|
-
|
662
|
-
License
|
663
|
-
-------
|
664
|
-
|
665
|
-
Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
|
666
|
-
|
667
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
668
|
-
this software and associated documentation files (the 'Software'), to deal in
|
669
|
-
the Software without restriction, including without limitation the rights to
|
670
|
-
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
671
|
-
the Software, and to permit persons to whom the Software is furnished to do so,
|
672
|
-
subject to the following conditions:
|
673
|
-
|
674
|
-
The above copyright notice and this permission notice shall be included in all
|
675
|
-
copies or substantial portions of the Software.
|
676
|
-
|
677
|
-
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
678
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
679
|
-
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
680
|
-
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
681
|
-
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
682
|
-
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@@ -74,7 +74,7 @@ class Sanitize
|
|
74
74
|
# the specified elements (when filtered) will be removed, and the contents
|
75
75
|
# of all other filtered elements will be left behind.
|
76
76
|
:remove_contents => %w[
|
77
|
-
iframe noembed noframes noscript script style
|
77
|
+
iframe math noembed noframes noscript plaintext script style svg xmp
|
78
78
|
],
|
79
79
|
|
80
80
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
@@ -6,7 +6,7 @@ class Sanitize
|
|
6
6
|
:elements => BASIC[:elements] + %w[
|
7
7
|
address article aside bdi bdo body caption col colgroup data del div
|
8
8
|
figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
|
9
|
-
img ins main nav rp rt ruby section span style summary
|
9
|
+
img ins main nav rp rt ruby section span style summary table tbody
|
10
10
|
td tfoot th thead title tr wbr
|
11
11
|
],
|
12
12
|
|
data/lib/sanitize/css.rb
CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
|
|
175
175
|
next prop
|
176
176
|
|
177
177
|
when :semicolon
|
178
|
-
# Only preserve the semicolon if it was preceded by
|
178
|
+
# Only preserve the semicolon if it was preceded by an allowlisted
|
179
179
|
# property. Otherwise, omit it in order to prevent redundant semicolons.
|
180
180
|
if preceded_by_property
|
181
181
|
preceded_by_property = false
|
@@ -296,7 +296,7 @@ class Sanitize; class CSS
|
|
296
296
|
end
|
297
297
|
|
298
298
|
# Returns `true` if the given node (which may be of type `:url` or
|
299
|
-
# `:function`, since the CSS syntax can produce both) uses
|
299
|
+
# `:function`, since the CSS syntax can produce both) uses an allowlisted
|
300
300
|
# protocol.
|
301
301
|
def valid_url?(node)
|
302
302
|
type = node[:node]
|
@@ -1,6 +1,6 @@
|
|
1
1
|
class Sanitize; module Transformers; module CSS
|
2
2
|
|
3
|
-
# Enforces a CSS
|
3
|
+
# Enforces a CSS allowlist on the contents of `style` attributes.
|
4
4
|
class CleanAttribute
|
5
5
|
def initialize(sanitizer_or_config)
|
6
6
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -14,7 +14,7 @@ class CleanAttribute
|
|
14
14
|
node = env[:node]
|
15
15
|
|
16
16
|
return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
|
17
|
-
node.key?('style') && !env[:
|
17
|
+
node.key?('style') && !env[:is_allowlisted]
|
18
18
|
|
19
19
|
attr = node.attribute('style')
|
20
20
|
css = @scss.properties(attr.value)
|
@@ -27,7 +27,7 @@ class CleanAttribute
|
|
27
27
|
end
|
28
28
|
end
|
29
29
|
|
30
|
-
# Enforces a CSS
|
30
|
+
# Enforces a CSS allowlist on the contents of `<style>` elements.
|
31
31
|
class CleanElement
|
32
32
|
def initialize(sanitizer_or_config)
|
33
33
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement
|
|
76
76
|
|
77
77
|
def call(env)
|
78
78
|
node = env[:node]
|
79
|
-
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:
|
79
|
+
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
|
80
80
|
|
81
81
|
name = env[:node_name]
|
82
82
|
|
83
|
-
# Delete any element that isn't in the config
|
83
|
+
# Delete any element that isn't in the config allowlist, unless the node has
|
84
84
|
# already been deleted from the document.
|
85
85
|
#
|
86
86
|
# It's important that we not try to reparent the children of a node that has
|
@@ -107,34 +107,31 @@ class Sanitize; module Transformers; class CleanElement
|
|
107
107
|
return
|
108
108
|
end
|
109
109
|
|
110
|
-
|
110
|
+
attr_allowlist = @attributes[name] || @attributes[:all]
|
111
111
|
|
112
|
-
if
|
113
|
-
# Delete all attributes from elements with no
|
112
|
+
if attr_allowlist.nil?
|
113
|
+
# Delete all attributes from elements with no allowlisted attributes.
|
114
114
|
node.attribute_nodes.each {|attr| attr.unlink }
|
115
115
|
else
|
116
|
-
allow_data_attributes =
|
116
|
+
allow_data_attributes = attr_allowlist.include?(:data)
|
117
117
|
|
118
118
|
# Delete any attribute that isn't allowed on this element.
|
119
119
|
node.attribute_nodes.each do |attr|
|
120
120
|
attr_name = attr.name.downcase
|
121
121
|
|
122
|
-
unless
|
123
|
-
# The attribute isn't
|
122
|
+
unless attr_allowlist.include?(attr_name)
|
123
|
+
# The attribute isn't in the allowlist, but may still be allowed if
|
124
|
+
# it's a data attribute.
|
124
125
|
|
125
|
-
|
126
|
-
#
|
127
|
-
#
|
128
|
-
|
126
|
+
unless allow_data_attributes && attr_name.start_with?('data-') && attr_name =~ REGEX_DATA_ATTR
|
127
|
+
# Either the attribute isn't a data attribute or arbitrary data
|
128
|
+
# attributes aren't allowed. Remove the attribute.
|
129
|
+
attr.unlink
|
130
|
+
next
|
129
131
|
end
|
130
|
-
|
131
|
-
# Either the attribute isn't a data attribute or arbitrary data
|
132
|
-
# attributes aren't allowed. Remove the attribute.
|
133
|
-
attr.unlink
|
134
|
-
next
|
135
132
|
end
|
136
133
|
|
137
|
-
# The attribute is
|
134
|
+
# The attribute is allowed.
|
138
135
|
|
139
136
|
# Remove any attributes that use unacceptable protocols.
|
140
137
|
if @protocols.include?(name) && @protocols[name].include?(attr_name)
|
@@ -162,7 +159,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
162
159
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
163
160
|
# attempt to preserve server-side includes. This can result in XSS since
|
164
161
|
# an unescaped double quote can allow an attacker to inject a
|
165
|
-
# non-
|
162
|
+
# non-allowlisted attribute.
|
166
163
|
#
|
167
164
|
# Sanitize works around this by implementing its own escaping for
|
168
165
|
# affected attributes, some of which can exist on any element and some
|
@@ -191,7 +188,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
191
188
|
# Element-specific special cases.
|
192
189
|
case name
|
193
190
|
|
194
|
-
# If this is
|
191
|
+
# If this is an allowlisted iframe that has children, remove all its
|
195
192
|
# children. The HTML standard says iframes shouldn't have content, but when
|
196
193
|
# they do, this content is parsed as text and is serialized verbatim without
|
197
194
|
# being escaped, which is unsafe because legacy browsers may still render it
|
data/lib/sanitize/version.rb
CHANGED
data/lib/sanitize.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# encoding: utf-8
|
2
2
|
|
3
|
-
require '
|
3
|
+
require 'nokogiri'
|
4
4
|
require 'set'
|
5
5
|
|
6
6
|
require_relative 'sanitize/version'
|
@@ -54,7 +54,7 @@ class Sanitize
|
|
54
54
|
# Returns a sanitized copy of the given full _html_ document, using the
|
55
55
|
# settings in _config_ if specified.
|
56
56
|
#
|
57
|
-
# When sanitizing a document, the `<html>` element must be
|
57
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
58
58
|
# error will be raised. If this is undesirable, you should probably use
|
59
59
|
# {#fragment} instead.
|
60
60
|
def self.document(html, config = {})
|
@@ -117,7 +117,7 @@ class Sanitize
|
|
117
117
|
|
118
118
|
# Returns a sanitized copy of the given _html_ document.
|
119
119
|
#
|
120
|
-
# When sanitizing a document, the `<html>` element must be
|
120
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
121
121
|
# error will be raised. If this is undesirable, you should probably use
|
122
122
|
# {#fragment} instead.
|
123
123
|
def document(html)
|
@@ -147,20 +147,20 @@ class Sanitize
|
|
147
147
|
# in place.
|
148
148
|
#
|
149
149
|
# If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
|
150
|
-
#
|
150
|
+
# allowlisted or an error will be raised.
|
151
151
|
def node!(node)
|
152
152
|
raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
|
153
153
|
|
154
154
|
if node.is_a?(Nokogiri::XML::Document)
|
155
155
|
unless @config[:elements].include?('html')
|
156
|
-
raise Error, 'When sanitizing a document, "<html>" must be
|
156
|
+
raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
|
157
157
|
end
|
158
158
|
end
|
159
159
|
|
160
|
-
|
160
|
+
node_allowlist = Set.new
|
161
161
|
|
162
162
|
traverse(node) do |n|
|
163
|
-
transform_node!(n,
|
163
|
+
transform_node!(n, node_allowlist)
|
164
164
|
end
|
165
165
|
|
166
166
|
node
|
@@ -189,7 +189,7 @@ class Sanitize
|
|
189
189
|
node.to_html(preserve_newline: true)
|
190
190
|
end
|
191
191
|
|
192
|
-
def transform_node!(node,
|
192
|
+
def transform_node!(node, node_allowlist)
|
193
193
|
@transformers.each do |transformer|
|
194
194
|
# Since transform_node! may be called in a tight loop to process thousands
|
195
195
|
# of items, we can optimize both memory and CPU performance by:
|
@@ -199,15 +199,19 @@ class Sanitize
|
|
199
199
|
# does merge! create a new hash, it is also 2.6x slower:
|
200
200
|
# https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
|
201
201
|
config = @transformer_config
|
202
|
-
config[:is_whitelisted] =
|
202
|
+
config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
|
203
203
|
config[:node] = node
|
204
204
|
config[:node_name] = node.name.downcase
|
205
|
-
config[:node_whitelist] =
|
205
|
+
config[:node_allowlist] = config[:node_whitelist] = node_allowlist
|
206
206
|
|
207
|
-
result = transformer.call(config)
|
207
|
+
result = transformer.call(**config)
|
208
208
|
|
209
|
-
if result.is_a?(Hash)
|
210
|
-
|
209
|
+
if result.is_a?(Hash)
|
210
|
+
result_allowlist = result[:node_allowlist] || result[:node_whitelist]
|
211
|
+
|
212
|
+
if result_allowlist.respond_to?(:each)
|
213
|
+
node_allowlist.merge(result_allowlist)
|
214
|
+
end
|
211
215
|
end
|
212
216
|
end
|
213
217
|
|
data/test/test_clean_element.rb
CHANGED
@@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
162
162
|
}
|
163
163
|
|
164
164
|
describe 'Default config' do
|
165
|
-
it 'should remove non-
|
165
|
+
it 'should remove non-allowlisted elements, leaving safe contents behind' do
|
166
166
|
Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
|
167
167
|
.must_equal 'foo bar baz quux'
|
168
168
|
|
@@ -192,21 +192,16 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
192
192
|
.must_equal ''
|
193
193
|
end
|
194
194
|
|
195
|
-
it 'should escape the content of removed `plaintext` elements' do
|
196
|
-
Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
|
197
|
-
.must_equal 'hello! <script>alert(0)</script>'
|
198
|
-
end
|
199
|
-
|
200
|
-
it 'should escape the content of removed `xmp` elements' do
|
201
|
-
Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
|
202
|
-
.must_equal 'hello! <script>alert(0)</script>'
|
203
|
-
end
|
204
|
-
|
205
195
|
it 'should not preserve the content of removed `iframe` elements' do
|
206
196
|
Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
|
207
197
|
.must_equal ''
|
208
198
|
end
|
209
199
|
|
200
|
+
it 'should not preserve the content of removed `math` elements' do
|
201
|
+
Sanitize.fragment('<math>hello! <script>alert(0)</script></math>')
|
202
|
+
.must_equal ''
|
203
|
+
end
|
204
|
+
|
210
205
|
it 'should not preserve the content of removed `noembed` elements' do
|
211
206
|
Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
|
212
207
|
.must_equal ''
|
@@ -222,6 +217,11 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
222
217
|
.must_equal ''
|
223
218
|
end
|
224
219
|
|
220
|
+
it 'should not preserve the content of removed `plaintext` elements' do
|
221
|
+
Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
|
222
|
+
.must_equal ''
|
223
|
+
end
|
224
|
+
|
225
225
|
it 'should not preserve the content of removed `script` elements' do
|
226
226
|
Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
|
227
227
|
.must_equal ''
|
@@ -232,6 +232,16 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
232
232
|
.must_equal ''
|
233
233
|
end
|
234
234
|
|
235
|
+
it 'should not preserve the content of removed `svg` elements' do
|
236
|
+
Sanitize.fragment('<svg>hello! <script>alert(0)</script></svg>')
|
237
|
+
.must_equal ''
|
238
|
+
end
|
239
|
+
|
240
|
+
it 'should not preserve the content of removed `xmp` elements' do
|
241
|
+
Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
|
242
|
+
.must_equal ''
|
243
|
+
end
|
244
|
+
|
235
245
|
strings.each do |name, data|
|
236
246
|
it "should clean #{name} HTML" do
|
237
247
|
Sanitize.fragment(data[:html]).must_equal(data[:default])
|
@@ -315,7 +325,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
315
325
|
end
|
316
326
|
|
317
327
|
describe 'Custom configs' do
|
318
|
-
it 'should allow attributes on all elements if
|
328
|
+
it 'should allow attributes on all elements if allowlisted under :all' do
|
319
329
|
input = '<p class="foo">bar</p>'
|
320
330
|
|
321
331
|
Sanitize.fragment(input).must_equal ' bar '
|
@@ -336,7 +346,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
336
346
|
}).must_equal input
|
337
347
|
end
|
338
348
|
|
339
|
-
it "should not allow relative URLs when relative URLs aren't
|
349
|
+
it "should not allow relative URLs when relative URLs aren't allowlisted" do
|
340
350
|
input = '<a href="/foo/bar">Link</a>'
|
341
351
|
|
342
352
|
Sanitize.fragment(input,
|
@@ -400,7 +410,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
400
410
|
).must_equal 'foo bar baz hi '
|
401
411
|
end
|
402
412
|
|
403
|
-
it 'should remove the contents of
|
413
|
+
it 'should remove the contents of allowlisted iframes' do
|
404
414
|
Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
|
405
415
|
:elements => ['iframe']
|
406
416
|
).must_equal '<iframe></iframe>'
|
@@ -481,6 +491,22 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
481
491
|
}).must_equal "<a>Text</a>"
|
482
492
|
end
|
483
493
|
|
494
|
+
it 'should sanitize protocols in data attributes even if data attributes are generically allowed' do
|
495
|
+
input = '<a data-url="mailto:someone@example.com">Text</a>'
|
496
|
+
|
497
|
+
Sanitize.fragment(input, {
|
498
|
+
:elements => ['a'],
|
499
|
+
:attributes => {'a' => [:data]},
|
500
|
+
:protocols => {'a' => {'data-url' => ['https']}}
|
501
|
+
}).must_equal "<a>Text</a>"
|
502
|
+
|
503
|
+
Sanitize.fragment(input, {
|
504
|
+
:elements => ['a'],
|
505
|
+
:attributes => {'a' => [:data]},
|
506
|
+
:protocols => {'a' => {'data-url' => ['mailto']}}
|
507
|
+
}).must_equal input
|
508
|
+
end
|
509
|
+
|
484
510
|
it 'should prevent `<meta>` tags from being used to set a non-UTF-8 charset' do
|
485
511
|
Sanitize.document('<html><head><meta charset="utf-8"></head><body>Howdy!</body></html>',
|
486
512
|
:elements => %w[html head meta body],
|
data/test/test_malicious_html.rb
CHANGED
@@ -128,13 +128,15 @@ describe 'Malicious HTML' do
|
|
128
128
|
|
129
129
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
130
130
|
# attempt to preserve server-side includes. This can result in XSS since an
|
131
|
-
# unescaped double quote can allow an attacker to inject a non-
|
131
|
+
# unescaped double quote can allow an attacker to inject a non-allowlisted
|
132
132
|
# attribute. Sanitize works around this by implementing its own escaping for
|
133
133
|
# affected attributes.
|
134
134
|
#
|
135
135
|
# The relevant libxml2 code is here:
|
136
136
|
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
|
137
137
|
describe 'unsafe libxml2 server-side includes in attributes' do
|
138
|
+
using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
|
139
|
+
|
138
140
|
tag_configs = [
|
139
141
|
{
|
140
142
|
tag_name: 'a',
|
@@ -166,6 +168,8 @@ describe 'Malicious HTML' do
|
|
166
168
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
167
169
|
|
168
170
|
it 'should escape unsafe characters in attributes' do
|
171
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
172
|
+
|
169
173
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
170
174
|
# libxml2's.
|
171
175
|
@s.fragment(input).
|
@@ -191,6 +195,8 @@ describe 'Malicious HTML' do
|
|
191
195
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
192
196
|
|
193
197
|
it 'should not escape characters unnecessarily' do
|
198
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
199
|
+
|
194
200
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
195
201
|
# libxml2's.
|
196
202
|
@s.fragment(input).
|
@@ -213,4 +219,17 @@ describe 'Malicious HTML' do
|
|
213
219
|
end
|
214
220
|
end
|
215
221
|
end
|
222
|
+
|
223
|
+
# https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
224
|
+
describe 'foreign content bypass in relaxed config' do
|
225
|
+
it 'prevents a sanitization bypass via carefully crafted foreign content' do
|
226
|
+
%w[iframe noembed noframes noscript plaintext script style xmp].each do |tag_name|
|
227
|
+
@s.fragment(%[<math><#{tag_name}>/*</#{tag_name}><img src onerror=alert(1)>*/]).
|
228
|
+
must_equal ''
|
229
|
+
|
230
|
+
@s.fragment(%[<svg><#{tag_name}>/*</#{tag_name}><img src onerror=alert(1)>*/]).
|
231
|
+
must_equal ''
|
232
|
+
end
|
233
|
+
end
|
234
|
+
end
|
216
235
|
end
|
data/test/test_parser.rb
CHANGED
data/test/test_sanitize.rb
CHANGED
@@ -53,9 +53,9 @@ describe 'Sanitize' do
|
|
53
53
|
@s.document("a#{sample_non_chars}z").must_equal "<html>az</html>"
|
54
54
|
end
|
55
55
|
|
56
|
-
describe 'when html body exceeds
|
56
|
+
describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
|
57
57
|
let(:content) do
|
58
|
-
content = nest_html_content('<b>foo</b>',
|
58
|
+
content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
|
59
59
|
"<html>#{content}</html>"
|
60
60
|
end
|
61
61
|
|
@@ -115,9 +115,9 @@ describe 'Sanitize' do
|
|
115
115
|
@s.fragment("a#{sample_non_chars}z").must_equal "az"
|
116
116
|
end
|
117
117
|
|
118
|
-
describe 'when html body exceeds
|
118
|
+
describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
|
119
119
|
let(:content) do
|
120
|
-
content = nest_html_content('<b>foo</b>',
|
120
|
+
content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
|
121
121
|
"<body>#{content}</body>"
|
122
122
|
end
|
123
123
|
|
@@ -150,7 +150,7 @@ describe 'Sanitize' do
|
|
150
150
|
frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
|
151
151
|
end
|
152
152
|
|
153
|
-
describe "when the given node is a document and <html> isn't
|
153
|
+
describe "when the given node is a document and <html> isn't allowlisted" do
|
154
154
|
it 'should raise a Sanitize::Error' do
|
155
155
|
doc = Nokogiri::HTML5.parse('foo')
|
156
156
|
proc { @s.node!(doc) }.must_raise Sanitize::Error
|
data/test/test_sanitize_css.rb
CHANGED
@@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do
|
|
21
21
|
@custom.properties(css).must_equal 'background: #fff; '
|
22
22
|
end
|
23
23
|
|
24
|
-
it 'should allow
|
24
|
+
it 'should allow allowlisted URL protocols' do
|
25
25
|
[
|
26
26
|
"background: url(relative.jpg)",
|
27
27
|
"background: url('relative.jpg')",
|
@@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do
|
|
36
36
|
end
|
37
37
|
end
|
38
38
|
|
39
|
-
it 'should not allow non-
|
39
|
+
it 'should not allow non-allowlisted URL protocols' do
|
40
40
|
[
|
41
41
|
"background: url(javascript:alert(0))",
|
42
42
|
"background: url(ja\\56 ascript:alert(0))",
|
@@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do
|
|
307
307
|
end
|
308
308
|
|
309
309
|
describe ":at_rules" do
|
310
|
-
it "should remove blockless at-rules that aren't
|
310
|
+
it "should remove blockless at-rules that aren't allowlisted" do
|
311
311
|
css = %[
|
312
312
|
@charset 'utf-8';
|
313
313
|
@import url('foo.css');
|
@@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do
|
|
319
319
|
].strip
|
320
320
|
end
|
321
321
|
|
322
|
-
describe "when blockless at-rules are
|
322
|
+
describe "when blockless at-rules are allowlisted" do
|
323
323
|
before do
|
324
324
|
@scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
|
325
325
|
:at_rules => ['charset', 'import']
|
data/test/test_transformers.rb
CHANGED
@@ -12,11 +12,13 @@ describe 'Transformers' do
|
|
12
12
|
return unless env[:node].element?
|
13
13
|
|
14
14
|
env[:config][:foo].must_equal :bar
|
15
|
-
env[:
|
15
|
+
env[:is_allowlisted].must_equal false
|
16
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
16
17
|
env[:node].must_be_kind_of Nokogiri::XML::Node
|
17
18
|
env[:node_name].must_equal 'span'
|
18
|
-
env[:
|
19
|
-
env[:
|
19
|
+
env[:node_allowlist].must_be_kind_of Set
|
20
|
+
env[:node_allowlist].must_be_empty
|
21
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
20
22
|
}
|
21
23
|
)
|
22
24
|
end
|
@@ -43,34 +45,38 @@ describe 'Transformers' do
|
|
43
45
|
nodes.must_equal %w[div span strong b p]
|
44
46
|
end
|
45
47
|
|
46
|
-
it 'should
|
48
|
+
it 'should allowlist nodes in the node allowlist' do
|
47
49
|
Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
|
48
50
|
:transformers => [
|
49
51
|
proc {|env|
|
50
|
-
{:
|
52
|
+
{:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
|
51
53
|
},
|
52
54
|
|
53
55
|
proc {|env|
|
54
|
-
env[:
|
55
|
-
env[:
|
56
|
-
env[:
|
56
|
+
env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
|
57
|
+
env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
|
58
|
+
env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
|
59
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
60
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
57
61
|
}
|
58
62
|
]
|
59
63
|
).must_equal '<div class="foo">foo</div>bar'
|
60
64
|
end
|
61
65
|
|
62
|
-
it 'should clear the node
|
66
|
+
it 'should clear the node allowlist after each fragment' do
|
63
67
|
called = false
|
64
68
|
|
65
69
|
Sanitize.fragment('<div>foo</div>',
|
66
|
-
:transformers => proc {|env| {:
|
70
|
+
:transformers => proc {|env| {:node_allowlist => [env[:node]]}}
|
67
71
|
)
|
68
72
|
|
69
73
|
Sanitize.fragment('<div>foo</div>',
|
70
74
|
:transformers => proc {|env|
|
71
75
|
called = true
|
72
|
-
env[:
|
73
|
-
env[:
|
76
|
+
env[:is_allowlisted].must_equal false
|
77
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
78
|
+
env[:node_allowlist].must_be_empty
|
79
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
74
80
|
}
|
75
81
|
)
|
76
82
|
|
@@ -83,10 +89,10 @@ describe 'Transformers' do
|
|
83
89
|
.must_equal(' foo ')
|
84
90
|
end
|
85
91
|
|
86
|
-
describe 'Image
|
92
|
+
describe 'Image allowlist transformer' do
|
87
93
|
require 'uri'
|
88
94
|
|
89
|
-
|
95
|
+
image_allowlist_transformer = lambda do |env|
|
90
96
|
# Ignore everything except <img> elements.
|
91
97
|
return unless env[:node_name] == 'img'
|
92
98
|
|
@@ -103,7 +109,7 @@ describe 'Transformers' do
|
|
103
109
|
|
104
110
|
before do
|
105
111
|
@s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
|
106
|
-
:transformers =>
|
112
|
+
:transformers => image_allowlist_transformer))
|
107
113
|
end
|
108
114
|
|
109
115
|
it 'should allow images with relative URLs' do
|
@@ -142,8 +148,8 @@ describe 'Transformers' do
|
|
142
148
|
node = env[:node]
|
143
149
|
node_name = env[:node_name]
|
144
150
|
|
145
|
-
# Don't continue if this node is already
|
146
|
-
return if env[:
|
151
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
152
|
+
return if env[:is_allowlisted] || !node.element?
|
147
153
|
|
148
154
|
# Don't continue unless the node is an iframe.
|
149
155
|
return unless node_name == 'iframe'
|
@@ -164,8 +170,8 @@ describe 'Transformers' do
|
|
164
170
|
|
165
171
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
166
172
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
167
|
-
# to
|
168
|
-
{:
|
173
|
+
# to allowlist the current node.
|
174
|
+
{:node_allowlist => [node]}
|
169
175
|
end
|
170
176
|
|
171
177
|
it 'should allow HTTP YouTube video embeds' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 6.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2021-08-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: crass
|
@@ -30,59 +30,45 @@ dependencies:
|
|
30
30
|
requirements:
|
31
31
|
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: 1.
|
33
|
+
version: 1.12.0
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: 1.
|
41
|
-
- !ruby/object:Gem::Dependency
|
42
|
-
name: nokogumbo
|
43
|
-
requirement: !ruby/object:Gem::Requirement
|
44
|
-
requirements:
|
45
|
-
- - "~>"
|
46
|
-
- !ruby/object:Gem::Version
|
47
|
-
version: '2.0'
|
48
|
-
type: :runtime
|
49
|
-
prerelease: false
|
50
|
-
version_requirements: !ruby/object:Gem::Requirement
|
51
|
-
requirements:
|
52
|
-
- - "~>"
|
53
|
-
- !ruby/object:Gem::Version
|
54
|
-
version: '2.0'
|
40
|
+
version: 1.12.0
|
55
41
|
- !ruby/object:Gem::Dependency
|
56
42
|
name: minitest
|
57
43
|
requirement: !ruby/object:Gem::Requirement
|
58
44
|
requirements:
|
59
45
|
- - "~>"
|
60
46
|
- !ruby/object:Gem::Version
|
61
|
-
version: 5.
|
47
|
+
version: 5.14.4
|
62
48
|
type: :development
|
63
49
|
prerelease: false
|
64
50
|
version_requirements: !ruby/object:Gem::Requirement
|
65
51
|
requirements:
|
66
52
|
- - "~>"
|
67
53
|
- !ruby/object:Gem::Version
|
68
|
-
version: 5.
|
54
|
+
version: 5.14.4
|
69
55
|
- !ruby/object:Gem::Dependency
|
70
56
|
name: rake
|
71
57
|
requirement: !ruby/object:Gem::Requirement
|
72
58
|
requirements:
|
73
59
|
- - "~>"
|
74
60
|
- !ruby/object:Gem::Version
|
75
|
-
version:
|
61
|
+
version: 13.0.6
|
76
62
|
type: :development
|
77
63
|
prerelease: false
|
78
64
|
version_requirements: !ruby/object:Gem::Requirement
|
79
65
|
requirements:
|
80
66
|
- - "~>"
|
81
67
|
- !ruby/object:Gem::Version
|
82
|
-
version:
|
83
|
-
description: Sanitize is
|
84
|
-
|
85
|
-
|
68
|
+
version: 13.0.6
|
69
|
+
description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
|
70
|
+
HTML and/or CSS from a string except the elements, attributes, and properties you
|
71
|
+
choose to allow.
|
86
72
|
email: ryan@wonko.com
|
87
73
|
executables: []
|
88
74
|
extensions: []
|
@@ -120,7 +106,7 @@ homepage: https://github.com/rgrove/sanitize/
|
|
120
106
|
licenses:
|
121
107
|
- MIT
|
122
108
|
metadata: {}
|
123
|
-
post_install_message:
|
109
|
+
post_install_message:
|
124
110
|
rdoc_options: []
|
125
111
|
require_paths:
|
126
112
|
- lib
|
@@ -128,15 +114,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
128
114
|
requirements:
|
129
115
|
- - ">="
|
130
116
|
- !ruby/object:Gem::Version
|
131
|
-
version: 2.
|
117
|
+
version: 2.5.0
|
132
118
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
133
119
|
requirements:
|
134
120
|
- - ">="
|
135
121
|
- !ruby/object:Gem::Version
|
136
122
|
version: 1.2.0
|
137
123
|
requirements: []
|
138
|
-
rubygems_version: 3.
|
139
|
-
signing_key:
|
124
|
+
rubygems_version: 3.2.22
|
125
|
+
signing_key:
|
140
126
|
specification_version: 4
|
141
|
-
summary:
|
127
|
+
summary: Allowlist-based HTML and CSS sanitizer.
|
142
128
|
test_files: []
|