sanitize 4.6.6 → 6.0.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +176 -16
- data/LICENSE +1 -1
- data/README.md +65 -67
- data/lib/sanitize/config/default.rb +10 -4
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +60 -22
- data/lib/sanitize/version.rb +1 -1
- data/lib/sanitize.rb +39 -63
- data/test/common.rb +0 -31
- data/test/test_clean_comment.rb +1 -5
- data/test/test_clean_css.rb +1 -1
- data/test/test_clean_doctype.rb +8 -8
- data/test/test_clean_element.rb +137 -26
- data/test/test_malicious_html.rb +50 -7
- data/test/test_parser.rb +3 -32
- data/test/test_sanitize.rb +103 -18
- data/test/test_sanitize_css.rb +43 -16
- data/test/test_transformers.rb +29 -23
- metadata +17 -33
- data/test/test_unicode.rb +0 -95
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
|
4
|
+
data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
|
7
|
+
data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,145 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 6.0.0 (2021-08-03)
|
4
|
+
|
5
|
+
### Potentially Breaking Changes
|
6
|
+
|
7
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
8
|
+
|
9
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
10
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
11
|
+
|
12
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
13
|
+
|
14
|
+
## 5.2.3 (2021-01-11)
|
15
|
+
|
16
|
+
### Bug Fixes
|
17
|
+
|
18
|
+
* Ensure protocol sanitization is applied to data attributes.
|
19
|
+
[@ccutrer - #207][207]
|
20
|
+
|
21
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
22
|
+
|
23
|
+
## 5.2.2 (2021-01-06)
|
24
|
+
|
25
|
+
### Bug Fixes
|
26
|
+
|
27
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
28
|
+
custom transformer. [@mscrivo - #206][206]
|
29
|
+
|
30
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
31
|
+
|
32
|
+
## 5.2.1 (2020-06-16)
|
33
|
+
|
34
|
+
### Bug Fixes
|
35
|
+
|
36
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
37
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
38
|
+
|
39
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
40
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
41
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
42
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
43
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
44
|
+
|
45
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
46
|
+
config or a custom config that allows one or more of the following HTML
|
47
|
+
elements:
|
48
|
+
|
49
|
+
- `iframe`
|
50
|
+
- `math`
|
51
|
+
- `noembed`
|
52
|
+
- `noframes`
|
53
|
+
- `noscript`
|
54
|
+
- `plaintext`
|
55
|
+
- `script`
|
56
|
+
- `style`
|
57
|
+
- `svg`
|
58
|
+
- `xmp`
|
59
|
+
|
60
|
+
See the security advisory for more details, including a workaround if you're
|
61
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
62
|
+
|
63
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
64
|
+
helping to verify the fix.
|
65
|
+
|
66
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
67
|
+
|
68
|
+
## 5.2.0 (2020-06-06)
|
69
|
+
|
70
|
+
### Changes
|
71
|
+
|
72
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
73
|
+
source and documentation.
|
74
|
+
|
75
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
76
|
+
intent, there are inherent racial connotations in the implication that white
|
77
|
+
is good and black (as in "blacklist") is not.
|
78
|
+
|
79
|
+
This is a change I should have made long ago, and I apologize for not making
|
80
|
+
it sooner.
|
81
|
+
|
82
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
83
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
84
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
85
|
+
but they are no longer documented and may be removed in a future semver major
|
86
|
+
release.
|
87
|
+
|
88
|
+
## 5.1.0 (2019-09-07)
|
89
|
+
|
90
|
+
### Features
|
91
|
+
|
92
|
+
* Added a `:parser_options` config hash, which makes it possible to pass custom
|
93
|
+
parsing options to Nokogumbo. [@austin-wang - #194][194]
|
94
|
+
|
95
|
+
### Bug Fixes
|
96
|
+
|
97
|
+
* Non-characters and non-whitespace control characters are now stripped from
|
98
|
+
HTML input before parsing to comply with the HTML Standard's [preprocessing
|
99
|
+
guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
|
100
|
+
W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
|
101
|
+
|
102
|
+
[179]:https://github.com/rgrove/sanitize/issues/179
|
103
|
+
[194]:https://github.com/rgrove/sanitize/pull/194
|
104
|
+
[html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
|
105
|
+
[unicode-xml]:https://www.w3.org/TR/unicode-xml/
|
106
|
+
|
107
|
+
## 5.0.0 (2018-10-14)
|
108
|
+
|
109
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
110
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
111
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
112
|
+
review the changes below carefully.
|
113
|
+
|
114
|
+
### Potentially Breaking Changes
|
115
|
+
|
116
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
117
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
118
|
+
no longer works in Ruby 1.9.x.
|
119
|
+
|
120
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
121
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
122
|
+
|
123
|
+
* Children of the following elements are now removed by default when these
|
124
|
+
elements are removed, rather than being preserved and escaped:
|
125
|
+
|
126
|
+
- `iframe`
|
127
|
+
- `noembed`
|
128
|
+
- `noframes`
|
129
|
+
- `noscript`
|
130
|
+
- `script`
|
131
|
+
- `style`
|
132
|
+
|
133
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
134
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
135
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
136
|
+
browsers, but it's been almost two decades since that was useful.
|
137
|
+
|
138
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
139
|
+
`true` when it was actually an Array.
|
140
|
+
|
141
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
142
|
+
|
3
143
|
## 4.6.6 (2018-07-23)
|
4
144
|
|
5
145
|
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
@@ -29,7 +169,7 @@
|
|
29
169
|
|
30
170
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
31
171
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
32
|
-
escaped output, allowing non-
|
172
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
33
173
|
elements.
|
34
174
|
|
35
175
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -73,7 +213,7 @@
|
|
73
213
|
|
74
214
|
## 4.4.0 (2016-09-29)
|
75
215
|
|
76
|
-
* Added `srcset` to the attribute
|
216
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
77
217
|
config. [@ejtttje - #156][156]
|
78
218
|
|
79
219
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -194,7 +334,7 @@
|
|
194
334
|
## 3.0.4 (2014-12-12)
|
195
335
|
|
196
336
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
197
|
-
caused the URL to be removed even when the protocol was
|
337
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
198
338
|
[@benubois - #126][126]
|
199
339
|
|
200
340
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -203,7 +343,7 @@
|
|
203
343
|
## 3.0.3 (2014-10-29)
|
204
344
|
|
205
345
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
206
|
-
`@media` block, causing them to be removed even when
|
346
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
207
347
|
have allowed them to remain. [#121][121]
|
208
348
|
|
209
349
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -268,7 +408,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
268
408
|
* The `clean_node!` method was renamed to `node!`.
|
269
409
|
|
270
410
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
271
|
-
isn't
|
411
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
272
412
|
regardless of the `:remove_contents` config setting.
|
273
413
|
|
274
414
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -279,7 +419,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
279
419
|
|
280
420
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
281
421
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
282
|
-
|
422
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
283
423
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
284
424
|
sanitize CSS stylesheets or properties.
|
285
425
|
|
@@ -324,9 +464,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
324
464
|
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
325
465
|
|
326
466
|
|
467
|
+
## 2.1.1 (2018-09-30)
|
468
|
+
|
469
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
470
|
+
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
471
|
+
|
472
|
+
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
473
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
474
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
475
|
+
elements.
|
476
|
+
|
477
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
478
|
+
this.
|
479
|
+
|
480
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
481
|
+
this issue.
|
482
|
+
|
483
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
484
|
+
[188]:https://github.com/rgrove/sanitize/pull/188
|
485
|
+
|
486
|
+
|
327
487
|
## 2.1.0 (2014-01-13)
|
328
488
|
|
329
|
-
* Added support for
|
489
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
330
490
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
331
491
|
indicate that arbitrary data attributes should be allowed on an element.
|
332
492
|
|
@@ -407,12 +567,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
407
567
|
the default depth-first mode.
|
408
568
|
|
409
569
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
410
|
-
elements to the
|
570
|
+
elements to the allowlists for the basic and relaxed configs.
|
411
571
|
|
412
572
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
413
|
-
`ruby`, and `wbr` elements to the
|
573
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
414
574
|
|
415
|
-
* The `dir`, `lang`, and `title` attributes are now
|
575
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
416
576
|
elements in the relaxed config.
|
417
577
|
|
418
578
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -423,7 +583,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
423
583
|
## 1.2.1 (2010-04-20)
|
424
584
|
|
425
585
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
426
|
-
remove the contents of all non-
|
586
|
+
remove the contents of all non-allowlisted elements in addition to the
|
427
587
|
elements themselves. If set to an array of element names, Sanitize will
|
428
588
|
remove the contents of only those elements (when filtered), and leave the
|
429
589
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -451,7 +611,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
451
611
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
452
612
|
all its children.
|
453
613
|
|
454
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
614
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
455
615
|
David Reese]
|
456
616
|
|
457
617
|
|
@@ -471,7 +631,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
471
631
|
|
472
632
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
473
633
|
being downcased in recent versions of Hpricot. This was exploitable to
|
474
|
-
prevent non-
|
634
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
475
635
|
Wanicur]
|
476
636
|
|
477
637
|
|
@@ -501,7 +661,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
501
661
|
|
502
662
|
## 1.0.5 (2009-02-05)
|
503
663
|
|
504
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
664
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
505
665
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
506
666
|
Dev Purkayastha]
|
507
667
|
|
@@ -511,7 +671,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
511
671
|
|
512
672
|
## 1.0.4 (2009-01-16)
|
513
673
|
|
514
|
-
* Fixed a bug that made it possible to sneak a non-
|
674
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
515
675
|
by repeating it several times in a row. All versions of Sanitize prior to
|
516
676
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
517
677
|
|
@@ -519,7 +679,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
519
679
|
## 1.0.3 (2009-01-15)
|
520
680
|
|
521
681
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
522
|
-
prevent non-
|
682
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
523
683
|
still decode the incomplete entities, users of those browsers may be
|
524
684
|
vulnerable to malicious script injection on websites using versions of
|
525
685
|
Sanitize prior to 1.0.3.
|
data/LICENSE
CHANGED
data/README.md
CHANGED
@@ -1,28 +1,27 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
be removed.
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
14
13
|
|
15
|
-
Sanitize is based on [
|
14
|
+
Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
|
16
15
|
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
|
17
|
-
exactly the same way modern browsers do. As long as your
|
16
|
+
exactly the same way modern browsers do. As long as your allowlist config only
|
18
17
|
allows safe markup and CSS, even the most malformed or malicious input will be
|
19
18
|
transformed into safe output.
|
20
19
|
|
21
|
-
[![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
|
22
20
|
[![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
|
21
|
+
[![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
|
23
22
|
|
24
23
|
[crass]:https://github.com/rgrove/crass
|
25
|
-
[
|
24
|
+
[nokogumbo]:https://github.com/rubys/nokogumbo
|
26
25
|
|
27
26
|
Links
|
28
27
|
-----
|
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
|
|
73
72
|
* Standalone CSS stylesheets
|
74
73
|
* Standalone CSS properties
|
75
74
|
|
75
|
+
However, please note that Sanitize _cannot_ fully sanitize the contents of
|
76
|
+
`<math>` or `<svg>` elements, since these elements don't follow the same parsing
|
77
|
+
rules as the rest of HTML. If this is something you need, you may want to look
|
78
|
+
for another solution.
|
79
|
+
|
76
80
|
### HTML Fragments
|
77
81
|
|
78
82
|
A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
|
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
|
|
88
92
|
# => 'foo'
|
89
93
|
```
|
90
94
|
|
91
|
-
To keep certain elements, add them to the element
|
95
|
+
To keep certain elements, add them to the element allowlist.
|
92
96
|
|
93
97
|
```ruby
|
94
98
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
101
|
|
98
102
|
### HTML Documents
|
99
103
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
104
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
105
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
106
|
definitions.
|
103
107
|
|
@@ -123,8 +127,8 @@ Sanitize.document(html,
|
|
123
127
|
|
124
128
|
### CSS in HTML
|
125
129
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
130
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
131
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
132
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
133
|
allow CSS comments or browser compatibility hacks.
|
130
134
|
|
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
271
|
|
268
272
|
```ruby
|
269
273
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
274
|
+
# existing allowlisted elements.
|
271
275
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
276
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
277
|
:remove_contents => true
|
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
399
|
|
396
400
|
##### :css => :properties (Array or Set)
|
397
401
|
|
398
|
-
|
399
|
-
lowercase.
|
402
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
403
|
|
401
404
|
##### :css => :protocols (Array or Set)
|
402
405
|
|
@@ -417,6 +420,23 @@ elements not in this array will be removed.
|
|
417
420
|
]
|
418
421
|
```
|
419
422
|
|
423
|
+
**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
|
424
|
+
elements, since these elements don't follow the same parsing rules as the rest
|
425
|
+
of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
|
426
|
+
content inside them will be allowed, even if that content would otherwise be
|
427
|
+
removed by Sanitize.
|
428
|
+
|
429
|
+
#### :parser_options (Hash)
|
430
|
+
|
431
|
+
[Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
|
432
|
+
|
433
|
+
```ruby
|
434
|
+
:parser_options => {
|
435
|
+
max_errors: -1,
|
436
|
+
max_tree_depth: -1
|
437
|
+
}
|
438
|
+
```
|
439
|
+
|
420
440
|
#### :protocols (Hash)
|
421
441
|
|
422
442
|
URL protocols to allow in specific attributes. If an attribute is listed here
|
@@ -441,15 +461,15 @@ include the symbol `:relative` in the protocol array:
|
|
441
461
|
|
442
462
|
#### :remove_contents (boolean or Array or Set)
|
443
463
|
|
444
|
-
If
|
464
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
445
465
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
446
466
|
safe parts of an element's contents behind when the element is removed.
|
447
467
|
|
448
|
-
If
|
449
|
-
elements (when filtered) will be removed, and the contents of all
|
450
|
-
elements will be left behind.
|
468
|
+
If this is an Array or Set of element names, then only the contents of the
|
469
|
+
specified elements (when filtered) will be removed, and the contents of all
|
470
|
+
other filtered elements will be left behind.
|
451
471
|
|
452
|
-
The default value is
|
472
|
+
The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
|
453
473
|
|
454
474
|
#### :transformers (Array or callable)
|
455
475
|
|
@@ -507,33 +527,33 @@ argument a Hash that contains the following items:
|
|
507
527
|
|
508
528
|
* **:config** - The current Sanitize configuration Hash.
|
509
529
|
|
510
|
-
* **:
|
530
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
511
531
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
512
|
-
a node that a previous transformer has
|
532
|
+
a node that a previous transformer has allowlisted.
|
513
533
|
|
514
534
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
515
535
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
516
536
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
517
537
|
selectively ignore node types you aren't interested in.
|
518
538
|
|
539
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
540
|
+
document that have been allowlisted by previous transformers, if any. It's
|
541
|
+
generally bad form to remove a node that a previous transformer has
|
542
|
+
allowlisted.
|
543
|
+
|
519
544
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
520
545
|
"div" or "span"). For non-element nodes, the name will be something like
|
521
546
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
522
547
|
|
523
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
524
|
-
document that have been whitelisted by previous transformers, if any. It's
|
525
|
-
generally bad form to remove a node that a previous transformer has
|
526
|
-
whitelisted.
|
527
|
-
|
528
548
|
### Output
|
529
549
|
|
530
550
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
531
551
|
which may contain the following items:
|
532
552
|
|
533
|
-
* **:
|
534
|
-
to add to the document's
|
535
|
-
These specific nodes and all their attributes will be
|
536
|
-
their children will not be.
|
553
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
554
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
555
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
556
|
+
but their children will not be.
|
537
557
|
|
538
558
|
If a transformer returns anything other than a Hash, the return value will be
|
539
559
|
ignored.
|
@@ -576,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
576
596
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
577
597
|
your own hands.
|
578
598
|
|
579
|
-
### Example: Transformer to
|
599
|
+
### Example: Transformer to allow image URLs by domain
|
580
600
|
|
581
601
|
The following example demonstrates how to remove image elements unless they use
|
582
602
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
583
|
-
element and its `src` attribute are already
|
603
|
+
element and its `src` attribute are already allowlisted.
|
584
604
|
|
585
605
|
```ruby
|
586
606
|
require 'uri'
|
587
607
|
|
588
|
-
|
608
|
+
image_allowlist_transformer = lambda do |env|
|
589
609
|
# Ignore everything except <img> elements.
|
590
610
|
return unless env[:node_name] == 'img'
|
591
611
|
|
@@ -601,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
|
|
601
621
|
end
|
602
622
|
```
|
603
623
|
|
604
|
-
### Example: Transformer to
|
624
|
+
### Example: Transformer to allow YouTube video embeds
|
605
625
|
|
606
626
|
The following example demonstrates how to create a transformer that will safely
|
607
|
-
|
608
|
-
|
609
|
-
|
627
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
628
|
+
content, which would be the case if you tried to do this by just allowing all
|
629
|
+
`<iframe>` elements:
|
610
630
|
|
611
631
|
```ruby
|
612
632
|
youtube_transformer = lambda do |env|
|
613
633
|
node = env[:node]
|
614
634
|
node_name = env[:node_name]
|
615
635
|
|
616
|
-
# Don't continue if this node is already
|
617
|
-
return if env[:
|
636
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
637
|
+
return if env[:is_allowlisted] || !node.element?
|
618
638
|
|
619
639
|
# Don't continue unless the node is an iframe.
|
620
640
|
return unless node_name == 'iframe'
|
@@ -635,8 +655,8 @@ youtube_transformer = lambda do |env|
|
|
635
655
|
|
636
656
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
637
657
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
638
|
-
# to
|
639
|
-
{:
|
658
|
+
# to allowlist the current node.
|
659
|
+
{:node_allowlist => [node]}
|
640
660
|
end
|
641
661
|
|
642
662
|
html = %[
|
@@ -647,25 +667,3 @@ html = %[
|
|
647
667
|
Sanitize.fragment(html, :transformers => youtube_transformer)
|
648
668
|
# => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
|
649
669
|
```
|
650
|
-
|
651
|
-
License
|
652
|
-
-------
|
653
|
-
|
654
|
-
Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
|
655
|
-
|
656
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
657
|
-
this software and associated documentation files (the 'Software'), to deal in
|
658
|
-
the Software without restriction, including without limitation the rights to
|
659
|
-
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
660
|
-
the Software, and to permit persons to whom the Software is furnished to do so,
|
661
|
-
subject to the following conditions:
|
662
|
-
|
663
|
-
The above copyright notice and this permission notice shall be included in all
|
664
|
-
copies or substantial portions of the Software.
|
665
|
-
|
666
|
-
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
667
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
668
|
-
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
669
|
-
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
670
|
-
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
671
|
-
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@@ -56,6 +56,10 @@ class Sanitize
|
|
56
56
|
# that all HTML will be stripped).
|
57
57
|
:elements => [],
|
58
58
|
|
59
|
+
# HTML parsing options to pass to Nokogumbo.
|
60
|
+
# https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options
|
61
|
+
:parser_options => {},
|
62
|
+
|
59
63
|
# URL handling protocols to allow in specific attributes. By default, no
|
60
64
|
# protocols are allowed. Use :relative in place of a protocol if you want
|
61
65
|
# to allow relative URLs sans protocol.
|
@@ -66,10 +70,12 @@ class Sanitize
|
|
66
70
|
# leaves the safe parts of an element's contents behind when the element
|
67
71
|
# is removed.
|
68
72
|
#
|
69
|
-
# If this is an Array of element names, then only the contents of
|
70
|
-
# specified elements (when filtered) will be removed, and the contents
|
71
|
-
# all other filtered elements will be left behind.
|
72
|
-
:remove_contents =>
|
73
|
+
# If this is an Array or Set of element names, then only the contents of
|
74
|
+
# the specified elements (when filtered) will be removed, and the contents
|
75
|
+
# of all other filtered elements will be left behind.
|
76
|
+
:remove_contents => %w[
|
77
|
+
iframe math noembed noframes noscript plaintext script style svg xmp
|
78
|
+
],
|
73
79
|
|
74
80
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
75
81
|
# README.md for details and examples.
|
@@ -6,7 +6,7 @@ class Sanitize
|
|
6
6
|
:elements => BASIC[:elements] + %w[
|
7
7
|
address article aside bdi bdo body caption col colgroup data del div
|
8
8
|
figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
|
9
|
-
img ins main nav rp rt ruby section span style summary
|
9
|
+
img ins main nav rp rt ruby section span style summary table tbody
|
10
10
|
td tfoot th thead title tr wbr
|
11
11
|
],
|
12
12
|
|
data/lib/sanitize/css.rb
CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
|
|
175
175
|
next prop
|
176
176
|
|
177
177
|
when :semicolon
|
178
|
-
# Only preserve the semicolon if it was preceded by
|
178
|
+
# Only preserve the semicolon if it was preceded by an allowlisted
|
179
179
|
# property. Otherwise, omit it in order to prevent redundant semicolons.
|
180
180
|
if preceded_by_property
|
181
181
|
preceded_by_property = false
|
@@ -296,7 +296,7 @@ class Sanitize; class CSS
|
|
296
296
|
end
|
297
297
|
|
298
298
|
# Returns `true` if the given node (which may be of type `:url` or
|
299
|
-
# `:function`, since the CSS syntax can produce both) uses
|
299
|
+
# `:function`, since the CSS syntax can produce both) uses an allowlisted
|
300
300
|
# protocol.
|
301
301
|
def valid_url?(node)
|
302
302
|
type = node[:node]
|