sanitize 4.6.4 → 5.2.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +125 -16
- data/README.md +59 -41
- data/lib/sanitize.rb +55 -71
- data/lib/sanitize/config/default.rb +10 -4
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +54 -13
- data/lib/sanitize/version.rb +1 -1
- data/test/common.rb +0 -31
- data/test/test_clean_comment.rb +1 -5
- data/test/test_clean_css.rb +1 -1
- data/test/test_clean_doctype.rb +8 -8
- data/test/test_clean_element.rb +111 -26
- data/test/test_malicious_html.rb +37 -7
- data/test/test_parser.rb +3 -32
- data/test/test_sanitize.rb +103 -18
- data/test/test_sanitize_css.rb +43 -16
- data/test/test_transformers.rb +29 -23
- metadata +16 -18
- data/test/test_unicode.rb +0 -95
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4f01a992746ecc3f28e9c1fd14c08c99456fb98a59c0b7ba6a8c6f01d0ab07cf
|
4
|
+
data.tar.gz: 4f379538b26db4d239078ea7e54fea3b106e7801d093ed7407e9b71282f6c4d3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 52d96c5f73eea8d738fe23d816d5aec856f9f37ca37cf88d88d385fcffbf242605d13494ab531b517af7bdea44bfae2569f27bc2d5fb005dbeee85a54211d674
|
7
|
+
data.tar.gz: 897e95c05448509cfeb455bb4ec156ff7557495987e1d058ff63b888f9c0069a821a9b3684e0fe0463f78e4f28faf9fe2089760ad59bbbd1b5a5390fe9632154
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,94 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 5.2.0 (2020-06-06)
|
4
|
+
|
5
|
+
### Changes
|
6
|
+
|
7
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
8
|
+
source and documentation.
|
9
|
+
|
10
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
11
|
+
intent, there are inherent racial connotations in the implication that white
|
12
|
+
is good and black (as in "blacklist") is not.
|
13
|
+
|
14
|
+
This is a change I should have made long ago, and I apologize for not making
|
15
|
+
it sooner.
|
16
|
+
|
17
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
18
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
19
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
20
|
+
but they are no longer documented and may be removed in a future semver major
|
21
|
+
release.
|
22
|
+
|
23
|
+
## 5.1.0 (2019-09-07)
|
24
|
+
|
25
|
+
### Features
|
26
|
+
|
27
|
+
* Added a `:parser_options` config hash, which makes it possible to pass custom
|
28
|
+
parsing options to Nokogumbo. [@austin-wang - #194][194]
|
29
|
+
|
30
|
+
### Bug Fixes
|
31
|
+
|
32
|
+
* Non-characters and non-whitespace control characters are now stripped from
|
33
|
+
HTML input before parsing to comply with the HTML Standard's [preprocessing
|
34
|
+
guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
|
35
|
+
W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
|
36
|
+
|
37
|
+
[179]:https://github.com/rgrove/sanitize/issues/179
|
38
|
+
[194]:https://github.com/rgrove/sanitize/pull/194
|
39
|
+
[html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
|
40
|
+
[unicode-xml]:https://www.w3.org/TR/unicode-xml/
|
41
|
+
|
42
|
+
## 5.0.0 (2018-10-14)
|
43
|
+
|
44
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
45
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
46
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
47
|
+
review the changes below carefully.
|
48
|
+
|
49
|
+
### Potentially Breaking Changes
|
50
|
+
|
51
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
52
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
53
|
+
no longer works in Ruby 1.9.x.
|
54
|
+
|
55
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
56
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
57
|
+
|
58
|
+
* Children of the following elements are now removed by default when these
|
59
|
+
elements are removed, rather than being preserved and escaped:
|
60
|
+
|
61
|
+
- `iframe`
|
62
|
+
- `noembed`
|
63
|
+
- `noframes`
|
64
|
+
- `noscript`
|
65
|
+
- `script`
|
66
|
+
- `style`
|
67
|
+
|
68
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
69
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
70
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
71
|
+
browsers, but it's been almost two decades since that was useful.
|
72
|
+
|
73
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
74
|
+
`true` when it was actually an Array.
|
75
|
+
|
76
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
77
|
+
|
78
|
+
## 4.6.6 (2018-07-23)
|
79
|
+
|
80
|
+
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
81
|
+
[@stanhu - #183][183]
|
82
|
+
|
83
|
+
[183]:https://github.com/rgrove/sanitize/pull/183
|
84
|
+
|
85
|
+
## 4.6.5 (2018-05-16)
|
86
|
+
|
87
|
+
* Improved performance slightly by tweaking the order of built-in transformers.
|
88
|
+
[@rafbm - #180][180]
|
89
|
+
|
90
|
+
[180]:https://github.com/rgrove/sanitize/pull/180
|
91
|
+
|
3
92
|
## 4.6.4 (2018-03-20)
|
4
93
|
|
5
94
|
* Fixed: A change introduced in 4.6.2 broke certain transformers that relied on
|
@@ -15,7 +104,7 @@
|
|
15
104
|
|
16
105
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
17
106
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
18
|
-
escaped output, allowing non-
|
107
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
19
108
|
elements.
|
20
109
|
|
21
110
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -59,7 +148,7 @@
|
|
59
148
|
|
60
149
|
## 4.4.0 (2016-09-29)
|
61
150
|
|
62
|
-
* Added `srcset` to the attribute
|
151
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
63
152
|
config. [@ejtttje - #156][156]
|
64
153
|
|
65
154
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -180,7 +269,7 @@
|
|
180
269
|
## 3.0.4 (2014-12-12)
|
181
270
|
|
182
271
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
183
|
-
caused the URL to be removed even when the protocol was
|
272
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
184
273
|
[@benubois - #126][126]
|
185
274
|
|
186
275
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -189,7 +278,7 @@
|
|
189
278
|
## 3.0.3 (2014-10-29)
|
190
279
|
|
191
280
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
192
|
-
`@media` block, causing them to be removed even when
|
281
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
193
282
|
have allowed them to remain. [#121][121]
|
194
283
|
|
195
284
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -254,7 +343,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
254
343
|
* The `clean_node!` method was renamed to `node!`.
|
255
344
|
|
256
345
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
257
|
-
isn't
|
346
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
258
347
|
regardless of the `:remove_contents` config setting.
|
259
348
|
|
260
349
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -265,7 +354,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
265
354
|
|
266
355
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
267
356
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
268
|
-
|
357
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
269
358
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
270
359
|
sanitize CSS stylesheets or properties.
|
271
360
|
|
@@ -310,9 +399,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
310
399
|
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
311
400
|
|
312
401
|
|
402
|
+
## 2.1.1 (2018-09-30)
|
403
|
+
|
404
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
405
|
+
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
406
|
+
|
407
|
+
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
408
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
409
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
410
|
+
elements.
|
411
|
+
|
412
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
413
|
+
this.
|
414
|
+
|
415
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
416
|
+
this issue.
|
417
|
+
|
418
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
419
|
+
[188]:https://github.com/rgrove/sanitize/pull/188
|
420
|
+
|
421
|
+
|
313
422
|
## 2.1.0 (2014-01-13)
|
314
423
|
|
315
|
-
* Added support for
|
424
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
316
425
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
317
426
|
indicate that arbitrary data attributes should be allowed on an element.
|
318
427
|
|
@@ -393,12 +502,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
393
502
|
the default depth-first mode.
|
394
503
|
|
395
504
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
396
|
-
elements to the
|
505
|
+
elements to the allowlists for the basic and relaxed configs.
|
397
506
|
|
398
507
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
399
|
-
`ruby`, and `wbr` elements to the
|
508
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
400
509
|
|
401
|
-
* The `dir`, `lang`, and `title` attributes are now
|
510
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
402
511
|
elements in the relaxed config.
|
403
512
|
|
404
513
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -409,7 +518,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
409
518
|
## 1.2.1 (2010-04-20)
|
410
519
|
|
411
520
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
412
|
-
remove the contents of all non-
|
521
|
+
remove the contents of all non-allowlisted elements in addition to the
|
413
522
|
elements themselves. If set to an array of element names, Sanitize will
|
414
523
|
remove the contents of only those elements (when filtered), and leave the
|
415
524
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -437,7 +546,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
437
546
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
438
547
|
all its children.
|
439
548
|
|
440
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
549
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
441
550
|
David Reese]
|
442
551
|
|
443
552
|
|
@@ -457,7 +566,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
457
566
|
|
458
567
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
459
568
|
being downcased in recent versions of Hpricot. This was exploitable to
|
460
|
-
prevent non-
|
569
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
461
570
|
Wanicur]
|
462
571
|
|
463
572
|
|
@@ -487,7 +596,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
487
596
|
|
488
597
|
## 1.0.5 (2009-02-05)
|
489
598
|
|
490
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
599
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
491
600
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
492
601
|
Dev Purkayastha]
|
493
602
|
|
@@ -497,7 +606,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
497
606
|
|
498
607
|
## 1.0.4 (2009-01-16)
|
499
608
|
|
500
|
-
* Fixed a bug that made it possible to sneak a non-
|
609
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
501
610
|
by repeating it several times in a row. All versions of Sanitize prior to
|
502
611
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
503
612
|
|
@@ -505,7 +614,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
505
614
|
## 1.0.3 (2009-01-15)
|
506
615
|
|
507
616
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
508
|
-
prevent non-
|
617
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
509
618
|
still decode the incomplete entities, users of those browsers may be
|
510
619
|
vulnerable to malicious script injection on websites using versions of
|
511
620
|
Sanitize prior to 1.0.3.
|
data/README.md
CHANGED
@@ -1,20 +1,19 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
be removed.
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
14
13
|
|
15
14
|
Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
|
16
15
|
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
|
17
|
-
exactly the same way modern browsers do. As long as your
|
16
|
+
exactly the same way modern browsers do. As long as your allowlist config only
|
18
17
|
allows safe markup and CSS, even the most malformed or malicious input will be
|
19
18
|
transformed into safe output.
|
20
19
|
|
@@ -88,7 +87,7 @@ Sanitize.fragment(html)
|
|
88
87
|
# => 'foo'
|
89
88
|
```
|
90
89
|
|
91
|
-
To keep certain elements, add them to the element
|
90
|
+
To keep certain elements, add them to the element allowlist.
|
92
91
|
|
93
92
|
```ruby
|
94
93
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +96,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
96
|
|
98
97
|
### HTML Documents
|
99
98
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
99
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
100
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
101
|
definitions.
|
103
102
|
|
@@ -123,8 +122,8 @@ Sanitize.document(html,
|
|
123
122
|
|
124
123
|
### CSS in HTML
|
125
124
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
125
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
126
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
127
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
128
|
allow CSS comments or browser compatibility hacks.
|
130
129
|
|
@@ -267,7 +266,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
266
|
|
268
267
|
```ruby
|
269
268
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
269
|
+
# existing allowlisted elements.
|
271
270
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
271
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
272
|
:remove_contents => true
|
@@ -395,8 +394,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
394
|
|
396
395
|
##### :css => :properties (Array or Set)
|
397
396
|
|
398
|
-
|
399
|
-
lowercase.
|
397
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
398
|
|
401
399
|
##### :css => :protocols (Array or Set)
|
402
400
|
|
@@ -417,6 +415,17 @@ elements not in this array will be removed.
|
|
417
415
|
]
|
418
416
|
```
|
419
417
|
|
418
|
+
#### :parser_options (Hash)
|
419
|
+
|
420
|
+
[Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
|
421
|
+
|
422
|
+
```ruby
|
423
|
+
:parser_options => {
|
424
|
+
max_errors: -1,
|
425
|
+
max_tree_depth: -1
|
426
|
+
}
|
427
|
+
```
|
428
|
+
|
420
429
|
#### :protocols (Hash)
|
421
430
|
|
422
431
|
URL protocols to allow in specific attributes. If an attribute is listed here
|
@@ -441,13 +450,13 @@ include the symbol `:relative` in the protocol array:
|
|
441
450
|
|
442
451
|
#### :remove_contents (boolean or Array or Set)
|
443
452
|
|
444
|
-
If
|
453
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
445
454
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
446
455
|
safe parts of an element's contents behind when the element is removed.
|
447
456
|
|
448
|
-
If
|
449
|
-
elements (when filtered) will be removed, and the contents of all
|
450
|
-
elements will be left behind.
|
457
|
+
If this is an Array or Set of element names, then only the contents of the
|
458
|
+
specified elements (when filtered) will be removed, and the contents of all
|
459
|
+
other filtered elements will be left behind.
|
451
460
|
|
452
461
|
The default value is `false`.
|
453
462
|
|
@@ -474,6 +483,15 @@ children, in which case it will be inserted after those children.
|
|
474
483
|
}
|
475
484
|
```
|
476
485
|
|
486
|
+
The default elements with whitespace added before and after are:
|
487
|
+
|
488
|
+
```
|
489
|
+
address article aside blockquote br dd div dl dt
|
490
|
+
footer h1 h2 h3 h4 h5 h6 header hgroup hr li nav
|
491
|
+
ol p pre section ul
|
492
|
+
|
493
|
+
```
|
494
|
+
|
477
495
|
## Transformers
|
478
496
|
|
479
497
|
Transformers allow you to filter and modify HTML nodes using your own custom
|
@@ -498,33 +516,33 @@ argument a Hash that contains the following items:
|
|
498
516
|
|
499
517
|
* **:config** - The current Sanitize configuration Hash.
|
500
518
|
|
501
|
-
* **:
|
519
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
502
520
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
503
|
-
a node that a previous transformer has
|
521
|
+
a node that a previous transformer has allowlisted.
|
504
522
|
|
505
523
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
506
524
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
507
525
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
508
526
|
selectively ignore node types you aren't interested in.
|
509
527
|
|
528
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
529
|
+
document that have been allowlisted by previous transformers, if any. It's
|
530
|
+
generally bad form to remove a node that a previous transformer has
|
531
|
+
allowlisted.
|
532
|
+
|
510
533
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
511
534
|
"div" or "span"). For non-element nodes, the name will be something like
|
512
535
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
513
536
|
|
514
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
515
|
-
document that have been whitelisted by previous transformers, if any. It's
|
516
|
-
generally bad form to remove a node that a previous transformer has
|
517
|
-
whitelisted.
|
518
|
-
|
519
537
|
### Output
|
520
538
|
|
521
539
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
522
540
|
which may contain the following items:
|
523
541
|
|
524
|
-
* **:
|
525
|
-
to add to the document's
|
526
|
-
These specific nodes and all their attributes will be
|
527
|
-
their children will not be.
|
542
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
543
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
544
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
545
|
+
but their children will not be.
|
528
546
|
|
529
547
|
If a transformer returns anything other than a Hash, the return value will be
|
530
548
|
ignored.
|
@@ -567,16 +585,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
567
585
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
568
586
|
your own hands.
|
569
587
|
|
570
|
-
### Example: Transformer to
|
588
|
+
### Example: Transformer to allow image URLs by domain
|
571
589
|
|
572
590
|
The following example demonstrates how to remove image elements unless they use
|
573
591
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
574
|
-
element and its `src` attribute are already
|
592
|
+
element and its `src` attribute are already allowlisted.
|
575
593
|
|
576
594
|
```ruby
|
577
595
|
require 'uri'
|
578
596
|
|
579
|
-
|
597
|
+
image_allowlist_transformer = lambda do |env|
|
580
598
|
# Ignore everything except <img> elements.
|
581
599
|
return unless env[:node_name] == 'img'
|
582
600
|
|
@@ -592,20 +610,20 @@ image_whitelist_transformer = lambda do |env|
|
|
592
610
|
end
|
593
611
|
```
|
594
612
|
|
595
|
-
### Example: Transformer to
|
613
|
+
### Example: Transformer to allow YouTube video embeds
|
596
614
|
|
597
615
|
The following example demonstrates how to create a transformer that will safely
|
598
|
-
|
599
|
-
|
600
|
-
|
616
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
617
|
+
content, which would be the case if you tried to do this by just allowing all
|
618
|
+
`<iframe>` elements:
|
601
619
|
|
602
620
|
```ruby
|
603
621
|
youtube_transformer = lambda do |env|
|
604
622
|
node = env[:node]
|
605
623
|
node_name = env[:node_name]
|
606
624
|
|
607
|
-
# Don't continue if this node is already
|
608
|
-
return if env[:
|
625
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
626
|
+
return if env[:is_allowlisted] || !node.element?
|
609
627
|
|
610
628
|
# Don't continue unless the node is an iframe.
|
611
629
|
return unless node_name == 'iframe'
|
@@ -626,8 +644,8 @@ youtube_transformer = lambda do |env|
|
|
626
644
|
|
627
645
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
628
646
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
629
|
-
# to
|
630
|
-
{:
|
647
|
+
# to allowlist the current node.
|
648
|
+
{:node_allowlist => [node]}
|
631
649
|
end
|
632
650
|
|
633
651
|
html = %[
|