sanitize 5.1.0 → 5.2.1
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +74 -18
- data/README.md +47 -38
- data/lib/sanitize.rb +15 -11
- data/lib/sanitize/config/default.rb +1 -1
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +11 -11
- data/lib/sanitize/version.rb +1 -1
- data/test/test_clean_element.rb +24 -14
- data/test/test_malicious_html.rb +20 -1
- data/test/test_parser.rb +1 -1
- data/test/test_sanitize.rb +1 -1
- data/test/test_sanitize_css.rb +4 -4
- data/test/test_transformers.rb +25 -19
- metadata +7 -7
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 3d1290690a9d32db9e06b8fb19c7e285c94a1d91ed51a4eb7e96389e427348d9
|
4
|
+
data.tar.gz: 5131063daf1763c83978954bed9ee3a783099e40aa71e50de26d06b8ae0c1054
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: bfcb7cda6aa70590f642583b41936bc09d8929210046cebdd0d0ff452ccb3213844b4c40d4e205e79c0cd64a2a0d56e16790e38f4c8f247b8abfa32dbec22297
|
7
|
+
data.tar.gz: 0ea5a6d6848f9a125f17e4e23145adff4d3c4ccfe30a3407466fae074ed33cbd4b1869eb5a9f0a72b808449b8cf166a3695c2a6d63b16a83b047fd260bfe50bd
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,61 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 5.2.1 (2020-06-16)
|
4
|
+
|
5
|
+
### Bug Fixes
|
6
|
+
|
7
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
8
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
9
|
+
|
10
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
11
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
12
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
13
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
14
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
15
|
+
|
16
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
17
|
+
config or a custom config that allows one or more of the following HTML
|
18
|
+
elements:
|
19
|
+
|
20
|
+
- `iframe`
|
21
|
+
- `math`
|
22
|
+
- `noembed`
|
23
|
+
- `noframes`
|
24
|
+
- `noscript`
|
25
|
+
- `plaintext`
|
26
|
+
- `script`
|
27
|
+
- `style`
|
28
|
+
- `svg`
|
29
|
+
- `xmp`
|
30
|
+
|
31
|
+
See the security advisory for more details, including a workaround if you're
|
32
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
33
|
+
|
34
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
35
|
+
helping to verify the fix.
|
36
|
+
|
37
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
38
|
+
|
39
|
+
## 5.2.0 (2020-06-06)
|
40
|
+
|
41
|
+
### Changes
|
42
|
+
|
43
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
44
|
+
source and documentation.
|
45
|
+
|
46
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
47
|
+
intent, there are inherent racial connotations in the implication that white
|
48
|
+
is good and black (as in "blacklist") is not.
|
49
|
+
|
50
|
+
This is a change I should have made long ago, and I apologize for not making
|
51
|
+
it sooner.
|
52
|
+
|
53
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
54
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
55
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
56
|
+
but they are no longer documented and may be removed in a future semver major
|
57
|
+
release.
|
58
|
+
|
3
59
|
## 5.1.0 (2019-09-07)
|
4
60
|
|
5
61
|
### Features
|
@@ -45,7 +101,7 @@ review the changes below carefully.
|
|
45
101
|
- `script`
|
46
102
|
- `style`
|
47
103
|
|
48
|
-
* Children of
|
104
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
49
105
|
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
50
106
|
`iframe` elements were allowed to contain fallback content for legacy
|
51
107
|
browsers, but it's been almost two decades since that was useful.
|
@@ -84,7 +140,7 @@ review the changes below carefully.
|
|
84
140
|
|
85
141
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
86
142
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
87
|
-
escaped output, allowing non-
|
143
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
88
144
|
elements.
|
89
145
|
|
90
146
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -128,7 +184,7 @@ review the changes below carefully.
|
|
128
184
|
|
129
185
|
## 4.4.0 (2016-09-29)
|
130
186
|
|
131
|
-
* Added `srcset` to the attribute
|
187
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
132
188
|
config. [@ejtttje - #156][156]
|
133
189
|
|
134
190
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -249,7 +305,7 @@ review the changes below carefully.
|
|
249
305
|
## 3.0.4 (2014-12-12)
|
250
306
|
|
251
307
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
252
|
-
caused the URL to be removed even when the protocol was
|
308
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
253
309
|
[@benubois - #126][126]
|
254
310
|
|
255
311
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -258,7 +314,7 @@ review the changes below carefully.
|
|
258
314
|
## 3.0.3 (2014-10-29)
|
259
315
|
|
260
316
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
261
|
-
`@media` block, causing them to be removed even when
|
317
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
262
318
|
have allowed them to remain. [#121][121]
|
263
319
|
|
264
320
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -323,7 +379,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
323
379
|
* The `clean_node!` method was renamed to `node!`.
|
324
380
|
|
325
381
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
326
|
-
isn't
|
382
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
327
383
|
regardless of the `:remove_contents` config setting.
|
328
384
|
|
329
385
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -334,7 +390,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
334
390
|
|
335
391
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
336
392
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
337
|
-
|
393
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
338
394
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
339
395
|
sanitize CSS stylesheets or properties.
|
340
396
|
|
@@ -386,7 +442,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
386
442
|
|
387
443
|
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
388
444
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
389
|
-
escaped output, allowing non-
|
445
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
390
446
|
elements.
|
391
447
|
|
392
448
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -401,7 +457,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
401
457
|
|
402
458
|
## 2.1.0 (2014-01-13)
|
403
459
|
|
404
|
-
* Added support for
|
460
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
405
461
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
406
462
|
indicate that arbitrary data attributes should be allowed on an element.
|
407
463
|
|
@@ -482,12 +538,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
482
538
|
the default depth-first mode.
|
483
539
|
|
484
540
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
485
|
-
elements to the
|
541
|
+
elements to the allowlists for the basic and relaxed configs.
|
486
542
|
|
487
543
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
488
|
-
`ruby`, and `wbr` elements to the
|
544
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
489
545
|
|
490
|
-
* The `dir`, `lang`, and `title` attributes are now
|
546
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
491
547
|
elements in the relaxed config.
|
492
548
|
|
493
549
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -498,7 +554,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
498
554
|
## 1.2.1 (2010-04-20)
|
499
555
|
|
500
556
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
501
|
-
remove the contents of all non-
|
557
|
+
remove the contents of all non-allowlisted elements in addition to the
|
502
558
|
elements themselves. If set to an array of element names, Sanitize will
|
503
559
|
remove the contents of only those elements (when filtered), and leave the
|
504
560
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -526,7 +582,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
526
582
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
527
583
|
all its children.
|
528
584
|
|
529
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
585
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
530
586
|
David Reese]
|
531
587
|
|
532
588
|
|
@@ -546,7 +602,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
546
602
|
|
547
603
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
548
604
|
being downcased in recent versions of Hpricot. This was exploitable to
|
549
|
-
prevent non-
|
605
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
550
606
|
Wanicur]
|
551
607
|
|
552
608
|
|
@@ -576,7 +632,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
576
632
|
|
577
633
|
## 1.0.5 (2009-02-05)
|
578
634
|
|
579
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
635
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
580
636
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
581
637
|
Dev Purkayastha]
|
582
638
|
|
@@ -586,7 +642,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
586
642
|
|
587
643
|
## 1.0.4 (2009-01-16)
|
588
644
|
|
589
|
-
* Fixed a bug that made it possible to sneak a non-
|
645
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
590
646
|
by repeating it several times in a row. All versions of Sanitize prior to
|
591
647
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
592
648
|
|
@@ -594,7 +650,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
594
650
|
## 1.0.3 (2009-01-15)
|
595
651
|
|
596
652
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
597
|
-
prevent non-
|
653
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
598
654
|
still decode the incomplete entities, users of those browsers may be
|
599
655
|
vulnerable to malicious script injection on websites using versions of
|
600
656
|
Sanitize prior to 1.0.3.
|
data/README.md
CHANGED
@@ -1,20 +1,19 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
be removed.
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
14
13
|
|
15
14
|
Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
|
16
15
|
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
|
17
|
-
exactly the same way modern browsers do. As long as your
|
16
|
+
exactly the same way modern browsers do. As long as your allowlist config only
|
18
17
|
allows safe markup and CSS, even the most malformed or malicious input will be
|
19
18
|
transformed into safe output.
|
20
19
|
|
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
|
|
73
72
|
* Standalone CSS stylesheets
|
74
73
|
* Standalone CSS properties
|
75
74
|
|
75
|
+
However, please note that Sanitize _cannot_ fully sanitize the contents of
|
76
|
+
`<math>` or `<svg>` elements, since these elements don't follow the same parsing
|
77
|
+
rules as the rest of HTML. If this is something you need, you may want to look
|
78
|
+
for another solution.
|
79
|
+
|
76
80
|
### HTML Fragments
|
77
81
|
|
78
82
|
A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
|
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
|
|
88
92
|
# => 'foo'
|
89
93
|
```
|
90
94
|
|
91
|
-
To keep certain elements, add them to the element
|
95
|
+
To keep certain elements, add them to the element allowlist.
|
92
96
|
|
93
97
|
```ruby
|
94
98
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
101
|
|
98
102
|
### HTML Documents
|
99
103
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
104
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
105
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
106
|
definitions.
|
103
107
|
|
@@ -123,8 +127,8 @@ Sanitize.document(html,
|
|
123
127
|
|
124
128
|
### CSS in HTML
|
125
129
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
130
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
131
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
132
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
133
|
allow CSS comments or browser compatibility hacks.
|
130
134
|
|
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
271
|
|
268
272
|
```ruby
|
269
273
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
274
|
+
# existing allowlisted elements.
|
271
275
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
276
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
277
|
:remove_contents => true
|
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
399
|
|
396
400
|
##### :css => :properties (Array or Set)
|
397
401
|
|
398
|
-
|
399
|
-
lowercase.
|
402
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
403
|
|
401
404
|
##### :css => :protocols (Array or Set)
|
402
405
|
|
@@ -417,6 +420,12 @@ elements not in this array will be removed.
|
|
417
420
|
]
|
418
421
|
```
|
419
422
|
|
423
|
+
**Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
|
424
|
+
elements, since these elements don't follow the same parsing rules as the rest
|
425
|
+
of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
|
426
|
+
content inside them will be allowed, even if that content would otherwise be
|
427
|
+
removed by Sanitize.
|
428
|
+
|
420
429
|
#### :parser_options (Hash)
|
421
430
|
|
422
431
|
[Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
|
@@ -452,7 +461,7 @@ include the symbol `:relative` in the protocol array:
|
|
452
461
|
|
453
462
|
#### :remove_contents (boolean or Array or Set)
|
454
463
|
|
455
|
-
If this is `true`, Sanitize will remove the contents of any non-
|
464
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
456
465
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
457
466
|
safe parts of an element's contents behind when the element is removed.
|
458
467
|
|
@@ -518,33 +527,33 @@ argument a Hash that contains the following items:
|
|
518
527
|
|
519
528
|
* **:config** - The current Sanitize configuration Hash.
|
520
529
|
|
521
|
-
* **:
|
530
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
522
531
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
523
|
-
a node that a previous transformer has
|
532
|
+
a node that a previous transformer has allowlisted.
|
524
533
|
|
525
534
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
526
535
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
527
536
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
528
537
|
selectively ignore node types you aren't interested in.
|
529
538
|
|
539
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
540
|
+
document that have been allowlisted by previous transformers, if any. It's
|
541
|
+
generally bad form to remove a node that a previous transformer has
|
542
|
+
allowlisted.
|
543
|
+
|
530
544
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
531
545
|
"div" or "span"). For non-element nodes, the name will be something like
|
532
546
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
533
547
|
|
534
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
535
|
-
document that have been whitelisted by previous transformers, if any. It's
|
536
|
-
generally bad form to remove a node that a previous transformer has
|
537
|
-
whitelisted.
|
538
|
-
|
539
548
|
### Output
|
540
549
|
|
541
550
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
542
551
|
which may contain the following items:
|
543
552
|
|
544
|
-
* **:
|
545
|
-
to add to the document's
|
546
|
-
These specific nodes and all their attributes will be
|
547
|
-
their children will not be.
|
553
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
554
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
555
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
556
|
+
but their children will not be.
|
548
557
|
|
549
558
|
If a transformer returns anything other than a Hash, the return value will be
|
550
559
|
ignored.
|
@@ -587,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
587
596
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
588
597
|
your own hands.
|
589
598
|
|
590
|
-
### Example: Transformer to
|
599
|
+
### Example: Transformer to allow image URLs by domain
|
591
600
|
|
592
601
|
The following example demonstrates how to remove image elements unless they use
|
593
602
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
594
|
-
element and its `src` attribute are already
|
603
|
+
element and its `src` attribute are already allowlisted.
|
595
604
|
|
596
605
|
```ruby
|
597
606
|
require 'uri'
|
598
607
|
|
599
|
-
|
608
|
+
image_allowlist_transformer = lambda do |env|
|
600
609
|
# Ignore everything except <img> elements.
|
601
610
|
return unless env[:node_name] == 'img'
|
602
611
|
|
@@ -612,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
|
|
612
621
|
end
|
613
622
|
```
|
614
623
|
|
615
|
-
### Example: Transformer to
|
624
|
+
### Example: Transformer to allow YouTube video embeds
|
616
625
|
|
617
626
|
The following example demonstrates how to create a transformer that will safely
|
618
|
-
|
619
|
-
|
620
|
-
|
627
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
628
|
+
content, which would be the case if you tried to do this by just allowing all
|
629
|
+
`<iframe>` elements:
|
621
630
|
|
622
631
|
```ruby
|
623
632
|
youtube_transformer = lambda do |env|
|
624
633
|
node = env[:node]
|
625
634
|
node_name = env[:node_name]
|
626
635
|
|
627
|
-
# Don't continue if this node is already
|
628
|
-
return if env[:
|
636
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
637
|
+
return if env[:is_allowlisted] || !node.element?
|
629
638
|
|
630
639
|
# Don't continue unless the node is an iframe.
|
631
640
|
return unless node_name == 'iframe'
|
@@ -646,8 +655,8 @@ youtube_transformer = lambda do |env|
|
|
646
655
|
|
647
656
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
648
657
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
649
|
-
# to
|
650
|
-
{:
|
658
|
+
# to allowlist the current node.
|
659
|
+
{:node_allowlist => [node]}
|
651
660
|
end
|
652
661
|
|
653
662
|
html = %[
|
data/lib/sanitize.rb
CHANGED
@@ -54,7 +54,7 @@ class Sanitize
|
|
54
54
|
# Returns a sanitized copy of the given full _html_ document, using the
|
55
55
|
# settings in _config_ if specified.
|
56
56
|
#
|
57
|
-
# When sanitizing a document, the `<html>` element must be
|
57
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
58
58
|
# error will be raised. If this is undesirable, you should probably use
|
59
59
|
# {#fragment} instead.
|
60
60
|
def self.document(html, config = {})
|
@@ -117,7 +117,7 @@ class Sanitize
|
|
117
117
|
|
118
118
|
# Returns a sanitized copy of the given _html_ document.
|
119
119
|
#
|
120
|
-
# When sanitizing a document, the `<html>` element must be
|
120
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
121
121
|
# error will be raised. If this is undesirable, you should probably use
|
122
122
|
# {#fragment} instead.
|
123
123
|
def document(html)
|
@@ -147,20 +147,20 @@ class Sanitize
|
|
147
147
|
# in place.
|
148
148
|
#
|
149
149
|
# If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
|
150
|
-
#
|
150
|
+
# allowlisted or an error will be raised.
|
151
151
|
def node!(node)
|
152
152
|
raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
|
153
153
|
|
154
154
|
if node.is_a?(Nokogiri::XML::Document)
|
155
155
|
unless @config[:elements].include?('html')
|
156
|
-
raise Error, 'When sanitizing a document, "<html>" must be
|
156
|
+
raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
|
157
157
|
end
|
158
158
|
end
|
159
159
|
|
160
|
-
|
160
|
+
node_allowlist = Set.new
|
161
161
|
|
162
162
|
traverse(node) do |n|
|
163
|
-
transform_node!(n,
|
163
|
+
transform_node!(n, node_allowlist)
|
164
164
|
end
|
165
165
|
|
166
166
|
node
|
@@ -189,7 +189,7 @@ class Sanitize
|
|
189
189
|
node.to_html(preserve_newline: true)
|
190
190
|
end
|
191
191
|
|
192
|
-
def transform_node!(node,
|
192
|
+
def transform_node!(node, node_allowlist)
|
193
193
|
@transformers.each do |transformer|
|
194
194
|
# Since transform_node! may be called in a tight loop to process thousands
|
195
195
|
# of items, we can optimize both memory and CPU performance by:
|
@@ -199,15 +199,19 @@ class Sanitize
|
|
199
199
|
# does merge! create a new hash, it is also 2.6x slower:
|
200
200
|
# https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
|
201
201
|
config = @transformer_config
|
202
|
-
config[:is_whitelisted] =
|
202
|
+
config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
|
203
203
|
config[:node] = node
|
204
204
|
config[:node_name] = node.name.downcase
|
205
|
-
config[:node_whitelist] =
|
205
|
+
config[:node_allowlist] = config[:node_whitelist] = node_allowlist
|
206
206
|
|
207
207
|
result = transformer.call(config)
|
208
208
|
|
209
|
-
if result.is_a?(Hash)
|
210
|
-
|
209
|
+
if result.is_a?(Hash)
|
210
|
+
result_allowlist = result[:node_allowlist] || result[:node_whitelist]
|
211
|
+
|
212
|
+
if result_allowlist.respond_to?(:each)
|
213
|
+
node_allowlist.merge(result_allowlist)
|
214
|
+
end
|
211
215
|
end
|
212
216
|
end
|
213
217
|
|
@@ -74,7 +74,7 @@ class Sanitize
|
|
74
74
|
# the specified elements (when filtered) will be removed, and the contents
|
75
75
|
# of all other filtered elements will be left behind.
|
76
76
|
:remove_contents => %w[
|
77
|
-
iframe noembed noframes noscript script style
|
77
|
+
iframe math noembed noframes noscript plaintext script style svg xmp
|
78
78
|
],
|
79
79
|
|
80
80
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
data/lib/sanitize/css.rb
CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
|
|
175
175
|
next prop
|
176
176
|
|
177
177
|
when :semicolon
|
178
|
-
# Only preserve the semicolon if it was preceded by
|
178
|
+
# Only preserve the semicolon if it was preceded by an allowlisted
|
179
179
|
# property. Otherwise, omit it in order to prevent redundant semicolons.
|
180
180
|
if preceded_by_property
|
181
181
|
preceded_by_property = false
|
@@ -296,7 +296,7 @@ class Sanitize; class CSS
|
|
296
296
|
end
|
297
297
|
|
298
298
|
# Returns `true` if the given node (which may be of type `:url` or
|
299
|
-
# `:function`, since the CSS syntax can produce both) uses
|
299
|
+
# `:function`, since the CSS syntax can produce both) uses an allowlisted
|
300
300
|
# protocol.
|
301
301
|
def valid_url?(node)
|
302
302
|
type = node[:node]
|
@@ -1,6 +1,6 @@
|
|
1
1
|
class Sanitize; module Transformers; module CSS
|
2
2
|
|
3
|
-
# Enforces a CSS
|
3
|
+
# Enforces a CSS allowlist on the contents of `style` attributes.
|
4
4
|
class CleanAttribute
|
5
5
|
def initialize(sanitizer_or_config)
|
6
6
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -14,7 +14,7 @@ class CleanAttribute
|
|
14
14
|
node = env[:node]
|
15
15
|
|
16
16
|
return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
|
17
|
-
node.key?('style') && !env[:
|
17
|
+
node.key?('style') && !env[:is_allowlisted]
|
18
18
|
|
19
19
|
attr = node.attribute('style')
|
20
20
|
css = @scss.properties(attr.value)
|
@@ -27,7 +27,7 @@ class CleanAttribute
|
|
27
27
|
end
|
28
28
|
end
|
29
29
|
|
30
|
-
# Enforces a CSS
|
30
|
+
# Enforces a CSS allowlist on the contents of `<style>` elements.
|
31
31
|
class CleanElement
|
32
32
|
def initialize(sanitizer_or_config)
|
33
33
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement
|
|
76
76
|
|
77
77
|
def call(env)
|
78
78
|
node = env[:node]
|
79
|
-
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:
|
79
|
+
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
|
80
80
|
|
81
81
|
name = env[:node_name]
|
82
82
|
|
83
|
-
# Delete any element that isn't in the config
|
83
|
+
# Delete any element that isn't in the config allowlist, unless the node has
|
84
84
|
# already been deleted from the document.
|
85
85
|
#
|
86
86
|
# It's important that we not try to reparent the children of a node that has
|
@@ -107,20 +107,20 @@ class Sanitize; module Transformers; class CleanElement
|
|
107
107
|
return
|
108
108
|
end
|
109
109
|
|
110
|
-
|
110
|
+
attr_allowlist = @attributes[name] || @attributes[:all]
|
111
111
|
|
112
|
-
if
|
113
|
-
# Delete all attributes from elements with no
|
112
|
+
if attr_allowlist.nil?
|
113
|
+
# Delete all attributes from elements with no allowlisted attributes.
|
114
114
|
node.attribute_nodes.each {|attr| attr.unlink }
|
115
115
|
else
|
116
|
-
allow_data_attributes =
|
116
|
+
allow_data_attributes = attr_allowlist.include?(:data)
|
117
117
|
|
118
118
|
# Delete any attribute that isn't allowed on this element.
|
119
119
|
node.attribute_nodes.each do |attr|
|
120
120
|
attr_name = attr.name.downcase
|
121
121
|
|
122
|
-
unless
|
123
|
-
# The attribute isn't
|
122
|
+
unless attr_allowlist.include?(attr_name)
|
123
|
+
# The attribute isn't allowed.
|
124
124
|
|
125
125
|
if allow_data_attributes && attr_name.start_with?('data-')
|
126
126
|
# Arbitrary data attributes are allowed. If this is a data
|
@@ -134,7 +134,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
134
134
|
next
|
135
135
|
end
|
136
136
|
|
137
|
-
# The attribute is
|
137
|
+
# The attribute is allowed.
|
138
138
|
|
139
139
|
# Remove any attributes that use unacceptable protocols.
|
140
140
|
if @protocols.include?(name) && @protocols[name].include?(attr_name)
|
@@ -162,7 +162,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
162
162
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
163
163
|
# attempt to preserve server-side includes. This can result in XSS since
|
164
164
|
# an unescaped double quote can allow an attacker to inject a
|
165
|
-
# non-
|
165
|
+
# non-allowlisted attribute.
|
166
166
|
#
|
167
167
|
# Sanitize works around this by implementing its own escaping for
|
168
168
|
# affected attributes, some of which can exist on any element and some
|
@@ -191,7 +191,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
191
191
|
# Element-specific special cases.
|
192
192
|
case name
|
193
193
|
|
194
|
-
# If this is
|
194
|
+
# If this is an allowlisted iframe that has children, remove all its
|
195
195
|
# children. The HTML standard says iframes shouldn't have content, but when
|
196
196
|
# they do, this content is parsed as text and is serialized verbatim without
|
197
197
|
# being escaped, which is unsafe because legacy browsers may still render it
|
data/lib/sanitize/version.rb
CHANGED
data/test/test_clean_element.rb
CHANGED
@@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
162
162
|
}
|
163
163
|
|
164
164
|
describe 'Default config' do
|
165
|
-
it 'should remove non-
|
165
|
+
it 'should remove non-allowlisted elements, leaving safe contents behind' do
|
166
166
|
Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
|
167
167
|
.must_equal 'foo bar baz quux'
|
168
168
|
|
@@ -192,21 +192,16 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
192
192
|
.must_equal ''
|
193
193
|
end
|
194
194
|
|
195
|
-
it 'should escape the content of removed `plaintext` elements' do
|
196
|
-
Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
|
197
|
-
.must_equal 'hello! <script>alert(0)</script>'
|
198
|
-
end
|
199
|
-
|
200
|
-
it 'should escape the content of removed `xmp` elements' do
|
201
|
-
Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
|
202
|
-
.must_equal 'hello! <script>alert(0)</script>'
|
203
|
-
end
|
204
|
-
|
205
195
|
it 'should not preserve the content of removed `iframe` elements' do
|
206
196
|
Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
|
207
197
|
.must_equal ''
|
208
198
|
end
|
209
199
|
|
200
|
+
it 'should not preserve the content of removed `math` elements' do
|
201
|
+
Sanitize.fragment('<math>hello! <script>alert(0)</script></math>')
|
202
|
+
.must_equal ''
|
203
|
+
end
|
204
|
+
|
210
205
|
it 'should not preserve the content of removed `noembed` elements' do
|
211
206
|
Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
|
212
207
|
.must_equal ''
|
@@ -222,6 +217,11 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
222
217
|
.must_equal ''
|
223
218
|
end
|
224
219
|
|
220
|
+
it 'should not preserve the content of removed `plaintext` elements' do
|
221
|
+
Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
|
222
|
+
.must_equal ''
|
223
|
+
end
|
224
|
+
|
225
225
|
it 'should not preserve the content of removed `script` elements' do
|
226
226
|
Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
|
227
227
|
.must_equal ''
|
@@ -232,6 +232,16 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
232
232
|
.must_equal ''
|
233
233
|
end
|
234
234
|
|
235
|
+
it 'should not preserve the content of removed `svg` elements' do
|
236
|
+
Sanitize.fragment('<svg>hello! <script>alert(0)</script></svg>')
|
237
|
+
.must_equal ''
|
238
|
+
end
|
239
|
+
|
240
|
+
it 'should not preserve the content of removed `xmp` elements' do
|
241
|
+
Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
|
242
|
+
.must_equal ''
|
243
|
+
end
|
244
|
+
|
235
245
|
strings.each do |name, data|
|
236
246
|
it "should clean #{name} HTML" do
|
237
247
|
Sanitize.fragment(data[:html]).must_equal(data[:default])
|
@@ -315,7 +325,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
315
325
|
end
|
316
326
|
|
317
327
|
describe 'Custom configs' do
|
318
|
-
it 'should allow attributes on all elements if
|
328
|
+
it 'should allow attributes on all elements if allowlisted under :all' do
|
319
329
|
input = '<p class="foo">bar</p>'
|
320
330
|
|
321
331
|
Sanitize.fragment(input).must_equal ' bar '
|
@@ -336,7 +346,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
336
346
|
}).must_equal input
|
337
347
|
end
|
338
348
|
|
339
|
-
it "should not allow relative URLs when relative URLs aren't
|
349
|
+
it "should not allow relative URLs when relative URLs aren't allowlisted" do
|
340
350
|
input = '<a href="/foo/bar">Link</a>'
|
341
351
|
|
342
352
|
Sanitize.fragment(input,
|
@@ -400,7 +410,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
400
410
|
).must_equal 'foo bar baz hi '
|
401
411
|
end
|
402
412
|
|
403
|
-
it 'should remove the contents of
|
413
|
+
it 'should remove the contents of allowlisted iframes' do
|
404
414
|
Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
|
405
415
|
:elements => ['iframe']
|
406
416
|
).must_equal '<iframe></iframe>'
|
data/test/test_malicious_html.rb
CHANGED
@@ -128,13 +128,15 @@ describe 'Malicious HTML' do
|
|
128
128
|
|
129
129
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
130
130
|
# attempt to preserve server-side includes. This can result in XSS since an
|
131
|
-
# unescaped double quote can allow an attacker to inject a non-
|
131
|
+
# unescaped double quote can allow an attacker to inject a non-allowlisted
|
132
132
|
# attribute. Sanitize works around this by implementing its own escaping for
|
133
133
|
# affected attributes.
|
134
134
|
#
|
135
135
|
# The relevant libxml2 code is here:
|
136
136
|
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
|
137
137
|
describe 'unsafe libxml2 server-side includes in attributes' do
|
138
|
+
using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
|
139
|
+
|
138
140
|
tag_configs = [
|
139
141
|
{
|
140
142
|
tag_name: 'a',
|
@@ -166,6 +168,8 @@ describe 'Malicious HTML' do
|
|
166
168
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
167
169
|
|
168
170
|
it 'should escape unsafe characters in attributes' do
|
171
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
172
|
+
|
169
173
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
170
174
|
# libxml2's.
|
171
175
|
@s.fragment(input).
|
@@ -191,6 +195,8 @@ describe 'Malicious HTML' do
|
|
191
195
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
192
196
|
|
193
197
|
it 'should not escape characters unnecessarily' do
|
198
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
199
|
+
|
194
200
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
195
201
|
# libxml2's.
|
196
202
|
@s.fragment(input).
|
@@ -213,4 +219,17 @@ describe 'Malicious HTML' do
|
|
213
219
|
end
|
214
220
|
end
|
215
221
|
end
|
222
|
+
|
223
|
+
# https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
224
|
+
describe 'foreign content bypass in relaxed config' do
|
225
|
+
it 'prevents a sanitization bypass via carefully crafted foreign content' do
|
226
|
+
%w[iframe noembed noframes noscript plaintext script style xmp].each do |tag_name|
|
227
|
+
@s.fragment(%[<math><#{tag_name}>/*</#{tag_name}><img src onerror=alert(1)>*/]).
|
228
|
+
must_equal ''
|
229
|
+
|
230
|
+
@s.fragment(%[<svg><#{tag_name}>/*</#{tag_name}><img src onerror=alert(1)>*/]).
|
231
|
+
must_equal ''
|
232
|
+
end
|
233
|
+
end
|
234
|
+
end
|
216
235
|
end
|
data/test/test_parser.rb
CHANGED
data/test/test_sanitize.rb
CHANGED
@@ -150,7 +150,7 @@ describe 'Sanitize' do
|
|
150
150
|
frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
|
151
151
|
end
|
152
152
|
|
153
|
-
describe "when the given node is a document and <html> isn't
|
153
|
+
describe "when the given node is a document and <html> isn't allowlisted" do
|
154
154
|
it 'should raise a Sanitize::Error' do
|
155
155
|
doc = Nokogiri::HTML5.parse('foo')
|
156
156
|
proc { @s.node!(doc) }.must_raise Sanitize::Error
|
data/test/test_sanitize_css.rb
CHANGED
@@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do
|
|
21
21
|
@custom.properties(css).must_equal 'background: #fff; '
|
22
22
|
end
|
23
23
|
|
24
|
-
it 'should allow
|
24
|
+
it 'should allow allowlisted URL protocols' do
|
25
25
|
[
|
26
26
|
"background: url(relative.jpg)",
|
27
27
|
"background: url('relative.jpg')",
|
@@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do
|
|
36
36
|
end
|
37
37
|
end
|
38
38
|
|
39
|
-
it 'should not allow non-
|
39
|
+
it 'should not allow non-allowlisted URL protocols' do
|
40
40
|
[
|
41
41
|
"background: url(javascript:alert(0))",
|
42
42
|
"background: url(ja\\56 ascript:alert(0))",
|
@@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do
|
|
307
307
|
end
|
308
308
|
|
309
309
|
describe ":at_rules" do
|
310
|
-
it "should remove blockless at-rules that aren't
|
310
|
+
it "should remove blockless at-rules that aren't allowlisted" do
|
311
311
|
css = %[
|
312
312
|
@charset 'utf-8';
|
313
313
|
@import url('foo.css');
|
@@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do
|
|
319
319
|
].strip
|
320
320
|
end
|
321
321
|
|
322
|
-
describe "when blockless at-rules are
|
322
|
+
describe "when blockless at-rules are allowlisted" do
|
323
323
|
before do
|
324
324
|
@scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
|
325
325
|
:at_rules => ['charset', 'import']
|
data/test/test_transformers.rb
CHANGED
@@ -12,11 +12,13 @@ describe 'Transformers' do
|
|
12
12
|
return unless env[:node].element?
|
13
13
|
|
14
14
|
env[:config][:foo].must_equal :bar
|
15
|
-
env[:
|
15
|
+
env[:is_allowlisted].must_equal false
|
16
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
16
17
|
env[:node].must_be_kind_of Nokogiri::XML::Node
|
17
18
|
env[:node_name].must_equal 'span'
|
18
|
-
env[:
|
19
|
-
env[:
|
19
|
+
env[:node_allowlist].must_be_kind_of Set
|
20
|
+
env[:node_allowlist].must_be_empty
|
21
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
20
22
|
}
|
21
23
|
)
|
22
24
|
end
|
@@ -43,34 +45,38 @@ describe 'Transformers' do
|
|
43
45
|
nodes.must_equal %w[div span strong b p]
|
44
46
|
end
|
45
47
|
|
46
|
-
it 'should
|
48
|
+
it 'should allowlist nodes in the node allowlist' do
|
47
49
|
Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
|
48
50
|
:transformers => [
|
49
51
|
proc {|env|
|
50
|
-
{:
|
52
|
+
{:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
|
51
53
|
},
|
52
54
|
|
53
55
|
proc {|env|
|
54
|
-
env[:
|
55
|
-
env[:
|
56
|
-
env[:
|
56
|
+
env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
|
57
|
+
env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
|
58
|
+
env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
|
59
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
60
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
57
61
|
}
|
58
62
|
]
|
59
63
|
).must_equal '<div class="foo">foo</div>bar'
|
60
64
|
end
|
61
65
|
|
62
|
-
it 'should clear the node
|
66
|
+
it 'should clear the node allowlist after each fragment' do
|
63
67
|
called = false
|
64
68
|
|
65
69
|
Sanitize.fragment('<div>foo</div>',
|
66
|
-
:transformers => proc {|env| {:
|
70
|
+
:transformers => proc {|env| {:node_allowlist => [env[:node]]}}
|
67
71
|
)
|
68
72
|
|
69
73
|
Sanitize.fragment('<div>foo</div>',
|
70
74
|
:transformers => proc {|env|
|
71
75
|
called = true
|
72
|
-
env[:
|
73
|
-
env[:
|
76
|
+
env[:is_allowlisted].must_equal false
|
77
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
78
|
+
env[:node_allowlist].must_be_empty
|
79
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
74
80
|
}
|
75
81
|
)
|
76
82
|
|
@@ -83,10 +89,10 @@ describe 'Transformers' do
|
|
83
89
|
.must_equal(' foo ')
|
84
90
|
end
|
85
91
|
|
86
|
-
describe 'Image
|
92
|
+
describe 'Image allowlist transformer' do
|
87
93
|
require 'uri'
|
88
94
|
|
89
|
-
|
95
|
+
image_allowlist_transformer = lambda do |env|
|
90
96
|
# Ignore everything except <img> elements.
|
91
97
|
return unless env[:node_name] == 'img'
|
92
98
|
|
@@ -103,7 +109,7 @@ describe 'Transformers' do
|
|
103
109
|
|
104
110
|
before do
|
105
111
|
@s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
|
106
|
-
:transformers =>
|
112
|
+
:transformers => image_allowlist_transformer))
|
107
113
|
end
|
108
114
|
|
109
115
|
it 'should allow images with relative URLs' do
|
@@ -142,8 +148,8 @@ describe 'Transformers' do
|
|
142
148
|
node = env[:node]
|
143
149
|
node_name = env[:node_name]
|
144
150
|
|
145
|
-
# Don't continue if this node is already
|
146
|
-
return if env[:
|
151
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
152
|
+
return if env[:is_allowlisted] || !node.element?
|
147
153
|
|
148
154
|
# Don't continue unless the node is an iframe.
|
149
155
|
return unless node_name == 'iframe'
|
@@ -164,8 +170,8 @@ describe 'Transformers' do
|
|
164
170
|
|
165
171
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
166
172
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
167
|
-
# to
|
168
|
-
{:
|
173
|
+
# to allowlist the current node.
|
174
|
+
{:node_allowlist => [node]}
|
169
175
|
end
|
170
176
|
|
171
177
|
it 'should allow HTTP YouTube video embeds' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 5.1
|
4
|
+
version: 5.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-06-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: crass
|
@@ -80,9 +80,9 @@ dependencies:
|
|
80
80
|
- - "~>"
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: 12.3.1
|
83
|
-
description: Sanitize is
|
84
|
-
|
85
|
-
|
83
|
+
description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
|
84
|
+
HTML and/or CSS from a string except the elements, attributes, and properties you
|
85
|
+
choose to allow.
|
86
86
|
email: ryan@wonko.com
|
87
87
|
executables: []
|
88
88
|
extensions: []
|
@@ -135,8 +135,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
135
135
|
- !ruby/object:Gem::Version
|
136
136
|
version: 1.2.0
|
137
137
|
requirements: []
|
138
|
-
rubygems_version: 3.
|
138
|
+
rubygems_version: 3.1.2
|
139
139
|
signing_key:
|
140
140
|
specification_version: 4
|
141
|
-
summary:
|
141
|
+
summary: Allowlist-based HTML and CSS sanitizer.
|
142
142
|
test_files: []
|