sanitize 5.1.0 → 5.2.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +38 -18
- data/README.md +36 -38
- data/lib/sanitize.rb +15 -11
- data/lib/sanitize/css.rb +2 -2
- data/lib/sanitize/transformers/clean_comment.rb +1 -1
- data/lib/sanitize/transformers/clean_css.rb +3 -3
- data/lib/sanitize/transformers/clean_doctype.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +11 -11
- data/lib/sanitize/version.rb +1 -1
- data/test/test_clean_element.rb +4 -4
- data/test/test_malicious_html.rb +7 -1
- data/test/test_parser.rb +1 -1
- data/test/test_sanitize.rb +1 -1
- data/test/test_sanitize_css.rb +4 -4
- data/test/test_transformers.rb +25 -19
- metadata +6 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 4f01a992746ecc3f28e9c1fd14c08c99456fb98a59c0b7ba6a8c6f01d0ab07cf
|
4
|
+
data.tar.gz: 4f379538b26db4d239078ea7e54fea3b106e7801d093ed7407e9b71282f6c4d3
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 52d96c5f73eea8d738fe23d816d5aec856f9f37ca37cf88d88d385fcffbf242605d13494ab531b517af7bdea44bfae2569f27bc2d5fb005dbeee85a54211d674
|
7
|
+
data.tar.gz: 897e95c05448509cfeb455bb4ec156ff7557495987e1d058ff63b888f9c0069a821a9b3684e0fe0463f78e4f28faf9fe2089760ad59bbbd1b5a5390fe9632154
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,25 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 5.2.0 (2020-06-06)
|
4
|
+
|
5
|
+
### Changes
|
6
|
+
|
7
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
8
|
+
source and documentation.
|
9
|
+
|
10
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
11
|
+
intent, there are inherent racial connotations in the implication that white
|
12
|
+
is good and black (as in "blacklist") is not.
|
13
|
+
|
14
|
+
This is a change I should have made long ago, and I apologize for not making
|
15
|
+
it sooner.
|
16
|
+
|
17
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
18
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
19
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
20
|
+
but they are no longer documented and may be removed in a future semver major
|
21
|
+
release.
|
22
|
+
|
3
23
|
## 5.1.0 (2019-09-07)
|
4
24
|
|
5
25
|
### Features
|
@@ -45,7 +65,7 @@ review the changes below carefully.
|
|
45
65
|
- `script`
|
46
66
|
- `style`
|
47
67
|
|
48
|
-
* Children of
|
68
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
49
69
|
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
50
70
|
`iframe` elements were allowed to contain fallback content for legacy
|
51
71
|
browsers, but it's been almost two decades since that was useful.
|
@@ -84,7 +104,7 @@ review the changes below carefully.
|
|
84
104
|
|
85
105
|
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
86
106
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
87
|
-
escaped output, allowing non-
|
107
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
88
108
|
elements.
|
89
109
|
|
90
110
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -128,7 +148,7 @@ review the changes below carefully.
|
|
128
148
|
|
129
149
|
## 4.4.0 (2016-09-29)
|
130
150
|
|
131
|
-
* Added `srcset` to the attribute
|
151
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
132
152
|
config. [@ejtttje - #156][156]
|
133
153
|
|
134
154
|
[156]:https://github.com/rgrove/sanitize/pull/156
|
@@ -249,7 +269,7 @@ review the changes below carefully.
|
|
249
269
|
## 3.0.4 (2014-12-12)
|
250
270
|
|
251
271
|
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
252
|
-
caused the URL to be removed even when the protocol was
|
272
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
253
273
|
[@benubois - #126][126]
|
254
274
|
|
255
275
|
[126]:https://github.com/rgrove/sanitize/pull/126
|
@@ -258,7 +278,7 @@ review the changes below carefully.
|
|
258
278
|
## 3.0.3 (2014-10-29)
|
259
279
|
|
260
280
|
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
261
|
-
`@media` block, causing them to be removed even when
|
281
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
262
282
|
have allowed them to remain. [#121][121]
|
263
283
|
|
264
284
|
[121]:https://github.com/rgrove/sanitize/issues/121
|
@@ -323,7 +343,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
323
343
|
* The `clean_node!` method was renamed to `node!`.
|
324
344
|
|
325
345
|
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
326
|
-
isn't
|
346
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
327
347
|
regardless of the `:remove_contents` config setting.
|
328
348
|
|
329
349
|
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
@@ -334,7 +354,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
334
354
|
|
335
355
|
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
336
356
|
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
337
|
-
|
357
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
338
358
|
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
339
359
|
sanitize CSS stylesheets or properties.
|
340
360
|
|
@@ -386,7 +406,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
386
406
|
|
387
407
|
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
388
408
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
389
|
-
escaped output, allowing non-
|
409
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
390
410
|
elements.
|
391
411
|
|
392
412
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -401,7 +421,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
401
421
|
|
402
422
|
## 2.1.0 (2014-01-13)
|
403
423
|
|
404
|
-
* Added support for
|
424
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
405
425
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
406
426
|
indicate that arbitrary data attributes should be allowed on an element.
|
407
427
|
|
@@ -482,12 +502,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
482
502
|
the default depth-first mode.
|
483
503
|
|
484
504
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
485
|
-
elements to the
|
505
|
+
elements to the allowlists for the basic and relaxed configs.
|
486
506
|
|
487
507
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
488
|
-
`ruby`, and `wbr` elements to the
|
508
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
489
509
|
|
490
|
-
* The `dir`, `lang`, and `title` attributes are now
|
510
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
491
511
|
elements in the relaxed config.
|
492
512
|
|
493
513
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
@@ -498,7 +518,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
498
518
|
## 1.2.1 (2010-04-20)
|
499
519
|
|
500
520
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
501
|
-
remove the contents of all non-
|
521
|
+
remove the contents of all non-allowlisted elements in addition to the
|
502
522
|
elements themselves. If set to an array of element names, Sanitize will
|
503
523
|
remove the contents of only those elements (when filtered), and leave the
|
504
524
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
@@ -526,7 +546,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
526
546
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
527
547
|
all its children.
|
528
548
|
|
529
|
-
* Added elements `<h1>` through `<h6>` to the Relaxed
|
549
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
530
550
|
David Reese]
|
531
551
|
|
532
552
|
|
@@ -546,7 +566,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
546
566
|
|
547
567
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
548
568
|
being downcased in recent versions of Hpricot. This was exploitable to
|
549
|
-
prevent non-
|
569
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
550
570
|
Wanicur]
|
551
571
|
|
552
572
|
|
@@ -576,7 +596,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
576
596
|
|
577
597
|
## 1.0.5 (2009-02-05)
|
578
598
|
|
579
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
599
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
580
600
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
581
601
|
Dev Purkayastha]
|
582
602
|
|
@@ -586,7 +606,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
586
606
|
|
587
607
|
## 1.0.4 (2009-01-16)
|
588
608
|
|
589
|
-
* Fixed a bug that made it possible to sneak a non-
|
609
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
590
610
|
by repeating it several times in a row. All versions of Sanitize prior to
|
591
611
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
592
612
|
|
@@ -594,7 +614,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
594
614
|
## 1.0.3 (2009-01-15)
|
595
615
|
|
596
616
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
597
|
-
prevent non-
|
617
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
598
618
|
still decode the incomplete entities, users of those browsers may be
|
599
619
|
vulnerable to malicious script injection on websites using versions of
|
600
620
|
Sanitize prior to 1.0.3.
|
data/README.md
CHANGED
@@ -1,20 +1,19 @@
|
|
1
1
|
Sanitize
|
2
2
|
========
|
3
3
|
|
4
|
-
Sanitize is
|
5
|
-
elements, attributes, and
|
6
|
-
|
4
|
+
Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
|
5
|
+
and/or CSS from a string except the elements, attributes, and properties you
|
6
|
+
choose to allow.
|
7
7
|
|
8
8
|
Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
|
9
9
|
elements, certain attributes within those elements, and even certain URL
|
10
|
-
protocols within attributes that contain URLs. You can also
|
11
|
-
properties, @ rules, and URL protocols
|
12
|
-
|
13
|
-
be removed.
|
10
|
+
protocols within attributes that contain URLs. You can also allow specific CSS
|
11
|
+
properties, @ rules, and URL protocols in elements or attributes containing CSS.
|
12
|
+
Any HTML or CSS that you don't explicitly allow will be removed.
|
14
13
|
|
15
14
|
Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
|
16
15
|
exactly the same way modern browsers do, and [Crass][crass], which parses CSS
|
17
|
-
exactly the same way modern browsers do. As long as your
|
16
|
+
exactly the same way modern browsers do. As long as your allowlist config only
|
18
17
|
allows safe markup and CSS, even the most malformed or malicious input will be
|
19
18
|
transformed into safe output.
|
20
19
|
|
@@ -88,7 +87,7 @@ Sanitize.fragment(html)
|
|
88
87
|
# => 'foo'
|
89
88
|
```
|
90
89
|
|
91
|
-
To keep certain elements, add them to the element
|
90
|
+
To keep certain elements, add them to the element allowlist.
|
92
91
|
|
93
92
|
```ruby
|
94
93
|
Sanitize.fragment(html, :elements => ['b'])
|
@@ -97,7 +96,7 @@ Sanitize.fragment(html, :elements => ['b'])
|
|
97
96
|
|
98
97
|
### HTML Documents
|
99
98
|
|
100
|
-
When sanitizing a document, the `<html>` element must be
|
99
|
+
When sanitizing a document, the `<html>` element must be allowlisted. You can
|
101
100
|
also set `:allow_doctype` to `true` to allow well-formed document type
|
102
101
|
definitions.
|
103
102
|
|
@@ -123,8 +122,8 @@ Sanitize.document(html,
|
|
123
122
|
|
124
123
|
### CSS in HTML
|
125
124
|
|
126
|
-
To sanitize CSS in an HTML fragment or document, first
|
127
|
-
element and/or the `style` attribute. Then
|
125
|
+
To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
|
126
|
+
element and/or the `style` attribute. Then allowlist the CSS properties,
|
128
127
|
@ rules, and URL protocols you wish to allow. You can also choose whether to
|
129
128
|
allow CSS comments or browser compatibility hacks.
|
130
129
|
|
@@ -267,7 +266,7 @@ new copy using `Sanitize::Config.merge()`, like so:
|
|
267
266
|
|
268
267
|
```ruby
|
269
268
|
# Create a customized copy of the Basic config, adding <div> and <table> to the
|
270
|
-
# existing
|
269
|
+
# existing allowlisted elements.
|
271
270
|
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
272
271
|
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
273
272
|
:remove_contents => true
|
@@ -395,8 +394,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
|
|
395
394
|
|
396
395
|
##### :css => :properties (Array or Set)
|
397
396
|
|
398
|
-
|
399
|
-
lowercase.
|
397
|
+
List of CSS property names to allow. Names should be specified in lowercase.
|
400
398
|
|
401
399
|
##### :css => :protocols (Array or Set)
|
402
400
|
|
@@ -452,7 +450,7 @@ include the symbol `:relative` in the protocol array:
|
|
452
450
|
|
453
451
|
#### :remove_contents (boolean or Array or Set)
|
454
452
|
|
455
|
-
If this is `true`, Sanitize will remove the contents of any non-
|
453
|
+
If this is `true`, Sanitize will remove the contents of any non-allowlisted
|
456
454
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
457
455
|
safe parts of an element's contents behind when the element is removed.
|
458
456
|
|
@@ -518,33 +516,33 @@ argument a Hash that contains the following items:
|
|
518
516
|
|
519
517
|
* **:config** - The current Sanitize configuration Hash.
|
520
518
|
|
521
|
-
* **:
|
519
|
+
* **:is_allowlisted** - `true` if the current node has been allowlisted by a
|
522
520
|
previous transformer, `false` otherwise. It's generally bad form to remove
|
523
|
-
a node that a previous transformer has
|
521
|
+
a node that a previous transformer has allowlisted.
|
524
522
|
|
525
523
|
* **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
|
526
524
|
node may be an element, a text node, a comment, a CDATA node, or a document
|
527
525
|
fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
|
528
526
|
selectively ignore node types you aren't interested in.
|
529
527
|
|
528
|
+
* **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
|
529
|
+
document that have been allowlisted by previous transformers, if any. It's
|
530
|
+
generally bad form to remove a node that a previous transformer has
|
531
|
+
allowlisted.
|
532
|
+
|
530
533
|
* **:node_name** - The name of the current HTML node, always lowercase (e.g.
|
531
534
|
"div" or "span"). For non-element nodes, the name will be something like
|
532
535
|
"text", "comment", "#cdata-section", "#document-fragment", etc.
|
533
536
|
|
534
|
-
* **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
|
535
|
-
document that have been whitelisted by previous transformers, if any. It's
|
536
|
-
generally bad form to remove a node that a previous transformer has
|
537
|
-
whitelisted.
|
538
|
-
|
539
537
|
### Output
|
540
538
|
|
541
539
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
542
540
|
which may contain the following items:
|
543
541
|
|
544
|
-
* **:
|
545
|
-
to add to the document's
|
546
|
-
These specific nodes and all their attributes will be
|
547
|
-
their children will not be.
|
542
|
+
* **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
|
543
|
+
objects to add to the document's allowlist, bypassing the current Sanitize
|
544
|
+
config. These specific nodes and all their attributes will be allowlisted,
|
545
|
+
but their children will not be.
|
548
546
|
|
549
547
|
If a transformer returns anything other than a Hash, the return value will be
|
550
548
|
ignored.
|
@@ -587,16 +585,16 @@ Transformers have a tremendous amount of power, including the power to
|
|
587
585
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
588
586
|
your own hands.
|
589
587
|
|
590
|
-
### Example: Transformer to
|
588
|
+
### Example: Transformer to allow image URLs by domain
|
591
589
|
|
592
590
|
The following example demonstrates how to remove image elements unless they use
|
593
591
|
a relative URL or are hosted on a specific domain. It assumes that the `<img>`
|
594
|
-
element and its `src` attribute are already
|
592
|
+
element and its `src` attribute are already allowlisted.
|
595
593
|
|
596
594
|
```ruby
|
597
595
|
require 'uri'
|
598
596
|
|
599
|
-
|
597
|
+
image_allowlist_transformer = lambda do |env|
|
600
598
|
# Ignore everything except <img> elements.
|
601
599
|
return unless env[:node_name] == 'img'
|
602
600
|
|
@@ -612,20 +610,20 @@ image_whitelist_transformer = lambda do |env|
|
|
612
610
|
end
|
613
611
|
```
|
614
612
|
|
615
|
-
### Example: Transformer to
|
613
|
+
### Example: Transformer to allow YouTube video embeds
|
616
614
|
|
617
615
|
The following example demonstrates how to create a transformer that will safely
|
618
|
-
|
619
|
-
|
620
|
-
|
616
|
+
allow valid YouTube video embeds without having to allow other kinds of embedded
|
617
|
+
content, which would be the case if you tried to do this by just allowing all
|
618
|
+
`<iframe>` elements:
|
621
619
|
|
622
620
|
```ruby
|
623
621
|
youtube_transformer = lambda do |env|
|
624
622
|
node = env[:node]
|
625
623
|
node_name = env[:node_name]
|
626
624
|
|
627
|
-
# Don't continue if this node is already
|
628
|
-
return if env[:
|
625
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
626
|
+
return if env[:is_allowlisted] || !node.element?
|
629
627
|
|
630
628
|
# Don't continue unless the node is an iframe.
|
631
629
|
return unless node_name == 'iframe'
|
@@ -646,8 +644,8 @@ youtube_transformer = lambda do |env|
|
|
646
644
|
|
647
645
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
648
646
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
649
|
-
# to
|
650
|
-
{:
|
647
|
+
# to allowlist the current node.
|
648
|
+
{:node_allowlist => [node]}
|
651
649
|
end
|
652
650
|
|
653
651
|
html = %[
|
data/lib/sanitize.rb
CHANGED
@@ -54,7 +54,7 @@ class Sanitize
|
|
54
54
|
# Returns a sanitized copy of the given full _html_ document, using the
|
55
55
|
# settings in _config_ if specified.
|
56
56
|
#
|
57
|
-
# When sanitizing a document, the `<html>` element must be
|
57
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
58
58
|
# error will be raised. If this is undesirable, you should probably use
|
59
59
|
# {#fragment} instead.
|
60
60
|
def self.document(html, config = {})
|
@@ -117,7 +117,7 @@ class Sanitize
|
|
117
117
|
|
118
118
|
# Returns a sanitized copy of the given _html_ document.
|
119
119
|
#
|
120
|
-
# When sanitizing a document, the `<html>` element must be
|
120
|
+
# When sanitizing a document, the `<html>` element must be allowlisted or an
|
121
121
|
# error will be raised. If this is undesirable, you should probably use
|
122
122
|
# {#fragment} instead.
|
123
123
|
def document(html)
|
@@ -147,20 +147,20 @@ class Sanitize
|
|
147
147
|
# in place.
|
148
148
|
#
|
149
149
|
# If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
|
150
|
-
#
|
150
|
+
# allowlisted or an error will be raised.
|
151
151
|
def node!(node)
|
152
152
|
raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
|
153
153
|
|
154
154
|
if node.is_a?(Nokogiri::XML::Document)
|
155
155
|
unless @config[:elements].include?('html')
|
156
|
-
raise Error, 'When sanitizing a document, "<html>" must be
|
156
|
+
raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
|
157
157
|
end
|
158
158
|
end
|
159
159
|
|
160
|
-
|
160
|
+
node_allowlist = Set.new
|
161
161
|
|
162
162
|
traverse(node) do |n|
|
163
|
-
transform_node!(n,
|
163
|
+
transform_node!(n, node_allowlist)
|
164
164
|
end
|
165
165
|
|
166
166
|
node
|
@@ -189,7 +189,7 @@ class Sanitize
|
|
189
189
|
node.to_html(preserve_newline: true)
|
190
190
|
end
|
191
191
|
|
192
|
-
def transform_node!(node,
|
192
|
+
def transform_node!(node, node_allowlist)
|
193
193
|
@transformers.each do |transformer|
|
194
194
|
# Since transform_node! may be called in a tight loop to process thousands
|
195
195
|
# of items, we can optimize both memory and CPU performance by:
|
@@ -199,15 +199,19 @@ class Sanitize
|
|
199
199
|
# does merge! create a new hash, it is also 2.6x slower:
|
200
200
|
# https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
|
201
201
|
config = @transformer_config
|
202
|
-
config[:is_whitelisted] =
|
202
|
+
config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
|
203
203
|
config[:node] = node
|
204
204
|
config[:node_name] = node.name.downcase
|
205
|
-
config[:node_whitelist] =
|
205
|
+
config[:node_allowlist] = config[:node_whitelist] = node_allowlist
|
206
206
|
|
207
207
|
result = transformer.call(config)
|
208
208
|
|
209
|
-
if result.is_a?(Hash)
|
210
|
-
|
209
|
+
if result.is_a?(Hash)
|
210
|
+
result_allowlist = result[:node_allowlist] || result[:node_whitelist]
|
211
|
+
|
212
|
+
if result_allowlist.respond_to?(:each)
|
213
|
+
node_allowlist.merge(result_allowlist)
|
214
|
+
end
|
211
215
|
end
|
212
216
|
end
|
213
217
|
|
data/lib/sanitize/css.rb
CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
|
|
175
175
|
next prop
|
176
176
|
|
177
177
|
when :semicolon
|
178
|
-
# Only preserve the semicolon if it was preceded by
|
178
|
+
# Only preserve the semicolon if it was preceded by an allowlisted
|
179
179
|
# property. Otherwise, omit it in order to prevent redundant semicolons.
|
180
180
|
if preceded_by_property
|
181
181
|
preceded_by_property = false
|
@@ -296,7 +296,7 @@ class Sanitize; class CSS
|
|
296
296
|
end
|
297
297
|
|
298
298
|
# Returns `true` if the given node (which may be of type `:url` or
|
299
|
-
# `:function`, since the CSS syntax can produce both) uses
|
299
|
+
# `:function`, since the CSS syntax can produce both) uses an allowlisted
|
300
300
|
# protocol.
|
301
301
|
def valid_url?(node)
|
302
302
|
type = node[:node]
|
@@ -1,6 +1,6 @@
|
|
1
1
|
class Sanitize; module Transformers; module CSS
|
2
2
|
|
3
|
-
# Enforces a CSS
|
3
|
+
# Enforces a CSS allowlist on the contents of `style` attributes.
|
4
4
|
class CleanAttribute
|
5
5
|
def initialize(sanitizer_or_config)
|
6
6
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -14,7 +14,7 @@ class CleanAttribute
|
|
14
14
|
node = env[:node]
|
15
15
|
|
16
16
|
return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
|
17
|
-
node.key?('style') && !env[:
|
17
|
+
node.key?('style') && !env[:is_allowlisted]
|
18
18
|
|
19
19
|
attr = node.attribute('style')
|
20
20
|
css = @scss.properties(attr.value)
|
@@ -27,7 +27,7 @@ class CleanAttribute
|
|
27
27
|
end
|
28
28
|
end
|
29
29
|
|
30
|
-
# Enforces a CSS
|
30
|
+
# Enforces a CSS allowlist on the contents of `<style>` elements.
|
31
31
|
class CleanElement
|
32
32
|
def initialize(sanitizer_or_config)
|
33
33
|
if Sanitize::CSS === sanitizer_or_config
|
@@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement
|
|
76
76
|
|
77
77
|
def call(env)
|
78
78
|
node = env[:node]
|
79
|
-
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:
|
79
|
+
return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
|
80
80
|
|
81
81
|
name = env[:node_name]
|
82
82
|
|
83
|
-
# Delete any element that isn't in the config
|
83
|
+
# Delete any element that isn't in the config allowlist, unless the node has
|
84
84
|
# already been deleted from the document.
|
85
85
|
#
|
86
86
|
# It's important that we not try to reparent the children of a node that has
|
@@ -107,20 +107,20 @@ class Sanitize; module Transformers; class CleanElement
|
|
107
107
|
return
|
108
108
|
end
|
109
109
|
|
110
|
-
|
110
|
+
attr_allowlist = @attributes[name] || @attributes[:all]
|
111
111
|
|
112
|
-
if
|
113
|
-
# Delete all attributes from elements with no
|
112
|
+
if attr_allowlist.nil?
|
113
|
+
# Delete all attributes from elements with no allowlisted attributes.
|
114
114
|
node.attribute_nodes.each {|attr| attr.unlink }
|
115
115
|
else
|
116
|
-
allow_data_attributes =
|
116
|
+
allow_data_attributes = attr_allowlist.include?(:data)
|
117
117
|
|
118
118
|
# Delete any attribute that isn't allowed on this element.
|
119
119
|
node.attribute_nodes.each do |attr|
|
120
120
|
attr_name = attr.name.downcase
|
121
121
|
|
122
|
-
unless
|
123
|
-
# The attribute isn't
|
122
|
+
unless attr_allowlist.include?(attr_name)
|
123
|
+
# The attribute isn't allowed.
|
124
124
|
|
125
125
|
if allow_data_attributes && attr_name.start_with?('data-')
|
126
126
|
# Arbitrary data attributes are allowed. If this is a data
|
@@ -134,7 +134,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
134
134
|
next
|
135
135
|
end
|
136
136
|
|
137
|
-
# The attribute is
|
137
|
+
# The attribute is allowed.
|
138
138
|
|
139
139
|
# Remove any attributes that use unacceptable protocols.
|
140
140
|
if @protocols.include?(name) && @protocols[name].include?(attr_name)
|
@@ -162,7 +162,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
162
162
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
163
163
|
# attempt to preserve server-side includes. This can result in XSS since
|
164
164
|
# an unescaped double quote can allow an attacker to inject a
|
165
|
-
# non-
|
165
|
+
# non-allowlisted attribute.
|
166
166
|
#
|
167
167
|
# Sanitize works around this by implementing its own escaping for
|
168
168
|
# affected attributes, some of which can exist on any element and some
|
@@ -191,7 +191,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
191
191
|
# Element-specific special cases.
|
192
192
|
case name
|
193
193
|
|
194
|
-
# If this is
|
194
|
+
# If this is an allowlisted iframe that has children, remove all its
|
195
195
|
# children. The HTML standard says iframes shouldn't have content, but when
|
196
196
|
# they do, this content is parsed as text and is serialized verbatim without
|
197
197
|
# being escaped, which is unsafe because legacy browsers may still render it
|
data/lib/sanitize/version.rb
CHANGED
data/test/test_clean_element.rb
CHANGED
@@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
162
162
|
}
|
163
163
|
|
164
164
|
describe 'Default config' do
|
165
|
-
it 'should remove non-
|
165
|
+
it 'should remove non-allowlisted elements, leaving safe contents behind' do
|
166
166
|
Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
|
167
167
|
.must_equal 'foo bar baz quux'
|
168
168
|
|
@@ -315,7 +315,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
315
315
|
end
|
316
316
|
|
317
317
|
describe 'Custom configs' do
|
318
|
-
it 'should allow attributes on all elements if
|
318
|
+
it 'should allow attributes on all elements if allowlisted under :all' do
|
319
319
|
input = '<p class="foo">bar</p>'
|
320
320
|
|
321
321
|
Sanitize.fragment(input).must_equal ' bar '
|
@@ -336,7 +336,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
336
336
|
}).must_equal input
|
337
337
|
end
|
338
338
|
|
339
|
-
it "should not allow relative URLs when relative URLs aren't
|
339
|
+
it "should not allow relative URLs when relative URLs aren't allowlisted" do
|
340
340
|
input = '<a href="/foo/bar">Link</a>'
|
341
341
|
|
342
342
|
Sanitize.fragment(input,
|
@@ -400,7 +400,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
400
400
|
).must_equal 'foo bar baz hi '
|
401
401
|
end
|
402
402
|
|
403
|
-
it 'should remove the contents of
|
403
|
+
it 'should remove the contents of allowlisted iframes' do
|
404
404
|
Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
|
405
405
|
:elements => ['iframe']
|
406
406
|
).must_equal '<iframe></iframe>'
|
data/test/test_malicious_html.rb
CHANGED
@@ -128,13 +128,15 @@ describe 'Malicious HTML' do
|
|
128
128
|
|
129
129
|
# libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
|
130
130
|
# attempt to preserve server-side includes. This can result in XSS since an
|
131
|
-
# unescaped double quote can allow an attacker to inject a non-
|
131
|
+
# unescaped double quote can allow an attacker to inject a non-allowlisted
|
132
132
|
# attribute. Sanitize works around this by implementing its own escaping for
|
133
133
|
# affected attributes.
|
134
134
|
#
|
135
135
|
# The relevant libxml2 code is here:
|
136
136
|
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
|
137
137
|
describe 'unsafe libxml2 server-side includes in attributes' do
|
138
|
+
using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
|
139
|
+
|
138
140
|
tag_configs = [
|
139
141
|
{
|
140
142
|
tag_name: 'a',
|
@@ -166,6 +168,8 @@ describe 'Malicious HTML' do
|
|
166
168
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
167
169
|
|
168
170
|
it 'should escape unsafe characters in attributes' do
|
171
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
172
|
+
|
169
173
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
170
174
|
# libxml2's.
|
171
175
|
@s.fragment(input).
|
@@ -191,6 +195,8 @@ describe 'Malicious HTML' do
|
|
191
195
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
192
196
|
|
193
197
|
it 'should not escape characters unnecessarily' do
|
198
|
+
skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
|
199
|
+
|
194
200
|
# This uses Nokogumbo's HTML-compliant serializer rather than
|
195
201
|
# libxml2's.
|
196
202
|
@s.fragment(input).
|
data/test/test_parser.rb
CHANGED
data/test/test_sanitize.rb
CHANGED
@@ -150,7 +150,7 @@ describe 'Sanitize' do
|
|
150
150
|
frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
|
151
151
|
end
|
152
152
|
|
153
|
-
describe "when the given node is a document and <html> isn't
|
153
|
+
describe "when the given node is a document and <html> isn't allowlisted" do
|
154
154
|
it 'should raise a Sanitize::Error' do
|
155
155
|
doc = Nokogiri::HTML5.parse('foo')
|
156
156
|
proc { @s.node!(doc) }.must_raise Sanitize::Error
|
data/test/test_sanitize_css.rb
CHANGED
@@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do
|
|
21
21
|
@custom.properties(css).must_equal 'background: #fff; '
|
22
22
|
end
|
23
23
|
|
24
|
-
it 'should allow
|
24
|
+
it 'should allow allowlisted URL protocols' do
|
25
25
|
[
|
26
26
|
"background: url(relative.jpg)",
|
27
27
|
"background: url('relative.jpg')",
|
@@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do
|
|
36
36
|
end
|
37
37
|
end
|
38
38
|
|
39
|
-
it 'should not allow non-
|
39
|
+
it 'should not allow non-allowlisted URL protocols' do
|
40
40
|
[
|
41
41
|
"background: url(javascript:alert(0))",
|
42
42
|
"background: url(ja\\56 ascript:alert(0))",
|
@@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do
|
|
307
307
|
end
|
308
308
|
|
309
309
|
describe ":at_rules" do
|
310
|
-
it "should remove blockless at-rules that aren't
|
310
|
+
it "should remove blockless at-rules that aren't allowlisted" do
|
311
311
|
css = %[
|
312
312
|
@charset 'utf-8';
|
313
313
|
@import url('foo.css');
|
@@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do
|
|
319
319
|
].strip
|
320
320
|
end
|
321
321
|
|
322
|
-
describe "when blockless at-rules are
|
322
|
+
describe "when blockless at-rules are allowlisted" do
|
323
323
|
before do
|
324
324
|
@scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
|
325
325
|
:at_rules => ['charset', 'import']
|
data/test/test_transformers.rb
CHANGED
@@ -12,11 +12,13 @@ describe 'Transformers' do
|
|
12
12
|
return unless env[:node].element?
|
13
13
|
|
14
14
|
env[:config][:foo].must_equal :bar
|
15
|
-
env[:
|
15
|
+
env[:is_allowlisted].must_equal false
|
16
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
16
17
|
env[:node].must_be_kind_of Nokogiri::XML::Node
|
17
18
|
env[:node_name].must_equal 'span'
|
18
|
-
env[:
|
19
|
-
env[:
|
19
|
+
env[:node_allowlist].must_be_kind_of Set
|
20
|
+
env[:node_allowlist].must_be_empty
|
21
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
20
22
|
}
|
21
23
|
)
|
22
24
|
end
|
@@ -43,34 +45,38 @@ describe 'Transformers' do
|
|
43
45
|
nodes.must_equal %w[div span strong b p]
|
44
46
|
end
|
45
47
|
|
46
|
-
it 'should
|
48
|
+
it 'should allowlist nodes in the node allowlist' do
|
47
49
|
Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
|
48
50
|
:transformers => [
|
49
51
|
proc {|env|
|
50
|
-
{:
|
52
|
+
{:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
|
51
53
|
},
|
52
54
|
|
53
55
|
proc {|env|
|
54
|
-
env[:
|
55
|
-
env[:
|
56
|
-
env[:
|
56
|
+
env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
|
57
|
+
env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
|
58
|
+
env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
|
59
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
60
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
57
61
|
}
|
58
62
|
]
|
59
63
|
).must_equal '<div class="foo">foo</div>bar'
|
60
64
|
end
|
61
65
|
|
62
|
-
it 'should clear the node
|
66
|
+
it 'should clear the node allowlist after each fragment' do
|
63
67
|
called = false
|
64
68
|
|
65
69
|
Sanitize.fragment('<div>foo</div>',
|
66
|
-
:transformers => proc {|env| {:
|
70
|
+
:transformers => proc {|env| {:node_allowlist => [env[:node]]}}
|
67
71
|
)
|
68
72
|
|
69
73
|
Sanitize.fragment('<div>foo</div>',
|
70
74
|
:transformers => proc {|env|
|
71
75
|
called = true
|
72
|
-
env[:
|
73
|
-
env[:
|
76
|
+
env[:is_allowlisted].must_equal false
|
77
|
+
env[:is_whitelisted].must_equal env[:is_allowlisted]
|
78
|
+
env[:node_allowlist].must_be_empty
|
79
|
+
env[:node_whitelist].must_equal env[:node_allowlist]
|
74
80
|
}
|
75
81
|
)
|
76
82
|
|
@@ -83,10 +89,10 @@ describe 'Transformers' do
|
|
83
89
|
.must_equal(' foo ')
|
84
90
|
end
|
85
91
|
|
86
|
-
describe 'Image
|
92
|
+
describe 'Image allowlist transformer' do
|
87
93
|
require 'uri'
|
88
94
|
|
89
|
-
|
95
|
+
image_allowlist_transformer = lambda do |env|
|
90
96
|
# Ignore everything except <img> elements.
|
91
97
|
return unless env[:node_name] == 'img'
|
92
98
|
|
@@ -103,7 +109,7 @@ describe 'Transformers' do
|
|
103
109
|
|
104
110
|
before do
|
105
111
|
@s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
|
106
|
-
:transformers =>
|
112
|
+
:transformers => image_allowlist_transformer))
|
107
113
|
end
|
108
114
|
|
109
115
|
it 'should allow images with relative URLs' do
|
@@ -142,8 +148,8 @@ describe 'Transformers' do
|
|
142
148
|
node = env[:node]
|
143
149
|
node_name = env[:node_name]
|
144
150
|
|
145
|
-
# Don't continue if this node is already
|
146
|
-
return if env[:
|
151
|
+
# Don't continue if this node is already allowlisted or is not an element.
|
152
|
+
return if env[:is_allowlisted] || !node.element?
|
147
153
|
|
148
154
|
# Don't continue unless the node is an iframe.
|
149
155
|
return unless node_name == 'iframe'
|
@@ -164,8 +170,8 @@ describe 'Transformers' do
|
|
164
170
|
|
165
171
|
# Now that we're sure that this is a valid YouTube embed and that there are
|
166
172
|
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
167
|
-
# to
|
168
|
-
{:
|
173
|
+
# to allowlist the current node.
|
174
|
+
{:node_allowlist => [node]}
|
169
175
|
end
|
170
176
|
|
171
177
|
it 'should allow HTTP YouTube video embeds' do
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 5.
|
4
|
+
version: 5.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-06-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: crass
|
@@ -80,9 +80,9 @@ dependencies:
|
|
80
80
|
- - "~>"
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: 12.3.1
|
83
|
-
description: Sanitize is
|
84
|
-
|
85
|
-
|
83
|
+
description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
|
84
|
+
HTML and/or CSS from a string except the elements, attributes, and properties you
|
85
|
+
choose to allow.
|
86
86
|
email: ryan@wonko.com
|
87
87
|
executables: []
|
88
88
|
extensions: []
|
@@ -138,5 +138,5 @@ requirements: []
|
|
138
138
|
rubygems_version: 3.0.3
|
139
139
|
signing_key:
|
140
140
|
specification_version: 4
|
141
|
-
summary:
|
141
|
+
summary: Allowlist-based HTML and CSS sanitizer.
|
142
142
|
test_files: []
|