sanitize 4.6.6 → 6.0.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c5672f967be01303dd78eba5c0a1ab45729d15b604e2f2cbb6108c69864ad5f6
4
- data.tar.gz: 8ff91d1efafb67205b6ba07697d2c9f920e34df5e59f357433e54a6f9f0cca76
3
+ metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
4
+ data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
5
5
  SHA512:
6
- metadata.gz: '0981c67f49e789e6ccb6becb2a5407ac3db48b96823f48bef3a284fcc8b2fe539545ec0db8f0449dc5db5039d35a7a193970e3d72c076f99152c001d87be8659'
7
- data.tar.gz: 25af08d3f6524b70aaee67cab17a5e8568697be138954f1ff1e1bc8da591df5ccb405bc1212c24daeb81a3b2c4e659e56e60bdc76938b9a10e096449ba38b657
6
+ metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
7
+ data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
data/HISTORY.md CHANGED
@@ -1,5 +1,145 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.0.0 (2021-08-03)
4
+
5
+ ### Potentially Breaking Changes
6
+
7
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
8
+
9
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
10
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
11
+
12
+ [211]:https://github.com/rgrove/sanitize/pull/211
13
+
14
+ ## 5.2.3 (2021-01-11)
15
+
16
+ ### Bug Fixes
17
+
18
+ * Ensure protocol sanitization is applied to data attributes.
19
+ [@ccutrer - #207][207]
20
+
21
+ [207]:https://github.com/rgrove/sanitize/pull/207
22
+
23
+ ## 5.2.2 (2021-01-06)
24
+
25
+ ### Bug Fixes
26
+
27
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
28
+ custom transformer. [@mscrivo - #206][206]
29
+
30
+ [206]:https://github.com/rgrove/sanitize/pull/206
31
+
32
+ ## 5.2.1 (2020-06-16)
33
+
34
+ ### Bug Fixes
35
+
36
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
37
+ Sanitize versions 3.0.0 through 5.2.0.
38
+
39
+ When HTML was sanitized using the "relaxed" config or a custom config that
40
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
41
+ have beeen sanitized correctly even if `math` and `svg` were not in the
42
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
43
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
44
+
45
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
46
+ config or a custom config that allows one or more of the following HTML
47
+ elements:
48
+
49
+ - `iframe`
50
+ - `math`
51
+ - `noembed`
52
+ - `noframes`
53
+ - `noscript`
54
+ - `plaintext`
55
+ - `script`
56
+ - `style`
57
+ - `svg`
58
+ - `xmp`
59
+
60
+ See the security advisory for more details, including a workaround if you're
61
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
62
+
63
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
64
+ helping to verify the fix.
65
+
66
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
67
+
68
+ ## 5.2.0 (2020-06-06)
69
+
70
+ ### Changes
71
+
72
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
73
+ source and documentation.
74
+
75
+ While the etymology of "whitelist" may not be explicitly racist in origin or
76
+ intent, there are inherent racial connotations in the implication that white
77
+ is good and black (as in "blacklist") is not.
78
+
79
+ This is a change I should have made long ago, and I apologize for not making
80
+ it sooner.
81
+
82
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
83
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
84
+ The old keys will continue to work in order to avoid breaking existing code,
85
+ but they are no longer documented and may be removed in a future semver major
86
+ release.
87
+
88
+ ## 5.1.0 (2019-09-07)
89
+
90
+ ### Features
91
+
92
+ * Added a `:parser_options` config hash, which makes it possible to pass custom
93
+ parsing options to Nokogumbo. [@austin-wang - #194][194]
94
+
95
+ ### Bug Fixes
96
+
97
+ * Non-characters and non-whitespace control characters are now stripped from
98
+ HTML input before parsing to comply with the HTML Standard's [preprocessing
99
+ guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
100
+ W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
101
+
102
+ [179]:https://github.com/rgrove/sanitize/issues/179
103
+ [194]:https://github.com/rgrove/sanitize/pull/194
104
+ [html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
105
+ [unicode-xml]:https://www.w3.org/TR/unicode-xml/
106
+
107
+ ## 5.0.0 (2018-10-14)
108
+
109
+ For most users, upgrading from 4.x shouldn't require any changes. However, the
110
+ minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
111
+ differ in some small ways from 4.x's output. If this matters to you, please
112
+ review the changes below carefully.
113
+
114
+ ### Potentially Breaking Changes
115
+
116
+ * Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
117
+ work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
118
+ no longer works in Ruby 1.9.x.
119
+
120
+ * Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
121
+ standard-compliant HTML serialization. [@stevecheckoway - #189][189]
122
+
123
+ * Children of the following elements are now removed by default when these
124
+ elements are removed, rather than being preserved and escaped:
125
+
126
+ - `iframe`
127
+ - `noembed`
128
+ - `noframes`
129
+ - `noscript`
130
+ - `script`
131
+ - `style`
132
+
133
+ * Children of allowlisted `iframe` elements are now always removed. In modern
134
+ HTML, `iframe` elements should never have children. In HTML 4 and earlier
135
+ `iframe` elements were allowed to contain fallback content for legacy
136
+ browsers, but it's been almost two decades since that was useful.
137
+
138
+ * Fixed a bug that caused `:remove_contents` to behave as if it were set to
139
+ `true` when it was actually an Array.
140
+
141
+ [189]:https://github.com/rgrove/sanitize/pull/189
142
+
3
143
  ## 4.6.6 (2018-07-23)
4
144
 
5
145
  * Improved performance and memory usage by optimizing `Sanitize#transform_node!`
@@ -29,7 +169,7 @@
29
169
 
30
170
  When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
31
171
  specially crafted HTML fragment can cause libxml2 to generate improperly
32
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
172
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
33
173
  elements.
34
174
 
35
175
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -73,7 +213,7 @@
73
213
 
74
214
  ## 4.4.0 (2016-09-29)
75
215
 
76
- * Added `srcset` to the attribute whitelist for `img` elements in the relaxed
216
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
77
217
  config. [@ejtttje - #156][156]
78
218
 
79
219
  [156]:https://github.com/rgrove/sanitize/pull/156
@@ -194,7 +334,7 @@
194
334
  ## 3.0.4 (2014-12-12)
195
335
 
196
336
  * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
197
- caused the URL to be removed even when the protocol was whitelisted.
337
+ caused the URL to be removed even when the protocol was allowlisted.
198
338
  [@benubois - #126][126]
199
339
 
200
340
  [126]:https://github.com/rgrove/sanitize/pull/126
@@ -203,7 +343,7 @@
203
343
  ## 3.0.3 (2014-10-29)
204
344
 
205
345
  * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
206
- `@media` block, causing them to be removed even when whitelist rules should
346
+ `@media` block, causing them to be removed even when allowlist rules should
207
347
  have allowed them to remain. [#121][121]
208
348
 
209
349
  [121]:https://github.com/rgrove/sanitize/issues/121
@@ -268,7 +408,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
268
408
  * The `clean_node!` method was renamed to `node!`.
269
409
 
270
410
  * The `document` method now raises a `Sanitize::Error` if the `<html>` element
271
- isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
411
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
272
412
  regardless of the `:remove_contents` config setting.
273
413
 
274
414
  * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -279,7 +419,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
279
419
 
280
420
  * Added advanced CSS sanitization support using [Crass][crass], which is fully
281
421
  compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
282
- whitelisted `<style>` elements and `style` attributes in HTML will be
422
+ allowlisted `<style>` elements and `style` attributes in HTML will be
283
423
  sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
284
424
  sanitize CSS stylesheets or properties.
285
425
 
@@ -324,9 +464,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
324
464
  [n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
325
465
 
326
466
 
467
+ ## 2.1.1 (2018-09-30)
468
+
469
+ * [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
470
+ XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
471
+
472
+ When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
473
+ specially crafted HTML fragment can cause libxml2 to generate improperly
474
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
475
+ elements.
476
+
477
+ Sanitize now performs additional escaping on affected attributes to prevent
478
+ this.
479
+
480
+ Many thanks to the Shopify Application Security Team for responsibly reporting
481
+ this issue.
482
+
483
+ [176]:https://github.com/rgrove/sanitize/issues/176
484
+ [188]:https://github.com/rgrove/sanitize/pull/188
485
+
486
+
327
487
  ## 2.1.0 (2014-01-13)
328
488
 
329
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
489
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
330
490
  symbol `:data` instead of an attribute name in the `:attributes` config to
331
491
  indicate that arbitrary data attributes should be allowed on an element.
332
492
 
@@ -407,12 +567,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
407
567
  the default depth-first mode.
408
568
 
409
569
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
410
- elements to the whitelists for the basic and relaxed configs.
570
+ elements to the allowlists for the basic and relaxed configs.
411
571
 
412
572
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
413
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
573
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
414
574
 
415
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
575
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
416
576
  elements in the relaxed config.
417
577
 
418
578
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -423,7 +583,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
423
583
  ## 1.2.1 (2010-04-20)
424
584
 
425
585
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
426
- remove the contents of all non-whitelisted elements in addition to the
586
+ remove the contents of all non-allowlisted elements in addition to the
427
587
  elements themselves. If set to an array of element names, Sanitize will
428
588
  remove the contents of only those elements (when filtered), and leave the
429
589
  contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -451,7 +611,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
451
611
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
452
612
  all its children.
453
613
 
454
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
614
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
455
615
  David Reese]
456
616
 
457
617
 
@@ -471,7 +631,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
471
631
 
472
632
  * Added a workaround for an Hpricot bug that prevents attribute names from
473
633
  being downcased in recent versions of Hpricot. This was exploitable to
474
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
634
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
475
635
  Wanicur]
476
636
 
477
637
 
@@ -501,7 +661,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
501
661
 
502
662
  ## 1.0.5 (2009-02-05)
503
663
 
504
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
664
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
505
665
  protocols from being cleaned when relative URLs were allowed. [Reported by
506
666
  Dev Purkayastha]
507
667
 
@@ -511,7 +671,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
511
671
 
512
672
  ## 1.0.4 (2009-01-16)
513
673
 
514
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
674
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
515
675
  by repeating it several times in a row. All versions of Sanitize prior to
516
676
  1.0.4 are vulnerable. [Reported by Cristobal]
517
677
 
@@ -519,7 +679,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
519
679
  ## 1.0.3 (2009-01-15)
520
680
 
521
681
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
522
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
682
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
523
683
  still decode the incomplete entities, users of those browsers may be
524
684
  vulnerable to malicious script injection on websites using versions of
525
685
  Sanitize prior to 1.0.3.
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.md CHANGED
@@ -1,28 +1,27 @@
1
1
  Sanitize
2
2
  ========
3
3
 
4
- Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of acceptable
5
- elements, attributes, and CSS properties, Sanitize will remove all unacceptable
6
- HTML and/or CSS from a string.
4
+ Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
5
+ and/or CSS from a string except the elements, attributes, and properties you
6
+ choose to allow.
7
7
 
8
8
  Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
9
9
  elements, certain attributes within those elements, and even certain URL
10
- protocols within attributes that contain URLs. You can also whitelist CSS
11
- properties, @ rules, and URL protocols you wish to allow in elements or
12
- attributes containing CSS. Any HTML or CSS that you don't explicitly allow will
13
- be removed.
10
+ protocols within attributes that contain URLs. You can also allow specific CSS
11
+ properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
+ Any HTML or CSS that you don't explicitly allow will be removed.
14
13
 
15
- Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
14
+ Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
16
15
  exactly the same way modern browsers do, and [Crass][crass], which parses CSS
17
- exactly the same way modern browsers do. As long as your whitelist config only
16
+ exactly the same way modern browsers do. As long as your allowlist config only
18
17
  allows safe markup and CSS, even the most malformed or malicious input will be
19
18
  transformed into safe output.
20
19
 
21
- [![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
22
20
  [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
21
+ [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
23
22
 
24
23
  [crass]:https://github.com/rgrove/crass
25
- [gumbo]:https://github.com/google/gumbo-parser
24
+ [nokogumbo]:https://github.com/rubys/nokogumbo
26
25
 
27
26
  Links
28
27
  -----
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
73
72
  * Standalone CSS stylesheets
74
73
  * Standalone CSS properties
75
74
 
75
+ However, please note that Sanitize _cannot_ fully sanitize the contents of
76
+ `<math>` or `<svg>` elements, since these elements don't follow the same parsing
77
+ rules as the rest of HTML. If this is something you need, you may want to look
78
+ for another solution.
79
+
76
80
  ### HTML Fragments
77
81
 
78
82
  A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
88
92
  # => 'foo'
89
93
  ```
90
94
 
91
- To keep certain elements, add them to the element whitelist.
95
+ To keep certain elements, add them to the element allowlist.
92
96
 
93
97
  ```ruby
94
98
  Sanitize.fragment(html, :elements => ['b'])
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
97
101
 
98
102
  ### HTML Documents
99
103
 
100
- When sanitizing a document, the `<html>` element must be whitelisted. You can
104
+ When sanitizing a document, the `<html>` element must be allowlisted. You can
101
105
  also set `:allow_doctype` to `true` to allow well-formed document type
102
106
  definitions.
103
107
 
@@ -123,8 +127,8 @@ Sanitize.document(html,
123
127
 
124
128
  ### CSS in HTML
125
129
 
126
- To sanitize CSS in an HTML fragment or document, first whitelist the `<style>`
127
- element and/or the `style` attribute. Then whitelist the CSS properties,
130
+ To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
131
+ element and/or the `style` attribute. Then allowlist the CSS properties,
128
132
  @ rules, and URL protocols you wish to allow. You can also choose whether to
129
133
  allow CSS comments or browser compatibility hacks.
130
134
 
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
267
271
 
268
272
  ```ruby
269
273
  # Create a customized copy of the Basic config, adding <div> and <table> to the
270
- # existing whitelisted elements.
274
+ # existing allowlisted elements.
271
275
  Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
276
  :elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
273
277
  :remove_contents => true
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
395
399
 
396
400
  ##### :css => :properties (Array or Set)
397
401
 
398
- Whitelist of CSS property names to allow. Names should be specified in
399
- lowercase.
402
+ List of CSS property names to allow. Names should be specified in lowercase.
400
403
 
401
404
  ##### :css => :protocols (Array or Set)
402
405
 
@@ -417,6 +420,23 @@ elements not in this array will be removed.
417
420
  ]
418
421
  ```
419
422
 
423
+ **Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
424
+ elements, since these elements don't follow the same parsing rules as the rest
425
+ of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
426
+ content inside them will be allowed, even if that content would otherwise be
427
+ removed by Sanitize.
428
+
429
+ #### :parser_options (Hash)
430
+
431
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
432
+
433
+ ```ruby
434
+ :parser_options => {
435
+ max_errors: -1,
436
+ max_tree_depth: -1
437
+ }
438
+ ```
439
+
420
440
  #### :protocols (Hash)
421
441
 
422
442
  URL protocols to allow in specific attributes. If an attribute is listed here
@@ -441,15 +461,15 @@ include the symbol `:relative` in the protocol array:
441
461
 
442
462
  #### :remove_contents (boolean or Array or Set)
443
463
 
444
- If set to `true`, Sanitize will remove the contents of any non-whitelisted
464
+ If this is `true`, Sanitize will remove the contents of any non-allowlisted
445
465
  elements in addition to the elements themselves. By default, Sanitize leaves the
446
466
  safe parts of an element's contents behind when the element is removed.
447
467
 
448
- If set to an array of element names, then only the contents of the specified
449
- elements (when filtered) will be removed, and the contents of all other filtered
450
- elements will be left behind.
468
+ If this is an Array or Set of element names, then only the contents of the
469
+ specified elements (when filtered) will be removed, and the contents of all
470
+ other filtered elements will be left behind.
451
471
 
452
- The default value is `false`.
472
+ The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
453
473
 
454
474
  #### :transformers (Array or callable)
455
475
 
@@ -507,33 +527,33 @@ argument a Hash that contains the following items:
507
527
 
508
528
  * **:config** - The current Sanitize configuration Hash.
509
529
 
510
- * **:is_whitelisted** - `true` if the current node has been whitelisted by a
530
+ * **:is_allowlisted** - `true` if the current node has been allowlisted by a
511
531
  previous transformer, `false` otherwise. It's generally bad form to remove
512
- a node that a previous transformer has whitelisted.
532
+ a node that a previous transformer has allowlisted.
513
533
 
514
534
  * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
515
535
  node may be an element, a text node, a comment, a CDATA node, or a document
516
536
  fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
517
537
  selectively ignore node types you aren't interested in.
518
538
 
539
+ * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
540
+ document that have been allowlisted by previous transformers, if any. It's
541
+ generally bad form to remove a node that a previous transformer has
542
+ allowlisted.
543
+
519
544
  * **:node_name** - The name of the current HTML node, always lowercase (e.g.
520
545
  "div" or "span"). For non-element nodes, the name will be something like
521
546
  "text", "comment", "#cdata-section", "#document-fragment", etc.
522
547
 
523
- * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
524
- document that have been whitelisted by previous transformers, if any. It's
525
- generally bad form to remove a node that a previous transformer has
526
- whitelisted.
527
-
528
548
  ### Output
529
549
 
530
550
  A transformer doesn't have to return anything, but may optionally return a Hash,
531
551
  which may contain the following items:
532
552
 
533
- * **:node_whitelist** - Array or Set of specific Nokogiri::XML::Node objects
534
- to add to the document's whitelist, bypassing the current Sanitize config.
535
- These specific nodes and all their attributes will be whitelisted, but
536
- their children will not be.
553
+ * **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
554
+ objects to add to the document's allowlist, bypassing the current Sanitize
555
+ config. These specific nodes and all their attributes will be allowlisted,
556
+ but their children will not be.
537
557
 
538
558
  If a transformer returns anything other than a Hash, the return value will be
539
559
  ignored.
@@ -576,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
576
596
  completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
577
597
  your own hands.
578
598
 
579
- ### Example: Transformer to whitelist image URLs by domain
599
+ ### Example: Transformer to allow image URLs by domain
580
600
 
581
601
  The following example demonstrates how to remove image elements unless they use
582
602
  a relative URL or are hosted on a specific domain. It assumes that the `<img>`
583
- element and its `src` attribute are already whitelisted.
603
+ element and its `src` attribute are already allowlisted.
584
604
 
585
605
  ```ruby
586
606
  require 'uri'
587
607
 
588
- image_whitelist_transformer = lambda do |env|
608
+ image_allowlist_transformer = lambda do |env|
589
609
  # Ignore everything except <img> elements.
590
610
  return unless env[:node_name] == 'img'
591
611
 
@@ -601,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
601
621
  end
602
622
  ```
603
623
 
604
- ### Example: Transformer to whitelist YouTube video embeds
624
+ ### Example: Transformer to allow YouTube video embeds
605
625
 
606
626
  The following example demonstrates how to create a transformer that will safely
607
- whitelist valid YouTube video embeds without having to blindly allow other kinds
608
- of embedded content, which would be the case if you tried to do this by just
609
- whitelisting all `<iframe>` elements:
627
+ allow valid YouTube video embeds without having to allow other kinds of embedded
628
+ content, which would be the case if you tried to do this by just allowing all
629
+ `<iframe>` elements:
610
630
 
611
631
  ```ruby
612
632
  youtube_transformer = lambda do |env|
613
633
  node = env[:node]
614
634
  node_name = env[:node_name]
615
635
 
616
- # Don't continue if this node is already whitelisted or is not an element.
617
- return if env[:is_whitelisted] || !node.element?
636
+ # Don't continue if this node is already allowlisted or is not an element.
637
+ return if env[:is_allowlisted] || !node.element?
618
638
 
619
639
  # Don't continue unless the node is an iframe.
620
640
  return unless node_name == 'iframe'
@@ -635,8 +655,8 @@ youtube_transformer = lambda do |env|
635
655
 
636
656
  # Now that we're sure that this is a valid YouTube embed and that there are
637
657
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
638
- # to whitelist the current node.
639
- {:node_whitelist => [node]}
658
+ # to allowlist the current node.
659
+ {:node_allowlist => [node]}
640
660
  end
641
661
 
642
662
  html = %[
@@ -647,25 +667,3 @@ html = %[
647
667
  Sanitize.fragment(html, :transformers => youtube_transformer)
648
668
  # => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
649
669
  ```
650
-
651
- License
652
- -------
653
-
654
- Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
655
-
656
- Permission is hereby granted, free of charge, to any person obtaining a copy of
657
- this software and associated documentation files (the 'Software'), to deal in
658
- the Software without restriction, including without limitation the rights to
659
- use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
660
- the Software, and to permit persons to whom the Software is furnished to do so,
661
- subject to the following conditions:
662
-
663
- The above copyright notice and this permission notice shall be included in all
664
- copies or substantial portions of the Software.
665
-
666
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
667
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
668
- FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
669
- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
670
- IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
671
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -56,6 +56,10 @@ class Sanitize
56
56
  # that all HTML will be stripped).
57
57
  :elements => [],
58
58
 
59
+ # HTML parsing options to pass to Nokogumbo.
60
+ # https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options
61
+ :parser_options => {},
62
+
59
63
  # URL handling protocols to allow in specific attributes. By default, no
60
64
  # protocols are allowed. Use :relative in place of a protocol if you want
61
65
  # to allow relative URLs sans protocol.
@@ -66,10 +70,12 @@ class Sanitize
66
70
  # leaves the safe parts of an element's contents behind when the element
67
71
  # is removed.
68
72
  #
69
- # If this is an Array of element names, then only the contents of the
70
- # specified elements (when filtered) will be removed, and the contents of
71
- # all other filtered elements will be left behind.
72
- :remove_contents => false,
73
+ # If this is an Array or Set of element names, then only the contents of
74
+ # the specified elements (when filtered) will be removed, and the contents
75
+ # of all other filtered elements will be left behind.
76
+ :remove_contents => %w[
77
+ iframe math noembed noframes noscript plaintext script style svg xmp
78
+ ],
73
79
 
74
80
  # Transformers allow you to filter or alter nodes using custom logic. See
75
81
  # README.md for details and examples.
@@ -6,7 +6,7 @@ class Sanitize
6
6
  :elements => BASIC[:elements] + %w[
7
7
  address article aside bdi bdo body caption col colgroup data del div
8
8
  figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
9
- img ins main nav rp rt ruby section span style summary sup table tbody
9
+ img ins main nav rp rt ruby section span style summary table tbody
10
10
  td tfoot th thead title tr wbr
11
11
  ],
12
12
 
data/lib/sanitize/css.rb CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
175
175
  next prop
176
176
 
177
177
  when :semicolon
178
- # Only preserve the semicolon if it was preceded by a whitelisted
178
+ # Only preserve the semicolon if it was preceded by an allowlisted
179
179
  # property. Otherwise, omit it in order to prevent redundant semicolons.
180
180
  if preceded_by_property
181
181
  preceded_by_property = false
@@ -296,7 +296,7 @@ class Sanitize; class CSS
296
296
  end
297
297
 
298
298
  # Returns `true` if the given node (which may be of type `:url` or
299
- # `:function`, since the CSS syntax can produce both) uses a whitelisted
299
+ # `:function`, since the CSS syntax can produce both) uses an allowlisted
300
300
  # protocol.
301
301
  def valid_url?(node)
302
302
  type = node[:node]
@@ -6,7 +6,7 @@ class Sanitize; module Transformers
6
6
  node = env[:node]
7
7
 
8
8
  if node.type == Nokogiri::XML::Node::COMMENT_NODE
9
- node.unlink unless env[:is_whitelisted]
9
+ node.unlink unless env[:is_allowlisted]
10
10
  end
11
11
  end
12
12