sanitize 4.6.5 → 5.2.1

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: eab36cec7ac13bd15bd00b1141990e9efc35332c95391cb405128ddfe891e242
4
- data.tar.gz: f69f77cf6febfa74b1bdc5103d245543f38ddcfe223d474dffab5913846525ec
3
+ metadata.gz: 3d1290690a9d32db9e06b8fb19c7e285c94a1d91ed51a4eb7e96389e427348d9
4
+ data.tar.gz: 5131063daf1763c83978954bed9ee3a783099e40aa71e50de26d06b8ae0c1054
5
5
  SHA512:
6
- metadata.gz: 3358c2574bcdd0e3a8c08460f2dd31ecd3ade8e04ed60380f8037c46dd0f67321ac2c4ccace6e1f82080acecb2dc71054630c2c1fec54b2a99cb50c0476dd0b2
7
- data.tar.gz: c10686ec8aacadf3268eafd407e4e2259e88deae363829d5ebe1b2877ed8e15c658bb86fe97b786d116c00b10e6eaff14baf5bb71a7a737ec507f3ab65f61187
6
+ metadata.gz: bfcb7cda6aa70590f642583b41936bc09d8929210046cebdd0d0ff452ccb3213844b4c40d4e205e79c0cd64a2a0d56e16790e38f4c8f247b8abfa32dbec22297
7
+ data.tar.gz: 0ea5a6d6848f9a125f17e4e23145adff4d3c4ccfe30a3407466fae074ed33cbd4b1869eb5a9f0a72b808449b8cf166a3695c2a6d63b16a83b047fd260bfe50bd
data/HISTORY.md CHANGED
@@ -1,5 +1,123 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 5.2.1 (2020-06-16)
4
+
5
+ ### Bug Fixes
6
+
7
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
8
+ Sanitize versions 3.0.0 through 5.2.0.
9
+
10
+ When HTML was sanitized using the "relaxed" config or a custom config that
11
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
12
+ have beeen sanitized correctly even if `math` and `svg` were not in the
13
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
14
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
15
+
16
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
17
+ config or a custom config that allows one or more of the following HTML
18
+ elements:
19
+
20
+ - `iframe`
21
+ - `math`
22
+ - `noembed`
23
+ - `noframes`
24
+ - `noscript`
25
+ - `plaintext`
26
+ - `script`
27
+ - `style`
28
+ - `svg`
29
+ - `xmp`
30
+
31
+ See the security advisory for more details, including a workaround if you're
32
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
33
+
34
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
35
+ helping to verify the fix.
36
+
37
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
38
+
39
+ ## 5.2.0 (2020-06-06)
40
+
41
+ ### Changes
42
+
43
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
44
+ source and documentation.
45
+
46
+ While the etymology of "whitelist" may not be explicitly racist in origin or
47
+ intent, there are inherent racial connotations in the implication that white
48
+ is good and black (as in "blacklist") is not.
49
+
50
+ This is a change I should have made long ago, and I apologize for not making
51
+ it sooner.
52
+
53
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
54
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
55
+ The old keys will continue to work in order to avoid breaking existing code,
56
+ but they are no longer documented and may be removed in a future semver major
57
+ release.
58
+
59
+ ## 5.1.0 (2019-09-07)
60
+
61
+ ### Features
62
+
63
+ * Added a `:parser_options` config hash, which makes it possible to pass custom
64
+ parsing options to Nokogumbo. [@austin-wang - #194][194]
65
+
66
+ ### Bug Fixes
67
+
68
+ * Non-characters and non-whitespace control characters are now stripped from
69
+ HTML input before parsing to comply with the HTML Standard's [preprocessing
70
+ guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
71
+ W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
72
+
73
+ [179]:https://github.com/rgrove/sanitize/issues/179
74
+ [194]:https://github.com/rgrove/sanitize/pull/194
75
+ [html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
76
+ [unicode-xml]:https://www.w3.org/TR/unicode-xml/
77
+
78
+ ## 5.0.0 (2018-10-14)
79
+
80
+ For most users, upgrading from 4.x shouldn't require any changes. However, the
81
+ minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
82
+ differ in some small ways from 4.x's output. If this matters to you, please
83
+ review the changes below carefully.
84
+
85
+ ### Potentially Breaking Changes
86
+
87
+ * Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
88
+ work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
89
+ no longer works in Ruby 1.9.x.
90
+
91
+ * Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
92
+ standard-compliant HTML serialization. [@stevecheckoway - #189][189]
93
+
94
+ * Children of the following elements are now removed by default when these
95
+ elements are removed, rather than being preserved and escaped:
96
+
97
+ - `iframe`
98
+ - `noembed`
99
+ - `noframes`
100
+ - `noscript`
101
+ - `script`
102
+ - `style`
103
+
104
+ * Children of allowlisted `iframe` elements are now always removed. In modern
105
+ HTML, `iframe` elements should never have children. In HTML 4 and earlier
106
+ `iframe` elements were allowed to contain fallback content for legacy
107
+ browsers, but it's been almost two decades since that was useful.
108
+
109
+ * Fixed a bug that caused `:remove_contents` to behave as if it were set to
110
+ `true` when it was actually an Array.
111
+
112
+ [189]:https://github.com/rgrove/sanitize/pull/189
113
+
114
+ ## 4.6.6 (2018-07-23)
115
+
116
+ * Improved performance and memory usage by optimizing `Sanitize#transform_node!`
117
+ [@stanhu - #183][183]
118
+
119
+ [183]:https://github.com/rgrove/sanitize/pull/183
120
+
3
121
  ## 4.6.5 (2018-05-16)
4
122
 
5
123
  * Improved performance slightly by tweaking the order of built-in transformers.
@@ -22,7 +140,7 @@
22
140
 
23
141
  When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
24
142
  specially crafted HTML fragment can cause libxml2 to generate improperly
25
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
143
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
26
144
  elements.
27
145
 
28
146
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -66,7 +184,7 @@
66
184
 
67
185
  ## 4.4.0 (2016-09-29)
68
186
 
69
- * Added `srcset` to the attribute whitelist for `img` elements in the relaxed
187
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
70
188
  config. [@ejtttje - #156][156]
71
189
 
72
190
  [156]:https://github.com/rgrove/sanitize/pull/156
@@ -187,7 +305,7 @@
187
305
  ## 3.0.4 (2014-12-12)
188
306
 
189
307
  * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
190
- caused the URL to be removed even when the protocol was whitelisted.
308
+ caused the URL to be removed even when the protocol was allowlisted.
191
309
  [@benubois - #126][126]
192
310
 
193
311
  [126]:https://github.com/rgrove/sanitize/pull/126
@@ -196,7 +314,7 @@
196
314
  ## 3.0.3 (2014-10-29)
197
315
 
198
316
  * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
199
- `@media` block, causing them to be removed even when whitelist rules should
317
+ `@media` block, causing them to be removed even when allowlist rules should
200
318
  have allowed them to remain. [#121][121]
201
319
 
202
320
  [121]:https://github.com/rgrove/sanitize/issues/121
@@ -261,7 +379,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
261
379
  * The `clean_node!` method was renamed to `node!`.
262
380
 
263
381
  * The `document` method now raises a `Sanitize::Error` if the `<html>` element
264
- isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
382
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
265
383
  regardless of the `:remove_contents` config setting.
266
384
 
267
385
  * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -272,7 +390,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
390
 
273
391
  * Added advanced CSS sanitization support using [Crass][crass], which is fully
274
392
  compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
275
- whitelisted `<style>` elements and `style` attributes in HTML will be
393
+ allowlisted `<style>` elements and `style` attributes in HTML will be
276
394
  sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
277
395
  sanitize CSS stylesheets or properties.
278
396
 
@@ -317,9 +435,29 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
317
435
  [n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
318
436
 
319
437
 
438
+ ## 2.1.1 (2018-09-30)
439
+
440
+ * [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
441
+ XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
442
+
443
+ When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
444
+ specially crafted HTML fragment can cause libxml2 to generate improperly
445
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
446
+ elements.
447
+
448
+ Sanitize now performs additional escaping on affected attributes to prevent
449
+ this.
450
+
451
+ Many thanks to the Shopify Application Security Team for responsibly reporting
452
+ this issue.
453
+
454
+ [176]:https://github.com/rgrove/sanitize/issues/176
455
+ [188]:https://github.com/rgrove/sanitize/pull/188
456
+
457
+
320
458
  ## 2.1.0 (2014-01-13)
321
459
 
322
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
460
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
323
461
  symbol `:data` instead of an attribute name in the `:attributes` config to
324
462
  indicate that arbitrary data attributes should be allowed on an element.
325
463
 
@@ -400,12 +538,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
400
538
  the default depth-first mode.
401
539
 
402
540
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
403
- elements to the whitelists for the basic and relaxed configs.
541
+ elements to the allowlists for the basic and relaxed configs.
404
542
 
405
543
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
406
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
544
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
407
545
 
408
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
546
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
409
547
  elements in the relaxed config.
410
548
 
411
549
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -416,7 +554,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
416
554
  ## 1.2.1 (2010-04-20)
417
555
 
418
556
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
419
- remove the contents of all non-whitelisted elements in addition to the
557
+ remove the contents of all non-allowlisted elements in addition to the
420
558
  elements themselves. If set to an array of element names, Sanitize will
421
559
  remove the contents of only those elements (when filtered), and leave the
422
560
  contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -444,7 +582,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
444
582
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
445
583
  all its children.
446
584
 
447
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
585
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
448
586
  David Reese]
449
587
 
450
588
 
@@ -464,7 +602,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
464
602
 
465
603
  * Added a workaround for an Hpricot bug that prevents attribute names from
466
604
  being downcased in recent versions of Hpricot. This was exploitable to
467
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
605
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
468
606
  Wanicur]
469
607
 
470
608
 
@@ -494,7 +632,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
494
632
 
495
633
  ## 1.0.5 (2009-02-05)
496
634
 
497
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
635
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
498
636
  protocols from being cleaned when relative URLs were allowed. [Reported by
499
637
  Dev Purkayastha]
500
638
 
@@ -504,7 +642,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
504
642
 
505
643
  ## 1.0.4 (2009-01-16)
506
644
 
507
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
645
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
508
646
  by repeating it several times in a row. All versions of Sanitize prior to
509
647
  1.0.4 are vulnerable. [Reported by Cristobal]
510
648
 
@@ -512,7 +650,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
512
650
  ## 1.0.3 (2009-01-15)
513
651
 
514
652
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
515
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
653
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
516
654
  still decode the incomplete entities, users of those browsers may be
517
655
  vulnerable to malicious script injection on websites using versions of
518
656
  Sanitize prior to 1.0.3.
data/README.md CHANGED
@@ -1,20 +1,19 @@
1
1
  Sanitize
2
2
  ========
3
3
 
4
- Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of acceptable
5
- elements, attributes, and CSS properties, Sanitize will remove all unacceptable
6
- HTML and/or CSS from a string.
4
+ Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
5
+ and/or CSS from a string except the elements, attributes, and properties you
6
+ choose to allow.
7
7
 
8
8
  Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
9
9
  elements, certain attributes within those elements, and even certain URL
10
- protocols within attributes that contain URLs. You can also whitelist CSS
11
- properties, @ rules, and URL protocols you wish to allow in elements or
12
- attributes containing CSS. Any HTML or CSS that you don't explicitly allow will
13
- be removed.
10
+ protocols within attributes that contain URLs. You can also allow specific CSS
11
+ properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
+ Any HTML or CSS that you don't explicitly allow will be removed.
14
13
 
15
14
  Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
16
15
  exactly the same way modern browsers do, and [Crass][crass], which parses CSS
17
- exactly the same way modern browsers do. As long as your whitelist config only
16
+ exactly the same way modern browsers do. As long as your allowlist config only
18
17
  allows safe markup and CSS, even the most malformed or malicious input will be
19
18
  transformed into safe output.
20
19
 
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
73
72
  * Standalone CSS stylesheets
74
73
  * Standalone CSS properties
75
74
 
75
+ However, please note that Sanitize _cannot_ fully sanitize the contents of
76
+ `<math>` or `<svg>` elements, since these elements don't follow the same parsing
77
+ rules as the rest of HTML. If this is something you need, you may want to look
78
+ for another solution.
79
+
76
80
  ### HTML Fragments
77
81
 
78
82
  A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
88
92
  # => 'foo'
89
93
  ```
90
94
 
91
- To keep certain elements, add them to the element whitelist.
95
+ To keep certain elements, add them to the element allowlist.
92
96
 
93
97
  ```ruby
94
98
  Sanitize.fragment(html, :elements => ['b'])
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
97
101
 
98
102
  ### HTML Documents
99
103
 
100
- When sanitizing a document, the `<html>` element must be whitelisted. You can
104
+ When sanitizing a document, the `<html>` element must be allowlisted. You can
101
105
  also set `:allow_doctype` to `true` to allow well-formed document type
102
106
  definitions.
103
107
 
@@ -123,8 +127,8 @@ Sanitize.document(html,
123
127
 
124
128
  ### CSS in HTML
125
129
 
126
- To sanitize CSS in an HTML fragment or document, first whitelist the `<style>`
127
- element and/or the `style` attribute. Then whitelist the CSS properties,
130
+ To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
131
+ element and/or the `style` attribute. Then allowlist the CSS properties,
128
132
  @ rules, and URL protocols you wish to allow. You can also choose whether to
129
133
  allow CSS comments or browser compatibility hacks.
130
134
 
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
267
271
 
268
272
  ```ruby
269
273
  # Create a customized copy of the Basic config, adding <div> and <table> to the
270
- # existing whitelisted elements.
274
+ # existing allowlisted elements.
271
275
  Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
276
  :elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
273
277
  :remove_contents => true
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
395
399
 
396
400
  ##### :css => :properties (Array or Set)
397
401
 
398
- Whitelist of CSS property names to allow. Names should be specified in
399
- lowercase.
402
+ List of CSS property names to allow. Names should be specified in lowercase.
400
403
 
401
404
  ##### :css => :protocols (Array or Set)
402
405
 
@@ -417,6 +420,23 @@ elements not in this array will be removed.
417
420
  ]
418
421
  ```
419
422
 
423
+ **Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
424
+ elements, since these elements don't follow the same parsing rules as the rest
425
+ of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
426
+ content inside them will be allowed, even if that content would otherwise be
427
+ removed by Sanitize.
428
+
429
+ #### :parser_options (Hash)
430
+
431
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
432
+
433
+ ```ruby
434
+ :parser_options => {
435
+ max_errors: -1,
436
+ max_tree_depth: -1
437
+ }
438
+ ```
439
+
420
440
  #### :protocols (Hash)
421
441
 
422
442
  URL protocols to allow in specific attributes. If an attribute is listed here
@@ -441,13 +461,13 @@ include the symbol `:relative` in the protocol array:
441
461
 
442
462
  #### :remove_contents (boolean or Array or Set)
443
463
 
444
- If set to `true`, Sanitize will remove the contents of any non-whitelisted
464
+ If this is `true`, Sanitize will remove the contents of any non-allowlisted
445
465
  elements in addition to the elements themselves. By default, Sanitize leaves the
446
466
  safe parts of an element's contents behind when the element is removed.
447
467
 
448
- If set to an array of element names, then only the contents of the specified
449
- elements (when filtered) will be removed, and the contents of all other filtered
450
- elements will be left behind.
468
+ If this is an Array or Set of element names, then only the contents of the
469
+ specified elements (when filtered) will be removed, and the contents of all
470
+ other filtered elements will be left behind.
451
471
 
452
472
  The default value is `false`.
453
473
 
@@ -474,6 +494,15 @@ children, in which case it will be inserted after those children.
474
494
  }
475
495
  ```
476
496
 
497
+ The default elements with whitespace added before and after are:
498
+
499
+ ```
500
+ address article aside blockquote br dd div dl dt
501
+ footer h1 h2 h3 h4 h5 h6 header hgroup hr li nav
502
+ ol p pre section ul
503
+
504
+ ```
505
+
477
506
  ## Transformers
478
507
 
479
508
  Transformers allow you to filter and modify HTML nodes using your own custom
@@ -498,33 +527,33 @@ argument a Hash that contains the following items:
498
527
 
499
528
  * **:config** - The current Sanitize configuration Hash.
500
529
 
501
- * **:is_whitelisted** - `true` if the current node has been whitelisted by a
530
+ * **:is_allowlisted** - `true` if the current node has been allowlisted by a
502
531
  previous transformer, `false` otherwise. It's generally bad form to remove
503
- a node that a previous transformer has whitelisted.
532
+ a node that a previous transformer has allowlisted.
504
533
 
505
534
  * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
506
535
  node may be an element, a text node, a comment, a CDATA node, or a document
507
536
  fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
508
537
  selectively ignore node types you aren't interested in.
509
538
 
539
+ * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
540
+ document that have been allowlisted by previous transformers, if any. It's
541
+ generally bad form to remove a node that a previous transformer has
542
+ allowlisted.
543
+
510
544
  * **:node_name** - The name of the current HTML node, always lowercase (e.g.
511
545
  "div" or "span"). For non-element nodes, the name will be something like
512
546
  "text", "comment", "#cdata-section", "#document-fragment", etc.
513
547
 
514
- * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
515
- document that have been whitelisted by previous transformers, if any. It's
516
- generally bad form to remove a node that a previous transformer has
517
- whitelisted.
518
-
519
548
  ### Output
520
549
 
521
550
  A transformer doesn't have to return anything, but may optionally return a Hash,
522
551
  which may contain the following items:
523
552
 
524
- * **:node_whitelist** - Array or Set of specific Nokogiri::XML::Node objects
525
- to add to the document's whitelist, bypassing the current Sanitize config.
526
- These specific nodes and all their attributes will be whitelisted, but
527
- their children will not be.
553
+ * **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
554
+ objects to add to the document's allowlist, bypassing the current Sanitize
555
+ config. These specific nodes and all their attributes will be allowlisted,
556
+ but their children will not be.
528
557
 
529
558
  If a transformer returns anything other than a Hash, the return value will be
530
559
  ignored.
@@ -567,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
567
596
  completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
568
597
  your own hands.
569
598
 
570
- ### Example: Transformer to whitelist image URLs by domain
599
+ ### Example: Transformer to allow image URLs by domain
571
600
 
572
601
  The following example demonstrates how to remove image elements unless they use
573
602
  a relative URL or are hosted on a specific domain. It assumes that the `<img>`
574
- element and its `src` attribute are already whitelisted.
603
+ element and its `src` attribute are already allowlisted.
575
604
 
576
605
  ```ruby
577
606
  require 'uri'
578
607
 
579
- image_whitelist_transformer = lambda do |env|
608
+ image_allowlist_transformer = lambda do |env|
580
609
  # Ignore everything except <img> elements.
581
610
  return unless env[:node_name] == 'img'
582
611
 
@@ -592,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
592
621
  end
593
622
  ```
594
623
 
595
- ### Example: Transformer to whitelist YouTube video embeds
624
+ ### Example: Transformer to allow YouTube video embeds
596
625
 
597
626
  The following example demonstrates how to create a transformer that will safely
598
- whitelist valid YouTube video embeds without having to blindly allow other kinds
599
- of embedded content, which would be the case if you tried to do this by just
600
- whitelisting all `<iframe>` elements:
627
+ allow valid YouTube video embeds without having to allow other kinds of embedded
628
+ content, which would be the case if you tried to do this by just allowing all
629
+ `<iframe>` elements:
601
630
 
602
631
  ```ruby
603
632
  youtube_transformer = lambda do |env|
604
633
  node = env[:node]
605
634
  node_name = env[:node_name]
606
635
 
607
- # Don't continue if this node is already whitelisted or is not an element.
608
- return if env[:is_whitelisted] || !node.element?
636
+ # Don't continue if this node is already allowlisted or is not an element.
637
+ return if env[:is_allowlisted] || !node.element?
609
638
 
610
639
  # Don't continue unless the node is an iframe.
611
640
  return unless node_name == 'iframe'
@@ -626,8 +655,8 @@ youtube_transformer = lambda do |env|
626
655
 
627
656
  # Now that we're sure that this is a valid YouTube embed and that there are
628
657
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
629
- # to whitelist the current node.
630
- {:node_whitelist => [node]}
658
+ # to allowlist the current node.
659
+ {:node_allowlist => [node]}
631
660
  end
632
661
 
633
662
  html = %[