sanitize 5.1.0 → 6.0.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8cf7bac25cea64ed464d106bdc57019388598ca9f1a4e7d8eddf3a98bab12267
4
- data.tar.gz: e8b1f402b0d67a825b0ad4aad83829816fd9c78cd8445879636cba0a282e8ee5
3
+ metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
4
+ data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
5
5
  SHA512:
6
- metadata.gz: 956edaca6569a5933223da0aa7dcac4880b5164aa59e37256ac896c9fefb271da71425defe7e09e241b1333b441f5a2629893abed6d5a2a47d0726bf03597614
7
- data.tar.gz: e45a018b904bcf8cb996f8ed08427e80b8ce058c4fe414782460c5496e88bb6c2a4055304118057621a630e514b4f96bac11bdc686181a6f0097dc7bf912ab04
6
+ metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
7
+ data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
data/HISTORY.md CHANGED
@@ -1,5 +1,90 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.0.0 (2021-08-03)
4
+
5
+ ### Potentially Breaking Changes
6
+
7
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
8
+
9
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
10
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
11
+
12
+ [211]:https://github.com/rgrove/sanitize/pull/211
13
+
14
+ ## 5.2.3 (2021-01-11)
15
+
16
+ ### Bug Fixes
17
+
18
+ * Ensure protocol sanitization is applied to data attributes.
19
+ [@ccutrer - #207][207]
20
+
21
+ [207]:https://github.com/rgrove/sanitize/pull/207
22
+
23
+ ## 5.2.2 (2021-01-06)
24
+
25
+ ### Bug Fixes
26
+
27
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
28
+ custom transformer. [@mscrivo - #206][206]
29
+
30
+ [206]:https://github.com/rgrove/sanitize/pull/206
31
+
32
+ ## 5.2.1 (2020-06-16)
33
+
34
+ ### Bug Fixes
35
+
36
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
37
+ Sanitize versions 3.0.0 through 5.2.0.
38
+
39
+ When HTML was sanitized using the "relaxed" config or a custom config that
40
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
41
+ have beeen sanitized correctly even if `math` and `svg` were not in the
42
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
43
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
44
+
45
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
46
+ config or a custom config that allows one or more of the following HTML
47
+ elements:
48
+
49
+ - `iframe`
50
+ - `math`
51
+ - `noembed`
52
+ - `noframes`
53
+ - `noscript`
54
+ - `plaintext`
55
+ - `script`
56
+ - `style`
57
+ - `svg`
58
+ - `xmp`
59
+
60
+ See the security advisory for more details, including a workaround if you're
61
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
62
+
63
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
64
+ helping to verify the fix.
65
+
66
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
67
+
68
+ ## 5.2.0 (2020-06-06)
69
+
70
+ ### Changes
71
+
72
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
73
+ source and documentation.
74
+
75
+ While the etymology of "whitelist" may not be explicitly racist in origin or
76
+ intent, there are inherent racial connotations in the implication that white
77
+ is good and black (as in "blacklist") is not.
78
+
79
+ This is a change I should have made long ago, and I apologize for not making
80
+ it sooner.
81
+
82
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
83
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
84
+ The old keys will continue to work in order to avoid breaking existing code,
85
+ but they are no longer documented and may be removed in a future semver major
86
+ release.
87
+
3
88
  ## 5.1.0 (2019-09-07)
4
89
 
5
90
  ### Features
@@ -45,7 +130,7 @@ review the changes below carefully.
45
130
  - `script`
46
131
  - `style`
47
132
 
48
- * Children of whitelisted `iframe` elements are now always removed. In modern
133
+ * Children of allowlisted `iframe` elements are now always removed. In modern
49
134
  HTML, `iframe` elements should never have children. In HTML 4 and earlier
50
135
  `iframe` elements were allowed to contain fallback content for legacy
51
136
  browsers, but it's been almost two decades since that was useful.
@@ -84,7 +169,7 @@ review the changes below carefully.
84
169
 
85
170
  When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
86
171
  specially crafted HTML fragment can cause libxml2 to generate improperly
87
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
172
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
88
173
  elements.
89
174
 
90
175
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -128,7 +213,7 @@ review the changes below carefully.
128
213
 
129
214
  ## 4.4.0 (2016-09-29)
130
215
 
131
- * Added `srcset` to the attribute whitelist for `img` elements in the relaxed
216
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
132
217
  config. [@ejtttje - #156][156]
133
218
 
134
219
  [156]:https://github.com/rgrove/sanitize/pull/156
@@ -249,7 +334,7 @@ review the changes below carefully.
249
334
  ## 3.0.4 (2014-12-12)
250
335
 
251
336
  * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
252
- caused the URL to be removed even when the protocol was whitelisted.
337
+ caused the URL to be removed even when the protocol was allowlisted.
253
338
  [@benubois - #126][126]
254
339
 
255
340
  [126]:https://github.com/rgrove/sanitize/pull/126
@@ -258,7 +343,7 @@ review the changes below carefully.
258
343
  ## 3.0.3 (2014-10-29)
259
344
 
260
345
  * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
261
- `@media` block, causing them to be removed even when whitelist rules should
346
+ `@media` block, causing them to be removed even when allowlist rules should
262
347
  have allowed them to remain. [#121][121]
263
348
 
264
349
  [121]:https://github.com/rgrove/sanitize/issues/121
@@ -323,7 +408,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
323
408
  * The `clean_node!` method was renamed to `node!`.
324
409
 
325
410
  * The `document` method now raises a `Sanitize::Error` if the `<html>` element
326
- isn't whitelisted, rather than a `RuntimeError`. This error is also now raised
411
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
327
412
  regardless of the `:remove_contents` config setting.
328
413
 
329
414
  * The `:output` config has been removed. Output is now always HTML, not XHTML.
@@ -334,7 +419,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
334
419
 
335
420
  * Added advanced CSS sanitization support using [Crass][crass], which is fully
336
421
  compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
337
- whitelisted `<style>` elements and `style` attributes in HTML will be
422
+ allowlisted `<style>` elements and `style` attributes in HTML will be
338
423
  sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
339
424
  sanitize CSS stylesheets or properties.
340
425
 
@@ -386,7 +471,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
386
471
 
387
472
  When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
388
473
  specially crafted HTML fragment can cause libxml2 to generate improperly
389
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
474
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
390
475
  elements.
391
476
 
392
477
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -401,7 +486,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
401
486
 
402
487
  ## 2.1.0 (2014-01-13)
403
488
 
404
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
489
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
405
490
  symbol `:data` instead of an attribute name in the `:attributes` config to
406
491
  indicate that arbitrary data attributes should be allowed on an element.
407
492
 
@@ -482,12 +567,12 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
482
567
  the default depth-first mode.
483
568
 
484
569
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
485
- elements to the whitelists for the basic and relaxed configs.
570
+ elements to the allowlists for the basic and relaxed configs.
486
571
 
487
572
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
488
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
573
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
489
574
 
490
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
575
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
491
576
  elements in the relaxed config.
492
577
 
493
578
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
@@ -498,7 +583,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
498
583
  ## 1.2.1 (2010-04-20)
499
584
 
500
585
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
501
- remove the contents of all non-whitelisted elements in addition to the
586
+ remove the contents of all non-allowlisted elements in addition to the
502
587
  elements themselves. If set to an array of element names, Sanitize will
503
588
  remove the contents of only those elements (when filtered), and leave the
504
589
  contents of other filtered elements. [Thanks to Rafael Souza for the array
@@ -526,7 +611,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
526
611
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
527
612
  all its children.
528
613
 
529
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
614
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
530
615
  David Reese]
531
616
 
532
617
 
@@ -546,7 +631,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
546
631
 
547
632
  * Added a workaround for an Hpricot bug that prevents attribute names from
548
633
  being downcased in recent versions of Hpricot. This was exploitable to
549
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
634
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
550
635
  Wanicur]
551
636
 
552
637
 
@@ -576,7 +661,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
576
661
 
577
662
  ## 1.0.5 (2009-02-05)
578
663
 
579
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
664
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
580
665
  protocols from being cleaned when relative URLs were allowed. [Reported by
581
666
  Dev Purkayastha]
582
667
 
@@ -586,7 +671,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
586
671
 
587
672
  ## 1.0.4 (2009-01-16)
588
673
 
589
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
674
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
590
675
  by repeating it several times in a row. All versions of Sanitize prior to
591
676
  1.0.4 are vulnerable. [Reported by Cristobal]
592
677
 
@@ -594,7 +679,7 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
594
679
  ## 1.0.3 (2009-01-15)
595
680
 
596
681
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
597
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
682
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
598
683
  still decode the incomplete entities, users of those browsers may be
599
684
  vulnerable to malicious script injection on websites using versions of
600
685
  Sanitize prior to 1.0.3.
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.md CHANGED
@@ -1,28 +1,27 @@
1
1
  Sanitize
2
2
  ========
3
3
 
4
- Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of acceptable
5
- elements, attributes, and CSS properties, Sanitize will remove all unacceptable
6
- HTML and/or CSS from a string.
4
+ Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all HTML
5
+ and/or CSS from a string except the elements, attributes, and properties you
6
+ choose to allow.
7
7
 
8
8
  Using a simple configuration syntax, you can tell Sanitize to allow certain HTML
9
9
  elements, certain attributes within those elements, and even certain URL
10
- protocols within attributes that contain URLs. You can also whitelist CSS
11
- properties, @ rules, and URL protocols you wish to allow in elements or
12
- attributes containing CSS. Any HTML or CSS that you don't explicitly allow will
13
- be removed.
10
+ protocols within attributes that contain URLs. You can also allow specific CSS
11
+ properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
+ Any HTML or CSS that you don't explicitly allow will be removed.
14
13
 
15
- Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
14
+ Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
16
15
  exactly the same way modern browsers do, and [Crass][crass], which parses CSS
17
- exactly the same way modern browsers do. As long as your whitelist config only
16
+ exactly the same way modern browsers do. As long as your allowlist config only
18
17
  allows safe markup and CSS, even the most malformed or malicious input will be
19
18
  transformed into safe output.
20
19
 
21
- [![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
22
20
  [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
21
+ [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
23
22
 
24
23
  [crass]:https://github.com/rgrove/crass
25
- [gumbo]:https://github.com/google/gumbo-parser
24
+ [nokogumbo]:https://github.com/rubys/nokogumbo
26
25
 
27
26
  Links
28
27
  -----
@@ -73,6 +72,11 @@ Sanitize can sanitize the following types of input:
73
72
  * Standalone CSS stylesheets
74
73
  * Standalone CSS properties
75
74
 
75
+ However, please note that Sanitize _cannot_ fully sanitize the contents of
76
+ `<math>` or `<svg>` elements, since these elements don't follow the same parsing
77
+ rules as the rest of HTML. If this is something you need, you may want to look
78
+ for another solution.
79
+
76
80
  ### HTML Fragments
77
81
 
78
82
  A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -88,7 +92,7 @@ Sanitize.fragment(html)
88
92
  # => 'foo'
89
93
  ```
90
94
 
91
- To keep certain elements, add them to the element whitelist.
95
+ To keep certain elements, add them to the element allowlist.
92
96
 
93
97
  ```ruby
94
98
  Sanitize.fragment(html, :elements => ['b'])
@@ -97,7 +101,7 @@ Sanitize.fragment(html, :elements => ['b'])
97
101
 
98
102
  ### HTML Documents
99
103
 
100
- When sanitizing a document, the `<html>` element must be whitelisted. You can
104
+ When sanitizing a document, the `<html>` element must be allowlisted. You can
101
105
  also set `:allow_doctype` to `true` to allow well-formed document type
102
106
  definitions.
103
107
 
@@ -123,8 +127,8 @@ Sanitize.document(html,
123
127
 
124
128
  ### CSS in HTML
125
129
 
126
- To sanitize CSS in an HTML fragment or document, first whitelist the `<style>`
127
- element and/or the `style` attribute. Then whitelist the CSS properties,
130
+ To sanitize CSS in an HTML fragment or document, first allowlist the `<style>`
131
+ element and/or the `style` attribute. Then allowlist the CSS properties,
128
132
  @ rules, and URL protocols you wish to allow. You can also choose whether to
129
133
  allow CSS comments or browser compatibility hacks.
130
134
 
@@ -267,7 +271,7 @@ new copy using `Sanitize::Config.merge()`, like so:
267
271
 
268
272
  ```ruby
269
273
  # Create a customized copy of the Basic config, adding <div> and <table> to the
270
- # existing whitelisted elements.
274
+ # existing allowlisted elements.
271
275
  Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
272
276
  :elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
273
277
  :remove_contents => true
@@ -395,8 +399,7 @@ Proc.new { |url| url.start_with?("https://fonts.googleapis.com") }
395
399
 
396
400
  ##### :css => :properties (Array or Set)
397
401
 
398
- Whitelist of CSS property names to allow. Names should be specified in
399
- lowercase.
402
+ List of CSS property names to allow. Names should be specified in lowercase.
400
403
 
401
404
  ##### :css => :protocols (Array or Set)
402
405
 
@@ -417,9 +420,15 @@ elements not in this array will be removed.
417
420
  ]
418
421
  ```
419
422
 
423
+ **Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
424
+ elements, since these elements don't follow the same parsing rules as the rest
425
+ of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
426
+ content inside them will be allowed, even if that content would otherwise be
427
+ removed by Sanitize.
428
+
420
429
  #### :parser_options (Hash)
421
430
 
422
- [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
431
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
423
432
 
424
433
  ```ruby
425
434
  :parser_options => {
@@ -452,7 +461,7 @@ include the symbol `:relative` in the protocol array:
452
461
 
453
462
  #### :remove_contents (boolean or Array or Set)
454
463
 
455
- If this is `true`, Sanitize will remove the contents of any non-whitelisted
464
+ If this is `true`, Sanitize will remove the contents of any non-allowlisted
456
465
  elements in addition to the elements themselves. By default, Sanitize leaves the
457
466
  safe parts of an element's contents behind when the element is removed.
458
467
 
@@ -460,7 +469,7 @@ If this is an Array or Set of element names, then only the contents of the
460
469
  specified elements (when filtered) will be removed, and the contents of all
461
470
  other filtered elements will be left behind.
462
471
 
463
- The default value is `false`.
472
+ The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
464
473
 
465
474
  #### :transformers (Array or callable)
466
475
 
@@ -518,33 +527,33 @@ argument a Hash that contains the following items:
518
527
 
519
528
  * **:config** - The current Sanitize configuration Hash.
520
529
 
521
- * **:is_whitelisted** - `true` if the current node has been whitelisted by a
530
+ * **:is_allowlisted** - `true` if the current node has been allowlisted by a
522
531
  previous transformer, `false` otherwise. It's generally bad form to remove
523
- a node that a previous transformer has whitelisted.
532
+ a node that a previous transformer has allowlisted.
524
533
 
525
534
  * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
526
535
  node may be an element, a text node, a comment, a CDATA node, or a document
527
536
  fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
528
537
  selectively ignore node types you aren't interested in.
529
538
 
539
+ * **:node_allowlist** - Set of `Nokogiri::XML::Node` objects in the current
540
+ document that have been allowlisted by previous transformers, if any. It's
541
+ generally bad form to remove a node that a previous transformer has
542
+ allowlisted.
543
+
530
544
  * **:node_name** - The name of the current HTML node, always lowercase (e.g.
531
545
  "div" or "span"). For non-element nodes, the name will be something like
532
546
  "text", "comment", "#cdata-section", "#document-fragment", etc.
533
547
 
534
- * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
535
- document that have been whitelisted by previous transformers, if any. It's
536
- generally bad form to remove a node that a previous transformer has
537
- whitelisted.
538
-
539
548
  ### Output
540
549
 
541
550
  A transformer doesn't have to return anything, but may optionally return a Hash,
542
551
  which may contain the following items:
543
552
 
544
- * **:node_whitelist** - Array or Set of specific Nokogiri::XML::Node objects
545
- to add to the document's whitelist, bypassing the current Sanitize config.
546
- These specific nodes and all their attributes will be whitelisted, but
547
- their children will not be.
553
+ * **:node_allowlist** - Array or Set of specific `Nokogiri::XML::Node`
554
+ objects to add to the document's allowlist, bypassing the current Sanitize
555
+ config. These specific nodes and all their attributes will be allowlisted,
556
+ but their children will not be.
548
557
 
549
558
  If a transformer returns anything other than a Hash, the return value will be
550
559
  ignored.
@@ -587,16 +596,16 @@ Transformers have a tremendous amount of power, including the power to
587
596
  completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
588
597
  your own hands.
589
598
 
590
- ### Example: Transformer to whitelist image URLs by domain
599
+ ### Example: Transformer to allow image URLs by domain
591
600
 
592
601
  The following example demonstrates how to remove image elements unless they use
593
602
  a relative URL or are hosted on a specific domain. It assumes that the `<img>`
594
- element and its `src` attribute are already whitelisted.
603
+ element and its `src` attribute are already allowlisted.
595
604
 
596
605
  ```ruby
597
606
  require 'uri'
598
607
 
599
- image_whitelist_transformer = lambda do |env|
608
+ image_allowlist_transformer = lambda do |env|
600
609
  # Ignore everything except <img> elements.
601
610
  return unless env[:node_name] == 'img'
602
611
 
@@ -612,20 +621,20 @@ image_whitelist_transformer = lambda do |env|
612
621
  end
613
622
  ```
614
623
 
615
- ### Example: Transformer to whitelist YouTube video embeds
624
+ ### Example: Transformer to allow YouTube video embeds
616
625
 
617
626
  The following example demonstrates how to create a transformer that will safely
618
- whitelist valid YouTube video embeds without having to blindly allow other kinds
619
- of embedded content, which would be the case if you tried to do this by just
620
- whitelisting all `<iframe>` elements:
627
+ allow valid YouTube video embeds without having to allow other kinds of embedded
628
+ content, which would be the case if you tried to do this by just allowing all
629
+ `<iframe>` elements:
621
630
 
622
631
  ```ruby
623
632
  youtube_transformer = lambda do |env|
624
633
  node = env[:node]
625
634
  node_name = env[:node_name]
626
635
 
627
- # Don't continue if this node is already whitelisted or is not an element.
628
- return if env[:is_whitelisted] || !node.element?
636
+ # Don't continue if this node is already allowlisted or is not an element.
637
+ return if env[:is_allowlisted] || !node.element?
629
638
 
630
639
  # Don't continue unless the node is an iframe.
631
640
  return unless node_name == 'iframe'
@@ -646,8 +655,8 @@ youtube_transformer = lambda do |env|
646
655
 
647
656
  # Now that we're sure that this is a valid YouTube embed and that there are
648
657
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
649
- # to whitelist the current node.
650
- {:node_whitelist => [node]}
658
+ # to allowlist the current node.
659
+ {:node_allowlist => [node]}
651
660
  end
652
661
 
653
662
  html = %[
@@ -658,25 +667,3 @@ html = %[
658
667
  Sanitize.fragment(html, :transformers => youtube_transformer)
659
668
  # => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
660
669
  ```
661
-
662
- License
663
- -------
664
-
665
- Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
666
-
667
- Permission is hereby granted, free of charge, to any person obtaining a copy of
668
- this software and associated documentation files (the 'Software'), to deal in
669
- the Software without restriction, including without limitation the rights to
670
- use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
671
- the Software, and to permit persons to whom the Software is furnished to do so,
672
- subject to the following conditions:
673
-
674
- The above copyright notice and this permission notice shall be included in all
675
- copies or substantial portions of the Software.
676
-
677
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
678
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
679
- FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
680
- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
681
- IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
682
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -74,7 +74,7 @@ class Sanitize
74
74
  # the specified elements (when filtered) will be removed, and the contents
75
75
  # of all other filtered elements will be left behind.
76
76
  :remove_contents => %w[
77
- iframe noembed noframes noscript script style
77
+ iframe math noembed noframes noscript plaintext script style svg xmp
78
78
  ],
79
79
 
80
80
  # Transformers allow you to filter or alter nodes using custom logic. See
@@ -6,7 +6,7 @@ class Sanitize
6
6
  :elements => BASIC[:elements] + %w[
7
7
  address article aside bdi bdo body caption col colgroup data del div
8
8
  figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
9
- img ins main nav rp rt ruby section span style summary sup table tbody
9
+ img ins main nav rp rt ruby section span style summary table tbody
10
10
  td tfoot th thead title tr wbr
11
11
  ],
12
12
 
data/lib/sanitize/css.rb CHANGED
@@ -175,7 +175,7 @@ class Sanitize; class CSS
175
175
  next prop
176
176
 
177
177
  when :semicolon
178
- # Only preserve the semicolon if it was preceded by a whitelisted
178
+ # Only preserve the semicolon if it was preceded by an allowlisted
179
179
  # property. Otherwise, omit it in order to prevent redundant semicolons.
180
180
  if preceded_by_property
181
181
  preceded_by_property = false
@@ -296,7 +296,7 @@ class Sanitize; class CSS
296
296
  end
297
297
 
298
298
  # Returns `true` if the given node (which may be of type `:url` or
299
- # `:function`, since the CSS syntax can produce both) uses a whitelisted
299
+ # `:function`, since the CSS syntax can produce both) uses an allowlisted
300
300
  # protocol.
301
301
  def valid_url?(node)
302
302
  type = node[:node]
@@ -6,7 +6,7 @@ class Sanitize; module Transformers
6
6
  node = env[:node]
7
7
 
8
8
  if node.type == Nokogiri::XML::Node::COMMENT_NODE
9
- node.unlink unless env[:is_whitelisted]
9
+ node.unlink unless env[:is_allowlisted]
10
10
  end
11
11
  end
12
12
 
@@ -1,6 +1,6 @@
1
1
  class Sanitize; module Transformers; module CSS
2
2
 
3
- # Enforces a CSS whitelist on the contents of `style` attributes.
3
+ # Enforces a CSS allowlist on the contents of `style` attributes.
4
4
  class CleanAttribute
5
5
  def initialize(sanitizer_or_config)
6
6
  if Sanitize::CSS === sanitizer_or_config
@@ -14,7 +14,7 @@ class CleanAttribute
14
14
  node = env[:node]
15
15
 
16
16
  return unless node.type == Nokogiri::XML::Node::ELEMENT_NODE &&
17
- node.key?('style') && !env[:is_whitelisted]
17
+ node.key?('style') && !env[:is_allowlisted]
18
18
 
19
19
  attr = node.attribute('style')
20
20
  css = @scss.properties(attr.value)
@@ -27,7 +27,7 @@ class CleanAttribute
27
27
  end
28
28
  end
29
29
 
30
- # Enforces a CSS whitelist on the contents of `<style>` elements.
30
+ # Enforces a CSS allowlist on the contents of `<style>` elements.
31
31
  class CleanElement
32
32
  def initialize(sanitizer_or_config)
33
33
  if Sanitize::CSS === sanitizer_or_config
@@ -3,7 +3,7 @@
3
3
  class Sanitize; module Transformers
4
4
 
5
5
  CleanDoctype = lambda do |env|
6
- return if env[:is_whitelisted]
6
+ return if env[:is_allowlisted]
7
7
 
8
8
  node = env[:node]
9
9
 
@@ -76,11 +76,11 @@ class Sanitize; module Transformers; class CleanElement
76
76
 
77
77
  def call(env)
78
78
  node = env[:node]
79
- return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_whitelisted]
79
+ return if node.type != Nokogiri::XML::Node::ELEMENT_NODE || env[:is_allowlisted]
80
80
 
81
81
  name = env[:node_name]
82
82
 
83
- # Delete any element that isn't in the config whitelist, unless the node has
83
+ # Delete any element that isn't in the config allowlist, unless the node has
84
84
  # already been deleted from the document.
85
85
  #
86
86
  # It's important that we not try to reparent the children of a node that has
@@ -107,34 +107,31 @@ class Sanitize; module Transformers; class CleanElement
107
107
  return
108
108
  end
109
109
 
110
- attr_whitelist = @attributes[name] || @attributes[:all]
110
+ attr_allowlist = @attributes[name] || @attributes[:all]
111
111
 
112
- if attr_whitelist.nil?
113
- # Delete all attributes from elements with no whitelisted attributes.
112
+ if attr_allowlist.nil?
113
+ # Delete all attributes from elements with no allowlisted attributes.
114
114
  node.attribute_nodes.each {|attr| attr.unlink }
115
115
  else
116
- allow_data_attributes = attr_whitelist.include?(:data)
116
+ allow_data_attributes = attr_allowlist.include?(:data)
117
117
 
118
118
  # Delete any attribute that isn't allowed on this element.
119
119
  node.attribute_nodes.each do |attr|
120
120
  attr_name = attr.name.downcase
121
121
 
122
- unless attr_whitelist.include?(attr_name)
123
- # The attribute isn't whitelisted.
122
+ unless attr_allowlist.include?(attr_name)
123
+ # The attribute isn't in the allowlist, but may still be allowed if
124
+ # it's a data attribute.
124
125
 
125
- if allow_data_attributes && attr_name.start_with?('data-')
126
- # Arbitrary data attributes are allowed. If this is a data
127
- # attribute, continue.
128
- next if attr_name =~ REGEX_DATA_ATTR
126
+ unless allow_data_attributes && attr_name.start_with?('data-') && attr_name =~ REGEX_DATA_ATTR
127
+ # Either the attribute isn't a data attribute or arbitrary data
128
+ # attributes aren't allowed. Remove the attribute.
129
+ attr.unlink
130
+ next
129
131
  end
130
-
131
- # Either the attribute isn't a data attribute or arbitrary data
132
- # attributes aren't allowed. Remove the attribute.
133
- attr.unlink
134
- next
135
132
  end
136
133
 
137
- # The attribute is whitelisted.
134
+ # The attribute is allowed.
138
135
 
139
136
  # Remove any attributes that use unacceptable protocols.
140
137
  if @protocols.include?(name) && @protocols[name].include?(attr_name)
@@ -162,7 +159,7 @@ class Sanitize; module Transformers; class CleanElement
162
159
  # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
163
160
  # attempt to preserve server-side includes. This can result in XSS since
164
161
  # an unescaped double quote can allow an attacker to inject a
165
- # non-whitelisted attribute.
162
+ # non-allowlisted attribute.
166
163
  #
167
164
  # Sanitize works around this by implementing its own escaping for
168
165
  # affected attributes, some of which can exist on any element and some
@@ -191,7 +188,7 @@ class Sanitize; module Transformers; class CleanElement
191
188
  # Element-specific special cases.
192
189
  case name
193
190
 
194
- # If this is a whitelisted iframe that has children, remove all its
191
+ # If this is an allowlisted iframe that has children, remove all its
195
192
  # children. The HTML standard says iframes shouldn't have content, but when
196
193
  # they do, this content is parsed as text and is serialized verbatim without
197
194
  # being escaped, which is unsafe because legacy browsers may still render it
@@ -1,5 +1,5 @@
1
1
  # encoding: utf-8
2
2
 
3
3
  class Sanitize
4
- VERSION = '5.1.0'
4
+ VERSION = '6.0.0'
5
5
  end
data/lib/sanitize.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  # encoding: utf-8
2
2
 
3
- require 'nokogumbo'
3
+ require 'nokogiri'
4
4
  require 'set'
5
5
 
6
6
  require_relative 'sanitize/version'
@@ -54,7 +54,7 @@ class Sanitize
54
54
  # Returns a sanitized copy of the given full _html_ document, using the
55
55
  # settings in _config_ if specified.
56
56
  #
57
- # When sanitizing a document, the `<html>` element must be whitelisted or an
57
+ # When sanitizing a document, the `<html>` element must be allowlisted or an
58
58
  # error will be raised. If this is undesirable, you should probably use
59
59
  # {#fragment} instead.
60
60
  def self.document(html, config = {})
@@ -117,7 +117,7 @@ class Sanitize
117
117
 
118
118
  # Returns a sanitized copy of the given _html_ document.
119
119
  #
120
- # When sanitizing a document, the `<html>` element must be whitelisted or an
120
+ # When sanitizing a document, the `<html>` element must be allowlisted or an
121
121
  # error will be raised. If this is undesirable, you should probably use
122
122
  # {#fragment} instead.
123
123
  def document(html)
@@ -147,20 +147,20 @@ class Sanitize
147
147
  # in place.
148
148
  #
149
149
  # If _node_ is a `Nokogiri::XML::Document`, the `<html>` element must be
150
- # whitelisted or an error will be raised.
150
+ # allowlisted or an error will be raised.
151
151
  def node!(node)
152
152
  raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
153
153
 
154
154
  if node.is_a?(Nokogiri::XML::Document)
155
155
  unless @config[:elements].include?('html')
156
- raise Error, 'When sanitizing a document, "<html>" must be whitelisted.'
156
+ raise Error, 'When sanitizing a document, "<html>" must be allowlisted.'
157
157
  end
158
158
  end
159
159
 
160
- node_whitelist = Set.new
160
+ node_allowlist = Set.new
161
161
 
162
162
  traverse(node) do |n|
163
- transform_node!(n, node_whitelist)
163
+ transform_node!(n, node_allowlist)
164
164
  end
165
165
 
166
166
  node
@@ -189,7 +189,7 @@ class Sanitize
189
189
  node.to_html(preserve_newline: true)
190
190
  end
191
191
 
192
- def transform_node!(node, node_whitelist)
192
+ def transform_node!(node, node_allowlist)
193
193
  @transformers.each do |transformer|
194
194
  # Since transform_node! may be called in a tight loop to process thousands
195
195
  # of items, we can optimize both memory and CPU performance by:
@@ -199,15 +199,19 @@ class Sanitize
199
199
  # does merge! create a new hash, it is also 2.6x slower:
200
200
  # https://github.com/JuanitoFatas/fast-ruby#hashmerge-vs-hashmerge-code
201
201
  config = @transformer_config
202
- config[:is_whitelisted] = node_whitelist.include?(node)
202
+ config[:is_allowlisted] = config[:is_whitelisted] = node_allowlist.include?(node)
203
203
  config[:node] = node
204
204
  config[:node_name] = node.name.downcase
205
- config[:node_whitelist] = node_whitelist
205
+ config[:node_allowlist] = config[:node_whitelist] = node_allowlist
206
206
 
207
- result = transformer.call(config)
207
+ result = transformer.call(**config)
208
208
 
209
- if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
210
- node_whitelist.merge(result[:node_whitelist])
209
+ if result.is_a?(Hash)
210
+ result_allowlist = result[:node_allowlist] || result[:node_whitelist]
211
+
212
+ if result_allowlist.respond_to?(:each)
213
+ node_allowlist.merge(result_allowlist)
214
+ end
211
215
  end
212
216
  end
213
217
 
@@ -162,7 +162,7 @@ describe 'Sanitize::Transformers::CleanElement' do
162
162
  }
163
163
 
164
164
  describe 'Default config' do
165
- it 'should remove non-whitelisted elements, leaving safe contents behind' do
165
+ it 'should remove non-allowlisted elements, leaving safe contents behind' do
166
166
  Sanitize.fragment('foo <b>bar</b> <strong><a href="#a">baz</a></strong> quux')
167
167
  .must_equal 'foo bar baz quux'
168
168
 
@@ -192,21 +192,16 @@ describe 'Sanitize::Transformers::CleanElement' do
192
192
  .must_equal ''
193
193
  end
194
194
 
195
- it 'should escape the content of removed `plaintext` elements' do
196
- Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
197
- .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
198
- end
199
-
200
- it 'should escape the content of removed `xmp` elements' do
201
- Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
202
- .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
203
- end
204
-
205
195
  it 'should not preserve the content of removed `iframe` elements' do
206
196
  Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
207
197
  .must_equal ''
208
198
  end
209
199
 
200
+ it 'should not preserve the content of removed `math` elements' do
201
+ Sanitize.fragment('<math>hello! <script>alert(0)</script></math>')
202
+ .must_equal ''
203
+ end
204
+
210
205
  it 'should not preserve the content of removed `noembed` elements' do
211
206
  Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
212
207
  .must_equal ''
@@ -222,6 +217,11 @@ describe 'Sanitize::Transformers::CleanElement' do
222
217
  .must_equal ''
223
218
  end
224
219
 
220
+ it 'should not preserve the content of removed `plaintext` elements' do
221
+ Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
222
+ .must_equal ''
223
+ end
224
+
225
225
  it 'should not preserve the content of removed `script` elements' do
226
226
  Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
227
227
  .must_equal ''
@@ -232,6 +232,16 @@ describe 'Sanitize::Transformers::CleanElement' do
232
232
  .must_equal ''
233
233
  end
234
234
 
235
+ it 'should not preserve the content of removed `svg` elements' do
236
+ Sanitize.fragment('<svg>hello! <script>alert(0)</script></svg>')
237
+ .must_equal ''
238
+ end
239
+
240
+ it 'should not preserve the content of removed `xmp` elements' do
241
+ Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
242
+ .must_equal ''
243
+ end
244
+
235
245
  strings.each do |name, data|
236
246
  it "should clean #{name} HTML" do
237
247
  Sanitize.fragment(data[:html]).must_equal(data[:default])
@@ -315,7 +325,7 @@ describe 'Sanitize::Transformers::CleanElement' do
315
325
  end
316
326
 
317
327
  describe 'Custom configs' do
318
- it 'should allow attributes on all elements if whitelisted under :all' do
328
+ it 'should allow attributes on all elements if allowlisted under :all' do
319
329
  input = '<p class="foo">bar</p>'
320
330
 
321
331
  Sanitize.fragment(input).must_equal ' bar '
@@ -336,7 +346,7 @@ describe 'Sanitize::Transformers::CleanElement' do
336
346
  }).must_equal input
337
347
  end
338
348
 
339
- it "should not allow relative URLs when relative URLs aren't whitelisted" do
349
+ it "should not allow relative URLs when relative URLs aren't allowlisted" do
340
350
  input = '<a href="/foo/bar">Link</a>'
341
351
 
342
352
  Sanitize.fragment(input,
@@ -400,7 +410,7 @@ describe 'Sanitize::Transformers::CleanElement' do
400
410
  ).must_equal 'foo bar baz hi '
401
411
  end
402
412
 
403
- it 'should remove the contents of whitelisted iframes' do
413
+ it 'should remove the contents of allowlisted iframes' do
404
414
  Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
405
415
  :elements => ['iframe']
406
416
  ).must_equal '<iframe></iframe>'
@@ -481,6 +491,22 @@ describe 'Sanitize::Transformers::CleanElement' do
481
491
  }).must_equal "<a>Text</a>"
482
492
  end
483
493
 
494
+ it 'should sanitize protocols in data attributes even if data attributes are generically allowed' do
495
+ input = '<a data-url="mailto:someone@example.com">Text</a>'
496
+
497
+ Sanitize.fragment(input, {
498
+ :elements => ['a'],
499
+ :attributes => {'a' => [:data]},
500
+ :protocols => {'a' => {'data-url' => ['https']}}
501
+ }).must_equal "<a>Text</a>"
502
+
503
+ Sanitize.fragment(input, {
504
+ :elements => ['a'],
505
+ :attributes => {'a' => [:data]},
506
+ :protocols => {'a' => {'data-url' => ['mailto']}}
507
+ }).must_equal input
508
+ end
509
+
484
510
  it 'should prevent `<meta>` tags from being used to set a non-UTF-8 charset' do
485
511
  Sanitize.document('<html><head><meta charset="utf-8"></head><body>Howdy!</body></html>',
486
512
  :elements => %w[html head meta body],
@@ -128,13 +128,15 @@ describe 'Malicious HTML' do
128
128
 
129
129
  # libxml2 >= 2.9.2 doesn't escape comments within some attributes, in an
130
130
  # attempt to preserve server-side includes. This can result in XSS since an
131
- # unescaped double quote can allow an attacker to inject a non-whitelisted
131
+ # unescaped double quote can allow an attacker to inject a non-allowlisted
132
132
  # attribute. Sanitize works around this by implementing its own escaping for
133
133
  # affected attributes.
134
134
  #
135
135
  # The relevant libxml2 code is here:
136
136
  # <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
137
137
  describe 'unsafe libxml2 server-side includes in attributes' do
138
+ using_unpatched_libxml2 = Nokogiri::VersionInfo.instance.libxml2_using_system?
139
+
138
140
  tag_configs = [
139
141
  {
140
142
  tag_name: 'a',
@@ -166,6 +168,8 @@ describe 'Malicious HTML' do
166
168
  input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
167
169
 
168
170
  it 'should escape unsafe characters in attributes' do
171
+ skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
172
+
169
173
  # This uses Nokogumbo's HTML-compliant serializer rather than
170
174
  # libxml2's.
171
175
  @s.fragment(input).
@@ -191,6 +195,8 @@ describe 'Malicious HTML' do
191
195
  input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
192
196
 
193
197
  it 'should not escape characters unnecessarily' do
198
+ skip "behavior should only exist in nokogiri's patched libxml" if using_unpatched_libxml2
199
+
194
200
  # This uses Nokogumbo's HTML-compliant serializer rather than
195
201
  # libxml2's.
196
202
  @s.fragment(input).
@@ -213,4 +219,17 @@ describe 'Malicious HTML' do
213
219
  end
214
220
  end
215
221
  end
222
+
223
+ # https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
224
+ describe 'foreign content bypass in relaxed config' do
225
+ it 'prevents a sanitization bypass via carefully crafted foreign content' do
226
+ %w[iframe noembed noframes noscript plaintext script style xmp].each do |tag_name|
227
+ @s.fragment(%[<math><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
228
+ must_equal ''
229
+
230
+ @s.fragment(%[<svg><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
231
+ must_equal ''
232
+ end
233
+ end
234
+ end
216
235
  end
data/test/test_parser.rb CHANGED
@@ -55,7 +55,7 @@ describe 'Parser' do
55
55
  siblings << env[:node][:id]
56
56
  end
57
57
 
58
- return {:node_whitelist => [env[:node]]}
58
+ return {:node_allowlist => [env[:node]]}
59
59
  })
60
60
 
61
61
  # All siblings should be traversed, and in the order added.
@@ -53,9 +53,9 @@ describe 'Sanitize' do
53
53
  @s.document("a#{sample_non_chars}z").must_equal "<html>az</html>"
54
54
  end
55
55
 
56
- describe 'when html body exceeds Nokogumbo::DEFAULT_MAX_TREE_DEPTH' do
56
+ describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
57
57
  let(:content) do
58
- content = nest_html_content('<b>foo</b>', Nokogumbo::DEFAULT_MAX_TREE_DEPTH)
58
+ content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
59
59
  "<html>#{content}</html>"
60
60
  end
61
61
 
@@ -115,9 +115,9 @@ describe 'Sanitize' do
115
115
  @s.fragment("a#{sample_non_chars}z").must_equal "az"
116
116
  end
117
117
 
118
- describe 'when html body exceeds Nokogumbo::DEFAULT_MAX_TREE_DEPTH' do
118
+ describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
119
119
  let(:content) do
120
- content = nest_html_content('<b>foo</b>', Nokogumbo::DEFAULT_MAX_TREE_DEPTH)
120
+ content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
121
121
  "<body>#{content}</body>"
122
122
  end
123
123
 
@@ -150,7 +150,7 @@ describe 'Sanitize' do
150
150
  frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
151
151
  end
152
152
 
153
- describe "when the given node is a document and <html> isn't whitelisted" do
153
+ describe "when the given node is a document and <html> isn't allowlisted" do
154
154
  it 'should raise a Sanitize::Error' do
155
155
  doc = Nokogiri::HTML5.parse('foo')
156
156
  proc { @s.node!(doc) }.must_raise Sanitize::Error
@@ -21,7 +21,7 @@ describe 'Sanitize::CSS' do
21
21
  @custom.properties(css).must_equal 'background: #fff; '
22
22
  end
23
23
 
24
- it 'should allow whitelisted URL protocols' do
24
+ it 'should allow allowlisted URL protocols' do
25
25
  [
26
26
  "background: url(relative.jpg)",
27
27
  "background: url('relative.jpg')",
@@ -36,7 +36,7 @@ describe 'Sanitize::CSS' do
36
36
  end
37
37
  end
38
38
 
39
- it 'should not allow non-whitelisted URL protocols' do
39
+ it 'should not allow non-allowlisted URL protocols' do
40
40
  [
41
41
  "background: url(javascript:alert(0))",
42
42
  "background: url(ja\\56 ascript:alert(0))",
@@ -307,7 +307,7 @@ describe 'Sanitize::CSS' do
307
307
  end
308
308
 
309
309
  describe ":at_rules" do
310
- it "should remove blockless at-rules that aren't whitelisted" do
310
+ it "should remove blockless at-rules that aren't allowlisted" do
311
311
  css = %[
312
312
  @charset 'utf-8';
313
313
  @import url('foo.css');
@@ -319,7 +319,7 @@ describe 'Sanitize::CSS' do
319
319
  ].strip
320
320
  end
321
321
 
322
- describe "when blockless at-rules are whitelisted" do
322
+ describe "when blockless at-rules are allowlisted" do
323
323
  before do
324
324
  @scss = Sanitize::CSS.new(Sanitize::Config.merge(Sanitize::Config::RELAXED[:css], {
325
325
  :at_rules => ['charset', 'import']
@@ -12,11 +12,13 @@ describe 'Transformers' do
12
12
  return unless env[:node].element?
13
13
 
14
14
  env[:config][:foo].must_equal :bar
15
- env[:is_whitelisted].must_equal false
15
+ env[:is_allowlisted].must_equal false
16
+ env[:is_whitelisted].must_equal env[:is_allowlisted]
16
17
  env[:node].must_be_kind_of Nokogiri::XML::Node
17
18
  env[:node_name].must_equal 'span'
18
- env[:node_whitelist].must_be_kind_of Set
19
- env[:node_whitelist].must_be_empty
19
+ env[:node_allowlist].must_be_kind_of Set
20
+ env[:node_allowlist].must_be_empty
21
+ env[:node_whitelist].must_equal env[:node_allowlist]
20
22
  }
21
23
  )
22
24
  end
@@ -43,34 +45,38 @@ describe 'Transformers' do
43
45
  nodes.must_equal %w[div span strong b p]
44
46
  end
45
47
 
46
- it 'should whitelist nodes in the node whitelist' do
48
+ it 'should allowlist nodes in the node allowlist' do
47
49
  Sanitize.fragment('<div class="foo">foo</div><span>bar</span>',
48
50
  :transformers => [
49
51
  proc {|env|
50
- {:node_whitelist => [env[:node]]} if env[:node_name] == 'div'
52
+ {:node_allowlist => [env[:node]]} if env[:node_name] == 'div'
51
53
  },
52
54
 
53
55
  proc {|env|
54
- env[:is_whitelisted].must_equal false unless env[:node_name] == 'div'
55
- env[:is_whitelisted].must_equal true if env[:node_name] == 'div'
56
- env[:node_whitelist].must_include env[:node] if env[:node_name] == 'div'
56
+ env[:is_allowlisted].must_equal false unless env[:node_name] == 'div'
57
+ env[:is_allowlisted].must_equal true if env[:node_name] == 'div'
58
+ env[:node_allowlist].must_include env[:node] if env[:node_name] == 'div'
59
+ env[:is_whitelisted].must_equal env[:is_allowlisted]
60
+ env[:node_whitelist].must_equal env[:node_allowlist]
57
61
  }
58
62
  ]
59
63
  ).must_equal '<div class="foo">foo</div>bar'
60
64
  end
61
65
 
62
- it 'should clear the node whitelist after each fragment' do
66
+ it 'should clear the node allowlist after each fragment' do
63
67
  called = false
64
68
 
65
69
  Sanitize.fragment('<div>foo</div>',
66
- :transformers => proc {|env| {:node_whitelist => [env[:node]]}}
70
+ :transformers => proc {|env| {:node_allowlist => [env[:node]]}}
67
71
  )
68
72
 
69
73
  Sanitize.fragment('<div>foo</div>',
70
74
  :transformers => proc {|env|
71
75
  called = true
72
- env[:is_whitelisted].must_equal false
73
- env[:node_whitelist].must_be_empty
76
+ env[:is_allowlisted].must_equal false
77
+ env[:is_whitelisted].must_equal env[:is_allowlisted]
78
+ env[:node_allowlist].must_be_empty
79
+ env[:node_whitelist].must_equal env[:node_allowlist]
74
80
  }
75
81
  )
76
82
 
@@ -83,10 +89,10 @@ describe 'Transformers' do
83
89
  .must_equal(' foo ')
84
90
  end
85
91
 
86
- describe 'Image whitelist transformer' do
92
+ describe 'Image allowlist transformer' do
87
93
  require 'uri'
88
94
 
89
- image_whitelist_transformer = lambda do |env|
95
+ image_allowlist_transformer = lambda do |env|
90
96
  # Ignore everything except <img> elements.
91
97
  return unless env[:node_name] == 'img'
92
98
 
@@ -103,7 +109,7 @@ describe 'Transformers' do
103
109
 
104
110
  before do
105
111
  @s = Sanitize.new(Sanitize::Config.merge(Sanitize::Config::RELAXED,
106
- :transformers => image_whitelist_transformer))
112
+ :transformers => image_allowlist_transformer))
107
113
  end
108
114
 
109
115
  it 'should allow images with relative URLs' do
@@ -142,8 +148,8 @@ describe 'Transformers' do
142
148
  node = env[:node]
143
149
  node_name = env[:node_name]
144
150
 
145
- # Don't continue if this node is already whitelisted or is not an element.
146
- return if env[:is_whitelisted] || !node.element?
151
+ # Don't continue if this node is already allowlisted or is not an element.
152
+ return if env[:is_allowlisted] || !node.element?
147
153
 
148
154
  # Don't continue unless the node is an iframe.
149
155
  return unless node_name == 'iframe'
@@ -164,8 +170,8 @@ describe 'Transformers' do
164
170
 
165
171
  # Now that we're sure that this is a valid YouTube embed and that there are
166
172
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
167
- # to whitelist the current node.
168
- {:node_whitelist => [node]}
173
+ # to allowlist the current node.
174
+ {:node_allowlist => [node]}
169
175
  end
170
176
 
171
177
  it 'should allow HTTP YouTube video embeds' do
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanitize
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.1.0
4
+ version: 6.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ryan Grove
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-09-08 00:00:00.000000000 Z
11
+ date: 2021-08-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: crass
@@ -30,59 +30,45 @@ dependencies:
30
30
  requirements:
31
31
  - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: 1.8.0
33
+ version: 1.12.0
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
- version: 1.8.0
41
- - !ruby/object:Gem::Dependency
42
- name: nokogumbo
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - "~>"
46
- - !ruby/object:Gem::Version
47
- version: '2.0'
48
- type: :runtime
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - "~>"
53
- - !ruby/object:Gem::Version
54
- version: '2.0'
40
+ version: 1.12.0
55
41
  - !ruby/object:Gem::Dependency
56
42
  name: minitest
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - "~>"
60
46
  - !ruby/object:Gem::Version
61
- version: 5.11.3
47
+ version: 5.14.4
62
48
  type: :development
63
49
  prerelease: false
64
50
  version_requirements: !ruby/object:Gem::Requirement
65
51
  requirements:
66
52
  - - "~>"
67
53
  - !ruby/object:Gem::Version
68
- version: 5.11.3
54
+ version: 5.14.4
69
55
  - !ruby/object:Gem::Dependency
70
56
  name: rake
71
57
  requirement: !ruby/object:Gem::Requirement
72
58
  requirements:
73
59
  - - "~>"
74
60
  - !ruby/object:Gem::Version
75
- version: 12.3.1
61
+ version: 13.0.6
76
62
  type: :development
77
63
  prerelease: false
78
64
  version_requirements: !ruby/object:Gem::Requirement
79
65
  requirements:
80
66
  - - "~>"
81
67
  - !ruby/object:Gem::Version
82
- version: 12.3.1
83
- description: Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of
84
- acceptable elements, attributes, and CSS properties, Sanitize will remove all unacceptable
85
- HTML and/or CSS from a string.
68
+ version: 13.0.6
69
+ description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
70
+ HTML and/or CSS from a string except the elements, attributes, and properties you
71
+ choose to allow.
86
72
  email: ryan@wonko.com
87
73
  executables: []
88
74
  extensions: []
@@ -120,7 +106,7 @@ homepage: https://github.com/rgrove/sanitize/
120
106
  licenses:
121
107
  - MIT
122
108
  metadata: {}
123
- post_install_message:
109
+ post_install_message:
124
110
  rdoc_options: []
125
111
  require_paths:
126
112
  - lib
@@ -128,15 +114,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
128
114
  requirements:
129
115
  - - ">="
130
116
  - !ruby/object:Gem::Version
131
- version: 2.1.0
117
+ version: 2.5.0
132
118
  required_rubygems_version: !ruby/object:Gem::Requirement
133
119
  requirements:
134
120
  - - ">="
135
121
  - !ruby/object:Gem::Version
136
122
  version: 1.2.0
137
123
  requirements: []
138
- rubygems_version: 3.0.3
139
- signing_key:
124
+ rubygems_version: 3.2.22
125
+ signing_key:
140
126
  specification_version: 4
141
- summary: Whitelist-based HTML and CSS sanitizer.
127
+ summary: Allowlist-based HTML and CSS sanitizer.
142
128
  test_files: []