sanitize 5.2.1 → 6.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3d1290690a9d32db9e06b8fb19c7e285c94a1d91ed51a4eb7e96389e427348d9
4
- data.tar.gz: 5131063daf1763c83978954bed9ee3a783099e40aa71e50de26d06b8ae0c1054
3
+ metadata.gz: 8811451060f77afcf698da8e589994af3c5683d08a0032a279f76b3b556b5f33
4
+ data.tar.gz: d2617a785428b5b99717ef1743cc75dc1f8c53bda53fea59050725a1218b5fe8
5
5
  SHA512:
6
- metadata.gz: bfcb7cda6aa70590f642583b41936bc09d8929210046cebdd0d0ff452ccb3213844b4c40d4e205e79c0cd64a2a0d56e16790e38f4c8f247b8abfa32dbec22297
7
- data.tar.gz: 0ea5a6d6848f9a125f17e4e23145adff4d3c4ccfe30a3407466fae074ed33cbd4b1869eb5a9f0a72b808449b8cf166a3695c2a6d63b16a83b047fd260bfe50bd
6
+ metadata.gz: 33b4b13b4369ba159031a1298bd5965b9dbe15921121b58d55155d3e717dd7cadf3495e10683613cd1439055f6d5a57249e540824fd9f98a11ae62db08167573
7
+ data.tar.gz: 40f149e0e3c51283b72332efb4598a81263fba029ea121ede31bb578634de339ed5c162fd49355601568c5cbc08f617879f058bcdfe5ce35afa6322e155cff8b
data/HISTORY.md CHANGED
@@ -1,5 +1,135 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.1.3 (2024-08-14)
4
+
5
+ ### Bug Fixes
6
+
7
+ * The CSS URL protocol allowlist is now enforced on the nonstandard `-webkit-image-set` CSS function. [@ltk - #242][242]
8
+
9
+ [242]:https://github.com/rgrove/sanitize/pull/242
10
+
11
+ ## 6.1.2 (2024-07-27)
12
+
13
+ ### Bug Fixes
14
+
15
+ * The CSS URL protocol allowlist is now properly enforced in [CSS Images Module Level 4](https://drafts.csswg.org/css-images-4/) `image` and `image-set` functions. [@ltk - #240][240]
16
+
17
+ [240]:https://github.com/rgrove/sanitize/pull/240
18
+
19
+ ## 6.1.1 (2024-06-12)
20
+
21
+ ### Bug Fixes
22
+
23
+ * Proactively fixed a compatibility issue with libxml >= 2.13.0 (which will be used in an upcoming version of Nokogiri) that caused HTML doctype sanitization to fail. [@flavorjones - #238][238]
24
+
25
+ [238]:https://github.com/rgrove/sanitize/pull/238
26
+
27
+ ## 6.1.0 (2023-09-14)
28
+
29
+ ### Features
30
+
31
+ * Added the `text-decoration-skip-ink` and `text-decoration-thickness` CSS properties to the relaxed config. [@martineriksson - #228][228]
32
+
33
+ [228]:https://github.com/rgrove/sanitize/pull/228
34
+
35
+ ## 6.0.2 (2023-07-06)
36
+
37
+ ### Bug Fixes
38
+
39
+ * CVE-2023-36823: Fixed an HTML+CSS sanitization bypass that could allow XSS
40
+ (cross-site scripting). This issue affects Sanitize versions 3.0.0 through
41
+ 6.0.1.
42
+
43
+ When using Sanitize's relaxed config or a custom config that allows `<style>`
44
+ elements and one or more CSS at-rules, carefully crafted input could be used
45
+ to sneak arbitrary HTML through Sanitize.
46
+
47
+ See the following security advisory for additional details:
48
+ [GHSA-f5ww-cq3m-q3g7](https://github.com/rgrove/sanitize/security/advisories/GHSA-f5ww-cq3m-q3g7)
49
+
50
+ Thanks to @cure53 for finding this issue.
51
+
52
+ ## 6.0.1 (2023-01-27)
53
+
54
+ ### Bug Fixes
55
+
56
+ * Sanitize now always removes `<noscript>` elements and their contents, even
57
+ when `noscript` is in the allowlist.
58
+
59
+ This fixes a sanitization bypass that could occur when `noscript` was allowed
60
+ by a custom allowlist. In this scenario, carefully crafted input could sneak
61
+ arbitrary HTML through Sanitize, potentially enabling an XSS (cross-site
62
+ scripting) attack.
63
+
64
+ Sanitize's default configs don't allow `<noscript>` elements and are not
65
+ vulnerable. This issue only affects users who are using a custom config that
66
+ adds `noscript` to the element allowlist.
67
+
68
+ The root cause of this issue is that HTML parsing rules treat the contents of
69
+ a `<noscript>` element differently depending on whether scripting is enabled
70
+ in the user agent. Nokogiri doesn't support scripting so it follows the
71
+ "scripting disabled" rules, but a web browser with scripting enabled will
72
+ follow the "scripting enabled" rules. This means that Sanitize can't reliably
73
+ make the contents of a `<noscript>` element safe for scripting enabled
74
+ browsers, so the safest thing to do is to remove the element and its contents
75
+ entirely.
76
+
77
+ See the following security advisory for additional details:
78
+ [GHSA-fw3g-2h3j-qmm7](https://github.com/rgrove/sanitize/security/advisories/GHSA-fw3g-2h3j-qmm7)
79
+
80
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
81
+ (@leeN) for reporting this issue.
82
+
83
+ * Fixed an edge case in which the contents of an "unescaped text" element (such
84
+ as `<noembed>` or `<xmp>`) were not properly escaped if that element was
85
+ allowlisted and was also inside an allowlisted `<math>` or `<svg>` element.
86
+
87
+ The only way to encounter this situation was to ignore multiple warnings in
88
+ the readme and create a custom config that allowlisted all the elements
89
+ involved, including `<math>` or `<svg>`. If you're using a default config or
90
+ if you heeded the warnings about MathML and SVG not being supported, you're
91
+ not affected by this issue.
92
+
93
+ Please let this be a reminder that Sanitize cannot safely sanitize MathML or
94
+ SVG content and does not support this use case. The default configs don't
95
+ allow MathML or SVG elements, and allowlisting MathML or SVG elements in a
96
+ custom config may create a security vulnerability in your application.
97
+
98
+ Documentation has been updated to add more warnings and to make the existing
99
+ warnings about this more prominent.
100
+
101
+ Thanks to David Klein from [TU Braunschweig](https://www.tu-braunschweig.de/en/ias)
102
+ (@leeN) for reporting this issue.
103
+
104
+ ## 6.0.0 (2021-08-03)
105
+
106
+ ### Potentially Breaking Changes
107
+
108
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
109
+
110
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
111
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
112
+
113
+ [211]:https://github.com/rgrove/sanitize/pull/211
114
+
115
+ ## 5.2.3 (2021-01-11)
116
+
117
+ ### Bug Fixes
118
+
119
+ * Ensure protocol sanitization is applied to data attributes.
120
+ [@ccutrer - #207][207]
121
+
122
+ [207]:https://github.com/rgrove/sanitize/pull/207
123
+
124
+ ## 5.2.2 (2021-01-06)
125
+
126
+ ### Bug Fixes
127
+
128
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
129
+ custom transformer. [@mscrivo - #206][206]
130
+
131
+ [206]:https://github.com/rgrove/sanitize/pull/206
132
+
3
133
  ## 5.2.1 (2020-06-16)
4
134
 
5
135
  ### Bug Fixes
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.md CHANGED
@@ -11,27 +11,26 @@ protocols within attributes that contain URLs. You can also allow specific CSS
11
11
  properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
12
  Any HTML or CSS that you don't explicitly allow will be removed.
13
13
 
14
- Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
15
- exactly the same way modern browsers do, and [Crass][crass], which parses CSS
16
- exactly the same way modern browsers do. As long as your allowlist config only
17
- allows safe markup and CSS, even the most malformed or malicious input will be
18
- transformed into safe output.
14
+ Sanitize is based on the [Nokogiri HTML5 parser][nokogiri], which parses HTML
15
+ the same way modern browsers do, and [Crass][crass], which parses CSS the same
16
+ way modern browsers do. As long as your allowlist config only allows safe markup
17
+ and CSS, even the most malformed or malicious input will be transformed into
18
+ safe output.
19
19
 
20
- [![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
21
20
  [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
21
+ [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
22
22
 
23
23
  [crass]:https://github.com/rgrove/crass
24
- [gumbo]:https://github.com/google/gumbo-parser
24
+ [nokogiri]:https://github.com/sparklemotion/nokogiri
25
25
 
26
26
  Links
27
27
  -----
28
28
 
29
29
  * [Home](https://github.com/rgrove/sanitize/)
30
- * [API Docs](http://rubydoc.info/github/rgrove/sanitize/master)
30
+ * [API Docs](https://rubydoc.info/github/rgrove/sanitize/Sanitize)
31
31
  * [Issues](https://github.com/rgrove/sanitize/issues)
32
- * [Release History](https://github.com/rgrove/sanitize/blob/master/HISTORY.md#sanitize-history)
33
- * [Online Demo](https://sanitize.herokuapp.com/)
34
- * [Biased comparison of Ruby HTML sanitization libraries](https://github.com/rgrove/sanitize/blob/master/COMPARISON.md)
32
+ * [Release History](https://github.com/rgrove/sanitize/releases)
33
+ * [Online Demo](https://sanitize-web.fly.dev/)
35
34
 
36
35
  Installation
37
36
  -------------
@@ -72,10 +71,11 @@ Sanitize can sanitize the following types of input:
72
71
  * Standalone CSS stylesheets
73
72
  * Standalone CSS properties
74
73
 
75
- However, please note that Sanitize _cannot_ fully sanitize the contents of
76
- `<math>` or `<svg>` elements, since these elements don't follow the same parsing
77
- rules as the rest of HTML. If this is something you need, you may want to look
78
- for another solution.
74
+ > **Warning**
75
+ >
76
+ > Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
77
+ >
78
+ > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you may create a security vulnerability in your application.
79
79
 
80
80
  ### HTML Fragments
81
81
 
@@ -118,11 +118,10 @@ Sanitize.document(html,
118
118
  :elements => ['html']
119
119
  )
120
120
  # => %[
121
- # <!DOCTYPE html>
122
- # <html>foo
121
+ # <!DOCTYPE html><html>foo
123
122
  #
124
- # </html>
125
- # ]
123
+ # </html>
124
+ # ]
126
125
  ```
127
126
 
128
127
  ### CSS in HTML
@@ -420,15 +419,21 @@ elements not in this array will be removed.
420
419
  ]
421
420
  ```
422
421
 
423
- **Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
424
- elements, since these elements don't follow the same parsing rules as the rest
425
- of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
426
- content inside them will be allowed, even if that content would otherwise be
427
- removed by Sanitize.
422
+ > **Warning**
423
+ >
424
+ > Sanitize cannot fully sanitize the contents of `<math>` or `<svg>` elements. MathML and SVG elements are [foreign elements](https://html.spec.whatwg.org/multipage/syntax.html#foreign-elements) that don't follow normal HTML parsing rules.
425
+ >
426
+ > By default, Sanitize will remove all MathML and SVG elements. If you add MathML or SVG elements to a custom element allowlist, you must assume that any content inside them will be allowed, even if that content would otherwise be removed or escaped by Sanitize. This may create a security vulnerability in your application.
427
+
428
+ > **Note**
429
+ >
430
+ > Sanitize always removes `<noscript>` elements and their contents, even if `noscript` is in the allowlist.
431
+ >
432
+ > This is because a `<noscript>` element's content is parsed differently in browsers depending on whether or not scripting is enabled. Since Nokogiri doesn't support scripting, it always parses `<noscript>` elements as if scripting is disabled. This results in edge cases where it's not possible to reliably sanitize the contents of a `<noscript>` element because Nokogiri can't fully replicate the parsing behavior of a scripting-enabled browser.
428
433
 
429
434
  #### :parser_options (Hash)
430
435
 
431
- [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
436
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
432
437
 
433
438
  ```ruby
434
439
  :parser_options => {
@@ -469,7 +474,7 @@ If this is an Array or Set of element names, then only the contents of the
469
474
  specified elements (when filtered) will be removed, and the contents of all
470
475
  other filtered elements will be left behind.
471
476
 
472
- The default value is `false`.
477
+ The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
473
478
 
474
479
  #### :transformers (Array or callable)
475
480
 
@@ -667,25 +672,3 @@ html = %[
667
672
  Sanitize.fragment(html, :transformers => youtube_transformer)
668
673
  # => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
669
674
  ```
670
-
671
- License
672
- -------
673
-
674
- Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
675
-
676
- Permission is hereby granted, free of charge, to any person obtaining a copy of
677
- this software and associated documentation files (the 'Software'), to deal in
678
- the Software without restriction, including without limitation the rights to
679
- use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
680
- the Software, and to permit persons to whom the Software is furnished to do so,
681
- subject to the following conditions:
682
-
683
- The above copyright notice and this permission notice shall be included in all
684
- copies or substantial portions of the Software.
685
-
686
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
687
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
688
- FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
689
- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
690
- IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
691
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -54,6 +54,11 @@ class Sanitize
54
54
 
55
55
  # HTML elements to allow. By default, no elements are allowed (which means
56
56
  # that all HTML will be stripped).
57
+ #
58
+ # Warning: Sanitize cannot safely sanitize the contents of foreign
59
+ # elements (elements in the MathML or SVG namespaces). Do not add `math`
60
+ # or `svg` to this list! If you do, you may create a security
61
+ # vulnerability in your application.
57
62
  :elements => [],
58
63
 
59
64
  # HTML parsing options to pass to Nokogumbo.
@@ -6,7 +6,7 @@ class Sanitize
6
6
  :elements => BASIC[:elements] + %w[
7
7
  address article aside bdi bdo body caption col colgroup data del div
8
8
  figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
9
- img ins main nav rp rt ruby section span style summary sup table tbody
9
+ img ins main nav rp rt ruby section span style summary table tbody
10
10
  td tfoot th thead title tr wbr
11
11
  ],
12
12
 
@@ -666,7 +666,9 @@ class Sanitize
666
666
  text-decoration-color
667
667
  text-decoration-line
668
668
  text-decoration-skip
669
+ text-decoration-skip-ink
669
670
  text-decoration-style
671
+ text-decoration-thickness
670
672
  text-emphasis
671
673
  text-emphasis-color
672
674
  text-emphasis-position
data/lib/sanitize/css.rb CHANGED
@@ -229,6 +229,12 @@ class Sanitize; class CSS
229
229
  rule
230
230
  end
231
231
 
232
+ # Returns `true` if the given CSS function name is an image-related function
233
+ # that may contain image URLs that need to be validated.
234
+ def image_function?(name)
235
+ ['image', 'image-set', '-webkit-image-set'].include?(name)
236
+ end
237
+
232
238
  # Passes the URL value of an @import rule to a block to ensure
233
239
  # it's an allowed URL
234
240
  def import_url_allowed?(rule)
@@ -272,6 +278,10 @@ class Sanitize; class CSS
272
278
  return nil unless valid_url?(child)
273
279
  end
274
280
 
281
+ if image_function?(name)
282
+ return nil unless valid_image?(child)
283
+ end
284
+
275
285
  combined_value << name
276
286
  return nil if name == 'expression' || combined_value == 'expression'
277
287
  end
@@ -345,4 +355,27 @@ class Sanitize; class CSS
345
355
  false
346
356
  end
347
357
 
358
+ # Returns `true` if the given node is an image-related function and contains
359
+ # only strings that use an allowlisted protocol.
360
+ def valid_image?(node)
361
+ return false unless node[:node] == :function
362
+ return false unless node.key?(:name) && image_function?(node[:name].downcase)
363
+ return false unless Array === node[:value]
364
+
365
+ node[:value].each do |token|
366
+ return false unless Hash === token
367
+
368
+ case token[:node]
369
+ when :string
370
+ if token[:value] =~ Sanitize::REGEX_PROTOCOL
371
+ return false unless @config[:protocols].include?($1.downcase)
372
+ else
373
+ return false unless @config[:protocols].include?(:relative)
374
+ end
375
+ else
376
+ next
377
+ end
378
+ end
379
+ end
380
+
348
381
  end; end
@@ -48,6 +48,7 @@ class CleanElement
48
48
  if css.strip.empty?
49
49
  node.unlink
50
50
  else
51
+ css.gsub!('</', '<\/')
51
52
  node.children.unlink
52
53
  node << Nokogiri::XML::Text.new(css, node.document)
53
54
  end
@@ -9,7 +9,11 @@ class Sanitize; module Transformers
9
9
 
10
10
  if node.type == Nokogiri::XML::Node::DTD_NODE
11
11
  if env[:config][:allow_doctype]
12
- node.name = 'html'
12
+ if node.name != "html"
13
+ document = node.document
14
+ node.unlink
15
+ document.create_internal_subset("html", nil, nil)
16
+ end
13
17
  else
14
18
  node.unlink
15
19
  end
@@ -1,5 +1,6 @@
1
1
  # encoding: utf-8
2
2
 
3
+ require 'cgi'
3
4
  require 'set'
4
5
 
5
6
  class Sanitize; module Transformers; class CleanElement
@@ -18,6 +19,18 @@ class Sanitize; module Transformers; class CleanElement
18
19
  # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
19
20
  REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
20
21
 
22
+ # Elements whose content is treated as unescaped text by HTML parsers.
23
+ UNESCAPED_TEXT_ELEMENTS = Set.new(%w[
24
+ iframe
25
+ noembed
26
+ noframes
27
+ noscript
28
+ plaintext
29
+ script
30
+ style
31
+ xmp
32
+ ])
33
+
21
34
  # Attributes that need additional escaping on `<a>` elements due to unsafe
22
35
  # libxml2 behavior.
23
36
  UNSAFE_LIBXML_ATTRS_A = Set.new(%w[
@@ -120,18 +133,15 @@ class Sanitize; module Transformers; class CleanElement
120
133
  attr_name = attr.name.downcase
121
134
 
122
135
  unless attr_allowlist.include?(attr_name)
123
- # The attribute isn't allowed.
124
-
125
- if allow_data_attributes && attr_name.start_with?('data-')
126
- # Arbitrary data attributes are allowed. If this is a data
127
- # attribute, continue.
128
- next if attr_name =~ REGEX_DATA_ATTR
136
+ # The attribute isn't in the allowlist, but may still be allowed if
137
+ # it's a data attribute.
138
+
139
+ unless allow_data_attributes && attr_name.start_with?('data-') && attr_name =~ REGEX_DATA_ATTR
140
+ # Either the attribute isn't a data attribute or arbitrary data
141
+ # attributes aren't allowed. Remove the attribute.
142
+ attr.unlink
143
+ next
129
144
  end
130
-
131
- # Either the attribute isn't a data attribute or arbitrary data
132
- # attributes aren't allowed. Remove the attribute.
133
- attr.unlink
134
- next
135
145
  end
136
146
 
137
147
  # The attribute is allowed.
@@ -188,6 +198,28 @@ class Sanitize; module Transformers; class CleanElement
188
198
  @add_attributes[name].each {|key, val| node[key] = val }
189
199
  end
190
200
 
201
+ # Make a best effort to ensure that text nodes in invalid "unescaped text"
202
+ # elements that are inside a math or svg namespace are properly escaped so
203
+ # that they don't get parsed as HTML.
204
+ #
205
+ # Sanitize is explicitly documented as not supporting MathML or SVG, but
206
+ # people sometimes allow `<math>` and `<svg>` elements in their custom
207
+ # configs without realizing that it's not safe. This workaround makes it
208
+ # slightly less unsafe, but you still shouldn't allow `<math>` or `<svg>`
209
+ # because Nokogiri doesn't parse them the same way browsers do and Sanitize
210
+ # can't guarantee that their contents are safe.
211
+ unless node.namespace.nil?
212
+ prefix = node.namespace.prefix
213
+
214
+ if (prefix == 'math' || prefix == 'svg') && UNESCAPED_TEXT_ELEMENTS.include?(name)
215
+ node.children.each do |child|
216
+ if child.type == Nokogiri::XML::Node::TEXT_NODE
217
+ child.content = CGI.escapeHTML(child.content)
218
+ end
219
+ end
220
+ end
221
+ end
222
+
191
223
  # Element-specific special cases.
192
224
  case name
193
225
 
@@ -220,6 +252,16 @@ class Sanitize; module Transformers; class CleanElement
220
252
 
221
253
  node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
222
254
  end
255
+
256
+ # A `<noscript>` element's content is parsed differently in browsers
257
+ # depending on whether or not scripting is enabled. Since Nokogiri doesn't
258
+ # support scripting, it always parses `<noscript>` elements as if scripting
259
+ # is disabled. This results in edge cases where it's not possible to
260
+ # reliably sanitize the contents of a `<noscript>` element because Nokogiri
261
+ # can't fully replicate the parsing behavior of a scripting-enabled browser.
262
+ # The safest thing to do is to simply remove all `<noscript>` elements.
263
+ when 'noscript'
264
+ node.unlink
223
265
  end
224
266
  end
225
267
 
@@ -1,5 +1,3 @@
1
- # encoding: utf-8
2
-
3
1
  class Sanitize
4
- VERSION = '5.2.1'
2
+ VERSION = '6.1.3'
5
3
  end
data/lib/sanitize.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  # encoding: utf-8
2
2
 
3
- require 'nokogumbo'
3
+ require 'nokogiri'
4
4
  require 'set'
5
5
 
6
6
  require_relative 'sanitize/version'
@@ -204,7 +204,7 @@ class Sanitize
204
204
  config[:node_name] = node.name.downcase
205
205
  config[:node_allowlist] = config[:node_whitelist] = node_allowlist
206
206
 
207
- result = transformer.call(config)
207
+ result = transformer.call(**config)
208
208
 
209
209
  if result.is_a?(Hash)
210
210
  result_allowlist = result[:node_allowlist] || result[:node_whitelist]
@@ -11,18 +11,18 @@ describe 'Sanitize::Transformers::CleanComment' do
11
11
  end
12
12
 
13
13
  it 'should remove comments' do
14
- @s.fragment('foo <!-- comment --> bar').must_equal 'foo bar'
15
- @s.fragment('foo <!-- ').must_equal 'foo '
16
- @s.fragment('foo <!-- - -> bar').must_equal 'foo '
17
- @s.fragment("foo <!--\n\n\n\n-->bar").must_equal 'foo bar'
18
- @s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo --&gt; --&gt;bar'
19
- @s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>&gt;bar</div>'
14
+ _(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo bar'
15
+ _(@s.fragment('foo <!-- ')).must_equal 'foo '
16
+ _(@s.fragment('foo <!-- - -> bar')).must_equal 'foo '
17
+ _(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal 'foo bar'
18
+ _(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo --&gt; --&gt;bar'
19
+ _(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>&gt;bar</div>'
20
20
 
21
21
  # Special case: the comment markup is inside a <script>, which makes it
22
22
  # text content and not an actual HTML comment.
23
- @s.fragment("<script><!-- comment --></script>").must_equal ''
23
+ _(@s.fragment("<script><!-- comment --></script>")).must_equal ''
24
24
 
25
- Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script'])
25
+ _(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script']))
26
26
  .must_equal '<script><!-- comment --></script>'
27
27
  end
28
28
  end
@@ -33,14 +33,14 @@ describe 'Sanitize::Transformers::CleanComment' do
33
33
  end
34
34
 
35
35
  it 'should allow comments' do
36
- @s.fragment('foo <!-- comment --> bar').must_equal 'foo <!-- comment --> bar'
37
- @s.fragment('foo <!-- ').must_equal 'foo <!-- -->'
38
- @s.fragment('foo <!-- - -> bar').must_equal 'foo <!-- - -> bar-->'
39
- @s.fragment("foo <!--\n\n\n\n-->bar").must_equal "foo <!--\n\n\n\n-->bar"
40
- @s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo <!-- <!-- <!-- --> --&gt; --&gt;bar'
41
- @s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>&gt;bar</div>'
42
-
43
- Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script'])
36
+ _(@s.fragment('foo <!-- comment --> bar')).must_equal 'foo <!-- comment --> bar'
37
+ _(@s.fragment('foo <!-- ')).must_equal 'foo <!-- -->'
38
+ _(@s.fragment('foo <!-- - -> bar')).must_equal 'foo <!-- - -> bar-->'
39
+ _(@s.fragment("foo <!--\n\n\n\n-->bar")).must_equal "foo <!--\n\n\n\n-->bar"
40
+ _(@s.fragment("foo <!-- <!-- <!-- --> --> -->bar")).must_equal 'foo <!-- <!-- <!-- --> --&gt; --&gt;bar'
41
+ _(@s.fragment("foo <div <!-- comment -->>bar</div>")).must_equal 'foo <div>&gt;bar</div>'
42
+
43
+ _(Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script']))
44
44
  .must_equal '<script><!-- comment --></script>'
45
45
  end
46
46
  end
@@ -10,15 +10,15 @@ describe 'Sanitize::Transformers::CSS::CleanAttribute' do
10
10
  end
11
11
 
12
12
  it 'should sanitize CSS properties in style attributes' do
13
- @s.fragment(%[
13
+ _(@s.fragment(%[
14
14
  <div style="color: #fff; width: expression(alert(1)); /* <-- evil! */"></div>
15
- ].strip).must_equal %[
15
+ ].strip)).must_equal %[
16
16
  <div style="color: #fff; /* <-- evil! */"></div>
17
17
  ].strip
18
18
  end
19
19
 
20
20
  it 'should remove the style attribute if the sanitized CSS is empty' do
21
- @s.fragment('<div style="width: expression(alert(1))"></div>').
21
+ _(@s.fragment('<div style="width: expression(alert(1))"></div>')).
22
22
  must_equal '<div></div>'
23
23
  end
24
24
  end
@@ -46,7 +46,7 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
46
46
  </style>
47
47
  ].strip
48
48
 
49
- @s.fragment(html).must_equal %[
49
+ _(@s.fragment(html)).must_equal %[
50
50
  <style>
51
51
  /* Yay CSS! */
52
52
  .foo { color: #fff; }
@@ -62,6 +62,6 @@ describe 'Sanitize::Transformers::CSS::CleanElement' do
62
62
  end
63
63
 
64
64
  it 'should remove the <style> element if the sanitized CSS is empty' do
65
- @s.fragment('<style></style>').must_equal ''
65
+ _(@s.fragment('<style></style>')).must_equal ''
66
66
  end
67
67
  end