sanitize 5.2.0 → 6.0.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4f01a992746ecc3f28e9c1fd14c08c99456fb98a59c0b7ba6a8c6f01d0ab07cf
4
- data.tar.gz: 4f379538b26db4d239078ea7e54fea3b106e7801d093ed7407e9b71282f6c4d3
3
+ metadata.gz: 94a37503617774f9317150c834cc3025cd32a718be754fb72eea1b9dd7347571
4
+ data.tar.gz: 597c76746d742db21842377bafab2911e7b84f389baf4dffafb2e53ecf67de92
5
5
  SHA512:
6
- metadata.gz: 52d96c5f73eea8d738fe23d816d5aec856f9f37ca37cf88d88d385fcffbf242605d13494ab531b517af7bdea44bfae2569f27bc2d5fb005dbeee85a54211d674
7
- data.tar.gz: 897e95c05448509cfeb455bb4ec156ff7557495987e1d058ff63b888f9c0069a821a9b3684e0fe0463f78e4f28faf9fe2089760ad59bbbd1b5a5390fe9632154
6
+ metadata.gz: c6d2dedfa9d6a589788d4156babae09cf14b3bebc765a9bb04a492aa5b5702f82dc3ae26d45199da3e8f9c096dfd191d15c53fea8d62084a3679604be5f7ddba
7
+ data.tar.gz: 70bbb00756f1a4a085ad5901b27fd91ebc4308d5f42bfa57ec54c8cc7982ded8395eff9b59546ca62f3dba6e7a012351d62f9ec81b06aa8ccbb563211f39bd3c
data/HISTORY.md CHANGED
@@ -1,5 +1,70 @@
1
1
  # Sanitize History
2
2
 
3
+ ## 6.0.0 (2021-08-03)
4
+
5
+ ### Potentially Breaking Changes
6
+
7
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
8
+
9
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
10
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
11
+
12
+ [211]:https://github.com/rgrove/sanitize/pull/211
13
+
14
+ ## 5.2.3 (2021-01-11)
15
+
16
+ ### Bug Fixes
17
+
18
+ * Ensure protocol sanitization is applied to data attributes.
19
+ [@ccutrer - #207][207]
20
+
21
+ [207]:https://github.com/rgrove/sanitize/pull/207
22
+
23
+ ## 5.2.2 (2021-01-06)
24
+
25
+ ### Bug Fixes
26
+
27
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
28
+ custom transformer. [@mscrivo - #206][206]
29
+
30
+ [206]:https://github.com/rgrove/sanitize/pull/206
31
+
32
+ ## 5.2.1 (2020-06-16)
33
+
34
+ ### Bug Fixes
35
+
36
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
37
+ Sanitize versions 3.0.0 through 5.2.0.
38
+
39
+ When HTML was sanitized using the "relaxed" config or a custom config that
40
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
41
+ have beeen sanitized correctly even if `math` and `svg` were not in the
42
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
43
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
44
+
45
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
46
+ config or a custom config that allows one or more of the following HTML
47
+ elements:
48
+
49
+ - `iframe`
50
+ - `math`
51
+ - `noembed`
52
+ - `noframes`
53
+ - `noscript`
54
+ - `plaintext`
55
+ - `script`
56
+ - `style`
57
+ - `svg`
58
+ - `xmp`
59
+
60
+ See the security advisory for more details, including a workaround if you're
61
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
62
+
63
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
64
+ helping to verify the fix.
65
+
66
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
67
+
3
68
  ## 5.2.0 (2020-06-06)
4
69
 
5
70
  ### Changes
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2015 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2021 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.md CHANGED
@@ -11,17 +11,17 @@ protocols within attributes that contain URLs. You can also allow specific CSS
11
11
  properties, @ rules, and URL protocols in elements or attributes containing CSS.
12
12
  Any HTML or CSS that you don't explicitly allow will be removed.
13
13
 
14
- Sanitize is based on [Google's Gumbo HTML5 parser][gumbo], which parses HTML
14
+ Sanitize is based on the [Nokogumbo HTML5 parser][nokogumbo], which parses HTML
15
15
  exactly the same way modern browsers do, and [Crass][crass], which parses CSS
16
16
  exactly the same way modern browsers do. As long as your allowlist config only
17
17
  allows safe markup and CSS, even the most malformed or malicious input will be
18
18
  transformed into safe output.
19
19
 
20
- [![Build Status](https://travis-ci.org/rgrove/sanitize.svg?branch=master)](https://travis-ci.org/rgrove/sanitize)
21
20
  [![Gem Version](https://badge.fury.io/rb/sanitize.svg)](http://badge.fury.io/rb/sanitize)
21
+ [![Tests](https://github.com/rgrove/sanitize/workflows/Tests/badge.svg)](https://github.com/rgrove/sanitize/actions?query=workflow%3ATests)
22
22
 
23
23
  [crass]:https://github.com/rgrove/crass
24
- [gumbo]:https://github.com/google/gumbo-parser
24
+ [nokogumbo]:https://github.com/rubys/nokogumbo
25
25
 
26
26
  Links
27
27
  -----
@@ -72,6 +72,11 @@ Sanitize can sanitize the following types of input:
72
72
  * Standalone CSS stylesheets
73
73
  * Standalone CSS properties
74
74
 
75
+ However, please note that Sanitize _cannot_ fully sanitize the contents of
76
+ `<math>` or `<svg>` elements, since these elements don't follow the same parsing
77
+ rules as the rest of HTML. If this is something you need, you may want to look
78
+ for another solution.
79
+
75
80
  ### HTML Fragments
76
81
 
77
82
  A fragment is a snippet of HTML that doesn't contain a root-level `<html>`
@@ -415,9 +420,15 @@ elements not in this array will be removed.
415
420
  ]
416
421
  ```
417
422
 
423
+ **Warning:** Sanitize cannot fully sanitize the contents of `<math>` or `<svg>`
424
+ elements, since these elements don't follow the same parsing rules as the rest
425
+ of HTML. If you add `math` or `svg` to the allowlist, you must assume that any
426
+ content inside them will be allowed, even if that content would otherwise be
427
+ removed by Sanitize.
428
+
418
429
  #### :parser_options (Hash)
419
430
 
420
- [Parsing options](https://github.com/rubys/nokogumbo/tree/v2.0.1#parsing-options) supplied to `nokogumbo`.
431
+ [Parsing options](https://github.com/rubys/nokogumbo/tree/master#parsing-options) to be supplied to `nokogumbo`.
421
432
 
422
433
  ```ruby
423
434
  :parser_options => {
@@ -458,7 +469,7 @@ If this is an Array or Set of element names, then only the contents of the
458
469
  specified elements (when filtered) will be removed, and the contents of all
459
470
  other filtered elements will be left behind.
460
471
 
461
- The default value is `false`.
472
+ The default value is `%w[iframe math noembed noframes noscript plaintext script style svg xmp]`.
462
473
 
463
474
  #### :transformers (Array or callable)
464
475
 
@@ -656,25 +667,3 @@ html = %[
656
667
  Sanitize.fragment(html, :transformers => youtube_transformer)
657
668
  # => '<iframe width="420" height="315" src="//www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allowfullscreen=""></iframe>'
658
669
  ```
659
-
660
- License
661
- -------
662
-
663
- Copyright (c) 2015 Ryan Grove (ryan@wonko.com)
664
-
665
- Permission is hereby granted, free of charge, to any person obtaining a copy of
666
- this software and associated documentation files (the 'Software'), to deal in
667
- the Software without restriction, including without limitation the rights to
668
- use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
669
- the Software, and to permit persons to whom the Software is furnished to do so,
670
- subject to the following conditions:
671
-
672
- The above copyright notice and this permission notice shall be included in all
673
- copies or substantial portions of the Software.
674
-
675
- THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
676
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
677
- FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
678
- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
679
- IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
680
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/lib/sanitize.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  # encoding: utf-8
2
2
 
3
- require 'nokogumbo'
3
+ require 'nokogiri'
4
4
  require 'set'
5
5
 
6
6
  require_relative 'sanitize/version'
@@ -204,7 +204,7 @@ class Sanitize
204
204
  config[:node_name] = node.name.downcase
205
205
  config[:node_allowlist] = config[:node_whitelist] = node_allowlist
206
206
 
207
- result = transformer.call(config)
207
+ result = transformer.call(**config)
208
208
 
209
209
  if result.is_a?(Hash)
210
210
  result_allowlist = result[:node_allowlist] || result[:node_whitelist]
@@ -74,7 +74,7 @@ class Sanitize
74
74
  # the specified elements (when filtered) will be removed, and the contents
75
75
  # of all other filtered elements will be left behind.
76
76
  :remove_contents => %w[
77
- iframe noembed noframes noscript script style
77
+ iframe math noembed noframes noscript plaintext script style svg xmp
78
78
  ],
79
79
 
80
80
  # Transformers allow you to filter or alter nodes using custom logic. See
@@ -6,7 +6,7 @@ class Sanitize
6
6
  :elements => BASIC[:elements] + %w[
7
7
  address article aside bdi bdo body caption col colgroup data del div
8
8
  figcaption figure footer h1 h2 h3 h4 h5 h6 head header hgroup hr html
9
- img ins main nav rp rt ruby section span style summary sup table tbody
9
+ img ins main nav rp rt ruby section span style summary table tbody
10
10
  td tfoot th thead title tr wbr
11
11
  ],
12
12
 
@@ -120,18 +120,15 @@ class Sanitize; module Transformers; class CleanElement
120
120
  attr_name = attr.name.downcase
121
121
 
122
122
  unless attr_allowlist.include?(attr_name)
123
- # The attribute isn't allowed.
124
-
125
- if allow_data_attributes && attr_name.start_with?('data-')
126
- # Arbitrary data attributes are allowed. If this is a data
127
- # attribute, continue.
128
- next if attr_name =~ REGEX_DATA_ATTR
123
+ # The attribute isn't in the allowlist, but may still be allowed if
124
+ # it's a data attribute.
125
+
126
+ unless allow_data_attributes && attr_name.start_with?('data-') && attr_name =~ REGEX_DATA_ATTR
127
+ # Either the attribute isn't a data attribute or arbitrary data
128
+ # attributes aren't allowed. Remove the attribute.
129
+ attr.unlink
130
+ next
129
131
  end
130
-
131
- # Either the attribute isn't a data attribute or arbitrary data
132
- # attributes aren't allowed. Remove the attribute.
133
- attr.unlink
134
- next
135
132
  end
136
133
 
137
134
  # The attribute is allowed.
@@ -1,5 +1,5 @@
1
1
  # encoding: utf-8
2
2
 
3
3
  class Sanitize
4
- VERSION = '5.2.0'
4
+ VERSION = '6.0.0'
5
5
  end
@@ -192,21 +192,16 @@ describe 'Sanitize::Transformers::CleanElement' do
192
192
  .must_equal ''
193
193
  end
194
194
 
195
- it 'should escape the content of removed `plaintext` elements' do
196
- Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
197
- .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
198
- end
199
-
200
- it 'should escape the content of removed `xmp` elements' do
201
- Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
202
- .must_equal 'hello! &lt;script&gt;alert(0)&lt;/script&gt;'
203
- end
204
-
205
195
  it 'should not preserve the content of removed `iframe` elements' do
206
196
  Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
207
197
  .must_equal ''
208
198
  end
209
199
 
200
+ it 'should not preserve the content of removed `math` elements' do
201
+ Sanitize.fragment('<math>hello! <script>alert(0)</script></math>')
202
+ .must_equal ''
203
+ end
204
+
210
205
  it 'should not preserve the content of removed `noembed` elements' do
211
206
  Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
212
207
  .must_equal ''
@@ -222,6 +217,11 @@ describe 'Sanitize::Transformers::CleanElement' do
222
217
  .must_equal ''
223
218
  end
224
219
 
220
+ it 'should not preserve the content of removed `plaintext` elements' do
221
+ Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
222
+ .must_equal ''
223
+ end
224
+
225
225
  it 'should not preserve the content of removed `script` elements' do
226
226
  Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
227
227
  .must_equal ''
@@ -232,6 +232,16 @@ describe 'Sanitize::Transformers::CleanElement' do
232
232
  .must_equal ''
233
233
  end
234
234
 
235
+ it 'should not preserve the content of removed `svg` elements' do
236
+ Sanitize.fragment('<svg>hello! <script>alert(0)</script></svg>')
237
+ .must_equal ''
238
+ end
239
+
240
+ it 'should not preserve the content of removed `xmp` elements' do
241
+ Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
242
+ .must_equal ''
243
+ end
244
+
235
245
  strings.each do |name, data|
236
246
  it "should clean #{name} HTML" do
237
247
  Sanitize.fragment(data[:html]).must_equal(data[:default])
@@ -481,6 +491,22 @@ describe 'Sanitize::Transformers::CleanElement' do
481
491
  }).must_equal "<a>Text</a>"
482
492
  end
483
493
 
494
+ it 'should sanitize protocols in data attributes even if data attributes are generically allowed' do
495
+ input = '<a data-url="mailto:someone@example.com">Text</a>'
496
+
497
+ Sanitize.fragment(input, {
498
+ :elements => ['a'],
499
+ :attributes => {'a' => [:data]},
500
+ :protocols => {'a' => {'data-url' => ['https']}}
501
+ }).must_equal "<a>Text</a>"
502
+
503
+ Sanitize.fragment(input, {
504
+ :elements => ['a'],
505
+ :attributes => {'a' => [:data]},
506
+ :protocols => {'a' => {'data-url' => ['mailto']}}
507
+ }).must_equal input
508
+ end
509
+
484
510
  it 'should prevent `<meta>` tags from being used to set a non-UTF-8 charset' do
485
511
  Sanitize.document('<html><head><meta charset="utf-8"></head><body>Howdy!</body></html>',
486
512
  :elements => %w[html head meta body],
@@ -219,4 +219,17 @@ describe 'Malicious HTML' do
219
219
  end
220
220
  end
221
221
  end
222
+
223
+ # https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
224
+ describe 'foreign content bypass in relaxed config' do
225
+ it 'prevents a sanitization bypass via carefully crafted foreign content' do
226
+ %w[iframe noembed noframes noscript plaintext script style xmp].each do |tag_name|
227
+ @s.fragment(%[<math><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
228
+ must_equal ''
229
+
230
+ @s.fragment(%[<svg><#{tag_name}>/*&lt;/#{tag_name}&gt;&lt;img src onerror=alert(1)>*/]).
231
+ must_equal ''
232
+ end
233
+ end
234
+ end
222
235
  end
@@ -53,9 +53,9 @@ describe 'Sanitize' do
53
53
  @s.document("a#{sample_non_chars}z").must_equal "<html>az</html>"
54
54
  end
55
55
 
56
- describe 'when html body exceeds Nokogumbo::DEFAULT_MAX_TREE_DEPTH' do
56
+ describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
57
57
  let(:content) do
58
- content = nest_html_content('<b>foo</b>', Nokogumbo::DEFAULT_MAX_TREE_DEPTH)
58
+ content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
59
59
  "<html>#{content}</html>"
60
60
  end
61
61
 
@@ -115,9 +115,9 @@ describe 'Sanitize' do
115
115
  @s.fragment("a#{sample_non_chars}z").must_equal "az"
116
116
  end
117
117
 
118
- describe 'when html body exceeds Nokogumbo::DEFAULT_MAX_TREE_DEPTH' do
118
+ describe 'when html body exceeds Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH' do
119
119
  let(:content) do
120
- content = nest_html_content('<b>foo</b>', Nokogumbo::DEFAULT_MAX_TREE_DEPTH)
120
+ content = nest_html_content('<b>foo</b>', Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH)
121
121
  "<body>#{content}</body>"
122
122
  end
123
123
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanitize
3
3
  version: !ruby/object:Gem::Version
4
- version: 5.2.0
4
+ version: 6.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ryan Grove
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-06 00:00:00.000000000 Z
11
+ date: 2021-08-04 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: crass
@@ -30,56 +30,42 @@ dependencies:
30
30
  requirements:
31
31
  - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: 1.8.0
33
+ version: 1.12.0
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
- version: 1.8.0
41
- - !ruby/object:Gem::Dependency
42
- name: nokogumbo
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - "~>"
46
- - !ruby/object:Gem::Version
47
- version: '2.0'
48
- type: :runtime
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - "~>"
53
- - !ruby/object:Gem::Version
54
- version: '2.0'
40
+ version: 1.12.0
55
41
  - !ruby/object:Gem::Dependency
56
42
  name: minitest
57
43
  requirement: !ruby/object:Gem::Requirement
58
44
  requirements:
59
45
  - - "~>"
60
46
  - !ruby/object:Gem::Version
61
- version: 5.11.3
47
+ version: 5.14.4
62
48
  type: :development
63
49
  prerelease: false
64
50
  version_requirements: !ruby/object:Gem::Requirement
65
51
  requirements:
66
52
  - - "~>"
67
53
  - !ruby/object:Gem::Version
68
- version: 5.11.3
54
+ version: 5.14.4
69
55
  - !ruby/object:Gem::Dependency
70
56
  name: rake
71
57
  requirement: !ruby/object:Gem::Requirement
72
58
  requirements:
73
59
  - - "~>"
74
60
  - !ruby/object:Gem::Version
75
- version: 12.3.1
61
+ version: 13.0.6
76
62
  type: :development
77
63
  prerelease: false
78
64
  version_requirements: !ruby/object:Gem::Requirement
79
65
  requirements:
80
66
  - - "~>"
81
67
  - !ruby/object:Gem::Version
82
- version: 12.3.1
68
+ version: 13.0.6
83
69
  description: Sanitize is an allowlist-based HTML and CSS sanitizer. It removes all
84
70
  HTML and/or CSS from a string except the elements, attributes, and properties you
85
71
  choose to allow.
@@ -120,7 +106,7 @@ homepage: https://github.com/rgrove/sanitize/
120
106
  licenses:
121
107
  - MIT
122
108
  metadata: {}
123
- post_install_message:
109
+ post_install_message:
124
110
  rdoc_options: []
125
111
  require_paths:
126
112
  - lib
@@ -128,15 +114,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
128
114
  requirements:
129
115
  - - ">="
130
116
  - !ruby/object:Gem::Version
131
- version: 2.1.0
117
+ version: 2.5.0
132
118
  required_rubygems_version: !ruby/object:Gem::Requirement
133
119
  requirements:
134
120
  - - ">="
135
121
  - !ruby/object:Gem::Version
136
122
  version: 1.2.0
137
123
  requirements: []
138
- rubygems_version: 3.0.3
139
- signing_key:
124
+ rubygems_version: 3.2.22
125
+ signing_key:
140
126
  specification_version: 4
141
127
  summary: Allowlist-based HTML and CSS sanitizer.
142
128
  test_files: []