sanitize 4.6.6 → 5.0.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +56 -0
- data/README.md +4 -4
- data/lib/sanitize.rb +2 -44
- data/lib/sanitize/config/default.rb +6 -4
- data/lib/sanitize/transformers/clean_element.rb +44 -3
- data/lib/sanitize/version.rb +1 -1
- data/test/test_clean_comment.rb +1 -5
- data/test/test_clean_css.rb +1 -1
- data/test/test_clean_doctype.rb +8 -8
- data/test/test_clean_element.rb +108 -23
- data/test/test_malicious_html.rb +15 -6
- data/test/test_parser.rb +2 -31
- data/test/test_sanitize.rb +4 -4
- data/test/test_transformers.rb +4 -4
- data/test/test_unicode.rb +12 -12
- metadata +12 -12
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c88243234986bc11c6e1da92e05f9ea153d6016f5e5c3c8e8ad6602b7225e07f
|
4
|
+
data.tar.gz: abf83048949361fbcaf7fdb1d03066c9787303ceee39c42d69a245d300bc4453
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: f72364a3ec7939a07d30f681c58f4bd4bafa804dff0ecef69a8fb31b16d2e77439c4b1e18c756e370b756067e1bacd7bd8ea8943d447ad144396068da57798a2
|
7
|
+
data.tar.gz: 1ac997e7ae3f0ffc65d002e439b63bf755acda220bb295a7d648d474333e9d9747259f4cca2af715da8df9f425c17eb8a8148ba5cf12c91cbfee71a74da15eda
|
data/HISTORY.md
CHANGED
@@ -1,5 +1,41 @@
|
|
1
1
|
# Sanitize History
|
2
2
|
|
3
|
+
## 5.0.0 (2018-10-14)
|
4
|
+
|
5
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
6
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
7
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
8
|
+
review the changes below carefully.
|
9
|
+
|
10
|
+
### Potentially Breaking Changes
|
11
|
+
|
12
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
13
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
14
|
+
no longer works in Ruby 1.9.x.
|
15
|
+
|
16
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
17
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
18
|
+
|
19
|
+
* Children of the following elements are now removed by default when these
|
20
|
+
elements are removed, rather than being preserved and escaped:
|
21
|
+
|
22
|
+
- `iframe`
|
23
|
+
- `noembed`
|
24
|
+
- `noframes`
|
25
|
+
- `noscript`
|
26
|
+
- `script`
|
27
|
+
- `style`
|
28
|
+
|
29
|
+
* Children of whitelisted `iframe` elements are now always removed. In modern
|
30
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
31
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
32
|
+
browsers, but it's been almost two decades since that was useful.
|
33
|
+
|
34
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
35
|
+
`true` when it was actually an Array.
|
36
|
+
|
37
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
38
|
+
|
3
39
|
## 4.6.6 (2018-07-23)
|
4
40
|
|
5
41
|
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
@@ -324,6 +360,26 @@ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
|
324
360
|
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
325
361
|
|
326
362
|
|
363
|
+
## 2.1.1 (2018-09-30)
|
364
|
+
|
365
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
366
|
+
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
367
|
+
|
368
|
+
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
369
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
370
|
+
escaped output, allowing non-whitelisted attributes to be used on whitelisted
|
371
|
+
elements.
|
372
|
+
|
373
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
374
|
+
this.
|
375
|
+
|
376
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
377
|
+
this issue.
|
378
|
+
|
379
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
380
|
+
[188]:https://github.com/rgrove/sanitize/pull/188
|
381
|
+
|
382
|
+
|
327
383
|
## 2.1.0 (2014-01-13)
|
328
384
|
|
329
385
|
* Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
|
data/README.md
CHANGED
@@ -441,13 +441,13 @@ include the symbol `:relative` in the protocol array:
|
|
441
441
|
|
442
442
|
#### :remove_contents (boolean or Array or Set)
|
443
443
|
|
444
|
-
If
|
444
|
+
If this is `true`, Sanitize will remove the contents of any non-whitelisted
|
445
445
|
elements in addition to the elements themselves. By default, Sanitize leaves the
|
446
446
|
safe parts of an element's contents behind when the element is removed.
|
447
447
|
|
448
|
-
If
|
449
|
-
elements (when filtered) will be removed, and the contents of all
|
450
|
-
elements will be left behind.
|
448
|
+
If this is an Array or Set of element names, then only the contents of the
|
449
|
+
specified elements (when filtered) will be removed, and the contents of all
|
450
|
+
other filtered elements will be left behind.
|
451
451
|
|
452
452
|
The default value is `false`.
|
453
453
|
|
data/lib/sanitize.rb
CHANGED
@@ -121,19 +121,7 @@ class Sanitize
|
|
121
121
|
return '' unless html
|
122
122
|
|
123
123
|
html = preprocess(html)
|
124
|
-
|
125
|
-
|
126
|
-
# Hack to allow fragments containing <body>. Borrowed from
|
127
|
-
# Nokogiri::HTML::DocumentFragment.
|
128
|
-
if html =~ /\A<body(?:\s|>)/i
|
129
|
-
path = '/html/body'
|
130
|
-
else
|
131
|
-
path = '/html/body/node()'
|
132
|
-
end
|
133
|
-
|
134
|
-
frag = doc.fragment
|
135
|
-
frag << doc.xpath(path)
|
136
|
-
|
124
|
+
frag = Nokogiri::HTML5.fragment(html)
|
137
125
|
node!(frag)
|
138
126
|
to_html(frag)
|
139
127
|
end
|
@@ -184,37 +172,7 @@ class Sanitize
|
|
184
172
|
end
|
185
173
|
|
186
174
|
def to_html(node)
|
187
|
-
|
188
|
-
|
189
|
-
# Hacky workaround for a libxml2 bug that adds an undesired Content-Type
|
190
|
-
# meta tag to all serialized HTML documents.
|
191
|
-
#
|
192
|
-
# https://github.com/sparklemotion/nokogiri/issues/1008
|
193
|
-
if node.type == Nokogiri::XML::Node::DOCUMENT_NODE ||
|
194
|
-
node.type == Nokogiri::XML::Node::HTML_DOCUMENT_NODE
|
195
|
-
|
196
|
-
regex_meta = %r|(<html[^>]*>\s*<head[^>]*>\s*)<meta http-equiv="Content-Type" content="text/html; charset=utf-8">|i
|
197
|
-
|
198
|
-
# Only replace the content-type meta tag if <meta> isn't whitelisted or
|
199
|
-
# the original document didn't actually include a content-type meta tag.
|
200
|
-
replace_meta = !@config[:elements].include?('meta') ||
|
201
|
-
node.xpath('/html/head/meta[@http-equiv]').none? do |meta|
|
202
|
-
meta['http-equiv'].casecmp('content-type').zero?
|
203
|
-
end
|
204
|
-
end
|
205
|
-
|
206
|
-
so = Nokogiri::XML::Node::SaveOptions
|
207
|
-
|
208
|
-
# Serialize to HTML without any formatting to prevent Nokogiri from adding
|
209
|
-
# newlines after certain tags.
|
210
|
-
html = node.to_html(
|
211
|
-
:encoding => 'utf-8',
|
212
|
-
:indent => 0,
|
213
|
-
:save_with => so::NO_DECLARATION | so::NO_EMPTY_TAGS | so::AS_HTML
|
214
|
-
)
|
215
|
-
|
216
|
-
html.gsub!(regex_meta, '\1') if replace_meta
|
217
|
-
html
|
175
|
+
node.to_html(preserve_newline: true)
|
218
176
|
end
|
219
177
|
|
220
178
|
def transform_node!(node, node_whitelist)
|
@@ -66,10 +66,12 @@ class Sanitize
|
|
66
66
|
# leaves the safe parts of an element's contents behind when the element
|
67
67
|
# is removed.
|
68
68
|
#
|
69
|
-
# If this is an Array of element names, then only the contents of
|
70
|
-
# specified elements (when filtered) will be removed, and the contents
|
71
|
-
# all other filtered elements will be left behind.
|
72
|
-
:remove_contents =>
|
69
|
+
# If this is an Array or Set of element names, then only the contents of
|
70
|
+
# the specified elements (when filtered) will be removed, and the contents
|
71
|
+
# of all other filtered elements will be left behind.
|
72
|
+
:remove_contents => %w[
|
73
|
+
iframe noembed noframes noscript script style
|
74
|
+
],
|
73
75
|
|
74
76
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
75
77
|
# README.md for details and examples.
|
@@ -67,7 +67,7 @@ class Sanitize; module Transformers; class CleanElement
|
|
67
67
|
@whitespace_elements = config[:whitespace_elements]
|
68
68
|
end
|
69
69
|
|
70
|
-
if config[:remove_contents].is_a?(
|
70
|
+
if config[:remove_contents].is_a?(Enumerable)
|
71
71
|
@remove_element_contents.merge(config[:remove_contents].map(&:to_s))
|
72
72
|
else
|
73
73
|
@remove_all_contents = !!config[:remove_contents]
|
@@ -97,8 +97,10 @@ class Sanitize; module Transformers; class CleanElement
|
|
97
97
|
end
|
98
98
|
end
|
99
99
|
|
100
|
-
unless
|
101
|
-
|
100
|
+
unless node.children.empty?
|
101
|
+
unless @remove_all_contents || @remove_element_contents.include?(name)
|
102
|
+
node.add_previous_sibling(node.children)
|
103
|
+
end
|
102
104
|
end
|
103
105
|
|
104
106
|
node.unlink
|
@@ -166,6 +168,11 @@ class Sanitize; module Transformers; class CleanElement
|
|
166
168
|
# affected attributes, some of which can exist on any element and some
|
167
169
|
# of which can only exist on `<a>` elements.
|
168
170
|
#
|
171
|
+
# This fix is technically no longer necessary with Nokogumbo >= 2.0
|
172
|
+
# since it no longer uses libxml2's serializer, but it's retained to
|
173
|
+
# avoid breaking use cases where people might be sanitizing individual
|
174
|
+
# Nokogiri nodes and then serializing them manually without Nokogumbo.
|
175
|
+
#
|
169
176
|
# The relevant libxml2 code is here:
|
170
177
|
# <https://github.com/GNOME/libxml2/commit/960f0e275616cadc29671a218d7fb9b69eb35588>
|
171
178
|
if UNSAFE_LIBXML_ATTRS_GLOBAL.include?(attr_name) ||
|
@@ -180,6 +187,40 @@ class Sanitize; module Transformers; class CleanElement
|
|
180
187
|
if @add_attributes.include?(name)
|
181
188
|
@add_attributes[name].each {|key, val| node[key] = val }
|
182
189
|
end
|
190
|
+
|
191
|
+
# Element-specific special cases.
|
192
|
+
case name
|
193
|
+
|
194
|
+
# If this is a whitelisted iframe that has children, remove all its
|
195
|
+
# children. The HTML standard says iframes shouldn't have content, but when
|
196
|
+
# they do, this content is parsed as text and is serialized verbatim without
|
197
|
+
# being escaped, which is unsafe because legacy browsers may still render it
|
198
|
+
# and execute `<script>` content. So the safe and correct thing to do is to
|
199
|
+
# always remove iframe content.
|
200
|
+
when 'iframe'
|
201
|
+
if !node.children.empty?
|
202
|
+
node.children.each do |child|
|
203
|
+
child.unlink
|
204
|
+
end
|
205
|
+
end
|
206
|
+
|
207
|
+
# Prevent the use of `<meta>` elements that set a charset other than UTF-8,
|
208
|
+
# since Sanitize's output is always UTF-8.
|
209
|
+
when 'meta'
|
210
|
+
if node.has_attribute?('charset') &&
|
211
|
+
node['charset'].downcase != 'utf-8'
|
212
|
+
|
213
|
+
node['charset'] = 'utf-8'
|
214
|
+
end
|
215
|
+
|
216
|
+
if node.has_attribute?('http-equiv') &&
|
217
|
+
node.has_attribute?('content') &&
|
218
|
+
node['http-equiv'].downcase == 'content-type' &&
|
219
|
+
node['content'].downcase =~ /;\s*charset\s*=\s*(?!utf-8)/
|
220
|
+
|
221
|
+
node['content'] = node['content'].gsub(/;\s*charset\s*=.+\z/, ';charset=utf-8')
|
222
|
+
end
|
223
|
+
end
|
183
224
|
end
|
184
225
|
|
185
226
|
end; end; end
|
data/lib/sanitize/version.rb
CHANGED
data/test/test_clean_comment.rb
CHANGED
@@ -20,7 +20,7 @@ describe 'Sanitize::Transformers::CleanComment' do
|
|
20
20
|
|
21
21
|
# Special case: the comment markup is inside a <script>, which makes it
|
22
22
|
# text content and not an actual HTML comment.
|
23
|
-
@s.fragment("<script><!-- comment --></script>").must_equal '
|
23
|
+
@s.fragment("<script><!-- comment --></script>").must_equal ''
|
24
24
|
|
25
25
|
Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => false, :elements => ['script'])
|
26
26
|
.must_equal '<script><!-- comment --></script>'
|
@@ -40,10 +40,6 @@ describe 'Sanitize::Transformers::CleanComment' do
|
|
40
40
|
@s.fragment("foo <!-- <!-- <!-- --> --> -->bar").must_equal 'foo <!-- <!-- <!-- --> --> -->bar'
|
41
41
|
@s.fragment("foo <div <!-- comment -->>bar</div>").must_equal 'foo <div>>bar</div>'
|
42
42
|
|
43
|
-
# Special case: the comment markup is inside a <script>, which makes it
|
44
|
-
# text content and not an actual HTML comment.
|
45
|
-
@s.fragment("<script><!-- comment --></script>").must_equal '<!-- comment -->'
|
46
|
-
|
47
43
|
Sanitize.fragment("<script><!-- comment --></script>", :allow_comments => true, :elements => ['script'])
|
48
44
|
.must_equal '<script><!-- comment --></script>'
|
49
45
|
end
|
data/test/test_clean_css.rb
CHANGED
@@ -13,7 +13,7 @@ describe 'Sanitize::Transformers::CSS::CleanAttribute' do
|
|
13
13
|
@s.fragment(%[
|
14
14
|
<div style="color: #fff; width: expression(alert(1)); /* <-- evil! */"></div>
|
15
15
|
].strip).must_equal %[
|
16
|
-
<div style="color: #fff; /*
|
16
|
+
<div style="color: #fff; /* <-- evil! */"></div>
|
17
17
|
].strip
|
18
18
|
end
|
19
19
|
|
data/test/test_clean_doctype.rb
CHANGED
@@ -11,7 +11,7 @@ describe 'Sanitize::Transformers::CleanDoctype' do
|
|
11
11
|
end
|
12
12
|
|
13
13
|
it 'should remove doctype declarations' do
|
14
|
-
@s.document('<!DOCTYPE html><html>foo</html>').must_equal "<html>foo</html
|
14
|
+
@s.document('<!DOCTYPE html><html>foo</html>').must_equal "<html>foo</html>"
|
15
15
|
@s.fragment('<!DOCTYPE html>foo').must_equal 'foo'
|
16
16
|
end
|
17
17
|
|
@@ -34,27 +34,27 @@ describe 'Sanitize::Transformers::CleanDoctype' do
|
|
34
34
|
|
35
35
|
it 'should allow doctype declarations in documents' do
|
36
36
|
@s.document('<!DOCTYPE html><html>foo</html>')
|
37
|
-
.must_equal "<!DOCTYPE html
|
37
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
38
38
|
|
39
39
|
@s.document('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
|
40
|
-
.must_equal "<!DOCTYPE html
|
40
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
41
41
|
|
42
42
|
@s.document("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html>foo</html>")
|
43
|
-
.must_equal "<!DOCTYPE html
|
43
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
44
44
|
end
|
45
45
|
|
46
46
|
it 'should not allow obviously invalid doctype declarations in documents' do
|
47
47
|
@s.document('<!DOCTYPE blah blah blah><html>foo</html>')
|
48
|
-
.must_equal "<!DOCTYPE html
|
48
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
49
49
|
|
50
50
|
@s.document('<!DOCTYPE blah><html>foo</html>')
|
51
|
-
.must_equal "<!DOCTYPE html
|
51
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
52
52
|
|
53
53
|
@s.document('<!DOCTYPE html BLAH "-//W3C//DTD HTML 4.01//EN"><html>foo</html>')
|
54
|
-
.must_equal "<!DOCTYPE html
|
54
|
+
.must_equal "<!DOCTYPE html><html>foo</html>"
|
55
55
|
|
56
56
|
@s.document('<!whatever><html>foo</html>')
|
57
|
-
.must_equal "<html>foo</html
|
57
|
+
.must_equal "<html>foo</html>"
|
58
58
|
end
|
59
59
|
|
60
60
|
it 'should not allow doctype definitions in fragments' do
|
data/test/test_clean_element.rb
CHANGED
@@ -8,25 +8,22 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
8
8
|
strings = {
|
9
9
|
:basic => {
|
10
10
|
:html => '<b>Lo<!-- comment -->rem</b> <a href="pants" title="foo" style="text-decoration: underline;">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <style>.foo { color: #fff; }</style> <script>alert("hello world");</script>',
|
11
|
-
|
12
|
-
:
|
13
|
-
:
|
14
|
-
:
|
15
|
-
:relaxed => '<b>Lorem</b> <a href="pants" title="foo" style="text-decoration: underline;">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet <style>.foo { color: #fff; }</style> alert("hello world");'
|
11
|
+
:default => 'Lorem ipsum dolor sit amet ',
|
12
|
+
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet ',
|
13
|
+
:basic => '<b>Lorem</b> <a href="pants" rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet ',
|
14
|
+
:relaxed => '<b>Lorem</b> <a href="pants" title="foo" style="text-decoration: underline;">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet <style>.foo { color: #fff; }</style> '
|
16
15
|
},
|
17
16
|
|
18
17
|
:malformed => {
|
19
18
|
:html => 'Lo<!-- comment -->rem</b> <a href=pants title="foo>ipsum <a href="http://foo.com/"><strong>dolor</a></strong> sit<br/>amet <script>alert("hello world");',
|
20
|
-
|
21
|
-
:
|
22
|
-
:
|
23
|
-
:
|
24
|
-
:relaxed => 'Lorem <a href="pants" title="foo>ipsum <a href="><strong>dolor</strong></a> sit<br>amet alert("hello world");',
|
19
|
+
:default => 'Lorem dolor sit amet ',
|
20
|
+
:restricted => 'Lorem <strong>dolor</strong> sit amet ',
|
21
|
+
:basic => 'Lorem <a href="pants" rel="nofollow"><strong>dolor</strong></a> sit<br>amet ',
|
22
|
+
:relaxed => 'Lorem <a href="pants" title="foo>ipsum <a href="><strong>dolor</strong></a> sit<br>amet ',
|
25
23
|
},
|
26
24
|
|
27
25
|
:unclosed => {
|
28
26
|
:html => '<p>a</p><blockquote>b',
|
29
|
-
|
30
27
|
:default => ' a b ',
|
31
28
|
:restricted => ' a b ',
|
32
29
|
:basic => '<p>a</p><blockquote>b</blockquote>',
|
@@ -35,7 +32,6 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
35
32
|
|
36
33
|
:malicious => {
|
37
34
|
:html => '<b>Lo<!-- comment -->rem</b> <a href="javascript:pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <<foo>script>alert("hello world");</script>',
|
38
|
-
|
39
35
|
:default => 'Lorem ipsum dolor sit amet <script>alert("hello world");',
|
40
36
|
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet <script>alert("hello world");',
|
41
37
|
:basic => '<b>Lorem</b> <a rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet <script>alert("hello world");',
|
@@ -171,10 +167,10 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
171
167
|
.must_equal 'foo bar baz quux'
|
172
168
|
|
173
169
|
Sanitize.fragment('<script>alert("<xss>");</script>')
|
174
|
-
.must_equal '
|
170
|
+
.must_equal ''
|
175
171
|
|
176
172
|
Sanitize.fragment('<<script>script>alert("<xss>");</<script>>')
|
177
|
-
.must_equal '<
|
173
|
+
.must_equal '<'
|
178
174
|
|
179
175
|
Sanitize.fragment('< script <>> alert("<xss>");</script>')
|
180
176
|
.must_equal '< script <>> alert("");'
|
@@ -196,6 +192,46 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
196
192
|
.must_equal ''
|
197
193
|
end
|
198
194
|
|
195
|
+
it 'should escape the content of removed `plaintext` elements' do
|
196
|
+
Sanitize.fragment('<plaintext>hello! <script>alert(0)</script>')
|
197
|
+
.must_equal 'hello! <script>alert(0)</script>'
|
198
|
+
end
|
199
|
+
|
200
|
+
it 'should escape the content of removed `xmp` elements' do
|
201
|
+
Sanitize.fragment('<xmp>hello! <script>alert(0)</script></xmp>')
|
202
|
+
.must_equal 'hello! <script>alert(0)</script>'
|
203
|
+
end
|
204
|
+
|
205
|
+
it 'should not preserve the content of removed `iframe` elements' do
|
206
|
+
Sanitize.fragment('<iframe>hello! <script>alert(0)</script></iframe>')
|
207
|
+
.must_equal ''
|
208
|
+
end
|
209
|
+
|
210
|
+
it 'should not preserve the content of removed `noembed` elements' do
|
211
|
+
Sanitize.fragment('<noembed>hello! <script>alert(0)</script></noembed>')
|
212
|
+
.must_equal ''
|
213
|
+
end
|
214
|
+
|
215
|
+
it 'should not preserve the content of removed `noframes` elements' do
|
216
|
+
Sanitize.fragment('<noframes>hello! <script>alert(0)</script></noframes>')
|
217
|
+
.must_equal ''
|
218
|
+
end
|
219
|
+
|
220
|
+
it 'should not preserve the content of removed `noscript` elements' do
|
221
|
+
Sanitize.fragment('<noscript>hello! <script>alert(0)</script></noscript>')
|
222
|
+
.must_equal ''
|
223
|
+
end
|
224
|
+
|
225
|
+
it 'should not preserve the content of removed `script` elements' do
|
226
|
+
Sanitize.fragment('<script>hello! <script>alert(0)</script></script>')
|
227
|
+
.must_equal ''
|
228
|
+
end
|
229
|
+
|
230
|
+
it 'should not preserve the content of removed `style` elements' do
|
231
|
+
Sanitize.fragment('<style>hello! <script>alert(0)</script></style>')
|
232
|
+
.must_equal ''
|
233
|
+
end
|
234
|
+
|
199
235
|
strings.each do |name, data|
|
200
236
|
it "should clean #{name} HTML" do
|
201
237
|
Sanitize.fragment(data[:html]).must_equal(data[:default])
|
@@ -234,7 +270,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
234
270
|
|
235
271
|
it 'should not choke on valueless attributes' do
|
236
272
|
@s.fragment('foo <a href>foo</a> bar')
|
237
|
-
.must_equal 'foo <a href rel="nofollow">foo</a> bar'
|
273
|
+
.must_equal 'foo <a href="" rel="nofollow">foo</a> bar'
|
238
274
|
end
|
239
275
|
|
240
276
|
it 'should downcase attribute names' do
|
@@ -262,7 +298,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
262
298
|
|
263
299
|
it 'should encode special chars in attribute values' do
|
264
300
|
@s.fragment('<a href="http://example.com" title="<b>éxamples</b> & things">foo</a>')
|
265
|
-
.must_equal '<a href="http://example.com" title="
|
301
|
+
.must_equal '<a href="http://example.com" title="<b>éxamples</b> & things">foo</a>'
|
266
302
|
end
|
267
303
|
|
268
304
|
strings.each do |name, data|
|
@@ -344,16 +380,30 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
344
380
|
).must_equal 'foo bar '
|
345
381
|
end
|
346
382
|
|
347
|
-
it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as strings' do
|
348
|
-
Sanitize.fragment('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>',
|
383
|
+
it 'should remove the contents of specified nodes when :remove_contents is an Array or Set of element names as strings' do
|
384
|
+
Sanitize.fragment('foo bar <div>baz<span>quux</span> <b>hi</b><script>alert("hello!");</script></div>',
|
349
385
|
:remove_contents => ['script', 'span']
|
350
|
-
).must_equal 'foo bar baz '
|
386
|
+
).must_equal 'foo bar baz hi '
|
387
|
+
|
388
|
+
Sanitize.fragment('foo bar <div>baz<span>quux</span> <b>hi</b><script>alert("hello!");</script></div>',
|
389
|
+
:remove_contents => Set.new(['script', 'span'])
|
390
|
+
).must_equal 'foo bar baz hi '
|
351
391
|
end
|
352
392
|
|
353
|
-
it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as symbols' do
|
354
|
-
Sanitize.fragment('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>',
|
393
|
+
it 'should remove the contents of specified nodes when :remove_contents is an Array or Set of element names as symbols' do
|
394
|
+
Sanitize.fragment('foo bar <div>baz<span>quux</span> <b>hi</b><script>alert("hello!");</script></div>',
|
355
395
|
:remove_contents => [:script, :span]
|
356
|
-
).must_equal 'foo bar baz '
|
396
|
+
).must_equal 'foo bar baz hi '
|
397
|
+
|
398
|
+
Sanitize.fragment('foo bar <div>baz<span>quux</span> <b>hi</b><script>alert("hello!");</script></div>',
|
399
|
+
:remove_contents => Set.new([:script, :span])
|
400
|
+
).must_equal 'foo bar baz hi '
|
401
|
+
end
|
402
|
+
|
403
|
+
it 'should remove the contents of whitelisted iframes' do
|
404
|
+
Sanitize.fragment('<iframe>hi <script>hello</script></iframe>',
|
405
|
+
:elements => ['iframe']
|
406
|
+
).must_equal '<iframe></iframe>'
|
357
407
|
end
|
358
408
|
|
359
409
|
it 'should not allow arbitrary HTML5 data attributes by default' do
|
@@ -413,7 +463,7 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
413
463
|
s.fragment('foo<br>bar<br>baz').must_equal "foo\nbar\nbaz"
|
414
464
|
end
|
415
465
|
|
416
|
-
it '
|
466
|
+
it 'should handle protocols correctly regardless of case' do
|
417
467
|
input = '<a href="hTTpS://foo.com/">Text</a>'
|
418
468
|
|
419
469
|
Sanitize.fragment(input, {
|
@@ -430,5 +480,40 @@ describe 'Sanitize::Transformers::CleanElement' do
|
|
430
480
|
:protocols => {'a' => {'href' => ['https']}}
|
431
481
|
}).must_equal "<a>Text</a>"
|
432
482
|
end
|
483
|
+
|
484
|
+
it 'should prevent `<meta>` tags from being used to set a non-UTF-8 charset' do
|
485
|
+
Sanitize.document('<html><head><meta charset="utf-8"></head><body>Howdy!</body></html>',
|
486
|
+
:elements => %w[html head meta body],
|
487
|
+
:attributes => {'meta' => ['charset']}
|
488
|
+
).must_equal "<html><head><meta charset=\"utf-8\"></head><body>Howdy!</body></html>"
|
489
|
+
|
490
|
+
Sanitize.document('<html><meta charset="utf-8">Howdy!</html>',
|
491
|
+
:elements => %w[html meta],
|
492
|
+
:attributes => {'meta' => ['charset']}
|
493
|
+
).must_equal "<html><meta charset=\"utf-8\">Howdy!</html>"
|
494
|
+
|
495
|
+
Sanitize.document('<html><meta charset="us-ascii">Howdy!</html>',
|
496
|
+
:elements => %w[html meta],
|
497
|
+
:attributes => {'meta' => ['charset']}
|
498
|
+
).must_equal "<html><meta charset=\"utf-8\">Howdy!</html>"
|
499
|
+
|
500
|
+
Sanitize.document('<html><meta http-equiv="content-type" content=" text/html; charset=us-ascii">Howdy!</html>',
|
501
|
+
:elements => %w[html meta],
|
502
|
+
:attributes => {'meta' => %w[content http-equiv]}
|
503
|
+
).must_equal "<html><meta http-equiv=\"content-type\" content=\" text/html;charset=utf-8\">Howdy!</html>"
|
504
|
+
|
505
|
+
Sanitize.document('<html><meta http-equiv="Content-Type" content="text/plain;charset = us-ascii">Howdy!</html>',
|
506
|
+
:elements => %w[html meta],
|
507
|
+
:attributes => {'meta' => %w[content http-equiv]}
|
508
|
+
).must_equal "<html><meta http-equiv=\"Content-Type\" content=\"text/plain;charset=utf-8\">Howdy!</html>"
|
509
|
+
end
|
510
|
+
|
511
|
+
it 'should not modify `<meta>` tags that already set a UTF-8 charset' do
|
512
|
+
Sanitize.document('<html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"></head><body>Howdy!</body></html>',
|
513
|
+
:elements => %w[html head meta body],
|
514
|
+
:attributes => {'meta' => %w[content http-equiv]}
|
515
|
+
).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>"
|
516
|
+
end
|
517
|
+
|
433
518
|
end
|
434
519
|
end
|
data/test/test_malicious_html.rb
CHANGED
@@ -43,7 +43,7 @@ describe 'Malicious HTML' do
|
|
43
43
|
describe '<body>' do
|
44
44
|
it 'should not be possible to inject JS via a malformed event attribute' do
|
45
45
|
@s.document('<html><head></head><body onload!#$%&()*~+-_.,:;?@[/|\\]^`=alert("XSS")></body></html>').
|
46
|
-
must_equal "<html><head></head><body></body></html
|
46
|
+
must_equal "<html><head></head><body></body></html>"
|
47
47
|
end
|
48
48
|
end
|
49
49
|
|
@@ -65,7 +65,7 @@ describe 'Malicious HTML' do
|
|
65
65
|
|
66
66
|
it 'should not be possible to inject <script> via a malformed <img> tag' do
|
67
67
|
@s.fragment('<img """><script>alert("XSS")</script>">').
|
68
|
-
must_equal '<img>
|
68
|
+
must_equal '<img>">'
|
69
69
|
end
|
70
70
|
|
71
71
|
it 'should not be possible to inject protocol-based JS' do
|
@@ -117,12 +117,12 @@ describe 'Malicious HTML' do
|
|
117
117
|
describe '<script>' do
|
118
118
|
it 'should not be possible to inject <script> using a malformed non-alphanumeric tag name' do
|
119
119
|
@s.fragment(%[<script/xss src="http://ha.ckers.org/xss.js">alert(1)</script>]).
|
120
|
-
must_equal '
|
120
|
+
must_equal ''
|
121
121
|
end
|
122
122
|
|
123
123
|
it 'should not be possible to inject <script> via extraneous open brackets' do
|
124
124
|
@s.fragment(%[<<script>alert("XSS");//<</script>]).
|
125
|
-
must_equal '<
|
125
|
+
must_equal '<'
|
126
126
|
end
|
127
127
|
end
|
128
128
|
|
@@ -166,7 +166,12 @@ describe 'Malicious HTML' do
|
|
166
166
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
167
167
|
|
168
168
|
it 'should escape unsafe characters in attributes' do
|
169
|
-
|
169
|
+
output = %[<#{tag_name} #{attr_name}="examp<!--%22%20onmouseover=alert(1)>-->le.com">foo</#{tag_name}>]
|
170
|
+
@s.fragment(input).must_equal(output)
|
171
|
+
|
172
|
+
fragment = Nokogiri::HTML.fragment(input)
|
173
|
+
@s.node!(fragment)
|
174
|
+
fragment.to_html.must_equal(output)
|
170
175
|
end
|
171
176
|
|
172
177
|
it 'should round-trip to the same output' do
|
@@ -179,7 +184,11 @@ describe 'Malicious HTML' do
|
|
179
184
|
input = %[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>]
|
180
185
|
|
181
186
|
it 'should not escape characters unnecessarily' do
|
182
|
-
@s.fragment(input).must_equal(
|
187
|
+
@s.fragment(input).must_equal(%[<#{tag_name} #{attr_name}="examp<!--" onmouseover=alert(1)>-->le.com">foo</#{tag_name}>])
|
188
|
+
|
189
|
+
fragment = Nokogiri::HTML.fragment(input)
|
190
|
+
@s.node!(fragment)
|
191
|
+
fragment.to_html.must_equal(%[<#{tag_name} #{attr_name}='examp<!--" onmouseover=alert(1)>-->le.com'>foo</#{tag_name}>])
|
183
192
|
end
|
184
193
|
|
185
194
|
it 'should round-trip to the same output' do
|
data/test/test_parser.rb
CHANGED
@@ -19,8 +19,8 @@ describe 'Parser' do
|
|
19
19
|
end
|
20
20
|
|
21
21
|
it 'should not have the Nokogiri 1.4.2+ unterminated script/style element bug' do
|
22
|
-
Sanitize.fragment('foo <script>bar').must_equal 'foo
|
23
|
-
Sanitize.fragment('foo <style>bar').must_equal 'foo
|
22
|
+
Sanitize.fragment('foo <script>bar').must_equal 'foo '
|
23
|
+
Sanitize.fragment('foo <style>bar').must_equal 'foo '
|
24
24
|
end
|
25
25
|
|
26
26
|
it 'ambiguous non-tag brackets like "1 > 2 and 2 < 1" should be parsed correctly' do
|
@@ -28,35 +28,6 @@ describe 'Parser' do
|
|
28
28
|
Sanitize.fragment('OMG HAPPY BIRTHDAY! *<:-D').must_equal 'OMG HAPPY BIRTHDAY! *<:-D'
|
29
29
|
end
|
30
30
|
|
31
|
-
# https://github.com/sparklemotion/nokogiri/issues/1008
|
32
|
-
it 'should work around the libxml2 content-type meta tag bug' do
|
33
|
-
Sanitize.document('<html><head></head><body>Howdy!</body></html>',
|
34
|
-
:elements => %w[html head body]
|
35
|
-
).must_equal "<html><head></head><body>Howdy!</body></html>\n"
|
36
|
-
|
37
|
-
Sanitize.document('<html><head></head><body>Howdy!</body></html>',
|
38
|
-
:elements => %w[html head meta body]
|
39
|
-
).must_equal "<html><head></head><body>Howdy!</body></html>\n"
|
40
|
-
|
41
|
-
Sanitize.document('<html><head><meta charset="utf-8"></head><body>Howdy!</body></html>',
|
42
|
-
:elements => %w[html head meta body],
|
43
|
-
:attributes => {'meta' => ['charset']}
|
44
|
-
).must_equal "<html><head><meta charset=\"utf-8\"></head><body>Howdy!</body></html>\n"
|
45
|
-
|
46
|
-
Sanitize.document('<html><head><meta http-equiv="Content-Type" content="text/html;charset=utf-8"></head><body>Howdy!</body></html>',
|
47
|
-
:elements => %w[html head meta body],
|
48
|
-
:attributes => {'meta' => %w[charset content http-equiv]}
|
49
|
-
).must_equal "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\"></head><body>Howdy!</body></html>\n"
|
50
|
-
|
51
|
-
# Edge case: an existing content-type meta tag with a non-UTF-8 content type
|
52
|
-
# will be converted to UTF-8, since that's the only output encoding we
|
53
|
-
# support.
|
54
|
-
Sanitize.document('<html><head><meta http-equiv="content-type" content="text/html;charset=us-ascii"></head><body>Howdy!</body></html>',
|
55
|
-
:elements => %w[html head meta body],
|
56
|
-
:attributes => {'meta' => %w[charset content http-equiv]}
|
57
|
-
).must_equal "<html><head><meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\"></head><body>Howdy!</body></html>\n"
|
58
|
-
end
|
59
|
-
|
60
31
|
describe 'when siblings are added after a node during traversal' do
|
61
32
|
it 'the added siblings should be traversed' do
|
62
33
|
html = %[
|
data/test/test_sanitize.rb
CHANGED
@@ -25,7 +25,7 @@ describe 'Sanitize' do
|
|
25
25
|
|
26
26
|
it 'should sanitize an HTML document' do
|
27
27
|
@s.document('<!doctype html><html><b>Lo<!-- comment -->rem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <script>alert("hello world");</script></html>')
|
28
|
-
.must_equal "<html>Lorem ipsum dolor sit amet
|
28
|
+
.must_equal "<html>Lorem ipsum dolor sit amet </html>"
|
29
29
|
end
|
30
30
|
|
31
31
|
it 'should not modify the input string' do
|
@@ -35,14 +35,14 @@ describe 'Sanitize' do
|
|
35
35
|
end
|
36
36
|
|
37
37
|
it 'should not choke on frozen documents' do
|
38
|
-
@s.document('<!doctype html><html><b>foo</b>'.freeze).must_equal "<html>foo</html
|
38
|
+
@s.document('<!doctype html><html><b>foo</b>'.freeze).must_equal "<html>foo</html>"
|
39
39
|
end
|
40
40
|
end
|
41
41
|
|
42
42
|
describe '#fragment' do
|
43
43
|
it 'should sanitize an HTML fragment' do
|
44
44
|
@s.fragment('<b>Lo<!-- comment -->rem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <script>alert("hello world");</script>')
|
45
|
-
.must_equal 'Lorem ipsum dolor sit amet
|
45
|
+
.must_equal 'Lorem ipsum dolor sit amet '
|
46
46
|
end
|
47
47
|
|
48
48
|
it 'should not modify the input string' do
|
@@ -71,7 +71,7 @@ describe 'Sanitize' do
|
|
71
71
|
doc.xpath('/html/body/node()').each {|node| frag << node }
|
72
72
|
|
73
73
|
@s.node!(frag)
|
74
|
-
frag.to_html.must_equal 'Lorem ipsum dolor sit amet
|
74
|
+
frag.to_html.must_equal 'Lorem ipsum dolor sit amet '
|
75
75
|
end
|
76
76
|
|
77
77
|
describe "when the given node is a document and <html> isn't whitelisted" do
|
data/test/test_transformers.rb
CHANGED
@@ -172,28 +172,28 @@ describe 'Transformers' do
|
|
172
172
|
input = '<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
173
173
|
|
174
174
|
Sanitize.fragment(input, :transformers => youtube_transformer)
|
175
|
-
.must_equal '<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""
|
175
|
+
.must_equal '<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""></iframe>'
|
176
176
|
end
|
177
177
|
|
178
178
|
it 'should allow HTTPS YouTube video embeds' do
|
179
179
|
input = '<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
180
180
|
|
181
181
|
Sanitize.fragment(input, :transformers => youtube_transformer)
|
182
|
-
.must_equal '<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""
|
182
|
+
.must_equal '<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""></iframe>'
|
183
183
|
end
|
184
184
|
|
185
185
|
it 'should allow protocol-relative YouTube video embeds' do
|
186
186
|
input = '<iframe width="420" height="315" src="//www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
187
187
|
|
188
188
|
Sanitize.fragment(input, :transformers => youtube_transformer)
|
189
|
-
.must_equal '<iframe width="420" height="315" src="//www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""
|
189
|
+
.must_equal '<iframe width="420" height="315" src="//www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""></iframe>'
|
190
190
|
end
|
191
191
|
|
192
192
|
it 'should allow privacy-enhanced YouTube video embeds' do
|
193
193
|
input = '<iframe width="420" height="315" src="https://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
194
194
|
|
195
195
|
Sanitize.fragment(input, :transformers => youtube_transformer)
|
196
|
-
.must_equal '<iframe width="420" height="315" src="https://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""
|
196
|
+
.must_equal '<iframe width="420" height="315" src="https://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen=""></iframe>'
|
197
197
|
end
|
198
198
|
|
199
199
|
it 'should not allow non-YouTube video embeds' do
|
data/test/test_unicode.rb
CHANGED
@@ -23,61 +23,61 @@ describe 'Unicode' do
|
|
23
23
|
end
|
24
24
|
|
25
25
|
it 'should strip deprecated grave and acute clones' do
|
26
|
-
@s.document("a\u0340b\u0341c").must_equal "<html><head></head><body>abc</body></html
|
26
|
+
@s.document("a\u0340b\u0341c").must_equal "<html><head></head><body>abc</body></html>"
|
27
27
|
@s.fragment("a\u0340b\u0341c").must_equal 'abc'
|
28
28
|
end
|
29
29
|
|
30
30
|
it 'should strip deprecated Khmer characters' do
|
31
|
-
@s.document("a\u17a3b\u17d3c").must_equal "<html><head></head><body>abc</body></html
|
31
|
+
@s.document("a\u17a3b\u17d3c").must_equal "<html><head></head><body>abc</body></html>"
|
32
32
|
@s.fragment("a\u17a3b\u17d3c").must_equal 'abc'
|
33
33
|
end
|
34
34
|
|
35
35
|
it 'should strip line and paragraph separator punctuation' do
|
36
|
-
@s.document("a\u2028b\u2029c").must_equal "<html><head></head><body>abc</body></html
|
36
|
+
@s.document("a\u2028b\u2029c").must_equal "<html><head></head><body>abc</body></html>"
|
37
37
|
@s.fragment("a\u2028b\u2029c").must_equal 'abc'
|
38
38
|
end
|
39
39
|
|
40
40
|
it 'should strip bidi embedding control characters' do
|
41
41
|
@s.document("a\u202ab\u202bc\u202cd\u202de\u202e")
|
42
|
-
.must_equal "<html><head></head><body>abcde</body></html
|
42
|
+
.must_equal "<html><head></head><body>abcde</body></html>"
|
43
43
|
|
44
44
|
@s.fragment("a\u202ab\u202bc\u202cd\u202de\u202e")
|
45
45
|
.must_equal 'abcde'
|
46
46
|
end
|
47
47
|
|
48
48
|
it 'should strip deprecated symmetric swapping characters' do
|
49
|
-
@s.document("a\u206ab\u206bc").must_equal "<html><head></head><body>abc</body></html
|
49
|
+
@s.document("a\u206ab\u206bc").must_equal "<html><head></head><body>abc</body></html>"
|
50
50
|
@s.fragment("a\u206ab\u206bc").must_equal 'abc'
|
51
51
|
end
|
52
52
|
|
53
53
|
it 'should strip deprecated Arabic form shaping characters' do
|
54
|
-
@s.document("a\u206cb\u206dc").must_equal "<html><head></head><body>abc</body></html
|
54
|
+
@s.document("a\u206cb\u206dc").must_equal "<html><head></head><body>abc</body></html>"
|
55
55
|
@s.fragment("a\u206cb\u206dc").must_equal 'abc'
|
56
56
|
end
|
57
57
|
|
58
58
|
it 'should strip deprecated National digit shape characters' do
|
59
|
-
@s.document("a\u206eb\u206fc").must_equal "<html><head></head><body>abc</body></html
|
59
|
+
@s.document("a\u206eb\u206fc").must_equal "<html><head></head><body>abc</body></html>"
|
60
60
|
@s.fragment("a\u206eb\u206fc").must_equal 'abc'
|
61
61
|
end
|
62
62
|
|
63
63
|
it 'should strip interlinear annotation characters' do
|
64
|
-
@s.document("a\ufff9b\ufffac\ufffb").must_equal "<html><head></head><body>abc</body></html
|
64
|
+
@s.document("a\ufff9b\ufffac\ufffb").must_equal "<html><head></head><body>abc</body></html>"
|
65
65
|
@s.fragment("a\ufff9b\ufffac\ufffb").must_equal 'abc'
|
66
66
|
end
|
67
67
|
|
68
68
|
it 'should strip BOM/zero-width non-breaking space characters' do
|
69
|
-
@s.document("a\ufeffbc").must_equal "<html><head></head><body>abc</body></html
|
69
|
+
@s.document("a\ufeffbc").must_equal "<html><head></head><body>abc</body></html>"
|
70
70
|
@s.fragment("a\ufeffbc").must_equal 'abc'
|
71
71
|
end
|
72
72
|
|
73
73
|
it 'should strip object replacement characters' do
|
74
|
-
@s.document("a\ufffcbc").must_equal "<html><head></head><body>abc</body></html
|
74
|
+
@s.document("a\ufffcbc").must_equal "<html><head></head><body>abc</body></html>"
|
75
75
|
@s.fragment("a\ufffcbc").must_equal 'abc'
|
76
76
|
end
|
77
77
|
|
78
78
|
it 'should strip musical notation scoping characters' do
|
79
79
|
@s.document("a\u{1d173}b\u{1d174}c\u{1d175}d\u{1d176}e\u{1d177}f\u{1d178}g\u{1d179}h\u{1d17a}")
|
80
|
-
.must_equal "<html><head></head><body>abcdefgh</body></html
|
80
|
+
.must_equal "<html><head></head><body>abcdefgh</body></html>"
|
81
81
|
|
82
82
|
@s.fragment("a\u{1d173}b\u{1d174}c\u{1d175}d\u{1d176}e\u{1d177}f\u{1d178}g\u{1d179}h\u{1d17a}")
|
83
83
|
.must_equal 'abcdefgh'
|
@@ -88,7 +88,7 @@ describe 'Unicode' do
|
|
88
88
|
(0xE0000..0xE007F).each {|n| str << [n].pack('U') }
|
89
89
|
str << 'b'
|
90
90
|
|
91
|
-
@s.document(str).must_equal "<html><head></head><body>ab</body></html
|
91
|
+
@s.document(str).must_equal "<html><head></head><body>ab</body></html>"
|
92
92
|
@s.fragment(str).must_equal 'ab'
|
93
93
|
end
|
94
94
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version:
|
4
|
+
version: 5.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2018-
|
11
|
+
date: 2018-10-15 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: crass
|
@@ -30,56 +30,56 @@ dependencies:
|
|
30
30
|
requirements:
|
31
31
|
- - ">="
|
32
32
|
- !ruby/object:Gem::Version
|
33
|
-
version: 1.
|
33
|
+
version: 1.8.0
|
34
34
|
type: :runtime
|
35
35
|
prerelease: false
|
36
36
|
version_requirements: !ruby/object:Gem::Requirement
|
37
37
|
requirements:
|
38
38
|
- - ">="
|
39
39
|
- !ruby/object:Gem::Version
|
40
|
-
version: 1.
|
40
|
+
version: 1.8.0
|
41
41
|
- !ruby/object:Gem::Dependency
|
42
42
|
name: nokogumbo
|
43
43
|
requirement: !ruby/object:Gem::Requirement
|
44
44
|
requirements:
|
45
45
|
- - "~>"
|
46
46
|
- !ruby/object:Gem::Version
|
47
|
-
version: '
|
47
|
+
version: '2.0'
|
48
48
|
type: :runtime
|
49
49
|
prerelease: false
|
50
50
|
version_requirements: !ruby/object:Gem::Requirement
|
51
51
|
requirements:
|
52
52
|
- - "~>"
|
53
53
|
- !ruby/object:Gem::Version
|
54
|
-
version: '
|
54
|
+
version: '2.0'
|
55
55
|
- !ruby/object:Gem::Dependency
|
56
56
|
name: minitest
|
57
57
|
requirement: !ruby/object:Gem::Requirement
|
58
58
|
requirements:
|
59
59
|
- - "~>"
|
60
60
|
- !ruby/object:Gem::Version
|
61
|
-
version: 5.
|
61
|
+
version: 5.11.3
|
62
62
|
type: :development
|
63
63
|
prerelease: false
|
64
64
|
version_requirements: !ruby/object:Gem::Requirement
|
65
65
|
requirements:
|
66
66
|
- - "~>"
|
67
67
|
- !ruby/object:Gem::Version
|
68
|
-
version: 5.
|
68
|
+
version: 5.11.3
|
69
69
|
- !ruby/object:Gem::Dependency
|
70
70
|
name: rake
|
71
71
|
requirement: !ruby/object:Gem::Requirement
|
72
72
|
requirements:
|
73
73
|
- - "~>"
|
74
74
|
- !ruby/object:Gem::Version
|
75
|
-
version: 12.
|
75
|
+
version: 12.3.1
|
76
76
|
type: :development
|
77
77
|
prerelease: false
|
78
78
|
version_requirements: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
80
|
- - "~>"
|
81
81
|
- !ruby/object:Gem::Version
|
82
|
-
version: 12.
|
82
|
+
version: 12.3.1
|
83
83
|
description: Sanitize is a whitelist-based HTML and CSS sanitizer. Given a list of
|
84
84
|
acceptable elements, attributes, and CSS properties, Sanitize will remove all unacceptable
|
85
85
|
HTML and/or CSS from a string.
|
@@ -129,7 +129,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
129
129
|
requirements:
|
130
130
|
- - ">="
|
131
131
|
- !ruby/object:Gem::Version
|
132
|
-
version: 1.
|
132
|
+
version: 2.1.0
|
133
133
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
134
134
|
requirements:
|
135
135
|
- - ">="
|
@@ -137,7 +137,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
137
137
|
version: 1.2.0
|
138
138
|
requirements: []
|
139
139
|
rubyforge_project:
|
140
|
-
rubygems_version: 2.7.
|
140
|
+
rubygems_version: 2.7.6
|
141
141
|
signing_key:
|
142
142
|
specification_version: 4
|
143
143
|
summary: Whitelist-based HTML and CSS sanitizer.
|