sanitize 2.0.4 → 2.0.5
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +109 -100
- data/lib/sanitize.rb +1 -0
- data/lib/sanitize/config.rb +9 -56
- data/lib/sanitize/config/basic.rb +2 -2
- data/lib/sanitize/config/default.rb +85 -0
- data/lib/sanitize/config/relaxed.rb +2 -2
- data/lib/sanitize/config/restricted.rb +2 -2
- data/lib/sanitize/version.rb +1 -1
- data/test/test_sanitize.rb +623 -0
- metadata +8 -6
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e5c69e88d2fe8117cf4ef82ff0d1469c67444b9d
|
4
|
+
data.tar.gz: 7708f7ca071901b213f06a542626e246c6d60409
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 90de33c0a818467a0df40564c9230a111744bd7cb064b1c3f68a4403d605970f33af2f5604b87a2c6103e7fb847061227d196675a4089f9e7f29b1748611e2fa
|
7
|
+
data.tar.gz: 1ae481892f1a1b7dbcdd20f02eebad70d320bb1e8f8ec888c6da61b8274b931074ceea562c46c02d90955fcda5118f4b69aa7332ff042162a4e5ef23f6760623
|
data/HISTORY.md
CHANGED
@@ -1,188 +1,197 @@
|
|
1
1
|
Sanitize History
|
2
2
|
================================================================================
|
3
3
|
|
4
|
+
Version 2.0.5 (2013-07-10)
|
5
|
+
--------------------------
|
6
|
+
|
7
|
+
* Loosened the Nokogiri dependency back to >= 1.4.4 to allow Sanitize to coexist
|
8
|
+
in newer Rubies with other libraries that restrict Nokogiri to 1.5.x for 1.8.7
|
9
|
+
compatibility. Sanitize still no longer supports 1.8.7, but this should make
|
10
|
+
life easier for people who need those other libs.
|
11
|
+
|
12
|
+
|
4
13
|
Version 2.0.4 (2013-06-12)
|
5
14
|
--------------------------
|
6
15
|
|
7
|
-
|
8
|
-
|
16
|
+
* Added `Sanitize.clean_document`, which sanitizes a full HTML document rather
|
17
|
+
than just a fragment. [Ben Anderson]
|
9
18
|
|
10
|
-
|
19
|
+
* Nokogiri dependency bumped to 1.6.x.
|
11
20
|
|
12
|
-
|
21
|
+
* Dropped support for Ruby versions older than 1.9.2.
|
13
22
|
|
14
23
|
|
15
24
|
Version 2.0.3 (2011-07-01)
|
16
25
|
--------------------------
|
17
26
|
|
18
|
-
|
27
|
+
* Loosened the Nokogiri dependency to allow Nokogiri 1.5.x.
|
19
28
|
|
20
29
|
|
21
30
|
Version 2.0.2 (2011-05-21)
|
22
31
|
--------------------------
|
23
32
|
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
33
|
+
* Fixed a bug in which a protocol like "java\script:" would be translated to
|
34
|
+
"java%5Cscript:" and allowed through the filter when relative URLs were
|
35
|
+
enabled. This didn't actually allow malicious code to run, but it is
|
36
|
+
undesired behavior.
|
28
37
|
|
29
38
|
|
30
39
|
Version 2.0.1 (2011-03-16)
|
31
40
|
--------------------------
|
32
41
|
|
33
|
-
|
34
|
-
|
42
|
+
* Updated the protocol regex to anchor at the beginning of the string rather
|
43
|
+
than the beginning of a line. [Eaden McKee]
|
35
44
|
|
36
45
|
|
37
46
|
Version 2.0.0 (2011-01-15)
|
38
47
|
--------------------------
|
39
48
|
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
|
49
|
+
* The environment data passed into transformers and the return values expected
|
50
|
+
from transformers have changed. Old transformers will need to be updated.
|
51
|
+
See the README for details.
|
52
|
+
* Transformers now receive nodes of all types, not just element nodes.
|
53
|
+
* Sanitize's own core filtering logic is now implemented as a set of always-on
|
54
|
+
transformers.
|
55
|
+
* The default value for the `:output` config is now `:html`. Previously it was
|
56
|
+
`:xhtml`.
|
57
|
+
* Added a `:whitespace_elements` config, which specifies elements (such as
|
58
|
+
`<br>` and `<p>`) that should be replaced with whitespace when removed in
|
59
|
+
order to preserve readability. See the README for the default list of
|
60
|
+
elements that will be replaced with whitespace when removed.
|
61
|
+
* Added a `:transformers_breadth` config, which may be used to specify
|
62
|
+
transformers that should traverse nodes in a breadth-first mode rather than
|
63
|
+
the default depth-first mode.
|
64
|
+
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
65
|
+
elements to the whitelists for the basic and relaxed configs.
|
66
|
+
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
67
|
+
`ruby`, and `wbr` elements to the whitelist for the relaxed config.
|
68
|
+
* The `dir`, `lang`, and `title` attributes are now whitelisted for all
|
69
|
+
elements in the relaxed config.
|
70
|
+
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
71
|
+
(issue #315) that caused `</body></html>` to be appended to the CDATA inside
|
72
|
+
unterminated script and style elements.
|
64
73
|
|
65
74
|
|
66
75
|
Version 1.2.1 (2010-04-20)
|
67
76
|
--------------------------
|
68
77
|
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
78
|
+
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
79
|
+
remove the contents of all non-whitelisted elements in addition to the
|
80
|
+
elements themselves. If set to an array of element names, Sanitize will
|
81
|
+
remove the contents of only those elements (when filtered), and leave the
|
82
|
+
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
83
|
+
option]
|
84
|
+
* Added an `:output_encoding` config setting to allow the character encoding
|
85
|
+
for HTML output to be specified. The default is utf-8.
|
86
|
+
* The environment hash passed into transformers now includes a `:node_name`
|
87
|
+
item containing the lowercase name of the current HTML node (e.g. "div").
|
88
|
+
* Returning anything other than a Hash or nil from a transformer will now
|
89
|
+
raise a meaningful `Sanitize::Error` exception rather than an unintended
|
90
|
+
`NameError`.
|
82
91
|
|
83
92
|
|
84
93
|
Version 1.2.0 (2010-01-17)
|
85
94
|
--------------------------
|
86
95
|
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
|
92
|
-
|
93
|
-
|
94
|
-
|
96
|
+
* Requires Nokogiri ~> 1.4.1.
|
97
|
+
* Added support for transformers, which allow you to filter and alter nodes
|
98
|
+
using your own custom logic, on top of (or instead of) Sanitize's core
|
99
|
+
filter. See the README for details and examples.
|
100
|
+
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
101
|
+
all its children.
|
102
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
|
103
|
+
David Reese]
|
95
104
|
|
96
105
|
|
97
106
|
Version 1.1.0 (2009-10-11)
|
98
107
|
--------------------------
|
99
108
|
|
100
|
-
|
101
|
-
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
109
|
+
* Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
|
110
|
+
* Added an `:output` config setting to allow the output format to be
|
111
|
+
specified. Supported formats are `:xhtml` (the default) and `:html` (which
|
112
|
+
outputs HTML4).
|
113
|
+
* Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
|
114
|
+
path segments. [Peter Cooper]
|
106
115
|
|
107
116
|
|
108
117
|
Version 1.0.8 (2009-04-23)
|
109
118
|
--------------------------
|
110
119
|
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
|
120
|
+
* Added a workaround for an Hpricot bug that prevents attribute names from
|
121
|
+
being downcased in recent versions of Hpricot. This was exploitable to
|
122
|
+
prevent non-whitelisted protocols from being cleaned. [Reported by Ben
|
123
|
+
Wanicur]
|
115
124
|
|
116
125
|
|
117
126
|
Version 1.0.7 (2009-04-11)
|
118
127
|
--------------------------
|
119
128
|
|
120
|
-
|
121
|
-
|
122
|
-
|
123
|
-
|
129
|
+
* Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
|
130
|
+
* Fixed a bug that caused named character entities containing digits (like
|
131
|
+
`²`) to be escaped when they shouldn't have been. [Reported by
|
132
|
+
Sebastian Steinmetz]
|
124
133
|
|
125
134
|
|
126
135
|
Version 1.0.6 (2009-02-23)
|
127
136
|
--------------------------
|
128
137
|
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
138
|
+
* Removed htmlentities gem dependency.
|
139
|
+
* Existing well-formed character entity references in the input string are now
|
140
|
+
preserved rather than being decoded and re-encoded.
|
141
|
+
* The `'` character is now encoded as `'` instead of `'` to prevent
|
142
|
+
problems in IE6.
|
143
|
+
* You can now specify the symbol `:all` in place of an element name in the
|
144
|
+
attributes config hash to allow certain attributes on all elements. [Thanks
|
145
|
+
to Mutwin Kraus]
|
137
146
|
|
138
147
|
|
139
148
|
Version 1.0.5 (2009-02-05)
|
140
149
|
--------------------------
|
141
150
|
|
142
|
-
|
143
|
-
|
144
|
-
|
145
|
-
|
146
|
-
|
151
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
|
152
|
+
protocols from being cleaned when relative URLs were allowed. [Reported by
|
153
|
+
Dev Purkayastha]
|
154
|
+
* Fixed "undefined method `parent='" exceptions caused by parser changes in
|
155
|
+
edge Hpricot.
|
147
156
|
|
148
157
|
|
149
158
|
Version 1.0.4 (2009-01-16)
|
150
159
|
--------------------------
|
151
160
|
|
152
|
-
|
153
|
-
|
154
|
-
|
161
|
+
* Fixed a bug that made it possible to sneak a non-whitelisted element through
|
162
|
+
by repeating it several times in a row. All versions of Sanitize prior to
|
163
|
+
1.0.4 are vulnerable. [Reported by Cristobal]
|
155
164
|
|
156
165
|
|
157
166
|
Version 1.0.3 (2009-01-15)
|
158
167
|
--------------------------
|
159
168
|
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
|
169
|
+
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
170
|
+
prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
|
171
|
+
still decode the incomplete entities, users of those browsers may be
|
172
|
+
vulnerable to malicious script injection on websites using versions of
|
173
|
+
Sanitize prior to 1.0.3.
|
165
174
|
|
166
175
|
|
167
176
|
Version 1.0.2 (2009-01-04)
|
168
177
|
--------------------------
|
169
178
|
|
170
|
-
|
171
|
-
|
179
|
+
* Fixed a bug that caused an exception to be thrown when parsing a valueless
|
180
|
+
attribute that's expected to contain a URL.
|
172
181
|
|
173
182
|
|
174
183
|
Version 1.0.1 (2009-01-01)
|
175
184
|
--------------------------
|
176
185
|
|
177
|
-
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
186
|
+
* You can now specify `:relative` in a protocol config array to allow
|
187
|
+
attributes containing relative URLs with no protocol. The Basic and Relaxed
|
188
|
+
configs have been updated to allow relative URLs.
|
189
|
+
* Added a workaround for an Hpricot bug that causes HTML entities for
|
190
|
+
non-ASCII characters to be replaced by question marks, and all other
|
191
|
+
entities to be destructively decoded.
|
183
192
|
|
184
193
|
|
185
194
|
Version 1.0.0 (2008-12-25)
|
186
195
|
--------------------------
|
187
196
|
|
188
|
-
|
197
|
+
* First release.
|
data/lib/sanitize.rb
CHANGED
data/lib/sanitize/config.rb
CHANGED
@@ -22,64 +22,17 @@
|
|
22
22
|
|
23
23
|
class Sanitize
|
24
24
|
module Config
|
25
|
-
DEFAULT = {
|
26
25
|
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
26
|
+
# Deeply freeze and return a configuration Hash.
|
27
|
+
def self.freeze_config(config)
|
28
|
+
if Array === config
|
29
|
+
config.each { |c| freeze_config(c) }
|
30
|
+
elsif Hash === config
|
31
|
+
config.each_value { |c| freeze_config(c) }
|
32
|
+
end
|
31
33
|
|
32
|
-
|
33
|
-
|
34
|
-
:add_attributes => {},
|
34
|
+
config.freeze
|
35
|
+
end
|
35
36
|
|
36
|
-
# HTML attributes to allow in specific elements. By default, no attributes
|
37
|
-
# are allowed.
|
38
|
-
:attributes => {},
|
39
|
-
|
40
|
-
# HTML elements to allow. By default, no elements are allowed (which means
|
41
|
-
# that all HTML will be stripped).
|
42
|
-
:elements => [],
|
43
|
-
|
44
|
-
# Output format. Supported formats are :html and :xhtml. Default is :html.
|
45
|
-
:output => :html,
|
46
|
-
|
47
|
-
# Character encoding to use for HTML output. Default is 'utf-8'.
|
48
|
-
:output_encoding => 'utf-8',
|
49
|
-
|
50
|
-
# URL handling protocols to allow in specific attributes. By default, no
|
51
|
-
# protocols are allowed. Use :relative in place of a protocol if you want
|
52
|
-
# to allow relative URLs sans protocol.
|
53
|
-
:protocols => {},
|
54
|
-
|
55
|
-
# If this is true, Sanitize will remove the contents of any filtered
|
56
|
-
# elements in addition to the elements themselves. By default, Sanitize
|
57
|
-
# leaves the safe parts of an element's contents behind when the element
|
58
|
-
# is removed.
|
59
|
-
#
|
60
|
-
# If this is an Array of element names, then only the contents of the
|
61
|
-
# specified elements (when filtered) will be removed, and the contents of
|
62
|
-
# all other filtered elements will be left behind.
|
63
|
-
:remove_contents => false,
|
64
|
-
|
65
|
-
# Transformers allow you to filter or alter nodes using custom logic. See
|
66
|
-
# README.rdoc for details and examples.
|
67
|
-
:transformers => [],
|
68
|
-
|
69
|
-
# By default, transformers perform depth-first traversal (deepest node
|
70
|
-
# upward). This setting allows you to specify transformers that should
|
71
|
-
# perform breadth-first traversal (top node downward).
|
72
|
-
:transformers_breadth => [],
|
73
|
-
|
74
|
-
# Elements which, when removed, should have their contents surrounded by
|
75
|
-
# space characters to preserve readability. For example,
|
76
|
-
# `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
|
77
|
-
# removed.
|
78
|
-
:whitespace_elements => %w[
|
79
|
-
address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
|
80
|
-
h6 header hgroup hr li nav ol p pre section ul
|
81
|
-
]
|
82
|
-
|
83
|
-
}
|
84
37
|
end
|
85
38
|
end
|
@@ -22,7 +22,7 @@
|
|
22
22
|
|
23
23
|
class Sanitize
|
24
24
|
module Config
|
25
|
-
BASIC =
|
25
|
+
BASIC = freeze_config(
|
26
26
|
:elements => %w[
|
27
27
|
a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
|
28
28
|
q s samp small strike strong sub sup time u ul var
|
@@ -46,6 +46,6 @@ class Sanitize
|
|
46
46
|
'blockquote' => {'cite' => ['http', 'https', :relative]},
|
47
47
|
'q' => {'cite' => ['http', 'https', :relative]}
|
48
48
|
}
|
49
|
-
|
49
|
+
)
|
50
50
|
end
|
51
51
|
end
|
@@ -0,0 +1,85 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2013 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
DEFAULT = freeze_config(
|
26
|
+
|
27
|
+
# Whether or not to allow HTML comments. Allowing comments is strongly
|
28
|
+
# discouraged, since IE allows script execution within conditional
|
29
|
+
# comments.
|
30
|
+
:allow_comments => false,
|
31
|
+
|
32
|
+
# HTML attributes to add to specific elements. By default, no attributes
|
33
|
+
# are added.
|
34
|
+
:add_attributes => {},
|
35
|
+
|
36
|
+
# HTML attributes to allow in specific elements. By default, no attributes
|
37
|
+
# are allowed.
|
38
|
+
:attributes => {},
|
39
|
+
|
40
|
+
# HTML elements to allow. By default, no elements are allowed (which means
|
41
|
+
# that all HTML will be stripped).
|
42
|
+
:elements => [],
|
43
|
+
|
44
|
+
# Output format. Supported formats are :html and :xhtml. Default is :html.
|
45
|
+
:output => :html,
|
46
|
+
|
47
|
+
# Character encoding to use for HTML output. Default is 'utf-8'.
|
48
|
+
:output_encoding => 'utf-8',
|
49
|
+
|
50
|
+
# URL handling protocols to allow in specific attributes. By default, no
|
51
|
+
# protocols are allowed. Use :relative in place of a protocol if you want
|
52
|
+
# to allow relative URLs sans protocol.
|
53
|
+
:protocols => {},
|
54
|
+
|
55
|
+
# If this is true, Sanitize will remove the contents of any filtered
|
56
|
+
# elements in addition to the elements themselves. By default, Sanitize
|
57
|
+
# leaves the safe parts of an element's contents behind when the element
|
58
|
+
# is removed.
|
59
|
+
#
|
60
|
+
# If this is an Array of element names, then only the contents of the
|
61
|
+
# specified elements (when filtered) will be removed, and the contents of
|
62
|
+
# all other filtered elements will be left behind.
|
63
|
+
:remove_contents => false,
|
64
|
+
|
65
|
+
# Transformers allow you to filter or alter nodes using custom logic. See
|
66
|
+
# README.rdoc for details and examples.
|
67
|
+
:transformers => [],
|
68
|
+
|
69
|
+
# By default, transformers perform depth-first traversal (deepest node
|
70
|
+
# upward). This setting allows you to specify transformers that should
|
71
|
+
# perform breadth-first traversal (top node downward).
|
72
|
+
:transformers_breadth => [],
|
73
|
+
|
74
|
+
# Elements which, when removed, should have their contents surrounded by
|
75
|
+
# space characters to preserve readability. For example,
|
76
|
+
# `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
|
77
|
+
# removed.
|
78
|
+
:whitespace_elements => %w[
|
79
|
+
address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
|
80
|
+
h6 header hgroup hr li nav ol p pre section ul
|
81
|
+
]
|
82
|
+
|
83
|
+
)
|
84
|
+
end
|
85
|
+
end
|
@@ -22,7 +22,7 @@
|
|
22
22
|
|
23
23
|
class Sanitize
|
24
24
|
module Config
|
25
|
-
RELAXED =
|
25
|
+
RELAXED = freeze_config(
|
26
26
|
:elements => %w[
|
27
27
|
a abbr b bdo blockquote br caption cite code col colgroup dd del dfn dl
|
28
28
|
dt em figcaption figure h1 h2 h3 h4 h5 h6 hgroup i img ins kbd li mark
|
@@ -56,6 +56,6 @@ class Sanitize
|
|
56
56
|
'ins' => {'cite' => ['http', 'https', :relative]},
|
57
57
|
'q' => {'cite' => ['http', 'https', :relative]}
|
58
58
|
}
|
59
|
-
|
59
|
+
)
|
60
60
|
end
|
61
61
|
end
|
data/lib/sanitize/version.rb
CHANGED
@@ -0,0 +1,623 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
#--
|
3
|
+
# Copyright (c) 2013 Ryan Grove <ryan@wonko.com>
|
4
|
+
#
|
5
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
7
|
+
# in the Software without restriction, including without limitation the rights
|
8
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
# copies of the Software, and to permit persons to whom the Software is
|
10
|
+
# furnished to do so, subject to the following conditions:
|
11
|
+
#
|
12
|
+
# The above copyright notice and this permission notice shall be included in all
|
13
|
+
# copies or substantial portions of the Software.
|
14
|
+
#
|
15
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
# SOFTWARE.
|
22
|
+
#++
|
23
|
+
|
24
|
+
require 'rubygems'
|
25
|
+
gem 'minitest'
|
26
|
+
|
27
|
+
require 'minitest/autorun'
|
28
|
+
require 'sanitize'
|
29
|
+
|
30
|
+
strings = {
|
31
|
+
:basic => {
|
32
|
+
:html => '<b>Lo<!-- comment -->rem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <script>alert("hello world");</script>',
|
33
|
+
:default => 'Lorem ipsum dolor sit amet alert("hello world");',
|
34
|
+
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet alert("hello world");',
|
35
|
+
:basic => '<b>Lorem</b> <a href="pants" rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet alert("hello world");',
|
36
|
+
:relaxed => '<b>Lorem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet alert("hello world");'
|
37
|
+
},
|
38
|
+
|
39
|
+
:malformed => {
|
40
|
+
:html => 'Lo<!-- comment -->rem</b> <a href=pants title="foo>ipsum <a href="http://foo.com/"><strong>dolor</a></strong> sit<br/>amet <script>alert("hello world");',
|
41
|
+
:default => 'Lorem dolor sit amet alert("hello world");',
|
42
|
+
:restricted => 'Lorem <strong>dolor</strong> sit amet alert("hello world");',
|
43
|
+
:basic => 'Lorem <a href="pants" rel="nofollow"><strong>dolor</strong></a> sit<br>amet alert("hello world");',
|
44
|
+
:relaxed => 'Lorem <a href="pants" title="foo>ipsum <a href="><strong>dolor</strong></a> sit<br>amet alert("hello world");',
|
45
|
+
:document => ' Lorem dolor sit amet alert("hello world"); '
|
46
|
+
},
|
47
|
+
|
48
|
+
:unclosed => {
|
49
|
+
:html => '<p>a</p><blockquote>b',
|
50
|
+
:default => ' a b ',
|
51
|
+
:restricted => ' a b ',
|
52
|
+
:basic => '<p>a</p><blockquote>b</blockquote>',
|
53
|
+
:relaxed => '<p>a</p><blockquote>b</blockquote>'
|
54
|
+
},
|
55
|
+
|
56
|
+
:malicious => {
|
57
|
+
:html => '<b>Lo<!-- comment -->rem</b> <a href="javascript:pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <<foo>script>alert("hello world");</script>',
|
58
|
+
:default => 'Lorem ipsum dolor sit amet script>alert("hello world");',
|
59
|
+
:restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet script>alert("hello world");',
|
60
|
+
:basic => '<b>Lorem</b> <a rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet script>alert("hello world");',
|
61
|
+
:relaxed => '<b>Lorem</b> <a title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet script>alert("hello world");'
|
62
|
+
},
|
63
|
+
|
64
|
+
:raw_comment => {
|
65
|
+
:html => '<!-- comment -->Hello',
|
66
|
+
:default => 'Hello',
|
67
|
+
:restricted => 'Hello',
|
68
|
+
:basic => 'Hello',
|
69
|
+
:relaxed => 'Hello',
|
70
|
+
:document => ' Hello ',
|
71
|
+
}
|
72
|
+
}
|
73
|
+
|
74
|
+
tricky = {
|
75
|
+
'protocol-based JS injection: simple, no spaces' => {
|
76
|
+
:html => '<a href="javascript:alert(\'XSS\');">foo</a>',
|
77
|
+
:default => 'foo',
|
78
|
+
:restricted => 'foo',
|
79
|
+
:basic => '<a rel="nofollow">foo</a>',
|
80
|
+
:relaxed => '<a>foo</a>'
|
81
|
+
},
|
82
|
+
|
83
|
+
'protocol-based JS injection: simple, spaces before' => {
|
84
|
+
:html => '<a href="javascript :alert(\'XSS\');">foo</a>',
|
85
|
+
:default => 'foo',
|
86
|
+
:restricted => 'foo',
|
87
|
+
:basic => '<a rel="nofollow">foo</a>',
|
88
|
+
:relaxed => '<a>foo</a>'
|
89
|
+
},
|
90
|
+
|
91
|
+
'protocol-based JS injection: simple, spaces after' => {
|
92
|
+
:html => '<a href="javascript: alert(\'XSS\');">foo</a>',
|
93
|
+
:default => 'foo',
|
94
|
+
:restricted => 'foo',
|
95
|
+
:basic => '<a rel="nofollow">foo</a>',
|
96
|
+
:relaxed => '<a>foo</a>'
|
97
|
+
},
|
98
|
+
|
99
|
+
'protocol-based JS injection: simple, spaces before and after' => {
|
100
|
+
:html => '<a href="javascript : alert(\'XSS\');">foo</a>',
|
101
|
+
:default => 'foo',
|
102
|
+
:restricted => 'foo',
|
103
|
+
:basic => '<a rel="nofollow">foo</a>',
|
104
|
+
:relaxed => '<a>foo</a>'
|
105
|
+
},
|
106
|
+
|
107
|
+
'protocol-based JS injection: preceding colon' => {
|
108
|
+
:html => '<a href=":javascript:alert(\'XSS\');">foo</a>',
|
109
|
+
:default => 'foo',
|
110
|
+
:restricted => 'foo',
|
111
|
+
:basic => '<a rel="nofollow">foo</a>',
|
112
|
+
:relaxed => '<a>foo</a>'
|
113
|
+
},
|
114
|
+
|
115
|
+
'protocol-based JS injection: UTF-8 encoding' => {
|
116
|
+
:html => '<a href="javascript:">foo</a>',
|
117
|
+
:default => 'foo',
|
118
|
+
:restricted => 'foo',
|
119
|
+
:basic => '<a rel="nofollow">foo</a>',
|
120
|
+
:relaxed => '<a>foo</a>'
|
121
|
+
},
|
122
|
+
|
123
|
+
'protocol-based JS injection: long UTF-8 encoding' => {
|
124
|
+
:html => '<a href="javascript:">foo</a>',
|
125
|
+
:default => 'foo',
|
126
|
+
:restricted => 'foo',
|
127
|
+
:basic => '<a rel="nofollow">foo</a>',
|
128
|
+
:relaxed => '<a>foo</a>'
|
129
|
+
},
|
130
|
+
|
131
|
+
'protocol-based JS injection: long UTF-8 encoding without semicolons' => {
|
132
|
+
:html => '<a href=javascript:alert('XSS')>foo</a>',
|
133
|
+
:default => 'foo',
|
134
|
+
:restricted => 'foo',
|
135
|
+
:basic => '<a rel="nofollow">foo</a>',
|
136
|
+
:relaxed => '<a>foo</a>'
|
137
|
+
},
|
138
|
+
|
139
|
+
'protocol-based JS injection: hex encoding' => {
|
140
|
+
:html => '<a href="javascript:">foo</a>',
|
141
|
+
:default => 'foo',
|
142
|
+
:restricted => 'foo',
|
143
|
+
:basic => '<a rel="nofollow">foo</a>',
|
144
|
+
:relaxed => '<a>foo</a>'
|
145
|
+
},
|
146
|
+
|
147
|
+
'protocol-based JS injection: long hex encoding' => {
|
148
|
+
:html => '<a href="javascript:">foo</a>',
|
149
|
+
:default => 'foo',
|
150
|
+
:restricted => 'foo',
|
151
|
+
:basic => '<a rel="nofollow">foo</a>',
|
152
|
+
:relaxed => '<a>foo</a>'
|
153
|
+
},
|
154
|
+
|
155
|
+
'protocol-based JS injection: hex encoding without semicolons' => {
|
156
|
+
:html => '<a href=javascript:alert('XSS')>foo</a>',
|
157
|
+
:default => 'foo',
|
158
|
+
:restricted => 'foo',
|
159
|
+
:basic => '<a rel="nofollow">foo</a>',
|
160
|
+
:relaxed => '<a>foo</a>'
|
161
|
+
},
|
162
|
+
|
163
|
+
'protocol-based JS injection: null char' => {
|
164
|
+
:html => "<img src=java\0script:alert(\"XSS\")>",
|
165
|
+
:default => '',
|
166
|
+
:restricted => '',
|
167
|
+
:basic => '',
|
168
|
+
:relaxed => '<img src="java">' # everything following the null char gets stripped, and URL is considered relative
|
169
|
+
},
|
170
|
+
|
171
|
+
'protocol-based JS injection: invalid URL char' => {
|
172
|
+
:html => '<img src=java\script:alert("XSS")>',
|
173
|
+
:default => '',
|
174
|
+
:restricted => '',
|
175
|
+
:basic => '',
|
176
|
+
:relaxed => '<img>'
|
177
|
+
},
|
178
|
+
|
179
|
+
'protocol-based JS injection: spaces and entities' => {
|
180
|
+
:html => '<img src="  javascript:alert(\'XSS\');">',
|
181
|
+
:default => '',
|
182
|
+
:restricted => '',
|
183
|
+
:basic => '',
|
184
|
+
:relaxed => '<img src="">'
|
185
|
+
}
|
186
|
+
}
|
187
|
+
|
188
|
+
describe 'Config::DEFAULT' do
|
189
|
+
it 'should translate valid HTML entities' do
|
190
|
+
Sanitize.clean("Don't tasé me & bro!").must_equal("Don't tasé me & bro!")
|
191
|
+
end
|
192
|
+
|
193
|
+
it 'should translate valid HTML entities while encoding unencoded ampersands' do
|
194
|
+
Sanitize.clean("cookies² & ¼ créme").must_equal("cookies² & ¼ créme")
|
195
|
+
end
|
196
|
+
|
197
|
+
it 'should never output '' do
|
198
|
+
Sanitize.clean("<a href=''' class=\"' '\">IE6 isn't a real browser</a>").wont_match(/'/)
|
199
|
+
end
|
200
|
+
|
201
|
+
it 'should not choke on several instances of the same element in a row' do
|
202
|
+
Sanitize.clean('<img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif">').must_equal('')
|
203
|
+
end
|
204
|
+
|
205
|
+
it 'should surround the contents of :whitespace_elements with space characters when removing the element' do
|
206
|
+
Sanitize.clean('foo<div>bar</div>baz').must_equal('foo bar baz')
|
207
|
+
Sanitize.clean('foo<br>bar<br>baz').must_equal('foo bar baz')
|
208
|
+
Sanitize.clean('foo<hr>bar<hr>baz').must_equal('foo bar baz')
|
209
|
+
end
|
210
|
+
|
211
|
+
strings.each do |name, data|
|
212
|
+
it "should clean #{name} HTML" do
|
213
|
+
Sanitize.clean(data[:html]).must_equal(data[:default])
|
214
|
+
end
|
215
|
+
end
|
216
|
+
|
217
|
+
tricky.each do |name, data|
|
218
|
+
it "should not allow #{name}" do
|
219
|
+
Sanitize.clean(data[:html]).must_equal(data[:default])
|
220
|
+
end
|
221
|
+
end
|
222
|
+
end
|
223
|
+
|
224
|
+
describe 'Config::RESTRICTED' do
|
225
|
+
before { @s = Sanitize.new(Sanitize::Config::RESTRICTED) }
|
226
|
+
|
227
|
+
strings.each do |name, data|
|
228
|
+
it "should clean #{name} HTML" do
|
229
|
+
@s.clean(data[:html]).must_equal(data[:restricted])
|
230
|
+
end
|
231
|
+
end
|
232
|
+
|
233
|
+
tricky.each do |name, data|
|
234
|
+
it "should not allow #{name}" do
|
235
|
+
@s.clean(data[:html]).must_equal(data[:restricted])
|
236
|
+
end
|
237
|
+
end
|
238
|
+
end
|
239
|
+
|
240
|
+
describe 'Config::BASIC' do
|
241
|
+
before { @s = Sanitize.new(Sanitize::Config::BASIC) }
|
242
|
+
|
243
|
+
it 'should not choke on valueless attributes' do
|
244
|
+
@s.clean('foo <a href>foo</a> bar').must_equal('foo <a href rel="nofollow">foo</a> bar')
|
245
|
+
end
|
246
|
+
|
247
|
+
it 'should downcase attribute names' do
|
248
|
+
@s.clean('<a HREF="javascript:alert(\'foo\')">bar</a>').must_equal('<a rel="nofollow">bar</a>')
|
249
|
+
end
|
250
|
+
|
251
|
+
strings.each do |name, data|
|
252
|
+
it "should clean #{name} HTML" do
|
253
|
+
@s.clean(data[:html]).must_equal(data[:basic])
|
254
|
+
end
|
255
|
+
end
|
256
|
+
|
257
|
+
tricky.each do |name, data|
|
258
|
+
it "should not allow #{name}" do
|
259
|
+
@s.clean(data[:html]).must_equal(data[:basic])
|
260
|
+
end
|
261
|
+
end
|
262
|
+
end
|
263
|
+
|
264
|
+
describe 'Config::RELAXED' do
|
265
|
+
before { @s = Sanitize.new(Sanitize::Config::RELAXED) }
|
266
|
+
|
267
|
+
it 'should encode special chars in attribute values' do
|
268
|
+
input = '<a href="http://example.com" title="<b>éxamples</b> & things">foo</a>'
|
269
|
+
output = Nokogiri::HTML.fragment('<a href="http://example.com" title="<b>éxamples</b> & things">foo</a>').to_xhtml(:encoding => 'utf-8', :indent => 0, :save_with => Nokogiri::XML::Node::SaveOptions::AS_XHTML)
|
270
|
+
@s.clean(input).must_equal(output)
|
271
|
+
end
|
272
|
+
|
273
|
+
strings.each do |name, data|
|
274
|
+
it "should clean #{name} HTML" do
|
275
|
+
@s.clean(data[:html]).must_equal(data[:relaxed])
|
276
|
+
end
|
277
|
+
end
|
278
|
+
|
279
|
+
tricky.each do |name, data|
|
280
|
+
it "should not allow #{name}" do
|
281
|
+
@s.clean(data[:html]).must_equal(data[:relaxed])
|
282
|
+
end
|
283
|
+
end
|
284
|
+
end
|
285
|
+
|
286
|
+
describe 'Full Document parser (using clean_document)' do
|
287
|
+
before {
|
288
|
+
@s = Sanitize.new({:elements => %w[!DOCTYPE html]})
|
289
|
+
@default_doctype = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">"
|
290
|
+
}
|
291
|
+
|
292
|
+
it 'should require HTML element is whitelisted to prevent parser errors' do
|
293
|
+
assert_raises(RuntimeError, 'You must have the HTML element whitelisted') {
|
294
|
+
Sanitize.clean_document!('', {:elements => [], :remove_contents => false})
|
295
|
+
}
|
296
|
+
end
|
297
|
+
|
298
|
+
it 'should NOT require HTML element to be whitelisted if remove_contents is true' do
|
299
|
+
output = '<!DOCTYPE html><html>foo</html>'
|
300
|
+
Sanitize.clean_document!(output, {:remove_contents => true}).must_equal "<!DOCTYPE html>\n\n"
|
301
|
+
end
|
302
|
+
|
303
|
+
it 'adds a doctype tag if not included' do
|
304
|
+
@s.clean_document('').must_equal("#{@default_doctype}\n\n")
|
305
|
+
end
|
306
|
+
|
307
|
+
it 'should apply whitelist filtering to HTML element' do
|
308
|
+
output = "<!DOCTYPE html>\n<html anything='false'></html>\n\n"
|
309
|
+
@s.clean_document(output).must_equal("<!DOCTYPE html>\n<html></html>\n")
|
310
|
+
end
|
311
|
+
|
312
|
+
strings.each do |name, data|
|
313
|
+
it "should wrap #{name} with DOCTYPE and HTML tag" do
|
314
|
+
output = data[:document] || data[:default]
|
315
|
+
@s.clean_document(data[:html]).must_equal("#{@default_doctype}\n<html>#{output}</html>\n")
|
316
|
+
end
|
317
|
+
end
|
318
|
+
|
319
|
+
tricky.each do |name, data|
|
320
|
+
it "should wrap #{name} with DOCTYPE and HTML tag" do
|
321
|
+
@s.clean_document(data[:html]).must_equal("#{@default_doctype}\n<html>#{data[:default]}</html>\n")
|
322
|
+
end
|
323
|
+
end
|
324
|
+
end
|
325
|
+
|
326
|
+
describe 'Custom configs' do
|
327
|
+
it 'should allow attributes on all elements if whitelisted under :all' do
|
328
|
+
input = '<p class="foo">bar</p>'
|
329
|
+
|
330
|
+
Sanitize.clean(input).must_equal(' bar ')
|
331
|
+
Sanitize.clean(input, {:elements => ['p'], :attributes => {:all => ['class']}}).must_equal(input)
|
332
|
+
Sanitize.clean(input, {:elements => ['p'], :attributes => {'div' => ['class']}}).must_equal('<p>bar</p>')
|
333
|
+
Sanitize.clean(input, {:elements => ['p'], :attributes => {'p' => ['title'], :all => ['class']}}).must_equal(input)
|
334
|
+
end
|
335
|
+
|
336
|
+
it 'should allow comments when :allow_comments == true' do
|
337
|
+
input = 'foo <!-- bar --> baz'
|
338
|
+
Sanitize.clean(input).must_equal('foo baz')
|
339
|
+
Sanitize.clean(input, :allow_comments => true).must_equal(input)
|
340
|
+
end
|
341
|
+
|
342
|
+
it 'should allow relative URLs containing colons where the colon is not in the first path segment' do
|
343
|
+
input = '<a href="/wiki/Special:Random">Random Page</a>'
|
344
|
+
Sanitize.clean(input, { :elements => ['a'], :attributes => {'a' => ['href']}, :protocols => { 'a' => { 'href' => [:relative] }} }).must_equal(input)
|
345
|
+
end
|
346
|
+
|
347
|
+
it 'should output HTML when :output == :html' do
|
348
|
+
input = 'foo<br/>bar<br>baz'
|
349
|
+
Sanitize.clean(input, :elements => ['br'], :output => :html).must_equal('foo<br>bar<br>baz')
|
350
|
+
end
|
351
|
+
|
352
|
+
it 'should remove the contents of filtered nodes when :remove_contents == true' do
|
353
|
+
Sanitize.clean('foo bar <div>baz<span>quux</span></div>', :remove_contents => true).must_equal('foo bar ')
|
354
|
+
end
|
355
|
+
|
356
|
+
it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as strings' do
|
357
|
+
Sanitize.clean('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>', :remove_contents => ['script', 'span']).must_equal('foo bar baz ')
|
358
|
+
end
|
359
|
+
|
360
|
+
it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as symbols' do
|
361
|
+
Sanitize.clean('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>', :remove_contents => [:script, :span]).must_equal('foo bar baz ')
|
362
|
+
end
|
363
|
+
|
364
|
+
it 'should support encodings other than utf-8' do
|
365
|
+
html = 'foo bar'
|
366
|
+
Sanitize.clean(html).must_equal("foo\302\240bar")
|
367
|
+
Sanitize.clean(html, :output_encoding => 'ASCII').must_equal("foo bar")
|
368
|
+
end
|
369
|
+
end
|
370
|
+
|
371
|
+
describe 'Sanitize.clean' do
|
372
|
+
it 'should not modify the input string' do
|
373
|
+
input = '<b>foo</b>'
|
374
|
+
Sanitize.clean(input)
|
375
|
+
input.must_equal('<b>foo</b>')
|
376
|
+
end
|
377
|
+
|
378
|
+
it 'should return a new string' do
|
379
|
+
input = '<b>foo</b>'
|
380
|
+
Sanitize.clean(input).must_equal('foo')
|
381
|
+
end
|
382
|
+
end
|
383
|
+
|
384
|
+
describe 'Sanitize.clean!' do
|
385
|
+
it 'should modify the input string' do
|
386
|
+
input = '<b>foo</b>'
|
387
|
+
Sanitize.clean!(input)
|
388
|
+
input.must_equal('foo')
|
389
|
+
end
|
390
|
+
|
391
|
+
it 'should return the string if it was modified' do
|
392
|
+
input = '<b>foo</b>'
|
393
|
+
Sanitize.clean!(input).must_equal('foo')
|
394
|
+
end
|
395
|
+
|
396
|
+
it 'should return nil if the string was not modified' do
|
397
|
+
input = 'foo'
|
398
|
+
Sanitize.clean!(input).must_equal(nil)
|
399
|
+
end
|
400
|
+
end
|
401
|
+
|
402
|
+
describe 'Sanitize.clean_document' do
|
403
|
+
before { @config = { :elements => ['html', 'p'] } }
|
404
|
+
|
405
|
+
it 'should be idempotent' do
|
406
|
+
input = '<!DOCTYPE html><html><p>foo</p></html>'
|
407
|
+
first = Sanitize.clean_document(input, @config)
|
408
|
+
second = Sanitize.clean_document(first, @config)
|
409
|
+
second.must_equal first
|
410
|
+
second.wont_be_nil
|
411
|
+
end
|
412
|
+
|
413
|
+
it 'should handle nil without raising' do
|
414
|
+
Sanitize.clean_document(nil).must_equal nil
|
415
|
+
end
|
416
|
+
|
417
|
+
it 'should not modify the input string' do
|
418
|
+
input = '<!DOCTYPE html><b>foo</b>'
|
419
|
+
Sanitize.clean_document(input, @config)
|
420
|
+
input.must_equal('<!DOCTYPE html><b>foo</b>')
|
421
|
+
end
|
422
|
+
|
423
|
+
it 'should return a new string' do
|
424
|
+
input = '<!DOCTYPE html><b>foo</b>'
|
425
|
+
Sanitize.clean_document(input, @config).must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
|
426
|
+
end
|
427
|
+
end
|
428
|
+
|
429
|
+
describe 'Sanitize.clean_document!' do
|
430
|
+
before { @config = { :elements => ['html'] } }
|
431
|
+
|
432
|
+
it 'should modify the input string' do
|
433
|
+
input = '<!DOCTYPE html><html><body><b>foo</b></body></html>'
|
434
|
+
Sanitize.clean_document!(input, @config)
|
435
|
+
input.must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
|
436
|
+
end
|
437
|
+
|
438
|
+
it 'should return the string if it was modified' do
|
439
|
+
input = '<!DOCTYPE html><html><body><b>foo</b></body></html>'
|
440
|
+
Sanitize.clean_document!(input, @config).must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
|
441
|
+
end
|
442
|
+
|
443
|
+
it 'should return nil if the string was not modified' do
|
444
|
+
input = "<!DOCTYPE html>\n<html></html>\n"
|
445
|
+
Sanitize.clean_document!(input, @config).must_equal(nil)
|
446
|
+
end
|
447
|
+
end
|
448
|
+
|
449
|
+
describe 'transformers' do
|
450
|
+
# YouTube embed transformer.
|
451
|
+
youtube = lambda do |env|
|
452
|
+
node = env[:node]
|
453
|
+
node_name = env[:node_name]
|
454
|
+
|
455
|
+
# Don't continue if this node is already whitelisted or is not an element.
|
456
|
+
return if env[:is_whitelisted] || !node.element?
|
457
|
+
|
458
|
+
# Don't continue unless the node is an iframe.
|
459
|
+
return unless node_name == 'iframe'
|
460
|
+
|
461
|
+
# Verify that the video URL is actually a valid YouTube video URL.
|
462
|
+
return unless node['src'] =~ /\Ahttps?:\/\/(?:www\.)?youtube(?:-nocookie)?\.com\//
|
463
|
+
|
464
|
+
# We're now certain that this is a YouTube embed, but we still need to run
|
465
|
+
# it through a special Sanitize step to ensure that no unwanted elements or
|
466
|
+
# attributes that don't belong in a YouTube embed can sneak in.
|
467
|
+
Sanitize.clean_node!(node, {
|
468
|
+
:elements => %w[iframe],
|
469
|
+
|
470
|
+
:attributes => {
|
471
|
+
'iframe' => %w[allowfullscreen frameborder height src width]
|
472
|
+
}
|
473
|
+
})
|
474
|
+
|
475
|
+
# Now that we're sure that this is a valid YouTube embed and that there are
|
476
|
+
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
477
|
+
# to whitelist the current node.
|
478
|
+
{:node_whitelist => [node]}
|
479
|
+
end
|
480
|
+
|
481
|
+
it 'should receive a complete env Hash as input' do
|
482
|
+
Sanitize.clean!('<SPAN>foo</SPAN>', :foo => :bar, :transformers => lambda {|env|
|
483
|
+
return unless env[:node].element?
|
484
|
+
|
485
|
+
env[:config][:foo].must_equal(:bar)
|
486
|
+
env[:is_whitelisted].must_equal(false)
|
487
|
+
env[:node].must_be_kind_of(Nokogiri::XML::Node)
|
488
|
+
env[:node_name].must_equal('span')
|
489
|
+
env[:node_whitelist].must_be_kind_of(Set)
|
490
|
+
env[:node_whitelist].must_be_empty
|
491
|
+
})
|
492
|
+
end
|
493
|
+
|
494
|
+
it 'should traverse all node types, including the fragment itself' do
|
495
|
+
nodes = []
|
496
|
+
|
497
|
+
Sanitize.clean!('<div>foo</div><!--bar--><script>cdata!</script>', :transformers => proc {|env|
|
498
|
+
nodes << env[:node_name]
|
499
|
+
})
|
500
|
+
|
501
|
+
nodes.must_equal(%w[
|
502
|
+
text div comment #cdata-section script #document-fragment
|
503
|
+
])
|
504
|
+
end
|
505
|
+
|
506
|
+
it 'should traverse in depth-first mode by default' do
|
507
|
+
nodes = []
|
508
|
+
|
509
|
+
Sanitize.clean!('<div><span>foo</span></div><p>bar</p>', :transformers => proc {|env|
|
510
|
+
env[:traversal_mode].must_equal(:depth)
|
511
|
+
nodes << env[:node_name] if env[:node].element?
|
512
|
+
})
|
513
|
+
|
514
|
+
nodes.must_equal(['span', 'div', 'p'])
|
515
|
+
end
|
516
|
+
|
517
|
+
it 'should traverse in breadth-first mode when using :transformers_breadth' do
|
518
|
+
nodes = []
|
519
|
+
|
520
|
+
Sanitize.clean!('<div><span>foo</span></div><p>bar</p>', :transformers_breadth => proc {|env|
|
521
|
+
env[:traversal_mode].must_equal(:breadth)
|
522
|
+
nodes << env[:node_name] if env[:node].element?
|
523
|
+
})
|
524
|
+
|
525
|
+
nodes.must_equal(['div', 'span', 'p'])
|
526
|
+
end
|
527
|
+
|
528
|
+
it 'should whitelist nodes in the node whitelist' do
|
529
|
+
Sanitize.clean!('<div class="foo">foo</div><span>bar</span>', :transformers => [
|
530
|
+
proc {|env|
|
531
|
+
{:node_whitelist => [env[:node]]} if env[:node_name] == 'div'
|
532
|
+
},
|
533
|
+
|
534
|
+
proc {|env|
|
535
|
+
env[:is_whitelisted].must_equal(false) unless env[:node_name] == 'div'
|
536
|
+
env[:is_whitelisted].must_equal(true) if env[:node_name] == 'div'
|
537
|
+
env[:node_whitelist].must_include(env[:node]) if env[:node_name] == 'div'
|
538
|
+
}
|
539
|
+
]).must_equal('<div class="foo">foo</div>bar')
|
540
|
+
end
|
541
|
+
|
542
|
+
it 'should clear the node whitelist after each fragment' do
|
543
|
+
called = false
|
544
|
+
|
545
|
+
Sanitize.clean!('<div>foo</div>', :transformers => proc {|env|
|
546
|
+
{:node_whitelist => [env[:node]]}
|
547
|
+
})
|
548
|
+
|
549
|
+
Sanitize.clean!('<div>foo</div>', :transformers => proc {|env|
|
550
|
+
called = true
|
551
|
+
env[:is_whitelisted].must_equal(false)
|
552
|
+
env[:node_whitelist].must_be_empty
|
553
|
+
})
|
554
|
+
|
555
|
+
called.must_equal(true)
|
556
|
+
end
|
557
|
+
|
558
|
+
it 'should allow youtube video embeds via the youtube transformer' do
|
559
|
+
input = '<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
560
|
+
output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
|
561
|
+
|
562
|
+
Sanitize.clean!(input, :transformers => youtube).must_equal(output)
|
563
|
+
end
|
564
|
+
|
565
|
+
it 'should allow https youtube video embeds via the youtube transformer' do
|
566
|
+
input = '<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
567
|
+
output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
|
568
|
+
|
569
|
+
Sanitize.clean!(input, :transformers => youtube).must_equal(output)
|
570
|
+
end
|
571
|
+
|
572
|
+
it 'should allow privacy-enhanced youtube video embeds via the youtube transformer' do
|
573
|
+
input = '<iframe width="420" height="315" src="http://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
|
574
|
+
output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="http://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
|
575
|
+
|
576
|
+
Sanitize.clean!(input, :transformers => youtube).must_equal(output)
|
577
|
+
end
|
578
|
+
|
579
|
+
it 'should not allow non-youtube video embeds via the youtube transformer' do
|
580
|
+
input = '<iframe width="420" height="315" src="http://www.fake-youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen></iframe>'
|
581
|
+
output = ''
|
582
|
+
|
583
|
+
Sanitize.clean!(input, :transformers => youtube).must_equal(output)
|
584
|
+
end
|
585
|
+
end
|
586
|
+
|
587
|
+
describe 'bugs' do
|
588
|
+
it 'should not have Nokogiri 1.4.2+ unterminated script/style element bug' do
|
589
|
+
Sanitize.clean!('foo <script>bar').must_equal('foo bar')
|
590
|
+
Sanitize.clean!('foo <style>bar').must_equal('foo bar')
|
591
|
+
end
|
592
|
+
end
|
593
|
+
|
594
|
+
describe "default configurations" do
|
595
|
+
def assert_deep_frozen(config)
|
596
|
+
if Hash === config
|
597
|
+
config.each_value { |c| assert_deep_frozen(c) }
|
598
|
+
config.frozen?.must_equal(true)
|
599
|
+
elsif Array === config
|
600
|
+
config.each { |c| assert_deep_frozen(c) }
|
601
|
+
config.frozen?.must_equal(true)
|
602
|
+
end
|
603
|
+
end
|
604
|
+
|
605
|
+
{
|
606
|
+
"DEFAULT" => Sanitize::Config::DEFAULT,
|
607
|
+
"RESTRICTED" => Sanitize::Config::RESTRICTED,
|
608
|
+
"BASIC" => Sanitize::Config::BASIC,
|
609
|
+
"RELAXED" => Sanitize::Config::RELAXED,
|
610
|
+
}.each do |name, config|
|
611
|
+
describe name do
|
612
|
+
it "should be frozen" do
|
613
|
+
assert_deep_frozen(config)
|
614
|
+
end
|
615
|
+
end
|
616
|
+
end
|
617
|
+
|
618
|
+
it "cannot be modified" do
|
619
|
+
assert_raises(RuntimeError, "can't modify frozen") {
|
620
|
+
Sanitize::Config::RESTRICTED.dup[:elements].push("script")
|
621
|
+
}
|
622
|
+
end
|
623
|
+
end
|
metadata
CHANGED
@@ -1,29 +1,29 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.0.
|
4
|
+
version: 2.0.5
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2013-
|
11
|
+
date: 2013-07-10 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: nokogiri
|
15
15
|
requirement: !ruby/object:Gem::Requirement
|
16
16
|
requirements:
|
17
|
-
- -
|
17
|
+
- - '>='
|
18
18
|
- !ruby/object:Gem::Version
|
19
|
-
version: 1.
|
19
|
+
version: 1.4.4
|
20
20
|
type: :runtime
|
21
21
|
prerelease: false
|
22
22
|
version_requirements: !ruby/object:Gem::Requirement
|
23
23
|
requirements:
|
24
|
-
- -
|
24
|
+
- - '>='
|
25
25
|
- !ruby/object:Gem::Version
|
26
|
-
version: 1.
|
26
|
+
version: 1.4.4
|
27
27
|
- !ruby/object:Gem::Dependency
|
28
28
|
name: minitest
|
29
29
|
requirement: !ruby/object:Gem::Requirement
|
@@ -62,6 +62,7 @@ files:
|
|
62
62
|
- LICENSE
|
63
63
|
- README.rdoc
|
64
64
|
- lib/sanitize/config/basic.rb
|
65
|
+
- lib/sanitize/config/default.rb
|
65
66
|
- lib/sanitize/config/relaxed.rb
|
66
67
|
- lib/sanitize/config/restricted.rb
|
67
68
|
- lib/sanitize/config.rb
|
@@ -70,6 +71,7 @@ files:
|
|
70
71
|
- lib/sanitize/transformers/clean_element.rb
|
71
72
|
- lib/sanitize/version.rb
|
72
73
|
- lib/sanitize.rb
|
74
|
+
- test/test_sanitize.rb
|
73
75
|
homepage: https://github.com/rgrove/sanitize/
|
74
76
|
licenses: []
|
75
77
|
metadata: {}
|