sanitize 2.0.4 → 2.0.5

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 80fc1e1df6ce565080a348c635645027839d8fbf
4
- data.tar.gz: ae3afc1372f0db565647ab7f46310531a0f8b940
3
+ metadata.gz: e5c69e88d2fe8117cf4ef82ff0d1469c67444b9d
4
+ data.tar.gz: 7708f7ca071901b213f06a542626e246c6d60409
5
5
  SHA512:
6
- metadata.gz: fc144309e028fcc172f3e4066dd34789c1a1b7f8abd341009052daa12672710a1e6ca52befd7f0cf39308c56fca70317e11c459f4b086d10c071d744477da70d
7
- data.tar.gz: 962f5668cf47a557cc17d3f6eca26bcee0df23f93f76d11014fb53f5650560a5a8bbc171b63ddb00d926fa847154958f27d93c9b2e0a46cd4cc585f27cf4a309
6
+ metadata.gz: 90de33c0a818467a0df40564c9230a111744bd7cb064b1c3f68a4403d605970f33af2f5604b87a2c6103e7fb847061227d196675a4089f9e7f29b1748611e2fa
7
+ data.tar.gz: 1ae481892f1a1b7dbcdd20f02eebad70d320bb1e8f8ec888c6da61b8274b931074ceea562c46c02d90955fcda5118f4b69aa7332ff042162a4e5ef23f6760623
data/HISTORY.md CHANGED
@@ -1,188 +1,197 @@
1
1
  Sanitize History
2
2
  ================================================================================
3
3
 
4
+ Version 2.0.5 (2013-07-10)
5
+ --------------------------
6
+
7
+ * Loosened the Nokogiri dependency back to >= 1.4.4 to allow Sanitize to coexist
8
+ in newer Rubies with other libraries that restrict Nokogiri to 1.5.x for 1.8.7
9
+ compatibility. Sanitize still no longer supports 1.8.7, but this should make
10
+ life easier for people who need those other libs.
11
+
12
+
4
13
  Version 2.0.4 (2013-06-12)
5
14
  --------------------------
6
15
 
7
- * Added `Sanitize.clean_document`, which sanitizes a full HTML document rather
8
- than just a fragment. [Ben Anderson]
16
+ * Added `Sanitize.clean_document`, which sanitizes a full HTML document rather
17
+ than just a fragment. [Ben Anderson]
9
18
 
10
- * Nokogiri dependency bumped to 1.6.x.
19
+ * Nokogiri dependency bumped to 1.6.x.
11
20
 
12
- * Dropped support for Ruby versions older than 1.9.2.
21
+ * Dropped support for Ruby versions older than 1.9.2.
13
22
 
14
23
 
15
24
  Version 2.0.3 (2011-07-01)
16
25
  --------------------------
17
26
 
18
- * Loosened the Nokogiri dependency to allow Nokogiri 1.5.x.
27
+ * Loosened the Nokogiri dependency to allow Nokogiri 1.5.x.
19
28
 
20
29
 
21
30
  Version 2.0.2 (2011-05-21)
22
31
  --------------------------
23
32
 
24
- * Fixed a bug in which a protocol like "java\script:" would be translated to
25
- "java%5Cscript:" and allowed through the filter when relative URLs were
26
- enabled. This didn't actually allow malicious code to run, but it is
27
- undesired behavior.
33
+ * Fixed a bug in which a protocol like "java\script:" would be translated to
34
+ "java%5Cscript:" and allowed through the filter when relative URLs were
35
+ enabled. This didn't actually allow malicious code to run, but it is
36
+ undesired behavior.
28
37
 
29
38
 
30
39
  Version 2.0.1 (2011-03-16)
31
40
  --------------------------
32
41
 
33
- * Updated the protocol regex to anchor at the beginning of the string rather
34
- than the beginning of a line. [Eaden McKee]
42
+ * Updated the protocol regex to anchor at the beginning of the string rather
43
+ than the beginning of a line. [Eaden McKee]
35
44
 
36
45
 
37
46
  Version 2.0.0 (2011-01-15)
38
47
  --------------------------
39
48
 
40
- * The environment data passed into transformers and the return values expected
41
- from transformers have changed. Old transformers will need to be updated.
42
- See the README for details.
43
- * Transformers now receive nodes of all types, not just element nodes.
44
- * Sanitize's own core filtering logic is now implemented as a set of always-on
45
- transformers.
46
- * The default value for the `:output` config is now `:html`. Previously it was
47
- `:xhtml`.
48
- * Added a `:whitespace_elements` config, which specifies elements (such as
49
- `<br>` and `<p>`) that should be replaced with whitespace when removed in
50
- order to preserve readability. See the README for the default list of
51
- elements that will be replaced with whitespace when removed.
52
- * Added a `:transformers_breadth` config, which may be used to specify
53
- transformers that should traverse nodes in a breadth-first mode rather than
54
- the default depth-first mode.
55
- * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
56
- elements to the whitelists for the basic and relaxed configs.
57
- * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
58
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
59
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
60
- elements in the relaxed config.
61
- * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
62
- (issue #315) that caused `</body></html>` to be appended to the CDATA inside
63
- unterminated script and style elements.
49
+ * The environment data passed into transformers and the return values expected
50
+ from transformers have changed. Old transformers will need to be updated.
51
+ See the README for details.
52
+ * Transformers now receive nodes of all types, not just element nodes.
53
+ * Sanitize's own core filtering logic is now implemented as a set of always-on
54
+ transformers.
55
+ * The default value for the `:output` config is now `:html`. Previously it was
56
+ `:xhtml`.
57
+ * Added a `:whitespace_elements` config, which specifies elements (such as
58
+ `<br>` and `<p>`) that should be replaced with whitespace when removed in
59
+ order to preserve readability. See the README for the default list of
60
+ elements that will be replaced with whitespace when removed.
61
+ * Added a `:transformers_breadth` config, which may be used to specify
62
+ transformers that should traverse nodes in a breadth-first mode rather than
63
+ the default depth-first mode.
64
+ * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
65
+ elements to the whitelists for the basic and relaxed configs.
66
+ * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
67
+ `ruby`, and `wbr` elements to the whitelist for the relaxed config.
68
+ * The `dir`, `lang`, and `title` attributes are now whitelisted for all
69
+ elements in the relaxed config.
70
+ * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
71
+ (issue #315) that caused `</body></html>` to be appended to the CDATA inside
72
+ unterminated script and style elements.
64
73
 
65
74
 
66
75
  Version 1.2.1 (2010-04-20)
67
76
  --------------------------
68
77
 
69
- * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
70
- remove the contents of all non-whitelisted elements in addition to the
71
- elements themselves. If set to an array of element names, Sanitize will
72
- remove the contents of only those elements (when filtered), and leave the
73
- contents of other filtered elements. [Thanks to Rafael Souza for the array
74
- option]
75
- * Added an `:output_encoding` config setting to allow the character encoding
76
- for HTML output to be specified. The default is utf-8.
77
- * The environment hash passed into transformers now includes a `:node_name`
78
- item containing the lowercase name of the current HTML node (e.g. "div").
79
- * Returning anything other than a Hash or nil from a transformer will now
80
- raise a meaningful `Sanitize::Error` exception rather than an unintended
81
- `NameError`.
78
+ * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
79
+ remove the contents of all non-whitelisted elements in addition to the
80
+ elements themselves. If set to an array of element names, Sanitize will
81
+ remove the contents of only those elements (when filtered), and leave the
82
+ contents of other filtered elements. [Thanks to Rafael Souza for the array
83
+ option]
84
+ * Added an `:output_encoding` config setting to allow the character encoding
85
+ for HTML output to be specified. The default is utf-8.
86
+ * The environment hash passed into transformers now includes a `:node_name`
87
+ item containing the lowercase name of the current HTML node (e.g. "div").
88
+ * Returning anything other than a Hash or nil from a transformer will now
89
+ raise a meaningful `Sanitize::Error` exception rather than an unintended
90
+ `NameError`.
82
91
 
83
92
 
84
93
  Version 1.2.0 (2010-01-17)
85
94
  --------------------------
86
95
 
87
- * Requires Nokogiri ~> 1.4.1.
88
- * Added support for transformers, which allow you to filter and alter nodes
89
- using your own custom logic, on top of (or instead of) Sanitize's core
90
- filter. See the README for details and examples.
91
- * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
92
- all its children.
93
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
94
- David Reese]
96
+ * Requires Nokogiri ~> 1.4.1.
97
+ * Added support for transformers, which allow you to filter and alter nodes
98
+ using your own custom logic, on top of (or instead of) Sanitize's core
99
+ filter. See the README for details and examples.
100
+ * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
101
+ all its children.
102
+ * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
103
+ David Reese]
95
104
 
96
105
 
97
106
  Version 1.1.0 (2009-10-11)
98
107
  --------------------------
99
108
 
100
- * Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
101
- * Added an `:output` config setting to allow the output format to be
102
- specified. Supported formats are `:xhtml` (the default) and `:html` (which
103
- outputs HTML4).
104
- * Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
105
- path segments. [Peter Cooper]
109
+ * Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
110
+ * Added an `:output` config setting to allow the output format to be
111
+ specified. Supported formats are `:xhtml` (the default) and `:html` (which
112
+ outputs HTML4).
113
+ * Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
114
+ path segments. [Peter Cooper]
106
115
 
107
116
 
108
117
  Version 1.0.8 (2009-04-23)
109
118
  --------------------------
110
119
 
111
- * Added a workaround for an Hpricot bug that prevents attribute names from
112
- being downcased in recent versions of Hpricot. This was exploitable to
113
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
114
- Wanicur]
120
+ * Added a workaround for an Hpricot bug that prevents attribute names from
121
+ being downcased in recent versions of Hpricot. This was exploitable to
122
+ prevent non-whitelisted protocols from being cleaned. [Reported by Ben
123
+ Wanicur]
115
124
 
116
125
 
117
126
  Version 1.0.7 (2009-04-11)
118
127
  --------------------------
119
128
 
120
- * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
121
- * Fixed a bug that caused named character entities containing digits (like
122
- `&sup2;`) to be escaped when they shouldn't have been. [Reported by
123
- Sebastian Steinmetz]
129
+ * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
130
+ * Fixed a bug that caused named character entities containing digits (like
131
+ `&sup2;`) to be escaped when they shouldn't have been. [Reported by
132
+ Sebastian Steinmetz]
124
133
 
125
134
 
126
135
  Version 1.0.6 (2009-02-23)
127
136
  --------------------------
128
137
 
129
- * Removed htmlentities gem dependency.
130
- * Existing well-formed character entity references in the input string are now
131
- preserved rather than being decoded and re-encoded.
132
- * The `'` character is now encoded as `&#39;` instead of `&apos;` to prevent
133
- problems in IE6.
134
- * You can now specify the symbol `:all` in place of an element name in the
135
- attributes config hash to allow certain attributes on all elements. [Thanks
136
- to Mutwin Kraus]
138
+ * Removed htmlentities gem dependency.
139
+ * Existing well-formed character entity references in the input string are now
140
+ preserved rather than being decoded and re-encoded.
141
+ * The `'` character is now encoded as `&#39;` instead of `&apos;` to prevent
142
+ problems in IE6.
143
+ * You can now specify the symbol `:all` in place of an element name in the
144
+ attributes config hash to allow certain attributes on all elements. [Thanks
145
+ to Mutwin Kraus]
137
146
 
138
147
 
139
148
  Version 1.0.5 (2009-02-05)
140
149
  --------------------------
141
150
 
142
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
143
- protocols from being cleaned when relative URLs were allowed. [Reported by
144
- Dev Purkayastha]
145
- * Fixed "undefined method `parent='" exceptions caused by parser changes in
146
- edge Hpricot.
151
+ * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
152
+ protocols from being cleaned when relative URLs were allowed. [Reported by
153
+ Dev Purkayastha]
154
+ * Fixed "undefined method `parent='" exceptions caused by parser changes in
155
+ edge Hpricot.
147
156
 
148
157
 
149
158
  Version 1.0.4 (2009-01-16)
150
159
  --------------------------
151
160
 
152
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
153
- by repeating it several times in a row. All versions of Sanitize prior to
154
- 1.0.4 are vulnerable. [Reported by Cristobal]
161
+ * Fixed a bug that made it possible to sneak a non-whitelisted element through
162
+ by repeating it several times in a row. All versions of Sanitize prior to
163
+ 1.0.4 are vulnerable. [Reported by Cristobal]
155
164
 
156
165
 
157
166
  Version 1.0.3 (2009-01-15)
158
167
  --------------------------
159
168
 
160
- * Fixed a bug whereby incomplete Unicode or hex entities could be used to
161
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
162
- still decode the incomplete entities, users of those browsers may be
163
- vulnerable to malicious script injection on websites using versions of
164
- Sanitize prior to 1.0.3.
169
+ * Fixed a bug whereby incomplete Unicode or hex entities could be used to
170
+ prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
171
+ still decode the incomplete entities, users of those browsers may be
172
+ vulnerable to malicious script injection on websites using versions of
173
+ Sanitize prior to 1.0.3.
165
174
 
166
175
 
167
176
  Version 1.0.2 (2009-01-04)
168
177
  --------------------------
169
178
 
170
- * Fixed a bug that caused an exception to be thrown when parsing a valueless
171
- attribute that's expected to contain a URL.
179
+ * Fixed a bug that caused an exception to be thrown when parsing a valueless
180
+ attribute that's expected to contain a URL.
172
181
 
173
182
 
174
183
  Version 1.0.1 (2009-01-01)
175
184
  --------------------------
176
185
 
177
- * You can now specify `:relative` in a protocol config array to allow
178
- attributes containing relative URLs with no protocol. The Basic and Relaxed
179
- configs have been updated to allow relative URLs.
180
- * Added a workaround for an Hpricot bug that causes HTML entities for
181
- non-ASCII characters to be replaced by question marks, and all other
182
- entities to be destructively decoded.
186
+ * You can now specify `:relative` in a protocol config array to allow
187
+ attributes containing relative URLs with no protocol. The Basic and Relaxed
188
+ configs have been updated to allow relative URLs.
189
+ * Added a workaround for an Hpricot bug that causes HTML entities for
190
+ non-ASCII characters to be replaced by question marks, and all other
191
+ entities to be destructively decoded.
183
192
 
184
193
 
185
194
  Version 1.0.0 (2008-12-25)
186
195
  --------------------------
187
196
 
188
- * First release.
197
+ * First release.
data/lib/sanitize.rb CHANGED
@@ -26,6 +26,7 @@ require 'set'
26
26
  require 'nokogiri'
27
27
  require 'sanitize/version'
28
28
  require 'sanitize/config'
29
+ require 'sanitize/config/default'
29
30
  require 'sanitize/config/restricted'
30
31
  require 'sanitize/config/basic'
31
32
  require 'sanitize/config/relaxed'
@@ -22,64 +22,17 @@
22
22
 
23
23
  class Sanitize
24
24
  module Config
25
- DEFAULT = {
26
25
 
27
- # Whether or not to allow HTML comments. Allowing comments is strongly
28
- # discouraged, since IE allows script execution within conditional
29
- # comments.
30
- :allow_comments => false,
26
+ # Deeply freeze and return a configuration Hash.
27
+ def self.freeze_config(config)
28
+ if Array === config
29
+ config.each { |c| freeze_config(c) }
30
+ elsif Hash === config
31
+ config.each_value { |c| freeze_config(c) }
32
+ end
31
33
 
32
- # HTML attributes to add to specific elements. By default, no attributes
33
- # are added.
34
- :add_attributes => {},
34
+ config.freeze
35
+ end
35
36
 
36
- # HTML attributes to allow in specific elements. By default, no attributes
37
- # are allowed.
38
- :attributes => {},
39
-
40
- # HTML elements to allow. By default, no elements are allowed (which means
41
- # that all HTML will be stripped).
42
- :elements => [],
43
-
44
- # Output format. Supported formats are :html and :xhtml. Default is :html.
45
- :output => :html,
46
-
47
- # Character encoding to use for HTML output. Default is 'utf-8'.
48
- :output_encoding => 'utf-8',
49
-
50
- # URL handling protocols to allow in specific attributes. By default, no
51
- # protocols are allowed. Use :relative in place of a protocol if you want
52
- # to allow relative URLs sans protocol.
53
- :protocols => {},
54
-
55
- # If this is true, Sanitize will remove the contents of any filtered
56
- # elements in addition to the elements themselves. By default, Sanitize
57
- # leaves the safe parts of an element's contents behind when the element
58
- # is removed.
59
- #
60
- # If this is an Array of element names, then only the contents of the
61
- # specified elements (when filtered) will be removed, and the contents of
62
- # all other filtered elements will be left behind.
63
- :remove_contents => false,
64
-
65
- # Transformers allow you to filter or alter nodes using custom logic. See
66
- # README.rdoc for details and examples.
67
- :transformers => [],
68
-
69
- # By default, transformers perform depth-first traversal (deepest node
70
- # upward). This setting allows you to specify transformers that should
71
- # perform breadth-first traversal (top node downward).
72
- :transformers_breadth => [],
73
-
74
- # Elements which, when removed, should have their contents surrounded by
75
- # space characters to preserve readability. For example,
76
- # `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
77
- # removed.
78
- :whitespace_elements => %w[
79
- address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
80
- h6 header hgroup hr li nav ol p pre section ul
81
- ]
82
-
83
- }
84
37
  end
85
38
  end
@@ -22,7 +22,7 @@
22
22
 
23
23
  class Sanitize
24
24
  module Config
25
- BASIC = {
25
+ BASIC = freeze_config(
26
26
  :elements => %w[
27
27
  a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
28
28
  q s samp small strike strong sub sup time u ul var
@@ -46,6 +46,6 @@ class Sanitize
46
46
  'blockquote' => {'cite' => ['http', 'https', :relative]},
47
47
  'q' => {'cite' => ['http', 'https', :relative]}
48
48
  }
49
- }
49
+ )
50
50
  end
51
51
  end
@@ -0,0 +1,85 @@
1
+ #--
2
+ # Copyright (c) 2013 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ DEFAULT = freeze_config(
26
+
27
+ # Whether or not to allow HTML comments. Allowing comments is strongly
28
+ # discouraged, since IE allows script execution within conditional
29
+ # comments.
30
+ :allow_comments => false,
31
+
32
+ # HTML attributes to add to specific elements. By default, no attributes
33
+ # are added.
34
+ :add_attributes => {},
35
+
36
+ # HTML attributes to allow in specific elements. By default, no attributes
37
+ # are allowed.
38
+ :attributes => {},
39
+
40
+ # HTML elements to allow. By default, no elements are allowed (which means
41
+ # that all HTML will be stripped).
42
+ :elements => [],
43
+
44
+ # Output format. Supported formats are :html and :xhtml. Default is :html.
45
+ :output => :html,
46
+
47
+ # Character encoding to use for HTML output. Default is 'utf-8'.
48
+ :output_encoding => 'utf-8',
49
+
50
+ # URL handling protocols to allow in specific attributes. By default, no
51
+ # protocols are allowed. Use :relative in place of a protocol if you want
52
+ # to allow relative URLs sans protocol.
53
+ :protocols => {},
54
+
55
+ # If this is true, Sanitize will remove the contents of any filtered
56
+ # elements in addition to the elements themselves. By default, Sanitize
57
+ # leaves the safe parts of an element's contents behind when the element
58
+ # is removed.
59
+ #
60
+ # If this is an Array of element names, then only the contents of the
61
+ # specified elements (when filtered) will be removed, and the contents of
62
+ # all other filtered elements will be left behind.
63
+ :remove_contents => false,
64
+
65
+ # Transformers allow you to filter or alter nodes using custom logic. See
66
+ # README.rdoc for details and examples.
67
+ :transformers => [],
68
+
69
+ # By default, transformers perform depth-first traversal (deepest node
70
+ # upward). This setting allows you to specify transformers that should
71
+ # perform breadth-first traversal (top node downward).
72
+ :transformers_breadth => [],
73
+
74
+ # Elements which, when removed, should have their contents surrounded by
75
+ # space characters to preserve readability. For example,
76
+ # `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
77
+ # removed.
78
+ :whitespace_elements => %w[
79
+ address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
80
+ h6 header hgroup hr li nav ol p pre section ul
81
+ ]
82
+
83
+ )
84
+ end
85
+ end
@@ -22,7 +22,7 @@
22
22
 
23
23
  class Sanitize
24
24
  module Config
25
- RELAXED = {
25
+ RELAXED = freeze_config(
26
26
  :elements => %w[
27
27
  a abbr b bdo blockquote br caption cite code col colgroup dd del dfn dl
28
28
  dt em figcaption figure h1 h2 h3 h4 h5 h6 hgroup i img ins kbd li mark
@@ -56,6 +56,6 @@ class Sanitize
56
56
  'ins' => {'cite' => ['http', 'https', :relative]},
57
57
  'q' => {'cite' => ['http', 'https', :relative]}
58
58
  }
59
- }
59
+ )
60
60
  end
61
61
  end
@@ -22,8 +22,8 @@
22
22
 
23
23
  class Sanitize
24
24
  module Config
25
- RESTRICTED = {
25
+ RESTRICTED = freeze_config(
26
26
  :elements => %w[b em i strong u]
27
- }
27
+ )
28
28
  end
29
29
  end
@@ -1,3 +1,3 @@
1
1
  class Sanitize
2
- VERSION = '2.0.4'
2
+ VERSION = '2.0.5'
3
3
  end
@@ -0,0 +1,623 @@
1
+ # encoding: utf-8
2
+ #--
3
+ # Copyright (c) 2013 Ryan Grove <ryan@wonko.com>
4
+ #
5
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ # of this software and associated documentation files (the 'Software'), to deal
7
+ # in the Software without restriction, including without limitation the rights
8
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ # copies of the Software, and to permit persons to whom the Software is
10
+ # furnished to do so, subject to the following conditions:
11
+ #
12
+ # The above copyright notice and this permission notice shall be included in all
13
+ # copies or substantial portions of the Software.
14
+ #
15
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ # SOFTWARE.
22
+ #++
23
+
24
+ require 'rubygems'
25
+ gem 'minitest'
26
+
27
+ require 'minitest/autorun'
28
+ require 'sanitize'
29
+
30
+ strings = {
31
+ :basic => {
32
+ :html => '<b>Lo<!-- comment -->rem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <script>alert("hello world");</script>',
33
+ :default => 'Lorem ipsum dolor sit amet alert("hello world");',
34
+ :restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet alert("hello world");',
35
+ :basic => '<b>Lorem</b> <a href="pants" rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet alert("hello world");',
36
+ :relaxed => '<b>Lorem</b> <a href="pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet alert("hello world");'
37
+ },
38
+
39
+ :malformed => {
40
+ :html => 'Lo<!-- comment -->rem</b> <a href=pants title="foo>ipsum <a href="http://foo.com/"><strong>dolor</a></strong> sit<br/>amet <script>alert("hello world");',
41
+ :default => 'Lorem dolor sit amet alert("hello world");',
42
+ :restricted => 'Lorem <strong>dolor</strong> sit amet alert("hello world");',
43
+ :basic => 'Lorem <a href="pants" rel="nofollow"><strong>dolor</strong></a> sit<br>amet alert("hello world");',
44
+ :relaxed => 'Lorem <a href="pants" title="foo&gt;ipsum &lt;a href="><strong>dolor</strong></a> sit<br>amet alert("hello world");',
45
+ :document => ' Lorem dolor sit amet alert("hello world"); '
46
+ },
47
+
48
+ :unclosed => {
49
+ :html => '<p>a</p><blockquote>b',
50
+ :default => ' a b ',
51
+ :restricted => ' a b ',
52
+ :basic => '<p>a</p><blockquote>b</blockquote>',
53
+ :relaxed => '<p>a</p><blockquote>b</blockquote>'
54
+ },
55
+
56
+ :malicious => {
57
+ :html => '<b>Lo<!-- comment -->rem</b> <a href="javascript:pants" title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br/>amet <<foo>script>alert("hello world");</script>',
58
+ :default => 'Lorem ipsum dolor sit amet script&gt;alert("hello world");',
59
+ :restricted => '<b>Lorem</b> ipsum <strong>dolor</strong> sit amet script&gt;alert("hello world");',
60
+ :basic => '<b>Lorem</b> <a rel="nofollow">ipsum</a> <a href="http://foo.com/" rel="nofollow"><strong>dolor</strong></a> sit<br>amet script&gt;alert("hello world");',
61
+ :relaxed => '<b>Lorem</b> <a title="foo">ipsum</a> <a href="http://foo.com/"><strong>dolor</strong></a> sit<br>amet script&gt;alert("hello world");'
62
+ },
63
+
64
+ :raw_comment => {
65
+ :html => '<!-- comment -->Hello',
66
+ :default => 'Hello',
67
+ :restricted => 'Hello',
68
+ :basic => 'Hello',
69
+ :relaxed => 'Hello',
70
+ :document => ' Hello ',
71
+ }
72
+ }
73
+
74
+ tricky = {
75
+ 'protocol-based JS injection: simple, no spaces' => {
76
+ :html => '<a href="javascript:alert(\'XSS\');">foo</a>',
77
+ :default => 'foo',
78
+ :restricted => 'foo',
79
+ :basic => '<a rel="nofollow">foo</a>',
80
+ :relaxed => '<a>foo</a>'
81
+ },
82
+
83
+ 'protocol-based JS injection: simple, spaces before' => {
84
+ :html => '<a href="javascript :alert(\'XSS\');">foo</a>',
85
+ :default => 'foo',
86
+ :restricted => 'foo',
87
+ :basic => '<a rel="nofollow">foo</a>',
88
+ :relaxed => '<a>foo</a>'
89
+ },
90
+
91
+ 'protocol-based JS injection: simple, spaces after' => {
92
+ :html => '<a href="javascript: alert(\'XSS\');">foo</a>',
93
+ :default => 'foo',
94
+ :restricted => 'foo',
95
+ :basic => '<a rel="nofollow">foo</a>',
96
+ :relaxed => '<a>foo</a>'
97
+ },
98
+
99
+ 'protocol-based JS injection: simple, spaces before and after' => {
100
+ :html => '<a href="javascript : alert(\'XSS\');">foo</a>',
101
+ :default => 'foo',
102
+ :restricted => 'foo',
103
+ :basic => '<a rel="nofollow">foo</a>',
104
+ :relaxed => '<a>foo</a>'
105
+ },
106
+
107
+ 'protocol-based JS injection: preceding colon' => {
108
+ :html => '<a href=":javascript:alert(\'XSS\');">foo</a>',
109
+ :default => 'foo',
110
+ :restricted => 'foo',
111
+ :basic => '<a rel="nofollow">foo</a>',
112
+ :relaxed => '<a>foo</a>'
113
+ },
114
+
115
+ 'protocol-based JS injection: UTF-8 encoding' => {
116
+ :html => '<a href="javascript&#58;">foo</a>',
117
+ :default => 'foo',
118
+ :restricted => 'foo',
119
+ :basic => '<a rel="nofollow">foo</a>',
120
+ :relaxed => '<a>foo</a>'
121
+ },
122
+
123
+ 'protocol-based JS injection: long UTF-8 encoding' => {
124
+ :html => '<a href="javascript&#0058;">foo</a>',
125
+ :default => 'foo',
126
+ :restricted => 'foo',
127
+ :basic => '<a rel="nofollow">foo</a>',
128
+ :relaxed => '<a>foo</a>'
129
+ },
130
+
131
+ 'protocol-based JS injection: long UTF-8 encoding without semicolons' => {
132
+ :html => '<a href=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>foo</a>',
133
+ :default => 'foo',
134
+ :restricted => 'foo',
135
+ :basic => '<a rel="nofollow">foo</a>',
136
+ :relaxed => '<a>foo</a>'
137
+ },
138
+
139
+ 'protocol-based JS injection: hex encoding' => {
140
+ :html => '<a href="javascript&#x3A;">foo</a>',
141
+ :default => 'foo',
142
+ :restricted => 'foo',
143
+ :basic => '<a rel="nofollow">foo</a>',
144
+ :relaxed => '<a>foo</a>'
145
+ },
146
+
147
+ 'protocol-based JS injection: long hex encoding' => {
148
+ :html => '<a href="javascript&#x003A;">foo</a>',
149
+ :default => 'foo',
150
+ :restricted => 'foo',
151
+ :basic => '<a rel="nofollow">foo</a>',
152
+ :relaxed => '<a>foo</a>'
153
+ },
154
+
155
+ 'protocol-based JS injection: hex encoding without semicolons' => {
156
+ :html => '<a href=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>foo</a>',
157
+ :default => 'foo',
158
+ :restricted => 'foo',
159
+ :basic => '<a rel="nofollow">foo</a>',
160
+ :relaxed => '<a>foo</a>'
161
+ },
162
+
163
+ 'protocol-based JS injection: null char' => {
164
+ :html => "<img src=java\0script:alert(\"XSS\")>",
165
+ :default => '',
166
+ :restricted => '',
167
+ :basic => '',
168
+ :relaxed => '<img src="java">' # everything following the null char gets stripped, and URL is considered relative
169
+ },
170
+
171
+ 'protocol-based JS injection: invalid URL char' => {
172
+ :html => '<img src=java\script:alert("XSS")>',
173
+ :default => '',
174
+ :restricted => '',
175
+ :basic => '',
176
+ :relaxed => '<img>'
177
+ },
178
+
179
+ 'protocol-based JS injection: spaces and entities' => {
180
+ :html => '<img src=" &#14; javascript:alert(\'XSS\');">',
181
+ :default => '',
182
+ :restricted => '',
183
+ :basic => '',
184
+ :relaxed => '<img src="">'
185
+ }
186
+ }
187
+
188
+ describe 'Config::DEFAULT' do
189
+ it 'should translate valid HTML entities' do
190
+ Sanitize.clean("Don&apos;t tas&eacute; me &amp; bro!").must_equal("Don't tasé me &amp; bro!")
191
+ end
192
+
193
+ it 'should translate valid HTML entities while encoding unencoded ampersands' do
194
+ Sanitize.clean("cookies&sup2; & &frac14; cr&eacute;me").must_equal("cookies² &amp; ¼ créme")
195
+ end
196
+
197
+ it 'should never output &apos;' do
198
+ Sanitize.clean("<a href='&apos;' class=\"' &#39;\">IE6 isn't a real browser</a>").wont_match(/&apos;/)
199
+ end
200
+
201
+ it 'should not choke on several instances of the same element in a row' do
202
+ Sanitize.clean('<img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif"><img src="http://www.google.com/intl/en_ALL/images/logo.gif">').must_equal('')
203
+ end
204
+
205
+ it 'should surround the contents of :whitespace_elements with space characters when removing the element' do
206
+ Sanitize.clean('foo<div>bar</div>baz').must_equal('foo bar baz')
207
+ Sanitize.clean('foo<br>bar<br>baz').must_equal('foo bar baz')
208
+ Sanitize.clean('foo<hr>bar<hr>baz').must_equal('foo bar baz')
209
+ end
210
+
211
+ strings.each do |name, data|
212
+ it "should clean #{name} HTML" do
213
+ Sanitize.clean(data[:html]).must_equal(data[:default])
214
+ end
215
+ end
216
+
217
+ tricky.each do |name, data|
218
+ it "should not allow #{name}" do
219
+ Sanitize.clean(data[:html]).must_equal(data[:default])
220
+ end
221
+ end
222
+ end
223
+
224
+ describe 'Config::RESTRICTED' do
225
+ before { @s = Sanitize.new(Sanitize::Config::RESTRICTED) }
226
+
227
+ strings.each do |name, data|
228
+ it "should clean #{name} HTML" do
229
+ @s.clean(data[:html]).must_equal(data[:restricted])
230
+ end
231
+ end
232
+
233
+ tricky.each do |name, data|
234
+ it "should not allow #{name}" do
235
+ @s.clean(data[:html]).must_equal(data[:restricted])
236
+ end
237
+ end
238
+ end
239
+
240
+ describe 'Config::BASIC' do
241
+ before { @s = Sanitize.new(Sanitize::Config::BASIC) }
242
+
243
+ it 'should not choke on valueless attributes' do
244
+ @s.clean('foo <a href>foo</a> bar').must_equal('foo <a href rel="nofollow">foo</a> bar')
245
+ end
246
+
247
+ it 'should downcase attribute names' do
248
+ @s.clean('<a HREF="javascript:alert(\'foo\')">bar</a>').must_equal('<a rel="nofollow">bar</a>')
249
+ end
250
+
251
+ strings.each do |name, data|
252
+ it "should clean #{name} HTML" do
253
+ @s.clean(data[:html]).must_equal(data[:basic])
254
+ end
255
+ end
256
+
257
+ tricky.each do |name, data|
258
+ it "should not allow #{name}" do
259
+ @s.clean(data[:html]).must_equal(data[:basic])
260
+ end
261
+ end
262
+ end
263
+
264
+ describe 'Config::RELAXED' do
265
+ before { @s = Sanitize.new(Sanitize::Config::RELAXED) }
266
+
267
+ it 'should encode special chars in attribute values' do
268
+ input = '<a href="http://example.com" title="<b>&eacute;xamples</b> & things">foo</a>'
269
+ output = Nokogiri::HTML.fragment('<a href="http://example.com" title="&lt;b&gt;éxamples&lt;/b&gt; &amp; things">foo</a>').to_xhtml(:encoding => 'utf-8', :indent => 0, :save_with => Nokogiri::XML::Node::SaveOptions::AS_XHTML)
270
+ @s.clean(input).must_equal(output)
271
+ end
272
+
273
+ strings.each do |name, data|
274
+ it "should clean #{name} HTML" do
275
+ @s.clean(data[:html]).must_equal(data[:relaxed])
276
+ end
277
+ end
278
+
279
+ tricky.each do |name, data|
280
+ it "should not allow #{name}" do
281
+ @s.clean(data[:html]).must_equal(data[:relaxed])
282
+ end
283
+ end
284
+ end
285
+
286
+ describe 'Full Document parser (using clean_document)' do
287
+ before {
288
+ @s = Sanitize.new({:elements => %w[!DOCTYPE html]})
289
+ @default_doctype = "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">"
290
+ }
291
+
292
+ it 'should require HTML element is whitelisted to prevent parser errors' do
293
+ assert_raises(RuntimeError, 'You must have the HTML element whitelisted') {
294
+ Sanitize.clean_document!('', {:elements => [], :remove_contents => false})
295
+ }
296
+ end
297
+
298
+ it 'should NOT require HTML element to be whitelisted if remove_contents is true' do
299
+ output = '<!DOCTYPE html><html>foo</html>'
300
+ Sanitize.clean_document!(output, {:remove_contents => true}).must_equal "<!DOCTYPE html>\n\n"
301
+ end
302
+
303
+ it 'adds a doctype tag if not included' do
304
+ @s.clean_document('').must_equal("#{@default_doctype}\n\n")
305
+ end
306
+
307
+ it 'should apply whitelist filtering to HTML element' do
308
+ output = "<!DOCTYPE html>\n<html anything='false'></html>\n\n"
309
+ @s.clean_document(output).must_equal("<!DOCTYPE html>\n<html></html>\n")
310
+ end
311
+
312
+ strings.each do |name, data|
313
+ it "should wrap #{name} with DOCTYPE and HTML tag" do
314
+ output = data[:document] || data[:default]
315
+ @s.clean_document(data[:html]).must_equal("#{@default_doctype}\n<html>#{output}</html>\n")
316
+ end
317
+ end
318
+
319
+ tricky.each do |name, data|
320
+ it "should wrap #{name} with DOCTYPE and HTML tag" do
321
+ @s.clean_document(data[:html]).must_equal("#{@default_doctype}\n<html>#{data[:default]}</html>\n")
322
+ end
323
+ end
324
+ end
325
+
326
+ describe 'Custom configs' do
327
+ it 'should allow attributes on all elements if whitelisted under :all' do
328
+ input = '<p class="foo">bar</p>'
329
+
330
+ Sanitize.clean(input).must_equal(' bar ')
331
+ Sanitize.clean(input, {:elements => ['p'], :attributes => {:all => ['class']}}).must_equal(input)
332
+ Sanitize.clean(input, {:elements => ['p'], :attributes => {'div' => ['class']}}).must_equal('<p>bar</p>')
333
+ Sanitize.clean(input, {:elements => ['p'], :attributes => {'p' => ['title'], :all => ['class']}}).must_equal(input)
334
+ end
335
+
336
+ it 'should allow comments when :allow_comments == true' do
337
+ input = 'foo <!-- bar --> baz'
338
+ Sanitize.clean(input).must_equal('foo baz')
339
+ Sanitize.clean(input, :allow_comments => true).must_equal(input)
340
+ end
341
+
342
+ it 'should allow relative URLs containing colons where the colon is not in the first path segment' do
343
+ input = '<a href="/wiki/Special:Random">Random Page</a>'
344
+ Sanitize.clean(input, { :elements => ['a'], :attributes => {'a' => ['href']}, :protocols => { 'a' => { 'href' => [:relative] }} }).must_equal(input)
345
+ end
346
+
347
+ it 'should output HTML when :output == :html' do
348
+ input = 'foo<br/>bar<br>baz'
349
+ Sanitize.clean(input, :elements => ['br'], :output => :html).must_equal('foo<br>bar<br>baz')
350
+ end
351
+
352
+ it 'should remove the contents of filtered nodes when :remove_contents == true' do
353
+ Sanitize.clean('foo bar <div>baz<span>quux</span></div>', :remove_contents => true).must_equal('foo bar ')
354
+ end
355
+
356
+ it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as strings' do
357
+ Sanitize.clean('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>', :remove_contents => ['script', 'span']).must_equal('foo bar baz ')
358
+ end
359
+
360
+ it 'should remove the contents of specified nodes when :remove_contents is an Array of element names as symbols' do
361
+ Sanitize.clean('foo bar <div>baz<span>quux</span><script>alert("hello!");</script></div>', :remove_contents => [:script, :span]).must_equal('foo bar baz ')
362
+ end
363
+
364
+ it 'should support encodings other than utf-8' do
365
+ html = 'foo&nbsp;bar'
366
+ Sanitize.clean(html).must_equal("foo\302\240bar")
367
+ Sanitize.clean(html, :output_encoding => 'ASCII').must_equal("foo&#160;bar")
368
+ end
369
+ end
370
+
371
+ describe 'Sanitize.clean' do
372
+ it 'should not modify the input string' do
373
+ input = '<b>foo</b>'
374
+ Sanitize.clean(input)
375
+ input.must_equal('<b>foo</b>')
376
+ end
377
+
378
+ it 'should return a new string' do
379
+ input = '<b>foo</b>'
380
+ Sanitize.clean(input).must_equal('foo')
381
+ end
382
+ end
383
+
384
+ describe 'Sanitize.clean!' do
385
+ it 'should modify the input string' do
386
+ input = '<b>foo</b>'
387
+ Sanitize.clean!(input)
388
+ input.must_equal('foo')
389
+ end
390
+
391
+ it 'should return the string if it was modified' do
392
+ input = '<b>foo</b>'
393
+ Sanitize.clean!(input).must_equal('foo')
394
+ end
395
+
396
+ it 'should return nil if the string was not modified' do
397
+ input = 'foo'
398
+ Sanitize.clean!(input).must_equal(nil)
399
+ end
400
+ end
401
+
402
+ describe 'Sanitize.clean_document' do
403
+ before { @config = { :elements => ['html', 'p'] } }
404
+
405
+ it 'should be idempotent' do
406
+ input = '<!DOCTYPE html><html><p>foo</p></html>'
407
+ first = Sanitize.clean_document(input, @config)
408
+ second = Sanitize.clean_document(first, @config)
409
+ second.must_equal first
410
+ second.wont_be_nil
411
+ end
412
+
413
+ it 'should handle nil without raising' do
414
+ Sanitize.clean_document(nil).must_equal nil
415
+ end
416
+
417
+ it 'should not modify the input string' do
418
+ input = '<!DOCTYPE html><b>foo</b>'
419
+ Sanitize.clean_document(input, @config)
420
+ input.must_equal('<!DOCTYPE html><b>foo</b>')
421
+ end
422
+
423
+ it 'should return a new string' do
424
+ input = '<!DOCTYPE html><b>foo</b>'
425
+ Sanitize.clean_document(input, @config).must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
426
+ end
427
+ end
428
+
429
+ describe 'Sanitize.clean_document!' do
430
+ before { @config = { :elements => ['html'] } }
431
+
432
+ it 'should modify the input string' do
433
+ input = '<!DOCTYPE html><html><body><b>foo</b></body></html>'
434
+ Sanitize.clean_document!(input, @config)
435
+ input.must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
436
+ end
437
+
438
+ it 'should return the string if it was modified' do
439
+ input = '<!DOCTYPE html><html><body><b>foo</b></body></html>'
440
+ Sanitize.clean_document!(input, @config).must_equal("<!DOCTYPE html>\n<html>foo</html>\n")
441
+ end
442
+
443
+ it 'should return nil if the string was not modified' do
444
+ input = "<!DOCTYPE html>\n<html></html>\n"
445
+ Sanitize.clean_document!(input, @config).must_equal(nil)
446
+ end
447
+ end
448
+
449
+ describe 'transformers' do
450
+ # YouTube embed transformer.
451
+ youtube = lambda do |env|
452
+ node = env[:node]
453
+ node_name = env[:node_name]
454
+
455
+ # Don't continue if this node is already whitelisted or is not an element.
456
+ return if env[:is_whitelisted] || !node.element?
457
+
458
+ # Don't continue unless the node is an iframe.
459
+ return unless node_name == 'iframe'
460
+
461
+ # Verify that the video URL is actually a valid YouTube video URL.
462
+ return unless node['src'] =~ /\Ahttps?:\/\/(?:www\.)?youtube(?:-nocookie)?\.com\//
463
+
464
+ # We're now certain that this is a YouTube embed, but we still need to run
465
+ # it through a special Sanitize step to ensure that no unwanted elements or
466
+ # attributes that don't belong in a YouTube embed can sneak in.
467
+ Sanitize.clean_node!(node, {
468
+ :elements => %w[iframe],
469
+
470
+ :attributes => {
471
+ 'iframe' => %w[allowfullscreen frameborder height src width]
472
+ }
473
+ })
474
+
475
+ # Now that we're sure that this is a valid YouTube embed and that there are
476
+ # no unwanted elements or attributes hidden inside it, we can tell Sanitize
477
+ # to whitelist the current node.
478
+ {:node_whitelist => [node]}
479
+ end
480
+
481
+ it 'should receive a complete env Hash as input' do
482
+ Sanitize.clean!('<SPAN>foo</SPAN>', :foo => :bar, :transformers => lambda {|env|
483
+ return unless env[:node].element?
484
+
485
+ env[:config][:foo].must_equal(:bar)
486
+ env[:is_whitelisted].must_equal(false)
487
+ env[:node].must_be_kind_of(Nokogiri::XML::Node)
488
+ env[:node_name].must_equal('span')
489
+ env[:node_whitelist].must_be_kind_of(Set)
490
+ env[:node_whitelist].must_be_empty
491
+ })
492
+ end
493
+
494
+ it 'should traverse all node types, including the fragment itself' do
495
+ nodes = []
496
+
497
+ Sanitize.clean!('<div>foo</div><!--bar--><script>cdata!</script>', :transformers => proc {|env|
498
+ nodes << env[:node_name]
499
+ })
500
+
501
+ nodes.must_equal(%w[
502
+ text div comment #cdata-section script #document-fragment
503
+ ])
504
+ end
505
+
506
+ it 'should traverse in depth-first mode by default' do
507
+ nodes = []
508
+
509
+ Sanitize.clean!('<div><span>foo</span></div><p>bar</p>', :transformers => proc {|env|
510
+ env[:traversal_mode].must_equal(:depth)
511
+ nodes << env[:node_name] if env[:node].element?
512
+ })
513
+
514
+ nodes.must_equal(['span', 'div', 'p'])
515
+ end
516
+
517
+ it 'should traverse in breadth-first mode when using :transformers_breadth' do
518
+ nodes = []
519
+
520
+ Sanitize.clean!('<div><span>foo</span></div><p>bar</p>', :transformers_breadth => proc {|env|
521
+ env[:traversal_mode].must_equal(:breadth)
522
+ nodes << env[:node_name] if env[:node].element?
523
+ })
524
+
525
+ nodes.must_equal(['div', 'span', 'p'])
526
+ end
527
+
528
+ it 'should whitelist nodes in the node whitelist' do
529
+ Sanitize.clean!('<div class="foo">foo</div><span>bar</span>', :transformers => [
530
+ proc {|env|
531
+ {:node_whitelist => [env[:node]]} if env[:node_name] == 'div'
532
+ },
533
+
534
+ proc {|env|
535
+ env[:is_whitelisted].must_equal(false) unless env[:node_name] == 'div'
536
+ env[:is_whitelisted].must_equal(true) if env[:node_name] == 'div'
537
+ env[:node_whitelist].must_include(env[:node]) if env[:node_name] == 'div'
538
+ }
539
+ ]).must_equal('<div class="foo">foo</div>bar')
540
+ end
541
+
542
+ it 'should clear the node whitelist after each fragment' do
543
+ called = false
544
+
545
+ Sanitize.clean!('<div>foo</div>', :transformers => proc {|env|
546
+ {:node_whitelist => [env[:node]]}
547
+ })
548
+
549
+ Sanitize.clean!('<div>foo</div>', :transformers => proc {|env|
550
+ called = true
551
+ env[:is_whitelisted].must_equal(false)
552
+ env[:node_whitelist].must_be_empty
553
+ })
554
+
555
+ called.must_equal(true)
556
+ end
557
+
558
+ it 'should allow youtube video embeds via the youtube transformer' do
559
+ input = '<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
560
+ output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="http://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
561
+
562
+ Sanitize.clean!(input, :transformers => youtube).must_equal(output)
563
+ end
564
+
565
+ it 'should allow https youtube video embeds via the youtube transformer' do
566
+ input = '<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
567
+ output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="https://www.youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
568
+
569
+ Sanitize.clean!(input, :transformers => youtube).must_equal(output)
570
+ end
571
+
572
+ it 'should allow privacy-enhanced youtube video embeds via the youtube transformer' do
573
+ input = '<iframe width="420" height="315" src="http://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen bogus="bogus"><script>alert()</script></iframe>'
574
+ output = Nokogiri::HTML::DocumentFragment.parse('<iframe width="420" height="315" src="http://www.youtube-nocookie.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen>alert()</iframe>').to_html(:encoding => 'utf-8', :indent => 0)
575
+
576
+ Sanitize.clean!(input, :transformers => youtube).must_equal(output)
577
+ end
578
+
579
+ it 'should not allow non-youtube video embeds via the youtube transformer' do
580
+ input = '<iframe width="420" height="315" src="http://www.fake-youtube.com/embed/QH2-TGUlwu4" frameborder="0" allowfullscreen></iframe>'
581
+ output = ''
582
+
583
+ Sanitize.clean!(input, :transformers => youtube).must_equal(output)
584
+ end
585
+ end
586
+
587
+ describe 'bugs' do
588
+ it 'should not have Nokogiri 1.4.2+ unterminated script/style element bug' do
589
+ Sanitize.clean!('foo <script>bar').must_equal('foo bar')
590
+ Sanitize.clean!('foo <style>bar').must_equal('foo bar')
591
+ end
592
+ end
593
+
594
+ describe "default configurations" do
595
+ def assert_deep_frozen(config)
596
+ if Hash === config
597
+ config.each_value { |c| assert_deep_frozen(c) }
598
+ config.frozen?.must_equal(true)
599
+ elsif Array === config
600
+ config.each { |c| assert_deep_frozen(c) }
601
+ config.frozen?.must_equal(true)
602
+ end
603
+ end
604
+
605
+ {
606
+ "DEFAULT" => Sanitize::Config::DEFAULT,
607
+ "RESTRICTED" => Sanitize::Config::RESTRICTED,
608
+ "BASIC" => Sanitize::Config::BASIC,
609
+ "RELAXED" => Sanitize::Config::RELAXED,
610
+ }.each do |name, config|
611
+ describe name do
612
+ it "should be frozen" do
613
+ assert_deep_frozen(config)
614
+ end
615
+ end
616
+ end
617
+
618
+ it "cannot be modified" do
619
+ assert_raises(RuntimeError, "can't modify frozen") {
620
+ Sanitize::Config::RESTRICTED.dup[:elements].push("script")
621
+ }
622
+ end
623
+ end
metadata CHANGED
@@ -1,29 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanitize
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.4
4
+ version: 2.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Ryan Grove
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-06-12 00:00:00.000000000 Z
11
+ date: 2013-07-10 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - '>='
18
18
  - !ruby/object:Gem::Version
19
- version: 1.6.0
19
+ version: 1.4.4
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - '>='
25
25
  - !ruby/object:Gem::Version
26
- version: 1.6.0
26
+ version: 1.4.4
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: minitest
29
29
  requirement: !ruby/object:Gem::Requirement
@@ -62,6 +62,7 @@ files:
62
62
  - LICENSE
63
63
  - README.rdoc
64
64
  - lib/sanitize/config/basic.rb
65
+ - lib/sanitize/config/default.rb
65
66
  - lib/sanitize/config/relaxed.rb
66
67
  - lib/sanitize/config/restricted.rb
67
68
  - lib/sanitize/config.rb
@@ -70,6 +71,7 @@ files:
70
71
  - lib/sanitize/transformers/clean_element.rb
71
72
  - lib/sanitize/version.rb
72
73
  - lib/sanitize.rb
74
+ - test/test_sanitize.rb
73
75
  homepage: https://github.com/rgrove/sanitize/
74
76
  licenses: []
75
77
  metadata: {}