glebm-sanitize 1.2.1.1
Sign up to get free protection for your applications and to get access to all the features.
- data/HISTORY +90 -0
- data/LICENSE +18 -0
- data/README.rdoc +334 -0
- data/lib/sanitize.rb +247 -0
- data/lib/sanitize/config.rb +70 -0
- data/lib/sanitize/config/basic.rb +49 -0
- data/lib/sanitize/config/relaxed.rb +57 -0
- data/lib/sanitize/config/restricted.rb +29 -0
- data/lib/sanitize/version.rb +3 -0
- metadata +124 -0
data/HISTORY
ADDED
@@ -0,0 +1,90 @@
|
|
1
|
+
Sanitize History
|
2
|
+
================================================================================
|
3
|
+
|
4
|
+
Version 1.2.1 (2010-04-20)
|
5
|
+
* Added a :remove_contents config setting. If set to true, Sanitize will
|
6
|
+
remove the contents of all non-whitelisted elements in addition to the
|
7
|
+
elements themselves. If set to an Array of element names, Sanitize will
|
8
|
+
remove the contents of only those elements (when filtered), and leave the
|
9
|
+
contents of other filtered elements. [Thanks to Rafael Souza for the Array
|
10
|
+
option]
|
11
|
+
* Added an :output_encoding config setting to allow the character encoding for
|
12
|
+
HTML output to be specified. The default is 'utf-8'.
|
13
|
+
* The environment hash passed into transformers now includes a :node_name item
|
14
|
+
containing the lowercase name of the current HTML node (e.g. "div").
|
15
|
+
* Returning anything other than a Hash or nil from a transformer will now
|
16
|
+
raise a meaningful Sanitize::Error exception rather than an unintended
|
17
|
+
NameError.
|
18
|
+
|
19
|
+
Version 1.2.0 (2010-01-17)
|
20
|
+
* Requires Nokogiri ~> 1.4.1.
|
21
|
+
* Added support for transformers, which allow you to filter and alter nodes
|
22
|
+
using your own custom logic, on top of (or instead of) Sanitize's core
|
23
|
+
filter. See the README for details and examples.
|
24
|
+
* Added Sanitize.clean_node!, which sanitizes a Nokogiri::XML::Node and all
|
25
|
+
its children.
|
26
|
+
* Added elements <h1> through <h6> to the Relaxed whitelist. [Suggested by
|
27
|
+
David Reese]
|
28
|
+
|
29
|
+
Version 1.1.0 (2009-10-11)
|
30
|
+
* Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
|
31
|
+
* Added an :output config setting to allow the output format to be specified.
|
32
|
+
Supported formats are :xhtml (the default) and :html (which outputs HTML4).
|
33
|
+
* Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
|
34
|
+
path segments. [Peter Cooper]
|
35
|
+
|
36
|
+
Version 1.0.8 (2009-04-23)
|
37
|
+
* Added a workaround for an Hpricot bug that prevents attribute names from
|
38
|
+
being downcased in recent versions of Hpricot. This was exploitable to
|
39
|
+
prevent non-whitelisted protocols from being cleaned. [Reported by Ben
|
40
|
+
Wanicur]
|
41
|
+
|
42
|
+
Version 1.0.7 (2009-04-11)
|
43
|
+
* Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
|
44
|
+
* Fixed a bug that caused named character entities containing digits (like
|
45
|
+
²) to be escaped when they shouldn't have been. [Reported by Sebastian
|
46
|
+
Steinmetz]
|
47
|
+
|
48
|
+
Version 1.0.6 (2009-02-23)
|
49
|
+
* Removed htmlentities gem dependency.
|
50
|
+
* Existing well-formed character entity references in the input string are now
|
51
|
+
preserved rather than being decoded and re-encoded.
|
52
|
+
* The ' character is now encoded as ' instead of ' to prevent
|
53
|
+
problems in IE6.
|
54
|
+
* You can now specify the symbol :all in place of an element name in the
|
55
|
+
attributes config hash to allow certain attributes on all elements. [Thanks
|
56
|
+
to Mutwin Kraus]
|
57
|
+
|
58
|
+
Version 1.0.5 (2009-02-05)
|
59
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
|
60
|
+
protocols from being cleaned when relative URLs were allowed. [Reported by
|
61
|
+
Dev Purkayastha]
|
62
|
+
* Fixed "undefined method `parent='" exceptions caused by parser changes in
|
63
|
+
edge Hpricot.
|
64
|
+
|
65
|
+
Version 1.0.4 (2009-01-16)
|
66
|
+
* Fixed a bug that made it possible to sneak a non-whitelisted element through
|
67
|
+
by repeating it several times in a row. All versions of Sanitize prior to
|
68
|
+
1.0.4 are vulnerable. [Reported by Cristobal]
|
69
|
+
|
70
|
+
Version 1.0.3 (2009-01-15)
|
71
|
+
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
72
|
+
prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
|
73
|
+
still decode the incomplete entities, users of those browsers may be
|
74
|
+
vulnerable to malicious script injection on websites using versions of
|
75
|
+
Sanitize prior to 1.0.3.
|
76
|
+
|
77
|
+
Version 1.0.2 (2009-01-04)
|
78
|
+
* Fixed a bug that caused an exception to be thrown when parsing a valueless
|
79
|
+
attribute that's expected to contain a URL.
|
80
|
+
|
81
|
+
Version 1.0.1 (2009-01-01)
|
82
|
+
* You can now specify :relative in a protocol config array to allow attributes
|
83
|
+
containing relative URLs with no protocol. The Basic and Relaxed configs
|
84
|
+
have been updated to allow relative URLs.
|
85
|
+
* Added a workaround for an Hpricot bug that causes HTML entities for
|
86
|
+
non-ASCII characters to be replaced by question marks, and all other
|
87
|
+
entities to be destructively decoded.
|
88
|
+
|
89
|
+
Version 1.0.0 (2008-12-25)
|
90
|
+
* First release.
|
data/LICENSE
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
4
|
+
this software and associated documentation files (the 'Software'), to deal in
|
5
|
+
the Software without restriction, including without limitation the rights to
|
6
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
7
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
8
|
+
subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in all
|
11
|
+
copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
15
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
16
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
17
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
18
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.rdoc
ADDED
@@ -0,0 +1,334 @@
|
|
1
|
+
= Sanitize
|
2
|
+
|
3
|
+
Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
|
4
|
+
elements and attributes, Sanitize will remove all unacceptable HTML from a
|
5
|
+
string.
|
6
|
+
|
7
|
+
Using a simple configuration syntax, you can tell Sanitize to allow certain
|
8
|
+
elements, certain attributes within those elements, and even certain URL
|
9
|
+
protocols within attributes that contain URLs. Any HTML elements or attributes
|
10
|
+
that you don't explicitly allow will be removed.
|
11
|
+
|
12
|
+
Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
|
13
|
+
of fragile regular expressions, Sanitize has no trouble dealing with malformed
|
14
|
+
or maliciously-formed HTML, and will always output valid HTML or XHTML.
|
15
|
+
|
16
|
+
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
17
|
+
*Version*:: 1.2.1 (2010-04-20)
|
18
|
+
*Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
|
19
|
+
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
20
|
+
*Website*:: http://github.com/rgrove/sanitize
|
21
|
+
|
22
|
+
== Requires
|
23
|
+
|
24
|
+
* Nokogiri ~> 1.4.1
|
25
|
+
* libxml2 >= 2.7.2
|
26
|
+
|
27
|
+
== Installation
|
28
|
+
|
29
|
+
Latest stable release:
|
30
|
+
|
31
|
+
gem install sanitize
|
32
|
+
|
33
|
+
Latest development version:
|
34
|
+
|
35
|
+
gem install sanitize --pre
|
36
|
+
|
37
|
+
== Usage
|
38
|
+
|
39
|
+
If you don't specify any configuration options, Sanitize will use its strictest
|
40
|
+
settings by default, which means it will strip all HTML and leave only text
|
41
|
+
behind.
|
42
|
+
|
43
|
+
require 'rubygems'
|
44
|
+
require 'sanitize'
|
45
|
+
|
46
|
+
html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
47
|
+
|
48
|
+
Sanitize.clean(html) # => 'foo'
|
49
|
+
|
50
|
+
== Configuration
|
51
|
+
|
52
|
+
In addition to the ultra-safe default settings, Sanitize comes with three other
|
53
|
+
built-in modes.
|
54
|
+
|
55
|
+
=== Sanitize::Config::RESTRICTED
|
56
|
+
|
57
|
+
Allows only very simple inline formatting markup. No links, images, or block
|
58
|
+
elements.
|
59
|
+
|
60
|
+
Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
|
61
|
+
|
62
|
+
=== Sanitize::Config::BASIC
|
63
|
+
|
64
|
+
Allows a variety of markup including formatting tags, links, and lists. Images
|
65
|
+
and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
|
66
|
+
protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
|
67
|
+
mitigate SEO spam.
|
68
|
+
|
69
|
+
Sanitize.clean(html, Sanitize::Config::BASIC)
|
70
|
+
# => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
|
71
|
+
|
72
|
+
=== Sanitize::Config::RELAXED
|
73
|
+
|
74
|
+
Allows an even wider variety of markup than BASIC, including images and tables.
|
75
|
+
Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
|
76
|
+
are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
|
77
|
+
added to links.
|
78
|
+
|
79
|
+
Sanitize.clean(html, Sanitize::Config::RELAXED)
|
80
|
+
# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
81
|
+
|
82
|
+
=== Custom Configuration
|
83
|
+
|
84
|
+
If the built-in modes don't meet your needs, you can easily specify a custom
|
85
|
+
configuration:
|
86
|
+
|
87
|
+
Sanitize.clean(html, :elements => ['a', 'span'],
|
88
|
+
:attributes => {'a' => ['href', 'title'], 'span' => ['class']},
|
89
|
+
:protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
|
90
|
+
|
91
|
+
==== :add_attributes (Hash)
|
92
|
+
|
93
|
+
Attributes to add to specific elements. If the attribute already exists, it will
|
94
|
+
be replaced with the value specified here. Specify all element names and
|
95
|
+
attributes in lowercase.
|
96
|
+
|
97
|
+
:add_attributes => {
|
98
|
+
'a' => {'rel' => 'nofollow'}
|
99
|
+
}
|
100
|
+
|
101
|
+
==== :attributes (Hash)
|
102
|
+
|
103
|
+
Attributes to allow for specific elements. Specify all element names and
|
104
|
+
attributes in lowercase.
|
105
|
+
|
106
|
+
:attributes => {
|
107
|
+
'a' => ['href', 'title'],
|
108
|
+
'blockquote' => ['cite'],
|
109
|
+
'img' => ['alt', 'src', 'title']
|
110
|
+
}
|
111
|
+
|
112
|
+
If you'd like to allow certain attributes on all elements, use the symbol
|
113
|
+
<code>:all</code> instead of an element name.
|
114
|
+
|
115
|
+
:attributes => {
|
116
|
+
:all => ['class'],
|
117
|
+
'a' => ['href', 'title']
|
118
|
+
}
|
119
|
+
|
120
|
+
==== :allow_comments (boolean)
|
121
|
+
|
122
|
+
Whether or not to allow HTML comments. Allowing comments is strongly
|
123
|
+
discouraged, since IE allows script execution within conditional comments. The
|
124
|
+
default value is <code>false</code>.
|
125
|
+
|
126
|
+
==== :elements (Array)
|
127
|
+
|
128
|
+
Array of element names to allow. Specify all names in lowercase.
|
129
|
+
|
130
|
+
:elements => [
|
131
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
132
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
133
|
+
'sup', 'u', 'ul'
|
134
|
+
]
|
135
|
+
|
136
|
+
==== :output (Symbol)
|
137
|
+
|
138
|
+
Output format. Supported formats are <code>:html</code> and <code>:xhtml</code>,
|
139
|
+
defaulting to <code>:xhtml</code>.
|
140
|
+
|
141
|
+
==== :output_encoding (String)
|
142
|
+
|
143
|
+
Character encoding to use for HTML output. Default is <code>'utf-8'</code>.
|
144
|
+
|
145
|
+
==== :protocols (Hash)
|
146
|
+
|
147
|
+
URL protocols to allow in specific attributes. If an attribute is listed here
|
148
|
+
and contains a protocol other than those specified (or if it contains no
|
149
|
+
protocol at all), it will be removed.
|
150
|
+
|
151
|
+
:protocols => {
|
152
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
153
|
+
'img' => {'src' => ['http', 'https']}
|
154
|
+
}
|
155
|
+
|
156
|
+
If you'd like to allow the use of relative URLs which don't have a protocol,
|
157
|
+
include the symbol <code>:relative</code> in the protocol array:
|
158
|
+
|
159
|
+
:protocols => {
|
160
|
+
'a' => {'href' => ['http', 'https', :relative]}
|
161
|
+
}
|
162
|
+
|
163
|
+
==== :remove_contents (boolean or Array)
|
164
|
+
|
165
|
+
If set to +true+, Sanitize will remove the contents of any non-whitelisted
|
166
|
+
elements in addition to the elements themselves. By default, Sanitize leaves the
|
167
|
+
safe parts of an element's contents behind when the element is removed.
|
168
|
+
|
169
|
+
If set to an Array of element names, then only the contents of the specified
|
170
|
+
elements (when filtered) will be removed, and the contents of all other filtered
|
171
|
+
elements will be left behind.
|
172
|
+
|
173
|
+
The default value is <code>false</code>.
|
174
|
+
|
175
|
+
==== :transformers
|
176
|
+
|
177
|
+
See below.
|
178
|
+
|
179
|
+
=== Transformers
|
180
|
+
|
181
|
+
Transformers allow you to filter and alter nodes using your own custom logic, on
|
182
|
+
top of (or instead of) Sanitize's core filter. A transformer is any object that
|
183
|
+
responds to <code>call()</code> (such as a lambda or proc) and returns either
|
184
|
+
<code>nil</code> or a Hash containing certain optional response values.
|
185
|
+
|
186
|
+
To use one or more transformers, pass them to the <code>:transformers</code>
|
187
|
+
config setting:
|
188
|
+
|
189
|
+
Sanitize.clean(html, :transformers => [transformer_one, transformer_two])
|
190
|
+
|
191
|
+
==== Input
|
192
|
+
|
193
|
+
Each registered transformer's <code>call()</code> method will be called once for
|
194
|
+
each element node in the HTML, and will receive as an argument an environment
|
195
|
+
Hash that contains the following items:
|
196
|
+
|
197
|
+
[<code>:config</code>]
|
198
|
+
The current Sanitize configuration Hash.
|
199
|
+
|
200
|
+
[<code>:node</code>]
|
201
|
+
A Nokogiri::XML::Node object representing an HTML element.
|
202
|
+
|
203
|
+
[<code>:node_name</code>]
|
204
|
+
The name of the current HTML node, always lowercase (e.g. "div" or "span").
|
205
|
+
|
206
|
+
==== Processing
|
207
|
+
|
208
|
+
Each transformer has full access to the Nokogiri::XML::Node that's passed into
|
209
|
+
it and to the rest of the document via the node's <code>document()</code>
|
210
|
+
method. Any changes will be reflected instantly in the document and passed on to
|
211
|
+
subsequently-called transformers and to Sanitize itself. A transformer may even
|
212
|
+
call Sanitize internally to perform custom sanitization if needed.
|
213
|
+
|
214
|
+
Nodes are passed into transformers in the order in which they're traversed. It's
|
215
|
+
important to note that Nokogiri traverses markup from the deepest node upward,
|
216
|
+
not from the first node to the last node:
|
217
|
+
|
218
|
+
html = '<div><span>foo</span></div>'
|
219
|
+
transformer = lambda{|env| puts env[:node].name }
|
220
|
+
|
221
|
+
# Prints "span", then "div".
|
222
|
+
Sanitize.clean(html, :transformers => transformer)
|
223
|
+
|
224
|
+
Transformers have a tremendous amount of power, including the power to
|
225
|
+
completely bypass Sanitize's built-in filtering. Be careful!
|
226
|
+
|
227
|
+
==== Output
|
228
|
+
|
229
|
+
A transformer may return either +nil+ or a Hash. A return value of +nil+
|
230
|
+
indicates that the transformer does not wish to act on the current node in any
|
231
|
+
way. A returned Hash may contain the following items, all of which are optional:
|
232
|
+
|
233
|
+
[<code>:attr_whitelist</code>]
|
234
|
+
Array of attribute names to add to the whitelist for the current node, in
|
235
|
+
addition to any whitelisted attributes already defined in the current config.
|
236
|
+
|
237
|
+
[<code>:node</code>]
|
238
|
+
A Nokogiri::XML::Node object that should replace the current node. All
|
239
|
+
subsequent transformers and Sanitize itself will receive this new node.
|
240
|
+
|
241
|
+
[<code>:whitelist</code>]
|
242
|
+
If _true_, the current node (and only the current node) will be whitelisted,
|
243
|
+
regardless of the current Sanitize config.
|
244
|
+
|
245
|
+
[<code>:whitelist_nodes</code>]
|
246
|
+
Array of specific Nokogiri::XML::Node objects to whitelist, anywhere in the
|
247
|
+
document, regardless of the current Sanitize config.
|
248
|
+
|
249
|
+
==== Example: Transformer to whitelist YouTube video embeds
|
250
|
+
|
251
|
+
The following example demonstrates how to create a Sanitize transformer that
|
252
|
+
will safely whitelist valid YouTube video embeds without having to blindly allow
|
253
|
+
other kinds of embedded content, which would be the case if you tried to do this
|
254
|
+
by just whitelisting all <code><object></code>, <code><embed></code>, and
|
255
|
+
<code><param></code> elements:
|
256
|
+
|
257
|
+
lambda do |env|
|
258
|
+
node = env[:node]
|
259
|
+
node_name = env[:node_name]
|
260
|
+
parent = node.parent
|
261
|
+
|
262
|
+
# Since the transformer receives the deepest nodes first, we look for a
|
263
|
+
# <param> element or an <embed> element whose parent is an <object>.
|
264
|
+
return nil unless (node_name == 'param' || node_name == 'embed') &&
|
265
|
+
parent.name.to_s.downcase == 'object'
|
266
|
+
|
267
|
+
if node_name == 'param'
|
268
|
+
# Quick XPath search to find the <param> node that contains the video URL.
|
269
|
+
return nil unless movie_node = parent.search('param[@name="movie"]')[0]
|
270
|
+
url = movie_node['value']
|
271
|
+
else
|
272
|
+
# Since this is an <embed>, the video URL is in the "src" attribute. No
|
273
|
+
# extra work needed.
|
274
|
+
url = node['src']
|
275
|
+
end
|
276
|
+
|
277
|
+
# Verify that the video URL is actually a valid YouTube video URL.
|
278
|
+
return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
|
279
|
+
|
280
|
+
# We're now certain that this is a YouTube embed, but we still need to run
|
281
|
+
# it through a special Sanitize step to ensure that no unwanted elements or
|
282
|
+
# attributes that don't belong in a YouTube embed can sneak in.
|
283
|
+
Sanitize.clean_node!(parent, {
|
284
|
+
:elements => ['embed', 'object', 'param'],
|
285
|
+
:attributes => {
|
286
|
+
'embed' => ['allowfullscreen', 'allowscriptaccess', 'height', 'src', 'type', 'width'],
|
287
|
+
'object' => ['height', 'width'],
|
288
|
+
'param' => ['name', 'value']
|
289
|
+
}
|
290
|
+
})
|
291
|
+
|
292
|
+
# Now that we're sure that this is a valid YouTube embed and that there are
|
293
|
+
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
|
294
|
+
# to whitelist the current node (<param> or <embed>) and its parent
|
295
|
+
# (<object>).
|
296
|
+
{:whitelist_nodes => [node, parent]}
|
297
|
+
end
|
298
|
+
|
299
|
+
== Contributors
|
300
|
+
|
301
|
+
The following lovely people have contributed to Sanitize in the form of patches
|
302
|
+
or ideas that later became code:
|
303
|
+
|
304
|
+
* Wilson Bilkovich <wilson@supremetyrant.com>
|
305
|
+
* Peter Cooper <git@peterc.org>
|
306
|
+
* Gabe da Silveira <gabe@websaviour.com>
|
307
|
+
* Ryan Grove <ryan@wonko.com>
|
308
|
+
* Adam Hooper <adam@adamhooper.com>
|
309
|
+
* Mutwin Kraus <mutle@blogage.de>
|
310
|
+
* Dev Purkayastha <dev.purkayastha@gmail.com>
|
311
|
+
* David Reese <work@whatcould.com>
|
312
|
+
* Rafael Souza <me@rafaelss.com>
|
313
|
+
* Ben Wanicur <bwanicur@verticalresponse.com>
|
314
|
+
|
315
|
+
== License
|
316
|
+
|
317
|
+
Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
318
|
+
|
319
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
320
|
+
this software and associated documentation files (the 'Software'), to deal in
|
321
|
+
the Software without restriction, including without limitation the rights to
|
322
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
323
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
324
|
+
subject to the following conditions:
|
325
|
+
|
326
|
+
The above copyright notice and this permission notice shall be included in all
|
327
|
+
copies or substantial portions of the Software.
|
328
|
+
|
329
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
330
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
331
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
332
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
333
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
334
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/lib/sanitize.rb
ADDED
@@ -0,0 +1,247 @@
|
|
1
|
+
# encoding: utf-8
|
2
|
+
#--
|
3
|
+
# Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
4
|
+
#
|
5
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
7
|
+
# in the Software without restriction, including without limitation the rights
|
8
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
# copies of the Software, and to permit persons to whom the Software is
|
10
|
+
# furnished to do so, subject to the following conditions:
|
11
|
+
#
|
12
|
+
# The above copyright notice and this permission notice shall be included in all
|
13
|
+
# copies or substantial portions of the Software.
|
14
|
+
#
|
15
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
# SOFTWARE.
|
22
|
+
#++
|
23
|
+
|
24
|
+
require 'nokogiri'
|
25
|
+
require 'sanitize/version'
|
26
|
+
require 'sanitize/config'
|
27
|
+
require 'sanitize/config/restricted'
|
28
|
+
require 'sanitize/config/basic'
|
29
|
+
require 'sanitize/config/relaxed'
|
30
|
+
|
31
|
+
class Sanitize
|
32
|
+
attr_reader :config
|
33
|
+
|
34
|
+
# Matches an attribute value that could be treated by a browser as a URL
|
35
|
+
# with a protocol prefix, such as "http:" or "javascript:". Any string of zero
|
36
|
+
# or more characters followed by a colon is considered a match, even if the
|
37
|
+
# colon is encoded as an entity and even if it's an incomplete entity (which
|
38
|
+
# IE6 and Opera will still parse).
|
39
|
+
REGEX_PROTOCOL = /^([A-Za-z0-9\+\-\.\&\;\#\s]*?)(?:\:|�*58|�*3a)/i
|
40
|
+
|
41
|
+
#--
|
42
|
+
# Class Methods
|
43
|
+
#++
|
44
|
+
|
45
|
+
# Returns a sanitized copy of _html_, using the settings in _config_ if
|
46
|
+
# specified.
|
47
|
+
def self.clean(html, config = {})
|
48
|
+
sanitize = Sanitize.new(config)
|
49
|
+
sanitize.clean(html)
|
50
|
+
end
|
51
|
+
|
52
|
+
# Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
|
53
|
+
# were made.
|
54
|
+
def self.clean!(html, config = {})
|
55
|
+
sanitize = Sanitize.new(config)
|
56
|
+
sanitize.clean!(html)
|
57
|
+
end
|
58
|
+
|
59
|
+
# Sanitizes the specified Nokogiri::XML::Node and all its children.
|
60
|
+
def self.clean_node!(node, config = {})
|
61
|
+
sanitize = Sanitize.new(config)
|
62
|
+
sanitize.clean_node!(node)
|
63
|
+
end
|
64
|
+
|
65
|
+
#--
|
66
|
+
# Instance Methods
|
67
|
+
#++
|
68
|
+
|
69
|
+
# Returns a new Sanitize object initialized with the settings in _config_.
|
70
|
+
def initialize(config = {})
|
71
|
+
# Sanitize configuration.
|
72
|
+
@config = Config::DEFAULT.merge(config)
|
73
|
+
@config[:transformers] = Array(@config[:transformers].dup)
|
74
|
+
|
75
|
+
# Convert the list of allowed elements to a Hash for faster lookup.
|
76
|
+
@allowed_elements = {}
|
77
|
+
@config[:elements].each {|el| @allowed_elements[el] = true }
|
78
|
+
|
79
|
+
# Convert the list of :remove_contents elements to a Hash for faster lookup.
|
80
|
+
@remove_all_contents = false
|
81
|
+
@remove_element_contents = {}
|
82
|
+
|
83
|
+
if @config[:remove_contents].is_a?(Array)
|
84
|
+
@config[:remove_contents].each {|el| @remove_element_contents[el] = true }
|
85
|
+
else
|
86
|
+
@remove_all_contents = !!@config[:remove_contents]
|
87
|
+
end
|
88
|
+
|
89
|
+
# Specific nodes to whitelist (along with all their attributes). This array
|
90
|
+
# is generated at runtime by transformers, and is cleared before and after
|
91
|
+
# a fragment is cleaned (so it applies only to a specific fragment).
|
92
|
+
@whitelist_nodes = []
|
93
|
+
end
|
94
|
+
|
95
|
+
# Returns a sanitized copy of _html_.
|
96
|
+
def clean(html)
|
97
|
+
if html
|
98
|
+
dupe = html.dup
|
99
|
+
clean!(dupe) || dupe
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
# Performs clean in place, returning _html_, or +nil+ if no changes were
|
104
|
+
# made.
|
105
|
+
def clean!(html)
|
106
|
+
fragment = Nokogiri::HTML::DocumentFragment.parse(html)
|
107
|
+
clean_node!(fragment)
|
108
|
+
|
109
|
+
output_method_params = {:encoding => @config[:output_encoding], :indent => 0}
|
110
|
+
|
111
|
+
if @config[:output] == :xhtml
|
112
|
+
output_method = fragment.method(:to_xhtml)
|
113
|
+
output_method_params[:save_with] = Nokogiri::XML::Node::SaveOptions::AS_XHTML
|
114
|
+
elsif @config[:output] == :html
|
115
|
+
output_method = fragment.method(:to_html)
|
116
|
+
else
|
117
|
+
raise Error, "unsupported output format: #{@config[:output]}"
|
118
|
+
end
|
119
|
+
|
120
|
+
result = output_method.call(output_method_params)
|
121
|
+
|
122
|
+
return result == html ? nil : html[0, html.length] = result
|
123
|
+
end
|
124
|
+
|
125
|
+
# Sanitizes the specified Nokogiri::XML::Node and all its children.
|
126
|
+
def clean_node!(node)
|
127
|
+
raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
|
128
|
+
|
129
|
+
@whitelist_nodes = []
|
130
|
+
|
131
|
+
node.traverse do |child|
|
132
|
+
if child.element?
|
133
|
+
clean_element!(child)
|
134
|
+
elsif child.comment?
|
135
|
+
child.unlink unless @config[:allow_comments]
|
136
|
+
elsif child.cdata?
|
137
|
+
child.replace(Nokogiri::XML::Text.new(child.text, child.document))
|
138
|
+
end
|
139
|
+
end
|
140
|
+
|
141
|
+
@whitelist_nodes = []
|
142
|
+
|
143
|
+
node
|
144
|
+
end
|
145
|
+
|
146
|
+
private
|
147
|
+
|
148
|
+
def clean_element!(node)
|
149
|
+
# Run this node through all configured transformers.
|
150
|
+
transform = transform_element!(node)
|
151
|
+
|
152
|
+
# If this node is in the dynamic whitelist array (built at runtime by
|
153
|
+
# transformers), let it live with all of its attributes intact.
|
154
|
+
return if @whitelist_nodes.include?(node)
|
155
|
+
|
156
|
+
name = node.name.to_s.downcase
|
157
|
+
|
158
|
+
# Delete any element that isn't in the whitelist.
|
159
|
+
unless transform[:whitelist] || @allowed_elements[name]
|
160
|
+
unless @remove_all_contents || @remove_element_contents[name]
|
161
|
+
node.children.each { |n| node.add_previous_sibling(n) }
|
162
|
+
end
|
163
|
+
|
164
|
+
node.unlink
|
165
|
+
|
166
|
+
return
|
167
|
+
end
|
168
|
+
|
169
|
+
attr_whitelist = (transform[:attr_whitelist] +
|
170
|
+
(@config[:attributes][name] || []) +
|
171
|
+
(@config[:attributes][:all] || [])).uniq
|
172
|
+
|
173
|
+
if attr_whitelist.empty?
|
174
|
+
# Delete all attributes from elements with no whitelisted attributes.
|
175
|
+
node.attribute_nodes.each {|attr| attr.remove }
|
176
|
+
else
|
177
|
+
# Delete any attribute that isn't in the whitelist for this element.
|
178
|
+
node.attribute_nodes.each do |attr|
|
179
|
+
attr.unlink unless attr_whitelist.include?(attr.name.downcase)
|
180
|
+
end
|
181
|
+
|
182
|
+
# Delete remaining attributes that use unacceptable protocols.
|
183
|
+
if @config[:protocols].has_key?(name)
|
184
|
+
protocol = @config[:protocols][name]
|
185
|
+
|
186
|
+
node.attribute_nodes.each do |attr|
|
187
|
+
attr_name = attr.name.downcase
|
188
|
+
next false unless protocol.has_key?(attr_name)
|
189
|
+
|
190
|
+
del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
|
191
|
+
!protocol[attr_name].include?($1.downcase)
|
192
|
+
else
|
193
|
+
!protocol[attr_name].include?(:relative)
|
194
|
+
end
|
195
|
+
|
196
|
+
attr.unlink if del
|
197
|
+
end
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
# Add required attributes.
|
202
|
+
if @config[:add_attributes].has_key?(name)
|
203
|
+
@config[:add_attributes][name].each do |key, val|
|
204
|
+
node[key] = val
|
205
|
+
end
|
206
|
+
end
|
207
|
+
|
208
|
+
transform
|
209
|
+
end
|
210
|
+
|
211
|
+
def transform_element!(node)
|
212
|
+
output = {
|
213
|
+
:attr_whitelist => [],
|
214
|
+
:node => node,
|
215
|
+
:whitelist => false
|
216
|
+
}
|
217
|
+
|
218
|
+
@config[:transformers].inject(node) do |transformer_node, transformer|
|
219
|
+
transform = transformer.call({
|
220
|
+
:config => @config,
|
221
|
+
:node => transformer_node,
|
222
|
+
:node_name => transformer_node.name.downcase
|
223
|
+
})
|
224
|
+
|
225
|
+
if transform.nil?
|
226
|
+
transformer_node
|
227
|
+
elsif transform.is_a?(Hash)
|
228
|
+
if transform[:whitelist_nodes].is_a?(Array)
|
229
|
+
@whitelist_nodes += transform[:whitelist_nodes]
|
230
|
+
@whitelist_nodes.uniq!
|
231
|
+
end
|
232
|
+
|
233
|
+
output[:attr_whitelist] += transform[:attr_whitelist] if transform[:attr_whitelist].is_a?(Array)
|
234
|
+
output[:whitelist] ||= true if transform[:whitelist]
|
235
|
+
output[:node] = transform[:node].is_a?(Nokogiri::XML::Node) ? transform[:node] : output[:node]
|
236
|
+
else
|
237
|
+
raise Error, "transformer output must be a Hash or nil"
|
238
|
+
end
|
239
|
+
end
|
240
|
+
|
241
|
+
node.replace(output[:node]) if node != output[:node]
|
242
|
+
|
243
|
+
return output
|
244
|
+
end
|
245
|
+
|
246
|
+
class Error < StandardError; end
|
247
|
+
end
|
@@ -0,0 +1,70 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
DEFAULT = {
|
26
|
+
# Whether or not to allow HTML comments. Allowing comments is strongly
|
27
|
+
# discouraged, since IE allows script execution within conditional
|
28
|
+
# comments.
|
29
|
+
:allow_comments => false,
|
30
|
+
|
31
|
+
# HTML attributes to add to specific elements. By default, no attributes
|
32
|
+
# are added.
|
33
|
+
:add_attributes => {},
|
34
|
+
|
35
|
+
# HTML attributes to allow in specific elements. By default, no attributes
|
36
|
+
# are allowed.
|
37
|
+
:attributes => {},
|
38
|
+
|
39
|
+
# HTML elements to allow. By default, no elements are allowed (which means
|
40
|
+
# that all HTML will be stripped).
|
41
|
+
:elements => [],
|
42
|
+
|
43
|
+
# Output format. Supported formats are :html and :xhtml (which is the
|
44
|
+
# default).
|
45
|
+
:output => :xhtml,
|
46
|
+
|
47
|
+
# Character encoding to use for HTML output. Default is 'utf-8'.
|
48
|
+
:output_encoding => 'utf-8',
|
49
|
+
|
50
|
+
# URL handling protocols to allow in specific attributes. By default, no
|
51
|
+
# protocols are allowed. Use :relative in place of a protocol if you want
|
52
|
+
# to allow relative URLs sans protocol.
|
53
|
+
:protocols => {},
|
54
|
+
|
55
|
+
# If this is true, Sanitize will remove the contents of any filtered
|
56
|
+
# elements in addition to the elements themselves. By default, Sanitize
|
57
|
+
# leaves the safe parts of an element's contents behind when the element
|
58
|
+
# is removed.
|
59
|
+
#
|
60
|
+
# If this is an Array of element names, then only the contents of the
|
61
|
+
# specified elements (when filtered) will be removed, and the contents of
|
62
|
+
# all other filtered elements will be left behind.
|
63
|
+
:remove_contents => false,
|
64
|
+
|
65
|
+
# Transformers allow you to filter or alter nodes using custom logic. See
|
66
|
+
# README.rdoc for details and examples.
|
67
|
+
:transformers => []
|
68
|
+
}
|
69
|
+
end
|
70
|
+
end
|
@@ -0,0 +1,49 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
BASIC = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
28
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
29
|
+
'sup', 'u', 'ul'],
|
30
|
+
|
31
|
+
:attributes => {
|
32
|
+
'a' => ['href'],
|
33
|
+
'blockquote' => ['cite'],
|
34
|
+
'q' => ['cite']
|
35
|
+
},
|
36
|
+
|
37
|
+
:add_attributes => {
|
38
|
+
'a' => {'rel' => 'nofollow'}
|
39
|
+
},
|
40
|
+
|
41
|
+
:protocols => {
|
42
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
|
43
|
+
:relative]},
|
44
|
+
'blockquote' => {'cite' => ['http', 'https', :relative]},
|
45
|
+
'q' => {'cite' => ['http', 'https', :relative]}
|
46
|
+
}
|
47
|
+
}
|
48
|
+
end
|
49
|
+
end
|
@@ -0,0 +1,57 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RELAXED = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
|
28
|
+
'colgroup', 'dd', 'dl', 'dt', 'em', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
|
29
|
+
'i', 'img', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong',
|
30
|
+
'sub', 'sup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr', 'u',
|
31
|
+
'ul'],
|
32
|
+
|
33
|
+
:attributes => {
|
34
|
+
'a' => ['href', 'title'],
|
35
|
+
'blockquote' => ['cite'],
|
36
|
+
'col' => ['span', 'width'],
|
37
|
+
'colgroup' => ['span', 'width'],
|
38
|
+
'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
|
39
|
+
'ol' => ['start', 'type'],
|
40
|
+
'q' => ['cite'],
|
41
|
+
'table' => ['summary', 'width'],
|
42
|
+
'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
|
43
|
+
'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
|
44
|
+
'width'],
|
45
|
+
'ul' => ['type']
|
46
|
+
},
|
47
|
+
|
48
|
+
:protocols => {
|
49
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
|
50
|
+
:relative]},
|
51
|
+
'blockquote' => {'cite' => ['http', 'https', :relative]},
|
52
|
+
'img' => {'src' => ['http', 'https', :relative]},
|
53
|
+
'q' => {'cite' => ['http', 'https', :relative]}
|
54
|
+
}
|
55
|
+
}
|
56
|
+
end
|
57
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RESTRICTED = {
|
26
|
+
:elements => ['b', 'em', 'i', 'strong', 'u']
|
27
|
+
}
|
28
|
+
end
|
29
|
+
end
|
metadata
ADDED
@@ -0,0 +1,124 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: glebm-sanitize
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 73
|
5
|
+
prerelease: false
|
6
|
+
segments:
|
7
|
+
- 1
|
8
|
+
- 2
|
9
|
+
- 1
|
10
|
+
- 1
|
11
|
+
version: 1.2.1.1
|
12
|
+
platform: ruby
|
13
|
+
authors:
|
14
|
+
- Ryan Grove
|
15
|
+
autorequire:
|
16
|
+
bindir: bin
|
17
|
+
cert_chain: []
|
18
|
+
|
19
|
+
date: 2010-07-19 00:00:00 +02:00
|
20
|
+
default_executable:
|
21
|
+
dependencies:
|
22
|
+
- !ruby/object:Gem::Dependency
|
23
|
+
name: glebm-nokogiri
|
24
|
+
prerelease: false
|
25
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
26
|
+
none: false
|
27
|
+
requirements:
|
28
|
+
- - ">="
|
29
|
+
- !ruby/object:Gem::Version
|
30
|
+
hash: 7
|
31
|
+
segments:
|
32
|
+
- 1
|
33
|
+
- 4
|
34
|
+
version: "1.4"
|
35
|
+
type: :runtime
|
36
|
+
version_requirements: *id001
|
37
|
+
- !ruby/object:Gem::Dependency
|
38
|
+
name: bacon
|
39
|
+
prerelease: false
|
40
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
42
|
+
requirements:
|
43
|
+
- - ~>
|
44
|
+
- !ruby/object:Gem::Version
|
45
|
+
hash: 19
|
46
|
+
segments:
|
47
|
+
- 1
|
48
|
+
- 1
|
49
|
+
- 0
|
50
|
+
version: 1.1.0
|
51
|
+
type: :development
|
52
|
+
version_requirements: *id002
|
53
|
+
- !ruby/object:Gem::Dependency
|
54
|
+
name: rake
|
55
|
+
prerelease: false
|
56
|
+
requirement: &id003 !ruby/object:Gem::Requirement
|
57
|
+
none: false
|
58
|
+
requirements:
|
59
|
+
- - ~>
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
hash: 63
|
62
|
+
segments:
|
63
|
+
- 0
|
64
|
+
- 8
|
65
|
+
- 0
|
66
|
+
version: 0.8.0
|
67
|
+
type: :development
|
68
|
+
version_requirements: *id003
|
69
|
+
description:
|
70
|
+
email: glex.spb@gmail.com
|
71
|
+
executables: []
|
72
|
+
|
73
|
+
extensions: []
|
74
|
+
|
75
|
+
extra_rdoc_files: []
|
76
|
+
|
77
|
+
files:
|
78
|
+
- HISTORY
|
79
|
+
- LICENSE
|
80
|
+
- README.rdoc
|
81
|
+
- lib/sanitize/config/restricted.rb
|
82
|
+
- lib/sanitize/config/basic.rb
|
83
|
+
- lib/sanitize/config/relaxed.rb
|
84
|
+
- lib/sanitize/config.rb
|
85
|
+
- lib/sanitize/version.rb
|
86
|
+
- lib/sanitize.rb
|
87
|
+
has_rdoc: true
|
88
|
+
homepage: http://github.com/rgrove/sanitize/
|
89
|
+
licenses: []
|
90
|
+
|
91
|
+
post_install_message:
|
92
|
+
rdoc_options: []
|
93
|
+
|
94
|
+
require_paths:
|
95
|
+
- lib
|
96
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
97
|
+
none: false
|
98
|
+
requirements:
|
99
|
+
- - ">="
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
hash: 59
|
102
|
+
segments:
|
103
|
+
- 1
|
104
|
+
- 8
|
105
|
+
- 6
|
106
|
+
version: 1.8.6
|
107
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
108
|
+
none: false
|
109
|
+
requirements:
|
110
|
+
- - ">="
|
111
|
+
- !ruby/object:Gem::Version
|
112
|
+
hash: 3
|
113
|
+
segments:
|
114
|
+
- 0
|
115
|
+
version: "0"
|
116
|
+
requirements: []
|
117
|
+
|
118
|
+
rubyforge_project: riposte
|
119
|
+
rubygems_version: 1.3.7
|
120
|
+
signing_key:
|
121
|
+
specification_version: 3
|
122
|
+
summary: Whitelist-based HTML sanitizer.
|
123
|
+
test_files: []
|
124
|
+
|