sanitize 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- data/HISTORY +5 -0
- data/LICENSE +18 -0
- data/README +142 -0
- data/lib/sanitize/config/basic.rb +48 -0
- data/lib/sanitize/config/relaxed.rb +55 -0
- data/lib/sanitize/config/restricted.rb +29 -0
- data/lib/sanitize/config.rb +48 -0
- data/lib/sanitize.rb +132 -0
- metadata +69 -0
data/HISTORY
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
4
|
+
this software and associated documentation files (the 'Software'), to deal in
|
5
|
+
the Software without restriction, including without limitation the rights to
|
6
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
7
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
8
|
+
subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in all
|
11
|
+
copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
15
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
16
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
17
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
18
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
= Sanitize
|
2
|
+
|
3
|
+
Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
|
4
|
+
elements and attributes, Sanitize will remove all unacceptable HTML from a
|
5
|
+
string.
|
6
|
+
|
7
|
+
Using a simple configuration syntax, you can tell Sanitize to allow certain
|
8
|
+
elements, certain attributes within those elements, and even certain URL
|
9
|
+
protocols within attributes that contain URLs. Any HTML elements or attributes
|
10
|
+
that you don't explicitly allow will be removed.
|
11
|
+
|
12
|
+
Because it's based on Hpricot, a full-fledged HTML parser, rather than a bunch
|
13
|
+
of fragile regular expressions, Sanitize has no trouble dealing with malformed
|
14
|
+
or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
|
15
|
+
caution.
|
16
|
+
|
17
|
+
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
18
|
+
*Version*:: 1.0.0 (2008-12-25)
|
19
|
+
*Copyright*:: Copyright (c) 2008 Ryan Grove. All rights reserved.
|
20
|
+
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
21
|
+
*Website*:: http://github.com/rgrove/sanitize
|
22
|
+
|
23
|
+
== Requires
|
24
|
+
|
25
|
+
* RubyGems
|
26
|
+
* Hpricot 0.6+
|
27
|
+
|
28
|
+
== Usage
|
29
|
+
|
30
|
+
If you don't specify any configuration options, Sanitize will use its strictest
|
31
|
+
settings by default, which means it will strip all HTML.
|
32
|
+
|
33
|
+
require 'rubygems'
|
34
|
+
require 'sanitize'
|
35
|
+
|
36
|
+
html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
37
|
+
|
38
|
+
Sanitize.clean(html) # => 'foo'
|
39
|
+
|
40
|
+
== Configuration
|
41
|
+
|
42
|
+
In addition to the ultra-safe default settings, Sanitize comes with three other
|
43
|
+
built-in modes.
|
44
|
+
|
45
|
+
=== Sanitize::Config::RESTRICTED
|
46
|
+
|
47
|
+
Allows only very simple inline formatting markup. No links, images, or block
|
48
|
+
elements.
|
49
|
+
|
50
|
+
Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
|
51
|
+
|
52
|
+
=== Sanitize::Config::BASIC
|
53
|
+
|
54
|
+
Allows a variety of markup including formatting tags, links, and lists. Images
|
55
|
+
and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
|
56
|
+
protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
|
57
|
+
mitigate SEO spam.
|
58
|
+
|
59
|
+
Sanitize.clean(html, Sanitize::Config::BASIC)
|
60
|
+
# => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
|
61
|
+
|
62
|
+
=== Sanitize::Config::RELAXED
|
63
|
+
|
64
|
+
Allows an even wider variety of markup than BASIC, including images and tables.
|
65
|
+
Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
|
66
|
+
are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
|
67
|
+
added to links.
|
68
|
+
|
69
|
+
Sanitize.clean(html, Sanitize::Config::RELAXED)
|
70
|
+
# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
71
|
+
|
72
|
+
=== Custom Configuration
|
73
|
+
|
74
|
+
If the built-in modes don't meet your needs, you can easily specify a custom
|
75
|
+
configuration:
|
76
|
+
|
77
|
+
Sanitize.clean(html, :elements => ['a', 'span'],
|
78
|
+
:attributes => {'a' => ['href', 'title'], 'span' => ['class']},
|
79
|
+
:protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
|
80
|
+
|
81
|
+
==== :elements
|
82
|
+
|
83
|
+
Array of element names to allow. Specify all names in lowercase.
|
84
|
+
|
85
|
+
:elements => [
|
86
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
87
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
88
|
+
'sup', 'u', 'ul'
|
89
|
+
]
|
90
|
+
|
91
|
+
==== :attributes
|
92
|
+
|
93
|
+
Attributes to allow for specific elements. Specify all element names and
|
94
|
+
attributes in lowercase.
|
95
|
+
|
96
|
+
:attributes => {
|
97
|
+
'a' => ['href', 'title'],
|
98
|
+
'blockquote' => ['cite'],
|
99
|
+
'img' => ['alt', 'src', 'title']
|
100
|
+
}
|
101
|
+
|
102
|
+
==== :add_attributes
|
103
|
+
|
104
|
+
Attributes to add to specific elements. If the attribute already exists, it will
|
105
|
+
be replaced with the value specified here. Specify all element names and
|
106
|
+
attributes in lowercase.
|
107
|
+
|
108
|
+
:add_attributes => {
|
109
|
+
'a' => {'rel' => 'nofollow'}
|
110
|
+
}
|
111
|
+
|
112
|
+
==== :protocols
|
113
|
+
|
114
|
+
URL protocols to allow in specific attributes. If an attribute is listed here
|
115
|
+
and contains a protocol other than those specified (or if it contains no
|
116
|
+
protocol at all), it will be removed.
|
117
|
+
|
118
|
+
:protocols => {
|
119
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
120
|
+
'img' => {'src' => ['http', 'https']}
|
121
|
+
}
|
122
|
+
|
123
|
+
== License
|
124
|
+
|
125
|
+
Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
126
|
+
|
127
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
128
|
+
this software and associated documentation files (the 'Software'), to deal in
|
129
|
+
the Software without restriction, including without limitation the rights to
|
130
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
131
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
132
|
+
subject to the following conditions:
|
133
|
+
|
134
|
+
The above copyright notice and this permission notice shall be included in all
|
135
|
+
copies or substantial portions of the Software.
|
136
|
+
|
137
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
138
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
139
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
140
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
141
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
142
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@@ -0,0 +1,48 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
BASIC = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
28
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
29
|
+
'sup', 'u', 'ul'],
|
30
|
+
|
31
|
+
:attributes => {
|
32
|
+
'a' => ['href'],
|
33
|
+
'blockquote' => ['cite'],
|
34
|
+
'q' => ['cite']
|
35
|
+
},
|
36
|
+
|
37
|
+
:add_attributes => {
|
38
|
+
'a' => {'rel' => 'nofollow'}
|
39
|
+
},
|
40
|
+
|
41
|
+
:protocols => {
|
42
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
43
|
+
'blockquote' => {'cite' => ['http', 'https']},
|
44
|
+
'q' => {'cite' => ['http', 'https']}
|
45
|
+
}
|
46
|
+
}
|
47
|
+
end
|
48
|
+
end
|
@@ -0,0 +1,55 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RELAXED = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
|
28
|
+
'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
|
29
|
+
'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
|
30
|
+
'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
|
31
|
+
|
32
|
+
:attributes => {
|
33
|
+
'a' => ['href', 'title'],
|
34
|
+
'blockquote' => ['cite'],
|
35
|
+
'col' => ['span', 'width'],
|
36
|
+
'colgroup' => ['span', 'width'],
|
37
|
+
'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
|
38
|
+
'ol' => ['start', 'type'],
|
39
|
+
'q' => ['cite'],
|
40
|
+
'table' => ['summary', 'width'],
|
41
|
+
'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
|
42
|
+
'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
|
43
|
+
'width'],
|
44
|
+
'ul' => ['type']
|
45
|
+
},
|
46
|
+
|
47
|
+
:protocols => {
|
48
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
49
|
+
'blockquote' => {'cite' => ['http', 'https']},
|
50
|
+
'img' => {'src' => ['http', 'https']},
|
51
|
+
'q' => {'cite' => ['http', 'https']}
|
52
|
+
}
|
53
|
+
}
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RESTRICTED = {
|
26
|
+
:elements => ['b', 'em', 'i', 'strong', 'u']
|
27
|
+
}
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,48 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
DEFAULT = {
|
26
|
+
# Whether or not to allow HTML comments. Allowing comments is strongly
|
27
|
+
# discouraged, since IE allows script execution within conditional
|
28
|
+
# comments.
|
29
|
+
:allow_comments => false,
|
30
|
+
|
31
|
+
# HTML elements to allow. By default, no elements are allowed (which means
|
32
|
+
# that all HTML will be stripped).
|
33
|
+
:elements => [],
|
34
|
+
|
35
|
+
# HTML attributes to allow in specific elements. By default, no attributes
|
36
|
+
# are allowed.
|
37
|
+
:attributes => {},
|
38
|
+
|
39
|
+
# HTML attributes to add to specific elements. By default, no attributes
|
40
|
+
# are added.
|
41
|
+
:add_attributes => {},
|
42
|
+
|
43
|
+
# URL handling protocols to allow in specific attributes. By default, no
|
44
|
+
# protocols are allowed.
|
45
|
+
:protocols => {}
|
46
|
+
}
|
47
|
+
end
|
48
|
+
end
|
data/lib/sanitize.rb
ADDED
@@ -0,0 +1,132 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
# Append this file's directory to the include path if it's not there already.
|
24
|
+
$:.unshift(File.dirname(File.expand_path(__FILE__)))
|
25
|
+
$:.uniq!
|
26
|
+
|
27
|
+
require 'rubygems'
|
28
|
+
gem 'hpricot', '~> 0.6'
|
29
|
+
|
30
|
+
require 'hpricot'
|
31
|
+
require 'sanitize/config'
|
32
|
+
require 'sanitize/config/restricted'
|
33
|
+
require 'sanitize/config/basic'
|
34
|
+
require 'sanitize/config/relaxed'
|
35
|
+
|
36
|
+
class Sanitize
|
37
|
+
#--
|
38
|
+
# Class Methods
|
39
|
+
#++
|
40
|
+
|
41
|
+
# Returns a sanitized copy of _html_, using the settings in _config_ if
|
42
|
+
# specified.
|
43
|
+
def self.clean(html, config = {})
|
44
|
+
sanitize = Sanitize.new(config)
|
45
|
+
sanitize.clean(html)
|
46
|
+
end
|
47
|
+
|
48
|
+
# Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
|
49
|
+
# were necessary.
|
50
|
+
def self.clean!(html, config = {})
|
51
|
+
sanitize = Sanitize.new(config)
|
52
|
+
sanitize.clean!(html)
|
53
|
+
end
|
54
|
+
|
55
|
+
#--
|
56
|
+
# Instance Methods
|
57
|
+
#++
|
58
|
+
|
59
|
+
# Returns a new Sanitize object initialized with the settings in _config_.
|
60
|
+
def initialize(config = {})
|
61
|
+
@config = Config::DEFAULT.merge(config)
|
62
|
+
end
|
63
|
+
|
64
|
+
# Returns a sanitized copy of _html_.
|
65
|
+
def clean(html)
|
66
|
+
dupe = html.dup
|
67
|
+
clean!(dupe) || dupe
|
68
|
+
end
|
69
|
+
|
70
|
+
# Performs clean in place, returning _html_, or +nil+ if no changes were
|
71
|
+
# necessary.
|
72
|
+
def clean!(html)
|
73
|
+
fragment = Hpricot(html)
|
74
|
+
|
75
|
+
fragment.traverse_element do |node|
|
76
|
+
if node.bogusetag? || node.doctype? || node.procins? || node.xmldecl?
|
77
|
+
node.swap('')
|
78
|
+
next
|
79
|
+
end
|
80
|
+
|
81
|
+
if node.comment?
|
82
|
+
node.swap('') unless @config[:allow_comments]
|
83
|
+
elsif node.elem?
|
84
|
+
name = node.name.downcase
|
85
|
+
|
86
|
+
# Delete any element that isn't in the whitelist.
|
87
|
+
unless @config[:elements].include?(name)
|
88
|
+
node.parent.replace_child(node, node.children)
|
89
|
+
next
|
90
|
+
end
|
91
|
+
|
92
|
+
if @config[:attributes].has_key?(name)
|
93
|
+
# Delete any attribute that isn't in the whitelist for this element.
|
94
|
+
node.raw_attributes.delete_if do |key, value|
|
95
|
+
!@config[:attributes][name].include?(key.downcase)
|
96
|
+
end
|
97
|
+
|
98
|
+
# Delete remaining attributes that use unacceptable protocols.
|
99
|
+
if @config[:protocols].has_key?(name)
|
100
|
+
protocol = @config[:protocols][name]
|
101
|
+
|
102
|
+
node.raw_attributes.delete_if do |key, value|
|
103
|
+
protocol.has_key?(key) && (!(value.downcase =~ /^([^:]+):/) ||
|
104
|
+
!protocol[key].include?($1.downcase))
|
105
|
+
end
|
106
|
+
end
|
107
|
+
else
|
108
|
+
# Delete all attributes from elements with no whitelisted
|
109
|
+
# attributes.
|
110
|
+
node.raw_attributes = {}
|
111
|
+
end
|
112
|
+
|
113
|
+
# Add required attributes.
|
114
|
+
if @config[:add_attributes].has_key?(name)
|
115
|
+
node.raw_attributes.merge!(@config[:add_attributes][name])
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
119
|
+
|
120
|
+
# Make one last pass through the fragment and replace angle brackets with
|
121
|
+
# entities in all text nodes. This helps eliminate certain types of
|
122
|
+
# maliciously-malformed nested tags.
|
123
|
+
fragment.traverse_element do |node|
|
124
|
+
if node.text?
|
125
|
+
node.swap(node.inner_text.gsub('<', '<').gsub('>', '>'))
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
result = fragment.to_s
|
130
|
+
return result == html ? nil : html[0, html.length] = result
|
131
|
+
end
|
132
|
+
end
|
metadata
ADDED
@@ -0,0 +1,69 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: sanitize
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Ryan Grove
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2008-12-24 00:00:00 -08:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: hpricot
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ~>
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.6"
|
24
|
+
version:
|
25
|
+
description:
|
26
|
+
email: ryan@wonko.com
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files: []
|
32
|
+
|
33
|
+
files:
|
34
|
+
- HISTORY
|
35
|
+
- LICENSE
|
36
|
+
- README
|
37
|
+
- lib/sanitize.rb
|
38
|
+
- lib/sanitize/config.rb
|
39
|
+
- lib/sanitize/config/basic.rb
|
40
|
+
- lib/sanitize/config/relaxed.rb
|
41
|
+
- lib/sanitize/config/restricted.rb
|
42
|
+
has_rdoc: false
|
43
|
+
homepage: http://github.com/rgrove/sanitize/
|
44
|
+
post_install_message:
|
45
|
+
rdoc_options: []
|
46
|
+
|
47
|
+
require_paths:
|
48
|
+
- lib
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 1.8.6
|
54
|
+
version:
|
55
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
requirements: []
|
62
|
+
|
63
|
+
rubyforge_project:
|
64
|
+
rubygems_version: 1.2.0
|
65
|
+
signing_key:
|
66
|
+
specification_version: 2
|
67
|
+
summary: Whitelist-based HTML sanitizer.
|
68
|
+
test_files: []
|
69
|
+
|