sanitize 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/HISTORY +5 -0
- data/LICENSE +18 -0
- data/README +142 -0
- data/lib/sanitize/config/basic.rb +48 -0
- data/lib/sanitize/config/relaxed.rb +55 -0
- data/lib/sanitize/config/restricted.rb +29 -0
- data/lib/sanitize/config.rb +48 -0
- data/lib/sanitize.rb +132 -0
- metadata +69 -0
data/HISTORY
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
4
|
+
this software and associated documentation files (the 'Software'), to deal in
|
5
|
+
the Software without restriction, including without limitation the rights to
|
6
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
7
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
8
|
+
subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in all
|
11
|
+
copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
15
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
16
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
17
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
18
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README
ADDED
@@ -0,0 +1,142 @@
|
|
1
|
+
= Sanitize
|
2
|
+
|
3
|
+
Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
|
4
|
+
elements and attributes, Sanitize will remove all unacceptable HTML from a
|
5
|
+
string.
|
6
|
+
|
7
|
+
Using a simple configuration syntax, you can tell Sanitize to allow certain
|
8
|
+
elements, certain attributes within those elements, and even certain URL
|
9
|
+
protocols within attributes that contain URLs. Any HTML elements or attributes
|
10
|
+
that you don't explicitly allow will be removed.
|
11
|
+
|
12
|
+
Because it's based on Hpricot, a full-fledged HTML parser, rather than a bunch
|
13
|
+
of fragile regular expressions, Sanitize has no trouble dealing with malformed
|
14
|
+
or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
|
15
|
+
caution.
|
16
|
+
|
17
|
+
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
18
|
+
*Version*:: 1.0.0 (2008-12-25)
|
19
|
+
*Copyright*:: Copyright (c) 2008 Ryan Grove. All rights reserved.
|
20
|
+
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
21
|
+
*Website*:: http://github.com/rgrove/sanitize
|
22
|
+
|
23
|
+
== Requires
|
24
|
+
|
25
|
+
* RubyGems
|
26
|
+
* Hpricot 0.6+
|
27
|
+
|
28
|
+
== Usage
|
29
|
+
|
30
|
+
If you don't specify any configuration options, Sanitize will use its strictest
|
31
|
+
settings by default, which means it will strip all HTML.
|
32
|
+
|
33
|
+
require 'rubygems'
|
34
|
+
require 'sanitize'
|
35
|
+
|
36
|
+
html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
37
|
+
|
38
|
+
Sanitize.clean(html) # => 'foo'
|
39
|
+
|
40
|
+
== Configuration
|
41
|
+
|
42
|
+
In addition to the ultra-safe default settings, Sanitize comes with three other
|
43
|
+
built-in modes.
|
44
|
+
|
45
|
+
=== Sanitize::Config::RESTRICTED
|
46
|
+
|
47
|
+
Allows only very simple inline formatting markup. No links, images, or block
|
48
|
+
elements.
|
49
|
+
|
50
|
+
Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
|
51
|
+
|
52
|
+
=== Sanitize::Config::BASIC
|
53
|
+
|
54
|
+
Allows a variety of markup including formatting tags, links, and lists. Images
|
55
|
+
and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
|
56
|
+
protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
|
57
|
+
mitigate SEO spam.
|
58
|
+
|
59
|
+
Sanitize.clean(html, Sanitize::Config::BASIC)
|
60
|
+
# => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
|
61
|
+
|
62
|
+
=== Sanitize::Config::RELAXED
|
63
|
+
|
64
|
+
Allows an even wider variety of markup than BASIC, including images and tables.
|
65
|
+
Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
|
66
|
+
are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
|
67
|
+
added to links.
|
68
|
+
|
69
|
+
Sanitize.clean(html, Sanitize::Config::RELAXED)
|
70
|
+
# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
|
71
|
+
|
72
|
+
=== Custom Configuration
|
73
|
+
|
74
|
+
If the built-in modes don't meet your needs, you can easily specify a custom
|
75
|
+
configuration:
|
76
|
+
|
77
|
+
Sanitize.clean(html, :elements => ['a', 'span'],
|
78
|
+
:attributes => {'a' => ['href', 'title'], 'span' => ['class']},
|
79
|
+
:protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
|
80
|
+
|
81
|
+
==== :elements
|
82
|
+
|
83
|
+
Array of element names to allow. Specify all names in lowercase.
|
84
|
+
|
85
|
+
:elements => [
|
86
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
87
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
88
|
+
'sup', 'u', 'ul'
|
89
|
+
]
|
90
|
+
|
91
|
+
==== :attributes
|
92
|
+
|
93
|
+
Attributes to allow for specific elements. Specify all element names and
|
94
|
+
attributes in lowercase.
|
95
|
+
|
96
|
+
:attributes => {
|
97
|
+
'a' => ['href', 'title'],
|
98
|
+
'blockquote' => ['cite'],
|
99
|
+
'img' => ['alt', 'src', 'title']
|
100
|
+
}
|
101
|
+
|
102
|
+
==== :add_attributes
|
103
|
+
|
104
|
+
Attributes to add to specific elements. If the attribute already exists, it will
|
105
|
+
be replaced with the value specified here. Specify all element names and
|
106
|
+
attributes in lowercase.
|
107
|
+
|
108
|
+
:add_attributes => {
|
109
|
+
'a' => {'rel' => 'nofollow'}
|
110
|
+
}
|
111
|
+
|
112
|
+
==== :protocols
|
113
|
+
|
114
|
+
URL protocols to allow in specific attributes. If an attribute is listed here
|
115
|
+
and contains a protocol other than those specified (or if it contains no
|
116
|
+
protocol at all), it will be removed.
|
117
|
+
|
118
|
+
:protocols => {
|
119
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
120
|
+
'img' => {'src' => ['http', 'https']}
|
121
|
+
}
|
122
|
+
|
123
|
+
== License
|
124
|
+
|
125
|
+
Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
126
|
+
|
127
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
128
|
+
this software and associated documentation files (the 'Software'), to deal in
|
129
|
+
the Software without restriction, including without limitation the rights to
|
130
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
131
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
132
|
+
subject to the following conditions:
|
133
|
+
|
134
|
+
The above copyright notice and this permission notice shall be included in all
|
135
|
+
copies or substantial portions of the Software.
|
136
|
+
|
137
|
+
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
138
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
139
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
140
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
141
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
142
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
@@ -0,0 +1,48 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
BASIC = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
28
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
29
|
+
'sup', 'u', 'ul'],
|
30
|
+
|
31
|
+
:attributes => {
|
32
|
+
'a' => ['href'],
|
33
|
+
'blockquote' => ['cite'],
|
34
|
+
'q' => ['cite']
|
35
|
+
},
|
36
|
+
|
37
|
+
:add_attributes => {
|
38
|
+
'a' => {'rel' => 'nofollow'}
|
39
|
+
},
|
40
|
+
|
41
|
+
:protocols => {
|
42
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
43
|
+
'blockquote' => {'cite' => ['http', 'https']},
|
44
|
+
'q' => {'cite' => ['http', 'https']}
|
45
|
+
}
|
46
|
+
}
|
47
|
+
end
|
48
|
+
end
|
@@ -0,0 +1,55 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RELAXED = {
|
26
|
+
:elements => [
|
27
|
+
'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
|
28
|
+
'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
|
29
|
+
'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
|
30
|
+
'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
|
31
|
+
|
32
|
+
:attributes => {
|
33
|
+
'a' => ['href', 'title'],
|
34
|
+
'blockquote' => ['cite'],
|
35
|
+
'col' => ['span', 'width'],
|
36
|
+
'colgroup' => ['span', 'width'],
|
37
|
+
'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
|
38
|
+
'ol' => ['start', 'type'],
|
39
|
+
'q' => ['cite'],
|
40
|
+
'table' => ['summary', 'width'],
|
41
|
+
'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
|
42
|
+
'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
|
43
|
+
'width'],
|
44
|
+
'ul' => ['type']
|
45
|
+
},
|
46
|
+
|
47
|
+
:protocols => {
|
48
|
+
'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
|
49
|
+
'blockquote' => {'cite' => ['http', 'https']},
|
50
|
+
'img' => {'src' => ['http', 'https']},
|
51
|
+
'q' => {'cite' => ['http', 'https']}
|
52
|
+
}
|
53
|
+
}
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,29 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
RESTRICTED = {
|
26
|
+
:elements => ['b', 'em', 'i', 'strong', 'u']
|
27
|
+
}
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,48 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
class Sanitize
|
24
|
+
module Config
|
25
|
+
DEFAULT = {
|
26
|
+
# Whether or not to allow HTML comments. Allowing comments is strongly
|
27
|
+
# discouraged, since IE allows script execution within conditional
|
28
|
+
# comments.
|
29
|
+
:allow_comments => false,
|
30
|
+
|
31
|
+
# HTML elements to allow. By default, no elements are allowed (which means
|
32
|
+
# that all HTML will be stripped).
|
33
|
+
:elements => [],
|
34
|
+
|
35
|
+
# HTML attributes to allow in specific elements. By default, no attributes
|
36
|
+
# are allowed.
|
37
|
+
:attributes => {},
|
38
|
+
|
39
|
+
# HTML attributes to add to specific elements. By default, no attributes
|
40
|
+
# are added.
|
41
|
+
:add_attributes => {},
|
42
|
+
|
43
|
+
# URL handling protocols to allow in specific attributes. By default, no
|
44
|
+
# protocols are allowed.
|
45
|
+
:protocols => {}
|
46
|
+
}
|
47
|
+
end
|
48
|
+
end
|
data/lib/sanitize.rb
ADDED
@@ -0,0 +1,132 @@
|
|
1
|
+
#--
|
2
|
+
# Copyright (c) 2008 Ryan Grove <ryan@wonko.com>
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the 'Software'), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in all
|
12
|
+
# copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
20
|
+
# SOFTWARE.
|
21
|
+
#++
|
22
|
+
|
23
|
+
# Append this file's directory to the include path if it's not there already.
|
24
|
+
$:.unshift(File.dirname(File.expand_path(__FILE__)))
|
25
|
+
$:.uniq!
|
26
|
+
|
27
|
+
require 'rubygems'
|
28
|
+
gem 'hpricot', '~> 0.6'
|
29
|
+
|
30
|
+
require 'hpricot'
|
31
|
+
require 'sanitize/config'
|
32
|
+
require 'sanitize/config/restricted'
|
33
|
+
require 'sanitize/config/basic'
|
34
|
+
require 'sanitize/config/relaxed'
|
35
|
+
|
36
|
+
class Sanitize
|
37
|
+
#--
|
38
|
+
# Class Methods
|
39
|
+
#++
|
40
|
+
|
41
|
+
# Returns a sanitized copy of _html_, using the settings in _config_ if
|
42
|
+
# specified.
|
43
|
+
def self.clean(html, config = {})
|
44
|
+
sanitize = Sanitize.new(config)
|
45
|
+
sanitize.clean(html)
|
46
|
+
end
|
47
|
+
|
48
|
+
# Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
|
49
|
+
# were necessary.
|
50
|
+
def self.clean!(html, config = {})
|
51
|
+
sanitize = Sanitize.new(config)
|
52
|
+
sanitize.clean!(html)
|
53
|
+
end
|
54
|
+
|
55
|
+
#--
|
56
|
+
# Instance Methods
|
57
|
+
#++
|
58
|
+
|
59
|
+
# Returns a new Sanitize object initialized with the settings in _config_.
|
60
|
+
def initialize(config = {})
|
61
|
+
@config = Config::DEFAULT.merge(config)
|
62
|
+
end
|
63
|
+
|
64
|
+
# Returns a sanitized copy of _html_.
|
65
|
+
def clean(html)
|
66
|
+
dupe = html.dup
|
67
|
+
clean!(dupe) || dupe
|
68
|
+
end
|
69
|
+
|
70
|
+
# Performs clean in place, returning _html_, or +nil+ if no changes were
|
71
|
+
# necessary.
|
72
|
+
def clean!(html)
|
73
|
+
fragment = Hpricot(html)
|
74
|
+
|
75
|
+
fragment.traverse_element do |node|
|
76
|
+
if node.bogusetag? || node.doctype? || node.procins? || node.xmldecl?
|
77
|
+
node.swap('')
|
78
|
+
next
|
79
|
+
end
|
80
|
+
|
81
|
+
if node.comment?
|
82
|
+
node.swap('') unless @config[:allow_comments]
|
83
|
+
elsif node.elem?
|
84
|
+
name = node.name.downcase
|
85
|
+
|
86
|
+
# Delete any element that isn't in the whitelist.
|
87
|
+
unless @config[:elements].include?(name)
|
88
|
+
node.parent.replace_child(node, node.children)
|
89
|
+
next
|
90
|
+
end
|
91
|
+
|
92
|
+
if @config[:attributes].has_key?(name)
|
93
|
+
# Delete any attribute that isn't in the whitelist for this element.
|
94
|
+
node.raw_attributes.delete_if do |key, value|
|
95
|
+
!@config[:attributes][name].include?(key.downcase)
|
96
|
+
end
|
97
|
+
|
98
|
+
# Delete remaining attributes that use unacceptable protocols.
|
99
|
+
if @config[:protocols].has_key?(name)
|
100
|
+
protocol = @config[:protocols][name]
|
101
|
+
|
102
|
+
node.raw_attributes.delete_if do |key, value|
|
103
|
+
protocol.has_key?(key) && (!(value.downcase =~ /^([^:]+):/) ||
|
104
|
+
!protocol[key].include?($1.downcase))
|
105
|
+
end
|
106
|
+
end
|
107
|
+
else
|
108
|
+
# Delete all attributes from elements with no whitelisted
|
109
|
+
# attributes.
|
110
|
+
node.raw_attributes = {}
|
111
|
+
end
|
112
|
+
|
113
|
+
# Add required attributes.
|
114
|
+
if @config[:add_attributes].has_key?(name)
|
115
|
+
node.raw_attributes.merge!(@config[:add_attributes][name])
|
116
|
+
end
|
117
|
+
end
|
118
|
+
end
|
119
|
+
|
120
|
+
# Make one last pass through the fragment and replace angle brackets with
|
121
|
+
# entities in all text nodes. This helps eliminate certain types of
|
122
|
+
# maliciously-malformed nested tags.
|
123
|
+
fragment.traverse_element do |node|
|
124
|
+
if node.text?
|
125
|
+
node.swap(node.inner_text.gsub('<', '<').gsub('>', '>'))
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
result = fragment.to_s
|
130
|
+
return result == html ? nil : html[0, html.length] = result
|
131
|
+
end
|
132
|
+
end
|
metadata
ADDED
@@ -0,0 +1,69 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: sanitize
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Ryan Grove
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2008-12-24 00:00:00 -08:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: hpricot
|
17
|
+
type: :runtime
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ~>
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0.6"
|
24
|
+
version:
|
25
|
+
description:
|
26
|
+
email: ryan@wonko.com
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files: []
|
32
|
+
|
33
|
+
files:
|
34
|
+
- HISTORY
|
35
|
+
- LICENSE
|
36
|
+
- README
|
37
|
+
- lib/sanitize.rb
|
38
|
+
- lib/sanitize/config.rb
|
39
|
+
- lib/sanitize/config/basic.rb
|
40
|
+
- lib/sanitize/config/relaxed.rb
|
41
|
+
- lib/sanitize/config/restricted.rb
|
42
|
+
has_rdoc: false
|
43
|
+
homepage: http://github.com/rgrove/sanitize/
|
44
|
+
post_install_message:
|
45
|
+
rdoc_options: []
|
46
|
+
|
47
|
+
require_paths:
|
48
|
+
- lib
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
requirements:
|
51
|
+
- - ">="
|
52
|
+
- !ruby/object:Gem::Version
|
53
|
+
version: 1.8.6
|
54
|
+
version:
|
55
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
requirements: []
|
62
|
+
|
63
|
+
rubyforge_project:
|
64
|
+
rubygems_version: 1.2.0
|
65
|
+
signing_key:
|
66
|
+
specification_version: 2
|
67
|
+
summary: Whitelist-based HTML sanitizer.
|
68
|
+
test_files: []
|
69
|
+
|