adamh-sanitize 1.0.4.2

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY ADDED
@@ -0,0 +1,29 @@
1
+ Sanitize History
2
+ ================================================================================
3
+
4
+ Version 1.0.4 (2009-01-16)
5
+ * Fixed a bug that made it possible to sneak a non-whitelisted element through
6
+ by repeating it several times in a row. All versions of Sanitize prior to
7
+ 1.0.4 are vulnerable. [Reported by Cristobal]
8
+
9
+ Version 1.0.3 (2009-01-15)
10
+ * Fixed a bug whereby incomplete Unicode or hex entities could be used to
11
+ prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
12
+ still decode the incomplete entities, users of those browsers may be
13
+ vulnerable to malicious script injection on websites using versions of
14
+ Sanitize prior to 1.0.3.
15
+
16
+ Version 1.0.2 (2009-01-04)
17
+ * Fixed a bug that caused an exception to be thrown when parsing a valueless
18
+ attribute that's expected to contain a URL.
19
+
20
+ Version 1.0.1 (2009-01-01)
21
+ * You can now specify :relative in a protocol config array to allow attributes
22
+ containing relative URLs with no protocol. The Basic and Relaxed configs
23
+ have been updated to allow relative URLs.
24
+ * Added a workaround for an Hpricot bug that causes HTML entities for
25
+ non-ASCII characters to be replaced by question marks, and all other
26
+ entities to be destructively decoded.
27
+
28
+ Version 1.0.0 (2008-12-25)
29
+ * First release.
data/LICENSE ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the 'Software'), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7
+ the Software, and to permit persons to whom the Software is furnished to do so,
8
+ subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
15
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
16
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,150 @@
1
+ = Sanitize
2
+
3
+ Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
4
+ elements and attributes, Sanitize will remove all unacceptable HTML from a
5
+ string.
6
+
7
+ Using a simple configuration syntax, you can tell Sanitize to allow certain
8
+ elements, certain attributes within those elements, and even certain URL
9
+ protocols within attributes that contain URLs. Any HTML elements or attributes
10
+ that you don't explicitly allow will be removed.
11
+
12
+ Because it's based on Hpricot, a full-fledged HTML parser, rather than a bunch
13
+ of fragile regular expressions, Sanitize has no trouble dealing with malformed
14
+ or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
15
+ caution.
16
+
17
+ *Author*:: Ryan Grove (mailto:ryan@wonko.com)
18
+ *Version*:: 1.0.4 (2009-01-16)
19
+ *Copyright*:: Copyright (c) 2009 Ryan Grove. All rights reserved.
20
+ *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
21
+ *Website*:: http://github.com/rgrove/sanitize
22
+
23
+ == Requires
24
+
25
+ * RubyGems
26
+ * Hpricot 0.6+
27
+ * HTMLEntities 4.0.0+
28
+
29
+ == Usage
30
+
31
+ If you don't specify any configuration options, Sanitize will use its strictest
32
+ settings by default, which means it will strip all HTML.
33
+
34
+ require 'rubygems'
35
+ require 'sanitize'
36
+
37
+ html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
38
+
39
+ Sanitize.clean(html) # => 'foo'
40
+
41
+ == Configuration
42
+
43
+ In addition to the ultra-safe default settings, Sanitize comes with three other
44
+ built-in modes.
45
+
46
+ === Sanitize::Config::RESTRICTED
47
+
48
+ Allows only very simple inline formatting markup. No links, images, or block
49
+ elements.
50
+
51
+ Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
52
+
53
+ === Sanitize::Config::BASIC
54
+
55
+ Allows a variety of markup including formatting tags, links, and lists. Images
56
+ and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
57
+ protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
58
+ mitigate SEO spam.
59
+
60
+ Sanitize.clean(html, Sanitize::Config::BASIC)
61
+ # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
62
+
63
+ === Sanitize::Config::RELAXED
64
+
65
+ Allows an even wider variety of markup than BASIC, including images and tables.
66
+ Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
67
+ are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
68
+ added to links.
69
+
70
+ Sanitize.clean(html, Sanitize::Config::RELAXED)
71
+ # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
72
+
73
+ === Custom Configuration
74
+
75
+ If the built-in modes don't meet your needs, you can easily specify a custom
76
+ configuration:
77
+
78
+ Sanitize.clean(html, :elements => ['a', 'span'],
79
+ :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
80
+ :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
81
+
82
+ ==== :elements
83
+
84
+ Array of element names to allow. Specify all names in lowercase.
85
+
86
+ :elements => [
87
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
88
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
89
+ 'sup', 'u', 'ul'
90
+ ]
91
+
92
+ ==== :attributes
93
+
94
+ Attributes to allow for specific elements. Specify all element names and
95
+ attributes in lowercase.
96
+
97
+ :attributes => {
98
+ 'a' => ['href', 'title'],
99
+ 'blockquote' => ['cite'],
100
+ 'img' => ['alt', 'src', 'title']
101
+ }
102
+
103
+ ==== :add_attributes
104
+
105
+ Attributes to add to specific elements. If the attribute already exists, it will
106
+ be replaced with the value specified here. Specify all element names and
107
+ attributes in lowercase.
108
+
109
+ :add_attributes => {
110
+ 'a' => {'rel' => 'nofollow'}
111
+ }
112
+
113
+ ==== :protocols
114
+
115
+ URL protocols to allow in specific attributes. If an attribute is listed here
116
+ and contains a protocol other than those specified (or if it contains no
117
+ protocol at all), it will be removed.
118
+
119
+ :protocols => {
120
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
121
+ 'img' => {'src' => ['http', 'https']}
122
+ }
123
+
124
+ If you'd like to allow the use of relative URLs which don't have a protocol,
125
+ include the special value <code>:relative</code> in the protocol array:
126
+
127
+ :protocols => {
128
+ 'a' => {'href' => ['http', 'https', :relative]}
129
+ }
130
+
131
+ == License
132
+
133
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
134
+
135
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
136
+ this software and associated documentation files (the 'Software'), to deal in
137
+ the Software without restriction, including without limitation the rights to
138
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
139
+ the Software, and to permit persons to whom the Software is furnished to do so,
140
+ subject to the following conditions:
141
+
142
+ The above copyright notice and this permission notice shall be included in all
143
+ copies or substantial portions of the Software.
144
+
145
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
146
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
147
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
148
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
149
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
150
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ BASIC = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
28
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
29
+ 'sup', 'u', 'ul'],
30
+
31
+ :attributes => {
32
+ 'a' => ['href'],
33
+ 'blockquote' => ['cite'],
34
+ 'q' => ['cite']
35
+ },
36
+
37
+ :add_attributes => {
38
+ 'a' => {'rel' => 'nofollow'}
39
+ },
40
+
41
+ :protocols => {
42
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
43
+ :relative]},
44
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
45
+ 'q' => {'cite' => ['http', 'https', :relative]}
46
+ }
47
+ }
48
+ end
49
+ end
@@ -0,0 +1,56 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RELAXED = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
28
+ 'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
29
+ 'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
30
+ 'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
31
+
32
+ :attributes => {
33
+ 'a' => ['href', 'title'],
34
+ 'blockquote' => ['cite'],
35
+ 'col' => ['span', 'width'],
36
+ 'colgroup' => ['span', 'width'],
37
+ 'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
38
+ 'ol' => ['start', 'type'],
39
+ 'q' => ['cite'],
40
+ 'table' => ['summary', 'width'],
41
+ 'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
42
+ 'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
43
+ 'width'],
44
+ 'ul' => ['type']
45
+ },
46
+
47
+ :protocols => {
48
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
49
+ :relative]},
50
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
51
+ 'img' => {'src' => ['http', 'https', :relative]},
52
+ 'q' => {'cite' => ['http', 'https', :relative]}
53
+ }
54
+ }
55
+ end
56
+ end
@@ -0,0 +1,29 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RESTRICTED = {
26
+ :elements => ['b', 'em', 'i', 'strong', 'u']
27
+ }
28
+ end
29
+ end
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ DEFAULT = {
26
+ # Whether or not to allow HTML comments. Allowing comments is strongly
27
+ # discouraged, since IE allows script execution within conditional
28
+ # comments.
29
+ :allow_comments => false,
30
+
31
+ # HTML elements to allow. By default, no elements are allowed (which means
32
+ # that all HTML will be stripped).
33
+ :elements => [],
34
+
35
+ # HTML attributes to allow in specific elements. By default, no attributes
36
+ # are allowed.
37
+ :attributes => {},
38
+
39
+ # HTML attributes to add to specific elements. By default, no attributes
40
+ # are added.
41
+ :add_attributes => {},
42
+
43
+ # URL handling protocols to allow in specific attributes. By default, no
44
+ # protocols are allowed. Use :relative in place of a protocol if you want
45
+ # to allow relative URLs sans protocol.
46
+ :protocols => {}
47
+ }
48
+ end
49
+ end
data/lib/sanitize.rb ADDED
@@ -0,0 +1,156 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ # Append this file's directory to the include path if it's not there already.
24
+ $:.unshift(File.dirname(File.expand_path(__FILE__)))
25
+ $:.uniq!
26
+
27
+ require 'rubygems'
28
+
29
+ gem 'adamh-hpricot', '~> 0.6'
30
+ gem 'htmlentities', '~> 4.0.0'
31
+
32
+ require 'hpricot'
33
+ require 'htmlentities'
34
+ require 'sanitize/config'
35
+ require 'sanitize/config/restricted'
36
+ require 'sanitize/config/basic'
37
+ require 'sanitize/config/relaxed'
38
+
39
+ class Sanitize
40
+
41
+ # Matches an attribute value that could be treated by a browser as a URL
42
+ # with a protocol prefix, such as "http:" or "javascript:". Any string of one
43
+ # or more characters followed by a colon is considered a match, even if the
44
+ # colon is encoded as an entity and even if it's an incomplete entity (which
45
+ # IE6 and Opera will still parse).
46
+ REGEX_PROTOCOL = /^([^:]+)(?:\:|&#0*58|&#x0*3a)(?:[^0-9a-f]|$)/i
47
+
48
+ #--
49
+ # Class Methods
50
+ #++
51
+
52
+ # Returns a sanitized copy of _html_, using the settings in _config_ if
53
+ # specified.
54
+ def self.clean(html, config = {})
55
+ sanitize = Sanitize.new(config)
56
+ sanitize.clean(html)
57
+ end
58
+
59
+ # Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
60
+ # were made.
61
+ def self.clean!(html, config = {})
62
+ sanitize = Sanitize.new(config)
63
+ sanitize.clean!(html)
64
+ end
65
+
66
+ #--
67
+ # Instance Methods
68
+ #++
69
+
70
+ # Returns a new Sanitize object initialized with the settings in _config_.
71
+ def initialize(config = {})
72
+ @config = Config::DEFAULT.merge(config)
73
+ end
74
+
75
+ # Returns a sanitized copy of _html_.
76
+ def clean(html)
77
+ dupe = html.dup
78
+ clean!(dupe) || dupe
79
+ end
80
+
81
+ # Performs clean in place, returning _html_, or +nil+ if no changes were
82
+ # made.
83
+ def clean!(html)
84
+ fragment = Hpricot(html)
85
+
86
+ fragment.search('*') do |node|
87
+ if node.bogusetag? || node.doctype? || node.procins? || node.xmldecl?
88
+ node.parent.altered!
89
+ node.parent.children[node.parent.children.index(node), 1] = []
90
+ next
91
+ end
92
+
93
+ if node.comment?
94
+ unless @config[:allow_comments]
95
+ node.parent.altered!
96
+ node.parent.children[node.parent.children.index(node), 1] = []
97
+ end
98
+ elsif node.elem?
99
+ name = node.name.to_s.downcase
100
+
101
+ # Delete any element that isn't in the whitelist.
102
+ unless @config[:elements].include?(name)
103
+ node.parent.replace_child(node, node.children || [])
104
+ next
105
+ end
106
+
107
+ if @config[:attributes].has_key?(name)
108
+ # Delete any attribute that isn't in the whitelist for this element.
109
+ node.raw_attributes.delete_if do |key, value|
110
+ !@config[:attributes][name].include?(key.to_s.downcase)
111
+ end
112
+
113
+ # Delete remaining attributes that use unacceptable protocols.
114
+ if @config[:protocols].has_key?(name)
115
+ protocol = @config[:protocols][name]
116
+
117
+ node.raw_attributes.delete_if do |key, value|
118
+ next false unless protocol.has_key?(key)
119
+ next true if value.nil?
120
+
121
+ if value.to_s.downcase =~ REGEX_PROTOCOL
122
+ !protocol[key].include?($1.downcase)
123
+ else
124
+ !protocol[key].include?(:relative)
125
+ end
126
+ end
127
+ end
128
+ else
129
+ # Delete all attributes from elements with no whitelisted
130
+ # attributes.
131
+ node.raw_attributes = {}
132
+ end
133
+
134
+ # Add required attributes.
135
+ if @config[:add_attributes].has_key?(name)
136
+ node.raw_attributes.merge!(@config[:add_attributes][name])
137
+ end
138
+ end
139
+ end
140
+
141
+ # Make one last pass through the fragment and encode all special HTML chars
142
+ # and non-ASCII chars as entities. This eliminates certain types of
143
+ # maliciously-malformed nested tags and also compensates for Hpricot's
144
+ # burning desire to decode all entities.
145
+ coder = HTMLEntities.new
146
+
147
+ fragment.traverse_element do |node|
148
+ if node.text?
149
+ node.swap(coder.encode(node.inner_text, :named))
150
+ end
151
+ end
152
+
153
+ result = fragment.to_s
154
+ return result == html ? nil : html[0, html.length] = result
155
+ end
156
+ end
metadata ADDED
@@ -0,0 +1,77 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: adamh-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.4.2
5
+ platform: ruby
6
+ authors:
7
+ - Ryan Grove
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-01-11 00:00:00 -08:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: adamh-hpricot
17
+ version_requirement:
18
+ version_requirements: !ruby/object:Gem::Requirement
19
+ requirements:
20
+ - - ~>
21
+ - !ruby/object:Gem::Version
22
+ version: "0.6"
23
+ version:
24
+ - !ruby/object:Gem::Dependency
25
+ name: htmlentities
26
+ version_requirement:
27
+ version_requirements: !ruby/object:Gem::Requirement
28
+ requirements:
29
+ - - ~>
30
+ - !ruby/object:Gem::Version
31
+ version: 4.0.0
32
+ version:
33
+ description:
34
+ email: ryan@wonko.com
35
+ executables: []
36
+
37
+ extensions: []
38
+
39
+ extra_rdoc_files: []
40
+
41
+ files:
42
+ - HISTORY
43
+ - LICENSE
44
+ - README.rdoc
45
+ - lib/sanitize.rb
46
+ - lib/sanitize/config.rb
47
+ - lib/sanitize/config/basic.rb
48
+ - lib/sanitize/config/relaxed.rb
49
+ - lib/sanitize/config/restricted.rb
50
+ has_rdoc: false
51
+ homepage: http://github.com/rgrove/sanitize/
52
+ post_install_message:
53
+ rdoc_options: []
54
+
55
+ require_paths:
56
+ - lib
57
+ required_ruby_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: 1.8.6
62
+ version:
63
+ required_rubygems_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: "0"
68
+ version:
69
+ requirements: []
70
+
71
+ rubyforge_project:
72
+ rubygems_version: 1.2.0
73
+ signing_key:
74
+ specification_version: 2
75
+ summary: Whitelist-based HTML sanitizer.
76
+ test_files: []
77
+