peterc-sanitize 1.0.8

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY ADDED
@@ -0,0 +1,58 @@
1
+ Sanitize History
2
+ ================================================================================
3
+
4
+ Version 1.0.8 (2009-04-23)
5
+ * Added a workaround for an Hpricot bug that prevents attribute names from
6
+ being downcased in recent versions of Hpricot. This was exploitable to
7
+ prevent non-whitelisted protocols from being cleaned. [Reported by Ben
8
+ Wanicur]
9
+
10
+ Version 1.0.7 (2009-04-11)
11
+ * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
12
+ * Fixed a bug that caused named character entities containing digits (like
13
+ ²) to be escaped when they shouldn't have been. [Reported by Sebastian
14
+ Steinmetz]
15
+
16
+ Version 1.0.6 (2009-02-23)
17
+ * Removed htmlentities gem dependency.
18
+ * Existing well-formed character entity references in the input string are now
19
+ preserved rather than being decoded and re-encoded.
20
+ * The ' character is now encoded as ' instead of ' to prevent
21
+ problems in IE6.
22
+ * You can now specify the symbol :all in place of an element name in the
23
+ attributes config hash to allow certain attributes on all elements. [Thanks
24
+ to Mutwin Kraus]
25
+
26
+ Version 1.0.5 (2009-02-05)
27
+ * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
28
+ protocols from being cleaned when relative URLs were allowed. [Reported by
29
+ Dev Purkayastha]
30
+ * Fixed "undefined method `parent='" exceptions caused by parser changes in
31
+ edge Hpricot.
32
+
33
+ Version 1.0.4 (2009-01-16)
34
+ * Fixed a bug that made it possible to sneak a non-whitelisted element through
35
+ by repeating it several times in a row. All versions of Sanitize prior to
36
+ 1.0.4 are vulnerable. [Reported by Cristobal]
37
+
38
+ Version 1.0.3 (2009-01-15)
39
+ * Fixed a bug whereby incomplete Unicode or hex entities could be used to
40
+ prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
41
+ still decode the incomplete entities, users of those browsers may be
42
+ vulnerable to malicious script injection on websites using versions of
43
+ Sanitize prior to 1.0.3.
44
+
45
+ Version 1.0.2 (2009-01-04)
46
+ * Fixed a bug that caused an exception to be thrown when parsing a valueless
47
+ attribute that's expected to contain a URL.
48
+
49
+ Version 1.0.1 (2009-01-01)
50
+ * You can now specify :relative in a protocol config array to allow attributes
51
+ containing relative URLs with no protocol. The Basic and Relaxed configs
52
+ have been updated to allow relative URLs.
53
+ * Added a workaround for an Hpricot bug that causes HTML entities for
54
+ non-ASCII characters to be replaced by question marks, and all other
55
+ entities to be destructively decoded.
56
+
57
+ Version 1.0.0 (2008-12-25)
58
+ * First release.
data/LICENSE ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the 'Software'), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7
+ the Software, and to permit persons to whom the Software is furnished to do so,
8
+ subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
15
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
16
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,169 @@
1
+ = Sanitize
2
+
3
+ Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
4
+ elements and attributes, Sanitize will remove all unacceptable HTML from a
5
+ string.
6
+
7
+ Using a simple configuration syntax, you can tell Sanitize to allow certain
8
+ elements, certain attributes within those elements, and even certain URL
9
+ protocols within attributes that contain URLs. Any HTML elements or attributes
10
+ that you don't explicitly allow will be removed.
11
+
12
+ Because it's based on Hpricot, a full-fledged HTML parser, rather than a bunch
13
+ of fragile regular expressions, Sanitize has no trouble dealing with malformed
14
+ or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
15
+ caution.
16
+
17
+ *Author*:: Ryan Grove (mailto:ryan@wonko.com)
18
+ *Version*:: 1.0.8 (2009-04-23)
19
+ *Copyright*:: Copyright (c) 2009 Ryan Grove. All rights reserved.
20
+ *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
21
+ *Website*:: http://github.com/rgrove/sanitize
22
+
23
+ == Requires
24
+
25
+ * RubyGems
26
+ * Hpricot 0.6+
27
+
28
+ == Usage
29
+
30
+ If you don't specify any configuration options, Sanitize will use its strictest
31
+ settings by default, which means it will strip all HTML.
32
+
33
+ require 'rubygems'
34
+ require 'sanitize'
35
+
36
+ html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
37
+
38
+ Sanitize.clean(html) # => 'foo'
39
+
40
+ == Configuration
41
+
42
+ In addition to the ultra-safe default settings, Sanitize comes with three other
43
+ built-in modes.
44
+
45
+ === Sanitize::Config::RESTRICTED
46
+
47
+ Allows only very simple inline formatting markup. No links, images, or block
48
+ elements.
49
+
50
+ Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
51
+
52
+ === Sanitize::Config::BASIC
53
+
54
+ Allows a variety of markup including formatting tags, links, and lists. Images
55
+ and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
56
+ protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
57
+ mitigate SEO spam.
58
+
59
+ Sanitize.clean(html, Sanitize::Config::BASIC)
60
+ # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
61
+
62
+ === Sanitize::Config::RELAXED
63
+
64
+ Allows an even wider variety of markup than BASIC, including images and tables.
65
+ Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
66
+ are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
67
+ added to links.
68
+
69
+ Sanitize.clean(html, Sanitize::Config::RELAXED)
70
+ # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
71
+
72
+ === Custom Configuration
73
+
74
+ If the built-in modes don't meet your needs, you can easily specify a custom
75
+ configuration:
76
+
77
+ Sanitize.clean(html, :elements => ['a', 'span'],
78
+ :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
79
+ :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
80
+
81
+ ==== :elements
82
+
83
+ Array of element names to allow. Specify all names in lowercase.
84
+
85
+ :elements => [
86
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
87
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
88
+ 'sup', 'u', 'ul'
89
+ ]
90
+
91
+ ==== :attributes
92
+
93
+ Attributes to allow for specific elements. Specify all element names and
94
+ attributes in lowercase.
95
+
96
+ :attributes => {
97
+ 'a' => ['href', 'title'],
98
+ 'blockquote' => ['cite'],
99
+ 'img' => ['alt', 'src', 'title']
100
+ }
101
+
102
+ If you'd like to allow certain attributes on all elements, use the symbol
103
+ <code>:all</code> instead of an element name.
104
+
105
+ :attributes => {
106
+ :all => ['class'],
107
+ 'a' => ['href', 'title']
108
+ }
109
+
110
+ ==== :add_attributes
111
+
112
+ Attributes to add to specific elements. If the attribute already exists, it will
113
+ be replaced with the value specified here. Specify all element names and
114
+ attributes in lowercase.
115
+
116
+ :add_attributes => {
117
+ 'a' => {'rel' => 'nofollow'}
118
+ }
119
+
120
+ ==== :protocols
121
+
122
+ URL protocols to allow in specific attributes. If an attribute is listed here
123
+ and contains a protocol other than those specified (or if it contains no
124
+ protocol at all), it will be removed.
125
+
126
+ :protocols => {
127
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
128
+ 'img' => {'src' => ['http', 'https']}
129
+ }
130
+
131
+ If you'd like to allow the use of relative URLs which don't have a protocol,
132
+ include the symbol <code>:relative</code> in the protocol array:
133
+
134
+ :protocols => {
135
+ 'a' => {'href' => ['http', 'https', :relative]}
136
+ }
137
+
138
+
139
+ == Contributors
140
+
141
+ The following lovely people have contributed to Sanitize in the form of patches
142
+ or ideas that later became code:
143
+
144
+ * Ryan Grove <ryan@wonko.com>
145
+ * Adam Hooper <adam@adamhooper.com>
146
+ * Mutwin Kraus <mutle@blogage.de>
147
+ * Dev Purkayastha <dev.purkayastha@gmail.com>
148
+ * Ben Wanicur <bwanicur@verticalresponse.com>
149
+
150
+ == License
151
+
152
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
153
+
154
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
155
+ this software and associated documentation files (the 'Software'), to deal in
156
+ the Software without restriction, including without limitation the rights to
157
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
158
+ the Software, and to permit persons to whom the Software is furnished to do so,
159
+ subject to the following conditions:
160
+
161
+ The above copyright notice and this permission notice shall be included in all
162
+ copies or substantial portions of the Software.
163
+
164
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
165
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
166
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
167
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
168
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
169
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/lib/sanitize.rb ADDED
@@ -0,0 +1,187 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ # Append this file's directory to the include path if it's not there already.
24
+ $:.unshift(File.dirname(File.expand_path(__FILE__)))
25
+ $:.uniq!
26
+
27
+ require 'rubygems'
28
+
29
+ gem 'hpricot', '~> 0.8.1'
30
+
31
+ require 'hpricot'
32
+ require 'sanitize/config'
33
+ require 'sanitize/config/restricted'
34
+ require 'sanitize/config/basic'
35
+ require 'sanitize/config/relaxed'
36
+
37
+ class Sanitize
38
+
39
+ # Characters that should be replaced with entities in text nodes.
40
+ ENTITY_MAP = {
41
+ '<' => '&lt;',
42
+ '>' => '&gt;',
43
+ '"' => '&quot;',
44
+ "'" => '&#39;'
45
+ }
46
+
47
+ # Matches an unencoded ampersand that is not part of a valid character entity
48
+ # reference.
49
+ REGEX_AMPERSAND = /&(?!(?:[a-z]+[0-9]{0,2}|#[0-9]+|#x[0-9a-f]+);)/i
50
+
51
+ # Matches an attribute value that could be treated by a browser as a URL
52
+ # with a protocol prefix, such as "http:" or "javascript:". Any string of zero
53
+ # or more characters followed by a colon is considered a match, even if the
54
+ # colon is encoded as an entity and even if it's an incomplete entity (which
55
+ # IE6 and Opera will still parse).
56
+ REGEX_PROTOCOL = /^([A-Za-z0-9\+\-\.\&\;\#\s]*?)(?:\:|&#0*58|&#x0*3a)/i
57
+
58
+ #--
59
+ # Instance Methods
60
+ #++
61
+
62
+ # Returns a new Sanitize object initialized with the settings in _config_.
63
+ def initialize(config = {})
64
+ @config = Config::DEFAULT.merge(config)
65
+ end
66
+
67
+ # Returns a sanitized copy of _html_.
68
+ def clean(html)
69
+ dupe = html.dup
70
+ clean!(dupe) || dupe
71
+ end
72
+
73
+ # Performs clean in place, returning _html_, or +nil+ if no changes were
74
+ # made.
75
+ def clean!(html)
76
+ fragment = Hpricot(html)
77
+
78
+ fragment.search('*') do |node|
79
+ if node.bogusetag? || node.doctype? || node.procins? || node.xmldecl?
80
+ node.parent.replace_child(node, '')
81
+ next
82
+ end
83
+
84
+ if node.comment?
85
+ node.parent.replace_child(node, '') unless @config[:allow_comments]
86
+ elsif node.elem?
87
+ name = node.name.to_s.downcase
88
+
89
+ # Delete any element that isn't in the whitelist.
90
+ unless @config[:elements].include?(name)
91
+ node.parent.replace_child(node, node.children || '')
92
+ next
93
+ end
94
+
95
+ node.raw_attributes ||= {}
96
+
97
+ attr_whitelist = ((@config[:attributes][name] || []) +
98
+ (@config[:attributes][:all] || [])).uniq
99
+
100
+ if attr_whitelist.empty?
101
+ # Delete all attributes from elements with no whitelisted
102
+ # attributes.
103
+ node.raw_attributes = {}
104
+ else
105
+ # Delete any attribute that isn't in the whitelist for this element.
106
+ node.raw_attributes.delete_if do |key, value|
107
+ !attr_whitelist.include?(key.to_s.downcase)
108
+ end
109
+
110
+ # Delete remaining attributes that use unacceptable protocols.
111
+ if @config[:protocols].has_key?(name)
112
+ protocol = @config[:protocols][name]
113
+
114
+ node.raw_attributes.delete_if do |key, value|
115
+ key = key.to_s.downcase
116
+ next false unless protocol.has_key?(key)
117
+ next true if value.nil?
118
+
119
+ if value.to_s.downcase =~ REGEX_PROTOCOL
120
+ !protocol[key].include?($1.downcase)
121
+ else
122
+ !protocol[key].include?(:relative)
123
+ end
124
+ end
125
+ end
126
+ end
127
+
128
+ # Add required attributes.
129
+ if @config[:add_attributes].has_key?(name)
130
+ node.raw_attributes.merge!(@config[:add_attributes][name])
131
+ end
132
+
133
+ unless @config[:skip_encoding]
134
+ # Escape special chars in attribute values.
135
+ node.raw_attributes.each do |key, value|
136
+ node.raw_attributes[key] = Sanitize.encode_html(value)
137
+ end
138
+ end
139
+ end
140
+ end
141
+
142
+ # Make one last pass through the fragment and encode all special HTML chars
143
+ # as entities. This eliminates certain types of maliciously-malformed nested
144
+ # tags.
145
+ unless @config[:skip_encoding]
146
+ fragment.search('*') do |node|
147
+ node.swap(Sanitize.encode_html(node.to_original_html)) if node.text?
148
+ end
149
+ end
150
+
151
+ result = fragment.to_s
152
+ return result == html ? nil : html[0, html.length] = result
153
+ end
154
+
155
+ #--
156
+ # Class Methods
157
+ #++
158
+
159
+ class << self
160
+ # Returns a sanitized copy of _html_, using the settings in _config_ if
161
+ # specified.
162
+ def clean(html, config = {})
163
+ sanitize = Sanitize.new(config)
164
+ sanitize.clean(html)
165
+ end
166
+
167
+ # Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
168
+ # were made.
169
+ def clean!(html, config = {})
170
+ sanitize = Sanitize.new(config)
171
+ sanitize.clean!(html)
172
+ end
173
+
174
+ # Encodes special HTML characters (<, >, ", ', and &) in _html_ as entity
175
+ # references and returns the encoded string.
176
+ def encode_html(html)
177
+ str = html.dup
178
+
179
+ # Encode special chars.
180
+ ENTITY_MAP.each {|char, entity| str.gsub!(char, entity) }
181
+
182
+ # Convert unencoded ampersands to entity references.
183
+ str.gsub(REGEX_AMPERSAND, '&amp;')
184
+ end
185
+ end
186
+
187
+ end
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ DEFAULT = {
26
+ # Whether or not to allow HTML comments. Allowing comments is strongly
27
+ # discouraged, since IE allows script execution within conditional
28
+ # comments.
29
+ :allow_comments => false,
30
+
31
+ # HTML elements to allow. By default, no elements are allowed (which means
32
+ # that all HTML will be stripped).
33
+ :elements => [],
34
+
35
+ # HTML attributes to allow in specific elements. By default, no attributes
36
+ # are allowed.
37
+ :attributes => {},
38
+
39
+ # HTML attributes to add to specific elements. By default, no attributes
40
+ # are added.
41
+ :add_attributes => {},
42
+
43
+ # URL handling protocols to allow in specific attributes. By default, no
44
+ # protocols are allowed. Use :relative in place of a protocol if you want
45
+ # to allow relative URLs sans protocol.
46
+ :protocols => {}
47
+ }
48
+ end
49
+ end
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ BASIC = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
28
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
29
+ 'sup', 'u', 'ul'],
30
+
31
+ :attributes => {
32
+ 'a' => ['href'],
33
+ 'blockquote' => ['cite'],
34
+ 'q' => ['cite']
35
+ },
36
+
37
+ :add_attributes => {
38
+ 'a' => {'rel' => 'nofollow'}
39
+ },
40
+
41
+ :protocols => {
42
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
43
+ :relative]},
44
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
45
+ 'q' => {'cite' => ['http', 'https', :relative]}
46
+ }
47
+ }
48
+ end
49
+ end
@@ -0,0 +1,56 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RELAXED = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
28
+ 'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
29
+ 'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
30
+ 'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
31
+
32
+ :attributes => {
33
+ 'a' => ['href', 'title'],
34
+ 'blockquote' => ['cite'],
35
+ 'col' => ['span', 'width'],
36
+ 'colgroup' => ['span', 'width'],
37
+ 'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
38
+ 'ol' => ['start', 'type'],
39
+ 'q' => ['cite'],
40
+ 'table' => ['summary', 'width'],
41
+ 'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
42
+ 'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
43
+ 'width'],
44
+ 'ul' => ['type']
45
+ },
46
+
47
+ :protocols => {
48
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
49
+ :relative]},
50
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
51
+ 'img' => {'src' => ['http', 'https', :relative]},
52
+ 'q' => {'cite' => ['http', 'https', :relative]}
53
+ }
54
+ }
55
+ end
56
+ end
@@ -0,0 +1,29 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RESTRICTED = {
26
+ :elements => ['b', 'em', 'i', 'strong', 'u']
27
+ }
28
+ end
29
+ end
metadata ADDED
@@ -0,0 +1,71 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: peterc-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.8
5
+ platform: ruby
6
+ authors:
7
+ - Ryan Grove
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+
12
+ date: 2009-10-05 00:00:00 +01:00
13
+ default_executable:
14
+ dependencies:
15
+ - !ruby/object:Gem::Dependency
16
+ name: hpricot
17
+ type: :runtime
18
+ version_requirement:
19
+ version_requirements: !ruby/object:Gem::Requirement
20
+ requirements:
21
+ - - ~>
22
+ - !ruby/object:Gem::Version
23
+ version: 0.8.1
24
+ version:
25
+ description:
26
+ email: ryan@wonko.com
27
+ executables: []
28
+
29
+ extensions: []
30
+
31
+ extra_rdoc_files: []
32
+
33
+ files:
34
+ - HISTORY
35
+ - LICENSE
36
+ - README.rdoc
37
+ - lib/sanitize.rb
38
+ - lib/sanitize/config.rb
39
+ - lib/sanitize/config/basic.rb
40
+ - lib/sanitize/config/relaxed.rb
41
+ - lib/sanitize/config/restricted.rb
42
+ has_rdoc: true
43
+ homepage: http://github.com/peterc/sanitize/
44
+ licenses: []
45
+
46
+ post_install_message:
47
+ rdoc_options: []
48
+
49
+ require_paths:
50
+ - lib
51
+ required_ruby_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: 1.8.6
56
+ version:
57
+ required_rubygems_version: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: "0"
62
+ version:
63
+ requirements: []
64
+
65
+ rubyforge_project:
66
+ rubygems_version: 1.3.5
67
+ signing_key:
68
+ specification_version: 3
69
+ summary: Whitelist-based HTML sanitizer.
70
+ test_files: []
71
+