dasil003-sanitize 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
data/HISTORY ADDED
@@ -0,0 +1,65 @@
1
+ Sanitize History
2
+ ================================================================================
3
+
4
+ Version 1.1.0 (2009-10-11)
5
+ * Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
6
+ * Added an :output config setting to allow the output format to be specified.
7
+ Supported formats are :xhtml (the default) and :html (which outputs HTML4).
8
+ * Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
9
+ path segments. [Peter Cooper]
10
+
11
+ Version 1.0.8 (2009-04-23)
12
+ * Added a workaround for an Hpricot bug that prevents attribute names from
13
+ being downcased in recent versions of Hpricot. This was exploitable to
14
+ prevent non-whitelisted protocols from being cleaned. [Reported by Ben
15
+ Wanicur]
16
+
17
+ Version 1.0.7 (2009-04-11)
18
+ * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
19
+ * Fixed a bug that caused named character entities containing digits (like
20
+ ²) to be escaped when they shouldn't have been. [Reported by Sebastian
21
+ Steinmetz]
22
+
23
+ Version 1.0.6 (2009-02-23)
24
+ * Removed htmlentities gem dependency.
25
+ * Existing well-formed character entity references in the input string are now
26
+ preserved rather than being decoded and re-encoded.
27
+ * The ' character is now encoded as ' instead of ' to prevent
28
+ problems in IE6.
29
+ * You can now specify the symbol :all in place of an element name in the
30
+ attributes config hash to allow certain attributes on all elements. [Thanks
31
+ to Mutwin Kraus]
32
+
33
+ Version 1.0.5 (2009-02-05)
34
+ * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
35
+ protocols from being cleaned when relative URLs were allowed. [Reported by
36
+ Dev Purkayastha]
37
+ * Fixed "undefined method `parent='" exceptions caused by parser changes in
38
+ edge Hpricot.
39
+
40
+ Version 1.0.4 (2009-01-16)
41
+ * Fixed a bug that made it possible to sneak a non-whitelisted element through
42
+ by repeating it several times in a row. All versions of Sanitize prior to
43
+ 1.0.4 are vulnerable. [Reported by Cristobal]
44
+
45
+ Version 1.0.3 (2009-01-15)
46
+ * Fixed a bug whereby incomplete Unicode or hex entities could be used to
47
+ prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
48
+ still decode the incomplete entities, users of those browsers may be
49
+ vulnerable to malicious script injection on websites using versions of
50
+ Sanitize prior to 1.0.3.
51
+
52
+ Version 1.0.2 (2009-01-04)
53
+ * Fixed a bug that caused an exception to be thrown when parsing a valueless
54
+ attribute that's expected to contain a URL.
55
+
56
+ Version 1.0.1 (2009-01-01)
57
+ * You can now specify :relative in a protocol config array to allow attributes
58
+ containing relative URLs with no protocol. The Basic and Relaxed configs
59
+ have been updated to allow relative URLs.
60
+ * Added a workaround for an Hpricot bug that causes HTML entities for
61
+ non-ASCII characters to be replaced by question marks, and all other
62
+ entities to be destructively decoded.
63
+
64
+ Version 1.0.0 (2008-12-25)
65
+ * First release.
data/LICENSE ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the 'Software'), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7
+ the Software, and to permit persons to whom the Software is furnished to do so,
8
+ subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
15
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
16
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,212 @@
1
+ = Sanitize
2
+
3
+ Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
4
+ elements and attributes, Sanitize will remove all unacceptable HTML from a
5
+ string.
6
+
7
+ Using a simple configuration syntax, you can tell Sanitize to allow certain
8
+ elements, certain attributes within those elements, and even certain URL
9
+ protocols within attributes that contain URLs. Any HTML elements or attributes
10
+ that you don't explicitly allow will be removed.
11
+
12
+ Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
13
+ of fragile regular expressions, Sanitize has no trouble dealing with malformed
14
+ or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
15
+ caution.
16
+
17
+ *Author*:: Ryan Grove (mailto:ryan@wonko.com)
18
+ *Version*:: 1.1.0 (2009-10-11)
19
+ *Copyright*:: Copyright (c) 2009 Ryan Grove. All rights reserved.
20
+ *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
21
+ *Website*:: http://github.com/rgrove/sanitize
22
+
23
+ == Requires
24
+
25
+ * Nokogiri
26
+ * libxml2 >= 2.7.2
27
+
28
+ == Installation
29
+
30
+ Latest stable release:
31
+
32
+ gem install sanitize
33
+
34
+ Latest development version:
35
+
36
+ gem install sanitize -s http://gemcutter.org --prerelease
37
+
38
+ == Usage
39
+
40
+ If you don't specify any configuration options, Sanitize will use its strictest
41
+ settings by default, which means it will strip all HTML.
42
+
43
+ require 'rubygems'
44
+ require 'sanitize'
45
+
46
+ html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
47
+
48
+ Sanitize.clean(html) # => 'foo'
49
+
50
+ == Configuration
51
+
52
+ In addition to the ultra-safe default settings, Sanitize comes with three other
53
+ built-in modes.
54
+
55
+ === Sanitize::Config::RESTRICTED
56
+
57
+ Allows only very simple inline formatting markup. No links, images, or block
58
+ elements.
59
+
60
+ Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
61
+
62
+ === Sanitize::Config::BASIC
63
+
64
+ Allows a variety of markup including formatting tags, links, and lists. Images
65
+ and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
66
+ protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
67
+ mitigate SEO spam.
68
+
69
+ Sanitize.clean(html, Sanitize::Config::BASIC)
70
+ # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
71
+
72
+ === Sanitize::Config::RELAXED
73
+
74
+ Allows an even wider variety of markup than BASIC, including images and tables.
75
+ Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
76
+ are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
77
+ added to links.
78
+
79
+ Sanitize.clean(html, Sanitize::Config::RELAXED)
80
+ # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
81
+
82
+ === Custom Configuration
83
+
84
+ If the built-in modes don't meet your needs, you can easily specify a custom
85
+ configuration:
86
+
87
+ Sanitize.clean(html, :elements => ['a', 'span'],
88
+ :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
89
+ :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
90
+
91
+ ==== :elements
92
+
93
+ Array of element names to allow. Specify all names in lowercase.
94
+
95
+ :elements => [
96
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
97
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
98
+ 'sup', 'u', 'ul'
99
+ ]
100
+
101
+ ==== :attributes
102
+
103
+ Attributes to allow for specific elements. Specify all element names and
104
+ attributes in lowercase.
105
+
106
+ :attributes => {
107
+ 'a' => ['href', 'title'],
108
+ 'blockquote' => ['cite'],
109
+ 'img' => ['alt', 'src', 'title']
110
+ }
111
+
112
+ If you'd like to allow certain attributes on all elements, use the symbol
113
+ <code>:all</code> instead of an element name.
114
+
115
+ :attributes => {
116
+ :all => ['class'],
117
+ 'a' => ['href', 'title']
118
+ }
119
+
120
+ ==== :add_attributes
121
+
122
+ Attributes to add to specific elements. If the attribute already exists, it will
123
+ be replaced with the value specified here. Specify all element names and
124
+ attributes in lowercase.
125
+
126
+ :add_attributes => {
127
+ 'a' => {'rel' => 'nofollow'}
128
+ }
129
+
130
+ ==== :protocols
131
+
132
+ URL protocols to allow in specific attributes. If an attribute is listed here
133
+ and contains a protocol other than those specified (or if it contains no
134
+ protocol at all), it will be removed.
135
+
136
+ :protocols => {
137
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
138
+ 'img' => {'src' => ['http', 'https']}
139
+ }
140
+
141
+ If you'd like to allow the use of relative URLs which don't have a protocol,
142
+ include the symbol <code>:relative</code> in the protocol array:
143
+
144
+ :protocols => {
145
+ 'a' => {'href' => ['http', 'https', :relative]}
146
+ }
147
+
148
+ ==== :object_urls
149
+
150
+ URL prefixes to allow specific flash embed codes. This can be used to allow
151
+ standard video embeds such as provided by YouTube:
152
+
153
+ :object_urls => ['http://www.youtube.com']
154
+
155
+ *Warning* Do not under any circumstances add 'object' or 'embed' to the standard
156
+ config. It is unnecessary and will open an XSS hole.
157
+
158
+ Because object tags are more complex than most other tags and include many
159
+ XSS attack vectors, this functionality follows a completely different code path
160
+ from the regular filtering. There is a secondary configuration variable that
161
+ controls what is allowed on the object tag and its descendents, by default it
162
+ is:
163
+
164
+ :object_config => {
165
+ :elements => ['object', 'param', 'embed'],
166
+ :attributes => {
167
+ 'object' => ['width', 'height'],
168
+ 'param' => ['name', 'value'],
169
+ 'embed' => ['src', 'type', 'allowscriptaccess', 'allowfullscreen',
170
+ 'width', 'height']
171
+ }}
172
+
173
+ This config is applied to the object tag and all of its immediate descendents
174
+ instead of the standard config. This initial configuration was crafted specifically
175
+ to allow YouTube and Vimeo embed codes in this format:
176
+
177
+ <object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/qVaEPx_VyXs&hl=en&fs=1&color1=0xcc2550&color2=0xe87a9f"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/qVaEPx_VyXs&hl=en&fs=1&color1=0xcc2550&color2=0xe87a9f" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
178
+
179
+
180
+ == Contributors
181
+
182
+ The following lovely people have contributed to Sanitize in the form of patches
183
+ or ideas that later became code:
184
+
185
+ * Peter Cooper <git@peterc.org>
186
+ * Gabe da Silveira <gabe@websaviour.com>
187
+ * Ryan Grove <ryan@wonko.com>
188
+ * Adam Hooper <adam@adamhooper.com>
189
+ * Mutwin Kraus <mutle@blogage.de>
190
+ * Dev Purkayastha <dev.purkayastha@gmail.com>
191
+ * Ben Wanicur <bwanicur@verticalresponse.com>
192
+
193
+ == License
194
+
195
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
196
+
197
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
198
+ this software and associated documentation files (the 'Software'), to deal in
199
+ the Software without restriction, including without limitation the rights to
200
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
201
+ the Software, and to permit persons to whom the Software is furnished to do so,
202
+ subject to the following conditions:
203
+
204
+ The above copyright notice and this permission notice shall be included in all
205
+ copies or substantial portions of the Software.
206
+
207
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
208
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
209
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
210
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
211
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
212
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/lib/sanitize.rb ADDED
@@ -0,0 +1,188 @@
1
+ # encoding: utf-8
2
+ #--
3
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
4
+ #
5
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ # of this software and associated documentation files (the 'Software'), to deal
7
+ # in the Software without restriction, including without limitation the rights
8
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ # copies of the Software, and to permit persons to whom the Software is
10
+ # furnished to do so, subject to the following conditions:
11
+ #
12
+ # The above copyright notice and this permission notice shall be included in all
13
+ # copies or substantial portions of the Software.
14
+ #
15
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ # SOFTWARE.
22
+ #++
23
+
24
+ require 'nokogiri'
25
+ require 'sanitize/version'
26
+ require 'sanitize/config'
27
+ require 'sanitize/config/restricted'
28
+ require 'sanitize/config/basic'
29
+ require 'sanitize/config/relaxed'
30
+
31
+ class Sanitize
32
+
33
+ # Matches an attribute value that could be treated by a browser as a URL
34
+ # with a protocol prefix, such as "http:" or "javascript:". Any string of zero
35
+ # or more characters followed by a colon is considered a match, even if the
36
+ # colon is encoded as an entity and even if it's an incomplete entity (which
37
+ # IE6 and Opera will still parse).
38
+ REGEX_PROTOCOL = /^([A-Za-z0-9\+\-\.\&\;\#\s]*?)(?:\:|&#0*58|&#x0*3a)/i
39
+
40
+ #--
41
+ # Instance Methods
42
+ #++
43
+
44
+ # Returns a new Sanitize object initialized with the settings in _config_.
45
+ def initialize(config = {})
46
+ @config = Config::DEFAULT.merge(config)
47
+ end
48
+
49
+ # Returns a sanitized copy of _html_.
50
+ def clean(html)
51
+ dupe = html.dup
52
+ clean!(dupe) || dupe
53
+ end
54
+
55
+ # Performs clean in place, returning _html_, or +nil+ if no changes were
56
+ # made.
57
+ def clean!(html)
58
+ fragment = Nokogiri::HTML::DocumentFragment.parse(html)
59
+
60
+ fragment.traverse do |node|
61
+ if node.comment?
62
+ node.unlink unless @config[:allow_comments]
63
+ elsif node.element?
64
+ name = node.name.to_s.downcase
65
+ parent_name = node.parent ? node.parent.name.to_s.downcase : nil
66
+
67
+ # Special handling of objects is necessary to limit by specific domains.
68
+ if @config[:object_urls].any? &&
69
+ [name, parent_name].include?('object')
70
+ unless @config[:object_config][:elements].include?(name)
71
+ node.unlink
72
+ next
73
+ end
74
+
75
+ attr_whitelist = @config[:object_config][:attributes][name] || []
76
+
77
+ # Remove non-whitelisted object interior tag attributes
78
+ node.attribute_nodes.each do |attr|
79
+ attr.unlink unless attr_whitelist.include?(attr.name.downcase)
80
+ end
81
+
82
+ # Remove non-whitelisted object URLs.
83
+ object_url = if name == 'param' && node['name'] == 'movie'
84
+ node['value']
85
+ elsif name == 'embed'
86
+ node['src']
87
+ end
88
+
89
+ if object_url &&
90
+ !@config[:object_urls].any?{|good| object_url.index(good) == 0}
91
+ node.parent.unlink
92
+ end
93
+
94
+ next
95
+ end
96
+
97
+ # Delete any element that isn't in the whitelist.
98
+ unless @config[:elements].include?(name)
99
+ node.children.each { |n| node.add_previous_sibling(n) }
100
+ node.unlink
101
+ next
102
+ end
103
+
104
+ attr_whitelist = ((@config[:attributes][name] || []) +
105
+ (@config[:attributes][:all] || [])).uniq
106
+
107
+ if attr_whitelist.empty?
108
+ # Delete all attributes from elements with no whitelisted
109
+ # attributes.
110
+ node.attribute_nodes.each { |attr| attr.remove }
111
+ else
112
+ # Delete any attribute that isn't in the whitelist for this element.
113
+ node.attribute_nodes.each do |attr|
114
+ attr.unlink unless attr_whitelist.include?(attr.name.downcase)
115
+ end
116
+
117
+ # Delete remaining attributes that use unacceptable protocols.
118
+ if @config[:protocols].has_key?(name)
119
+ protocol = @config[:protocols][name]
120
+
121
+ node.attribute_nodes.each do |attr|
122
+ attr_name = attr.name.downcase
123
+ next false unless protocol.has_key?(attr_name)
124
+
125
+ del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
126
+ !protocol[attr_name].include?($1.downcase)
127
+ else
128
+ !protocol[attr_name].include?(:relative)
129
+ end
130
+
131
+ attr.unlink if del
132
+ end
133
+ end
134
+ end
135
+
136
+ # Add required attributes.
137
+ if @config[:add_attributes].has_key?(name)
138
+ @config[:add_attributes][name].each do |key, val|
139
+ node[key] = val
140
+ end
141
+ end
142
+ elsif node.cdata?
143
+ node.replace(Nokogiri::XML::Text.new(node.text, node.document))
144
+ end
145
+ end
146
+
147
+ if @config[:output] == :xhtml
148
+ output_method = fragment.method(:to_xhtml)
149
+ elsif @config[:output] == :html
150
+ output_method = fragment.method(:to_html)
151
+ else
152
+ raise Error, "unsupported output format: #{@config[:output]}"
153
+ end
154
+
155
+ if RUBY_VERSION >= '1.9'
156
+ # Nokogiri 1.3.3 (and possibly earlier versions) always returns a US-ASCII
157
+ # string no matter what we ask for. This will be fixed in 1.4.0, but for
158
+ # now we have to hack around it to prevent errors.
159
+ result = output_method.call(:encoding => 'utf-8', :indent => 0).force_encoding('utf-8')
160
+ result.gsub!(">\n", '>')
161
+ else
162
+ result = output_method.call(:encoding => 'utf-8', :indent => 0).gsub(">\n", '>')
163
+ end
164
+
165
+ return result == html ? nil : html[0, html.length] = result
166
+ end
167
+
168
+ #--
169
+ # Class Methods
170
+ #++
171
+
172
+ class << self
173
+ # Returns a sanitized copy of _html_, using the settings in _config_ if
174
+ # specified.
175
+ def clean(html, config = {})
176
+ sanitize = Sanitize.new(config)
177
+ sanitize.clean(html)
178
+ end
179
+
180
+ # Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
181
+ # were made.
182
+ def clean!(html, config = {})
183
+ sanitize = Sanitize.new(config)
184
+ sanitize.clean!(html)
185
+ end
186
+ end
187
+
188
+ end
@@ -0,0 +1,75 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ FLASH_VIDEO_OBJECT = {
26
+ :elements => ['object', 'param', 'embed'],
27
+ :attributes => {
28
+ 'object' => ['width', 'height'],
29
+ 'param' => ['name', 'value'],
30
+ 'embed' => ['src', 'type', 'allowscriptaccess', 'allowfullscreen',
31
+ 'width', 'height']
32
+ }
33
+ }
34
+
35
+ DEFAULT = {
36
+ # Whether or not to allow HTML comments. Allowing comments is strongly
37
+ # discouraged, since IE allows script execution within conditional
38
+ # comments.
39
+ :allow_comments => false,
40
+
41
+ # HTML attributes to add to specific elements. By default, no attributes
42
+ # are added.
43
+ :add_attributes => {},
44
+
45
+ # HTML attributes to allow in specific elements. By default, no attributes
46
+ # are allowed.
47
+ :attributes => {},
48
+
49
+ # HTML elements to allow. By default, no elements are allowed (which means
50
+ # that all HTML will be stripped).
51
+ :elements => [],
52
+
53
+ # URL prefixes to be allowed in object embeds. Note that any kind of arbitrary
54
+ # object embed would be insecure, therefore this is locked down pretty tight
55
+ # to allow only YouTube-style embed codes. Under no circumstances should you
56
+ # add object to the allowed element above, these are handled by a separate code
57
+ # path in the sanitizer. You must include the fully qualified URL name including
58
+ # protocol since it matches directly against the attribute value.
59
+ :object_urls => [],
60
+
61
+ # This specifies the elements and attributes on an object and its immediate
62
+ # descendents. The default configuration is for standard flash video embeds.
63
+ :object_config => FLASH_VIDEO_OBJECT,
64
+
65
+ # Output format. Supported formats are :html and :xhtml (which is the
66
+ # default).
67
+ :output => :xhtml,
68
+
69
+ # URL handling protocols to allow in specific attributes. By default, no
70
+ # protocols are allowed. Use :relative in place of a protocol if you want
71
+ # to allow relative URLs sans protocol.
72
+ :protocols => {}
73
+ }
74
+ end
75
+ end
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ BASIC = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
28
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
29
+ 'sup', 'u', 'ul'],
30
+
31
+ :attributes => {
32
+ 'a' => ['href'],
33
+ 'blockquote' => ['cite'],
34
+ 'q' => ['cite']
35
+ },
36
+
37
+ :add_attributes => {
38
+ 'a' => {'rel' => 'nofollow'}
39
+ },
40
+
41
+ :protocols => {
42
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
43
+ :relative]},
44
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
45
+ 'q' => {'cite' => ['http', 'https', :relative]}
46
+ }
47
+ }
48
+ end
49
+ end
@@ -0,0 +1,56 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RELAXED = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
28
+ 'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
29
+ 'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
30
+ 'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
31
+
32
+ :attributes => {
33
+ 'a' => ['href', 'title'],
34
+ 'blockquote' => ['cite'],
35
+ 'col' => ['span', 'width'],
36
+ 'colgroup' => ['span', 'width'],
37
+ 'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
38
+ 'ol' => ['start', 'type'],
39
+ 'q' => ['cite'],
40
+ 'table' => ['summary', 'width'],
41
+ 'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
42
+ 'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
43
+ 'width'],
44
+ 'ul' => ['type']
45
+ },
46
+
47
+ :protocols => {
48
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
49
+ :relative]},
50
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
51
+ 'img' => {'src' => ['http', 'https', :relative]},
52
+ 'q' => {'cite' => ['http', 'https', :relative]}
53
+ }
54
+ }
55
+ end
56
+ end
@@ -0,0 +1,29 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RESTRICTED = {
26
+ :elements => ['b', 'em', 'i', 'strong', 'u']
27
+ }
28
+ end
29
+ end
@@ -0,0 +1,3 @@
1
+ class Sanitize
2
+ VERSION = '1.1.0'
3
+ end
metadata ADDED
@@ -0,0 +1,93 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: dasil003-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ryan Grove
8
+ - Gabe da Silveira
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2009-10-13 00:00:00 -07:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: nokogiri
18
+ type: :runtime
19
+ version_requirement:
20
+ version_requirements: !ruby/object:Gem::Requirement
21
+ requirements:
22
+ - - ~>
23
+ - !ruby/object:Gem::Version
24
+ version: 1.3.3
25
+ version:
26
+ - !ruby/object:Gem::Dependency
27
+ name: bacon
28
+ type: :development
29
+ version_requirement:
30
+ version_requirements: !ruby/object:Gem::Requirement
31
+ requirements:
32
+ - - ~>
33
+ - !ruby/object:Gem::Version
34
+ version: 1.1.0
35
+ version:
36
+ - !ruby/object:Gem::Dependency
37
+ name: rake
38
+ type: :development
39
+ version_requirement:
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ requirements:
42
+ - - ~>
43
+ - !ruby/object:Gem::Version
44
+ version: 0.8.0
45
+ version:
46
+ description:
47
+ email: gabe@websaviour.com
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ extra_rdoc_files: []
53
+
54
+ files:
55
+ - HISTORY
56
+ - LICENSE
57
+ - README.rdoc
58
+ - lib/sanitize/config/basic.rb
59
+ - lib/sanitize/config/relaxed.rb
60
+ - lib/sanitize/config/restricted.rb
61
+ - lib/sanitize/config.rb
62
+ - lib/sanitize/version.rb
63
+ - lib/sanitize.rb
64
+ has_rdoc: true
65
+ homepage: http://github.com/dasil003/sanitize/
66
+ licenses: []
67
+
68
+ post_install_message:
69
+ rdoc_options: []
70
+
71
+ require_paths:
72
+ - lib
73
+ required_ruby_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: 1.8.6
78
+ version:
79
+ required_rubygems_version: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: "0"
84
+ version:
85
+ requirements: []
86
+
87
+ rubyforge_project:
88
+ rubygems_version: 1.3.5
89
+ signing_key:
90
+ specification_version: 3
91
+ summary: Whitelist-based HTML sanitizer.
92
+ test_files: []
93
+