dasil003-sanitize 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/HISTORY ADDED
@@ -0,0 +1,65 @@
1
+ Sanitize History
2
+ ================================================================================
3
+
4
+ Version 1.1.0 (2009-10-11)
5
+ * Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
6
+ * Added an :output config setting to allow the output format to be specified.
7
+ Supported formats are :xhtml (the default) and :html (which outputs HTML4).
8
+ * Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
9
+ path segments. [Peter Cooper]
10
+
11
+ Version 1.0.8 (2009-04-23)
12
+ * Added a workaround for an Hpricot bug that prevents attribute names from
13
+ being downcased in recent versions of Hpricot. This was exploitable to
14
+ prevent non-whitelisted protocols from being cleaned. [Reported by Ben
15
+ Wanicur]
16
+
17
+ Version 1.0.7 (2009-04-11)
18
+ * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
19
+ * Fixed a bug that caused named character entities containing digits (like
20
+ ²) to be escaped when they shouldn't have been. [Reported by Sebastian
21
+ Steinmetz]
22
+
23
+ Version 1.0.6 (2009-02-23)
24
+ * Removed htmlentities gem dependency.
25
+ * Existing well-formed character entity references in the input string are now
26
+ preserved rather than being decoded and re-encoded.
27
+ * The ' character is now encoded as ' instead of ' to prevent
28
+ problems in IE6.
29
+ * You can now specify the symbol :all in place of an element name in the
30
+ attributes config hash to allow certain attributes on all elements. [Thanks
31
+ to Mutwin Kraus]
32
+
33
+ Version 1.0.5 (2009-02-05)
34
+ * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
35
+ protocols from being cleaned when relative URLs were allowed. [Reported by
36
+ Dev Purkayastha]
37
+ * Fixed "undefined method `parent='" exceptions caused by parser changes in
38
+ edge Hpricot.
39
+
40
+ Version 1.0.4 (2009-01-16)
41
+ * Fixed a bug that made it possible to sneak a non-whitelisted element through
42
+ by repeating it several times in a row. All versions of Sanitize prior to
43
+ 1.0.4 are vulnerable. [Reported by Cristobal]
44
+
45
+ Version 1.0.3 (2009-01-15)
46
+ * Fixed a bug whereby incomplete Unicode or hex entities could be used to
47
+ prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
48
+ still decode the incomplete entities, users of those browsers may be
49
+ vulnerable to malicious script injection on websites using versions of
50
+ Sanitize prior to 1.0.3.
51
+
52
+ Version 1.0.2 (2009-01-04)
53
+ * Fixed a bug that caused an exception to be thrown when parsing a valueless
54
+ attribute that's expected to contain a URL.
55
+
56
+ Version 1.0.1 (2009-01-01)
57
+ * You can now specify :relative in a protocol config array to allow attributes
58
+ containing relative URLs with no protocol. The Basic and Relaxed configs
59
+ have been updated to allow relative URLs.
60
+ * Added a workaround for an Hpricot bug that causes HTML entities for
61
+ non-ASCII characters to be replaced by question marks, and all other
62
+ entities to be destructively decoded.
63
+
64
+ Version 1.0.0 (2008-12-25)
65
+ * First release.
data/LICENSE ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the 'Software'), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
7
+ the Software, and to permit persons to whom the Software is furnished to do so,
8
+ subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
15
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
16
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.rdoc ADDED
@@ -0,0 +1,212 @@
1
+ = Sanitize
2
+
3
+ Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
4
+ elements and attributes, Sanitize will remove all unacceptable HTML from a
5
+ string.
6
+
7
+ Using a simple configuration syntax, you can tell Sanitize to allow certain
8
+ elements, certain attributes within those elements, and even certain URL
9
+ protocols within attributes that contain URLs. Any HTML elements or attributes
10
+ that you don't explicitly allow will be removed.
11
+
12
+ Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
13
+ of fragile regular expressions, Sanitize has no trouble dealing with malformed
14
+ or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
15
+ caution.
16
+
17
+ *Author*:: Ryan Grove (mailto:ryan@wonko.com)
18
+ *Version*:: 1.1.0 (2009-10-11)
19
+ *Copyright*:: Copyright (c) 2009 Ryan Grove. All rights reserved.
20
+ *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
21
+ *Website*:: http://github.com/rgrove/sanitize
22
+
23
+ == Requires
24
+
25
+ * Nokogiri
26
+ * libxml2 >= 2.7.2
27
+
28
+ == Installation
29
+
30
+ Latest stable release:
31
+
32
+ gem install sanitize
33
+
34
+ Latest development version:
35
+
36
+ gem install sanitize -s http://gemcutter.org --prerelease
37
+
38
+ == Usage
39
+
40
+ If you don't specify any configuration options, Sanitize will use its strictest
41
+ settings by default, which means it will strip all HTML.
42
+
43
+ require 'rubygems'
44
+ require 'sanitize'
45
+
46
+ html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
47
+
48
+ Sanitize.clean(html) # => 'foo'
49
+
50
+ == Configuration
51
+
52
+ In addition to the ultra-safe default settings, Sanitize comes with three other
53
+ built-in modes.
54
+
55
+ === Sanitize::Config::RESTRICTED
56
+
57
+ Allows only very simple inline formatting markup. No links, images, or block
58
+ elements.
59
+
60
+ Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
61
+
62
+ === Sanitize::Config::BASIC
63
+
64
+ Allows a variety of markup including formatting tags, links, and lists. Images
65
+ and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
66
+ protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
67
+ mitigate SEO spam.
68
+
69
+ Sanitize.clean(html, Sanitize::Config::BASIC)
70
+ # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
71
+
72
+ === Sanitize::Config::RELAXED
73
+
74
+ Allows an even wider variety of markup than BASIC, including images and tables.
75
+ Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
76
+ are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
77
+ added to links.
78
+
79
+ Sanitize.clean(html, Sanitize::Config::RELAXED)
80
+ # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
81
+
82
+ === Custom Configuration
83
+
84
+ If the built-in modes don't meet your needs, you can easily specify a custom
85
+ configuration:
86
+
87
+ Sanitize.clean(html, :elements => ['a', 'span'],
88
+ :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
89
+ :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
90
+
91
+ ==== :elements
92
+
93
+ Array of element names to allow. Specify all names in lowercase.
94
+
95
+ :elements => [
96
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
97
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
98
+ 'sup', 'u', 'ul'
99
+ ]
100
+
101
+ ==== :attributes
102
+
103
+ Attributes to allow for specific elements. Specify all element names and
104
+ attributes in lowercase.
105
+
106
+ :attributes => {
107
+ 'a' => ['href', 'title'],
108
+ 'blockquote' => ['cite'],
109
+ 'img' => ['alt', 'src', 'title']
110
+ }
111
+
112
+ If you'd like to allow certain attributes on all elements, use the symbol
113
+ <code>:all</code> instead of an element name.
114
+
115
+ :attributes => {
116
+ :all => ['class'],
117
+ 'a' => ['href', 'title']
118
+ }
119
+
120
+ ==== :add_attributes
121
+
122
+ Attributes to add to specific elements. If the attribute already exists, it will
123
+ be replaced with the value specified here. Specify all element names and
124
+ attributes in lowercase.
125
+
126
+ :add_attributes => {
127
+ 'a' => {'rel' => 'nofollow'}
128
+ }
129
+
130
+ ==== :protocols
131
+
132
+ URL protocols to allow in specific attributes. If an attribute is listed here
133
+ and contains a protocol other than those specified (or if it contains no
134
+ protocol at all), it will be removed.
135
+
136
+ :protocols => {
137
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto']},
138
+ 'img' => {'src' => ['http', 'https']}
139
+ }
140
+
141
+ If you'd like to allow the use of relative URLs which don't have a protocol,
142
+ include the symbol <code>:relative</code> in the protocol array:
143
+
144
+ :protocols => {
145
+ 'a' => {'href' => ['http', 'https', :relative]}
146
+ }
147
+
148
+ ==== :object_urls
149
+
150
+ URL prefixes to allow specific flash embed codes. This can be used to allow
151
+ standard video embeds such as provided by YouTube:
152
+
153
+ :object_urls => ['http://www.youtube.com']
154
+
155
+ *Warning* Do not under any circumstances add 'object' or 'embed' to the standard
156
+ config. It is unnecessary and will open an XSS hole.
157
+
158
+ Because object tags are more complex than most other tags and include many
159
+ XSS attack vectors, this functionality follows a completely different code path
160
+ from the regular filtering. There is a secondary configuration variable that
161
+ controls what is allowed on the object tag and its descendents, by default it
162
+ is:
163
+
164
+ :object_config => {
165
+ :elements => ['object', 'param', 'embed'],
166
+ :attributes => {
167
+ 'object' => ['width', 'height'],
168
+ 'param' => ['name', 'value'],
169
+ 'embed' => ['src', 'type', 'allowscriptaccess', 'allowfullscreen',
170
+ 'width', 'height']
171
+ }}
172
+
173
+ This config is applied to the object tag and all of its immediate descendents
174
+ instead of the standard config. This initial configuration was crafted specifically
175
+ to allow YouTube and Vimeo embed codes in this format:
176
+
177
+ <object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/qVaEPx_VyXs&hl=en&fs=1&color1=0xcc2550&color2=0xe87a9f"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/qVaEPx_VyXs&hl=en&fs=1&color1=0xcc2550&color2=0xe87a9f" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object>
178
+
179
+
180
+ == Contributors
181
+
182
+ The following lovely people have contributed to Sanitize in the form of patches
183
+ or ideas that later became code:
184
+
185
+ * Peter Cooper <git@peterc.org>
186
+ * Gabe da Silveira <gabe@websaviour.com>
187
+ * Ryan Grove <ryan@wonko.com>
188
+ * Adam Hooper <adam@adamhooper.com>
189
+ * Mutwin Kraus <mutle@blogage.de>
190
+ * Dev Purkayastha <dev.purkayastha@gmail.com>
191
+ * Ben Wanicur <bwanicur@verticalresponse.com>
192
+
193
+ == License
194
+
195
+ Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
196
+
197
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
198
+ this software and associated documentation files (the 'Software'), to deal in
199
+ the Software without restriction, including without limitation the rights to
200
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
201
+ the Software, and to permit persons to whom the Software is furnished to do so,
202
+ subject to the following conditions:
203
+
204
+ The above copyright notice and this permission notice shall be included in all
205
+ copies or substantial portions of the Software.
206
+
207
+ THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
208
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
209
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
210
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
211
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
212
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/lib/sanitize.rb ADDED
@@ -0,0 +1,188 @@
1
+ # encoding: utf-8
2
+ #--
3
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
4
+ #
5
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ # of this software and associated documentation files (the 'Software'), to deal
7
+ # in the Software without restriction, including without limitation the rights
8
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ # copies of the Software, and to permit persons to whom the Software is
10
+ # furnished to do so, subject to the following conditions:
11
+ #
12
+ # The above copyright notice and this permission notice shall be included in all
13
+ # copies or substantial portions of the Software.
14
+ #
15
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ # SOFTWARE.
22
+ #++
23
+
24
+ require 'nokogiri'
25
+ require 'sanitize/version'
26
+ require 'sanitize/config'
27
+ require 'sanitize/config/restricted'
28
+ require 'sanitize/config/basic'
29
+ require 'sanitize/config/relaxed'
30
+
31
+ class Sanitize
32
+
33
+ # Matches an attribute value that could be treated by a browser as a URL
34
+ # with a protocol prefix, such as "http:" or "javascript:". Any string of zero
35
+ # or more characters followed by a colon is considered a match, even if the
36
+ # colon is encoded as an entity and even if it's an incomplete entity (which
37
+ # IE6 and Opera will still parse).
38
+ REGEX_PROTOCOL = /^([A-Za-z0-9\+\-\.\&\;\#\s]*?)(?:\:|&#0*58|&#x0*3a)/i
39
+
40
+ #--
41
+ # Instance Methods
42
+ #++
43
+
44
+ # Returns a new Sanitize object initialized with the settings in _config_.
45
+ def initialize(config = {})
46
+ @config = Config::DEFAULT.merge(config)
47
+ end
48
+
49
+ # Returns a sanitized copy of _html_.
50
+ def clean(html)
51
+ dupe = html.dup
52
+ clean!(dupe) || dupe
53
+ end
54
+
55
+ # Performs clean in place, returning _html_, or +nil+ if no changes were
56
+ # made.
57
+ def clean!(html)
58
+ fragment = Nokogiri::HTML::DocumentFragment.parse(html)
59
+
60
+ fragment.traverse do |node|
61
+ if node.comment?
62
+ node.unlink unless @config[:allow_comments]
63
+ elsif node.element?
64
+ name = node.name.to_s.downcase
65
+ parent_name = node.parent ? node.parent.name.to_s.downcase : nil
66
+
67
+ # Special handling of objects is necessary to limit by specific domains.
68
+ if @config[:object_urls].any? &&
69
+ [name, parent_name].include?('object')
70
+ unless @config[:object_config][:elements].include?(name)
71
+ node.unlink
72
+ next
73
+ end
74
+
75
+ attr_whitelist = @config[:object_config][:attributes][name] || []
76
+
77
+ # Remove non-whitelisted object interior tag attributes
78
+ node.attribute_nodes.each do |attr|
79
+ attr.unlink unless attr_whitelist.include?(attr.name.downcase)
80
+ end
81
+
82
+ # Remove non-whitelisted object URLs.
83
+ object_url = if name == 'param' && node['name'] == 'movie'
84
+ node['value']
85
+ elsif name == 'embed'
86
+ node['src']
87
+ end
88
+
89
+ if object_url &&
90
+ !@config[:object_urls].any?{|good| object_url.index(good) == 0}
91
+ node.parent.unlink
92
+ end
93
+
94
+ next
95
+ end
96
+
97
+ # Delete any element that isn't in the whitelist.
98
+ unless @config[:elements].include?(name)
99
+ node.children.each { |n| node.add_previous_sibling(n) }
100
+ node.unlink
101
+ next
102
+ end
103
+
104
+ attr_whitelist = ((@config[:attributes][name] || []) +
105
+ (@config[:attributes][:all] || [])).uniq
106
+
107
+ if attr_whitelist.empty?
108
+ # Delete all attributes from elements with no whitelisted
109
+ # attributes.
110
+ node.attribute_nodes.each { |attr| attr.remove }
111
+ else
112
+ # Delete any attribute that isn't in the whitelist for this element.
113
+ node.attribute_nodes.each do |attr|
114
+ attr.unlink unless attr_whitelist.include?(attr.name.downcase)
115
+ end
116
+
117
+ # Delete remaining attributes that use unacceptable protocols.
118
+ if @config[:protocols].has_key?(name)
119
+ protocol = @config[:protocols][name]
120
+
121
+ node.attribute_nodes.each do |attr|
122
+ attr_name = attr.name.downcase
123
+ next false unless protocol.has_key?(attr_name)
124
+
125
+ del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
126
+ !protocol[attr_name].include?($1.downcase)
127
+ else
128
+ !protocol[attr_name].include?(:relative)
129
+ end
130
+
131
+ attr.unlink if del
132
+ end
133
+ end
134
+ end
135
+
136
+ # Add required attributes.
137
+ if @config[:add_attributes].has_key?(name)
138
+ @config[:add_attributes][name].each do |key, val|
139
+ node[key] = val
140
+ end
141
+ end
142
+ elsif node.cdata?
143
+ node.replace(Nokogiri::XML::Text.new(node.text, node.document))
144
+ end
145
+ end
146
+
147
+ if @config[:output] == :xhtml
148
+ output_method = fragment.method(:to_xhtml)
149
+ elsif @config[:output] == :html
150
+ output_method = fragment.method(:to_html)
151
+ else
152
+ raise Error, "unsupported output format: #{@config[:output]}"
153
+ end
154
+
155
+ if RUBY_VERSION >= '1.9'
156
+ # Nokogiri 1.3.3 (and possibly earlier versions) always returns a US-ASCII
157
+ # string no matter what we ask for. This will be fixed in 1.4.0, but for
158
+ # now we have to hack around it to prevent errors.
159
+ result = output_method.call(:encoding => 'utf-8', :indent => 0).force_encoding('utf-8')
160
+ result.gsub!(">\n", '>')
161
+ else
162
+ result = output_method.call(:encoding => 'utf-8', :indent => 0).gsub(">\n", '>')
163
+ end
164
+
165
+ return result == html ? nil : html[0, html.length] = result
166
+ end
167
+
168
+ #--
169
+ # Class Methods
170
+ #++
171
+
172
+ class << self
173
+ # Returns a sanitized copy of _html_, using the settings in _config_ if
174
+ # specified.
175
+ def clean(html, config = {})
176
+ sanitize = Sanitize.new(config)
177
+ sanitize.clean(html)
178
+ end
179
+
180
+ # Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
181
+ # were made.
182
+ def clean!(html, config = {})
183
+ sanitize = Sanitize.new(config)
184
+ sanitize.clean!(html)
185
+ end
186
+ end
187
+
188
+ end
@@ -0,0 +1,75 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ FLASH_VIDEO_OBJECT = {
26
+ :elements => ['object', 'param', 'embed'],
27
+ :attributes => {
28
+ 'object' => ['width', 'height'],
29
+ 'param' => ['name', 'value'],
30
+ 'embed' => ['src', 'type', 'allowscriptaccess', 'allowfullscreen',
31
+ 'width', 'height']
32
+ }
33
+ }
34
+
35
+ DEFAULT = {
36
+ # Whether or not to allow HTML comments. Allowing comments is strongly
37
+ # discouraged, since IE allows script execution within conditional
38
+ # comments.
39
+ :allow_comments => false,
40
+
41
+ # HTML attributes to add to specific elements. By default, no attributes
42
+ # are added.
43
+ :add_attributes => {},
44
+
45
+ # HTML attributes to allow in specific elements. By default, no attributes
46
+ # are allowed.
47
+ :attributes => {},
48
+
49
+ # HTML elements to allow. By default, no elements are allowed (which means
50
+ # that all HTML will be stripped).
51
+ :elements => [],
52
+
53
+ # URL prefixes to be allowed in object embeds. Note that any kind of arbitrary
54
+ # object embed would be insecure, therefore this is locked down pretty tight
55
+ # to allow only YouTube-style embed codes. Under no circumstances should you
56
+ # add object to the allowed element above, these are handled by a separate code
57
+ # path in the sanitizer. You must include the fully qualified URL name including
58
+ # protocol since it matches directly against the attribute value.
59
+ :object_urls => [],
60
+
61
+ # This specifies the elements and attributes on an object and its immediate
62
+ # descendents. The default configuration is for standard flash video embeds.
63
+ :object_config => FLASH_VIDEO_OBJECT,
64
+
65
+ # Output format. Supported formats are :html and :xhtml (which is the
66
+ # default).
67
+ :output => :xhtml,
68
+
69
+ # URL handling protocols to allow in specific attributes. By default, no
70
+ # protocols are allowed. Use :relative in place of a protocol if you want
71
+ # to allow relative URLs sans protocol.
72
+ :protocols => {}
73
+ }
74
+ end
75
+ end
@@ -0,0 +1,49 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ BASIC = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
28
+ 'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
29
+ 'sup', 'u', 'ul'],
30
+
31
+ :attributes => {
32
+ 'a' => ['href'],
33
+ 'blockquote' => ['cite'],
34
+ 'q' => ['cite']
35
+ },
36
+
37
+ :add_attributes => {
38
+ 'a' => {'rel' => 'nofollow'}
39
+ },
40
+
41
+ :protocols => {
42
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
43
+ :relative]},
44
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
45
+ 'q' => {'cite' => ['http', 'https', :relative]}
46
+ }
47
+ }
48
+ end
49
+ end
@@ -0,0 +1,56 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RELAXED = {
26
+ :elements => [
27
+ 'a', 'b', 'blockquote', 'br', 'caption', 'cite', 'code', 'col',
28
+ 'colgroup', 'dd', 'dl', 'dt', 'em', 'i', 'img', 'li', 'ol', 'p', 'pre',
29
+ 'q', 'small', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td',
30
+ 'tfoot', 'th', 'thead', 'tr', 'u', 'ul'],
31
+
32
+ :attributes => {
33
+ 'a' => ['href', 'title'],
34
+ 'blockquote' => ['cite'],
35
+ 'col' => ['span', 'width'],
36
+ 'colgroup' => ['span', 'width'],
37
+ 'img' => ['align', 'alt', 'height', 'src', 'title', 'width'],
38
+ 'ol' => ['start', 'type'],
39
+ 'q' => ['cite'],
40
+ 'table' => ['summary', 'width'],
41
+ 'td' => ['abbr', 'axis', 'colspan', 'rowspan', 'width'],
42
+ 'th' => ['abbr', 'axis', 'colspan', 'rowspan', 'scope',
43
+ 'width'],
44
+ 'ul' => ['type']
45
+ },
46
+
47
+ :protocols => {
48
+ 'a' => {'href' => ['ftp', 'http', 'https', 'mailto',
49
+ :relative]},
50
+ 'blockquote' => {'cite' => ['http', 'https', :relative]},
51
+ 'img' => {'src' => ['http', 'https', :relative]},
52
+ 'q' => {'cite' => ['http', 'https', :relative]}
53
+ }
54
+ }
55
+ end
56
+ end
@@ -0,0 +1,29 @@
1
+ #--
2
+ # Copyright (c) 2009 Ryan Grove <ryan@wonko.com>
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the 'Software'), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in all
12
+ # copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
20
+ # SOFTWARE.
21
+ #++
22
+
23
+ class Sanitize
24
+ module Config
25
+ RESTRICTED = {
26
+ :elements => ['b', 'em', 'i', 'strong', 'u']
27
+ }
28
+ end
29
+ end
@@ -0,0 +1,3 @@
1
+ class Sanitize
2
+ VERSION = '1.1.0'
3
+ end
metadata ADDED
@@ -0,0 +1,93 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: dasil003-sanitize
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Ryan Grove
8
+ - Gabe da Silveira
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2009-10-13 00:00:00 -07:00
14
+ default_executable:
15
+ dependencies:
16
+ - !ruby/object:Gem::Dependency
17
+ name: nokogiri
18
+ type: :runtime
19
+ version_requirement:
20
+ version_requirements: !ruby/object:Gem::Requirement
21
+ requirements:
22
+ - - ~>
23
+ - !ruby/object:Gem::Version
24
+ version: 1.3.3
25
+ version:
26
+ - !ruby/object:Gem::Dependency
27
+ name: bacon
28
+ type: :development
29
+ version_requirement:
30
+ version_requirements: !ruby/object:Gem::Requirement
31
+ requirements:
32
+ - - ~>
33
+ - !ruby/object:Gem::Version
34
+ version: 1.1.0
35
+ version:
36
+ - !ruby/object:Gem::Dependency
37
+ name: rake
38
+ type: :development
39
+ version_requirement:
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ requirements:
42
+ - - ~>
43
+ - !ruby/object:Gem::Version
44
+ version: 0.8.0
45
+ version:
46
+ description:
47
+ email: gabe@websaviour.com
48
+ executables: []
49
+
50
+ extensions: []
51
+
52
+ extra_rdoc_files: []
53
+
54
+ files:
55
+ - HISTORY
56
+ - LICENSE
57
+ - README.rdoc
58
+ - lib/sanitize/config/basic.rb
59
+ - lib/sanitize/config/relaxed.rb
60
+ - lib/sanitize/config/restricted.rb
61
+ - lib/sanitize/config.rb
62
+ - lib/sanitize/version.rb
63
+ - lib/sanitize.rb
64
+ has_rdoc: true
65
+ homepage: http://github.com/dasil003/sanitize/
66
+ licenses: []
67
+
68
+ post_install_message:
69
+ rdoc_options: []
70
+
71
+ require_paths:
72
+ - lib
73
+ required_ruby_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ version: 1.8.6
78
+ version:
79
+ required_rubygems_version: !ruby/object:Gem::Requirement
80
+ requirements:
81
+ - - ">="
82
+ - !ruby/object:Gem::Version
83
+ version: "0"
84
+ version:
85
+ requirements: []
86
+
87
+ rubyforge_project:
88
+ rubygems_version: 1.3.5
89
+ signing_key:
90
+ specification_version: 3
91
+ summary: Whitelist-based HTML sanitizer.
92
+ test_files: []
93
+