sanitize 1.2.0 → 1.2.1.dev.20100122
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- data/HISTORY +9 -0
- data/README.rdoc +43 -19
- data/lib/sanitize.rb +9 -3
- data/lib/sanitize/config.rb +5 -0
- data/lib/sanitize/version.rb +1 -1
- metadata +4 -4
data/HISTORY
CHANGED
@@ -1,6 +1,15 @@
|
|
1
1
|
Sanitize History
|
2
2
|
================================================================================
|
3
3
|
|
4
|
+
Version 1.2.1 (git)
|
5
|
+
* Added a :remove_contents config setting. If set to true, Sanitize will
|
6
|
+
remove the contents of filtered nodes in addition to the nodes themselves.
|
7
|
+
* The environment hash passed into transformers now includes a :node_name item
|
8
|
+
containing the lowercase name of the current HTML node (e.g. "div").
|
9
|
+
* Returning anything other than a Hash or nil from a transformer will now
|
10
|
+
raise a meaningful Sanitize::Error exception rather than an unintended
|
11
|
+
NameError.
|
12
|
+
|
4
13
|
Version 1.2.0 (2010-01-17)
|
5
14
|
* Requires Nokogiri ~> 1.4.1.
|
6
15
|
* Added support for transformers, which allow you to filter and alter nodes
|
data/README.rdoc
CHANGED
@@ -15,7 +15,7 @@ or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
|
|
15
15
|
caution.
|
16
16
|
|
17
17
|
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
18
|
-
*Version*:: 1.2.
|
18
|
+
*Version*:: 1.2.1.dev (git)
|
19
19
|
*Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
|
20
20
|
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
21
21
|
*Website*:: http://github.com/rgrove/sanitize
|
@@ -33,7 +33,7 @@ Latest stable release:
|
|
33
33
|
|
34
34
|
Latest development version:
|
35
35
|
|
36
|
-
gem install sanitize --
|
36
|
+
gem install sanitize --pre
|
37
37
|
|
38
38
|
== Usage
|
39
39
|
|
@@ -89,17 +89,17 @@ configuration:
|
|
89
89
|
:attributes => {'a' => ['href', 'title'], 'span' => ['class']},
|
90
90
|
:protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
|
91
91
|
|
92
|
-
==== :
|
92
|
+
==== :add_attributes (Hash)
|
93
93
|
|
94
|
-
|
94
|
+
Attributes to add to specific elements. If the attribute already exists, it will
|
95
|
+
be replaced with the value specified here. Specify all element names and
|
96
|
+
attributes in lowercase.
|
95
97
|
|
96
|
-
:
|
97
|
-
'a'
|
98
|
-
|
99
|
-
'sup', 'u', 'ul'
|
100
|
-
]
|
98
|
+
:add_attributes => {
|
99
|
+
'a' => {'rel' => 'nofollow'}
|
100
|
+
}
|
101
101
|
|
102
|
-
==== :attributes
|
102
|
+
==== :attributes (Hash)
|
103
103
|
|
104
104
|
Attributes to allow for specific elements. Specify all element names and
|
105
105
|
attributes in lowercase.
|
@@ -118,17 +118,28 @@ If you'd like to allow certain attributes on all elements, use the symbol
|
|
118
118
|
'a' => ['href', 'title']
|
119
119
|
}
|
120
120
|
|
121
|
-
==== :
|
121
|
+
==== :allow_comments (boolean)
|
122
122
|
|
123
|
-
|
124
|
-
|
125
|
-
|
123
|
+
Whether or not to allow HTML comments. Allowing comments is strongly
|
124
|
+
discouraged, since IE allows script execution within conditional comments. The
|
125
|
+
default value is <code>false</code>.
|
126
126
|
|
127
|
-
|
128
|
-
|
129
|
-
|
127
|
+
==== :elements (Array)
|
128
|
+
|
129
|
+
Array of element names to allow. Specify all names in lowercase.
|
130
|
+
|
131
|
+
:elements => [
|
132
|
+
'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
|
133
|
+
'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
|
134
|
+
'sup', 'u', 'ul'
|
135
|
+
]
|
136
|
+
|
137
|
+
==== :output (Symbol)
|
130
138
|
|
131
|
-
|
139
|
+
Output format. Supported formats are <code>:html</code> and <code>:xhtml</code>,
|
140
|
+
defaulting to <code>:xhtml</code>.
|
141
|
+
|
142
|
+
==== :protocols (Hash)
|
132
143
|
|
133
144
|
URL protocols to allow in specific attributes. If an attribute is listed here
|
134
145
|
and contains a protocol other than those specified (or if it contains no
|
@@ -146,6 +157,16 @@ include the symbol <code>:relative</code> in the protocol array:
|
|
146
157
|
'a' => {'href' => ['http', 'https', :relative]}
|
147
158
|
}
|
148
159
|
|
160
|
+
==== :remove_contents (boolean)
|
161
|
+
|
162
|
+
If set to <code>true</code>, Sanitize will remove the contents of any filtered
|
163
|
+
nodes in addition to the nodes themselves. By default, Sanitize leaves the safe
|
164
|
+
parts of a node's contents behind when the node is removed.
|
165
|
+
|
166
|
+
==== :transformers
|
167
|
+
|
168
|
+
See below.
|
169
|
+
|
149
170
|
=== Transformers
|
150
171
|
|
151
172
|
Transformers allow you to filter and alter nodes using your own custom logic, on
|
@@ -170,6 +191,9 @@ Hash that contains the following items:
|
|
170
191
|
[<code>:node</code>]
|
171
192
|
A Nokogiri::XML::Node object representing an HTML element.
|
172
193
|
|
194
|
+
[<code>:node_name</code>]
|
195
|
+
The name of the current HTML node, always lowercase (e.g. "div" or "span").
|
196
|
+
|
173
197
|
==== Processing
|
174
198
|
|
175
199
|
Each transformer has full access to the Nokogiri::XML::Node that's passed into
|
@@ -223,7 +247,7 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
|
|
223
247
|
|
224
248
|
lambda do |env|
|
225
249
|
node = env[:node]
|
226
|
-
node_name =
|
250
|
+
node_name = env[:node_name]
|
227
251
|
parent = node.parent
|
228
252
|
|
229
253
|
# Since the transformer receives the deepest nodes first, we look for a
|
data/lib/sanitize.rb
CHANGED
@@ -144,7 +144,10 @@ class Sanitize
|
|
144
144
|
|
145
145
|
# Delete any element that isn't in the whitelist.
|
146
146
|
unless transform[:whitelist] || @config[:elements].include?(name)
|
147
|
-
|
147
|
+
unless @config[:remove_contents]
|
148
|
+
node.children.each { |n| node.add_previous_sibling(n) }
|
149
|
+
end
|
150
|
+
|
148
151
|
node.unlink
|
149
152
|
return
|
150
153
|
end
|
@@ -200,8 +203,9 @@ class Sanitize
|
|
200
203
|
|
201
204
|
@config[:transformers].inject(node) do |transformer_node, transformer|
|
202
205
|
transform = transformer.call({
|
203
|
-
:config
|
204
|
-
:node
|
206
|
+
:config => @config,
|
207
|
+
:node => transformer_node,
|
208
|
+
:node_name => transformer_node.name.downcase
|
205
209
|
})
|
206
210
|
|
207
211
|
if transform.nil?
|
@@ -224,4 +228,6 @@ class Sanitize
|
|
224
228
|
|
225
229
|
return output
|
226
230
|
end
|
231
|
+
|
232
|
+
class Error < StandardError; end
|
227
233
|
end
|
data/lib/sanitize/config.rb
CHANGED
@@ -49,6 +49,11 @@ class Sanitize
|
|
49
49
|
# to allow relative URLs sans protocol.
|
50
50
|
:protocols => {},
|
51
51
|
|
52
|
+
# If this is true, Sanitize will remove the contents of any filtered nodes
|
53
|
+
# in addition to the nodes themselves. By default, Sanitize leaves the
|
54
|
+
# safe parts of a node's contents behind when the node is removed.
|
55
|
+
:remove_contents => false,
|
56
|
+
|
52
57
|
# Transformers allow you to filter or alter nodes using custom logic. See
|
53
58
|
# README.rdoc for details and examples.
|
54
59
|
:transformers => []
|
data/lib/sanitize/version.rb
CHANGED
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.2.
|
4
|
+
version: 1.2.1.dev.20100122
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Ryan Grove
|
@@ -9,7 +9,7 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2010-01-
|
12
|
+
date: 2010-01-22 00:00:00 -08:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
@@ -77,9 +77,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
77
77
|
version:
|
78
78
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
79
79
|
requirements:
|
80
|
-
- - "
|
80
|
+
- - ">"
|
81
81
|
- !ruby/object:Gem::Version
|
82
|
-
version:
|
82
|
+
version: 1.3.1
|
83
83
|
version:
|
84
84
|
requirements: []
|
85
85
|
|