sanitize 2.0.0.dev.20101225 → 2.0.0.dev.20110105
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- data/HISTORY.md +5 -2
- data/LICENSE +1 -1
- data/README.rdoc +31 -13
- data/lib/sanitize.rb +34 -15
- data/lib/sanitize/config.rb +7 -1
- data/lib/sanitize/config/basic.rb +1 -1
- data/lib/sanitize/config/relaxed.rb +1 -1
- data/lib/sanitize/config/restricted.rb +1 -1
- data/lib/sanitize/transformers/clean_element.rb +7 -10
- data/lib/sanitize/version.rb +1 -1
- metadata +3 -3
data/HISTORY.md
CHANGED
@@ -16,14 +16,17 @@ Version 2.0.0 (git)
|
|
16
16
|
`<br>` and `<p>`) that should be replaced with whitespace when removed in
|
17
17
|
order to preserve readability. See the README for the default list of
|
18
18
|
elements that will be replaced with whitespace when removed.
|
19
|
+
* Added a `:transformers_breadth` config, which may be used to specify
|
20
|
+
transformers that should traverse nodes in a breadth-first mode rather than
|
21
|
+
the default depth-first mode.
|
19
22
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
20
23
|
elements to the whitelists for the basic and relaxed configs.
|
21
24
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
22
25
|
`ruby`, and `wbr` elements to the whitelist for the relaxed config.
|
23
26
|
* The `dir`, `lang`, and `title` attributes are now whitelisted for all
|
24
27
|
elements in the relaxed config.
|
25
|
-
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
26
|
-
#315) that caused `</body></html>` to be appended to the CDATA inside
|
28
|
+
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
29
|
+
(issue #315) that caused `</body></html>` to be appended to the CDATA inside
|
27
30
|
unterminated script and style elements.
|
28
31
|
|
29
32
|
|
data/LICENSE
CHANGED
data/README.rdoc
CHANGED
@@ -15,7 +15,7 @@ or maliciously-formed HTML, and will always output valid HTML or XHTML.
|
|
15
15
|
|
16
16
|
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
17
17
|
*Version*:: 2.0.0 (git)
|
18
|
-
*Copyright*:: Copyright (c)
|
18
|
+
*Copyright*:: Copyright (c) 2011 Ryan Grove. All rights reserved.
|
19
19
|
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
20
20
|
*Website*:: http://github.com/rgrove/sanitize
|
21
21
|
|
@@ -173,8 +173,13 @@ The default value is <code>false</code>.
|
|
173
173
|
|
174
174
|
==== :transformers
|
175
175
|
|
176
|
-
Custom transformer or array of custom transformers
|
177
|
-
below for details.
|
176
|
+
Custom transformer or array of custom transformers to run using depth-first
|
177
|
+
traversal. See the Transformers section below for details.
|
178
|
+
|
179
|
+
=== :transformers_breadth
|
180
|
+
|
181
|
+
Custom transformer or array of custom transformers to run using breadth-first
|
182
|
+
traversal. See the Transformers section below for details.
|
178
183
|
|
179
184
|
==== :whitespace_elements (Array)
|
180
185
|
|
@@ -230,6 +235,10 @@ receive as an argument an environment Hash that contains the following items:
|
|
230
235
|
whitelisted by previous transformers, if any. It's generally bad form to
|
231
236
|
remove a node that a previous transformer has whitelisted.
|
232
237
|
|
238
|
+
[<code>:traversal_mode</code>]
|
239
|
+
Current node traversal mode, either <code>:depth</code> for depth-first (the
|
240
|
+
default mode) or <code>:breadth</code> for breadth-first.
|
241
|
+
|
233
242
|
==== Output
|
234
243
|
|
235
244
|
A transformer doesn't have to return anything, but may optionally return a Hash,
|
@@ -252,9 +261,9 @@ reflected instantly in the document and passed on to subsequently-called
|
|
252
261
|
transformers and to Sanitize itself. A transformer may even call Sanitize
|
253
262
|
internally to perform custom sanitization if needed.
|
254
263
|
|
255
|
-
Nodes are passed into transformers in the order in which they're traversed.
|
256
|
-
|
257
|
-
not from the first node to the last node:
|
264
|
+
Nodes are passed into transformers in the order in which they're traversed. By
|
265
|
+
default, depth-first traversal is used, meaning that markup is traversed from
|
266
|
+
the deepest node upward (not from the first node to the last node):
|
258
267
|
|
259
268
|
html = '<div><span>foo</span></div>'
|
260
269
|
transformer = lambda{|env| puts env[:node_name] }
|
@@ -262,17 +271,26 @@ not from the first node to the last node:
|
|
262
271
|
# Prints "text", "span", "div", "#document-fragment".
|
263
272
|
Sanitize.clean(html, :transformers => transformer)
|
264
273
|
|
274
|
+
You may use the <code>:transformers_breadth</code> config to specify one or more
|
275
|
+
transformers that should traverse nodes in breadth-first mode:
|
276
|
+
|
277
|
+
html = '<div><span>foo</span></div>'
|
278
|
+
transformer = lambda{|env| puts env[:node_name] }
|
279
|
+
|
280
|
+
# Prints "#document-fragment", "div", "span", "text".
|
281
|
+
Sanitize.clean(html, :transformers_breadth => transformer)
|
282
|
+
|
265
283
|
Transformers have a tremendous amount of power, including the power to
|
266
284
|
completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
|
267
285
|
your own hands.
|
268
286
|
|
269
287
|
==== Example: Transformer to whitelist YouTube video embeds
|
270
288
|
|
271
|
-
The following example demonstrates how to create a Sanitize
|
272
|
-
will safely whitelist valid YouTube video embeds without having
|
273
|
-
other kinds of embedded content, which would be the case if you
|
274
|
-
by just whitelisting all <code><object></code>,
|
275
|
-
<code><param></code> elements:
|
289
|
+
The following example demonstrates how to create a depth-first Sanitize
|
290
|
+
transformer that will safely whitelist valid YouTube video embeds without having
|
291
|
+
to blindly allow other kinds of embedded content, which would be the case if you
|
292
|
+
tried to do this by just whitelisting all <code><object></code>,
|
293
|
+
<code><embed></code>, and <code><param></code> elements:
|
276
294
|
|
277
295
|
lambda do |env|
|
278
296
|
node = env[:node]
|
@@ -323,7 +341,7 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
|
|
323
341
|
|
324
342
|
== Contributors
|
325
343
|
|
326
|
-
Sanitize was created and is
|
344
|
+
Sanitize was created and is maintained by Ryan Grove (ryan@wonko.com).
|
327
345
|
|
328
346
|
The following lovely people have also contributed to Sanitize:
|
329
347
|
|
@@ -341,7 +359,7 @@ The following lovely people have also contributed to Sanitize:
|
|
341
359
|
|
342
360
|
== License
|
343
361
|
|
344
|
-
Copyright (c)
|
362
|
+
Copyright (c) 2011 Ryan Grove (ryan@wonko.com)
|
345
363
|
|
346
364
|
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
347
365
|
this software and associated documentation files (the 'Software'), to deal in
|
data/lib/sanitize.rb
CHANGED
@@ -1,6 +1,6 @@
|
|
1
1
|
# encoding: utf-8
|
2
2
|
#--
|
3
|
-
# Copyright (c)
|
3
|
+
# Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
|
4
4
|
#
|
5
5
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
6
|
# of this software and associated documentation files (the 'Software'), to deal
|
@@ -70,14 +70,18 @@ class Sanitize
|
|
70
70
|
|
71
71
|
# Returns a new Sanitize object initialized with the settings in _config_.
|
72
72
|
def initialize(config = {})
|
73
|
-
@config
|
74
|
-
@transformers = Array(@config[:transformers].dup)
|
73
|
+
@config = Config::DEFAULT.merge(config)
|
75
74
|
|
76
|
-
|
77
|
-
|
78
|
-
|
75
|
+
@transformers = {
|
76
|
+
:breadth => Array(@config[:transformers_breadth].dup),
|
77
|
+
:depth => Array(@config[:transformers]) + Array(@config[:transformers_depth])
|
78
|
+
}
|
79
79
|
|
80
|
-
|
80
|
+
# Default depth transformers. These always run at the end of the chain,
|
81
|
+
# after any custom transformers.
|
82
|
+
@transformers[:depth] << Transformers::CleanComment unless @config[:allow_comments]
|
83
|
+
|
84
|
+
@transformers[:depth] <<
|
81
85
|
Transformers::CleanCDATA <<
|
82
86
|
Transformers::CleanElement.new(@config)
|
83
87
|
end
|
@@ -117,34 +121,49 @@ class Sanitize
|
|
117
121
|
raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
|
118
122
|
|
119
123
|
node_whitelist = Set.new
|
120
|
-
node.traverse {|child| transform_node!(child, node_whitelist) }
|
121
124
|
|
125
|
+
unless @transformers[:breadth].empty?
|
126
|
+
traverse_breadth(node) {|n| transform_node!(n, node_whitelist, :breadth) }
|
127
|
+
end
|
128
|
+
|
129
|
+
traverse_depth(node) {|n| transform_node!(n, node_whitelist, :depth) }
|
122
130
|
node
|
123
131
|
end
|
124
132
|
|
125
133
|
private
|
126
134
|
|
127
|
-
def transform_node!(node, node_whitelist)
|
128
|
-
@transformers.each do |transformer|
|
135
|
+
def transform_node!(node, node_whitelist, mode)
|
136
|
+
@transformers[mode].each do |transformer|
|
129
137
|
result = transformer.call({
|
130
138
|
:config => @config,
|
131
139
|
:is_whitelisted => node_whitelist.include?(node),
|
132
140
|
:node => node,
|
133
141
|
:node_name => node.name.downcase,
|
134
|
-
:node_whitelist => node_whitelist
|
142
|
+
:node_whitelist => node_whitelist,
|
143
|
+
:traversal_mode => mode
|
135
144
|
})
|
136
145
|
|
137
146
|
if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
|
138
147
|
node_whitelist.merge(result[:node_whitelist])
|
139
148
|
end
|
140
|
-
|
141
|
-
# If the node has been unlinked or replaced, there's no point running
|
142
|
-
# subsequent transformers.
|
143
|
-
break if node.parent.nil? && !node.fragment?
|
144
149
|
end
|
145
150
|
|
146
151
|
node
|
147
152
|
end
|
148
153
|
|
154
|
+
# Performs breadth-first traversal, operating first on the root node, then
|
155
|
+
# traversing downwards.
|
156
|
+
def traverse_breadth(node, &block)
|
157
|
+
block.call(node)
|
158
|
+
node.children.each {|child| traverse_breadth(child, &block) }
|
159
|
+
end
|
160
|
+
|
161
|
+
# Performs depth-first traversal, operating first on the deepest nodes in the
|
162
|
+
# document, then traversing upwards to the root.
|
163
|
+
def traverse_depth(node, &block)
|
164
|
+
node.children.each {|child| traverse_depth(child, &block) }
|
165
|
+
block.call(node)
|
166
|
+
end
|
167
|
+
|
149
168
|
class Error < StandardError; end
|
150
169
|
end
|
data/lib/sanitize/config.rb
CHANGED
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c)
|
2
|
+
# Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the 'Software'), to deal
|
@@ -66,6 +66,11 @@ class Sanitize
|
|
66
66
|
# README.rdoc for details and examples.
|
67
67
|
:transformers => [],
|
68
68
|
|
69
|
+
# By default, transformers perform depth-first traversal (deepest node
|
70
|
+
# upward). This setting allows you to specify transformers that should
|
71
|
+
# perform breadth-first traversal (top node downward).
|
72
|
+
:transformers_breadth => [],
|
73
|
+
|
69
74
|
# Elements which, when removed, should have their contents surrounded by
|
70
75
|
# space characters to preserve readability. For example,
|
71
76
|
# `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
|
@@ -74,6 +79,7 @@ class Sanitize
|
|
74
79
|
address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
|
75
80
|
h6 header hgroup hr li nav ol p pre section ul
|
76
81
|
]
|
82
|
+
|
77
83
|
}
|
78
84
|
end
|
79
85
|
end
|
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c)
|
2
|
+
# Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the 'Software'), to deal
|
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c)
|
2
|
+
# Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the 'Software'), to deal
|
@@ -1,5 +1,5 @@
|
|
1
1
|
#--
|
2
|
-
# Copyright (c)
|
2
|
+
# Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
|
3
3
|
#
|
4
4
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
5
|
# of this software and associated documentation files (the 'Software'), to deal
|
@@ -6,18 +6,15 @@ class Sanitize; module Transformers
|
|
6
6
|
|
7
7
|
# For faster lookups.
|
8
8
|
@add_attributes = config[:add_attributes]
|
9
|
-
@allowed_elements =
|
9
|
+
@allowed_elements = Set.new(config[:elements])
|
10
10
|
@attributes = config[:attributes]
|
11
11
|
@protocols = config[:protocols]
|
12
12
|
@remove_all_contents = false
|
13
|
-
@remove_element_contents =
|
14
|
-
@whitespace_elements =
|
15
|
-
|
16
|
-
config[:elements].each {|el| @allowed_elements[el] = true }
|
17
|
-
config[:whitespace_elements].each {|el| @whitespace_elements[el] = true }
|
13
|
+
@remove_element_contents = Set.new
|
14
|
+
@whitespace_elements = Set.new(config[:whitespace_elements])
|
18
15
|
|
19
16
|
if config[:remove_contents].is_a?(Array)
|
20
|
-
config[:remove_contents]
|
17
|
+
@remove_element_contents.merge(config[:remove_contents])
|
21
18
|
else
|
22
19
|
@remove_all_contents = !!config[:remove_contents]
|
23
20
|
end
|
@@ -30,10 +27,10 @@ class Sanitize; module Transformers
|
|
30
27
|
return if env[:is_whitelisted] || !node.element?
|
31
28
|
|
32
29
|
# Delete any element that isn't in the config whitelist.
|
33
|
-
unless @allowed_elements
|
30
|
+
unless @allowed_elements.include?(name)
|
34
31
|
# Elements like br, div, p, etc. need to be replaced with whitespace in
|
35
32
|
# order to preserve readability.
|
36
|
-
if @whitespace_elements
|
33
|
+
if @whitespace_elements.include?(name)
|
37
34
|
node.add_previous_sibling(Nokogiri::XML::Text.new(' ', node.document))
|
38
35
|
|
39
36
|
unless node.children.empty?
|
@@ -41,7 +38,7 @@ class Sanitize; module Transformers
|
|
41
38
|
end
|
42
39
|
end
|
43
40
|
|
44
|
-
unless @remove_all_contents || @remove_element_contents
|
41
|
+
unless @remove_all_contents || @remove_element_contents.include?(name)
|
45
42
|
node.children.each {|n| node.add_previous_sibling(n) }
|
46
43
|
end
|
47
44
|
|
data/lib/sanitize/version.rb
CHANGED
metadata
CHANGED
@@ -7,8 +7,8 @@ version: !ruby/object:Gem::Version
|
|
7
7
|
- 0
|
8
8
|
- 0
|
9
9
|
- dev
|
10
|
-
-
|
11
|
-
version: 2.0.0.dev.
|
10
|
+
- 20110105
|
11
|
+
version: 2.0.0.dev.20110105
|
12
12
|
platform: ruby
|
13
13
|
authors:
|
14
14
|
- Ryan Grove
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date:
|
19
|
+
date: 2011-01-05 00:00:00 -08:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|