sanitize 2.0.0.dev.20101225 → 2.0.0.dev.20110105

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

data/HISTORY.md CHANGED
@@ -16,14 +16,17 @@ Version 2.0.0 (git)
16
16
  `<br>` and `<p>`) that should be replaced with whitespace when removed in
17
17
  order to preserve readability. See the README for the default list of
18
18
  elements that will be replaced with whitespace when removed.
19
+ * Added a `:transformers_breadth` config, which may be used to specify
20
+ transformers that should traverse nodes in a breadth-first mode rather than
21
+ the default depth-first mode.
19
22
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
20
23
  elements to the whitelists for the basic and relaxed configs.
21
24
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
22
25
  `ruby`, and `wbr` elements to the whitelist for the relaxed config.
23
26
  * The `dir`, `lang`, and `title` attributes are now whitelisted for all
24
27
  elements in the relaxed config.
25
- * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+ (issue
26
- #315) that caused `</body></html>` to be appended to the CDATA inside
28
+ * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
29
+ (issue #315) that caused `</body></html>` to be appended to the CDATA inside
27
30
  unterminated script and style elements.
28
31
 
29
32
 
data/LICENSE CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
1
+ Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
2
2
 
3
3
  Permission is hereby granted, free of charge, to any person obtaining a copy of
4
4
  this software and associated documentation files (the 'Software'), to deal in
data/README.rdoc CHANGED
@@ -15,7 +15,7 @@ or maliciously-formed HTML, and will always output valid HTML or XHTML.
15
15
 
16
16
  *Author*:: Ryan Grove (mailto:ryan@wonko.com)
17
17
  *Version*:: 2.0.0 (git)
18
- *Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
18
+ *Copyright*:: Copyright (c) 2011 Ryan Grove. All rights reserved.
19
19
  *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
20
20
  *Website*:: http://github.com/rgrove/sanitize
21
21
 
@@ -173,8 +173,13 @@ The default value is <code>false</code>.
173
173
 
174
174
  ==== :transformers
175
175
 
176
- Custom transformer or array of custom transformers. See the Transformers section
177
- below for details.
176
+ Custom transformer or array of custom transformers to run using depth-first
177
+ traversal. See the Transformers section below for details.
178
+
179
+ === :transformers_breadth
180
+
181
+ Custom transformer or array of custom transformers to run using breadth-first
182
+ traversal. See the Transformers section below for details.
178
183
 
179
184
  ==== :whitespace_elements (Array)
180
185
 
@@ -230,6 +235,10 @@ receive as an argument an environment Hash that contains the following items:
230
235
  whitelisted by previous transformers, if any. It's generally bad form to
231
236
  remove a node that a previous transformer has whitelisted.
232
237
 
238
+ [<code>:traversal_mode</code>]
239
+ Current node traversal mode, either <code>:depth</code> for depth-first (the
240
+ default mode) or <code>:breadth</code> for breadth-first.
241
+
233
242
  ==== Output
234
243
 
235
244
  A transformer doesn't have to return anything, but may optionally return a Hash,
@@ -252,9 +261,9 @@ reflected instantly in the document and passed on to subsequently-called
252
261
  transformers and to Sanitize itself. A transformer may even call Sanitize
253
262
  internally to perform custom sanitization if needed.
254
263
 
255
- Nodes are passed into transformers in the order in which they're traversed. It's
256
- important to note that Nokogiri traverses markup from the deepest node upward,
257
- not from the first node to the last node:
264
+ Nodes are passed into transformers in the order in which they're traversed. By
265
+ default, depth-first traversal is used, meaning that markup is traversed from
266
+ the deepest node upward (not from the first node to the last node):
258
267
 
259
268
  html = '<div><span>foo</span></div>'
260
269
  transformer = lambda{|env| puts env[:node_name] }
@@ -262,17 +271,26 @@ not from the first node to the last node:
262
271
  # Prints "text", "span", "div", "#document-fragment".
263
272
  Sanitize.clean(html, :transformers => transformer)
264
273
 
274
+ You may use the <code>:transformers_breadth</code> config to specify one or more
275
+ transformers that should traverse nodes in breadth-first mode:
276
+
277
+ html = '<div><span>foo</span></div>'
278
+ transformer = lambda{|env| puts env[:node_name] }
279
+
280
+ # Prints "#document-fragment", "div", "span", "text".
281
+ Sanitize.clean(html, :transformers_breadth => transformer)
282
+
265
283
  Transformers have a tremendous amount of power, including the power to
266
284
  completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
267
285
  your own hands.
268
286
 
269
287
  ==== Example: Transformer to whitelist YouTube video embeds
270
288
 
271
- The following example demonstrates how to create a Sanitize transformer that
272
- will safely whitelist valid YouTube video embeds without having to blindly allow
273
- other kinds of embedded content, which would be the case if you tried to do this
274
- by just whitelisting all <code><object></code>, <code><embed></code>, and
275
- <code><param></code> elements:
289
+ The following example demonstrates how to create a depth-first Sanitize
290
+ transformer that will safely whitelist valid YouTube video embeds without having
291
+ to blindly allow other kinds of embedded content, which would be the case if you
292
+ tried to do this by just whitelisting all <code><object></code>,
293
+ <code><embed></code>, and <code><param></code> elements:
276
294
 
277
295
  lambda do |env|
278
296
  node = env[:node]
@@ -323,7 +341,7 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
323
341
 
324
342
  == Contributors
325
343
 
326
- Sanitize was created and is currently maintained by Ryan Grove (ryan@wonko.com).
344
+ Sanitize was created and is maintained by Ryan Grove (ryan@wonko.com).
327
345
 
328
346
  The following lovely people have also contributed to Sanitize:
329
347
 
@@ -341,7 +359,7 @@ The following lovely people have also contributed to Sanitize:
341
359
 
342
360
  == License
343
361
 
344
- Copyright (c) 2010 Ryan Grove (ryan@wonko.com)
362
+ Copyright (c) 2011 Ryan Grove (ryan@wonko.com)
345
363
 
346
364
  Permission is hereby granted, free of charge, to any person obtaining a copy of
347
365
  this software and associated documentation files (the 'Software'), to deal in
data/lib/sanitize.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  # encoding: utf-8
2
2
  #--
3
- # Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
3
+ # Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
4
4
  #
5
5
  # Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  # of this software and associated documentation files (the 'Software'), to deal
@@ -70,14 +70,18 @@ class Sanitize
70
70
 
71
71
  # Returns a new Sanitize object initialized with the settings in _config_.
72
72
  def initialize(config = {})
73
- @config = Config::DEFAULT.merge(config)
74
- @transformers = Array(@config[:transformers].dup)
73
+ @config = Config::DEFAULT.merge(config)
75
74
 
76
- # Default transformers. These always run at the end of the transformer
77
- # chain, after any custom transformers.
78
- @transformers << Transformers::CleanComment unless @config[:allow_comments]
75
+ @transformers = {
76
+ :breadth => Array(@config[:transformers_breadth].dup),
77
+ :depth => Array(@config[:transformers]) + Array(@config[:transformers_depth])
78
+ }
79
79
 
80
- @transformers <<
80
+ # Default depth transformers. These always run at the end of the chain,
81
+ # after any custom transformers.
82
+ @transformers[:depth] << Transformers::CleanComment unless @config[:allow_comments]
83
+
84
+ @transformers[:depth] <<
81
85
  Transformers::CleanCDATA <<
82
86
  Transformers::CleanElement.new(@config)
83
87
  end
@@ -117,34 +121,49 @@ class Sanitize
117
121
  raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
118
122
 
119
123
  node_whitelist = Set.new
120
- node.traverse {|child| transform_node!(child, node_whitelist) }
121
124
 
125
+ unless @transformers[:breadth].empty?
126
+ traverse_breadth(node) {|n| transform_node!(n, node_whitelist, :breadth) }
127
+ end
128
+
129
+ traverse_depth(node) {|n| transform_node!(n, node_whitelist, :depth) }
122
130
  node
123
131
  end
124
132
 
125
133
  private
126
134
 
127
- def transform_node!(node, node_whitelist)
128
- @transformers.each do |transformer|
135
+ def transform_node!(node, node_whitelist, mode)
136
+ @transformers[mode].each do |transformer|
129
137
  result = transformer.call({
130
138
  :config => @config,
131
139
  :is_whitelisted => node_whitelist.include?(node),
132
140
  :node => node,
133
141
  :node_name => node.name.downcase,
134
- :node_whitelist => node_whitelist
142
+ :node_whitelist => node_whitelist,
143
+ :traversal_mode => mode
135
144
  })
136
145
 
137
146
  if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
138
147
  node_whitelist.merge(result[:node_whitelist])
139
148
  end
140
-
141
- # If the node has been unlinked or replaced, there's no point running
142
- # subsequent transformers.
143
- break if node.parent.nil? && !node.fragment?
144
149
  end
145
150
 
146
151
  node
147
152
  end
148
153
 
154
+ # Performs breadth-first traversal, operating first on the root node, then
155
+ # traversing downwards.
156
+ def traverse_breadth(node, &block)
157
+ block.call(node)
158
+ node.children.each {|child| traverse_breadth(child, &block) }
159
+ end
160
+
161
+ # Performs depth-first traversal, operating first on the deepest nodes in the
162
+ # document, then traversing upwards to the root.
163
+ def traverse_depth(node, &block)
164
+ node.children.each {|child| traverse_depth(child, &block) }
165
+ block.call(node)
166
+ end
167
+
149
168
  class Error < StandardError; end
150
169
  end
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
2
+ # Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the 'Software'), to deal
@@ -66,6 +66,11 @@ class Sanitize
66
66
  # README.rdoc for details and examples.
67
67
  :transformers => [],
68
68
 
69
+ # By default, transformers perform depth-first traversal (deepest node
70
+ # upward). This setting allows you to specify transformers that should
71
+ # perform breadth-first traversal (top node downward).
72
+ :transformers_breadth => [],
73
+
69
74
  # Elements which, when removed, should have their contents surrounded by
70
75
  # space characters to preserve readability. For example,
71
76
  # `foo<div>bar</div>baz` will become 'foo bar baz' when the <div> is
@@ -74,6 +79,7 @@ class Sanitize
74
79
  address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
75
80
  h6 header hgroup hr li nav ol p pre section ul
76
81
  ]
82
+
77
83
  }
78
84
  end
79
85
  end
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
2
+ # Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the 'Software'), to deal
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
2
+ # Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the 'Software'), to deal
@@ -1,5 +1,5 @@
1
1
  #--
2
- # Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
2
+ # Copyright (c) 2011 Ryan Grove <ryan@wonko.com>
3
3
  #
4
4
  # Permission is hereby granted, free of charge, to any person obtaining a copy
5
5
  # of this software and associated documentation files (the 'Software'), to deal
@@ -6,18 +6,15 @@ class Sanitize; module Transformers
6
6
 
7
7
  # For faster lookups.
8
8
  @add_attributes = config[:add_attributes]
9
- @allowed_elements = {}
9
+ @allowed_elements = Set.new(config[:elements])
10
10
  @attributes = config[:attributes]
11
11
  @protocols = config[:protocols]
12
12
  @remove_all_contents = false
13
- @remove_element_contents = {}
14
- @whitespace_elements = {}
15
-
16
- config[:elements].each {|el| @allowed_elements[el] = true }
17
- config[:whitespace_elements].each {|el| @whitespace_elements[el] = true }
13
+ @remove_element_contents = Set.new
14
+ @whitespace_elements = Set.new(config[:whitespace_elements])
18
15
 
19
16
  if config[:remove_contents].is_a?(Array)
20
- config[:remove_contents].each {|el| @remove_element_contents[el] = true }
17
+ @remove_element_contents.merge(config[:remove_contents])
21
18
  else
22
19
  @remove_all_contents = !!config[:remove_contents]
23
20
  end
@@ -30,10 +27,10 @@ class Sanitize; module Transformers
30
27
  return if env[:is_whitelisted] || !node.element?
31
28
 
32
29
  # Delete any element that isn't in the config whitelist.
33
- unless @allowed_elements[name]
30
+ unless @allowed_elements.include?(name)
34
31
  # Elements like br, div, p, etc. need to be replaced with whitespace in
35
32
  # order to preserve readability.
36
- if @whitespace_elements[name]
33
+ if @whitespace_elements.include?(name)
37
34
  node.add_previous_sibling(Nokogiri::XML::Text.new(' ', node.document))
38
35
 
39
36
  unless node.children.empty?
@@ -41,7 +38,7 @@ class Sanitize; module Transformers
41
38
  end
42
39
  end
43
40
 
44
- unless @remove_all_contents || @remove_element_contents[name]
41
+ unless @remove_all_contents || @remove_element_contents.include?(name)
45
42
  node.children.each {|n| node.add_previous_sibling(n) }
46
43
  end
47
44
 
@@ -1,3 +1,3 @@
1
1
  class Sanitize
2
- VERSION = '2.0.0.dev.20101225'
2
+ VERSION = '2.0.0.dev.20110105'
3
3
  end
metadata CHANGED
@@ -7,8 +7,8 @@ version: !ruby/object:Gem::Version
7
7
  - 0
8
8
  - 0
9
9
  - dev
10
- - 20101225
11
- version: 2.0.0.dev.20101225
10
+ - 20110105
11
+ version: 2.0.0.dev.20110105
12
12
  platform: ruby
13
13
  authors:
14
14
  - Ryan Grove
@@ -16,7 +16,7 @@ autorequire:
16
16
  bindir: bin
17
17
  cert_chain: []
18
18
 
19
- date: 2010-12-25 00:00:00 -08:00
19
+ date: 2011-01-05 00:00:00 -08:00
20
20
  default_executable:
21
21
  dependencies:
22
22
  - !ruby/object:Gem::Dependency