sanitize 1.2.1 → 1.2.2.dev.20100822

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

data/HISTORY CHANGED
@@ -1,6 +1,18 @@
1
1
  Sanitize History
2
2
  ================================================================================
3
3
 
4
+ Version 1.2.? (git)
5
+ * The environment hash passed into transformers now includes an
6
+ :allowed_elements Hash to facilitate faster lookups when attempting to
7
+ determine whether an element is in the whitelist. [Suggested by Nicholas
8
+ Evans]
9
+ * The environment hash passed into transformers now includes a
10
+ :whitelist_nodes Array, so transformers now have insight into what nodes
11
+ have been whitelisted by other transformers. [Suggested by Nicholas Evans]
12
+ * Added a workaround for a bug in Nokogiri 1.4.2 and higher (issue #315) that
13
+ causes "</body></html>" to be appended to the CDATA inside unterminated
14
+ script and style elements.
15
+
4
16
  Version 1.2.1 (2010-04-20)
5
17
  * Added a :remove_contents config setting. If set to true, Sanitize will
6
18
  remove the contents of all non-whitelisted elements in addition to the
data/README.rdoc CHANGED
@@ -14,7 +14,7 @@ of fragile regular expressions, Sanitize has no trouble dealing with malformed
14
14
  or maliciously-formed HTML, and will always output valid HTML or XHTML.
15
15
 
16
16
  *Author*:: Ryan Grove (mailto:ryan@wonko.com)
17
- *Version*:: 1.2.1 (2010-04-20)
17
+ *Version*:: 1.2.? (git)
18
18
  *Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
19
19
  *License*:: MIT License (http://opensource.org/licenses/mit-license.php)
20
20
  *Website*:: http://github.com/rgrove/sanitize
@@ -194,6 +194,10 @@ Each registered transformer's <code>call()</code> method will be called once for
194
194
  each element node in the HTML, and will receive as an argument an environment
195
195
  Hash that contains the following items:
196
196
 
197
+ [<code>:allowed_elements</code>]
198
+ Hash with whitelisted element names as keys, to facilitate fast lookups of
199
+ whitelisted elements.
200
+
197
201
  [<code>:config</code>]
198
202
  The current Sanitize configuration Hash.
199
203
 
@@ -203,6 +207,10 @@ Hash that contains the following items:
203
207
  [<code>:node_name</code>]
204
208
  The name of the current HTML node, always lowercase (e.g. "div" or "span").
205
209
 
210
+ [<code>:whitelist_nodes</code>]
211
+ Array of Nokogiri::XML::Node instances that have already been whitelisted by
212
+ previous transformers, if any.
213
+
206
214
  ==== Processing
207
215
 
208
216
  Each transformer has full access to the Nokogiri::XML::Node that's passed into
@@ -301,9 +309,11 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
301
309
  The following lovely people have contributed to Sanitize in the form of patches
302
310
  or ideas that later became code:
303
311
 
312
+ * Ryan Grove <ryan@wonko.com>
313
+ * Wilson Bilkovich <wilson@supremetyrant.com>
304
314
  * Peter Cooper <git@peterc.org>
305
315
  * Gabe da Silveira <gabe@websaviour.com>
306
- * Ryan Grove <ryan@wonko.com>
316
+ * Nicholas Evans <owlmanatt@gmail.com>
307
317
  * Adam Hooper <adam@adamhooper.com>
308
318
  * Mutwin Kraus <mutle@blogage.de>
309
319
  * Dev Purkayastha <dev.purkayastha@gmail.com>
@@ -0,0 +1,27 @@
1
+ class Sanitize; module Transformers
2
+
3
+ # Nokogiri 1.4.2 and higher contain a fragment parsing bug that causes the
4
+ # string "</body></html>" to be appended to the CDATA inside an unterminated
5
+ # <script> or <style> element. This transformer works around this bug by
6
+ # finding affected elements and removing the spurious text.
7
+ #
8
+ # See http://github.com/tenderlove/nokogiri/issues#issue/315
9
+ FIX_FRAGMENT_CDATA = lambda do |env|
10
+ node_name = env[:node_name]
11
+
12
+ if node_name == 'script' || node_name == 'style'
13
+ node = env[:node]
14
+
15
+ unless node.children.empty?
16
+ last_child = node.children.last
17
+
18
+ if last_child.text? && last_child.content =~ %r|</body></html>$|
19
+ last_child.content = last_child.content.chomp('</body></html>')
20
+ end
21
+ end
22
+ end
23
+
24
+ nil
25
+ end
26
+
27
+ end; end
@@ -1,3 +1,3 @@
1
1
  class Sanitize
2
- VERSION = '1.2.1'
2
+ VERSION = '1.2.2.dev.20100822'
3
3
  end
data/lib/sanitize.rb CHANGED
@@ -27,6 +27,7 @@ require 'sanitize/config'
27
27
  require 'sanitize/config/restricted'
28
28
  require 'sanitize/config/basic'
29
29
  require 'sanitize/config/relaxed'
30
+ require 'sanitize/transformers/fix_fragment_cdata'
30
31
 
31
32
  class Sanitize
32
33
  attr_reader :config
@@ -90,12 +91,22 @@ class Sanitize
90
91
  # is generated at runtime by transformers, and is cleared before and after
91
92
  # a fragment is cleaned (so it applies only to a specific fragment).
92
93
  @whitelist_nodes = []
94
+
95
+ # Workaround for a fragment parsing bug in Nokogiri >= 1.4.2. The naïve
96
+ # version check is fine here; there are no side effects for unaffected
97
+ # versions except slightly worse performance, and I plan to remove this hack
98
+ # as soon as Nokogiri fixes the bug on their end.
99
+ if Nokogiri::VERSION > '1.4.1'
100
+ @config[:transformers] << Transformers::FIX_FRAGMENT_CDATA
101
+ end
93
102
  end
94
103
 
95
104
  # Returns a sanitized copy of _html_.
96
105
  def clean(html)
97
- dupe = html.dup
98
- clean!(dupe) || dupe
106
+ if html
107
+ dupe = html.dup
108
+ clean!(dupe) || dupe
109
+ end
99
110
  end
100
111
 
101
112
  # Performs clean in place, returning _html_, or +nil+ if no changes were
@@ -215,9 +226,11 @@ class Sanitize
215
226
 
216
227
  @config[:transformers].inject(node) do |transformer_node, transformer|
217
228
  transform = transformer.call({
218
- :config => @config,
219
- :node => transformer_node,
220
- :node_name => transformer_node.name.downcase
229
+ :allowed_elements => @allowed_elements,
230
+ :config => @config,
231
+ :node => transformer_node,
232
+ :node_name => transformer_node.name.downcase,
233
+ :whitelist_nodes => @whitelist_nodes
221
234
  })
222
235
 
223
236
  if transform.nil?
metadata CHANGED
@@ -1,7 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: sanitize
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.2.1
4
+ prerelease: true
5
+ segments:
6
+ - 1
7
+ - 2
8
+ - 2
9
+ - dev
10
+ - 20100822
11
+ version: 1.2.2.dev.20100822
5
12
  platform: ruby
6
13
  authors:
7
14
  - Ryan Grove
@@ -9,39 +16,54 @@ autorequire:
9
16
  bindir: bin
10
17
  cert_chain: []
11
18
 
12
- date: 2010-04-20 00:00:00 -07:00
19
+ date: 2010-08-22 00:00:00 -07:00
13
20
  default_executable:
14
21
  dependencies:
15
22
  - !ruby/object:Gem::Dependency
16
23
  name: nokogiri
17
- type: :runtime
18
- version_requirement:
19
- version_requirements: !ruby/object:Gem::Requirement
24
+ prerelease: false
25
+ requirement: &id001 !ruby/object:Gem::Requirement
26
+ none: false
20
27
  requirements:
21
28
  - - ~>
22
29
  - !ruby/object:Gem::Version
30
+ segments:
31
+ - 1
32
+ - 4
33
+ - 1
23
34
  version: 1.4.1
24
- version:
35
+ type: :runtime
36
+ version_requirements: *id001
25
37
  - !ruby/object:Gem::Dependency
26
38
  name: bacon
27
- type: :development
28
- version_requirement:
29
- version_requirements: !ruby/object:Gem::Requirement
39
+ prerelease: false
40
+ requirement: &id002 !ruby/object:Gem::Requirement
41
+ none: false
30
42
  requirements:
31
43
  - - ~>
32
44
  - !ruby/object:Gem::Version
45
+ segments:
46
+ - 1
47
+ - 1
48
+ - 0
33
49
  version: 1.1.0
34
- version:
50
+ type: :development
51
+ version_requirements: *id002
35
52
  - !ruby/object:Gem::Dependency
36
53
  name: rake
37
- type: :development
38
- version_requirement:
39
- version_requirements: !ruby/object:Gem::Requirement
54
+ prerelease: false
55
+ requirement: &id003 !ruby/object:Gem::Requirement
56
+ none: false
40
57
  requirements:
41
58
  - - ~>
42
59
  - !ruby/object:Gem::Version
60
+ segments:
61
+ - 0
62
+ - 8
63
+ - 0
43
64
  version: 0.8.0
44
- version:
65
+ type: :development
66
+ version_requirements: *id003
45
67
  description:
46
68
  email: ryan@wonko.com
47
69
  executables: []
@@ -58,6 +80,7 @@ files:
58
80
  - lib/sanitize/config/relaxed.rb
59
81
  - lib/sanitize/config/restricted.rb
60
82
  - lib/sanitize/config.rb
83
+ - lib/sanitize/transformers/fix_fragment_cdata.rb
61
84
  - lib/sanitize/version.rb
62
85
  - lib/sanitize.rb
63
86
  has_rdoc: true
@@ -70,21 +93,29 @@ rdoc_options: []
70
93
  require_paths:
71
94
  - lib
72
95
  required_ruby_version: !ruby/object:Gem::Requirement
96
+ none: false
73
97
  requirements:
74
98
  - - ">="
75
99
  - !ruby/object:Gem::Version
100
+ segments:
101
+ - 1
102
+ - 8
103
+ - 6
76
104
  version: 1.8.6
77
- version:
78
105
  required_rubygems_version: !ruby/object:Gem::Requirement
106
+ none: false
79
107
  requirements:
80
- - - ">="
108
+ - - ">"
81
109
  - !ruby/object:Gem::Version
82
- version: "0"
83
- version:
110
+ segments:
111
+ - 1
112
+ - 3
113
+ - 1
114
+ version: 1.3.1
84
115
  requirements: []
85
116
 
86
117
  rubyforge_project: riposte
87
- rubygems_version: 1.3.5
118
+ rubygems_version: 1.3.7
88
119
  signing_key:
89
120
  specification_version: 3
90
121
  summary: Whitelist-based HTML sanitizer.