sanitize 1.2.1 → 1.2.2.dev.20100822
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- data/HISTORY +12 -0
- data/README.rdoc +12 -2
- data/lib/sanitize/transformers/fix_fragment_cdata.rb +27 -0
- data/lib/sanitize/version.rb +1 -1
- data/lib/sanitize.rb +18 -5
- metadata +50 -19
data/HISTORY
CHANGED
@@ -1,6 +1,18 @@
|
|
1
1
|
Sanitize History
|
2
2
|
================================================================================
|
3
3
|
|
4
|
+
Version 1.2.? (git)
|
5
|
+
* The environment hash passed into transformers now includes an
|
6
|
+
:allowed_elements Hash to facilitate faster lookups when attempting to
|
7
|
+
determine whether an element is in the whitelist. [Suggested by Nicholas
|
8
|
+
Evans]
|
9
|
+
* The environment hash passed into transformers now includes a
|
10
|
+
:whitelist_nodes Array, so transformers now have insight into what nodes
|
11
|
+
have been whitelisted by other transformers. [Suggested by Nicholas Evans]
|
12
|
+
* Added a workaround for a bug in Nokogiri 1.4.2 and higher (issue #315) that
|
13
|
+
causes "</body></html>" to be appended to the CDATA inside unterminated
|
14
|
+
script and style elements.
|
15
|
+
|
4
16
|
Version 1.2.1 (2010-04-20)
|
5
17
|
* Added a :remove_contents config setting. If set to true, Sanitize will
|
6
18
|
remove the contents of all non-whitelisted elements in addition to the
|
data/README.rdoc
CHANGED
@@ -14,7 +14,7 @@ of fragile regular expressions, Sanitize has no trouble dealing with malformed
|
|
14
14
|
or maliciously-formed HTML, and will always output valid HTML or XHTML.
|
15
15
|
|
16
16
|
*Author*:: Ryan Grove (mailto:ryan@wonko.com)
|
17
|
-
*Version*:: 1.2
|
17
|
+
*Version*:: 1.2.? (git)
|
18
18
|
*Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
|
19
19
|
*License*:: MIT License (http://opensource.org/licenses/mit-license.php)
|
20
20
|
*Website*:: http://github.com/rgrove/sanitize
|
@@ -194,6 +194,10 @@ Each registered transformer's <code>call()</code> method will be called once for
|
|
194
194
|
each element node in the HTML, and will receive as an argument an environment
|
195
195
|
Hash that contains the following items:
|
196
196
|
|
197
|
+
[<code>:allowed_elements</code>]
|
198
|
+
Hash with whitelisted element names as keys, to facilitate fast lookups of
|
199
|
+
whitelisted elements.
|
200
|
+
|
197
201
|
[<code>:config</code>]
|
198
202
|
The current Sanitize configuration Hash.
|
199
203
|
|
@@ -203,6 +207,10 @@ Hash that contains the following items:
|
|
203
207
|
[<code>:node_name</code>]
|
204
208
|
The name of the current HTML node, always lowercase (e.g. "div" or "span").
|
205
209
|
|
210
|
+
[<code>:whitelist_nodes</code>]
|
211
|
+
Array of Nokogiri::XML::Node instances that have already been whitelisted by
|
212
|
+
previous transformers, if any.
|
213
|
+
|
206
214
|
==== Processing
|
207
215
|
|
208
216
|
Each transformer has full access to the Nokogiri::XML::Node that's passed into
|
@@ -301,9 +309,11 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
|
|
301
309
|
The following lovely people have contributed to Sanitize in the form of patches
|
302
310
|
or ideas that later became code:
|
303
311
|
|
312
|
+
* Ryan Grove <ryan@wonko.com>
|
313
|
+
* Wilson Bilkovich <wilson@supremetyrant.com>
|
304
314
|
* Peter Cooper <git@peterc.org>
|
305
315
|
* Gabe da Silveira <gabe@websaviour.com>
|
306
|
-
*
|
316
|
+
* Nicholas Evans <owlmanatt@gmail.com>
|
307
317
|
* Adam Hooper <adam@adamhooper.com>
|
308
318
|
* Mutwin Kraus <mutle@blogage.de>
|
309
319
|
* Dev Purkayastha <dev.purkayastha@gmail.com>
|
@@ -0,0 +1,27 @@
|
|
1
|
+
class Sanitize; module Transformers
|
2
|
+
|
3
|
+
# Nokogiri 1.4.2 and higher contain a fragment parsing bug that causes the
|
4
|
+
# string "</body></html>" to be appended to the CDATA inside an unterminated
|
5
|
+
# <script> or <style> element. This transformer works around this bug by
|
6
|
+
# finding affected elements and removing the spurious text.
|
7
|
+
#
|
8
|
+
# See http://github.com/tenderlove/nokogiri/issues#issue/315
|
9
|
+
FIX_FRAGMENT_CDATA = lambda do |env|
|
10
|
+
node_name = env[:node_name]
|
11
|
+
|
12
|
+
if node_name == 'script' || node_name == 'style'
|
13
|
+
node = env[:node]
|
14
|
+
|
15
|
+
unless node.children.empty?
|
16
|
+
last_child = node.children.last
|
17
|
+
|
18
|
+
if last_child.text? && last_child.content =~ %r|</body></html>$|
|
19
|
+
last_child.content = last_child.content.chomp('</body></html>')
|
20
|
+
end
|
21
|
+
end
|
22
|
+
end
|
23
|
+
|
24
|
+
nil
|
25
|
+
end
|
26
|
+
|
27
|
+
end; end
|
data/lib/sanitize/version.rb
CHANGED
data/lib/sanitize.rb
CHANGED
@@ -27,6 +27,7 @@ require 'sanitize/config'
|
|
27
27
|
require 'sanitize/config/restricted'
|
28
28
|
require 'sanitize/config/basic'
|
29
29
|
require 'sanitize/config/relaxed'
|
30
|
+
require 'sanitize/transformers/fix_fragment_cdata'
|
30
31
|
|
31
32
|
class Sanitize
|
32
33
|
attr_reader :config
|
@@ -90,12 +91,22 @@ class Sanitize
|
|
90
91
|
# is generated at runtime by transformers, and is cleared before and after
|
91
92
|
# a fragment is cleaned (so it applies only to a specific fragment).
|
92
93
|
@whitelist_nodes = []
|
94
|
+
|
95
|
+
# Workaround for a fragment parsing bug in Nokogiri >= 1.4.2. The naïve
|
96
|
+
# version check is fine here; there are no side effects for unaffected
|
97
|
+
# versions except slightly worse performance, and I plan to remove this hack
|
98
|
+
# as soon as Nokogiri fixes the bug on their end.
|
99
|
+
if Nokogiri::VERSION > '1.4.1'
|
100
|
+
@config[:transformers] << Transformers::FIX_FRAGMENT_CDATA
|
101
|
+
end
|
93
102
|
end
|
94
103
|
|
95
104
|
# Returns a sanitized copy of _html_.
|
96
105
|
def clean(html)
|
97
|
-
|
98
|
-
|
106
|
+
if html
|
107
|
+
dupe = html.dup
|
108
|
+
clean!(dupe) || dupe
|
109
|
+
end
|
99
110
|
end
|
100
111
|
|
101
112
|
# Performs clean in place, returning _html_, or +nil+ if no changes were
|
@@ -215,9 +226,11 @@ class Sanitize
|
|
215
226
|
|
216
227
|
@config[:transformers].inject(node) do |transformer_node, transformer|
|
217
228
|
transform = transformer.call({
|
218
|
-
:
|
219
|
-
:
|
220
|
-
:
|
229
|
+
:allowed_elements => @allowed_elements,
|
230
|
+
:config => @config,
|
231
|
+
:node => transformer_node,
|
232
|
+
:node_name => transformer_node.name.downcase,
|
233
|
+
:whitelist_nodes => @whitelist_nodes
|
221
234
|
})
|
222
235
|
|
223
236
|
if transform.nil?
|
metadata
CHANGED
@@ -1,7 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: sanitize
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
|
4
|
+
prerelease: true
|
5
|
+
segments:
|
6
|
+
- 1
|
7
|
+
- 2
|
8
|
+
- 2
|
9
|
+
- dev
|
10
|
+
- 20100822
|
11
|
+
version: 1.2.2.dev.20100822
|
5
12
|
platform: ruby
|
6
13
|
authors:
|
7
14
|
- Ryan Grove
|
@@ -9,39 +16,54 @@ autorequire:
|
|
9
16
|
bindir: bin
|
10
17
|
cert_chain: []
|
11
18
|
|
12
|
-
date: 2010-
|
19
|
+
date: 2010-08-22 00:00:00 -07:00
|
13
20
|
default_executable:
|
14
21
|
dependencies:
|
15
22
|
- !ruby/object:Gem::Dependency
|
16
23
|
name: nokogiri
|
17
|
-
|
18
|
-
|
19
|
-
|
24
|
+
prerelease: false
|
25
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
26
|
+
none: false
|
20
27
|
requirements:
|
21
28
|
- - ~>
|
22
29
|
- !ruby/object:Gem::Version
|
30
|
+
segments:
|
31
|
+
- 1
|
32
|
+
- 4
|
33
|
+
- 1
|
23
34
|
version: 1.4.1
|
24
|
-
|
35
|
+
type: :runtime
|
36
|
+
version_requirements: *id001
|
25
37
|
- !ruby/object:Gem::Dependency
|
26
38
|
name: bacon
|
27
|
-
|
28
|
-
|
29
|
-
|
39
|
+
prerelease: false
|
40
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
41
|
+
none: false
|
30
42
|
requirements:
|
31
43
|
- - ~>
|
32
44
|
- !ruby/object:Gem::Version
|
45
|
+
segments:
|
46
|
+
- 1
|
47
|
+
- 1
|
48
|
+
- 0
|
33
49
|
version: 1.1.0
|
34
|
-
|
50
|
+
type: :development
|
51
|
+
version_requirements: *id002
|
35
52
|
- !ruby/object:Gem::Dependency
|
36
53
|
name: rake
|
37
|
-
|
38
|
-
|
39
|
-
|
54
|
+
prerelease: false
|
55
|
+
requirement: &id003 !ruby/object:Gem::Requirement
|
56
|
+
none: false
|
40
57
|
requirements:
|
41
58
|
- - ~>
|
42
59
|
- !ruby/object:Gem::Version
|
60
|
+
segments:
|
61
|
+
- 0
|
62
|
+
- 8
|
63
|
+
- 0
|
43
64
|
version: 0.8.0
|
44
|
-
|
65
|
+
type: :development
|
66
|
+
version_requirements: *id003
|
45
67
|
description:
|
46
68
|
email: ryan@wonko.com
|
47
69
|
executables: []
|
@@ -58,6 +80,7 @@ files:
|
|
58
80
|
- lib/sanitize/config/relaxed.rb
|
59
81
|
- lib/sanitize/config/restricted.rb
|
60
82
|
- lib/sanitize/config.rb
|
83
|
+
- lib/sanitize/transformers/fix_fragment_cdata.rb
|
61
84
|
- lib/sanitize/version.rb
|
62
85
|
- lib/sanitize.rb
|
63
86
|
has_rdoc: true
|
@@ -70,21 +93,29 @@ rdoc_options: []
|
|
70
93
|
require_paths:
|
71
94
|
- lib
|
72
95
|
required_ruby_version: !ruby/object:Gem::Requirement
|
96
|
+
none: false
|
73
97
|
requirements:
|
74
98
|
- - ">="
|
75
99
|
- !ruby/object:Gem::Version
|
100
|
+
segments:
|
101
|
+
- 1
|
102
|
+
- 8
|
103
|
+
- 6
|
76
104
|
version: 1.8.6
|
77
|
-
version:
|
78
105
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
106
|
+
none: false
|
79
107
|
requirements:
|
80
|
-
- - "
|
108
|
+
- - ">"
|
81
109
|
- !ruby/object:Gem::Version
|
82
|
-
|
83
|
-
|
110
|
+
segments:
|
111
|
+
- 1
|
112
|
+
- 3
|
113
|
+
- 1
|
114
|
+
version: 1.3.1
|
84
115
|
requirements: []
|
85
116
|
|
86
117
|
rubyforge_project: riposte
|
87
|
-
rubygems_version: 1.3.
|
118
|
+
rubygems_version: 1.3.7
|
88
119
|
signing_key:
|
89
120
|
specification_version: 3
|
90
121
|
summary: Whitelist-based HTML sanitizer.
|