sanitize 2.0.0.dev.20101213 → 2.0.0.dev.20101225
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- data/{HISTORY → HISTORY.md} +70 -32
- data/lib/sanitize.rb +6 -5
- data/lib/sanitize/version.rb +1 -1
- metadata +4 -4
data/{HISTORY → HISTORY.md}
RENAMED
@@ -2,113 +2,151 @@ Sanitize History
|
|
2
2
|
================================================================================
|
3
3
|
|
4
4
|
Version 2.0.0 (git)
|
5
|
+
-------------------
|
6
|
+
|
5
7
|
* The environment data passed into transformers and the return values expected
|
6
8
|
from transformers have changed. Old transformers will need to be updated.
|
7
9
|
See the README for details.
|
8
10
|
* Transformers now receive nodes of all types, not just element nodes.
|
9
11
|
* Sanitize's own core filtering logic is now implemented as a set of always-on
|
10
12
|
transformers.
|
11
|
-
* The default value for the
|
12
|
-
|
13
|
-
* Added a
|
14
|
-
and
|
15
|
-
preserve readability. See the README for the default list of
|
16
|
-
will be replaced with whitespace when removed.
|
13
|
+
* The default value for the `:output` config is now `:html`. Previously it was
|
14
|
+
`:xhtml`.
|
15
|
+
* Added a `:whitespace_elements` config, which specifies elements (such as
|
16
|
+
`<br>` and `<p>`) that should be replaced with whitespace when removed in
|
17
|
+
order to preserve readability. See the README for the default list of
|
18
|
+
elements that will be replaced with whitespace when removed.
|
17
19
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
18
|
-
elements to the whitelists for
|
19
|
-
`Sanitize::Config::RELAXED`.
|
20
|
+
elements to the whitelists for the basic and relaxed configs.
|
20
21
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
21
|
-
`ruby`, and `wbr` elements to the whitelist for
|
22
|
+
`ruby`, and `wbr` elements to the whitelist for the relaxed config.
|
22
23
|
* The `dir`, `lang`, and `title` attributes are now whitelisted for all
|
23
|
-
elements in
|
24
|
+
elements in the relaxed config.
|
24
25
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+ (issue
|
25
|
-
#315) that caused
|
26
|
+
#315) that caused `</body></html>` to be appended to the CDATA inside
|
26
27
|
unterminated script and style elements.
|
27
28
|
|
29
|
+
|
28
30
|
Version 1.2.1 (2010-04-20)
|
29
|
-
|
31
|
+
--------------------------
|
32
|
+
|
33
|
+
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
30
34
|
remove the contents of all non-whitelisted elements in addition to the
|
31
|
-
elements themselves. If set to an
|
35
|
+
elements themselves. If set to an array of element names, Sanitize will
|
32
36
|
remove the contents of only those elements (when filtered), and leave the
|
33
|
-
contents of other filtered elements. [Thanks to Rafael Souza for the
|
37
|
+
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
34
38
|
option]
|
35
|
-
* Added an
|
36
|
-
HTML output to be specified. The default is
|
37
|
-
* The environment hash passed into transformers now includes a
|
38
|
-
containing the lowercase name of the current HTML node (e.g. "div").
|
39
|
+
* Added an `:output_encoding` config setting to allow the character encoding
|
40
|
+
for HTML output to be specified. The default is utf-8.
|
41
|
+
* The environment hash passed into transformers now includes a `:node_name`
|
42
|
+
item containing the lowercase name of the current HTML node (e.g. "div").
|
39
43
|
* Returning anything other than a Hash or nil from a transformer will now
|
40
|
-
raise a meaningful Sanitize::Error exception rather than an unintended
|
41
|
-
NameError
|
44
|
+
raise a meaningful `Sanitize::Error` exception rather than an unintended
|
45
|
+
`NameError`.
|
46
|
+
|
42
47
|
|
43
48
|
Version 1.2.0 (2010-01-17)
|
49
|
+
--------------------------
|
50
|
+
|
44
51
|
* Requires Nokogiri ~> 1.4.1.
|
45
52
|
* Added support for transformers, which allow you to filter and alter nodes
|
46
53
|
using your own custom logic, on top of (or instead of) Sanitize's core
|
47
54
|
filter. See the README for details and examples.
|
48
|
-
* Added Sanitize.clean_node
|
49
|
-
its children.
|
50
|
-
* Added elements
|
55
|
+
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
56
|
+
all its children.
|
57
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
|
51
58
|
David Reese]
|
52
59
|
|
60
|
+
|
53
61
|
Version 1.1.0 (2009-10-11)
|
62
|
+
--------------------------
|
63
|
+
|
54
64
|
* Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
|
55
|
-
* Added an
|
56
|
-
Supported formats are
|
65
|
+
* Added an `:output` config setting to allow the output format to be
|
66
|
+
specified. Supported formats are `:xhtml` (the default) and `:html` (which
|
67
|
+
outputs HTML4).
|
57
68
|
* Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
|
58
69
|
path segments. [Peter Cooper]
|
59
70
|
|
71
|
+
|
60
72
|
Version 1.0.8 (2009-04-23)
|
73
|
+
--------------------------
|
74
|
+
|
61
75
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
62
76
|
being downcased in recent versions of Hpricot. This was exploitable to
|
63
77
|
prevent non-whitelisted protocols from being cleaned. [Reported by Ben
|
64
78
|
Wanicur]
|
65
79
|
|
80
|
+
|
66
81
|
Version 1.0.7 (2009-04-11)
|
82
|
+
--------------------------
|
83
|
+
|
67
84
|
* Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
|
68
85
|
* Fixed a bug that caused named character entities containing digits (like
|
69
|
-
|
70
|
-
Steinmetz]
|
86
|
+
`²`) to be escaped when they shouldn't have been. [Reported by
|
87
|
+
Sebastian Steinmetz]
|
88
|
+
|
71
89
|
|
72
90
|
Version 1.0.6 (2009-02-23)
|
91
|
+
--------------------------
|
92
|
+
|
73
93
|
* Removed htmlentities gem dependency.
|
74
94
|
* Existing well-formed character entity references in the input string are now
|
75
95
|
preserved rather than being decoded and re-encoded.
|
76
|
-
* The ' character is now encoded as
|
96
|
+
* The `'` character is now encoded as `'` instead of `'` to prevent
|
77
97
|
problems in IE6.
|
78
|
-
* You can now specify the symbol
|
98
|
+
* You can now specify the symbol `:all` in place of an element name in the
|
79
99
|
attributes config hash to allow certain attributes on all elements. [Thanks
|
80
100
|
to Mutwin Kraus]
|
81
101
|
|
102
|
+
|
82
103
|
Version 1.0.5 (2009-02-05)
|
104
|
+
--------------------------
|
105
|
+
|
83
106
|
* Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
|
84
107
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
85
108
|
Dev Purkayastha]
|
86
109
|
* Fixed "undefined method `parent='" exceptions caused by parser changes in
|
87
110
|
edge Hpricot.
|
88
111
|
|
112
|
+
|
89
113
|
Version 1.0.4 (2009-01-16)
|
114
|
+
--------------------------
|
115
|
+
|
90
116
|
* Fixed a bug that made it possible to sneak a non-whitelisted element through
|
91
117
|
by repeating it several times in a row. All versions of Sanitize prior to
|
92
118
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
93
119
|
|
120
|
+
|
94
121
|
Version 1.0.3 (2009-01-15)
|
122
|
+
--------------------------
|
123
|
+
|
95
124
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
96
125
|
prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
|
97
126
|
still decode the incomplete entities, users of those browsers may be
|
98
127
|
vulnerable to malicious script injection on websites using versions of
|
99
128
|
Sanitize prior to 1.0.3.
|
100
129
|
|
130
|
+
|
101
131
|
Version 1.0.2 (2009-01-04)
|
132
|
+
--------------------------
|
133
|
+
|
102
134
|
* Fixed a bug that caused an exception to be thrown when parsing a valueless
|
103
135
|
attribute that's expected to contain a URL.
|
104
136
|
|
137
|
+
|
105
138
|
Version 1.0.1 (2009-01-01)
|
106
|
-
|
107
|
-
|
108
|
-
|
139
|
+
--------------------------
|
140
|
+
|
141
|
+
* You can now specify `:relative` in a protocol config array to allow
|
142
|
+
attributes containing relative URLs with no protocol. The Basic and Relaxed
|
143
|
+
configs have been updated to allow relative URLs.
|
109
144
|
* Added a workaround for an Hpricot bug that causes HTML entities for
|
110
145
|
non-ASCII characters to be replaced by question marks, and all other
|
111
146
|
entities to be destructively decoded.
|
112
147
|
|
148
|
+
|
113
149
|
Version 1.0.0 (2008-12-25)
|
150
|
+
--------------------------
|
151
|
+
|
114
152
|
* First release.
|
data/lib/sanitize.rb
CHANGED
@@ -75,8 +75,9 @@ class Sanitize
|
|
75
75
|
|
76
76
|
# Default transformers. These always run at the end of the transformer
|
77
77
|
# chain, after any custom transformers.
|
78
|
+
@transformers << Transformers::CleanComment unless @config[:allow_comments]
|
79
|
+
|
78
80
|
@transformers <<
|
79
|
-
Transformers::CleanComment <<
|
80
81
|
Transformers::CleanCDATA <<
|
81
82
|
Transformers::CleanElement.new(@config)
|
82
83
|
end
|
@@ -133,13 +134,13 @@ class Sanitize
|
|
133
134
|
:node_whitelist => node_whitelist
|
134
135
|
})
|
135
136
|
|
136
|
-
# If the node has been unlinked, there's no point running subsequent
|
137
|
-
# transformers.
|
138
|
-
break if node.parent.nil? && !node.fragment?
|
139
|
-
|
140
137
|
if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
|
141
138
|
node_whitelist.merge(result[:node_whitelist])
|
142
139
|
end
|
140
|
+
|
141
|
+
# If the node has been unlinked or replaced, there's no point running
|
142
|
+
# subsequent transformers.
|
143
|
+
break if node.parent.nil? && !node.fragment?
|
143
144
|
end
|
144
145
|
|
145
146
|
node
|
data/lib/sanitize/version.rb
CHANGED
metadata
CHANGED
@@ -7,8 +7,8 @@ version: !ruby/object:Gem::Version
|
|
7
7
|
- 0
|
8
8
|
- 0
|
9
9
|
- dev
|
10
|
-
-
|
11
|
-
version: 2.0.0.dev.
|
10
|
+
- 20101225
|
11
|
+
version: 2.0.0.dev.20101225
|
12
12
|
platform: ruby
|
13
13
|
authors:
|
14
14
|
- Ryan Grove
|
@@ -16,7 +16,7 @@ autorequire:
|
|
16
16
|
bindir: bin
|
17
17
|
cert_chain: []
|
18
18
|
|
19
|
-
date: 2010-12-
|
19
|
+
date: 2010-12-25 00:00:00 -08:00
|
20
20
|
default_executable:
|
21
21
|
dependencies:
|
22
22
|
- !ruby/object:Gem::Dependency
|
@@ -73,7 +73,7 @@ extensions: []
|
|
73
73
|
extra_rdoc_files: []
|
74
74
|
|
75
75
|
files:
|
76
|
-
- HISTORY
|
76
|
+
- HISTORY.md
|
77
77
|
- LICENSE
|
78
78
|
- README.rdoc
|
79
79
|
- lib/sanitize/config/basic.rb
|