sanitize 2.1.1 → 6.0.0
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of sanitize might be problematic. Click here for more details.
- checksums.yaml +4 -4
- data/HISTORY.md +520 -55
- data/LICENSE +1 -1
- data/README.md +438 -168
- data/lib/sanitize/config/basic.rb +12 -32
- data/lib/sanitize/config/default.rb +118 -0
- data/lib/sanitize/config/relaxed.rb +716 -53
- data/lib/sanitize/config/restricted.rb +3 -23
- data/lib/sanitize/config.rb +53 -79
- data/lib/sanitize/css.rb +348 -0
- data/lib/sanitize/transformers/clean_cdata.rb +3 -3
- data/lib/sanitize/transformers/clean_comment.rb +6 -3
- data/lib/sanitize/transformers/clean_css.rb +57 -0
- data/lib/sanitize/transformers/clean_doctype.rb +19 -0
- data/lib/sanitize/transformers/clean_element.rb +192 -124
- data/lib/sanitize/version.rb +3 -1
- data/lib/sanitize.rb +172 -143
- data/test/common.rb +3 -0
- data/test/test_clean_comment.rb +47 -0
- data/test/test_clean_css.rb +67 -0
- data/test/test_clean_doctype.rb +71 -0
- data/test/test_clean_element.rb +545 -0
- data/test/test_config.rb +65 -0
- data/test/test_malicious_css.rb +42 -0
- data/test/test_malicious_html.rb +235 -0
- data/test/test_parser.rb +75 -0
- data/test/test_sanitize.rb +151 -675
- data/test/test_sanitize_css.rb +424 -0
- data/test/test_transformers.rb +230 -0
- metadata +44 -41
data/HISTORY.md
CHANGED
@@ -1,15 +1,477 @@
|
|
1
|
-
Sanitize History
|
2
|
-
================================================================================
|
1
|
+
# Sanitize History
|
3
2
|
|
4
|
-
|
5
|
-
|
3
|
+
## 6.0.0 (2021-08-03)
|
4
|
+
|
5
|
+
### Potentially Breaking Changes
|
6
|
+
|
7
|
+
* Ruby 2.5.0 is now the oldest officially supported Ruby version.
|
8
|
+
|
9
|
+
* Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
|
10
|
+
The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
|
11
|
+
|
12
|
+
[211]:https://github.com/rgrove/sanitize/pull/211
|
13
|
+
|
14
|
+
## 5.2.3 (2021-01-11)
|
15
|
+
|
16
|
+
### Bug Fixes
|
17
|
+
|
18
|
+
* Ensure protocol sanitization is applied to data attributes.
|
19
|
+
[@ccutrer - #207][207]
|
20
|
+
|
21
|
+
[207]:https://github.com/rgrove/sanitize/pull/207
|
22
|
+
|
23
|
+
## 5.2.2 (2021-01-06)
|
24
|
+
|
25
|
+
### Bug Fixes
|
26
|
+
|
27
|
+
* Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
|
28
|
+
custom transformer. [@mscrivo - #206][206]
|
29
|
+
|
30
|
+
[206]:https://github.com/rgrove/sanitize/pull/206
|
31
|
+
|
32
|
+
## 5.2.1 (2020-06-16)
|
33
|
+
|
34
|
+
### Bug Fixes
|
35
|
+
|
36
|
+
* Fixed an HTML sanitization bypass that could allow XSS. This issue affects
|
37
|
+
Sanitize versions 3.0.0 through 5.2.0.
|
38
|
+
|
39
|
+
When HTML was sanitized using the "relaxed" config or a custom config that
|
40
|
+
allows certain elements, some content in a `<math>` or `<svg>` element may not
|
41
|
+
have beeen sanitized correctly even if `math` and `svg` were not in the
|
42
|
+
allowlist. This could allow carefully crafted input to sneak arbitrary HTML
|
43
|
+
through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
|
44
|
+
|
45
|
+
You are likely to be vulnerable to this issue if you use Sanitize's relaxed
|
46
|
+
config or a custom config that allows one or more of the following HTML
|
47
|
+
elements:
|
48
|
+
|
49
|
+
- `iframe`
|
50
|
+
- `math`
|
51
|
+
- `noembed`
|
52
|
+
- `noframes`
|
53
|
+
- `noscript`
|
54
|
+
- `plaintext`
|
55
|
+
- `script`
|
56
|
+
- `style`
|
57
|
+
- `svg`
|
58
|
+
- `xmp`
|
59
|
+
|
60
|
+
See the security advisory for more details, including a workaround if you're
|
61
|
+
not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
|
62
|
+
|
63
|
+
Many thanks to Michał Bentkowski of Securitum for reporting this issue and
|
64
|
+
helping to verify the fix.
|
65
|
+
|
66
|
+
[GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
|
67
|
+
|
68
|
+
## 5.2.0 (2020-06-06)
|
69
|
+
|
70
|
+
### Changes
|
71
|
+
|
72
|
+
* The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
|
73
|
+
source and documentation.
|
74
|
+
|
75
|
+
While the etymology of "whitelist" may not be explicitly racist in origin or
|
76
|
+
intent, there are inherent racial connotations in the implication that white
|
77
|
+
is good and black (as in "blacklist") is not.
|
78
|
+
|
79
|
+
This is a change I should have made long ago, and I apologize for not making
|
80
|
+
it sooner.
|
81
|
+
|
82
|
+
* In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
|
83
|
+
deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
|
84
|
+
The old keys will continue to work in order to avoid breaking existing code,
|
85
|
+
but they are no longer documented and may be removed in a future semver major
|
86
|
+
release.
|
87
|
+
|
88
|
+
## 5.1.0 (2019-09-07)
|
89
|
+
|
90
|
+
### Features
|
91
|
+
|
92
|
+
* Added a `:parser_options` config hash, which makes it possible to pass custom
|
93
|
+
parsing options to Nokogumbo. [@austin-wang - #194][194]
|
94
|
+
|
95
|
+
### Bug Fixes
|
96
|
+
|
97
|
+
* Non-characters and non-whitespace control characters are now stripped from
|
98
|
+
HTML input before parsing to comply with the HTML Standard's [preprocessing
|
99
|
+
guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
|
100
|
+
W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
|
101
|
+
|
102
|
+
[179]:https://github.com/rgrove/sanitize/issues/179
|
103
|
+
[194]:https://github.com/rgrove/sanitize/pull/194
|
104
|
+
[html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
|
105
|
+
[unicode-xml]:https://www.w3.org/TR/unicode-xml/
|
106
|
+
|
107
|
+
## 5.0.0 (2018-10-14)
|
108
|
+
|
109
|
+
For most users, upgrading from 4.x shouldn't require any changes. However, the
|
110
|
+
minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
|
111
|
+
differ in some small ways from 4.x's output. If this matters to you, please
|
112
|
+
review the changes below carefully.
|
113
|
+
|
114
|
+
### Potentially Breaking Changes
|
115
|
+
|
116
|
+
* Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
|
117
|
+
work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
|
118
|
+
no longer works in Ruby 1.9.x.
|
119
|
+
|
120
|
+
* Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
|
121
|
+
standard-compliant HTML serialization. [@stevecheckoway - #189][189]
|
122
|
+
|
123
|
+
* Children of the following elements are now removed by default when these
|
124
|
+
elements are removed, rather than being preserved and escaped:
|
125
|
+
|
126
|
+
- `iframe`
|
127
|
+
- `noembed`
|
128
|
+
- `noframes`
|
129
|
+
- `noscript`
|
130
|
+
- `script`
|
131
|
+
- `style`
|
132
|
+
|
133
|
+
* Children of allowlisted `iframe` elements are now always removed. In modern
|
134
|
+
HTML, `iframe` elements should never have children. In HTML 4 and earlier
|
135
|
+
`iframe` elements were allowed to contain fallback content for legacy
|
136
|
+
browsers, but it's been almost two decades since that was useful.
|
137
|
+
|
138
|
+
* Fixed a bug that caused `:remove_contents` to behave as if it were set to
|
139
|
+
`true` when it was actually an Array.
|
140
|
+
|
141
|
+
[189]:https://github.com/rgrove/sanitize/pull/189
|
142
|
+
|
143
|
+
## 4.6.6 (2018-07-23)
|
144
|
+
|
145
|
+
* Improved performance and memory usage by optimizing `Sanitize#transform_node!`
|
146
|
+
[@stanhu - #183][183]
|
147
|
+
|
148
|
+
[183]:https://github.com/rgrove/sanitize/pull/183
|
149
|
+
|
150
|
+
## 4.6.5 (2018-05-16)
|
151
|
+
|
152
|
+
* Improved performance slightly by tweaking the order of built-in transformers.
|
153
|
+
[@rafbm - #180][180]
|
154
|
+
|
155
|
+
[180]:https://github.com/rgrove/sanitize/pull/180
|
156
|
+
|
157
|
+
## 4.6.4 (2018-03-20)
|
158
|
+
|
159
|
+
* Fixed: A change introduced in 4.6.2 broke certain transformers that relied on
|
160
|
+
being able to mutate the name of an HTML node. That change has been reverted
|
161
|
+
and a test has been added to cover this case. [@zetter - #177][177]
|
162
|
+
|
163
|
+
[177]:https://github.com/rgrove/sanitize/issues/177
|
164
|
+
|
165
|
+
## 4.6.3 (2018-03-19)
|
166
|
+
|
167
|
+
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
168
|
+
XSS.
|
169
|
+
|
170
|
+
When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
|
171
|
+
specially crafted HTML fragment can cause libxml2 to generate improperly
|
172
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
173
|
+
elements.
|
174
|
+
|
175
|
+
Sanitize now performs additional escaping on affected attributes to prevent
|
176
|
+
this.
|
177
|
+
|
178
|
+
Many thanks to the Shopify Application Security Team for responsibly reporting
|
179
|
+
this issue.
|
180
|
+
|
181
|
+
[176]:https://github.com/rgrove/sanitize/issues/176
|
182
|
+
|
183
|
+
## 4.6.2 (2018-03-19)
|
184
|
+
|
185
|
+
* Reduced string allocations to optimize memory usage. [@janklimo - #175][175]
|
186
|
+
|
187
|
+
[175]:https://github.com/rgrove/sanitize/pull/175
|
188
|
+
|
189
|
+
## 4.6.1 (2018-03-15)
|
190
|
+
|
191
|
+
* Added support for frozen string literals in Ruby 2.4+.
|
192
|
+
[@flavorjones - #174][174]
|
193
|
+
|
194
|
+
[174]:https://github.com/rgrove/sanitize/pull/174
|
195
|
+
|
196
|
+
## 4.6.0 (2018-01-29)
|
197
|
+
|
198
|
+
* Loosened the Nokogumbo dependency to allow installing semver-compatible
|
199
|
+
versions greater than or equal to v1.4. [@rafbm - #171][171]
|
200
|
+
|
201
|
+
[171]:https://github.com/rgrove/sanitize/pull/171
|
202
|
+
|
203
|
+
## 4.5.0 (2017-06-04)
|
204
|
+
|
205
|
+
* Added SVG-related CSS properties to the relaxed config. See [the diff][161]
|
206
|
+
for the full list of added properties. [@louim - #161][161]
|
207
|
+
|
208
|
+
* Fixed: Sanitize now strips null bytes (`\u0000`) before passing input to
|
209
|
+
Nokogumbo, since they can cause recent versions to crash with a failed
|
210
|
+
assertion in the Gumbo parser.
|
211
|
+
|
212
|
+
[161]:https://github.com/rgrove/sanitize/pull/161
|
213
|
+
|
214
|
+
## 4.4.0 (2016-09-29)
|
215
|
+
|
216
|
+
* Added `srcset` to the attribute allowlist for `img` elements in the relaxed
|
217
|
+
config. [@ejtttje - #156][156]
|
218
|
+
|
219
|
+
[156]:https://github.com/rgrove/sanitize/pull/156
|
220
|
+
|
221
|
+
|
222
|
+
## 4.3.0 (2016-09-20)
|
223
|
+
|
224
|
+
* Methods can now be used as transformers. [@Skipants - #155][155]
|
225
|
+
|
226
|
+
[155]:https://github.com/rgrove/sanitize/pull/155
|
227
|
+
|
228
|
+
|
229
|
+
## 4.2.0 (2016-08-22)
|
230
|
+
|
231
|
+
* Added `-webkit-font-smoothing` to the relaxed CSS config. [@louim - #154][154]
|
232
|
+
|
233
|
+
* Fixed: Nokogumbo >=1.4.9 changed its behavior in a way that allowed invalid
|
234
|
+
doctypes (like `<!DOCTYPE nonsense>`) when the `:allow_doctype` config setting
|
235
|
+
was `true`. Invalid doctypes are now coerced to valid ones as they were prior
|
236
|
+
to this Nokogumbo change.
|
237
|
+
|
238
|
+
[154]:https://github.com/rgrove/sanitize/pull/154
|
239
|
+
|
240
|
+
|
241
|
+
## 4.1.0 (2016-06-17)
|
242
|
+
|
243
|
+
* Added a new CSS config setting, `:import_url_validator`. This is a Proc or
|
244
|
+
other callable object that will be called with each `@import` URL, and should
|
245
|
+
return `true` to allow the URL or `false` to remove it. [@nikz - #153][153]
|
246
|
+
|
247
|
+
[153]:https://github.com/rgrove/sanitize/pull/153/
|
248
|
+
|
249
|
+
|
250
|
+
## 4.0.1 (2015-12-09)
|
251
|
+
|
252
|
+
* Unpinned the Nokogumbo dependency. [@rubys - #141][141]
|
253
|
+
|
254
|
+
[141]:https://github.com/rgrove/sanitize/pull/141
|
255
|
+
|
256
|
+
|
257
|
+
## 4.0.0 (2015-04-20)
|
258
|
+
|
259
|
+
### Potentially breaking changes
|
260
|
+
|
261
|
+
* Added two new CSS config settings, `:at_rules_with_properties` and
|
262
|
+
`:at_rules_with_styles`. These allow you to define which at-rules should be
|
263
|
+
allowed to contain properties and which should be allowed to contain style
|
264
|
+
rules. Previously this was hard-coded internally. [#111][111]
|
265
|
+
|
266
|
+
The previous `:at_rules` setting still exists, and defines at-rules that may
|
267
|
+
not have associated blocks, such as `@import`. If you have a custom config
|
268
|
+
that contains an `:at_rules` setting, you may need to move rules can have
|
269
|
+
blocks to either `:at_rules_with_properties` or `:at_rules_with_styles`.
|
270
|
+
|
271
|
+
See Sanitize's relaxed config for an example.
|
272
|
+
|
273
|
+
### Other changes
|
274
|
+
|
275
|
+
* Added full support for CSS `@page` rules in the relaxed config, including
|
276
|
+
support for all page-margin box rules (such as `@top-left`, `@bottom-center`,
|
277
|
+
etc.)
|
278
|
+
|
279
|
+
* Added the following CSS at-rules to the relaxed config:
|
280
|
+
|
281
|
+
- `@-moz-keyframes`
|
282
|
+
- `@-o-keyframes`
|
283
|
+
- `@-webkit-keyframes`
|
284
|
+
- `@document`
|
285
|
+
|
286
|
+
* Added a whole bunch of CSS properties to the relaxed config. View the complete
|
287
|
+
list [here](https://gist.github.com/rgrove/044cc7e9a5b44f583c05).
|
288
|
+
|
289
|
+
* Small performance improvements.
|
290
|
+
|
291
|
+
* Fixed: Upgraded Crass to 1.0.2 to pick up a fix that affected the parsing of
|
292
|
+
CSS `@page` rules.
|
293
|
+
|
294
|
+
[111]:https://github.com/rgrove/sanitize/issues/111
|
295
|
+
|
296
|
+
|
297
|
+
## 3.1.2 (2015-02-22)
|
298
|
+
|
299
|
+
* Fixed: Deleting a node in a custom transformer could trigger a memory leak
|
300
|
+
in Nokogiri if that node's children were later reparented, which the built-in
|
301
|
+
CleanElement transformer did by default. The CleanElement transformer is now
|
302
|
+
careful not to reparent the children of deleted nodes. [#129][129]
|
303
|
+
|
304
|
+
[129]:https://github.com/rgrove/sanitize/issues/129
|
305
|
+
|
306
|
+
|
307
|
+
## 3.1.1 (2015-02-04)
|
308
|
+
|
309
|
+
* Fixed: `#document` and `#fragment` failed on frozen strings, and could
|
310
|
+
unintentionally modify unfrozen strings if they used an encoding other than
|
311
|
+
UTF-8 or if they contained characters not allowed in HTML.
|
312
|
+
[@AnchorCat - #128][128]
|
313
|
+
|
314
|
+
[128]:https://github.com/rgrove/sanitize/pull/128
|
315
|
+
|
316
|
+
|
317
|
+
## 3.1.0 (2014-12-22)
|
318
|
+
|
319
|
+
* Added the following CSS properties to the relaxed config. [@ehudc - #120][120]
|
320
|
+
|
321
|
+
- `-moz-text-size-adjust`
|
322
|
+
- `-ms-text-size-adjust`
|
323
|
+
- `-webkit-text-size-adjust`
|
324
|
+
- `text-size-adjust`
|
325
|
+
|
326
|
+
* Updated Nokogumbo to 1.2.0 to pick up a fix for a Gumbo bug where the
|
327
|
+
entity `Æ` left its semicolon behind when it was converted to a
|
328
|
+
character during parsing. [#119][119]
|
329
|
+
|
330
|
+
[119]:https://github.com/rgrove/sanitize/issues/119
|
331
|
+
[120]:https://github.com/rgrove/sanitize/pull/120
|
332
|
+
|
333
|
+
|
334
|
+
## 3.0.4 (2014-12-12)
|
335
|
+
|
336
|
+
* Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
|
337
|
+
caused the URL to be removed even when the protocol was allowlisted.
|
338
|
+
[@benubois - #126][126]
|
339
|
+
|
340
|
+
[126]:https://github.com/rgrove/sanitize/pull/126
|
341
|
+
|
342
|
+
|
343
|
+
## 3.0.3 (2014-10-29)
|
344
|
+
|
345
|
+
* Fixed: Some CSS selectors weren't parsed correctly inside the body of a
|
346
|
+
`@media` block, causing them to be removed even when allowlist rules should
|
347
|
+
have allowed them to remain. [#121][121]
|
348
|
+
|
349
|
+
[121]:https://github.com/rgrove/sanitize/issues/121
|
350
|
+
|
351
|
+
|
352
|
+
## 3.0.2 (2014-09-02)
|
353
|
+
|
354
|
+
* Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we
|
355
|
+
were trying to pick up in the last release. Now issue [#114][114] is
|
356
|
+
_actually_ fixed.
|
357
|
+
|
358
|
+
|
359
|
+
## 3.0.1 (2014-09-02)
|
360
|
+
|
361
|
+
* Updated Nokogumbo to 1.1.11 to pick up a fix for a Gumbo bug in which certain
|
362
|
+
HTML character entities, such as `Ö`, were parsed incorrectly, leaving
|
363
|
+
the semicolon behind in the output. [#114][114]
|
364
|
+
|
365
|
+
[114]:https://github.com/rgrove/sanitize/issues/114
|
366
|
+
|
367
|
+
|
368
|
+
## 3.0.0 (2014-06-21)
|
369
|
+
|
370
|
+
As of this version, Sanitize adheres strictly to the [SemVer 2.0.0][semver]
|
371
|
+
versioning standard. This release contains API and output changes that are
|
372
|
+
incompatible with previous releases, as indicated by the major version
|
373
|
+
increment.
|
374
|
+
|
375
|
+
[semver]:http://semver.org/
|
376
|
+
|
377
|
+
### Backwards-incompatible changes
|
378
|
+
|
379
|
+
* HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the
|
380
|
+
HTML5 parsing spec and behaves much more like modern browser parsers than the
|
381
|
+
previous libxml2-based parser. As a result, HTML output may differ from that
|
382
|
+
of previous versions of Sanitize.
|
383
|
+
|
384
|
+
* All transformers now traverse the document from the top down, starting with
|
385
|
+
the first node, then its first child, and so on. The `:transformers_breadth`
|
386
|
+
config has been removed, and old bottom-up transformers (the previous default)
|
387
|
+
may need to be rewritten.
|
388
|
+
|
389
|
+
* Sanitize's built-in configs are now deeply frozen to prevent people from
|
390
|
+
modifying them (either accidentally or maliciously). To customize a built-in
|
391
|
+
config, create a new copy using `Sanitize::Config.merge()`, like so:
|
392
|
+
|
393
|
+
```ruby
|
394
|
+
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
|
395
|
+
:elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
|
396
|
+
:remove_contents => true
|
397
|
+
))
|
398
|
+
```
|
399
|
+
|
400
|
+
* The `clean!` and `clean_document!` methods were removed, since they weren't
|
401
|
+
useful and tended to confuse people.
|
402
|
+
|
403
|
+
* The `clean` method was renamed to `fragment` to more clearly indicate that its
|
404
|
+
intended use is to sanitize an HTML fragment.
|
405
|
+
|
406
|
+
* The `clean_document` method was renamed to `document`.
|
407
|
+
|
408
|
+
* The `clean_node!` method was renamed to `node!`.
|
409
|
+
|
410
|
+
* The `document` method now raises a `Sanitize::Error` if the `<html>` element
|
411
|
+
isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
|
412
|
+
regardless of the `:remove_contents` config setting.
|
413
|
+
|
414
|
+
* The `:output` config has been removed. Output is now always HTML, not XHTML.
|
415
|
+
|
416
|
+
* The `:output_encoding` config has been removed. Output is now always UTF-8.
|
417
|
+
|
418
|
+
### Other changes
|
419
|
+
|
420
|
+
* Added advanced CSS sanitization support using [Crass][crass], which is fully
|
421
|
+
compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
|
422
|
+
allowlisted `<style>` elements and `style` attributes in HTML will be
|
423
|
+
sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
|
424
|
+
sanitize CSS stylesheets or properties.
|
425
|
+
|
426
|
+
* Added an `:allow_doctype` setting. When `true`, well-formed doctype
|
427
|
+
definitions will be allowed in documents. When `false` (the default), doctype
|
428
|
+
definitions will be removed from documents. Doctype definitions are never
|
429
|
+
allowed in fragments, regardless of this setting.
|
430
|
+
|
431
|
+
* Added the following elements to the relaxed config, in addition to various
|
432
|
+
attributes: `article`, `aside`, `body`, `data`, `div`, `footer`, `head`,
|
433
|
+
`header`, `html`, `main`, `nav`, `section`, `span`, `style`, `title`.
|
434
|
+
|
435
|
+
* The `:whitespace_elements` config is now a Hash, and allows you to specify the
|
436
|
+
text that should be inserted before and after these elements when they're
|
437
|
+
removed. The old-style Array-based config value is still supported for
|
438
|
+
backwards compatibility. [@alperkokmen - #94][94]
|
439
|
+
|
440
|
+
* Unsuitable Unicode characters are now removed from HTML before it's parsed.
|
441
|
+
[#106][106]
|
442
|
+
|
443
|
+
* Fixed: Non-tag brackets in input like `"1 > 2 and 2 < 1"` are now parsed and
|
444
|
+
escaped correctly in accordance with the HTML5 spec, becoming
|
445
|
+
`"1 > 2 and 2 < 1"`. [#83][83]
|
446
|
+
|
447
|
+
* Fixed: Siblings added after the current node during traversal are now
|
448
|
+
also traversed. In previous versions they were simply skipped. [#91][91]
|
449
|
+
|
450
|
+
* Fixed: Nokogiri has been smacked and instructed to stop adding newlines after
|
451
|
+
certain elements, because if people wanted newlines there they'd have put them
|
452
|
+
there, dammit. [#103][103]
|
453
|
+
|
454
|
+
* Fixed: Added a workaround for a libxml2 bug that caused an undesired
|
455
|
+
content-type meta tag to be added to all documents with `<head>` elements.
|
456
|
+
[Nokogiri #1008][n1008]
|
457
|
+
|
458
|
+
[crass]:https://github.com/rgrove/crass
|
459
|
+
[83]:https://github.com/rgrove/sanitize/issues/83
|
460
|
+
[91]:https://github.com/rgrove/sanitize/issues/91
|
461
|
+
[94]:https://github.com/rgrove/sanitize/pull/94/
|
462
|
+
[103]:https://github.com/rgrove/sanitize/issues/103
|
463
|
+
[106]:https://github.com/rgrove/sanitize/issues/106
|
464
|
+
[n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
|
465
|
+
|
466
|
+
|
467
|
+
## 2.1.1 (2018-09-30)
|
6
468
|
|
7
469
|
* [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
|
8
470
|
XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
|
9
471
|
|
10
472
|
When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
|
11
473
|
specially crafted HTML fragment can cause libxml2 to generate improperly
|
12
|
-
escaped output, allowing non-
|
474
|
+
escaped output, allowing non-allowlisted attributes to be used on allowlisted
|
13
475
|
elements.
|
14
476
|
|
15
477
|
Sanitize now performs additional escaping on affected attributes to prevent
|
@@ -22,10 +484,9 @@ Version 2.1.1 (2018-09-30)
|
|
22
484
|
[188]:https://github.com/rgrove/sanitize/pull/188
|
23
485
|
|
24
486
|
|
25
|
-
|
26
|
-
--------------------------
|
487
|
+
## 2.1.0 (2014-01-13)
|
27
488
|
|
28
|
-
* Added support for
|
489
|
+
* Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
|
29
490
|
symbol `:data` instead of an attribute name in the `:attributes` config to
|
30
491
|
indicate that arbitrary data attributes should be allowed on an element.
|
31
492
|
|
@@ -38,16 +499,14 @@ Version 2.1.0 (2014-01-13)
|
|
38
499
|
[87]:https://github.com/rgrove/sanitize/pull/87
|
39
500
|
|
40
501
|
|
41
|
-
|
42
|
-
--------------------------
|
502
|
+
## 2.0.6 (2013-07-10)
|
43
503
|
|
44
504
|
* Fixed: Version 2.0.5 inadvertently included some work-in-progress changes that
|
45
505
|
shouldn't have made their way into the master branch. This is what happens
|
46
506
|
when I release before coffee instead of after.
|
47
507
|
|
48
508
|
|
49
|
-
|
50
|
-
--------------------------
|
509
|
+
## 2.0.5 (2013-07-10)
|
51
510
|
|
52
511
|
* Loosened the Nokogiri dependency back to >= 1.4.4 to allow Sanitize to coexist
|
53
512
|
in newer Rubies with other libraries that restrict Nokogiri to 1.5.x for 1.8.7
|
@@ -55,8 +514,7 @@ Version 2.0.5 (2013-07-10)
|
|
55
514
|
life easier for people who need those other libs.
|
56
515
|
|
57
516
|
|
58
|
-
|
59
|
-
--------------------------
|
517
|
+
## 2.0.4 (2013-06-12)
|
60
518
|
|
61
519
|
* Added `Sanitize.clean_document`, which sanitizes a full HTML document rather
|
62
520
|
than just a fragment. [Ben Anderson]
|
@@ -66,14 +524,12 @@ Version 2.0.4 (2013-06-12)
|
|
66
524
|
* Dropped support for Ruby versions older than 1.9.2.
|
67
525
|
|
68
526
|
|
69
|
-
|
70
|
-
--------------------------
|
527
|
+
## 2.0.3 (2011-07-01)
|
71
528
|
|
72
529
|
* Loosened the Nokogiri dependency to allow Nokogiri 1.5.x.
|
73
530
|
|
74
531
|
|
75
|
-
|
76
|
-
--------------------------
|
532
|
+
## 2.0.2 (2011-05-21)
|
77
533
|
|
78
534
|
* Fixed a bug in which a protocol like "java\script:" would be translated to
|
79
535
|
"java%5Cscript:" and allowed through the filter when relative URLs were
|
@@ -81,162 +537,171 @@ Version 2.0.2 (2011-05-21)
|
|
81
537
|
undesired behavior.
|
82
538
|
|
83
539
|
|
84
|
-
|
85
|
-
--------------------------
|
540
|
+
## 2.0.1 (2011-03-16)
|
86
541
|
|
87
542
|
* Updated the protocol regex to anchor at the beginning of the string rather
|
88
543
|
than the beginning of a line. [Eaden McKee]
|
89
544
|
|
90
545
|
|
91
|
-
|
92
|
-
--------------------------
|
546
|
+
## 2.0.0 (2011-01-15)
|
93
547
|
|
94
548
|
* The environment data passed into transformers and the return values expected
|
95
549
|
from transformers have changed. Old transformers will need to be updated.
|
96
550
|
See the README for details.
|
551
|
+
|
97
552
|
* Transformers now receive nodes of all types, not just element nodes.
|
553
|
+
|
98
554
|
* Sanitize's own core filtering logic is now implemented as a set of always-on
|
99
555
|
transformers.
|
556
|
+
|
100
557
|
* The default value for the `:output` config is now `:html`. Previously it was
|
101
558
|
`:xhtml`.
|
559
|
+
|
102
560
|
* Added a `:whitespace_elements` config, which specifies elements (such as
|
103
561
|
`<br>` and `<p>`) that should be replaced with whitespace when removed in
|
104
562
|
order to preserve readability. See the README for the default list of
|
105
563
|
elements that will be replaced with whitespace when removed.
|
564
|
+
|
106
565
|
* Added a `:transformers_breadth` config, which may be used to specify
|
107
566
|
transformers that should traverse nodes in a breadth-first mode rather than
|
108
567
|
the default depth-first mode.
|
568
|
+
|
109
569
|
* Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
|
110
|
-
elements to the
|
570
|
+
elements to the allowlists for the basic and relaxed configs.
|
571
|
+
|
111
572
|
* Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
|
112
|
-
`ruby`, and `wbr` elements to the
|
113
|
-
|
573
|
+
`ruby`, and `wbr` elements to the allowlist for the relaxed config.
|
574
|
+
|
575
|
+
* The `dir`, `lang`, and `title` attributes are now allowlisted for all
|
114
576
|
elements in the relaxed config.
|
577
|
+
|
115
578
|
* Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
|
116
579
|
(issue #315) that caused `</body></html>` to be appended to the CDATA inside
|
117
580
|
unterminated script and style elements.
|
118
581
|
|
119
582
|
|
120
|
-
|
121
|
-
--------------------------
|
583
|
+
## 1.2.1 (2010-04-20)
|
122
584
|
|
123
585
|
* Added a `:remove_contents` config setting. If set to `true`, Sanitize will
|
124
|
-
remove the contents of all non-
|
586
|
+
remove the contents of all non-allowlisted elements in addition to the
|
125
587
|
elements themselves. If set to an array of element names, Sanitize will
|
126
588
|
remove the contents of only those elements (when filtered), and leave the
|
127
589
|
contents of other filtered elements. [Thanks to Rafael Souza for the array
|
128
590
|
option]
|
591
|
+
|
129
592
|
* Added an `:output_encoding` config setting to allow the character encoding
|
130
593
|
for HTML output to be specified. The default is utf-8.
|
594
|
+
|
131
595
|
* The environment hash passed into transformers now includes a `:node_name`
|
132
596
|
item containing the lowercase name of the current HTML node (e.g. "div").
|
597
|
+
|
133
598
|
* Returning anything other than a Hash or nil from a transformer will now
|
134
599
|
raise a meaningful `Sanitize::Error` exception rather than an unintended
|
135
600
|
`NameError`.
|
136
601
|
|
137
602
|
|
138
|
-
|
139
|
-
--------------------------
|
603
|
+
## 1.2.0 (2010-01-17)
|
140
604
|
|
141
605
|
* Requires Nokogiri ~> 1.4.1.
|
606
|
+
|
142
607
|
* Added support for transformers, which allow you to filter and alter nodes
|
143
608
|
using your own custom logic, on top of (or instead of) Sanitize's core
|
144
609
|
filter. See the README for details and examples.
|
610
|
+
|
145
611
|
* Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
|
146
612
|
all its children.
|
147
|
-
|
613
|
+
|
614
|
+
* Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
|
148
615
|
David Reese]
|
149
616
|
|
150
617
|
|
151
|
-
|
152
|
-
--------------------------
|
618
|
+
## 1.1.0 (2009-10-11)
|
153
619
|
|
154
620
|
* Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
|
621
|
+
|
155
622
|
* Added an `:output` config setting to allow the output format to be
|
156
623
|
specified. Supported formats are `:xhtml` (the default) and `:html` (which
|
157
624
|
outputs HTML4).
|
625
|
+
|
158
626
|
* Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
|
159
627
|
path segments. [Peter Cooper]
|
160
628
|
|
161
629
|
|
162
|
-
|
163
|
-
--------------------------
|
630
|
+
## 1.0.8 (2009-04-23)
|
164
631
|
|
165
632
|
* Added a workaround for an Hpricot bug that prevents attribute names from
|
166
633
|
being downcased in recent versions of Hpricot. This was exploitable to
|
167
|
-
prevent non-
|
634
|
+
prevent non-allowlisted protocols from being cleaned. [Reported by Ben
|
168
635
|
Wanicur]
|
169
636
|
|
170
637
|
|
171
|
-
|
172
|
-
--------------------------
|
638
|
+
## 1.0.7 (2009-04-11)
|
173
639
|
|
174
640
|
* Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
|
641
|
+
|
175
642
|
* Fixed a bug that caused named character entities containing digits (like
|
176
643
|
`²`) to be escaped when they shouldn't have been. [Reported by
|
177
644
|
Sebastian Steinmetz]
|
178
645
|
|
179
646
|
|
180
|
-
|
181
|
-
--------------------------
|
647
|
+
## 1.0.6 (2009-02-23)
|
182
648
|
|
183
649
|
* Removed htmlentities gem dependency.
|
650
|
+
|
184
651
|
* Existing well-formed character entity references in the input string are now
|
185
652
|
preserved rather than being decoded and re-encoded.
|
653
|
+
|
186
654
|
* The `'` character is now encoded as `'` instead of `'` to prevent
|
187
655
|
problems in IE6.
|
656
|
+
|
188
657
|
* You can now specify the symbol `:all` in place of an element name in the
|
189
658
|
attributes config hash to allow certain attributes on all elements. [Thanks
|
190
659
|
to Mutwin Kraus]
|
191
660
|
|
192
661
|
|
193
|
-
|
194
|
-
--------------------------
|
662
|
+
## 1.0.5 (2009-02-05)
|
195
663
|
|
196
|
-
* Fixed a bug introduced in version 1.0.3 that prevented non-
|
664
|
+
* Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
|
197
665
|
protocols from being cleaned when relative URLs were allowed. [Reported by
|
198
666
|
Dev Purkayastha]
|
667
|
+
|
199
668
|
* Fixed "undefined method `parent='" exceptions caused by parser changes in
|
200
669
|
edge Hpricot.
|
201
670
|
|
202
671
|
|
203
|
-
|
204
|
-
--------------------------
|
672
|
+
## 1.0.4 (2009-01-16)
|
205
673
|
|
206
|
-
* Fixed a bug that made it possible to sneak a non-
|
674
|
+
* Fixed a bug that made it possible to sneak a non-allowlisted element through
|
207
675
|
by repeating it several times in a row. All versions of Sanitize prior to
|
208
676
|
1.0.4 are vulnerable. [Reported by Cristobal]
|
209
677
|
|
210
678
|
|
211
|
-
|
212
|
-
--------------------------
|
679
|
+
## 1.0.3 (2009-01-15)
|
213
680
|
|
214
681
|
* Fixed a bug whereby incomplete Unicode or hex entities could be used to
|
215
|
-
prevent non-
|
682
|
+
prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
|
216
683
|
still decode the incomplete entities, users of those browsers may be
|
217
684
|
vulnerable to malicious script injection on websites using versions of
|
218
685
|
Sanitize prior to 1.0.3.
|
219
686
|
|
220
687
|
|
221
|
-
|
222
|
-
--------------------------
|
688
|
+
## 1.0.2 (2009-01-04)
|
223
689
|
|
224
690
|
* Fixed a bug that caused an exception to be thrown when parsing a valueless
|
225
691
|
attribute that's expected to contain a URL.
|
226
692
|
|
227
693
|
|
228
|
-
|
229
|
-
--------------------------
|
694
|
+
## 1.0.1 (2009-01-01)
|
230
695
|
|
231
696
|
* You can now specify `:relative` in a protocol config array to allow
|
232
697
|
attributes containing relative URLs with no protocol. The Basic and Relaxed
|
233
698
|
configs have been updated to allow relative URLs.
|
699
|
+
|
234
700
|
* Added a workaround for an Hpricot bug that causes HTML entities for
|
235
701
|
non-ASCII characters to be replaced by question marks, and all other
|
236
702
|
entities to be destructively decoded.
|
237
703
|
|
238
704
|
|
239
|
-
|
240
|
-
--------------------------
|
705
|
+
## 1.0.0 (2008-12-25)
|
241
706
|
|
242
707
|
* First release.
|