sanitize 2.1.1 → 6.0.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of sanitize might be problematic. Click here for more details.

data/HISTORY.md CHANGED
@@ -1,15 +1,477 @@
1
- Sanitize History
2
- ================================================================================
1
+ # Sanitize History
3
2
 
4
- Version 2.1.1 (2018-09-30)
5
- --------------------------
3
+ ## 6.0.0 (2021-08-03)
4
+
5
+ ### Potentially Breaking Changes
6
+
7
+ * Ruby 2.5.0 is now the oldest officially supported Ruby version.
8
+
9
+ * Sanitize now requires Nokogiri 1.12.0 or higher, which includes Nokogumbo.
10
+ The separate dependency on Nokogumbo has been removed. [@lis2 - #211][211]
11
+
12
+ [211]:https://github.com/rgrove/sanitize/pull/211
13
+
14
+ ## 5.2.3 (2021-01-11)
15
+
16
+ ### Bug Fixes
17
+
18
+ * Ensure protocol sanitization is applied to data attributes.
19
+ [@ccutrer - #207][207]
20
+
21
+ [207]:https://github.com/rgrove/sanitize/pull/207
22
+
23
+ ## 5.2.2 (2021-01-06)
24
+
25
+ ### Bug Fixes
26
+
27
+ * Fixed a deprecation warning in Ruby 2.7+ when using keyword arguments in a
28
+ custom transformer. [@mscrivo - #206][206]
29
+
30
+ [206]:https://github.com/rgrove/sanitize/pull/206
31
+
32
+ ## 5.2.1 (2020-06-16)
33
+
34
+ ### Bug Fixes
35
+
36
+ * Fixed an HTML sanitization bypass that could allow XSS. This issue affects
37
+ Sanitize versions 3.0.0 through 5.2.0.
38
+
39
+ When HTML was sanitized using the "relaxed" config or a custom config that
40
+ allows certain elements, some content in a `<math>` or `<svg>` element may not
41
+ have beeen sanitized correctly even if `math` and `svg` were not in the
42
+ allowlist. This could allow carefully crafted input to sneak arbitrary HTML
43
+ through Sanitize, potentially enabling an XSS (cross-site scripting) attack.
44
+
45
+ You are likely to be vulnerable to this issue if you use Sanitize's relaxed
46
+ config or a custom config that allows one or more of the following HTML
47
+ elements:
48
+
49
+ - `iframe`
50
+ - `math`
51
+ - `noembed`
52
+ - `noframes`
53
+ - `noscript`
54
+ - `plaintext`
55
+ - `script`
56
+ - `style`
57
+ - `svg`
58
+ - `xmp`
59
+
60
+ See the security advisory for more details, including a workaround if you're
61
+ not able to upgrade: [GHSA-p4x4-rw2p-8j8m]
62
+
63
+ Many thanks to Michał Bentkowski of Securitum for reporting this issue and
64
+ helping to verify the fix.
65
+
66
+ [GHSA-p4x4-rw2p-8j8m]:https://github.com/rgrove/sanitize/security/advisories/GHSA-p4x4-rw2p-8j8m
67
+
68
+ ## 5.2.0 (2020-06-06)
69
+
70
+ ### Changes
71
+
72
+ * The term "whitelist" has been replaced with "allowlist" throughout Sanitize's
73
+ source and documentation.
74
+
75
+ While the etymology of "whitelist" may not be explicitly racist in origin or
76
+ intent, there are inherent racial connotations in the implication that white
77
+ is good and black (as in "blacklist") is not.
78
+
79
+ This is a change I should have made long ago, and I apologize for not making
80
+ it sooner.
81
+
82
+ * In transformer input, the `:is_whitelisted` and `:node_whitelist` keys are now
83
+ deprecated. New `:is_allowlisted` and `:node_allowlist` keys have been added.
84
+ The old keys will continue to work in order to avoid breaking existing code,
85
+ but they are no longer documented and may be removed in a future semver major
86
+ release.
87
+
88
+ ## 5.1.0 (2019-09-07)
89
+
90
+ ### Features
91
+
92
+ * Added a `:parser_options` config hash, which makes it possible to pass custom
93
+ parsing options to Nokogumbo. [@austin-wang - #194][194]
94
+
95
+ ### Bug Fixes
96
+
97
+ * Non-characters and non-whitespace control characters are now stripped from
98
+ HTML input before parsing to comply with the HTML Standard's [preprocessing
99
+ guidelines][html-preprocessing]. Prior to this Sanitize had adhered to [older
100
+ W3C guidelines][unicode-xml] that have since been withdrawn. [#179][179]
101
+
102
+ [179]:https://github.com/rgrove/sanitize/issues/179
103
+ [194]:https://github.com/rgrove/sanitize/pull/194
104
+ [html-preprocessing]:https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream
105
+ [unicode-xml]:https://www.w3.org/TR/unicode-xml/
106
+
107
+ ## 5.0.0 (2018-10-14)
108
+
109
+ For most users, upgrading from 4.x shouldn't require any changes. However, the
110
+ minimum required Ruby version has changed, and Sanitize 5.x's HTML output may
111
+ differ in some small ways from 4.x's output. If this matters to you, please
112
+ review the changes below carefully.
113
+
114
+ ### Potentially Breaking Changes
115
+
116
+ * Ruby 2.3.0 is now the oldest officially supported Ruby version. Sanitize may
117
+ work in older 2.x Rubies, but they aren't actively tested. Sanitize definitely
118
+ no longer works in Ruby 1.9.x.
119
+
120
+ * Upgraded to Nokogumbo 2.x, which fixes various bugs and adds
121
+ standard-compliant HTML serialization. [@stevecheckoway - #189][189]
122
+
123
+ * Children of the following elements are now removed by default when these
124
+ elements are removed, rather than being preserved and escaped:
125
+
126
+ - `iframe`
127
+ - `noembed`
128
+ - `noframes`
129
+ - `noscript`
130
+ - `script`
131
+ - `style`
132
+
133
+ * Children of allowlisted `iframe` elements are now always removed. In modern
134
+ HTML, `iframe` elements should never have children. In HTML 4 and earlier
135
+ `iframe` elements were allowed to contain fallback content for legacy
136
+ browsers, but it's been almost two decades since that was useful.
137
+
138
+ * Fixed a bug that caused `:remove_contents` to behave as if it were set to
139
+ `true` when it was actually an Array.
140
+
141
+ [189]:https://github.com/rgrove/sanitize/pull/189
142
+
143
+ ## 4.6.6 (2018-07-23)
144
+
145
+ * Improved performance and memory usage by optimizing `Sanitize#transform_node!`
146
+ [@stanhu - #183][183]
147
+
148
+ [183]:https://github.com/rgrove/sanitize/pull/183
149
+
150
+ ## 4.6.5 (2018-05-16)
151
+
152
+ * Improved performance slightly by tweaking the order of built-in transformers.
153
+ [@rafbm - #180][180]
154
+
155
+ [180]:https://github.com/rgrove/sanitize/pull/180
156
+
157
+ ## 4.6.4 (2018-03-20)
158
+
159
+ * Fixed: A change introduced in 4.6.2 broke certain transformers that relied on
160
+ being able to mutate the name of an HTML node. That change has been reverted
161
+ and a test has been added to cover this case. [@zetter - #177][177]
162
+
163
+ [177]:https://github.com/rgrove/sanitize/issues/177
164
+
165
+ ## 4.6.3 (2018-03-19)
166
+
167
+ * [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
168
+ XSS.
169
+
170
+ When Sanitize <= 4.6.2 is used in combination with libxml2 >= 2.9.2, a
171
+ specially crafted HTML fragment can cause libxml2 to generate improperly
172
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
173
+ elements.
174
+
175
+ Sanitize now performs additional escaping on affected attributes to prevent
176
+ this.
177
+
178
+ Many thanks to the Shopify Application Security Team for responsibly reporting
179
+ this issue.
180
+
181
+ [176]:https://github.com/rgrove/sanitize/issues/176
182
+
183
+ ## 4.6.2 (2018-03-19)
184
+
185
+ * Reduced string allocations to optimize memory usage. [@janklimo - #175][175]
186
+
187
+ [175]:https://github.com/rgrove/sanitize/pull/175
188
+
189
+ ## 4.6.1 (2018-03-15)
190
+
191
+ * Added support for frozen string literals in Ruby 2.4+.
192
+ [@flavorjones - #174][174]
193
+
194
+ [174]:https://github.com/rgrove/sanitize/pull/174
195
+
196
+ ## 4.6.0 (2018-01-29)
197
+
198
+ * Loosened the Nokogumbo dependency to allow installing semver-compatible
199
+ versions greater than or equal to v1.4. [@rafbm - #171][171]
200
+
201
+ [171]:https://github.com/rgrove/sanitize/pull/171
202
+
203
+ ## 4.5.0 (2017-06-04)
204
+
205
+ * Added SVG-related CSS properties to the relaxed config. See [the diff][161]
206
+ for the full list of added properties. [@louim - #161][161]
207
+
208
+ * Fixed: Sanitize now strips null bytes (`\u0000`) before passing input to
209
+ Nokogumbo, since they can cause recent versions to crash with a failed
210
+ assertion in the Gumbo parser.
211
+
212
+ [161]:https://github.com/rgrove/sanitize/pull/161
213
+
214
+ ## 4.4.0 (2016-09-29)
215
+
216
+ * Added `srcset` to the attribute allowlist for `img` elements in the relaxed
217
+ config. [@ejtttje - #156][156]
218
+
219
+ [156]:https://github.com/rgrove/sanitize/pull/156
220
+
221
+
222
+ ## 4.3.0 (2016-09-20)
223
+
224
+ * Methods can now be used as transformers. [@Skipants - #155][155]
225
+
226
+ [155]:https://github.com/rgrove/sanitize/pull/155
227
+
228
+
229
+ ## 4.2.0 (2016-08-22)
230
+
231
+ * Added `-webkit-font-smoothing` to the relaxed CSS config. [@louim - #154][154]
232
+
233
+ * Fixed: Nokogumbo >=1.4.9 changed its behavior in a way that allowed invalid
234
+ doctypes (like `<!DOCTYPE nonsense>`) when the `:allow_doctype` config setting
235
+ was `true`. Invalid doctypes are now coerced to valid ones as they were prior
236
+ to this Nokogumbo change.
237
+
238
+ [154]:https://github.com/rgrove/sanitize/pull/154
239
+
240
+
241
+ ## 4.1.0 (2016-06-17)
242
+
243
+ * Added a new CSS config setting, `:import_url_validator`. This is a Proc or
244
+ other callable object that will be called with each `@import` URL, and should
245
+ return `true` to allow the URL or `false` to remove it. [@nikz - #153][153]
246
+
247
+ [153]:https://github.com/rgrove/sanitize/pull/153/
248
+
249
+
250
+ ## 4.0.1 (2015-12-09)
251
+
252
+ * Unpinned the Nokogumbo dependency. [@rubys - #141][141]
253
+
254
+ [141]:https://github.com/rgrove/sanitize/pull/141
255
+
256
+
257
+ ## 4.0.0 (2015-04-20)
258
+
259
+ ### Potentially breaking changes
260
+
261
+ * Added two new CSS config settings, `:at_rules_with_properties` and
262
+ `:at_rules_with_styles`. These allow you to define which at-rules should be
263
+ allowed to contain properties and which should be allowed to contain style
264
+ rules. Previously this was hard-coded internally. [#111][111]
265
+
266
+ The previous `:at_rules` setting still exists, and defines at-rules that may
267
+ not have associated blocks, such as `@import`. If you have a custom config
268
+ that contains an `:at_rules` setting, you may need to move rules can have
269
+ blocks to either `:at_rules_with_properties` or `:at_rules_with_styles`.
270
+
271
+ See Sanitize's relaxed config for an example.
272
+
273
+ ### Other changes
274
+
275
+ * Added full support for CSS `@page` rules in the relaxed config, including
276
+ support for all page-margin box rules (such as `@top-left`, `@bottom-center`,
277
+ etc.)
278
+
279
+ * Added the following CSS at-rules to the relaxed config:
280
+
281
+ - `@-moz-keyframes`
282
+ - `@-o-keyframes`
283
+ - `@-webkit-keyframes`
284
+ - `@document`
285
+
286
+ * Added a whole bunch of CSS properties to the relaxed config. View the complete
287
+ list [here](https://gist.github.com/rgrove/044cc7e9a5b44f583c05).
288
+
289
+ * Small performance improvements.
290
+
291
+ * Fixed: Upgraded Crass to 1.0.2 to pick up a fix that affected the parsing of
292
+ CSS `@page` rules.
293
+
294
+ [111]:https://github.com/rgrove/sanitize/issues/111
295
+
296
+
297
+ ## 3.1.2 (2015-02-22)
298
+
299
+ * Fixed: Deleting a node in a custom transformer could trigger a memory leak
300
+ in Nokogiri if that node's children were later reparented, which the built-in
301
+ CleanElement transformer did by default. The CleanElement transformer is now
302
+ careful not to reparent the children of deleted nodes. [#129][129]
303
+
304
+ [129]:https://github.com/rgrove/sanitize/issues/129
305
+
306
+
307
+ ## 3.1.1 (2015-02-04)
308
+
309
+ * Fixed: `#document` and `#fragment` failed on frozen strings, and could
310
+ unintentionally modify unfrozen strings if they used an encoding other than
311
+ UTF-8 or if they contained characters not allowed in HTML.
312
+ [@AnchorCat - #128][128]
313
+
314
+ [128]:https://github.com/rgrove/sanitize/pull/128
315
+
316
+
317
+ ## 3.1.0 (2014-12-22)
318
+
319
+ * Added the following CSS properties to the relaxed config. [@ehudc - #120][120]
320
+
321
+ - `-moz-text-size-adjust`
322
+ - `-ms-text-size-adjust`
323
+ - `-webkit-text-size-adjust`
324
+ - `text-size-adjust`
325
+
326
+ * Updated Nokogumbo to 1.2.0 to pick up a fix for a Gumbo bug where the
327
+ entity `&AElig;` left its semicolon behind when it was converted to a
328
+ character during parsing. [#119][119]
329
+
330
+ [119]:https://github.com/rgrove/sanitize/issues/119
331
+ [120]:https://github.com/rgrove/sanitize/pull/120
332
+
333
+
334
+ ## 3.0.4 (2014-12-12)
335
+
336
+ * Fixed: Harmless whitespace preceding a URL protocol (such as " http://")
337
+ caused the URL to be removed even when the protocol was allowlisted.
338
+ [@benubois - #126][126]
339
+
340
+ [126]:https://github.com/rgrove/sanitize/pull/126
341
+
342
+
343
+ ## 3.0.3 (2014-10-29)
344
+
345
+ * Fixed: Some CSS selectors weren't parsed correctly inside the body of a
346
+ `@media` block, causing them to be removed even when allowlist rules should
347
+ have allowed them to remain. [#121][121]
348
+
349
+ [121]:https://github.com/rgrove/sanitize/issues/121
350
+
351
+
352
+ ## 3.0.2 (2014-09-02)
353
+
354
+ * Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we
355
+ were trying to pick up in the last release. Now issue [#114][114] is
356
+ _actually_ fixed.
357
+
358
+
359
+ ## 3.0.1 (2014-09-02)
360
+
361
+ * Updated Nokogumbo to 1.1.11 to pick up a fix for a Gumbo bug in which certain
362
+ HTML character entities, such as `&Ouml;`, were parsed incorrectly, leaving
363
+ the semicolon behind in the output. [#114][114]
364
+
365
+ [114]:https://github.com/rgrove/sanitize/issues/114
366
+
367
+
368
+ ## 3.0.0 (2014-06-21)
369
+
370
+ As of this version, Sanitize adheres strictly to the [SemVer 2.0.0][semver]
371
+ versioning standard. This release contains API and output changes that are
372
+ incompatible with previous releases, as indicated by the major version
373
+ increment.
374
+
375
+ [semver]:http://semver.org/
376
+
377
+ ### Backwards-incompatible changes
378
+
379
+ * HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the
380
+ HTML5 parsing spec and behaves much more like modern browser parsers than the
381
+ previous libxml2-based parser. As a result, HTML output may differ from that
382
+ of previous versions of Sanitize.
383
+
384
+ * All transformers now traverse the document from the top down, starting with
385
+ the first node, then its first child, and so on. The `:transformers_breadth`
386
+ config has been removed, and old bottom-up transformers (the previous default)
387
+ may need to be rewritten.
388
+
389
+ * Sanitize's built-in configs are now deeply frozen to prevent people from
390
+ modifying them (either accidentally or maliciously). To customize a built-in
391
+ config, create a new copy using `Sanitize::Config.merge()`, like so:
392
+
393
+ ```ruby
394
+ Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
395
+ :elements => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
396
+ :remove_contents => true
397
+ ))
398
+ ```
399
+
400
+ * The `clean!` and `clean_document!` methods were removed, since they weren't
401
+ useful and tended to confuse people.
402
+
403
+ * The `clean` method was renamed to `fragment` to more clearly indicate that its
404
+ intended use is to sanitize an HTML fragment.
405
+
406
+ * The `clean_document` method was renamed to `document`.
407
+
408
+ * The `clean_node!` method was renamed to `node!`.
409
+
410
+ * The `document` method now raises a `Sanitize::Error` if the `<html>` element
411
+ isn't allowlisted, rather than a `RuntimeError`. This error is also now raised
412
+ regardless of the `:remove_contents` config setting.
413
+
414
+ * The `:output` config has been removed. Output is now always HTML, not XHTML.
415
+
416
+ * The `:output_encoding` config has been removed. Output is now always UTF-8.
417
+
418
+ ### Other changes
419
+
420
+ * Added advanced CSS sanitization support using [Crass][crass], which is fully
421
+ compliant with the CSS Syntax Module Level 3 parsing spec. The contents of
422
+ allowlisted `<style>` elements and `style` attributes in HTML will be
423
+ sanitized as CSS, or you can use the `Sanitize::CSS` class to manually
424
+ sanitize CSS stylesheets or properties.
425
+
426
+ * Added an `:allow_doctype` setting. When `true`, well-formed doctype
427
+ definitions will be allowed in documents. When `false` (the default), doctype
428
+ definitions will be removed from documents. Doctype definitions are never
429
+ allowed in fragments, regardless of this setting.
430
+
431
+ * Added the following elements to the relaxed config, in addition to various
432
+ attributes: `article`, `aside`, `body`, `data`, `div`, `footer`, `head`,
433
+ `header`, `html`, `main`, `nav`, `section`, `span`, `style`, `title`.
434
+
435
+ * The `:whitespace_elements` config is now a Hash, and allows you to specify the
436
+ text that should be inserted before and after these elements when they're
437
+ removed. The old-style Array-based config value is still supported for
438
+ backwards compatibility. [@alperkokmen - #94][94]
439
+
440
+ * Unsuitable Unicode characters are now removed from HTML before it's parsed.
441
+ [#106][106]
442
+
443
+ * Fixed: Non-tag brackets in input like `"1 > 2 and 2 < 1"` are now parsed and
444
+ escaped correctly in accordance with the HTML5 spec, becoming
445
+ `"1 &gt; 2 and 2 &lt; 1"`. [#83][83]
446
+
447
+ * Fixed: Siblings added after the current node during traversal are now
448
+ also traversed. In previous versions they were simply skipped. [#91][91]
449
+
450
+ * Fixed: Nokogiri has been smacked and instructed to stop adding newlines after
451
+ certain elements, because if people wanted newlines there they'd have put them
452
+ there, dammit. [#103][103]
453
+
454
+ * Fixed: Added a workaround for a libxml2 bug that caused an undesired
455
+ content-type meta tag to be added to all documents with `<head>` elements.
456
+ [Nokogiri #1008][n1008]
457
+
458
+ [crass]:https://github.com/rgrove/crass
459
+ [83]:https://github.com/rgrove/sanitize/issues/83
460
+ [91]:https://github.com/rgrove/sanitize/issues/91
461
+ [94]:https://github.com/rgrove/sanitize/pull/94/
462
+ [103]:https://github.com/rgrove/sanitize/issues/103
463
+ [106]:https://github.com/rgrove/sanitize/issues/106
464
+ [n1008]:https://github.com/sparklemotion/nokogiri/issues/1008
465
+
466
+
467
+ ## 2.1.1 (2018-09-30)
6
468
 
7
469
  * [CVE-2018-3740][176]: Fixed an HTML injection vulnerability that could allow
8
470
  XSS (backported from Sanitize 4.6.3). [@dometto - #188][188]
9
471
 
10
472
  When Sanitize <= 2.1.0 is used in combination with libxml2 >= 2.9.2, a
11
473
  specially crafted HTML fragment can cause libxml2 to generate improperly
12
- escaped output, allowing non-whitelisted attributes to be used on whitelisted
474
+ escaped output, allowing non-allowlisted attributes to be used on allowlisted
13
475
  elements.
14
476
 
15
477
  Sanitize now performs additional escaping on affected attributes to prevent
@@ -22,10 +484,9 @@ Version 2.1.1 (2018-09-30)
22
484
  [188]:https://github.com/rgrove/sanitize/pull/188
23
485
 
24
486
 
25
- Version 2.1.0 (2014-01-13)
26
- --------------------------
487
+ ## 2.1.0 (2014-01-13)
27
488
 
28
- * Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
489
+ * Added support for allowlisting arbitrary HTML5 `data-*` attributes. Use the
29
490
  symbol `:data` instead of an attribute name in the `:attributes` config to
30
491
  indicate that arbitrary data attributes should be allowed on an element.
31
492
 
@@ -38,16 +499,14 @@ Version 2.1.0 (2014-01-13)
38
499
  [87]:https://github.com/rgrove/sanitize/pull/87
39
500
 
40
501
 
41
- Version 2.0.6 (2013-07-10)
42
- --------------------------
502
+ ## 2.0.6 (2013-07-10)
43
503
 
44
504
  * Fixed: Version 2.0.5 inadvertently included some work-in-progress changes that
45
505
  shouldn't have made their way into the master branch. This is what happens
46
506
  when I release before coffee instead of after.
47
507
 
48
508
 
49
- Version 2.0.5 (2013-07-10)
50
- --------------------------
509
+ ## 2.0.5 (2013-07-10)
51
510
 
52
511
  * Loosened the Nokogiri dependency back to >= 1.4.4 to allow Sanitize to coexist
53
512
  in newer Rubies with other libraries that restrict Nokogiri to 1.5.x for 1.8.7
@@ -55,8 +514,7 @@ Version 2.0.5 (2013-07-10)
55
514
  life easier for people who need those other libs.
56
515
 
57
516
 
58
- Version 2.0.4 (2013-06-12)
59
- --------------------------
517
+ ## 2.0.4 (2013-06-12)
60
518
 
61
519
  * Added `Sanitize.clean_document`, which sanitizes a full HTML document rather
62
520
  than just a fragment. [Ben Anderson]
@@ -66,14 +524,12 @@ Version 2.0.4 (2013-06-12)
66
524
  * Dropped support for Ruby versions older than 1.9.2.
67
525
 
68
526
 
69
- Version 2.0.3 (2011-07-01)
70
- --------------------------
527
+ ## 2.0.3 (2011-07-01)
71
528
 
72
529
  * Loosened the Nokogiri dependency to allow Nokogiri 1.5.x.
73
530
 
74
531
 
75
- Version 2.0.2 (2011-05-21)
76
- --------------------------
532
+ ## 2.0.2 (2011-05-21)
77
533
 
78
534
  * Fixed a bug in which a protocol like "java\script:" would be translated to
79
535
  "java%5Cscript:" and allowed through the filter when relative URLs were
@@ -81,162 +537,171 @@ Version 2.0.2 (2011-05-21)
81
537
  undesired behavior.
82
538
 
83
539
 
84
- Version 2.0.1 (2011-03-16)
85
- --------------------------
540
+ ## 2.0.1 (2011-03-16)
86
541
 
87
542
  * Updated the protocol regex to anchor at the beginning of the string rather
88
543
  than the beginning of a line. [Eaden McKee]
89
544
 
90
545
 
91
- Version 2.0.0 (2011-01-15)
92
- --------------------------
546
+ ## 2.0.0 (2011-01-15)
93
547
 
94
548
  * The environment data passed into transformers and the return values expected
95
549
  from transformers have changed. Old transformers will need to be updated.
96
550
  See the README for details.
551
+
97
552
  * Transformers now receive nodes of all types, not just element nodes.
553
+
98
554
  * Sanitize's own core filtering logic is now implemented as a set of always-on
99
555
  transformers.
556
+
100
557
  * The default value for the `:output` config is now `:html`. Previously it was
101
558
  `:xhtml`.
559
+
102
560
  * Added a `:whitespace_elements` config, which specifies elements (such as
103
561
  `<br>` and `<p>`) that should be replaced with whitespace when removed in
104
562
  order to preserve readability. See the README for the default list of
105
563
  elements that will be replaced with whitespace when removed.
564
+
106
565
  * Added a `:transformers_breadth` config, which may be used to specify
107
566
  transformers that should traverse nodes in a breadth-first mode rather than
108
567
  the default depth-first mode.
568
+
109
569
  * Added the `abbr`, `dfn`, `kbd`, `mark`, `s`, `samp`, `time`, and `var`
110
- elements to the whitelists for the basic and relaxed configs.
570
+ elements to the allowlists for the basic and relaxed configs.
571
+
111
572
  * Added the `bdo`, `del`, `figcaption`, `figure`, `hgroup`, `ins`, `rp`, `rt`,
112
- `ruby`, and `wbr` elements to the whitelist for the relaxed config.
113
- * The `dir`, `lang`, and `title` attributes are now whitelisted for all
573
+ `ruby`, and `wbr` elements to the allowlist for the relaxed config.
574
+
575
+ * The `dir`, `lang`, and `title` attributes are now allowlisted for all
114
576
  elements in the relaxed config.
577
+
115
578
  * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+
116
579
  (issue #315) that caused `</body></html>` to be appended to the CDATA inside
117
580
  unterminated script and style elements.
118
581
 
119
582
 
120
- Version 1.2.1 (2010-04-20)
121
- --------------------------
583
+ ## 1.2.1 (2010-04-20)
122
584
 
123
585
  * Added a `:remove_contents` config setting. If set to `true`, Sanitize will
124
- remove the contents of all non-whitelisted elements in addition to the
586
+ remove the contents of all non-allowlisted elements in addition to the
125
587
  elements themselves. If set to an array of element names, Sanitize will
126
588
  remove the contents of only those elements (when filtered), and leave the
127
589
  contents of other filtered elements. [Thanks to Rafael Souza for the array
128
590
  option]
591
+
129
592
  * Added an `:output_encoding` config setting to allow the character encoding
130
593
  for HTML output to be specified. The default is utf-8.
594
+
131
595
  * The environment hash passed into transformers now includes a `:node_name`
132
596
  item containing the lowercase name of the current HTML node (e.g. "div").
597
+
133
598
  * Returning anything other than a Hash or nil from a transformer will now
134
599
  raise a meaningful `Sanitize::Error` exception rather than an unintended
135
600
  `NameError`.
136
601
 
137
602
 
138
- Version 1.2.0 (2010-01-17)
139
- --------------------------
603
+ ## 1.2.0 (2010-01-17)
140
604
 
141
605
  * Requires Nokogiri ~> 1.4.1.
606
+
142
607
  * Added support for transformers, which allow you to filter and alter nodes
143
608
  using your own custom logic, on top of (or instead of) Sanitize's core
144
609
  filter. See the README for details and examples.
610
+
145
611
  * Added `Sanitize.clean_node!`, which sanitizes a `Nokogiri::XML::Node` and
146
612
  all its children.
147
- * Added elements `<h1>` through `<h6>` to the Relaxed whitelist. [Suggested by
613
+
614
+ * Added elements `<h1>` through `<h6>` to the Relaxed allowlist. [Suggested by
148
615
  David Reese]
149
616
 
150
617
 
151
- Version 1.1.0 (2009-10-11)
152
- --------------------------
618
+ ## 1.1.0 (2009-10-11)
153
619
 
154
620
  * Migrated from Hpricot to Nokogiri. Requires libxml2 >= 2.7.2 [Adam Hooper]
621
+
155
622
  * Added an `:output` config setting to allow the output format to be
156
623
  specified. Supported formats are `:xhtml` (the default) and `:html` (which
157
624
  outputs HTML4).
625
+
158
626
  * Changed protocol regex to ensure Sanitize doesn't kill URLs with colons in
159
627
  path segments. [Peter Cooper]
160
628
 
161
629
 
162
- Version 1.0.8 (2009-04-23)
163
- --------------------------
630
+ ## 1.0.8 (2009-04-23)
164
631
 
165
632
  * Added a workaround for an Hpricot bug that prevents attribute names from
166
633
  being downcased in recent versions of Hpricot. This was exploitable to
167
- prevent non-whitelisted protocols from being cleaned. [Reported by Ben
634
+ prevent non-allowlisted protocols from being cleaned. [Reported by Ben
168
635
  Wanicur]
169
636
 
170
637
 
171
- Version 1.0.7 (2009-04-11)
172
- --------------------------
638
+ ## 1.0.7 (2009-04-11)
173
639
 
174
640
  * Requires Hpricot 0.8.1+, which is finally compatible with Ruby 1.9.1.
641
+
175
642
  * Fixed a bug that caused named character entities containing digits (like
176
643
  `&sup2;`) to be escaped when they shouldn't have been. [Reported by
177
644
  Sebastian Steinmetz]
178
645
 
179
646
 
180
- Version 1.0.6 (2009-02-23)
181
- --------------------------
647
+ ## 1.0.6 (2009-02-23)
182
648
 
183
649
  * Removed htmlentities gem dependency.
650
+
184
651
  * Existing well-formed character entity references in the input string are now
185
652
  preserved rather than being decoded and re-encoded.
653
+
186
654
  * The `'` character is now encoded as `&#39;` instead of `&apos;` to prevent
187
655
  problems in IE6.
656
+
188
657
  * You can now specify the symbol `:all` in place of an element name in the
189
658
  attributes config hash to allow certain attributes on all elements. [Thanks
190
659
  to Mutwin Kraus]
191
660
 
192
661
 
193
- Version 1.0.5 (2009-02-05)
194
- --------------------------
662
+ ## 1.0.5 (2009-02-05)
195
663
 
196
- * Fixed a bug introduced in version 1.0.3 that prevented non-whitelisted
664
+ * Fixed a bug introduced in version 1.0.3 that prevented non-allowlisted
197
665
  protocols from being cleaned when relative URLs were allowed. [Reported by
198
666
  Dev Purkayastha]
667
+
199
668
  * Fixed "undefined method `parent='" exceptions caused by parser changes in
200
669
  edge Hpricot.
201
670
 
202
671
 
203
- Version 1.0.4 (2009-01-16)
204
- --------------------------
672
+ ## 1.0.4 (2009-01-16)
205
673
 
206
- * Fixed a bug that made it possible to sneak a non-whitelisted element through
674
+ * Fixed a bug that made it possible to sneak a non-allowlisted element through
207
675
  by repeating it several times in a row. All versions of Sanitize prior to
208
676
  1.0.4 are vulnerable. [Reported by Cristobal]
209
677
 
210
678
 
211
- Version 1.0.3 (2009-01-15)
212
- --------------------------
679
+ ## 1.0.3 (2009-01-15)
213
680
 
214
681
  * Fixed a bug whereby incomplete Unicode or hex entities could be used to
215
- prevent non-whitelisted protocols from being cleaned. Since IE6 and Opera
682
+ prevent non-allowlisted protocols from being cleaned. Since IE6 and Opera
216
683
  still decode the incomplete entities, users of those browsers may be
217
684
  vulnerable to malicious script injection on websites using versions of
218
685
  Sanitize prior to 1.0.3.
219
686
 
220
687
 
221
- Version 1.0.2 (2009-01-04)
222
- --------------------------
688
+ ## 1.0.2 (2009-01-04)
223
689
 
224
690
  * Fixed a bug that caused an exception to be thrown when parsing a valueless
225
691
  attribute that's expected to contain a URL.
226
692
 
227
693
 
228
- Version 1.0.1 (2009-01-01)
229
- --------------------------
694
+ ## 1.0.1 (2009-01-01)
230
695
 
231
696
  * You can now specify `:relative` in a protocol config array to allow
232
697
  attributes containing relative URLs with no protocol. The Basic and Relaxed
233
698
  configs have been updated to allow relative URLs.
699
+
234
700
  * Added a workaround for an Hpricot bug that causes HTML entities for
235
701
  non-ASCII characters to be replaced by question marks, and all other
236
702
  entities to be destructively decoded.
237
703
 
238
704
 
239
- Version 1.0.0 (2008-12-25)
240
- --------------------------
705
+ ## 1.0.0 (2008-12-25)
241
706
 
242
707
  * First release.