loofah 2.2.3 → 2.19.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +212 -31
  3. data/README.md +18 -24
  4. data/lib/loofah/elements.rb +79 -75
  5. data/lib/loofah/helpers.rb +18 -7
  6. data/lib/loofah/html/document.rb +1 -0
  7. data/lib/loofah/html/document_fragment.rb +4 -2
  8. data/lib/loofah/html5/libxml2_workarounds.rb +8 -7
  9. data/lib/loofah/html5/safelist.rb +1043 -0
  10. data/lib/loofah/html5/scrub.rb +73 -48
  11. data/lib/loofah/instance_methods.rb +14 -8
  12. data/lib/loofah/metahelpers.rb +2 -1
  13. data/lib/loofah/scrubber.rb +8 -7
  14. data/lib/loofah/scrubbers.rb +19 -13
  15. data/lib/loofah/version.rb +5 -0
  16. data/lib/loofah/xml/document.rb +1 -0
  17. data/lib/loofah/xml/document_fragment.rb +2 -1
  18. data/lib/loofah.rb +35 -18
  19. metadata +52 -138
  20. data/.gemtest +0 -0
  21. data/Gemfile +0 -22
  22. data/Manifest.txt +0 -40
  23. data/Rakefile +0 -79
  24. data/benchmark/benchmark.rb +0 -149
  25. data/benchmark/fragment.html +0 -96
  26. data/benchmark/helper.rb +0 -73
  27. data/benchmark/www.slashdot.com.html +0 -2560
  28. data/lib/loofah/html5/whitelist.rb +0 -186
  29. data/test/assets/msword.html +0 -63
  30. data/test/assets/testdata_sanitizer_tests1.dat +0 -502
  31. data/test/helper.rb +0 -18
  32. data/test/html5/test_sanitizer.rb +0 -382
  33. data/test/integration/test_ad_hoc.rb +0 -204
  34. data/test/integration/test_helpers.rb +0 -43
  35. data/test/integration/test_html.rb +0 -72
  36. data/test/integration/test_scrubbers.rb +0 -400
  37. data/test/integration/test_xml.rb +0 -55
  38. data/test/unit/test_api.rb +0 -142
  39. data/test/unit/test_encoding.rb +0 -20
  40. data/test/unit/test_helpers.rb +0 -62
  41. data/test/unit/test_scrubber.rb +0 -229
  42. data/test/unit/test_scrubbers.rb +0 -14
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c22c1a749ff878b96f0c4a53e789834fa8072775c5abdccb68c388d6218b1bce
4
- data.tar.gz: e8d00e6ff5d623b3f3d03ce83ee780a88e92138fcb71efff28194f8a7d87e5fc
3
+ metadata.gz: 3d59ed56910860de60170e919b3ab77b382f00eadc5d37518a7a395edabc8a4f
4
+ data.tar.gz: d0ed6a2362ec8b366f4739a67c2197a24c45e0681cba6e5bd6b7b55617d492dc
5
5
  SHA512:
6
- metadata.gz: 0d5a0160010d61a51dad8e31bc644e03454311b99b1d71c6eaea5458cfaaa228671b82db52cf2369b42c48b636b912ca0d812191ac886a5c1499c44fc5221239
7
- data.tar.gz: ac479e283ef08b0df14938ec577a3aa4008d07ba3288232541928794cd0b9fe2512da88ac7fd2d123666dcad67d09c1a07307442610f61adbfd65f143ae339b5
6
+ metadata.gz: dabaf4204cf846132d0b2962cef11534e3043ae8b2be39cbf23dea2fabc3722d83fba8805a5453fca6f2ec80f13c48c62726751f6acf06d3fdfd427297f07968
7
+ data.tar.gz: 84d3442b65227346d62df8ea24ef0febe3212b1a1bdb61266f22cafc356467637f2a3a050d4c52672d55e081a3e040d2cb423961d571cf364978265398742f47
data/CHANGELOG.md CHANGED
@@ -1,12 +1,193 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.19.0 / 2022-09-14
4
+
5
+ ### Features
6
+
7
+ * Allow SVG 1.0 color keyword names in CSS attributes. These colors are part of the [CSS Color Module Level 3](https://www.w3.org/TR/css-color-3/#svg-color) recommendation released 2022-01-18. [[#243](https://github.com/flavorjones/loofah/issues/243)]
8
+
9
+
10
+ ## 2.18.0 / 2022-05-11
11
+
12
+ ### Features
13
+
14
+ * Allow CSS property `aspect-ratio`. [[#236](https://github.com/flavorjones/loofah/issues/236)] (Thanks, [@louim](https://github.com/louim)!)
15
+
16
+
17
+ ## 2.17.0 / 2022-04-28
18
+
19
+ ### Features
20
+
21
+ * Allow ARIA attributes. [[#232](https://github.com/flavorjones/loofah/issues/232), [#233](https://github.com/flavorjones/loofah/issues/233)] (Thanks, [@nick-desteffen](https://github.com/nick-desteffen)!)
22
+
23
+
24
+ ## 2.16.0 / 2022-04-01
25
+
26
+ ### Features
27
+
28
+ * Allow MathML elements `menclose` and `ms`, and MathML attributes `dir`, `href`, `lquote`, `mathsize`, `notation`, and `rquote`. [[#231](https://github.com/flavorjones/loofah/issues/231)] (Thanks, [@nick-desteffen](https://github.com/nick-desteffen)!)
29
+
30
+
31
+ ## 2.15.0 / 2022-03-14
32
+
33
+ ### Features
34
+
35
+ * Expand set of allowed protocols to include `sms:`. [[#228](https://github.com/flavorjones/loofah/issues/228)] (Thanks, [@brendon](https://github.com/brendon)!)
36
+
37
+
38
+ ## 2.14.0 / 2022-02-11
39
+
40
+ ### Features
41
+
42
+ * The `#to_text` method on `Loofah::HTML::{Document,DocumentFragment}` replaces `<br>` line break elements with a newline. [[#225](https://github.com/flavorjones/loofah/issues/225)]
43
+
44
+
45
+ ## 2.13.0 / 2021-12-10
46
+
47
+ ### Bug fixes
48
+
49
+ * Loofah::HTML::DocumentFragment#text no longer serializes top-level comment children. [[#221](https://github.com/flavorjones/loofah/issues/221)]
50
+
51
+
52
+ ## 2.12.0 / 2021-08-11
53
+
54
+ ### Features
55
+
56
+ * Support empty HTML5 data attributes. [[#215](https://github.com/flavorjones/loofah/issues/215)]
57
+
58
+
59
+ ## 2.11.0 / 2021-07-31
60
+
61
+ ### Features
62
+
63
+ * Allow HTML5 element `wbr`.
64
+ * Allow all CSS property values for `border-collapse`. [[#201](https://github.com/flavorjones/loofah/issues/201)]
65
+
66
+
67
+ ### Changes
68
+
69
+ * Deprecating `Loofah::HTML5::SafeList::VOID_ELEMENTS` which is not a canonical list of void HTML4 or HTML5 elements.
70
+ * Removed some elements from `Loofah::HTML5::SafeList::VOID_ELEMENTS` that either are not acceptable elements or aren't considered "void" by libxml2.
71
+
72
+
73
+ ## 2.10.0 / 2021-06-06
74
+
75
+ ### Features
76
+
77
+ * Allow CSS properties `overflow-x` and `overflow-y`. [[#206](https://github.com/flavorjones/loofah/issues/206)] (Thanks, [@sampokuokkanen](https://github.com/sampokuokkanen)!)
78
+
79
+
80
+ ## 2.9.1 / 2021-04-07
81
+
82
+ ### Bug fixes
83
+
84
+ * Fix a regression in v2.9.0 which inappropriately removed CSS properties with quoted string values. [[#202](https://github.com/flavorjones/loofah/issues/202)]
85
+
86
+
87
+ ## 2.9.0 / 2021-01-14
88
+
89
+ ### Features
90
+
91
+ * Handle CSS functions in a CSS shorthand property (like `background`). [[#199](https://github.com/flavorjones/loofah/issues/199), [#200](https://github.com/flavorjones/loofah/issues/200)]
92
+
93
+
94
+ ## 2.8.0 / 2020-11-25
95
+
96
+ ### Features
97
+
98
+ * Allow CSS properties `order`, `flex-direction`, `flex-grow`, `flex-wrap`, `flex-shrink`, `flex-flow`, `flex-basis`, `flex`, `justify-content`, `align-self`, `align-items`, and `align-content`. [[#197](https://github.com/flavorjones/loofah/issues/197)] (Thanks, [@miguelperez](https://github.com/miguelperez)!)
99
+
100
+
101
+ ## 2.7.0 / 2020-08-26
102
+
103
+ ### Features
104
+
105
+ * Allow CSS properties `page-break-before`, `page-break-inside`, and `page-break-after`. [[#190](https://github.com/flavorjones/loofah/issues/190)] (Thanks, [@ahorek](https://github.com/ahorek)!)
106
+
107
+
108
+ ### Fixes
109
+
110
+ * Don't drop the `!important` rule from some CSS properties. [[#191](https://github.com/flavorjones/loofah/issues/191)] (Thanks, [@b7kich](https://github.com/b7kich)!)
111
+
112
+
113
+ ## 2.6.0 / 2020-06-16
114
+
115
+ ### Features
116
+
117
+ * Allow CSS `border-style` keywords. [[#188](https://github.com/flavorjones/loofah/issues/188)] (Thanks, [@tarcisiozf](https://github.com/tarcisiozf)!)
118
+
119
+
120
+ ## 2.5.0 / 2020-04-05
121
+
122
+ ### Features
123
+
124
+ * Allow more CSS length units: "ch", "vw", "vh", "Q", "lh", "vmin", "vmax". [[#178](https://github.com/flavorjones/loofah/issues/178)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas)!)
125
+
126
+
127
+ ### Fixes
128
+
129
+ * Remove comments from `Loofah::HTML::Document`s that exist outside the `html` element. [[#80](https://github.com/flavorjones/loofah/issues/80)]
130
+
131
+
132
+ ### Other changes
133
+
134
+ * Gem metadata being set [[#181](https://github.com/flavorjones/loofah/issues/181)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas)!)
135
+ * Test files removed from gem file [[#180](https://github.com/flavorjones/loofah/issues/180),[#166](https://github.com/flavorjones/loofah/issues/166),[#159](https://github.com/flavorjones/loofah/issues/159)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas) and [@greysteil](https://github.com/greysteil)!)
136
+
137
+
138
+ ## 2.4.0 / 2019-11-25
139
+
140
+ ### Features
141
+
142
+ * Allow CSS property `max-width` [[#175](https://github.com/flavorjones/loofah/issues/175)] (Thanks, [@bchaney](https://github.com/bchaney)!)
143
+ * Allow CSS sizes expressed in `rem` [[#176](https://github.com/flavorjones/loofah/issues/176), [#177](https://github.com/flavorjones/loofah/issues/177)]
144
+ * Add `frozen_string_literal: true` magic comment to all `lib` files. [[#118](https://github.com/flavorjones/loofah/issues/118)]
145
+
146
+
147
+ ## 2.3.1 / 2019-10-22
148
+
149
+ ### Security
150
+
151
+ Address CVE-2019-15587: Unsanitized JavaScript may occur in sanitized output when a crafted SVG element is republished.
152
+
153
+ This CVE's public notice is at [#171](https://github.com/flavorjones/loofah/issues/171)
154
+
155
+
156
+ ## 2.3.0 / 2019-09-28
157
+
158
+ ### Features
159
+
160
+ * Expand set of allowed protocols to include `tel:` and `line:`. [[#104](https://github.com/flavorjones/loofah/issues/104), [#147](https://github.com/flavorjones/loofah/issues/147)]
161
+ * Expand set of allowed CSS functions. [related to [#122](https://github.com/flavorjones/loofah/issues/122)]
162
+ * Allow greater precision in shorthand CSS values. [[#149](https://github.com/flavorjones/loofah/issues/149)] (Thanks, [@danfstucky](https://github.com/danfstucky)!)
163
+ * Allow CSS property `list-style` [[#162](https://github.com/flavorjones/loofah/issues/162)] (Thanks, [@jaredbeck](https://github.com/jaredbeck)!)
164
+ * Allow CSS keywords `thick` and `thin` [[#168](https://github.com/flavorjones/loofah/issues/168)] (Thanks, [@georgeclaghorn](https://github.com/georgeclaghorn)!)
165
+ * Allow HTML property `contenteditable` [[#167](https://github.com/flavorjones/loofah/issues/167)] (Thanks, [@andreynering](https://github.com/andreynering)!)
166
+
167
+
168
+ ### Bug fixes
169
+
170
+ * CSS hex values are no longer limited to lowercase hex. Previously uppercase hex were scrubbed. [[#165](https://github.com/flavorjones/loofah/issues/165)] (Thanks, [@asok](https://github.com/asok)!)
171
+
172
+
173
+ ### Deprecations / Name Changes
174
+
175
+ The following method and constants are hereby deprecated, and will be completely removed in a future release:
176
+
177
+ * Deprecate `Loofah::Helpers::ActionView.white_list_sanitizer`, please use `Loofah::Helpers::ActionView.safe_list_sanitizer` instead.
178
+ * Deprecate `Loofah::Helpers::ActionView::WhiteListSanitizer`, please use `Loofah::Helpers::ActionView::SafeListSanitizer` instead.
179
+ * Deprecate `Loofah::HTML5::WhiteList`, please use `Loofah::HTML5::SafeList` instead.
180
+
181
+ Thanks to [@JuanitoFatas](https://github.com/JuanitoFatas) for submitting these changes in [#164](https://github.com/flavorjones/loofah/issues/164) and for making the language used in Loofah more inclusive.
182
+
183
+
3
184
  ## 2.2.3 / 2018-10-30
4
185
 
5
186
  ### Security
6
187
 
7
188
  Address CVE-2018-16468: Unsanitized JavaScript may occur in sanitized output when a crafted SVG element is republished.
8
189
 
9
- This CVE's public notice is at https://github.com/flavorjones/loofah/issues/154
190
+ This CVE's public notice is at [#154](https://github.com/flavorjones/loofah/issues/154)
10
191
 
11
192
 
12
193
  ## Meta / 2018-10-27
@@ -33,76 +214,76 @@ attribute scrubbers should they need to address CVE-2018-8048.
33
214
 
34
215
  Addresses CVE-2018-8048. Loofah allowed non-whitelisted attributes to be present in sanitized output when input with specially-crafted HTML fragments.
35
216
 
36
- This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
217
+ This CVE's public notice is at [#144](https://github.com/flavorjones/loofah/issues/144)
37
218
 
38
219
 
39
220
  ## 2.2.0 / 2018-02-11
40
221
 
41
222
  ### Features:
42
223
 
43
- * Support HTML5 `<main>` tag. #133 (Thanks, @MothOnMars!)
44
- * Recognize HTML5 block elements. #136 (Thanks, @MothOnMars!)
45
- * Support SVG `<symbol>` tag. #131 (Thanks, @baopham!)
46
- * Support for whitelisting CSS functions, initially just `calc` and `rgb`. #122/#123/#129 (Thanks, @NikoRoberts!)
47
- * Whitelist CSS property `list-style-type`. #68/#137/#142 (Thanks, @andela-ysanni and @NikoRoberts!)
224
+ * Support HTML5 `<main>` tag. [#133](https://github.com/flavorjones/loofah/issues/133) (Thanks, [@MothOnMars](https://github.com/MothOnMars)!)
225
+ * Recognize HTML5 block elements. [#136](https://github.com/flavorjones/loofah/issues/136) (Thanks, [@MothOnMars](https://github.com/MothOnMars)!)
226
+ * Support SVG `<symbol>` tag. [#131](https://github.com/flavorjones/loofah/issues/131) (Thanks, [@baopham](https://github.com/baopham)!)
227
+ * Support for whitelisting CSS functions, initially just `calc` and `rgb`. [#122](https://github.com/flavorjones/loofah/issues/122)/[#123](https://github.com/flavorjones/loofah/issues/123)/[#129](https://github.com/flavorjones/loofah/issues/129) (Thanks, [@NikoRoberts](https://github.com/NikoRoberts)!)
228
+ * Whitelist CSS property `list-style-type`. [#68](https://github.com/flavorjones/loofah/issues/68)/[#137](https://github.com/flavorjones/loofah/issues/137)/[#142](https://github.com/flavorjones/loofah/issues/142) (Thanks, [@andela-ysanni](https://github.com/andela-ysanni) and [@NikoRoberts](https://github.com/NikoRoberts)!)
48
229
 
49
230
  ### Bugfixes:
50
231
 
51
- * Properly handle nested `script` tags. #127.
232
+ * Properly handle nested `script` tags. [#127](https://github.com/flavorjones/loofah/issues/127).
52
233
 
53
234
 
54
235
  ## 2.1.1 / 2017-09-24
55
236
 
56
237
  ### Bugfixes:
57
238
 
58
- * Removed warning for unused variable. #124 (Thanks, @y-yagi!)
239
+ * Removed warning for unused variable. [#124](https://github.com/flavorjones/loofah/issues/124) (Thanks, [@y-yagi](https://github.com/y-yagi)!)
59
240
 
60
241
 
61
242
  ## 2.1.0 / 2017-09-24
62
243
 
63
244
  ### Notes:
64
245
 
65
- * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. #91
246
+ * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. [#91](https://github.com/flavorjones/loofah/issues/91)
66
247
 
67
248
 
68
249
  ### Features:
69
250
 
70
- * Added :noopener HTML scrubber (Thanks, @tastycode!)
71
- * Support `data` URIs with the following media types: text/plain, text/css, image/png, image/gif, image/jpeg, image/svg+xml. #101, #120. (Thanks, @mrpasquini!)
251
+ * Added :noopener HTML scrubber (Thanks, [@tastycode](https://github.com/tastycode)!)
252
+ * Support `data` URIs with the following media types: text/plain, text/css, image/png, image/gif, image/jpeg, image/svg+xml. [#101](https://github.com/flavorjones/loofah/issues/101), [#120](https://github.com/flavorjones/loofah/issues/120). (Thanks, [@mrpasquini](https://github.com/mrpasquini)!)
72
253
 
73
254
 
74
255
  ### Bugfixes:
75
256
 
76
- * The :unprintable scrubber now scrubs unprintable characters in CDATA nodes (like `<script>`). #124
77
- * Allow negative values in CSS properties. Restores functionality that was reverted in v2.0.3. #91
257
+ * The :unprintable scrubber now scrubs unprintable characters in CDATA nodes (like `<script>`). [#124](https://github.com/flavorjones/loofah/issues/124)
258
+ * Allow negative values in CSS properties. Restores functionality that was reverted in v2.0.3. [#91](https://github.com/flavorjones/loofah/issues/91)
78
259
 
79
260
 
80
261
  ## 2.0.3 / 2015-08-17
81
262
 
82
263
  ### Bug fixes:
83
264
 
84
- * Revert support for negative values in CSS properties due to slow performance. #90 (Related to #85.)
265
+ * Revert support for negative values in CSS properties due to slow performance. [#90](https://github.com/flavorjones/loofah/issues/90) (Related to [#85](https://github.com/flavorjones/loofah/issues/85).)
85
266
 
86
267
 
87
268
  ## 2.0.2 / 2015-05-05
88
269
 
89
270
  ### Bug fixes:
90
271
 
91
- * Fix error with `#to_text` when Loofah::Helpers hadn't been required. #75
92
- * Allow multi-word data attributes. #84 (Thanks, @jstorimer!)
93
- * Allow negative values in CSS properties. #85 (Thanks, @siddhartham!)
272
+ * Fix error with `#to_text` when Loofah::Helpers hadn't been required. [#75](https://github.com/flavorjones/loofah/issues/75)
273
+ * Allow multi-word data attributes. [#84](https://github.com/flavorjones/loofah/issues/84) (Thanks, [@jstorimer](https://github.com/jstorimer)!)
274
+ * Allow negative values in CSS properties. [#85](https://github.com/flavorjones/loofah/issues/85) (Thanks, [@siddhartham](https://github.com/siddhartham)!)
94
275
 
95
276
 
96
277
  ## 2.0.1 / 2014-08-21
97
278
 
98
279
  ### Bug fixes:
99
280
 
100
- * Load RR correctly when running test files directly. (Thanks, @ktdreyer!)
281
+ * Load RR correctly when running test files directly. (Thanks, [@ktdreyer](https://github.com/ktdreyer)!)
101
282
 
102
283
 
103
284
  ### Notes:
104
285
 
105
- * Extracted HTML5::Scrub#scrub_css_attribute to accommodate the Rails integration work. (Thanks, @kaspth!)
286
+ * Extracted HTML5::Scrub#scrub_css_attribute to accommodate the Rails integration work. (Thanks, [@kaspth](https://github.com/kaspth)!)
106
287
 
107
288
 
108
289
  ## 2.0.0 / 2014-05-09
@@ -118,19 +299,19 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
118
299
  * tags: `article`, `aside`, `bdi`, `bdo`, `canvas`, `command`, `datalist`, `details`, `figcaption`, `figure`, `footer`, `header`, `mark`, `meter`, `nav`, `output`, `section`, `summary`, `time`
119
300
  * attributes: `data-*` (Thanks, Rafael Franca!)
120
301
  * URI attributes: `poster` and `preload`
121
- * Addition of the `:unprintable` scrubber to remove unprintable characters from text nodes. #65 (Thanks, Matt Swanson!)
122
- * `Loofah.fragment` accepts an optional encoding argument, compatible with `Nokogiri::HTML::DocumentFragment.parse`. #62 (Thanks, Ben Atkins!)
302
+ * Addition of the `:unprintable` scrubber to remove unprintable characters from text nodes. [#65](https://github.com/flavorjones/loofah/issues/65) (Thanks, Matt Swanson!)
303
+ * `Loofah.fragment` accepts an optional encoding argument, compatible with `Nokogiri::HTML::DocumentFragment.parse`. [#62](https://github.com/flavorjones/loofah/issues/62) (Thanks, Ben Atkins!)
123
304
  * HTML5 sanitizers now remove attributes without values. (Thanks, Kasper Timm Hansen!)
124
305
 
125
306
  ### Bug fixes:
126
307
 
127
308
  * HTML5 sanitizers' CSS keyword check now actually works (broken in v2.0). Additional regression tests added. (Thanks, Kasper Timm Hansen!)
128
- * HTML5 sanitizers now allow negative arguments to CSS. #64 (Thanks, Jon Calhoun!)
309
+ * HTML5 sanitizers now allow negative arguments to CSS. [#64](https://github.com/flavorjones/loofah/issues/64) (Thanks, Jon Calhoun!)
129
310
 
130
311
 
131
312
  ## 1.2.1 (2012-04-14)
132
313
 
133
- * Declaring encoding in html5/scrub.rb. Without this, use of the ruby -KU option would cause havoc. (#32)
314
+ * Declaring encoding in html5/scrub.rb. Without this, use of the ruby -KU option would cause havoc. ([#32](https://github.com/flavorjones/loofah/issues/32))
134
315
 
135
316
 
136
317
  ## 1.2.0 (2011-08-08)
@@ -148,7 +329,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
148
329
  * Additional HTML5lib whitelist elements (from html5lib 1524:80b5efe26230).
149
330
  Up to date with HTML5lib ruby code as of 1723:7ee6a0331856.
150
331
  * Whitelists (which are not part of the public API) are now Sets (were previously Arrays).
151
- * Don't explode when encountering UTF-8 URIs. (#25, #29)
332
+ * Don't explode when encountering UTF-8 URIs. ([#25](https://github.com/flavorjones/loofah/issues/25), [#29](https://github.com/flavorjones/loofah/issues/29))
152
333
 
153
334
 
154
335
  ## 1.0.0 (2010-10-26)
@@ -166,7 +347,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
166
347
  * New methods Loofah::HTML::Document#to_text and
167
348
  Loofah::HTML::DocumentFragment#to_text do the right thing with
168
349
  whitespace. Note that these methods are significantly slower than
169
- #text. GH #12
350
+ #text. GH [#12](https://github.com/flavorjones/loofah/issues/12)
170
351
  * Loofah::Elements::BLOCK_LEVEL contains a canonical list of HTML4 block-level4 elements.
171
352
  * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text
172
353
  will return unescaped HTML entities by passing :encode_special_chars => false.
@@ -180,7 +361,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
180
361
 
181
362
  ### Bug fixes:
182
363
 
183
- * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
364
+ * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH [#17](https://github.com/flavorjones/loofah/issues/17)
184
365
 
185
366
 
186
367
  ## 0.4.3 (2010-01-29)
@@ -208,7 +389,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
208
389
 
209
390
  ### Bug fixes:
210
391
 
211
- * Supporting Rails apps that aren't loading ActiveRecord. GH #10
392
+ * Supporting Rails apps that aren't loading ActiveRecord. GH [#10](https://github.com/flavorjones/loofah/issues/10)
212
393
 
213
394
  ### Miscellaneous:
214
395
 
@@ -269,13 +450,13 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
269
450
  ### Enhancements:
270
451
 
271
452
  * when loaded in a Rails app, automatically extend ActiveRecord::Base
272
- with html_fragment and html_document. GH #6 (Thanks Josh Nichols!)
453
+ with html_fragment and html_document. GH [#6](https://github.com/flavorjones/loofah/issues/6) (Thanks Josh Nichols!)
273
454
 
274
455
  ### Bugfixes:
275
456
 
276
457
  * ActiveRecord scrubbing should generate strings instead of Document or
277
- DocumentFragment objects. GH #5
278
- * init.rb fixed to support installation as a Rails plugin. GH #6
458
+ DocumentFragment objects. GH [#5](https://github.com/flavorjones/loofah/issues/5)
459
+ * init.rb fixed to support installation as a Rails plugin. GH [#6](https://github.com/flavorjones/loofah/issues/6)
279
460
  (Thanks Josh Nichols!)
280
461
 
281
462
 
data/README.md CHANGED
@@ -1,36 +1,27 @@
1
1
  # Loofah
2
2
 
3
3
  * https://github.com/flavorjones/loofah
4
- * Docs: http://rubydoc.info/github/flavorjones/loofah/master/frames
4
+ * Docs: http://rubydoc.info/github/flavorjones/loofah/main/frames
5
5
  * Mailing list: [loofah-talk@googlegroups.com](https://groups.google.com/forum/#!forum/loofah-talk)
6
6
 
7
7
  ## Status
8
8
 
9
- |System|Status|
10
- |--|--|
11
- | Concourse | [![Concourse CI](https://ci.nokogiri.org/api/v1/teams/nokogiri-core/pipelines/loofah/jobs/ruby-2.5/badge)](https://ci.nokogiri.org/teams/nokogiri-core/pipelines/loofah?groups=master) |
12
- | Code Climate | [![Code Climate](https://codeclimate.com/github/flavorjones/loofah.svg)](https://codeclimate.com/github/flavorjones/loofah) |
13
- | Version Eye | [![Version Eye](https://www.versioneye.com/ruby/loofah/badge.png)](https://www.versioneye.com/ruby/loofah) |
9
+ [![ci](https://github.com/flavorjones/loofah/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/flavorjones/loofah/actions/workflows/ci.yml)
10
+ [![Tidelift dependencies](https://tidelift.com/badges/package/rubygems/loofah)](https://tidelift.com/subscription/pkg/rubygems-loofah?utm_source=rubygems-loofah&utm_medium=referral&utm_campaign=readme)
14
11
 
15
12
 
16
13
  ## Description
17
14
 
18
- Loofah is a general library for manipulating and transforming HTML/XML
19
- documents and fragments. It's built on top of Nokogiri and libxml2, so
20
- it's fast and has a nice API.
15
+ Loofah is a general library for manipulating and transforming HTML/XML documents and fragments, built on top of Nokogiri.
21
16
 
22
- Loofah excels at HTML sanitization (XSS prevention). It includes some
23
- nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
24
- most likely won't make your codes less secure. (These statements have
25
- not been evaluated by Netexperts.)
17
+ Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML sanitizers, which are based on HTML5lib's safelist, so it most likely won't make your codes less secure. (These statements have not been evaluated by Netexperts.)
26
18
 
27
- ActiveRecord extensions for sanitization are available in the
28
- [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
19
+ ActiveRecord extensions for sanitization are available in the [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
29
20
 
30
21
 
31
22
  ## Features
32
23
 
33
- * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's whitelists).
24
+ * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's safelists).
34
25
  * Common HTML sanitizing tasks are built-in:
35
26
  * _Strip_ unsafe tags, leaving behind only the inner text.
36
27
  * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
@@ -142,13 +133,12 @@ and `text` to return plain text:
142
133
  doc.text # => "ohai! div is safe "
143
134
  ```
144
135
 
145
- Also, `to_text` is available, which does the right thing with
146
- whitespace around block-level elements.
136
+ Also, `to_text` is available, which does the right thing with whitespace around block-level and line break elements.
147
137
 
148
138
  ``` ruby
149
- doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
150
- doc.text # => "TitleContent" # probably not what you want
151
- doc.to_text # => "\nTitle\n\nContent\n" # better
139
+ doc = Loofah.fragment("<h1>Title</h1><div>Content<br>Next line</div>")
140
+ doc.text # => "TitleContentNext line" # probably not what you want
141
+ doc.to_text # => "\nTitle\n\nContent\nNext line\n" # better
152
142
  ```
153
143
 
154
144
  ### Loofah::XML::Document and Loofah::XML::DocumentFragment
@@ -219,10 +209,10 @@ end
219
209
  Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
220
210
  ```
221
211
 
222
- === Built-In HTML Scrubbers
212
+ ### Built-In HTML Scrubbers
223
213
 
224
214
  Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
225
- whitelist algorithm:
215
+ safelist algorithm:
226
216
 
227
217
  ``` ruby
228
218
  doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
@@ -308,6 +298,10 @@ And the mailing list is on Google Groups:
308
298
 
309
299
  And the IRC channel is \#loofah on freenode.
310
300
 
301
+ Consider subscribing to [Tidelift][tidelift] which provides license assurances and timely security notifications for your open source dependencies, including Loofah. [Tidelift][tidelift] subscriptions also help the Loofah maintainers fund our [automated testing](https://ci.nokogiri.org) which in turn allows us to ship releases, bugfixes, and security updates more often.
302
+
303
+ [tidelift]: https://tidelift.com/subscription/pkg/rubygems-loofah?utm_source=undefined&utm_medium=referral&utm_campaign=enterprise
304
+
311
305
 
312
306
  ## Security
313
307
 
@@ -354,7 +348,7 @@ And a big shout-out to Corey Innis for the name, and feedback on the API.
354
348
 
355
349
  ## Thank You
356
350
 
357
- The following people have generously donated via the [Pledgie](http://pledgie.com) badge on the [Loofah github page](https://github.com/flavorjones/loofah):
351
+ The following people have generously funded Loofah:
358
352
 
359
353
  * Bill Harding
360
354
 
@@ -1,91 +1,95 @@
1
- require 'set'
1
+ # frozen_string_literal: true
2
+ require "set"
2
3
 
3
4
  module Loofah
4
5
  module Elements
5
6
  STRICT_BLOCK_LEVEL_HTML4 = Set.new %w[
6
- address
7
- blockquote
8
- center
9
- dir
10
- div
11
- dl
12
- fieldset
13
- form
14
- h1
15
- h2
16
- h3
17
- h4
18
- h5
19
- h6
20
- hr
21
- isindex
22
- menu
23
- noframes
24
- noscript
25
- ol
26
- p
27
- pre
28
- table
29
- ul
30
- ]
7
+ address
8
+ blockquote
9
+ center
10
+ dir
11
+ div
12
+ dl
13
+ fieldset
14
+ form
15
+ h1
16
+ h2
17
+ h3
18
+ h4
19
+ h5
20
+ h6
21
+ hr
22
+ isindex
23
+ menu
24
+ noframes
25
+ noscript
26
+ ol
27
+ p
28
+ pre
29
+ table
30
+ ul
31
+ ]
31
32
 
32
33
  # https://developer.mozilla.org/en-US/docs/Web/HTML/Block-level_elements
33
34
  STRICT_BLOCK_LEVEL_HTML5 = Set.new %w[
34
- address
35
- article
36
- aside
37
- blockquote
38
- canvas
39
- dd
40
- div
41
- dl
42
- dt
43
- fieldset
44
- figcaption
45
- figure
46
- footer
47
- form
48
- h1
49
- h2
50
- h3
51
- h4
52
- h5
53
- h6
54
- header
55
- hgroup
56
- hr
57
- li
58
- main
59
- nav
60
- noscript
61
- ol
62
- output
63
- p
64
- pre
65
- section
66
- table
67
- tfoot
68
- ul
69
- video
70
- ]
71
-
72
- STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
35
+ address
36
+ article
37
+ aside
38
+ blockquote
39
+ canvas
40
+ dd
41
+ div
42
+ dl
43
+ dt
44
+ fieldset
45
+ figcaption
46
+ figure
47
+ footer
48
+ form
49
+ h1
50
+ h2
51
+ h3
52
+ h4
53
+ h5
54
+ h6
55
+ header
56
+ hgroup
57
+ hr
58
+ li
59
+ main
60
+ nav
61
+ noscript
62
+ ol
63
+ output
64
+ p
65
+ pre
66
+ section
67
+ table
68
+ tfoot
69
+ ul
70
+ video
71
+ ]
73
72
 
74
73
  # The following elements may also be considered block-level
75
74
  # elements since they may contain block-level elements
76
75
  LOOSE_BLOCK_LEVEL = Set.new %w[dd
77
- dt
78
- frameset
79
- li
80
- tbody
81
- td
82
- tfoot
83
- th
84
- thead
85
- tr
86
- ]
76
+ dt
77
+ frameset
78
+ li
79
+ tbody
80
+ td
81
+ tfoot
82
+ th
83
+ thead
84
+ tr
85
+ ]
87
86
 
87
+ # Elements that aren't block but should generate a newline in #to_text
88
+ INLINE_LINE_BREAK = Set.new(["br"])
89
+
90
+ STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
88
91
  BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
92
+ LINEBREAKERS = BLOCK_LEVEL + INLINE_LINE_BREAK
89
93
  end
90
94
 
91
95
  ::Loofah::MetaHelpers.add_downcased_set_members_to_all_set_constants ::Loofah::Elements