loofah 2.2.3 → 2.9.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

Files changed (42) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +124 -31
  3. data/README.md +12 -16
  4. data/lib/loofah.rb +35 -18
  5. data/lib/loofah/elements.rb +74 -73
  6. data/lib/loofah/helpers.rb +18 -7
  7. data/lib/loofah/html/document.rb +1 -0
  8. data/lib/loofah/html/document_fragment.rb +4 -2
  9. data/lib/loofah/html5/libxml2_workarounds.rb +8 -7
  10. data/lib/loofah/html5/safelist.rb +819 -0
  11. data/lib/loofah/html5/scrub.rb +63 -46
  12. data/lib/loofah/instance_methods.rb +5 -3
  13. data/lib/loofah/metahelpers.rb +2 -1
  14. data/lib/loofah/scrubber.rb +8 -7
  15. data/lib/loofah/scrubbers.rb +12 -11
  16. data/lib/loofah/version.rb +5 -0
  17. data/lib/loofah/xml/document.rb +1 -0
  18. data/lib/loofah/xml/document_fragment.rb +2 -1
  19. metadata +40 -112
  20. data/.gemtest +0 -0
  21. data/Gemfile +0 -22
  22. data/Manifest.txt +0 -40
  23. data/Rakefile +0 -79
  24. data/benchmark/benchmark.rb +0 -149
  25. data/benchmark/fragment.html +0 -96
  26. data/benchmark/helper.rb +0 -73
  27. data/benchmark/www.slashdot.com.html +0 -2560
  28. data/lib/loofah/html5/whitelist.rb +0 -186
  29. data/test/assets/msword.html +0 -63
  30. data/test/assets/testdata_sanitizer_tests1.dat +0 -502
  31. data/test/helper.rb +0 -18
  32. data/test/html5/test_sanitizer.rb +0 -382
  33. data/test/integration/test_ad_hoc.rb +0 -204
  34. data/test/integration/test_helpers.rb +0 -43
  35. data/test/integration/test_html.rb +0 -72
  36. data/test/integration/test_scrubbers.rb +0 -400
  37. data/test/integration/test_xml.rb +0 -55
  38. data/test/unit/test_api.rb +0 -142
  39. data/test/unit/test_encoding.rb +0 -20
  40. data/test/unit/test_helpers.rb +0 -62
  41. data/test/unit/test_scrubber.rb +0 -229
  42. data/test/unit/test_scrubbers.rb +0 -14
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c22c1a749ff878b96f0c4a53e789834fa8072775c5abdccb68c388d6218b1bce
4
- data.tar.gz: e8d00e6ff5d623b3f3d03ce83ee780a88e92138fcb71efff28194f8a7d87e5fc
3
+ metadata.gz: 10f6e8ff06a760da3400cdf8660e6768cfc2e7bbcb34a3ae6aaadea5e29ff924
4
+ data.tar.gz: 7e0accdb26147612bd7da3abc8fa98a6fac850dbb7a0ee99d20375400de4b877
5
5
  SHA512:
6
- metadata.gz: 0d5a0160010d61a51dad8e31bc644e03454311b99b1d71c6eaea5458cfaaa228671b82db52cf2369b42c48b636b912ca0d812191ac886a5c1499c44fc5221239
7
- data.tar.gz: ac479e283ef08b0df14938ec577a3aa4008d07ba3288232541928794cd0b9fe2512da88ac7fd2d123666dcad67d09c1a07307442610f61adbfd65f143ae339b5
6
+ metadata.gz: 11b0f4dcad5a9f38444e9eebd45cb09705e536468c901c03d792711133536812f8b0579533eb54a305311d5303fdd4cf510761a9a0d42d0af46bb153d3402a3c
7
+ data.tar.gz: d6032694eaaaddd47c02868ecb037dc2673b5ebd749a7d8846c2a55e13744f9455a66b41ec16a4cf3c4905e6019df3493972ec462cd04404365ebe202e15e211
data/CHANGELOG.md CHANGED
@@ -1,12 +1,105 @@
1
1
  # Changelog
2
2
 
3
+ ### 2.9.0 / 2021-01-14
4
+
5
+ * Handle CSS functions in a CSS shorthand property (like `background`). [[#199](https://github.com/flavorjones/loofah/issues/199), [#200](https://github.com/flavorjones/loofah/issues/200)]
6
+
7
+
8
+ ### 2.8.0 / 2020-11-25
9
+
10
+ * Allow CSS properties `order`, `flex-direction`, `flex-grow`, `flex-wrap`, `flex-shrink`, `flex-flow`, `flex-basis`, `flex`, `justify-content`, `align-self`, `align-items`, and `align-content`. [[#197](https://github.com/flavorjones/loofah/issues/197)] (Thanks, [@miguelperez](https://github.com/miguelperez)!)
11
+
12
+
13
+ ## 2.7.0 / 2020-08-26
14
+
15
+ ### Features
16
+
17
+ * Allow CSS properties `page-break-before`, `page-break-inside`, and `page-break-after`. [[#190](https://github.com/flavorjones/loofah/issues/190)] (Thanks, [@ahorek](https://github.com/ahorek)!)
18
+
19
+
20
+ ### Fixes
21
+
22
+ * Don't drop the `!important` rule from some CSS properties. [[#191](https://github.com/flavorjones/loofah/issues/191)] (Thanks, [@b7kich](https://github.com/b7kich)!)
23
+
24
+
25
+ ## 2.6.0 / 2020-06-16
26
+
27
+ ### Features
28
+
29
+ * Allow CSS `border-style` keywords. [[#188](https://github.com/flavorjones/loofah/issues/188)] (Thanks, [@tarcisiozf](https://github.com/tarcisiozf)!)
30
+
31
+
32
+ ## 2.5.0 / 2020-04-05
33
+
34
+ ### Features
35
+
36
+ * Allow more CSS length units: "ch", "vw", "vh", "Q", "lh", "vmin", "vmax". [[#178](https://github.com/flavorjones/loofah/issues/178)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas)!)
37
+
38
+
39
+ ### Fixes
40
+
41
+ * Remove comments from `Loofah::HTML::Document`s that exist outside the `html` element. [[#80](https://github.com/flavorjones/loofah/issues/80)]
42
+
43
+
44
+ ### Other changes
45
+
46
+ * Gem metadata being set [[#181](https://github.com/flavorjones/loofah/issues/181)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas)!)
47
+ * Test files removed from gem file [[#180](https://github.com/flavorjones/loofah/issues/180),[#166](https://github.com/flavorjones/loofah/issues/166),[#159](https://github.com/flavorjones/loofah/issues/159)] (Thanks, [@JuanitoFatas](https://github.com/JuanitoFatas) and [@greysteil](https://github.com/greysteil)!)
48
+
49
+
50
+ ## 2.4.0 / 2019-11-25
51
+
52
+ ### Features
53
+
54
+ * Allow CSS property `max-width` [[#175](https://github.com/flavorjones/loofah/issues/175)] (Thanks, [@bchaney](https://github.com/bchaney)!)
55
+ * Allow CSS sizes expressed in `rem` [[#176](https://github.com/flavorjones/loofah/issues/176), [#177](https://github.com/flavorjones/loofah/issues/177)]
56
+ * Add `frozen_string_literal: true` magic comment to all `lib` files. [[#118](https://github.com/flavorjones/loofah/issues/118)]
57
+
58
+
59
+ ## 2.3.1 / 2019-10-22
60
+
61
+ ### Security
62
+
63
+ Address CVE-2019-15587: Unsanitized JavaScript may occur in sanitized output when a crafted SVG element is republished.
64
+
65
+ This CVE's public notice is at [#171](https://github.com/flavorjones/loofah/issues/171)
66
+
67
+
68
+ ## 2.3.0 / 2019-09-28
69
+
70
+ ### Features
71
+
72
+ * Expand set of allowed protocols to include `tel:` and `line:`. [[#104](https://github.com/flavorjones/loofah/issues/104), [#147](https://github.com/flavorjones/loofah/issues/147)]
73
+ * Expand set of allowed CSS functions. [related to [#122](https://github.com/flavorjones/loofah/issues/122)]
74
+ * Allow greater precision in shorthand CSS values. [[#149](https://github.com/flavorjones/loofah/issues/149)] (Thanks, [@danfstucky](https://github.com/danfstucky)!)
75
+ * Allow CSS property `list-style` [[#162](https://github.com/flavorjones/loofah/issues/162)] (Thanks, [@jaredbeck](https://github.com/jaredbeck)!)
76
+ * Allow CSS keywords `thick` and `thin` [[#168](https://github.com/flavorjones/loofah/issues/168)] (Thanks, [@georgeclaghorn](https://github.com/georgeclaghorn)!)
77
+ * Allow HTML property `contenteditable` [[#167](https://github.com/flavorjones/loofah/issues/167)] (Thanks, [@andreynering](https://github.com/andreynering)!)
78
+
79
+
80
+ ### Bug fixes
81
+
82
+ * CSS hex values are no longer limited to lowercase hex. Previously uppercase hex were scrubbed. [[#165](https://github.com/flavorjones/loofah/issues/165)] (Thanks, [@asok](https://github.com/asok)!)
83
+
84
+
85
+ ### Deprecations / Name Changes
86
+
87
+ The following method and constants are hereby deprecated, and will be completely removed in a future release:
88
+
89
+ * Deprecate `Loofah::Helpers::ActionView.white_list_sanitizer`, please use `Loofah::Helpers::ActionView.safe_list_sanitizer` instead.
90
+ * Deprecate `Loofah::Helpers::ActionView::WhiteListSanitizer`, please use `Loofah::Helpers::ActionView::SafeListSanitizer` instead.
91
+ * Deprecate `Loofah::HTML5::WhiteList`, please use `Loofah::HTML5::SafeList` instead.
92
+
93
+ Thanks to [@JuanitoFatas](https://github.com/JuanitoFatas) for submitting these changes in [#164](https://github.com/flavorjones/loofah/issues/164) and for making the language used in Loofah more inclusive.
94
+
95
+
3
96
  ## 2.2.3 / 2018-10-30
4
97
 
5
98
  ### Security
6
99
 
7
100
  Address CVE-2018-16468: Unsanitized JavaScript may occur in sanitized output when a crafted SVG element is republished.
8
101
 
9
- This CVE's public notice is at https://github.com/flavorjones/loofah/issues/154
102
+ This CVE's public notice is at [#154](https://github.com/flavorjones/loofah/issues/154)
10
103
 
11
104
 
12
105
  ## Meta / 2018-10-27
@@ -33,76 +126,76 @@ attribute scrubbers should they need to address CVE-2018-8048.
33
126
 
34
127
  Addresses CVE-2018-8048. Loofah allowed non-whitelisted attributes to be present in sanitized output when input with specially-crafted HTML fragments.
35
128
 
36
- This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
129
+ This CVE's public notice is at [#144](https://github.com/flavorjones/loofah/issues/144)
37
130
 
38
131
 
39
132
  ## 2.2.0 / 2018-02-11
40
133
 
41
134
  ### Features:
42
135
 
43
- * Support HTML5 `<main>` tag. #133 (Thanks, @MothOnMars!)
44
- * Recognize HTML5 block elements. #136 (Thanks, @MothOnMars!)
45
- * Support SVG `<symbol>` tag. #131 (Thanks, @baopham!)
46
- * Support for whitelisting CSS functions, initially just `calc` and `rgb`. #122/#123/#129 (Thanks, @NikoRoberts!)
47
- * Whitelist CSS property `list-style-type`. #68/#137/#142 (Thanks, @andela-ysanni and @NikoRoberts!)
136
+ * Support HTML5 `<main>` tag. [#133](https://github.com/flavorjones/loofah/issues/133) (Thanks, [@MothOnMars](https://github.com/MothOnMars)!)
137
+ * Recognize HTML5 block elements. [#136](https://github.com/flavorjones/loofah/issues/136) (Thanks, [@MothOnMars](https://github.com/MothOnMars)!)
138
+ * Support SVG `<symbol>` tag. [#131](https://github.com/flavorjones/loofah/issues/131) (Thanks, [@baopham](https://github.com/baopham)!)
139
+ * Support for whitelisting CSS functions, initially just `calc` and `rgb`. [#122](https://github.com/flavorjones/loofah/issues/122)/[#123](https://github.com/flavorjones/loofah/issues/123)/[#129](https://github.com/flavorjones/loofah/issues/129) (Thanks, [@NikoRoberts](https://github.com/NikoRoberts)!)
140
+ * Whitelist CSS property `list-style-type`. [#68](https://github.com/flavorjones/loofah/issues/68)/[#137](https://github.com/flavorjones/loofah/issues/137)/[#142](https://github.com/flavorjones/loofah/issues/142) (Thanks, [@andela-ysanni](https://github.com/andela-ysanni) and [@NikoRoberts](https://github.com/NikoRoberts)!)
48
141
 
49
142
  ### Bugfixes:
50
143
 
51
- * Properly handle nested `script` tags. #127.
144
+ * Properly handle nested `script` tags. [#127](https://github.com/flavorjones/loofah/issues/127).
52
145
 
53
146
 
54
147
  ## 2.1.1 / 2017-09-24
55
148
 
56
149
  ### Bugfixes:
57
150
 
58
- * Removed warning for unused variable. #124 (Thanks, @y-yagi!)
151
+ * Removed warning for unused variable. [#124](https://github.com/flavorjones/loofah/issues/124) (Thanks, [@y-yagi](https://github.com/y-yagi)!)
59
152
 
60
153
 
61
154
  ## 2.1.0 / 2017-09-24
62
155
 
63
156
  ### Notes:
64
157
 
65
- * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. #91
158
+ * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. [#91](https://github.com/flavorjones/loofah/issues/91)
66
159
 
67
160
 
68
161
  ### Features:
69
162
 
70
- * Added :noopener HTML scrubber (Thanks, @tastycode!)
71
- * Support `data` URIs with the following media types: text/plain, text/css, image/png, image/gif, image/jpeg, image/svg+xml. #101, #120. (Thanks, @mrpasquini!)
163
+ * Added :noopener HTML scrubber (Thanks, [@tastycode](https://github.com/tastycode)!)
164
+ * Support `data` URIs with the following media types: text/plain, text/css, image/png, image/gif, image/jpeg, image/svg+xml. [#101](https://github.com/flavorjones/loofah/issues/101), [#120](https://github.com/flavorjones/loofah/issues/120). (Thanks, [@mrpasquini](https://github.com/mrpasquini)!)
72
165
 
73
166
 
74
167
  ### Bugfixes:
75
168
 
76
- * The :unprintable scrubber now scrubs unprintable characters in CDATA nodes (like `<script>`). #124
77
- * Allow negative values in CSS properties. Restores functionality that was reverted in v2.0.3. #91
169
+ * The :unprintable scrubber now scrubs unprintable characters in CDATA nodes (like `<script>`). [#124](https://github.com/flavorjones/loofah/issues/124)
170
+ * Allow negative values in CSS properties. Restores functionality that was reverted in v2.0.3. [#91](https://github.com/flavorjones/loofah/issues/91)
78
171
 
79
172
 
80
173
  ## 2.0.3 / 2015-08-17
81
174
 
82
175
  ### Bug fixes:
83
176
 
84
- * Revert support for negative values in CSS properties due to slow performance. #90 (Related to #85.)
177
+ * Revert support for negative values in CSS properties due to slow performance. [#90](https://github.com/flavorjones/loofah/issues/90) (Related to [#85](https://github.com/flavorjones/loofah/issues/85).)
85
178
 
86
179
 
87
180
  ## 2.0.2 / 2015-05-05
88
181
 
89
182
  ### Bug fixes:
90
183
 
91
- * Fix error with `#to_text` when Loofah::Helpers hadn't been required. #75
92
- * Allow multi-word data attributes. #84 (Thanks, @jstorimer!)
93
- * Allow negative values in CSS properties. #85 (Thanks, @siddhartham!)
184
+ * Fix error with `#to_text` when Loofah::Helpers hadn't been required. [#75](https://github.com/flavorjones/loofah/issues/75)
185
+ * Allow multi-word data attributes. [#84](https://github.com/flavorjones/loofah/issues/84) (Thanks, [@jstorimer](https://github.com/jstorimer)!)
186
+ * Allow negative values in CSS properties. [#85](https://github.com/flavorjones/loofah/issues/85) (Thanks, [@siddhartham](https://github.com/siddhartham)!)
94
187
 
95
188
 
96
189
  ## 2.0.1 / 2014-08-21
97
190
 
98
191
  ### Bug fixes:
99
192
 
100
- * Load RR correctly when running test files directly. (Thanks, @ktdreyer!)
193
+ * Load RR correctly when running test files directly. (Thanks, [@ktdreyer](https://github.com/ktdreyer)!)
101
194
 
102
195
 
103
196
  ### Notes:
104
197
 
105
- * Extracted HTML5::Scrub#scrub_css_attribute to accommodate the Rails integration work. (Thanks, @kaspth!)
198
+ * Extracted HTML5::Scrub#scrub_css_attribute to accommodate the Rails integration work. (Thanks, [@kaspth](https://github.com/kaspth)!)
106
199
 
107
200
 
108
201
  ## 2.0.0 / 2014-05-09
@@ -118,19 +211,19 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
118
211
  * tags: `article`, `aside`, `bdi`, `bdo`, `canvas`, `command`, `datalist`, `details`, `figcaption`, `figure`, `footer`, `header`, `mark`, `meter`, `nav`, `output`, `section`, `summary`, `time`
119
212
  * attributes: `data-*` (Thanks, Rafael Franca!)
120
213
  * URI attributes: `poster` and `preload`
121
- * Addition of the `:unprintable` scrubber to remove unprintable characters from text nodes. #65 (Thanks, Matt Swanson!)
122
- * `Loofah.fragment` accepts an optional encoding argument, compatible with `Nokogiri::HTML::DocumentFragment.parse`. #62 (Thanks, Ben Atkins!)
214
+ * Addition of the `:unprintable` scrubber to remove unprintable characters from text nodes. [#65](https://github.com/flavorjones/loofah/issues/65) (Thanks, Matt Swanson!)
215
+ * `Loofah.fragment` accepts an optional encoding argument, compatible with `Nokogiri::HTML::DocumentFragment.parse`. [#62](https://github.com/flavorjones/loofah/issues/62) (Thanks, Ben Atkins!)
123
216
  * HTML5 sanitizers now remove attributes without values. (Thanks, Kasper Timm Hansen!)
124
217
 
125
218
  ### Bug fixes:
126
219
 
127
220
  * HTML5 sanitizers' CSS keyword check now actually works (broken in v2.0). Additional regression tests added. (Thanks, Kasper Timm Hansen!)
128
- * HTML5 sanitizers now allow negative arguments to CSS. #64 (Thanks, Jon Calhoun!)
221
+ * HTML5 sanitizers now allow negative arguments to CSS. [#64](https://github.com/flavorjones/loofah/issues/64) (Thanks, Jon Calhoun!)
129
222
 
130
223
 
131
224
  ## 1.2.1 (2012-04-14)
132
225
 
133
- * Declaring encoding in html5/scrub.rb. Without this, use of the ruby -KU option would cause havoc. (#32)
226
+ * Declaring encoding in html5/scrub.rb. Without this, use of the ruby -KU option would cause havoc. ([#32](https://github.com/flavorjones/loofah/issues/32))
134
227
 
135
228
 
136
229
  ## 1.2.0 (2011-08-08)
@@ -148,7 +241,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
148
241
  * Additional HTML5lib whitelist elements (from html5lib 1524:80b5efe26230).
149
242
  Up to date with HTML5lib ruby code as of 1723:7ee6a0331856.
150
243
  * Whitelists (which are not part of the public API) are now Sets (were previously Arrays).
151
- * Don't explode when encountering UTF-8 URIs. (#25, #29)
244
+ * Don't explode when encountering UTF-8 URIs. ([#25](https://github.com/flavorjones/loofah/issues/25), [#29](https://github.com/flavorjones/loofah/issues/29))
152
245
 
153
246
 
154
247
  ## 1.0.0 (2010-10-26)
@@ -166,7 +259,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
166
259
  * New methods Loofah::HTML::Document#to_text and
167
260
  Loofah::HTML::DocumentFragment#to_text do the right thing with
168
261
  whitespace. Note that these methods are significantly slower than
169
- #text. GH #12
262
+ #text. GH [#12](https://github.com/flavorjones/loofah/issues/12)
170
263
  * Loofah::Elements::BLOCK_LEVEL contains a canonical list of HTML4 block-level4 elements.
171
264
  * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text
172
265
  will return unescaped HTML entities by passing :encode_special_chars => false.
@@ -180,7 +273,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
180
273
 
181
274
  ### Bug fixes:
182
275
 
183
- * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
276
+ * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH [#17](https://github.com/flavorjones/loofah/issues/17)
184
277
 
185
278
 
186
279
  ## 0.4.3 (2010-01-29)
@@ -208,7 +301,7 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
208
301
 
209
302
  ### Bug fixes:
210
303
 
211
- * Supporting Rails apps that aren't loading ActiveRecord. GH #10
304
+ * Supporting Rails apps that aren't loading ActiveRecord. GH [#10](https://github.com/flavorjones/loofah/issues/10)
212
305
 
213
306
  ### Miscellaneous:
214
307
 
@@ -269,13 +362,13 @@ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
269
362
  ### Enhancements:
270
363
 
271
364
  * when loaded in a Rails app, automatically extend ActiveRecord::Base
272
- with html_fragment and html_document. GH #6 (Thanks Josh Nichols!)
365
+ with html_fragment and html_document. GH [#6](https://github.com/flavorjones/loofah/issues/6) (Thanks Josh Nichols!)
273
366
 
274
367
  ### Bugfixes:
275
368
 
276
369
  * ActiveRecord scrubbing should generate strings instead of Document or
277
- DocumentFragment objects. GH #5
278
- * init.rb fixed to support installation as a Rails plugin. GH #6
370
+ DocumentFragment objects. GH [#5](https://github.com/flavorjones/loofah/issues/5)
371
+ * init.rb fixed to support installation as a Rails plugin. GH [#6](https://github.com/flavorjones/loofah/issues/6)
279
372
  (Thanks Josh Nichols!)
280
373
 
281
374
 
data/README.md CHANGED
@@ -6,31 +6,23 @@
6
6
 
7
7
  ## Status
8
8
 
9
- |System|Status|
10
- |--|--|
11
- | Concourse | [![Concourse CI](https://ci.nokogiri.org/api/v1/teams/nokogiri-core/pipelines/loofah/jobs/ruby-2.5/badge)](https://ci.nokogiri.org/teams/nokogiri-core/pipelines/loofah?groups=master) |
12
- | Code Climate | [![Code Climate](https://codeclimate.com/github/flavorjones/loofah.svg)](https://codeclimate.com/github/flavorjones/loofah) |
13
- | Version Eye | [![Version Eye](https://www.versioneye.com/ruby/loofah/badge.png)](https://www.versioneye.com/ruby/loofah) |
9
+ [![Concourse CI](https://ci.nokogiri.org/api/v1/teams/nokogiri-core/pipelines/loofah/jobs/ruby-2.5/badge)](https://ci.nokogiri.org/teams/nokogiri-core/pipelines/loofah?groups=master)
10
+ [![Code Climate](https://codeclimate.com/github/flavorjones/loofah.svg)](https://codeclimate.com/github/flavorjones/loofah)
11
+ [![Tidelift dependencies](https://tidelift.com/badges/package/rubygems/loofah)](https://tidelift.com/subscription/pkg/rubygems-loofah?utm_source=rubygems-loofah&utm_medium=referral&utm_campaign=readme)
14
12
 
15
13
 
16
14
  ## Description
17
15
 
18
- Loofah is a general library for manipulating and transforming HTML/XML
19
- documents and fragments. It's built on top of Nokogiri and libxml2, so
20
- it's fast and has a nice API.
16
+ Loofah is a general library for manipulating and transforming HTML/XML documents and fragments, built on top of Nokogiri.
21
17
 
22
- Loofah excels at HTML sanitization (XSS prevention). It includes some
23
- nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
24
- most likely won't make your codes less secure. (These statements have
25
- not been evaluated by Netexperts.)
18
+ Loofah excels at HTML sanitization (XSS prevention). It includes some nice HTML sanitizers, which are based on HTML5lib's safelist, so it most likely won't make your codes less secure. (These statements have not been evaluated by Netexperts.)
26
19
 
27
- ActiveRecord extensions for sanitization are available in the
28
- [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
20
+ ActiveRecord extensions for sanitization are available in the [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
29
21
 
30
22
 
31
23
  ## Features
32
24
 
33
- * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's whitelists).
25
+ * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's safelists).
34
26
  * Common HTML sanitizing tasks are built-in:
35
27
  * _Strip_ unsafe tags, leaving behind only the inner text.
36
28
  * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
@@ -222,7 +214,7 @@ Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
222
214
  === Built-In HTML Scrubbers
223
215
 
224
216
  Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
225
- whitelist algorithm:
217
+ safelist algorithm:
226
218
 
227
219
  ``` ruby
228
220
  doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
@@ -308,6 +300,10 @@ And the mailing list is on Google Groups:
308
300
 
309
301
  And the IRC channel is \#loofah on freenode.
310
302
 
303
+ Consider subscribing to [Tidelift][tidelift] which provides license assurances and timely security notifications for your open source dependencies, including Loofah. [Tidelift][tidelift] subscriptions also help the Loofah maintainers fund our [automated testing](https://ci.nokogiri.org) which in turn allows us to ship releases, bugfixes, and security updates more often.
304
+
305
+ [tidelift]: https://tidelift.com/subscription/pkg/rubygems-loofah?utm_source=undefined&utm_medium=referral&utm_campaign=enterprise
306
+
311
307
 
312
308
  ## Security
313
309
 
data/lib/loofah.rb CHANGED
@@ -1,22 +1,24 @@
1
+ # frozen_string_literal: true
1
2
  $LOAD_PATH.unshift(File.expand_path(File.dirname(__FILE__))) unless $LOAD_PATH.include?(File.expand_path(File.dirname(__FILE__)))
2
3
 
3
- require 'nokogiri'
4
+ require "nokogiri"
4
5
 
5
- require 'loofah/metahelpers'
6
- require 'loofah/elements'
6
+ require_relative "loofah/version"
7
+ require_relative "loofah/metahelpers"
8
+ require_relative "loofah/elements"
7
9
 
8
- require 'loofah/html5/whitelist'
9
- require 'loofah/html5/libxml2_workarounds'
10
- require 'loofah/html5/scrub'
10
+ require_relative "loofah/html5/safelist"
11
+ require_relative "loofah/html5/libxml2_workarounds"
12
+ require_relative "loofah/html5/scrub"
11
13
 
12
- require 'loofah/scrubber'
13
- require 'loofah/scrubbers'
14
+ require_relative "loofah/scrubber"
15
+ require_relative "loofah/scrubbers"
14
16
 
15
- require 'loofah/instance_methods'
16
- require 'loofah/xml/document'
17
- require 'loofah/xml/document_fragment'
18
- require 'loofah/html/document'
19
- require 'loofah/html/document_fragment'
17
+ require_relative "loofah/instance_methods"
18
+ require_relative "loofah/xml/document"
19
+ require_relative "loofah/xml/document_fragment"
20
+ require_relative "loofah/html/document"
21
+ require_relative "loofah/html/document_fragment"
20
22
 
21
23
  # == Strings and IO Objects as Input
22
24
  #
@@ -27,14 +29,11 @@ require 'loofah/html/document_fragment'
27
29
  # quantities of docs.
28
30
  #
29
31
  module Loofah
30
- # The version of Loofah you are using
31
- VERSION = '2.2.3'
32
-
33
32
  class << self
34
33
  # Shortcut for Loofah::HTML::Document.parse
35
34
  # This method accepts the same parameters as Nokogiri::HTML::Document.parse
36
35
  def document(*args, &block)
37
- Loofah::HTML::Document.parse(*args, &block)
36
+ remove_comments_before_html_element Loofah::HTML::Document.parse(*args, &block)
38
37
  end
39
38
 
40
39
  # Shortcut for Loofah::HTML::DocumentFragment.parse
@@ -77,7 +76,25 @@ module Loofah
77
76
 
78
77
  # A helper to remove extraneous whitespace from text-ified HTML
79
78
  def remove_extraneous_whitespace(string)
80
- string.gsub(/\n\s*\n\s*\n/,"\n\n")
79
+ string.gsub(/\n\s*\n\s*\n/, "\n\n")
80
+ end
81
+
82
+ private
83
+
84
+ # remove comments that exist outside of the HTML element.
85
+ #
86
+ # these comments are allowed by the HTML spec:
87
+ #
88
+ # https://www.w3.org/TR/html401/struct/global.html#h-7.1
89
+ #
90
+ # but are not scrubbed by Loofah because these nodes don't meet
91
+ # the contract that scrubbers expect of a node (e.g., it can be
92
+ # replaced, sibling and children nodes can be created).
93
+ def remove_comments_before_html_element(doc)
94
+ doc.children.each do |child|
95
+ child.unlink if child.comment?
96
+ end
97
+ doc
81
98
  end
82
99
  end
83
100
  end
@@ -1,89 +1,90 @@
1
- require 'set'
1
+ # frozen_string_literal: true
2
+ require "set"
2
3
 
3
4
  module Loofah
4
5
  module Elements
5
6
  STRICT_BLOCK_LEVEL_HTML4 = Set.new %w[
6
- address
7
- blockquote
8
- center
9
- dir
10
- div
11
- dl
12
- fieldset
13
- form
14
- h1
15
- h2
16
- h3
17
- h4
18
- h5
19
- h6
20
- hr
21
- isindex
22
- menu
23
- noframes
24
- noscript
25
- ol
26
- p
27
- pre
28
- table
29
- ul
30
- ]
7
+ address
8
+ blockquote
9
+ center
10
+ dir
11
+ div
12
+ dl
13
+ fieldset
14
+ form
15
+ h1
16
+ h2
17
+ h3
18
+ h4
19
+ h5
20
+ h6
21
+ hr
22
+ isindex
23
+ menu
24
+ noframes
25
+ noscript
26
+ ol
27
+ p
28
+ pre
29
+ table
30
+ ul
31
+ ]
31
32
 
32
33
  # https://developer.mozilla.org/en-US/docs/Web/HTML/Block-level_elements
33
34
  STRICT_BLOCK_LEVEL_HTML5 = Set.new %w[
34
- address
35
- article
36
- aside
37
- blockquote
38
- canvas
39
- dd
40
- div
41
- dl
42
- dt
43
- fieldset
44
- figcaption
45
- figure
46
- footer
47
- form
48
- h1
49
- h2
50
- h3
51
- h4
52
- h5
53
- h6
54
- header
55
- hgroup
56
- hr
57
- li
58
- main
59
- nav
60
- noscript
61
- ol
62
- output
63
- p
64
- pre
65
- section
66
- table
67
- tfoot
68
- ul
69
- video
70
- ]
35
+ address
36
+ article
37
+ aside
38
+ blockquote
39
+ canvas
40
+ dd
41
+ div
42
+ dl
43
+ dt
44
+ fieldset
45
+ figcaption
46
+ figure
47
+ footer
48
+ form
49
+ h1
50
+ h2
51
+ h3
52
+ h4
53
+ h5
54
+ h6
55
+ header
56
+ hgroup
57
+ hr
58
+ li
59
+ main
60
+ nav
61
+ noscript
62
+ ol
63
+ output
64
+ p
65
+ pre
66
+ section
67
+ table
68
+ tfoot
69
+ ul
70
+ video
71
+ ]
71
72
 
72
73
  STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
73
74
 
74
75
  # The following elements may also be considered block-level
75
76
  # elements since they may contain block-level elements
76
77
  LOOSE_BLOCK_LEVEL = Set.new %w[dd
77
- dt
78
- frameset
79
- li
80
- tbody
81
- td
82
- tfoot
83
- th
84
- thead
85
- tr
86
- ]
78
+ dt
79
+ frameset
80
+ li
81
+ tbody
82
+ td
83
+ tfoot
84
+ th
85
+ thead
86
+ tr
87
+ ]
87
88
 
88
89
  BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
89
90
  end