loofah 2.1.1 → 2.3.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: ca2c20c5ce7252fd9d37c4588ce3d505e39480dc
4
- data.tar.gz: eec13222402419857eb4d1cffbde0fcf837b59c3
2
+ SHA256:
3
+ metadata.gz: 521948af26b151c0584b5eabd8e60c8c31ff451d2b134da4bc632256feeb87f4
4
+ data.tar.gz: 9b699d079c84a6c498fcb5be0e56f7c68ad7049bb0aa498e3413343803fcf585
5
5
  SHA512:
6
- metadata.gz: 2d351e77bef7101ba64be53b98e5630597aa2e72af8e2940d10c79f6479325560069b8daff12091fd4f2eb5c562b8bcbfd6bef53501a0c80d42c9850ebfd4f13
7
- data.tar.gz: 8261bcadedc0943141eb21b198553a7151f030d1f381cf9427a9b9985ff0a6c59e65c95141afc4d48e284d0a38f9b2ce53e02d96cfdf3e99475ab5ee222f9e68
6
+ metadata.gz: 7781d0db35620637fd69051e3729db36f4d10712bab60038df78f523d72b991b8e8f86009655495b56ef69d5b97aa5a621cc22698bc4eaec06577bece6841ec6
7
+ data.tar.gz: e42ab470cc2f3fbb5d0c3965b6a60fe698d0d076b3d87d58f6c4fa209531eac82188bef01c8005a94f3caa3f342ae7df4a850a4107fa043b618bdbd9f98c8d86
@@ -1,26 +1,105 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.3.0 / unreleased
4
+
5
+ ### Features
6
+
7
+ * Expand set of allowed protocols to include `tel:` and `line:`. [#104, #147]
8
+ * Expand set of allowed CSS functions. [related to #122]
9
+ * Allow greater precision in shorthand CSS values. [#149] (Thanks, @danfstucky!)
10
+ * Allow CSS property `list-style` [#162] (Thanks, @jaredbeck!)
11
+ * Allow CSS keywords `thick` and `thin` [#168] (Thanks, @georgeclaghorn!)
12
+ * Allow HTML property `contenteditable` [#167] (Thanks, @andreynering!)
13
+
14
+
15
+ ### Bug fixes
16
+
17
+ * CSS hex values are no longer limited to lowercase hex. Previously uppercase hex were scrubbed. [#165] (Thanks, @asok!)
18
+
19
+
20
+ ### Deprecations / Name Changes
21
+
22
+ The following method and constants are hereby deprecated, and will be completely removed in a future release:
23
+
24
+ * Deprecate `Loofah::Helpers::ActionView.white_list_sanitizer`, please use `Loofah::Helpers::ActionView.safe_list_sanitizer` instead.
25
+ * Deprecate `Loofah::Helpers::ActionView::WhiteListSanitizer`, please use `Loofah::Helpers::ActionView::SafeListSanitizer` instead.
26
+ * Deprecate `Loofah::HTML5::WhiteList`, please use `Loofah::HTML5::SafeList` instead.
27
+
28
+ Thanks to @JuanitoFatas for submitting these changes in #164 and for making the language used in Loofah more inclusive.
29
+
30
+
31
+ ## 2.2.3 / 2018-10-30
32
+
33
+ ### Security
34
+
35
+ Address CVE-2018-16468: Unsanitized JavaScript may occur in sanitized output when a crafted SVG element is republished.
36
+
37
+ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/154
38
+
39
+
40
+ ## Meta / 2018-10-27
41
+
42
+ The mailing list is now on Google Groups [#146](https://github.com/flavorjones/loofah/issues/146):
43
+
44
+ * Mail: loofah-talk@googlegroups.com
45
+ * Archive: https://groups.google.com/forum/#!forum/loofah-talk
46
+
47
+ This change was made because librelist no longer appears to be maintained.
48
+
49
+
50
+ ## 2.2.2 / 2018-03-22
51
+
52
+ Make public `Loofah::HTML5::Scrub.force_correct_attribute_escaping!`,
53
+ which was previously a private method. This is so that downstream gems
54
+ (like rails-html-sanitizer) can use this logic directly for their own
55
+ attribute scrubbers should they need to address CVE-2018-8048.
56
+
57
+
58
+ ## 2.2.1 / 2018-03-19
59
+
60
+ ### Security
61
+
62
+ Addresses CVE-2018-8048. Loofah allowed non-whitelisted attributes to be present in sanitized output when input with specially-crafted HTML fragments.
63
+
64
+ This CVE's public notice is at https://github.com/flavorjones/loofah/issues/144
65
+
66
+
67
+ ## 2.2.0 / 2018-02-11
68
+
69
+ ### Features:
70
+
71
+ * Support HTML5 `<main>` tag. #133 (Thanks, @MothOnMars!)
72
+ * Recognize HTML5 block elements. #136 (Thanks, @MothOnMars!)
73
+ * Support SVG `<symbol>` tag. #131 (Thanks, @baopham!)
74
+ * Support for whitelisting CSS functions, initially just `calc` and `rgb`. #122/#123/#129 (Thanks, @NikoRoberts!)
75
+ * Whitelist CSS property `list-style-type`. #68/#137/#142 (Thanks, @andela-ysanni and @NikoRoberts!)
76
+
77
+ ### Bugfixes:
78
+
79
+ * Properly handle nested `script` tags. #127.
80
+
81
+
3
82
  ## 2.1.1 / 2017-09-24
4
83
 
5
- Bugfixes:
84
+ ### Bugfixes:
6
85
 
7
86
  * Removed warning for unused variable. #124 (Thanks, @y-yagi!)
8
87
 
9
88
 
10
89
  ## 2.1.0 / 2017-09-24
11
90
 
12
- Notes:
91
+ ### Notes:
13
92
 
14
- * Re-implemented CSS parsing and sanitization using the {crass}[https://github.com/rgrove/crass] library. #91
93
+ * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. #91
15
94
 
16
95
 
17
- Features:
96
+ ### Features:
18
97
 
19
98
  * Added :noopener HTML scrubber (Thanks, @tastycode!)
20
99
  * Support `data` URIs with the following media types: text/plain, text/css, image/png, image/gif, image/jpeg, image/svg+xml. #101, #120. (Thanks, @mrpasquini!)
21
100
 
22
101
 
23
- Bugfixes:
102
+ ### Bugfixes:
24
103
 
25
104
  * The :unprintable scrubber now scrubs unprintable characters in CDATA nodes (like `<script>`). #124
26
105
  * Allow negative values in CSS properties. Restores functionality that was reverted in v2.0.3. #91
@@ -28,14 +107,14 @@ Bugfixes:
28
107
 
29
108
  ## 2.0.3 / 2015-08-17
30
109
 
31
- Bug fixes:
110
+ ### Bug fixes:
32
111
 
33
112
  * Revert support for negative values in CSS properties due to slow performance. #90 (Related to #85.)
34
113
 
35
114
 
36
115
  ## 2.0.2 / 2015-05-05
37
116
 
38
- Bug fixes:
117
+ ### Bug fixes:
39
118
 
40
119
  * Fix error with `#to_text` when Loofah::Helpers hadn't been required. #75
41
120
  * Allow multi-word data attributes. #84 (Thanks, @jstorimer!)
@@ -44,24 +123,24 @@ Bug fixes:
44
123
 
45
124
  ## 2.0.1 / 2014-08-21
46
125
 
47
- Bug fixes:
126
+ ### Bug fixes:
48
127
 
49
128
  * Load RR correctly when running test files directly. (Thanks, @ktdreyer!)
50
129
 
51
130
 
52
- Notes:
131
+ ### Notes:
53
132
 
54
133
  * Extracted HTML5::Scrub#scrub_css_attribute to accommodate the Rails integration work. (Thanks, @kaspth!)
55
134
 
56
135
 
57
136
  ## 2.0.0 / 2014-05-09
58
137
 
59
- Compatibility notes:
138
+ ### Compatibility notes:
60
139
 
61
140
  * ActionView helpers now must be required explicitly: `require "loofah/helpers"`
62
141
  * Support for Ruby 1.8.7 and prior has been dropped
63
142
 
64
- Enhancements:
143
+ ### Enhancements:
65
144
 
66
145
  * HTML5 whitelist allows the following ...
67
146
  * tags: `article`, `aside`, `bdi`, `bdo`, `canvas`, `command`, `datalist`, `details`, `figcaption`, `figure`, `footer`, `header`, `mark`, `meter`, `nav`, `output`, `section`, `summary`, `time`
@@ -71,7 +150,7 @@ Enhancements:
71
150
  * `Loofah.fragment` accepts an optional encoding argument, compatible with `Nokogiri::HTML::DocumentFragment.parse`. #62 (Thanks, Ben Atkins!)
72
151
  * HTML5 sanitizers now remove attributes without values. (Thanks, Kasper Timm Hansen!)
73
152
 
74
- Bug fixes:
153
+ ### Bug fixes:
75
154
 
76
155
  * HTML5 sanitizers' CSS keyword check now actually works (broken in v2.0). Additional regression tests added. (Thanks, Kasper Timm Hansen!)
77
156
  * HTML5 sanitizers now allow negative arguments to CSS. #64 (Thanks, Jon Calhoun!)
@@ -84,7 +163,7 @@ Bug fixes:
84
163
 
85
164
  ## 1.2.0 (2011-08-08)
86
165
 
87
- Enhancements:
166
+ ### Enhancements:
88
167
 
89
168
  * Loofah::Helpers.sanitize_css is a replacement for Rails's built-in sanitize_css helper.
90
169
  * Improving ActionView integration.
@@ -92,7 +171,7 @@ Enhancements:
92
171
 
93
172
  ## 1.1.0 (2011-08-08)
94
173
 
95
- Enhancements:
174
+ ### Enhancements:
96
175
 
97
176
  * Additional HTML5lib whitelist elements (from html5lib 1524:80b5efe26230).
98
177
  Up to date with HTML5lib ruby code as of 1723:7ee6a0331856.
@@ -102,7 +181,7 @@ Enhancements:
102
181
 
103
182
  ## 1.0.0 (2010-10-26)
104
183
 
105
- Notes:
184
+ ### Notes:
106
185
 
107
186
  * Moved ActiveRecord functionality into `loofah-activerecord` gem.
108
187
  * Removed DEPRECATIONS.rdoc documenting 0.3.0 API changes.
@@ -110,7 +189,7 @@ Notes:
110
189
 
111
190
  ## 0.4.7 (2010-03-09)
112
191
 
113
- Enhancements:
192
+ ### Enhancements:
114
193
 
115
194
  * New methods Loofah::HTML::Document#to_text and
116
195
  Loofah::HTML::DocumentFragment#to_text do the right thing with
@@ -123,23 +202,23 @@ Enhancements:
123
202
 
124
203
  ## 0.4.4, 0.4.5, 0.4.6 (2010-02-01)
125
204
 
126
- Enhancements:
205
+ ### Enhancements:
127
206
 
128
207
  * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text now escape HTML entities.
129
208
 
130
- Bug fixes:
209
+ ### Bug fixes:
131
210
 
132
211
  * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
133
212
 
134
213
 
135
214
  ## 0.4.3 (2010-01-29)
136
215
 
137
- Enhancements:
216
+ ### Enhancements:
138
217
 
139
218
  * All built-in scrubbers are accepted by ActiveRecord::Base.xss_foliate
140
219
  * Loofah::XssFoliate.xss_foliate_all_models replaces use of the constant LOOFAH_XSS_FOLIATE_ALL_MODELS
141
220
 
142
- Miscellaneous:
221
+ ### Miscellaneous:
143
222
 
144
223
  * Modified documentation for bootstrapping XssFoliate in a Rails app,
145
224
  since the use of Bundler breaks the previously-documented method. To
@@ -148,18 +227,18 @@ Miscellaneous:
148
227
 
149
228
  ## 0.4.2 (2010-01-22)
150
229
 
151
- Enhancements:
230
+ ### Enhancements:
152
231
 
153
232
  * Implemented Node#scrub! for scrubbing subtrees.
154
233
  * Implemented NodeSet#scrub! for scrubbing a set of subtrees.
155
234
  * Document.text now only serializes <body> contents (ignores <head>)
156
235
  * <head>, <html> and <body> added to the HTML5lib whitelist.
157
236
 
158
- Bug fixes:
237
+ ### Bug fixes:
159
238
 
160
239
  * Supporting Rails apps that aren't loading ActiveRecord. GH #10
161
240
 
162
- Miscellaneous:
241
+ ### Miscellaneous:
163
242
 
164
243
  * Mailing list is now loofah@librelist.com / http://librelist.com
165
244
  * IRC channel is now \#loofah on freenode.
@@ -167,14 +246,14 @@ Miscellaneous:
167
246
 
168
247
  ## 0.4.1 (2009-11-23)
169
248
 
170
- Bugfix:
249
+ ### Bugfix:
171
250
 
172
251
  * Manifest fixed. Whoops.
173
252
 
174
253
 
175
254
  ## 0.4.0 (2009-11-21)
176
255
 
177
- Enhancements:
256
+ ### Enhancements:
178
257
 
179
258
  * Scrubber class introduced, allowing development of custom scrubbers.
180
259
  * Added support for XML documents and fragments.
@@ -185,20 +264,20 @@ Enhancements:
185
264
 
186
265
  ## 0.3.1 (2009-10-12)
187
266
 
188
- Bug fixes:
267
+ ### Bug fixes:
189
268
 
190
269
  * Scrubbed Documents properly render html, head and body tags when serialized.
191
270
 
192
271
 
193
272
  ## 0.3.0 (2009-10-06)
194
273
 
195
- Enhancements:
274
+ ### Enhancements:
196
275
 
197
276
  * New ActiveRecord extension `xss_foliate`, a drop-in replacement for xss_terminate[http://github.com/look/xss_terminate/tree/master].
198
277
  * Replacement methods for Rails's helpers, Loofah::Rails.sanitize and Loofah::Rails.strip_tags.
199
278
  * Official support (and test coverage) for Rails versions 2.3, 2.2, 2.1, 2.0 and 1.2.
200
279
 
201
- Deprecations:
280
+ ### Deprecations:
202
281
 
203
282
  * The methods strip_tags, whitewash, whitewash_document, sanitize, and
204
283
  sanitize_document have been deprecated. See DEPRECATED.rdoc for
@@ -207,7 +286,7 @@ Deprecations:
207
286
 
208
287
  ## 0.2.2 (2009-09-30)
209
288
 
210
- Enhancements:
289
+ ### Enhancements:
211
290
 
212
291
  * ActiveRecord extension scrubs fields in a before_validation callback
213
292
  (was previously in a before_save)
@@ -215,12 +294,12 @@ Enhancements:
215
294
 
216
295
  ## 0.2.1 (2009-09-19)
217
296
 
218
- Enhancements:
297
+ ### Enhancements:
219
298
 
220
299
  * when loaded in a Rails app, automatically extend ActiveRecord::Base
221
300
  with html_fragment and html_document. GH #6 (Thanks Josh Nichols!)
222
301
 
223
- Bugfixes:
302
+ ### Bugfixes:
224
303
 
225
304
  * ActiveRecord scrubbing should generate strings instead of Document or
226
305
  DocumentFragment objects. GH #5
data/Gemfile CHANGED
@@ -15,8 +15,8 @@ gem "hoe-gemspec", ">=0", :group => [:development, :test]
15
15
  gem "hoe-debugging", ">=0", :group => [:development, :test]
16
16
  gem "hoe-bundler", ">=0", :group => [:development, :test]
17
17
  gem "hoe-git", ">=0", :group => [:development, :test]
18
- gem "concourse", ">=0.14.0", :group => [:development, :test]
19
- gem "rdoc", "~>4.0", :group => [:development, :test]
20
- gem "hoe", "~>3.16", :group => [:development, :test]
18
+ gem "concourse", ">=0.26.0", :group => [:development, :test]
19
+ gem "rdoc", ">=4.0", "<7", :group => [:development, :test]
20
+ gem "hoe", "~>3.17", :group => [:development, :test]
21
21
 
22
22
  # vim: syntax=ruby
@@ -2,7 +2,7 @@ The MIT License
2
2
 
3
3
  The MIT License
4
4
 
5
- Copyright (c) 2009 -- 2014 by Mike Dalessio, Bryan Helmkamp
5
+ Copyright (c) 2009 -- 2018 by Mike Dalessio, Bryan Helmkamp
6
6
 
7
7
  Permission is hereby granted, free of charge, to any person obtaining a copy
8
8
  of this software and associated documentation files (the "Software"), to deal
@@ -3,8 +3,9 @@ CHANGELOG.md
3
3
  Gemfile
4
4
  MIT-LICENSE.txt
5
5
  Manifest.txt
6
- README.rdoc
6
+ README.md
7
7
  Rakefile
8
+ SECURITY.md
8
9
  benchmark/benchmark.rb
9
10
  benchmark/fragment.html
10
11
  benchmark/helper.rb
@@ -14,17 +15,20 @@ lib/loofah/elements.rb
14
15
  lib/loofah/helpers.rb
15
16
  lib/loofah/html/document.rb
16
17
  lib/loofah/html/document_fragment.rb
18
+ lib/loofah/html5/libxml2_workarounds.rb
19
+ lib/loofah/html5/safelist.rb
17
20
  lib/loofah/html5/scrub.rb
18
- lib/loofah/html5/whitelist.rb
19
21
  lib/loofah/instance_methods.rb
20
22
  lib/loofah/metahelpers.rb
21
23
  lib/loofah/scrubber.rb
22
24
  lib/loofah/scrubbers.rb
23
25
  lib/loofah/xml/document.rb
24
26
  lib/loofah/xml/document_fragment.rb
27
+ test/assets/msword.html
25
28
  test/assets/testdata_sanitizer_tests1.dat
26
29
  test/helper.rb
27
30
  test/html5/test_sanitizer.rb
31
+ test/html5/test_scrub.rb
28
32
  test/integration/test_ad_hoc.rb
29
33
  test/integration/test_helpers.rb
30
34
  test/integration/test_html.rb
@@ -0,0 +1,369 @@
1
+ # Loofah
2
+
3
+ * https://github.com/flavorjones/loofah
4
+ * Docs: http://rubydoc.info/github/flavorjones/loofah/master/frames
5
+ * Mailing list: [loofah-talk@googlegroups.com](https://groups.google.com/forum/#!forum/loofah-talk)
6
+
7
+ ## Status
8
+
9
+ |System|Status|
10
+ |--|--|
11
+ | Concourse CI | [![Concourse CI](https://ci.nokogiri.org/api/v1/teams/nokogiri-core/pipelines/loofah/jobs/ruby-2.5/badge)](https://ci.nokogiri.org/teams/nokogiri-core/pipelines/loofah?groups=master) |
12
+ | Code Climate | [![Code Climate](https://codeclimate.com/github/flavorjones/loofah.svg)](https://codeclimate.com/github/flavorjones/loofah) |
13
+
14
+
15
+ ## Description
16
+
17
+ Loofah is a general library for manipulating and transforming HTML/XML
18
+ documents and fragments. It's built on top of Nokogiri and libxml2, so
19
+ it's fast and has a nice API.
20
+
21
+ Loofah excels at HTML sanitization (XSS prevention). It includes some
22
+ nice HTML sanitizers, which are based on HTML5lib's safelist, so it
23
+ most likely won't make your codes less secure. (These statements have
24
+ not been evaluated by Netexperts.)
25
+
26
+ ActiveRecord extensions for sanitization are available in the
27
+ [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
28
+
29
+
30
+ ## Features
31
+
32
+ * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's safelists).
33
+ * Common HTML sanitizing tasks are built-in:
34
+ * _Strip_ unsafe tags, leaving behind only the inner text.
35
+ * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
36
+ * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
37
+ * _Whitewash_ the markup, removing all attributes and namespaced nodes.
38
+ * Common HTML transformation tasks are built-in:
39
+ * Add the _nofollow_ attribute to all hyperlinks.
40
+ * Format markup as plain text, with or without sensible whitespace handling around block elements.
41
+ * Replace Rails's `strip_tags` and `sanitize` view helper methods.
42
+
43
+
44
+ ## Compare and Contrast
45
+
46
+ Loofah is one of two known Ruby XSS/sanitization solutions that
47
+ guarantees well-formed and valid markup (the other is Sanitize, which
48
+ also uses Nokogiri).
49
+
50
+ Loofah works on XML, XHTML and HTML documents.
51
+
52
+ Also, it's pretty fast. Here is a benchmark comparing Loofah to other
53
+ commonly-used libraries (ActionView, Sanitize, HTML5lib and HTMLfilter):
54
+
55
+ * https://gist.github.com/170193
56
+
57
+ Lastly, Loofah is extensible. It's super-easy to write your own custom
58
+ scrubbers for whatever document manipulation you need. You don't like
59
+ the built-in scrubbers? Build your own, like a boss.
60
+
61
+
62
+ ## The Basics
63
+
64
+ Loofah wraps [Nokogiri](http://nokogiri.org) in a loving
65
+ embrace. Nokogiri is an excellent HTML/XML parser. If you don't know
66
+ how Nokogiri works, you might want to pause for a moment and go check
67
+ it out. I'll wait.
68
+
69
+ Loofah presents the following classes:
70
+
71
+ * `Loofah::HTML::Document` and `Loofah::HTML::DocumentFragment`
72
+ * `Loofah::XML::Document` and `Loofah::XML::DocumentFragment`
73
+ * `Loofah::Scrubber`
74
+
75
+ The documents and fragments are subclasses of the similar Nokogiri classes.
76
+
77
+ The Scrubber represents the document manipulation, either by wrapping
78
+ a block,
79
+
80
+ ``` ruby
81
+ span2div = Loofah::Scrubber.new do |node|
82
+ node.name = "div" if node.name == "span"
83
+ end
84
+ ```
85
+
86
+ or by implementing a method.
87
+
88
+
89
+ ### Side Note: Fragments vs Documents
90
+
91
+ Generally speaking, unless you expect to have a DOCTYPE and a single
92
+ root node, you don't have a *document*, you have a *fragment*. For
93
+ HTML, another rule of thumb is that *documents* have `html` and `body`
94
+ tags, and *fragments* usually do not.
95
+
96
+ HTML fragments should be parsed with Loofah.fragment. The result won't
97
+ be wrapped in `html` or `body` tags, won't have a DOCTYPE declaration,
98
+ `head` elements will be silently ignored, and multiple root nodes are
99
+ allowed.
100
+
101
+ XML fragments should be parsed with Loofah.xml_fragment. The result
102
+ won't have a DOCTYPE declaration, and multiple root nodes are allowed.
103
+
104
+ HTML documents should be parsed with Loofah.document. The result will
105
+ have a DOCTYPE declaration, along with `html`, `head` and `body` tags.
106
+
107
+ XML documents should be parsed with Loofah.xml_document. The result
108
+ will have a DOCTYPE declaration and a single root node.
109
+
110
+
111
+ ### Loofah::HTML::Document and Loofah::HTML::DocumentFragment
112
+
113
+ These classes are subclasses of Nokogiri::HTML::Document and
114
+ Nokogiri::HTML::DocumentFragment, so you get all the markup
115
+ fixer-uppery and API goodness of Nokogiri.
116
+
117
+ The module methods Loofah.document and Loofah.fragment will parse an
118
+ HTML document and an HTML fragment, respectively.
119
+
120
+ ``` ruby
121
+ Loofah.document(unsafe_html).is_a?(Nokogiri::HTML::Document) # => true
122
+ Loofah.fragment(unsafe_html).is_a?(Nokogiri::HTML::DocumentFragment) # => true
123
+ ```
124
+
125
+ Loofah injects a `scrub!` method, which takes either a symbol (for
126
+ built-in scrubbers) or a Loofah::Scrubber object (for custom
127
+ scrubbers), and modifies the document in-place.
128
+
129
+ Loofah overrides `to_s` to return HTML:
130
+
131
+ ``` ruby
132
+ unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"
133
+
134
+ doc = Loofah.fragment(unsafe_html).scrub!(:prune)
135
+ doc.to_s # => "ohai! <div>div is safe</div> "
136
+ ```
137
+
138
+ and `text` to return plain text:
139
+
140
+ ``` ruby
141
+ doc.text # => "ohai! div is safe "
142
+ ```
143
+
144
+ Also, `to_text` is available, which does the right thing with
145
+ whitespace around block-level elements.
146
+
147
+ ``` ruby
148
+ doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
149
+ doc.text # => "TitleContent" # probably not what you want
150
+ doc.to_text # => "\nTitle\n\nContent\n" # better
151
+ ```
152
+
153
+ ### Loofah::XML::Document and Loofah::XML::DocumentFragment
154
+
155
+ These classes are subclasses of Nokogiri::XML::Document and
156
+ Nokogiri::XML::DocumentFragment, so you get all the markup
157
+ fixer-uppery and API goodness of Nokogiri.
158
+
159
+ The module methods Loofah.xml_document and Loofah.xml_fragment will
160
+ parse an XML document and an XML fragment, respectively.
161
+
162
+ ``` ruby
163
+ Loofah.xml_document(bad_xml).is_a?(Nokogiri::XML::Document) # => true
164
+ Loofah.xml_fragment(bad_xml).is_a?(Nokogiri::XML::DocumentFragment) # => true
165
+ ```
166
+
167
+ ### Nodes and NodeSets
168
+
169
+ Nokogiri::XML::Node and Nokogiri::XML::NodeSet also get a `scrub!`
170
+ method, which makes it easy to scrub subtrees.
171
+
172
+ The following code will apply the `employee_scrubber` only to the
173
+ `employee` nodes (and their subtrees) in the document:
174
+
175
+ ``` ruby
176
+ Loofah.xml_document(bad_xml).xpath("//employee").scrub!(employee_scrubber)
177
+ ```
178
+
179
+ And this code will only scrub the first `employee` node and its subtree:
180
+
181
+ ``` ruby
182
+ Loofah.xml_document(bad_xml).at_xpath("//employee").scrub!(employee_scrubber)
183
+ ```
184
+
185
+ ### Loofah::Scrubber
186
+
187
+ A Scrubber wraps up a block (or method) that is run on a document node:
188
+
189
+ ``` ruby
190
+ # change all <span> tags to <div> tags
191
+ span2div = Loofah::Scrubber.new do |node|
192
+ node.name = "div" if node.name == "span"
193
+ end
194
+ ```
195
+
196
+ This can then be run on a document:
197
+
198
+ ``` ruby
199
+ Loofah.fragment("<span>foo</span><p>bar</p>").scrub!(span2div).to_s
200
+ # => "<div>foo</div><p>bar</p>"
201
+ ```
202
+
203
+ Scrubbers can be run on a document in either a top-down traversal (the
204
+ default) or bottom-up. Top-down scrubbers can optionally return
205
+ Scrubber::STOP to terminate the traversal of a subtree. Read below and
206
+ in the Loofah::Scrubber class for more detailed usage.
207
+
208
+ Here's an XML example:
209
+
210
+ ``` ruby
211
+ # remove all <employee> tags that have a "deceased" attribute set to true
212
+ bring_out_your_dead = Loofah::Scrubber.new do |node|
213
+ if node.name == "employee" and node["deceased"] == "true"
214
+ node.remove
215
+ Loofah::Scrubber::STOP # don't bother with the rest of the subtree
216
+ end
217
+ end
218
+ Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
219
+ ```
220
+
221
+ === Built-In HTML Scrubbers
222
+
223
+ Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
224
+ safelist algorithm:
225
+
226
+ ``` ruby
227
+ doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
228
+ doc.scrub!(:prune) # removes unknown/unsafe tags and their children
229
+ doc.scrub!(:escape) # escapes unknown/unsafe tags, like this: &lt;script&gt;
230
+ doc.scrub!(:whitewash) # removes unknown/unsafe/namespaced tags and their children,
231
+ # and strips all node attributes
232
+ ```
233
+
234
+ Loofah also comes with some common transformation tasks:
235
+
236
+ ``` ruby
237
+ doc.scrub!(:nofollow) # adds rel="nofollow" attribute to links
238
+ doc.scrub!(:unprintable) # removes unprintable characters from text nodes
239
+ ```
240
+
241
+ See Loofah::Scrubbers for more details and example usage.
242
+
243
+
244
+ ### Chaining Scrubbers
245
+
246
+ You can chain scrubbers:
247
+
248
+ ``` ruby
249
+ Loofah.fragment("<span>hello</span> <script>alert('OHAI')</script>") \
250
+ .scrub!(:prune) \
251
+ .scrub!(span2div).to_s
252
+ # => "<div>hello</div> "
253
+ ```
254
+
255
+ ### Shorthand
256
+
257
+ The class methods Loofah.scrub_fragment and Loofah.scrub_document are
258
+ shorthand.
259
+
260
+ ``` ruby
261
+ Loofah.scrub_fragment(unsafe_html, :prune)
262
+ Loofah.scrub_document(unsafe_html, :prune)
263
+ Loofah.scrub_xml_fragment(bad_xml, custom_scrubber)
264
+ Loofah.scrub_xml_document(bad_xml, custom_scrubber)
265
+ ```
266
+
267
+ are the same thing as (and arguably semantically clearer than):
268
+
269
+ ``` ruby
270
+ Loofah.fragment(unsafe_html).scrub!(:prune)
271
+ Loofah.document(unsafe_html).scrub!(:prune)
272
+ Loofah.xml_fragment(bad_xml).scrub!(custom_scrubber)
273
+ Loofah.xml_document(bad_xml).scrub!(custom_scrubber)
274
+ ```
275
+
276
+
277
+ ### View Helpers
278
+
279
+ Loofah has two "view helpers": Loofah::Helpers.sanitize and
280
+ Loofah::Helpers.strip_tags, both of which are drop-in replacements for
281
+ the Rails ActionView helpers of the same name.
282
+ These are no longer required automatically. You must require `loofah/helpers`.
283
+
284
+
285
+ ## Requirements
286
+
287
+ * Nokogiri >= 1.5.9
288
+
289
+
290
+ ## Installation
291
+
292
+ Unsurprisingly:
293
+
294
+ * gem install loofah
295
+
296
+
297
+ ## Support
298
+
299
+ The bug tracker is available here:
300
+
301
+ * https://github.com/flavorjones/loofah/issues
302
+
303
+ And the mailing list is on Google Groups:
304
+
305
+ * Mail: loofah-talk@googlegroups.com
306
+ * Archive: https://groups.google.com/forum/#!forum/loofah-talk
307
+
308
+ And the IRC channel is \#loofah on freenode.
309
+
310
+
311
+ ## Security
312
+
313
+ See [`SECURITY.md`](SECURITY.md) for vulnerability reporting details.
314
+
315
+
316
+ ### "Secure by Default"
317
+
318
+ Some tools may incorrectly report Loofah as a potential security
319
+ vulnerability.
320
+
321
+ Loofah depends on Nokogiri, and it's _possible_ to use Nokogiri in a
322
+ dangerous way (by enabling its DTDLOAD option and disabling its NONET
323
+ option). This specifically allows the opportunity for an XML External
324
+ Entity (XXE) vulnerability if the XML data is untrusted.
325
+
326
+ However, Loofah __never enables this Nokogiri configuration__; Loofah
327
+ never enables DTDLOAD, and it never disables NONET, thereby protecting
328
+ you by default from this XXE vulnerability.
329
+
330
+
331
+ ## Related Links
332
+
333
+ * Nokogiri: http://nokogiri.org
334
+ * libxml2: http://xmlsoft.org
335
+ * html5lib: https://code.google.com/p/html5lib
336
+
337
+
338
+ ## Authors
339
+
340
+ * [Mike Dalessio](http://mike.daless.io) ([@flavorjones](https://twitter.com/flavorjones))
341
+ * Bryan Helmkamp
342
+
343
+ Featuring code contributed by:
344
+
345
+ * Aaron Patterson
346
+ * John Barnette
347
+ * Josh Owens
348
+ * Paul Dix
349
+ * Luke Melia
350
+
351
+ And a big shout-out to Corey Innis for the name, and feedback on the API.
352
+
353
+
354
+ ## Thank You
355
+
356
+ The following people have generously donated via the [Pledgie](http://pledgie.com) badge on the [Loofah github page](https://github.com/flavorjones/loofah):
357
+
358
+ * Bill Harding
359
+
360
+
361
+ ## Historical Note
362
+
363
+ This library was formerly known as Dryopteris, which was a very bad
364
+ name that nobody could spell properly.
365
+
366
+
367
+ ## License
368
+
369
+ Distributed under the MIT License. See `MIT-LICENSE.txt` for details.