loofah 2.1.1 → 2.2.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ca2c20c5ce7252fd9d37c4588ce3d505e39480dc
4
- data.tar.gz: eec13222402419857eb4d1cffbde0fcf837b59c3
3
+ metadata.gz: 6656f9e5edc815b2c5ee676d1c4fb818b2dc03f4
4
+ data.tar.gz: 7bea1d04f8af479fd825c7adf687f0ca0c624830
5
5
  SHA512:
6
- metadata.gz: 2d351e77bef7101ba64be53b98e5630597aa2e72af8e2940d10c79f6479325560069b8daff12091fd4f2eb5c562b8bcbfd6bef53501a0c80d42c9850ebfd4f13
7
- data.tar.gz: 8261bcadedc0943141eb21b198553a7151f030d1f381cf9427a9b9985ff0a6c59e65c95141afc4d48e284d0a38f9b2ce53e02d96cfdf3e99475ab5ee222f9e68
6
+ metadata.gz: 42f030b7228867ebf322c9d8e286349e1288ef3d60f90fe404b0d9250cc626ea6fad84ff1325cd2754ea4a7fdf80802a4bdae5a9b7121ac312e56d96c280d1a3
7
+ data.tar.gz: 8a67c56281a65b6e89d8623f40423ae41ed2628eeb0a90193196cfb87aeb4efccbe23c961b05ab26a247bac0117a55b68dea97ab6b67076e272ebad8471e33cb
@@ -1,5 +1,20 @@
1
1
  # Changelog
2
2
 
3
+ ## 2.2.0 / 2018-02-11
4
+
5
+ Features:
6
+
7
+ * Support HTML5 `<main>` tag. #133 (Thanks, @MothOnMars!)
8
+ * Recognize HTML5 block elements. #136 (Thanks, @MothOnMars!)
9
+ * Support SVG `<symbol>` tag. #131 (Thanks, @baopham!)
10
+ * Support for whitelisting CSS functions, initially just `calc` and `rgb`. #122/#123/#129 (Thanks, @NikoRoberts!)
11
+ * Whitelist CSS property `list-style-type`. #68/#137/#142 (Thanks, @andela-ysanni and @NikoRoberts!)
12
+
13
+ Bugfixes:
14
+
15
+ * Properly handle nested `script` tags. #127.
16
+
17
+
3
18
  ## 2.1.1 / 2017-09-24
4
19
 
5
20
  Bugfixes:
@@ -11,7 +26,7 @@ Bugfixes:
11
26
 
12
27
  Notes:
13
28
 
14
- * Re-implemented CSS parsing and sanitization using the {crass}[https://github.com/rgrove/crass] library. #91
29
+ * Re-implemented CSS parsing and sanitization using the [crass](https://github.com/rgrove/crass) library. #91
15
30
 
16
31
 
17
32
  Features:
data/Gemfile CHANGED
@@ -15,7 +15,7 @@ gem "hoe-gemspec", ">=0", :group => [:development, :test]
15
15
  gem "hoe-debugging", ">=0", :group => [:development, :test]
16
16
  gem "hoe-bundler", ">=0", :group => [:development, :test]
17
17
  gem "hoe-git", ">=0", :group => [:development, :test]
18
- gem "concourse", ">=0.14.0", :group => [:development, :test]
18
+ gem "concourse", ">=0.15.0", :group => [:development, :test]
19
19
  gem "rdoc", "~>4.0", :group => [:development, :test]
20
20
  gem "hoe", "~>3.16", :group => [:development, :test]
21
21
 
@@ -2,7 +2,7 @@ The MIT License
2
2
 
3
3
  The MIT License
4
4
 
5
- Copyright (c) 2009 -- 2014 by Mike Dalessio, Bryan Helmkamp
5
+ Copyright (c) 2009 -- 2018 by Mike Dalessio, Bryan Helmkamp
6
6
 
7
7
  Permission is hereby granted, free of charge, to any person obtaining a copy
8
8
  of this software and associated documentation files (the "Software"), to deal
@@ -3,7 +3,7 @@ CHANGELOG.md
3
3
  Gemfile
4
4
  MIT-LICENSE.txt
5
5
  Manifest.txt
6
- README.rdoc
6
+ README.md
7
7
  Rakefile
8
8
  benchmark/benchmark.rb
9
9
  benchmark/fragment.html
@@ -0,0 +1,361 @@
1
+ # Loofah
2
+
3
+ * https://github.com/flavorjones/loofah
4
+ * http://rubydoc.info/github/flavorjones/loofah/master/frames
5
+ * http://librelist.com/browser/loofah
6
+
7
+ ## Status
8
+
9
+ |System|Status|
10
+ |--|--|
11
+ | Concourse | [![Concourse CI](https://ci.nokogiri.org/api/v1/teams/nokogiri-core/pipelines/loofah/jobs/ruby-2.5/badge)](https://ci.nokogiri.org/teams/nokogiri-core/pipelines/loofah?groups=master) |
12
+ | Code Climate | [![Code Climate](https://codeclimate.com/github/flavorjones/loofah.svg)](https://codeclimate.com/github/flavorjones/loofah) |
13
+ | Version Eye | [![Version Eye](https://www.versioneye.com/ruby/loofah/badge.png)](https://www.versioneye.com/ruby/loofah) |
14
+
15
+
16
+ ## Description
17
+
18
+ Loofah is a general library for manipulating and transforming HTML/XML
19
+ documents and fragments. It's built on top of Nokogiri and libxml2, so
20
+ it's fast and has a nice API.
21
+
22
+ Loofah excels at HTML sanitization (XSS prevention). It includes some
23
+ nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
24
+ most likely won't make your codes less secure. (These statements have
25
+ not been evaluated by Netexperts.)
26
+
27
+ ActiveRecord extensions for sanitization are available in the
28
+ [`loofah-activerecord` gem](https://github.com/flavorjones/loofah-activerecord).
29
+
30
+
31
+ ## Features
32
+
33
+ * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's whitelists).
34
+ * Common HTML sanitizing tasks are built-in:
35
+ * _Strip_ unsafe tags, leaving behind only the inner text.
36
+ * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
37
+ * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
38
+ * _Whitewash_ the markup, removing all attributes and namespaced nodes.
39
+ * Common HTML transformation tasks are built-in:
40
+ * Add the _nofollow_ attribute to all hyperlinks.
41
+ * Format markup as plain text, with or without sensible whitespace handling around block elements.
42
+ * Replace Rails's `strip_tags` and `sanitize` view helper methods.
43
+
44
+
45
+ ## Compare and Contrast
46
+
47
+ Loofah is one of two known Ruby XSS/sanitization solutions that
48
+ guarantees well-formed and valid markup (the other is Sanitize, which
49
+ also uses Nokogiri).
50
+
51
+ Loofah works on XML, XHTML and HTML documents.
52
+
53
+ Also, it's pretty fast. Here is a benchmark comparing Loofah to other
54
+ commonly-used libraries (ActionView, Sanitize, HTML5lib and HTMLfilter):
55
+
56
+ * https://gist.github.com/170193
57
+
58
+ Lastly, Loofah is extensible. It's super-easy to write your own custom
59
+ scrubbers for whatever document manipulation you need. You don't like
60
+ the built-in scrubbers? Build your own, like a boss.
61
+
62
+
63
+ ## The Basics
64
+
65
+ Loofah wraps [Nokogiri](http://nokogiri.org) in a loving
66
+ embrace. Nokogiri is an excellent HTML/XML parser. If you don't know
67
+ how Nokogiri works, you might want to pause for a moment and go check
68
+ it out. I'll wait.
69
+
70
+ Loofah presents the following classes:
71
+
72
+ * `Loofah::HTML::Document` and `Loofah::HTML::DocumentFragment`
73
+ * `Loofah::XML::Document` and `Loofah::XML::DocumentFragment`
74
+ * `Loofah::Scrubber`
75
+
76
+ The documents and fragments are subclasses of the similar Nokogiri classes.
77
+
78
+ The Scrubber represents the document manipulation, either by wrapping
79
+ a block,
80
+
81
+ ``` ruby
82
+ span2div = Loofah::Scrubber.new do |node|
83
+ node.name = "div" if node.name == "span"
84
+ end
85
+ ```
86
+
87
+ or by implementing a method.
88
+
89
+
90
+ ### Side Note: Fragments vs Documents
91
+
92
+ Generally speaking, unless you expect to have a DOCTYPE and a single
93
+ root node, you don't have a *document*, you have a *fragment*. For
94
+ HTML, another rule of thumb is that *documents* have `html` and `body`
95
+ tags, and *fragments* usually do not.
96
+
97
+ HTML fragments should be parsed with Loofah.fragment. The result won't
98
+ be wrapped in `html` or `body` tags, won't have a DOCTYPE declaration,
99
+ `head` elements will be silently ignored, and multiple root nodes are
100
+ allowed.
101
+
102
+ XML fragments should be parsed with Loofah.xml_fragment. The result
103
+ won't have a DOCTYPE declaration, and multiple root nodes are allowed.
104
+
105
+ HTML documents should be parsed with Loofah.document. The result will
106
+ have a DOCTYPE declaration, along with `html`, `head` and `body` tags.
107
+
108
+ XML documents should be parsed with Loofah.xml_document. The result
109
+ will have a DOCTYPE declaration and a single root node.
110
+
111
+
112
+ ### Loofah::HTML::Document and Loofah::HTML::DocumentFragment
113
+
114
+ These classes are subclasses of Nokogiri::HTML::Document and
115
+ Nokogiri::HTML::DocumentFragment, so you get all the markup
116
+ fixer-uppery and API goodness of Nokogiri.
117
+
118
+ The module methods Loofah.document and Loofah.fragment will parse an
119
+ HTML document and an HTML fragment, respectively.
120
+
121
+ ``` ruby
122
+ Loofah.document(unsafe_html).is_a?(Nokogiri::HTML::Document) # => true
123
+ Loofah.fragment(unsafe_html).is_a?(Nokogiri::HTML::DocumentFragment) # => true
124
+ ```
125
+
126
+ Loofah injects a `scrub!` method, which takes either a symbol (for
127
+ built-in scrubbers) or a Loofah::Scrubber object (for custom
128
+ scrubbers), and modifies the document in-place.
129
+
130
+ Loofah overrides `to_s` to return HTML:
131
+
132
+ ``` ruby
133
+ unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"
134
+
135
+ doc = Loofah.fragment(unsafe_html).scrub!(:prune)
136
+ doc.to_s # => "ohai! <div>div is safe</div> "
137
+ ```
138
+
139
+ and `text` to return plain text:
140
+
141
+ ``` ruby
142
+ doc.text # => "ohai! div is safe "
143
+ ```
144
+
145
+ Also, `to_text` is available, which does the right thing with
146
+ whitespace around block-level elements.
147
+
148
+ ``` ruby
149
+ doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
150
+ doc.text # => "TitleContent" # probably not what you want
151
+ doc.to_text # => "\nTitle\n\nContent\n" # better
152
+ ```
153
+
154
+ ### Loofah::XML::Document and Loofah::XML::DocumentFragment
155
+
156
+ These classes are subclasses of Nokogiri::XML::Document and
157
+ Nokogiri::XML::DocumentFragment, so you get all the markup
158
+ fixer-uppery and API goodness of Nokogiri.
159
+
160
+ The module methods Loofah.xml_document and Loofah.xml_fragment will
161
+ parse an XML document and an XML fragment, respectively.
162
+
163
+ ``` ruby
164
+ Loofah.xml_document(bad_xml).is_a?(Nokogiri::XML::Document) # => true
165
+ Loofah.xml_fragment(bad_xml).is_a?(Nokogiri::XML::DocumentFragment) # => true
166
+ ```
167
+
168
+ ### Nodes and NodeSets
169
+
170
+ Nokogiri::XML::Node and Nokogiri::XML::NodeSet also get a `scrub!`
171
+ method, which makes it easy to scrub subtrees.
172
+
173
+ The following code will apply the `employee_scrubber` only to the
174
+ `employee` nodes (and their subtrees) in the document:
175
+
176
+ ``` ruby
177
+ Loofah.xml_document(bad_xml).xpath("//employee").scrub!(employee_scrubber)
178
+ ```
179
+
180
+ And this code will only scrub the first `employee` node and its subtree:
181
+
182
+ ``` ruby
183
+ Loofah.xml_document(bad_xml).at_xpath("//employee").scrub!(employee_scrubber)
184
+ ```
185
+
186
+ ### Loofah::Scrubber
187
+
188
+ A Scrubber wraps up a block (or method) that is run on a document node:
189
+
190
+ ``` ruby
191
+ # change all <span> tags to <div> tags
192
+ span2div = Loofah::Scrubber.new do |node|
193
+ node.name = "div" if node.name == "span"
194
+ end
195
+ ```
196
+
197
+ This can then be run on a document:
198
+
199
+ ``` ruby
200
+ Loofah.fragment("<span>foo</span><p>bar</p>").scrub!(span2div).to_s
201
+ # => "<div>foo</div><p>bar</p>"
202
+ ```
203
+
204
+ Scrubbers can be run on a document in either a top-down traversal (the
205
+ default) or bottom-up. Top-down scrubbers can optionally return
206
+ Scrubber::STOP to terminate the traversal of a subtree. Read below and
207
+ in the Loofah::Scrubber class for more detailed usage.
208
+
209
+ Here's an XML example:
210
+
211
+ ``` ruby
212
+ # remove all <employee> tags that have a "deceased" attribute set to true
213
+ bring_out_your_dead = Loofah::Scrubber.new do |node|
214
+ if node.name == "employee" and node["deceased"] == "true"
215
+ node.remove
216
+ Loofah::Scrubber::STOP # don't bother with the rest of the subtree
217
+ end
218
+ end
219
+ Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
220
+ ```
221
+
222
+ === Built-In HTML Scrubbers
223
+
224
+ Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
225
+ whitelist algorithm:
226
+
227
+ ``` ruby
228
+ doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
229
+ doc.scrub!(:prune) # removes unknown/unsafe tags and their children
230
+ doc.scrub!(:escape) # escapes unknown/unsafe tags, like this: &lt;script&gt;
231
+ doc.scrub!(:whitewash) # removes unknown/unsafe/namespaced tags and their children,
232
+ # and strips all node attributes
233
+ ```
234
+
235
+ Loofah also comes with some common transformation tasks:
236
+
237
+ ``` ruby
238
+ doc.scrub!(:nofollow) # adds rel="nofollow" attribute to links
239
+ doc.scrub!(:unprintable) # removes unprintable characters from text nodes
240
+ ```
241
+
242
+ See Loofah::Scrubbers for more details and example usage.
243
+
244
+
245
+ ### Chaining Scrubbers
246
+
247
+ You can chain scrubbers:
248
+
249
+ ``` ruby
250
+ Loofah.fragment("<span>hello</span> <script>alert('OHAI')</script>") \
251
+ .scrub!(:prune) \
252
+ .scrub!(span2div).to_s
253
+ # => "<div>hello</div> "
254
+ ```
255
+
256
+ ### Shorthand
257
+
258
+ The class methods Loofah.scrub_fragment and Loofah.scrub_document are
259
+ shorthand.
260
+
261
+ ``` ruby
262
+ Loofah.scrub_fragment(unsafe_html, :prune)
263
+ Loofah.scrub_document(unsafe_html, :prune)
264
+ Loofah.scrub_xml_fragment(bad_xml, custom_scrubber)
265
+ Loofah.scrub_xml_document(bad_xml, custom_scrubber)
266
+ ```
267
+
268
+ are the same thing as (and arguably semantically clearer than):
269
+
270
+ ``` ruby
271
+ Loofah.fragment(unsafe_html).scrub!(:prune)
272
+ Loofah.document(unsafe_html).scrub!(:prune)
273
+ Loofah.xml_fragment(bad_xml).scrub!(custom_scrubber)
274
+ Loofah.xml_document(bad_xml).scrub!(custom_scrubber)
275
+ ```
276
+
277
+
278
+ ### View Helpers
279
+
280
+ Loofah has two "view helpers": Loofah::Helpers.sanitize and
281
+ Loofah::Helpers.strip_tags, both of which are drop-in replacements for
282
+ the Rails ActionView helpers of the same name.
283
+ These are no longer required automatically. You must require `loofah/helpers`.
284
+
285
+
286
+ ## Requirements
287
+
288
+ * Nokogiri >= 1.5.9
289
+
290
+
291
+ ## Installation
292
+
293
+ Unsurprisingly:
294
+
295
+ * gem install loofah
296
+
297
+
298
+ ## Support
299
+
300
+ The bug tracker is available here:
301
+
302
+ * https://github.com/flavorjones/loofah/issues
303
+
304
+ And the mailing list is on librelist:
305
+
306
+ * loofah@librelist.com / http://librelist.com
307
+
308
+ And the IRC channel is \#loofah on freenode.
309
+
310
+
311
+ ## Security
312
+
313
+ Some tools may incorrectly report loofah is a potential security
314
+ vulnerability. Loofah depends on Nokogiri, and it's possible to use
315
+ Nokogiri in a dangerous way (by enabling its DTDLOAD option and
316
+ disabling its NONET option). This dangerous Nokogiri configuration,
317
+ which is sometimes used by other components, can create an XML
318
+ External Entity (XXE) vulnerability if the XML data is not trusted.
319
+ However, loofah never enables this dangerous Nokogiri configuration;
320
+ loofah never enables DTDLOAD, and it never disables NONET.
321
+
322
+
323
+ ## Related Links
324
+
325
+ * Nokogiri: http://nokogiri.org
326
+ * libxml2: http://xmlsoft.org
327
+ * html5lib: https://code.google.com/p/html5lib
328
+
329
+
330
+ ## Authors
331
+
332
+ * [Mike Dalessio](http://mike.daless.io) ([@flavorjones](https://twitter.com/flavorjones))
333
+ * Bryan Helmkamp
334
+
335
+ Featuring code contributed by:
336
+
337
+ * Aaron Patterson
338
+ * John Barnette
339
+ * Josh Owens
340
+ * Paul Dix
341
+ * Luke Melia
342
+
343
+ And a big shout-out to Corey Innis for the name, and feedback on the API.
344
+
345
+
346
+ ## Thank You
347
+
348
+ The following people have generously donated via the Pledgie[http://pledgie.com] badge on the {Loofah github page}[https://github.com/flavorjones/loofah]:
349
+
350
+ * Bill Harding
351
+
352
+
353
+ ## Historical Note
354
+
355
+ This library was formerly known as Dryopteris, which was a very bad
356
+ name that nobody could spell properly.
357
+
358
+
359
+ ## License
360
+
361
+ Distributed under the MIT License. See `MIT-LICENSE.txt` for details.
data/Rakefile CHANGED
@@ -28,7 +28,7 @@ Hoe.spec "loofah" do
28
28
  extra_dev_deps << ["hoe-debugging", ">=0"]
29
29
  extra_dev_deps << ["hoe-bundler", ">=0"]
30
30
  extra_dev_deps << ["hoe-git", ">=0"]
31
- extra_dev_deps << ["concourse", ">=0.14.0"]
31
+ extra_dev_deps << ["concourse", ">=0.15.0"]
32
32
  end
33
33
 
34
34
  task :gemspec do
@@ -27,7 +27,7 @@ require 'loofah/html/document_fragment'
27
27
  #
28
28
  module Loofah
29
29
  # The version of Loofah you are using
30
- VERSION = '2.1.1'
30
+ VERSION = '2.2.0'
31
31
 
32
32
  class << self
33
33
  # Shortcut for Loofah::HTML::Document.parse
@@ -2,13 +2,88 @@ require 'set'
2
2
 
3
3
  module Loofah
4
4
  module Elements
5
- # Block elements in HTML4
6
- STRICT_BLOCK_LEVEL = Set.new %w[address blockquote center dir div dl
7
- fieldset form h1 h2 h3 h4 h5 h6 hr isindex menu noframes
8
- noscript ol p pre table ul]
5
+ STRICT_BLOCK_LEVEL_HTML4 = Set.new %w[
6
+ address
7
+ blockquote
8
+ center
9
+ dir
10
+ div
11
+ dl
12
+ fieldset
13
+ form
14
+ h1
15
+ h2
16
+ h3
17
+ h4
18
+ h5
19
+ h6
20
+ hr
21
+ isindex
22
+ menu
23
+ noframes
24
+ noscript
25
+ ol
26
+ p
27
+ pre
28
+ table
29
+ ul
30
+ ]
9
31
 
10
- # The following elements may also be considered block-level elements since they may contain block-level elements
11
- LOOSE_BLOCK_LEVEL = Set.new %w[dd dt frameset li tbody td tfoot th thead tr]
32
+ # https://developer.mozilla.org/en-US/docs/Web/HTML/Block-level_elements
33
+ STRICT_BLOCK_LEVEL_HTML5 = Set.new %w[
34
+ address
35
+ article
36
+ aside
37
+ blockquote
38
+ canvas
39
+ dd
40
+ div
41
+ dl
42
+ dt
43
+ fieldset
44
+ figcaption
45
+ figure
46
+ footer
47
+ form
48
+ h1
49
+ h2
50
+ h3
51
+ h4
52
+ h5
53
+ h6
54
+ header
55
+ hgroup
56
+ hr
57
+ li
58
+ main
59
+ nav
60
+ noscript
61
+ ol
62
+ output
63
+ p
64
+ pre
65
+ section
66
+ table
67
+ tfoot
68
+ ul
69
+ video
70
+ ]
71
+
72
+ STRICT_BLOCK_LEVEL = STRICT_BLOCK_LEVEL_HTML4 + STRICT_BLOCK_LEVEL_HTML5
73
+
74
+ # The following elements may also be considered block-level
75
+ # elements since they may contain block-level elements
76
+ LOOSE_BLOCK_LEVEL = Set.new %w[dd
77
+ dt
78
+ frameset
79
+ li
80
+ tbody
81
+ td
82
+ tfoot
83
+ th
84
+ thead
85
+ tr
86
+ ]
12
87
 
13
88
  BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
14
89
  end
@@ -79,7 +79,7 @@ module Loofah
79
79
  style_tree.each do |node|
80
80
  next unless node[:node] == :property
81
81
  next if node[:children].any? do |child|
82
- [:url, :bad_url, :function].include? child[:node]
82
+ [:url, :bad_url].include?(child[:node]) || (child[:node] == :function && !WhiteList::ALLOWED_CSS_FUNCTIONS.include?(child[:name].downcase))
83
83
  end
84
84
  name = node[:name].downcase
85
85
  if WhiteList::ALLOWED_CSS_PROPERTIES.include?(name) || WhiteList::ALLOWED_SVG_PROPERTIES.include?(name)
@@ -51,7 +51,7 @@ module Loofah
51
51
  caption center cite code col colgroup command datalist dd del
52
52
  details dfn dir div dl dt em fieldset figcaption figure footer
53
53
  font form h1 h2 h3 h4 h5 h6 header hr i img input ins kbd label
54
- legend li map mark menu meter nav ol output optgroup option p
54
+ legend li main map mark menu meter nav ol output optgroup option p
55
55
  pre q s samp section select small span strike strong sub summary
56
56
  sup table tbody td textarea tfoot th thead time tr tt u ul var
57
57
  video]
@@ -65,7 +65,7 @@ module Loofah
65
65
  circle clipPath defs desc ellipse feGaussianBlur filter font-face
66
66
  font-face-name font-face-src foreignObject
67
67
  g glyph hkern linearGradient line marker mask metadata missing-glyph
68
- mpath path polygon polyline radialGradient rect set stop svg switch
68
+ mpath path polygon polyline radialGradient rect set stop svg switch symbol
69
69
  text textPath title tspan use]
70
70
 
71
71
  ACCEPTABLE_ATTRIBUTES = Set.new %w[abbr accept accept-charset accesskey action
@@ -125,8 +125,8 @@ module Loofah
125
125
  border-bottom-color border-collapse border-color border-left-color
126
126
  border-right-color border-top-color clear color cursor direction
127
127
  display elevation float font font-family font-size font-style
128
- font-variant font-weight height letter-spacing line-height overflow
129
- pause pause-after pause-before pitch pitch-range richness speak
128
+ font-variant font-weight height letter-spacing line-height list-style-type
129
+ overflow pause pause-after pause-before pitch pitch-range richness speak
130
130
  speak-header speak-numeral speak-punctuation speech-rate stress
131
131
  text-align text-decoration text-indent unicode-bidi vertical-align
132
132
  voice-family volume white-space width]
@@ -137,6 +137,8 @@ module Loofah
137
137
  purple red right solid silver teal top transparent underline white
138
138
  yellow]
139
139
 
140
+ ACCEPTABLE_CSS_FUNCTIONS = Set.new %w[calc rgb]
141
+
140
142
  SHORTHAND_CSS_PROPERTIES = Set.new %w[background border margin padding]
141
143
 
142
144
  ACCEPTABLE_SVG_PROPERTIES = Set.new %w[fill fill-opacity fill-rule stroke
@@ -155,6 +157,7 @@ module Loofah
155
157
  ALLOWED_ATTRIBUTES = ACCEPTABLE_ATTRIBUTES + MATHML_ATTRIBUTES + SVG_ATTRIBUTES
156
158
  ALLOWED_CSS_PROPERTIES = ACCEPTABLE_CSS_PROPERTIES
157
159
  ALLOWED_CSS_KEYWORDS = ACCEPTABLE_CSS_KEYWORDS
160
+ ALLOWED_CSS_FUNCTIONS = ACCEPTABLE_CSS_FUNCTIONS
158
161
  ALLOWED_SVG_PROPERTIES = ACCEPTABLE_SVG_PROPERTIES
159
162
  ALLOWED_PROTOCOLS = ACCEPTABLE_PROTOCOLS
160
163
  ALLOWED_URI_DATA_MEDIATYPES = ACCEPTABLE_URI_DATA_MEDIATYPES
@@ -99,7 +99,12 @@ module Loofah
99
99
 
100
100
  def scrub(node)
101
101
  return CONTINUE if html5lib_sanitize(node) == CONTINUE
102
- node.before node.children
102
+ if node.children.length == 1 && node.children.first.cdata?
103
+ sanitized_text = Loofah.fragment(node.children.first.to_html).scrub!(:strip).to_html
104
+ node.before Nokogiri::XML::Text.new(sanitized_text, node.document)
105
+ else
106
+ node.before node.children
107
+ end
103
108
  node.remove
104
109
  end
105
110
  end
@@ -20,9 +20,9 @@ class Html5TestSanitizer < Loofah::TestCase
20
20
  def check_sanitization(input, htmloutput, xhtmloutput, rexmloutput)
21
21
  ## libxml uses double-quotes, so let's swappo-boppo our quotes before comparing.
22
22
  sane = sanitize_html(input).gsub('"',"'")
23
- htmloutput.gsub!('"',"'")
24
- xhtmloutput.gsub!('"',"'")
25
- rexmloutput.gsub!('"',"'")
23
+ htmloutput = htmloutput.gsub('"',"'")
24
+ xhtmloutput = xhtmloutput.gsub('"',"'")
25
+ rexmloutput = rexmloutput.gsub('"',"'")
26
26
 
27
27
  ## HTML5's parsers are shit. there's so much inconsistency with what has closing tags, etc, that
28
28
  ## it would require a lot of manual hacking to make the tests match libxml's output.
@@ -136,7 +136,7 @@ class Html5TestSanitizer < Loofah::TestCase
136
136
  check_sanitization(input, output, output, output)
137
137
  end
138
138
  end
139
-
139
+
140
140
  HTML5::WhiteList::ALLOWED_URI_DATA_MEDIATYPES.each do |data_uri_type|
141
141
  define_method "test_should_allow_data_#{data_uri_type}_uris" do
142
142
  input = %(<a href="data:#{data_uri_type}">foo</a>)
@@ -275,6 +275,38 @@ class Html5TestSanitizer < Loofah::TestCase
275
275
  assert_match %r/-0.05em/, sane.inner_html
276
276
  end
277
277
 
278
+ def test_css_function_sanitization_leaves_whitelisted_functions_calc
279
+ html = "<span style=\"width:calc(5%)\">"
280
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
281
+ assert_match %r/calc\(5%\)/, sane.inner_html
282
+
283
+ html = "<span style=\"width: calc(5%)\">"
284
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
285
+ assert_match %r/calc\(5%\)/, sane.inner_html
286
+ end
287
+
288
+ def test_css_function_sanitization_leaves_whitelisted_functions_rgb
289
+ html = '<span style="color: rgb(255, 0, 0)">'
290
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
291
+ assert_match %r/rgb\(255, 0, 0\)/, sane.inner_html
292
+ end
293
+
294
+ def test_css_function_sanitization_leaves_whitelisted_list_style_type
295
+ html = "<ol style='list-style-type:lower-greek;'></ol>"
296
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
297
+ assert_match %r/list-style-type:lower-greek/, sane.inner_html
298
+ end
299
+
300
+ def test_css_function_sanitization_strips_style_attributes_with_unsafe_functions
301
+ html = "<span style=\"width:attr(data-evil-attr)\">"
302
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
303
+ assert_match %r/<span><\/span>/, sane.inner_html
304
+
305
+ html = "<span style=\"width: attr(data-evil-attr)\">"
306
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :strip).to_html)
307
+ assert_match %r/<span><\/span>/, sane.inner_html
308
+ end
309
+
278
310
  def test_issue_90_slow_regex
279
311
  skip("timing tests are hard to make pass and have little regression-testing value")
280
312
 
@@ -16,66 +16,67 @@ class IntegrationTestAdHoc < Loofah::TestCase
16
16
  end
17
17
  end
18
18
 
19
- def test_removal_of_illegal_tag
20
- html = <<-HTML
19
+ context "tests" do
20
+ def test_removal_of_illegal_tag
21
+ html = <<-HTML
21
22
  following this there should be no jim tag
22
23
  <jim>jim</jim>
23
24
  was there?
24
25
  HTML
25
- sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
26
- assert sane.xpath("//jim").empty?
27
- end
26
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
27
+ assert sane.xpath("//jim").empty?
28
+ end
28
29
 
29
- def test_removal_of_illegal_attribute
30
- html = "<p class=bar foo=bar abbr=bar />"
31
- sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
32
- node = sane.xpath("//p").first
33
- assert node.attributes['class']
34
- assert node.attributes['abbr']
35
- assert_nil node.attributes['foo']
36
- end
30
+ def test_removal_of_illegal_attribute
31
+ html = "<p class=bar foo=bar abbr=bar />"
32
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
33
+ node = sane.xpath("//p").first
34
+ assert node.attributes['class']
35
+ assert node.attributes['abbr']
36
+ assert_nil node.attributes['foo']
37
+ end
37
38
 
38
- def test_removal_of_illegal_url_in_href
39
- html = <<-HTML
39
+ def test_removal_of_illegal_url_in_href
40
+ html = <<-HTML
40
41
  <a href='jimbo://jim.jim/'>this link should have its href removed because of illegal url</a>
41
42
  <a href='http://jim.jim/'>this link should be fine</a>
42
43
  HTML
43
- sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
44
- nodes = sane.xpath("//a")
45
- assert_nil nodes.first.attributes['href']
46
- assert nodes.last.attributes['href']
47
- end
44
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
45
+ nodes = sane.xpath("//a")
46
+ assert_nil nodes.first.attributes['href']
47
+ assert nodes.last.attributes['href']
48
+ end
48
49
 
49
- def test_css_sanitization
50
- html = "<p style='background-color: url(\"http://foo.com/\") ; background-color: #000 ;' />"
51
- sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
52
- assert_match %r/#000/, sane.inner_html
53
- refute_match %r/foo\.com/, sane.inner_html
54
- end
50
+ def test_css_sanitization
51
+ html = "<p style='background-color: url(\"http://foo.com/\") ; background-color: #000 ;' />"
52
+ sane = Nokogiri::HTML(Loofah.scrub_fragment(html, :escape).to_xml)
53
+ assert_match %r/#000/, sane.inner_html
54
+ refute_match %r/foo\.com/, sane.inner_html
55
+ end
55
56
 
56
- def test_fragment_with_no_tags
57
- assert_equal "This fragment has no tags.", Loofah.scrub_fragment("This fragment has no tags.", :escape).to_xml
58
- end
57
+ def test_fragment_with_no_tags
58
+ assert_equal "This fragment has no tags.", Loofah.scrub_fragment("This fragment has no tags.", :escape).to_xml
59
+ end
59
60
 
60
- def test_fragment_in_p_tag
61
- assert_equal "<p>This fragment is in a p.</p>", Loofah.scrub_fragment("<p>This fragment is in a p.</p>", :escape).to_xml
62
- end
61
+ def test_fragment_in_p_tag
62
+ assert_equal "<p>This fragment is in a p.</p>", Loofah.scrub_fragment("<p>This fragment is in a p.</p>", :escape).to_xml
63
+ end
63
64
 
64
- def test_fragment_in_p_tag_plus_stuff
65
- assert_equal "<p>This fragment is in a p.</p>foo<strong>bar</strong>", Loofah.scrub_fragment("<p>This fragment is in a p.</p>foo<strong>bar</strong>", :escape).to_xml
66
- end
65
+ def test_fragment_in_p_tag_plus_stuff
66
+ assert_equal "<p>This fragment is in a p.</p>foo<strong>bar</strong>", Loofah.scrub_fragment("<p>This fragment is in a p.</p>foo<strong>bar</strong>", :escape).to_xml
67
+ end
67
68
 
68
- def test_fragment_with_text_nodes_leading_and_trailing
69
- assert_equal "text<p>fragment</p>text", Loofah.scrub_fragment("text<p>fragment</p>text", :escape).to_xml
70
- end
69
+ def test_fragment_with_text_nodes_leading_and_trailing
70
+ assert_equal "text<p>fragment</p>text", Loofah.scrub_fragment("text<p>fragment</p>text", :escape).to_xml
71
+ end
71
72
 
72
- def test_whitewash_on_fragment
73
- html = "safe<frameset rows=\"*\"><frame src=\"http://example.com\"></frameset> <b>description</b>"
74
- whitewashed = Loofah.scrub_document(html, :whitewash).xpath("/html/body/*").to_s
75
- assert_equal "<p>safe</p><b>description</b>", whitewashed.gsub("\n","")
76
- end
73
+ def test_whitewash_on_fragment
74
+ html = "safe<frameset rows=\"*\"><frame src=\"http://example.com\"></frameset> <b>description</b>"
75
+ whitewashed = Loofah.scrub_document(html, :whitewash).xpath("/html/body/*").to_s
76
+ assert_equal "<p>safe</p><b>description</b>", whitewashed.gsub("\n","")
77
+ end
77
78
 
78
- MSWORD_HTML = <<-EOHTML
79
+ MSWORD_HTML = <<-EOHTML
79
80
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="ProgId" content="Word.Document"><meta name="Generator" content="Microsoft Word 11"><meta name="Originator" content="Microsoft Word 11"><link rel="File-List" href="file:///C:%5CDOCUME%7E1%5CNICOLE%7E1%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"><!--[if gte mso 9]><xml>
80
81
  <w:WordDocument>
81
82
  <w:View>Normal</w:View>
@@ -141,36 +142,52 @@ mso-bidi-language:#0400;}
141
142
  <p class="MsoNormal">Foo <b style="">BOLD<o:p></o:p></b></p>
142
143
  EOHTML
143
144
 
144
- def test_fragment_whitewash_on_microsofty_markup
145
- whitewashed = Loofah.fragment(MSWORD_HTML).scrub!(:whitewash)
146
- assert_equal "<p>Foo <b>BOLD</b></p>", whitewashed.to_s.strip
147
- end
145
+ def test_fragment_whitewash_on_microsofty_markup
146
+ whitewashed = Loofah.fragment(MSWORD_HTML).scrub!(:whitewash)
147
+ assert_equal "<p>Foo <b>BOLD</b></p>", whitewashed.to_s.strip
148
+ end
148
149
 
149
- def test_document_whitewash_on_microsofty_markup
150
- whitewashed = Loofah.document(MSWORD_HTML).scrub!(:whitewash)
151
- assert_match %r(<p>Foo <b>BOLD</b></p>), whitewashed.to_s
152
- assert_equal "<p>Foo <b>BOLD</b></p>", whitewashed.xpath("/html/body/*").to_s
153
- end
150
+ def test_document_whitewash_on_microsofty_markup
151
+ whitewashed = Loofah.document(MSWORD_HTML).scrub!(:whitewash)
152
+ assert_match %r(<p>Foo <b>BOLD</b></p>), whitewashed.to_s
153
+ assert_equal "<p>Foo <b>BOLD</b></p>", whitewashed.xpath("/html/body/*").to_s
154
+ end
154
155
 
155
- def test_return_empty_string_when_nothing_left
156
- assert_equal "", Loofah.scrub_document('<script>test</script>', :prune).text
157
- end
156
+ def test_return_empty_string_when_nothing_left
157
+ assert_equal "", Loofah.scrub_document('<script>test</script>', :prune).text
158
+ end
159
+
160
+ def test_nested_script_cdata_tags_should_be_scrubbed
161
+ html = "<script><script src='malicious.js'></script>"
162
+ stripped = Loofah.fragment(html).scrub!(:strip)
163
+ assert_empty stripped.xpath("//script")
164
+ refute_match("<script", stripped.to_html)
165
+ end
158
166
 
159
- def test_removal_of_all_tags
160
- html = <<-HTML
167
+ def test_nested_script_cdata_tags_should_be_scrubbed_2
168
+ html = "<script><script>alert('a');</script></script>"
169
+ stripped = Loofah.fragment(html).scrub!(:strip)
170
+ assert_empty stripped.xpath("//script")
171
+ refute_match("<script", stripped.to_html)
172
+ end
173
+
174
+ def test_removal_of_all_tags
175
+ html = <<-HTML
161
176
  What's up <strong>doc</strong>?
162
177
  HTML
163
- stripped = Loofah.scrub_document(html, :prune).text
164
- assert_equal %Q(What\'s up doc?).strip, stripped.strip
165
- end
178
+ stripped = Loofah.scrub_document(html, :prune).text
179
+ assert_equal %Q(What\'s up doc?).strip, stripped.strip
180
+ end
166
181
 
167
- def test_dont_remove_whitespace
168
- html = "Foo\nBar"
169
- assert_equal html, Loofah.scrub_document(html, :prune).text
170
- end
182
+ def test_dont_remove_whitespace
183
+ html = "Foo\nBar"
184
+ assert_equal html, Loofah.scrub_document(html, :prune).text
185
+ end
171
186
 
172
- def test_dont_remove_whitespace_between_tags
173
- html = "<p>Foo</p>\n<p>Bar</p>"
174
- assert_equal "Foo\nBar", Loofah.scrub_document(html, :prune).text
187
+ def test_dont_remove_whitespace_between_tags
188
+ html = "<p>Foo</p>\n<p>Bar</p>"
189
+ assert_equal "Foo\nBar", Loofah.scrub_document(html, :prune).text
190
+ end
175
191
  end
176
192
  end
193
+
@@ -19,11 +19,16 @@ class IntegrationTestHtml < Loofah::TestCase
19
19
  end
20
20
 
21
21
  context "#to_text" do
22
- it "add newlines before and after block elements" do
22
+ it "add newlines before and after html4 block elements" do
23
23
  html = Loofah.fragment "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
24
24
  assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
25
25
  end
26
26
 
27
+ it "add newlines before and after html5 block elements" do
28
+ html = Loofah.fragment "<div>tweedle<section>beetle</section>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
29
+ assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
30
+ end
31
+
27
32
  it "remove extraneous whitespace" do
28
33
  html = Loofah.fragment "<div>tweedle\n\n\t\n\s\nbeetle</div>"
29
34
  assert_equal "\ntweedle\n\nbeetle\n", html.to_text
@@ -47,11 +52,16 @@ class IntegrationTestHtml < Loofah::TestCase
47
52
  end
48
53
 
49
54
  context "#to_text" do
50
- it "add newlines before and after block elements" do
55
+ it "add newlines before and after html4 block elements" do
51
56
  html = Loofah.document "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
52
57
  assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
53
58
  end
54
59
 
60
+ it "add newlines before and after html5 block elements" do
61
+ html = Loofah.document "<div>tweedle<section>beetle</section>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
62
+ assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
63
+ end
64
+
55
65
  it "remove extraneous whitespace" do
56
66
  html = Loofah.document "<div>tweedle\n\n\t\n\s\nbeetle</div>"
57
67
  assert_equal "\ntweedle\n\nbeetle\n", html.to_text
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: loofah
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.1
4
+ version: 2.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Mike Dalessio
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2017-09-25 00:00:00.000000000 Z
12
+ date: 2018-02-11 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: nokogiri
@@ -157,14 +157,14 @@ dependencies:
157
157
  requirements:
158
158
  - - ">="
159
159
  - !ruby/object:Gem::Version
160
- version: 0.14.0
160
+ version: 0.15.0
161
161
  type: :development
162
162
  prerelease: false
163
163
  version_requirements: !ruby/object:Gem::Requirement
164
164
  requirements:
165
165
  - - ">="
166
166
  - !ruby/object:Gem::Version
167
- version: 0.14.0
167
+ version: 0.15.0
168
168
  - !ruby/object:Gem::Dependency
169
169
  name: rdoc
170
170
  requirement: !ruby/object:Gem::Requirement
@@ -193,19 +193,7 @@ dependencies:
193
193
  - - "~>"
194
194
  - !ruby/object:Gem::Version
195
195
  version: '3.16'
196
- description: |-
197
- Loofah is a general library for manipulating and transforming HTML/XML
198
- documents and fragments. It's built on top of Nokogiri and libxml2, so
199
- it's fast and has a nice API.
200
-
201
- Loofah excels at HTML sanitization (XSS prevention). It includes some
202
- nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
203
- most likely won't make your codes less secure. (These statements have
204
- not been evaluated by Netexperts.)
205
-
206
- ActiveRecord extensions for sanitization are available in the
207
- `loofah-activerecord` gem (see
208
- https://github.com/flavorjones/loofah-activerecord).
196
+ description: ''
209
197
  email:
210
198
  - mike.dalessio@gmail.com
211
199
  - bryan@brynary.com
@@ -215,14 +203,14 @@ extra_rdoc_files:
215
203
  - CHANGELOG.md
216
204
  - MIT-LICENSE.txt
217
205
  - Manifest.txt
218
- - README.rdoc
206
+ - README.md
219
207
  files:
220
208
  - ".gemtest"
221
209
  - CHANGELOG.md
222
210
  - Gemfile
223
211
  - MIT-LICENSE.txt
224
212
  - Manifest.txt
225
- - README.rdoc
213
+ - README.md
226
214
  - Rakefile
227
215
  - benchmark/benchmark.rb
228
216
  - benchmark/fragment.html
@@ -254,7 +242,7 @@ files:
254
242
  - test/unit/test_helpers.rb
255
243
  - test/unit/test_scrubber.rb
256
244
  - test/unit/test_scrubbers.rb
257
- homepage: https://github.com/flavorjones/loofah
245
+ homepage:
258
246
  licenses:
259
247
  - MIT
260
248
  metadata: {}
@@ -279,6 +267,5 @@ rubyforge_project:
279
267
  rubygems_version: 2.6.12
280
268
  signing_key:
281
269
  specification_version: 4
282
- summary: Loofah is a general library for manipulating and transforming HTML/XML documents
283
- and fragments
270
+ summary: ''
284
271
  test_files: []
@@ -1,314 +0,0 @@
1
- = Loofah {<img src="https://travis-ci.org/flavorjones/loofah.svg?branch=master" alt="Build Status" />}[https://travis-ci.org/flavorjones/loofah]
2
-
3
- * https://github.com/flavorjones/loofah
4
- * http://rubydoc.info/github/flavorjones/loofah/master/frames
5
- * http://librelist.com/browser/loofah
6
-
7
- == Description
8
-
9
- Loofah is a general library for manipulating and transforming HTML/XML
10
- documents and fragments. It's built on top of Nokogiri and libxml2, so
11
- it's fast and has a nice API.
12
-
13
- Loofah excels at HTML sanitization (XSS prevention). It includes some
14
- nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
15
- most likely won't make your codes less secure. (These statements have
16
- not been evaluated by Netexperts.)
17
-
18
- ActiveRecord extensions for sanitization are available in the
19
- `loofah-activerecord` gem (see
20
- https://github.com/flavorjones/loofah-activerecord).
21
-
22
- == Features
23
-
24
- * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's whitelists).
25
- * Common HTML sanitizing tasks are built-in:
26
- * _Strip_ unsafe tags, leaving behind only the inner text.
27
- * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
28
- * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
29
- * _Whitewash_ the markup, removing all attributes and namespaced nodes.
30
- * Common HTML transformation tasks are built-in:
31
- * Add the _nofollow_ attribute to all hyperlinks.
32
- * Format markup as plain text, with or without sensible whitespace handling around block elements.
33
- * Replace Rails's +strip_tags+ and +sanitize+ view helper methods.
34
-
35
- == Compare and Contrast
36
-
37
- Loofah is one of two known Ruby XSS/sanitization solutions that
38
- guarantees well-formed and valid markup (the other is Sanitize, which
39
- also uses Nokogiri).
40
-
41
- Loofah works on XML, XHTML and HTML documents.
42
-
43
- Also, it's pretty fast. Here is a benchmark comparing Loofah to other
44
- commonly-used libraries (ActionView, Sanitize, HTML5lib and HTMLfilter):
45
-
46
- * https://gist.github.com/170193
47
-
48
- Lastly, Loofah is extensible. It's super-easy to write your own custom
49
- scrubbers for whatever document manipulation you need. You don't like
50
- the built-in scrubbers? Build your own, like a boss.
51
-
52
- == The Basics
53
-
54
- Loofah wraps Nokogiri[http://nokogiri.org] in a loving
55
- embrace. Nokogiri[http://nokogiri.org] is an excellent HTML/XML
56
- parser. If you don't know how Nokogiri[http://nokogiri.org] works, you
57
- might want to pause for a moment and go check it out. I'll wait.
58
-
59
- Loofah presents the following classes:
60
-
61
- * Loofah::HTML::Document and Loofah::HTML::DocumentFragment
62
- * Loofah::XML::Document and Loofah::XML::DocumentFragment
63
- * Loofah::Scrubber
64
-
65
- The documents and fragments are subclasses of the similar Nokogiri classes.
66
-
67
- The Scrubber represents the document manipulation, either by wrapping
68
- a block,
69
-
70
- span2div = Loofah::Scrubber.new do |node|
71
- node.name = "div" if node.name == "span"
72
- end
73
-
74
- or by implementing a method.
75
-
76
- === Side Note: Fragments vs Documents
77
-
78
- Generally speaking, unless you expect to have a DOCTYPE and a single
79
- root node, you don't have a *document*, you have a *fragment*. For
80
- HTML, another rule of thumb is that *documents* have +html+ and +body+
81
- tags, and *fragments* usually do not.
82
-
83
- HTML fragments should be parsed with Loofah.fragment. The result won't
84
- be wrapped in +html+ or +body+ tags, won't have a DOCTYPE declaration,
85
- +head+ elements will be silently ignored, and multiple root nodes are
86
- allowed.
87
-
88
- XML fragments should be parsed with Loofah.xml_fragment. The result
89
- won't have a DOCTYPE declaration, and multiple root nodes are allowed.
90
-
91
- HTML documents should be parsed with Loofah.document. The result will
92
- have a DOCTYPE declaration, along with +html+, +head+ and +body+ tags.
93
-
94
- XML documents should be parsed with Loofah.xml_document. The result
95
- will have a DOCTYPE declaration and a single root node.
96
-
97
- === Loofah::HTML::Document and Loofah::HTML::DocumentFragment
98
-
99
- These classes are subclasses of Nokogiri::HTML::Document and
100
- Nokogiri::HTML::DocumentFragment, so you get all the markup
101
- fixer-uppery and API goodness of Nokogiri.
102
-
103
- The module methods Loofah.document and Loofah.fragment will parse an
104
- HTML document and an HTML fragment, respectively.
105
-
106
- Loofah.document(unsafe_html).is_a?(Nokogiri::HTML::Document) # => true
107
- Loofah.fragment(unsafe_html).is_a?(Nokogiri::HTML::DocumentFragment) # => true
108
-
109
- Loofah injects a +scrub!+ method, which takes either a symbol (for
110
- built-in scrubbers) or a Loofah::Scrubber object (for custom
111
- scrubbers), and modifies the document in-place.
112
-
113
- Loofah overrides +to_s+ to return HTML:
114
-
115
- unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"
116
-
117
- doc = Loofah.fragment(unsafe_html).scrub!(:strip)
118
- doc.to_s # => "ohai! <div>div is safe</div> "
119
-
120
- and +text+ to return plain text:
121
-
122
- doc.text # => "ohai! div is safe "
123
-
124
- Also, +to_text+ is available, which does the right thing with
125
- whitespace around block-level elements.
126
-
127
- doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
128
- doc.text # => "TitleContent" # probably not what you want
129
- doc.to_text # => "\nTitle\n\nContent\n" # better
130
-
131
- === Loofah::XML::Document and Loofah::XML::DocumentFragment
132
-
133
- These classes are subclasses of Nokogiri::XML::Document and
134
- Nokogiri::XML::DocumentFragment, so you get all the markup
135
- fixer-uppery and API goodness of Nokogiri.
136
-
137
- The module methods Loofah.xml_document and Loofah.xml_fragment will
138
- parse an XML document and an XML fragment, respectively.
139
-
140
- Loofah.xml_document(bad_xml).is_a?(Nokogiri::XML::Document) # => true
141
- Loofah.xml_fragment(bad_xml).is_a?(Nokogiri::XML::DocumentFragment) # => true
142
-
143
- === Nodes and NodeSets
144
-
145
- Nokogiri::XML::Node and Nokogiri::XML::NodeSet also get a +scrub!+
146
- method, which makes it easy to scrub subtrees.
147
-
148
- The following code will apply the +employee_scrubber+ only to the
149
- +employee+ nodes (and their subtrees) in the document:
150
-
151
- Loofah.xml_document(bad_xml).xpath("//employee").scrub!(employee_scrubber)
152
-
153
- And this code will only scrub the first +employee+ node and its subtree:
154
-
155
- Loofah.xml_document(bad_xml).at_xpath("//employee").scrub!(employee_scrubber)
156
-
157
- === Loofah::Scrubber
158
-
159
- A Scrubber wraps up a block (or method) that is run on a document node:
160
-
161
- # change all <span> tags to <div> tags
162
- span2div = Loofah::Scrubber.new do |node|
163
- node.name = "div" if node.name == "span"
164
- end
165
-
166
- This can then be run on a document:
167
-
168
- Loofah.fragment("<span>foo</span><p>bar</p>").scrub!(span2div).to_s
169
- # => "<div>foo</div><p>bar</p>"
170
-
171
- Scrubbers can be run on a document in either a top-down traversal (the
172
- default) or bottom-up. Top-down scrubbers can optionally return
173
- Scrubber::STOP to terminate the traversal of a subtree. Read below and
174
- in the Loofah::Scrubber class for more detailed usage.
175
-
176
- Here's an XML example:
177
-
178
- # remove all <employee> tags that have a "deceased" attribute set to true
179
- bring_out_your_dead = Loofah::Scrubber.new do |node|
180
- if node.name == "employee" and node["deceased"] == "true"
181
- node.remove
182
- Loofah::Scrubber::STOP # don't bother with the rest of the subtree
183
- end
184
- end
185
- Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
186
-
187
- === Built-In HTML Scrubbers
188
-
189
- Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
190
- whitelist algorithm:
191
-
192
- doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
193
- doc.scrub!(:prune) # removes unknown/unsafe tags and their children
194
- doc.scrub!(:escape) # escapes unknown/unsafe tags, like this: &lt;script&gt;
195
- doc.scrub!(:whitewash) # removes unknown/unsafe/namespaced tags and their children,
196
- # and strips all node attributes
197
-
198
- Loofah also comes with some common transformation tasks:
199
-
200
- doc.scrub!(:nofollow) # adds rel="nofollow" attribute to links
201
- doc.scrub!(:unprintable) # removes unprintable characters from text nodes
202
-
203
- See Loofah::Scrubbers for more details and example usage.
204
-
205
- === Chaining Scrubbers
206
-
207
- You can chain scrubbers:
208
-
209
- Loofah.fragment("<span>hello</span> <script>alert('OHAI')</script>") \
210
- .scrub!(:prune) \
211
- .scrub!(span2div).to_s
212
- # => "<div>hello</div> "
213
-
214
- === Shorthand
215
-
216
- The class methods Loofah.scrub_fragment and Loofah.scrub_document are
217
- shorthand.
218
-
219
- Loofah.scrub_fragment(unsafe_html, :prune)
220
- Loofah.scrub_document(unsafe_html, :prune)
221
- Loofah.scrub_xml_fragment(bad_xml, custom_scrubber)
222
- Loofah.scrub_xml_document(bad_xml, custom_scrubber)
223
-
224
- are the same thing as (and arguably semantically clearer than):
225
-
226
- Loofah.fragment(unsafe_html).scrub!(:prune)
227
- Loofah.document(unsafe_html).scrub!(:prune)
228
- Loofah.xml_fragment(bad_xml).scrub!(custom_scrubber)
229
- Loofah.xml_document(bad_xml).scrub!(custom_scrubber)
230
-
231
- === View Helpers
232
-
233
- Loofah has two "view helpers": Loofah::Helpers.sanitize and
234
- Loofah::Helpers.strip_tags, both of which are drop-in replacements for
235
- the Rails ActionView helpers of the same name.
236
- These are no longer required automatically. You must require `loofah/helpers`.
237
-
238
- == Requirements
239
-
240
- * Nokogiri >= 1.4.4
241
-
242
- == Installation
243
-
244
- Unsurprisingly:
245
-
246
- * gem install loofah
247
-
248
- == Support
249
-
250
- The bug tracker is available here:
251
-
252
- * https://github.com/flavorjones/loofah/issues
253
-
254
- And the mailing list is on librelist:
255
-
256
- * loofah@librelist.com / http://librelist.com
257
-
258
- And the IRC channel is \#loofah on freenode.
259
-
260
- == Related Links
261
-
262
- * Nokogiri: http://nokogiri.org
263
- * libxml2: http://xmlsoft.org
264
- * html5lib: https://code.google.com/p/html5lib
265
-
266
- == Authors
267
-
268
- * {Mike Dalessio}[http://mike.daless.io] (@flavorjones[https://twitter.com/flavorjones])
269
- * Bryan Helmkamp
270
-
271
- Featuring code contributed by:
272
-
273
- * Aaron Patterson
274
- * John Barnette
275
- * Josh Owens
276
- * Paul Dix
277
- * Luke Melia
278
-
279
- And a big shout-out to Corey Innis for the name, and feedback on the API.
280
-
281
- == Thank You
282
-
283
- The following people have generously donated via the Pledgie[http://pledgie.com] badge on the {Loofah github page}[https://github.com/flavorjones/loofah]:
284
-
285
- * Bill Harding
286
-
287
- == Historical Note
288
-
289
- This library was formerly known as Dryopteris, which was a very bad
290
- name that nobody could spell properly.
291
-
292
- == License
293
-
294
- The MIT License
295
-
296
- Copyright (c) 2009 -- 2014 by Mike Dalessio, Bryan Helmkamp
297
-
298
- Permission is hereby granted, free of charge, to any person obtaining a copy
299
- of this software and associated documentation files (the "Software"), to deal
300
- in the Software without restriction, including without limitation the rights
301
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
302
- copies of the Software, and to permit persons to whom the Software is
303
- furnished to do so, subject to the following conditions:
304
-
305
- The above copyright notice and this permission notice shall be included in
306
- all copies or substantial portions of the Software.
307
-
308
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
309
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
310
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
311
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
312
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
313
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
314
- THE SOFTWARE.