loofah 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

data.tar.gz.sig CHANGED
Binary file
@@ -1,5 +1,14 @@
1
1
  = Changelog
2
2
 
3
+ == 0.4.0
4
+
5
+ Enhancements:
6
+
7
+ * Scrubber class introduced, allowing development of custom scrubbers.
8
+ * Added support for XML documents and fragments.
9
+ * Added :nofollow HTML scrubber (thanks Luke Melia!)
10
+ * Built-in scrubbing methods refactored to use Scrubber.
11
+
3
12
  == 0.3.1 (2009-10-12)
4
13
 
5
14
  Bug fixes:
@@ -17,14 +17,16 @@ lib/loofah/html/document.rb
17
17
  lib/loofah/html/document_fragment.rb
18
18
  lib/loofah/html5/scrub.rb
19
19
  lib/loofah/html5/whitelist.rb
20
+ lib/loofah/instance_methods.rb
20
21
  lib/loofah/scrubber.rb
22
+ lib/loofah/scrubbers.rb
21
23
  lib/loofah/xss_foliate.rb
22
24
  test/helper.rb
23
25
  test/html5/test_sanitizer.rb
24
- test/html5/testdata/tests1.dat
25
26
  test/test_active_record.rb
26
27
  test/test_ad_hoc.rb
27
28
  test/test_api.rb
28
29
  test/test_helpers.rb
29
30
  test/test_scrubber.rb
31
+ test/test_scrubbers.rb
30
32
  test/test_xss_foliate.rb
@@ -4,139 +4,253 @@
4
4
  * http://rubyforge.org/projects/loofah
5
5
  * http://github.com/flavorjones/loofah
6
6
 
7
- == DESCRIPTION
7
+ == Description
8
+
9
+ Loofah is a general library for manipulating HTML/XML documents and
10
+ fragments. It's built on top of Nokogiri and libxml2, so it's fast and
11
+ has a nice API.
12
+
13
+ Loofah excels at HTML sanitization (XSS prevention). It includes some
14
+ nice HTML sanitizers, which are based on HTML5lib's whitelist, so it
15
+ most likely won't make your codes less secure. (These statements have
16
+ not been evaluated by Netexperts.)
17
+
18
+ == Features
19
+
20
+ * Easily write custom scrubbers for HTML/XML leveraging the sweetness of Nokogiri (and HTML5lib's whitelists).
21
+ * Common HTML sanitizing tasks are built-in:
22
+ * _Strip_ unsafe tags, leaving behind only the inner text.
23
+ * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
24
+ * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
25
+ * _Whitewash_ the markup, removing all attributes and namespaced nodes.
26
+ * Common HTML transformation tasks are built-in:
27
+ * Add the _nofollow_ attribute to all hyperlinks.
28
+ * Format markup as plain text.
29
+ * Replace Rails's +strip_tags+ and +sanitize+ helper methods.
30
+ * Two ActiveRecord extensions:
31
+ * Loofah::XssFoliate, an XssTerminate[http://github.com/look/xss_terminate/tree/master] drop-in replacement, is an *opt-out* sanitizer. By default all models and attributes are sanitized.
32
+ * Loofah::ActiveRecordExtension is an *opt-in* sanitizer. You must explicitly declare attributes to be sanitized.
33
+
34
+ == Compare and Contrast
35
+
36
+ Loofah is the only Ruby XSS/sanitization solution that guarantees
37
+ well-formed and valid markup.
8
38
 
9
- Loofah is an HTML sanitizer. It will always fix broken markup, but
10
- can also sanitize unsafe tags in a few different ways, and transform
11
- the markup for storage or display.
39
+ Loofah works fine on XML, XHTML and HTML documents.
12
40
 
13
- It's built on top of Nokogiri and libxml2, so it's fast. And it uses
14
- html5lib's whitelist, so it most likely won't make your codes less
15
- secure. \*
41
+ Also, it's pretty fast. Here is a benchmark comparing Loofah to other
42
+ commonly-used libraries (ActionView, Sanitize and HTML5lib):
16
43
 
17
- \* These statements have not been evaluated by Netexperts.
44
+ * http://gist.github.com/170193
18
45
 
19
- == FEATURES
46
+ Lastly, Loofah is extensible. It's super-easy to write your own custom
47
+ scrubbers for whatever document manipulation you need. You don't like
48
+ the built-in scrubbers? Build your own, like a boss.
20
49
 
21
- * _Strip_ unsafe tags, leaving behind only the inner text.
22
- * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
23
- * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
24
- * _Whitewash_ the markup, removing all attributes and namespaced nodes.
25
- * Format the markup as plain text.
26
- * Replacements for Rails's +strip_tags+ and +sanitize+ helper methods.
27
- * TWO! Count them, TWO! ActiveRecord extensions:
28
- * Loofah::XssFoliate (an XssTerminate[http://github.com/look/xss_terminate/tree/master] drop-in replacement) is an *opt-out* sanitizer; by default all models and attributes are sanitized.
29
- * Loofah::ActiveRecordExtension is an *opt-in* sanitizer; you must explicitly declare attributes to be sanitized.
30
- * 99 44/100 % pure
50
+ == The Basics
31
51
 
32
- == COMPARE AND CONTRAST
52
+ Loofah wraps Nokogiri[http://nokogiri.org] in a loving
53
+ embrace. Nokogiri[http://nokogiri.org] is an excellent HTML/XML
54
+ parser. If you don't know how Nokogiri[http://nokogiri.org] works, you
55
+ might want to pause for a moment and go check it out. I'll wait.
33
56
 
34
- Loofah is the only ruby XSS/sanitization library that guarantees
35
- well-formed and valid markup.
57
+ Loofah presents the following classes:
36
58
 
37
- Also, it's pretty fast. Here is a benchmark comparing Loofah to other
38
- commonly-used libraries:
59
+ * Loofah::HTML::Document and Loofah::HTML::DocumentFragment
60
+ * Loofah::XML::Document and Loofah::XML::DocumentFragment
61
+ * Loofah::Scrubber
39
62
 
40
- * http://gist.github.com/170193
63
+ The documents and fragments are subclasses of the similar Nokogiri classes.
41
64
 
42
- == SYNOPSIS
65
+ The Scrubber represents the document manipulation, either by wrapping
66
+ a block,
43
67
 
44
- For a full explanation, see the documentation for Loofah.
68
+ span2div = Loofah::Scrubber.new do |node|
69
+ node.name = "div" if node.name == "span"
70
+ end
45
71
 
46
- require 'loofah'
72
+ or by implementing a method.
47
73
 
48
- unsafe_html = "ohai! <div>a div is safe</div> <script>but script is not</script>"
74
+ === Side Note: Fragments vs Documents
49
75
 
50
- Loofah.scrub_fragment(unsafe_html, :prune).to_s # => "ohai! <div>div is safe</div> "
76
+ Generally speaking, unless you expect to have a DOCTYPE and a single
77
+ root node, you don't have a *document*, you have a *fragment*. For
78
+ HTML, another rule of thumb is that *documents* have \&lt;html\&gt;
79
+ and \&lt;body\&gt; tags, and *fragments* usually do not.
51
80
 
52
- OR
81
+ HTML fragments should be parsed with Loofah.fragment. Loofah won't
82
+ wrap the result in +html+ and +body+ tags, won't add a DOCTYPE
83
+ declaration, and will ignore +head+ elements.
53
84
 
54
- doc = Loofah.fragment(unsafe_html) # returns a Nokogiri document ...
55
- doc.scrub!(:prune) # ... with one extra method
56
- doc.to_s # => "ohai! <div>div is safe</div> "
57
- doc.text # => "ohai! div is safe "
85
+ XML fragments should be parsed with Loofah.xml_fragment. Loofah won't
86
+ add a DOCTYPE declaration and will allow multiple root nodes.
58
87
 
59
- === ACTIVERECORD EXTENSION \#1: OPT-IN
88
+ HTML documents should be parsed with Loofah.document, which will add
89
+ the DOCTYPE declaration, and properly handle +head+ and +body+
90
+ elements.
60
91
 
61
- See Loofah::ActiveRecordExtension for more documentation. The methods
62
- mixed into ActiveRecord are:
92
+ XML documents should be parsed with Loofah.xml_document. Loofah will
93
+ make sure there's a DOCTYPE declaration and a single root node.
63
94
 
64
- * Loofah::ActiveRecordExtension.html_document
65
- * Loofah::ActiveRecordExtension.html_fragment
95
+ === Loofah::HTML::Document and Loofah::HTML::DocumentFragment
66
96
 
67
- Example:
97
+ These classes are subclasses of Nokogiri::HTML::Document and
98
+ Nokogiri::HTML::DocumentFragment, so you get all the markup
99
+ fixer-uppery and API goodness of Nokogiri.
68
100
 
69
- # config/environment.rb
70
- Rails::Initializer.run do |config|
71
- config.gem 'loofah'
72
- end
101
+ The module methods Loofah.document and Loofah.fragment will parse an
102
+ HTML document and an HTML fragment, respectively.
73
103
 
74
- # db/schema.rb
75
- create_table "posts" do |t|
76
- t.string "title"
77
- t.text "body"
78
- t.string "author"
79
- end
104
+ Loofah.document(unsafe_html).is_a?(Nokogiri::HTML::Document) # => true
105
+ Loofah.fragment(unsafe_html).is_a?(Nokogiri::HTML::DocumentFragment) # => true
80
106
 
81
- # app/model/post.rb
82
- class Post < ActiveRecord::Base
83
- html_fragment :body, :scrub => :prune # scrubs 'body' in a before_validation
84
- end
107
+ Loofah injects a +scrub!+ method, which takes either a symbol (for
108
+ built-in scrubbers) or a Loofah::Scrubber object (for custom
109
+ scrubbers), and modifies the document in-place.
85
110
 
86
- === ACTIVERECORD EXTENSION \#2: OPT-OUT (XSS_TERMINATE DROP-IN REPLACEMENT)
111
+ Loofah overrides +to_s+ to return HTML:
87
112
 
88
- See Loofah::XssFoliate::ClassMethods for more documentation. The methods mixed into ActiveRecord are:
113
+ unsafe_html = "ohai! <div>div is safe</div> <script>but script is not</script>"
89
114
 
90
- * Loofah::XssFoliate::ClassMethods.xss_foliate
91
- * Loofah::XssFoliate::ClassMethods.xss_foliated?
115
+ doc = Loofah.fragment(unsafe_html).scrub!(:strip)
116
+ doc.to_s # => "ohai! <div>div is safe</div> "
92
117
 
93
- If the constant LOOFAH_XSS_FOLIATE_ALL_MODELS is set, then all models
94
- inheriting from ActiveRecord::Base will sanitize their string and text
95
- attributes in a before_validate. Otherwise, the xss_foliate method is
96
- available for opting in to sanitization.
118
+ and +text+ to return plain text:
97
119
 
98
- Example:
120
+ doc.text # => "ohai! div is safe "
99
121
 
100
- # config/environment
101
- LOOFAH_XSS_FOLIATE_ALL_MODELS = true
102
- Rails::Initializer.run do |config|
103
- config.gem "loofah"
104
- end
122
+ === Loofah::XML::Document and Loofah::XML::DocumentFragment
123
+
124
+ These classes are subclasses of Nokogiri::XML::Document and
125
+ Nokogiri::XML::DocumentFragment, so you get all the markup
126
+ fixer-uppery and API goodness of Nokogiri.
127
+
128
+ The module methods Loofah.xml_document and Loofah.xml_fragment will
129
+ parse an XML document and an XML fragment, respectively.
130
+
131
+ Loofah.xml_document(bad_xml).is_a?(Nokogiri::XML::Document) # => true
132
+ Loofah.xml_fragment(bad_xml).is_a?(Nokogiri::XML::DocumentFragment) # => true
133
+
134
+ === Loofah::Scrubber
105
135
 
106
- # db/schema.rb
107
- create_table "posts" do |t|
108
- t.string "title"
109
- t.text "body"
110
- t.string "author"
136
+ A Scrubber wraps up a block (or method) that is run on a document node:
137
+
138
+ # change all <span> tags to <div> tags
139
+ span2div = Loofah::Scrubber.new do |node|
140
+ node.name = "div" if node.name == "span"
111
141
  end
112
142
 
113
- # app/model/post.rb
114
- class Post < ActiveRecord::Base
115
- # by default, 'title', 'body' and 'author' will all be sanitized in a before_validation,
116
- # without additional declarations.
143
+ This can then be run on a document:
144
+
145
+ Loofah.fragment("<span>foo</span><p>bar</p>").scrub!(span2div).to_s
146
+ # => "<div>foo</div><p>bar</p>"
147
+
148
+ Scrubbers can be run on a document in either a top-down traversal (the
149
+ default) or bottom-up. Top-down scrubbers can optionally return
150
+ Scrubber::STOP to terminate the traversal of a subtree. Read below and
151
+ in the Loofah::Scrubber class for more detailed usage.
152
+
153
+ Here's an XML example:
154
+
155
+ # remove all <employee> tags that have a "deceased" attribute set to true
156
+ bring_out_your_dead = Loofah::Scrubber.new do |node|
157
+ if node.name == "employee" and node["deceased"] == "true"
158
+ node.remove
159
+ Loofah::Scrubber::STOP # don't bother with the rest of the subtree
160
+ end
117
161
  end
162
+ Loofah.xml_document(File.read('plague.xml')).scrub!(bring_out_your_dead)
118
163
 
119
- OR
164
+ === Built-In HTML Scrubbers
165
+
166
+ Loofah comes with a set of sanitizing scrubbers that use HTML5lib's
167
+ whitelist algorithm:
168
+
169
+ doc.scrub!(:strip) # replaces unknown/unsafe tags with their inner text
170
+ doc.scrub!(:prune) # removes unknown/unsafe tags and their children
171
+ doc.scrub!(:escape) # escapes unknown/unsafe tags, like this: &lt;script&gt;
172
+ doc.scrub!(:whitewash) # removes unknown/unsafe/namespaced tags and their children,
173
+ # and strips all node attributes
174
+
175
+ Loofah also comes with some common transformation tasks:
176
+
177
+ doc.scrub!(:nofollow) # adds rel="nofollow" attribute to links
178
+
179
+ See Loofah::Scrubbers for more details and example usage.
180
+
181
+ === Chaining Scrubbers
182
+
183
+ You can chain scrubbers:
184
+
185
+ Loofah.fragment("<span>hello</span> <script>alert('OHAI')</script>") \
186
+ .scrub!(:prune) \
187
+ .scrub!(span2div).to_s
188
+ # => "<div>hello</div> "
189
+
190
+ === Shorthand
191
+
192
+ The class methods Loofah.scrub_fragment and Loofah.scrub_document are
193
+ shorthand.
194
+
195
+ Loofah.scrub_fragment(unsafe_html, :prune)
196
+ Loofah.scrub_document(unsafe_html, :prune)
197
+ Loofah.scrub_xml_fragment(bad_xml, custom_scrubber)
198
+ Loofah.scrub_xml_document(bad_xml, custom_scrubber)
199
+
200
+ are the same thing as (and arguably semantically clearer than):
201
+
202
+ Loofah.fragment(unsafe_html).scrub!(:prune)
203
+ Loofah.document(unsafe_html).scrub!(:prune)
204
+ Loofah.xml_fragment(bad_xml).scrub!(custom_scrubber)
205
+ Loofah.xml_document(bad_xml).scrub!(custom_scrubber)
206
+
207
+ === ActiveRecord Extension \#1: Opt-In
208
+
209
+ See Loofah::ActiveRecordExtension for full documentation. The methods
210
+ mixed into ActiveRecord are:
211
+
212
+ * Loofah::ActiveRecordExtension.html_document
213
+ * Loofah::ActiveRecordExtension.html_fragment
214
+
215
+ which are used to declare how specific string and text attributes
216
+ should be scrubbed at +before_validation+.
120
217
 
121
218
  # app/model/post.rb
122
219
  class Post < ActiveRecord::Base
123
- # opt-out and/or modify the sanitization method used
124
- xss_foliate :except => [:title, :author], :prune => :body
220
+ html_fragment :body, :scrub => :prune # scrubs 'body' at before_validation
125
221
  end
126
222
 
127
- == REQUIREMENTS
223
+ === ActiveRecord Extension \#2: Opt-Out
224
+
225
+ See Loofah::XssFoliate::ClassMethods for more documentation. The methods mixed into ActiveRecord are:
226
+
227
+ * Loofah::XssFoliate::ClassMethods.xss_foliate
228
+ * Loofah::XssFoliate::ClassMethods.xss_foliated?
229
+
230
+ which are used to declare how specific string and text attributes
231
+ should be scrubbed at +before_validation+.
232
+
233
+ Attributes are stripped by default, unless another scrubber is
234
+ specified or the attribute is present in an +:except+ clause.
235
+
236
+ === View Helpers
237
+
238
+ Loofah has two "view helpers": Loofah::Helpers.sanitize and
239
+ Loofah::Helpers.strip_tags, both of which are drop-in replacements for
240
+ the ActionView helpers of the same name.
241
+
242
+ == Requirements
128
243
 
129
244
  * Nokogiri >= 1.3.3
130
- * ruby 1.8.6, 1.8.7 or 1.9
131
- * Rails 2.3, 2.2, 2.1, 2.0 or 1.2 (for ActiveRecord extensions)
245
+ * Rails 2.3, 2.2, 2.1, 2.0 or 1.2 (if you're using the ActiveRecord extensions)
132
246
 
133
- == INSTALLATION
247
+ == Installation
134
248
 
135
249
  Unsurprisingly:
136
250
 
137
251
  * gem install loofah
138
252
 
139
- == SUPPORT
253
+ == Support
140
254
 
141
255
  The bug tracker is available here:
142
256
 
@@ -148,16 +262,16 @@ For now, we're piggybacking on the Nokogiri mailing list:
148
262
 
149
263
  And the IRC channel is #nokogiri on freenode.
150
264
 
151
- == RELATED LINKS
265
+ == Related Links
152
266
 
153
267
  * Nokogiri: http://nokogiri.org
154
268
  * libxml2: http://xmlsoft.org
155
269
  * html5lib: http://code.google.com/p/html5lib
156
270
  * XssTerminate: http://github.com/look/xss_terminate/tree/master
157
271
 
158
- == AUTHORS
272
+ == Authors
159
273
 
160
- * {Mike Dalessio}[mailto:mike.dalessio@gmail.com]
274
+ * {Mike Dalessio}[mailto:mike.dalessio@gmail.com] (@flavorjones)
161
275
  * {Bryan Helmkamp}[mailto:bryan@brynary.com]
162
276
 
163
277
  Featuring code contributed by:
@@ -167,18 +281,35 @@ Featuring code contributed by:
167
281
  * Josh Owens
168
282
  * Paul Dix
169
283
  * Josh Nichols
284
+ * Luke Melia
170
285
 
171
286
  And a big shout-out to Corey Innis for the name, and feedback on the API.
172
287
 
173
- == HISTORICAL NOTE
288
+ == Historical Note
174
289
 
175
290
  This library was formerly known as Dryopteris, which was a very bad
176
291
  name that nobody could spell properly.
177
292
 
178
- == LICENSE
293
+ == License
179
294
 
180
295
  The MIT License
181
296
 
182
297
  Copyright (c) 2009 Mike Dalessio, Bryan Helmkamp
183
298
 
184
- See MIT-LICENSE.txt in this directory.
299
+ Permission is hereby granted, free of charge, to any person obtaining a copy
300
+ of this software and associated documentation files (the "Software"), to deal
301
+ in the Software without restriction, including without limitation the rights
302
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
303
+ copies of the Software, and to permit persons to whom the Software is
304
+ furnished to do so, subject to the following conditions:
305
+
306
+ The above copyright notice and this permission notice shall be included in
307
+ all copies or substantial portions of the Software.
308
+
309
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
310
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
311
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
312
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
313
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
314
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
315
+ THE SOFTWARE.
data/Rakefile CHANGED
@@ -13,6 +13,9 @@ Hoe.spec "loofah" do
13
13
  self.readme_file = "README.rdoc"
14
14
 
15
15
  extra_deps << ["nokogiri", ">= 1.3.3"]
16
+ extra_dev_deps << ["mocha", ">=0.9"]
17
+ extra_dev_deps << ["thoughtbot-shoulda", ">=2.10"]
18
+ extra_dev_deps << ["acts_as_fu", ">=0.0.5"]
16
19
 
17
20
  # note: .hoerc should have the following line to omit rails tests and tmp
18
21
  # exclude: !ruby/regexp /\/tmp\/|\/rails_tests\/|CVS|TAGS|\.(svn|git|DS_Store)/
@@ -44,10 +47,15 @@ task :fix_css do
44
47
  margin-top: .5em ;
45
48
  }
46
49
 
47
- div#main ul {
48
- list-style-type: disc ;
49
- list-style-position: inside ;
50
+ #main ul, div#documentation ul {
51
+ list-style-type: disc ! IMPORTANT ;
52
+ list-style-position: inside ! IMPORTANT ;
50
53
  }
54
+
55
+ h2 + ul {
56
+ margin-top: 1em;
57
+ }
58
+
51
59
  EOT
52
60
  puts "* fixing css"
53
61
  File.open("doc/rdoc.css", "a") { |f| f.write better_css }