loofah 0.2.2 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

data.tar.gz.sig CHANGED
Binary file
@@ -1,5 +1,19 @@
1
1
  = Changelog
2
2
 
3
+ == 0.3.0 (2009-10-06)
4
+
5
+ Enhancements:
6
+
7
+ * New ActiveRecord extension `xss_foliate`, a drop-in replacement for xss_terminate[http://github.com/look/xss_terminate/tree/master].
8
+ * Replacement methods for Rails's helpers, Loofah::Rails.sanitize and Loofah::Rails.strip_tags.
9
+ * Official support (and test coverage) for Rails versions 2.3, 2.2, 2.1, 2.0 and 1.2.
10
+
11
+ Deprecations:
12
+
13
+ * The methods strip_tags, whitewash, whitewash_document, sanitize, and
14
+ sanitize_document have been deprecated. See DEPRECATED.rdoc for
15
+ details on the equivalent calls with the post-0.2 API.
16
+
3
17
  == 0.2.2 (2009-09-30)
4
18
 
5
19
  Enhancements:
@@ -0,0 +1,12 @@
1
+ = Deprecations
2
+
3
+ In Loofah 0.3.0, some methods have been deprecated. The following
4
+ lists the equivalent calls with the post-0.2 API:
5
+
6
+ * <tt>strip_tags(string_or_io)</tt> is now <tt>scrub_document(string_or_io, :prune).text</tt>
7
+ * <tt>whitewash(string_or_io)</tt> is now <tt>scrub_fragment(string_or_io, :whitewash).to_s</tt>
8
+ * <tt>whitewash_document(string_or_io)</tt> is now <tt>scrub_document(string_or_io, :whitewash).to_s</tt>
9
+ * <tt>sanitize(string_or_io)</tt> is now <tt>scrub_fragment(string_or_io, :escape).to_xml</tt>
10
+ * <tt>sanitize_document(string_or_io)</tt> is now <tt>scrub_document(string_or_io, :escape).to_xml</tt>
11
+
12
+ Have a nice day.
@@ -1,4 +1,5 @@
1
1
  CHANGELOG.rdoc
2
+ DEPRECATED.rdoc
2
3
  MIT-LICENSE.txt
3
4
  Manifest.txt
4
5
  README.rdoc
@@ -11,19 +12,19 @@ benchmark/www.slashdot.com.html
11
12
  init.rb
12
13
  lib/loofah.rb
13
14
  lib/loofah/active_record.rb
14
- lib/loofah/deprecated.rb
15
+ lib/loofah/helpers.rb
15
16
  lib/loofah/html/document.rb
16
17
  lib/loofah/html/document_fragment.rb
17
18
  lib/loofah/html5/scrub.rb
18
19
  lib/loofah/html5/whitelist.rb
19
20
  lib/loofah/scrubber.rb
21
+ lib/loofah/xss_foliate.rb
20
22
  test/helper.rb
21
- test/html5/test_deprecated_sanitizer.rb
22
23
  test/html5/test_sanitizer.rb
23
24
  test/html5/testdata/tests1.dat
24
25
  test/test_active_record.rb
26
+ test/test_ad_hoc.rb
25
27
  test/test_api.rb
26
- test/test_deprecated_basic.rb
27
- test/test_microsofty.rb
28
+ test/test_helpers.rb
28
29
  test/test_scrubber.rb
29
- test/test_strip_tags.rb
30
+ test/test_xss_foliate.rb
@@ -1,34 +1,41 @@
1
1
  = Loofah
2
2
 
3
- * http://loofah.rubyforge.org/
3
+ * http://loofah.rubyforge.org
4
4
  * http://rubyforge.org/projects/loofah
5
5
  * http://github.com/flavorjones/loofah
6
6
 
7
7
  == DESCRIPTION
8
8
 
9
- Loofah is an HTML sanitizer. It will *always* fix broken markup, but
9
+ Loofah is an HTML sanitizer. It will always fix broken markup, but
10
10
  can also sanitize unsafe tags in a few different ways, and transform
11
11
  the markup for storage or display.
12
12
 
13
13
  It's built on top of Nokogiri and libxml2, so it's fast. And it uses
14
14
  html5lib's whitelist, so it most likely won't make your codes less
15
- secure.
15
+ secure. \*
16
16
 
17
- (These statements have not been evaluated by Internet Experts.)
18
-
19
- This library was formerly known as Dryopteris.
17
+ \* These statements have not been evaluated by Netexperts.
20
18
 
21
19
  == FEATURES
22
20
 
23
- * Strip unsafe tags, leaving behind only the inner text.
24
- * Prune unsafe tags and their subtrees, removing all traces that they ever existed.
25
- * Escape unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
26
- * Whitewash the markup, removing all attributes and namespaced nodes.
21
+ * _Strip_ unsafe tags, leaving behind only the inner text.
22
+ * _Prune_ unsafe tags and their subtrees, removing all traces that they ever existed.
23
+ * _Escape_ unsafe tags and their subtrees, leaving behind lots of <tt>&lt;</tt> and <tt>&gt;</tt> entities.
24
+ * _Whitewash_ the markup, removing all attributes and namespaced nodes.
27
25
  * Format the markup as plain text.
28
- * ActiveRecord extension.
29
- * 99 44/100 % Tenderlove-free!
26
+ * Replacements for Rails's +strip_tags+ and +sanitize+ helper methods.
27
+ * TWO! Count them, TWO! ActiveRecord extensions:
28
+ * Loofah::XssFoliate (an XssTerminate[http://github.com/look/xss_terminate/tree/master] drop-in replacement) is an *opt-out* sanitizer; by default all models and attributes are sanitized.
29
+ * Loofah::ActiveRecordExtension is an *opt-in* sanitizer; you must explicitly declare attributes to be sanitized.
30
+ * 99 44/100 % pure
31
+
32
+ == COMPARE AND CONTRAST
30
33
 
31
- Here is a speed test comparing Loofah to other commonly-used sanitization libraries:
34
+ Loofah is the only ruby XSS/sanitization library that guarantees
35
+ well-formed and valid markup.
36
+
37
+ Also, it's pretty fast. Here is a benchmark comparing Loofah to other
38
+ commonly-used libraries:
32
39
 
33
40
  * http://gist.github.com/170193
34
41
 
@@ -49,7 +56,15 @@ OR
49
56
  doc.to_s # => "ohai! <div>div is safe</div> "
50
57
  doc.text # => "ohai! div is safe "
51
58
 
52
- === ACTIVERECORD EXTENSION
59
+ === ACTIVERECORD EXTENSION \#1: OPT-IN
60
+
61
+ See Loofah::ActiveRecordExtension for more documentation. The methods
62
+ mixed into ActiveRecord are:
63
+
64
+ * Loofah::ActiveRecordExtension.html_document
65
+ * Loofah::ActiveRecordExtension.html_fragment
66
+
67
+ Example:
53
68
 
54
69
  # config/environment.rb
55
70
  Rails::Initializer.run do |config|
@@ -59,7 +74,8 @@ OR
59
74
  # db/schema.rb
60
75
  create_table "posts" do |t|
61
76
  t.string "title"
62
- t.string "body"
77
+ t.text "body"
78
+ t.string "author"
63
79
  end
64
80
 
65
81
  # app/model/post.rb
@@ -67,10 +83,52 @@ OR
67
83
  html_fragment :body, :scrub => :prune # scrubs 'body' in a before_validation
68
84
  end
69
85
 
86
+ === ACTIVERECORD EXTENSION \#2: OPT-OUT (XSS_TERMINATE DROP-IN REPLACEMENT)
87
+
88
+ See Loofah::XssFoliate::ClassMethods for more documentation. The methods mixed into ActiveRecord are:
89
+
90
+ * Loofah::XssFoliate::ClassMethods.xss_foliate
91
+ * Loofah::XssFoliate::ClassMethods.xss_foliated?
92
+
93
+ If the constant LOOFAH_XSS_FOLIATE_ALL_MODELS is set, then all models
94
+ inheriting from ActiveRecord::Base will sanitize their string and text
95
+ attributes in a before_validate. Otherwise, the xss_foliate method is
96
+ available for opting in to sanitization.
97
+
98
+ Example:
99
+
100
+ # config/environment
101
+ LOOFAH_XSS_FOLIATE_ALL_MODELS = true
102
+ Rails::Initializer.run do |config|
103
+ config.gem "loofah"
104
+ end
105
+
106
+ # db/schema.rb
107
+ create_table "posts" do |t|
108
+ t.string "title"
109
+ t.text "body"
110
+ t.string "author"
111
+ end
112
+
113
+ # app/model/post.rb
114
+ class Post < ActiveRecord::Base
115
+ # by default, 'title', 'body' and 'author' will all be sanitized in a before_validation,
116
+ # without additional declarations.
117
+ end
118
+
119
+ OR
120
+
121
+ # app/model/post.rb
122
+ class Post < ActiveRecord::Base
123
+ # opt-out and/or modify the sanitization method used
124
+ xss_foliate :except => [:title, :author], :prune => :body
125
+ end
126
+
70
127
  == REQUIREMENTS
71
128
 
72
- * ruby 1.8 or 1.9
73
129
  * Nokogiri >= 1.3.3
130
+ * ruby 1.8.6, 1.8.7 or 1.9
131
+ * Rails 2.3, 2.2, 2.1, 2.0 or 1.2 (for ActiveRecord extensions)
74
132
 
75
133
  == INSTALLATION
76
134
 
@@ -84,7 +142,7 @@ The bug tracker is available here:
84
142
 
85
143
  * http://github.com/flavorjones/loofah/issues
86
144
 
87
- You can also try the Nokogiri mailing list:
145
+ For now, we're piggybacking on the Nokogiri mailing list:
88
146
 
89
147
  * http://groups.google.com/group/nokogiri-talk
90
148
 
@@ -92,9 +150,10 @@ And the IRC channel is #nokogiri on freenode.
92
150
 
93
151
  == RELATED LINKS
94
152
 
95
- * Nokogiri: http://nokogiri.org/
96
- * libxml2: http://xmlsoft.org/
97
- * html5lib: http://code.google.com/p/html5lib/
153
+ * Nokogiri: http://nokogiri.org
154
+ * libxml2: http://xmlsoft.org
155
+ * html5lib: http://code.google.com/p/html5lib
156
+ * XssTerminate: http://github.com/look/xss_terminate/tree/master
98
157
 
99
158
  == AUTHORS
100
159
 
@@ -109,6 +168,13 @@ Featuring code contributed by:
109
168
  * Paul Dix
110
169
  * Josh Nichols
111
170
 
171
+ And a big shout-out to Corey Innis for the name, and feedback on the API.
172
+
173
+ == HISTORICAL NOTE
174
+
175
+ This library was formerly known as Dryopteris, which was a very bad
176
+ name that nobody could spell properly.
177
+
112
178
  == LICENSE
113
179
 
114
180
  The MIT License
data/Rakefile CHANGED
@@ -13,4 +13,42 @@ Hoe.spec "loofah" do
13
13
  self.readme_file = "README.rdoc"
14
14
 
15
15
  extra_deps << ["nokogiri", ">= 1.3.3"]
16
+
17
+ # note: .hoerc should have the following line to omit rails tests and tmp
18
+ # exclude: !ruby/regexp /\/tmp\/|\/rails_tests\/|CVS|TAGS|\.(svn|git|DS_Store)/
19
+ end
20
+
21
+ if File.exist?("rails_test/Rakefile")
22
+ load "rails_test/Rakefile"
23
+ else
24
+ task :test do
25
+ puts "----------"
26
+ puts "-- NOTE: An additional Rails regression test suite is available in source repository"
27
+ puts "----------"
28
+ end
16
29
  end
30
+
31
+ task :redocs => :fix_css
32
+ task :fix_css do
33
+ better_css = <<-EOT
34
+ .method-description pre {
35
+ margin: 1em 0 ;
36
+ }
37
+
38
+ .method-description ul {
39
+ padding: .5em 0 .5em 2em ;
40
+ }
41
+
42
+ .method-description p {
43
+ margin-top: .5em ;
44
+ }
45
+
46
+ div#main ul {
47
+ list-style-type: disc ;
48
+ list-style-position: inside ;
49
+ }
50
+ EOT
51
+ puts "* fixing css"
52
+ File.open("doc/rdoc.css", "a") { |f| f.write better_css }
53
+ end
54
+
@@ -1,49 +1,129 @@
1
1
  #!/usr/bin/env ruby
2
2
  require "#{File.dirname(__FILE__)}/helper.rb"
3
3
 
4
- BIG_FILE = File.read(File.join(File.dirname(__FILE__), "www.slashdot.com.html"))
5
- FRAGMENT = File.read(File.join(File.dirname(__FILE__), "fragment.html"))
6
- SNIPPET = "This is typical form field input in <b>length and content."
7
-
8
- def bench(content, ntimes, fragment_p)
9
- Benchmark.bm(15) do |x|
10
- x.report('Loofah') do
11
- ntimes.times do
12
- if fragment_p
13
- Loofah.scrub_fragment(content, :escape)
14
- else
15
- Loofah.scrub_document(content, :escape)
16
- end
17
- end
4
+ def compare_scrub_methods
5
+ snip = "<div>foo</div><foo>fuxx <b>quux</b></foo><script>i have a chair</script>"
6
+ puts "starting with: #{snip}"
7
+ puts
8
+ puts RailsSanitize.new.sanitize(snip) # => Rails.sanitize / scrub!(:prune).to_s
9
+ puts Loofah::Helpers.sanitize(snip)
10
+ puts "--"
11
+ puts RailsSanitize.new.strip_tags(snip) # => Rails.strip_tags / parse().text
12
+ puts Loofah::Helpers.strip_tags(snip)
13
+ puts "--"
14
+ puts Sanitize.clean(snip, Sanitize::Config::RELAXED) # => scrub!(:strip).to_s
15
+ puts Loofah.scrub_fragment(snip, :strip).to_s
16
+ puts "--"
17
+ puts HTML5libSanitize.new.sanitize(snip) # => scrub!(:escape).to_s
18
+ puts Loofah.scrub_fragment(snip, :escape).to_s
19
+ puts "--"
20
+ end
21
+
22
+ module TestSet
23
+ def test_set options={}
24
+ scale = options[:rehearse] ? 10 : 1
25
+ puts self.class.name
26
+
27
+ n = 100 / scale
28
+ puts " Large document, #{BIG_FILE.length} bytes (x#{n})"
29
+ bench BIG_FILE, n, false
30
+ puts
31
+
32
+ n = 1000 / scale
33
+ puts " Small fragment, #{FRAGMENT.length} bytes (x#{n})"
34
+ bench FRAGMENT, n, true
35
+ puts
36
+
37
+ n = 10_000 / scale
38
+ puts " Text snippet, #{SNIPPET.length} bytes (x#{n})"
39
+ bench SNIPPET, n, true
40
+ puts
41
+ end
42
+ end
43
+
44
+ class HeadToHead < Measure
45
+ end
46
+
47
+ class HeadToHeadRailsSanitize < Measure
48
+ include TestSet
49
+ def bench(content, ntimes, fragment_p)
50
+ clear_measure
51
+
52
+ measure "Loofah::Helpers.sanitize", ntimes do
53
+ Loofah::Helpers.sanitize content
18
54
  end
19
-
20
- x.report('ActionView') do
21
- sanitizer = RailsSanitize.new
22
-
23
- ntimes.times do
24
- sanitizer.sanitize(content)
25
- end
55
+
56
+ sanitizer = RailsSanitize.new
57
+ measure "ActionView sanitize", ntimes do
58
+ sanitizer.sanitize(content)
59
+ end
60
+ end
61
+ end
62
+
63
+ class HeadToHeadRailsStripTags < Measure
64
+ include TestSet
65
+ def bench(content, ntimes, fragment_p)
66
+ clear_measure
67
+
68
+ measure "Loofah::Helpers.strip_tags", ntimes do
69
+ Loofah::Helpers.strip_tags content
26
70
  end
27
-
28
- x.report('Sanitize') do
29
- ntimes.times do
30
- Sanitize.clean(content, Sanitize::Config::RELAXED)
71
+
72
+ sanitizer = RailsSanitize.new
73
+ measure "ActionView strip_tags", ntimes do
74
+ sanitizer.strip_tags(content)
75
+ end
76
+ end
77
+ end
78
+
79
+ class HeadToHeadSanitizerSanitize < Measure
80
+ include TestSet
81
+ def bench(content, ntimes, fragment_p)
82
+ clear_measure
83
+
84
+ measure "Loofah :strip", ntimes do
85
+ if fragment_p
86
+ Loofah.scrub_fragment(content, :strip).to_s
87
+ else
88
+ Loofah.scrub_document(content, :strip).to_s
31
89
  end
32
90
  end
33
-
34
- x.report('HTML5lib') do
35
- sanitizer = HTML5libSanitize.new
36
-
37
- ntimes.times do
38
- sanitizer.sanitize(content)
91
+
92
+ measure "Sanitize.clean", ntimes do
93
+ Sanitize.clean(content, Sanitize::Config::RELAXED)
94
+ end
95
+ end
96
+ end
97
+
98
+ class HeadToHeadHtml5LibSanitize < Measure
99
+ include TestSet
100
+ def bench(content, ntimes, fragment_p)
101
+ clear_measure
102
+
103
+ measure "Loofah :escape", ntimes do
104
+ if fragment_p
105
+ Loofah.scrub_fragment(content, :escape).to_s
106
+ else
107
+ Loofah.scrub_document(content, :escape).to_s
39
108
  end
40
109
  end
110
+
111
+ html5_sanitizer = HTML5libSanitize.new
112
+ measure "HTML5lib.sanitize", ntimes do
113
+ html5_sanitizer.sanitize(content)
114
+ end
41
115
  end
42
116
  end
43
117
 
44
- puts "Large document, #{BIG_FILE.length} bytes (x100)"
45
- bench BIG_FILE, 100, false
46
- puts "Small fragment, #{FRAGMENT.length} bytes (x1000)"
47
- bench FRAGMENT, 1000, true
48
- puts "Text snippet, #{SNIPPET.length} bytes (x10000)"
49
- bench SNIPPET, 10000, true
118
+ puts "Nokogiri version: #{Nokogiri::VERSION_INFO.inspect}"
119
+ puts "Loofah version: #{Loofah::VERSION.inspect}"
120
+
121
+ benches = []
122
+ benches << HeadToHeadRailsSanitize.new
123
+ benches << HeadToHeadRailsStripTags.new
124
+ benches << HeadToHeadSanitizerSanitize.new
125
+ benches << HeadToHeadHtml5LibSanitize.new
126
+ puts "---------- rehearsal ----------"
127
+ benches.each { |bench| bench.test_set :rehearse => true }
128
+ puts "---------- realsies ----------"
129
+ benches.each { |bench| bench.test_set }
@@ -6,6 +6,7 @@ require 'benchmark'
6
6
  require "action_view"
7
7
  require "action_controller/vendor/html-scanner"
8
8
  require "sanitize"
9
+ require 'hitimes'
9
10
 
10
11
  class RailsSanitize
11
12
  include ActionView::Helpers::SanitizeHelper
@@ -30,3 +31,38 @@ class HTML5libSanitize
30
31
  }).to_s
31
32
  end
32
33
  end
34
+
35
+ BIG_FILE = File.read(File.join(File.dirname(__FILE__), "www.slashdot.com.html"))
36
+ FRAGMENT = File.read(File.join(File.dirname(__FILE__), "fragment.html"))
37
+ SNIPPET = "This is typical form field input in <b>length and content."
38
+
39
+ class Measure
40
+ def initialize
41
+ clear_measure
42
+ end
43
+
44
+ def clear_measure
45
+ @first_time = true
46
+ @baseline = nil
47
+ end
48
+
49
+ def measure(name, ntimes)
50
+ if @first_time
51
+ printf " %-30s %7s %8s %5s\n", "", "total", "single", "rel"
52
+ @first_time = false
53
+ end
54
+ timer = Hitimes::TimedMetric.new(name)
55
+ timer.start
56
+ ntimes.times do |j|
57
+ yield
58
+ end
59
+ timer.stop
60
+ if @baseline
61
+ printf " %30s %7.3f (%8.6f) %5.2fx\n", timer.name, timer.sum, timer.sum / ntimes, timer.sum / @baseline
62
+ else
63
+ @baseline = timer.sum
64
+ printf " %30s %7.3f (%8.6f) %5s\n", timer.name, timer.sum, timer.sum / ntimes, "-"
65
+ end
66
+ timer.sum
67
+ end
68
+ end