loofah 0.4.6 → 0.4.7

Sign up to get free protection for your applications and to get access to all the features.

Potentially problematic release.


This version of loofah might be problematic. Click here for more details.

data.tar.gz.sig CHANGED
Binary file
data/CHANGELOG.rdoc CHANGED
@@ -1,60 +1,72 @@
1
1
  = Changelog
2
2
 
3
+ == 0.4.7 (2010-03-09)
4
+
5
+ Enhancements:
6
+
7
+ * New methods Loofah::HTML::Document#to_text and
8
+ Loofah::HTML::DocumentFragment#to_text do the right thing with
9
+ whitespace. Note that these methods are significantly slower than
10
+ #text. GH #12
11
+ * Loofah::Elements::BLOCK_LEVEL contains a canonical list of HTML4 block-level4 elements.
12
+ * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text
13
+ will return unescaped HTML entities by passing :encode_special_chars => false.
14
+
3
15
  == 0.4.4, 0.4.5, 0.4.6 (2010-02-01)
4
16
 
5
17
  Enhancements:
6
18
 
7
- * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text now escape HTML entities.
19
+ * Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text now escape HTML entities.
8
20
 
9
21
  Bug fixes:
10
22
 
11
- * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
23
+ * Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
12
24
 
13
25
  == 0.4.3 (2010-01-29)
14
26
 
15
27
  Enhancements:
16
28
 
17
- * All built-in scrubbers are accepted by ActiveRecord::Base.xss_foliate
18
- * Loofah::XssFoliate.xss_foliate_all_models replaces use of the constant LOOFAH_XSS_FOLIATE_ALL_MODELS
29
+ * All built-in scrubbers are accepted by ActiveRecord::Base.xss_foliate
30
+ * Loofah::XssFoliate.xss_foliate_all_models replaces use of the constant LOOFAH_XSS_FOLIATE_ALL_MODELS
19
31
 
20
32
  Miscellaneous:
21
33
 
22
- * Modified documentation for bootstrapping XssFoliate in a Rails
23
- app, since the use of Bundler breaks the previously-documented
24
- method. To be safe, always use an initializer file.
34
+ * Modified documentation for bootstrapping XssFoliate in a Rails app,
35
+ since the use of Bundler breaks the previously-documented method. To
36
+ be safe, always use an initializer file.
25
37
 
26
38
  == 0.4.2 (2010-01-22)
27
39
 
28
40
  Enhancements:
29
41
 
30
- * Implemented Node#scrub! for scrubbing subtrees.
31
- * Implemented NodeSet#scrub! for scrubbing a set of subtrees.
32
- * Document.text now only serializes <body> contents (ignores <head>)
33
- * <head>, <html> and <body> added to the HTML5lib whitelist.
42
+ * Implemented Node#scrub! for scrubbing subtrees.
43
+ * Implemented NodeSet#scrub! for scrubbing a set of subtrees.
44
+ * Document.text now only serializes <body> contents (ignores <head>)
45
+ * <head>, <html> and <body> added to the HTML5lib whitelist.
34
46
 
35
47
  Bug fixes:
36
48
 
37
- * Supporting Rails apps that aren't loading ActiveRecord. GH #10
49
+ * Supporting Rails apps that aren't loading ActiveRecord. GH #10
38
50
 
39
51
  Miscellaneous:
40
52
 
41
- * Mailing list is now loofah@librelist.com / http://librelist.com
42
- * IRC channel is now \#loofah on freenode.
53
+ * Mailing list is now loofah@librelist.com / http://librelist.com
54
+ * IRC channel is now \#loofah on freenode.
43
55
 
44
56
  == 0.4.1 (2009-11-23)
45
57
 
46
58
  Bugfix:
47
59
 
48
- * Manifest fixed. Whoops.
60
+ * Manifest fixed. Whoops.
49
61
 
50
62
  == 0.4.0 (2009-11-21)
51
63
 
52
64
  Enhancements:
53
65
 
54
- * Scrubber class introduced, allowing development of custom scrubbers.
55
- * Added support for XML documents and fragments.
56
- * Added :nofollow HTML scrubber (thanks Luke Melia!)
57
- * Built-in scrubbing methods refactored to use Scrubber.
66
+ * Scrubber class introduced, allowing development of custom scrubbers.
67
+ * Added support for XML documents and fragments.
68
+ * Added :nofollow HTML scrubber (thanks Luke Melia!)
69
+ * Built-in scrubbing methods refactored to use Scrubber.
58
70
 
59
71
  == 0.3.1 (2009-10-12)
60
72
 
data/Manifest.txt CHANGED
@@ -12,12 +12,14 @@ benchmark/www.slashdot.com.html
12
12
  init.rb
13
13
  lib/loofah.rb
14
14
  lib/loofah/active_record.rb
15
+ lib/loofah/elements.rb
15
16
  lib/loofah/helpers.rb
16
17
  lib/loofah/html/document.rb
17
18
  lib/loofah/html/document_fragment.rb
18
19
  lib/loofah/html5/scrub.rb
19
20
  lib/loofah/html5/whitelist.rb
20
21
  lib/loofah/instance_methods.rb
22
+ lib/loofah/metahelpers.rb
21
23
  lib/loofah/scrubber.rb
22
24
  lib/loofah/scrubbers.rb
23
25
  lib/loofah/xml/document.rb
@@ -27,7 +29,9 @@ test/helper.rb
27
29
  test/html5/test_sanitizer.rb
28
30
  test/integration/test_ad_hoc.rb
29
31
  test/integration/test_helpers.rb
32
+ test/integration/test_html.rb
30
33
  test/integration/test_scrubbers.rb
34
+ test/integration/test_xml.rb
31
35
  test/unit/test_active_record.rb
32
36
  test/unit/test_api.rb
33
37
  test/unit/test_helpers.rb
data/README.rdoc CHANGED
@@ -120,6 +120,13 @@ and +text+ to return plain text:
120
120
 
121
121
  doc.text # => "ohai! div is safe "
122
122
 
123
+ Also, +to_text+ is available, which does the right thing with
124
+ whitespace around block-level elements.
125
+
126
+ doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
127
+ doc.text # => "TitleContent" # probably not what you want
128
+ doc.to_text # => "\nTitle\n\nContent\n" # better
129
+
123
130
  === Loofah::XML::Document and Loofah::XML::DocumentFragment
124
131
 
125
132
  These classes are subclasses of Nokogiri::XML::Document and
data/lib/loofah.rb CHANGED
@@ -2,6 +2,9 @@ $LOAD_PATH.unshift(File.expand_path(File.dirname(__FILE__))) unless $LOAD_PATH.i
2
2
 
3
3
  require 'nokogiri'
4
4
 
5
+ require 'loofah/metahelpers'
6
+ require 'loofah/elements'
7
+
5
8
  require 'loofah/html5/whitelist'
6
9
  require 'loofah/html5/scrub'
7
10
 
@@ -26,7 +29,7 @@ require 'loofah/helpers'
26
29
  #
27
30
  module Loofah
28
31
  # The version of Loofah you are using
29
- VERSION = '0.4.6'
32
+ VERSION = '0.4.7'
30
33
 
31
34
  # The minimum required version of Nokogiri
32
35
  REQUIRED_NOKOGIRI_VERSION = '1.3.3'
@@ -0,0 +1,19 @@
1
+ module Loofah
2
+ module Elements
3
+ # Block elements in HTML4
4
+ STRICT_BLOCK_LEVEL = %w[address blockquote center dir div dl
5
+ fieldset form h1 h2 h3 h4 h5 h6 hr isindex menu noframes
6
+ noscript ol p pre table ul]
7
+
8
+ # The following elements may also be considered block-level elements since they may contain block-level elements
9
+ LOOSE_BLOCK_LEVEL = %w[dd dt frameset li tbody td tfoot th thead tr]
10
+
11
+ BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
12
+ end
13
+
14
+ module HashedElements
15
+ include Loofah::MetaHelpers::HashifiedConstants(Elements)
16
+ end
17
+ end
18
+
19
+
@@ -18,6 +18,13 @@ module Loofah
18
18
  def sanitize(string_or_io)
19
19
  Loofah.scrub_fragment(string_or_io, :strip).to_s
20
20
  end
21
+
22
+ #
23
+ # A helper to remove extraneous whitespace from text-ified HTML
24
+ #
25
+ def remove_extraneous_whitespace(string)
26
+ string.gsub(/\n\s*\n\s*\n/,"\n\n")
27
+ end
21
28
  end
22
29
  end
23
30
  end
@@ -3,21 +3,16 @@ module Loofah
3
3
  #
4
4
  # Subclass of Nokogiri::HTML::Document.
5
5
  #
6
- # See Loofah::ScrubBehavior and Loofah::DocumentDecorator for additional methods.
6
+ # See Loofah::ScrubBehavior and Loofah::TextBehavior for additional methods.
7
7
  #
8
8
  class Document < Nokogiri::HTML::Document
9
9
  include Loofah::ScrubBehavior::Node
10
10
  include Loofah::DocumentDecorator
11
+ include Loofah::TextBehavior
11
12
 
12
- #
13
- # Returns a plain-text version of the markup contained by the document,
14
- # with HTML entities encoded.
15
- #
16
- def text
17
- encode_special_chars xpath("/html/body").inner_text
13
+ def serialize_root
14
+ at_xpath("/html/body")
18
15
  end
19
- alias :inner_text :text
20
- alias :to_str :text
21
16
  end
22
17
  end
23
18
  end
@@ -3,9 +3,11 @@ module Loofah
3
3
  #
4
4
  # Subclass of Nokogiri::HTML::DocumentFragment.
5
5
  #
6
- # See Loofah::ScrubBehavior for additional methods.
6
+ # See Loofah::ScrubBehavior and Loofah::TextBehavior for additional methods.
7
7
  #
8
8
  class DocumentFragment < Nokogiri::HTML::DocumentFragment
9
+ include Loofah::TextBehavior
10
+
9
11
  class << self
10
12
  #
11
13
  # Overridden Nokogiri::HTML::DocumentFragment
@@ -21,23 +23,12 @@ module Loofah
21
23
  # Returns the HTML markup contained by the fragment
22
24
  #
23
25
  def to_s
24
- serialize_roots.children.to_s
26
+ serialize_root.children.to_s
25
27
  end
26
28
  alias :serialize :to_s
27
29
 
28
- #
29
- # Returns a plain-text version of the markup contained by the fragment
30
- #
31
- def text
32
- encode_special_chars serialize_roots.children.inner_text
33
- end
34
- alias :inner_text :text
35
- alias :to_str :text
36
-
37
- private
38
-
39
- def serialize_roots # :nodoc:
40
- xpath("./body").first || self
30
+ def serialize_root
31
+ at_xpath("./body") || self
41
32
  end
42
33
  end
43
34
  end
@@ -162,13 +162,7 @@ module Loofah
162
162
  # The HTML5lib whitelist arrays, transformed into hashes for faster lookup.
163
163
  #
164
164
  module HashedWhiteList
165
- WhiteList.constants.each do |constant|
166
- next unless WhiteList.module_eval("#{constant}").is_a?(Array)
167
- module_eval <<-CODE
168
- #{constant} = {}
169
- WhiteList::#{constant}.each { |c| #{constant}[c] = true ; #{constant}[c.downcase] = true }
170
- CODE
171
- end
165
+ include Loofah::MetaHelpers::HashifiedConstants(WhiteList)
172
166
  end
173
167
  end
174
168
  end
@@ -27,8 +27,7 @@ module Loofah
27
27
  # README.rdoc for more example usage.
28
28
  #
29
29
  module ScrubBehavior
30
- # see Loofah::ScrubBehavior
31
- module Node
30
+ module Node # :nodoc:
32
31
  def scrub!(scrubber)
33
32
  #
34
33
  # yes. this should be three separate methods. but nokogiri
@@ -50,8 +49,7 @@ module Loofah
50
49
  end
51
50
  end
52
51
 
53
- # see Loofah::ScrubBehavior
54
- module NodeSet
52
+ module NodeSet # :nodoc:
55
53
  def scrub!(scrubber)
56
54
  each { |node| node.scrub!(scrubber) }
57
55
  self
@@ -67,6 +65,58 @@ module Loofah
67
65
  end
68
66
  end
69
67
 
68
+ #
69
+ # Overrides +text+ in HTML::Document and HTML::DocumentFragment,
70
+ # and mixes in +to_text+.
71
+ #
72
+ module TextBehavior
73
+ #
74
+ # Returns a plain-text version of the markup contained by the document,
75
+ # with HTML entities encoded.
76
+ #
77
+ # This method is significantly faster than #to_text, but isn't
78
+ # clever about whitespace around block elements.
79
+ #
80
+ # Loofah.document("<h1>Title</h1><div>Content</div>").text
81
+ # # => "TitleContent"
82
+ #
83
+ # By default, the returned text will have HTML entities
84
+ # escaped. If you want unescaped entities, and you understand
85
+ # that the result is unsafe to render in a browser, then you
86
+ # can pass an argument as shown:
87
+ #
88
+ # frag = Loofah.fragment("&lt;script&gt;alert('EVIL');&lt;/script&gt;")
89
+ # # ok for browser:
90
+ # frag.text # => "&lt;script&gt;alert('EVIL');&lt;/script&gt;"
91
+ # # decidedly not ok for browser:
92
+ # frag.text(:encode_special_chars => false) # => "<script>alert('EVIL');</script>"
93
+ #
94
+ def text(options={})
95
+ result = serialize_root.children.inner_text rescue ""
96
+ if options[:encode_special_chars] == false
97
+ result # possibly dangerous if rendered in a browser
98
+ else
99
+ encode_special_chars result
100
+ end
101
+ end
102
+ alias :inner_text :text
103
+ alias :to_str :text
104
+
105
+ #
106
+ # Returns a plain-text version of the markup contained by the
107
+ # fragment, with HTML entities encoded.
108
+ #
109
+ # This method is slower than #to_text, but is clever about
110
+ # whitespace around block elements.
111
+ #
112
+ # Loofah.document("<h1>Title</h1><div>Content</div>").to_text
113
+ # # => "\nTitle\n\nContent\n"
114
+ #
115
+ def to_text(options={})
116
+ Loofah::Helpers.remove_extraneous_whitespace self.dup.scrub!(:newline_block_elements).text(options)
117
+ end
118
+ end
119
+
70
120
  module DocumentDecorator # :nodoc:
71
121
  def initialize(*args, &block)
72
122
  super
@@ -0,0 +1,15 @@
1
+ module Loofah
2
+ module MetaHelpers
3
+ def self.HashifiedConstants(orig_module)
4
+ hashed_module = Module.new
5
+ orig_module.constants.each do |constant|
6
+ next unless orig_module.module_eval("#{constant}").is_a?(Array)
7
+ hashed_module.module_eval <<-CODE
8
+ #{constant} = {}
9
+ #{orig_module.name}::#{constant}.each { |c| #{constant}[c] = true ; #{constant}[c.downcase] = true }
10
+ CODE
11
+ end
12
+ hashed_module
13
+ end
14
+ end
15
+ end
@@ -58,7 +58,6 @@ module Loofah
58
58
  # Loofah.fragment(link_farmers_markup).scrub!(:nofollow)
59
59
  # => "ohai! <a href='http://www.myswarmysite.com/' rel="nofollow">I like your blog post</a>"
60
60
  #
61
- #
62
61
  module Scrubbers
63
62
  #
64
63
  # === scrub!(:strip)
@@ -184,15 +183,30 @@ module Loofah
184
183
  end
185
184
  end
186
185
 
186
+ # This class probably isn't useful publicly, but is used for #to_text's current implemention
187
+ class NewlineBlockElements < Scrubber # :nodoc:
188
+ def initialize
189
+ @direction = :bottom_up
190
+ end
191
+
192
+ def scrub(node)
193
+ return CONTINUE unless Loofah::HashedElements::BLOCK_LEVEL[node.name]
194
+ replacement_killer = Nokogiri::XML::Text.new("\n#{node.content}\n", node.document)
195
+ node.add_next_sibling replacement_killer
196
+ node.remove
197
+ end
198
+ end
199
+
187
200
  #
188
201
  # A hash that maps a symbol (like +:prune+) to the appropriate Scrubber (Loofah::Scrubbers::Prune).
189
202
  #
190
203
  MAP = {
191
- :escape => Escape,
192
- :prune => Prune,
204
+ :escape => Escape,
205
+ :prune => Prune,
193
206
  :whitewash => Whitewash,
194
- :strip => Strip,
195
- :nofollow => NoFollow
207
+ :strip => Strip,
208
+ :nofollow => NoFollow,
209
+ :newline_block_elements => NewlineBlockElements
196
210
  }
197
211
 
198
212
  #
@@ -16,81 +16,6 @@ class TestAdHoc < Test::Unit::TestCase
16
16
  end
17
17
  end
18
18
 
19
- context "integration test" do
20
- context "xml document" do
21
- context "custom scrubber" do
22
- should "act as expected" do
23
- xml = Loofah.xml_document <<-EOXML
24
- <root>
25
- <employee deceased='true'>Abraham Lincoln</employee>
26
- <employee deceased='false'>Abe Vigoda</employee>
27
- </root>
28
- EOXML
29
- bring_out_your_dead = Loofah::Scrubber.new do |node|
30
- if node.name == "employee" and node["deceased"] == "true"
31
- node.remove
32
- Loofah::Scrubber::STOP # don't bother with the rest of the subtree
33
- end
34
- end
35
- assert_equal 2, xml.css("employee").length
36
-
37
- xml.scrub!(bring_out_your_dead)
38
-
39
- employees = xml.css "employee"
40
- assert_equal 1, employees.length
41
- assert_equal "Abe Vigoda", employees.first.inner_text
42
- end
43
- end
44
- end
45
-
46
- context "xml fragment" do
47
- context "custom scrubber" do
48
- should "act as expected" do
49
- xml = Loofah.xml_fragment <<-EOXML
50
- <employee deceased='true'>Abraham Lincoln</employee>
51
- <employee deceased='false'>Abe Vigoda</employee>
52
- EOXML
53
- bring_out_your_dead = Loofah::Scrubber.new do |node|
54
- if node.name == "employee" and node["deceased"] == "true"
55
- node.remove
56
- Loofah::Scrubber::STOP # don't bother with the rest of the subtree
57
- end
58
- end
59
- assert_equal 2, xml.css("employee").length
60
-
61
- xml.scrub!(bring_out_your_dead)
62
-
63
- employees = xml.css "employee"
64
- assert_equal 1, employees.length
65
- assert_equal "Abe Vigoda", employees.first.inner_text
66
- end
67
- end
68
- end
69
-
70
- context "html fragment" do
71
- context "#to_s" do
72
- should "not include head tags (like style)" do
73
- html = Loofah.fragment "<style>foo</style><div>bar</div>"
74
- assert_equal "<div>bar</div>", html.to_s
75
- end
76
- end
77
-
78
- context "#text" do
79
- should "not include head tags (like style)" do
80
- html = Loofah.fragment "<style>foo</style><div>bar</div>"
81
- assert_equal "bar", html.text
82
- end
83
- end
84
- end
85
-
86
- context "html document" do
87
- should "not include head tags (like style)" do
88
- html = Loofah.document "<style>foo</style><div>bar</div>"
89
- assert_equal "bar", html.text
90
- end
91
- end
92
- end
93
-
94
19
  def test_removal_of_illegal_tag
95
20
  html = <<-HTML
96
21
  following this there should be no jim tag
@@ -0,0 +1,51 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', 'helper'))
2
+
3
+ class TestHtml < Test::Unit::TestCase
4
+ context "html fragment" do
5
+ context "#to_s" do
6
+ should "not include head tags (like style)" do
7
+ html = Loofah.fragment "<style>foo</style><div>bar</div>"
8
+ assert_equal "<div>bar</div>", html.to_s
9
+ end
10
+ end
11
+
12
+ context "#text" do
13
+ should "not include head tags (like style)" do
14
+ html = Loofah.fragment "<style>foo</style><div>bar</div>"
15
+ assert_equal "bar", html.text
16
+ end
17
+ end
18
+
19
+ context "#to_text" do
20
+ should "add newlines before and after block elements" do
21
+ html = Loofah.fragment "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
22
+ assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
23
+ end
24
+
25
+ should "remove extraneous whitespace" do
26
+ html = Loofah.fragment "<div>tweedle\n\n\t\n\s\nbeetle</div>"
27
+ assert_equal "\ntweedle\n\nbeetle\n", html.to_text
28
+ end
29
+ end
30
+ end
31
+
32
+ context "html document" do
33
+ should "not include head tags (like style)" do
34
+ html = Loofah.document "<style>foo</style><div>bar</div>"
35
+ assert_equal "bar", html.text
36
+ end
37
+
38
+ context "#to_text" do
39
+ should "add newlines before and after block elements" do
40
+ html = Loofah.document "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
41
+ assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
42
+ end
43
+
44
+ should "remove extraneous whitespace" do
45
+ html = Loofah.document "<div>tweedle\n\n\t\n\s\nbeetle</div>"
46
+ assert_equal "\ntweedle\n\nbeetle\n", html.to_text
47
+ end
48
+ end
49
+ end
50
+ end
51
+
@@ -18,6 +18,7 @@ class TestScrubbers < Test::Unit::TestCase
18
18
 
19
19
  ENTITY_HACK_ATTACK = "<div><div>Hack attack!</div><div>&lt;script&gt;alert('evil')&lt;/script&gt;</div></div>"
20
20
  ENTITY_HACK_ATTACK_TEXT_SCRUB = "Hack attack!&lt;script&gt;alert('evil')&lt;/script&gt;"
21
+ ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC = "Hack attack!<script>alert('evil')</script>"
21
22
 
22
23
  context "Document" do
23
24
  context "#scrub!" do
@@ -89,6 +90,24 @@ class TestScrubbers < Test::Unit::TestCase
89
90
 
90
91
  assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
91
92
  end
93
+
94
+ context "with encode_special_chars => false" do
95
+ should "leave behind only inner text with html entities unescaped" do
96
+ doc = Loofah::HTML::Document.parse "<html><body>#{ENTITY_HACK_ATTACK}</body></html>"
97
+ result = doc.text(:encode_special_chars => false)
98
+
99
+ assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC, result
100
+ end
101
+ end
102
+
103
+ context "with encode_special_chars => true" do
104
+ should "leave behind only inner text with html entities still escaped" do
105
+ doc = Loofah::HTML::Document.parse "<html><body>#{ENTITY_HACK_ATTACK}</body></html>"
106
+ result = doc.text(:encode_special_chars => true)
107
+
108
+ assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
109
+ end
110
+ end
92
111
  end
93
112
 
94
113
  context "#to_s" do
@@ -239,6 +258,24 @@ class TestScrubbers < Test::Unit::TestCase
239
258
 
240
259
  assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
241
260
  end
261
+
262
+ context "with encode_special_chars => false" do
263
+ should "leave behind only inner text with html entities unescaped" do
264
+ doc = Loofah::HTML::DocumentFragment.parse "<div>#{ENTITY_HACK_ATTACK}</div>"
265
+ result = doc.text(:encode_special_chars => false)
266
+
267
+ assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC, result
268
+ end
269
+ end
270
+
271
+ context "with encode_special_chars => true" do
272
+ should "leave behind only inner text with html entities still escaped" do
273
+ doc = Loofah::HTML::DocumentFragment.parse "<div>#{ENTITY_HACK_ATTACK}</div>"
274
+ result = doc.text(:encode_special_chars => true)
275
+
276
+ assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
277
+ end
278
+ end
242
279
  end
243
280
 
244
281
  context "#to_s" do
@@ -0,0 +1,55 @@
1
+ require File.expand_path(File.join(File.dirname(__FILE__), '..', 'helper'))
2
+
3
+ class TestXml < Test::Unit::TestCase
4
+ context "integration test" do
5
+ context "xml document" do
6
+ context "custom scrubber" do
7
+ should "act as expected" do
8
+ xml = Loofah.xml_document <<-EOXML
9
+ <root>
10
+ <employee deceased='true'>Abraham Lincoln</employee>
11
+ <employee deceased='false'>Abe Vigoda</employee>
12
+ </root>
13
+ EOXML
14
+ bring_out_your_dead = Loofah::Scrubber.new do |node|
15
+ if node.name == "employee" and node["deceased"] == "true"
16
+ node.remove
17
+ Loofah::Scrubber::STOP # don't bother with the rest of the subtree
18
+ end
19
+ end
20
+ assert_equal 2, xml.css("employee").length
21
+
22
+ xml.scrub!(bring_out_your_dead)
23
+
24
+ employees = xml.css "employee"
25
+ assert_equal 1, employees.length
26
+ assert_equal "Abe Vigoda", employees.first.inner_text
27
+ end
28
+ end
29
+ end
30
+
31
+ context "xml fragment" do
32
+ context "custom scrubber" do
33
+ should "act as expected" do
34
+ xml = Loofah.xml_fragment <<-EOXML
35
+ <employee deceased='true'>Abraham Lincoln</employee>
36
+ <employee deceased='false'>Abe Vigoda</employee>
37
+ EOXML
38
+ bring_out_your_dead = Loofah::Scrubber.new do |node|
39
+ if node.name == "employee" and node["deceased"] == "true"
40
+ node.remove
41
+ Loofah::Scrubber::STOP # don't bother with the rest of the subtree
42
+ end
43
+ end
44
+ assert_equal 2, xml.css("employee").length
45
+
46
+ xml.scrub!(bring_out_your_dead)
47
+
48
+ employees = xml.css "employee"
49
+ assert_equal 1, employees.length
50
+ assert_equal "Abe Vigoda", employees.first.inner_text
51
+ end
52
+ end
53
+ end
54
+ end
55
+ end
@@ -81,13 +81,13 @@ class TestApi < Test::Unit::TestCase
81
81
  end
82
82
 
83
83
  def test_loofah_xml_document_node_scrub!
84
- doc = Loofah.document(XML)
84
+ doc = Loofah.xml_document(XML)
85
85
  assert(node = doc.at_css("div"))
86
86
  node.scrub!(:strip)
87
87
  end
88
88
 
89
89
  def test_loofah_xml_fragment_node_scrub!
90
- doc = Loofah.fragment(XML)
90
+ doc = Loofah.xml_fragment(XML)
91
91
  assert(node = doc.at_css("div"))
92
92
  node.scrub!(:strip)
93
93
  end
@@ -99,6 +99,16 @@ class TestApi < Test::Unit::TestCase
99
99
  node_set.scrub!(:strip)
100
100
  end
101
101
 
102
+ should "HTML::DocumentFragment exposes serialize_root" do
103
+ doc = Loofah.fragment(HTML)
104
+ assert_equal HTML, doc.serialize_root.to_html
105
+ end
106
+
107
+ should "HTML::Document exposes serialize_root" do
108
+ doc = Loofah.document(HTML)
109
+ assert_equal HTML, doc.serialize_root.children.to_html
110
+ end
111
+
102
112
  private
103
113
 
104
114
  def assert_html_documentish(doc)
metadata CHANGED
@@ -1,7 +1,12 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: loofah
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.6
4
+ prerelease: false
5
+ segments:
6
+ - 0
7
+ - 4
8
+ - 7
9
+ version: 0.4.7
5
10
  platform: ruby
6
11
  authors:
7
12
  - Mike Dalessio
@@ -31,59 +36,105 @@ cert_chain:
31
36
  FlqnTjy13J3nD30uxy9a1g==
32
37
  -----END CERTIFICATE-----
33
38
 
34
- date: 2010-02-02 00:00:00 -05:00
39
+ date: 2010-03-09 00:00:00 -05:00
35
40
  default_executable:
36
41
  dependencies:
37
42
  - !ruby/object:Gem::Dependency
38
43
  name: nokogiri
39
- type: :runtime
40
- version_requirement:
41
- version_requirements: !ruby/object:Gem::Requirement
44
+ prerelease: false
45
+ requirement: &id001 !ruby/object:Gem::Requirement
42
46
  requirements:
43
47
  - - ">="
44
48
  - !ruby/object:Gem::Version
49
+ segments:
50
+ - 1
51
+ - 3
52
+ - 3
45
53
  version: 1.3.3
46
- version:
54
+ type: :runtime
55
+ version_requirements: *id001
47
56
  - !ruby/object:Gem::Dependency
48
- name: mocha
57
+ name: rubyforge
58
+ prerelease: false
59
+ requirement: &id002 !ruby/object:Gem::Requirement
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ segments:
64
+ - 2
65
+ - 0
66
+ - 3
67
+ version: 2.0.3
68
+ type: :development
69
+ version_requirements: *id002
70
+ - !ruby/object:Gem::Dependency
71
+ name: gemcutter
72
+ prerelease: false
73
+ requirement: &id003 !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - ">="
76
+ - !ruby/object:Gem::Version
77
+ segments:
78
+ - 0
79
+ - 3
80
+ - 0
81
+ version: 0.3.0
49
82
  type: :development
50
- version_requirement:
51
- version_requirements: !ruby/object:Gem::Requirement
83
+ version_requirements: *id003
84
+ - !ruby/object:Gem::Dependency
85
+ name: mocha
86
+ prerelease: false
87
+ requirement: &id004 !ruby/object:Gem::Requirement
52
88
  requirements:
53
89
  - - ">="
54
90
  - !ruby/object:Gem::Version
91
+ segments:
92
+ - 0
93
+ - 9
55
94
  version: "0.9"
56
- version:
95
+ type: :development
96
+ version_requirements: *id004
57
97
  - !ruby/object:Gem::Dependency
58
98
  name: thoughtbot-shoulda
59
- type: :development
60
- version_requirement:
61
- version_requirements: !ruby/object:Gem::Requirement
99
+ prerelease: false
100
+ requirement: &id005 !ruby/object:Gem::Requirement
62
101
  requirements:
63
102
  - - ">="
64
103
  - !ruby/object:Gem::Version
104
+ segments:
105
+ - 2
106
+ - 10
65
107
  version: "2.10"
66
- version:
108
+ type: :development
109
+ version_requirements: *id005
67
110
  - !ruby/object:Gem::Dependency
68
111
  name: acts_as_fu
69
- type: :development
70
- version_requirement:
71
- version_requirements: !ruby/object:Gem::Requirement
112
+ prerelease: false
113
+ requirement: &id006 !ruby/object:Gem::Requirement
72
114
  requirements:
73
115
  - - ">="
74
116
  - !ruby/object:Gem::Version
117
+ segments:
118
+ - 0
119
+ - 0
120
+ - 5
75
121
  version: 0.0.5
76
- version:
122
+ type: :development
123
+ version_requirements: *id006
77
124
  - !ruby/object:Gem::Dependency
78
125
  name: hoe
79
- type: :development
80
- version_requirement:
81
- version_requirements: !ruby/object:Gem::Requirement
126
+ prerelease: false
127
+ requirement: &id007 !ruby/object:Gem::Requirement
82
128
  requirements:
83
129
  - - ">="
84
130
  - !ruby/object:Gem::Version
85
- version: 2.3.3
86
- version:
131
+ segments:
132
+ - 2
133
+ - 5
134
+ - 0
135
+ version: 2.5.0
136
+ type: :development
137
+ version_requirements: *id007
87
138
  description: |-
88
139
  Loofah is a general library for manipulating HTML/XML documents and
89
140
  fragments. It's built on top of Nokogiri and libxml2, so it's fast and
@@ -122,12 +173,14 @@ files:
122
173
  - init.rb
123
174
  - lib/loofah.rb
124
175
  - lib/loofah/active_record.rb
176
+ - lib/loofah/elements.rb
125
177
  - lib/loofah/helpers.rb
126
178
  - lib/loofah/html/document.rb
127
179
  - lib/loofah/html/document_fragment.rb
128
180
  - lib/loofah/html5/scrub.rb
129
181
  - lib/loofah/html5/whitelist.rb
130
182
  - lib/loofah/instance_methods.rb
183
+ - lib/loofah/metahelpers.rb
131
184
  - lib/loofah/scrubber.rb
132
185
  - lib/loofah/scrubbers.rb
133
186
  - lib/loofah/xml/document.rb
@@ -137,7 +190,9 @@ files:
137
190
  - test/html5/test_sanitizer.rb
138
191
  - test/integration/test_ad_hoc.rb
139
192
  - test/integration/test_helpers.rb
193
+ - test/integration/test_html.rb
140
194
  - test/integration/test_scrubbers.rb
195
+ - test/integration/test_xml.rb
141
196
  - test/unit/test_active_record.rb
142
197
  - test/unit/test_api.rb
143
198
  - test/unit/test_helpers.rb
@@ -158,18 +213,20 @@ required_ruby_version: !ruby/object:Gem::Requirement
158
213
  requirements:
159
214
  - - ">="
160
215
  - !ruby/object:Gem::Version
216
+ segments:
217
+ - 0
161
218
  version: "0"
162
- version:
163
219
  required_rubygems_version: !ruby/object:Gem::Requirement
164
220
  requirements:
165
221
  - - ">="
166
222
  - !ruby/object:Gem::Version
223
+ segments:
224
+ - 0
167
225
  version: "0"
168
- version:
169
226
  requirements: []
170
227
 
171
228
  rubyforge_project: loofah
172
- rubygems_version: 1.3.5
229
+ rubygems_version: 1.3.6
173
230
  signing_key:
174
231
  specification_version: 3
175
232
  summary: Loofah is a general library for manipulating HTML/XML documents and fragments
@@ -177,6 +234,8 @@ test_files:
177
234
  - test/integration/test_helpers.rb
178
235
  - test/integration/test_scrubbers.rb
179
236
  - test/integration/test_ad_hoc.rb
237
+ - test/integration/test_xml.rb
238
+ - test/integration/test_html.rb
180
239
  - test/unit/test_xss_foliate.rb
181
240
  - test/unit/test_helpers.rb
182
241
  - test/unit/test_scrubber.rb
metadata.gz.sig CHANGED
Binary file