loofah 0.4.6 → 0.4.7
Sign up to get free protection for your applications and to get access to all the features.
Potentially problematic release.
This version of loofah might be problematic. Click here for more details.
- data.tar.gz.sig +0 -0
- data/CHANGELOG.rdoc +31 -19
- data/Manifest.txt +4 -0
- data/README.rdoc +7 -0
- data/lib/loofah.rb +4 -1
- data/lib/loofah/elements.rb +19 -0
- data/lib/loofah/helpers.rb +7 -0
- data/lib/loofah/html/document.rb +4 -9
- data/lib/loofah/html/document_fragment.rb +6 -15
- data/lib/loofah/html5/whitelist.rb +1 -7
- data/lib/loofah/instance_methods.rb +54 -4
- data/lib/loofah/metahelpers.rb +15 -0
- data/lib/loofah/scrubbers.rb +19 -5
- data/test/integration/test_ad_hoc.rb +0 -75
- data/test/integration/test_html.rb +51 -0
- data/test/integration/test_scrubbers.rb +37 -0
- data/test/integration/test_xml.rb +55 -0
- data/test/unit/test_api.rb +12 -2
- metadata +85 -26
- metadata.gz.sig +0 -0
data.tar.gz.sig
CHANGED
Binary file
|
data/CHANGELOG.rdoc
CHANGED
@@ -1,60 +1,72 @@
|
|
1
1
|
= Changelog
|
2
2
|
|
3
|
+
== 0.4.7 (2010-03-09)
|
4
|
+
|
5
|
+
Enhancements:
|
6
|
+
|
7
|
+
* New methods Loofah::HTML::Document#to_text and
|
8
|
+
Loofah::HTML::DocumentFragment#to_text do the right thing with
|
9
|
+
whitespace. Note that these methods are significantly slower than
|
10
|
+
#text. GH #12
|
11
|
+
* Loofah::Elements::BLOCK_LEVEL contains a canonical list of HTML4 block-level4 elements.
|
12
|
+
* Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text
|
13
|
+
will return unescaped HTML entities by passing :encode_special_chars => false.
|
14
|
+
|
3
15
|
== 0.4.4, 0.4.5, 0.4.6 (2010-02-01)
|
4
16
|
|
5
17
|
Enhancements:
|
6
18
|
|
7
|
-
|
19
|
+
* Loofah::HTML::Document#text and Loofah::HTML::DocumentFragment#text now escape HTML entities.
|
8
20
|
|
9
21
|
Bug fixes:
|
10
22
|
|
11
|
-
|
23
|
+
* Loofah::XssFoliate was not properly escaping HTML entities when implicitly scrubbing a string attribute. GH #17
|
12
24
|
|
13
25
|
== 0.4.3 (2010-01-29)
|
14
26
|
|
15
27
|
Enhancements:
|
16
28
|
|
17
|
-
|
18
|
-
|
29
|
+
* All built-in scrubbers are accepted by ActiveRecord::Base.xss_foliate
|
30
|
+
* Loofah::XssFoliate.xss_foliate_all_models replaces use of the constant LOOFAH_XSS_FOLIATE_ALL_MODELS
|
19
31
|
|
20
32
|
Miscellaneous:
|
21
33
|
|
22
|
-
|
23
|
-
|
24
|
-
|
34
|
+
* Modified documentation for bootstrapping XssFoliate in a Rails app,
|
35
|
+
since the use of Bundler breaks the previously-documented method. To
|
36
|
+
be safe, always use an initializer file.
|
25
37
|
|
26
38
|
== 0.4.2 (2010-01-22)
|
27
39
|
|
28
40
|
Enhancements:
|
29
41
|
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
42
|
+
* Implemented Node#scrub! for scrubbing subtrees.
|
43
|
+
* Implemented NodeSet#scrub! for scrubbing a set of subtrees.
|
44
|
+
* Document.text now only serializes <body> contents (ignores <head>)
|
45
|
+
* <head>, <html> and <body> added to the HTML5lib whitelist.
|
34
46
|
|
35
47
|
Bug fixes:
|
36
48
|
|
37
|
-
|
49
|
+
* Supporting Rails apps that aren't loading ActiveRecord. GH #10
|
38
50
|
|
39
51
|
Miscellaneous:
|
40
52
|
|
41
|
-
|
42
|
-
|
53
|
+
* Mailing list is now loofah@librelist.com / http://librelist.com
|
54
|
+
* IRC channel is now \#loofah on freenode.
|
43
55
|
|
44
56
|
== 0.4.1 (2009-11-23)
|
45
57
|
|
46
58
|
Bugfix:
|
47
59
|
|
48
|
-
|
60
|
+
* Manifest fixed. Whoops.
|
49
61
|
|
50
62
|
== 0.4.0 (2009-11-21)
|
51
63
|
|
52
64
|
Enhancements:
|
53
65
|
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
66
|
+
* Scrubber class introduced, allowing development of custom scrubbers.
|
67
|
+
* Added support for XML documents and fragments.
|
68
|
+
* Added :nofollow HTML scrubber (thanks Luke Melia!)
|
69
|
+
* Built-in scrubbing methods refactored to use Scrubber.
|
58
70
|
|
59
71
|
== 0.3.1 (2009-10-12)
|
60
72
|
|
data/Manifest.txt
CHANGED
@@ -12,12 +12,14 @@ benchmark/www.slashdot.com.html
|
|
12
12
|
init.rb
|
13
13
|
lib/loofah.rb
|
14
14
|
lib/loofah/active_record.rb
|
15
|
+
lib/loofah/elements.rb
|
15
16
|
lib/loofah/helpers.rb
|
16
17
|
lib/loofah/html/document.rb
|
17
18
|
lib/loofah/html/document_fragment.rb
|
18
19
|
lib/loofah/html5/scrub.rb
|
19
20
|
lib/loofah/html5/whitelist.rb
|
20
21
|
lib/loofah/instance_methods.rb
|
22
|
+
lib/loofah/metahelpers.rb
|
21
23
|
lib/loofah/scrubber.rb
|
22
24
|
lib/loofah/scrubbers.rb
|
23
25
|
lib/loofah/xml/document.rb
|
@@ -27,7 +29,9 @@ test/helper.rb
|
|
27
29
|
test/html5/test_sanitizer.rb
|
28
30
|
test/integration/test_ad_hoc.rb
|
29
31
|
test/integration/test_helpers.rb
|
32
|
+
test/integration/test_html.rb
|
30
33
|
test/integration/test_scrubbers.rb
|
34
|
+
test/integration/test_xml.rb
|
31
35
|
test/unit/test_active_record.rb
|
32
36
|
test/unit/test_api.rb
|
33
37
|
test/unit/test_helpers.rb
|
data/README.rdoc
CHANGED
@@ -120,6 +120,13 @@ and +text+ to return plain text:
|
|
120
120
|
|
121
121
|
doc.text # => "ohai! div is safe "
|
122
122
|
|
123
|
+
Also, +to_text+ is available, which does the right thing with
|
124
|
+
whitespace around block-level elements.
|
125
|
+
|
126
|
+
doc = Loofah.fragment("<h1>Title</h1><div>Content</div>")
|
127
|
+
doc.text # => "TitleContent" # probably not what you want
|
128
|
+
doc.to_text # => "\nTitle\n\nContent\n" # better
|
129
|
+
|
123
130
|
=== Loofah::XML::Document and Loofah::XML::DocumentFragment
|
124
131
|
|
125
132
|
These classes are subclasses of Nokogiri::XML::Document and
|
data/lib/loofah.rb
CHANGED
@@ -2,6 +2,9 @@ $LOAD_PATH.unshift(File.expand_path(File.dirname(__FILE__))) unless $LOAD_PATH.i
|
|
2
2
|
|
3
3
|
require 'nokogiri'
|
4
4
|
|
5
|
+
require 'loofah/metahelpers'
|
6
|
+
require 'loofah/elements'
|
7
|
+
|
5
8
|
require 'loofah/html5/whitelist'
|
6
9
|
require 'loofah/html5/scrub'
|
7
10
|
|
@@ -26,7 +29,7 @@ require 'loofah/helpers'
|
|
26
29
|
#
|
27
30
|
module Loofah
|
28
31
|
# The version of Loofah you are using
|
29
|
-
VERSION = '0.4.
|
32
|
+
VERSION = '0.4.7'
|
30
33
|
|
31
34
|
# The minimum required version of Nokogiri
|
32
35
|
REQUIRED_NOKOGIRI_VERSION = '1.3.3'
|
@@ -0,0 +1,19 @@
|
|
1
|
+
module Loofah
|
2
|
+
module Elements
|
3
|
+
# Block elements in HTML4
|
4
|
+
STRICT_BLOCK_LEVEL = %w[address blockquote center dir div dl
|
5
|
+
fieldset form h1 h2 h3 h4 h5 h6 hr isindex menu noframes
|
6
|
+
noscript ol p pre table ul]
|
7
|
+
|
8
|
+
# The following elements may also be considered block-level elements since they may contain block-level elements
|
9
|
+
LOOSE_BLOCK_LEVEL = %w[dd dt frameset li tbody td tfoot th thead tr]
|
10
|
+
|
11
|
+
BLOCK_LEVEL = STRICT_BLOCK_LEVEL + LOOSE_BLOCK_LEVEL
|
12
|
+
end
|
13
|
+
|
14
|
+
module HashedElements
|
15
|
+
include Loofah::MetaHelpers::HashifiedConstants(Elements)
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
|
data/lib/loofah/helpers.rb
CHANGED
@@ -18,6 +18,13 @@ module Loofah
|
|
18
18
|
def sanitize(string_or_io)
|
19
19
|
Loofah.scrub_fragment(string_or_io, :strip).to_s
|
20
20
|
end
|
21
|
+
|
22
|
+
#
|
23
|
+
# A helper to remove extraneous whitespace from text-ified HTML
|
24
|
+
#
|
25
|
+
def remove_extraneous_whitespace(string)
|
26
|
+
string.gsub(/\n\s*\n\s*\n/,"\n\n")
|
27
|
+
end
|
21
28
|
end
|
22
29
|
end
|
23
30
|
end
|
data/lib/loofah/html/document.rb
CHANGED
@@ -3,21 +3,16 @@ module Loofah
|
|
3
3
|
#
|
4
4
|
# Subclass of Nokogiri::HTML::Document.
|
5
5
|
#
|
6
|
-
# See Loofah::ScrubBehavior and Loofah::
|
6
|
+
# See Loofah::ScrubBehavior and Loofah::TextBehavior for additional methods.
|
7
7
|
#
|
8
8
|
class Document < Nokogiri::HTML::Document
|
9
9
|
include Loofah::ScrubBehavior::Node
|
10
10
|
include Loofah::DocumentDecorator
|
11
|
+
include Loofah::TextBehavior
|
11
12
|
|
12
|
-
|
13
|
-
|
14
|
-
# with HTML entities encoded.
|
15
|
-
#
|
16
|
-
def text
|
17
|
-
encode_special_chars xpath("/html/body").inner_text
|
13
|
+
def serialize_root
|
14
|
+
at_xpath("/html/body")
|
18
15
|
end
|
19
|
-
alias :inner_text :text
|
20
|
-
alias :to_str :text
|
21
16
|
end
|
22
17
|
end
|
23
18
|
end
|
@@ -3,9 +3,11 @@ module Loofah
|
|
3
3
|
#
|
4
4
|
# Subclass of Nokogiri::HTML::DocumentFragment.
|
5
5
|
#
|
6
|
-
# See Loofah::ScrubBehavior for additional methods.
|
6
|
+
# See Loofah::ScrubBehavior and Loofah::TextBehavior for additional methods.
|
7
7
|
#
|
8
8
|
class DocumentFragment < Nokogiri::HTML::DocumentFragment
|
9
|
+
include Loofah::TextBehavior
|
10
|
+
|
9
11
|
class << self
|
10
12
|
#
|
11
13
|
# Overridden Nokogiri::HTML::DocumentFragment
|
@@ -21,23 +23,12 @@ module Loofah
|
|
21
23
|
# Returns the HTML markup contained by the fragment
|
22
24
|
#
|
23
25
|
def to_s
|
24
|
-
|
26
|
+
serialize_root.children.to_s
|
25
27
|
end
|
26
28
|
alias :serialize :to_s
|
27
29
|
|
28
|
-
|
29
|
-
|
30
|
-
#
|
31
|
-
def text
|
32
|
-
encode_special_chars serialize_roots.children.inner_text
|
33
|
-
end
|
34
|
-
alias :inner_text :text
|
35
|
-
alias :to_str :text
|
36
|
-
|
37
|
-
private
|
38
|
-
|
39
|
-
def serialize_roots # :nodoc:
|
40
|
-
xpath("./body").first || self
|
30
|
+
def serialize_root
|
31
|
+
at_xpath("./body") || self
|
41
32
|
end
|
42
33
|
end
|
43
34
|
end
|
@@ -162,13 +162,7 @@ module Loofah
|
|
162
162
|
# The HTML5lib whitelist arrays, transformed into hashes for faster lookup.
|
163
163
|
#
|
164
164
|
module HashedWhiteList
|
165
|
-
WhiteList
|
166
|
-
next unless WhiteList.module_eval("#{constant}").is_a?(Array)
|
167
|
-
module_eval <<-CODE
|
168
|
-
#{constant} = {}
|
169
|
-
WhiteList::#{constant}.each { |c| #{constant}[c] = true ; #{constant}[c.downcase] = true }
|
170
|
-
CODE
|
171
|
-
end
|
165
|
+
include Loofah::MetaHelpers::HashifiedConstants(WhiteList)
|
172
166
|
end
|
173
167
|
end
|
174
168
|
end
|
@@ -27,8 +27,7 @@ module Loofah
|
|
27
27
|
# README.rdoc for more example usage.
|
28
28
|
#
|
29
29
|
module ScrubBehavior
|
30
|
-
#
|
31
|
-
module Node
|
30
|
+
module Node # :nodoc:
|
32
31
|
def scrub!(scrubber)
|
33
32
|
#
|
34
33
|
# yes. this should be three separate methods. but nokogiri
|
@@ -50,8 +49,7 @@ module Loofah
|
|
50
49
|
end
|
51
50
|
end
|
52
51
|
|
53
|
-
#
|
54
|
-
module NodeSet
|
52
|
+
module NodeSet # :nodoc:
|
55
53
|
def scrub!(scrubber)
|
56
54
|
each { |node| node.scrub!(scrubber) }
|
57
55
|
self
|
@@ -67,6 +65,58 @@ module Loofah
|
|
67
65
|
end
|
68
66
|
end
|
69
67
|
|
68
|
+
#
|
69
|
+
# Overrides +text+ in HTML::Document and HTML::DocumentFragment,
|
70
|
+
# and mixes in +to_text+.
|
71
|
+
#
|
72
|
+
module TextBehavior
|
73
|
+
#
|
74
|
+
# Returns a plain-text version of the markup contained by the document,
|
75
|
+
# with HTML entities encoded.
|
76
|
+
#
|
77
|
+
# This method is significantly faster than #to_text, but isn't
|
78
|
+
# clever about whitespace around block elements.
|
79
|
+
#
|
80
|
+
# Loofah.document("<h1>Title</h1><div>Content</div>").text
|
81
|
+
# # => "TitleContent"
|
82
|
+
#
|
83
|
+
# By default, the returned text will have HTML entities
|
84
|
+
# escaped. If you want unescaped entities, and you understand
|
85
|
+
# that the result is unsafe to render in a browser, then you
|
86
|
+
# can pass an argument as shown:
|
87
|
+
#
|
88
|
+
# frag = Loofah.fragment("<script>alert('EVIL');</script>")
|
89
|
+
# # ok for browser:
|
90
|
+
# frag.text # => "<script>alert('EVIL');</script>"
|
91
|
+
# # decidedly not ok for browser:
|
92
|
+
# frag.text(:encode_special_chars => false) # => "<script>alert('EVIL');</script>"
|
93
|
+
#
|
94
|
+
def text(options={})
|
95
|
+
result = serialize_root.children.inner_text rescue ""
|
96
|
+
if options[:encode_special_chars] == false
|
97
|
+
result # possibly dangerous if rendered in a browser
|
98
|
+
else
|
99
|
+
encode_special_chars result
|
100
|
+
end
|
101
|
+
end
|
102
|
+
alias :inner_text :text
|
103
|
+
alias :to_str :text
|
104
|
+
|
105
|
+
#
|
106
|
+
# Returns a plain-text version of the markup contained by the
|
107
|
+
# fragment, with HTML entities encoded.
|
108
|
+
#
|
109
|
+
# This method is slower than #to_text, but is clever about
|
110
|
+
# whitespace around block elements.
|
111
|
+
#
|
112
|
+
# Loofah.document("<h1>Title</h1><div>Content</div>").to_text
|
113
|
+
# # => "\nTitle\n\nContent\n"
|
114
|
+
#
|
115
|
+
def to_text(options={})
|
116
|
+
Loofah::Helpers.remove_extraneous_whitespace self.dup.scrub!(:newline_block_elements).text(options)
|
117
|
+
end
|
118
|
+
end
|
119
|
+
|
70
120
|
module DocumentDecorator # :nodoc:
|
71
121
|
def initialize(*args, &block)
|
72
122
|
super
|
@@ -0,0 +1,15 @@
|
|
1
|
+
module Loofah
|
2
|
+
module MetaHelpers
|
3
|
+
def self.HashifiedConstants(orig_module)
|
4
|
+
hashed_module = Module.new
|
5
|
+
orig_module.constants.each do |constant|
|
6
|
+
next unless orig_module.module_eval("#{constant}").is_a?(Array)
|
7
|
+
hashed_module.module_eval <<-CODE
|
8
|
+
#{constant} = {}
|
9
|
+
#{orig_module.name}::#{constant}.each { |c| #{constant}[c] = true ; #{constant}[c.downcase] = true }
|
10
|
+
CODE
|
11
|
+
end
|
12
|
+
hashed_module
|
13
|
+
end
|
14
|
+
end
|
15
|
+
end
|
data/lib/loofah/scrubbers.rb
CHANGED
@@ -58,7 +58,6 @@ module Loofah
|
|
58
58
|
# Loofah.fragment(link_farmers_markup).scrub!(:nofollow)
|
59
59
|
# => "ohai! <a href='http://www.myswarmysite.com/' rel="nofollow">I like your blog post</a>"
|
60
60
|
#
|
61
|
-
#
|
62
61
|
module Scrubbers
|
63
62
|
#
|
64
63
|
# === scrub!(:strip)
|
@@ -184,15 +183,30 @@ module Loofah
|
|
184
183
|
end
|
185
184
|
end
|
186
185
|
|
186
|
+
# This class probably isn't useful publicly, but is used for #to_text's current implemention
|
187
|
+
class NewlineBlockElements < Scrubber # :nodoc:
|
188
|
+
def initialize
|
189
|
+
@direction = :bottom_up
|
190
|
+
end
|
191
|
+
|
192
|
+
def scrub(node)
|
193
|
+
return CONTINUE unless Loofah::HashedElements::BLOCK_LEVEL[node.name]
|
194
|
+
replacement_killer = Nokogiri::XML::Text.new("\n#{node.content}\n", node.document)
|
195
|
+
node.add_next_sibling replacement_killer
|
196
|
+
node.remove
|
197
|
+
end
|
198
|
+
end
|
199
|
+
|
187
200
|
#
|
188
201
|
# A hash that maps a symbol (like +:prune+) to the appropriate Scrubber (Loofah::Scrubbers::Prune).
|
189
202
|
#
|
190
203
|
MAP = {
|
191
|
-
:escape
|
192
|
-
:prune
|
204
|
+
:escape => Escape,
|
205
|
+
:prune => Prune,
|
193
206
|
:whitewash => Whitewash,
|
194
|
-
:strip
|
195
|
-
:nofollow
|
207
|
+
:strip => Strip,
|
208
|
+
:nofollow => NoFollow,
|
209
|
+
:newline_block_elements => NewlineBlockElements
|
196
210
|
}
|
197
211
|
|
198
212
|
#
|
@@ -16,81 +16,6 @@ class TestAdHoc < Test::Unit::TestCase
|
|
16
16
|
end
|
17
17
|
end
|
18
18
|
|
19
|
-
context "integration test" do
|
20
|
-
context "xml document" do
|
21
|
-
context "custom scrubber" do
|
22
|
-
should "act as expected" do
|
23
|
-
xml = Loofah.xml_document <<-EOXML
|
24
|
-
<root>
|
25
|
-
<employee deceased='true'>Abraham Lincoln</employee>
|
26
|
-
<employee deceased='false'>Abe Vigoda</employee>
|
27
|
-
</root>
|
28
|
-
EOXML
|
29
|
-
bring_out_your_dead = Loofah::Scrubber.new do |node|
|
30
|
-
if node.name == "employee" and node["deceased"] == "true"
|
31
|
-
node.remove
|
32
|
-
Loofah::Scrubber::STOP # don't bother with the rest of the subtree
|
33
|
-
end
|
34
|
-
end
|
35
|
-
assert_equal 2, xml.css("employee").length
|
36
|
-
|
37
|
-
xml.scrub!(bring_out_your_dead)
|
38
|
-
|
39
|
-
employees = xml.css "employee"
|
40
|
-
assert_equal 1, employees.length
|
41
|
-
assert_equal "Abe Vigoda", employees.first.inner_text
|
42
|
-
end
|
43
|
-
end
|
44
|
-
end
|
45
|
-
|
46
|
-
context "xml fragment" do
|
47
|
-
context "custom scrubber" do
|
48
|
-
should "act as expected" do
|
49
|
-
xml = Loofah.xml_fragment <<-EOXML
|
50
|
-
<employee deceased='true'>Abraham Lincoln</employee>
|
51
|
-
<employee deceased='false'>Abe Vigoda</employee>
|
52
|
-
EOXML
|
53
|
-
bring_out_your_dead = Loofah::Scrubber.new do |node|
|
54
|
-
if node.name == "employee" and node["deceased"] == "true"
|
55
|
-
node.remove
|
56
|
-
Loofah::Scrubber::STOP # don't bother with the rest of the subtree
|
57
|
-
end
|
58
|
-
end
|
59
|
-
assert_equal 2, xml.css("employee").length
|
60
|
-
|
61
|
-
xml.scrub!(bring_out_your_dead)
|
62
|
-
|
63
|
-
employees = xml.css "employee"
|
64
|
-
assert_equal 1, employees.length
|
65
|
-
assert_equal "Abe Vigoda", employees.first.inner_text
|
66
|
-
end
|
67
|
-
end
|
68
|
-
end
|
69
|
-
|
70
|
-
context "html fragment" do
|
71
|
-
context "#to_s" do
|
72
|
-
should "not include head tags (like style)" do
|
73
|
-
html = Loofah.fragment "<style>foo</style><div>bar</div>"
|
74
|
-
assert_equal "<div>bar</div>", html.to_s
|
75
|
-
end
|
76
|
-
end
|
77
|
-
|
78
|
-
context "#text" do
|
79
|
-
should "not include head tags (like style)" do
|
80
|
-
html = Loofah.fragment "<style>foo</style><div>bar</div>"
|
81
|
-
assert_equal "bar", html.text
|
82
|
-
end
|
83
|
-
end
|
84
|
-
end
|
85
|
-
|
86
|
-
context "html document" do
|
87
|
-
should "not include head tags (like style)" do
|
88
|
-
html = Loofah.document "<style>foo</style><div>bar</div>"
|
89
|
-
assert_equal "bar", html.text
|
90
|
-
end
|
91
|
-
end
|
92
|
-
end
|
93
|
-
|
94
19
|
def test_removal_of_illegal_tag
|
95
20
|
html = <<-HTML
|
96
21
|
following this there should be no jim tag
|
@@ -0,0 +1,51 @@
|
|
1
|
+
require File.expand_path(File.join(File.dirname(__FILE__), '..', 'helper'))
|
2
|
+
|
3
|
+
class TestHtml < Test::Unit::TestCase
|
4
|
+
context "html fragment" do
|
5
|
+
context "#to_s" do
|
6
|
+
should "not include head tags (like style)" do
|
7
|
+
html = Loofah.fragment "<style>foo</style><div>bar</div>"
|
8
|
+
assert_equal "<div>bar</div>", html.to_s
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
context "#text" do
|
13
|
+
should "not include head tags (like style)" do
|
14
|
+
html = Loofah.fragment "<style>foo</style><div>bar</div>"
|
15
|
+
assert_equal "bar", html.text
|
16
|
+
end
|
17
|
+
end
|
18
|
+
|
19
|
+
context "#to_text" do
|
20
|
+
should "add newlines before and after block elements" do
|
21
|
+
html = Loofah.fragment "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
|
22
|
+
assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
|
23
|
+
end
|
24
|
+
|
25
|
+
should "remove extraneous whitespace" do
|
26
|
+
html = Loofah.fragment "<div>tweedle\n\n\t\n\s\nbeetle</div>"
|
27
|
+
assert_equal "\ntweedle\n\nbeetle\n", html.to_text
|
28
|
+
end
|
29
|
+
end
|
30
|
+
end
|
31
|
+
|
32
|
+
context "html document" do
|
33
|
+
should "not include head tags (like style)" do
|
34
|
+
html = Loofah.document "<style>foo</style><div>bar</div>"
|
35
|
+
assert_equal "bar", html.text
|
36
|
+
end
|
37
|
+
|
38
|
+
context "#to_text" do
|
39
|
+
should "add newlines before and after block elements" do
|
40
|
+
html = Loofah.document "<div>tweedle<h1>beetle</h1>bottle<span>puddle</span>paddle<div>battle</div>muddle</div>"
|
41
|
+
assert_equal "\ntweedle\nbeetle\nbottlepuddlepaddle\nbattle\nmuddle\n", html.to_text
|
42
|
+
end
|
43
|
+
|
44
|
+
should "remove extraneous whitespace" do
|
45
|
+
html = Loofah.document "<div>tweedle\n\n\t\n\s\nbeetle</div>"
|
46
|
+
assert_equal "\ntweedle\n\nbeetle\n", html.to_text
|
47
|
+
end
|
48
|
+
end
|
49
|
+
end
|
50
|
+
end
|
51
|
+
|
@@ -18,6 +18,7 @@ class TestScrubbers < Test::Unit::TestCase
|
|
18
18
|
|
19
19
|
ENTITY_HACK_ATTACK = "<div><div>Hack attack!</div><div><script>alert('evil')</script></div></div>"
|
20
20
|
ENTITY_HACK_ATTACK_TEXT_SCRUB = "Hack attack!<script>alert('evil')</script>"
|
21
|
+
ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC = "Hack attack!<script>alert('evil')</script>"
|
21
22
|
|
22
23
|
context "Document" do
|
23
24
|
context "#scrub!" do
|
@@ -89,6 +90,24 @@ class TestScrubbers < Test::Unit::TestCase
|
|
89
90
|
|
90
91
|
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
|
91
92
|
end
|
93
|
+
|
94
|
+
context "with encode_special_chars => false" do
|
95
|
+
should "leave behind only inner text with html entities unescaped" do
|
96
|
+
doc = Loofah::HTML::Document.parse "<html><body>#{ENTITY_HACK_ATTACK}</body></html>"
|
97
|
+
result = doc.text(:encode_special_chars => false)
|
98
|
+
|
99
|
+
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC, result
|
100
|
+
end
|
101
|
+
end
|
102
|
+
|
103
|
+
context "with encode_special_chars => true" do
|
104
|
+
should "leave behind only inner text with html entities still escaped" do
|
105
|
+
doc = Loofah::HTML::Document.parse "<html><body>#{ENTITY_HACK_ATTACK}</body></html>"
|
106
|
+
result = doc.text(:encode_special_chars => true)
|
107
|
+
|
108
|
+
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
|
109
|
+
end
|
110
|
+
end
|
92
111
|
end
|
93
112
|
|
94
113
|
context "#to_s" do
|
@@ -239,6 +258,24 @@ class TestScrubbers < Test::Unit::TestCase
|
|
239
258
|
|
240
259
|
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
|
241
260
|
end
|
261
|
+
|
262
|
+
context "with encode_special_chars => false" do
|
263
|
+
should "leave behind only inner text with html entities unescaped" do
|
264
|
+
doc = Loofah::HTML::DocumentFragment.parse "<div>#{ENTITY_HACK_ATTACK}</div>"
|
265
|
+
result = doc.text(:encode_special_chars => false)
|
266
|
+
|
267
|
+
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC, result
|
268
|
+
end
|
269
|
+
end
|
270
|
+
|
271
|
+
context "with encode_special_chars => true" do
|
272
|
+
should "leave behind only inner text with html entities still escaped" do
|
273
|
+
doc = Loofah::HTML::DocumentFragment.parse "<div>#{ENTITY_HACK_ATTACK}</div>"
|
274
|
+
result = doc.text(:encode_special_chars => true)
|
275
|
+
|
276
|
+
assert_equal ENTITY_HACK_ATTACK_TEXT_SCRUB, result
|
277
|
+
end
|
278
|
+
end
|
242
279
|
end
|
243
280
|
|
244
281
|
context "#to_s" do
|
@@ -0,0 +1,55 @@
|
|
1
|
+
require File.expand_path(File.join(File.dirname(__FILE__), '..', 'helper'))
|
2
|
+
|
3
|
+
class TestXml < Test::Unit::TestCase
|
4
|
+
context "integration test" do
|
5
|
+
context "xml document" do
|
6
|
+
context "custom scrubber" do
|
7
|
+
should "act as expected" do
|
8
|
+
xml = Loofah.xml_document <<-EOXML
|
9
|
+
<root>
|
10
|
+
<employee deceased='true'>Abraham Lincoln</employee>
|
11
|
+
<employee deceased='false'>Abe Vigoda</employee>
|
12
|
+
</root>
|
13
|
+
EOXML
|
14
|
+
bring_out_your_dead = Loofah::Scrubber.new do |node|
|
15
|
+
if node.name == "employee" and node["deceased"] == "true"
|
16
|
+
node.remove
|
17
|
+
Loofah::Scrubber::STOP # don't bother with the rest of the subtree
|
18
|
+
end
|
19
|
+
end
|
20
|
+
assert_equal 2, xml.css("employee").length
|
21
|
+
|
22
|
+
xml.scrub!(bring_out_your_dead)
|
23
|
+
|
24
|
+
employees = xml.css "employee"
|
25
|
+
assert_equal 1, employees.length
|
26
|
+
assert_equal "Abe Vigoda", employees.first.inner_text
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
context "xml fragment" do
|
32
|
+
context "custom scrubber" do
|
33
|
+
should "act as expected" do
|
34
|
+
xml = Loofah.xml_fragment <<-EOXML
|
35
|
+
<employee deceased='true'>Abraham Lincoln</employee>
|
36
|
+
<employee deceased='false'>Abe Vigoda</employee>
|
37
|
+
EOXML
|
38
|
+
bring_out_your_dead = Loofah::Scrubber.new do |node|
|
39
|
+
if node.name == "employee" and node["deceased"] == "true"
|
40
|
+
node.remove
|
41
|
+
Loofah::Scrubber::STOP # don't bother with the rest of the subtree
|
42
|
+
end
|
43
|
+
end
|
44
|
+
assert_equal 2, xml.css("employee").length
|
45
|
+
|
46
|
+
xml.scrub!(bring_out_your_dead)
|
47
|
+
|
48
|
+
employees = xml.css "employee"
|
49
|
+
assert_equal 1, employees.length
|
50
|
+
assert_equal "Abe Vigoda", employees.first.inner_text
|
51
|
+
end
|
52
|
+
end
|
53
|
+
end
|
54
|
+
end
|
55
|
+
end
|
data/test/unit/test_api.rb
CHANGED
@@ -81,13 +81,13 @@ class TestApi < Test::Unit::TestCase
|
|
81
81
|
end
|
82
82
|
|
83
83
|
def test_loofah_xml_document_node_scrub!
|
84
|
-
doc = Loofah.
|
84
|
+
doc = Loofah.xml_document(XML)
|
85
85
|
assert(node = doc.at_css("div"))
|
86
86
|
node.scrub!(:strip)
|
87
87
|
end
|
88
88
|
|
89
89
|
def test_loofah_xml_fragment_node_scrub!
|
90
|
-
doc = Loofah.
|
90
|
+
doc = Loofah.xml_fragment(XML)
|
91
91
|
assert(node = doc.at_css("div"))
|
92
92
|
node.scrub!(:strip)
|
93
93
|
end
|
@@ -99,6 +99,16 @@ class TestApi < Test::Unit::TestCase
|
|
99
99
|
node_set.scrub!(:strip)
|
100
100
|
end
|
101
101
|
|
102
|
+
should "HTML::DocumentFragment exposes serialize_root" do
|
103
|
+
doc = Loofah.fragment(HTML)
|
104
|
+
assert_equal HTML, doc.serialize_root.to_html
|
105
|
+
end
|
106
|
+
|
107
|
+
should "HTML::Document exposes serialize_root" do
|
108
|
+
doc = Loofah.document(HTML)
|
109
|
+
assert_equal HTML, doc.serialize_root.children.to_html
|
110
|
+
end
|
111
|
+
|
102
112
|
private
|
103
113
|
|
104
114
|
def assert_html_documentish(doc)
|
metadata
CHANGED
@@ -1,7 +1,12 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: loofah
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
|
4
|
+
prerelease: false
|
5
|
+
segments:
|
6
|
+
- 0
|
7
|
+
- 4
|
8
|
+
- 7
|
9
|
+
version: 0.4.7
|
5
10
|
platform: ruby
|
6
11
|
authors:
|
7
12
|
- Mike Dalessio
|
@@ -31,59 +36,105 @@ cert_chain:
|
|
31
36
|
FlqnTjy13J3nD30uxy9a1g==
|
32
37
|
-----END CERTIFICATE-----
|
33
38
|
|
34
|
-
date: 2010-
|
39
|
+
date: 2010-03-09 00:00:00 -05:00
|
35
40
|
default_executable:
|
36
41
|
dependencies:
|
37
42
|
- !ruby/object:Gem::Dependency
|
38
43
|
name: nokogiri
|
39
|
-
|
40
|
-
|
41
|
-
version_requirements: !ruby/object:Gem::Requirement
|
44
|
+
prerelease: false
|
45
|
+
requirement: &id001 !ruby/object:Gem::Requirement
|
42
46
|
requirements:
|
43
47
|
- - ">="
|
44
48
|
- !ruby/object:Gem::Version
|
49
|
+
segments:
|
50
|
+
- 1
|
51
|
+
- 3
|
52
|
+
- 3
|
45
53
|
version: 1.3.3
|
46
|
-
|
54
|
+
type: :runtime
|
55
|
+
version_requirements: *id001
|
47
56
|
- !ruby/object:Gem::Dependency
|
48
|
-
name:
|
57
|
+
name: rubyforge
|
58
|
+
prerelease: false
|
59
|
+
requirement: &id002 !ruby/object:Gem::Requirement
|
60
|
+
requirements:
|
61
|
+
- - ">="
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
segments:
|
64
|
+
- 2
|
65
|
+
- 0
|
66
|
+
- 3
|
67
|
+
version: 2.0.3
|
68
|
+
type: :development
|
69
|
+
version_requirements: *id002
|
70
|
+
- !ruby/object:Gem::Dependency
|
71
|
+
name: gemcutter
|
72
|
+
prerelease: false
|
73
|
+
requirement: &id003 !ruby/object:Gem::Requirement
|
74
|
+
requirements:
|
75
|
+
- - ">="
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
segments:
|
78
|
+
- 0
|
79
|
+
- 3
|
80
|
+
- 0
|
81
|
+
version: 0.3.0
|
49
82
|
type: :development
|
50
|
-
|
51
|
-
|
83
|
+
version_requirements: *id003
|
84
|
+
- !ruby/object:Gem::Dependency
|
85
|
+
name: mocha
|
86
|
+
prerelease: false
|
87
|
+
requirement: &id004 !ruby/object:Gem::Requirement
|
52
88
|
requirements:
|
53
89
|
- - ">="
|
54
90
|
- !ruby/object:Gem::Version
|
91
|
+
segments:
|
92
|
+
- 0
|
93
|
+
- 9
|
55
94
|
version: "0.9"
|
56
|
-
|
95
|
+
type: :development
|
96
|
+
version_requirements: *id004
|
57
97
|
- !ruby/object:Gem::Dependency
|
58
98
|
name: thoughtbot-shoulda
|
59
|
-
|
60
|
-
|
61
|
-
version_requirements: !ruby/object:Gem::Requirement
|
99
|
+
prerelease: false
|
100
|
+
requirement: &id005 !ruby/object:Gem::Requirement
|
62
101
|
requirements:
|
63
102
|
- - ">="
|
64
103
|
- !ruby/object:Gem::Version
|
104
|
+
segments:
|
105
|
+
- 2
|
106
|
+
- 10
|
65
107
|
version: "2.10"
|
66
|
-
|
108
|
+
type: :development
|
109
|
+
version_requirements: *id005
|
67
110
|
- !ruby/object:Gem::Dependency
|
68
111
|
name: acts_as_fu
|
69
|
-
|
70
|
-
|
71
|
-
version_requirements: !ruby/object:Gem::Requirement
|
112
|
+
prerelease: false
|
113
|
+
requirement: &id006 !ruby/object:Gem::Requirement
|
72
114
|
requirements:
|
73
115
|
- - ">="
|
74
116
|
- !ruby/object:Gem::Version
|
117
|
+
segments:
|
118
|
+
- 0
|
119
|
+
- 0
|
120
|
+
- 5
|
75
121
|
version: 0.0.5
|
76
|
-
|
122
|
+
type: :development
|
123
|
+
version_requirements: *id006
|
77
124
|
- !ruby/object:Gem::Dependency
|
78
125
|
name: hoe
|
79
|
-
|
80
|
-
|
81
|
-
version_requirements: !ruby/object:Gem::Requirement
|
126
|
+
prerelease: false
|
127
|
+
requirement: &id007 !ruby/object:Gem::Requirement
|
82
128
|
requirements:
|
83
129
|
- - ">="
|
84
130
|
- !ruby/object:Gem::Version
|
85
|
-
|
86
|
-
|
131
|
+
segments:
|
132
|
+
- 2
|
133
|
+
- 5
|
134
|
+
- 0
|
135
|
+
version: 2.5.0
|
136
|
+
type: :development
|
137
|
+
version_requirements: *id007
|
87
138
|
description: |-
|
88
139
|
Loofah is a general library for manipulating HTML/XML documents and
|
89
140
|
fragments. It's built on top of Nokogiri and libxml2, so it's fast and
|
@@ -122,12 +173,14 @@ files:
|
|
122
173
|
- init.rb
|
123
174
|
- lib/loofah.rb
|
124
175
|
- lib/loofah/active_record.rb
|
176
|
+
- lib/loofah/elements.rb
|
125
177
|
- lib/loofah/helpers.rb
|
126
178
|
- lib/loofah/html/document.rb
|
127
179
|
- lib/loofah/html/document_fragment.rb
|
128
180
|
- lib/loofah/html5/scrub.rb
|
129
181
|
- lib/loofah/html5/whitelist.rb
|
130
182
|
- lib/loofah/instance_methods.rb
|
183
|
+
- lib/loofah/metahelpers.rb
|
131
184
|
- lib/loofah/scrubber.rb
|
132
185
|
- lib/loofah/scrubbers.rb
|
133
186
|
- lib/loofah/xml/document.rb
|
@@ -137,7 +190,9 @@ files:
|
|
137
190
|
- test/html5/test_sanitizer.rb
|
138
191
|
- test/integration/test_ad_hoc.rb
|
139
192
|
- test/integration/test_helpers.rb
|
193
|
+
- test/integration/test_html.rb
|
140
194
|
- test/integration/test_scrubbers.rb
|
195
|
+
- test/integration/test_xml.rb
|
141
196
|
- test/unit/test_active_record.rb
|
142
197
|
- test/unit/test_api.rb
|
143
198
|
- test/unit/test_helpers.rb
|
@@ -158,18 +213,20 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
158
213
|
requirements:
|
159
214
|
- - ">="
|
160
215
|
- !ruby/object:Gem::Version
|
216
|
+
segments:
|
217
|
+
- 0
|
161
218
|
version: "0"
|
162
|
-
version:
|
163
219
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
164
220
|
requirements:
|
165
221
|
- - ">="
|
166
222
|
- !ruby/object:Gem::Version
|
223
|
+
segments:
|
224
|
+
- 0
|
167
225
|
version: "0"
|
168
|
-
version:
|
169
226
|
requirements: []
|
170
227
|
|
171
228
|
rubyforge_project: loofah
|
172
|
-
rubygems_version: 1.3.
|
229
|
+
rubygems_version: 1.3.6
|
173
230
|
signing_key:
|
174
231
|
specification_version: 3
|
175
232
|
summary: Loofah is a general library for manipulating HTML/XML documents and fragments
|
@@ -177,6 +234,8 @@ test_files:
|
|
177
234
|
- test/integration/test_helpers.rb
|
178
235
|
- test/integration/test_scrubbers.rb
|
179
236
|
- test/integration/test_ad_hoc.rb
|
237
|
+
- test/integration/test_xml.rb
|
238
|
+
- test/integration/test_html.rb
|
180
239
|
- test/unit/test_xss_foliate.rb
|
181
240
|
- test/unit/test_helpers.rb
|
182
241
|
- test/unit/test_scrubber.rb
|
metadata.gz.sig
CHANGED
Binary file
|