schwabsauce-merb_dm_xss_terminate 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Mike Schwab
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README ADDED
@@ -0,0 +1,89 @@
1
+ this fork is for the purpose of hooking into DataMapper::Resource instead of ActiveRecord::Base.
2
+
3
+ what's the merb/dm equivalent of ENV['MODELS']? (in rake task). apart from that I think the port is complete.
4
+
5
+ = merb_dm_xss_terminate
6
+
7
+ Plugin that auto-sanitizes data before it is saved in your DataMapper models.
8
+
9
+ merb_dm_xss_terminate is a port of merb_xss_terminate by Ben Chiu, which is
10
+ a port of xss_terminate by Luke Francl. The white list sanitizer and full
11
+ sanitizer were lifted from Rails so you don't have to install ActionPack.
12
+
13
+ merb_dm_xss_terminate makes stripping and sanitizing HTML automatic. Install
14
+ and forget. And forget about remembering to escape your output, because you
15
+ won't need to anymore. Just remember the cases where html is allowed.
16
+
17
+ *By default, it will strip all HTML tags from user input.* But merb_dm_xss_terminate
18
+ is also flexible. When you need users to be able to enter HTML, the plugin
19
+ allows you remove bad HTML with your choice of two whitelist-based sanitizers,
20
+ or to skip HTML sanitization entirely on a per-field basis.
21
+
22
+ == Installation
23
+
24
+ git clone git://github.com/schwabsauce/merb_xss_terminate.git
25
+ cd merb_xss_terminate
26
+ rake install
27
+ add: dependency 'merb_dm_xss_terminate' to init.rb
28
+
29
+ == HTML sanitization
30
+
31
+ * Full-sanitizer: removes all HTML by stripping all the tags. Tags are
32
+ removed, but their content is not.
33
+ * White-list sanitizer: removes bad HTML with Rails' HTML sanitizer methods.
34
+ Bad tags are removed completely, including their content.
35
+ * HTML5lib sanitization: removes bad HTML after parsing it with
36
+ {HTML5lib}[http://code.google.com/p/html5lib/], a library that parses HTML
37
+ like browsers do. It should be very tolerant of invalid HTML. Bad tags are
38
+ escaped, not removed.
39
+ * Do nothing. You can chose not to process given fields.
40
+
41
+ == Usage
42
+
43
+ Installing the plugin creates a +before :save+ hook that will strip HTML tags
44
+ from all string and text fields. No further configuration is necessary if this
45
+ is what you want. To customize the behavior, you use the xss_terminate
46
+ class method in your models.
47
+
48
+ To exempt some fields from sanitization, use the <tt>:except</tt> option
49
+ with a list of fields not to process. Note: Merb uses :exclude but use :except here.
50
+
51
+ class Comment
52
+ xss_terminate :except => [ :body ]
53
+ end
54
+
55
+ To sanitize HTML with Rails' sanitization, use the <tt>:sanitize</tt> option:
56
+
57
+ class Review
58
+ xss_terminate :sanitize => [ :body, :author_name]
59
+ end
60
+
61
+ To sanitize HTML with {HTML5Lib}[http://code.google.com/p/html5lib/]
62
+ use the <tt>:html5lib_sanitize</tt> option with a list of fields to sanitize:
63
+
64
+ class Entry
65
+ xss_terminate :html5lib_sanitize => [ :body, :author_name ]
66
+ end
67
+
68
+ You can combine multiple options if you have some fields you would like skipped
69
+ and others sanitized. Fields not listed in the option arrays will be stripped.
70
+
71
+ class Message
72
+ xss_terminate :except => [ :body ], :sanitize => [ :title ]
73
+ end
74
+
75
+ == Sanitizing existing records
76
+
77
+ After installing merb_xss_terminate and configuring it to your liking, you can
78
+ run <tt>rake merb_xss_terminate:db:sanitize MODELS=Foo,Bar,Baz</tt> to execute
79
+ it against your existing records. This will load each model found and save it
80
+ again to invoke the before_save hook.
81
+
82
+ == Credits
83
+
84
+ merb_xss_terminate by {Ben Chiu}
85
+
86
+ xss_terminate by {Luke Francl}[http://railspikes.com] and acts_as_sanitized by
87
+ {Alex Payne}[http://www.al3x.net].
88
+
89
+ HTML5Lib sanitization by {Jacques Distler}[http://golem.ph.utexas.edu/~distler].
data/Rakefile ADDED
@@ -0,0 +1,64 @@
1
+ require 'rubygems'
2
+ require 'rubygems/specification'
3
+ require 'rake/gempackagetask'
4
+ require 'spec/rake/spectask'
5
+ require 'merb-core/test/tasks/spectasks'
6
+ require 'date'
7
+ require 'merb_rake_helper'
8
+
9
+ PLUGIN = "merb_dm_xss_terminate"
10
+ NAME = "merb_dm_xss_terminate"
11
+ GEM_VERSION = "0.5.3"
12
+ AUTHOR = "Mike Schwab"
13
+ EMAIL = "mike.schwab@gmail.com"
14
+ HOMEPAGE = "http://github.com/schwabsauce/merb_xss_terminate"
15
+ SUMMARY = "Plugin that auto-sanitizes data before it is saved in your DataMapper models"
16
+
17
+ spec = Gem::Specification.new do |s|
18
+ s.name = NAME
19
+ s.version = GEM_VERSION
20
+ s.platform = Gem::Platform::RUBY
21
+ s.has_rdoc = true
22
+ s.extra_rdoc_files = ["README", "LICENSE", 'TODO']
23
+ s.summary = SUMMARY
24
+ s.description = s.summary
25
+ s.author = AUTHOR
26
+ s.email = EMAIL
27
+ s.homepage = HOMEPAGE
28
+ s.add_dependency('merb-core', '>= 0.9.0')
29
+ s.add_dependency('html5', '>= 0.10.0')
30
+ s.require_path = 'lib'
31
+ s.files = %w(LICENSE README Rakefile TODO) + Dir.glob("{lib,spec}/**/*")
32
+ end
33
+
34
+ Rake::GemPackageTask.new(spec) do |pkg|
35
+ pkg.gem_spec = spec
36
+ end
37
+
38
+ desc "install the plugin locally"
39
+ task :install => [:package] do
40
+ sh %{#{sudo} #{gemx} install pkg/#{NAME}-#{GEM_VERSION} --local --no-update-sources}
41
+ end
42
+
43
+ desc "install frozen (source must be located somewhere inside main frozen gems folder)"
44
+ task :install_frozen => [:package] do
45
+ if !path = gems_path
46
+ puts "source must be located somewhere inside main frozen gems folder"
47
+ else
48
+ sh %{#{sudo} #{gemx} install pkg/#{NAME}-#{GEM_VERSION} -i #{path} --local --no-update-sources}
49
+ end
50
+ end
51
+
52
+ desc "create a gemspec file"
53
+ task :make_spec do
54
+ File.open("#{NAME}.gemspec", "w") do |file|
55
+ file.puts spec.to_ruby
56
+ end
57
+ end
58
+
59
+ namespace :jruby do
60
+ desc "Run :package and install the resulting .gem with jruby"
61
+ task :install => :package do
62
+ sh %{#{sudo} jruby -S gem install pkg/#{NAME}-#{GEM_VERSION}.gem --no-rdoc --no-ri}
63
+ end
64
+ end
data/TODO ADDED
@@ -0,0 +1 @@
1
+ TODO:
@@ -0,0 +1,12 @@
1
+ if defined?(Merb::Plugins)
2
+ require 'merb_dm_xss_terminate/xss_terminate'
3
+ require 'merb_dm_xss_terminate/rails_sanitize'
4
+ require 'merb_dm_xss_terminate/html5lib_sanitize'
5
+
6
+ Merb::BootLoader.before_app_loads do
7
+ DataMapper::Resource.append_inclusions XssTerminate
8
+ DataMapper::Model.append_extensions XssTerminate::ClassMethods
9
+ end
10
+
11
+ Merb::Plugins.add_rakefiles "merb_dm_xss_terminate/merbtasks"
12
+ end
@@ -0,0 +1,68 @@
1
+ require File.dirname(__FILE__) + '/tokenizer'
2
+ require File.dirname(__FILE__) + '/node'
3
+ require File.dirname(__FILE__) + '/selector'
4
+ require File.dirname(__FILE__) + '/sanitizer'
5
+
6
+ module HTML #:nodoc:
7
+ # A top-level HTMl document. You give it a body of text, and it will parse that
8
+ # text into a tree of nodes.
9
+ class Document #:nodoc:
10
+
11
+ # The root of the parsed document.
12
+ attr_reader :root
13
+
14
+ # Create a new Document from the given text.
15
+ def initialize(text, strict=false, xml=false)
16
+ tokenizer = Tokenizer.new(text)
17
+ @root = Node.new(nil)
18
+ node_stack = [ @root ]
19
+ while token = tokenizer.next
20
+ node = Node.parse(node_stack.last, tokenizer.line, tokenizer.position, token)
21
+
22
+ node_stack.last.children << node unless node.tag? && node.closing == :close
23
+ if node.tag?
24
+ if node_stack.length > 1 && node.closing == :close
25
+ if node_stack.last.name == node.name
26
+ if node_stack.last.children.empty?
27
+ node_stack.last.children << Text.new(node_stack.last, node.line, node.position, "")
28
+ end
29
+ node_stack.pop
30
+ else
31
+ open_start = node_stack.last.position - 20
32
+ open_start = 0 if open_start < 0
33
+ close_start = node.position - 20
34
+ close_start = 0 if close_start < 0
35
+ msg = <<EOF.strip
36
+ ignoring attempt to close #{node_stack.last.name} with #{node.name}
37
+ opened at byte #{node_stack.last.position}, line #{node_stack.last.line}
38
+ closed at byte #{node.position}, line #{node.line}
39
+ attributes at open: #{node_stack.last.attributes.inspect}
40
+ text around open: #{text[open_start,40].inspect}
41
+ text around close: #{text[close_start,40].inspect}
42
+ EOF
43
+ strict ? raise(msg) : warn(msg)
44
+ end
45
+ elsif !node.childless?(xml) && node.closing != :close
46
+ node_stack.push node
47
+ end
48
+ end
49
+ end
50
+ end
51
+
52
+ # Search the tree for (and return) the first node that matches the given
53
+ # conditions. The conditions are interpreted differently for different node
54
+ # types, see HTML::Text#find and HTML::Tag#find.
55
+ def find(conditions)
56
+ @root.find(conditions)
57
+ end
58
+
59
+ # Search the tree for (and return) all nodes that match the given
60
+ # conditions. The conditions are interpreted differently for different node
61
+ # types, see HTML::Text#find and HTML::Tag#find.
62
+ def find_all(conditions)
63
+ @root.find_all(conditions)
64
+ end
65
+
66
+ end
67
+
68
+ end
@@ -0,0 +1,530 @@
1
+ require 'strscan'
2
+
3
+ module HTML #:nodoc:
4
+
5
+ class Conditions < Hash #:nodoc:
6
+ def initialize(hash)
7
+ super()
8
+ hash = { :content => hash } unless Hash === hash
9
+ hash = keys_to_symbols(hash)
10
+ hash.each do |k,v|
11
+ case k
12
+ when :tag, :content then
13
+ # keys are valid, and require no further processing
14
+ when :attributes then
15
+ hash[k] = keys_to_strings(v)
16
+ when :parent, :child, :ancestor, :descendant, :sibling, :before,
17
+ :after
18
+ hash[k] = Conditions.new(v)
19
+ when :children
20
+ hash[k] = v = keys_to_symbols(v)
21
+ v.each do |k,v2|
22
+ case k
23
+ when :count, :greater_than, :less_than
24
+ # keys are valid, and require no further processing
25
+ when :only
26
+ v[k] = Conditions.new(v2)
27
+ else
28
+ raise "illegal key #{k.inspect} => #{v2.inspect}"
29
+ end
30
+ end
31
+ else
32
+ raise "illegal key #{k.inspect} => #{v.inspect}"
33
+ end
34
+ end
35
+ update hash
36
+ end
37
+
38
+ private
39
+
40
+ def keys_to_strings(hash)
41
+ hash.keys.inject({}) do |h,k|
42
+ h[k.to_s] = hash[k]
43
+ h
44
+ end
45
+ end
46
+
47
+ def keys_to_symbols(hash)
48
+ hash.keys.inject({}) do |h,k|
49
+ raise "illegal key #{k.inspect}" unless k.respond_to?(:to_sym)
50
+ h[k.to_sym] = hash[k]
51
+ h
52
+ end
53
+ end
54
+ end
55
+
56
+ # The base class of all nodes, textual and otherwise, in an HTML document.
57
+ class Node #:nodoc:
58
+ # The array of children of this node. Not all nodes have children.
59
+ attr_reader :children
60
+
61
+ # The parent node of this node. All nodes have a parent, except for the
62
+ # root node.
63
+ attr_reader :parent
64
+
65
+ # The line number of the input where this node was begun
66
+ attr_reader :line
67
+
68
+ # The byte position in the input where this node was begun
69
+ attr_reader :position
70
+
71
+ # Create a new node as a child of the given parent.
72
+ def initialize(parent, line=0, pos=0)
73
+ @parent = parent
74
+ @children = []
75
+ @line, @position = line, pos
76
+ end
77
+
78
+ # Return a textual representation of the node.
79
+ def to_s
80
+ s = ""
81
+ @children.each { |child| s << child.to_s }
82
+ s
83
+ end
84
+
85
+ # Return false (subclasses must override this to provide specific matching
86
+ # behavior.) +conditions+ may be of any type.
87
+ def match(conditions)
88
+ false
89
+ end
90
+
91
+ # Search the children of this node for the first node for which #find
92
+ # returns non +nil+. Returns the result of the #find call that succeeded.
93
+ def find(conditions)
94
+ conditions = validate_conditions(conditions)
95
+ @children.each do |child|
96
+ node = child.find(conditions)
97
+ return node if node
98
+ end
99
+ nil
100
+ end
101
+
102
+ # Search for all nodes that match the given conditions, and return them
103
+ # as an array.
104
+ def find_all(conditions)
105
+ conditions = validate_conditions(conditions)
106
+
107
+ matches = []
108
+ matches << self if match(conditions)
109
+ @children.each do |child|
110
+ matches.concat child.find_all(conditions)
111
+ end
112
+ matches
113
+ end
114
+
115
+ # Returns +false+. Subclasses may override this if they define a kind of
116
+ # tag.
117
+ def tag?
118
+ false
119
+ end
120
+
121
+ def validate_conditions(conditions)
122
+ Conditions === conditions ? conditions : Conditions.new(conditions)
123
+ end
124
+
125
+ def ==(node)
126
+ return false unless self.class == node.class && children.size == node.children.size
127
+
128
+ equivalent = true
129
+
130
+ children.size.times do |i|
131
+ equivalent &&= children[i] == node.children[i]
132
+ end
133
+
134
+ equivalent
135
+ end
136
+
137
+ class <<self
138
+ def parse(parent, line, pos, content, strict=true)
139
+ if content !~ /^<\S/
140
+ Text.new(parent, line, pos, content)
141
+ else
142
+ scanner = StringScanner.new(content)
143
+
144
+ unless scanner.skip(/</)
145
+ if strict
146
+ raise "expected <"
147
+ else
148
+ return Text.new(parent, line, pos, content)
149
+ end
150
+ end
151
+
152
+ if scanner.skip(/!\[CDATA\[/)
153
+ scanner.scan_until(/\]\]>/)
154
+ return CDATA.new(parent, line, pos, scanner.pre_match.gsub(/<!\[CDATA\[/, ''))
155
+ end
156
+
157
+ closing = ( scanner.scan(/\//) ? :close : nil )
158
+ return Text.new(parent, line, pos, content) unless name = scanner.scan(/[\w:-]+/)
159
+ name.downcase!
160
+
161
+ unless closing
162
+ scanner.skip(/\s*/)
163
+ attributes = {}
164
+ while attr = scanner.scan(/[-\w:]+/)
165
+ value = true
166
+ if scanner.scan(/\s*=\s*/)
167
+ if delim = scanner.scan(/['"]/)
168
+ value = ""
169
+ while text = scanner.scan(/[^#{delim}\\]+|./)
170
+ case text
171
+ when "\\" then
172
+ value << text
173
+ value << scanner.getch
174
+ when delim
175
+ break
176
+ else value << text
177
+ end
178
+ end
179
+ else
180
+ value = scanner.scan(/[^\s>\/]+/)
181
+ end
182
+ end
183
+ attributes[attr.downcase] = value
184
+ scanner.skip(/\s*/)
185
+ end
186
+
187
+ closing = ( scanner.scan(/\//) ? :self : nil )
188
+ end
189
+
190
+ unless scanner.scan(/\s*>/)
191
+ if strict
192
+ raise "expected > (got #{scanner.rest.inspect} for #{content}, #{attributes.inspect})"
193
+ else
194
+ # throw away all text until we find what we're looking for
195
+ scanner.skip_until(/>/) or scanner.terminate
196
+ end
197
+ end
198
+
199
+ Tag.new(parent, line, pos, name, attributes, closing)
200
+ end
201
+ end
202
+ end
203
+ end
204
+
205
+ # A node that represents text, rather than markup.
206
+ class Text < Node #:nodoc:
207
+
208
+ attr_reader :content
209
+
210
+ # Creates a new text node as a child of the given parent, with the given
211
+ # content.
212
+ def initialize(parent, line, pos, content)
213
+ super(parent, line, pos)
214
+ @content = content
215
+ end
216
+
217
+ # Returns the content of this node.
218
+ def to_s
219
+ @content
220
+ end
221
+
222
+ # Returns +self+ if this node meets the given conditions. Text nodes support
223
+ # conditions of the following kinds:
224
+ #
225
+ # * if +conditions+ is a string, it must be a substring of the node's
226
+ # content
227
+ # * if +conditions+ is a regular expression, it must match the node's
228
+ # content
229
+ # * if +conditions+ is a hash, it must contain a <tt>:content</tt> key that
230
+ # is either a string or a regexp, and which is interpreted as described
231
+ # above.
232
+ def find(conditions)
233
+ match(conditions) && self
234
+ end
235
+
236
+ # Returns non-+nil+ if this node meets the given conditions, or +nil+
237
+ # otherwise. See the discussion of #find for the valid conditions.
238
+ def match(conditions)
239
+ case conditions
240
+ when String
241
+ @content == conditions
242
+ when Regexp
243
+ @content =~ conditions
244
+ when Hash
245
+ conditions = validate_conditions(conditions)
246
+
247
+ # Text nodes only have :content, :parent, :ancestor
248
+ unless (conditions.keys - [:content, :parent, :ancestor]).empty?
249
+ return false
250
+ end
251
+
252
+ match(conditions[:content])
253
+ else
254
+ nil
255
+ end
256
+ end
257
+
258
+ def ==(node)
259
+ return false unless super
260
+ content == node.content
261
+ end
262
+ end
263
+
264
+ # A CDATA node is simply a text node with a specialized way of displaying
265
+ # itself.
266
+ class CDATA < Text #:nodoc:
267
+ def to_s
268
+ "<![CDATA[#{super}]>"
269
+ end
270
+ end
271
+
272
+ # A Tag is any node that represents markup. It may be an opening tag, a
273
+ # closing tag, or a self-closing tag. It has a name, and may have a hash of
274
+ # attributes.
275
+ class Tag < Node #:nodoc:
276
+
277
+ # Either +nil+, <tt>:close</tt>, or <tt>:self</tt>
278
+ attr_reader :closing
279
+
280
+ # Either +nil+, or a hash of attributes for this node.
281
+ attr_reader :attributes
282
+
283
+ # The name of this tag.
284
+ attr_reader :name
285
+
286
+ # Create a new node as a child of the given parent, using the given content
287
+ # to describe the node. It will be parsed and the node name, attributes and
288
+ # closing status extracted.
289
+ def initialize(parent, line, pos, name, attributes, closing)
290
+ super(parent, line, pos)
291
+ @name = name
292
+ @attributes = attributes
293
+ @closing = closing
294
+ end
295
+
296
+ # A convenience for obtaining an attribute of the node. Returns +nil+ if
297
+ # the node has no attributes.
298
+ def [](attr)
299
+ @attributes ? @attributes[attr] : nil
300
+ end
301
+
302
+ # Returns non-+nil+ if this tag can contain child nodes.
303
+ def childless?(xml = false)
304
+ return false if xml && @closing.nil?
305
+ !@closing.nil? ||
306
+ @name =~ /^(img|br|hr|link|meta|area|base|basefont|
307
+ col|frame|input|isindex|param)$/ox
308
+ end
309
+
310
+ # Returns a textual representation of the node
311
+ def to_s
312
+ if @closing == :close
313
+ "</#{@name}>"
314
+ else
315
+ s = "<#{@name}"
316
+ @attributes.each do |k,v|
317
+ s << " #{k}"
318
+ s << "=\"#{v}\"" if String === v
319
+ end
320
+ s << " /" if @closing == :self
321
+ s << ">"
322
+ @children.each { |child| s << child.to_s }
323
+ s << "</#{@name}>" if @closing != :self && !@children.empty?
324
+ s
325
+ end
326
+ end
327
+
328
+ # If either the node or any of its children meet the given conditions, the
329
+ # matching node is returned. Otherwise, +nil+ is returned. (See the
330
+ # description of the valid conditions in the +match+ method.)
331
+ def find(conditions)
332
+ match(conditions) && self || super
333
+ end
334
+
335
+ # Returns +true+, indicating that this node represents an HTML tag.
336
+ def tag?
337
+ true
338
+ end
339
+
340
+ # Returns +true+ if the node meets any of the given conditions. The
341
+ # +conditions+ parameter must be a hash of any of the following keys
342
+ # (all are optional):
343
+ #
344
+ # * <tt>:tag</tt>: the node name must match the corresponding value
345
+ # * <tt>:attributes</tt>: a hash. The node's values must match the
346
+ # corresponding values in the hash.
347
+ # * <tt>:parent</tt>: a hash. The node's parent must match the
348
+ # corresponding hash.
349
+ # * <tt>:child</tt>: a hash. At least one of the node's immediate children
350
+ # must meet the criteria described by the hash.
351
+ # * <tt>:ancestor</tt>: a hash. At least one of the node's ancestors must
352
+ # meet the criteria described by the hash.
353
+ # * <tt>:descendant</tt>: a hash. At least one of the node's descendants
354
+ # must meet the criteria described by the hash.
355
+ # * <tt>:sibling</tt>: a hash. At least one of the node's siblings must
356
+ # meet the criteria described by the hash.
357
+ # * <tt>:after</tt>: a hash. The node must be after any sibling meeting
358
+ # the criteria described by the hash, and at least one sibling must match.
359
+ # * <tt>:before</tt>: a hash. The node must be before any sibling meeting
360
+ # the criteria described by the hash, and at least one sibling must match.
361
+ # * <tt>:children</tt>: a hash, for counting children of a node. Accepts the
362
+ # keys:
363
+ # ** <tt>:count</tt>: either a number or a range which must equal (or
364
+ # include) the number of children that match.
365
+ # ** <tt>:less_than</tt>: the number of matching children must be less than
366
+ # this number.
367
+ # ** <tt>:greater_than</tt>: the number of matching children must be
368
+ # greater than this number.
369
+ # ** <tt>:only</tt>: another hash consisting of the keys to use
370
+ # to match on the children, and only matching children will be
371
+ # counted.
372
+ #
373
+ # Conditions are matched using the following algorithm:
374
+ #
375
+ # * if the condition is a string, it must be a substring of the value.
376
+ # * if the condition is a regexp, it must match the value.
377
+ # * if the condition is a number, the value must match number.to_s.
378
+ # * if the condition is +true+, the value must not be +nil+.
379
+ # * if the condition is +false+ or +nil+, the value must be +nil+.
380
+ #
381
+ # Usage:
382
+ #
383
+ # # test if the node is a "span" tag
384
+ # node.match :tag => "span"
385
+ #
386
+ # # test if the node's parent is a "div"
387
+ # node.match :parent => { :tag => "div" }
388
+ #
389
+ # # test if any of the node's ancestors are "table" tags
390
+ # node.match :ancestor => { :tag => "table" }
391
+ #
392
+ # # test if any of the node's immediate children are "em" tags
393
+ # node.match :child => { :tag => "em" }
394
+ #
395
+ # # test if any of the node's descendants are "strong" tags
396
+ # node.match :descendant => { :tag => "strong" }
397
+ #
398
+ # # test if the node has between 2 and 4 span tags as immediate children
399
+ # node.match :children => { :count => 2..4, :only => { :tag => "span" } }
400
+ #
401
+ # # get funky: test to see if the node is a "div", has a "ul" ancestor
402
+ # # and an "li" parent (with "class" = "enum"), and whether or not it has
403
+ # # a "span" descendant that contains # text matching /hello world/:
404
+ # node.match :tag => "div",
405
+ # :ancestor => { :tag => "ul" },
406
+ # :parent => { :tag => "li",
407
+ # :attributes => { :class => "enum" } },
408
+ # :descendant => { :tag => "span",
409
+ # :child => /hello world/ }
410
+ def match(conditions)
411
+ conditions = validate_conditions(conditions)
412
+ # check content of child nodes
413
+ if conditions[:content]
414
+ if children.empty?
415
+ return false unless match_condition("", conditions[:content])
416
+ else
417
+ return false unless children.find { |child| child.match(conditions[:content]) }
418
+ end
419
+ end
420
+
421
+ # test the name
422
+ return false unless match_condition(@name, conditions[:tag]) if conditions[:tag]
423
+
424
+ # test attributes
425
+ (conditions[:attributes] || {}).each do |key, value|
426
+ return false unless match_condition(self[key], value)
427
+ end
428
+
429
+ # test parent
430
+ return false unless parent.match(conditions[:parent]) if conditions[:parent]
431
+
432
+ # test children
433
+ return false unless children.find { |child| child.match(conditions[:child]) } if conditions[:child]
434
+
435
+ # test ancestors
436
+ if conditions[:ancestor]
437
+ return false unless catch :found do
438
+ p = self
439
+ throw :found, true if p.match(conditions[:ancestor]) while p = p.parent
440
+ end
441
+ end
442
+
443
+ # test descendants
444
+ if conditions[:descendant]
445
+ return false unless children.find do |child|
446
+ # test the child
447
+ child.match(conditions[:descendant]) ||
448
+ # test the child's descendants
449
+ child.match(:descendant => conditions[:descendant])
450
+ end
451
+ end
452
+
453
+ # count children
454
+ if opts = conditions[:children]
455
+ matches = children.select do |c|
456
+ (c.kind_of?(HTML::Tag) and (c.closing == :self or ! c.childless?))
457
+ end
458
+
459
+ matches = matches.select { |c| c.match(opts[:only]) } if opts[:only]
460
+ opts.each do |key, value|
461
+ next if key == :only
462
+ case key
463
+ when :count
464
+ if Integer === value
465
+ return false if matches.length != value
466
+ else
467
+ return false unless value.include?(matches.length)
468
+ end
469
+ when :less_than
470
+ return false unless matches.length < value
471
+ when :greater_than
472
+ return false unless matches.length > value
473
+ else raise "unknown count condition #{key}"
474
+ end
475
+ end
476
+ end
477
+
478
+ # test siblings
479
+ if conditions[:sibling] || conditions[:before] || conditions[:after]
480
+ siblings = parent ? parent.children : []
481
+ self_index = siblings.index(self)
482
+
483
+ if conditions[:sibling]
484
+ return false unless siblings.detect do |s|
485
+ s != self && s.match(conditions[:sibling])
486
+ end
487
+ end
488
+
489
+ if conditions[:before]
490
+ return false unless siblings[self_index+1..-1].detect do |s|
491
+ s != self && s.match(conditions[:before])
492
+ end
493
+ end
494
+
495
+ if conditions[:after]
496
+ return false unless siblings[0,self_index].detect do |s|
497
+ s != self && s.match(conditions[:after])
498
+ end
499
+ end
500
+ end
501
+
502
+ true
503
+ end
504
+
505
+ def ==(node)
506
+ return false unless super
507
+ return false unless closing == node.closing && self.name == node.name
508
+ attributes == node.attributes
509
+ end
510
+
511
+ private
512
+ # Match the given value to the given condition.
513
+ def match_condition(value, condition)
514
+ case condition
515
+ when String
516
+ value && value == condition
517
+ when Regexp
518
+ value && value.match(condition)
519
+ when Numeric
520
+ value == condition.to_s
521
+ when true
522
+ !value.nil?
523
+ when false, nil
524
+ value.nil?
525
+ else
526
+ false
527
+ end
528
+ end
529
+ end
530
+ end