schwabsauce-merb_dm_xss_terminate 0.6.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Mike Schwab
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README ADDED
@@ -0,0 +1,89 @@
1
+ this fork is for the purpose of hooking into DataMapper::Resource instead of ActiveRecord::Base.
2
+
3
+ what's the merb/dm equivalent of ENV['MODELS']? (in rake task). apart from that I think the port is complete.
4
+
5
+ = merb_dm_xss_terminate
6
+
7
+ Plugin that auto-sanitizes data before it is saved in your DataMapper models.
8
+
9
+ merb_dm_xss_terminate is a port of merb_xss_terminate by Ben Chiu, which is
10
+ a port of xss_terminate by Luke Francl. The white list sanitizer and full
11
+ sanitizer were lifted from Rails so you don't have to install ActionPack.
12
+
13
+ merb_dm_xss_terminate makes stripping and sanitizing HTML automatic. Install
14
+ and forget. And forget about remembering to escape your output, because you
15
+ won't need to anymore. Just remember the cases where html is allowed.
16
+
17
+ *By default, it will strip all HTML tags from user input.* But merb_dm_xss_terminate
18
+ is also flexible. When you need users to be able to enter HTML, the plugin
19
+ allows you remove bad HTML with your choice of two whitelist-based sanitizers,
20
+ or to skip HTML sanitization entirely on a per-field basis.
21
+
22
+ == Installation
23
+
24
+ git clone git://github.com/schwabsauce/merb_xss_terminate.git
25
+ cd merb_xss_terminate
26
+ rake install
27
+ add: dependency 'merb_dm_xss_terminate' to init.rb
28
+
29
+ == HTML sanitization
30
+
31
+ * Full-sanitizer: removes all HTML by stripping all the tags. Tags are
32
+ removed, but their content is not.
33
+ * White-list sanitizer: removes bad HTML with Rails' HTML sanitizer methods.
34
+ Bad tags are removed completely, including their content.
35
+ * HTML5lib sanitization: removes bad HTML after parsing it with
36
+ {HTML5lib}[http://code.google.com/p/html5lib/], a library that parses HTML
37
+ like browsers do. It should be very tolerant of invalid HTML. Bad tags are
38
+ escaped, not removed.
39
+ * Do nothing. You can chose not to process given fields.
40
+
41
+ == Usage
42
+
43
+ Installing the plugin creates a +before :save+ hook that will strip HTML tags
44
+ from all string and text fields. No further configuration is necessary if this
45
+ is what you want. To customize the behavior, you use the xss_terminate
46
+ class method in your models.
47
+
48
+ To exempt some fields from sanitization, use the <tt>:except</tt> option
49
+ with a list of fields not to process. Note: Merb uses :exclude but use :except here.
50
+
51
+ class Comment
52
+ xss_terminate :except => [ :body ]
53
+ end
54
+
55
+ To sanitize HTML with Rails' sanitization, use the <tt>:sanitize</tt> option:
56
+
57
+ class Review
58
+ xss_terminate :sanitize => [ :body, :author_name]
59
+ end
60
+
61
+ To sanitize HTML with {HTML5Lib}[http://code.google.com/p/html5lib/]
62
+ use the <tt>:html5lib_sanitize</tt> option with a list of fields to sanitize:
63
+
64
+ class Entry
65
+ xss_terminate :html5lib_sanitize => [ :body, :author_name ]
66
+ end
67
+
68
+ You can combine multiple options if you have some fields you would like skipped
69
+ and others sanitized. Fields not listed in the option arrays will be stripped.
70
+
71
+ class Message
72
+ xss_terminate :except => [ :body ], :sanitize => [ :title ]
73
+ end
74
+
75
+ == Sanitizing existing records
76
+
77
+ After installing merb_xss_terminate and configuring it to your liking, you can
78
+ run <tt>rake merb_xss_terminate:db:sanitize MODELS=Foo,Bar,Baz</tt> to execute
79
+ it against your existing records. This will load each model found and save it
80
+ again to invoke the before_save hook.
81
+
82
+ == Credits
83
+
84
+ merb_xss_terminate by {Ben Chiu}
85
+
86
+ xss_terminate by {Luke Francl}[http://railspikes.com] and acts_as_sanitized by
87
+ {Alex Payne}[http://www.al3x.net].
88
+
89
+ HTML5Lib sanitization by {Jacques Distler}[http://golem.ph.utexas.edu/~distler].
data/Rakefile ADDED
@@ -0,0 +1,64 @@
1
+ require 'rubygems'
2
+ require 'rubygems/specification'
3
+ require 'rake/gempackagetask'
4
+ require 'spec/rake/spectask'
5
+ require 'merb-core/test/tasks/spectasks'
6
+ require 'date'
7
+ require 'merb_rake_helper'
8
+
9
+ PLUGIN = "merb_dm_xss_terminate"
10
+ NAME = "merb_dm_xss_terminate"
11
+ GEM_VERSION = "0.5.3"
12
+ AUTHOR = "Mike Schwab"
13
+ EMAIL = "mike.schwab@gmail.com"
14
+ HOMEPAGE = "http://github.com/schwabsauce/merb_xss_terminate"
15
+ SUMMARY = "Plugin that auto-sanitizes data before it is saved in your DataMapper models"
16
+
17
+ spec = Gem::Specification.new do |s|
18
+ s.name = NAME
19
+ s.version = GEM_VERSION
20
+ s.platform = Gem::Platform::RUBY
21
+ s.has_rdoc = true
22
+ s.extra_rdoc_files = ["README", "LICENSE", 'TODO']
23
+ s.summary = SUMMARY
24
+ s.description = s.summary
25
+ s.author = AUTHOR
26
+ s.email = EMAIL
27
+ s.homepage = HOMEPAGE
28
+ s.add_dependency('merb-core', '>= 0.9.0')
29
+ s.add_dependency('html5', '>= 0.10.0')
30
+ s.require_path = 'lib'
31
+ s.files = %w(LICENSE README Rakefile TODO) + Dir.glob("{lib,spec}/**/*")
32
+ end
33
+
34
+ Rake::GemPackageTask.new(spec) do |pkg|
35
+ pkg.gem_spec = spec
36
+ end
37
+
38
+ desc "install the plugin locally"
39
+ task :install => [:package] do
40
+ sh %{#{sudo} #{gemx} install pkg/#{NAME}-#{GEM_VERSION} --local --no-update-sources}
41
+ end
42
+
43
+ desc "install frozen (source must be located somewhere inside main frozen gems folder)"
44
+ task :install_frozen => [:package] do
45
+ if !path = gems_path
46
+ puts "source must be located somewhere inside main frozen gems folder"
47
+ else
48
+ sh %{#{sudo} #{gemx} install pkg/#{NAME}-#{GEM_VERSION} -i #{path} --local --no-update-sources}
49
+ end
50
+ end
51
+
52
+ desc "create a gemspec file"
53
+ task :make_spec do
54
+ File.open("#{NAME}.gemspec", "w") do |file|
55
+ file.puts spec.to_ruby
56
+ end
57
+ end
58
+
59
+ namespace :jruby do
60
+ desc "Run :package and install the resulting .gem with jruby"
61
+ task :install => :package do
62
+ sh %{#{sudo} jruby -S gem install pkg/#{NAME}-#{GEM_VERSION}.gem --no-rdoc --no-ri}
63
+ end
64
+ end
data/TODO ADDED
@@ -0,0 +1 @@
1
+ TODO:
@@ -0,0 +1,12 @@
1
+ if defined?(Merb::Plugins)
2
+ require 'merb_dm_xss_terminate/xss_terminate'
3
+ require 'merb_dm_xss_terminate/rails_sanitize'
4
+ require 'merb_dm_xss_terminate/html5lib_sanitize'
5
+
6
+ Merb::BootLoader.before_app_loads do
7
+ DataMapper::Resource.append_inclusions XssTerminate
8
+ DataMapper::Model.append_extensions XssTerminate::ClassMethods
9
+ end
10
+
11
+ Merb::Plugins.add_rakefiles "merb_dm_xss_terminate/merbtasks"
12
+ end
@@ -0,0 +1,68 @@
1
+ require File.dirname(__FILE__) + '/tokenizer'
2
+ require File.dirname(__FILE__) + '/node'
3
+ require File.dirname(__FILE__) + '/selector'
4
+ require File.dirname(__FILE__) + '/sanitizer'
5
+
6
+ module HTML #:nodoc:
7
+ # A top-level HTMl document. You give it a body of text, and it will parse that
8
+ # text into a tree of nodes.
9
+ class Document #:nodoc:
10
+
11
+ # The root of the parsed document.
12
+ attr_reader :root
13
+
14
+ # Create a new Document from the given text.
15
+ def initialize(text, strict=false, xml=false)
16
+ tokenizer = Tokenizer.new(text)
17
+ @root = Node.new(nil)
18
+ node_stack = [ @root ]
19
+ while token = tokenizer.next
20
+ node = Node.parse(node_stack.last, tokenizer.line, tokenizer.position, token)
21
+
22
+ node_stack.last.children << node unless node.tag? && node.closing == :close
23
+ if node.tag?
24
+ if node_stack.length > 1 && node.closing == :close
25
+ if node_stack.last.name == node.name
26
+ if node_stack.last.children.empty?
27
+ node_stack.last.children << Text.new(node_stack.last, node.line, node.position, "")
28
+ end
29
+ node_stack.pop
30
+ else
31
+ open_start = node_stack.last.position - 20
32
+ open_start = 0 if open_start < 0
33
+ close_start = node.position - 20
34
+ close_start = 0 if close_start < 0
35
+ msg = <<EOF.strip
36
+ ignoring attempt to close #{node_stack.last.name} with #{node.name}
37
+ opened at byte #{node_stack.last.position}, line #{node_stack.last.line}
38
+ closed at byte #{node.position}, line #{node.line}
39
+ attributes at open: #{node_stack.last.attributes.inspect}
40
+ text around open: #{text[open_start,40].inspect}
41
+ text around close: #{text[close_start,40].inspect}
42
+ EOF
43
+ strict ? raise(msg) : warn(msg)
44
+ end
45
+ elsif !node.childless?(xml) && node.closing != :close
46
+ node_stack.push node
47
+ end
48
+ end
49
+ end
50
+ end
51
+
52
+ # Search the tree for (and return) the first node that matches the given
53
+ # conditions. The conditions are interpreted differently for different node
54
+ # types, see HTML::Text#find and HTML::Tag#find.
55
+ def find(conditions)
56
+ @root.find(conditions)
57
+ end
58
+
59
+ # Search the tree for (and return) all nodes that match the given
60
+ # conditions. The conditions are interpreted differently for different node
61
+ # types, see HTML::Text#find and HTML::Tag#find.
62
+ def find_all(conditions)
63
+ @root.find_all(conditions)
64
+ end
65
+
66
+ end
67
+
68
+ end
@@ -0,0 +1,530 @@
1
+ require 'strscan'
2
+
3
+ module HTML #:nodoc:
4
+
5
+ class Conditions < Hash #:nodoc:
6
+ def initialize(hash)
7
+ super()
8
+ hash = { :content => hash } unless Hash === hash
9
+ hash = keys_to_symbols(hash)
10
+ hash.each do |k,v|
11
+ case k
12
+ when :tag, :content then
13
+ # keys are valid, and require no further processing
14
+ when :attributes then
15
+ hash[k] = keys_to_strings(v)
16
+ when :parent, :child, :ancestor, :descendant, :sibling, :before,
17
+ :after
18
+ hash[k] = Conditions.new(v)
19
+ when :children
20
+ hash[k] = v = keys_to_symbols(v)
21
+ v.each do |k,v2|
22
+ case k
23
+ when :count, :greater_than, :less_than
24
+ # keys are valid, and require no further processing
25
+ when :only
26
+ v[k] = Conditions.new(v2)
27
+ else
28
+ raise "illegal key #{k.inspect} => #{v2.inspect}"
29
+ end
30
+ end
31
+ else
32
+ raise "illegal key #{k.inspect} => #{v.inspect}"
33
+ end
34
+ end
35
+ update hash
36
+ end
37
+
38
+ private
39
+
40
+ def keys_to_strings(hash)
41
+ hash.keys.inject({}) do |h,k|
42
+ h[k.to_s] = hash[k]
43
+ h
44
+ end
45
+ end
46
+
47
+ def keys_to_symbols(hash)
48
+ hash.keys.inject({}) do |h,k|
49
+ raise "illegal key #{k.inspect}" unless k.respond_to?(:to_sym)
50
+ h[k.to_sym] = hash[k]
51
+ h
52
+ end
53
+ end
54
+ end
55
+
56
+ # The base class of all nodes, textual and otherwise, in an HTML document.
57
+ class Node #:nodoc:
58
+ # The array of children of this node. Not all nodes have children.
59
+ attr_reader :children
60
+
61
+ # The parent node of this node. All nodes have a parent, except for the
62
+ # root node.
63
+ attr_reader :parent
64
+
65
+ # The line number of the input where this node was begun
66
+ attr_reader :line
67
+
68
+ # The byte position in the input where this node was begun
69
+ attr_reader :position
70
+
71
+ # Create a new node as a child of the given parent.
72
+ def initialize(parent, line=0, pos=0)
73
+ @parent = parent
74
+ @children = []
75
+ @line, @position = line, pos
76
+ end
77
+
78
+ # Return a textual representation of the node.
79
+ def to_s
80
+ s = ""
81
+ @children.each { |child| s << child.to_s }
82
+ s
83
+ end
84
+
85
+ # Return false (subclasses must override this to provide specific matching
86
+ # behavior.) +conditions+ may be of any type.
87
+ def match(conditions)
88
+ false
89
+ end
90
+
91
+ # Search the children of this node for the first node for which #find
92
+ # returns non +nil+. Returns the result of the #find call that succeeded.
93
+ def find(conditions)
94
+ conditions = validate_conditions(conditions)
95
+ @children.each do |child|
96
+ node = child.find(conditions)
97
+ return node if node
98
+ end
99
+ nil
100
+ end
101
+
102
+ # Search for all nodes that match the given conditions, and return them
103
+ # as an array.
104
+ def find_all(conditions)
105
+ conditions = validate_conditions(conditions)
106
+
107
+ matches = []
108
+ matches << self if match(conditions)
109
+ @children.each do |child|
110
+ matches.concat child.find_all(conditions)
111
+ end
112
+ matches
113
+ end
114
+
115
+ # Returns +false+. Subclasses may override this if they define a kind of
116
+ # tag.
117
+ def tag?
118
+ false
119
+ end
120
+
121
+ def validate_conditions(conditions)
122
+ Conditions === conditions ? conditions : Conditions.new(conditions)
123
+ end
124
+
125
+ def ==(node)
126
+ return false unless self.class == node.class && children.size == node.children.size
127
+
128
+ equivalent = true
129
+
130
+ children.size.times do |i|
131
+ equivalent &&= children[i] == node.children[i]
132
+ end
133
+
134
+ equivalent
135
+ end
136
+
137
+ class <<self
138
+ def parse(parent, line, pos, content, strict=true)
139
+ if content !~ /^<\S/
140
+ Text.new(parent, line, pos, content)
141
+ else
142
+ scanner = StringScanner.new(content)
143
+
144
+ unless scanner.skip(/</)
145
+ if strict
146
+ raise "expected <"
147
+ else
148
+ return Text.new(parent, line, pos, content)
149
+ end
150
+ end
151
+
152
+ if scanner.skip(/!\[CDATA\[/)
153
+ scanner.scan_until(/\]\]>/)
154
+ return CDATA.new(parent, line, pos, scanner.pre_match.gsub(/<!\[CDATA\[/, ''))
155
+ end
156
+
157
+ closing = ( scanner.scan(/\//) ? :close : nil )
158
+ return Text.new(parent, line, pos, content) unless name = scanner.scan(/[\w:-]+/)
159
+ name.downcase!
160
+
161
+ unless closing
162
+ scanner.skip(/\s*/)
163
+ attributes = {}
164
+ while attr = scanner.scan(/[-\w:]+/)
165
+ value = true
166
+ if scanner.scan(/\s*=\s*/)
167
+ if delim = scanner.scan(/['"]/)
168
+ value = ""
169
+ while text = scanner.scan(/[^#{delim}\\]+|./)
170
+ case text
171
+ when "\\" then
172
+ value << text
173
+ value << scanner.getch
174
+ when delim
175
+ break
176
+ else value << text
177
+ end
178
+ end
179
+ else
180
+ value = scanner.scan(/[^\s>\/]+/)
181
+ end
182
+ end
183
+ attributes[attr.downcase] = value
184
+ scanner.skip(/\s*/)
185
+ end
186
+
187
+ closing = ( scanner.scan(/\//) ? :self : nil )
188
+ end
189
+
190
+ unless scanner.scan(/\s*>/)
191
+ if strict
192
+ raise "expected > (got #{scanner.rest.inspect} for #{content}, #{attributes.inspect})"
193
+ else
194
+ # throw away all text until we find what we're looking for
195
+ scanner.skip_until(/>/) or scanner.terminate
196
+ end
197
+ end
198
+
199
+ Tag.new(parent, line, pos, name, attributes, closing)
200
+ end
201
+ end
202
+ end
203
+ end
204
+
205
+ # A node that represents text, rather than markup.
206
+ class Text < Node #:nodoc:
207
+
208
+ attr_reader :content
209
+
210
+ # Creates a new text node as a child of the given parent, with the given
211
+ # content.
212
+ def initialize(parent, line, pos, content)
213
+ super(parent, line, pos)
214
+ @content = content
215
+ end
216
+
217
+ # Returns the content of this node.
218
+ def to_s
219
+ @content
220
+ end
221
+
222
+ # Returns +self+ if this node meets the given conditions. Text nodes support
223
+ # conditions of the following kinds:
224
+ #
225
+ # * if +conditions+ is a string, it must be a substring of the node's
226
+ # content
227
+ # * if +conditions+ is a regular expression, it must match the node's
228
+ # content
229
+ # * if +conditions+ is a hash, it must contain a <tt>:content</tt> key that
230
+ # is either a string or a regexp, and which is interpreted as described
231
+ # above.
232
+ def find(conditions)
233
+ match(conditions) && self
234
+ end
235
+
236
+ # Returns non-+nil+ if this node meets the given conditions, or +nil+
237
+ # otherwise. See the discussion of #find for the valid conditions.
238
+ def match(conditions)
239
+ case conditions
240
+ when String
241
+ @content == conditions
242
+ when Regexp
243
+ @content =~ conditions
244
+ when Hash
245
+ conditions = validate_conditions(conditions)
246
+
247
+ # Text nodes only have :content, :parent, :ancestor
248
+ unless (conditions.keys - [:content, :parent, :ancestor]).empty?
249
+ return false
250
+ end
251
+
252
+ match(conditions[:content])
253
+ else
254
+ nil
255
+ end
256
+ end
257
+
258
+ def ==(node)
259
+ return false unless super
260
+ content == node.content
261
+ end
262
+ end
263
+
264
+ # A CDATA node is simply a text node with a specialized way of displaying
265
+ # itself.
266
+ class CDATA < Text #:nodoc:
267
+ def to_s
268
+ "<![CDATA[#{super}]>"
269
+ end
270
+ end
271
+
272
+ # A Tag is any node that represents markup. It may be an opening tag, a
273
+ # closing tag, or a self-closing tag. It has a name, and may have a hash of
274
+ # attributes.
275
+ class Tag < Node #:nodoc:
276
+
277
+ # Either +nil+, <tt>:close</tt>, or <tt>:self</tt>
278
+ attr_reader :closing
279
+
280
+ # Either +nil+, or a hash of attributes for this node.
281
+ attr_reader :attributes
282
+
283
+ # The name of this tag.
284
+ attr_reader :name
285
+
286
+ # Create a new node as a child of the given parent, using the given content
287
+ # to describe the node. It will be parsed and the node name, attributes and
288
+ # closing status extracted.
289
+ def initialize(parent, line, pos, name, attributes, closing)
290
+ super(parent, line, pos)
291
+ @name = name
292
+ @attributes = attributes
293
+ @closing = closing
294
+ end
295
+
296
+ # A convenience for obtaining an attribute of the node. Returns +nil+ if
297
+ # the node has no attributes.
298
+ def [](attr)
299
+ @attributes ? @attributes[attr] : nil
300
+ end
301
+
302
+ # Returns non-+nil+ if this tag can contain child nodes.
303
+ def childless?(xml = false)
304
+ return false if xml && @closing.nil?
305
+ !@closing.nil? ||
306
+ @name =~ /^(img|br|hr|link|meta|area|base|basefont|
307
+ col|frame|input|isindex|param)$/ox
308
+ end
309
+
310
+ # Returns a textual representation of the node
311
+ def to_s
312
+ if @closing == :close
313
+ "</#{@name}>"
314
+ else
315
+ s = "<#{@name}"
316
+ @attributes.each do |k,v|
317
+ s << " #{k}"
318
+ s << "=\"#{v}\"" if String === v
319
+ end
320
+ s << " /" if @closing == :self
321
+ s << ">"
322
+ @children.each { |child| s << child.to_s }
323
+ s << "</#{@name}>" if @closing != :self && !@children.empty?
324
+ s
325
+ end
326
+ end
327
+
328
+ # If either the node or any of its children meet the given conditions, the
329
+ # matching node is returned. Otherwise, +nil+ is returned. (See the
330
+ # description of the valid conditions in the +match+ method.)
331
+ def find(conditions)
332
+ match(conditions) && self || super
333
+ end
334
+
335
+ # Returns +true+, indicating that this node represents an HTML tag.
336
+ def tag?
337
+ true
338
+ end
339
+
340
+ # Returns +true+ if the node meets any of the given conditions. The
341
+ # +conditions+ parameter must be a hash of any of the following keys
342
+ # (all are optional):
343
+ #
344
+ # * <tt>:tag</tt>: the node name must match the corresponding value
345
+ # * <tt>:attributes</tt>: a hash. The node's values must match the
346
+ # corresponding values in the hash.
347
+ # * <tt>:parent</tt>: a hash. The node's parent must match the
348
+ # corresponding hash.
349
+ # * <tt>:child</tt>: a hash. At least one of the node's immediate children
350
+ # must meet the criteria described by the hash.
351
+ # * <tt>:ancestor</tt>: a hash. At least one of the node's ancestors must
352
+ # meet the criteria described by the hash.
353
+ # * <tt>:descendant</tt>: a hash. At least one of the node's descendants
354
+ # must meet the criteria described by the hash.
355
+ # * <tt>:sibling</tt>: a hash. At least one of the node's siblings must
356
+ # meet the criteria described by the hash.
357
+ # * <tt>:after</tt>: a hash. The node must be after any sibling meeting
358
+ # the criteria described by the hash, and at least one sibling must match.
359
+ # * <tt>:before</tt>: a hash. The node must be before any sibling meeting
360
+ # the criteria described by the hash, and at least one sibling must match.
361
+ # * <tt>:children</tt>: a hash, for counting children of a node. Accepts the
362
+ # keys:
363
+ # ** <tt>:count</tt>: either a number or a range which must equal (or
364
+ # include) the number of children that match.
365
+ # ** <tt>:less_than</tt>: the number of matching children must be less than
366
+ # this number.
367
+ # ** <tt>:greater_than</tt>: the number of matching children must be
368
+ # greater than this number.
369
+ # ** <tt>:only</tt>: another hash consisting of the keys to use
370
+ # to match on the children, and only matching children will be
371
+ # counted.
372
+ #
373
+ # Conditions are matched using the following algorithm:
374
+ #
375
+ # * if the condition is a string, it must be a substring of the value.
376
+ # * if the condition is a regexp, it must match the value.
377
+ # * if the condition is a number, the value must match number.to_s.
378
+ # * if the condition is +true+, the value must not be +nil+.
379
+ # * if the condition is +false+ or +nil+, the value must be +nil+.
380
+ #
381
+ # Usage:
382
+ #
383
+ # # test if the node is a "span" tag
384
+ # node.match :tag => "span"
385
+ #
386
+ # # test if the node's parent is a "div"
387
+ # node.match :parent => { :tag => "div" }
388
+ #
389
+ # # test if any of the node's ancestors are "table" tags
390
+ # node.match :ancestor => { :tag => "table" }
391
+ #
392
+ # # test if any of the node's immediate children are "em" tags
393
+ # node.match :child => { :tag => "em" }
394
+ #
395
+ # # test if any of the node's descendants are "strong" tags
396
+ # node.match :descendant => { :tag => "strong" }
397
+ #
398
+ # # test if the node has between 2 and 4 span tags as immediate children
399
+ # node.match :children => { :count => 2..4, :only => { :tag => "span" } }
400
+ #
401
+ # # get funky: test to see if the node is a "div", has a "ul" ancestor
402
+ # # and an "li" parent (with "class" = "enum"), and whether or not it has
403
+ # # a "span" descendant that contains # text matching /hello world/:
404
+ # node.match :tag => "div",
405
+ # :ancestor => { :tag => "ul" },
406
+ # :parent => { :tag => "li",
407
+ # :attributes => { :class => "enum" } },
408
+ # :descendant => { :tag => "span",
409
+ # :child => /hello world/ }
410
+ def match(conditions)
411
+ conditions = validate_conditions(conditions)
412
+ # check content of child nodes
413
+ if conditions[:content]
414
+ if children.empty?
415
+ return false unless match_condition("", conditions[:content])
416
+ else
417
+ return false unless children.find { |child| child.match(conditions[:content]) }
418
+ end
419
+ end
420
+
421
+ # test the name
422
+ return false unless match_condition(@name, conditions[:tag]) if conditions[:tag]
423
+
424
+ # test attributes
425
+ (conditions[:attributes] || {}).each do |key, value|
426
+ return false unless match_condition(self[key], value)
427
+ end
428
+
429
+ # test parent
430
+ return false unless parent.match(conditions[:parent]) if conditions[:parent]
431
+
432
+ # test children
433
+ return false unless children.find { |child| child.match(conditions[:child]) } if conditions[:child]
434
+
435
+ # test ancestors
436
+ if conditions[:ancestor]
437
+ return false unless catch :found do
438
+ p = self
439
+ throw :found, true if p.match(conditions[:ancestor]) while p = p.parent
440
+ end
441
+ end
442
+
443
+ # test descendants
444
+ if conditions[:descendant]
445
+ return false unless children.find do |child|
446
+ # test the child
447
+ child.match(conditions[:descendant]) ||
448
+ # test the child's descendants
449
+ child.match(:descendant => conditions[:descendant])
450
+ end
451
+ end
452
+
453
+ # count children
454
+ if opts = conditions[:children]
455
+ matches = children.select do |c|
456
+ (c.kind_of?(HTML::Tag) and (c.closing == :self or ! c.childless?))
457
+ end
458
+
459
+ matches = matches.select { |c| c.match(opts[:only]) } if opts[:only]
460
+ opts.each do |key, value|
461
+ next if key == :only
462
+ case key
463
+ when :count
464
+ if Integer === value
465
+ return false if matches.length != value
466
+ else
467
+ return false unless value.include?(matches.length)
468
+ end
469
+ when :less_than
470
+ return false unless matches.length < value
471
+ when :greater_than
472
+ return false unless matches.length > value
473
+ else raise "unknown count condition #{key}"
474
+ end
475
+ end
476
+ end
477
+
478
+ # test siblings
479
+ if conditions[:sibling] || conditions[:before] || conditions[:after]
480
+ siblings = parent ? parent.children : []
481
+ self_index = siblings.index(self)
482
+
483
+ if conditions[:sibling]
484
+ return false unless siblings.detect do |s|
485
+ s != self && s.match(conditions[:sibling])
486
+ end
487
+ end
488
+
489
+ if conditions[:before]
490
+ return false unless siblings[self_index+1..-1].detect do |s|
491
+ s != self && s.match(conditions[:before])
492
+ end
493
+ end
494
+
495
+ if conditions[:after]
496
+ return false unless siblings[0,self_index].detect do |s|
497
+ s != self && s.match(conditions[:after])
498
+ end
499
+ end
500
+ end
501
+
502
+ true
503
+ end
504
+
505
+ def ==(node)
506
+ return false unless super
507
+ return false unless closing == node.closing && self.name == node.name
508
+ attributes == node.attributes
509
+ end
510
+
511
+ private
512
+ # Match the given value to the given condition.
513
+ def match_condition(value, condition)
514
+ case condition
515
+ when String
516
+ value && value == condition
517
+ when Regexp
518
+ value && value.match(condition)
519
+ when Numeric
520
+ value == condition.to_s
521
+ when true
522
+ !value.nil?
523
+ when false, nil
524
+ value.nil?
525
+ else
526
+ false
527
+ end
528
+ end
529
+ end
530
+ end