RubyGems - sanitize - Versions diffs - 1.3.0.dev.20101210 → 2.0.0.dev.20101211 - Mend

sanitize 1.3.0.dev.20101210 → 2.0.0.dev.20101211

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sanitize might be problematic. Click here for more details.

Files changed (9) hide show

data/HISTORY +7 -10
data/README.rdoc +82 -83
data/lib/sanitize.rb +33 -143
data/lib/sanitize/config.rb +0 -4
data/lib/sanitize/transformers/clean_cdata.rb +13 -0
data/lib/sanitize/transformers/clean_comment.rb +10 -0
data/lib/sanitize/transformers/clean_element.rb +87 -0
data/lib/sanitize/version.rb +1 -1
metadata +10 -7

data/HISTORY CHANGED Viewed

@@ -1,7 +1,13 @@
 Sanitize History
 ================================================================================
-Version 1.3.0 (git)
+Version 2.0.0 (git)
+  * The environment data passed into transformers and the return values expected
+    from transformers have changed. Old transformers will need to be updated.
+    See the README for details.
+  * Transformers now receive nodes of all types, not just element nodes.
+  * Sanitize's own core filtering logic is now implemented as a set of always-on
+    transformers.
   * The default value for the :output config is now :html. Previously it was
     :xhtml.
   * Added a :whitespace_elements config, which specifies elements (such as <br>
@@ -15,15 +21,6 @@ Version 1.3.0 (git)
     `ruby`, and `wbr` elements to the whitelist for `Sanitize::Config::RELAXED`.
   * The `dir`, `lang`, and `title` attributes are now whitelisted for all
     elements in `Sanitize::Config::RELAXED`.
-  * The environment hash passed into transformers now includes an
-    :allowed_elements Hash to facilitate faster lookups when attempting to
-    determine whether an element is in the whitelist. [Suggested by Nicholas
-    Evans]
-  * The environment hash passed into transformers now includes a
-    :whitelist_nodes Array, so transformers now have insight into what nodes
-    have been whitelisted by other transformers. [Suggested by Nicholas Evans]
-  * Added a :process_text_nodes config setting. If set to true, Sanitize will
-    pass text nodes to transformers. The default is false. [Ardie Saeidi]
   * Bumped minimum Nokogiri version to 1.4.4 to avoid a bug in 1.4.2+ (issue
     #315) that caused "</body></html>" to be appended to the CDATA inside
     unterminated script and style elements.

data/README.rdoc CHANGED Viewed

@@ -14,7 +14,7 @@ of fragile regular expressions, Sanitize has no trouble dealing with malformed
 or maliciously-formed HTML, and will always output valid HTML or XHTML.
 *Author*::    Ryan Grove (mailto:ryan@wonko.com)
-*Version*::   1.3.0 (git)
+*Version*::   2.0.0 (git)
 *Copyright*:: Copyright (c) 2010 Ryan Grove. All rights reserved.
 *License*::   MIT License (http://opensource.org/licenses/mit-license.php)
 *Website*::   http://github.com/rgrove/sanitize
@@ -43,7 +43,7 @@ behind.
   require 'rubygems'
   require 'sanitize'
-  html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
+  html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
   Sanitize.clean(html) # => 'foo'
@@ -77,7 +77,7 @@ are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
 added to links.
   Sanitize.clean(html, Sanitize::Config::RELAXED)
-  # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'
+  # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
 === Custom Configuration
@@ -127,10 +127,9 @@ default value is <code>false</code>.
 Array of element names to allow. Specify all names in lowercase.
-  :elements => [
-    'a', 'b', 'blockquote', 'br', 'cite', 'code', 'dd', 'dl', 'dt', 'em',
-    'i', 'li', 'ol', 'p', 'pre', 'q', 'small', 'strike', 'strong', 'sub',
-    'sup', 'u', 'ul'
+  :elements => %w[
+    a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
+    q s samp small strike strong sub sup time u ul var
   ]
 ==== :output (Symbol)
@@ -140,12 +139,7 @@ defaulting to <code>:html</code>.
 ==== :output_encoding (String)
-Character encoding to use for HTML output. Default is <code>'utf-8'</code>.
-==== :process_text_nodes (Boolean)
-Whether or not to process text nodes. Enabling this will allow text nodes to be
-processed by transformers. The default is <code>false</code>.
+Character encoding to use for HTML output. Default is <code>utf-8</code>.
 ==== :protocols (Hash)
@@ -171,7 +165,7 @@ If set to +true+, Sanitize will remove the contents of any non-whitelisted
 elements in addition to the elements themselves. By default, Sanitize leaves the
 safe parts of an element's contents behind when the element is removed.
-If set to an Array of element names, then only the contents of the specified
+If set to an array of element names, then only the contents of the specified
 elements (when filtered) will be removed, and the contents of all other filtered
 elements will be left behind.
@@ -179,7 +173,8 @@ The default value is <code>false</code>.
 ==== :transformers
-See below.
+Custom transformer or array of custom transformers. See the Transformers section
+below for details.
 ==== :whitespace_elements (Array)
@@ -196,81 +191,80 @@ By default, the following elements are included in the
 === Transformers
-Transformers allow you to filter and alter nodes using your own custom logic, on
-top of (or instead of) Sanitize's core filter. A transformer is any object that
-responds to <code>call()</code> (such as a lambda or proc) and returns either
-<code>nil</code> or a Hash containing certain optional response values.
+Transformers allow you to filter and modify nodes using your own custom logic,
+on top of (or instead of) Sanitize's core filter. A transformer is any object
+that responds to <code>call()</code> (such as a lambda or proc).
 To use one or more transformers, pass them to the <code>:transformers</code>
-config setting:
+config setting. You may pass a single transformer or an array of transformers.
   Sanitize.clean(html, :transformers => [transformer_one, transformer_two])
 ==== Input
 Each registered transformer's <code>call()</code> method will be called once for
-each element node in the HTML, and will receive as an argument an environment
-Hash that contains the following items:
-[<code>:allowed_elements</code>]
-  Hash with whitelisted element names as keys, to facilitate fast lookups of
-  whitelisted elements.
+each node in the HTML (including elements, text nodes, comments, etc.), and will
+receive as an argument an environment Hash that contains the following items:
 [<code>:config</code>]
   The current Sanitize configuration Hash.
+[<code>:is_whitelisted</code>]
+  <code>true</code> if the current node has been whitelisted by a previous
+  transformer, <code>false</code> otherwise. It's generally bad form to remove a
+  node that a previous transformer has whitelisted.
 [<code>:node</code>]
-  A Nokogiri::XML::Node object representing an HTML element.
+  A Nokogiri::XML::Node object representing an HTML node. The node may be an
+  element, a text node, a comment, a CDATA node, or a document fragment. Use
+  Nokogiri's inspection methods (<code>element?</code>, <code>text?</code>,
+  etc.) to selectively ignore node types you aren't interested in.
 [<code>:node_name</code>]
   The name of the current HTML node, always lowercase (e.g. "div" or "span").
+  For non-element nodes, the name will be something like "text", "comment",
+  "#cdata-section", "#document-fragment", etc.
+[<code>:node_whitelist</code>]
+  Set of Nokogiri::XML::Node objects in the current document that have been
+  whitelisted by previous transformers, if any. It's generally bad form to
+  remove a node that a previous transformer has whitelisted.
-[<code>:whitelist_nodes</code>]
-  Array of Nokogiri::XML::Node instances that have already been whitelisted by
-  previous transformers, if any.
+==== Output
+A transformer doesn't have to return anything, but may optionally return a Hash,
+which may contain the following items:
+[<code>:node_whitelist</code>]
+  Array or Set of specific Nokogiri::XML::Node objects to add to the document's
+  whitelist, bypassing the current Sanitize config. These specific nodes and all
+  their attributes will be whitelisted, but their children will not be.
+If a transformer returns anything other than a Hash, the return value will be
+ignored.
 ==== Processing
 Each transformer has full access to the Nokogiri::XML::Node that's passed into
 it and to the rest of the document via the node's <code>document()</code>
-method. Any changes will be reflected instantly in the document and passed on to
-subsequently-called transformers and to Sanitize itself. A transformer may even
-call Sanitize internally to perform custom sanitization if needed.
+method. Any changes made to the current node or to the document will be
+reflected instantly in the document and passed on to subsequently-called
+transformers and to Sanitize itself. A transformer may even call Sanitize
+internally to perform custom sanitization if needed.
 Nodes are passed into transformers in the order in which they're traversed. It's
 important to note that Nokogiri traverses markup from the deepest node upward,
 not from the first node to the last node:
   html        = '<div><span>foo</span></div>'
-  transformer = lambda{|env| puts env[:node].name }
+  transformer = lambda{|env| puts env[:node_name] }
-  # Prints "span", then "div".
+  # Prints "text", "span", "div", "#document-fragment".
   Sanitize.clean(html, :transformers => transformer)
 Transformers have a tremendous amount of power, including the power to
-completely bypass Sanitize's built-in filtering. Be careful!
-==== Output
-A transformer may return either +nil+ or a Hash. A return value of +nil+
-indicates that the transformer does not wish to act on the current node in any
-way. A returned Hash may contain the following items, all of which are optional:
-[<code>:attr_whitelist</code>]
-  Array of attribute names to add to the whitelist for the current node, in
-  addition to any whitelisted attributes already defined in the current config.
-[<code>:node</code>]
-  A Nokogiri::XML::Node object that should replace the current node. All
-  subsequent transformers and Sanitize itself will receive this new node.
-[<code>:whitelist</code>]
-  If _true_, the current node (and only the current node) will be whitelisted,
-  regardless of the current Sanitize config.
-[<code>:whitelist_nodes</code>]
-  Array of specific Nokogiri::XML::Node objects to whitelist, anywhere in the
-  document, regardless of the current Sanitize config.
+completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
+your own hands.
 ==== Example: Transformer to whitelist YouTube video embeds
@@ -283,16 +277,20 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
   lambda do |env|
     node      = env[:node]
     node_name = env[:node_name]
-    parent    = node.parent
+    # Don't continue if this node is already whitelisted or is not an element.
+    return if env[:is_whitelisted] || !node.element?
+    parent = node.parent
     # Since the transformer receives the deepest nodes first, we look for a
     # <param> element or an <embed> element whose parent is an <object>.
-    return nil unless (node_name == 'param' || node_name == 'embed') &&
+    return unless (node_name == 'param' || node_name == 'embed') &&
         parent.name.to_s.downcase == 'object'
     if node_name == 'param'
       # Quick XPath search to find the <param> node that contains the video URL.
-      return nil unless movie_node = parent.search('param[@name="movie"]')[0]
+      return unless movie_node = parent.search('param[@name="movie"]')[0]
       url = movie_node['value']
     else
       # Since this is an <embed>, the video URL is in the "src" attribute. No
@@ -301,17 +299,18 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
     end
     # Verify that the video URL is actually a valid YouTube video URL.
-    return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
+    return unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
     # We're now certain that this is a YouTube embed, but we still need to run
     # it through a special Sanitize step to ensure that no unwanted elements or
     # attributes that don't belong in a YouTube embed can sneak in.
     Sanitize.clean_node!(parent, {
-      :elements   => ['embed', 'object', 'param'],
+      :elements => %w[embed object param],
       :attributes => {
-        'embed'  => ['allowfullscreen', 'allowscriptaccess', 'height', 'src', 'type', 'width'],
-        'object' => ['height', 'width'],
-        'param'  => ['name', 'value']
+        'embed'  => %w[allowfullscreen allowscriptaccess height src type width],
+        'object' => %w[height width],
+        'param'  => %w[name value]
       }
     })
@@ -319,30 +318,30 @@ by just whitelisting all <code><object></code>, <code><embed></code>, and
     # no unwanted elements or attributes hidden inside it, we can tell Sanitize
     # to whitelist the current node (<param> or <embed>) and its parent
     # (<object>).
-    {:whitelist_nodes => [node, parent]}
+    {:node_whitelist => [node, parent]}
   end
 == Contributors
-The following lovely people have contributed to Sanitize in the form of patches
-or ideas that later became code:
-* Ryan Grove <ryan@wonko.com>
-* Wilson Bilkovich <wilson@supremetyrant.com>
-* Peter Cooper <git@peterc.org>
-* Gabe da Silveira <gabe@websaviour.com>
-* Nicholas Evans <owlmanatt@gmail.com>
-* Adam Hooper <adam@adamhooper.com>
-* Mutwin Kraus <mutle@blogage.de>
-* Dev Purkayastha <dev.purkayastha@gmail.com>
-* David Reese <work@whatcould.com>
-* Ardie Saeidi <ardalan.saeidi@gmail.com>
-* Rafael Souza <me@rafaelss.com>
-* Ben Wanicur <bwanicur@verticalresponse.com>
+Sanitize was created and is currently maintained by Ryan Grove (ryan@wonko.com).
+The following lovely people have also contributed to Sanitize:
+* Wilson Bilkovich (wilson@supremetyrant.com)
+* Peter Cooper (git@peterc.org)
+* Gabe da Silveira (gabe@websaviour.com)
+* Nicholas Evans (owlmanatt@gmail.com)
+* Adam Hooper (adam@adamhooper.com)
+* Mutwin Kraus (mutle@blogage.de)
+* Dev Purkayastha (dev.purkayastha@gmail.com)
+* David Reese (work@whatcould.com)
+* Ardie Saeidi (ardalan.saeidi@gmail.com)
+* Rafael Souza (me@rafaelss.com)
+* Ben Wanicur (bwanicur@verticalresponse.com)
 == License
-Copyright (c) 2010 Ryan Grove <ryan@wonko.com>
+Copyright (c) 2010 Ryan Grove (ryan@wonko.com)
 Permission is hereby granted, free of charge, to any person obtaining a copy of
 this software and associated documentation files (the 'Software'), to deal in

data/lib/sanitize.rb CHANGED Viewed

@@ -21,12 +21,17 @@
 # SOFTWARE.
 #++
+require 'set'
 require 'nokogiri'
 require 'sanitize/version'
 require 'sanitize/config'
 require 'sanitize/config/restricted'
 require 'sanitize/config/basic'
 require 'sanitize/config/relaxed'
+require 'sanitize/transformers/clean_cdata'
+require 'sanitize/transformers/clean_comment'
+require 'sanitize/transformers/clean_element'
 class Sanitize
   attr_reader :config
@@ -45,21 +50,18 @@ class Sanitize
   # Returns a sanitized copy of _html_, using the settings in _config_ if
   # specified.
   def self.clean(html, config = {})
-    sanitize = Sanitize.new(config)
-    sanitize.clean(html)
+    Sanitize.new(config).clean(html)
   end
   # Performs Sanitize#clean in place, returning _html_, or +nil+ if no changes
   # were made.
   def self.clean!(html, config = {})
-    sanitize = Sanitize.new(config)
-    sanitize.clean!(html)
+    Sanitize.new(config).clean!(html)
   end
   # Sanitizes the specified Nokogiri::XML::Node and all its children.
   def self.clean_node!(node, config = {})
-    sanitize = Sanitize.new(config)
-    sanitize.clean_node!(node)
+    Sanitize.new(config).clean_node!(node)
   end
   #--
@@ -68,31 +70,15 @@ class Sanitize
   # Returns a new Sanitize object initialized with the settings in _config_.
   def initialize(config = {})
-    # Sanitize configuration.
-    @config = Config::DEFAULT.merge(config)
-    @config[:transformers] = Array(@config[:transformers].dup)
-    # Convert arrays to hashes for faster lookups.
-    @allowed_elements    = {}
-    @whitespace_elements = {}
-    @config[:elements].each {|el| @allowed_elements[el] = true }
-    @config[:whitespace_elements].each {|el| @whitespace_elements[el] = true }
-    # Convert the list of :remove_contents elements to a Hash for faster lookup.
-    @remove_all_contents     = false
-    @remove_element_contents = {}
-    if @config[:remove_contents].is_a?(Array)
-      @config[:remove_contents].each {|el| @remove_element_contents[el] = true }
-    else
-      @remove_all_contents = !!@config[:remove_contents]
-    end
-    # Specific nodes to whitelist (along with all their attributes). This array
-    # is generated at runtime by transformers, and is cleared before and after
-    # a fragment is cleaned (so it applies only to a specific fragment).
-    @whitelist_nodes = []
+    @config       = Config::DEFAULT.merge(config)
+    @transformers = Array(@config[:transformers].dup)
+    # Default transformers. These always run at the end of the transformer
+    # chain, after any custom transformers.
+    @transformers <<
+        Transformers::CleanComment <<
+        Transformers::CleanCDATA <<
+        Transformers::CleanElement.new(@config)
   end
   # Returns a sanitized copy of _html_.
@@ -129,130 +115,34 @@ class Sanitize
   def clean_node!(node)
     raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
-    @whitelist_nodes = []
-    node.traverse do |child|
-      if child.element? || (child.text? && @config[:process_text_nodes])
-        clean_element!(child)
-      elsif child.comment?
-        child.unlink unless @config[:allow_comments]
-      elsif child.cdata?
-        child.replace(Nokogiri::XML::Text.new(child.text, child.document))
-      end
-    end
-    @whitelist_nodes = []
+    node_whitelist = Set.new
+    node.traverse {|child| transform_node!(child, node_whitelist) }
     node
   end
   private
-  def clean_element!(node)
-    # Run this node through all configured transformers.
-    transform = transform_element!(node)
-    # If this node is in the dynamic whitelist array (built at runtime by
-    # transformers), let it live with all of its attributes intact.
-    return if @whitelist_nodes.include?(node)
-    name = node.name.to_s.downcase
-    # Delete any element that isn't in the whitelist.
-    unless transform[:whitelist] || @allowed_elements[name]
-      # Elements like br, div, p, etc. need to be replaced with whitespace in
-      # order to preserve readability.
-      if @whitespace_elements[name]
-        node.add_previous_sibling(' ')
-        node.add_next_sibling(' ') unless node.children.empty?
-      end
-      unless @remove_all_contents || @remove_element_contents[name]
-        node.children.each { |n| node.add_previous_sibling(n) }
-      end
-      node.unlink
-      return
-    end
-    attr_whitelist = (transform[:attr_whitelist] +
-        (@config[:attributes][name] || []) +
-        (@config[:attributes][:all] || [])).uniq
-    if attr_whitelist.empty?
-      # Delete all attributes from elements with no whitelisted attributes.
-      node.attribute_nodes.each {|attr| attr.remove }
-    else
-      # Delete any attribute that isn't in the whitelist for this element.
-      node.attribute_nodes.each do |attr|
-        attr.unlink unless attr_whitelist.include?(attr.name.downcase)
-      end
-      # Delete remaining attributes that use unacceptable protocols.
-      if @config[:protocols].has_key?(name)
-        protocol = @config[:protocols][name]
-        node.attribute_nodes.each do |attr|
-          attr_name = attr.name.downcase
-          next false unless protocol.has_key?(attr_name)
-          del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
-            !protocol[attr_name].include?($1.downcase)
-          else
-            !protocol[attr_name].include?(:relative)
-          end
-          attr.unlink if del
-        end
-      end
-    end
-    # Add required attributes.
-    if @config[:add_attributes].has_key?(name)
-      @config[:add_attributes][name].each do |key, val|
-        node[key] = val
-      end
-    end
-    transform
-  end
-  def transform_element!(node)
-    output = {
-      :attr_whitelist => [],
-      :node           => node,
-      :whitelist      => false
-    }
-    @config[:transformers].inject(node) do |transformer_node, transformer|
-      transform = transformer.call({
-        :allowed_elements => @allowed_elements,
-        :config           => @config,
-        :node             => transformer_node,
-        :node_name        => transformer_node.name.downcase,
-        :whitelist_nodes  => @whitelist_nodes
+  def transform_node!(node, node_whitelist)
+    @transformers.each do |transformer|
+      result = transformer.call({
+        :config         => @config,
+        :is_whitelisted => node_whitelist.include?(node),
+        :node           => node,
+        :node_name      => node.name.downcase,
+        :node_whitelist => node_whitelist
       })
-      if transform.nil?
-        transformer_node
-      elsif transform.is_a?(Hash)
-        if transform[:whitelist_nodes].is_a?(Array)
-          @whitelist_nodes += transform[:whitelist_nodes]
-          @whitelist_nodes.uniq!
-        end
+      # If the node has been unlinked, there's no point running subsequent
+      # transformers.
+      break if node.parent.nil? && !node.fragment?
-        output[:attr_whitelist]  += transform[:attr_whitelist] if transform[:attr_whitelist].is_a?(Array)
-        output[:whitelist]      ||= true if transform[:whitelist]
-        output[:node]             = transform[:node].is_a?(Nokogiri::XML::Node) ? transform[:node] : output[:node]
-      else
-        raise Error, "transformer output must be a Hash or nil"
+      if result.is_a?(Hash) && result[:node_whitelist].respond_to?(:each)
+        node_whitelist.merge(result[:node_whitelist])
       end
     end
-    node.replace(output[:node]) if node != output[:node]
-    return output
+    node
   end
   class Error < StandardError; end

data/lib/sanitize/config.rb CHANGED Viewed

@@ -47,10 +47,6 @@ class Sanitize
       # Character encoding to use for HTML output. Default is 'utf-8'.
       :output_encoding => 'utf-8',
-      # Whether or not to process text nodes. Enabling this will allow text
-      # nodes to be processed by transformers.
-      :process_text_nodes => false,
       # URL handling protocols to allow in specific attributes. By default, no
       # protocols are allowed. Use :relative in place of a protocol if you want
       # to allow relative URLs sans protocol.

data/lib/sanitize/transformers/clean_cdata.rb ADDED Viewed

@@ -0,0 +1,13 @@
+class Sanitize; module Transformers
+  CleanCDATA = lambda do |env|
+    return if env[:is_whitelisted]
+    node = env[:node]
+    if node.cdata?
+      node.replace(Nokogiri::XML::Text.new(node.text, node.document))
+    end
+  end
+end; end

data/lib/sanitize/transformers/clean_comment.rb ADDED Viewed

@@ -0,0 +1,10 @@
+class Sanitize; module Transformers
+  CleanComment = lambda do |env|
+    return if env[:is_whitelisted]
+    node = env[:node]
+    node.unlink if node.comment? && !env[:config][:allow_comments]
+  end
+end; end

data/lib/sanitize/transformers/clean_element.rb ADDED Viewed

@@ -0,0 +1,87 @@
+class Sanitize; module Transformers
+  class CleanElement
+    def initialize(config)
+      @config = config
+      # For faster lookups.
+      @add_attributes          = config[:add_attributes]
+      @allowed_elements        = {}
+      @attributes              = config[:attributes]
+      @protocols               = config[:protocols]
+      @remove_all_contents     = false
+      @remove_element_contents = {}
+      @whitespace_elements     = {}
+      config[:elements].each {|el| @allowed_elements[el] = true }
+      config[:whitespace_elements].each {|el| @whitespace_elements[el] = true }
+      if config[:remove_contents].is_a?(Array)
+        config[:remove_contents].each {|el| @remove_element_contents[el] = true }
+      else
+        @remove_all_contents = !!config[:remove_contents]
+      end
+    end
+    def call(env)
+      name = env[:node_name]
+      node = env[:node]
+      return if env[:is_whitelisted] || !node.element?
+      # Delete any element that isn't in the config whitelist.
+      unless @allowed_elements[name]
+        # Elements like br, div, p, etc. need to be replaced with whitespace in
+        # order to preserve readability.
+        if @whitespace_elements[name]
+          node.add_previous_sibling(' ')
+          node.add_next_sibling(' ') unless node.children.empty?
+        end
+        unless @remove_all_contents || @remove_element_contents[name]
+          node.children.each {|n| node.add_previous_sibling(n) }
+        end
+        node.unlink
+        return
+      end
+      attr_whitelist = Set.new((@attributes[name] || []) +
+          (@attributes[:all] || []))
+      if attr_whitelist.empty?
+        # Delete all attributes from elements with no whitelisted attributes.
+        node.attribute_nodes.each {|attr| attr.unlink }
+      else
+        # Delete any attribute that isn't in the whitelist for this element.
+        node.attribute_nodes.each do |attr|
+          attr.unlink unless attr_whitelist.include?(attr.name.downcase)
+        end
+        # Delete remaining attributes that use unacceptable protocols.
+        if @protocols.has_key?(name)
+          protocol = @protocols[name]
+          node.attribute_nodes.each do |attr|
+            attr_name = attr.name.downcase
+            next false unless protocol.has_key?(attr_name)
+            del = if attr.value.to_s.downcase =~ REGEX_PROTOCOL
+              !protocol[attr_name].include?($1.downcase)
+            else
+              !protocol[attr_name].include?(:relative)
+            end
+            attr.unlink if del
+          end
+        end
+      end
+      # Add required attributes.
+      if @add_attributes.has_key?(name)
+        @add_attributes[name].each {|key, val| node[key] = val }
+      end
+    end
+  end
+end; end

data/lib/sanitize/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 class Sanitize
-  VERSION = '1.3.0.dev.20101210'
+  VERSION = '2.0.0.dev.20101211'
 end

metadata CHANGED Viewed

@@ -3,12 +3,12 @@ name: sanitize
 version: !ruby/object:Gem::Version
   prerelease: true
   segments:
-  - 1
-  - 3
+  - 2
+  - 0
   - 0
   - dev
-  - 20101210
-  version: 1.3.0.dev.20101210
+  - 20101211
+  version: 2.0.0.dev.20101211
 platform: ruby
 authors:
 - Ryan Grove
@@ -16,7 +16,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-12-10 00:00:00 -08:00
+date: 2010-12-11 00:00:00 -08:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
@@ -80,6 +80,9 @@ files:
 - lib/sanitize/config/relaxed.rb
 - lib/sanitize/config/restricted.rb
 - lib/sanitize/config.rb
+- lib/sanitize/transformers/clean_cdata.rb
+- lib/sanitize/transformers/clean_comment.rb
+- lib/sanitize/transformers/clean_element.rb
 - lib/sanitize/version.rb
 - lib/sanitize.rb
 has_rdoc: true
@@ -99,8 +102,8 @@ required_ruby_version: !ruby/object:Gem::Requirement
       segments:
       - 1
       - 8
-      - 6
-      version: 1.8.6
+      - 7
+      version: 1.8.7
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements: