RubyGems - sanitize - Versions diffs - 1.2.1.dev.20100122 → 1.2.1.dev.20100124 - Mend

sanitize 1.2.1.dev.20100122 → 1.2.1.dev.20100124

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of sanitize might be problematic. Click here for more details.

Files changed (6) hide show

data/HISTORY CHANGED Viewed

@@ -2,8 +2,11 @@ Sanitize History
 ================================================================================
 Version 1.2.1 (git)
+  * Added an :escape_only config setting. If set to true, Sanitize will escape
+    non-whitelisted elements and their contents instead of removing them.
   * Added a :remove_contents config setting. If set to true, Sanitize will
-    remove the contents of filtered nodes in addition to the nodes themselves.
+    remove the contents of non-whitelisted elements in addition to the elements
+    themselves.
   * The environment hash passed into transformers now includes a :node_name item
     containing the lowercase name of the current HTML node (e.g. "div").
   * Returning anything other than a Hash or nil from a transformer will now

data/README.rdoc CHANGED Viewed

@@ -11,8 +11,7 @@ that you don't explicitly allow will be removed.
 Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
 of fragile regular expressions, Sanitize has no trouble dealing with malformed
-or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of
-caution.
+or maliciously-formed HTML, and will always output valid HTML or XHTML.
 *Author*::    Ryan Grove (mailto:ryan@wonko.com)
 *Version*::   1.2.1.dev (git)
@@ -134,6 +133,11 @@ Array of element names to allow. Specify all names in lowercase.
     'sup', 'u', 'ul'
   ]
+==== :escape_only (boolean)
+If set to +true+, Sanitize will escape non-whitelisted elements and their
+contents rather than removing them.
 ==== :output (Symbol)
 Output format. Supported formats are <code>:html</code> and <code>:xhtml</code>,
@@ -159,9 +163,12 @@ include the symbol <code>:relative</code> in the protocol array:
 ==== :remove_contents (boolean)
-If set to <code>true</code>, Sanitize will remove the contents of any filtered
-nodes in addition to the nodes themselves. By default, Sanitize leaves the safe
-parts of a node's contents behind when the node is removed.
+If set to +true+, Sanitize will remove the contents of any non-whitelisted
+elements in addition to the elements themselves. By default, Sanitize leaves the
+safe parts of an element's contents behind when the element is removed.
+If both <code>:escape_only</code> and <code>:remove_contents</code> are enabled,
+<code>:remove_contents</code> will take precedence.
 ==== :transformers

data/lib/sanitize/config.rb CHANGED Viewed

@@ -40,6 +40,10 @@ class Sanitize
       # that all HTML will be stripped).
       :elements => [],
+      # If this is true, Sanitize will escape non-whitelisted elements and their
+      # contents rather than removing them.
+      :escape_only => false,
       # Output format. Supported formats are :html and :xhtml (which is the
       # default).
       :output => :xhtml,
@@ -49,9 +53,13 @@ class Sanitize
       # to allow relative URLs sans protocol.
       :protocols => {},
-      # If this is true, Sanitize will remove the contents of any filtered nodes
-      # in addition to the nodes themselves. By default, Sanitize leaves the
-      # safe parts of a node's contents behind when the node is removed.
+      # If this is true, Sanitize will remove the contents of any filtered
+      # elements in addition to the elements themselves. By default, Sanitize
+      # leaves the safe parts of an element's contents behind when the element
+      # is removed.
+      #
+      # If both :escape_only and :remove_contents are true, :remove_contents
+      # will take precedence.
       :remove_contents => false,
       # Transformers allow you to filter or alter nodes using custom logic. See

data/lib/sanitize/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 class Sanitize
-  VERSION = '1.2.1.dev.20100122'
+  VERSION = '1.2.1.dev.20100124'
 end

data/lib/sanitize.rb CHANGED Viewed

@@ -72,6 +72,15 @@ class Sanitize
     @config = Config::DEFAULT.merge(config)
     @config[:transformers] = Array(@config[:transformers])
+    # :remove_contents takes precedence over :escape_only.
+    if @config[:remove_contents] && @config[:escape_only]
+      @config[:escape_only] = false
+    end
+    # Convert the list of allowed elements to a Hash for faster lookup.
+    @allowed_elements = {}
+    @config[:elements].each {|el| @allowed_elements[el] = true }
     # Specific nodes to whitelist (along with all their attributes). This array
     # is generated at runtime by transformers, and is cleared before and after
     # a fragment is cleaned (so it applies only to a specific fragment).
@@ -87,10 +96,8 @@ class Sanitize
   # Performs clean in place, returning _html_, or +nil+ if no changes were
   # made.
   def clean!(html)
-    @whitelist_nodes = []
     fragment = Nokogiri::HTML::DocumentFragment.parse(html)
     clean_node!(fragment)
-    @whitelist_nodes = []
     output_method_params = {:encoding => 'utf-8', :indent => 0}
@@ -116,17 +123,26 @@ class Sanitize
   def clean_node!(node)
     raise ArgumentError unless node.is_a?(Nokogiri::XML::Node)
-    node.traverse do |traversed_node|
-      if traversed_node.element?
-        clean_element!(traversed_node)
-      elsif traversed_node.comment?
-        traversed_node.unlink unless @config[:allow_comments]
-      elsif traversed_node.cdata?
-        traversed_node.replace(Nokogiri::XML::Text.new(traversed_node.text,
-            traversed_node.document))
+    @whitelist_nodes = []
+    node.traverse do |child|
+      if child.element?
+        clean_element!(child)
+      elsif child.comment?
+        unless @config[:allow_comments]
+          if @config[:escape_only]
+            child.replace(Nokogiri::XML::Text.new(child.to_s, child.document))
+          else
+            child.unlink
+          end
+        end
+      elsif child.cdata?
+        child.replace(Nokogiri::XML::Text.new(child.text, child.document))
       end
     end
+    @whitelist_nodes = []
     node
   end
@@ -143,12 +159,17 @@ class Sanitize
     name = node.name.to_s.downcase
     # Delete any element that isn't in the whitelist.
-    unless transform[:whitelist] || @config[:elements].include?(name)
-      unless @config[:remove_contents]
-        node.children.each { |n| node.add_previous_sibling(n) }
+    unless transform[:whitelist] || @allowed_elements[name]
+      if @config[:escape_only]
+        node.replace(Nokogiri::XML::Text.new(node.to_s, node.document))
+      else
+        unless @config[:remove_contents]
+          node.children.each { |n| node.add_previous_sibling(n) }
+        end
+        node.unlink
       end
-      node.unlink
       return
     end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: sanitize
 version: !ruby/object:Gem::Version
-  version: 1.2.1.dev.20100122
+  version: 1.2.1.dev.20100124
 platform: ruby
 authors:
 - Ryan Grove
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2010-01-22 00:00:00 -08:00
+date: 2010-01-24 00:00:00 -08:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency