RubyGems - sanitize - Versions diffs - 2.0.6 → 2.1.0 - Mend

sanitize 2.0.6 → 2.1.0

Potentially problematic release.

This version of sanitize might be problematic. Click here for more details.

Files changed (12) hide show

checksums.yaml +4 -4
data/HISTORY.md +16 -0
data/LICENSE +1 -1
data/README.md +399 -0
data/lib/sanitize.rb +19 -2
data/lib/sanitize/config.rb +2 -1
data/lib/sanitize/config/relaxed.rb +4 -4
data/lib/sanitize/transformers/clean_element.rb +18 -2
data/lib/sanitize/version.rb +1 -1
data/test/test_sanitize.rb +55 -0
metadata +46 -18
data/README.rdoc +0 -367

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 240d390dd3a6813197ab1e3ccafb42f2103bf136
-  data.tar.gz: 5813a179d76ec2e44a7eb0bd0a7582c23ea0696b
+  metadata.gz: a1be4f7e5790c7e0fa8943b793803e507bbaa2ce
+  data.tar.gz: a879b798b76f4bfff12532e4779bb418a89d4500
 SHA512:
-  metadata.gz: dbc7db8d41dbac5be557a50ab69d096fe5373cd8310196b4666e7e0d7fb3f12c5138e7605c86b4ccc713b500e9ad2c0e3e06374b891c12b0fed7b2949c90868c
-  data.tar.gz: 401cdf8549edce7742fb6b0498aa82da817926537f9eca32ec87b400290541aa1051645c7dea61de8edd128f8302a42d39359e5b1bc30d4a3fb25de673024c96
+  metadata.gz: ecdbc579a9ed3f737539118ac5b6c17612a736268263fafd03b9daf39da433309a11e090494c2008859edc16c278dcc1ea63ea52b5693479c625b825bbbfbc80
+  data.tar.gz: 4fff69ad6c6812fb6aac4c492a7644f196faeb82039096dcd204461b07872a05d97c02e0b92237fc65b36891783256e84ee335fc83b03365e92ec5e07a2af57e

data/HISTORY.md CHANGED

@@ -1,6 +1,22 @@
 Sanitize History
 ================================================================================
+Version 2.1.0 (2014-01-13)
+--------------------------
+* Added support for whitelisting arbitrary HTML5 `data-*` attributes. Use the
+  symbol `:data` instead of an attribute name in the `:attributes` config to
+  indicate that arbitrary data attributes should be allowed on an element.
+* Added the following elements to the relaxed config: `address`, `bdi`, `hr`,
+  and `summary`.
+* Fixed: A colon (`:`) character in a URL fragment identifier such as `#foo:1`
+  was incorrectly treated as a protocol delimiter. [@heathd - #87][87]
+[87]:https://github.com/rgrove/sanitize/pull/87
 Version 2.0.6 (2013-07-10)
 --------------------------

data/LICENSE CHANGED

@@ -1,4 +1,4 @@
-Copyright (c) 2013 Ryan Grove <ryan@wonko.com>
+Copyright (c) 2014 Ryan Grove <ryan@wonko.com>
 Permission is hereby granted, free of charge, to any person obtaining a copy of
 this software and associated documentation files (the 'Software'), to deal in

data/README.md ADDED

@@ -0,0 +1,399 @@
+Sanitize
+========
+Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
+elements and attributes, Sanitize will remove all unacceptable HTML from a
+string.
+Using a simple configuration syntax, you can tell Sanitize to allow certain
+elements, certain attributes within those elements, and even certain URL
+protocols within attributes that contain URLs. Any HTML elements or attributes
+that you don't explicitly allow will be removed.
+Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
+of fragile regular expressions, Sanitize has no trouble dealing with malformed
+or maliciously-formed HTML and returning safe output.
+[![Build Status](https://travis-ci.org/rgrove/sanitize.png?branch=master)](https://travis-ci.org/rgrove/sanitize?branch=master)
+Installation
+-------------
+```
+gem install sanitize
+```
+Usage
+-----
+If you don't specify any configuration options, Sanitize will use its strictest
+settings by default, which means it will strip all HTML and leave only text
+behind.
+```ruby
+require 'rubygems'
+require 'sanitize'
+html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
+Sanitize.clean(html) # => 'foo'
+# or sanitize an entire HTML document (example assumes _html_ is whitelisted)
+html = '<!DOCTYPE html><html><b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg"></html>'
+Sanitize.clean_document(html) # => '<!DOCTYPE html>\n<html>foo</html>\n'
+```
+Configuration
+-------------
+In addition to the ultra-safe default settings, Sanitize comes with three other
+built-in modes.
+### Sanitize::Config::RESTRICTED
+Allows only very simple inline formatting markup. No links, images, or block
+elements.
+```ruby
+Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
+```
+### Sanitize::Config::BASIC
+Allows a variety of markup including formatting tags, links, and lists. Images
+and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
+protocols, and a `rel="nofollow"` attribute is added to all links to
+mitigate SEO spam.
+```ruby
+Sanitize.clean(html, Sanitize::Config::BASIC)
+# => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
+```
+### Sanitize::Config::RELAXED
+Allows an even wider variety of markup than BASIC, including images and tables.
+Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
+are limited to HTTP and HTTPS. In this mode, `rel="nofollow"` is not added to
+links.
+```ruby
+Sanitize.clean(html, Sanitize::Config::RELAXED)
+# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
+```
+### Custom Configuration
+If the built-in modes don't meet your needs, you can easily specify a custom
+configuration:
+```ruby
+Sanitize.clean(html, :elements => ['a', 'span'],
+    :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
+    :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
+```
+#### :add_attributes (Hash)
+Attributes to add to specific elements. If the attribute already exists, it will
+be replaced with the value specified here. Specify all element names and
+attributes in lowercase.
+```ruby
+:add_attributes => {
+  'a' => {'rel' => 'nofollow'}
+}
+```
+#### :allow_comments (boolean)
+Whether or not to allow HTML comments. Allowing comments is strongly
+discouraged, since IE allows script execution within conditional comments. The
+default value is `false`.
+#### :attributes (Hash)
+Attributes to allow for specific elements. Specify all element names and
+attributes in lowercase.
+```ruby
+:attributes => {
+  'a'          => ['href', 'title'],
+  'blockquote' => ['cite'],
+  'img'        => ['alt', 'src', 'title']
+}
+```
+If you'd like to allow certain attributes on all elements, use the symbol
+`:all` instead of an element name.
+```ruby
+# Allow the class attribute on all elements.
+:attributes => {
+  :all => ['class'],
+  'a'  => ['href', 'title']
+}
+```
+To allow arbitrary HTML5 `data-*` attributes, use the symbol
+`:data` in place of an attribute name.
+```ruby
+# Allow arbitrary HTML5 data-* attributes on <div> elements.
+:attributes => {
+  'div' => [:data]
+}
+```
+#### :elements (Array)
+Array of element names to allow. Specify all names in lowercase.
+```ruby
+:elements => %w[
+  a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
+  q s samp small strike strong sub sup time u ul var
+]
+```
+#### :output (Symbol)
+Output format. Supported formats are `:html` and `:xhtml`,
+defaulting to `:html`.
+#### :output_encoding (String)
+Character encoding to use for HTML output. Default is `utf-8`.
+#### :protocols (Hash)
+URL protocols to allow in specific attributes. If an attribute is listed here
+and contains a protocol other than those specified (or if it contains no
+protocol at all), it will be removed.
+```ruby
+:protocols => {
+  'a'   => {'href' => ['ftp', 'http', 'https', 'mailto']},
+  'img' => {'src'  => ['http', 'https']}
+}
+```
+If you'd like to allow the use of relative URLs which don't have a protocol,
+include the symbol `:relative` in the protocol array:
+```ruby
+:protocols => {
+  'a' => {'href' => ['http', 'https', :relative]}
+}
+```
+#### :remove_contents (boolean or Array)
+If set to +true+, Sanitize will remove the contents of any non-whitelisted
+elements in addition to the elements themselves. By default, Sanitize leaves the
+safe parts of an element's contents behind when the element is removed.
+If set to an array of element names, then only the contents of the specified
+elements (when filtered) will be removed, and the contents of all other filtered
+elements will be left behind.
+The default value is `false`.
+#### :transformers
+Custom transformer or array of custom transformers to run using depth-first
+traversal. See the Transformers section below for details.
+#### :transformers_breadth
+Custom transformer or array of custom transformers to run using breadth-first
+traversal. See the Transformers section below for details.
+#### :whitespace_elements (Array)
+Array of lowercase element names that should be replaced with whitespace when
+removed in order to preserve readability. For example,
+`foo<div>bar</div>baz` will become
+`foo bar baz` when the `<div>` is removed.
+By default, the following elements are included in the
+`:whitespace_elements` array:
+```
+address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
+h6 header hgroup hr li nav ol p pre section ul
+```
+### Transformers
+Transformers allow you to filter and modify nodes using your own custom logic,
+on top of (or instead of) Sanitize's core filter. A transformer is any object
+that responds to `call()` (such as a lambda or proc).
+To use one or more transformers, pass them to the `:transformers`
+config setting. You may pass a single transformer or an array of transformers.
+```ruby
+Sanitize.clean(html, :transformers => [transformer_one, transformer_two])
+```
+#### Input
+Each registered transformer's `call()` method will be called once for
+each node in the HTML (including elements, text nodes, comments, etc.), and will
+receive as an argument an environment Hash that contains the following items:
+  * **:config** - The current Sanitize configuration Hash.
+  * **:is_whitelisted** - `true` if the current node has been whitelisted by a
+    previous transformer, `false` otherwise. It's generally bad form to remove
+    a node that a previous transformer has whitelisted.
+  * **:node** - A `Nokogiri::XML::Node` object representing an HTML node. The
+    node may be an element, a text node, a comment, a CDATA node, or a document
+    fragment. Use Nokogiri's inspection methods (`element?`, `text?`, etc.) to
+    selectively ignore node types you aren't interested in.
+  * **:node_name** - The name of the current HTML node, always lowercase (e.g.
+    "div" or "span"). For non-element nodes, the name will be something like
+    "text", "comment", "#cdata-section", "#document-fragment", etc.
+  * **:node_whitelist** - Set of `Nokogiri::XML::Node` objects in the current
+    document that have been whitelisted by previous transformers, if any. It's
+    generally bad form to remove a node that a previous transformer has
+    whitelisted.
+  * **:traversal_mode** - Current node traversal mode, either `:depth` for
+    depth-first (the default mode) or `:breadth` for breadth-first.
+#### Output
+A transformer doesn't have to return anything, but may optionally return a Hash,
+which may contain the following items:
+  * **:node_whitelist** -  Array or Set of specific Nokogiri::XML::Node objects
+    to add to the document's whitelist, bypassing the current Sanitize config.
+    These specific nodes and all their attributes will be whitelisted, but
+    their children will not be.
+If a transformer returns anything other than a Hash, the return value will be
+ignored.
+#### Processing
+Each transformer has full access to the `Nokogiri::XML::Node` that's passed into
+it and to the rest of the document via the node's `document()` method. Any
+changes made to the current node or to the document will be reflected instantly
+in the document and passed on to subsequently called transformers and to
+Sanitize itself. A transformer may even call Sanitize internally to perform
+custom sanitization if needed.
+Nodes are passed into transformers in the order in which they're traversed. By
+default, depth-first traversal is used, meaning that markup is traversed from
+the deepest node upward (not from the first node to the last node):
+```ruby
+html        = '<div><span>foo</span></div>'
+transformer = lambda{|env| puts env[:node_name] }
+# Prints "text", "span", "div", "#document-fragment".
+Sanitize.clean(html, :transformers => transformer)
+```
+You may use the `:transformers_breadth` config to specify one or more
+transformers that should traverse nodes in breadth-first mode:
+```ruby
+html        = '<div><span>foo</span></div>'
+transformer = lambda{|env| puts env[:node_name] }
+# Prints "#document-fragment", "div", "span", "text".
+Sanitize.clean(html, :transformers_breadth => transformer)
+```
+Transformers have a tremendous amount of power, including the power to
+completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
+your own hands.
+#### Example: Transformer to whitelist YouTube video embeds
+The following example demonstrates how to create a depth-first Sanitize
+transformer that will safely whitelist valid YouTube video embeds without having
+to blindly allow other kinds of embedded content, which would be the case if you
+tried to do this by just whitelisting all `<iframe>` elements:
+```ruby
+lambda do |env|
+  node      = env[:node]
+  node_name = env[:node_name]
+  # Don't continue if this node is already whitelisted or is not an element.
+  return if env[:is_whitelisted] || !node.element?
+  # Don't continue unless the node is an iframe.
+  return unless node_name == 'iframe'
+  # Verify that the video URL is actually a valid YouTube video URL.
+  return unless node['src'] =~ /\A(https?:)?\/\/(?:www\.)?youtube(?:-nocookie)?\.com\//
+  # We're now certain that this is a YouTube embed, but we still need to run
+  # it through a special Sanitize step to ensure that no unwanted elements or
+  # attributes that don't belong in a YouTube embed can sneak in.
+  Sanitize.clean_node!(node, {
+    :elements => %w[iframe],
+    :attributes => {
+      'iframe'  => %w[allowfullscreen frameborder height src width]
+    }
+  })
+  # Now that we're sure that this is a valid YouTube embed and that there are
+  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
+  # to whitelist the current node.
+  {:node_whitelist => [node]}
+end
+```
+Contributors
+------------
+Sanitize was created and is maintained by Ryan Grove (ryan@wonko.com).
+The following lovely people have also contributed to Sanitize:
+* Ben Anderson
+* Wilson Bilkovich
+* Peter Cooper
+* Gabe da Silveira
+* Nicholas Evans
+* Nils Gemeinhardt
+* Adam Hooper
+* Mutwin Kraus
+* Eaden McKee
+* Dev Purkayastha
+* David Reese
+* Ardie Saeidi
+* Rafael Souza
+* Ben Wanicur
+License
+-------
+Copyright (c) 2014 Ryan Grove (ryan@wonko.com)
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the 'Software'), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/lib/sanitize.rb CHANGED

@@ -36,12 +36,26 @@ require 'sanitize/transformers/clean_element'
 class Sanitize
   attr_reader :config
+  # Matches a valid HTML5 data attribute name. The unicode ranges included here
+  # are a conservative subset of the full range of characters that are
+  # technically allowed, with the intent of matching the most common characters
+  # used in data attribute names while excluding uncommon or potentially
+  # misleading characters, or characters with the potential to be normalized
+  # into unsafe or confusing forms.
+  #
+  # If you need data attr names with characters that aren't included here (such
+  # as combining marks, full-width characters, or CJK), please consider creating
+  # a custom transformer to validate attributes according to your needs.
+  #
+  # http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#embedding-custom-non-visible-data-with-the-data-*-attributes
+  REGEX_DATA_ATTR = /\Adata-(?!xml)[a-z_][\w.\u00E0-\u00F6\u00F8-\u017F\u01DD-\u02AF-]*\z/u
   # Matches an attribute value that could be treated by a browser as a URL
   # with a protocol prefix, such as "http:" or "javascript:". Any string of zero
   # or more characters followed by a colon is considered a match, even if the
   # colon is encoded as an entity and even if it's an incomplete entity (which
   # IE6 and Opera will still parse).
-  REGEX_PROTOCOL = /\A([^\/]*?)(?:\:|&#0*58|&#x0*3a)/i
+  REGEX_PROTOCOL = /\A([^\/#]*?)(?:\:|&#0*58|&#x0*3a)/i
   #--
   # Class Methods
@@ -99,7 +113,7 @@ class Sanitize
         Transformers::CleanElement.new(@config)
   end
-  # Returns a sanitized copy of _html_.
+  # Returns a sanitized copy of the given _html_ fragment.
   def clean(html)
     if html
       dupe = html.dup
@@ -129,12 +143,15 @@ class Sanitize
     return result == html ? nil : html[0, html.length] = result
   end
+  # Returns a sanitized copy of the given full _html_ document.
   def clean_document(html)
     unless html.nil?
       clean_document!(html.dup) || html
     end
   end
+  # Performs clean_document in place, returning _html_, or +nil+ if no changes
+  # were made.
   def clean_document!(html)
     if !@config[:elements].include?('html') && !@config[:remove_contents]
       raise 'You must have the HTML element whitelisted to call #clean_document unless remove_contents is set to true'

data/lib/sanitize/config.rb CHANGED

@@ -34,7 +34,8 @@ class Sanitize
       :add_attributes => {},
       # HTML attributes to allow in specific elements. By default, no attributes
-      # are allowed.
+      # are allowed. Use the symbol :data to indicate that arbitrary HTML5
+      # data-* attributes should be allowed.
       :attributes => {},
       # HTML elements to allow. By default, no elements are allowed (which means

data/lib/sanitize/config/relaxed.rb CHANGED

@@ -24,10 +24,10 @@ class Sanitize
   module Config
     RELAXED = {
       :elements => %w[
-        a abbr b bdo blockquote br caption cite code col colgroup dd del dfn dl
-        dt em figcaption figure h1 h2 h3 h4 h5 h6 hgroup i img ins kbd li mark
-        ol p pre q rp rt ruby s samp small strike strong sub sup table tbody td
-        tfoot th thead time tr u ul var wbr
+        a abbr address b bdi bdo blockquote br caption cite code col colgroup dd
+        del dfn dl dt em figcaption figure h1 h2 h3 h4 h5 h6 hgroup hr i img ins
+        kbd li mark ol p pre q rp rt ruby s samp small strike strong sub summary
+        sup table tbody td tfoot th thead time tr u ul var wbr
       ],
       :attributes => {

data/lib/sanitize/transformers/clean_element.rb CHANGED

@@ -49,13 +49,29 @@ class Sanitize; module Transformers
       attr_whitelist = Set.new((@attributes[name] || []) +
           (@attributes[:all] || []))
+      allow_data_attributes = attr_whitelist.include?(:data)
       if attr_whitelist.empty?
         # Delete all attributes from elements with no whitelisted attributes.
         node.attribute_nodes.each {|attr| attr.unlink }
       else
-        # Delete any attribute that isn't in the whitelist for this element.
+        # Delete any attribute that isn't allowed on this element.
         node.attribute_nodes.each do |attr|
-          attr.unlink unless attr_whitelist.include?(attr.name.downcase)
+          attr_name = attr.name.downcase
+          unless attr_whitelist.include?(attr_name)
+            # The attribute isn't explicitly whitelisted.
+            if allow_data_attributes && attr_name.start_with?('data-')
+              # Arbitrary data attributes are allowed. Verify that the attribute
+              # is a valid data attribute.
+              attr.unlink unless attr_name =~ REGEX_DATA_ATTR
+            else
+              # Either the attribute isn't a data attribute, or arbitrary data
+              # attributes aren't allowed. Remove the attribute.
+              attr.unlink
+            end
+          end
         end
         # Delete remaining attributes that use unacceptable protocols.

data/lib/sanitize/version.rb CHANGED

@@ -1,3 +1,3 @@
 class Sanitize
-  VERSION = '2.0.6'
+  VERSION = '2.1.0'
 end

data/test/test_sanitize.rb CHANGED

@@ -344,6 +344,16 @@ describe 'Custom configs' do
     Sanitize.clean(input, { :elements => ['a'], :attributes => {'a' => ['href']}, :protocols => { 'a' => { 'href' => [:relative] }} }).must_equal(input)
   end
+  it 'should allow relative URLs containing colons where the colon is part of an anchor' do
+    input = '<a href="#fn:1">Footnote 1</a>'
+    Sanitize.clean(input, { :elements => ['a'], :attributes => {'a' => ['href']}, :protocols => { 'a' => { 'href' => [:relative] }} }).must_equal(input)
+  end
+  it 'should allow relative URLs containing colons where the colon is part of an anchor' do
+    input = '<a href="somepage#fn:1">Footnote 1</a>'
+    Sanitize.clean(input, { :elements => ['a'], :attributes => {'a' => ['href']}, :protocols => { 'a' => { 'href' => [:relative] }} }).must_equal(input)
+  end
   it 'should output HTML when :output == :html' do
     input = 'foo<br/>bar<br>baz'
     Sanitize.clean(input, :elements => ['br'], :output => :html).must_equal('foo<br>bar<br>baz')
@@ -366,6 +376,51 @@ describe 'Custom configs' do
     Sanitize.clean(html).must_equal("foo\302\240bar")
     Sanitize.clean(html, :output_encoding => 'ASCII').must_equal("foo&#160;bar")
   end
+  it 'should not allow arbitrary HTML5 data attributes by default' do
+    config = {
+      :elements => ['b']
+    }
+    Sanitize.clean('<b data-foo="bar"></b>', config)
+      .must_equal('<b></b>')
+    config[:attributes] = {'b' => ['class']}
+    Sanitize.clean('<b class="foo" data-foo="bar"></b>', config)
+      .must_equal('<b class="foo"></b>')
+  end
+  it 'should allow arbitrary HTML5 data attributes when the :attributes config includes :data' do
+    config = {
+      :attributes => {'b' => [:data]},
+      :elements   => ['b']
+    }
+    Sanitize.clean('<b data-foo="valid" data-bar="valid"></b>', config)
+      .must_equal('<b data-foo="valid" data-bar="valid"></b>')
+    Sanitize.clean('<b data-="invalid"></b>', config)
+      .must_equal('<b></b>')
+    Sanitize.clean('<b data-="invalid"></b>', config)
+      .must_equal('<b></b>')
+    Sanitize.clean('<b data-xml="invalid"></b>', config)
+      .must_equal('<b></b>')
+    Sanitize.clean('<b data-xmlfoo="invalid"></b>', config)
+      .must_equal('<b></b>')
+    Sanitize.clean('<b data-f:oo="valid"></b>', config)
+      .must_equal('<b></b>')
+    Sanitize.clean('<b data-f/oo="partial"></b>', config)
+      .must_equal('<b data-f></b>') # Nokogiri quirk; not ideal, but harmless
+    Sanitize.clean('<b data-éfoo="valid"></b>', config)
+      .must_equal('<b></b>') # Another annoying Nokogiri quirk.
+  end
 end
 describe 'Sanitize.clean' do

metadata CHANGED

@@ -1,57 +1,85 @@
 --- !ruby/object:Gem::Specification
 name: sanitize
 version: !ruby/object:Gem::Version
-  version: 2.0.6
+  version: 2.1.0
 platform: ruby
 authors:
 - Ryan Grove
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2013-07-11 00:00:00.000000000 Z
+date: 2014-01-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: nokogiri
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: 1.4.4
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - ">="
       - !ruby/object:Gem::Version
         version: 1.4.4
 - !ruby/object:Gem::Dependency
   name: minitest
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: 2.0.0
+        version: '4.7'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: 2.0.0
+        version: '4.7'
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.9'
+        version: '10.1'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - '>='
+    - - "~>"
       - !ruby/object:Gem::Version
-        version: '0.9'
+        version: '10.1'
+- !ruby/object:Gem::Dependency
+  name: redcarpet
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 3.0.0
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 3.0.0
+- !ruby/object:Gem::Dependency
+  name: yard
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: 0.8.7
 description:
 email: ryan@wonko.com
 executables: []
@@ -60,16 +88,16 @@ extra_rdoc_files: []
 files:
 - HISTORY.md
 - LICENSE
-- README.rdoc
+- README.md
+- lib/sanitize.rb
+- lib/sanitize/config.rb
 - lib/sanitize/config/basic.rb
 - lib/sanitize/config/relaxed.rb
 - lib/sanitize/config/restricted.rb
-- lib/sanitize/config.rb
 - lib/sanitize/transformers/clean_cdata.rb
 - lib/sanitize/transformers/clean_comment.rb
 - lib/sanitize/transformers/clean_element.rb
 - lib/sanitize/version.rb
-- lib/sanitize.rb
 - test/test_sanitize.rb
 homepage: https://github.com/rgrove/sanitize/
 licenses: []
@@ -80,17 +108,17 @@ require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
-  - - '>='
+  - - ">="
     - !ruby/object:Gem::Version
       version: 1.9.2
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
-  - - '>='
+  - - ">="
     - !ruby/object:Gem::Version
       version: 1.2.0
 requirements: []
 rubyforge_project:
-rubygems_version: 2.0.0
+rubygems_version: 2.2.0
 signing_key:
 specification_version: 4
 summary: Whitelist-based HTML sanitizer.

data/README.rdoc DELETED

@@ -1,367 +0,0 @@
-= Sanitize
-Sanitize is a whitelist-based HTML sanitizer. Given a list of acceptable
-elements and attributes, Sanitize will remove all unacceptable HTML from a
-string.
-Using a simple configuration syntax, you can tell Sanitize to allow certain
-elements, certain attributes within those elements, and even certain URL
-protocols within attributes that contain URLs. Any HTML elements or attributes
-that you don't explicitly allow will be removed.
-Because it's based on Nokogiri, a full-fledged HTML parser, rather than a bunch
-of fragile regular expressions, Sanitize has no trouble dealing with malformed
-or maliciously-formed HTML, and will always output valid HTML or XHTML.
-*Author*::    Ryan Grove (mailto:ryan@wonko.com)
-*Version*::   2.0.6 (2013-07-10)
-*Copyright*:: Copyright (c) 2013 Ryan Grove. All rights reserved.
-*License*::   MIT License (http://opensource.org/licenses/mit-license.php)
-*Website*::   http://github.com/rgrove/sanitize
-{<img src="https://secure.travis-ci.org/rgrove/sanitize.png?branch=master" alt="Build Status" />}[http://travis-ci.org/rgrove/sanitize]
-== Installation
-Latest stable release:
-  gem install sanitize
-Latest development version:
-  gem install sanitize --pre
-== Usage
-If you don't specify any configuration options, Sanitize will use its strictest
-settings by default, which means it will strip all HTML and leave only text
-behind.
-  require 'rubygems'
-  require 'sanitize'
-  html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
-  Sanitize.clean(html) # => 'foo'
-  ...
-  # or sanitize an entire HTML document (example assumes _html_ is whitelisted)
-  html = '<!DOCTYPE html><html><b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg"></html>'
-  Sanitize.clean_document(html) # => '<!DOCTYPE html>\n<html>foo</html>\n'
-== Configuration
-In addition to the ultra-safe default settings, Sanitize comes with three other
-built-in modes.
-=== Sanitize::Config::RESTRICTED
-Allows only very simple inline formatting markup. No links, images, or block
-elements.
-  Sanitize.clean(html, Sanitize::Config::RESTRICTED) # => '<b>foo</b>'
-=== Sanitize::Config::BASIC
-Allows a variety of markup including formatting tags, links, and lists. Images
-and tables are not allowed, links are limited to FTP, HTTP, HTTPS, and mailto
-protocols, and a <code>rel="nofollow"</code> attribute is added to all links to
-mitigate SEO spam.
-  Sanitize.clean(html, Sanitize::Config::BASIC)
-  # => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'
-=== Sanitize::Config::RELAXED
-Allows an even wider variety of markup than BASIC, including images and tables.
-Links are still limited to FTP, HTTP, HTTPS, and mailto protocols, while images
-are limited to HTTP and HTTPS. In this mode, <code>rel="nofollow"</code> is not
-added to links.
-  Sanitize.clean(html, Sanitize::Config::RELAXED)
-  # => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg">'
-=== Custom Configuration
-If the built-in modes don't meet your needs, you can easily specify a custom
-configuration:
-  Sanitize.clean(html, :elements => ['a', 'span'],
-      :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
-      :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})
-==== :add_attributes (Hash)
-Attributes to add to specific elements. If the attribute already exists, it will
-be replaced with the value specified here. Specify all element names and
-attributes in lowercase.
-  :add_attributes => {
-    'a' => {'rel' => 'nofollow'}
-  }
-==== :attributes (Hash)
-Attributes to allow for specific elements. Specify all element names and
-attributes in lowercase.
-  :attributes => {
-    'a'          => ['href', 'title'],
-    'blockquote' => ['cite'],
-    'img'        => ['alt', 'src', 'title']
-  }
-If you'd like to allow certain attributes on all elements, use the symbol
-<code>:all</code> instead of an element name.
-  :attributes => {
-    :all => ['class'],
-    'a'  => ['href', 'title']
-  }
-==== :allow_comments (boolean)
-Whether or not to allow HTML comments. Allowing comments is strongly
-discouraged, since IE allows script execution within conditional comments. The
-default value is <code>false</code>.
-==== :elements (Array)
-Array of element names to allow. Specify all names in lowercase.
-  :elements => %w[
-    a abbr b blockquote br cite code dd dfn dl dt em i kbd li mark ol p pre
-    q s samp small strike strong sub sup time u ul var
-  ]
-==== :output (Symbol)
-Output format. Supported formats are <code>:html</code> and <code>:xhtml</code>,
-defaulting to <code>:html</code>.
-==== :output_encoding (String)
-Character encoding to use for HTML output. Default is <code>utf-8</code>.
-==== :protocols (Hash)
-URL protocols to allow in specific attributes. If an attribute is listed here
-and contains a protocol other than those specified (or if it contains no
-protocol at all), it will be removed.
-  :protocols => {
-    'a'   => {'href' => ['ftp', 'http', 'https', 'mailto']},
-    'img' => {'src'  => ['http', 'https']}
-  }
-If you'd like to allow the use of relative URLs which don't have a protocol,
-include the symbol <code>:relative</code> in the protocol array:
-  :protocols => {
-    'a' => {'href' => ['http', 'https', :relative]}
-  }
-==== :remove_contents (boolean or Array)
-If set to +true+, Sanitize will remove the contents of any non-whitelisted
-elements in addition to the elements themselves. By default, Sanitize leaves the
-safe parts of an element's contents behind when the element is removed.
-If set to an array of element names, then only the contents of the specified
-elements (when filtered) will be removed, and the contents of all other filtered
-elements will be left behind.
-The default value is <code>false</code>.
-==== :transformers
-Custom transformer or array of custom transformers to run using depth-first
-traversal. See the Transformers section below for details.
-==== :transformers_breadth
-Custom transformer or array of custom transformers to run using breadth-first
-traversal. See the Transformers section below for details.
-==== :whitespace_elements (Array)
-Array of lowercase element names that should be replaced with whitespace when
-removed in order to preserve readability. For example,
-<code>foo<div>bar</div>baz</code> will become
-<code>foo bar baz</code> when the <code><div></code> is removed.
-By default, the following elements are included in the
-<code>:whitespace_elements</code> array:
-  address article aside blockquote br dd div dl dt footer h1 h2 h3 h4 h5
-  h6 header hgroup hr li nav ol p pre section ul
-=== Transformers
-Transformers allow you to filter and modify nodes using your own custom logic,
-on top of (or instead of) Sanitize's core filter. A transformer is any object
-that responds to <code>call()</code> (such as a lambda or proc).
-To use one or more transformers, pass them to the <code>:transformers</code>
-config setting. You may pass a single transformer or an array of transformers.
-  Sanitize.clean(html, :transformers => [transformer_one, transformer_two])
-==== Input
-Each registered transformer's <code>call()</code> method will be called once for
-each node in the HTML (including elements, text nodes, comments, etc.), and will
-receive as an argument an environment Hash that contains the following items:
-[<code>:config</code>]
-  The current Sanitize configuration Hash.
-[<code>:is_whitelisted</code>]
-  <code>true</code> if the current node has been whitelisted by a previous
-  transformer, <code>false</code> otherwise. It's generally bad form to remove a
-  node that a previous transformer has whitelisted.
-[<code>:node</code>]
-  A Nokogiri::XML::Node object representing an HTML node. The node may be an
-  element, a text node, a comment, a CDATA node, or a document fragment. Use
-  Nokogiri's inspection methods (<code>element?</code>, <code>text?</code>,
-  etc.) to selectively ignore node types you aren't interested in.
-[<code>:node_name</code>]
-  The name of the current HTML node, always lowercase (e.g. "div" or "span").
-  For non-element nodes, the name will be something like "text", "comment",
-  "#cdata-section", "#document-fragment", etc.
-[<code>:node_whitelist</code>]
-  Set of Nokogiri::XML::Node objects in the current document that have been
-  whitelisted by previous transformers, if any. It's generally bad form to
-  remove a node that a previous transformer has whitelisted.
-[<code>:traversal_mode</code>]
-  Current node traversal mode, either <code>:depth</code> for depth-first (the
-  default mode) or <code>:breadth</code> for breadth-first.
-==== Output
-A transformer doesn't have to return anything, but may optionally return a Hash,
-which may contain the following items:
-[<code>:node_whitelist</code>]
-  Array or Set of specific Nokogiri::XML::Node objects to add to the document's
-  whitelist, bypassing the current Sanitize config. These specific nodes and all
-  their attributes will be whitelisted, but their children will not be.
-If a transformer returns anything other than a Hash, the return value will be
-ignored.
-==== Processing
-Each transformer has full access to the Nokogiri::XML::Node that's passed into
-it and to the rest of the document via the node's <code>document()</code>
-method. Any changes made to the current node or to the document will be
-reflected instantly in the document and passed on to subsequently-called
-transformers and to Sanitize itself. A transformer may even call Sanitize
-internally to perform custom sanitization if needed.
-Nodes are passed into transformers in the order in which they're traversed. By
-default, depth-first traversal is used, meaning that markup is traversed from
-the deepest node upward (not from the first node to the last node):
-  html        = '<div><span>foo</span></div>'
-  transformer = lambda{|env| puts env[:node_name] }
-  # Prints "text", "span", "div", "#document-fragment".
-  Sanitize.clean(html, :transformers => transformer)
-You may use the <code>:transformers_breadth</code> config to specify one or more
-transformers that should traverse nodes in breadth-first mode:
-  html        = '<div><span>foo</span></div>'
-  transformer = lambda{|env| puts env[:node_name] }
-  # Prints "#document-fragment", "div", "span", "text".
-  Sanitize.clean(html, :transformers_breadth => transformer)
-Transformers have a tremendous amount of power, including the power to
-completely bypass Sanitize's built-in filtering. Be careful! Your safety is in
-your own hands.
-==== Example: Transformer to whitelist YouTube video embeds
-The following example demonstrates how to create a depth-first Sanitize
-transformer that will safely whitelist valid YouTube video embeds without having
-to blindly allow other kinds of embedded content, which would be the case if you
-tried to do this by just whitelisting all <code><iframe></code> elements:
-  lambda do |env|
-    node      = env[:node]
-    node_name = env[:node_name]
-    # Don't continue if this node is already whitelisted or is not an element.
-    return if env[:is_whitelisted] || !node.element?
-    # Don't continue unless the node is an iframe.
-    return unless node_name == 'iframe'
-    # Verify that the video URL is actually a valid YouTube video URL.
-    return unless node['src'] =~ /\Ahttps?:\/\/(?:www\.)?youtube(?:-nocookie)?\.com\//
-    # We're now certain that this is a YouTube embed, but we still need to run
-    # it through a special Sanitize step to ensure that no unwanted elements or
-    # attributes that don't belong in a YouTube embed can sneak in.
-    Sanitize.clean_node!(node, {
-      :elements => %w[iframe],
-      :attributes => {
-        'iframe'  => %w[allowfullscreen frameborder height src width]
-      }
-    })
-    # Now that we're sure that this is a valid YouTube embed and that there are
-    # no unwanted elements or attributes hidden inside it, we can tell Sanitize
-    # to whitelist the current node.
-    {:node_whitelist => [node]}
-  end
-== Contributors
-Sanitize was created and is maintained by Ryan Grove (ryan@wonko.com).
-The following lovely people have also contributed to Sanitize:
-* Ben Anderson
-* Wilson Bilkovich
-* Peter Cooper
-* Gabe da Silveira
-* Nicholas Evans
-* Nils Gemeinhardt
-* Adam Hooper
-* Mutwin Kraus
-* Eaden McKee
-* Dev Purkayastha
-* David Reese
-* Ardie Saeidi
-* Rafael Souza
-* Ben Wanicur
-== License
-Copyright (c) 2013 Ryan Grove (ryan@wonko.com)
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the 'Software'), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
-the Software, and to permit persons to whom the Software is furnished to do so,
-subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
-FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
-COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
-IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
-CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.