RubyGems - owasp-esapi-ruby - Versions diffs - 0.30.0 - Mend

owasp-esapi-ruby 0.30.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (56) hide show

data/.document +5 -0
data/AUTHORS +5 -0
data/ChangeLog +69 -0
data/ISSUES +0 -0
data/LICENSE +24 -0
data/README +51 -0
data/Rakefile +63 -0
data/VERSION +1 -0
data/lib/codec/base_codec.rb +99 -0
data/lib/codec/css_codec.rb +101 -0
data/lib/codec/encoder.rb +330 -0
data/lib/codec/html_codec.rb +424 -0
data/lib/codec/javascript_codec.rb +119 -0
data/lib/codec/mysql_codec.rb +131 -0
data/lib/codec/oracle_codec.rb +46 -0
data/lib/codec/os_codec.rb +78 -0
data/lib/codec/percent_codec.rb +53 -0
data/lib/codec/pushable_string.rb +114 -0
data/lib/codec/vbscript_codec.rb +64 -0
data/lib/codec/xml_codec.rb +173 -0
data/lib/esapi.rb +68 -0
data/lib/exceptions.rb +37 -0
data/lib/executor.rb +20 -0
data/lib/owasp-esapi-ruby.rb +13 -0
data/lib/sanitizer/xss.rb +59 -0
data/lib/validator/base_rule.rb +90 -0
data/lib/validator/date_rule.rb +92 -0
data/lib/validator/email.rb +29 -0
data/lib/validator/float_rule.rb +76 -0
data/lib/validator/generic_validator.rb +26 -0
data/lib/validator/integer_rule.rb +61 -0
data/lib/validator/string_rule.rb +146 -0
data/lib/validator/validator_error_list.rb +48 -0
data/lib/validator/zipcode.rb +27 -0
data/spec/codec/css_codec_spec.rb +61 -0
data/spec/codec/html_codec_spec.rb +87 -0
data/spec/codec/javascript_codec_spec.rb +45 -0
data/spec/codec/mysql_codec_spec.rb +44 -0
data/spec/codec/oracle_codec_spec.rb +23 -0
data/spec/codec/os_codec_spec.rb +51 -0
data/spec/codec/percent_codec_spec.rb +34 -0
data/spec/codec/vbcript_codec_spec.rb +23 -0
data/spec/codec/xml_codec_spec.rb +83 -0
data/spec/owasp_esapi_encoder_spec.rb +226 -0
data/spec/owasp_esapi_executor_spec.rb +9 -0
data/spec/owasp_esapi_ruby_email_validator_spec.rb +39 -0
data/spec/owasp_esapi_ruby_xss_sanitizer_spec.rb +66 -0
data/spec/owasp_esapi_ruby_zipcode_validator_spec.rb +42 -0
data/spec/spec_helper.rb +10 -0
data/spec/validator/base_rule_spec.rb +29 -0
data/spec/validator/date_rule_spec.rb +40 -0
data/spec/validator/float_rule_spec.rb +31 -0
data/spec/validator/integer_rule_spec.rb +51 -0
data/spec/validator/string_rule_spec.rb +103 -0
data/spec/validator_skeleton.rb +150 -0
metadata +235 -0

data/.document ADDED

@@ -0,0 +1,5 @@
+README.rdoc
+lib/**/*.rb
+bin/*
+features/**/*.feature
+LICENSE

data/AUTHORS ADDED

@@ -0,0 +1,5 @@
+Owasp Esapi Ruby core
+---------------------
+* Paolo Perego <thesp0nge@owasp.org>
+* Sal Scotto <sal.scotto@gmail.com>

data/ChangeLog ADDED

@@ -0,0 +1,69 @@
+2011-03-02 12:47:57 -0500 Sal Scotto Renamed validators to rule, the container class Validator will be the delegate ot those classes. Also fixed rake file
+2011-03-02 12:37:56 -0500 Sal Scotto Added nokogiri dependency. Nokogiri will be used for HTML/CSS scanning
+2011-02-28 20:35:29 -0500 Sal Scotto Added an int and float validators.
+2011-02-28 17:20:51 -0500 Sal Scotto Remove old date validator code, that is now superceeded by new DateValidator object
+2011-02-28 17:19:49 -0500 Sal Scotto Added date validator. you pass it a dateformat string and it will return a valid Time object.
+2011-02-28 16:08:45 -0500 Sal Scotto Remove old validator spec file
+2011-02-28 11:24:34 +0100 Paolo Perego Merge remote branch 'washu/master'
+2011-02-13 09:54:46 -0500 Paolo Perego Added a baseline validator spec
+2011-02-27 12:18:20 -0500 Sal Scotto Added base validator rule and string validator rule
+2011-02-26 13:51:27 -0500 Sal Scotto Fixed up a funny looking doc entry
+2011-02-26 13:42:25 -0500 Sal Scotto Added in last of the codecs. Ive also gone back and updated the rdoc for all the codecs and the encoder. Formatting and whitespace clean was also performed as well asn upper level formatting and rodc inclusions. I have cpied a good bit of the java esapi docs for class headers, methods since I implmented them to give the same results as it would be in the java world
+2011-02-26 09:44:38 -0500 Sal Scotto Added mysql and oracle codecs
+2011-02-26 09:30:34 -0500 Sal Scotto moved percent codec
+2011-02-26 09:28:58 -0500 Sal Scotto moved some codecs around
+2011-02-26 09:27:36 -0500 Sal Scotto update percent codec
+2011-02-24 23:55:24 -0500 Sal Scotto Stubbing in the executor class
+2011-02-24 23:54:38 -0500 Sal Scotto Added a vbscript codec
+2011-02-24 17:52:52 -0500 Sal Scotto Stubbed in vbscript_codec
+2011-02-24 17:50:54 -0500 Sal Scotto Fixed up more codec to more ruby stylish
+2011-02-23 22:56:05 -0500 Sal Scotto added in more test examples
+2011-02-23 22:08:28 -0500 Sal Scotto more encoder tests
+2011-02-23 20:00:27 -0500 Sal Scotto Changed the overally convuluted tests into dynamic tests do each sequence makes a dyanimc test now
+2011-02-21 10:38:39 -0500 Sal Scotto added os and javascript codecs. Added in spec file for thos codecs and updated encoder spec. TODO: add in some convience methods for encode_for_os and encode_for_js. Refactored some things inside pushable string to be more ruby like in method names. Will keep going over code and refactoing as time permits. Still need a vbscript, oracle, and mysql codecs
+2011-02-20 11:19:04 -0500 Sal Scotto Updated codecs for whitespace
+2011-02-20 11:18:23 -0500 Sal Scotto Renamed url_codec to percent_codec
+2011-02-20 10:54:04 -0500 Sal Scotto Added URL codec and test cases
+2011-02-19 23:22:36 -0500 Sal Scotto Added a HTML entity codec. Added a spec file to test the encoder Added a spec fiel for the codec Cleaned up encoder code and added mroe docs
+2011-02-19 16:17:40 -0500 Sal Scotto Finished cleaning up encoding stuff, strings should be pushed to UTF_8 as they are scanned for processing
+2011-02-19 11:00:58 -0500 Sal Scotto Fixed css codec to properly add a space after encoding a value to terminate properly
+2011-02-19 10:44:42 -0500 Sal Scotto Added some more documentation to teh code
+2011-02-19 09:53:31 -0500 Sal Scotto Added the Encoder Added a top level ESPI module definition that will be used to get references to the currecntly configured esapi setup Added an encoder spec, currently it has enough setup to test css as the only codec available Added an exceptions module, will house the various exception classes that can be raised
+2011-02-19 08:22:55 -0500 Sal Scotto Merge branch 'master' of https://github.com/thesp0nge/owasp-esapi-ruby
+2011-02-18 09:51:58 +0100 Paolo Perego Working on validating EU date formatted
+2011-02-18 00:16:22 -0500 Sal Scotto Added a CSS codec. Flow should go from Validator --> execute all relevant codecs to decode/encode the inputs BEFORE Applying all other rules. More codecs to come i.e. Base64, HTMLEntity, Hex, JavaScript, XMLEntity, Os specific i.e. Windows,Unix and Database level codecs to force escapes
+2011-02-17 19:27:50 -0500 Sal Scotto Merge branch 'master' of https://github.com/thesp0nge/owasp-esapi-ruby
+2011-02-17 18:02:55 +0100 Paolo Perego Now also dates written in US long format are recognized
+2011-02-17 09:14:39 +0100 Paolo Perego Now date validates MMM DD, YYY Added an ISSUE file to track remotely issues
+2011-02-17 08:05:01 +0100 Paolo Perego Added a ChangeLog and written some more stuff into README Zipcode had a wrong optional argument check that caused a null pointer exception. Date now validates good 'MM/DD/YYYY'
+2011-02-16 21:32:07 -0500 Sal Scotto Merge branch 'master' of https://github.com/thesp0nge/owasp-esapi-ruby
+2011-02-16 19:18:12 +0100 Paolo Perego Work over validators
+2011-02-16 09:47:00 +0100 Paolo Perego Fixed boolean operators
+2011-02-16 09:21:23 +0100 Paolo Perego Changed validator method from validate to valid? Added basic date validator
+2011-02-15 14:18:29 +0100 Paolo Perego Fixed typo
+2011-02-15 13:06:13 +0100 Paolo Perego Owasp Esapi Ruby will require at least 1.9.2 ruby version due to the usage of regex patterns only available with the new regex engine
+2011-02-15 12:59:06 +0100 Paolo Perego Now generic_validator handles validation method and both email than zipcode validators are run against it
+2011-02-15 11:56:08 +0100 Paolo Perego Removed a redundant method since matcher is an attr_accessor
+2011-02-15 01:53:09 -0800 Paolo Perego Added Daniele and Sal email addresses
+2011-02-15 09:08:07 +0100 Paolo Perego Added a generic validator class with a validate method. All specific validator will inehrit code from this class.
+2011-02-15 08:28:32 +0100 Paolo Perego Added a generic validator class with a validate method. All specific validator will inehrit code from this class.
+2011-02-15 08:25:39 +0100 Paolo Perego Modified boolean validation test
+2011-02-15 08:23:18 +0100 Paolo Perego Version bumped to 0.5.0. It means approx 5% of the work done.
+2011-02-15 08:21:42 +0100 Paolo Perego Renamed Sal Scotto rspec file with a filename that does not include it into running tasks (I want to see true failing tests). Let's use this good rspec as skeleton. Added an email address pattern rspec file. Implemented email address pattern validation.
+2011-02-14 18:41:49 -0500 Sal Scotto Merge branch 'master' of https://github.com/thesp0nge/owasp-esapi-ruby
+2011-02-14 18:26:36 +0100 Paolo Perego Fixed an initialization issue in XSS Added some Zip code spec Renamed Sal's validator skeleton not to be included in rake spec task
+2011-02-14 18:23:26 +0100 Paolo Perego Fixed (C) statement. Added a private filtering routine called by the public API
+2011-02-14 17:05:59 +0100 Paolo Perego (C) must be given to Owasp foundation
+2011-02-13 09:54:46 -0500 Paolo Perego Added a baseline validator spec
+2011-02-14 16:47:14 +0100 Paolo Perego Modified namespace. Now it's Owasp::Esapi
+2011-02-14 07:30:46 -0500 Sal Scotto Merge branch 'master' of https://github.com/thesp0nge/owasp-esapi-ruby
+2011-02-14 09:20:02 +0100 Paolo Perego Zipcode validator now works with Italian regular expression, must fix the US one
+2011-02-14 09:19:01 +0100 Paolo Perego Added AUTHORS file. Zipcode validator now works with Italian regular expression. Not the US one right now
+2011-02-13 16:56:26 +0100 Paolo Perego Renamed XSS sanitizer in a proper namespace. Added more test cases and created a basic (and not working right now) zip code validator.
+2011-02-13 09:54:46 -0500 Sal Scotto Added a baseline validator spec
+2011-02-12 17:35:01 +0100 Paolo Perego First real commit with 2 xss rspec and first xss sanitizing implementation. This is *just the beginning*
+2011-01-18 12:47:01 +0100 Paolo Perego Added _site and pixelmator file
+2011-01-14 14:58:47 +0100 Paolo Perego Added kickstarting info for Owasp Summit
+2010-06-01 13:25:38 +0200 Paolo Perego Some Typos
+2010-05-31 12:29:52 +0200 Paolo Perego Licensed as "new BSD" project with a starting README information
+2010-05-31 12:21:17 +0200 Paolo Perego Initial commit to owasp-esapi-ruby.

data/ISSUES ADDED

File without changes

data/LICENSE ADDED

@@ -0,0 +1,24 @@
+Copyright (c) 2010-2011, The OWASP Foundation
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+    * Redistributions of source code must retain the above copyright
+      notice, this list of conditions and the following disclaimer.
+    * Redistributions in binary form must reproduce the above copyright
+      notice, this list of conditions and the following disclaimer in the
+      documentation and/or other materials provided with the distribution.
+    * Neither the name of the <organization> nor the
+      names of its contributors may be used to endorse or promote products
+      derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
+DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README ADDED

@@ -0,0 +1,51 @@
+= The Owasp ESAPI Ruby project
+== Introduction
+The Owasp ESAPI Ruby is a port for outstanding release quality Owasp ESAPI
+project to the Ruby programming language.
+Ruby is now a famous programming language due to its Rails framework developed by David Heinemeier Hansson (http://twitter.com/dhh) that simplify the creation of a web application using a convention over configuration approach to simplify programmers' life.
+Despite Rails diffusion, there are a lot of Web framework out there that allow people to write web apps in Ruby (merb, sinatra, vintage) [http://accidentaltechnologist.com/ruby/10-alternative-ruby-web-frameworks/]. Owasp Esapi Ruby wants to bring all Ruby deevelopers a gem full of Secure APIs they can use whatever the framework they choose.
+== Why supporting only Ruby 1.9.2 and beyond?
+The OWASP Esapi Ruby gem will require at least version 1.9.2 of Ruby interpreter to make sure to have full advantages of the newer language APIs.
+In particular version 1.9.2 introduces radical changes in the following areas:
+=== Regular expression engine
+(to be written)
+=== UTF-8 support
+Unicode support in 1.9.2 is much better and provides better support for character set encoding/decoding
+* All strings have an additional chunk of info attached: Encoding
+* String#size takes encoding into account – returns the encoded character count
+* You can get the raw datasize
+* Indexed access is by encoded data – characters, not bytes
+* You can change encoding by force but it doesn’t convert the data
+=== Dates and Time
+From "Programming Ruby 1.9"
+"As of Ruby 1.9.2, the range of dates that can be represented is no longer limited by the under- lying operating system’s time representation (so there’s no year 2038 problem). As a result, the year passed to the methods gm, local, new, mktime, and utc must now include the century—a year of 90 now represents 90 and not 1990."
+== Roadmap
+Please see ChangeLog file.
+== Note on Patches/Pull Requests
+* Fork the project.
+* Create documentation with rake yard task
+* Make your feature addition or bug fix.
+* Add tests for it. This is important so I don't break it in a
+  future version unintentionally.
+* Commit, do not mess with rakefile, version, or history.
+  (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
+* Send me a pull request. Bonus points for topic branches.
+== Copyright
+Copyright (c) 2011 the OWASP Foundation. See LICENSE for details.

data/Rakefile ADDED

@@ -0,0 +1,63 @@
+require 'rubygems'
+require 'rake'
+begin
+  require 'jeweler'
+  Jeweler::Tasks.new do |gem|
+    gem.name = "owasp-esapi-ruby"
+    gem.summary = %Q{Owasp Enterprise Security APIs for Ruby language}
+    gem.description = File.read(File.join(File.dirname(__FILE__), 'README'))
+    gem.email = "thesp0nge@owasp.org"
+    gem.version = File.read(File.join(File.dirname(__FILE__), 'VERSION'))
+    gem.homepage = "http://github.com/thesp0nge/owasp-esapi-ruby"
+    gem.authors = File.read(File.join(File.dirname(__FILE__), 'AUTHORS'))
+    gem.required_ruby_version = '>= 1.9.2'
+    gem.add_development_dependency "rspec", ">= 1.2.9"
+    gem.add_development_dependency "yard", ">= 0"
+    gem.add_development_dependency "nokogiri",">= 1.4.4"
+    gem.add_dependency "nokogiri",">= 1.4.4"
+    # gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
+  end
+  Jeweler::GemcutterTasks.new
+rescue LoadError
+  puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
+end
+require 'rspec/core/rake_task'
+RSpec::Core::RakeTask.new(:spec) do |t|
+  t.pattern = "./spec/**/*_spec.rb"
+  # Put spec opts in a file named .rspec in root
+end
+# require 'spec/rake/spectask'
+# Spec::Rake::SpecTask.new(:spec) do |spec|
+#   spec.libs << 'lib' << 'spec'
+#   spec.spec_files = FileList['spec/**/*_spec.rb']
+# end
+# Spec::Rake::SpecTask.new(:rcov) do |spec|
+#   spec.libs << 'lib' << 'spec'
+#   spec.pattern = 'spec/**/*_spec.rb'
+#   spec.rcov = true
+# end
+task :spec => :check_dependencies
+task :default => :spec
+begin
+  require 'yard'
+  YARD::Rake::YardocTask.new
+rescue LoadError
+  task :yardoc do
+    abort "YARD is not available. In order to run yardoc, you must: sudo gem install yard"
+  end
+end
+namespace :prepare do
+  desc 'Generate ChangeLog'
+  task :changelog do
+    system ('git log --format="%ai %cn %s" > ChangeLog')
+  end
+end

data/VERSION ADDED

	@@ -0,0 +1 @@
1	+ 0.30.0

data/lib/codec/base_codec.rb ADDED

@@ -0,0 +1,99 @@
+# The Codec interface defines a set of methods for encoding and decoding application level encoding schemes,
+# * such as HTML entity encoding and percent encoding (aka URL encoding). Codecs are used in output encoding
+# * and canonicalization.  The design of these codecs allows for character-by-character decoding, which is
+# * necessary to detect double-encoding and the use of multiple encoding schemes, both of which are techniques
+# * used by attackers to bypass validation and bury encoded attacks in data.
+class Fixnum
+  def to_h
+    to_s(16)
+  end
+end
+class Bignum
+  def to_h
+    to_s(16)
+  end
+end
+module Owasp
+  module Esapi
+    # The Codec module, houses Codec implementations
+    module Codec
+      class BaseCodec
+        # start range of valid code points
+        START_CODE_POINT = 0x000
+        # ending range of valid code points
+        END_CODE_POINT = 0x10fff
+        @@hex_codes = [] #:nodoc:
+        for c in (0..255) do
+          if (c >= 0x30 and c <= 0x39) or (c >= 0x41 and c <= 0x5A) or (c >= 0x61 and c <= 0x7A)
+            @@hex_codes[c] = nil
+          else
+            @@hex_codes[c] = c.to_h
+          end
+        end
+        # Encode a String so that it can be safely used in a specific context.
+        # immune is an arry or string that contains character tobe ignore
+        def encode(immune, input)
+          return nil if input.nil?
+          encoded_string = ''
+          encoded_string.encode!(Encoding::UTF_8)
+          input.encode(Encoding::UTF_8).chars do |c|
+            encoded_string << encode_char(immune,c)
+          end
+          encoded_string
+        end
+        # Default implementation that should be overridden in specific codecs.
+        def encode_char(immune, input)
+          input
+        end
+        #  Helper method for codecs to get the hex value of a character
+        def hex(c)
+          return nil if c.nil?
+          b = c[0].ord
+          if b < 0xff
+            @@hex_codes[b]
+          else
+            b.to_h
+          end
+        end
+        # Decode a String that was encoded using the encode method in this Class
+        def decode(input)
+          decoded_string = ''
+          seekable = PushableString.new(input.dup)
+          while seekable.next?
+            t = decode_char(seekable)
+            if t.nil?
+              decoded_string << seekable.next
+            else
+              decoded_string << t
+            end
+          end
+          decoded_string
+        end
+        # Returns the decoded version of the next character from the input string and advances the
+        # current character in the PushableString.  If the current character is not encoded, this
+        # method MUST reset the PushableString.
+        def decode_char(input)
+          input
+        end
+        # Basic min method
+        def min(a,b) #:nodoc:
+          if a > b
+            return b
+          else
+            return a
+          end
+        end
+      end
+    end
+  end
+end

data/lib/codec/css_codec.rb ADDED

@@ -0,0 +1,101 @@
+#
+# Implementation of the Codec interface for backslash encoding used in CSS.
+module Owasp
+  module Esapi
+    module Codec
+      class CssCodec < BaseCodec
+        # Returns backslash encoded character.
+        def encode_char(immune, input)
+          # check immune
+          return input if immune.include?(input)
+          # check for alpha numeric
+          hex = hex(input)
+          # add a space at end to terminate under css
+          return "\\#{hex} " unless hex.nil? or hex.empty?
+          input
+        end
+        # decode a character from the PushableString
+        # We follow the rules defined for CSS by w3
+        # http://www.w3.org/TR/CSS21/syndata.html#characters
+        # All CSS syntax is case-insensitive within the ASCII range (i.e., [a-z] and [A-Z] are equivalent), except for parts that are not under the control of CSS. For example,
+        # the case-sensitivity of values of the HTML attributes "id" and "class", of font names, and of URIs lies outside the scope of this specification.
+        # Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML. In CSS, identifiers (including element names, classes, and IDs in selectors)
+        # can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-)  and the underscore (_); they cannot start with a digit,
+        # two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance,
+        # the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F". Note that Unicode is code-by-code equivalent to ISO 10646 (see [UNICODE] and [ISO10646]).
+        # In CSS 2.1, a backslash (\) character can indicate one of three types of character escape. Inside a CSS comment, a backslash stands for itself, and if a backslash is
+        # immediately followed by the end of the style sheet, it also stands for itself (i.e., a DELIM token).
+        #
+        # First, inside a string, a backslash followed by a newline is ignored (i.e., the string is deemed not to contain either the backslash or the newline).
+        # Outside a string, a backslash followed by a newline stands for itself (i.e., a DELIM followed  by a newline).
+        # <P>
+        # Second, it cancels the meaning of special CSS characters. Any character (except a hexadecimal digit, linefeed, carriage return, or form feed) can be escaped
+        # with a backslash to remove its special meaning. For example, "\"" is a string consisting of one double quote. Style sheet preprocessors must not remove these backslashes
+        # from a style sheet since that would change the style sheet's meaning.
+        # <P>
+        # Third, backslash escapes allow authors to refer to characters they cannot easily put in a document.  In this case, the backslash is followed by at most six
+        # hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number, which must not be zero. (It is undefined in CSS 2.1 what happens
+        # if a style sheet does contain a character with Unicode codepoint zero.) If a character in the range [0-9a-fA-F] follows the hexadecimal number, the end of the number
+        # needs to be made clear. There are two ways to do that:
+        # 1. with a space (or other white space character): "\26 B" ("&B"). In this case, user agents should treat a "CR/LF" pair (U+000D/U+000A) as a single white space character.
+        # 2. by providing exactly 6 hexadecimal digits: "\000026B" ("&B")
+        # In fact, these two methods may be combined. Only one white space character is ignored after a hexadecimal escape. Note that this means that a "real" space after the
+        # escape sequence must be doubled.If the number is outside the range allowed by Unicode (e.g., "\110000" is above the maximum 10FFFF allowed in current Unicode), the UA
+        # may replace the escape with the "replacement character" (U+FFFD). If the character is to be displayed, the UA should show a visible symbol, such as a
+        # "missing character" glyph (cf. 15.2, point 5). Note: Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation,
+        # even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not). The identifier "te\st" is exactly the same identifier as "test".
+        def decode_char(input)
+          input.mark
+          first = input.next
+          if first.nil? or !first.eql?('\\')
+            input.reset
+            return nil
+          end
+          second = input.next
+          if second.nil?
+            input.reset
+            return nil
+          end
+          # rule execution
+          fallthrough = false
+          if second == "\r"
+            # speical whitespace cases
+            if input.peek?("\n")
+              input.next
+              fallthrough = true
+            end
+          end
+          # handle the skip ahead. Ruby case doesnt allow for fall through so we inlined the small setup
+          return decode_char(input) if second == "\n" || second == "\f" || second == "\u0000" || fallthrough
+          # non hex test
+          return second if !input.hex?(second)
+          # check for 6 hex digits for rule 3
+          tmp = second
+          for i in 1..5 do
+            c = input.next
+            if c.nil? or c =~ /\s/
+              break
+            end
+            if input.hex?(c)
+              tmp << c
+            else
+              input.push(c)
+            end
+          end
+          # check the codepoint and if outside of range, return teh replacement
+          begin
+            i = tmp.hex
+            return i.chr(Encoding::UTF_8) if i >= START_CODE_POINT and i <= END_CODE_POINT
+            return "\ufffd"
+          rescue Exception => e
+            raise EncodingError.new("Received an exception while parsing a string verified to be hex")
+          end
+        end
+      end
+    end
+  end
+end

data/lib/codec/encoder.rb ADDED

@@ -0,0 +1,330 @@
+# The Encoder interface contains a number of methods for decoding input and encoding output
+# so that it will be safe for a variety of interpreters. To prevent
+# double-encoding, callers should make sure input does not already contain encoded characters
+# by calling canonicalize. Validator implementations should call canonicalize on user input
+# <b>before</b> validating to prevent encoded attacks.
+# All of the methods must use a "whitelist" or "positive" security model.
+# For the encoding methods, this means that all characters should be encoded, except for a specific list of
+# "immune" characters that are known to be safe.
+# The Encoder performs two key functions, encoding and decoding. These functions rely
+# on a set of codecs that can be found in the org.owasp.esapi.codecs package. These include:
+# * CSS Escaping<
+# * HTMLEntity Encoding
+# * JavaScript Escaping
+# * MySQL Escaping
+# * Oracle Escaping
+# * Percent Encoding (aka URL Encoding)
+# * Unix Escaping
+# * VBScript Escaping
+# * Windows Encoding
+require 'cgi'
+require 'base64'
+require 'codec/base_codec'
+require 'codec/pushable_string'
+require 'codec/base_codec'
+require 'codec/css_codec'
+require 'codec/html_codec'
+require 'codec/percent_codec'
+require 'codec/javascript_codec'
+require 'codec/os_codec'
+require 'codec/vbscript_codec'
+require 'codec/oracle_codec'
+require 'codec/mysql_codec'
+require 'codec/xml_codec'
+module Owasp
+  module Esapi
+    class Encoder
+      #
+      # == Immune Character feilds
+      #
+      IMMUNE_CSS        = [ ]
+      IMMUNE_HTMLATTR   = [ ',', '.', '-', '_' ]
+      IMMUNE_HTML       = [ ',', '.', '-', '_', ' ' ]
+      IMMUNE_JAVASCRIPT = [ ',', '.', '_' ]
+      IMMUNE_VBSCRIPT   = [ ',', '.', '_' ]
+      IMMUNE_XML        = [ ',', '.', '-', '_', ' ' ]
+      IMMUNE_SQL        = [ ' ' ]
+      IMMUNE_OS         = [ '-' ]
+      IMMUNE_XMLATTR    = [ ',', '.', '-', '_' ]
+      IMMUNE_XPATH      = [ ',', '.', '-', '_', ' ' ]
+      PASSWORD_SPECIALS = "!$*-.=?@_"
+      # == Standard Characetr Sets
+      CHAR_LCASE = "abcdefghijklmnopqrstuvwxyz"
+      CHAR_UCASE = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+      CHAR_DIGITS = "0123456789"
+      CHAR_SPECIALS = "!$*+-.=?@^_|~"
+      CHAR_LETTERS = "#{CHAR_LCASE}#{CHAR_UCASE}"
+      CHAR_ALPHANUMERIC = "#{CHAR_LETTERS}#{CHAR_DIGITS}"
+      # Create the encoder, optionally pass in a list of codecs to use
+      def initialize(configured_codecs = nil)
+        # codec list
+        @codecs = []
+        # default codecs
+        @html_codec = Owasp::Esapi::Codec::HtmlCodec.new
+        @percent_codec = Owasp::Esapi::Codec::PercentCodec.new
+        @js_codec = Owasp::Esapi::Codec::JavascriptCodec.new
+        @vb_codec = Owasp::Esapi::Codec::VbScriptCodec.new
+        @css_codec = Owasp::Esapi::Codec::CssCodec.new
+        @xml_codec = Owasp::Esapi::Codec::XmlCodec.new
+        unless configured_codecs.nil?
+          configured_codecs.each do |c|
+            @codecs << c
+          end
+        else
+          # setup some defaults codecs
+          @codecs << @html_codec
+          @codecs << @percent_codec
+          @codecs << @js_codec
+        end
+      end
+      # This method is equivalent to calling sanitize(input, true)
+      def canonicalize(input)
+        # if the input is nil, just return nil
+        return nil if input.nil?
+        # check teh ESAPI config and figure out if we want strict encoding
+        sanitize(input,Owasp::Esapi.security_config.ids?)
+      end
+      # Sanitization is simply the operation of reducing a possibly encoded
+      # string down to its simplest form. This is important, because attackers
+      # frequently use encoding to change their input in a way that will bypass
+      # validation filters, but still be interpreted properly by the target of
+      # the attack. Note that data encoded more than once is not something that a
+      # normal user would generate and should be regarded as an attack.
+      # Everyone says[http://cwe.mitre.org/data/definitions/180.html] you shouldn't do validation
+      # without canonicalizing the data first. This is easier said than done. The canonicalize method can
+      # be used to simplify just about any input down to its most basic form. Note that sanitization doesn't
+      # handle Unicode issues, it focuses on higher level encoding and escaping schemes. In addition to simple
+      # decoding, sanitize also handles:
+      # * Perverse but legal variants of escaping schemes
+      # * Multiple escaping (%2526 or &#x26;lt;)
+      # * Mixed escaping (%26lt;)
+      # * Nested escaping (%%316 or &%6ct;)
+      # * All combinations of multiple, mixed, and nested encoding/escaping (%2&#x35;3c or &#x2526gt;)
+      #
+      # Although ESAPI is able to canonicalize multiple, mixed, or nested encoding, it's safer to not accept
+      # this stuff in the first place. In ESAPI, the default is "strict" mode that throws an IntrusionException
+      # if it receives anything not single-encoded with a single scheme. Even if you disable "strict" mode,
+      # you'll still get warning messages in the log about each multiple encoding and mixed encoding received.
+      #
+      def sanitize(input, strict)
+        # check input again, as someone may just wana call sanitize
+        return nil if input.nil?
+        working = input
+        found_codec = nil
+        mixed_count = 1
+        found_count = 0
+        clean = false
+        while !clean
+          clean = true
+          @codecs.each do |codec|
+            old = working
+            working = codec.decode(working)
+            if !old.eql?(working)
+              if !found_codec.nil? and found_codec != codec
+                mixed_count += 1
+              end
+              found_codec = codec
+              if clean
+                found_count += 1
+              end
+              clean = false
+            end
+          end
+        end
+        # test for strict encoding, and indicate mixed and multiple errors
+        if found_count >= 2 and mixed_count > 1
+          if strict
+            raise Owasp::Esapi::IntrustionException.new("Input validation failure", "Multiple (#{found_count}x) and mixed encoding (#{mixed_count}x) detected in #{input}")
+          else
+            Owasp::Esapi.logger.warn("Multiple (#{found_count}x) and mixed encoding (#{mixed_count}x) detected in #{input}")
+          end
+        elsif found_count >= 2
+          if strict
+            raise Owasp::Esapi::IntrustionException.new("Input validation failure", "Multiple (#{found_count}x) detected in #{input}")
+          else
+            Owasp::Esapi.logger.warn("Multiple (#{found_count}x) detected in #{input}")
+          end
+        elsif mixed_count > 1
+          if strict
+            raise Owasp::Esapi::IntrustionException.new("Input validation failure", "Mixed encoding (#{mixed_count}x) detected in #{input}")
+          else
+            Owasp::Esapi.logger.warn("Mixed encoding (#{mixed_count}x) detected in #{input}")
+          end
+        end
+        working
+      end
+      # Encode for Base64. using the url safe input set
+      def encode_for_base64(input)
+        return nil if input.nil?
+        Base64.urlsafe_encode64(input)
+      end
+      # Decode data encoded with BASE-64 encoding.
+      # it assumes url safe encoding sets
+      def decode_for_base64(input)
+        return nil if input.nil?
+        Base64.urlsafe_decode64(input)
+      end
+      def encode_for_ldap(input)
+      end
+      def encode_for_dn(input)
+      end
+      # Encode for use in a URL. This method performs URL encoding[http://en.wikipedia.org/wiki/Percent-encoding]
+      # on the entire string.
+      def encode_for_url(input)
+        return nil if input.nil?
+        CGI::escape(input)
+      end
+      # Decode from URL. First canonicalize and detect any double-encoding.
+      # If this check passes, then the data is decoded using URL decoding.
+      def decode_for_url(input)
+        return nil if input.nil?
+        clean = sanitize(input)
+        CGI::unescape(input,Owasp::Esapi.security_config.encoding)
+      end
+      # Encode data for use in Cascading Style Sheets (CSS) content.
+      # CSS Syntax[http://www.w3.org/TR/CSS21/syndata.html#escaped-characters] (w3.org)
+      def encode_for_css(input)
+        return nil if input.nil?
+        @css_codec.encode(IMMUNE_CSS,input)
+      end
+      # Encode data for insertion inside a data value or function argument in JavaScript. Including user data
+      # directly inside a script is quite dangerous. Great care must be taken to prevent including user data
+      # directly into script code itself, as no amount of encoding will prevent attacks there.
+      #
+      # Please note there are some JavaScript functions that can never safely receive untrusted data
+      # as input – even if the user input is encoded.
+      #
+      # For example:
+      #
+      #  <script>
+      #  window.setInterval('<%= EVEN IF YOU ENCODE UNTRUSTED DATA YOU ARE XSSED HERE %>');
+      #  </script>
+      #
+      def encode_for_javascript(input)
+        return nil if input.nil?
+        @js_codec.encode(IMMUNE_JAVASCRIPT,input)
+      end
+      # Encode data for use in HTML using HTML entity encoding
+      # <p>
+      # Note that the following characters:
+      # 00-08, 0B-0C, 0E-1F, and 7F-9F
+      # cannot be used in HTML.
+      #
+      # * HTML Encodings[http://en.wikipedia.org/wiki/Character_encodings_in_HTML] (wikipedia.org)
+      # * SGML Specification[http://www.w3.org/TR/html4/sgml/sgmldecl.html] (w3.org)
+      # * XML Specification[http://www.w3.org/TR/REC-xml/#charsets] (w3.org)
+      def encode_for_html(input)
+        return nil if input.nil?
+        @html_codec.encode(IMMUNE_HTML,input)
+      end
+      # Decodes HTML entities.
+      def dencode_for_html(input)
+        return nil if input.nil?
+        @html_codec.decode(input)
+      end
+      #  Encode data for use in HTML attributes.
+      def encode_for_html_attr(input)
+        return nil if input.nil?
+        @html_codec.encode(IMMUNE_HTMLATTR,input)
+      end
+      # Encode for an operating system command shell according to the configured OS codec
+      #
+      # Please note the following recommendations before choosing to use this method:
+      #
+      # 1. It is strongly recommended that applications avoid making direct OS system calls if possible as such calls are not portable, and they are potentially unsafe. Please use language provided features if at all possible, rather than native OS calls to implement the desired feature.
+      # 2. If an OS call cannot be avoided, then it is recommended that the program to be invoked be invoked directly (e.g., Kernel.system("nameofcommand","parameterstocommand")) as this avoids the use of the command shell. The "parameterstocommand" should of course be validated before passing them to the OS command.
+      # 3. If you must use this method, then we recommend validating all user supplied input passed to the command shell as well, in addition to using this method in order to make the command shell invocation safe.
+      #
+      # An example use of this method would be: Kernel.system("dir" ,encode_for_os(WindowsCodec, "parameter(s)tocommandwithuserinput");
+      def encode_for_os(codec,input)
+        return nil if input.nil?
+        codec.encode(IMMUNE_OS,input)
+      end
+      # Encode data for insertion inside a data value in a Visual Basic script. Putting user data directly
+      # inside a script is quite dangerous. Great care must be taken to prevent putting user data
+      # directly into script code itself, as no amount of encoding will prevent attacks there.
+      #
+      # This method is not recommended as VBScript is only supported by Internet Explorer
+      def encode_for_vbscript(input)
+        return nil if input.nil?
+        @vb_codec.encode(IMMUNE_VBSCRIPT,input)
+      end
+      # Encode input for use in a SQL query, according to the selected codec
+      # (appropriate codecs include the MySQLCodec and OracleCodec).
+      #
+      # This method is not recommended. The use of the PreparedStatement
+      # interface is the preferred approach. However, if for some reason
+      # this is impossible, then this method is provided as a weaker
+      # alternative.
+      #
+      # The best approach is to make sure any single-quotes are double-quoted.
+      def encode_for_sql(codec,input)
+        return nil if input.nil?
+        codec.encode(IMMUNE_SQL,input)
+      end
+      # Encode data for use in an XPath query.
+      #
+      # NB: The reference implementation encodes almost everything and may over-encode.
+      #
+      # The difficulty with XPath encoding is that XPath has no built in mechanism for escaping
+      # characters. It is possible to use XQuery in a parameterized way to
+      # prevent injection.
+      #
+      # For more information, refer to this article[http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html]
+      # which specifies the following list of characters as the most dangerous: ^&"*';<>().
+      #
+      # This[http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf] paper suggests disallowing ' and " in queries.<p>
+      # * XPath Injection[http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html] (ibm.com)
+      # * Blind XPath Injection[http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf] (packetstormsecurity.org)
+      def encode_for_xpath(input)
+        return nil if input.nil?
+        @xml_codec.encode(IMMUNE_XPATH,input)
+      end
+      # Encode data for use in an XML element. The implementation should follow the
+      # XML Encoding Standard[http://www.w3schools.com/xml/xml_encoding.asp] from the W3C.
+      # <p>
+      # The use of a real XML parser is strongly encouraged. However, in the
+      # hopefully rare case that you need to make sure that data is safe for
+      # inclusion in an XML document and cannot use a parse, this method provides
+      # a safe mechanism to do so.
+      def encode_for_xml(input)
+        return nil if input.nil?
+        @xml_codec.encode(IMMUNE_XML,input)
+      end
+      # Encode data for use in an XML attribute. The implementation should follow
+      # the XML Encoding Standard[http://www.w3schools.com/xml/xml_encoding.asp] from the W3C.
+      # <p>
+      # The use of a real XML parser is highly encouraged. However, in the
+      # hopefully rare case that you need to make sure that data is safe for
+      # inclusion in an XML document and cannot use a parse, this method provides
+      # a safe mechanism to do so.
+      def encode_for_xml_attr(input)
+        return nil if input.nil?
+        @xml_codec.encode(IMMUNE_XMLATTR,input)
+      end
+    end
+  end
+end