sanitize-url 0.1.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.document +5 -0
- data/.gitignore +21 -0
- data/LICENSE +20 -0
- data/README.markdown +67 -0
- data/Rakefile +53 -0
- data/VERSION +1 -0
- data/lib/sanitize-url.rb +109 -0
- data/sanitize-url.gemspec +55 -0
- data/spec/sanitize_url_spec.rb +164 -0
- data/spec/spec_helper.rb +7 -0
- data/test.rb +16 -0
- metadata +76 -0
data/.document
ADDED
data/.gitignore
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2009 Jarrett Colby
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
4
|
+
a copy of this software and associated documentation files (the
|
5
|
+
"Software"), to deal in the Software without restriction, including
|
6
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
7
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
8
|
+
permit persons to whom the Software is furnished to do so, subject to
|
9
|
+
the following conditions:
|
10
|
+
|
11
|
+
The above copyright notice and this permission notice shall be
|
12
|
+
included in all copies or substantial portions of the Software.
|
13
|
+
|
14
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
15
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
16
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
17
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
18
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
19
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
20
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.markdown
ADDED
@@ -0,0 +1,67 @@
|
|
1
|
+
sanitize-url
|
2
|
+
============
|
3
|
+
|
4
|
+
This gem provides a module called `SanitizeUrl`, which you can mix-in anywhere you like. It provides a single method: `sanitize_url`, which accepts a URL and returns one with JavaScript removed. It also prepends the `http://` scheme if no valid scheme is found.
|
5
|
+
|
6
|
+
Why do you need this? Because attackers can sneak JavaScript into URLs, and some browsers may execute it. Say, for example, you have a web app that lets users post links. If you don't sanitize the URLs, you may have a cross-site-scripting vulnerability on your hands. More commonly, well-intentioned users will type URLs without prepending a protocol. If you render these URLs as-in your links, the browser will interpret them as links within your own site, e.g. `http://your-site.com/www.site-they-linked-to.com`.
|
7
|
+
|
8
|
+
Rails mitigates some of the danger by automatically URL-encoding in the `link_to` helper, but this does not solve every problem. For example, it doesn't remove plain old `javascript:alert("xss")`, and URLs with numeric character references come out broken. This gem fixes those and other problems.
|
9
|
+
|
10
|
+
Basic Usage
|
11
|
+
===========
|
12
|
+
|
13
|
+
require 'rubygems'
|
14
|
+
require 'sanitize-url'
|
15
|
+
|
16
|
+
include SanitizeUrl
|
17
|
+
|
18
|
+
sanitize_url('www.example.com')
|
19
|
+
|
20
|
+
Advanced
|
21
|
+
========
|
22
|
+
|
23
|
+
This gem uses a whitelist approach, killing any schemes that aren't in the list. This should block `javascript:` and `data:` URLs, both of which can be used for XSS. The default list of allowed schemes is:
|
24
|
+
|
25
|
+
http://
|
26
|
+
https://
|
27
|
+
ftp://
|
28
|
+
ftps://
|
29
|
+
mailto://
|
30
|
+
svn://
|
31
|
+
svn+ssh://
|
32
|
+
git://
|
33
|
+
mailto:
|
34
|
+
|
35
|
+
You can pass in your own whitelist like this:
|
36
|
+
|
37
|
+
sanitize_url('http://example.com', :schemes => ['http', 'https'])
|
38
|
+
|
39
|
+
If `sanitize_url` receives a URL with a forbidden scheme, it wipes out the entire URL and returns a blank string. You can override this behavior and have it return a string of your choosing like this:
|
40
|
+
|
41
|
+
sanitize_url('javascript:alert("XSS")', :replace_evil_with => 'my replacement')
|
42
|
+
# => 'my replacement'
|
43
|
+
|
44
|
+
See the spec/sanitize_url_spec.rb for some examples of the how this gem transforms URLs.
|
45
|
+
|
46
|
+
Installation
|
47
|
+
============
|
48
|
+
|
49
|
+
gem install sanitize-url
|
50
|
+
|
51
|
+
If that doesn't work, it's probably because the gem is hosted on Gemcutter, and your computer doesn't know about Gemcutter yet. To fix that:
|
52
|
+
|
53
|
+
gem install gemcutter
|
54
|
+
gem tumble
|
55
|
+
|
56
|
+
Bug Reports
|
57
|
+
===========
|
58
|
+
|
59
|
+
Since this is a security-related gem, you'll rack up mad karma by reporting a bug. If you find a way to sneak executable JavaScript (or any other form of evil) past the filter, please send me a message on GitHub:
|
60
|
+
|
61
|
+
http://github.com/inbox/new/jarrett
|
62
|
+
|
63
|
+
For most projects, I prefer that people use GitHub's issue tracker. But given the sensitive nature of security vulnerabilities, I prefer private messages for this one.
|
64
|
+
|
65
|
+
== Copyright
|
66
|
+
|
67
|
+
Copyright (c) 2010 Jarrett Colby. See LICENSE for details.
|
data/Rakefile
ADDED
@@ -0,0 +1,53 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'rake'
|
3
|
+
|
4
|
+
begin
|
5
|
+
require 'jeweler'
|
6
|
+
Jeweler::Tasks.new do |gem|
|
7
|
+
gem.name = "sanitize-url"
|
8
|
+
gem.summary = %Q{Sanitizes untrusted URLs}
|
9
|
+
gem.description = %Q{This gem provides a module called SanitizeUrl, which you can mix-in anywhere you like. It provides a single method: sanitize_url, which accepts a URL and returns one with JavaScript removed. It also prepends the http:// scheme if no valid scheme is found.}
|
10
|
+
gem.email = "jarrett@uchicago.edu"
|
11
|
+
gem.homepage = "http://github.com/jarrett/sanitize-url"
|
12
|
+
gem.authors = ["jarrett"]
|
13
|
+
gem.add_development_dependency "thoughtbot-shoulda", ">= 0"
|
14
|
+
# gem is a Gem::Specification... see http://www.rubygems.org/read/chapter/20 for additional settings
|
15
|
+
end
|
16
|
+
Jeweler::GemcutterTasks.new
|
17
|
+
rescue LoadError
|
18
|
+
puts "Jeweler (or a dependency) not available. Install it with: gem install jeweler"
|
19
|
+
end
|
20
|
+
|
21
|
+
require 'rake/testtask'
|
22
|
+
Rake::TestTask.new(:test) do |test|
|
23
|
+
test.libs << 'lib' << 'test'
|
24
|
+
test.pattern = 'test/**/test_*.rb'
|
25
|
+
test.verbose = true
|
26
|
+
end
|
27
|
+
|
28
|
+
begin
|
29
|
+
require 'rcov/rcovtask'
|
30
|
+
Rcov::RcovTask.new do |test|
|
31
|
+
test.libs << 'test'
|
32
|
+
test.pattern = 'test/**/test_*.rb'
|
33
|
+
test.verbose = true
|
34
|
+
end
|
35
|
+
rescue LoadError
|
36
|
+
task :rcov do
|
37
|
+
abort "RCov is not available. In order to run rcov, you must: sudo gem install spicycode-rcov"
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
task :test => :check_dependencies
|
42
|
+
|
43
|
+
task :default => :test
|
44
|
+
|
45
|
+
require 'rake/rdoctask'
|
46
|
+
Rake::RDocTask.new do |rdoc|
|
47
|
+
version = File.exist?('VERSION') ? File.read('VERSION') : ""
|
48
|
+
|
49
|
+
rdoc.rdoc_dir = 'rdoc'
|
50
|
+
rdoc.title = "sanitize-url #{version}"
|
51
|
+
rdoc.rdoc_files.include('README*')
|
52
|
+
rdoc.rdoc_files.include('lib/**/*.rb')
|
53
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.1.1
|
data/lib/sanitize-url.rb
ADDED
@@ -0,0 +1,109 @@
|
|
1
|
+
# Helper methods in this module are module methods so that they won't
|
2
|
+
# pollute the namespace into which the module is mixed in.
|
3
|
+
module SanitizeUrl
|
4
|
+
ALPHANUMERIC_CHAR_CODES = (48..57).to_a + (65..90).to_a + (97..122).to_a
|
5
|
+
|
6
|
+
VALID_OPAQUE_SPECIAL_CHARS = ['!', '*', "'", '(', ')', ';', ':', '@', '&', '=', '+', '$', ',', '/', '?', '%', '#', '[', ']', '-', '_', '.', '~']
|
7
|
+
VALID_OPAQUE_SPECIAL_CHAR_CODES = VALID_OPAQUE_SPECIAL_CHARS.collect { |c| c[0] }
|
8
|
+
VALID_OPAQUE_CHAR_CODES = ALPHANUMERIC_CHAR_CODES + VALID_OPAQUE_SPECIAL_CHAR_CODES
|
9
|
+
|
10
|
+
VALID_SCHEME_SPECIAL_CHARS = ['+', '.', '-']
|
11
|
+
VALID_SCHEME_SPECIAL_CHAR_CODES = VALID_SCHEME_SPECIAL_CHARS.collect { |c| c[0] }
|
12
|
+
VALID_SCHEME_CHAR_CODES = ALPHANUMERIC_CHAR_CODES + VALID_SCHEME_SPECIAL_CHAR_CODES
|
13
|
+
|
14
|
+
HTTP_STYLE_SCHEMES = ['http', 'https', 'ftp', 'ftps', 'svn', 'svn+ssh', 'git'] # Common schemes whose format should be "scheme://" instead of "scheme:"
|
15
|
+
|
16
|
+
def sanitize_url(url, options = {})
|
17
|
+
raise(ArgumentError, 'options[:schemes] must be an array') if options.has_key?(:schemes) and !options[:schemes].is_a?(Array)
|
18
|
+
options = {
|
19
|
+
:replace_evil_with => '',
|
20
|
+
:schemes => ['http', 'https', 'ftp', 'ftps', 'mailto', 'svn', 'svn+ssh', 'git']
|
21
|
+
}.merge(options)
|
22
|
+
|
23
|
+
url = SanitizeUrl.dereference_numerics(url)
|
24
|
+
|
25
|
+
# Schemes can consist of letters, digits, or any of the following special chars: + . -
|
26
|
+
# The scheme must begin with a letter and be terminated by a colon.
|
27
|
+
# Everything after the scheme is opaque for our purposes. (See http://www.w3.org/DesignIssues/Axioms.html#opaque)
|
28
|
+
|
29
|
+
# Try to match a URI with a scheme. We check for percent-encoded characters in the scheme.
|
30
|
+
url.match(/^(.+?)(:|%3A)(.*)$/)
|
31
|
+
dirty_scheme = $1
|
32
|
+
if dirty_scheme
|
33
|
+
unescaped_opaque = $3
|
34
|
+
return options[:replace_evil_with] if unescaped_opaque.nil? or unescaped_opaque.empty? or unescaped_opaque.match(/^\/+$/)
|
35
|
+
else
|
36
|
+
# Use http as the best guest, and the rest of the URL will be considered opaque
|
37
|
+
dirty_scheme = 'http'
|
38
|
+
unescaped_opaque = url
|
39
|
+
end
|
40
|
+
# Remove URL encoding from the scheme
|
41
|
+
dirty_scheme.gsub!(/%([a-zA-Z0-9]{2})/) do
|
42
|
+
code = $1.to_i(16)
|
43
|
+
VALID_SCHEME_CHAR_CODES.include?(code) ? code.chr : ''
|
44
|
+
end
|
45
|
+
|
46
|
+
# Clean the scheme by removing invalid characters
|
47
|
+
scheme = ''
|
48
|
+
dirty_scheme.each_byte do |code|
|
49
|
+
scheme << code.chr if VALID_SCHEME_CHAR_CODES.include?(code)
|
50
|
+
end
|
51
|
+
|
52
|
+
# URL-encode the opaque portion as necessary. Only encode those bytes that are absolutely not allowed in URLs.
|
53
|
+
opaque = ''
|
54
|
+
unescaped_opaque.each_byte do |code|
|
55
|
+
if SanitizeUrl.url_encode?(code)
|
56
|
+
opaque << '%' << code.to_s(16).upcase
|
57
|
+
else
|
58
|
+
opaque << code.chr
|
59
|
+
end
|
60
|
+
end
|
61
|
+
|
62
|
+
if options[:schemes].include?(scheme.downcase)
|
63
|
+
if HTTP_STYLE_SCHEMES.include?(scheme.downcase) and !opaque.match(/^\/\//)
|
64
|
+
# It's an HTTP-like scheme, but the two slashes are missing. We'll fix that as a courtesy.
|
65
|
+
url = scheme + '://' + opaque
|
66
|
+
else
|
67
|
+
# Either the scheme doesn't need the two slashes, or the opaque portion already has them.
|
68
|
+
url = scheme + ':' + opaque
|
69
|
+
end
|
70
|
+
return url
|
71
|
+
else
|
72
|
+
return options[:replace_evil_with]
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
def self.dereference_numerics(str)
|
77
|
+
# Decimal code points, e.g. j j j j
|
78
|
+
str = str.gsub(/&#([a-fA-f0-9]+);?/) do
|
79
|
+
char_or_url_encoded($1.to_i)
|
80
|
+
end
|
81
|
+
# Hex code points, e.g. j j
|
82
|
+
str.gsub(/&#[xX]([a-fA-f0-9]+);?/) do
|
83
|
+
char_or_url_encoded($1.to_i(16))
|
84
|
+
end
|
85
|
+
end
|
86
|
+
|
87
|
+
# Return either the literal char or the URL-encoded equivalent,
|
88
|
+
# depending on our normalization rules. Requires a decimal
|
89
|
+
# code point. Code point can be outside the single-byte range.
|
90
|
+
def self.char_or_url_encoded(code)
|
91
|
+
if url_encode?(code)
|
92
|
+
utf_8_str = ([code.to_i].pack('U'))
|
93
|
+
'%' + utf_8_str.unpack('H2' * utf_8_str.length).join('%').upcase
|
94
|
+
else
|
95
|
+
code.chr
|
96
|
+
end
|
97
|
+
end
|
98
|
+
|
99
|
+
# Should we URL-encode the byte?
|
100
|
+
# Must receive an integer code point
|
101
|
+
def self.url_encode?(code)
|
102
|
+
!(
|
103
|
+
(code >= 48 and code <= 57) or # Numbers
|
104
|
+
(code >= 65 and code <= 90) or # Uppercase
|
105
|
+
(code >= 97 and code <= 122) or # Lowercase
|
106
|
+
VALID_OPAQUE_CHAR_CODES.include?(code)
|
107
|
+
)
|
108
|
+
end
|
109
|
+
end
|
@@ -0,0 +1,55 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in rakefile, and run the gemspec command
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{sanitize-url}
|
8
|
+
s.version = "0.1.1"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["jarrett"]
|
12
|
+
s.date = %q{2010-02-25}
|
13
|
+
s.description = %q{This gem provides a module called SanitizeUrl, which you can mix-in anywhere you like. It provides a single method: sanitize_url, which accepts a URL and returns one with JavaScript removed. It also prepends the http:// scheme if no valid scheme is found.}
|
14
|
+
s.email = %q{jarrett@uchicago.edu}
|
15
|
+
s.extra_rdoc_files = [
|
16
|
+
"LICENSE",
|
17
|
+
"README.markdown"
|
18
|
+
]
|
19
|
+
s.files = [
|
20
|
+
".document",
|
21
|
+
".gitignore",
|
22
|
+
"LICENSE",
|
23
|
+
"README.markdown",
|
24
|
+
"Rakefile",
|
25
|
+
"VERSION",
|
26
|
+
"lib/sanitize-url.rb",
|
27
|
+
"sanitize-url.gemspec",
|
28
|
+
"spec/sanitize_url_spec.rb",
|
29
|
+
"spec/spec_helper.rb",
|
30
|
+
"test.rb"
|
31
|
+
]
|
32
|
+
s.homepage = %q{http://github.com/jarrett/sanitize-url}
|
33
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
34
|
+
s.require_paths = ["lib"]
|
35
|
+
s.rubygems_version = %q{1.3.5}
|
36
|
+
s.summary = %q{Sanitizes untrusted URLs}
|
37
|
+
s.test_files = [
|
38
|
+
"spec/sanitize_url_spec.rb",
|
39
|
+
"spec/spec_helper.rb"
|
40
|
+
]
|
41
|
+
|
42
|
+
if s.respond_to? :specification_version then
|
43
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
44
|
+
s.specification_version = 3
|
45
|
+
|
46
|
+
if Gem::Version.new(Gem::RubyGemsVersion) >= Gem::Version.new('1.2.0') then
|
47
|
+
s.add_development_dependency(%q<thoughtbot-shoulda>, [">= 0"])
|
48
|
+
else
|
49
|
+
s.add_dependency(%q<thoughtbot-shoulda>, [">= 0"])
|
50
|
+
end
|
51
|
+
else
|
52
|
+
s.add_dependency(%q<thoughtbot-shoulda>, [">= 0"])
|
53
|
+
end
|
54
|
+
end
|
55
|
+
|
@@ -0,0 +1,164 @@
|
|
1
|
+
require File.expand_path(File.dirname(__FILE__) + '/spec_helper')
|
2
|
+
|
3
|
+
describe SanitizeUrl do
|
4
|
+
include SanitizeUrl
|
5
|
+
|
6
|
+
describe '#sanitize_url' do
|
7
|
+
it 'replaces JavaScript URLs with options[:replace_evil_with]' do
|
8
|
+
urls = [
|
9
|
+
'javascript:alert("1");',
|
10
|
+
'javascript//:alert("2");',
|
11
|
+
'javascript://alert("3");',
|
12
|
+
'javascript/:/alert("4");',
|
13
|
+
'j a v a script:alert("5");',
|
14
|
+
' javascript:alert("6");',
|
15
|
+
'JavaScript:alert("7");',
|
16
|
+
"java\nscript:alert(\"8\");",
|
17
|
+
"java\rscript:alert(\"9\");"
|
18
|
+
].each do |evil_url|
|
19
|
+
sanitize_url(evil_url, :replace_evil_with => 'replaced').should == 'replaced'
|
20
|
+
end
|
21
|
+
end
|
22
|
+
|
23
|
+
it 'replaces data: URLs with options[:replace_evil_with]' do
|
24
|
+
urls = [
|
25
|
+
'data:text/html;base64,PHNjcmlwdD5hbGVydCgnMScpPC9zY3JpcHQ+',
|
26
|
+
'data://text/html;base64,PHNjcmlwdD5hbGVydCgnMicpPC9zY3JpcHQ+',
|
27
|
+
'data//:text/html;base64,PHNjcmlwdD5hbGVydCgnMycpPC9zY3JpcHQ+',
|
28
|
+
'data/:/text/html;base64,PHNjcmlwdD5hbGVydCgnNCcpPC9zY3JpcHQ+',
|
29
|
+
' data:text/html;base64,PHNjcmlwdD5hbGVydCgnNScpPC9zY3JpcHQ+',
|
30
|
+
'da ta:text/html;base64,PHNjcmlwdD5hbGVydCgnNicpPC9zY3JpcHQ+',
|
31
|
+
'Data:text/html;base64,PHNjcmlwdD5hbGVydCgnNycpPC9zY3JpcHQ+',
|
32
|
+
"da\nta:text/html;base64,PHNjcmlwdD5hbGVydCgnOCcpPC9zY3JpcHQ+",
|
33
|
+
"da\rta:text/html;base64,PHNjcmlwdD5hbGVydCgnOScpPC9zY3JpcHQ+",
|
34
|
+
].each do |evil_url|
|
35
|
+
sanitize_url(evil_url, :replace_evil_with => 'replaced').should == 'replaced'
|
36
|
+
end
|
37
|
+
end
|
38
|
+
|
39
|
+
context 'with :schemes whitelist' do
|
40
|
+
it 'kills anything not on the list' do
|
41
|
+
[
|
42
|
+
'https://example.com',
|
43
|
+
'https:example.com',
|
44
|
+
'ftp://example.com',
|
45
|
+
'ftp:example.com',
|
46
|
+
'data://example.com',
|
47
|
+
'data:example.com',
|
48
|
+
'javascript://example.com',
|
49
|
+
'javascript:example.com',
|
50
|
+
].each do |evil_url|
|
51
|
+
sanitize_url(evil_url, :schemes => ['http'], :replace_evil_with => 'replaced')
|
52
|
+
end
|
53
|
+
end
|
54
|
+
|
55
|
+
it 'allows anything on the list' do
|
56
|
+
[
|
57
|
+
'http://example.com',
|
58
|
+
'https://example.com'
|
59
|
+
].each do |good_url|
|
60
|
+
sanitize_url(good_url, :schemes => ['http', 'https']).should == good_url
|
61
|
+
end
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
it 'prepends http:// if no scheme is given' do
|
66
|
+
sanitize_url('www.example.com').should == 'http://www.example.com'
|
67
|
+
end
|
68
|
+
|
69
|
+
it 'replaces evil URLs that are encoded with Unicode numerical character references' do
|
70
|
+
[
|
71
|
+
'javascript:alert('1')',
|
72
|
+
'javascript:alert('2')'
|
73
|
+
].each do |evil_url|
|
74
|
+
sanitize_url(evil_url, :replace_evil_with => 'replaced').should == 'replaced'
|
75
|
+
end
|
76
|
+
end
|
77
|
+
|
78
|
+
it 'replaces evil URLs that are URL-encoded (hex with %)' do
|
79
|
+
sanitize_url('%6A%61%76%61%73%63%72%69%70%74%3A%61%6C%65%72%74%28%22%58%53%53%22%29', :replace_evil_with => 'replaced').should == 'replaced'
|
80
|
+
end
|
81
|
+
|
82
|
+
it 'does not try to fix broken schemes after the start of the string' do
|
83
|
+
sanitize_url('http://example.com/http/foo').should == 'http://example.com/http/foo'
|
84
|
+
end
|
85
|
+
|
86
|
+
it 'does not prepend an extra http:// if a valid scheme is given' do
|
87
|
+
sanitize_url('http://www.example.com').should == 'http://www.example.com'
|
88
|
+
sanitize_url('https://www.example.com').should == 'https://www.example.com'
|
89
|
+
sanitize_url('ftp://www.example.com').should == 'ftp://www.example.com'
|
90
|
+
end
|
91
|
+
|
92
|
+
it 'dereferences URL-encoded characters in the scheme' do
|
93
|
+
sanitize_url('h%74tp://example.com').should == 'http://example.com'
|
94
|
+
end
|
95
|
+
|
96
|
+
it 'dereferences decimal numeric character references in the scheme' do
|
97
|
+
sanitize_url('http://example.com').should == 'http://example.com'
|
98
|
+
end
|
99
|
+
|
100
|
+
it 'dereferences hex numeric character references in the scheme' do
|
101
|
+
sanitize_url('http://example.com').should == 'http://example.com'
|
102
|
+
end
|
103
|
+
|
104
|
+
it 'retains URL-encoded characters in the opaque portion' do
|
105
|
+
sanitize_url('http://someone%40gmail.com:password@example.com').should == 'http://someone%40gmail.com:password@example.com'
|
106
|
+
end
|
107
|
+
|
108
|
+
it 'URL-encodes code points outside ASCII' do
|
109
|
+
# Percent-encoding should be in UTF-8 (RFC 3986).
|
110
|
+
# http://en.wikipedia.org/wiki/Percent-encoding#Current_standard
|
111
|
+
sanitize_url('http://Д').should == 'http://%D0%94'
|
112
|
+
sanitize_url('http://Д').should == 'http://%D0%94'
|
113
|
+
sanitize_url("http://\xD0\x94").should == 'http://%D0%94' # UTF-8 version of the same.
|
114
|
+
end
|
115
|
+
|
116
|
+
it 'replaces URLs without the opaque portion' do
|
117
|
+
sanitize_url('http://', :replace_evil_with => 'replaced').should == 'replaced'
|
118
|
+
sanitize_url('mailto:', :replace_evil_with => 'replaced').should == 'replaced'
|
119
|
+
end
|
120
|
+
|
121
|
+
it 'adds the two slashes for known schemes that require it' do
|
122
|
+
sanitize_url('http:example.com').should == 'http://example.com'
|
123
|
+
sanitize_url('ftp:example.com').should == 'ftp://example.com'
|
124
|
+
sanitize_url('svn+ssh:example.com').should == 'svn+ssh://example.com'
|
125
|
+
end
|
126
|
+
|
127
|
+
it 'does not add slashes for schemes that do not require it' do
|
128
|
+
sanitize_url('mailto:someone@example.com').should == 'mailto:someone@example.com'
|
129
|
+
end
|
130
|
+
|
131
|
+
it 'strips invalid characters from the scheme and then evaluates the scheme according to the normal rules' do
|
132
|
+
sanitize_url("ht\xD0\x94tp://example.com").should == 'http://example.com'
|
133
|
+
sanitize_url('htt$p://example.com').should == 'http://example.com'
|
134
|
+
sanitize_url('j%avascript:alert("XSS")', :replace_evil_with => 'replaced').should == 'replaced'
|
135
|
+
end
|
136
|
+
end
|
137
|
+
|
138
|
+
|
139
|
+
describe '.dereference_numerics' do
|
140
|
+
it 'decodes short-form decimal UTF-8 character references with a semicolon' do
|
141
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
142
|
+
end
|
143
|
+
|
144
|
+
it 'decodes short-form decimal UTF-8 character references without a semicolon' do
|
145
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
146
|
+
end
|
147
|
+
|
148
|
+
it 'decodes long-form decimal UTF-8 character references with a semicolon' do
|
149
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
150
|
+
end
|
151
|
+
|
152
|
+
it 'decodes long-form decimal UTF-8 character references without a semicolon' do
|
153
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
154
|
+
end
|
155
|
+
|
156
|
+
it 'decodes hex UTF-8 character references with a semicolon' do
|
157
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
158
|
+
end
|
159
|
+
|
160
|
+
it 'decodes hex UTF-8 character references without a semicolon' do
|
161
|
+
SanitizeUrl.dereference_numerics('j').should == 'j'
|
162
|
+
end
|
163
|
+
end
|
164
|
+
end
|
data/spec/spec_helper.rb
ADDED
data/test.rb
ADDED
@@ -0,0 +1,16 @@
|
|
1
|
+
# Copyright sign
|
2
|
+
|
3
|
+
#def decimal_code_point_to_url_encoded(code_point)
|
4
|
+
# utf_8_str = ([code_point.to_i].pack('U'))
|
5
|
+
# '%' + utf_8_str.unpack('H2' * utf_8_str.length).join('%').upcase
|
6
|
+
#end
|
7
|
+
|
8
|
+
hex_code_point = 'A9'
|
9
|
+
decimal_code_point = '169'
|
10
|
+
hex_utf_8_bytes = '%C2%A9'
|
11
|
+
|
12
|
+
#puts 'Expected: ' + hex_utf_8_bytes
|
13
|
+
#puts 'Actual: ' + decimal_code_point_to_url_encoded(decimal_code_point)
|
14
|
+
|
15
|
+
evil = 'javascript:alert("XSS")'
|
16
|
+
puts evil.unpack('H2' * evil.length).join('%').upcase
|
metadata
ADDED
@@ -0,0 +1,76 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: sanitize-url
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.1
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- jarrett
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
|
12
|
+
date: 2010-02-25 00:00:00 -06:00
|
13
|
+
default_executable:
|
14
|
+
dependencies:
|
15
|
+
- !ruby/object:Gem::Dependency
|
16
|
+
name: thoughtbot-shoulda
|
17
|
+
type: :development
|
18
|
+
version_requirement:
|
19
|
+
version_requirements: !ruby/object:Gem::Requirement
|
20
|
+
requirements:
|
21
|
+
- - ">="
|
22
|
+
- !ruby/object:Gem::Version
|
23
|
+
version: "0"
|
24
|
+
version:
|
25
|
+
description: "This gem provides a module called SanitizeUrl, which you can mix-in anywhere you like. It provides a single method: sanitize_url, which accepts a URL and returns one with JavaScript removed. It also prepends the http:// scheme if no valid scheme is found."
|
26
|
+
email: jarrett@uchicago.edu
|
27
|
+
executables: []
|
28
|
+
|
29
|
+
extensions: []
|
30
|
+
|
31
|
+
extra_rdoc_files:
|
32
|
+
- LICENSE
|
33
|
+
- README.markdown
|
34
|
+
files:
|
35
|
+
- .document
|
36
|
+
- .gitignore
|
37
|
+
- LICENSE
|
38
|
+
- README.markdown
|
39
|
+
- Rakefile
|
40
|
+
- VERSION
|
41
|
+
- lib/sanitize-url.rb
|
42
|
+
- sanitize-url.gemspec
|
43
|
+
- spec/sanitize_url_spec.rb
|
44
|
+
- spec/spec_helper.rb
|
45
|
+
- test.rb
|
46
|
+
has_rdoc: true
|
47
|
+
homepage: http://github.com/jarrett/sanitize-url
|
48
|
+
licenses: []
|
49
|
+
|
50
|
+
post_install_message:
|
51
|
+
rdoc_options:
|
52
|
+
- --charset=UTF-8
|
53
|
+
require_paths:
|
54
|
+
- lib
|
55
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
56
|
+
requirements:
|
57
|
+
- - ">="
|
58
|
+
- !ruby/object:Gem::Version
|
59
|
+
version: "0"
|
60
|
+
version:
|
61
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
62
|
+
requirements:
|
63
|
+
- - ">="
|
64
|
+
- !ruby/object:Gem::Version
|
65
|
+
version: "0"
|
66
|
+
version:
|
67
|
+
requirements: []
|
68
|
+
|
69
|
+
rubyforge_project:
|
70
|
+
rubygems_version: 1.3.5
|
71
|
+
signing_key:
|
72
|
+
specification_version: 3
|
73
|
+
summary: Sanitizes untrusted URLs
|
74
|
+
test_files:
|
75
|
+
- spec/sanitize_url_spec.rb
|
76
|
+
- spec/spec_helper.rb
|