disinfect_url 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: fe51acef20c9a6e1ddf2648f9d242e6d23d5225bb626d48c42bed908c0eea734
4
+ data.tar.gz: 0a39172196fa928c1eb10b606c08af12d1d2fd9b624b54aabf8b4ed4b18cc9c8
5
+ SHA512:
6
+ metadata.gz: e65bdbe96f9ff1f2f25fa988ec2c94f7f16418ee2b42767f6e6029a8dedc61c1c02854f1f112a36629a62bfcbe3613b6970a7f40949ed074b5caa679d9a86dfb
7
+ data.tar.gz: bebc66b0ca7aba77c83589ef64403c9fb0673519096fec2d1cd2f635237d30ecfb24b78f2c7105b525cb3725214a299bea20b17141dcccfe8055b64b2ff35d94
data/CHANGELOG.md ADDED
@@ -0,0 +1,8 @@
1
+ # Changelog
2
+
3
+ ## [0.1.0] - 2023-07-04
4
+
5
+ ### Added
6
+
7
+ - Implemented sanitization on URLs and HTML `href` attributes in `<a>` tags.
8
+ - Added RSpec tests for the sanitization functionality.
data/README.md ADDED
@@ -0,0 +1,94 @@
1
+ # DisinfectUrl
2
+
3
+ This gem was _heavily_ influenced by Braintree's [sanitize-url](https://github.com/braintree/sanitize-url/tree/main)
4
+
5
+ ## Installation
6
+
7
+ ```ruby
8
+ gem 'disinfect_url'
9
+ ```
10
+
11
+ ## Usage
12
+
13
+ ### Requirements
14
+
15
+ This gem requires Ruby 2.5+
16
+
17
+ ### Basic Usage
18
+
19
+ Convert bad urls to "about:blank"
20
+
21
+ ```ruby
22
+ DisinfectUrl.sanitize("https://example.com")
23
+ # => https://example.com
24
+
25
+ DisinfectUrl.sanitize("http://example.com")
26
+ # => http://example.com
27
+
28
+ DisinfectUrl.sanitize("www.example.com")
29
+ # => www.example.com
30
+
31
+ DisinfectUrl.sanitize("mailto:hello@example.com")
32
+ # => mailto:hello@example.com
33
+
34
+ DisinfectUrl.sanitize("&#104;&#116;&#116;&#112;&#115;&#0000058//&#101;&#120;&#97;&#109;&#112;&#108;&#101;&#46;&#99;&#111;&#109;")
35
+ # => https://example.com
36
+
37
+ DisinfectUrl.sanitize("javascript:alert(document.domain)")
38
+ DisinfectUrl.sanitize("jAvasCrIPT:alert(document.domain)")
39
+ DisinfectUrl.sanitize("JaVaScRiP%0at:alert(document.domain)")
40
+ # => about:blank
41
+
42
+ #HTML encoded javascript:alert('XSS')
43
+ DisinfectUrl.sanitize("&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041")
44
+ # => about:blank
45
+
46
+ # Works the same way for href attribute within <a> tags
47
+ DisinfectUrl.sanitize(%q(<a href="javascript:alert('attack')">Example</a>))
48
+ # => <a href="about:blank">Example</a>
49
+ ```
50
+
51
+ ### ActiveRecord Callbacks
52
+
53
+ ```ruby
54
+ before_validation :disinfect_website_url
55
+ before_validation :disinfect_biography
56
+
57
+ private
58
+
59
+ def disinfect_website_url
60
+ self.website_url = DisinfectUrl.sanitize(self.website_url)
61
+ end
62
+
63
+ def disinfect_biography
64
+ self.biography = DisinfectUrl.sanitize(self.biography)
65
+ end
66
+ ```
67
+
68
+ ### Validation Note
69
+
70
+ This gem doesn't perform any validation for differentiating HTML vs URL. You may need to perform additional validation.
71
+
72
+ ```ruby
73
+ def disinfect_website_url
74
+ self.website_url = DisinfectUrl.sanitize(%q(<a href="https://example.com">Example</a>))
75
+ end
76
+ ```
77
+
78
+ ## Development
79
+
80
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
81
+
82
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
83
+
84
+ ## Contributing
85
+
86
+ Bug reports and pull requests are welcome on GitHub at https://github.com/stevenjcumming/disinfect_url. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [code of conduct](https://github.com/stevenjcumming/disinfect_url/blob/main/CODE_OF_CONDUCT.md).
87
+
88
+ ## License
89
+
90
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
91
+
92
+ ## Code of Conduct
93
+
94
+ Everyone interacting in the DisinfectUrl project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](https://github.com/stevenjcumming/disinfect_url/blob/main/CODE_OF_CONDUCT.md).
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "nokogiri"
4
+ require "disinfect_url"
5
+
6
+ # A class responsible for sanitizing HTML by cleaning up URLs within <a> tags.
7
+ # Not recommended to be used directly
8
+ module DisinfectUrl
9
+ class HTMLSanitizer
10
+ class << self
11
+ # Sanitizes the given HTML by cleaning up URLs in href attributes within <a> tags.
12
+ #
13
+ # @param html [String] The HTML to sanitize.
14
+ # @return [String] The sanitized HTML.
15
+ def sanitize(html)
16
+ return nil if html.nil? || html.to_s.strip.empty?
17
+
18
+ fragment = Nokogiri::HTML.fragment(html)
19
+
20
+ fragment.css("a").each do |link|
21
+ link["href"] = DisinfectUrl::URLSanitizer.sanitize(link["href"])
22
+ end
23
+
24
+ fragment.to_html
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,64 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "cgi"
4
+
5
+ # A class responsible for sanitizing URLs.
6
+ # Not recommended to be used directly
7
+ module DisinfectUrl
8
+ class URLSanitizer
9
+ INVALID_PROTOCOL_REGEX = /^([^\w]*)(javascript|data|vbscript)/im.freeze
10
+ HTML_ENTITIES_REGEX = /&#(\w+)(?!\w|;)?/.freeze
11
+ HTML_CTRL_ENTITY_REGEX = /&(newline|tab);/i.freeze
12
+ CTRL_CHARACTERS_REGEX = /[\u0000-\u001F\u007F-\u009F\u2000-\u200D\uFEFF]/m.freeze
13
+ URL_SCHEME_REGEX = /^.+(:|&colon;)/mi.freeze
14
+ RELATIVE_FIRST_CHARACTERS = [".", "/"].freeze
15
+
16
+ class << self
17
+ # Sanitizes the given URL by removing invalid parts and characters.
18
+ #
19
+ # @param url [String] The URL to sanitize.
20
+ # @return [String] The sanitized URL.
21
+ def sanitize(url)
22
+ return nil if url.nil? || url.to_s.strip.empty?
23
+
24
+ sanitized_url = decode_html_characters(url || "")
25
+ .gsub(HTML_CTRL_ENTITY_REGEX, "")
26
+ .gsub(CTRL_CHARACTERS_REGEX, "")
27
+ .strip
28
+
29
+ return "about:blank" if sanitized_url.empty?
30
+
31
+ return sanitized_url if relative_url_without_protocol?(sanitized_url)
32
+
33
+ url_scheme_parse_results = sanitized_url.match(URL_SCHEME_REGEX)
34
+
35
+ return sanitized_url unless url_scheme_parse_results
36
+
37
+ url_scheme = url_scheme_parse_results[0]
38
+
39
+ return "about:blank" if INVALID_PROTOCOL_REGEX.match(url_scheme)
40
+
41
+ sanitized_url
42
+ end
43
+
44
+ private
45
+
46
+ # Checks if the URL is a relative URL without a protocol.
47
+ #
48
+ # @param url [String] The URL to check.
49
+ # @return [Boolean] `true` if the URL is a relative URL without a protocol, `false` otherwise.
50
+ def relative_url_without_protocol?(url)
51
+ RELATIVE_FIRST_CHARACTERS.include?(url[0])
52
+ end
53
+
54
+ # Decodes HTML entities in the given string.
55
+ #
56
+ # @param str [String] The string to decode.
57
+ # @return [String] The decoded string.
58
+ def decode_html_characters(str)
59
+ str = CGI.unescapeHTML(str)
60
+ str.gsub(HTML_ENTITIES_REGEX) { |match| match[2..-1].to_i.chr }
61
+ end
62
+ end
63
+ end
64
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DisinfectUrl
4
+ VERSION = "0.2.0"
5
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "disinfect_url/version"
4
+ require_relative "disinfect_url/html_sanitizer"
5
+ require_relative "disinfect_url/url_sanitizer"
6
+
7
+ # Sanitizes a URL or an HTML string depending on the input.
8
+ module DisinfectUrl
9
+ class << self
10
+ # If the input is a URL (a string), it sanitizes the URL by removing
11
+ # potentially dangerous elements and characters.
12
+ #
13
+ # If the input is an HTML string (a string), it sanitizes the URLs within
14
+ # <a> tags by removing potentially dangerous elements and attributes.
15
+ #
16
+ # @param input [String] The URL or HTML string to be sanitized.
17
+ # @return [String, nil] The sanitized URL or HTML string, or nil if the input is not a String.
18
+ def sanitize(input)
19
+ return unless input.is_a?(String)
20
+
21
+ HTMLSanitizer.sanitize(URLSanitizer.sanitize(input))
22
+ end
23
+ end
24
+ end
metadata ADDED
@@ -0,0 +1,138 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: disinfect_url
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.2.0
5
+ platform: ruby
6
+ authors:
7
+ - stevenjcumming
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2023-07-04 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rubocop
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.21'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.21'
69
+ - !ruby/object:Gem::Dependency
70
+ name: rubocop-rspec
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '2.22'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '2.22'
83
+ - !ruby/object:Gem::Dependency
84
+ name: yard
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: 0.9.34
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: 0.9.34
97
+ description: A gem to sanitize URLs or HTML href attributes within <a> tags to help
98
+ prevent XSS attacks.
99
+ email:
100
+ executables: []
101
+ extensions: []
102
+ extra_rdoc_files: []
103
+ files:
104
+ - CHANGELOG.md
105
+ - README.md
106
+ - lib/disinfect_url.rb
107
+ - lib/disinfect_url/html_sanitizer.rb
108
+ - lib/disinfect_url/url_sanitizer.rb
109
+ - lib/disinfect_url/version.rb
110
+ homepage: https://github.com/stevenjcumming/disinfect_url
111
+ licenses:
112
+ - MIT
113
+ metadata:
114
+ homepage_uri: https://github.com/stevenjcumming/disinfect_url
115
+ documentation_uri: https://rubydoc.info/github/stevenjcumming/disinfect_url
116
+ changelog_uri: https://github.com/stevenjcumming/disinfect_url/blob/main/CHANGELOG.md
117
+ source_code_uri: https://github.com/stevenjcumming/disinfect_url
118
+ bug_tracker_uri: https://github.com/stevenjcumming/disinfect_url/issues
119
+ post_install_message:
120
+ rdoc_options: []
121
+ require_paths:
122
+ - lib
123
+ required_ruby_version: !ruby/object:Gem::Requirement
124
+ requirements:
125
+ - - ">="
126
+ - !ruby/object:Gem::Version
127
+ version: 2.4.0
128
+ required_rubygems_version: !ruby/object:Gem::Requirement
129
+ requirements:
130
+ - - ">="
131
+ - !ruby/object:Gem::Version
132
+ version: '0'
133
+ requirements: []
134
+ rubygems_version: 3.4.6
135
+ signing_key:
136
+ specification_version: 4
137
+ summary: A gem to sanitize URLs or HTML
138
+ test_files: []