url_regex 0.0.2 → 0.0.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +24 -14
- data/lib/url_regex.rb +52 -36
- data/lib/url_regex/version.rb +1 -1
- data/spec/url_regex_spec.rb +18 -1
- data/url_regex.gemspec +1 -1
- metadata +4 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0908e7a609c4471a328ebe341b1cbfab0c231439
|
4
|
+
data.tar.gz: 21ad64e6a6f5f69f864c7a41d87c7559e6c7d714
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0ce85ec9ee8f114b56637b32ce42ff78a0072806b3c99442f4317675d0917b1f70de123e876e754c812add2d00497711c6aeaf8d5d36fc110ff00ecff6581e33
|
7
|
+
data.tar.gz: bad8abe5797ca1790f1be3cc190ec6ca4d0f04ab3c9f0ee7fc8fb34868473c88662d5d5cc44d778c59b96715c7f27a5eaf70db582cb39d69cd6ac0be7a33b03b
|
data/README.md
CHANGED
@@ -1,18 +1,18 @@
|
|
1
1
|
[](https://travis-ci.org/amogil/url_regex)
|
2
2
|
[](https://codeclimate.com/github/amogil/url_regex)
|
3
|
-
[](https://badge.fury.io/rb/url_regex)
|
4
4
|
[](https://gemnasium.com/github.com/amogil/url_regex)
|
5
5
|
|
6
6
|
# UrlRegex
|
7
7
|
|
8
8
|
Provides the best known regex for validating and extracting URLs.
|
9
|
-
It
|
9
|
+
It builds on amazing work done by [Diego Perini](https://gist.github.com/dperini/729294)
|
10
10
|
and [Mathias Bynens](https://mathiasbynens.be/demo/url-regex).
|
11
11
|
|
12
|
-
Why do we need a gem for this regex?
|
12
|
+
Why do we need a gem for this regex?
|
13
13
|
|
14
|
-
- You don't need to
|
15
|
-
- You
|
14
|
+
- You don't need to follow changes and improvements of original regex.
|
15
|
+
- You can slightly customize the regex: a scheme can be optional, and you can get the regex for validation or parsing.
|
16
16
|
|
17
17
|
## Installation
|
18
18
|
|
@@ -33,12 +33,12 @@ Or install it yourself as:
|
|
33
33
|
Get the regex:
|
34
34
|
|
35
35
|
UrlRegex.get(options)
|
36
|
-
|
36
|
+
|
37
37
|
where options are:
|
38
38
|
|
39
39
|
- `scheme_required` indicates that schema is required, defaults to `true`.
|
40
40
|
|
41
|
-
- `mode` can gets either `:validation` or `:
|
41
|
+
- `mode` can gets either `:validation`, `:parsing` or `:javascript`, defaults to `:validation`.
|
42
42
|
|
43
43
|
`:validation` asks to return the regex for validation, namely, with `\A` prefix, and with `\z` postfix.
|
44
44
|
That means, it matches whole text:
|
@@ -47,17 +47,27 @@ That means, it matches whole text:
|
|
47
47
|
# => false
|
48
48
|
UrlRegex.get(mode: :validation).match('link: https://www.google.com').nil?
|
49
49
|
# => true
|
50
|
-
|
50
|
+
|
51
51
|
`:parsing` asks to return the regex for parsing:
|
52
52
|
|
53
53
|
str = 'links: google.com https://google.com?t=1'
|
54
54
|
str.scan(UrlRegex.get(mode: :parsing))
|
55
55
|
# => ["https://google.com?t=1"]
|
56
|
-
|
56
|
+
|
57
57
|
# schema is not required
|
58
58
|
str.scan(UrlRegex.get(scheme_required: false, mode: :parsing))
|
59
59
|
# => ["google.com", "https://google.com?t=1"]
|
60
60
|
|
61
|
+
`:javascript` asks to return the regex formatted for use in Javascript files or as `pattern` attribute values on HTML inputs. For this purpose, you'd use the `source` method on the Regexp object instance in order to produce a string that Javascript will understand. These examples make use of the Rails `text_field` method to generate HTML input elements.
|
62
|
+
|
63
|
+
regex = UrlRegex.get(mode: :javascript)
|
64
|
+
text_field(:site, :url, pattern: regex.source)
|
65
|
+
# => <input type="text" id="site_url" name="site[url]" pattern="[javascript URL regex]" />
|
66
|
+
|
67
|
+
regex = UrlRegex.get(scheme_required: false, mode: :javascript)
|
68
|
+
text_field(:site, :url, pattern: regex.source)
|
69
|
+
# => <input type="text" id="site_url" name="site[url]" pattern="[javascript URL regex with optional scheme]" />
|
70
|
+
|
61
71
|
`UrlRegex.get` returns regular Ruby's [Regex](http://ruby-doc.org/core-2.0.0/Regexp.html) object,
|
62
72
|
so you can use it as usual.
|
63
73
|
|
@@ -66,18 +76,18 @@ All regexes are case-insensitive.
|
|
66
76
|
## FAQ
|
67
77
|
|
68
78
|
Q: Hey, I want to parse HTML, but it doesn't work:
|
69
|
-
|
79
|
+
|
70
80
|
str = '<a href="http://google.com?t=1">Link</a>'
|
71
81
|
str.scan(UrlRegex.get(mode: :parsing))
|
72
82
|
# => "http://google.com?t=1">Link</a>"
|
73
|
-
|
74
|
-
A: Well, you probably know that parsing HTML with regex is
|
83
|
+
|
84
|
+
A: Well, you probably know that parsing HTML with regex is
|
75
85
|
[a bad idea](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags).
|
76
86
|
It requires matching corresponding open and close brackets, that makes the regex even more complicated.
|
77
87
|
|
78
88
|
Q: How can I speed up processing?
|
79
89
|
|
80
|
-
A: Generated regex depends only on options, so you can get the regex only once and cache it.
|
90
|
+
A: Generated regex depends only on options, so you can get the regex only once and cache it.
|
81
91
|
|
82
92
|
## Contributing
|
83
93
|
|
@@ -85,4 +95,4 @@ A: Generated regex depends only on options, so you can get the regex only once a
|
|
85
95
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
86
96
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
87
97
|
4. Push to the branch (`git push origin my-new-feature`)
|
88
|
-
5. Create a new Pull Request
|
98
|
+
5. Create a new Pull Request
|
data/lib/url_regex.rb
CHANGED
@@ -3,7 +3,6 @@ require 'url_regex/version'
|
|
3
3
|
# Provides the best known regex for validating and extracting URLs.
|
4
4
|
# It uses amazing job done by [Diego Perini](https://gist.github.com/dperini/729294)
|
5
5
|
# and [Mathias Bynens](https://mathiasbynens.be/demo/url-regex).
|
6
|
-
|
7
6
|
module UrlRegex
|
8
7
|
# Returns the regex for URLs parsing or validating.
|
9
8
|
#
|
@@ -12,49 +11,66 @@ module UrlRegex
|
|
12
11
|
# @return [Regex] regex for parsing or validating
|
13
12
|
def self.get(scheme_required: true, mode: :validation)
|
14
13
|
raise ArgumentError, "wrong mode: #{mode}" if MODES.index(mode).nil?
|
14
|
+
|
15
15
|
scheme = scheme_required ? PROTOCOL_IDENTIFIER : PROTOCOL_IDENTIFIER_OPTIONAL
|
16
|
-
|
16
|
+
|
17
|
+
case mode
|
18
|
+
when :validation
|
19
|
+
/\A#{scheme} #{BASE}\z/xi
|
20
|
+
when :parsing
|
21
|
+
/#{scheme} #{BASE}/xi
|
22
|
+
when :javascript
|
23
|
+
/^#{scheme}#{JAVASCRIPT_BASE}$/
|
24
|
+
end
|
17
25
|
end
|
18
26
|
|
19
27
|
BASE = '
|
20
|
-
|
21
|
-
|
28
|
+
# user:pass authentication
|
29
|
+
(?:\S+(?::\S*)?@)?
|
22
30
|
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
31
|
+
(?:
|
32
|
+
# IP address exclusion
|
33
|
+
# private & local networks
|
34
|
+
(?!(?:10|127)(?:\.\d{1,3}){3})
|
35
|
+
(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})
|
36
|
+
(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})
|
37
|
+
# IP address dotted notation octets
|
38
|
+
# excludes loopback network 0.0.0.0
|
39
|
+
# excludes reserved space >= 224.0.0.0
|
40
|
+
# excludes network & broadcast addresses
|
41
|
+
# (first & last IP address of each class)
|
42
|
+
(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])
|
43
|
+
(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}
|
44
|
+
(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))
|
45
|
+
|
|
46
|
+
# host name
|
47
|
+
(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)
|
48
|
+
# domain name
|
49
|
+
(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*
|
50
|
+
# TLD identifier
|
51
|
+
(?:\.(?:[a-z\u00a1-\uffff]{2,}))
|
52
|
+
# TLD may end with dot
|
53
|
+
\.?
|
54
|
+
)
|
55
|
+
|
56
|
+
# port number
|
57
|
+
(?::\d{2,5})?
|
58
|
+
|
59
|
+
# resource path
|
60
|
+
(?:[/?#]\S*)?
|
61
|
+
'.freeze
|
62
|
+
|
63
|
+
JAVASCRIPT_BASE = '
|
64
|
+
(?:\S+(?::\S*)?@)?
|
65
|
+
(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})
|
66
|
+
(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])
|
67
|
+
(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|
|
68
|
+
(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*
|
69
|
+
(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?'.gsub(/\s+/, '').freeze
|
54
70
|
|
55
71
|
PROTOCOL_IDENTIFIER = '(?:(?:https?|ftp)://)'.freeze
|
56
72
|
PROTOCOL_IDENTIFIER_OPTIONAL = '(?:(?:https?|ftp)://)?'.freeze
|
57
|
-
MODES = [:validation, :parsing].freeze
|
73
|
+
MODES = [:validation, :parsing, :javascript].freeze
|
58
74
|
|
59
75
|
private_constant :BASE, :PROTOCOL_IDENTIFIER, :PROTOCOL_IDENTIFIER_OPTIONAL, :MODES
|
60
76
|
end
|
data/lib/url_regex/version.rb
CHANGED
data/spec/url_regex_spec.rb
CHANGED
@@ -6,8 +6,9 @@ describe UrlRegex do
|
|
6
6
|
expect(UrlRegex.get(mode: :parsing)).to be
|
7
7
|
end
|
8
8
|
|
9
|
-
it 'should raise ArgumentError if mode neither :validation nor :parsing' do
|
9
|
+
it 'should raise ArgumentError if mode neither :validation nor :parsing nor :javascript' do
|
10
10
|
expect { UrlRegex.get(mode: nil) }.to raise_error ArgumentError
|
11
|
+
expect { UrlRegex.get(mode: :hahaha) }.to raise_error ArgumentError
|
11
12
|
end
|
12
13
|
end
|
13
14
|
|
@@ -56,6 +57,10 @@ describe UrlRegex do
|
|
56
57
|
it "should match #{valid_url}" do
|
57
58
|
expect(UrlRegex.get(scheme_required: true)).to match valid_url
|
58
59
|
end
|
60
|
+
|
61
|
+
it "should match #{valid_url} against Javascript regex" do
|
62
|
+
expect(UrlRegex.get(scheme_required: true, mode: :javascript)).to match valid_url
|
63
|
+
end
|
59
64
|
end
|
60
65
|
|
61
66
|
[
|
@@ -100,6 +105,10 @@ describe UrlRegex do
|
|
100
105
|
it "should not match #{invalid_url}" do
|
101
106
|
expect(UrlRegex.get(scheme_required: true)).to_not match invalid_url
|
102
107
|
end
|
108
|
+
|
109
|
+
it "should not match #{invalid_url} against Javascript regex" do
|
110
|
+
expect(UrlRegex.get(scheme_required: true, mode: :javascript)).to_not match invalid_url
|
111
|
+
end
|
103
112
|
end
|
104
113
|
end
|
105
114
|
|
@@ -149,6 +158,10 @@ describe UrlRegex do
|
|
149
158
|
it "should match #{valid_url}" do
|
150
159
|
expect(UrlRegex.get(scheme_required: false)).to match valid_url
|
151
160
|
end
|
161
|
+
|
162
|
+
it "should match #{valid_url} against Javascript regex" do
|
163
|
+
expect(UrlRegex.get(scheme_required: false, mode: :javascript)).to match valid_url
|
164
|
+
end
|
152
165
|
end
|
153
166
|
|
154
167
|
[
|
@@ -191,6 +204,10 @@ describe UrlRegex do
|
|
191
204
|
it "should not match #{invalid_url}" do
|
192
205
|
expect(UrlRegex.get(scheme_required: false)).to_not match invalid_url
|
193
206
|
end
|
207
|
+
|
208
|
+
it "should not match #{invalid_url} against Javascript regex" do
|
209
|
+
expect(UrlRegex.get(scheme_required: false, mode: :javascript)).to_not match invalid_url
|
210
|
+
end
|
194
211
|
end
|
195
212
|
end
|
196
213
|
|
data/url_regex.gemspec
CHANGED
@@ -6,7 +6,7 @@ require 'url_regex/version'
|
|
6
6
|
Gem::Specification.new do |spec|
|
7
7
|
spec.name = 'url_regex'
|
8
8
|
spec.version = UrlRegex::VERSION
|
9
|
-
spec.authors = ['Alexey Mogilnikov']
|
9
|
+
spec.authors = ['Alexey Mogilnikov', 'Michael Bester']
|
10
10
|
spec.email = ['alexey@mogilnikov.name']
|
11
11
|
spec.summary = 'Provides the best regex for validating or extracting URLs.'
|
12
12
|
spec.description = 'Provides the best regex for validating or extracting URLs.'
|
metadata
CHANGED
@@ -1,14 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: url_regex
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Alexey Mogilnikov
|
8
|
+
- Michael Bester
|
8
9
|
autorequire:
|
9
10
|
bindir: bin
|
10
11
|
cert_chain: []
|
11
|
-
date:
|
12
|
+
date: 2017-01-22 00:00:00.000000000 Z
|
12
13
|
dependencies:
|
13
14
|
- !ruby/object:Gem::Dependency
|
14
15
|
name: bundler
|
@@ -104,7 +105,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
104
105
|
version: '0'
|
105
106
|
requirements: []
|
106
107
|
rubyforge_project:
|
107
|
-
rubygems_version: 2.
|
108
|
+
rubygems_version: 2.4.8
|
108
109
|
signing_key:
|
109
110
|
specification_version: 4
|
110
111
|
summary: Provides the best regex for validating or extracting URLs.
|