url_regex 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +24 -14
- data/lib/url_regex.rb +52 -36
- data/lib/url_regex/version.rb +1 -1
- data/spec/url_regex_spec.rb +18 -1
- data/url_regex.gemspec +1 -1
- metadata +4 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 0908e7a609c4471a328ebe341b1cbfab0c231439
|
4
|
+
data.tar.gz: 21ad64e6a6f5f69f864c7a41d87c7559e6c7d714
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0ce85ec9ee8f114b56637b32ce42ff78a0072806b3c99442f4317675d0917b1f70de123e876e754c812add2d00497711c6aeaf8d5d36fc110ff00ecff6581e33
|
7
|
+
data.tar.gz: bad8abe5797ca1790f1be3cc190ec6ca4d0f04ab3c9f0ee7fc8fb34868473c88662d5d5cc44d778c59b96715c7f27a5eaf70db582cb39d69cd6ac0be7a33b03b
|
data/README.md
CHANGED
@@ -1,18 +1,18 @@
|
|
1
1
|
[![Build Status](https://travis-ci.org/amogil/url_regex.svg?branch=master)](https://travis-ci.org/amogil/url_regex)
|
2
2
|
[![Code Climate](https://codeclimate.com/github/amogil/url_regex/badges/gpa.svg)](https://codeclimate.com/github/amogil/url_regex)
|
3
|
-
[![
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/url_regex.svg)](https://badge.fury.io/rb/url_regex)
|
4
4
|
[![Dependency Status](https://gemnasium.com/badges/github.com/amogil/url_regex.svg)](https://gemnasium.com/github.com/amogil/url_regex)
|
5
5
|
|
6
6
|
# UrlRegex
|
7
7
|
|
8
8
|
Provides the best known regex for validating and extracting URLs.
|
9
|
-
It
|
9
|
+
It builds on amazing work done by [Diego Perini](https://gist.github.com/dperini/729294)
|
10
10
|
and [Mathias Bynens](https://mathiasbynens.be/demo/url-regex).
|
11
11
|
|
12
|
-
Why do we need a gem for this regex?
|
12
|
+
Why do we need a gem for this regex?
|
13
13
|
|
14
|
-
- You don't need to
|
15
|
-
- You
|
14
|
+
- You don't need to follow changes and improvements of original regex.
|
15
|
+
- You can slightly customize the regex: a scheme can be optional, and you can get the regex for validation or parsing.
|
16
16
|
|
17
17
|
## Installation
|
18
18
|
|
@@ -33,12 +33,12 @@ Or install it yourself as:
|
|
33
33
|
Get the regex:
|
34
34
|
|
35
35
|
UrlRegex.get(options)
|
36
|
-
|
36
|
+
|
37
37
|
where options are:
|
38
38
|
|
39
39
|
- `scheme_required` indicates that schema is required, defaults to `true`.
|
40
40
|
|
41
|
-
- `mode` can gets either `:validation` or `:
|
41
|
+
- `mode` can gets either `:validation`, `:parsing` or `:javascript`, defaults to `:validation`.
|
42
42
|
|
43
43
|
`:validation` asks to return the regex for validation, namely, with `\A` prefix, and with `\z` postfix.
|
44
44
|
That means, it matches whole text:
|
@@ -47,17 +47,27 @@ That means, it matches whole text:
|
|
47
47
|
# => false
|
48
48
|
UrlRegex.get(mode: :validation).match('link: https://www.google.com').nil?
|
49
49
|
# => true
|
50
|
-
|
50
|
+
|
51
51
|
`:parsing` asks to return the regex for parsing:
|
52
52
|
|
53
53
|
str = 'links: google.com https://google.com?t=1'
|
54
54
|
str.scan(UrlRegex.get(mode: :parsing))
|
55
55
|
# => ["https://google.com?t=1"]
|
56
|
-
|
56
|
+
|
57
57
|
# schema is not required
|
58
58
|
str.scan(UrlRegex.get(scheme_required: false, mode: :parsing))
|
59
59
|
# => ["google.com", "https://google.com?t=1"]
|
60
60
|
|
61
|
+
`:javascript` asks to return the regex formatted for use in Javascript files or as `pattern` attribute values on HTML inputs. For this purpose, you'd use the `source` method on the Regexp object instance in order to produce a string that Javascript will understand. These examples make use of the Rails `text_field` method to generate HTML input elements.
|
62
|
+
|
63
|
+
regex = UrlRegex.get(mode: :javascript)
|
64
|
+
text_field(:site, :url, pattern: regex.source)
|
65
|
+
# => <input type="text" id="site_url" name="site[url]" pattern="[javascript URL regex]" />
|
66
|
+
|
67
|
+
regex = UrlRegex.get(scheme_required: false, mode: :javascript)
|
68
|
+
text_field(:site, :url, pattern: regex.source)
|
69
|
+
# => <input type="text" id="site_url" name="site[url]" pattern="[javascript URL regex with optional scheme]" />
|
70
|
+
|
61
71
|
`UrlRegex.get` returns regular Ruby's [Regex](http://ruby-doc.org/core-2.0.0/Regexp.html) object,
|
62
72
|
so you can use it as usual.
|
63
73
|
|
@@ -66,18 +76,18 @@ All regexes are case-insensitive.
|
|
66
76
|
## FAQ
|
67
77
|
|
68
78
|
Q: Hey, I want to parse HTML, but it doesn't work:
|
69
|
-
|
79
|
+
|
70
80
|
str = '<a href="http://google.com?t=1">Link</a>'
|
71
81
|
str.scan(UrlRegex.get(mode: :parsing))
|
72
82
|
# => "http://google.com?t=1">Link</a>"
|
73
|
-
|
74
|
-
A: Well, you probably know that parsing HTML with regex is
|
83
|
+
|
84
|
+
A: Well, you probably know that parsing HTML with regex is
|
75
85
|
[a bad idea](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags).
|
76
86
|
It requires matching corresponding open and close brackets, that makes the regex even more complicated.
|
77
87
|
|
78
88
|
Q: How can I speed up processing?
|
79
89
|
|
80
|
-
A: Generated regex depends only on options, so you can get the regex only once and cache it.
|
90
|
+
A: Generated regex depends only on options, so you can get the regex only once and cache it.
|
81
91
|
|
82
92
|
## Contributing
|
83
93
|
|
@@ -85,4 +95,4 @@ A: Generated regex depends only on options, so you can get the regex only once a
|
|
85
95
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
86
96
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
87
97
|
4. Push to the branch (`git push origin my-new-feature`)
|
88
|
-
5. Create a new Pull Request
|
98
|
+
5. Create a new Pull Request
|
data/lib/url_regex.rb
CHANGED
@@ -3,7 +3,6 @@ require 'url_regex/version'
|
|
3
3
|
# Provides the best known regex for validating and extracting URLs.
|
4
4
|
# It uses amazing job done by [Diego Perini](https://gist.github.com/dperini/729294)
|
5
5
|
# and [Mathias Bynens](https://mathiasbynens.be/demo/url-regex).
|
6
|
-
|
7
6
|
module UrlRegex
|
8
7
|
# Returns the regex for URLs parsing or validating.
|
9
8
|
#
|
@@ -12,49 +11,66 @@ module UrlRegex
|
|
12
11
|
# @return [Regex] regex for parsing or validating
|
13
12
|
def self.get(scheme_required: true, mode: :validation)
|
14
13
|
raise ArgumentError, "wrong mode: #{mode}" if MODES.index(mode).nil?
|
14
|
+
|
15
15
|
scheme = scheme_required ? PROTOCOL_IDENTIFIER : PROTOCOL_IDENTIFIER_OPTIONAL
|
16
|
-
|
16
|
+
|
17
|
+
case mode
|
18
|
+
when :validation
|
19
|
+
/\A#{scheme} #{BASE}\z/xi
|
20
|
+
when :parsing
|
21
|
+
/#{scheme} #{BASE}/xi
|
22
|
+
when :javascript
|
23
|
+
/^#{scheme}#{JAVASCRIPT_BASE}$/
|
24
|
+
end
|
17
25
|
end
|
18
26
|
|
19
27
|
BASE = '
|
20
|
-
|
21
|
-
|
28
|
+
# user:pass authentication
|
29
|
+
(?:\S+(?::\S*)?@)?
|
22
30
|
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
31
|
+
(?:
|
32
|
+
# IP address exclusion
|
33
|
+
# private & local networks
|
34
|
+
(?!(?:10|127)(?:\.\d{1,3}){3})
|
35
|
+
(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})
|
36
|
+
(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})
|
37
|
+
# IP address dotted notation octets
|
38
|
+
# excludes loopback network 0.0.0.0
|
39
|
+
# excludes reserved space >= 224.0.0.0
|
40
|
+
# excludes network & broadcast addresses
|
41
|
+
# (first & last IP address of each class)
|
42
|
+
(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])
|
43
|
+
(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}
|
44
|
+
(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))
|
45
|
+
|
|
46
|
+
# host name
|
47
|
+
(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)
|
48
|
+
# domain name
|
49
|
+
(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*
|
50
|
+
# TLD identifier
|
51
|
+
(?:\.(?:[a-z\u00a1-\uffff]{2,}))
|
52
|
+
# TLD may end with dot
|
53
|
+
\.?
|
54
|
+
)
|
55
|
+
|
56
|
+
# port number
|
57
|
+
(?::\d{2,5})?
|
58
|
+
|
59
|
+
# resource path
|
60
|
+
(?:[/?#]\S*)?
|
61
|
+
'.freeze
|
62
|
+
|
63
|
+
JAVASCRIPT_BASE = '
|
64
|
+
(?:\S+(?::\S*)?@)?
|
65
|
+
(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})
|
66
|
+
(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])
|
67
|
+
(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|
|
68
|
+
(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*
|
69
|
+
(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?'.gsub(/\s+/, '').freeze
|
54
70
|
|
55
71
|
PROTOCOL_IDENTIFIER = '(?:(?:https?|ftp)://)'.freeze
|
56
72
|
PROTOCOL_IDENTIFIER_OPTIONAL = '(?:(?:https?|ftp)://)?'.freeze
|
57
|
-
MODES = [:validation, :parsing].freeze
|
73
|
+
MODES = [:validation, :parsing, :javascript].freeze
|
58
74
|
|
59
75
|
private_constant :BASE, :PROTOCOL_IDENTIFIER, :PROTOCOL_IDENTIFIER_OPTIONAL, :MODES
|
60
76
|
end
|
data/lib/url_regex/version.rb
CHANGED
data/spec/url_regex_spec.rb
CHANGED
@@ -6,8 +6,9 @@ describe UrlRegex do
|
|
6
6
|
expect(UrlRegex.get(mode: :parsing)).to be
|
7
7
|
end
|
8
8
|
|
9
|
-
it 'should raise ArgumentError if mode neither :validation nor :parsing' do
|
9
|
+
it 'should raise ArgumentError if mode neither :validation nor :parsing nor :javascript' do
|
10
10
|
expect { UrlRegex.get(mode: nil) }.to raise_error ArgumentError
|
11
|
+
expect { UrlRegex.get(mode: :hahaha) }.to raise_error ArgumentError
|
11
12
|
end
|
12
13
|
end
|
13
14
|
|
@@ -56,6 +57,10 @@ describe UrlRegex do
|
|
56
57
|
it "should match #{valid_url}" do
|
57
58
|
expect(UrlRegex.get(scheme_required: true)).to match valid_url
|
58
59
|
end
|
60
|
+
|
61
|
+
it "should match #{valid_url} against Javascript regex" do
|
62
|
+
expect(UrlRegex.get(scheme_required: true, mode: :javascript)).to match valid_url
|
63
|
+
end
|
59
64
|
end
|
60
65
|
|
61
66
|
[
|
@@ -100,6 +105,10 @@ describe UrlRegex do
|
|
100
105
|
it "should not match #{invalid_url}" do
|
101
106
|
expect(UrlRegex.get(scheme_required: true)).to_not match invalid_url
|
102
107
|
end
|
108
|
+
|
109
|
+
it "should not match #{invalid_url} against Javascript regex" do
|
110
|
+
expect(UrlRegex.get(scheme_required: true, mode: :javascript)).to_not match invalid_url
|
111
|
+
end
|
103
112
|
end
|
104
113
|
end
|
105
114
|
|
@@ -149,6 +158,10 @@ describe UrlRegex do
|
|
149
158
|
it "should match #{valid_url}" do
|
150
159
|
expect(UrlRegex.get(scheme_required: false)).to match valid_url
|
151
160
|
end
|
161
|
+
|
162
|
+
it "should match #{valid_url} against Javascript regex" do
|
163
|
+
expect(UrlRegex.get(scheme_required: false, mode: :javascript)).to match valid_url
|
164
|
+
end
|
152
165
|
end
|
153
166
|
|
154
167
|
[
|
@@ -191,6 +204,10 @@ describe UrlRegex do
|
|
191
204
|
it "should not match #{invalid_url}" do
|
192
205
|
expect(UrlRegex.get(scheme_required: false)).to_not match invalid_url
|
193
206
|
end
|
207
|
+
|
208
|
+
it "should not match #{invalid_url} against Javascript regex" do
|
209
|
+
expect(UrlRegex.get(scheme_required: false, mode: :javascript)).to_not match invalid_url
|
210
|
+
end
|
194
211
|
end
|
195
212
|
end
|
196
213
|
|
data/url_regex.gemspec
CHANGED
@@ -6,7 +6,7 @@ require 'url_regex/version'
|
|
6
6
|
Gem::Specification.new do |spec|
|
7
7
|
spec.name = 'url_regex'
|
8
8
|
spec.version = UrlRegex::VERSION
|
9
|
-
spec.authors = ['Alexey Mogilnikov']
|
9
|
+
spec.authors = ['Alexey Mogilnikov', 'Michael Bester']
|
10
10
|
spec.email = ['alexey@mogilnikov.name']
|
11
11
|
spec.summary = 'Provides the best regex for validating or extracting URLs.'
|
12
12
|
spec.description = 'Provides the best regex for validating or extracting URLs.'
|
metadata
CHANGED
@@ -1,14 +1,15 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: url_regex
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Alexey Mogilnikov
|
8
|
+
- Michael Bester
|
8
9
|
autorequire:
|
9
10
|
bindir: bin
|
10
11
|
cert_chain: []
|
11
|
-
date:
|
12
|
+
date: 2017-01-22 00:00:00.000000000 Z
|
12
13
|
dependencies:
|
13
14
|
- !ruby/object:Gem::Dependency
|
14
15
|
name: bundler
|
@@ -104,7 +105,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
104
105
|
version: '0'
|
105
106
|
requirements: []
|
106
107
|
rubyforge_project:
|
107
|
-
rubygems_version: 2.
|
108
|
+
rubygems_version: 2.4.8
|
108
109
|
signing_key:
|
109
110
|
specification_version: 4
|
110
111
|
summary: Provides the best regex for validating or extracting URLs.
|