regexp-examples 1.3.1 → 1.3.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.travis.yml +3 -2
- data/README.md +9 -10
- data/lib/regexp-examples/chargroup_parser.rb +3 -3
- data/lib/regexp-examples/constants.rb +9 -7
- data/lib/regexp-examples/groups.rb +3 -3
- data/lib/regexp-examples/max_results_limiter.rb +1 -1
- data/lib/regexp-examples/parser_helpers/parse_after_backslash_group_helper.rb +22 -20
- data/lib/regexp-examples/parser_helpers/parse_multi_group_helper.rb +10 -11
- data/lib/regexp-examples/repeaters.rb +1 -1
- data/lib/regexp-examples/unicode_char_ranges.rb +3 -3
- data/lib/regexp-examples/version.rb +1 -1
- data/spec/regexp-examples_spec.rb +8 -2
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7ce9ae460670e7c7525a38992990271751c311d4
|
4
|
+
data.tar.gz: f9501da52ba2b57b6ef92324f43ba7252a8d70f8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5fabfe8bf9dcb41d4e1a6ceba7b8c98278ea44f12ea657409821abd25fbf856b11effc726d67d06c0d2c67e728caf24853a89e2e73f26f1c5fba58ea397478e9
|
7
|
+
data.tar.gz: 717755a835ca5f0a611c4a62ba4d8f4b90b7a422ce8d80136f79a97a71ee2f439b027c3fbd42f68ac79e65884940f0004b9f35a72197e8748cc6c7fef2680891
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -15,7 +15,7 @@ or a huge number of possible matches, such as `/.\w/`, then only a subset of the
|
|
15
15
|
|
16
16
|
For more detail on this, see [configuration options](#configuration-options).
|
17
17
|
|
18
|
-
If you'd like to understand how/why this gem works, please check out my [blog post](
|
18
|
+
If you'd like to understand how/why this gem works, please check out my [blog post](https://tom-lord.github.io/Reverse-Engineering-Regular-Expressions/) about it.
|
19
19
|
|
20
20
|
## Usage
|
21
21
|
|
@@ -100,12 +100,14 @@ Long answer:
|
|
100
100
|
* Negation, e.g. `/[^a-z]/`
|
101
101
|
* Escaped characters, e.g. `/[\w\s\b]/`
|
102
102
|
* POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
|
103
|
+
* ...Taking the current ruby version into account - e.g. the definition of `/[[:punct:]]/`
|
104
|
+
[changed](https://bugs.ruby-lang.org/issues/12577) in version `2.4.0`.
|
103
105
|
* Set intersection, e.g. `/[[a-h]&&[f-z]]/`
|
104
106
|
* Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
|
105
107
|
* Capture groups, e.g. `/(group)/`
|
106
108
|
* Including named groups, e.g. `/(?<name>group)/`
|
107
109
|
* And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
|
108
|
-
* ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1
|
110
|
+
* ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1>/`
|
109
111
|
* ...and even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
|
110
112
|
* Non-capture groups, e.g. `/(?:foo)/`
|
111
113
|
* Comment groups, e.g. `/foo(?#comment)bar/`
|
@@ -178,15 +180,12 @@ For instance, the following takes no more than ~ 1 second on my machine:
|
|
178
180
|
There are no known major bugs with this library. However, there are a few obscure issues that you *may* encounter:
|
179
181
|
|
180
182
|
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` are not yet supported. (This example *should* return: `["group1 yes", " no"]`)
|
181
|
-
* `\Z` should be interpreted like `\n?\z`; it's currently just interpreted like `\z`. (This basically just means you'll be missing a few examples.)
|
182
|
-
* Ideally, `regexp#examples` should always return up to `max_results_limit`. Currenty, it usually "aborts" before this limit is reached.
|
183
|
-
(I.e. the exact number of examples generated can be hard to predict, for complex patterns.)
|
184
|
-
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` -
|
185
|
-
which includes `"aaaa aa"`. This is because each repeater is not context-aware, so the "greediness" logic is flawed.
|
186
|
-
(E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy.)
|
187
|
-
However, patterns like this are highly unusual...
|
188
183
|
* Nested repeat operators are incorrectly parsed, e.g. `/b{2}{3}/` - which *should* be interpreted like `/b{6}/`. (However, there is probably no reason
|
189
184
|
to ever write regexes like this!)
|
185
|
+
* A new ["absent operator" (`/(?~exp)/`)](https://medium.com/rubyinside/the-new-absent-operator-in-ruby-s-regular-expressions-7c3ef6cd0b99)
|
186
|
+
was added to Ruby version `2.4.1`. This gem does not yet support it (or gracefully fail when used).
|
187
|
+
* Ideally, `regexp#examples` should always return up to `max_results_limit`. Currenty, it usually "aborts" before this limit is reached.
|
188
|
+
(I.e. the exact number of examples generated can be hard to predict, for complex patterns.)
|
190
189
|
|
191
190
|
Some of the most obscure regexp features are not even mentioned in [the ruby docs](http://ruby-doc.org/core/Regexp.html).
|
192
191
|
However, full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found
|
@@ -195,7 +194,7 @@ However, full documentation on all the intricate obscurities in the ruby (versio
|
|
195
194
|
## Impossible features ("illegal syntax")
|
196
195
|
|
197
196
|
The following features in the regex language can never be properly implemented into this gem because, put simply, they are not technically "regular"!
|
198
|
-
If you'd like to understand this in more detail, check out what I had to say in [my blog post](
|
197
|
+
If you'd like to understand this in more detail, check out what I had to say in [my blog post](https://tom-lord.github.io/Reverse-Engineering-Regular-Expressions/) about this gem.
|
199
198
|
|
200
199
|
Using any of the following will raise a `RegexpExamples::IllegalSyntax` exception:
|
201
200
|
|
@@ -15,7 +15,7 @@ module RegexpExamples
|
|
15
15
|
include CharsetNegationHelper
|
16
16
|
|
17
17
|
attr_reader :regexp_string, :current_position
|
18
|
-
|
18
|
+
alias length current_position
|
19
19
|
|
20
20
|
def initialize(regexp_string, is_sub_group: false)
|
21
21
|
@regexp_string = regexp_string
|
@@ -85,10 +85,10 @@ module RegexpExamples
|
|
85
85
|
|
86
86
|
def parse_after_backslash
|
87
87
|
case next_char
|
88
|
-
when *BackslashCharMap.keys
|
89
|
-
BackslashCharMap[next_char]
|
90
88
|
when 'b'
|
91
89
|
["\b"]
|
90
|
+
when *BackslashCharMap.keys
|
91
|
+
BackslashCharMap[next_char]
|
92
92
|
else
|
93
93
|
[next_char]
|
94
94
|
end
|
@@ -21,10 +21,12 @@ module RegexpExamples
|
|
21
21
|
# This is to prevent the system "freezing" when given instructions like:
|
22
22
|
# /[ab]{30}/.examples
|
23
23
|
# (Which would attempt to generate 2**30 == 1073741824 examples!!!)
|
24
|
-
MAX_RESULTS_LIMIT_DEFAULT =
|
24
|
+
MAX_RESULTS_LIMIT_DEFAULT = 10_000
|
25
25
|
class << self
|
26
26
|
attr_reader :max_repeater_variance, :max_group_results, :max_results_limit
|
27
|
-
def configure!(max_repeater_variance: nil,
|
27
|
+
def configure!(max_repeater_variance: nil,
|
28
|
+
max_group_results: nil,
|
29
|
+
max_results_limit: nil)
|
28
30
|
@max_repeater_variance = (max_repeater_variance || MAX_REPEATER_VARIANCE_DEFAULT)
|
29
31
|
@max_group_results = (max_group_results || MAX_GROUP_RESULTS_DEFAULT)
|
30
32
|
@max_results_limit = (max_results_limit || MAX_RESULTS_LIMIT_DEFAULT)
|
@@ -35,9 +37,11 @@ module RegexpExamples
|
|
35
37
|
def self.max_repeater_variance
|
36
38
|
ResultCountLimiters.max_repeater_variance
|
37
39
|
end
|
40
|
+
|
38
41
|
def self.max_group_results
|
39
42
|
ResultCountLimiters.max_group_results
|
40
43
|
end
|
44
|
+
|
41
45
|
def self.max_results_limit
|
42
46
|
ResultCountLimiters.max_results_limit
|
43
47
|
end
|
@@ -48,13 +52,11 @@ module RegexpExamples
|
|
48
52
|
Lower = Array('a'..'z')
|
49
53
|
Upper = Array('A'..'Z')
|
50
54
|
Digit = Array('0'..'9')
|
51
|
-
#
|
52
|
-
|
53
|
-
# However, due to a ruby bug (!!) these do not work properly at the moment!
|
54
|
-
Punct = %w(! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { })
|
55
|
+
Punct = %w[! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }] \
|
56
|
+
| (RUBY_VERSION >= '2.4.0' ? %w[$ + < = > ^ ` | ~] : [])
|
55
57
|
Hex = Array('a'..'f') | Array('A'..'F') | Digit
|
56
58
|
Word = Lower | Upper | Digit | ['_']
|
57
|
-
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"]
|
59
|
+
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"].freeze
|
58
60
|
Control = (0..31).map(&:chr) | ["\x7f"]
|
59
61
|
# Ensure that the "common" characters appear first in the array
|
60
62
|
# Also, ensure "\n" comes first, to make it obvious when included
|
@@ -131,7 +131,7 @@ module RegexpExamples
|
|
131
131
|
end
|
132
132
|
end
|
133
133
|
|
134
|
-
|
134
|
+
alias random_result result
|
135
135
|
end
|
136
136
|
|
137
137
|
# A boolean "or" group.
|
@@ -162,7 +162,7 @@ module RegexpExamples
|
|
162
162
|
max_results_limiter = MaxResultsLimiterBySum.new
|
163
163
|
repeaters_list
|
164
164
|
.map { |repeaters| RegexpExamples.generic_map_result(repeaters, method) }
|
165
|
-
.map { |result| max_results_limiter.limit_results(result)}
|
165
|
+
.map { |result| max_results_limiter.limit_results(result) }
|
166
166
|
.inject(:concat)
|
167
167
|
.map { |result| GroupResult.new(result) }
|
168
168
|
.uniq
|
@@ -184,7 +184,7 @@ module RegexpExamples
|
|
184
184
|
# of /\1/ as being "__1__". It later gets updated.
|
185
185
|
class BackReferenceGroup
|
186
186
|
include RandomResultBySample
|
187
|
-
PLACEHOLDER_FORMAT = '__%s__'
|
187
|
+
PLACEHOLDER_FORMAT = '__%s__'.freeze
|
188
188
|
attr_reader :id
|
189
189
|
def initialize(id)
|
190
190
|
@id = id
|
@@ -44,7 +44,7 @@ module RegexpExamples
|
|
44
44
|
end
|
45
45
|
|
46
46
|
# For example:
|
47
|
-
# Needed when generating examples for /[ab]{10}|{cd}{11}/
|
47
|
+
# Needed when generating examples for /[ab]{10}|{cd}{11}/
|
48
48
|
# (here, results_count will reach 1024 + 2048 == 3072)
|
49
49
|
class MaxResultsLimiterBySum < MaxResultsLimiter
|
50
50
|
def initialize
|
@@ -5,35 +5,33 @@ module RegexpExamples
|
|
5
5
|
|
6
6
|
def parse_after_backslash_group
|
7
7
|
@current_position += 1
|
8
|
-
|
9
|
-
when rest_of_string =~ /\A(\d{1,3})/
|
8
|
+
if rest_of_string =~ /\A(\d{1,3})/
|
10
9
|
parse_regular_backreference_group(Regexp.last_match(1))
|
11
|
-
|
10
|
+
elsif rest_of_string =~ /\Ak['<]([\w-]+)['>]/
|
12
11
|
parse_named_backreference_group(Regexp.last_match(1))
|
13
|
-
|
12
|
+
elsif BackslashCharMap.keys.include?(next_char)
|
14
13
|
parse_backslash_special_char
|
15
|
-
|
14
|
+
elsif rest_of_string =~ /\A(c|C-)(.)/
|
16
15
|
parse_backslash_control_char(Regexp.last_match(1), Regexp.last_match(2))
|
17
|
-
|
16
|
+
elsif rest_of_string =~ /\Ax(\h{1,2})/
|
18
17
|
parse_backslash_escape_sequence(Regexp.last_match(1))
|
19
|
-
|
18
|
+
elsif rest_of_string =~ /\Au(\h{4}|\{\h{1,4}\})/
|
20
19
|
parse_backslash_unicode_sequence(Regexp.last_match(1))
|
21
|
-
|
20
|
+
elsif rest_of_string =~ /\A(p)\{(\^?)([^}]+)\}/i
|
22
21
|
parse_backslash_named_property(
|
23
22
|
Regexp.last_match(1), Regexp.last_match(2), Regexp.last_match(3)
|
24
23
|
)
|
25
|
-
|
24
|
+
elsif next_char == 'K' # Keep (special lookbehind that CAN be supported safely!)
|
26
25
|
PlaceHolderGroup.new
|
27
|
-
|
26
|
+
elsif next_char == 'R'
|
28
27
|
parse_backslash_linebreak
|
29
|
-
|
28
|
+
elsif next_char == 'g'
|
30
29
|
parse_backslash_subexpresion_call
|
31
|
-
|
30
|
+
elsif next_char =~ /[bB]/
|
32
31
|
parse_backslash_anchor
|
33
|
-
|
32
|
+
elsif next_char =~ /[AG]/
|
34
33
|
parse_backslash_start_of_string
|
35
|
-
|
36
|
-
# TODO: /\Z/ should be treated as /\n?/
|
34
|
+
elsif next_char =~ /[zZ]/
|
37
35
|
parse_backslash_end_of_string
|
38
36
|
else
|
39
37
|
parse_single_char_group(next_char)
|
@@ -112,8 +110,8 @@ module RegexpExamples
|
|
112
110
|
end
|
113
111
|
|
114
112
|
def parse_backslash_subexpresion_call
|
115
|
-
|
116
|
-
|
113
|
+
raise IllegalSyntaxError,
|
114
|
+
'Subexpression calls (\\g) cannot be supported, as they are not regular'
|
117
115
|
end
|
118
116
|
|
119
117
|
def parse_backslash_anchor
|
@@ -130,15 +128,19 @@ module RegexpExamples
|
|
130
128
|
|
131
129
|
def parse_backslash_end_of_string
|
132
130
|
if @current_position == (regexp_string.length - 1)
|
133
|
-
|
131
|
+
if next_char == 'z'
|
132
|
+
PlaceHolderGroup.new
|
133
|
+
else # next_char == 'Z'
|
134
|
+
QuestionMarkRepeater.new(SingleCharGroup.new("\n", @ignorecase))
|
135
|
+
end
|
134
136
|
else
|
135
137
|
raise_anchors_exception!
|
136
138
|
end
|
137
139
|
end
|
138
140
|
|
139
141
|
def raise_anchors_exception!
|
140
|
-
|
141
|
-
|
142
|
+
raise IllegalSyntaxError,
|
143
|
+
"Anchors ('#{next_char}') cannot be supported, as they are not regular"
|
142
144
|
end
|
143
145
|
end
|
144
146
|
end
|
@@ -28,15 +28,14 @@ module RegexpExamples
|
|
28
28
|
)?
|
29
29
|
/x
|
30
30
|
) do |match|
|
31
|
-
|
32
|
-
when match[1].nil? # e.g. /(normal)/
|
31
|
+
if match[1].nil? # e.g. /(normal)/
|
33
32
|
group_id = @num_groups.to_s
|
34
|
-
|
33
|
+
elsif match[2] == ':' # e.g. /(?:nocapture)/
|
35
34
|
@current_position += 2
|
36
|
-
|
35
|
+
elsif match[2] == '#' # e.g. /(?#comment)/
|
37
36
|
comment_group = rest_of_string.match(/.*?[^\\](?:\\{2})*\)/)[0]
|
38
37
|
@current_position += comment_group.length
|
39
|
-
|
38
|
+
elsif match[2] =~ /\A(?=[mix-]+)([mix]*)-?([mix]*)/ # e.g. /(?i-mx)/
|
40
39
|
regexp_options_toggle(Regexp.last_match(1), Regexp.last_match(2))
|
41
40
|
@num_groups -= 1 # Toggle "groups" should not increase backref group count
|
42
41
|
@current_position += $&.length + 1
|
@@ -45,12 +44,12 @@ module RegexpExamples
|
|
45
44
|
else
|
46
45
|
return PlaceHolderGroup.new
|
47
46
|
end
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
47
|
+
elsif %w[! =].include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
|
48
|
+
raise IllegalSyntaxError,
|
49
|
+
'Lookaheads are not regular; cannot generate examples'
|
50
|
+
elsif %w[! =].include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
|
51
|
+
raise IllegalSyntaxError,
|
52
|
+
'Lookbehinds are not regular; cannot generate examples'
|
54
53
|
else # e.g. /(?<name>namedgroup)/
|
55
54
|
@current_position += (match[3].length + 3)
|
56
55
|
group_id = match[3]
|
@@ -10,7 +10,7 @@ module RegexpExamples
|
|
10
10
|
# Note: Only the first 128 results are listed, for performance.
|
11
11
|
# Also, some groups seem to have no matches (weird!)
|
12
12
|
# (Don't care about ruby micro version number)
|
13
|
-
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION[0..2]}.pstore"
|
13
|
+
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION[0..2]}.pstore".freeze
|
14
14
|
|
15
15
|
attr_reader :range_store
|
16
16
|
|
@@ -24,7 +24,7 @@ module RegexpExamples
|
|
24
24
|
end
|
25
25
|
end
|
26
26
|
|
27
|
-
|
27
|
+
alias [] get
|
28
28
|
|
29
29
|
private
|
30
30
|
|
@@ -40,7 +40,7 @@ module RegexpExamples
|
|
40
40
|
def ranges_to_unicode(ranges)
|
41
41
|
result = []
|
42
42
|
ranges.each do |range|
|
43
|
-
if range.is_a?
|
43
|
+
if range.is_a? Integer # Small hack to increase data compression
|
44
44
|
result << hex_to_unicode(range.to_s(16))
|
45
45
|
else
|
46
46
|
range.each { |num| result << hex_to_unicode(num.to_s(16)) }
|
@@ -170,8 +170,9 @@ RSpec.describe Regexp, '#examples' do
|
|
170
170
|
/\Glast-match/,
|
171
171
|
/^start/,
|
172
172
|
/end$/,
|
173
|
-
/end\z
|
174
|
-
/end\Z/
|
173
|
+
/end\z/
|
174
|
+
# Cannot test /end\Z/ with the generic method here,
|
175
|
+
# as it's a special case. Tested specially below.
|
175
176
|
)
|
176
177
|
end
|
177
178
|
|
@@ -303,6 +304,11 @@ RSpec.describe Regexp, '#examples' do
|
|
303
304
|
it { expect(/a{1}?/.examples).to match_array ['', 'a'] }
|
304
305
|
end
|
305
306
|
|
307
|
+
context 'end of string' do
|
308
|
+
it { expect(/test\z/.examples).to match_array %w(test) }
|
309
|
+
it { expect(/test\Z/.examples).to match_array ['test', "test\n"] }
|
310
|
+
end
|
311
|
+
|
306
312
|
context 'backreferences and escaped octal combined' do
|
307
313
|
it do
|
308
314
|
expect(/(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)? \10\9\8\7\6\5\4\3\2\1/.examples)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp-examples
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.3.
|
4
|
+
version: 1.3.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tom Lord
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2017-06-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -100,7 +100,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
100
100
|
version: '0'
|
101
101
|
requirements: []
|
102
102
|
rubyforge_project:
|
103
|
-
rubygems_version: 2.6.
|
103
|
+
rubygems_version: 2.6.12
|
104
104
|
signing_key:
|
105
105
|
specification_version: 4
|
106
106
|
summary: Extends the Regexp class with '#examples' and '#random_example'
|