regexp-examples 1.3.1 → 1.3.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.travis.yml +3 -2
- data/README.md +9 -10
- data/lib/regexp-examples/chargroup_parser.rb +3 -3
- data/lib/regexp-examples/constants.rb +9 -7
- data/lib/regexp-examples/groups.rb +3 -3
- data/lib/regexp-examples/max_results_limiter.rb +1 -1
- data/lib/regexp-examples/parser_helpers/parse_after_backslash_group_helper.rb +22 -20
- data/lib/regexp-examples/parser_helpers/parse_multi_group_helper.rb +10 -11
- data/lib/regexp-examples/repeaters.rb +1 -1
- data/lib/regexp-examples/unicode_char_ranges.rb +3 -3
- data/lib/regexp-examples/version.rb +1 -1
- data/spec/regexp-examples_spec.rb +8 -2
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 7ce9ae460670e7c7525a38992990271751c311d4
|
4
|
+
data.tar.gz: f9501da52ba2b57b6ef92324f43ba7252a8d70f8
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 5fabfe8bf9dcb41d4e1a6ceba7b8c98278ea44f12ea657409821abd25fbf856b11effc726d67d06c0d2c67e728caf24853a89e2e73f26f1c5fba58ea397478e9
|
7
|
+
data.tar.gz: 717755a835ca5f0a611c4a62ba4d8f4b90b7a422ce8d80136f79a97a71ee2f439b027c3fbd42f68ac79e65884940f0004b9f35a72197e8748cc6c7fef2680891
|
data/.travis.yml
CHANGED
data/README.md
CHANGED
@@ -15,7 +15,7 @@ or a huge number of possible matches, such as `/.\w/`, then only a subset of the
|
|
15
15
|
|
16
16
|
For more detail on this, see [configuration options](#configuration-options).
|
17
17
|
|
18
|
-
If you'd like to understand how/why this gem works, please check out my [blog post](
|
18
|
+
If you'd like to understand how/why this gem works, please check out my [blog post](https://tom-lord.github.io/Reverse-Engineering-Regular-Expressions/) about it.
|
19
19
|
|
20
20
|
## Usage
|
21
21
|
|
@@ -100,12 +100,14 @@ Long answer:
|
|
100
100
|
* Negation, e.g. `/[^a-z]/`
|
101
101
|
* Escaped characters, e.g. `/[\w\s\b]/`
|
102
102
|
* POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
|
103
|
+
* ...Taking the current ruby version into account - e.g. the definition of `/[[:punct:]]/`
|
104
|
+
[changed](https://bugs.ruby-lang.org/issues/12577) in version `2.4.0`.
|
103
105
|
* Set intersection, e.g. `/[[a-h]&&[f-z]]/`
|
104
106
|
* Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
|
105
107
|
* Capture groups, e.g. `/(group)/`
|
106
108
|
* Including named groups, e.g. `/(?<name>group)/`
|
107
109
|
* And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
|
108
|
-
* ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1
|
110
|
+
* ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1>/`
|
109
111
|
* ...and even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
|
110
112
|
* Non-capture groups, e.g. `/(?:foo)/`
|
111
113
|
* Comment groups, e.g. `/foo(?#comment)bar/`
|
@@ -178,15 +180,12 @@ For instance, the following takes no more than ~ 1 second on my machine:
|
|
178
180
|
There are no known major bugs with this library. However, there are a few obscure issues that you *may* encounter:
|
179
181
|
|
180
182
|
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` are not yet supported. (This example *should* return: `["group1 yes", " no"]`)
|
181
|
-
* `\Z` should be interpreted like `\n?\z`; it's currently just interpreted like `\z`. (This basically just means you'll be missing a few examples.)
|
182
|
-
* Ideally, `regexp#examples` should always return up to `max_results_limit`. Currenty, it usually "aborts" before this limit is reached.
|
183
|
-
(I.e. the exact number of examples generated can be hard to predict, for complex patterns.)
|
184
|
-
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` -
|
185
|
-
which includes `"aaaa aa"`. This is because each repeater is not context-aware, so the "greediness" logic is flawed.
|
186
|
-
(E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy.)
|
187
|
-
However, patterns like this are highly unusual...
|
188
183
|
* Nested repeat operators are incorrectly parsed, e.g. `/b{2}{3}/` - which *should* be interpreted like `/b{6}/`. (However, there is probably no reason
|
189
184
|
to ever write regexes like this!)
|
185
|
+
* A new ["absent operator" (`/(?~exp)/`)](https://medium.com/rubyinside/the-new-absent-operator-in-ruby-s-regular-expressions-7c3ef6cd0b99)
|
186
|
+
was added to Ruby version `2.4.1`. This gem does not yet support it (or gracefully fail when used).
|
187
|
+
* Ideally, `regexp#examples` should always return up to `max_results_limit`. Currenty, it usually "aborts" before this limit is reached.
|
188
|
+
(I.e. the exact number of examples generated can be hard to predict, for complex patterns.)
|
190
189
|
|
191
190
|
Some of the most obscure regexp features are not even mentioned in [the ruby docs](http://ruby-doc.org/core/Regexp.html).
|
192
191
|
However, full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found
|
@@ -195,7 +194,7 @@ However, full documentation on all the intricate obscurities in the ruby (versio
|
|
195
194
|
## Impossible features ("illegal syntax")
|
196
195
|
|
197
196
|
The following features in the regex language can never be properly implemented into this gem because, put simply, they are not technically "regular"!
|
198
|
-
If you'd like to understand this in more detail, check out what I had to say in [my blog post](
|
197
|
+
If you'd like to understand this in more detail, check out what I had to say in [my blog post](https://tom-lord.github.io/Reverse-Engineering-Regular-Expressions/) about this gem.
|
199
198
|
|
200
199
|
Using any of the following will raise a `RegexpExamples::IllegalSyntax` exception:
|
201
200
|
|
@@ -15,7 +15,7 @@ module RegexpExamples
|
|
15
15
|
include CharsetNegationHelper
|
16
16
|
|
17
17
|
attr_reader :regexp_string, :current_position
|
18
|
-
|
18
|
+
alias length current_position
|
19
19
|
|
20
20
|
def initialize(regexp_string, is_sub_group: false)
|
21
21
|
@regexp_string = regexp_string
|
@@ -85,10 +85,10 @@ module RegexpExamples
|
|
85
85
|
|
86
86
|
def parse_after_backslash
|
87
87
|
case next_char
|
88
|
-
when *BackslashCharMap.keys
|
89
|
-
BackslashCharMap[next_char]
|
90
88
|
when 'b'
|
91
89
|
["\b"]
|
90
|
+
when *BackslashCharMap.keys
|
91
|
+
BackslashCharMap[next_char]
|
92
92
|
else
|
93
93
|
[next_char]
|
94
94
|
end
|
@@ -21,10 +21,12 @@ module RegexpExamples
|
|
21
21
|
# This is to prevent the system "freezing" when given instructions like:
|
22
22
|
# /[ab]{30}/.examples
|
23
23
|
# (Which would attempt to generate 2**30 == 1073741824 examples!!!)
|
24
|
-
MAX_RESULTS_LIMIT_DEFAULT =
|
24
|
+
MAX_RESULTS_LIMIT_DEFAULT = 10_000
|
25
25
|
class << self
|
26
26
|
attr_reader :max_repeater_variance, :max_group_results, :max_results_limit
|
27
|
-
def configure!(max_repeater_variance: nil,
|
27
|
+
def configure!(max_repeater_variance: nil,
|
28
|
+
max_group_results: nil,
|
29
|
+
max_results_limit: nil)
|
28
30
|
@max_repeater_variance = (max_repeater_variance || MAX_REPEATER_VARIANCE_DEFAULT)
|
29
31
|
@max_group_results = (max_group_results || MAX_GROUP_RESULTS_DEFAULT)
|
30
32
|
@max_results_limit = (max_results_limit || MAX_RESULTS_LIMIT_DEFAULT)
|
@@ -35,9 +37,11 @@ module RegexpExamples
|
|
35
37
|
def self.max_repeater_variance
|
36
38
|
ResultCountLimiters.max_repeater_variance
|
37
39
|
end
|
40
|
+
|
38
41
|
def self.max_group_results
|
39
42
|
ResultCountLimiters.max_group_results
|
40
43
|
end
|
44
|
+
|
41
45
|
def self.max_results_limit
|
42
46
|
ResultCountLimiters.max_results_limit
|
43
47
|
end
|
@@ -48,13 +52,11 @@ module RegexpExamples
|
|
48
52
|
Lower = Array('a'..'z')
|
49
53
|
Upper = Array('A'..'Z')
|
50
54
|
Digit = Array('0'..'9')
|
51
|
-
#
|
52
|
-
|
53
|
-
# However, due to a ruby bug (!!) these do not work properly at the moment!
|
54
|
-
Punct = %w(! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { })
|
55
|
+
Punct = %w[! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }] \
|
56
|
+
| (RUBY_VERSION >= '2.4.0' ? %w[$ + < = > ^ ` | ~] : [])
|
55
57
|
Hex = Array('a'..'f') | Array('A'..'F') | Digit
|
56
58
|
Word = Lower | Upper | Digit | ['_']
|
57
|
-
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"]
|
59
|
+
Whitespace = [' ', "\t", "\n", "\r", "\v", "\f"].freeze
|
58
60
|
Control = (0..31).map(&:chr) | ["\x7f"]
|
59
61
|
# Ensure that the "common" characters appear first in the array
|
60
62
|
# Also, ensure "\n" comes first, to make it obvious when included
|
@@ -131,7 +131,7 @@ module RegexpExamples
|
|
131
131
|
end
|
132
132
|
end
|
133
133
|
|
134
|
-
|
134
|
+
alias random_result result
|
135
135
|
end
|
136
136
|
|
137
137
|
# A boolean "or" group.
|
@@ -162,7 +162,7 @@ module RegexpExamples
|
|
162
162
|
max_results_limiter = MaxResultsLimiterBySum.new
|
163
163
|
repeaters_list
|
164
164
|
.map { |repeaters| RegexpExamples.generic_map_result(repeaters, method) }
|
165
|
-
.map { |result| max_results_limiter.limit_results(result)}
|
165
|
+
.map { |result| max_results_limiter.limit_results(result) }
|
166
166
|
.inject(:concat)
|
167
167
|
.map { |result| GroupResult.new(result) }
|
168
168
|
.uniq
|
@@ -184,7 +184,7 @@ module RegexpExamples
|
|
184
184
|
# of /\1/ as being "__1__". It later gets updated.
|
185
185
|
class BackReferenceGroup
|
186
186
|
include RandomResultBySample
|
187
|
-
PLACEHOLDER_FORMAT = '__%s__'
|
187
|
+
PLACEHOLDER_FORMAT = '__%s__'.freeze
|
188
188
|
attr_reader :id
|
189
189
|
def initialize(id)
|
190
190
|
@id = id
|
@@ -44,7 +44,7 @@ module RegexpExamples
|
|
44
44
|
end
|
45
45
|
|
46
46
|
# For example:
|
47
|
-
# Needed when generating examples for /[ab]{10}|{cd}{11}/
|
47
|
+
# Needed when generating examples for /[ab]{10}|{cd}{11}/
|
48
48
|
# (here, results_count will reach 1024 + 2048 == 3072)
|
49
49
|
class MaxResultsLimiterBySum < MaxResultsLimiter
|
50
50
|
def initialize
|
@@ -5,35 +5,33 @@ module RegexpExamples
|
|
5
5
|
|
6
6
|
def parse_after_backslash_group
|
7
7
|
@current_position += 1
|
8
|
-
|
9
|
-
when rest_of_string =~ /\A(\d{1,3})/
|
8
|
+
if rest_of_string =~ /\A(\d{1,3})/
|
10
9
|
parse_regular_backreference_group(Regexp.last_match(1))
|
11
|
-
|
10
|
+
elsif rest_of_string =~ /\Ak['<]([\w-]+)['>]/
|
12
11
|
parse_named_backreference_group(Regexp.last_match(1))
|
13
|
-
|
12
|
+
elsif BackslashCharMap.keys.include?(next_char)
|
14
13
|
parse_backslash_special_char
|
15
|
-
|
14
|
+
elsif rest_of_string =~ /\A(c|C-)(.)/
|
16
15
|
parse_backslash_control_char(Regexp.last_match(1), Regexp.last_match(2))
|
17
|
-
|
16
|
+
elsif rest_of_string =~ /\Ax(\h{1,2})/
|
18
17
|
parse_backslash_escape_sequence(Regexp.last_match(1))
|
19
|
-
|
18
|
+
elsif rest_of_string =~ /\Au(\h{4}|\{\h{1,4}\})/
|
20
19
|
parse_backslash_unicode_sequence(Regexp.last_match(1))
|
21
|
-
|
20
|
+
elsif rest_of_string =~ /\A(p)\{(\^?)([^}]+)\}/i
|
22
21
|
parse_backslash_named_property(
|
23
22
|
Regexp.last_match(1), Regexp.last_match(2), Regexp.last_match(3)
|
24
23
|
)
|
25
|
-
|
24
|
+
elsif next_char == 'K' # Keep (special lookbehind that CAN be supported safely!)
|
26
25
|
PlaceHolderGroup.new
|
27
|
-
|
26
|
+
elsif next_char == 'R'
|
28
27
|
parse_backslash_linebreak
|
29
|
-
|
28
|
+
elsif next_char == 'g'
|
30
29
|
parse_backslash_subexpresion_call
|
31
|
-
|
30
|
+
elsif next_char =~ /[bB]/
|
32
31
|
parse_backslash_anchor
|
33
|
-
|
32
|
+
elsif next_char =~ /[AG]/
|
34
33
|
parse_backslash_start_of_string
|
35
|
-
|
36
|
-
# TODO: /\Z/ should be treated as /\n?/
|
34
|
+
elsif next_char =~ /[zZ]/
|
37
35
|
parse_backslash_end_of_string
|
38
36
|
else
|
39
37
|
parse_single_char_group(next_char)
|
@@ -112,8 +110,8 @@ module RegexpExamples
|
|
112
110
|
end
|
113
111
|
|
114
112
|
def parse_backslash_subexpresion_call
|
115
|
-
|
116
|
-
|
113
|
+
raise IllegalSyntaxError,
|
114
|
+
'Subexpression calls (\\g) cannot be supported, as they are not regular'
|
117
115
|
end
|
118
116
|
|
119
117
|
def parse_backslash_anchor
|
@@ -130,15 +128,19 @@ module RegexpExamples
|
|
130
128
|
|
131
129
|
def parse_backslash_end_of_string
|
132
130
|
if @current_position == (regexp_string.length - 1)
|
133
|
-
|
131
|
+
if next_char == 'z'
|
132
|
+
PlaceHolderGroup.new
|
133
|
+
else # next_char == 'Z'
|
134
|
+
QuestionMarkRepeater.new(SingleCharGroup.new("\n", @ignorecase))
|
135
|
+
end
|
134
136
|
else
|
135
137
|
raise_anchors_exception!
|
136
138
|
end
|
137
139
|
end
|
138
140
|
|
139
141
|
def raise_anchors_exception!
|
140
|
-
|
141
|
-
|
142
|
+
raise IllegalSyntaxError,
|
143
|
+
"Anchors ('#{next_char}') cannot be supported, as they are not regular"
|
142
144
|
end
|
143
145
|
end
|
144
146
|
end
|
@@ -28,15 +28,14 @@ module RegexpExamples
|
|
28
28
|
)?
|
29
29
|
/x
|
30
30
|
) do |match|
|
31
|
-
|
32
|
-
when match[1].nil? # e.g. /(normal)/
|
31
|
+
if match[1].nil? # e.g. /(normal)/
|
33
32
|
group_id = @num_groups.to_s
|
34
|
-
|
33
|
+
elsif match[2] == ':' # e.g. /(?:nocapture)/
|
35
34
|
@current_position += 2
|
36
|
-
|
35
|
+
elsif match[2] == '#' # e.g. /(?#comment)/
|
37
36
|
comment_group = rest_of_string.match(/.*?[^\\](?:\\{2})*\)/)[0]
|
38
37
|
@current_position += comment_group.length
|
39
|
-
|
38
|
+
elsif match[2] =~ /\A(?=[mix-]+)([mix]*)-?([mix]*)/ # e.g. /(?i-mx)/
|
40
39
|
regexp_options_toggle(Regexp.last_match(1), Regexp.last_match(2))
|
41
40
|
@num_groups -= 1 # Toggle "groups" should not increase backref group count
|
42
41
|
@current_position += $&.length + 1
|
@@ -45,12 +44,12 @@ module RegexpExamples
|
|
45
44
|
else
|
46
45
|
return PlaceHolderGroup.new
|
47
46
|
end
|
48
|
-
|
49
|
-
|
50
|
-
|
51
|
-
|
52
|
-
|
53
|
-
|
47
|
+
elsif %w[! =].include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
|
48
|
+
raise IllegalSyntaxError,
|
49
|
+
'Lookaheads are not regular; cannot generate examples'
|
50
|
+
elsif %w[! =].include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
|
51
|
+
raise IllegalSyntaxError,
|
52
|
+
'Lookbehinds are not regular; cannot generate examples'
|
54
53
|
else # e.g. /(?<name>namedgroup)/
|
55
54
|
@current_position += (match[3].length + 3)
|
56
55
|
group_id = match[3]
|
@@ -10,7 +10,7 @@ module RegexpExamples
|
|
10
10
|
# Note: Only the first 128 results are listed, for performance.
|
11
11
|
# Also, some groups seem to have no matches (weird!)
|
12
12
|
# (Don't care about ruby micro version number)
|
13
|
-
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION[0..2]}.pstore"
|
13
|
+
STORE_FILENAME = "unicode_ranges_#{RUBY_VERSION[0..2]}.pstore".freeze
|
14
14
|
|
15
15
|
attr_reader :range_store
|
16
16
|
|
@@ -24,7 +24,7 @@ module RegexpExamples
|
|
24
24
|
end
|
25
25
|
end
|
26
26
|
|
27
|
-
|
27
|
+
alias [] get
|
28
28
|
|
29
29
|
private
|
30
30
|
|
@@ -40,7 +40,7 @@ module RegexpExamples
|
|
40
40
|
def ranges_to_unicode(ranges)
|
41
41
|
result = []
|
42
42
|
ranges.each do |range|
|
43
|
-
if range.is_a?
|
43
|
+
if range.is_a? Integer # Small hack to increase data compression
|
44
44
|
result << hex_to_unicode(range.to_s(16))
|
45
45
|
else
|
46
46
|
range.each { |num| result << hex_to_unicode(num.to_s(16)) }
|
@@ -170,8 +170,9 @@ RSpec.describe Regexp, '#examples' do
|
|
170
170
|
/\Glast-match/,
|
171
171
|
/^start/,
|
172
172
|
/end$/,
|
173
|
-
/end\z
|
174
|
-
/end\Z/
|
173
|
+
/end\z/
|
174
|
+
# Cannot test /end\Z/ with the generic method here,
|
175
|
+
# as it's a special case. Tested specially below.
|
175
176
|
)
|
176
177
|
end
|
177
178
|
|
@@ -303,6 +304,11 @@ RSpec.describe Regexp, '#examples' do
|
|
303
304
|
it { expect(/a{1}?/.examples).to match_array ['', 'a'] }
|
304
305
|
end
|
305
306
|
|
307
|
+
context 'end of string' do
|
308
|
+
it { expect(/test\z/.examples).to match_array %w(test) }
|
309
|
+
it { expect(/test\Z/.examples).to match_array ['test', "test\n"] }
|
310
|
+
end
|
311
|
+
|
306
312
|
context 'backreferences and escaped octal combined' do
|
307
313
|
it do
|
308
314
|
expect(/(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)? \10\9\8\7\6\5\4\3\2\1/.examples)
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp-examples
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.3.
|
4
|
+
version: 1.3.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tom Lord
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2017-06-06 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -100,7 +100,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
100
100
|
version: '0'
|
101
101
|
requirements: []
|
102
102
|
rubyforge_project:
|
103
|
-
rubygems_version: 2.6.
|
103
|
+
rubygems_version: 2.6.12
|
104
104
|
signing_key:
|
105
105
|
specification_version: 4
|
106
106
|
summary: Extends the Regexp class with '#examples' and '#random_example'
|