regexp-examples 1.0.1 → 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +20 -9
- data/lib/regexp-examples/parser.rb +98 -72
- data/lib/regexp-examples/version.rb +1 -1
- data/spec/regexp-examples_spec.rb +5 -2
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: e4648ff5cf5c73b7916f58099a989ad58619e2d1
|
4
|
+
data.tar.gz: 9a9d53ceaf5a89f1f363124fad033b46b1489774
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 77997419f70d44cde2181c9a61f81a7e34456de573ab0bfbe46d1dcb35a6350472b3f09b4f9853aa657bca17af8b220d6179649227fda29e69a203f4a467b668
|
7
|
+
data.tar.gz: 5c830560b6485f7a02bb9ad41fdc0f316ca4b0685d079ea8706cf170a4298e88f113b8bb8eb9bfc0cce18e22e733e235f03ab42120c60bd7828628ae779f5666
|
data/README.md
CHANGED
@@ -5,7 +5,7 @@
|
|
5
5
|
|
6
6
|
Extends the Regexp class with the method: Regexp#examples
|
7
7
|
|
8
|
-
This method generates a list of (some\*) strings that will match the given regular expression
|
8
|
+
This method generates a list of (some\*) strings that will match the given regular expression.
|
9
9
|
|
10
10
|
\* If the regex has an infinite number of possible srings that match it, such as `/a*b+c{2,}/`,
|
11
11
|
or a huge number of possible matches, such as `/.\w/`, then only a subset of these will be listed.
|
@@ -22,9 +22,15 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
22
22
|
# 'http://www.github.com', 'https://github.com', 'https://www.github.com']
|
23
23
|
/(I(N(C(E(P(T(I(O(N)))))))))*/.examples #=> ["", "INCEPTION", "INCEPTIONINCEPTION"]
|
24
24
|
/\x74\x68\x69\x73/.examples #=> ["this"]
|
25
|
-
/\u6829/.examples #=> ["栩"]
|
26
25
|
/what about (backreferences\?) \1/.examples
|
27
26
|
#=> ['what about backreferences? backreferences?']
|
27
|
+
/
|
28
|
+
\u{28}\u2022\u{5f}\u2022\u{29}
|
29
|
+
|
|
30
|
+
\u{28}\u{20}\u2022\u{5f}\u2022\u{29}\u{3e}\u2310\u25a0\u{2d}\u25a0\u{20}
|
31
|
+
|
|
32
|
+
\u{28}\u2310\u25a0\u{5f}\u25a0\u{29}
|
33
|
+
/x.examples #=> ["(•_•)", "( •_•)>⌐■-■ ", "(⌐■_■)"]
|
28
34
|
```
|
29
35
|
|
30
36
|
## Installation
|
@@ -45,6 +51,10 @@ Or install it yourself as:
|
|
45
51
|
|
46
52
|
## Supported syntax
|
47
53
|
|
54
|
+
Short answer: **Everything** is supported, apart from "irregular" aspects of the regexp language -- see [impossible features](#impossible-features-illegal-syntax)
|
55
|
+
|
56
|
+
Long answer:
|
57
|
+
|
48
58
|
* All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
|
49
59
|
* Reluctant and possissive repeaters work fine, too, e.g. `/a*?/`, `/a*+/`
|
50
60
|
* Boolean "Or" groups, e.g. `/a|b|c/`
|
@@ -57,8 +67,9 @@ Or install it yourself as:
|
|
57
67
|
* Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
|
58
68
|
* Capture groups, e.g. `/(group)/`
|
59
69
|
* Including named groups, e.g. `/(?<name>group)/`
|
60
|
-
*
|
61
|
-
*
|
70
|
+
* And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
|
71
|
+
* ...even for the more "obscure" syntax, e.g. `/(?<future>the) \k'future'/`, `/(a)(b) \k<-1>/``
|
72
|
+
* ...and even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
|
62
73
|
* Non-capture groups, e.g. `/(?:foo)/`
|
63
74
|
* Comment groups, e.g. `/foo(?#comment)bar/`
|
64
75
|
* Control characters, e.g. `/\ca/`, `/\cZ/`, `/\C-9/`
|
@@ -66,7 +77,7 @@ Or install it yourself as:
|
|
66
77
|
* Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
|
67
78
|
* Octal characters, e.g. `/\10/`, `/\177/`
|
68
79
|
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
|
69
|
-
, `/\p{^Ll}/` ("Not a lowercase letter"),
|
80
|
+
, `/\p{^Ll}/` ("Not a lowercase letter"), `/\P{^Canadian_Aboriginal}/` ("Not not a Canadian aboriginal character")
|
70
81
|
* **Arbitrarily complex combinations of all the above!**
|
71
82
|
|
72
83
|
* Regexp options can also be used:
|
@@ -77,13 +88,13 @@ Or install it yourself as:
|
|
77
88
|
|
78
89
|
## Bugs and Not-Yet-Supported syntax
|
79
90
|
|
80
|
-
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
|
91
|
+
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...)
|
81
92
|
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.0 (see the pending pull request)!
|
82
93
|
|
83
|
-
|
94
|
+
Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
|
84
95
|
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
|
85
|
-
|
86
|
-
|
96
|
+
|
97
|
+
Some of the most obscure regexp features are not even mentioned in the ruby docs! However, full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE).
|
87
98
|
|
88
99
|
## Impossible features ("illegal syntax")
|
89
100
|
|
@@ -46,31 +46,60 @@ module RegexpExamples
|
|
46
46
|
when '\\'
|
47
47
|
group = parse_after_backslash_group
|
48
48
|
when '^'
|
49
|
-
|
50
|
-
group = PlaceHolderGroup.new # Ignore the "illegal" character
|
51
|
-
else
|
52
|
-
raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
|
53
|
-
end
|
49
|
+
group = parse_caret
|
54
50
|
when '$'
|
55
|
-
|
56
|
-
group = PlaceHolderGroup.new # Ignore the "illegal" character
|
57
|
-
else
|
58
|
-
raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
|
59
|
-
end
|
51
|
+
group = parse_dollar
|
60
52
|
when /[#\s]/
|
61
|
-
|
62
|
-
parse_extended_whitespace
|
63
|
-
group = PlaceHolderGroup.new # Ignore the whitespace/comment
|
64
|
-
else
|
65
|
-
group = parse_single_char_group(next_char)
|
66
|
-
end
|
53
|
+
group = parse_extended_whitespace
|
67
54
|
else
|
68
55
|
group = parse_single_char_group(next_char)
|
69
56
|
end
|
70
57
|
group
|
71
58
|
end
|
72
59
|
|
60
|
+
def parse_repeater(group)
|
61
|
+
case next_char
|
62
|
+
when '*'
|
63
|
+
repeater = parse_star_repeater(group)
|
64
|
+
when '+'
|
65
|
+
repeater = parse_plus_repeater(group)
|
66
|
+
when '?'
|
67
|
+
repeater = parse_question_mark_repeater(group)
|
68
|
+
when '{'
|
69
|
+
repeater = parse_range_repeater(group)
|
70
|
+
else
|
71
|
+
repeater = parse_one_time_repeater(group)
|
72
|
+
end
|
73
|
+
repeater
|
74
|
+
end
|
75
|
+
|
76
|
+
def parse_caret
|
77
|
+
if @current_position == 0
|
78
|
+
return PlaceHolderGroup.new # Ignore the "illegal" character
|
79
|
+
else
|
80
|
+
raise_anchors_exception!
|
81
|
+
end
|
82
|
+
end
|
83
|
+
|
84
|
+
def parse_dollar
|
85
|
+
if @current_position == (regexp_string.length - 1)
|
86
|
+
return PlaceHolderGroup.new # Ignore the "illegal" character
|
87
|
+
else
|
88
|
+
raise_anchors_exception!
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
73
92
|
def parse_extended_whitespace
|
93
|
+
if @extended
|
94
|
+
skip_whitespace
|
95
|
+
group = PlaceHolderGroup.new # Ignore the whitespace/comment
|
96
|
+
else
|
97
|
+
group = parse_single_char_group(next_char)
|
98
|
+
end
|
99
|
+
group
|
100
|
+
end
|
101
|
+
|
102
|
+
def skip_whitespace
|
74
103
|
whitespace_chars = rest_of_string.match(/#.*|\s+/)[0]
|
75
104
|
@current_position += whitespace_chars.length - 1
|
76
105
|
end
|
@@ -81,9 +110,11 @@ module RegexpExamples
|
|
81
110
|
when rest_of_string =~ /\A(\d{1,3})/
|
82
111
|
@current_position += ($1.length - 1) # In case of 10+ backrefs!
|
83
112
|
group = parse_backreference_group($1)
|
84
|
-
when rest_of_string =~ /\Ak<([
|
113
|
+
when rest_of_string =~ /\Ak['<]([\w-]+)['>]/ # Named capture group
|
85
114
|
@current_position += ($1.length + 2)
|
86
|
-
group
|
115
|
+
# Check for RELATIVE group number, e.g. /(a)(b)(c)(d) \k<-2>/
|
116
|
+
group_id = ($1.to_i < 0) ? (@num_groups + $1.to_i + 1) : $1
|
117
|
+
group = parse_backreference_group(group_id)
|
87
118
|
when BackslashCharMap.keys.include?(next_char)
|
88
119
|
group = CharGroup.new(
|
89
120
|
BackslashCharMap[next_char].dup,
|
@@ -117,18 +148,18 @@ module RegexpExamples
|
|
117
148
|
when next_char == 'g' # Subexpression call
|
118
149
|
raise IllegalSyntaxError, "Subexpression calls (\\g) cannot be supported, as they are not regular"
|
119
150
|
when next_char =~ /[bB]/ # Anchors
|
120
|
-
|
151
|
+
raise_anchors_exception!
|
121
152
|
when next_char =~ /[AG]/ # Start of string
|
122
153
|
if @current_position == 1
|
123
154
|
group = PlaceHolderGroup.new
|
124
155
|
else
|
125
|
-
|
156
|
+
raise_anchors_exception!
|
126
157
|
end
|
127
158
|
when next_char =~ /[zZ]/ # End of string
|
128
159
|
if @current_position == (regexp_string.length - 1)
|
129
160
|
group = PlaceHolderGroup.new
|
130
161
|
else
|
131
|
-
|
162
|
+
raise_anchors_exception!
|
132
163
|
end
|
133
164
|
else
|
134
165
|
group = parse_single_char_group( next_char )
|
@@ -136,31 +167,13 @@ module RegexpExamples
|
|
136
167
|
group
|
137
168
|
end
|
138
169
|
|
139
|
-
def parse_repeater(group)
|
140
|
-
case next_char
|
141
|
-
when '*'
|
142
|
-
repeater = parse_star_repeater(group)
|
143
|
-
when '+'
|
144
|
-
repeater = parse_plus_repeater(group)
|
145
|
-
when '?'
|
146
|
-
repeater = parse_question_mark_repeater(group)
|
147
|
-
when '{'
|
148
|
-
repeater = parse_range_repeater(group)
|
149
|
-
else
|
150
|
-
repeater = parse_one_time_repeater(group)
|
151
|
-
end
|
152
|
-
repeater
|
153
|
-
end
|
154
|
-
|
155
170
|
def parse_multi_group
|
156
171
|
@current_position += 1
|
157
172
|
@num_groups += 1
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
rest_of_string.match(
|
163
|
-
/
|
173
|
+
remember_old_regexp_options do
|
174
|
+
group_id = nil # init
|
175
|
+
rest_of_string.match(
|
176
|
+
/
|
164
177
|
\A
|
165
178
|
(\?)? # Is it a "special" group, i.e. starts with a "?"?
|
166
179
|
(
|
@@ -175,39 +188,48 @@ module RegexpExamples
|
|
175
188
|
|[^>]+ # Named capture
|
176
189
|
)
|
177
190
|
|[mix]*-?[mix]* # Option toggle
|
178
|
-
|
179
|
-
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
|
184
|
-
|
185
|
-
|
186
|
-
|
187
|
-
|
188
|
-
|
189
|
-
|
190
|
-
|
191
|
-
|
192
|
-
|
193
|
-
|
194
|
-
|
195
|
-
|
191
|
+
)?
|
192
|
+
/x
|
193
|
+
) do |match|
|
194
|
+
case
|
195
|
+
when match[1].nil? # e.g. /(normal)/
|
196
|
+
group_id = @num_groups.to_s
|
197
|
+
when match[2] == ':' # e.g. /(?:nocapture)/
|
198
|
+
@current_position += 2
|
199
|
+
when match[2] == '#' # e.g. /(?#comment)/
|
200
|
+
comment_group = rest_of_string.match(/.*?[^\\](?:\\{2})*\)/)[0]
|
201
|
+
@current_position += comment_group.length
|
202
|
+
when match[2] =~ /\A(?=[mix-]+)([mix]*)-?([mix]*)/ # e.g. /(?i-mx)/
|
203
|
+
regexp_options_toggle($1, $2)
|
204
|
+
@num_groups -= 1 # Toggle "groups" should not increase backref group count
|
205
|
+
@current_position += $&.length + 1
|
206
|
+
if next_char == ':' # e.g. /(?i:subexpr)/
|
207
|
+
@current_position += 1
|
208
|
+
else
|
209
|
+
return PlaceHolderGroup.new
|
210
|
+
end
|
211
|
+
when %w(! =).include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
|
212
|
+
raise IllegalSyntaxError, "Lookaheads are not regular; cannot generate examples"
|
213
|
+
when %w(! =).include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
|
214
|
+
raise IllegalSyntaxError, "Lookbehinds are not regular; cannot generate examples"
|
215
|
+
else # e.g. /(?<name>namedgroup)/
|
216
|
+
@current_position += (match[3].length + 3)
|
217
|
+
group_id = match[3]
|
196
218
|
end
|
197
|
-
when %w(! =).include?(match[2]) # e.g. /(?=lookahead)/, /(?!neglookahead)/
|
198
|
-
raise IllegalSyntaxError, "Lookaheads are not regular; cannot generate examples"
|
199
|
-
when %w(! =).include?(match[3]) # e.g. /(?<=lookbehind)/, /(?<!neglookbehind)/
|
200
|
-
raise IllegalSyntaxError, "Lookbehinds are not regular; cannot generate examples"
|
201
|
-
else # e.g. /(?<name>namedgroup)/
|
202
|
-
@current_position += (match[3].length + 3)
|
203
|
-
group_id = match[3]
|
204
219
|
end
|
220
|
+
MultiGroup.new(parse, group_id)
|
205
221
|
end
|
206
|
-
|
222
|
+
end
|
223
|
+
|
224
|
+
def remember_old_regexp_options
|
225
|
+
previous_ignorecase = @ignorecase
|
226
|
+
previous_multiline = @multiline
|
227
|
+
previous_extended = @extended
|
228
|
+
group = yield
|
207
229
|
@ignorecase = previous_ignorecase
|
208
230
|
@multiline = previous_multiline
|
209
231
|
@extended = previous_extended
|
210
|
-
|
232
|
+
group
|
211
233
|
end
|
212
234
|
|
213
235
|
def regexp_options_toggle(on, off)
|
@@ -246,8 +268,8 @@ module RegexpExamples
|
|
246
268
|
SingleCharGroup.new(char, @ignorecase)
|
247
269
|
end
|
248
270
|
|
249
|
-
def parse_backreference_group(
|
250
|
-
BackReferenceGroup.new(
|
271
|
+
def parse_backreference_group(group_id)
|
272
|
+
BackReferenceGroup.new(group_id)
|
251
273
|
end
|
252
274
|
|
253
275
|
def parse_control_character(char)
|
@@ -308,6 +330,10 @@ module RegexpExamples
|
|
308
330
|
repeater
|
309
331
|
end
|
310
332
|
|
333
|
+
def raise_anchors_exception!
|
334
|
+
raise IllegalSyntaxError, "Anchors ('#{next_char}') cannot be supported, as they are not regular"
|
335
|
+
end
|
336
|
+
|
311
337
|
def parse_one_time_repeater(group)
|
312
338
|
OneTimeRepeater.new(group)
|
313
339
|
end
|
@@ -98,7 +98,8 @@ RSpec.describe Regexp, "#examples" do
|
|
98
98
|
/(normal)/,
|
99
99
|
/(?:nocapture)/,
|
100
100
|
/(?<name>namedgroup)/,
|
101
|
-
/(?<name>namedgroup) \k<name
|
101
|
+
/(?<name>namedgroup) \k<name>/,
|
102
|
+
/(?<name>namedgroup) \k'name'/
|
102
103
|
)
|
103
104
|
end
|
104
105
|
|
@@ -124,7 +125,8 @@ RSpec.describe Regexp, "#examples" do
|
|
124
125
|
/(a?(b?(c?(d?(e?)))))/,
|
125
126
|
/(a)? \1/,
|
126
127
|
/(a|(b)) \2/,
|
127
|
-
/([ab]){2} \1
|
128
|
+
/([ab]){2} \1/, # \1 should always be the LAST result of the capture group
|
129
|
+
/(ref1) (ref2) \k'1' \k<-1>/, # RELATIVE backref!
|
128
130
|
)
|
129
131
|
end
|
130
132
|
|
@@ -326,6 +328,7 @@ RSpec.describe Regexp, "#examples" do
|
|
326
328
|
it { expect(/a(?i)b(?-i)c/.examples).to eq %w{abc aBc}}
|
327
329
|
it { expect(/a(?x) b(?-x) c/.examples).to eq %w{ab\ c}}
|
328
330
|
it { expect(/(?m)./.examples(max_group_results: 999)).to include "\n" }
|
331
|
+
it { expect(/(?i)(a)-\1/.examples).to eq %w{a-a A-A}} # Toggle "groups" should not increase backref group count
|
329
332
|
end
|
330
333
|
context "subexpression" do
|
331
334
|
it { expect(/a(?i:b)c/.examples).to eq %w{abc aBc}}
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp-examples
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tom Lord
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-03-
|
11
|
+
date: 2015-03-07 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -85,7 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
85
85
|
version: '0'
|
86
86
|
requirements: []
|
87
87
|
rubyforge_project:
|
88
|
-
rubygems_version: 2.
|
88
|
+
rubygems_version: 2.4.5
|
89
89
|
signing_key:
|
90
90
|
specification_version: 4
|
91
91
|
summary: Extends the Regexp class with '#examples'
|