regexp-examples 1.0.0 → 1.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +16 -12
- data/lib/regexp-examples/parser.rb +7 -6
- data/lib/regexp-examples/version.rb +1 -1
- data/spec/regexp-examples_spec.rb +5 -1
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: fc845182adb1adaeed70de6139d27711a69dc81f
|
4
|
+
data.tar.gz: 3d1850382acaf7ee4c96c9acf924a585cec6939b
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0b2a8ff8619ba8bc4186a27491dac7140ff8e0d7e4cb87ccfd8e9047b0f392c7152e2988666edaf4df6a1d2db0961ac1dfdc0af32b9d50c3ddbda6ff60814c97
|
7
|
+
data.tar.gz: f3690a8f6d2089b57a57246d9ec2b25252a2bee5972536ef095bf45358348ce1c322e3d3b04b69121e8530e155c2c9c4542f5f48cdc6a83ea6366eb246af3f94
|
data/README.md
CHANGED
@@ -15,7 +15,7 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
15
15
|
## Usage
|
16
16
|
|
17
17
|
```ruby
|
18
|
-
/a*/.examples #=> [''
|
18
|
+
/a*/.examples #=> ['', 'a', 'aa']
|
19
19
|
/ab+/.examples #=> ['ab', 'abb', 'abbb']
|
20
20
|
/this|is|awesome/.examples #=> ['this', 'is', 'awesome']
|
21
21
|
/https?:\/\/(www\.)?github\.com/.examples #=> ['http://github.com',
|
@@ -23,7 +23,8 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
23
23
|
/(I(N(C(E(P(T(I(O(N)))))))))*/.examples #=> ["", "INCEPTION", "INCEPTIONINCEPTION"]
|
24
24
|
/\x74\x68\x69\x73/.examples #=> ["this"]
|
25
25
|
/\u6829/.examples #=> ["栩"]
|
26
|
-
/what about (backreferences\?) \1/.examples
|
26
|
+
/what about (backreferences\?) \1/.examples
|
27
|
+
#=> ['what about backreferences? backreferences?']
|
27
28
|
```
|
28
29
|
|
29
30
|
## Installation
|
@@ -45,9 +46,9 @@ Or install it yourself as:
|
|
45
46
|
## Supported syntax
|
46
47
|
|
47
48
|
* All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
|
48
|
-
* Reluctant and possissive repeaters work fine, too
|
49
|
+
* Reluctant and possissive repeaters work fine, too, e.g. `/a*?/`, `/a*+/`
|
49
50
|
* Boolean "Or" groups, e.g. `/a|b|c/`
|
50
|
-
* Character sets e.g. `/[abc]/` - including:
|
51
|
+
* Character sets, e.g. `/[abc]/` - including:
|
51
52
|
* Ranges, e.g.`/[A-Z0-9]/`
|
52
53
|
* Negation, e.g. `/[^a-z]/`
|
53
54
|
* Escaped characters, e.g. `/[\w\s\b]/`
|
@@ -57,14 +58,15 @@ Or install it yourself as:
|
|
57
58
|
* Capture groups, e.g. `/(group)/`
|
58
59
|
* Including named groups, e.g. `/(?<name>group)/`
|
59
60
|
* ...And backreferences(!!!), e.g. `/(this|that) \1/` `/(?<name>foo) \k<name>/`
|
60
|
-
* Groups work fine, even if nested or optional e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
|
61
|
+
* Groups work fine, even if nested or optional, e.g. `/(even(this(works?))) \1 \2 \3/`, `/what about (this)? \1/`
|
61
62
|
* Non-capture groups, e.g. `/(?:foo)/`
|
62
63
|
* Comment groups, e.g. `/foo(?#comment)bar/`
|
63
64
|
* Control characters, e.g. `/\ca/`, `/\cZ/`, `/\C-9/`
|
64
65
|
* Escape sequences, e.g. `/\x42/`, `/\x5word/`, `/#{"\x80".force_encoding("ASCII-8BIT")}/`
|
65
66
|
* Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
|
66
67
|
* Octal characters, e.g. `/\10/`, `/\177/`
|
67
|
-
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
|
68
|
+
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
|
69
|
+
, `/\p{^Ll}/` ("Not a lowercase letter"), `\P{^Canadian_Aboriginal}` ("Not not a Canadian aboriginal character")
|
68
70
|
* **Arbitrarily complex combinations of all the above!**
|
69
71
|
|
70
72
|
* Regexp options can also be used:
|
@@ -76,11 +78,12 @@ Or install it yourself as:
|
|
76
78
|
## Bugs and Not-Yet-Supported syntax
|
77
79
|
|
78
80
|
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
|
79
|
-
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This
|
81
|
+
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.0 (see the pending pull request)!
|
80
82
|
|
81
|
-
There are also some various (increasingly obscure) unsupported bits of syntax
|
83
|
+
There are also some various (increasingly obscure) unsupported bits of syntax; some of which I haven't yet investigated. Much of this is not even mentioned in the ruby docs! Full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE). To name a few:
|
82
84
|
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
|
83
|
-
* Back reference by
|
85
|
+
* Back reference by relative group number, e.g. `/(a)(b)(c)(d) \k<-2>/.examples` (which *should* return: `["abcd c"]`)
|
86
|
+
* Back reference using single quotes, and for group numbers, e.g. `/(a) \k'1'/.examples` (which is really just alternative syntax for `/(a) \1/`!)
|
84
87
|
|
85
88
|
## Impossible features ("illegal syntax")
|
86
89
|
|
@@ -92,7 +95,7 @@ Using any of the following will raise a RegexpExamples::IllegalSyntax exception:
|
|
92
95
|
* Lookarounds, e.g. `/foo(?=bar)/`, `/foo(?!bar)/`, `/(?<=foo)bar/`, `/(?<!foo)bar/`
|
93
96
|
* [Anchors](http://ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Anchors) (`\b`, `\B`, `\G`, `^`, `\A`, `$`, `\z`, `\Z`), e.g. `/\bword\b/`, `/line1\n^line2/`
|
94
97
|
* However, a special case has been made to allow `^`, `\A` and `\G` at the start of a pattern; and to allow `$`, `\z` and `\Z` at the end of pattern. In such cases, the characters are effectively just ignored.
|
95
|
-
* Subexpression calls, e.g. `/(?<name> ... \g<name>* )/`
|
98
|
+
* Subexpression calls (`\g`), e.g. `/(?<name> ... \g<name>* )/`
|
96
99
|
|
97
100
|
(Note: Backreferences are not really "regular" either, but I got these to work with a bit of hackery!)
|
98
101
|
|
@@ -137,8 +140,9 @@ A more sensible use case might be, for example, to generate one random 1-4 digit
|
|
137
140
|
## TODO
|
138
141
|
|
139
142
|
* Performance improvements:
|
140
|
-
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time.
|
141
|
-
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes"
|
143
|
+
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time. See the pending pull request.
|
144
|
+
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes".
|
145
|
+
* Potential future feature: `Regexp#random_example` - but implementing this properly is non-trivial, due to performance issues that need addressing first!
|
142
146
|
* Write a blog post about how this amazing gem works! :)
|
143
147
|
|
144
148
|
## Contributing
|
@@ -99,13 +99,14 @@ module RegexpExamples
|
|
99
99
|
@current_position += $1.length
|
100
100
|
sequence = $1.match(/\h{1,4}/)[0] # Strip off "{" and "}"
|
101
101
|
group = parse_single_char_group( parse_unicode_sequence(sequence) )
|
102
|
-
when rest_of_string =~ /\
|
103
|
-
@current_position += ($
|
102
|
+
when rest_of_string =~ /\A(p)\{(\^?)([^}]+)\}/i # Named properties
|
103
|
+
@current_position += ($2.length + $3.length + 2)
|
104
|
+
is_negative = ($1 == "P") ^ ($2 == "^") # Beware of double negatives! E.g. /\P{^Space}/
|
104
105
|
group = CharGroup.new(
|
105
|
-
if
|
106
|
-
CharSets::Any.dup - NamedPropertyCharMap[$
|
106
|
+
if is_negative
|
107
|
+
CharSets::Any.dup - NamedPropertyCharMap[$3.downcase]
|
107
108
|
else
|
108
|
-
NamedPropertyCharMap[$
|
109
|
+
NamedPropertyCharMap[$3.downcase]
|
109
110
|
end,
|
110
111
|
@ignorecase
|
111
112
|
)
|
@@ -114,7 +115,7 @@ module RegexpExamples
|
|
114
115
|
when next_char == 'R' # Linebreak
|
115
116
|
group = CharGroup.new(["\r\n", "\n", "\v", "\f", "\r"], @ignorecase) # A bit hacky...
|
116
117
|
when next_char == 'g' # Subexpression call
|
117
|
-
raise IllegalSyntaxError, "Subexpression calls (
|
118
|
+
raise IllegalSyntaxError, "Subexpression calls (\\g) cannot be supported, as they are not regular"
|
118
119
|
when next_char =~ /[bB]/ # Anchors
|
119
120
|
raise IllegalSyntaxError, "Anchors ('\\#{next_char}') cannot be supported, as they are not regular"
|
120
121
|
when next_char =~ /[AG]/ # Start of string
|
@@ -80,6 +80,8 @@ RSpec.describe Regexp, "#examples" do
|
|
80
80
|
/[\\\]]/,
|
81
81
|
/[\n-\r]/,
|
82
82
|
/[\-]/,
|
83
|
+
/[-abc]/,
|
84
|
+
/[abc-]/,
|
83
85
|
/[%-+]/, # This regex is "supposed to" match some surprising things!!!
|
84
86
|
/['-.]/, # Test to ensure no "infinite loop" on character set expansion
|
85
87
|
/[[abc]]/, # Nested groups
|
@@ -180,7 +182,9 @@ RSpec.describe Regexp, "#examples" do
|
|
180
182
|
/\p{L}/,
|
181
183
|
/\p{Space}/,
|
182
184
|
/\p{AlPhA}/, # Checking case insensitivity
|
183
|
-
/\p{^Ll}
|
185
|
+
/\p{^Ll}/,
|
186
|
+
/\P{Ll}/,
|
187
|
+
/\P{^Ll}/ # Double negative!!
|
184
188
|
)
|
185
189
|
|
186
190
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp-examples
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tom Lord
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-03-
|
11
|
+
date: 2015-03-04 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|