regexp-examples 0.7.0 → 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +1 -0
- data/Gemfile +5 -4
- data/README.md +33 -23
- data/lib/regexp-examples/chargroup_parser.rb +97 -48
- data/lib/regexp-examples/constants.rb +130 -130
- data/lib/regexp-examples/parser.rb +6 -26
- data/lib/regexp-examples/version.rb +1 -1
- data/scripts/unicode_lister.rb +1 -1
- data/spec/regexp-examples_spec.rb +9 -3
- data/spec/spec_helper.rb +2 -9
- metadata +3 -5
- data/coverage/.gitignore +0 -4
- data/coverage/coverage-badge.png +0 -0
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 39d4ef9f2ee3e17541118580954e0a9f137f969b
|
4
|
+
data.tar.gz: 9e573acd0b5cdcf700bfe4f49a838933e6a01737
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 50ee2b66ced2a4878309566088a493b8cc7654d44db6b91458c0f04f4015cbc9a2e437b7444670b28a78475a5df715d89dd8a81fa1e9d508095c234da830c782
|
7
|
+
data.tar.gz: 4785f7967bc3549629c2fbb5f74f0d7343076a028d33265344b9f1cb0b74c39df2c4f402ab97baeb42fb536dbd31132eb3fb69d462cab5a3ffb0f568ae91bed5
|
data/.gitignore
CHANGED
data/Gemfile
CHANGED
@@ -1,9 +1,10 @@
|
|
1
1
|
source 'https://rubygems.org'
|
2
2
|
|
3
|
-
|
4
|
-
gem '
|
5
|
-
gem '
|
6
|
-
gem 'pry'
|
3
|
+
group :test do
|
4
|
+
gem 'rspec'
|
5
|
+
gem 'coveralls', require: false
|
6
|
+
gem 'pry'
|
7
|
+
end
|
7
8
|
|
8
9
|
# Specify your gem's dependencies in regexp-examples.gemspec
|
9
10
|
gemspec
|
data/README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
# regexp-examples
|
2
2
|
[](http://badge.fury.io/rb/regexp-examples)
|
3
3
|
[](https://travis-ci.org/tom-lord/regexp-examples/builds)
|
4
|
-
](https://coveralls.io/r/tom-lord/regexp-examples?branch=master)
|
5
5
|
|
6
6
|
Extends the Regexp class with the method: Regexp#examples
|
7
7
|
|
@@ -26,12 +26,33 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
26
26
|
/what about (backreferences\?) \1/.examples #=> ['what about backreferences? backreferences?']
|
27
27
|
```
|
28
28
|
|
29
|
+
## Installation
|
30
|
+
|
31
|
+
Add this line to your application's Gemfile:
|
32
|
+
|
33
|
+
```ruby
|
34
|
+
gem 'regexp-examples'
|
35
|
+
```
|
36
|
+
|
37
|
+
And then execute:
|
38
|
+
|
39
|
+
$ bundle
|
40
|
+
|
41
|
+
Or install it yourself as:
|
42
|
+
|
43
|
+
$ gem install regexp-examples
|
44
|
+
|
29
45
|
## Supported syntax
|
30
46
|
|
31
47
|
* All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
|
32
48
|
* Reluctant and possissive repeaters work fine, too - e.g. `/a*?/`, `/a*+/`
|
33
49
|
* Boolean "Or" groups, e.g. `/a|b|c/`
|
34
|
-
* Character sets
|
50
|
+
* Character sets e.g. `/[abc]/` - including:
|
51
|
+
* Ranges, e.g.`/[A-Z0-9]/`
|
52
|
+
* Negation, e.g. `/[^a-z]/`
|
53
|
+
* Escaped characters, e.g. `/[\w\s\b]/`
|
54
|
+
* POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
|
55
|
+
* Set intersection, e.g. `/[[a-h]&&[f-z]]/`
|
35
56
|
* Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
|
36
57
|
* Capture groups, e.g. `/(group)/`
|
37
58
|
* Including named groups, e.g. `/(?<name>group)/`
|
@@ -43,7 +64,6 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
43
64
|
* Escape sequences, e.g. `/\x42/`, `/\x5word/`, `/#{"\x80".force_encoding("ASCII-8BIT")}/`
|
44
65
|
* Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
|
45
66
|
* Octal characters, e.g. `/\10/`, `/\177/`
|
46
|
-
* POSIX bracket expressions (including negation), e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
|
47
67
|
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character"), `/\p{^Ll}/` ("Not a lowercase letter")
|
48
68
|
* **Arbitrarily complex combinations of all the above!**
|
49
69
|
|
@@ -55,13 +75,12 @@ For more detail on this, see [configuration options](#configuration-options).
|
|
55
75
|
|
56
76
|
## Bugs and Not-Yet-Supported syntax
|
57
77
|
|
58
|
-
*
|
59
|
-
|
60
|
-
* `/[[a-d]&&[c-f]]/.examples` (which _should_ return: `["c", "d"]`)
|
78
|
+
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
|
79
|
+
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This would be "easy" to fix, but I can't be bothered... Feel free to make a pull request!
|
61
80
|
|
62
|
-
|
63
|
-
|
64
|
-
|
81
|
+
There are also some various (increasingly obscure) unsupported bits of syntax, which I cannot be bothered to write out fully here. Full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE). To name a couple:
|
82
|
+
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
|
83
|
+
* Back reference by relatve group number, e.g. `/(a)(b)(c)(d) \k<-2>/.examples` (which *should* return: `["abcd c"]`)
|
65
84
|
|
66
85
|
## Impossible features ("illegal syntax")
|
67
86
|
|
@@ -115,21 +134,12 @@ A more sensible use case might be, for example, to generate one random 1-4 digit
|
|
115
134
|
|
116
135
|
(Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
|
117
136
|
|
118
|
-
##
|
119
|
-
|
120
|
-
Add this line to your application's Gemfile:
|
121
|
-
|
122
|
-
```ruby
|
123
|
-
gem 'regexp-examples'
|
124
|
-
```
|
137
|
+
## TODO
|
125
138
|
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
Or install it yourself as:
|
131
|
-
|
132
|
-
$ gem install regexp-examples
|
139
|
+
* Performance improvements:
|
140
|
+
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time.
|
141
|
+
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes"
|
142
|
+
* Write a blog post about how this amazing gem works! :)
|
133
143
|
|
134
144
|
## Contributing
|
135
145
|
|
@@ -1,69 +1,118 @@
|
|
1
1
|
module RegexpExamples
|
2
|
-
#
|
3
|
-
#
|
4
|
-
#
|
5
|
-
#
|
6
|
-
#
|
7
|
-
#
|
2
|
+
# A "sub-parser", for char groups in a regular expression
|
3
|
+
# Some examples of what this class needs to parse:
|
4
|
+
# [abc] - plain characters
|
5
|
+
# [a-z] - ranges
|
6
|
+
# [\n\b\d] - escaped characters (which may represent character sets)
|
7
|
+
# [^abc] - negated group
|
8
|
+
# [[a][bc]] - sub-groups (should match "a", "b" or "c")
|
9
|
+
# [[:lower:]] - POSIX group
|
10
|
+
# [[a-f]&&[d-z]] - set intersection (should match "d", "f" or "f")
|
11
|
+
# [[^:alpha:]&&[\n]a-c] - all of the above!!!! (should match "\n")
|
8
12
|
class ChargroupParser
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
13
|
+
attr_reader :regexp_string
|
14
|
+
def initialize(regexp_string, is_sub_group: false)
|
15
|
+
@regexp_string = regexp_string
|
16
|
+
@is_sub_group = is_sub_group
|
17
|
+
@current_position = 0
|
18
|
+
parse
|
19
|
+
end
|
20
|
+
|
21
|
+
def parse
|
22
|
+
@charset = []
|
23
|
+
@negative = false
|
24
|
+
parse_first_chars
|
25
|
+
until next_char == "]" do
|
26
|
+
case next_char
|
27
|
+
when "["
|
28
|
+
@current_position += 1
|
29
|
+
sub_group_parser = self.class.new(rest_of_string, is_sub_group: true)
|
30
|
+
@charset.concat sub_group_parser.result
|
31
|
+
@current_position += sub_group_parser.length
|
32
|
+
when "-"
|
33
|
+
if regexp_string[@current_position + 1] == "]" # e.g. /[abc-]/ -- not a range!
|
34
|
+
@charset << "-"
|
35
|
+
@current_position += 1
|
36
|
+
else
|
37
|
+
@current_position += 1
|
38
|
+
@charset.concat (@charset.last .. parse_checking_backlash.first).to_a
|
39
|
+
@current_position += 1
|
40
|
+
end
|
41
|
+
when "&"
|
42
|
+
if regexp_string[@current_position + 1] == "&"
|
43
|
+
@current_position += 2
|
44
|
+
sub_group_parser = self.class.new(rest_of_string, is_sub_group: @is_sub_group)
|
45
|
+
@charset &= sub_group_parser.result
|
46
|
+
@current_position += (sub_group_parser.length - 1)
|
47
|
+
else
|
48
|
+
@charset << "&"
|
49
|
+
@current_position += 1
|
50
|
+
end
|
51
|
+
else
|
52
|
+
@charset.concat parse_checking_backlash
|
53
|
+
@current_position += 1
|
54
|
+
end
|
16
55
|
end
|
17
56
|
|
18
|
-
|
19
|
-
|
57
|
+
@charset.uniq!
|
58
|
+
@current_position += 1 # To account for final "]"
|
59
|
+
end
|
60
|
+
|
61
|
+
def length
|
62
|
+
@current_position
|
20
63
|
end
|
21
64
|
|
22
65
|
def result
|
23
|
-
@negative ? (CharSets::Any - @
|
66
|
+
@negative ? (CharSets::Any - @charset) : @charset
|
24
67
|
end
|
25
68
|
|
26
69
|
private
|
27
|
-
def
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
|
70
|
+
def parse_first_chars
|
71
|
+
if next_char == '^'
|
72
|
+
@negative = true
|
73
|
+
@current_position += 1
|
74
|
+
end
|
75
|
+
|
76
|
+
case rest_of_string
|
77
|
+
when /\A[-\]]/ # e.g. /[]]/ (match "]") or /[-]/ (match "-")
|
78
|
+
@charset << next_char
|
79
|
+
@current_position += 1
|
80
|
+
when /\A:(\^?)([^:]+):\]/ # e.g. [[:alpha:]] - POSIX group
|
81
|
+
if @is_sub_group
|
82
|
+
chars = $1.empty? ? POSIXCharMap[$2] : (CharSets::Any - POSIXCharMap[$2])
|
83
|
+
@charset.concat chars
|
84
|
+
@current_position += ($1.length + $2.length + 2)
|
39
85
|
end
|
40
86
|
end
|
41
87
|
end
|
42
88
|
|
43
|
-
|
44
|
-
|
45
|
-
|
46
|
-
|
47
|
-
|
48
|
-
|
49
|
-
|
50
|
-
# Prevent infinite loops from expanding [",", "-", "."] to itself
|
51
|
-
# (Since ",".ord = 44, "-".ord = 45, ".".ord = 46)
|
52
|
-
if (@chars[i-1] == ',' && @chars[i+1] == '.')
|
53
|
-
hyphen = @chars.delete_at(i)
|
54
|
-
else
|
55
|
-
@chars[i-1..i+1] = (@chars[i-1]..@chars[i+1]).to_a
|
56
|
-
end
|
89
|
+
# Always returns an Array, for consistency
|
90
|
+
def parse_checking_backlash
|
91
|
+
if next_char == "\\"
|
92
|
+
@current_position += 1
|
93
|
+
parse_after_backslash
|
94
|
+
else
|
95
|
+
[next_char]
|
57
96
|
end
|
58
|
-
# restore hyphen, if stripped out earlier
|
59
|
-
@chars.unshift(hyphen) if hyphen
|
60
97
|
end
|
61
98
|
|
62
|
-
def
|
63
|
-
|
64
|
-
|
99
|
+
def parse_after_backslash
|
100
|
+
case next_char
|
101
|
+
when *BackslashCharMap.keys
|
102
|
+
BackslashCharMap[next_char]
|
103
|
+
when 'b'
|
104
|
+
["\b"]
|
105
|
+
else
|
106
|
+
[next_char]
|
65
107
|
end
|
66
|
-
|
108
|
+
end
|
109
|
+
|
110
|
+
def rest_of_string
|
111
|
+
regexp_string[@current_position..-1]
|
112
|
+
end
|
113
|
+
|
114
|
+
def next_char
|
115
|
+
regexp_string[@current_position]
|
67
116
|
end
|
68
117
|
end
|
69
118
|
end
|
@@ -105,136 +105,136 @@ module RegexpExamples
|
|
105
105
|
# Note: Only the first 128 results are listed, for performance.
|
106
106
|
# Also, some groups seem to have no matches (weird!)
|
107
107
|
NamedPropertyCharMap = {
|
108
|
-
'
|
109
|
-
'
|
110
|
-
'
|
111
|
-
'
|
112
|
-
'
|
113
|
-
'
|
114
|
-
'
|
115
|
-
'
|
116
|
-
'
|
117
|
-
'
|
118
|
-
'
|
119
|
-
'
|
120
|
-
'
|
121
|
-
'
|
122
|
-
'
|
123
|
-
'
|
124
|
-
'
|
125
|
-
'
|
126
|
-
'
|
127
|
-
'
|
128
|
-
'
|
129
|
-
'
|
130
|
-
'
|
131
|
-
'
|
132
|
-
'
|
133
|
-
'
|
134
|
-
'
|
135
|
-
'
|
136
|
-
'
|
137
|
-
'
|
138
|
-
'
|
139
|
-
'
|
140
|
-
'
|
141
|
-
'
|
142
|
-
'
|
143
|
-
'
|
144
|
-
'
|
145
|
-
'
|
146
|
-
'
|
147
|
-
'
|
148
|
-
'
|
149
|
-
'
|
150
|
-
'
|
151
|
-
'
|
152
|
-
'
|
153
|
-
'
|
154
|
-
'
|
155
|
-
'
|
156
|
-
'
|
157
|
-
'
|
158
|
-
'
|
159
|
-
'
|
160
|
-
'
|
161
|
-
'
|
162
|
-
'
|
163
|
-
'
|
164
|
-
'
|
165
|
-
'
|
166
|
-
'
|
167
|
-
'
|
168
|
-
'
|
169
|
-
'
|
170
|
-
'
|
171
|
-
'
|
172
|
-
'
|
173
|
-
'
|
174
|
-
'
|
175
|
-
'
|
176
|
-
'
|
177
|
-
'
|
178
|
-
'
|
179
|
-
'
|
180
|
-
'
|
181
|
-
'
|
182
|
-
'
|
183
|
-
'
|
184
|
-
'
|
185
|
-
'
|
186
|
-
'
|
187
|
-
'
|
188
|
-
'
|
189
|
-
'
|
190
|
-
'
|
191
|
-
'
|
192
|
-
'
|
193
|
-
'
|
194
|
-
'
|
195
|
-
'
|
196
|
-
'
|
197
|
-
'
|
198
|
-
'
|
199
|
-
'
|
200
|
-
'
|
201
|
-
'
|
202
|
-
'
|
203
|
-
'
|
204
|
-
'
|
205
|
-
'
|
206
|
-
'
|
207
|
-
'
|
208
|
-
'
|
209
|
-
'
|
210
|
-
'
|
211
|
-
'
|
212
|
-
'
|
213
|
-
'
|
214
|
-
'
|
215
|
-
'
|
216
|
-
'
|
217
|
-
'
|
218
|
-
'
|
219
|
-
'
|
220
|
-
'
|
221
|
-
'
|
222
|
-
'
|
223
|
-
'
|
224
|
-
'
|
225
|
-
'
|
226
|
-
'
|
227
|
-
'
|
228
|
-
'
|
229
|
-
'
|
230
|
-
'
|
231
|
-
'
|
232
|
-
'
|
233
|
-
'
|
234
|
-
'
|
235
|
-
'
|
236
|
-
'
|
237
|
-
'
|
108
|
+
'alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
|
109
|
+
'alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
|
110
|
+
'blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
|
111
|
+
'cntrl' => ranges_to_unicode(0..31, 127..159),
|
112
|
+
'digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
|
113
|
+
'graph' => ranges_to_unicode(33..126, 161..194),
|
114
|
+
'lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
|
115
|
+
'print' => ranges_to_unicode(32..126, 160..192),
|
116
|
+
'punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
|
117
|
+
'space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
|
118
|
+
'upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
|
119
|
+
'xdigit' => ranges_to_unicode(48..57, 65..70, 97..102),
|
120
|
+
'word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
|
121
|
+
'ascii' => ranges_to_unicode(0..127),
|
122
|
+
'any' => ranges_to_unicode(0..127),
|
123
|
+
'assigned' => ranges_to_unicode(0..127),
|
124
|
+
'l' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
|
125
|
+
'll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
|
126
|
+
'lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
|
127
|
+
'lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
|
128
|
+
'lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
|
129
|
+
'lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
|
130
|
+
'm' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
|
131
|
+
'mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
|
132
|
+
'mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
|
133
|
+
'me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
|
134
|
+
'n' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
|
135
|
+
'nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
|
136
|
+
'nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
|
137
|
+
'no' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
|
138
|
+
'p' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
|
139
|
+
'pc' => ranges_to_unicode(95, 8255..8256, 8276),
|
140
|
+
'pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
|
141
|
+
'ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
|
142
|
+
'pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
|
143
|
+
'pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
|
144
|
+
'pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
|
145
|
+
'po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
|
146
|
+
's' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
|
147
|
+
'sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
|
148
|
+
'sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
|
149
|
+
'sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
|
150
|
+
'so' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
|
151
|
+
'z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
|
152
|
+
'zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
|
153
|
+
'zl' => ranges_to_unicode(8232),
|
154
|
+
'zp' => ranges_to_unicode(8233),
|
155
|
+
'c' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
|
156
|
+
'cc' => ranges_to_unicode(0..31, 127..159),
|
157
|
+
'cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
|
158
|
+
'cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
|
159
|
+
'co' => ranges_to_unicode(),
|
160
|
+
'cs' => ranges_to_unicode(),
|
161
|
+
'arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
|
162
|
+
'armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
|
163
|
+
'balinese' => ranges_to_unicode(6912..6987, 6992..7036),
|
164
|
+
'bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
|
165
|
+
'bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
|
166
|
+
'braille' => ranges_to_unicode(10240..10367),
|
167
|
+
'buginese' => ranges_to_unicode(6656..6683, 6686..6687),
|
168
|
+
'buhid' => ranges_to_unicode(5952..5971),
|
169
|
+
'canadian_aboriginal' => ranges_to_unicode(5120..5247),
|
170
|
+
'carian' => ranges_to_unicode(),
|
171
|
+
'cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
|
172
|
+
'cherokee' => ranges_to_unicode(5024..5108),
|
173
|
+
'common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
|
174
|
+
'coptic' => ranges_to_unicode(994..1007, 11392..11505),
|
175
|
+
'cuneiform' => ranges_to_unicode(),
|
176
|
+
'cypriot' => ranges_to_unicode(),
|
177
|
+
'cyrillic' => ranges_to_unicode(1024..1151),
|
178
|
+
'deseret' => ranges_to_unicode(),
|
179
|
+
'devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
|
180
|
+
'ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
|
181
|
+
'georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
|
182
|
+
'glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
|
183
|
+
'gothic' => ranges_to_unicode(),
|
184
|
+
'greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
|
185
|
+
'gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
|
186
|
+
'gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
|
187
|
+
'han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
|
188
|
+
'hangul' => ranges_to_unicode(4352..4479),
|
189
|
+
'hanunoo' => ranges_to_unicode(5920..5940),
|
190
|
+
'hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
|
191
|
+
'hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
|
192
|
+
'inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
|
193
|
+
'kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
|
194
|
+
'katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
|
195
|
+
'kayah_li' => ranges_to_unicode(43264..43309, 43311),
|
196
|
+
'kharoshthi' => ranges_to_unicode(),
|
197
|
+
'khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
|
198
|
+
'lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
|
199
|
+
'latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
|
200
|
+
'lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
|
201
|
+
'limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
|
202
|
+
'linear_b' => ranges_to_unicode(),
|
203
|
+
'lycian' => ranges_to_unicode(),
|
204
|
+
'lydian' => ranges_to_unicode(),
|
205
|
+
'malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
|
206
|
+
'mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
|
207
|
+
'myanmar' => ranges_to_unicode(4096..4223),
|
208
|
+
'new_tai_lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
|
209
|
+
'nko' => ranges_to_unicode(1984..2042),
|
210
|
+
'ogham' => ranges_to_unicode(5760..5788),
|
211
|
+
'ol_chiki' => ranges_to_unicode(7248..7295),
|
212
|
+
'old_italic' => ranges_to_unicode(),
|
213
|
+
'old_persian' => ranges_to_unicode(),
|
214
|
+
'oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
|
215
|
+
'osmanya' => ranges_to_unicode(),
|
216
|
+
'phags_pa' => ranges_to_unicode(43072..43127),
|
217
|
+
'phoenician' => ranges_to_unicode(),
|
218
|
+
'rejang' => ranges_to_unicode(43312..43347, 43359),
|
219
|
+
'runic' => ranges_to_unicode(5792..5866, 5870..5880),
|
220
|
+
'saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
|
221
|
+
'shavian' => ranges_to_unicode(),
|
222
|
+
'sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
|
223
|
+
'sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
|
224
|
+
'syloti_nagri' => ranges_to_unicode(43008..43051),
|
225
|
+
'syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
|
226
|
+
'tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
|
227
|
+
'tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
|
228
|
+
'tai_le' => ranges_to_unicode(6480..6509, 6512..6516),
|
229
|
+
'tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
|
230
|
+
'telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
|
231
|
+
'thaana' => ranges_to_unicode(1920..1969),
|
232
|
+
'thai' => ranges_to_unicode(3585..3642, 3648..3675),
|
233
|
+
'tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
|
234
|
+
'tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
|
235
|
+
'ugaritic' => ranges_to_unicode(),
|
236
|
+
'vai' => ranges_to_unicode(42240..42367),
|
237
|
+
'yi' => ranges_to_unicode(40960..41087),
|
238
238
|
}.freeze
|
239
239
|
end
|
240
240
|
|
@@ -103,9 +103,9 @@ module RegexpExamples
|
|
103
103
|
@current_position += ($1.length + $2.length + 2)
|
104
104
|
group = CharGroup.new(
|
105
105
|
if($1 == "^")
|
106
|
-
CharSets::Any.dup - NamedPropertyCharMap[$2]
|
106
|
+
CharSets::Any.dup - NamedPropertyCharMap[$2.downcase]
|
107
107
|
else
|
108
|
-
NamedPropertyCharMap[$2]
|
108
|
+
NamedPropertyCharMap[$2.downcase]
|
109
109
|
end,
|
110
110
|
@ignorecase
|
111
111
|
)
|
@@ -223,30 +223,10 @@ module RegexpExamples
|
|
223
223
|
end
|
224
224
|
|
225
225
|
def parse_char_group
|
226
|
-
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
return CharGroup.new(chars, @ignorecase)
|
231
|
-
end
|
232
|
-
chars = []
|
233
|
-
@current_position += 1
|
234
|
-
if next_char == ']'
|
235
|
-
# Beware of the sneaky edge case:
|
236
|
-
# /[]]/ (match "]")
|
237
|
-
chars << ']'
|
238
|
-
@current_position += 1
|
239
|
-
end
|
240
|
-
until next_char == ']' \
|
241
|
-
&& !regexp_string[0..@current_position-1].match(/[^\\](\\{2})*\\\z/)
|
242
|
-
# Beware of having an ODD number of "\" before the "]", e.g.
|
243
|
-
# /[\]]/ (match "]")
|
244
|
-
# /[\\]/ (match "\")
|
245
|
-
# /[\\\]]/ (match "\" or "]")
|
246
|
-
chars << next_char
|
247
|
-
@current_position += 1
|
248
|
-
end
|
249
|
-
parsed_chars = ChargroupParser.new(chars).result
|
226
|
+
@current_position += 1 # Skip past opening "["
|
227
|
+
chargroup_parser = ChargroupParser.new(rest_of_string)
|
228
|
+
parsed_chars = chargroup_parser.result
|
229
|
+
@current_position += (chargroup_parser.length - 1) # Step back to closing "]"
|
250
230
|
CharGroup.new(parsed_chars, @ignorecase)
|
251
231
|
end
|
252
232
|
|
data/scripts/unicode_lister.rb
CHANGED
@@ -171,7 +171,7 @@ File.open(OutputFilename, 'w') do |f|
|
|
171
171
|
NamedGroups.each do |name|
|
172
172
|
count += 1
|
173
173
|
matching_codes = (0..55295).lazy.select { |x| /\p{#{name}}/ =~ eval("?\\u{#{x.to_s(16)}}") }.first(128)
|
174
|
-
f.puts "'#{name}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
|
174
|
+
f.puts "'#{name.downcase}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
|
175
175
|
puts "(#{count}/#{NamedGroups.length}) Finished property: #{name}"
|
176
176
|
end
|
177
177
|
puts "*"*50
|
@@ -69,7 +69,6 @@ RSpec.describe Regexp, "#examples" do
|
|
69
69
|
|
70
70
|
context "for complex char groups (square brackets)" do
|
71
71
|
examples_exist_and_match(
|
72
|
-
|
73
72
|
/[abc]/,
|
74
73
|
/[a-c]/,
|
75
74
|
/[abc-e]/,
|
@@ -82,7 +81,13 @@ RSpec.describe Regexp, "#examples" do
|
|
82
81
|
/[\n-\r]/,
|
83
82
|
/[\-]/,
|
84
83
|
/[%-+]/, # This regex is "supposed to" match some surprising things!!!
|
85
|
-
/['-.]
|
84
|
+
/['-.]/, # Test to ensure no "infinite loop" on character set expansion
|
85
|
+
/[[abc]]/, # Nested groups
|
86
|
+
/[[[[abc]]]]/,
|
87
|
+
/[[a][b][c]]/,
|
88
|
+
/[[a-h]&&[f-z]]/, # Set intersection
|
89
|
+
/[[a-h]&&ab[c]]/, # Set intersection
|
90
|
+
/[[a-h]&[f-z]]/, # NOT set intersection
|
86
91
|
)
|
87
92
|
end
|
88
93
|
|
@@ -173,7 +178,8 @@ RSpec.describe Regexp, "#examples" do
|
|
173
178
|
context "for named properties" do
|
174
179
|
examples_exist_and_match(
|
175
180
|
/\p{L}/,
|
176
|
-
/\p{
|
181
|
+
/\p{Space}/,
|
182
|
+
/\p{AlPhA}/, # Checking case insensitivity
|
177
183
|
/\p{^Ll}/
|
178
184
|
)
|
179
185
|
|
data/spec/spec_helper.rb
CHANGED
@@ -1,12 +1,5 @@
|
|
1
|
-
require '
|
2
|
-
|
3
|
-
require 'simplecov-badge'
|
4
|
-
SimpleCov::Formatter::BadgeFormatter.strength_foreground = true
|
5
|
-
SimpleCov.formatter = SimpleCov::Formatter::MultiFormatter[
|
6
|
-
SimpleCov::Formatter::HTMLFormatter,
|
7
|
-
SimpleCov::Formatter::BadgeFormatter,
|
8
|
-
]
|
9
|
-
end
|
1
|
+
require 'coveralls'
|
2
|
+
Coveralls.wear!
|
10
3
|
|
11
4
|
require './lib/regexp-examples.rb'
|
12
5
|
require 'pry'
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: regexp-examples
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 1.0.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Tom Lord
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2015-02
|
11
|
+
date: 2015-03-02 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: bundler
|
@@ -51,8 +51,6 @@ files:
|
|
51
51
|
- LICENSE.txt
|
52
52
|
- README.md
|
53
53
|
- Rakefile
|
54
|
-
- coverage/.gitignore
|
55
|
-
- coverage/coverage-badge.png
|
56
54
|
- lib/regexp-examples.rb
|
57
55
|
- lib/regexp-examples/backreferences.rb
|
58
56
|
- lib/regexp-examples/chargroup_parser.rb
|
@@ -87,7 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
87
85
|
version: '0'
|
88
86
|
requirements: []
|
89
87
|
rubyforge_project:
|
90
|
-
rubygems_version: 2.
|
88
|
+
rubygems_version: 2.2.2
|
91
89
|
signing_key:
|
92
90
|
specification_version: 4
|
93
91
|
summary: Extends the Regexp class with '#examples'
|
data/coverage/.gitignore
DELETED
data/coverage/coverage-badge.png
DELETED
Binary file
|