regexp-examples 1.1.0 → 1.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/README.md +9 -9
- data/Rakefile +3 -3
- data/db/unicode_ranges_2.0.pstore +0 -0
- data/db/unicode_ranges_2.1.pstore +0 -0
- data/db/unicode_ranges_2.2.pstore +0 -0
- data/lib/{regexp-examples/core_extensions → core_extensions}/regexp/examples.rb +3 -3
- data/lib/regexp-examples.rb +11 -2
- data/lib/regexp-examples/backreferences.rb +3 -4
- data/lib/regexp-examples/chargroup_parser.rb +14 -14
- data/lib/regexp-examples/constants.rb +5 -156
- data/lib/regexp-examples/groups.rb +20 -12
- data/lib/regexp-examples/helpers.rb +5 -5
- data/lib/regexp-examples/parser.rb +52 -42
- data/lib/regexp-examples/repeaters.rb +5 -5
- data/lib/regexp-examples/unicode_char_ranges.rb +45 -0
- data/lib/regexp-examples/version.rb +1 -1
- data/regexp-examples.gemspec +4 -4
- data/scripts/unicode_lister.rb +34 -150
- data/spec/regexp-examples_spec.rb +81 -59
- data/spec/regexp-random_example_spec.rb +2 -2
- data/spec/spec_helper.rb +1 -1
- metadata +8 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 96c7fa5e2bb981f8a2bb79debed55724f7c2bf57
|
4
|
+
data.tar.gz: abbc72305949312b0b3847c583b13c4b5dd09b8a
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: c18e91f5585b7ba5d494c0a26af766c0e7c7e39a9244e3a211d23ad2ec96b512046d01cf8f4e11839bfb15717dc24215661644a8de33095b9ce929e483b37429
|
7
|
+
data.tar.gz: 5661b852fb34cd1c7c2948c5558d4263520b996645e806cdd2635ea08466d019365a7998f0f8eedf7681dd107c5bdf5e909a09a16ebed640e83be38eadd71f5c
|
data/README.md
CHANGED
@@ -14,6 +14,8 @@ or a huge number of possible matches, such as `/.\w/`, then only a subset of the
|
|
14
14
|
|
15
15
|
For more detail on this, see [configuration options](#configuration-options).
|
16
16
|
|
17
|
+
If you'd like to understand how/why this gem works, please check out my [blog post](http://tom-lord.weebly.com/blog/reverse-engineering-regular-expressions) about it!
|
18
|
+
|
17
19
|
## Usage
|
18
20
|
|
19
21
|
```ruby
|
@@ -88,6 +90,7 @@ Long answer:
|
|
88
90
|
* Octal characters, e.g. `/\10/`, `/\177/`
|
89
91
|
* Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
|
90
92
|
, `/\p{^Ll}/` ("Not a lowercase letter"), `/\P{^Canadian_Aboriginal}/` ("Not not a Canadian aboriginal character")
|
93
|
+
* ...Even between different ruby versions!! (e.g. `/\p{Arabic}/.examples(max_group_results: 999)` will give you a different answer in ruby v2.1.x and v2.2.x)
|
91
94
|
* **Arbitrarily complex combinations of all the above!**
|
92
95
|
|
93
96
|
* Regexp options can also be used:
|
@@ -98,8 +101,7 @@ Long answer:
|
|
98
101
|
|
99
102
|
## Bugs and Not-Yet-Supported syntax
|
100
103
|
|
101
|
-
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
|
102
|
-
* Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.1 (see the pending pull request)!
|
104
|
+
* There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy!) However, patterns like this are highly unusual...
|
103
105
|
|
104
106
|
Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
|
105
107
|
* Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
|
@@ -109,7 +111,7 @@ Some of the most obscure regexp features are not even mentioned in the ruby docs
|
|
109
111
|
## Impossible features ("illegal syntax")
|
110
112
|
|
111
113
|
The following features in the regex language can never be properly implemented into this gem because, put simply, they are not technically "regular"!
|
112
|
-
If you'd like to understand this in more detail,
|
114
|
+
If you'd like to understand this in more detail, check out what I had to say in [my blog post](http://tom-lord.weebly.com/blog/reverse-engineering-regular-expressions) about this gem!
|
113
115
|
|
114
116
|
Using any of the following will raise a RegexpExamples::IllegalSyntax exception:
|
115
117
|
|
@@ -137,7 +139,7 @@ When generating examples, the gem uses 2 configurable values to limit how many e
|
|
137
139
|
* `[h-s]` is equivalent to `[hijkl]`
|
138
140
|
* `(1|2|3|4|5|6|7|8)` is equivalent to `[12345]`
|
139
141
|
|
140
|
-
Rexexp#examples makes use of *both* these options; Rexexp#random_example only uses `max_repeater_variance`, since the other option is redundant!
|
142
|
+
`Rexexp#examples` makes use of *both* these options; `Rexexp#random_example` only uses `max_repeater_variance`, since the other option is redundant!
|
141
143
|
|
142
144
|
To use an alternative value, simply pass the configuration option as follows:
|
143
145
|
|
@@ -150,9 +152,9 @@ To use an alternative value, simply pass the configuration option as follows:
|
|
150
152
|
#=> "A very unlikely result!"
|
151
153
|
```
|
152
154
|
|
153
|
-
_**WARNING**: Choosing huge numbers for `Regexp#examples
|
155
|
+
_**WARNING**: Choosing huge numbers for `Regexp#examples` and/or a sufficiently "complex" regex, could easily cause your system to freeze!_
|
154
156
|
|
155
|
-
For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1
|
157
|
+
For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _billion_) examples!
|
156
158
|
|
157
159
|
In other words, think twice before playing around with this config!
|
158
160
|
|
@@ -167,13 +169,11 @@ Due to code optimisation, this is not something you need to worry about (much) f
|
|
167
169
|
## TODO
|
168
170
|
|
169
171
|
* Performance improvements:
|
170
|
-
* Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time. See the pending pull request.
|
171
172
|
* (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes".
|
172
|
-
* Write a blog post about how this amazing gem works! :)
|
173
173
|
|
174
174
|
## Contributing
|
175
175
|
|
176
|
-
1. Fork it ( https://github.com/
|
176
|
+
1. Fork it ( https://github.com/tom-lord/regexp-examples/fork )
|
177
177
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
178
178
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
179
179
|
4. Push to the branch (`git push origin my-new-feature`)
|
data/Rakefile
CHANGED
Binary file
|
Binary file
|
Binary file
|
@@ -17,17 +17,17 @@ module CoreExtensions
|
|
17
17
|
end
|
18
18
|
|
19
19
|
private
|
20
|
-
|
20
|
+
|
21
|
+
def examples_by_method(method)
|
21
22
|
full_examples = RegexpExamples.public_send(
|
22
23
|
method,
|
23
24
|
RegexpExamples::Parser.new(source, options).parse
|
24
25
|
)
|
25
26
|
RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
|
26
|
-
|
27
|
+
end
|
27
28
|
end
|
28
29
|
end
|
29
30
|
end
|
30
31
|
|
31
32
|
# Regexp#include is private for ruby 2.0 and below
|
32
33
|
Regexp.send(:include, CoreExtensions::Regexp::Examples)
|
33
|
-
|
data/lib/regexp-examples.rb
CHANGED
@@ -1,2 +1,11 @@
|
|
1
|
-
|
2
|
-
|
1
|
+
require_relative 'regexp-examples/unicode_char_ranges'
|
2
|
+
require_relative 'regexp-examples/backreferences'
|
3
|
+
require_relative 'regexp-examples/chargroup_parser'
|
4
|
+
require_relative 'regexp-examples/constants'
|
5
|
+
require_relative 'regexp-examples/groups'
|
6
|
+
require_relative 'regexp-examples/helpers'
|
7
|
+
require_relative 'regexp-examples/parser'
|
8
|
+
require_relative 'regexp-examples/repeaters'
|
9
|
+
require_relative 'regexp-examples/unicode_char_ranges'
|
10
|
+
require_relative 'regexp-examples/version'
|
11
|
+
require_relative 'core_extensions/regexp/examples'
|
@@ -6,7 +6,7 @@ module RegexpExamples
|
|
6
6
|
full_examples.map do |full_example|
|
7
7
|
begin
|
8
8
|
while full_example.match(/__(\w+?)__/)
|
9
|
-
full_example.sub!(/__(\w+?)__/, find_backref_for(full_example,
|
9
|
+
full_example.sub!(/__(\w+?)__/, find_backref_for(full_example, Regexp.last_match(1)))
|
10
10
|
end
|
11
11
|
full_example
|
12
12
|
rescue BackrefNotFound
|
@@ -18,6 +18,7 @@ module RegexpExamples
|
|
18
18
|
end
|
19
19
|
|
20
20
|
private
|
21
|
+
|
21
22
|
def find_backref_for(full_example, group_id)
|
22
23
|
full_example.all_subgroups.detect do |subgroup|
|
23
24
|
subgroup.group_id == group_id
|
@@ -29,10 +30,8 @@ module RegexpExamples
|
|
29
30
|
if octal_chars =~ /\A[01]?[0-7]{1,2}\z/ && octal_chars.to_i >= 10
|
30
31
|
Integer(octal_chars, 8).chr
|
31
32
|
else
|
32
|
-
|
33
|
+
fail(BackrefNotFound)
|
33
34
|
end
|
34
35
|
end
|
35
|
-
|
36
36
|
end
|
37
|
-
|
38
37
|
end
|
@@ -22,30 +22,30 @@ module RegexpExamples
|
|
22
22
|
@charset = []
|
23
23
|
@negative = false
|
24
24
|
parse_first_chars
|
25
|
-
until next_char ==
|
25
|
+
until next_char == ']'
|
26
26
|
case next_char
|
27
|
-
when
|
27
|
+
when '['
|
28
28
|
@current_position += 1
|
29
29
|
sub_group_parser = self.class.new(rest_of_string, is_sub_group: true)
|
30
30
|
@charset.concat sub_group_parser.result
|
31
31
|
@current_position += sub_group_parser.length
|
32
|
-
when
|
33
|
-
if regexp_string[@current_position + 1] ==
|
34
|
-
@charset <<
|
32
|
+
when '-'
|
33
|
+
if regexp_string[@current_position + 1] == ']' # e.g. /[abc-]/ -- not a range!
|
34
|
+
@charset << '-'
|
35
35
|
@current_position += 1
|
36
36
|
else
|
37
37
|
@current_position += 1
|
38
|
-
@charset.concat (@charset.last
|
38
|
+
@charset.concat (@charset.last..parse_checking_backlash.first).to_a
|
39
39
|
@current_position += 1
|
40
40
|
end
|
41
|
-
when
|
42
|
-
if regexp_string[@current_position + 1] ==
|
41
|
+
when '&'
|
42
|
+
if regexp_string[@current_position + 1] == '&'
|
43
43
|
@current_position += 2
|
44
44
|
sub_group_parser = self.class.new(rest_of_string, is_sub_group: @is_sub_group)
|
45
45
|
@charset &= sub_group_parser.result
|
46
46
|
@current_position += (sub_group_parser.length - 1)
|
47
47
|
else
|
48
|
-
@charset <<
|
48
|
+
@charset << '&'
|
49
49
|
@current_position += 1
|
50
50
|
end
|
51
51
|
else
|
@@ -67,28 +67,29 @@ module RegexpExamples
|
|
67
67
|
end
|
68
68
|
|
69
69
|
private
|
70
|
+
|
70
71
|
def parse_first_chars
|
71
72
|
if next_char == '^'
|
72
73
|
@negative = true
|
73
74
|
@current_position += 1
|
74
75
|
end
|
75
|
-
|
76
|
+
|
76
77
|
case rest_of_string
|
77
78
|
when /\A[-\]]/ # e.g. /[]]/ (match "]") or /[-]/ (match "-")
|
78
79
|
@charset << next_char
|
79
80
|
@current_position += 1
|
80
81
|
when /\A:(\^?)([^:]+):\]/ # e.g. [[:alpha:]] - POSIX group
|
81
82
|
if @is_sub_group
|
82
|
-
chars =
|
83
|
+
chars = Regexp.last_match(1).empty? ? POSIXCharMap[Regexp.last_match(2)] : (CharSets::Any - POSIXCharMap[Regexp.last_match(2)])
|
83
84
|
@charset.concat chars
|
84
|
-
@current_position += (
|
85
|
+
@current_position += (Regexp.last_match(1).length + Regexp.last_match(2).length + 2)
|
85
86
|
end
|
86
87
|
end
|
87
88
|
end
|
88
89
|
|
89
90
|
# Always returns an Array, for consistency
|
90
91
|
def parse_checking_backlash
|
91
|
-
if next_char ==
|
92
|
+
if next_char == '\\'
|
92
93
|
@current_position += 1
|
93
94
|
parse_after_backslash
|
94
95
|
else
|
@@ -116,4 +117,3 @@ module RegexpExamples
|
|
116
117
|
end
|
117
118
|
end
|
118
119
|
end
|
119
|
-
|
@@ -11,8 +11,8 @@ module RegexpExamples
|
|
11
11
|
|
12
12
|
# Maximum number of characters returned from a char set, to reduce output spam
|
13
13
|
# For example, if @@max_group_results = 5 then:
|
14
|
-
# \d
|
15
|
-
# \w
|
14
|
+
# \d is equivalent to [01234]
|
15
|
+
# \w is equivalent to [abcde]
|
16
16
|
MaxGroupResultsDefault = 5
|
17
17
|
|
18
18
|
class << self
|
@@ -72,10 +72,10 @@ module RegexpExamples
|
|
72
72
|
POSIXCharMap = {
|
73
73
|
'alnum' => CharSets::Upper | CharSets::Lower | CharSets::Digit,
|
74
74
|
'alpha' => CharSets::Upper | CharSets::Lower,
|
75
|
-
'blank' => [
|
75
|
+
'blank' => [' ', "\t"],
|
76
76
|
'cntrl' => CharSets::Control,
|
77
77
|
'digit' => CharSets::Digit,
|
78
|
-
'graph' => (CharSets::Any - CharSets::Control) - [
|
78
|
+
'graph' => (CharSets::Any - CharSets::Control) - [' '], # Visible chars
|
79
79
|
'lower' => CharSets::Lower,
|
80
80
|
'print' => CharSets::Any - CharSets::Control,
|
81
81
|
'punct' => CharSets::Punct,
|
@@ -86,156 +86,5 @@ module RegexpExamples
|
|
86
86
|
'ascii' => CharSets::Any
|
87
87
|
}.freeze
|
88
88
|
|
89
|
-
|
90
|
-
result = []
|
91
|
-
ranges.each do |range|
|
92
|
-
if range.is_a? Fixnum # Small hack to improve readability below
|
93
|
-
result << hex_to_unicode(range.to_s(16))
|
94
|
-
else
|
95
|
-
range.each { |num| result << hex_to_unicode(num.to_s(16)) }
|
96
|
-
end
|
97
|
-
end
|
98
|
-
result
|
99
|
-
end
|
100
|
-
|
101
|
-
def self.hex_to_unicode(hex)
|
102
|
-
eval("?\\u{#{hex}}")
|
103
|
-
end
|
104
|
-
|
105
|
-
# These values were generated by: scripts/unicode_lister.rb
|
106
|
-
# Note: Only the first 128 results are listed, for performance.
|
107
|
-
# Also, some groups seem to have no matches (weird!)
|
108
|
-
NamedPropertyCharMap = {
|
109
|
-
'alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
|
110
|
-
'alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
|
111
|
-
'blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
|
112
|
-
'cntrl' => ranges_to_unicode(0..31, 127..159),
|
113
|
-
'digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
|
114
|
-
'graph' => ranges_to_unicode(33..126, 161..194),
|
115
|
-
'lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
|
116
|
-
'print' => ranges_to_unicode(32..126, 160..192),
|
117
|
-
'punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
|
118
|
-
'space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
|
119
|
-
'upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
|
120
|
-
'xdigit' => ranges_to_unicode(48..57, 65..70, 97..102),
|
121
|
-
'word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
|
122
|
-
'ascii' => ranges_to_unicode(0..127),
|
123
|
-
'any' => ranges_to_unicode(0..127),
|
124
|
-
'assigned' => ranges_to_unicode(0..127),
|
125
|
-
'l' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
|
126
|
-
'll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
|
127
|
-
'lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
|
128
|
-
'lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
|
129
|
-
'lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
|
130
|
-
'lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
|
131
|
-
'm' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
|
132
|
-
'mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
|
133
|
-
'mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
|
134
|
-
'me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
|
135
|
-
'n' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
|
136
|
-
'nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
|
137
|
-
'nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
|
138
|
-
'no' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
|
139
|
-
'p' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
|
140
|
-
'pc' => ranges_to_unicode(95, 8255..8256, 8276),
|
141
|
-
'pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
|
142
|
-
'ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
|
143
|
-
'pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
|
144
|
-
'pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
|
145
|
-
'pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
|
146
|
-
'po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
|
147
|
-
's' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
|
148
|
-
'sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
|
149
|
-
'sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
|
150
|
-
'sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
|
151
|
-
'so' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
|
152
|
-
'z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
|
153
|
-
'zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
|
154
|
-
'zl' => ranges_to_unicode(8232),
|
155
|
-
'zp' => ranges_to_unicode(8233),
|
156
|
-
'c' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
|
157
|
-
'cc' => ranges_to_unicode(0..31, 127..159),
|
158
|
-
'cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
|
159
|
-
'cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
|
160
|
-
'co' => ranges_to_unicode(),
|
161
|
-
'cs' => ranges_to_unicode(),
|
162
|
-
'arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
|
163
|
-
'armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
|
164
|
-
'balinese' => ranges_to_unicode(6912..6987, 6992..7036),
|
165
|
-
'bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
|
166
|
-
'bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
|
167
|
-
'braille' => ranges_to_unicode(10240..10367),
|
168
|
-
'buginese' => ranges_to_unicode(6656..6683, 6686..6687),
|
169
|
-
'buhid' => ranges_to_unicode(5952..5971),
|
170
|
-
'canadian_aboriginal' => ranges_to_unicode(5120..5247),
|
171
|
-
'carian' => ranges_to_unicode(),
|
172
|
-
'cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
|
173
|
-
'cherokee' => ranges_to_unicode(5024..5108),
|
174
|
-
'common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
|
175
|
-
'coptic' => ranges_to_unicode(994..1007, 11392..11505),
|
176
|
-
'cuneiform' => ranges_to_unicode(),
|
177
|
-
'cypriot' => ranges_to_unicode(),
|
178
|
-
'cyrillic' => ranges_to_unicode(1024..1151),
|
179
|
-
'deseret' => ranges_to_unicode(),
|
180
|
-
'devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
|
181
|
-
'ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
|
182
|
-
'georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
|
183
|
-
'glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
|
184
|
-
'gothic' => ranges_to_unicode(),
|
185
|
-
'greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
|
186
|
-
'gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
|
187
|
-
'gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
|
188
|
-
'han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
|
189
|
-
'hangul' => ranges_to_unicode(4352..4479),
|
190
|
-
'hanunoo' => ranges_to_unicode(5920..5940),
|
191
|
-
'hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
|
192
|
-
'hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
|
193
|
-
'inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
|
194
|
-
'kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
|
195
|
-
'katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
|
196
|
-
'kayah_li' => ranges_to_unicode(43264..43309, 43311),
|
197
|
-
'kharoshthi' => ranges_to_unicode(),
|
198
|
-
'khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
|
199
|
-
'lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
|
200
|
-
'latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
|
201
|
-
'lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
|
202
|
-
'limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
|
203
|
-
'linear_b' => ranges_to_unicode(),
|
204
|
-
'lycian' => ranges_to_unicode(),
|
205
|
-
'lydian' => ranges_to_unicode(),
|
206
|
-
'malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
|
207
|
-
'mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
|
208
|
-
'myanmar' => ranges_to_unicode(4096..4223),
|
209
|
-
'new_tai_lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
|
210
|
-
'nko' => ranges_to_unicode(1984..2042),
|
211
|
-
'ogham' => ranges_to_unicode(5760..5788),
|
212
|
-
'ol_chiki' => ranges_to_unicode(7248..7295),
|
213
|
-
'old_italic' => ranges_to_unicode(),
|
214
|
-
'old_persian' => ranges_to_unicode(),
|
215
|
-
'oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
|
216
|
-
'osmanya' => ranges_to_unicode(),
|
217
|
-
'phags_pa' => ranges_to_unicode(43072..43127),
|
218
|
-
'phoenician' => ranges_to_unicode(),
|
219
|
-
'rejang' => ranges_to_unicode(43312..43347, 43359),
|
220
|
-
'runic' => ranges_to_unicode(5792..5866, 5870..5880),
|
221
|
-
'saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
|
222
|
-
'shavian' => ranges_to_unicode(),
|
223
|
-
'sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
|
224
|
-
'sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
|
225
|
-
'syloti_nagri' => ranges_to_unicode(43008..43051),
|
226
|
-
'syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
|
227
|
-
'tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
|
228
|
-
'tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
|
229
|
-
'tai_le' => ranges_to_unicode(6480..6509, 6512..6516),
|
230
|
-
'tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
|
231
|
-
'telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
|
232
|
-
'thaana' => ranges_to_unicode(1920..1969),
|
233
|
-
'thai' => ranges_to_unicode(3585..3642, 3648..3675),
|
234
|
-
'tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
|
235
|
-
'tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
|
236
|
-
'ugaritic' => ranges_to_unicode(),
|
237
|
-
'vai' => ranges_to_unicode(42240..42367),
|
238
|
-
'yi' => ranges_to_unicode(40960..41087),
|
239
|
-
}.freeze
|
89
|
+
NamedPropertyCharMap = UnicodeCharRanges.new
|
240
90
|
end
|
241
|
-
|
@@ -7,9 +7,7 @@ module RegexpExamples
|
|
7
7
|
def initialize(result, group_id = nil, subgroups = [])
|
8
8
|
@group_id = group_id
|
9
9
|
@subgroups = subgroups
|
10
|
-
if result.respond_to?(:group_id)
|
11
|
-
@subgroups = result.all_subgroups
|
12
|
-
end
|
10
|
+
@subgroups = result.all_subgroups if result.respond_to?(:group_id)
|
13
11
|
super(result)
|
14
12
|
end
|
15
13
|
|
@@ -23,13 +21,21 @@ module RegexpExamples
|
|
23
21
|
end
|
24
22
|
end
|
25
23
|
|
24
|
+
module ForceLazyEnumerators
|
25
|
+
def force_if_lazy(arr_or_enum)
|
26
|
+
arr_or_enum.respond_to?(:force) ? arr_or_enum.force : arr_or_enum
|
27
|
+
end
|
28
|
+
end
|
29
|
+
|
26
30
|
module GroupWithIgnoreCase
|
31
|
+
include ForceLazyEnumerators
|
27
32
|
attr_reader :ignorecase
|
28
33
|
def result
|
29
34
|
group_result = super
|
30
35
|
if ignorecase
|
31
|
-
group_result
|
32
|
-
|
36
|
+
group_result_array = force_if_lazy(group_result)
|
37
|
+
group_result_array
|
38
|
+
.concat(group_result_array.map(&:swapcase))
|
33
39
|
.uniq
|
34
40
|
else
|
35
41
|
group_result
|
@@ -38,8 +44,9 @@ module RegexpExamples
|
|
38
44
|
end
|
39
45
|
|
40
46
|
module RandomResultBySample
|
47
|
+
include ForceLazyEnumerators
|
41
48
|
def random_result
|
42
|
-
result.sample(1)
|
49
|
+
force_if_lazy(result).sample(1)
|
43
50
|
end
|
44
51
|
end
|
45
52
|
|
@@ -50,6 +57,7 @@ module RegexpExamples
|
|
50
57
|
@char = char
|
51
58
|
@ignorecase = ignorecase
|
52
59
|
end
|
60
|
+
|
53
61
|
def result
|
54
62
|
[GroupResult.new(@char)]
|
55
63
|
end
|
@@ -75,11 +83,10 @@ module RegexpExamples
|
|
75
83
|
end
|
76
84
|
|
77
85
|
def result
|
78
|
-
@chars.map do |result|
|
86
|
+
@chars.lazy.map do |result|
|
79
87
|
GroupResult.new(result)
|
80
88
|
end
|
81
89
|
end
|
82
|
-
|
83
90
|
end
|
84
91
|
|
85
92
|
class DotGroup
|
@@ -91,7 +98,7 @@ module RegexpExamples
|
|
91
98
|
|
92
99
|
def result
|
93
100
|
chars = multiline ? CharSets::Any : CharSets::AnyNoNewLine
|
94
|
-
chars.map do |result|
|
101
|
+
chars.lazy.map do |result|
|
95
102
|
GroupResult.new(result)
|
96
103
|
end
|
97
104
|
end
|
@@ -113,10 +120,11 @@ module RegexpExamples
|
|
113
120
|
end
|
114
121
|
|
115
122
|
private
|
123
|
+
|
116
124
|
# Generates the result of each contained group
|
117
125
|
# and adds the filled group of each result to itself
|
118
126
|
def result_by_method(method)
|
119
|
-
strings = @groups.map {|repeater| repeater.public_send(method)}
|
127
|
+
strings = @groups.map { |repeater| repeater.public_send(method) }
|
120
128
|
RegexpExamples.permutations_of_strings(strings).map do |result|
|
121
129
|
GroupResult.new(result, group_id)
|
122
130
|
end
|
@@ -143,6 +151,7 @@ module RegexpExamples
|
|
143
151
|
end
|
144
152
|
|
145
153
|
private
|
154
|
+
|
146
155
|
def result_by_method(method)
|
147
156
|
left_result = RegexpExamples.public_send(method, @left_repeaters)
|
148
157
|
right_result = RegexpExamples.public_send(method, @right_repeaters)
|
@@ -160,8 +169,7 @@ module RegexpExamples
|
|
160
169
|
end
|
161
170
|
|
162
171
|
def result
|
163
|
-
[
|
172
|
+
[GroupResult.new("__#{@id}__")]
|
164
173
|
end
|
165
174
|
end
|
166
|
-
|
167
175
|
end
|