regexp-examples 1.1.0 → 1.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: df7027b67d35ab27ac6577ffad4ccb2edc9852a9
4
- data.tar.gz: 9b5d3556ec049d0ccde7d6e1d8ed8bcdced2d1dd
3
+ metadata.gz: 96c7fa5e2bb981f8a2bb79debed55724f7c2bf57
4
+ data.tar.gz: abbc72305949312b0b3847c583b13c4b5dd09b8a
5
5
  SHA512:
6
- metadata.gz: ddcd3c9c084020ea7003cb86363ed40d7b511f743f0bb9169e9ba32dcb2aecce65d03819dee59a8889148a7c399b1a547ee07bfa9800f30fbf28d47ed7a50eb3
7
- data.tar.gz: bf2714b21a7c81db479f5fda2b23d378abcea98d2f2843429006ae39402db96fc09d15cd233fbea8f11c17e3bca93062cef01f3ac51a947beac00076beb4acbb
6
+ metadata.gz: c18e91f5585b7ba5d494c0a26af766c0e7c7e39a9244e3a211d23ad2ec96b512046d01cf8f4e11839bfb15717dc24215661644a8de33095b9ce929e483b37429
7
+ data.tar.gz: 5661b852fb34cd1c7c2948c5558d4263520b996645e806cdd2635ea08466d019365a7998f0f8eedf7681dd107c5bdf5e909a09a16ebed640e83be38eadd71f5c
data/README.md CHANGED
@@ -14,6 +14,8 @@ or a huge number of possible matches, such as `/.\w/`, then only a subset of the
14
14
 
15
15
  For more detail on this, see [configuration options](#configuration-options).
16
16
 
17
+ If you'd like to understand how/why this gem works, please check out my [blog post](http://tom-lord.weebly.com/blog/reverse-engineering-regular-expressions) about it!
18
+
17
19
  ## Usage
18
20
 
19
21
  ```ruby
@@ -88,6 +90,7 @@ Long answer:
88
90
  * Octal characters, e.g. `/\10/`, `/\177/`
89
91
  * Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character")
90
92
  , `/\p{^Ll}/` ("Not a lowercase letter"), `/\P{^Canadian_Aboriginal}/` ("Not not a Canadian aboriginal character")
93
+ * ...Even between different ruby versions!! (e.g. `/\p{Arabic}/.examples(max_group_results: 999)` will give you a different answer in ruby v2.1.x and v2.2.x)
91
94
  * **Arbitrarily complex combinations of all the above!**
92
95
 
93
96
  * Regexp options can also be used:
@@ -98,8 +101,7 @@ Long answer:
98
101
 
99
102
  ## Bugs and Not-Yet-Supported syntax
100
103
 
101
- * There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...)
102
- * Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This will be fixed in version 1.1.1 (see the pending pull request)!
104
+ * There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy!) However, patterns like this are highly unusual...
103
105
 
104
106
  Since the Regexp language is so vast, it's quite likely I've missed something (please raise an issue if you find something)! The only missing feature that I'm currently aware of is:
105
107
  * Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
@@ -109,7 +111,7 @@ Some of the most obscure regexp features are not even mentioned in the ruby docs
109
111
  ## Impossible features ("illegal syntax")
110
112
 
111
113
  The following features in the regex language can never be properly implemented into this gem because, put simply, they are not technically "regular"!
112
- If you'd like to understand this in more detail, there are many good blog posts out on the internet. The [wikipedia entry](http://en.wikipedia.org/wiki/Regular_expression)'s not bad either.
114
+ If you'd like to understand this in more detail, check out what I had to say in [my blog post](http://tom-lord.weebly.com/blog/reverse-engineering-regular-expressions) about this gem!
113
115
 
114
116
  Using any of the following will raise a RegexpExamples::IllegalSyntax exception:
115
117
 
@@ -137,7 +139,7 @@ When generating examples, the gem uses 2 configurable values to limit how many e
137
139
  * `[h-s]` is equivalent to `[hijkl]`
138
140
  * `(1|2|3|4|5|6|7|8)` is equivalent to `[12345]`
139
141
 
140
- Rexexp#examples makes use of *both* these options; Rexexp#random_example only uses `max_repeater_variance`, since the other option is redundant!
142
+ `Rexexp#examples` makes use of *both* these options; `Rexexp#random_example` only uses `max_repeater_variance`, since the other option is redundant!
141
143
 
142
144
  To use an alternative value, simply pass the configuration option as follows:
143
145
 
@@ -150,9 +152,9 @@ To use an alternative value, simply pass the configuration option as follows:
150
152
  #=> "A very unlikely result!"
151
153
  ```
152
154
 
153
- _**WARNING**: Choosing huge numbers for `Regexp#examples`, along with a "complex" regex, could easily cause your system to freeze!_
155
+ _**WARNING**: Choosing huge numbers for `Regexp#examples` and/or a sufficiently "complex" regex, could easily cause your system to freeze!_
154
156
 
155
- For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _trillion_) examples!
157
+ For example, if you try to generate a list of _all_ 5-letter words: `/\w{5}/.examples(max_group_results: 999)`, then since there are actually `63` "word" characters (upper/lower case letters, numbers and "\_"), this will try to generate `63**5 #=> 992436543` (almost 1 _billion_) examples!
156
158
 
157
159
  In other words, think twice before playing around with this config!
158
160
 
@@ -167,13 +169,11 @@ Due to code optimisation, this is not something you need to worry about (much) f
167
169
  ## TODO
168
170
 
169
171
  * Performance improvements:
170
- * Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time. See the pending pull request.
171
172
  * (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes".
172
- * Write a blog post about how this amazing gem works! :)
173
173
 
174
174
  ## Contributing
175
175
 
176
- 1. Fork it ( https://github.com/[my-github-username]/regexp-examples/fork )
176
+ 1. Fork it ( https://github.com/tom-lord/regexp-examples/fork )
177
177
  2. Create your feature branch (`git checkout -b my-new-feature`)
178
178
  3. Commit your changes (`git commit -am 'Add some feature'`)
179
179
  4. Push to the branch (`git push origin my-new-feature`)
data/Rakefile CHANGED
@@ -1,7 +1,7 @@
1
1
  require 'rake'
2
2
  require 'rspec/core/rake_task'
3
3
  require 'bundler/gem_tasks'
4
-
4
+
5
5
  RSpec::Core::RakeTask.new(:spec)
6
-
7
- task :default => :spec
6
+
7
+ task default: :spec
Binary file
Binary file
Binary file
@@ -17,17 +17,17 @@ module CoreExtensions
17
17
  end
18
18
 
19
19
  private
20
- def examples_by_method(method)
20
+
21
+ def examples_by_method(method)
21
22
  full_examples = RegexpExamples.public_send(
22
23
  method,
23
24
  RegexpExamples::Parser.new(source, options).parse
24
25
  )
25
26
  RegexpExamples::BackReferenceReplacer.new.substitute_backreferences(full_examples)
26
- end
27
+ end
27
28
  end
28
29
  end
29
30
  end
30
31
 
31
32
  # Regexp#include is private for ruby 2.0 and below
32
33
  Regexp.send(:include, CoreExtensions::Regexp::Examples)
33
-
@@ -1,2 +1,11 @@
1
- Dir[File.dirname(__FILE__) + '/regexp-examples/**/*.rb'].each {|file| require file }
2
-
1
+ require_relative 'regexp-examples/unicode_char_ranges'
2
+ require_relative 'regexp-examples/backreferences'
3
+ require_relative 'regexp-examples/chargroup_parser'
4
+ require_relative 'regexp-examples/constants'
5
+ require_relative 'regexp-examples/groups'
6
+ require_relative 'regexp-examples/helpers'
7
+ require_relative 'regexp-examples/parser'
8
+ require_relative 'regexp-examples/repeaters'
9
+ require_relative 'regexp-examples/unicode_char_ranges'
10
+ require_relative 'regexp-examples/version'
11
+ require_relative 'core_extensions/regexp/examples'
@@ -6,7 +6,7 @@ module RegexpExamples
6
6
  full_examples.map do |full_example|
7
7
  begin
8
8
  while full_example.match(/__(\w+?)__/)
9
- full_example.sub!(/__(\w+?)__/, find_backref_for(full_example, $1))
9
+ full_example.sub!(/__(\w+?)__/, find_backref_for(full_example, Regexp.last_match(1)))
10
10
  end
11
11
  full_example
12
12
  rescue BackrefNotFound
@@ -18,6 +18,7 @@ module RegexpExamples
18
18
  end
19
19
 
20
20
  private
21
+
21
22
  def find_backref_for(full_example, group_id)
22
23
  full_example.all_subgroups.detect do |subgroup|
23
24
  subgroup.group_id == group_id
@@ -29,10 +30,8 @@ module RegexpExamples
29
30
  if octal_chars =~ /\A[01]?[0-7]{1,2}\z/ && octal_chars.to_i >= 10
30
31
  Integer(octal_chars, 8).chr
31
32
  else
32
- raise(BackrefNotFound)
33
+ fail(BackrefNotFound)
33
34
  end
34
35
  end
35
-
36
36
  end
37
-
38
37
  end
@@ -22,30 +22,30 @@ module RegexpExamples
22
22
  @charset = []
23
23
  @negative = false
24
24
  parse_first_chars
25
- until next_char == "]" do
25
+ until next_char == ']'
26
26
  case next_char
27
- when "["
27
+ when '['
28
28
  @current_position += 1
29
29
  sub_group_parser = self.class.new(rest_of_string, is_sub_group: true)
30
30
  @charset.concat sub_group_parser.result
31
31
  @current_position += sub_group_parser.length
32
- when "-"
33
- if regexp_string[@current_position + 1] == "]" # e.g. /[abc-]/ -- not a range!
34
- @charset << "-"
32
+ when '-'
33
+ if regexp_string[@current_position + 1] == ']' # e.g. /[abc-]/ -- not a range!
34
+ @charset << '-'
35
35
  @current_position += 1
36
36
  else
37
37
  @current_position += 1
38
- @charset.concat (@charset.last .. parse_checking_backlash.first).to_a
38
+ @charset.concat (@charset.last..parse_checking_backlash.first).to_a
39
39
  @current_position += 1
40
40
  end
41
- when "&"
42
- if regexp_string[@current_position + 1] == "&"
41
+ when '&'
42
+ if regexp_string[@current_position + 1] == '&'
43
43
  @current_position += 2
44
44
  sub_group_parser = self.class.new(rest_of_string, is_sub_group: @is_sub_group)
45
45
  @charset &= sub_group_parser.result
46
46
  @current_position += (sub_group_parser.length - 1)
47
47
  else
48
- @charset << "&"
48
+ @charset << '&'
49
49
  @current_position += 1
50
50
  end
51
51
  else
@@ -67,28 +67,29 @@ module RegexpExamples
67
67
  end
68
68
 
69
69
  private
70
+
70
71
  def parse_first_chars
71
72
  if next_char == '^'
72
73
  @negative = true
73
74
  @current_position += 1
74
75
  end
75
-
76
+
76
77
  case rest_of_string
77
78
  when /\A[-\]]/ # e.g. /[]]/ (match "]") or /[-]/ (match "-")
78
79
  @charset << next_char
79
80
  @current_position += 1
80
81
  when /\A:(\^?)([^:]+):\]/ # e.g. [[:alpha:]] - POSIX group
81
82
  if @is_sub_group
82
- chars = $1.empty? ? POSIXCharMap[$2] : (CharSets::Any - POSIXCharMap[$2])
83
+ chars = Regexp.last_match(1).empty? ? POSIXCharMap[Regexp.last_match(2)] : (CharSets::Any - POSIXCharMap[Regexp.last_match(2)])
83
84
  @charset.concat chars
84
- @current_position += ($1.length + $2.length + 2)
85
+ @current_position += (Regexp.last_match(1).length + Regexp.last_match(2).length + 2)
85
86
  end
86
87
  end
87
88
  end
88
89
 
89
90
  # Always returns an Array, for consistency
90
91
  def parse_checking_backlash
91
- if next_char == "\\"
92
+ if next_char == '\\'
92
93
  @current_position += 1
93
94
  parse_after_backslash
94
95
  else
@@ -116,4 +117,3 @@ module RegexpExamples
116
117
  end
117
118
  end
118
119
  end
119
-
@@ -11,8 +11,8 @@ module RegexpExamples
11
11
 
12
12
  # Maximum number of characters returned from a char set, to reduce output spam
13
13
  # For example, if @@max_group_results = 5 then:
14
- # \d = ["0", "1", "2", "3", "4"]
15
- # \w = ["a", "b", "c", "d", "e"]
14
+ # \d is equivalent to [01234]
15
+ # \w is equivalent to [abcde]
16
16
  MaxGroupResultsDefault = 5
17
17
 
18
18
  class << self
@@ -72,10 +72,10 @@ module RegexpExamples
72
72
  POSIXCharMap = {
73
73
  'alnum' => CharSets::Upper | CharSets::Lower | CharSets::Digit,
74
74
  'alpha' => CharSets::Upper | CharSets::Lower,
75
- 'blank' => [" ", "\t"],
75
+ 'blank' => [' ', "\t"],
76
76
  'cntrl' => CharSets::Control,
77
77
  'digit' => CharSets::Digit,
78
- 'graph' => (CharSets::Any - CharSets::Control) - [" "], # Visible chars
78
+ 'graph' => (CharSets::Any - CharSets::Control) - [' '], # Visible chars
79
79
  'lower' => CharSets::Lower,
80
80
  'print' => CharSets::Any - CharSets::Control,
81
81
  'punct' => CharSets::Punct,
@@ -86,156 +86,5 @@ module RegexpExamples
86
86
  'ascii' => CharSets::Any
87
87
  }.freeze
88
88
 
89
- def self.ranges_to_unicode(*ranges)
90
- result = []
91
- ranges.each do |range|
92
- if range.is_a? Fixnum # Small hack to improve readability below
93
- result << hex_to_unicode(range.to_s(16))
94
- else
95
- range.each { |num| result << hex_to_unicode(num.to_s(16)) }
96
- end
97
- end
98
- result
99
- end
100
-
101
- def self.hex_to_unicode(hex)
102
- eval("?\\u{#{hex}}")
103
- end
104
-
105
- # These values were generated by: scripts/unicode_lister.rb
106
- # Note: Only the first 128 results are listed, for performance.
107
- # Also, some groups seem to have no matches (weird!)
108
- NamedPropertyCharMap = {
109
- 'alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
110
- 'alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
111
- 'blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
112
- 'cntrl' => ranges_to_unicode(0..31, 127..159),
113
- 'digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
114
- 'graph' => ranges_to_unicode(33..126, 161..194),
115
- 'lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
116
- 'print' => ranges_to_unicode(32..126, 160..192),
117
- 'punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
118
- 'space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
119
- 'upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
120
- 'xdigit' => ranges_to_unicode(48..57, 65..70, 97..102),
121
- 'word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
122
- 'ascii' => ranges_to_unicode(0..127),
123
- 'any' => ranges_to_unicode(0..127),
124
- 'assigned' => ranges_to_unicode(0..127),
125
- 'l' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
126
- 'll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
127
- 'lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
128
- 'lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
129
- 'lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
130
- 'lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
131
- 'm' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
132
- 'mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
133
- 'mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
134
- 'me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
135
- 'n' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
136
- 'nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
137
- 'nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
138
- 'no' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
139
- 'p' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
140
- 'pc' => ranges_to_unicode(95, 8255..8256, 8276),
141
- 'pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
142
- 'ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
143
- 'pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
144
- 'pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
145
- 'pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
146
- 'po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
147
- 's' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
148
- 'sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
149
- 'sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
150
- 'sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
151
- 'so' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
152
- 'z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
153
- 'zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
154
- 'zl' => ranges_to_unicode(8232),
155
- 'zp' => ranges_to_unicode(8233),
156
- 'c' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
157
- 'cc' => ranges_to_unicode(0..31, 127..159),
158
- 'cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
159
- 'cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
160
- 'co' => ranges_to_unicode(),
161
- 'cs' => ranges_to_unicode(),
162
- 'arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
163
- 'armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
164
- 'balinese' => ranges_to_unicode(6912..6987, 6992..7036),
165
- 'bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
166
- 'bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
167
- 'braille' => ranges_to_unicode(10240..10367),
168
- 'buginese' => ranges_to_unicode(6656..6683, 6686..6687),
169
- 'buhid' => ranges_to_unicode(5952..5971),
170
- 'canadian_aboriginal' => ranges_to_unicode(5120..5247),
171
- 'carian' => ranges_to_unicode(),
172
- 'cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
173
- 'cherokee' => ranges_to_unicode(5024..5108),
174
- 'common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
175
- 'coptic' => ranges_to_unicode(994..1007, 11392..11505),
176
- 'cuneiform' => ranges_to_unicode(),
177
- 'cypriot' => ranges_to_unicode(),
178
- 'cyrillic' => ranges_to_unicode(1024..1151),
179
- 'deseret' => ranges_to_unicode(),
180
- 'devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
181
- 'ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
182
- 'georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
183
- 'glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
184
- 'gothic' => ranges_to_unicode(),
185
- 'greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
186
- 'gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
187
- 'gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
188
- 'han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
189
- 'hangul' => ranges_to_unicode(4352..4479),
190
- 'hanunoo' => ranges_to_unicode(5920..5940),
191
- 'hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
192
- 'hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
193
- 'inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
194
- 'kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
195
- 'katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
196
- 'kayah_li' => ranges_to_unicode(43264..43309, 43311),
197
- 'kharoshthi' => ranges_to_unicode(),
198
- 'khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
199
- 'lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
200
- 'latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
201
- 'lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
202
- 'limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
203
- 'linear_b' => ranges_to_unicode(),
204
- 'lycian' => ranges_to_unicode(),
205
- 'lydian' => ranges_to_unicode(),
206
- 'malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
207
- 'mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
208
- 'myanmar' => ranges_to_unicode(4096..4223),
209
- 'new_tai_lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
210
- 'nko' => ranges_to_unicode(1984..2042),
211
- 'ogham' => ranges_to_unicode(5760..5788),
212
- 'ol_chiki' => ranges_to_unicode(7248..7295),
213
- 'old_italic' => ranges_to_unicode(),
214
- 'old_persian' => ranges_to_unicode(),
215
- 'oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
216
- 'osmanya' => ranges_to_unicode(),
217
- 'phags_pa' => ranges_to_unicode(43072..43127),
218
- 'phoenician' => ranges_to_unicode(),
219
- 'rejang' => ranges_to_unicode(43312..43347, 43359),
220
- 'runic' => ranges_to_unicode(5792..5866, 5870..5880),
221
- 'saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
222
- 'shavian' => ranges_to_unicode(),
223
- 'sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
224
- 'sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
225
- 'syloti_nagri' => ranges_to_unicode(43008..43051),
226
- 'syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
227
- 'tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
228
- 'tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
229
- 'tai_le' => ranges_to_unicode(6480..6509, 6512..6516),
230
- 'tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
231
- 'telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
232
- 'thaana' => ranges_to_unicode(1920..1969),
233
- 'thai' => ranges_to_unicode(3585..3642, 3648..3675),
234
- 'tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
235
- 'tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
236
- 'ugaritic' => ranges_to_unicode(),
237
- 'vai' => ranges_to_unicode(42240..42367),
238
- 'yi' => ranges_to_unicode(40960..41087),
239
- }.freeze
89
+ NamedPropertyCharMap = UnicodeCharRanges.new
240
90
  end
241
-
@@ -7,9 +7,7 @@ module RegexpExamples
7
7
  def initialize(result, group_id = nil, subgroups = [])
8
8
  @group_id = group_id
9
9
  @subgroups = subgroups
10
- if result.respond_to?(:group_id)
11
- @subgroups = result.all_subgroups
12
- end
10
+ @subgroups = result.all_subgroups if result.respond_to?(:group_id)
13
11
  super(result)
14
12
  end
15
13
 
@@ -23,13 +21,21 @@ module RegexpExamples
23
21
  end
24
22
  end
25
23
 
24
+ module ForceLazyEnumerators
25
+ def force_if_lazy(arr_or_enum)
26
+ arr_or_enum.respond_to?(:force) ? arr_or_enum.force : arr_or_enum
27
+ end
28
+ end
29
+
26
30
  module GroupWithIgnoreCase
31
+ include ForceLazyEnumerators
27
32
  attr_reader :ignorecase
28
33
  def result
29
34
  group_result = super
30
35
  if ignorecase
31
- group_result
32
- .concat( group_result.map(&:swapcase) )
36
+ group_result_array = force_if_lazy(group_result)
37
+ group_result_array
38
+ .concat(group_result_array.map(&:swapcase))
33
39
  .uniq
34
40
  else
35
41
  group_result
@@ -38,8 +44,9 @@ module RegexpExamples
38
44
  end
39
45
 
40
46
  module RandomResultBySample
47
+ include ForceLazyEnumerators
41
48
  def random_result
42
- result.sample(1)
49
+ force_if_lazy(result).sample(1)
43
50
  end
44
51
  end
45
52
 
@@ -50,6 +57,7 @@ module RegexpExamples
50
57
  @char = char
51
58
  @ignorecase = ignorecase
52
59
  end
60
+
53
61
  def result
54
62
  [GroupResult.new(@char)]
55
63
  end
@@ -75,11 +83,10 @@ module RegexpExamples
75
83
  end
76
84
 
77
85
  def result
78
- @chars.map do |result|
86
+ @chars.lazy.map do |result|
79
87
  GroupResult.new(result)
80
88
  end
81
89
  end
82
-
83
90
  end
84
91
 
85
92
  class DotGroup
@@ -91,7 +98,7 @@ module RegexpExamples
91
98
 
92
99
  def result
93
100
  chars = multiline ? CharSets::Any : CharSets::AnyNoNewLine
94
- chars.map do |result|
101
+ chars.lazy.map do |result|
95
102
  GroupResult.new(result)
96
103
  end
97
104
  end
@@ -113,10 +120,11 @@ module RegexpExamples
113
120
  end
114
121
 
115
122
  private
123
+
116
124
  # Generates the result of each contained group
117
125
  # and adds the filled group of each result to itself
118
126
  def result_by_method(method)
119
- strings = @groups.map {|repeater| repeater.public_send(method)}
127
+ strings = @groups.map { |repeater| repeater.public_send(method) }
120
128
  RegexpExamples.permutations_of_strings(strings).map do |result|
121
129
  GroupResult.new(result, group_id)
122
130
  end
@@ -143,6 +151,7 @@ module RegexpExamples
143
151
  end
144
152
 
145
153
  private
154
+
146
155
  def result_by_method(method)
147
156
  left_result = RegexpExamples.public_send(method, @left_repeaters)
148
157
  right_result = RegexpExamples.public_send(method, @right_repeaters)
@@ -160,8 +169,7 @@ module RegexpExamples
160
169
  end
161
170
 
162
171
  def result
163
- [ GroupResult.new("__#{@id}__") ]
172
+ [GroupResult.new("__#{@id}__")]
164
173
  end
165
174
  end
166
-
167
175
  end