regexp-examples 0.7.0 → 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f7dacce756110dd70823630de898a8c9f55d12b1
4
- data.tar.gz: d3ee78e2ed48d91aacc9cb916d8ab71dd25e326d
3
+ metadata.gz: 39d4ef9f2ee3e17541118580954e0a9f137f969b
4
+ data.tar.gz: 9e573acd0b5cdcf700bfe4f49a838933e6a01737
5
5
  SHA512:
6
- metadata.gz: 2655f9c1b1bbb8452a06d7debdba232ba53354131776bbf23a69fc1dc3b62d950600093b4581d2f4c5304f161db421ed95204e8c40cee2adc75c95670dcf42a1
7
- data.tar.gz: da2dd9829aa3f5f2415f4a5ca4182133c19b1a481a40172140858ba72f65e05824eebdbff8899c6f0d84a90c93b0539c86c68dd7a23371fe6f577da738746824
6
+ metadata.gz: 50ee2b66ced2a4878309566088a493b8cc7654d44db6b91458c0f04f4015cbc9a2e437b7444670b28a78475a5df715d89dd8a81fa1e9d508095c234da830c782
7
+ data.tar.gz: 4785f7967bc3549629c2fbb5f74f0d7343076a028d33265344b9f1cb0b74c39df2c4f402ab97baeb42fb536dbd31132eb3fb69d462cab5a3ffb0f568ae91bed5
data/.gitignore CHANGED
@@ -12,3 +12,4 @@
12
12
  *.a
13
13
  mkmf.log
14
14
  tags
15
+ /coverage/
data/Gemfile CHANGED
@@ -1,9 +1,10 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- gem 'rspec', group: :test
4
- gem 'simplecov', require: false, group: :test
5
- gem 'simplecov-badge', require: false, group: :test
6
- gem 'pry', group: :test
3
+ group :test do
4
+ gem 'rspec'
5
+ gem 'coveralls', require: false
6
+ gem 'pry'
7
+ end
7
8
 
8
9
  # Specify your gem's dependencies in regexp-examples.gemspec
9
10
  gemspec
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # regexp-examples
2
2
  [![Gem Version](https://badge.fury.io/rb/regexp-examples.svg)](http://badge.fury.io/rb/regexp-examples)
3
3
  [![Build Status](https://travis-ci.org/tom-lord/regexp-examples.svg?branch=master)](https://travis-ci.org/tom-lord/regexp-examples/builds)
4
- ![Code Coverage](coverage/coverage-badge.png)
4
+ [![Coverage Status](https://coveralls.io/repos/tom-lord/regexp-examples/badge.svg?branch=master)](https://coveralls.io/r/tom-lord/regexp-examples?branch=master)
5
5
 
6
6
  Extends the Regexp class with the method: Regexp#examples
7
7
 
@@ -26,12 +26,33 @@ For more detail on this, see [configuration options](#configuration-options).
26
26
  /what about (backreferences\?) \1/.examples #=> ['what about backreferences? backreferences?']
27
27
  ```
28
28
 
29
+ ## Installation
30
+
31
+ Add this line to your application's Gemfile:
32
+
33
+ ```ruby
34
+ gem 'regexp-examples'
35
+ ```
36
+
37
+ And then execute:
38
+
39
+ $ bundle
40
+
41
+ Or install it yourself as:
42
+
43
+ $ gem install regexp-examples
44
+
29
45
  ## Supported syntax
30
46
 
31
47
  * All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
32
48
  * Reluctant and possissive repeaters work fine, too - e.g. `/a*?/`, `/a*+/`
33
49
  * Boolean "Or" groups, e.g. `/a|b|c/`
34
- * Character sets (inluding ranges and negation!), e.g. `/[abc]/`, `/[A-Z0-9]/`, `/[^a-z]/`, `/[\w\s\b]/`
50
+ * Character sets e.g. `/[abc]/` - including:
51
+ * Ranges, e.g.`/[A-Z0-9]/`
52
+ * Negation, e.g. `/[^a-z]/`
53
+ * Escaped characters, e.g. `/[\w\s\b]/`
54
+ * POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
55
+ * Set intersection, e.g. `/[[a-h]&&[f-z]]/`
35
56
  * Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
36
57
  * Capture groups, e.g. `/(group)/`
37
58
  * Including named groups, e.g. `/(?<name>group)/`
@@ -43,7 +64,6 @@ For more detail on this, see [configuration options](#configuration-options).
43
64
  * Escape sequences, e.g. `/\x42/`, `/\x5word/`, `/#{"\x80".force_encoding("ASCII-8BIT")}/`
44
65
  * Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
45
66
  * Octal characters, e.g. `/\10/`, `/\177/`
46
- * POSIX bracket expressions (including negation), e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
47
67
  * Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character"), `/\p{^Ll}/` ("Not a lowercase letter")
48
68
  * **Arbitrarily complex combinations of all the above!**
49
69
 
@@ -55,13 +75,12 @@ For more detail on this, see [configuration options](#configuration-options).
55
75
 
56
76
  ## Bugs and Not-Yet-Supported syntax
57
77
 
58
- * Nested character classes, and the use of set intersection ([See here](http://www.ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Character+Classes) for the official documentation on this.) For example:
59
- * `/[[abc]de]/.examples` (which _should_ return `["a", "b", "c", "d", "e"]`)
60
- * `/[[a-d]&&[c-f]]/.examples` (which _should_ return: `["c", "d"]`)
78
+ * There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
79
+ * Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This would be "easy" to fix, but I can't be bothered... Feel free to make a pull request!
61
80
 
62
- * Conditional capture groups, such as `/(group1) (?(1)yes|no)`
63
-
64
- There are loads more (increasingly obscure) unsupported bits of syntax, which I cannot be bothered to write out here. Full documentation on all the various other obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE).
81
+ There are also some various (increasingly obscure) unsupported bits of syntax, which I cannot be bothered to write out fully here. Full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE). To name a couple:
82
+ * Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
83
+ * Back reference by relatve group number, e.g. `/(a)(b)(c)(d) \k<-2>/.examples` (which *should* return: `["abcd c"]`)
65
84
 
66
85
  ## Impossible features ("illegal syntax")
67
86
 
@@ -115,21 +134,12 @@ A more sensible use case might be, for example, to generate one random 1-4 digit
115
134
 
116
135
  (Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
117
136
 
118
- ## Installation
119
-
120
- Add this line to your application's Gemfile:
121
-
122
- ```ruby
123
- gem 'regexp-examples'
124
- ```
137
+ ## TODO
125
138
 
126
- And then execute:
127
-
128
- $ bundle
129
-
130
- Or install it yourself as:
131
-
132
- $ gem install regexp-examples
139
+ * Performance improvements:
140
+ * Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time.
141
+ * (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes"
142
+ * Write a blog post about how this amazing gem works! :)
133
143
 
134
144
  ## Contributing
135
145
 
@@ -1,69 +1,118 @@
1
1
  module RegexpExamples
2
- # Given an array of chars from inside a character set,
3
- # Interprets all backslashes, ranges and negations
4
- # TODO: This needs a bit of a rewrite because:
5
- # A) It's ugly
6
- # B) It doesn't take into account nested character groups, or set intersection
7
- # To achieve this, the algorithm needs to be recursive, like the main Parser.
2
+ # A "sub-parser", for char groups in a regular expression
3
+ # Some examples of what this class needs to parse:
4
+ # [abc] - plain characters
5
+ # [a-z] - ranges
6
+ # [\n\b\d] - escaped characters (which may represent character sets)
7
+ # [^abc] - negated group
8
+ # [[a][bc]] - sub-groups (should match "a", "b" or "c")
9
+ # [[:lower:]] - POSIX group
10
+ # [[a-f]&&[d-z]] - set intersection (should match "d", "f" or "f")
11
+ # [[^:alpha:]&&[\n]a-c] - all of the above!!!! (should match "\n")
8
12
  class ChargroupParser
9
- def initialize(chars)
10
- @chars = chars
11
- if @chars[0] == "^"
12
- @negative = true
13
- @chars = @chars[1..-1]
14
- else
15
- @negative = false
13
+ attr_reader :regexp_string
14
+ def initialize(regexp_string, is_sub_group: false)
15
+ @regexp_string = regexp_string
16
+ @is_sub_group = is_sub_group
17
+ @current_position = 0
18
+ parse
19
+ end
20
+
21
+ def parse
22
+ @charset = []
23
+ @negative = false
24
+ parse_first_chars
25
+ until next_char == "]" do
26
+ case next_char
27
+ when "["
28
+ @current_position += 1
29
+ sub_group_parser = self.class.new(rest_of_string, is_sub_group: true)
30
+ @charset.concat sub_group_parser.result
31
+ @current_position += sub_group_parser.length
32
+ when "-"
33
+ if regexp_string[@current_position + 1] == "]" # e.g. /[abc-]/ -- not a range!
34
+ @charset << "-"
35
+ @current_position += 1
36
+ else
37
+ @current_position += 1
38
+ @charset.concat (@charset.last .. parse_checking_backlash.first).to_a
39
+ @current_position += 1
40
+ end
41
+ when "&"
42
+ if regexp_string[@current_position + 1] == "&"
43
+ @current_position += 2
44
+ sub_group_parser = self.class.new(rest_of_string, is_sub_group: @is_sub_group)
45
+ @charset &= sub_group_parser.result
46
+ @current_position += (sub_group_parser.length - 1)
47
+ else
48
+ @charset << "&"
49
+ @current_position += 1
50
+ end
51
+ else
52
+ @charset.concat parse_checking_backlash
53
+ @current_position += 1
54
+ end
16
55
  end
17
56
 
18
- init_backslash_chars
19
- init_ranges
57
+ @charset.uniq!
58
+ @current_position += 1 # To account for final "]"
59
+ end
60
+
61
+ def length
62
+ @current_position
20
63
  end
21
64
 
22
65
  def result
23
- @negative ? (CharSets::Any - @chars) : @chars
66
+ @negative ? (CharSets::Any - @charset) : @charset
24
67
  end
25
68
 
26
69
  private
27
- def init_backslash_chars
28
- @chars.each_with_index do |char, i|
29
- if char == "\\"
30
- if BackslashCharMap.keys.include?(@chars[i+1])
31
- @chars[i..i+1] = move_backslash_to_front( BackslashCharMap[@chars[i+1]] )
32
- elsif @chars[i+1] == 'b'
33
- @chars[i..i+1] = "\b"
34
- elsif @chars[i+1] == "\\"
35
- @chars.delete_at(i+1)
36
- else
37
- @chars.delete_at(i)
38
- end
70
+ def parse_first_chars
71
+ if next_char == '^'
72
+ @negative = true
73
+ @current_position += 1
74
+ end
75
+
76
+ case rest_of_string
77
+ when /\A[-\]]/ # e.g. /[]]/ (match "]") or /[-]/ (match "-")
78
+ @charset << next_char
79
+ @current_position += 1
80
+ when /\A:(\^?)([^:]+):\]/ # e.g. [[:alpha:]] - POSIX group
81
+ if @is_sub_group
82
+ chars = $1.empty? ? POSIXCharMap[$2] : (CharSets::Any - POSIXCharMap[$2])
83
+ @charset.concat chars
84
+ @current_position += ($1.length + $2.length + 2)
39
85
  end
40
86
  end
41
87
  end
42
88
 
43
- def init_ranges
44
- # remove hyphen ("-") from front/back, if present
45
- hyphen = nil
46
- hyphen = @chars.shift if @chars.first == "-"
47
- hyphen ||= @chars.pop if @chars.last == "-"
48
- # Replace all instances of e.g. ["a", "-", "z"] with ["a", "b", ..., "z"]
49
- while i = @chars.index("-")
50
- # Prevent infinite loops from expanding [",", "-", "."] to itself
51
- # (Since ",".ord = 44, "-".ord = 45, ".".ord = 46)
52
- if (@chars[i-1] == ',' && @chars[i+1] == '.')
53
- hyphen = @chars.delete_at(i)
54
- else
55
- @chars[i-1..i+1] = (@chars[i-1]..@chars[i+1]).to_a
56
- end
89
+ # Always returns an Array, for consistency
90
+ def parse_checking_backlash
91
+ if next_char == "\\"
92
+ @current_position += 1
93
+ parse_after_backslash
94
+ else
95
+ [next_char]
57
96
  end
58
- # restore hyphen, if stripped out earlier
59
- @chars.unshift(hyphen) if hyphen
60
97
  end
61
98
 
62
- def move_backslash_to_front(chars)
63
- if index = chars.index { |char| char == '\\' }
64
- chars.unshift chars.delete_at(index)
99
+ def parse_after_backslash
100
+ case next_char
101
+ when *BackslashCharMap.keys
102
+ BackslashCharMap[next_char]
103
+ when 'b'
104
+ ["\b"]
105
+ else
106
+ [next_char]
65
107
  end
66
- chars
108
+ end
109
+
110
+ def rest_of_string
111
+ regexp_string[@current_position..-1]
112
+ end
113
+
114
+ def next_char
115
+ regexp_string[@current_position]
67
116
  end
68
117
  end
69
118
  end
@@ -105,136 +105,136 @@ module RegexpExamples
105
105
  # Note: Only the first 128 results are listed, for performance.
106
106
  # Also, some groups seem to have no matches (weird!)
107
107
  NamedPropertyCharMap = {
108
- 'Alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
109
- 'Alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
110
- 'Blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
111
- 'Cntrl' => ranges_to_unicode(0..31, 127..159),
112
- 'Digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
113
- 'Graph' => ranges_to_unicode(33..126, 161..194),
114
- 'Lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
115
- 'Print' => ranges_to_unicode(32..126, 160..192),
116
- 'Punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
117
- 'Space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
118
- 'Upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
119
- 'XDigit' => ranges_to_unicode(48..57, 65..70, 97..102),
120
- 'Word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
121
- 'ASCII' => ranges_to_unicode(0..127),
122
- 'Any' => ranges_to_unicode(0..127),
123
- 'Assigned' => ranges_to_unicode(0..127),
124
- 'L' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
125
- 'Ll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
126
- 'Lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
127
- 'Lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
128
- 'Lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
129
- 'Lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
130
- 'M' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
131
- 'Mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
132
- 'Mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
133
- 'Me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
134
- 'N' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
135
- 'Nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
136
- 'Nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
137
- 'No' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
138
- 'P' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
139
- 'Pc' => ranges_to_unicode(95, 8255..8256, 8276),
140
- 'Pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
141
- 'Ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
142
- 'Pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
143
- 'Pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
144
- 'Pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
145
- 'Po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
146
- 'S' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
147
- 'Sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
148
- 'Sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
149
- 'Sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
150
- 'So' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
151
- 'Z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
152
- 'Zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
153
- 'Zl' => ranges_to_unicode(8232),
154
- 'Zp' => ranges_to_unicode(8233),
155
- 'C' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
156
- 'Cc' => ranges_to_unicode(0..31, 127..159),
157
- 'Cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
158
- 'Cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
159
- 'Co' => ranges_to_unicode(),
160
- 'Cs' => ranges_to_unicode(),
161
- 'Arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
162
- 'Armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
163
- 'Balinese' => ranges_to_unicode(6912..6987, 6992..7036),
164
- 'Bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
165
- 'Bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
166
- 'Braille' => ranges_to_unicode(10240..10367),
167
- 'Buginese' => ranges_to_unicode(6656..6683, 6686..6687),
168
- 'Buhid' => ranges_to_unicode(5952..5971),
169
- 'Canadian_Aboriginal' => ranges_to_unicode(5120..5247),
170
- 'Carian' => ranges_to_unicode(),
171
- 'Cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
172
- 'Cherokee' => ranges_to_unicode(5024..5108),
173
- 'Common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
174
- 'Coptic' => ranges_to_unicode(994..1007, 11392..11505),
175
- 'Cuneiform' => ranges_to_unicode(),
176
- 'Cypriot' => ranges_to_unicode(),
177
- 'Cyrillic' => ranges_to_unicode(1024..1151),
178
- 'Deseret' => ranges_to_unicode(),
179
- 'Devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
180
- 'Ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
181
- 'Georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
182
- 'Glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
183
- 'Gothic' => ranges_to_unicode(),
184
- 'Greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
185
- 'Gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
186
- 'Gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
187
- 'Han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
188
- 'Hangul' => ranges_to_unicode(4352..4479),
189
- 'Hanunoo' => ranges_to_unicode(5920..5940),
190
- 'Hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
191
- 'Hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
192
- 'Inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
193
- 'Kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
194
- 'Katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
195
- 'Kayah_Li' => ranges_to_unicode(43264..43309, 43311),
196
- 'Kharoshthi' => ranges_to_unicode(),
197
- 'Khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
198
- 'Lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
199
- 'Latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
200
- 'Lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
201
- 'Limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
202
- 'Linear_B' => ranges_to_unicode(),
203
- 'Lycian' => ranges_to_unicode(),
204
- 'Lydian' => ranges_to_unicode(),
205
- 'Malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
206
- 'Mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
207
- 'Myanmar' => ranges_to_unicode(4096..4223),
208
- 'New_Tai_Lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
209
- 'Nko' => ranges_to_unicode(1984..2042),
210
- 'Ogham' => ranges_to_unicode(5760..5788),
211
- 'Ol_Chiki' => ranges_to_unicode(7248..7295),
212
- 'Old_Italic' => ranges_to_unicode(),
213
- 'Old_Persian' => ranges_to_unicode(),
214
- 'Oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
215
- 'Osmanya' => ranges_to_unicode(),
216
- 'Phags_Pa' => ranges_to_unicode(43072..43127),
217
- 'Phoenician' => ranges_to_unicode(),
218
- 'Rejang' => ranges_to_unicode(43312..43347, 43359),
219
- 'Runic' => ranges_to_unicode(5792..5866, 5870..5880),
220
- 'Saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
221
- 'Shavian' => ranges_to_unicode(),
222
- 'Sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
223
- 'Sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
224
- 'Syloti_Nagri' => ranges_to_unicode(43008..43051),
225
- 'Syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
226
- 'Tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
227
- 'Tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
228
- 'Tai_Le' => ranges_to_unicode(6480..6509, 6512..6516),
229
- 'Tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
230
- 'Telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
231
- 'Thaana' => ranges_to_unicode(1920..1969),
232
- 'Thai' => ranges_to_unicode(3585..3642, 3648..3675),
233
- 'Tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
234
- 'Tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
235
- 'Ugaritic' => ranges_to_unicode(),
236
- 'Vai' => ranges_to_unicode(42240..42367),
237
- 'Yi' => ranges_to_unicode(40960..41087),
108
+ 'alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
109
+ 'alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
110
+ 'blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
111
+ 'cntrl' => ranges_to_unicode(0..31, 127..159),
112
+ 'digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
113
+ 'graph' => ranges_to_unicode(33..126, 161..194),
114
+ 'lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
115
+ 'print' => ranges_to_unicode(32..126, 160..192),
116
+ 'punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
117
+ 'space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
118
+ 'upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
119
+ 'xdigit' => ranges_to_unicode(48..57, 65..70, 97..102),
120
+ 'word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
121
+ 'ascii' => ranges_to_unicode(0..127),
122
+ 'any' => ranges_to_unicode(0..127),
123
+ 'assigned' => ranges_to_unicode(0..127),
124
+ 'l' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
125
+ 'll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
126
+ 'lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
127
+ 'lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
128
+ 'lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
129
+ 'lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
130
+ 'm' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
131
+ 'mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
132
+ 'mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
133
+ 'me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
134
+ 'n' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
135
+ 'nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
136
+ 'nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
137
+ 'no' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
138
+ 'p' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
139
+ 'pc' => ranges_to_unicode(95, 8255..8256, 8276),
140
+ 'pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
141
+ 'ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
142
+ 'pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
143
+ 'pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
144
+ 'pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
145
+ 'po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
146
+ 's' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
147
+ 'sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
148
+ 'sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
149
+ 'sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
150
+ 'so' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
151
+ 'z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
152
+ 'zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
153
+ 'zl' => ranges_to_unicode(8232),
154
+ 'zp' => ranges_to_unicode(8233),
155
+ 'c' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
156
+ 'cc' => ranges_to_unicode(0..31, 127..159),
157
+ 'cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
158
+ 'cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
159
+ 'co' => ranges_to_unicode(),
160
+ 'cs' => ranges_to_unicode(),
161
+ 'arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
162
+ 'armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
163
+ 'balinese' => ranges_to_unicode(6912..6987, 6992..7036),
164
+ 'bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
165
+ 'bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
166
+ 'braille' => ranges_to_unicode(10240..10367),
167
+ 'buginese' => ranges_to_unicode(6656..6683, 6686..6687),
168
+ 'buhid' => ranges_to_unicode(5952..5971),
169
+ 'canadian_aboriginal' => ranges_to_unicode(5120..5247),
170
+ 'carian' => ranges_to_unicode(),
171
+ 'cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
172
+ 'cherokee' => ranges_to_unicode(5024..5108),
173
+ 'common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
174
+ 'coptic' => ranges_to_unicode(994..1007, 11392..11505),
175
+ 'cuneiform' => ranges_to_unicode(),
176
+ 'cypriot' => ranges_to_unicode(),
177
+ 'cyrillic' => ranges_to_unicode(1024..1151),
178
+ 'deseret' => ranges_to_unicode(),
179
+ 'devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
180
+ 'ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
181
+ 'georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
182
+ 'glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
183
+ 'gothic' => ranges_to_unicode(),
184
+ 'greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
185
+ 'gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
186
+ 'gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
187
+ 'han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
188
+ 'hangul' => ranges_to_unicode(4352..4479),
189
+ 'hanunoo' => ranges_to_unicode(5920..5940),
190
+ 'hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
191
+ 'hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
192
+ 'inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
193
+ 'kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
194
+ 'katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
195
+ 'kayah_li' => ranges_to_unicode(43264..43309, 43311),
196
+ 'kharoshthi' => ranges_to_unicode(),
197
+ 'khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
198
+ 'lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
199
+ 'latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
200
+ 'lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
201
+ 'limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
202
+ 'linear_b' => ranges_to_unicode(),
203
+ 'lycian' => ranges_to_unicode(),
204
+ 'lydian' => ranges_to_unicode(),
205
+ 'malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
206
+ 'mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
207
+ 'myanmar' => ranges_to_unicode(4096..4223),
208
+ 'new_tai_lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
209
+ 'nko' => ranges_to_unicode(1984..2042),
210
+ 'ogham' => ranges_to_unicode(5760..5788),
211
+ 'ol_chiki' => ranges_to_unicode(7248..7295),
212
+ 'old_italic' => ranges_to_unicode(),
213
+ 'old_persian' => ranges_to_unicode(),
214
+ 'oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
215
+ 'osmanya' => ranges_to_unicode(),
216
+ 'phags_pa' => ranges_to_unicode(43072..43127),
217
+ 'phoenician' => ranges_to_unicode(),
218
+ 'rejang' => ranges_to_unicode(43312..43347, 43359),
219
+ 'runic' => ranges_to_unicode(5792..5866, 5870..5880),
220
+ 'saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
221
+ 'shavian' => ranges_to_unicode(),
222
+ 'sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
223
+ 'sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
224
+ 'syloti_nagri' => ranges_to_unicode(43008..43051),
225
+ 'syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
226
+ 'tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
227
+ 'tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
228
+ 'tai_le' => ranges_to_unicode(6480..6509, 6512..6516),
229
+ 'tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
230
+ 'telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
231
+ 'thaana' => ranges_to_unicode(1920..1969),
232
+ 'thai' => ranges_to_unicode(3585..3642, 3648..3675),
233
+ 'tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
234
+ 'tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
235
+ 'ugaritic' => ranges_to_unicode(),
236
+ 'vai' => ranges_to_unicode(42240..42367),
237
+ 'yi' => ranges_to_unicode(40960..41087),
238
238
  }.freeze
239
239
  end
240
240
 
@@ -103,9 +103,9 @@ module RegexpExamples
103
103
  @current_position += ($1.length + $2.length + 2)
104
104
  group = CharGroup.new(
105
105
  if($1 == "^")
106
- CharSets::Any.dup - NamedPropertyCharMap[$2]
106
+ CharSets::Any.dup - NamedPropertyCharMap[$2.downcase]
107
107
  else
108
- NamedPropertyCharMap[$2]
108
+ NamedPropertyCharMap[$2.downcase]
109
109
  end,
110
110
  @ignorecase
111
111
  )
@@ -223,30 +223,10 @@ module RegexpExamples
223
223
  end
224
224
 
225
225
  def parse_char_group
226
- # TODO: Extract all this logic into ChargroupParser
227
- if rest_of_string =~ /\A\[\[:(\^?)([^:]+):\]\]/
228
- @current_position += (6 + $1.length + $2.length)
229
- chars = $1.empty? ? POSIXCharMap[$2] : CharSets::Any - POSIXCharMap[$2]
230
- return CharGroup.new(chars, @ignorecase)
231
- end
232
- chars = []
233
- @current_position += 1
234
- if next_char == ']'
235
- # Beware of the sneaky edge case:
236
- # /[]]/ (match "]")
237
- chars << ']'
238
- @current_position += 1
239
- end
240
- until next_char == ']' \
241
- && !regexp_string[0..@current_position-1].match(/[^\\](\\{2})*\\\z/)
242
- # Beware of having an ODD number of "\" before the "]", e.g.
243
- # /[\]]/ (match "]")
244
- # /[\\]/ (match "\")
245
- # /[\\\]]/ (match "\" or "]")
246
- chars << next_char
247
- @current_position += 1
248
- end
249
- parsed_chars = ChargroupParser.new(chars).result
226
+ @current_position += 1 # Skip past opening "["
227
+ chargroup_parser = ChargroupParser.new(rest_of_string)
228
+ parsed_chars = chargroup_parser.result
229
+ @current_position += (chargroup_parser.length - 1) # Step back to closing "]"
250
230
  CharGroup.new(parsed_chars, @ignorecase)
251
231
  end
252
232
 
@@ -1,3 +1,3 @@
1
1
  module RegexpExamples
2
- VERSION = '0.7.0'
2
+ VERSION = '1.0.0'
3
3
  end
@@ -171,7 +171,7 @@ File.open(OutputFilename, 'w') do |f|
171
171
  NamedGroups.each do |name|
172
172
  count += 1
173
173
  matching_codes = (0..55295).lazy.select { |x| /\p{#{name}}/ =~ eval("?\\u{#{x.to_s(16)}}") }.first(128)
174
- f.puts "'#{name}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
174
+ f.puts "'#{name.downcase}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
175
175
  puts "(#{count}/#{NamedGroups.length}) Finished property: #{name}"
176
176
  end
177
177
  puts "*"*50
@@ -69,7 +69,6 @@ RSpec.describe Regexp, "#examples" do
69
69
 
70
70
  context "for complex char groups (square brackets)" do
71
71
  examples_exist_and_match(
72
-
73
72
  /[abc]/,
74
73
  /[a-c]/,
75
74
  /[abc-e]/,
@@ -82,7 +81,13 @@ RSpec.describe Regexp, "#examples" do
82
81
  /[\n-\r]/,
83
82
  /[\-]/,
84
83
  /[%-+]/, # This regex is "supposed to" match some surprising things!!!
85
- /['-.]/ # Test to ensure no "infinite loop" on character set expansion
84
+ /['-.]/, # Test to ensure no "infinite loop" on character set expansion
85
+ /[[abc]]/, # Nested groups
86
+ /[[[[abc]]]]/,
87
+ /[[a][b][c]]/,
88
+ /[[a-h]&&[f-z]]/, # Set intersection
89
+ /[[a-h]&&ab[c]]/, # Set intersection
90
+ /[[a-h]&[f-z]]/, # NOT set intersection
86
91
  )
87
92
  end
88
93
 
@@ -173,7 +178,8 @@ RSpec.describe Regexp, "#examples" do
173
178
  context "for named properties" do
174
179
  examples_exist_and_match(
175
180
  /\p{L}/,
176
- /\p{Arabic}/,
181
+ /\p{Space}/,
182
+ /\p{AlPhA}/, # Checking case insensitivity
177
183
  /\p{^Ll}/
178
184
  )
179
185
 
data/spec/spec_helper.rb CHANGED
@@ -1,12 +1,5 @@
1
- require 'simplecov'
2
- SimpleCov.start do
3
- require 'simplecov-badge'
4
- SimpleCov::Formatter::BadgeFormatter.strength_foreground = true
5
- SimpleCov.formatter = SimpleCov::Formatter::MultiFormatter[
6
- SimpleCov::Formatter::HTMLFormatter,
7
- SimpleCov::Formatter::BadgeFormatter,
8
- ]
9
- end
1
+ require 'coveralls'
2
+ Coveralls.wear!
10
3
 
11
4
  require './lib/regexp-examples.rb'
12
5
  require 'pry'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: regexp-examples
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tom Lord
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-02-28 00:00:00.000000000 Z
11
+ date: 2015-03-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -51,8 +51,6 @@ files:
51
51
  - LICENSE.txt
52
52
  - README.md
53
53
  - Rakefile
54
- - coverage/.gitignore
55
- - coverage/coverage-badge.png
56
54
  - lib/regexp-examples.rb
57
55
  - lib/regexp-examples/backreferences.rb
58
56
  - lib/regexp-examples/chargroup_parser.rb
@@ -87,7 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
87
85
  version: '0'
88
86
  requirements: []
89
87
  rubyforge_project:
90
- rubygems_version: 2.4.5
88
+ rubygems_version: 2.2.2
91
89
  signing_key:
92
90
  specification_version: 4
93
91
  summary: Extends the Regexp class with '#examples'
data/coverage/.gitignore DELETED
@@ -1,4 +0,0 @@
1
- # Ignore any file in this directory except for this file and coverage-badge.png files
2
- *
3
- !/.gitignore
4
- !coverage-badge.png
Binary file