regexp-examples 0.7.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f7dacce756110dd70823630de898a8c9f55d12b1
4
- data.tar.gz: d3ee78e2ed48d91aacc9cb916d8ab71dd25e326d
3
+ metadata.gz: 39d4ef9f2ee3e17541118580954e0a9f137f969b
4
+ data.tar.gz: 9e573acd0b5cdcf700bfe4f49a838933e6a01737
5
5
  SHA512:
6
- metadata.gz: 2655f9c1b1bbb8452a06d7debdba232ba53354131776bbf23a69fc1dc3b62d950600093b4581d2f4c5304f161db421ed95204e8c40cee2adc75c95670dcf42a1
7
- data.tar.gz: da2dd9829aa3f5f2415f4a5ca4182133c19b1a481a40172140858ba72f65e05824eebdbff8899c6f0d84a90c93b0539c86c68dd7a23371fe6f577da738746824
6
+ metadata.gz: 50ee2b66ced2a4878309566088a493b8cc7654d44db6b91458c0f04f4015cbc9a2e437b7444670b28a78475a5df715d89dd8a81fa1e9d508095c234da830c782
7
+ data.tar.gz: 4785f7967bc3549629c2fbb5f74f0d7343076a028d33265344b9f1cb0b74c39df2c4f402ab97baeb42fb536dbd31132eb3fb69d462cab5a3ffb0f568ae91bed5
data/.gitignore CHANGED
@@ -12,3 +12,4 @@
12
12
  *.a
13
13
  mkmf.log
14
14
  tags
15
+ /coverage/
data/Gemfile CHANGED
@@ -1,9 +1,10 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
- gem 'rspec', group: :test
4
- gem 'simplecov', require: false, group: :test
5
- gem 'simplecov-badge', require: false, group: :test
6
- gem 'pry', group: :test
3
+ group :test do
4
+ gem 'rspec'
5
+ gem 'coveralls', require: false
6
+ gem 'pry'
7
+ end
7
8
 
8
9
  # Specify your gem's dependencies in regexp-examples.gemspec
9
10
  gemspec
data/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # regexp-examples
2
2
  [![Gem Version](https://badge.fury.io/rb/regexp-examples.svg)](http://badge.fury.io/rb/regexp-examples)
3
3
  [![Build Status](https://travis-ci.org/tom-lord/regexp-examples.svg?branch=master)](https://travis-ci.org/tom-lord/regexp-examples/builds)
4
- ![Code Coverage](coverage/coverage-badge.png)
4
+ [![Coverage Status](https://coveralls.io/repos/tom-lord/regexp-examples/badge.svg?branch=master)](https://coveralls.io/r/tom-lord/regexp-examples?branch=master)
5
5
 
6
6
  Extends the Regexp class with the method: Regexp#examples
7
7
 
@@ -26,12 +26,33 @@ For more detail on this, see [configuration options](#configuration-options).
26
26
  /what about (backreferences\?) \1/.examples #=> ['what about backreferences? backreferences?']
27
27
  ```
28
28
 
29
+ ## Installation
30
+
31
+ Add this line to your application's Gemfile:
32
+
33
+ ```ruby
34
+ gem 'regexp-examples'
35
+ ```
36
+
37
+ And then execute:
38
+
39
+ $ bundle
40
+
41
+ Or install it yourself as:
42
+
43
+ $ gem install regexp-examples
44
+
29
45
  ## Supported syntax
30
46
 
31
47
  * All forms of repeaters (quantifiers), e.g. `/a*/`, `/a+/`, `/a?/`, `/a{1,4}/`, `/a{3,}/`, `/a{,2}/`
32
48
  * Reluctant and possissive repeaters work fine, too - e.g. `/a*?/`, `/a*+/`
33
49
  * Boolean "Or" groups, e.g. `/a|b|c/`
34
- * Character sets (inluding ranges and negation!), e.g. `/[abc]/`, `/[A-Z0-9]/`, `/[^a-z]/`, `/[\w\s\b]/`
50
+ * Character sets e.g. `/[abc]/` - including:
51
+ * Ranges, e.g.`/[A-Z0-9]/`
52
+ * Negation, e.g. `/[^a-z]/`
53
+ * Escaped characters, e.g. `/[\w\s\b]/`
54
+ * POSIX bracket expressions, e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
55
+ * Set intersection, e.g. `/[[a-h]&&[f-z]]/`
35
56
  * Escaped characters, e.g. `/\n/`, `/\w/`, `/\D/` (and so on...)
36
57
  * Capture groups, e.g. `/(group)/`
37
58
  * Including named groups, e.g. `/(?<name>group)/`
@@ -43,7 +64,6 @@ For more detail on this, see [configuration options](#configuration-options).
43
64
  * Escape sequences, e.g. `/\x42/`, `/\x5word/`, `/#{"\x80".force_encoding("ASCII-8BIT")}/`
44
65
  * Unicode characters, e.g. `/\u0123/`, `/\uabcd/`, `/\u{789}/`
45
66
  * Octal characters, e.g. `/\10/`, `/\177/`
46
- * POSIX bracket expressions (including negation), e.g. `/[[:alnum:]]/`, `/[[:^space:]]/`
47
67
  * Named properties, e.g. `/\p{L}/` ("Letter"), `/\p{Arabic}/` ("Arabic character"), `/\p{^Ll}/` ("Not a lowercase letter")
48
68
  * **Arbitrarily complex combinations of all the above!**
49
69
 
@@ -55,13 +75,12 @@ For more detail on this, see [configuration options](#configuration-options).
55
75
 
56
76
  ## Bugs and Not-Yet-Supported syntax
57
77
 
58
- * Nested character classes, and the use of set intersection ([See here](http://www.ruby-doc.org/core-2.2.0/Regexp.html#class-Regexp-label-Character+Classes) for the official documentation on this.) For example:
59
- * `/[[abc]de]/.examples` (which _should_ return `["a", "b", "c", "d", "e"]`)
60
- * `/[[a-d]&&[c-f]]/.examples` (which _should_ return: `["c", "d"]`)
78
+ * There are some (rare) edge cases where backreferences do not work properly, e.g. `/(a*)a* \1/.examples` - which includes "aaaa aa". This is because each repeater is not context-aware, so the "greediness" logic is flawed. (E.g. in this case, the second `a*` should always evaluate to an empty string, because the previous `a*` was greedy! However, patterns like this are highly unusual...
79
+ * Some named properties, e.g. `/\p{Arabic}/`, list non-matching examples for ruby 2.0/2.1 (as the definitions changed in ruby 2.2). This would be "easy" to fix, but I can't be bothered... Feel free to make a pull request!
61
80
 
62
- * Conditional capture groups, such as `/(group1) (?(1)yes|no)`
63
-
64
- There are loads more (increasingly obscure) unsupported bits of syntax, which I cannot be bothered to write out here. Full documentation on all the various other obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE).
81
+ There are also some various (increasingly obscure) unsupported bits of syntax, which I cannot be bothered to write out fully here. Full documentation on all the intricate obscurities in the ruby (version 2.x) regexp parser can be found [here](https://raw.githubusercontent.com/k-takata/Onigmo/master/doc/RE). To name a couple:
82
+ * Conditional capture groups, e.g. `/(group1)? (?(1)yes|no)/.examples` (which *should* return: `["group1 yes", " no"]`)
83
+ * Back reference by relatve group number, e.g. `/(a)(b)(c)(d) \k<-2>/.examples` (which *should* return: `["abcd c"]`)
65
84
 
66
85
  ## Impossible features ("illegal syntax")
67
86
 
@@ -115,21 +134,12 @@ A more sensible use case might be, for example, to generate one random 1-4 digit
115
134
 
116
135
  (Note: I may develop a much more efficient way to "generate one example" in a later release of this gem.)
117
136
 
118
- ## Installation
119
-
120
- Add this line to your application's Gemfile:
121
-
122
- ```ruby
123
- gem 'regexp-examples'
124
- ```
137
+ ## TODO
125
138
 
126
- And then execute:
127
-
128
- $ bundle
129
-
130
- Or install it yourself as:
131
-
132
- $ gem install regexp-examples
139
+ * Performance improvements:
140
+ * Use of lambdas/something (in [constants.rb](lib/regexp-examples/constants.rb)) to improve the library load time.
141
+ * (Maybe?) add a `max_examples` configuration option and use lazy evaluation, to ensure the method never "freezes"
142
+ * Write a blog post about how this amazing gem works! :)
133
143
 
134
144
  ## Contributing
135
145
 
@@ -1,69 +1,118 @@
1
1
  module RegexpExamples
2
- # Given an array of chars from inside a character set,
3
- # Interprets all backslashes, ranges and negations
4
- # TODO: This needs a bit of a rewrite because:
5
- # A) It's ugly
6
- # B) It doesn't take into account nested character groups, or set intersection
7
- # To achieve this, the algorithm needs to be recursive, like the main Parser.
2
+ # A "sub-parser", for char groups in a regular expression
3
+ # Some examples of what this class needs to parse:
4
+ # [abc] - plain characters
5
+ # [a-z] - ranges
6
+ # [\n\b\d] - escaped characters (which may represent character sets)
7
+ # [^abc] - negated group
8
+ # [[a][bc]] - sub-groups (should match "a", "b" or "c")
9
+ # [[:lower:]] - POSIX group
10
+ # [[a-f]&&[d-z]] - set intersection (should match "d", "f" or "f")
11
+ # [[^:alpha:]&&[\n]a-c] - all of the above!!!! (should match "\n")
8
12
  class ChargroupParser
9
- def initialize(chars)
10
- @chars = chars
11
- if @chars[0] == "^"
12
- @negative = true
13
- @chars = @chars[1..-1]
14
- else
15
- @negative = false
13
+ attr_reader :regexp_string
14
+ def initialize(regexp_string, is_sub_group: false)
15
+ @regexp_string = regexp_string
16
+ @is_sub_group = is_sub_group
17
+ @current_position = 0
18
+ parse
19
+ end
20
+
21
+ def parse
22
+ @charset = []
23
+ @negative = false
24
+ parse_first_chars
25
+ until next_char == "]" do
26
+ case next_char
27
+ when "["
28
+ @current_position += 1
29
+ sub_group_parser = self.class.new(rest_of_string, is_sub_group: true)
30
+ @charset.concat sub_group_parser.result
31
+ @current_position += sub_group_parser.length
32
+ when "-"
33
+ if regexp_string[@current_position + 1] == "]" # e.g. /[abc-]/ -- not a range!
34
+ @charset << "-"
35
+ @current_position += 1
36
+ else
37
+ @current_position += 1
38
+ @charset.concat (@charset.last .. parse_checking_backlash.first).to_a
39
+ @current_position += 1
40
+ end
41
+ when "&"
42
+ if regexp_string[@current_position + 1] == "&"
43
+ @current_position += 2
44
+ sub_group_parser = self.class.new(rest_of_string, is_sub_group: @is_sub_group)
45
+ @charset &= sub_group_parser.result
46
+ @current_position += (sub_group_parser.length - 1)
47
+ else
48
+ @charset << "&"
49
+ @current_position += 1
50
+ end
51
+ else
52
+ @charset.concat parse_checking_backlash
53
+ @current_position += 1
54
+ end
16
55
  end
17
56
 
18
- init_backslash_chars
19
- init_ranges
57
+ @charset.uniq!
58
+ @current_position += 1 # To account for final "]"
59
+ end
60
+
61
+ def length
62
+ @current_position
20
63
  end
21
64
 
22
65
  def result
23
- @negative ? (CharSets::Any - @chars) : @chars
66
+ @negative ? (CharSets::Any - @charset) : @charset
24
67
  end
25
68
 
26
69
  private
27
- def init_backslash_chars
28
- @chars.each_with_index do |char, i|
29
- if char == "\\"
30
- if BackslashCharMap.keys.include?(@chars[i+1])
31
- @chars[i..i+1] = move_backslash_to_front( BackslashCharMap[@chars[i+1]] )
32
- elsif @chars[i+1] == 'b'
33
- @chars[i..i+1] = "\b"
34
- elsif @chars[i+1] == "\\"
35
- @chars.delete_at(i+1)
36
- else
37
- @chars.delete_at(i)
38
- end
70
+ def parse_first_chars
71
+ if next_char == '^'
72
+ @negative = true
73
+ @current_position += 1
74
+ end
75
+
76
+ case rest_of_string
77
+ when /\A[-\]]/ # e.g. /[]]/ (match "]") or /[-]/ (match "-")
78
+ @charset << next_char
79
+ @current_position += 1
80
+ when /\A:(\^?)([^:]+):\]/ # e.g. [[:alpha:]] - POSIX group
81
+ if @is_sub_group
82
+ chars = $1.empty? ? POSIXCharMap[$2] : (CharSets::Any - POSIXCharMap[$2])
83
+ @charset.concat chars
84
+ @current_position += ($1.length + $2.length + 2)
39
85
  end
40
86
  end
41
87
  end
42
88
 
43
- def init_ranges
44
- # remove hyphen ("-") from front/back, if present
45
- hyphen = nil
46
- hyphen = @chars.shift if @chars.first == "-"
47
- hyphen ||= @chars.pop if @chars.last == "-"
48
- # Replace all instances of e.g. ["a", "-", "z"] with ["a", "b", ..., "z"]
49
- while i = @chars.index("-")
50
- # Prevent infinite loops from expanding [",", "-", "."] to itself
51
- # (Since ",".ord = 44, "-".ord = 45, ".".ord = 46)
52
- if (@chars[i-1] == ',' && @chars[i+1] == '.')
53
- hyphen = @chars.delete_at(i)
54
- else
55
- @chars[i-1..i+1] = (@chars[i-1]..@chars[i+1]).to_a
56
- end
89
+ # Always returns an Array, for consistency
90
+ def parse_checking_backlash
91
+ if next_char == "\\"
92
+ @current_position += 1
93
+ parse_after_backslash
94
+ else
95
+ [next_char]
57
96
  end
58
- # restore hyphen, if stripped out earlier
59
- @chars.unshift(hyphen) if hyphen
60
97
  end
61
98
 
62
- def move_backslash_to_front(chars)
63
- if index = chars.index { |char| char == '\\' }
64
- chars.unshift chars.delete_at(index)
99
+ def parse_after_backslash
100
+ case next_char
101
+ when *BackslashCharMap.keys
102
+ BackslashCharMap[next_char]
103
+ when 'b'
104
+ ["\b"]
105
+ else
106
+ [next_char]
65
107
  end
66
- chars
108
+ end
109
+
110
+ def rest_of_string
111
+ regexp_string[@current_position..-1]
112
+ end
113
+
114
+ def next_char
115
+ regexp_string[@current_position]
67
116
  end
68
117
  end
69
118
  end
@@ -105,136 +105,136 @@ module RegexpExamples
105
105
  # Note: Only the first 128 results are listed, for performance.
106
106
  # Also, some groups seem to have no matches (weird!)
107
107
  NamedPropertyCharMap = {
108
- 'Alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
109
- 'Alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
110
- 'Blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
111
- 'Cntrl' => ranges_to_unicode(0..31, 127..159),
112
- 'Digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
113
- 'Graph' => ranges_to_unicode(33..126, 161..194),
114
- 'Lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
115
- 'Print' => ranges_to_unicode(32..126, 160..192),
116
- 'Punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
117
- 'Space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
118
- 'Upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
119
- 'XDigit' => ranges_to_unicode(48..57, 65..70, 97..102),
120
- 'Word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
121
- 'ASCII' => ranges_to_unicode(0..127),
122
- 'Any' => ranges_to_unicode(0..127),
123
- 'Assigned' => ranges_to_unicode(0..127),
124
- 'L' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
125
- 'Ll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
126
- 'Lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
127
- 'Lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
128
- 'Lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
129
- 'Lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
130
- 'M' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
131
- 'Mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
132
- 'Mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
133
- 'Me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
134
- 'N' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
135
- 'Nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
136
- 'Nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
137
- 'No' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
138
- 'P' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
139
- 'Pc' => ranges_to_unicode(95, 8255..8256, 8276),
140
- 'Pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
141
- 'Ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
142
- 'Pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
143
- 'Pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
144
- 'Pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
145
- 'Po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
146
- 'S' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
147
- 'Sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
148
- 'Sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
149
- 'Sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
150
- 'So' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
151
- 'Z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
152
- 'Zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
153
- 'Zl' => ranges_to_unicode(8232),
154
- 'Zp' => ranges_to_unicode(8233),
155
- 'C' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
156
- 'Cc' => ranges_to_unicode(0..31, 127..159),
157
- 'Cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
158
- 'Cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
159
- 'Co' => ranges_to_unicode(),
160
- 'Cs' => ranges_to_unicode(),
161
- 'Arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
162
- 'Armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
163
- 'Balinese' => ranges_to_unicode(6912..6987, 6992..7036),
164
- 'Bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
165
- 'Bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
166
- 'Braille' => ranges_to_unicode(10240..10367),
167
- 'Buginese' => ranges_to_unicode(6656..6683, 6686..6687),
168
- 'Buhid' => ranges_to_unicode(5952..5971),
169
- 'Canadian_Aboriginal' => ranges_to_unicode(5120..5247),
170
- 'Carian' => ranges_to_unicode(),
171
- 'Cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
172
- 'Cherokee' => ranges_to_unicode(5024..5108),
173
- 'Common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
174
- 'Coptic' => ranges_to_unicode(994..1007, 11392..11505),
175
- 'Cuneiform' => ranges_to_unicode(),
176
- 'Cypriot' => ranges_to_unicode(),
177
- 'Cyrillic' => ranges_to_unicode(1024..1151),
178
- 'Deseret' => ranges_to_unicode(),
179
- 'Devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
180
- 'Ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
181
- 'Georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
182
- 'Glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
183
- 'Gothic' => ranges_to_unicode(),
184
- 'Greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
185
- 'Gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
186
- 'Gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
187
- 'Han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
188
- 'Hangul' => ranges_to_unicode(4352..4479),
189
- 'Hanunoo' => ranges_to_unicode(5920..5940),
190
- 'Hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
191
- 'Hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
192
- 'Inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
193
- 'Kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
194
- 'Katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
195
- 'Kayah_Li' => ranges_to_unicode(43264..43309, 43311),
196
- 'Kharoshthi' => ranges_to_unicode(),
197
- 'Khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
198
- 'Lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
199
- 'Latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
200
- 'Lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
201
- 'Limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
202
- 'Linear_B' => ranges_to_unicode(),
203
- 'Lycian' => ranges_to_unicode(),
204
- 'Lydian' => ranges_to_unicode(),
205
- 'Malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
206
- 'Mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
207
- 'Myanmar' => ranges_to_unicode(4096..4223),
208
- 'New_Tai_Lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
209
- 'Nko' => ranges_to_unicode(1984..2042),
210
- 'Ogham' => ranges_to_unicode(5760..5788),
211
- 'Ol_Chiki' => ranges_to_unicode(7248..7295),
212
- 'Old_Italic' => ranges_to_unicode(),
213
- 'Old_Persian' => ranges_to_unicode(),
214
- 'Oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
215
- 'Osmanya' => ranges_to_unicode(),
216
- 'Phags_Pa' => ranges_to_unicode(43072..43127),
217
- 'Phoenician' => ranges_to_unicode(),
218
- 'Rejang' => ranges_to_unicode(43312..43347, 43359),
219
- 'Runic' => ranges_to_unicode(5792..5866, 5870..5880),
220
- 'Saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
221
- 'Shavian' => ranges_to_unicode(),
222
- 'Sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
223
- 'Sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
224
- 'Syloti_Nagri' => ranges_to_unicode(43008..43051),
225
- 'Syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
226
- 'Tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
227
- 'Tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
228
- 'Tai_Le' => ranges_to_unicode(6480..6509, 6512..6516),
229
- 'Tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
230
- 'Telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
231
- 'Thaana' => ranges_to_unicode(1920..1969),
232
- 'Thai' => ranges_to_unicode(3585..3642, 3648..3675),
233
- 'Tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
234
- 'Tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
235
- 'Ugaritic' => ranges_to_unicode(),
236
- 'Vai' => ranges_to_unicode(42240..42367),
237
- 'Yi' => ranges_to_unicode(40960..41087),
108
+ 'alnum' => ranges_to_unicode(48..57, 65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..256),
109
+ 'alpha' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
110
+ 'blank' => ranges_to_unicode(9, 32, 160, 5760, 8192..8202, 8239, 8287, 12288),
111
+ 'cntrl' => ranges_to_unicode(0..31, 127..159),
112
+ 'digit' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
113
+ 'graph' => ranges_to_unicode(33..126, 161..194),
114
+ 'lower' => ranges_to_unicode(97..122, 170, 181, 186, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387),
115
+ 'print' => ranges_to_unicode(32..126, 160..192),
116
+ 'punct' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
117
+ 'space' => ranges_to_unicode(9..13, 32, 133, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
118
+ 'upper' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
119
+ 'xdigit' => ranges_to_unicode(48..57, 65..70, 97..102),
120
+ 'word' => ranges_to_unicode(48..57, 65..90, 95, 97..122, 170, 181, 186, 192..214, 216..246, 248..255),
121
+ 'ascii' => ranges_to_unicode(0..127),
122
+ 'any' => ranges_to_unicode(0..127),
123
+ 'assigned' => ranges_to_unicode(0..127),
124
+ 'l' => ranges_to_unicode(65..90, 97..122, 170, 181, 186, 192..214, 216..246, 248..266),
125
+ 'll' => ranges_to_unicode(97..122, 181, 223..246, 248..255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311..312, 314, 316, 318, 320, 322, 324, 326, 328..329, 331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 375, 378, 380, 382..384, 387, 389, 392),
126
+ 'lm' => ranges_to_unicode(688..705, 710..721, 736..740, 748, 750, 884, 890, 1369, 1600, 1765..1766, 2036..2037, 2042, 2074, 2084, 2088, 2417, 3654, 3782, 4348, 6103, 6211, 6823, 7288..7293, 7468..7530, 7544, 7579..7580),
127
+ 'lo' => ranges_to_unicode(170, 186, 443, 448..451, 660, 1488..1514, 1520..1522, 1568..1599, 1601..1610, 1646..1647, 1649..1694),
128
+ 'lt' => ranges_to_unicode(453, 456, 459, 498, 8072..8079, 8088..8095, 8104..8111, 8124, 8140, 8188),
129
+ 'lu' => ranges_to_unicode(65..90, 192..214, 216..222, 256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 304, 306, 308, 310, 313, 315, 317, 319, 321, 323, 325, 327, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376..377, 379, 381, 385..386, 388, 390..391, 393..395, 398),
130
+ 'm' => ranges_to_unicode(768..879, 1155..1161, 1425..1433),
131
+ 'mn' => ranges_to_unicode(768..879, 1155..1159, 1425..1435),
132
+ 'mc' => ranges_to_unicode(2307, 2363, 2366..2368, 2377..2380, 2382..2383, 2434..2435, 2494..2496, 2503..2504, 2507..2508, 2519, 2563, 2622..2624, 2691, 2750..2752, 2761, 2763..2764, 2818..2819, 2878, 2880, 2887..2888, 2891..2892, 2903, 3006..3007, 3009..3010, 3014..3016, 3018..3020, 3031, 3073..3075, 3137..3140, 3202..3203, 3262, 3264..3268, 3271..3272, 3274..3275, 3285..3286, 3330..3331, 3390..3392, 3398..3400, 3402..3404, 3415, 3458..3459, 3535..3537, 3544..3551, 3570..3571, 3902..3903, 3967, 4139..4140, 4145, 4152, 4155..4156, 4182..4183, 4194..4196, 4199..4205, 4227..4228, 4231..4235),
133
+ 'me' => ranges_to_unicode(1160..1161, 6846, 8413..8416, 8418..8420, 42608..42610),
134
+ 'n' => ranges_to_unicode(48..57, 178..179, 185, 188..190, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2548..2553, 2662..2671, 2790..2799, 2918..2927, 2930..2935, 3046..3058, 3174..3180),
135
+ 'nd' => ranges_to_unicode(48..57, 1632..1641, 1776..1785, 1984..1993, 2406..2415, 2534..2543, 2662..2671, 2790..2799, 2918..2927, 3046..3055, 3174..3183, 3302..3311, 3430..3437),
136
+ 'nl' => ranges_to_unicode(5870..5872, 8544..8578, 8581..8584, 12295, 12321..12329, 12344..12346, 42726..42735),
137
+ 'no' => ranges_to_unicode(178..179, 185, 188..190, 2548..2553, 2930..2935, 3056..3058, 3192..3198, 3440..3445, 3882..3891, 4969..4988, 6128..6137, 6618, 8304, 8308..8313, 8320..8329, 8528..8543, 8585, 9312..9330),
138
+ 'p' => ranges_to_unicode(33..35, 37..42, 44..47, 58..59, 63..64, 91..93, 95, 123, 125, 161, 167, 171, 182..183, 187, 191, 894, 903, 1370..1375, 1417..1418, 1470, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3898..3901, 3973, 4048..4052, 4057..4058, 4170),
139
+ 'pc' => ranges_to_unicode(95, 8255..8256, 8276),
140
+ 'pd' => ranges_to_unicode(45, 1418, 1470, 5120, 6150, 8208..8213, 11799, 11802, 11834..11835, 11840, 12316, 12336, 12448),
141
+ 'ps' => ranges_to_unicode(40, 91, 123, 3898, 3900, 5787, 8218, 8222, 8261, 8317, 8333, 8968, 8970, 9001, 10088, 10090, 10092, 10094, 10096, 10098, 10100, 10181, 10214, 10216, 10218, 10220, 10222, 10627, 10629, 10631, 10633, 10635, 10637, 10639, 10641, 10643, 10645, 10647, 10712, 10714, 10748, 11810, 11812, 11814, 11816, 11842, 12296, 12298, 12300, 12302, 12304, 12308, 12310, 12312, 12314, 12317),
142
+ 'pe' => ranges_to_unicode(41, 93, 125, 3899, 3901, 5788, 8262, 8318, 8334, 8969, 8971, 9002, 10089, 10091, 10093, 10095, 10097, 10099, 10101, 10182, 10215, 10217, 10219, 10221, 10223, 10628, 10630, 10632, 10634, 10636, 10638, 10640, 10642, 10644, 10646, 10648, 10713, 10715, 10749, 11811, 11813, 11815, 11817, 12297, 12299, 12301, 12303, 12305, 12309, 12311, 12313, 12315, 12318..12319),
143
+ 'pi' => ranges_to_unicode(171, 8216, 8219..8220, 8223, 8249, 11778, 11780, 11785, 11788, 11804, 11808),
144
+ 'pf' => ranges_to_unicode(187, 8217, 8221, 8250, 11779, 11781, 11786, 11789, 11805, 11809),
145
+ 'po' => ranges_to_unicode(33..35, 37..39, 42, 44, 46..47, 58..59, 63..64, 92, 161, 167, 182..183, 191, 894, 903, 1370..1375, 1417, 1472, 1475, 1478, 1523..1524, 1545..1546, 1548..1549, 1563, 1566..1567, 1642..1645, 1748, 1792..1805, 2039..2041, 2096..2110, 2142, 2404..2405, 2416, 2800, 3572, 3663, 3674..3675, 3844..3858, 3860, 3973, 4048..4052, 4057..4058, 4170..4175, 4347, 4960..4968, 5741),
146
+ 's' => ranges_to_unicode(36, 43, 60..62, 94, 96, 124, 126, 162..166, 168..169, 172, 174..177, 180, 184, 215, 247, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 1014, 1154, 1421..1423, 1542..1544, 1547, 1550..1551, 1758, 1769, 1789..1790, 2038, 2546..2547, 2554..2555, 2801, 2928, 3059..3066, 3199, 3449, 3647, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037),
147
+ 'sm' => ranges_to_unicode(43, 60..62, 124, 126, 172, 177, 215, 247, 1014, 1542..1544, 8260, 8274, 8314..8316, 8330..8332, 8472, 8512..8516, 8523, 8592..8596, 8602..8603, 8608, 8611, 8614, 8622, 8654..8655, 8658, 8660, 8692..8775),
148
+ 'sc' => ranges_to_unicode(36, 162..165, 1423, 1547, 2546..2547, 2555, 2801, 3065, 3647, 6107, 8352..8381, 43064),
149
+ 'sk' => ranges_to_unicode(94, 96, 168, 175, 180, 184, 706..709, 722..735, 741..747, 749, 751..767, 885, 900..901, 8125, 8127..8129, 8141..8143, 8157..8159, 8173..8175, 8189..8190, 12443..12444, 42752..42774, 42784..42785, 42889..42890, 43867),
150
+ 'so' => ranges_to_unicode(166, 169, 174, 176, 1154, 1421..1422, 1550..1551, 1758, 1769, 1789..1790, 2038, 2554, 2928, 3059..3064, 3066, 3199, 3449, 3841..3843, 3859, 3861..3863, 3866..3871, 3892, 3894, 3896, 4030..4037, 4039..4044, 4046..4047, 4053..4056, 4254..4255, 5008..5017, 6464, 6622..6655, 7009..7018, 7028..7036, 8448),
151
+ 'z' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8232..8233, 8239, 8287, 12288),
152
+ 'zs' => ranges_to_unicode(32, 160, 5760, 8192..8202, 8239, 8287, 12288),
153
+ 'zl' => ranges_to_unicode(8232),
154
+ 'zp' => ranges_to_unicode(8233),
155
+ 'c' => ranges_to_unicode(0..31, 127..159, 173, 888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1541, 1564..1565, 1757, 1806..1807, 1867..1868, 1970..1977),
156
+ 'cc' => ranges_to_unicode(0..31, 127..159),
157
+ 'cf' => ranges_to_unicode(173, 1536..1541, 1564, 1757, 1807, 6158, 8203..8207, 8234..8238, 8288..8292, 8294..8303),
158
+ 'cn' => ranges_to_unicode(888..889, 896..899, 907, 909, 930, 1328, 1367..1368, 1376, 1416, 1419..1420, 1424, 1480..1487, 1515..1519, 1525..1535, 1565, 1806, 1867..1868, 1970..1983, 2043..2047, 2094..2095, 2111, 2140..2141, 2143..2201),
159
+ 'co' => ranges_to_unicode(),
160
+ 'cs' => ranges_to_unicode(),
161
+ 'arabic' => ranges_to_unicode(1536..1540, 1542..1547, 1549..1562, 1566, 1568..1599, 1601..1610, 1622..1631, 1642..1647, 1649..1692),
162
+ 'armenian' => ranges_to_unicode(1329..1366, 1369..1375, 1377..1415, 1418, 1421..1423),
163
+ 'balinese' => ranges_to_unicode(6912..6987, 6992..7036),
164
+ 'bengali' => ranges_to_unicode(2432..2435, 2437..2444, 2447..2448, 2451..2472, 2474..2480, 2482, 2486..2489, 2492..2500, 2503..2504, 2507..2510, 2519, 2524..2525, 2527..2531, 2534..2555),
165
+ 'bopomofo' => ranges_to_unicode(746..747, 12549..12589, 12704..12730),
166
+ 'braille' => ranges_to_unicode(10240..10367),
167
+ 'buginese' => ranges_to_unicode(6656..6683, 6686..6687),
168
+ 'buhid' => ranges_to_unicode(5952..5971),
169
+ 'canadian_aboriginal' => ranges_to_unicode(5120..5247),
170
+ 'carian' => ranges_to_unicode(),
171
+ 'cham' => ranges_to_unicode(43520..43574, 43584..43597, 43600..43609, 43612..43615),
172
+ 'cherokee' => ranges_to_unicode(5024..5108),
173
+ 'common' => ranges_to_unicode(0..64, 91..96, 123..169, 171..180),
174
+ 'coptic' => ranges_to_unicode(994..1007, 11392..11505),
175
+ 'cuneiform' => ranges_to_unicode(),
176
+ 'cypriot' => ranges_to_unicode(),
177
+ 'cyrillic' => ranges_to_unicode(1024..1151),
178
+ 'deseret' => ranges_to_unicode(),
179
+ 'devanagari' => ranges_to_unicode(2304..2384, 2387..2403, 2406..2431, 43232..43235),
180
+ 'ethiopic' => ranges_to_unicode(4608..4680, 4682..4685, 4688..4694, 4696, 4698..4701, 4704..4742),
181
+ 'georgian' => ranges_to_unicode(4256..4293, 4295, 4301, 4304..4346, 4348..4351, 11520..11557, 11559, 11565),
182
+ 'glagolitic' => ranges_to_unicode(11264..11310, 11312..11358),
183
+ 'gothic' => ranges_to_unicode(),
184
+ 'greek' => ranges_to_unicode(880..883, 885..887, 890..893, 895, 900, 902, 904..906, 908, 910..929, 931..993, 1008..1023, 7462..7466, 7517..7521, 7526),
185
+ 'gujarati' => ranges_to_unicode(2689..2691, 2693..2701, 2703..2705, 2707..2728, 2730..2736, 2738..2739, 2741..2745, 2748..2757, 2759..2761, 2763..2765, 2768, 2784..2787, 2790..2801),
186
+ 'gurmukhi' => ranges_to_unicode(2561..2563, 2565..2570, 2575..2576, 2579..2600, 2602..2608, 2610..2611, 2613..2614, 2616..2617, 2620, 2622..2626, 2631..2632, 2635..2637, 2641, 2649..2652, 2654, 2662..2677),
187
+ 'han' => ranges_to_unicode(11904..11929, 11931..12019, 12032..12044),
188
+ 'hangul' => ranges_to_unicode(4352..4479),
189
+ 'hanunoo' => ranges_to_unicode(5920..5940),
190
+ 'hebrew' => ranges_to_unicode(1425..1479, 1488..1514, 1520..1524),
191
+ 'hiragana' => ranges_to_unicode(12353..12438, 12445..12447),
192
+ 'inherited' => ranges_to_unicode(768..879, 1157..1158, 1611..1621, 1648, 2385..2386),
193
+ 'kannada' => ranges_to_unicode(3201..3203, 3205..3212, 3214..3216, 3218..3240, 3242..3251, 3253..3257, 3260..3268, 3270..3272, 3274..3277, 3285..3286, 3294, 3296..3299, 3302..3311, 3313..3314),
194
+ 'katakana' => ranges_to_unicode(12449..12538, 12541..12543, 12784..12799, 13008..13026),
195
+ 'kayah_li' => ranges_to_unicode(43264..43309, 43311),
196
+ 'kharoshthi' => ranges_to_unicode(),
197
+ 'khmer' => ranges_to_unicode(6016..6109, 6112..6121, 6128..6137, 6624..6637),
198
+ 'lao' => ranges_to_unicode(3713..3714, 3716, 3719..3720, 3722, 3725, 3732..3735, 3737..3743, 3745..3747, 3749, 3751, 3754..3755, 3757..3769, 3771..3773, 3776..3780, 3782, 3784..3789, 3792..3801, 3804..3807),
199
+ 'latin' => ranges_to_unicode(65..90, 97..122, 170, 186, 192..214, 216..246, 248..267),
200
+ 'lepcha' => ranges_to_unicode(7168..7223, 7227..7241, 7245..7247),
201
+ 'limbu' => ranges_to_unicode(6400..6430, 6432..6443, 6448..6459, 6464, 6468..6479),
202
+ 'linear_b' => ranges_to_unicode(),
203
+ 'lycian' => ranges_to_unicode(),
204
+ 'lydian' => ranges_to_unicode(),
205
+ 'malayalam' => ranges_to_unicode(3329..3331, 3333..3340, 3342..3344, 3346..3386, 3389..3396, 3398..3400, 3402..3406, 3415, 3424..3427, 3430..3445, 3449..3455),
206
+ 'mongolian' => ranges_to_unicode(6144..6145, 6148, 6150..6158, 6160..6169, 6176..6263, 6272..6289),
207
+ 'myanmar' => ranges_to_unicode(4096..4223),
208
+ 'new_tai_lue' => ranges_to_unicode(6528..6571, 6576..6601, 6608..6618, 6622..6623),
209
+ 'nko' => ranges_to_unicode(1984..2042),
210
+ 'ogham' => ranges_to_unicode(5760..5788),
211
+ 'ol_chiki' => ranges_to_unicode(7248..7295),
212
+ 'old_italic' => ranges_to_unicode(),
213
+ 'old_persian' => ranges_to_unicode(),
214
+ 'oriya' => ranges_to_unicode(2817..2819, 2821..2828, 2831..2832, 2835..2856, 2858..2864, 2866..2867, 2869..2873, 2876..2884, 2887..2888, 2891..2893, 2902..2903, 2908..2909, 2911..2915, 2918..2935),
215
+ 'osmanya' => ranges_to_unicode(),
216
+ 'phags_pa' => ranges_to_unicode(43072..43127),
217
+ 'phoenician' => ranges_to_unicode(),
218
+ 'rejang' => ranges_to_unicode(43312..43347, 43359),
219
+ 'runic' => ranges_to_unicode(5792..5866, 5870..5880),
220
+ 'saurashtra' => ranges_to_unicode(43136..43204, 43214..43225),
221
+ 'shavian' => ranges_to_unicode(),
222
+ 'sinhala' => ranges_to_unicode(3458..3459, 3461..3478, 3482..3505, 3507..3515, 3517, 3520..3526, 3530, 3535..3540, 3542, 3544..3551, 3558..3567, 3570..3572),
223
+ 'sundanese' => ranges_to_unicode(7040..7103, 7360..7367),
224
+ 'syloti_nagri' => ranges_to_unicode(43008..43051),
225
+ 'syriac' => ranges_to_unicode(1792..1805, 1807..1866, 1869..1871),
226
+ 'tagalog' => ranges_to_unicode(5888..5900, 5902..5908),
227
+ 'tagbanwa' => ranges_to_unicode(5984..5996, 5998..6000, 6002..6003),
228
+ 'tai_le' => ranges_to_unicode(6480..6509, 6512..6516),
229
+ 'tamil' => ranges_to_unicode(2946..2947, 2949..2954, 2958..2960, 2962..2965, 2969..2970, 2972, 2974..2975, 2979..2980, 2984..2986, 2990..3001, 3006..3010, 3014..3016, 3018..3021, 3024, 3031, 3046..3066),
230
+ 'telugu' => ranges_to_unicode(3072..3075, 3077..3084, 3086..3088, 3090..3112, 3114..3129, 3133..3140, 3142..3144, 3146..3149, 3157..3158, 3160..3161, 3168..3171, 3174..3183, 3192..3199),
231
+ 'thaana' => ranges_to_unicode(1920..1969),
232
+ 'thai' => ranges_to_unicode(3585..3642, 3648..3675),
233
+ 'tibetan' => ranges_to_unicode(3840..3911, 3913..3948, 3953..3972),
234
+ 'tifinagh' => ranges_to_unicode(11568..11623, 11631..11632, 11647),
235
+ 'ugaritic' => ranges_to_unicode(),
236
+ 'vai' => ranges_to_unicode(42240..42367),
237
+ 'yi' => ranges_to_unicode(40960..41087),
238
238
  }.freeze
239
239
  end
240
240
 
@@ -103,9 +103,9 @@ module RegexpExamples
103
103
  @current_position += ($1.length + $2.length + 2)
104
104
  group = CharGroup.new(
105
105
  if($1 == "^")
106
- CharSets::Any.dup - NamedPropertyCharMap[$2]
106
+ CharSets::Any.dup - NamedPropertyCharMap[$2.downcase]
107
107
  else
108
- NamedPropertyCharMap[$2]
108
+ NamedPropertyCharMap[$2.downcase]
109
109
  end,
110
110
  @ignorecase
111
111
  )
@@ -223,30 +223,10 @@ module RegexpExamples
223
223
  end
224
224
 
225
225
  def parse_char_group
226
- # TODO: Extract all this logic into ChargroupParser
227
- if rest_of_string =~ /\A\[\[:(\^?)([^:]+):\]\]/
228
- @current_position += (6 + $1.length + $2.length)
229
- chars = $1.empty? ? POSIXCharMap[$2] : CharSets::Any - POSIXCharMap[$2]
230
- return CharGroup.new(chars, @ignorecase)
231
- end
232
- chars = []
233
- @current_position += 1
234
- if next_char == ']'
235
- # Beware of the sneaky edge case:
236
- # /[]]/ (match "]")
237
- chars << ']'
238
- @current_position += 1
239
- end
240
- until next_char == ']' \
241
- && !regexp_string[0..@current_position-1].match(/[^\\](\\{2})*\\\z/)
242
- # Beware of having an ODD number of "\" before the "]", e.g.
243
- # /[\]]/ (match "]")
244
- # /[\\]/ (match "\")
245
- # /[\\\]]/ (match "\" or "]")
246
- chars << next_char
247
- @current_position += 1
248
- end
249
- parsed_chars = ChargroupParser.new(chars).result
226
+ @current_position += 1 # Skip past opening "["
227
+ chargroup_parser = ChargroupParser.new(rest_of_string)
228
+ parsed_chars = chargroup_parser.result
229
+ @current_position += (chargroup_parser.length - 1) # Step back to closing "]"
250
230
  CharGroup.new(parsed_chars, @ignorecase)
251
231
  end
252
232
 
@@ -1,3 +1,3 @@
1
1
  module RegexpExamples
2
- VERSION = '0.7.0'
2
+ VERSION = '1.0.0'
3
3
  end
@@ -171,7 +171,7 @@ File.open(OutputFilename, 'w') do |f|
171
171
  NamedGroups.each do |name|
172
172
  count += 1
173
173
  matching_codes = (0..55295).lazy.select { |x| /\p{#{name}}/ =~ eval("?\\u{#{x.to_s(16)}}") }.first(128)
174
- f.puts "'#{name}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
174
+ f.puts "'#{name.downcase}' => ranges_to_unicode(#{calculate_ranges(matching_codes)}),"
175
175
  puts "(#{count}/#{NamedGroups.length}) Finished property: #{name}"
176
176
  end
177
177
  puts "*"*50
@@ -69,7 +69,6 @@ RSpec.describe Regexp, "#examples" do
69
69
 
70
70
  context "for complex char groups (square brackets)" do
71
71
  examples_exist_and_match(
72
-
73
72
  /[abc]/,
74
73
  /[a-c]/,
75
74
  /[abc-e]/,
@@ -82,7 +81,13 @@ RSpec.describe Regexp, "#examples" do
82
81
  /[\n-\r]/,
83
82
  /[\-]/,
84
83
  /[%-+]/, # This regex is "supposed to" match some surprising things!!!
85
- /['-.]/ # Test to ensure no "infinite loop" on character set expansion
84
+ /['-.]/, # Test to ensure no "infinite loop" on character set expansion
85
+ /[[abc]]/, # Nested groups
86
+ /[[[[abc]]]]/,
87
+ /[[a][b][c]]/,
88
+ /[[a-h]&&[f-z]]/, # Set intersection
89
+ /[[a-h]&&ab[c]]/, # Set intersection
90
+ /[[a-h]&[f-z]]/, # NOT set intersection
86
91
  )
87
92
  end
88
93
 
@@ -173,7 +178,8 @@ RSpec.describe Regexp, "#examples" do
173
178
  context "for named properties" do
174
179
  examples_exist_and_match(
175
180
  /\p{L}/,
176
- /\p{Arabic}/,
181
+ /\p{Space}/,
182
+ /\p{AlPhA}/, # Checking case insensitivity
177
183
  /\p{^Ll}/
178
184
  )
179
185
 
data/spec/spec_helper.rb CHANGED
@@ -1,12 +1,5 @@
1
- require 'simplecov'
2
- SimpleCov.start do
3
- require 'simplecov-badge'
4
- SimpleCov::Formatter::BadgeFormatter.strength_foreground = true
5
- SimpleCov.formatter = SimpleCov::Formatter::MultiFormatter[
6
- SimpleCov::Formatter::HTMLFormatter,
7
- SimpleCov::Formatter::BadgeFormatter,
8
- ]
9
- end
1
+ require 'coveralls'
2
+ Coveralls.wear!
10
3
 
11
4
  require './lib/regexp-examples.rb'
12
5
  require 'pry'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: regexp-examples
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.7.0
4
+ version: 1.0.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Tom Lord
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2015-02-28 00:00:00.000000000 Z
11
+ date: 2015-03-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -51,8 +51,6 @@ files:
51
51
  - LICENSE.txt
52
52
  - README.md
53
53
  - Rakefile
54
- - coverage/.gitignore
55
- - coverage/coverage-badge.png
56
54
  - lib/regexp-examples.rb
57
55
  - lib/regexp-examples/backreferences.rb
58
56
  - lib/regexp-examples/chargroup_parser.rb
@@ -87,7 +85,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
87
85
  version: '0'
88
86
  requirements: []
89
87
  rubyforge_project:
90
- rubygems_version: 2.4.5
88
+ rubygems_version: 2.2.2
91
89
  signing_key:
92
90
  specification_version: 4
93
91
  summary: Extends the Regexp class with '#examples'
data/coverage/.gitignore DELETED
@@ -1,4 +0,0 @@
1
- # Ignore any file in this directory except for this file and coverage-badge.png files
2
- *
3
- !/.gitignore
4
- !coverage-badge.png
Binary file