oniguruma 1.0.1-mswin32

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,49 @@
1
+ == 1.0.1 / 2007-03-28
2
+ * Minimal recommended version of oniglib changed to be compatible with Ruby 1.9, now is 4.6 or higher.
3
+ * Restore check for onig version to build with 4.6
4
+ * In getting replacement do not create temp string object, but directly add to resulting buffer (performance impr.)
5
+ * Included gem support for windows.
6
+ * Modified Rakefile to support win32 gems.
7
+
8
+ == 1.0.0 / 2007-03-27
9
+ * Added documentation for MatchData.
10
+ * Added ogsub, ogsub!, sub and sub! to ::String.
11
+ * Removed ::String definitions from tests.
12
+ * Now the minimal recommended version of oniglib is 5.5 or higher.
13
+ * Removed ugly #if statements from c code.
14
+ * Do not create @named_captures hash if there are no named groups for regexp -- somewhat improve speed for repetive calls
15
+ * Fixed usage of named backreferences in gsub with non-ascii names
16
+ * Move ORegexp#=~ to C code, make it work just like Regexp#=~, i.e. set $~. Throw ArgumentError instead of Exception if pattern does not compile
17
+ * Fix implementation of ORegexp#===, so it now does not raise errors in case statement anymore
18
+ (resembles plain Ruby Regexp#=== behaviour)
19
+ * Modified begin, end and offset methods in MatchData to handle named groups and default to group 0.
20
+ * Exception is not longer thrown when in oregexp_make_match_data.
21
+ * Removed references to MultiMatchData from documentation
22
+ * Removed class MultiMatchData
23
+ * Fix off by one error in region->num_regs usage
24
+ * Fix dumb bug with zero-width matches that made infinite loops. now consume at least one char in gsub and scan
25
+ * ORegexp API changes:
26
+ * Pass only MatchData to sub/gsub with blocks
27
+ oregexp.sub( str ) {|match_data| ... }
28
+ oregexp.gsub( str ) {|match_data| ... }
29
+ * Add ORegexp#scan instead of match_all
30
+ oregexp.scan(str) {|match_data| ... } # => MultiMatchData
31
+ * Friendly way to set options
32
+ ORegexp.new( pattern, options_str, encoding, syntax)
33
+ ORegexp.new('\w+', 'imsx', 'koi8r', 'perl')
34
+ * Named backreferences in substitions
35
+ ORegexp.new('(?<pre>\w+)\d+(?<after>\w+)').sub('abc123def', '\<after>123\<pre>') #=> 'def123abc'
36
+ * couple of bugfixes with region's num_regs
37
+ * some docs for substitution methods added
38
+
39
+ == 0.9.1 / 2007-03-25
40
+ * FIX: Buggy resolution of numeric codes for encoding and syntax options (Nikolai Lugovoi)
41
+ * FIX: Buggy implementation of ORegexp#gsub and ORegexp#gsub methods. Now code is all C (Nikolai Lugovoi)
42
+ * Added documentation for class ORegexp
43
+ * Added regexp syntax documentation.
44
+
45
+ == 0.9.0 / 2007-03-19
46
+
47
+ * 1 major enhancement
48
+ * Birthday!
49
+
@@ -0,0 +1,10 @@
1
+ History.txt
2
+ Manifest.txt
3
+ README.txt
4
+ Syntax.txt
5
+ Rakefile
6
+ lib/oniguruma.rb
7
+ ext/oregexp.c
8
+ ext/extconf.rb
9
+ test/test_oniguruma.rb
10
+ win/oregexp.so
@@ -0,0 +1,71 @@
1
+ == ONIGURUMA FOR RUBY:
2
+
3
+ Ruby bindings to the Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] regular expression library (no need to recompile Ruby).
4
+
5
+ == FEATURES:
6
+
7
+ * Increased performance.
8
+ * Same interface than standard Regexp class (easy transition!).
9
+ * Support for named groups, look-ahead, look-behind, and other
10
+ cool features!
11
+ * Support for other regexp syntaxes (Perl, Python, Java, etc.)
12
+
13
+ == SYNOPSIS:
14
+
15
+ reg = Oniguruma::ORegex.new( '(?<before>.*)(a)(?<after>.*)' )
16
+ match = reg.match( 'terraforming' )
17
+ puts match[0] <= 'terraforming'
18
+ puts match[:before] <= 'terr'
19
+ puts match[:after] <= 'forming'
20
+
21
+ == SYNTAX
22
+
23
+ Consult the Syntax.txt[link:files/Syntax_txt.html] page.
24
+
25
+ == REQUIREMENTS:
26
+
27
+ * Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] library v. 5.5 or higher
28
+
29
+ == INSTALL:
30
+
31
+ sudo gem install -r oniguruma
32
+
33
+ == BUGS/PROBLEMS/INCOMPATIBILITIES:
34
+
35
+ * <code>ORegexp#~</code> is not implemented.
36
+ * <code>ORegexp#kcode</code> results are not compatible with <code>Regexp</code>.
37
+ * <code>ORegexp</code> options set in the string are not visible, this affects
38
+ <code>ORegexp#options</code>, <code>ORegexp#to_s</code>, <code>ORegexp#inspect</code>
39
+ and <code>ORegexp#==</code>.
40
+
41
+ == TODO:
42
+
43
+ * Complete documentation (methods, oniguruma syntax).
44
+
45
+ == CREDITS:
46
+
47
+ * N. Lugovoi. ORegexp.sub and ORegexp.gsub code and lots of other stuff.
48
+ * K. Kosako. For his great library.
49
+ * A lot of the documentation has been copied from the original Ruby Regex documentation.
50
+
51
+ == LICENSE:
52
+
53
+ New BSD License
54
+
55
+ Copyright (c) 2007, Dizan Vasquez
56
+ All rights reserved.
57
+
58
+ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
59
+
60
+ * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
61
+ * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
62
+ documentation and/or other materials provided with the distribution.
63
+ * Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this
64
+ software without specific prior written permission.
65
+
66
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
67
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
68
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
69
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
70
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
71
+ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,41 @@
1
+ require 'rubygems'
2
+ require 'hoe'
3
+
4
+ class Hoe
5
+ # Dirty hack to eliminate Hoe from gem dependencies
6
+ def extra_deps
7
+ @extra_deps.reject { |x| Array(x).first == 'hoe' }
8
+ end
9
+
10
+ # Dirty hack to package only the required files per platform
11
+ def spec= s
12
+ if ENV['PLATFORM'] =~ /win32/
13
+ s.files = s.files.reject! {|f| f =~ /extconf\.rb/}
14
+ else
15
+ s.files = s.files.reject! {|f| f =~ /win\//}
16
+ end
17
+ @spec = s
18
+ end
19
+ end
20
+
21
+ version = /^== *(\d+\.\d+\.\d+)/.match( File.read( 'History.txt' ) )[1]
22
+
23
+ Hoe.new('oniguruma', version) do |p|
24
+ p.rubyforge_name = 'oniguruma'
25
+ p.author = ['Dizan Vasquez', 'Nikolai Lugovoi']
26
+ p.email = 'dichodaemon@gmail.com'
27
+ p.summary = 'Bindings for the oniguruma regular expression library'
28
+ p.description = p.paragraphs_of('README.txt', 1 ).join('\n\n')
29
+ p.url = 'http://oniguruma.rubyforge.org'
30
+ if ENV['PLATFORM'] =~ /win32/
31
+ p.lib_files = ["win/oregexp.so"]
32
+ p.spec_extras[:require_paths] = ["win", "lib", "ext" ]
33
+ p.spec_extras[:platform] = Gem::Platform::WIN32
34
+ else
35
+ p.spec_extras[:extensions] = ["ext/extconf.rb"]
36
+ end
37
+ p.rdoc_pattern = /^(lib|bin|ext)|txt$/
38
+ p.changes = p.paragraphs_of('History.txt', 0).join("\n\n")
39
+ end
40
+
41
+
@@ -0,0 +1,396 @@
1
+ = RUBY REGULAR EXPRESSION SYNTAX
2
+
3
+
4
+ == Syntax Elements
5
+
6
+ [\] escape (enable or disable meta character meaning)
7
+ [|] alternation
8
+ [(...)] group
9
+ [[...]] character class
10
+
11
+
12
+ == Characters
13
+
14
+ [\t] horizontal tab (0x09)
15
+ [\v] vertical tab (0x0B)
16
+ [\n] newline (0x0A)
17
+ [\r] return (0x0D)
18
+ [\b] back space (0x08)
19
+
20
+ \b is effective in character class [...] only
21
+ [\f] form feed (0x0C)
22
+ [\a] bell (0x07)
23
+ [\e] escape (0x1B)
24
+ [\nnn] octal char (encoded byte value)
25
+ [\xHH] hexadecimal char (encoded byte value)
26
+ [\x{7HHHHHHH}] wide hexadecimal char (character code point value)
27
+ [\cx] control char (character code point value)
28
+ [\C-x] control char (character code point value)
29
+ [\M-x] meta (x|0x80) (character code point value)
30
+ [\M-\C-x] meta control char (character code point value)
31
+
32
+
33
+
34
+ == Character types
35
+
36
+ [.] any character (except newline)
37
+ [\w] word character
38
+
39
+ Not Unicode:
40
+ * alphanumeric, "_" and multibyte char.
41
+ Unicode:
42
+ * General_Category -- (Letter|Mark|Number|Connector_Punctuation)
43
+ [\W] non word char
44
+ [\s] whitespace char
45
+
46
+ Not Unicode:
47
+ * \t, \n, \v, \f, \r, \x20
48
+ Unicode:
49
+ * 0009, 000A, 000B, 000C, 000D, 0085(NEL),
50
+ * General_Category:
51
+ * -- Line_Separator
52
+ * -- Paragraph_Separator
53
+ * -- Space_Separator
54
+ [\S] non whitespace char
55
+ [\d] decimal digit char
56
+
57
+ Unicode: General_Category -- Decimal_Number
58
+ [\D] non decimal digit char
59
+ [\h] hexadecimal digit char [0-9a-fA-F]
60
+ [\H] non hexadecimal digit char
61
+
62
+
63
+ == Character Properties
64
+
65
+ \p{property-name}
66
+ \p{^property-name} (negative)
67
+ \P{property-name} (negative)
68
+
69
+ === property-name:
70
+
71
+ Works on all encodings:
72
+ * Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower,
73
+ Print, Punct, Space, Upper, XDigit, Word, ASCII,
74
+ Works on EUC_JP, Shift_JIS:
75
+ * Hiragana, Katakana
76
+ Works on UTF8, UTF16, UTF32:
77
+ * Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu,
78
+ M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps,
79
+ S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs,
80
+ Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese,
81
+ Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic,
82
+ Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian,
83
+ Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul,
84
+ Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana,
85
+ Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
86
+ Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian,
87
+ Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac,
88
+ Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan,
89
+ Tifinagh, Ugaritic, Yi
90
+
91
+ == Quantifiers
92
+
93
+ === Greedy
94
+
95
+ [?] 1 or 0 times
96
+ [*] 0 or more times
97
+ [+] 1 or more times
98
+ [{n,m}] at least n but not more than m times
99
+ [{n,}] at least n times
100
+ [{,n}] at least 0 but not more than n times ({0,n})
101
+ [{n}] n times
102
+
103
+ === Reluctant
104
+
105
+ [??] 1 or 0 times
106
+ [*?] 0 or more times
107
+ [+?] 1 or more times
108
+ [{n,m}?] at least n but not more than m times
109
+ [{n,}?] at least n times
110
+ [{,n}?] at least 0 but not more than n times (== {0,n}?)
111
+
112
+ === Possessive (greedy and does not backtrack after repeated)
113
+
114
+ [?+] 1 or 0 times
115
+ [*+] 0 or more times
116
+ [++] 1 or more times
117
+
118
+ ({n,m}+, {n,}+, {n}+ are possessive op. in ONIG_SYNTAX_JAVA only)
119
+
120
+
121
+ == Anchors
122
+
123
+ [^] beginning of the line
124
+ [$] end of the line
125
+ [\b] word boundary
126
+ [\B] not word boundary
127
+ [\A] beginning of string
128
+ [\Z] end of string, or before newline at the end
129
+ [\z] end of string
130
+ [\G] matching start position
131
+
132
+
133
+ == Character class
134
+
135
+ [^...] negative class (lowest precedence operator)
136
+ [x-y] range from x to y
137
+ [[...]] set (character class in character class)
138
+ [..&&..] intersection (low precedence at the next of ^)
139
+
140
+ If you want to use '[', '-', ']' as a normal character
141
+ in a character class, you should escape these characters by '\'.
142
+
143
+
144
+ POSIX bracket ([:xxxxx:], negate [:^xxxxx:])
145
+
146
+ === Not Unicode Case:
147
+
148
+ [alnum] alphabet or digit char
149
+ [alpha] alphabet
150
+ [ascii] code value: [0 - 127]
151
+ [blank] \t, \x20
152
+ [cntrl] control
153
+ [digit] 0-9
154
+ [graph] include all of multibyte encoded characters
155
+ [lower] lower case
156
+ [print] include all of multibyte encoded characters
157
+ [punct] punctuation
158
+ [space] \t, \n, \v, \f, \r, \x20
159
+ [upper] upper case
160
+ [xdigit] 0-9, a-f, A-F
161
+ [word] alphanumeric, "_" and multibyte characters
162
+
163
+
164
+ === Unicode Case:
165
+
166
+ [alnum] Letter | Mark | Decimal_Number
167
+ [alpha] Letter | Mark
168
+ [ascii] 0000 - 007F
169
+ [blank] Space_Separator | 0009
170
+ [cntrl] Control | Format | Unassigned | Private_Use | Surrogate
171
+ [digit] Decimal_Number
172
+ [graph] [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
173
+ [lower] Lowercase_Letter
174
+ [print] [[:graph:]] | [[:space:]]
175
+ [punct] Connector_Punctuation | Dash_Punctuation | Close_Punctuation |
176
+ Final_Punctuation | Initial_Punctuation | Other_Punctuation |
177
+ Open_Punctuation
178
+ [space] Space_Separator | Line_Separator | Paragraph_Separator |
179
+ 0009 | 000A | 000B | 000C | 000D | 0085
180
+ [upper] Uppercase_Letter
181
+ [xdigit] 0030 - 0039 | 0041 - 0046 | 0061 - 0066
182
+ (0-9, a-f, A-F)
183
+ [word] Letter | Mark | Decimal_Number | Connector_Punctuation
184
+
185
+
186
+
187
+ == Extended groups
188
+
189
+ [(?#...)] comment
190
+ [(?imx-imx)] option on/off:
191
+ * i: ignore case
192
+ * m: multi-line (dot(.) match newline)
193
+ * x: extended form
194
+ [(?imx-imx:subexp)] option on/off for subexp
195
+ [(?:subexp)] not captured group
196
+ [(subexp)] captured group
197
+ [(?=subexp)] look-ahead
198
+ [(?!subexp)] negative look-ahead
199
+ [(?<=subexp)] look-behind
200
+ [(?<!subexp)] negative look-behind
201
+
202
+ Subexp of look-behind must be fixed character length.
203
+ But different character length is allowed in top level
204
+ alternatives only.
205
+ ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
206
+
207
+ In negative-look-behind, captured group isn't allowed,
208
+ but shy group(?:) is allowed.
209
+ [(?>subexp)] atomic group
210
+ don't backtrack in subexp.
211
+ [(?<name>subexp)] define named group
212
+ (All characters of the name must be a word character.)
213
+
214
+ Not only a name but a number is assigned like a captured
215
+ group.
216
+
217
+ Assigning the same name as two or more subexps is allowed.
218
+ In this case, a subexp call can not be performed although
219
+ the back reference is possible.
220
+
221
+
222
+ == Back reference
223
+
224
+ [\n] back reference by group number (n >= 1)
225
+ [\k<name>] back reference by group name
226
+ In the back reference by the multiplex definition name,
227
+ a subexp with a large number is referred to preferentially.
228
+ (When not matched, a group of the small number is referred to.)
229
+
230
+ * Back reference by group number is forbidden if named group is defined
231
+ in the pattern and ONIG_OPTION_CAPTURE_GROUP is not setted.
232
+
233
+
234
+ === Back reference with nest level
235
+
236
+ [\k<name+n>] n: 0, 1, 2, ...
237
+ [\k<name-n>] n: 0, 1, 2, ...
238
+
239
+ Destinate relative nest level from back reference position.
240
+
241
+ Examples:
242
+ /\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer")
243
+
244
+ r = ORegexp.compile(<<'__REGEXP__'.strip, :options => Oniguruma::EXTENDED)
245
+ (?<element> \g<stag> \g<content>* \g<etag> ){0}
246
+ (?<stag> < \g<name> \s* > ){0}
247
+ (?<name> [a-zA-Z_:]+ ){0}
248
+ (?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
249
+ (?<etag> </ \k<name+1> >){0}
250
+ \g<element>
251
+ __REGEXP__
252
+
253
+ p r.match('<foo>f<bar>bbb</bar>f</foo>').captures
254
+
255
+
256
+
257
+ === Subexp call ("Tanaka Akira special")
258
+
259
+ [\g<name>] call by group name
260
+ [\g<n>] call by group number (n >= 1)
261
+
262
+ * left-most recursive call is not allowed.
263
+
264
+ Example:
265
+ (?<name>a|\g<name>b) => error
266
+ (?<name>a|b\g<name>c) => OK
267
+ * Call by group number is forbidden if named group is defined in the pattern
268
+ and Oniguruma::OPTION_CAPTURE_GROUP is not set.
269
+ * If the option status of called group is different from calling position
270
+ then the group's option is effective.
271
+
272
+ Example:
273
+ (?-i:\g<name>)(?i:(?<name>a)){0} <i>matches "A"</i>
274
+
275
+
276
+ == Captured group
277
+
278
+ Behavior of the no-named group (...) changes with the following conditions.
279
+ (But named group is not changed.)
280
+
281
+ [case 1] <code>ORegexp.new( '...' )</code> (named group is not used, no option)
282
+
283
+ ... is treated as a captured group.
284
+ [case 2] <code>ORegexp.new( '...', :options => OPTION_DONT_CAPTURE_GROUP )</code> (named group is not used, 'g' option)
285
+
286
+ ... is treated as a no-captured group (?:...).
287
+
288
+ [case 3] <code>ORegexp.new( '...(?<name>...)...' )</code> (named group is used, no option)
289
+
290
+ (?<name>...) is treated as a no-captured group (?:...)
291
+
292
+ numbered-backref/call is not allowed.
293
+
294
+ [case 2] <code>ORegexp.new( '...', :options => OPTION_CAPTURE_GROUP )</code> (named group is used, 'G' option)
295
+
296
+ (?<name>...) is treated as a captured group (?:...)
297
+
298
+ numbered-backref/call is allowed.
299
+
300
+ where
301
+ * g: OPTION_DONT_CAPTURE_GROUP
302
+ * G: OPTION_CAPTURE_GROUP
303
+
304
+ ('g' and 'G' options are argued in ruby-dev ML)
305
+
306
+
307
+ == Syntax dependent options
308
+
309
+ === ONIG_SYNTAX_RUBY
310
+
311
+ [(?m)] dot(.) match newline
312
+
313
+ === ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA
314
+
315
+ [(?s)] dot(.) match newline
316
+ [(?m)] ^ match after newline, $ match before newline
317
+
318
+ == Original extensions
319
+
320
+ * hexadecimal digit char type \h, \H
321
+ * named group (?<name>...)
322
+ * named backref \k<name>
323
+ * subexp call \g<name>, \g<group-num>
324
+
325
+
326
+ == Lacking features compare with perl 5.8.0
327
+
328
+ * \N{name}
329
+ * \l,\u,\L,\U, \X, \C
330
+ * (?{code})
331
+ * (??{code})
332
+ * (?(condition)yes-pat|no-pat)
333
+ * \Q...\E
334
+
335
+ This is effective on ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA.
336
+
337
+
338
+ == Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
339
+
340
+ * add character property (\p{property}, \P{property})
341
+ * add hexadecimal digit char type (\h, \H)
342
+ * add look-behind
343
+
344
+ (?<=fixed-char-length-pattern), (?<!fixed-char-length-pattern)
345
+ * add possessive quantifier. ?+, *+, ++
346
+ * add operations in character class. [], &&
347
+
348
+ ('[' must be escaped as an usual char in character class.)
349
+ * add named group and subexp call.
350
+ * octal or hexadecimal number sequence can be treated as
351
+ a multibyte code char in character class if multibyte encoding
352
+ is specified.
353
+
354
+ (ex. <code>[\xa1\xa2], [\xa1\xa7-\xa4\xa1]</code>)
355
+ * allow the range of single byte char and multibyte char in character
356
+ class.
357
+
358
+ ex. <code>[a-<<any EUC-JP character>>]</code> in EUC-JP encoding.
359
+ * effect range of isolated option is to next ')'.
360
+ ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b).
361
+ * isolated option is not transparent to previous pattern.
362
+ ex. <code>a(?i)*</code> is a syntax error pattern.
363
+ * allowed incompleted left brace as an usual string.
364
+ ex. /{/, /({)/, /a{2,3/ etc...
365
+ * negative POSIX bracket [:^xxxx:] is supported.
366
+ * POSIX bracket [:ascii:] is added.
367
+ * repeat of look-ahead is not allowed.
368
+ ex. <code>(?=a)*</code>, <code>(?!b){5}</code>
369
+ * Ignore case option is effective to numbered character.
370
+ ex. <code>/\x61/i =~ "A"<code>
371
+ * In the range quantifier, the number of the minimum is omissible.
372
+
373
+ <code>/a{,n}/ == /a{0,n}/<code>
374
+
375
+ The simultanious abbreviation of the number of times of the minimum
376
+ and the maximum is not allowed. (/a{,}/)
377
+ * <code>a{n}?<code> is not a non-greedy operator.
378
+ <code>/a{n}?/ == /(?:a{n})?/<code>
379
+ * invalid back reference is checked and cause error.
380
+ /\1/, /(a)\2/
381
+ * Zero-length match in infinite repeat stops the repeat,
382
+ then changes of the capture group status are checked as stop condition.
383
+ /(?:()|())*\1\2/ =~ ""
384
+ /(?:\1a|())*/ =~ "a"
385
+
386
+
387
+ == Problems
388
+
389
+ * Invalid encoding byte sequence is not checked in UTF-8.
390
+
391
+ * Invalid first byte is treated as a character.
392
+ /./u =~ "\xa3"
393
+
394
+ * Incomplete byte sequence is not checked.
395
+ /\w+/ =~ "a\xf3\x8ec"
396
+