jasherai-oniguruma 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/History.txt ADDED
@@ -0,0 +1,56 @@
1
+ == 1.1.1 /
2
+ * Fixed oregexp.c issue with UTF16/32 etc. patch found at http://howflow.com/tricks/ruby_how_to_install_the_gem_oniguruma
3
+
4
+ == 1.1.0 /
5
+ * Fixed string escaping in ORegexp#to_str and ORegexp#inspect.
6
+ * Added begin parameter to ORegexp#match.
7
+
8
+ == 1.0.1 / 2007-03-28
9
+ * Minimal recommended version of oniglib changed to be compatible with Ruby 1.9, now is 4.6 or higher.
10
+ * Restore check for onig version to build with 4.6
11
+ * In getting replacement do not create temp string object, but directly add to resulting buffer (performance impr.)
12
+ * Included binary gems for windows.
13
+ * Modified Rakefile to support win32 gems.
14
+
15
+ == 1.0.0 / 2007-03-27
16
+ * Added documentation for MatchData.
17
+ * Added ogsub, ogsub!, sub and sub! to ::String.
18
+ * Removed ::String definitions from tests.
19
+ * Now the minimal recommended version of oniglib is 5.5 or higher.
20
+ * Removed ugly #if statements from c code.
21
+ * Do not create @named_captures hash if there are no named groups for regexp -- somewhat improve speed for repetive calls
22
+ * Fixed usage of named backreferences in gsub with non-ascii names
23
+ * Move ORegexp#=~ to C code, make it work just like Regexp#=~, i.e. set $~. Throw ArgumentError instead of Exception if pattern does not compile
24
+ * Fix implementation of ORegexp#===, so it now does not raise errors in case statement anymore
25
+ (resembles plain Ruby Regexp#=== behaviour)
26
+ * Modified begin, end and offset methods in MatchData to handle named groups and default to group 0.
27
+ * Exception is not longer thrown when in oregexp_make_match_data.
28
+ * Removed references to MultiMatchData from documentation
29
+ * Removed class MultiMatchData
30
+ * Fix off by one error in region->num_regs usage
31
+ * Fix dumb bug with zero-width matches that made infinite loops. now consume at least one char in gsub and scan
32
+ * ORegexp API changes:
33
+ * Pass only MatchData to sub/gsub with blocks
34
+ oregexp.sub( str ) {|match_data| ... }
35
+ oregexp.gsub( str ) {|match_data| ... }
36
+ * Add ORegexp#scan instead of match_all
37
+ oregexp.scan(str) {|match_data| ... } # => MultiMatchData
38
+ * Friendly way to set options
39
+ ORegexp.new( pattern, options_str, encoding, syntax)
40
+ ORegexp.new('\w+', 'imsx', 'koi8r', 'perl')
41
+ * Named backreferences in substitions
42
+ ORegexp.new('(?<pre>\w+)\d+(?<after>\w+)').sub('abc123def', '\<after>123\<pre>') #=> 'def123abc'
43
+ * couple of bugfixes with region's num_regs
44
+ * some docs for substitution methods added
45
+
46
+ == 0.9.1 / 2007-03-25
47
+ * FIX: Buggy resolution of numeric codes for encoding and syntax options (Nikolai Lugovoi)
48
+ * FIX: Buggy implementation of ORegexp#gsub and ORegexp#gsub methods. Now code is all C (Nikolai Lugovoi)
49
+ * Added documentation for class ORegexp
50
+ * Added regexp syntax documentation.
51
+
52
+ == 0.9.0 / 2007-03-19
53
+
54
+ * 1 major enhancement
55
+ * Birthday!
56
+
data/Manifest.txt ADDED
@@ -0,0 +1,10 @@
1
+ History.txt
2
+ Manifest.txt
3
+ README.txt
4
+ Syntax.txt
5
+ Rakefile
6
+ lib/oniguruma.rb
7
+ ext/oregexp.c
8
+ ext/extconf.rb
9
+ test/test_oniguruma.rb
10
+ win/oregexp.so
data/README.txt ADDED
@@ -0,0 +1,71 @@
1
+ == ONIGURUMA FOR RUBY:
2
+
3
+ Ruby bindings to the Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] regular expression library (no need to recompile Ruby).
4
+
5
+ == FEATURES:
6
+
7
+ * Increased performance.
8
+ * Same interface than standard Regexp class (easy transition!).
9
+ * Support for named groups, look-ahead, look-behind, and other
10
+ cool features!
11
+ * Support for other regexp syntaxes (Perl, Python, Java, etc.)
12
+
13
+ == SYNOPSIS:
14
+
15
+ reg = Oniguruma::ORegex.new( '(?<before>.*)(a)(?<after>.*)' )
16
+ match = reg.match( 'terraforming' )
17
+ puts match[0] <= 'terraforming'
18
+ puts match[:before] <= 'terr'
19
+ puts match[:after] <= 'forming'
20
+
21
+ == SYNTAX
22
+
23
+ Consult the Syntax.txt[link:files/Syntax_txt.html] page.
24
+
25
+ == REQUIREMENTS:
26
+
27
+ * Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] library v. 4.6 or higher
28
+
29
+ == INSTALL:
30
+
31
+ sudo gem install -r oniguruma
32
+
33
+ == BUGS/PROBLEMS/INCOMPATIBILITIES:
34
+
35
+ * <code>ORegexp#~</code> is not implemented.
36
+ * <code>ORegexp#kcode</code> results are not compatible with <code>Regexp</code>.
37
+ * <code>ORegexp</code> options set in the string are not visible, this affects
38
+ <code>ORegexp#options</code>, <code>ORegexp#to_s</code>, <code>ORegexp#inspect</code>
39
+ and <code>ORegexp#==</code>.
40
+
41
+ == TODO:
42
+
43
+ * Complete documentation (methods, oniguruma syntax).
44
+
45
+ == CREDITS:
46
+
47
+ * N. Lugovoi. ORegexp.sub and ORegexp.gsub code and lots of other stuff.
48
+ * K. Kosako. For his great library.
49
+ * A lot of the documentation has been copied from the original Ruby Regex documentation.
50
+
51
+ == LICENSE:
52
+
53
+ New BSD License
54
+
55
+ Copyright (c) 2007, Dizan Vasquez
56
+ All rights reserved.
57
+
58
+ Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
59
+
60
+ * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
61
+ * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
62
+ documentation and/or other materials provided with the distribution.
63
+ * Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this
64
+ software without specific prior written permission.
65
+
66
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
67
+ LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
68
+ OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
69
+ LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
70
+ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
71
+ THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
data/Rakefile ADDED
@@ -0,0 +1,67 @@
1
+ require 'rubygems'
2
+ require 'hoe'
3
+
4
+ rubyforge_name = "oniguruma"
5
+
6
+ begin
7
+
8
+ class Hoe
9
+ # Dirty hack to eliminate Hoe from gem dependencies
10
+ def extra_deps
11
+ @extra_deps.delete_if{ |x| x.first == 'hoe' }
12
+ end
13
+
14
+ # Dirty hack to package only the required files per platform
15
+ def spec= s
16
+ if ENV['PLATFORM'] =~ /win32/
17
+ s.files = s.files.reject! {|f| f =~ /extconf\.rb/}
18
+ else
19
+ s.files = s.files.reject! {|f| f =~ /win\//}
20
+ end
21
+ @spec = s
22
+ end
23
+ end
24
+
25
+ version = /^== *(\d+\.\d+\.\d+)/.match( File.read( 'History.txt' ) )[1]
26
+
27
+ h = Hoe.new('oniguruma', version) do |p|
28
+ p.rubyforge_name = 'oniguruma'
29
+ p.author = ['Dizan Vasquez', 'Nikolai Lugovoi']
30
+ p.email = 'dichodaemon@gmail.com'
31
+ p.summary = 'Bindings for the oniguruma regular expression library'
32
+ p.description = p.paragraphs_of('README.txt', 1 ).join('\n\n')
33
+ p.url = 'http://oniguruma.rubyforge.org'
34
+ if ENV['PLATFORM'] =~ /win32/
35
+ p.lib_files = ["win/oregexp.so"]
36
+ p.spec_extras[:require_paths] = ["win", "lib", "ext" ]
37
+ p.spec_extras[:platform] = Gem::Platform::WIN32
38
+ else
39
+ p.spec_extras[:extensions] = ["ext/extconf.rb"]
40
+ end
41
+ p.rdoc_pattern = /^(lib|bin|ext)|txt$/
42
+ p.changes = p.paragraphs_of('History.txt', 0).join("\n\n")
43
+ p.clean_globs = ["manual/*"]
44
+ end
45
+
46
+ desc 'Create MaMa documentation'
47
+ task :mama => :clean do
48
+ system "mm -c -t refresh -o manual mm/manual.mm"
49
+ end
50
+
51
+ desc 'Publish MaMa documentation to RubyForge'
52
+ task :mama_publish => [:clean, :mama] do
53
+ config = YAML.load(File.read(File.expand_path("~/.rubyforge/user-config.yml")))
54
+ host = "#{config["username"]}@rubyforge.org"
55
+ remote_dir = "/var/www/gforge-projects/#{h.rubyforge_name}"
56
+ local_dir = 'manual'
57
+ system "rsync -av #{local_dir}/ #{host}:#{remote_dir}"
58
+ end
59
+
60
+ rescue LoadError => e
61
+ desc 'Run the test suite.'
62
+ task :test do
63
+ system "ruby -Ibin:lib:test test_#{rubyforge_name}.rb"
64
+ end
65
+ end
66
+
67
+
data/Syntax.txt ADDED
@@ -0,0 +1,396 @@
1
+ = RUBY REGULAR EXPRESSION SYNTAX
2
+
3
+
4
+ == Syntax Elements
5
+
6
+ [\] escape (enable or disable meta character meaning)
7
+ [|] alternation
8
+ [(...)] group
9
+ [[...]] character class
10
+
11
+
12
+ == Characters
13
+
14
+ [\t] horizontal tab (0x09)
15
+ [\v] vertical tab (0x0B)
16
+ [\n] newline (0x0A)
17
+ [\r] return (0x0D)
18
+ [\b] back space (0x08)
19
+
20
+ \b is effective in character class [...] only
21
+ [\f] form feed (0x0C)
22
+ [\a] bell (0x07)
23
+ [\e] escape (0x1B)
24
+ [\nnn] octal char (encoded byte value)
25
+ [\xHH] hexadecimal char (encoded byte value)
26
+ [\x{7HHHHHHH}] wide hexadecimal char (character code point value)
27
+ [\cx] control char (character code point value)
28
+ [\C-x] control char (character code point value)
29
+ [\M-x] meta (x|0x80) (character code point value)
30
+ [\M-\C-x] meta control char (character code point value)
31
+
32
+
33
+
34
+ == Character types
35
+
36
+ [.] any character (except newline)
37
+ [\w] word character
38
+
39
+ Not Unicode:
40
+ * alphanumeric, "_" and multibyte char.
41
+ Unicode:
42
+ * General_Category -- (Letter|Mark|Number|Connector_Punctuation)
43
+ [\W] non word char
44
+ [\s] whitespace char
45
+
46
+ Not Unicode:
47
+ * \t, \n, \v, \f, \r, \x20
48
+ Unicode:
49
+ * 0009, 000A, 000B, 000C, 000D, 0085(NEL),
50
+ * General_Category:
51
+ * -- Line_Separator
52
+ * -- Paragraph_Separator
53
+ * -- Space_Separator
54
+ [\S] non whitespace char
55
+ [\d] decimal digit char
56
+
57
+ Unicode: General_Category -- Decimal_Number
58
+ [\D] non decimal digit char
59
+ [\h] hexadecimal digit char [0-9a-fA-F]
60
+ [\H] non hexadecimal digit char
61
+
62
+
63
+ == Character Properties
64
+
65
+ \p{property-name}
66
+ \p{^property-name} (negative)
67
+ \P{property-name} (negative)
68
+
69
+ === property-name:
70
+
71
+ Works on all encodings:
72
+ * Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower,
73
+ Print, Punct, Space, Upper, XDigit, Word, ASCII,
74
+ Works on EUC_JP, Shift_JIS:
75
+ * Hiragana, Katakana
76
+ Works on UTF8, UTF16, UTF32:
77
+ * Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu,
78
+ M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps,
79
+ S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs,
80
+ Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese,
81
+ Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic,
82
+ Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian,
83
+ Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul,
84
+ Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana,
85
+ Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
86
+ Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian,
87
+ Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac,
88
+ Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan,
89
+ Tifinagh, Ugaritic, Yi
90
+
91
+ == Quantifiers
92
+
93
+ === Greedy
94
+
95
+ [?] 1 or 0 times
96
+ [*] 0 or more times
97
+ [+] 1 or more times
98
+ [{n,m}] at least n but not more than m times
99
+ [{n,}] at least n times
100
+ [{,n}] at least 0 but not more than n times ({0,n})
101
+ [{n}] n times
102
+
103
+ === Reluctant
104
+
105
+ [??] 1 or 0 times
106
+ [*?] 0 or more times
107
+ [+?] 1 or more times
108
+ [{n,m}?] at least n but not more than m times
109
+ [{n,}?] at least n times
110
+ [{,n}?] at least 0 but not more than n times (== {0,n}?)
111
+
112
+ === Possessive (greedy and does not backtrack after repeated)
113
+
114
+ [?+] 1 or 0 times
115
+ [*+] 0 or more times
116
+ [++] 1 or more times
117
+
118
+ ({n,m}+, {n,}+, {n}+ are possessive op. in ONIG_SYNTAX_JAVA only)
119
+
120
+
121
+ == Anchors
122
+
123
+ [^] beginning of the line
124
+ [$] end of the line
125
+ [\b] word boundary
126
+ [\B] not word boundary
127
+ [\A] beginning of string
128
+ [\Z] end of string, or before newline at the end
129
+ [\z] end of string
130
+ [\G] matching start position
131
+
132
+
133
+ == Character class
134
+
135
+ [^...] negative class (lowest precedence operator)
136
+ [x-y] range from x to y
137
+ [[...]] set (character class in character class)
138
+ [..&&..] intersection (low precedence at the next of ^)
139
+
140
+ If you want to use '[', '-', ']' as a normal character
141
+ in a character class, you should escape these characters by '\'.
142
+
143
+
144
+ POSIX bracket ([:xxxxx:], negate [:^xxxxx:])
145
+
146
+ === Not Unicode Case:
147
+
148
+ [alnum] alphabet or digit char
149
+ [alpha] alphabet
150
+ [ascii] code value: [0 - 127]
151
+ [blank] \t, \x20
152
+ [cntrl] control
153
+ [digit] 0-9
154
+ [graph] include all of multibyte encoded characters
155
+ [lower] lower case
156
+ [print] include all of multibyte encoded characters
157
+ [punct] punctuation
158
+ [space] \t, \n, \v, \f, \r, \x20
159
+ [upper] upper case
160
+ [xdigit] 0-9, a-f, A-F
161
+ [word] alphanumeric, "_" and multibyte characters
162
+
163
+
164
+ === Unicode Case:
165
+
166
+ [alnum] Letter | Mark | Decimal_Number
167
+ [alpha] Letter | Mark
168
+ [ascii] 0000 - 007F
169
+ [blank] Space_Separator | 0009
170
+ [cntrl] Control | Format | Unassigned | Private_Use | Surrogate
171
+ [digit] Decimal_Number
172
+ [graph] [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
173
+ [lower] Lowercase_Letter
174
+ [print] [[:graph:]] | [[:space:]]
175
+ [punct] Connector_Punctuation | Dash_Punctuation | Close_Punctuation |
176
+ Final_Punctuation | Initial_Punctuation | Other_Punctuation |
177
+ Open_Punctuation
178
+ [space] Space_Separator | Line_Separator | Paragraph_Separator |
179
+ 0009 | 000A | 000B | 000C | 000D | 0085
180
+ [upper] Uppercase_Letter
181
+ [xdigit] 0030 - 0039 | 0041 - 0046 | 0061 - 0066
182
+ (0-9, a-f, A-F)
183
+ [word] Letter | Mark | Decimal_Number | Connector_Punctuation
184
+
185
+
186
+
187
+ == Extended groups
188
+
189
+ [(?#...)] comment
190
+ [(?imx-imx)] option on/off:
191
+ * i: ignore case
192
+ * m: multi-line (dot(.) match newline)
193
+ * x: extended form
194
+ [(?imx-imx:subexp)] option on/off for subexp
195
+ [(?:subexp)] not captured group
196
+ [(subexp)] captured group
197
+ [(?=subexp)] look-ahead
198
+ [(?!subexp)] negative look-ahead
199
+ [(?<=subexp)] look-behind
200
+ [(?<!subexp)] negative look-behind
201
+
202
+ Subexp of look-behind must be fixed character length.
203
+ But different character length is allowed in top level
204
+ alternatives only.
205
+ ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
206
+
207
+ In negative-look-behind, captured group isn't allowed,
208
+ but shy group(?:) is allowed.
209
+ [(?>subexp)] atomic group
210
+ don't backtrack in subexp.
211
+ [(?<name>subexp)] define named group
212
+ (All characters of the name must be a word character.)
213
+
214
+ Not only a name but a number is assigned like a captured
215
+ group.
216
+
217
+ Assigning the same name as two or more subexps is allowed.
218
+ In this case, a subexp call can not be performed although
219
+ the back reference is possible.
220
+
221
+
222
+ == Back reference
223
+
224
+ [\n] back reference by group number (n >= 1)
225
+ [\k<name>] back reference by group name
226
+ In the back reference by the multiplex definition name,
227
+ a subexp with a large number is referred to preferentially.
228
+ (When not matched, a group of the small number is referred to.)
229
+
230
+ * Back reference by group number is forbidden if named group is defined
231
+ in the pattern and ONIG_OPTION_CAPTURE_GROUP is not setted.
232
+
233
+
234
+ === Back reference with nest level
235
+
236
+ [\k<name+n>] n: 0, 1, 2, ...
237
+ [\k<name-n>] n: 0, 1, 2, ...
238
+
239
+ Destinate relative nest level from back reference position.
240
+
241
+ Examples:
242
+ /\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer")
243
+
244
+ r = ORegexp.compile(<<'__REGEXP__'.strip, :options => Oniguruma::EXTENDED)
245
+ (?<element> \g<stag> \g<content>* \g<etag> ){0}
246
+ (?<stag> < \g<name> \s* > ){0}
247
+ (?<name> [a-zA-Z_:]+ ){0}
248
+ (?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
249
+ (?<etag> </ \k<name+1> >){0}
250
+ \g<element>
251
+ __REGEXP__
252
+
253
+ p r.match('<foo>f<bar>bbb</bar>f</foo>').captures
254
+
255
+
256
+
257
+ === Subexp call ("Tanaka Akira special")
258
+
259
+ [\g<name>] call by group name
260
+ [\g<n>] call by group number (n >= 1)
261
+
262
+ * left-most recursive call is not allowed.
263
+
264
+ Example:
265
+ (?<name>a|\g<name>b) => error
266
+ (?<name>a|b\g<name>c) => OK
267
+ * Call by group number is forbidden if named group is defined in the pattern
268
+ and Oniguruma::OPTION_CAPTURE_GROUP is not set.
269
+ * If the option status of called group is different from calling position
270
+ then the group's option is effective.
271
+
272
+ Example:
273
+ (?-i:\g<name>)(?i:(?<name>a)){0} <i>matches "A"</i>
274
+
275
+
276
+ == Captured group
277
+
278
+ Behavior of the no-named group (...) changes with the following conditions.
279
+ (But named group is not changed.)
280
+
281
+ [case 1] <code>ORegexp.new( '...' )</code> (named group is not used, no option)
282
+
283
+ ... is treated as a captured group.
284
+ [case 2] <code>ORegexp.new( '...', :options => OPTION_DONT_CAPTURE_GROUP )</code> (named group is not used, 'g' option)
285
+
286
+ ... is treated as a no-captured group (?:...).
287
+
288
+ [case 3] <code>ORegexp.new( '...(?<name>...)...' )</code> (named group is used, no option)
289
+
290
+ (?<name>...) is treated as a no-captured group (?:...)
291
+
292
+ numbered-backref/call is not allowed.
293
+
294
+ [case 2] <code>ORegexp.new( '...', :options => OPTION_CAPTURE_GROUP )</code> (named group is used, 'G' option)
295
+
296
+ (?<name>...) is treated as a captured group (?:...)
297
+
298
+ numbered-backref/call is allowed.
299
+
300
+ where
301
+ * g: OPTION_DONT_CAPTURE_GROUP
302
+ * G: OPTION_CAPTURE_GROUP
303
+
304
+ ('g' and 'G' options are argued in ruby-dev ML)
305
+
306
+
307
+ == Syntax dependent options
308
+
309
+ === ONIG_SYNTAX_RUBY
310
+
311
+ [(?m)] dot(.) match newline
312
+
313
+ === ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA
314
+
315
+ [(?s)] dot(.) match newline
316
+ [(?m)] ^ match after newline, $ match before newline
317
+
318
+ == Original extensions
319
+
320
+ * hexadecimal digit char type \h, \H
321
+ * named group (?<name>...)
322
+ * named backref \k<name>
323
+ * subexp call \g<name>, \g<group-num>
324
+
325
+
326
+ == Lacking features compare with perl 5.8.0
327
+
328
+ * \N{name}
329
+ * \l,\u,\L,\U, \X, \C
330
+ * (?{code})
331
+ * (??{code})
332
+ * (?(condition)yes-pat|no-pat)
333
+ * \Q...\E
334
+
335
+ This is effective on ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA.
336
+
337
+
338
+ == Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
339
+
340
+ * add character property (\p{property}, \P{property})
341
+ * add hexadecimal digit char type (\h, \H)
342
+ * add look-behind
343
+
344
+ (?<=fixed-char-length-pattern), (?<!fixed-char-length-pattern)
345
+ * add possessive quantifier. ?+, *+, ++
346
+ * add operations in character class. [], &&
347
+
348
+ ('[' must be escaped as an usual char in character class.)
349
+ * add named group and subexp call.
350
+ * octal or hexadecimal number sequence can be treated as
351
+ a multibyte code char in character class if multibyte encoding
352
+ is specified.
353
+
354
+ (ex. <code>[\xa1\xa2], [\xa1\xa7-\xa4\xa1]</code>)
355
+ * allow the range of single byte char and multibyte char in character
356
+ class.
357
+
358
+ ex. <code>[a-<<any EUC-JP character>>]</code> in EUC-JP encoding.
359
+ * effect range of isolated option is to next ')'.
360
+ ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b).
361
+ * isolated option is not transparent to previous pattern.
362
+ ex. <code>a(?i)*</code> is a syntax error pattern.
363
+ * allowed incompleted left brace as an usual string.
364
+ ex. /{/, /({)/, /a{2,3/ etc...
365
+ * negative POSIX bracket [:^xxxx:] is supported.
366
+ * POSIX bracket [:ascii:] is added.
367
+ * repeat of look-ahead is not allowed.
368
+ ex. <code>(?=a)*</code>, <code>(?!b){5}</code>
369
+ * Ignore case option is effective to numbered character.
370
+ ex. <code>/\x61/i =~ "A"<code>
371
+ * In the range quantifier, the number of the minimum is omissible.
372
+
373
+ <code>/a{,n}/ == /a{0,n}/<code>
374
+
375
+ The simultanious abbreviation of the number of times of the minimum
376
+ and the maximum is not allowed. (/a{,}/)
377
+ * <code>a{n}?<code> is not a non-greedy operator.
378
+ <code>/a{n}?/ == /(?:a{n})?/<code>
379
+ * invalid back reference is checked and cause error.
380
+ /\1/, /(a)\2/
381
+ * Zero-length match in infinite repeat stops the repeat,
382
+ then changes of the capture group status are checked as stop condition.
383
+ /(?:()|())*\1\2/ =~ ""
384
+ /(?:\1a|())*/ =~ "a"
385
+
386
+
387
+ == Problems
388
+
389
+ * Invalid encoding byte sequence is not checked in UTF-8.
390
+
391
+ * Invalid first byte is treated as a character.
392
+ /./u =~ "\xa3"
393
+
394
+ * Incomplete byte sequence is not checked.
395
+ /\w+/ =~ "a\xf3\x8ec"
396
+