oniguruma 1.0.1-mswin32
Sign up to get free protection for your applications and to get access to all the features.
- data/History.txt +49 -0
- data/Manifest.txt +10 -0
- data/README.txt +71 -0
- data/Rakefile +41 -0
- data/Syntax.txt +396 -0
- data/ext/oregexp.c +712 -0
- data/lib/oniguruma.rb +479 -0
- data/test/test_oniguruma.rb +361 -0
- data/win/oregexp.so +0 -0
- metadata +57 -0
data/History.txt
ADDED
@@ -0,0 +1,49 @@
|
|
1
|
+
== 1.0.1 / 2007-03-28
|
2
|
+
* Minimal recommended version of oniglib changed to be compatible with Ruby 1.9, now is 4.6 or higher.
|
3
|
+
* Restore check for onig version to build with 4.6
|
4
|
+
* In getting replacement do not create temp string object, but directly add to resulting buffer (performance impr.)
|
5
|
+
* Included gem support for windows.
|
6
|
+
* Modified Rakefile to support win32 gems.
|
7
|
+
|
8
|
+
== 1.0.0 / 2007-03-27
|
9
|
+
* Added documentation for MatchData.
|
10
|
+
* Added ogsub, ogsub!, sub and sub! to ::String.
|
11
|
+
* Removed ::String definitions from tests.
|
12
|
+
* Now the minimal recommended version of oniglib is 5.5 or higher.
|
13
|
+
* Removed ugly #if statements from c code.
|
14
|
+
* Do not create @named_captures hash if there are no named groups for regexp -- somewhat improve speed for repetive calls
|
15
|
+
* Fixed usage of named backreferences in gsub with non-ascii names
|
16
|
+
* Move ORegexp#=~ to C code, make it work just like Regexp#=~, i.e. set $~. Throw ArgumentError instead of Exception if pattern does not compile
|
17
|
+
* Fix implementation of ORegexp#===, so it now does not raise errors in case statement anymore
|
18
|
+
(resembles plain Ruby Regexp#=== behaviour)
|
19
|
+
* Modified begin, end and offset methods in MatchData to handle named groups and default to group 0.
|
20
|
+
* Exception is not longer thrown when in oregexp_make_match_data.
|
21
|
+
* Removed references to MultiMatchData from documentation
|
22
|
+
* Removed class MultiMatchData
|
23
|
+
* Fix off by one error in region->num_regs usage
|
24
|
+
* Fix dumb bug with zero-width matches that made infinite loops. now consume at least one char in gsub and scan
|
25
|
+
* ORegexp API changes:
|
26
|
+
* Pass only MatchData to sub/gsub with blocks
|
27
|
+
oregexp.sub( str ) {|match_data| ... }
|
28
|
+
oregexp.gsub( str ) {|match_data| ... }
|
29
|
+
* Add ORegexp#scan instead of match_all
|
30
|
+
oregexp.scan(str) {|match_data| ... } # => MultiMatchData
|
31
|
+
* Friendly way to set options
|
32
|
+
ORegexp.new( pattern, options_str, encoding, syntax)
|
33
|
+
ORegexp.new('\w+', 'imsx', 'koi8r', 'perl')
|
34
|
+
* Named backreferences in substitions
|
35
|
+
ORegexp.new('(?<pre>\w+)\d+(?<after>\w+)').sub('abc123def', '\<after>123\<pre>') #=> 'def123abc'
|
36
|
+
* couple of bugfixes with region's num_regs
|
37
|
+
* some docs for substitution methods added
|
38
|
+
|
39
|
+
== 0.9.1 / 2007-03-25
|
40
|
+
* FIX: Buggy resolution of numeric codes for encoding and syntax options (Nikolai Lugovoi)
|
41
|
+
* FIX: Buggy implementation of ORegexp#gsub and ORegexp#gsub methods. Now code is all C (Nikolai Lugovoi)
|
42
|
+
* Added documentation for class ORegexp
|
43
|
+
* Added regexp syntax documentation.
|
44
|
+
|
45
|
+
== 0.9.0 / 2007-03-19
|
46
|
+
|
47
|
+
* 1 major enhancement
|
48
|
+
* Birthday!
|
49
|
+
|
data/Manifest.txt
ADDED
data/README.txt
ADDED
@@ -0,0 +1,71 @@
|
|
1
|
+
== ONIGURUMA FOR RUBY:
|
2
|
+
|
3
|
+
Ruby bindings to the Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] regular expression library (no need to recompile Ruby).
|
4
|
+
|
5
|
+
== FEATURES:
|
6
|
+
|
7
|
+
* Increased performance.
|
8
|
+
* Same interface than standard Regexp class (easy transition!).
|
9
|
+
* Support for named groups, look-ahead, look-behind, and other
|
10
|
+
cool features!
|
11
|
+
* Support for other regexp syntaxes (Perl, Python, Java, etc.)
|
12
|
+
|
13
|
+
== SYNOPSIS:
|
14
|
+
|
15
|
+
reg = Oniguruma::ORegex.new( '(?<before>.*)(a)(?<after>.*)' )
|
16
|
+
match = reg.match( 'terraforming' )
|
17
|
+
puts match[0] <= 'terraforming'
|
18
|
+
puts match[:before] <= 'terr'
|
19
|
+
puts match[:after] <= 'forming'
|
20
|
+
|
21
|
+
== SYNTAX
|
22
|
+
|
23
|
+
Consult the Syntax.txt[link:files/Syntax_txt.html] page.
|
24
|
+
|
25
|
+
== REQUIREMENTS:
|
26
|
+
|
27
|
+
* Oniguruma[http://www.geocities.jp/kosako3/oniguruma/] library v. 5.5 or higher
|
28
|
+
|
29
|
+
== INSTALL:
|
30
|
+
|
31
|
+
sudo gem install -r oniguruma
|
32
|
+
|
33
|
+
== BUGS/PROBLEMS/INCOMPATIBILITIES:
|
34
|
+
|
35
|
+
* <code>ORegexp#~</code> is not implemented.
|
36
|
+
* <code>ORegexp#kcode</code> results are not compatible with <code>Regexp</code>.
|
37
|
+
* <code>ORegexp</code> options set in the string are not visible, this affects
|
38
|
+
<code>ORegexp#options</code>, <code>ORegexp#to_s</code>, <code>ORegexp#inspect</code>
|
39
|
+
and <code>ORegexp#==</code>.
|
40
|
+
|
41
|
+
== TODO:
|
42
|
+
|
43
|
+
* Complete documentation (methods, oniguruma syntax).
|
44
|
+
|
45
|
+
== CREDITS:
|
46
|
+
|
47
|
+
* N. Lugovoi. ORegexp.sub and ORegexp.gsub code and lots of other stuff.
|
48
|
+
* K. Kosako. For his great library.
|
49
|
+
* A lot of the documentation has been copied from the original Ruby Regex documentation.
|
50
|
+
|
51
|
+
== LICENSE:
|
52
|
+
|
53
|
+
New BSD License
|
54
|
+
|
55
|
+
Copyright (c) 2007, Dizan Vasquez
|
56
|
+
All rights reserved.
|
57
|
+
|
58
|
+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
59
|
+
|
60
|
+
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
|
61
|
+
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
|
62
|
+
documentation and/or other materials provided with the distribution.
|
63
|
+
* Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this
|
64
|
+
software without specific prior written permission.
|
65
|
+
|
66
|
+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
67
|
+
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
68
|
+
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
69
|
+
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
70
|
+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
|
71
|
+
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
data/Rakefile
ADDED
@@ -0,0 +1,41 @@
|
|
1
|
+
require 'rubygems'
|
2
|
+
require 'hoe'
|
3
|
+
|
4
|
+
class Hoe
|
5
|
+
# Dirty hack to eliminate Hoe from gem dependencies
|
6
|
+
def extra_deps
|
7
|
+
@extra_deps.reject { |x| Array(x).first == 'hoe' }
|
8
|
+
end
|
9
|
+
|
10
|
+
# Dirty hack to package only the required files per platform
|
11
|
+
def spec= s
|
12
|
+
if ENV['PLATFORM'] =~ /win32/
|
13
|
+
s.files = s.files.reject! {|f| f =~ /extconf\.rb/}
|
14
|
+
else
|
15
|
+
s.files = s.files.reject! {|f| f =~ /win\//}
|
16
|
+
end
|
17
|
+
@spec = s
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
version = /^== *(\d+\.\d+\.\d+)/.match( File.read( 'History.txt' ) )[1]
|
22
|
+
|
23
|
+
Hoe.new('oniguruma', version) do |p|
|
24
|
+
p.rubyforge_name = 'oniguruma'
|
25
|
+
p.author = ['Dizan Vasquez', 'Nikolai Lugovoi']
|
26
|
+
p.email = 'dichodaemon@gmail.com'
|
27
|
+
p.summary = 'Bindings for the oniguruma regular expression library'
|
28
|
+
p.description = p.paragraphs_of('README.txt', 1 ).join('\n\n')
|
29
|
+
p.url = 'http://oniguruma.rubyforge.org'
|
30
|
+
if ENV['PLATFORM'] =~ /win32/
|
31
|
+
p.lib_files = ["win/oregexp.so"]
|
32
|
+
p.spec_extras[:require_paths] = ["win", "lib", "ext" ]
|
33
|
+
p.spec_extras[:platform] = Gem::Platform::WIN32
|
34
|
+
else
|
35
|
+
p.spec_extras[:extensions] = ["ext/extconf.rb"]
|
36
|
+
end
|
37
|
+
p.rdoc_pattern = /^(lib|bin|ext)|txt$/
|
38
|
+
p.changes = p.paragraphs_of('History.txt', 0).join("\n\n")
|
39
|
+
end
|
40
|
+
|
41
|
+
|
data/Syntax.txt
ADDED
@@ -0,0 +1,396 @@
|
|
1
|
+
= RUBY REGULAR EXPRESSION SYNTAX
|
2
|
+
|
3
|
+
|
4
|
+
== Syntax Elements
|
5
|
+
|
6
|
+
[\] escape (enable or disable meta character meaning)
|
7
|
+
[|] alternation
|
8
|
+
[(...)] group
|
9
|
+
[[...]] character class
|
10
|
+
|
11
|
+
|
12
|
+
== Characters
|
13
|
+
|
14
|
+
[\t] horizontal tab (0x09)
|
15
|
+
[\v] vertical tab (0x0B)
|
16
|
+
[\n] newline (0x0A)
|
17
|
+
[\r] return (0x0D)
|
18
|
+
[\b] back space (0x08)
|
19
|
+
|
20
|
+
\b is effective in character class [...] only
|
21
|
+
[\f] form feed (0x0C)
|
22
|
+
[\a] bell (0x07)
|
23
|
+
[\e] escape (0x1B)
|
24
|
+
[\nnn] octal char (encoded byte value)
|
25
|
+
[\xHH] hexadecimal char (encoded byte value)
|
26
|
+
[\x{7HHHHHHH}] wide hexadecimal char (character code point value)
|
27
|
+
[\cx] control char (character code point value)
|
28
|
+
[\C-x] control char (character code point value)
|
29
|
+
[\M-x] meta (x|0x80) (character code point value)
|
30
|
+
[\M-\C-x] meta control char (character code point value)
|
31
|
+
|
32
|
+
|
33
|
+
|
34
|
+
== Character types
|
35
|
+
|
36
|
+
[.] any character (except newline)
|
37
|
+
[\w] word character
|
38
|
+
|
39
|
+
Not Unicode:
|
40
|
+
* alphanumeric, "_" and multibyte char.
|
41
|
+
Unicode:
|
42
|
+
* General_Category -- (Letter|Mark|Number|Connector_Punctuation)
|
43
|
+
[\W] non word char
|
44
|
+
[\s] whitespace char
|
45
|
+
|
46
|
+
Not Unicode:
|
47
|
+
* \t, \n, \v, \f, \r, \x20
|
48
|
+
Unicode:
|
49
|
+
* 0009, 000A, 000B, 000C, 000D, 0085(NEL),
|
50
|
+
* General_Category:
|
51
|
+
* -- Line_Separator
|
52
|
+
* -- Paragraph_Separator
|
53
|
+
* -- Space_Separator
|
54
|
+
[\S] non whitespace char
|
55
|
+
[\d] decimal digit char
|
56
|
+
|
57
|
+
Unicode: General_Category -- Decimal_Number
|
58
|
+
[\D] non decimal digit char
|
59
|
+
[\h] hexadecimal digit char [0-9a-fA-F]
|
60
|
+
[\H] non hexadecimal digit char
|
61
|
+
|
62
|
+
|
63
|
+
== Character Properties
|
64
|
+
|
65
|
+
\p{property-name}
|
66
|
+
\p{^property-name} (negative)
|
67
|
+
\P{property-name} (negative)
|
68
|
+
|
69
|
+
=== property-name:
|
70
|
+
|
71
|
+
Works on all encodings:
|
72
|
+
* Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower,
|
73
|
+
Print, Punct, Space, Upper, XDigit, Word, ASCII,
|
74
|
+
Works on EUC_JP, Shift_JIS:
|
75
|
+
* Hiragana, Katakana
|
76
|
+
Works on UTF8, UTF16, UTF32:
|
77
|
+
* Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu,
|
78
|
+
M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps,
|
79
|
+
S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs,
|
80
|
+
Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese,
|
81
|
+
Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic,
|
82
|
+
Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian,
|
83
|
+
Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul,
|
84
|
+
Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana,
|
85
|
+
Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
|
86
|
+
Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian,
|
87
|
+
Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac,
|
88
|
+
Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan,
|
89
|
+
Tifinagh, Ugaritic, Yi
|
90
|
+
|
91
|
+
== Quantifiers
|
92
|
+
|
93
|
+
=== Greedy
|
94
|
+
|
95
|
+
[?] 1 or 0 times
|
96
|
+
[*] 0 or more times
|
97
|
+
[+] 1 or more times
|
98
|
+
[{n,m}] at least n but not more than m times
|
99
|
+
[{n,}] at least n times
|
100
|
+
[{,n}] at least 0 but not more than n times ({0,n})
|
101
|
+
[{n}] n times
|
102
|
+
|
103
|
+
=== Reluctant
|
104
|
+
|
105
|
+
[??] 1 or 0 times
|
106
|
+
[*?] 0 or more times
|
107
|
+
[+?] 1 or more times
|
108
|
+
[{n,m}?] at least n but not more than m times
|
109
|
+
[{n,}?] at least n times
|
110
|
+
[{,n}?] at least 0 but not more than n times (== {0,n}?)
|
111
|
+
|
112
|
+
=== Possessive (greedy and does not backtrack after repeated)
|
113
|
+
|
114
|
+
[?+] 1 or 0 times
|
115
|
+
[*+] 0 or more times
|
116
|
+
[++] 1 or more times
|
117
|
+
|
118
|
+
({n,m}+, {n,}+, {n}+ are possessive op. in ONIG_SYNTAX_JAVA only)
|
119
|
+
|
120
|
+
|
121
|
+
== Anchors
|
122
|
+
|
123
|
+
[^] beginning of the line
|
124
|
+
[$] end of the line
|
125
|
+
[\b] word boundary
|
126
|
+
[\B] not word boundary
|
127
|
+
[\A] beginning of string
|
128
|
+
[\Z] end of string, or before newline at the end
|
129
|
+
[\z] end of string
|
130
|
+
[\G] matching start position
|
131
|
+
|
132
|
+
|
133
|
+
== Character class
|
134
|
+
|
135
|
+
[^...] negative class (lowest precedence operator)
|
136
|
+
[x-y] range from x to y
|
137
|
+
[[...]] set (character class in character class)
|
138
|
+
[..&&..] intersection (low precedence at the next of ^)
|
139
|
+
|
140
|
+
If you want to use '[', '-', ']' as a normal character
|
141
|
+
in a character class, you should escape these characters by '\'.
|
142
|
+
|
143
|
+
|
144
|
+
POSIX bracket ([:xxxxx:], negate [:^xxxxx:])
|
145
|
+
|
146
|
+
=== Not Unicode Case:
|
147
|
+
|
148
|
+
[alnum] alphabet or digit char
|
149
|
+
[alpha] alphabet
|
150
|
+
[ascii] code value: [0 - 127]
|
151
|
+
[blank] \t, \x20
|
152
|
+
[cntrl] control
|
153
|
+
[digit] 0-9
|
154
|
+
[graph] include all of multibyte encoded characters
|
155
|
+
[lower] lower case
|
156
|
+
[print] include all of multibyte encoded characters
|
157
|
+
[punct] punctuation
|
158
|
+
[space] \t, \n, \v, \f, \r, \x20
|
159
|
+
[upper] upper case
|
160
|
+
[xdigit] 0-9, a-f, A-F
|
161
|
+
[word] alphanumeric, "_" and multibyte characters
|
162
|
+
|
163
|
+
|
164
|
+
=== Unicode Case:
|
165
|
+
|
166
|
+
[alnum] Letter | Mark | Decimal_Number
|
167
|
+
[alpha] Letter | Mark
|
168
|
+
[ascii] 0000 - 007F
|
169
|
+
[blank] Space_Separator | 0009
|
170
|
+
[cntrl] Control | Format | Unassigned | Private_Use | Surrogate
|
171
|
+
[digit] Decimal_Number
|
172
|
+
[graph] [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
|
173
|
+
[lower] Lowercase_Letter
|
174
|
+
[print] [[:graph:]] | [[:space:]]
|
175
|
+
[punct] Connector_Punctuation | Dash_Punctuation | Close_Punctuation |
|
176
|
+
Final_Punctuation | Initial_Punctuation | Other_Punctuation |
|
177
|
+
Open_Punctuation
|
178
|
+
[space] Space_Separator | Line_Separator | Paragraph_Separator |
|
179
|
+
0009 | 000A | 000B | 000C | 000D | 0085
|
180
|
+
[upper] Uppercase_Letter
|
181
|
+
[xdigit] 0030 - 0039 | 0041 - 0046 | 0061 - 0066
|
182
|
+
(0-9, a-f, A-F)
|
183
|
+
[word] Letter | Mark | Decimal_Number | Connector_Punctuation
|
184
|
+
|
185
|
+
|
186
|
+
|
187
|
+
== Extended groups
|
188
|
+
|
189
|
+
[(?#...)] comment
|
190
|
+
[(?imx-imx)] option on/off:
|
191
|
+
* i: ignore case
|
192
|
+
* m: multi-line (dot(.) match newline)
|
193
|
+
* x: extended form
|
194
|
+
[(?imx-imx:subexp)] option on/off for subexp
|
195
|
+
[(?:subexp)] not captured group
|
196
|
+
[(subexp)] captured group
|
197
|
+
[(?=subexp)] look-ahead
|
198
|
+
[(?!subexp)] negative look-ahead
|
199
|
+
[(?<=subexp)] look-behind
|
200
|
+
[(?<!subexp)] negative look-behind
|
201
|
+
|
202
|
+
Subexp of look-behind must be fixed character length.
|
203
|
+
But different character length is allowed in top level
|
204
|
+
alternatives only.
|
205
|
+
ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
|
206
|
+
|
207
|
+
In negative-look-behind, captured group isn't allowed,
|
208
|
+
but shy group(?:) is allowed.
|
209
|
+
[(?>subexp)] atomic group
|
210
|
+
don't backtrack in subexp.
|
211
|
+
[(?<name>subexp)] define named group
|
212
|
+
(All characters of the name must be a word character.)
|
213
|
+
|
214
|
+
Not only a name but a number is assigned like a captured
|
215
|
+
group.
|
216
|
+
|
217
|
+
Assigning the same name as two or more subexps is allowed.
|
218
|
+
In this case, a subexp call can not be performed although
|
219
|
+
the back reference is possible.
|
220
|
+
|
221
|
+
|
222
|
+
== Back reference
|
223
|
+
|
224
|
+
[\n] back reference by group number (n >= 1)
|
225
|
+
[\k<name>] back reference by group name
|
226
|
+
In the back reference by the multiplex definition name,
|
227
|
+
a subexp with a large number is referred to preferentially.
|
228
|
+
(When not matched, a group of the small number is referred to.)
|
229
|
+
|
230
|
+
* Back reference by group number is forbidden if named group is defined
|
231
|
+
in the pattern and ONIG_OPTION_CAPTURE_GROUP is not setted.
|
232
|
+
|
233
|
+
|
234
|
+
=== Back reference with nest level
|
235
|
+
|
236
|
+
[\k<name+n>] n: 0, 1, 2, ...
|
237
|
+
[\k<name-n>] n: 0, 1, 2, ...
|
238
|
+
|
239
|
+
Destinate relative nest level from back reference position.
|
240
|
+
|
241
|
+
Examples:
|
242
|
+
/\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer")
|
243
|
+
|
244
|
+
r = ORegexp.compile(<<'__REGEXP__'.strip, :options => Oniguruma::EXTENDED)
|
245
|
+
(?<element> \g<stag> \g<content>* \g<etag> ){0}
|
246
|
+
(?<stag> < \g<name> \s* > ){0}
|
247
|
+
(?<name> [a-zA-Z_:]+ ){0}
|
248
|
+
(?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
|
249
|
+
(?<etag> </ \k<name+1> >){0}
|
250
|
+
\g<element>
|
251
|
+
__REGEXP__
|
252
|
+
|
253
|
+
p r.match('<foo>f<bar>bbb</bar>f</foo>').captures
|
254
|
+
|
255
|
+
|
256
|
+
|
257
|
+
=== Subexp call ("Tanaka Akira special")
|
258
|
+
|
259
|
+
[\g<name>] call by group name
|
260
|
+
[\g<n>] call by group number (n >= 1)
|
261
|
+
|
262
|
+
* left-most recursive call is not allowed.
|
263
|
+
|
264
|
+
Example:
|
265
|
+
(?<name>a|\g<name>b) => error
|
266
|
+
(?<name>a|b\g<name>c) => OK
|
267
|
+
* Call by group number is forbidden if named group is defined in the pattern
|
268
|
+
and Oniguruma::OPTION_CAPTURE_GROUP is not set.
|
269
|
+
* If the option status of called group is different from calling position
|
270
|
+
then the group's option is effective.
|
271
|
+
|
272
|
+
Example:
|
273
|
+
(?-i:\g<name>)(?i:(?<name>a)){0} <i>matches "A"</i>
|
274
|
+
|
275
|
+
|
276
|
+
== Captured group
|
277
|
+
|
278
|
+
Behavior of the no-named group (...) changes with the following conditions.
|
279
|
+
(But named group is not changed.)
|
280
|
+
|
281
|
+
[case 1] <code>ORegexp.new( '...' )</code> (named group is not used, no option)
|
282
|
+
|
283
|
+
... is treated as a captured group.
|
284
|
+
[case 2] <code>ORegexp.new( '...', :options => OPTION_DONT_CAPTURE_GROUP )</code> (named group is not used, 'g' option)
|
285
|
+
|
286
|
+
... is treated as a no-captured group (?:...).
|
287
|
+
|
288
|
+
[case 3] <code>ORegexp.new( '...(?<name>...)...' )</code> (named group is used, no option)
|
289
|
+
|
290
|
+
(?<name>...) is treated as a no-captured group (?:...)
|
291
|
+
|
292
|
+
numbered-backref/call is not allowed.
|
293
|
+
|
294
|
+
[case 2] <code>ORegexp.new( '...', :options => OPTION_CAPTURE_GROUP )</code> (named group is used, 'G' option)
|
295
|
+
|
296
|
+
(?<name>...) is treated as a captured group (?:...)
|
297
|
+
|
298
|
+
numbered-backref/call is allowed.
|
299
|
+
|
300
|
+
where
|
301
|
+
* g: OPTION_DONT_CAPTURE_GROUP
|
302
|
+
* G: OPTION_CAPTURE_GROUP
|
303
|
+
|
304
|
+
('g' and 'G' options are argued in ruby-dev ML)
|
305
|
+
|
306
|
+
|
307
|
+
== Syntax dependent options
|
308
|
+
|
309
|
+
=== ONIG_SYNTAX_RUBY
|
310
|
+
|
311
|
+
[(?m)] dot(.) match newline
|
312
|
+
|
313
|
+
=== ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA
|
314
|
+
|
315
|
+
[(?s)] dot(.) match newline
|
316
|
+
[(?m)] ^ match after newline, $ match before newline
|
317
|
+
|
318
|
+
== Original extensions
|
319
|
+
|
320
|
+
* hexadecimal digit char type \h, \H
|
321
|
+
* named group (?<name>...)
|
322
|
+
* named backref \k<name>
|
323
|
+
* subexp call \g<name>, \g<group-num>
|
324
|
+
|
325
|
+
|
326
|
+
== Lacking features compare with perl 5.8.0
|
327
|
+
|
328
|
+
* \N{name}
|
329
|
+
* \l,\u,\L,\U, \X, \C
|
330
|
+
* (?{code})
|
331
|
+
* (??{code})
|
332
|
+
* (?(condition)yes-pat|no-pat)
|
333
|
+
* \Q...\E
|
334
|
+
|
335
|
+
This is effective on ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA.
|
336
|
+
|
337
|
+
|
338
|
+
== Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
|
339
|
+
|
340
|
+
* add character property (\p{property}, \P{property})
|
341
|
+
* add hexadecimal digit char type (\h, \H)
|
342
|
+
* add look-behind
|
343
|
+
|
344
|
+
(?<=fixed-char-length-pattern), (?<!fixed-char-length-pattern)
|
345
|
+
* add possessive quantifier. ?+, *+, ++
|
346
|
+
* add operations in character class. [], &&
|
347
|
+
|
348
|
+
('[' must be escaped as an usual char in character class.)
|
349
|
+
* add named group and subexp call.
|
350
|
+
* octal or hexadecimal number sequence can be treated as
|
351
|
+
a multibyte code char in character class if multibyte encoding
|
352
|
+
is specified.
|
353
|
+
|
354
|
+
(ex. <code>[\xa1\xa2], [\xa1\xa7-\xa4\xa1]</code>)
|
355
|
+
* allow the range of single byte char and multibyte char in character
|
356
|
+
class.
|
357
|
+
|
358
|
+
ex. <code>[a-<<any EUC-JP character>>]</code> in EUC-JP encoding.
|
359
|
+
* effect range of isolated option is to next ')'.
|
360
|
+
ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b).
|
361
|
+
* isolated option is not transparent to previous pattern.
|
362
|
+
ex. <code>a(?i)*</code> is a syntax error pattern.
|
363
|
+
* allowed incompleted left brace as an usual string.
|
364
|
+
ex. /{/, /({)/, /a{2,3/ etc...
|
365
|
+
* negative POSIX bracket [:^xxxx:] is supported.
|
366
|
+
* POSIX bracket [:ascii:] is added.
|
367
|
+
* repeat of look-ahead is not allowed.
|
368
|
+
ex. <code>(?=a)*</code>, <code>(?!b){5}</code>
|
369
|
+
* Ignore case option is effective to numbered character.
|
370
|
+
ex. <code>/\x61/i =~ "A"<code>
|
371
|
+
* In the range quantifier, the number of the minimum is omissible.
|
372
|
+
|
373
|
+
<code>/a{,n}/ == /a{0,n}/<code>
|
374
|
+
|
375
|
+
The simultanious abbreviation of the number of times of the minimum
|
376
|
+
and the maximum is not allowed. (/a{,}/)
|
377
|
+
* <code>a{n}?<code> is not a non-greedy operator.
|
378
|
+
<code>/a{n}?/ == /(?:a{n})?/<code>
|
379
|
+
* invalid back reference is checked and cause error.
|
380
|
+
/\1/, /(a)\2/
|
381
|
+
* Zero-length match in infinite repeat stops the repeat,
|
382
|
+
then changes of the capture group status are checked as stop condition.
|
383
|
+
/(?:()|())*\1\2/ =~ ""
|
384
|
+
/(?:\1a|())*/ =~ "a"
|
385
|
+
|
386
|
+
|
387
|
+
== Problems
|
388
|
+
|
389
|
+
* Invalid encoding byte sequence is not checked in UTF-8.
|
390
|
+
|
391
|
+
* Invalid first byte is treated as a character.
|
392
|
+
/./u =~ "\xa3"
|
393
|
+
|
394
|
+
* Incomplete byte sequence is not checked.
|
395
|
+
/\w+/ =~ "a\xf3\x8ec"
|
396
|
+
|