re 0.0.4 → 0.0.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. data/README.rdoc +71 -7
  2. data/Rakefile +31 -3
  3. data/lib/re.rb +129 -38
  4. data/test/re_test.rb +58 -2
  5. metadata +2 -2
@@ -1,5 +1,5 @@
1
1
 
2
- = Regular Expression Construction.
2
+ = Regular Expression Construction
3
3
 
4
4
  Complex regular expressions are hard to construct and even harder to
5
5
  read. The Re library allows users to construct complex regular
@@ -8,7 +8,7 @@ following regular expression that will parse dates:
8
8
 
9
9
  /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
10
10
 
11
- Using the Re library, That regular expression can be built
11
+ Using the Re library, that regular expression can be built
12
12
  incrementaly from smaller, easier to understand expressions.
13
13
  Perhaps something like this:
14
14
 
@@ -38,9 +38,11 @@ groups can be retrieved by name:
38
38
  result.data(:month) # => "01"
39
39
  result.data(:day) # => "23"
40
40
 
41
- == Version: 0.0.4
41
+ == Version
42
42
 
43
- == Usage:
43
+ This document describes Re version 0.0.5.
44
+
45
+ == Usage
44
46
 
45
47
  include Re
46
48
 
@@ -51,7 +53,7 @@ groups can be retrieved by name:
51
53
  puts "No Match"
52
54
  end
53
55
 
54
- == Examples:
56
+ == Examples
55
57
 
56
58
  re("a") -- matches "a"
57
59
  re("a") + re("b") -- matches "ab"
@@ -83,9 +85,70 @@ and character class functions.
83
85
 
84
86
  See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
85
87
 
86
- == License and Copyright:
88
+ == Performance
89
+
90
+ We should say a word or two about performance.
91
+
92
+ First of all, building regular expressions using Re is slow. If you
93
+ use Re to build regular expressions, you are encouraged to build the
94
+ regular expression once and reuse it as needed. This means you
95
+ won't do a lot of inline expressions using Re, but rather assign the
96
+ generated Re regular expression to a constant. For example:
97
+
98
+ PHONE_RE = re.digit.repeat(3).capture(:area) +
99
+ re("-") +
100
+ re.digit.repeat(3).capture(:exchange) +
101
+ re("-") +
102
+ re.digit.repeat(4)).capture(:subscriber)
103
+
104
+ Alternatively, you can arrange for the regular expression to be
105
+ constructed only when actually needed. Something like:q
106
+
107
+ def phone_re
108
+ @phone_re ||= re.digit.repeat(3).capture(:area) +
109
+ re("-") +
110
+ re.digit.repeat(3).capture(:exchange) +
111
+ re("-") +
112
+ re.digit.repeat(4)).capture(:subscriber)
113
+ end
114
+
115
+ That method constructs the phone number regular expression once and
116
+ returns a cached value thereafter. Just make sure you put the
117
+ method in an object that is instantiated once (e.g. a class method).
118
+
119
+ When used in matching, Re regular expressions perform fairly well
120
+ compared to native regular expressions. The overhead is a small
121
+ number of extra method calls and the creation of a Re::Result object
122
+ to return the match results.
123
+
124
+ If regular expression performance is a premium in your application,
125
+ then you can still use Re to construct the regular expression and
126
+ extract the raw Ruby Regexp object to be used for the actual
127
+ matching. You lose the ability to use named capture groups easily,
128
+ but you get raw Ruby regular expression matching performance.
129
+
130
+ For example, if you wanted to use the raw regular expression from
131
+ PHONE_RE defined above, you could extract the regular expression
132
+ like this:
133
+
134
+ PHONE_REGEXP = PHONE_RE.regexp
135
+
136
+ And then use it directly:
137
+
138
+ if PHONE_REGEXP =~ string
139
+ # blah blah blah
140
+ end
141
+
142
+ The above match runs at full Ruby matching speed. If you still
143
+ wanted named capture groups, you can something like this:
144
+
145
+ match_data = PHONE_REGEXP.match(string)
146
+ area_code = match_data[PHONE_RE.name_map[:area]]
147
+
148
+ == License and Copyright
87
149
 
88
- Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
150
+ Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
151
+ All rights Reserved.
89
152
 
90
153
  Re is provided under the MIT open source license (see MIT-LICENSE)
91
154
 
@@ -94,6 +157,7 @@ Re is provided under the MIT open source license (see MIT-LICENSE)
94
157
  Documentation :: http://re-lib.rubyforge.org
95
158
  Source :: http://github.com/jimweirich/re
96
159
  GemCutter :: http://gemcutter.org/gems/re
160
+ Download :: http://rubyforge.org/frs/?group_id=9329
97
161
  Bug Tracker :: http://www.pivotaltracker.com/projects/47758
98
162
  Author :: jim.weirich@gmail.com
99
163
 
data/Rakefile CHANGED
@@ -14,8 +14,36 @@ Rake::TestTask.new(:test) do |t|
14
14
  t.test_files = FileList['test/*_test.rb']
15
15
  end
16
16
 
17
- task :release => [:check_non_beta, :readme, :gem, "publish:rdoc"]
17
+ namespace "release" do
18
+ task :new => [
19
+ :readme,
20
+ :check_non_beta,
21
+ :check_all_committed,
22
+ :tag_version,
23
+ :gem,
24
+ "publish:rdoc"
25
+ ]
26
+
27
+ task :check_all_committed do
28
+ status = `git status`
29
+ unless status =~ /nothing to commit/
30
+ fail "Outstanding Git Changes:\n#{status}"
31
+ end
32
+ end
33
+
34
+ task :commit_new_version do
35
+ sh "git commit -m 'bumped to version #{Re::VERSION}'"
36
+ end
37
+
38
+ task :not_already_tagged
18
39
 
19
- task :check_non_beta do
20
- fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
40
+ task :tag_version => :not_already_tagged do
41
+ sh "git tag re-#{Re::VERSION}"
42
+ sh "git push --tags"
43
+ end
44
+
45
+ task :check_non_beta do
46
+ fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
47
+ end
21
48
  end
49
+ task :release => "release:new"
data/lib/re.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/ruby -wKU
2
2
  #
3
- # = Regular Expression Construction.
3
+ # = Regular Expression Construction
4
4
  #
5
5
  # Complex regular expressions are hard to construct and even harder to
6
6
  # read. The Re library allows users to construct complex regular
@@ -9,7 +9,7 @@
9
9
  #
10
10
  # /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
11
11
  #
12
- # Using the Re library, That regular expression can be built
12
+ # Using the Re library, that regular expression can be built
13
13
  # incrementaly from smaller, easier to understand expressions.
14
14
  # Perhaps something like this:
15
15
  #
@@ -39,7 +39,7 @@
39
39
  # result.data(:month) # => "01"
40
40
  # result.data(:day) # => "23"
41
41
  #
42
- # == Usage:
42
+ # == Usage
43
43
  #
44
44
  # include Re
45
45
  #
@@ -50,7 +50,7 @@
50
50
  # puts "No Match"
51
51
  # end
52
52
  #
53
- # == Examples:
53
+ # == Examples
54
54
  #
55
55
  # re("a") -- matches "a"
56
56
  # re("a") + re("b") -- matches "ab"
@@ -82,9 +82,70 @@
82
82
  #
83
83
  # See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
84
84
  #
85
- # == License and Copyright:
85
+ # == Performance
86
86
  #
87
- # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
87
+ # We should say a word or two about performance.
88
+ #
89
+ # First of all, building regular expressions using Re is slow. If you
90
+ # use Re to build regular expressions, you are encouraged to build the
91
+ # regular expression once and reuse it as needed. This means you
92
+ # won't do a lot of inline expressions using Re, but rather assign the
93
+ # generated Re regular expression to a constant. For example:
94
+ #
95
+ # PHONE_RE = re.digit.repeat(3).capture(:area) +
96
+ # re("-") +
97
+ # re.digit.repeat(3).capture(:exchange) +
98
+ # re("-") +
99
+ # re.digit.repeat(4)).capture(:subscriber)
100
+ #
101
+ # Alternatively, you can arrange for the regular expression to be
102
+ # constructed only when actually needed. Something like:q
103
+ #
104
+ # def phone_re
105
+ # @phone_re ||= re.digit.repeat(3).capture(:area) +
106
+ # re("-") +
107
+ # re.digit.repeat(3).capture(:exchange) +
108
+ # re("-") +
109
+ # re.digit.repeat(4)).capture(:subscriber)
110
+ # end
111
+ #
112
+ # That method constructs the phone number regular expression once and
113
+ # returns a cached value thereafter. Just make sure you put the
114
+ # method in an object that is instantiated once (e.g. a class method).
115
+ #
116
+ # When used in matching, Re regular expressions perform fairly well
117
+ # compared to native regular expressions. The overhead is a small
118
+ # number of extra method calls and the creation of a Re::Result object
119
+ # to return the match results.
120
+ #
121
+ # If regular expression performance is a premium in your application,
122
+ # then you can still use Re to construct the regular expression and
123
+ # extract the raw Ruby Regexp object to be used for the actual
124
+ # matching. You lose the ability to use named capture groups easily,
125
+ # but you get raw Ruby regular expression matching performance.
126
+ #
127
+ # For example, if you wanted to use the raw regular expression from
128
+ # PHONE_RE defined above, you could extract the regular expression
129
+ # like this:
130
+ #
131
+ # PHONE_REGEXP = PHONE_RE.regexp
132
+ #
133
+ # And then use it directly:
134
+ #
135
+ # if PHONE_REGEXP =~ string
136
+ # # blah blah blah
137
+ # end
138
+ #
139
+ # The above match runs at full Ruby matching speed. If you still
140
+ # wanted named capture groups, you can something like this:
141
+ #
142
+ # match_data = PHONE_REGEXP.match(string)
143
+ # area_code = match_data[PHONE_RE.name_map[:area]]
144
+ #
145
+ # == License and Copyright
146
+ #
147
+ # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
148
+ # All rights Reserved.
88
149
  #
89
150
  # Re is provided under the MIT open source license (see MIT-LICENSE)
90
151
  #
@@ -93,6 +154,7 @@
93
154
  # Documentation :: http://re-lib.rubyforge.org
94
155
  # Source :: http://github.com/jimweirich/re
95
156
  # GemCutter :: http://gemcutter.org/gems/re
157
+ # Download :: http://rubyforge.org/frs/?group_id=9329
96
158
  # Bug Tracker :: http://www.pivotaltracker.com/projects/47758
97
159
  # Author :: jim.weirich@gmail.com
98
160
  #
@@ -102,7 +164,7 @@ module Re
102
164
  NUMBERS = [
103
165
  MAJOR = 0,
104
166
  MINOR = 0,
105
- BUILD = 4,
167
+ BUILD = 5,
106
168
  BETA = nil,
107
169
  ].compact
108
170
  end
@@ -125,8 +187,15 @@ module Re
125
187
 
126
188
  # Return the text of the named capture data.
127
189
  def [](name)
128
- index = @rexp.capture_keys.index(name)
129
- index ? @match_data[index+1] : nil
190
+ index = name_map[name]
191
+ index ? @match_data[index] : nil
192
+ end
193
+
194
+ private
195
+
196
+ # Lazy eval map of names to capture indices.
197
+ def name_map
198
+ @name_map ||= @rexp.name_map
130
199
  end
131
200
  end
132
201
 
@@ -154,29 +223,47 @@ module Re
154
223
  @level = level
155
224
  @capture_keys = keys
156
225
  @options = options
226
+ @greedy = true
157
227
  end
158
228
 
229
+ # Does it match a string? Returns Re::Result if match, nil
230
+ # otherwise.
231
+ def match(string)
232
+ md = regexp.match(string)
233
+ md ? Result.new(md, self) : nil
234
+ end
235
+ alias =~ match
236
+
159
237
  # Return a Regexp from the the constructed regular expression.
160
238
  def regexp
161
239
  @regexp ||= Regexp.new(encoding)
162
240
  end
163
241
 
164
- # Does it match a string? (returns Re::Result if match, nil otherwise)
165
- def match(string)
166
- md = regexp.match(string)
167
- md ? Result.new(md, self) : nil
242
+ # Is the current regular expression marked to be treated as greedy
243
+ # when repeat operators are applied?
244
+ def greedy?
245
+ @greedy
168
246
  end
169
- alias =~ match
170
247
 
171
- # New regular expresion that matches the concatenation of self and
172
- # other.
248
+ # Map of names to capture indices. Use this to lookup names in
249
+ # the the match data returned from a regular expression match.
250
+ def name_map
251
+ result = {}
252
+ capture_keys.each_with_index do |key, i|
253
+ result[key] = i + 1
254
+ end
255
+ result
256
+ end
257
+
258
+ # New regular expression that matches the concatenation of self
259
+ # and other.
173
260
  def +(other)
174
261
  Rexp.new(parenthesized_encoding(CONCAT) + other.parenthesized_encoding(CONCAT),
175
262
  CONCAT,
176
263
  capture_keys + other.capture_keys)
177
264
  end
178
265
 
179
- # New regular expresion that matches either self or other.
266
+ # New regular expression that matches either self or other.
180
267
  def |(other)
181
268
  Rexp.new(parenthesized_encoding(ALT) + "|" + other.parenthesized_encoding(ALT),
182
269
  ALT,
@@ -187,28 +274,26 @@ module Re
187
274
  def optional
188
275
  Rexp.new(parenthesized_encoding(POSTFIX) + "?", POSTFIX, capture_keys)
189
276
  end
277
+
278
+ # Mark the current regular expression with the non-greedy flag.
279
+ # Repeats applied to this regular expression will be treated as
280
+ # non-greedy repeats. Note that +non_greedy has no effect unless
281
+ # immediately followed by +many+, +one_or_more+, +repeat+,
282
+ # +at_least+ or +at_most+.
283
+ def non_greedy
284
+ @greedy = false
285
+ self
286
+ end
190
287
 
191
288
  # New regular expression that matches self many (zero or more)
192
289
  # times.
193
290
  def many
194
- Rexp.new(parenthesized_encoding(POSTFIX) + "*", POSTFIX, capture_keys)
195
- end
196
-
197
- # New regular expression that matches self many (zero or more)
198
- # times (non-greedy version).
199
- def many!
200
- Rexp.new(parenthesized_encoding(POSTFIX) + "*?", POSTFIX, capture_keys)
291
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("*"), POSTFIX, capture_keys)
201
292
  end
202
-
293
+
203
294
  # New regular expression that matches self one or more times.
204
295
  def one_or_more
205
- Rexp.new(parenthesized_encoding(POSTFIX) + "+", POSTFIX, capture_keys)
206
- end
207
-
208
- # New regular expression that matches self one or more times
209
- # (non-greedy version).
210
- def one_or_more!
211
- Rexp.new(parenthesized_encoding(POSTFIX) + "+?", POSTFIX, capture_keys)
296
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("+"), POSTFIX, capture_keys)
212
297
  end
213
298
 
214
299
  # New regular expression that matches self between +min+ and +max+
@@ -216,7 +301,7 @@ module Re
216
301
  # exactly exactly +min+ times.
217
302
  def repeat(min, max=nil)
218
303
  if min && max
219
- Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},#{max}}", POSTFIX, capture_keys)
304
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},#{max}}"), POSTFIX, capture_keys)
220
305
  else
221
306
  Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min}}", POSTFIX, capture_keys)
222
307
  end
@@ -224,12 +309,12 @@ module Re
224
309
 
225
310
  # New regular expression that matches self at least +min+ times.
226
311
  def at_least(min)
227
- Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},}", POSTFIX, capture_keys)
312
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},}"), POSTFIX, capture_keys)
228
313
  end
229
314
 
230
315
  # New regular expression that matches self at most +max+ times.
231
316
  def at_most(max)
232
- Rexp.new(parenthesized_encoding(POSTFIX) + "{0,#{max}}", POSTFIX, capture_keys)
317
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{0,#{max}}"), POSTFIX, capture_keys)
233
318
  end
234
319
 
235
320
  # New regular expression that matches self across the complete
@@ -323,7 +408,13 @@ module Re
323
408
 
324
409
  protected
325
410
 
326
- # String representation with grouping if needed.
411
+ # Return the repeat op in either greedy or non-greedy form, as
412
+ # determined by the greedy flag on the current regular expression.
413
+ def apply_greedy(op)
414
+ greedy? ? op : "#{op}?"
415
+ end
416
+
417
+ # String encoding with grouping if needed.
327
418
  #
328
419
  # If the precedence of the current Regexp is less than the new
329
420
  # precedence level, return the encoding wrapped in a non-capturing
@@ -457,8 +548,8 @@ module Re
457
548
  # Examples:
458
549
  #
459
550
  # re.none("aieouy") -- matches non-vowels
460
- # re.any("0-9") -- matches non-digits
461
- # re.any("A-Z", "a-z", "0-9") -- matches non-alphanumerics
551
+ # re.none("0-9") -- matches non-digits
552
+ # re.none("A-Z", "a-z", "0-9") -- matches non-alphanumerics
462
553
  #
463
554
  def none(*chars)
464
555
  Rexp.new("[^" + char_class(chars) + "]", GROUPED, [])
@@ -1,5 +1,10 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
4
+ # All rights reserved.
5
+ #
6
+ # Re is provided under the MIT open source license (see MIT-LICENSE)
7
+
3
8
  require 'test/unit'
4
9
  require 're'
5
10
 
@@ -59,7 +64,7 @@ class ReTest < Test::Unit::TestCase
59
64
  end
60
65
 
61
66
  def test_non_greedy_many
62
- r = re.any.many!.capture(:x) + re("b")
67
+ r = re.any.non_greedy.many.capture(:x) + re("b")
63
68
  result = r.match("xbxb")
64
69
  assert result
65
70
  assert_equal "x", result[:x]
@@ -80,7 +85,7 @@ class ReTest < Test::Unit::TestCase
80
85
  end
81
86
 
82
87
  def test_non_greedy_one_or_more
83
- r = re.any.one_or_more!.capture(:any) + re("b")
88
+ r = re.any.non_greedy.one_or_more.capture(:any) + re("b")
84
89
  result = r.match("xbxb")
85
90
  assert result
86
91
  assert_equal "x", result[:any]
@@ -102,6 +107,18 @@ class ReTest < Test::Unit::TestCase
102
107
  assert r !~ "aaaaa"
103
108
  end
104
109
 
110
+ def test_repeat_greedy
111
+ r = re("a").repeat(2, 4)
112
+ result = r =~ "aaaaa"
113
+ assert_equal "aaaa", result.full_match
114
+ end
115
+
116
+ def test_repeat_non_greedy
117
+ r = re("a").non_greedy.repeat(2, 4)
118
+ result = r =~ "aaaaa"
119
+ assert_equal "aa", result.full_match
120
+ end
121
+
105
122
  def test_at_least
106
123
  r = re("a").at_least(2).all
107
124
  assert r !~ "a"
@@ -109,6 +126,18 @@ class ReTest < Test::Unit::TestCase
109
126
  assert r =~ "aaaaaaaaaaaaaaaaaaaa"
110
127
  end
111
128
 
129
+ def test_at_least_greedy
130
+ r = re("a").at_least(2)
131
+ result = r =~ "aaaa"
132
+ assert_equal "aaaa", result.full_match
133
+ end
134
+
135
+ def test_at_least_non_greedy
136
+ r = re("a").non_greedy.at_least(2)
137
+ result = r =~ "aaa"
138
+ assert_equal "aa", result.full_match
139
+ end
140
+
112
141
  def test_at_most
113
142
  r = re("a").at_most(4).all
114
143
  assert r =~ ""
@@ -119,6 +148,24 @@ class ReTest < Test::Unit::TestCase
119
148
  assert r !~ "aaaaa"
120
149
  end
121
150
 
151
+ def test_at_most_greedy
152
+ r = re("a").at_most(4)
153
+ result = r =~ "aaaa"
154
+ assert_equal "aaaa", result.full_match
155
+ end
156
+
157
+ def test_at_most_non_greedy
158
+ r = re("a").non_greedy.at_most(4)
159
+ result = r =~ "aaaa"
160
+ if RUBY_VERSION < "1.9"
161
+ # Ruby 1.8.x seems to have a bug where non-greedy matches with
162
+ # intervals match at least one character.
163
+ assert_equal "a", result.full_match
164
+ else
165
+ assert_equal "", result.full_match
166
+ end
167
+ end
168
+
122
169
  def test_optional
123
170
  r = re("a").optional.all
124
171
  assert r =~ ""
@@ -494,6 +541,15 @@ class ReTest < Test::Unit::TestCase
494
541
  assert_equal "02", result[:month]
495
542
  assert_equal "14", result[:day]
496
543
  end
544
+
545
+ def test_name_map_returns_map_of_keywords
546
+ r = re("a").capture(:a) + re("b").capture(:b) + re("c").capture(:c)
547
+ result = r.match("abc")
548
+ assert result
549
+ assert_equal 1, r.name_map[:a]
550
+ assert_equal 2, r.name_map[:b]
551
+ assert_equal 3, r.name_map[:c]
552
+ end
497
553
 
498
554
  private
499
555
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: re
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jim Weirich
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-12-29 00:00:00 -05:00
12
+ date: 2009-12-31 00:00:00 -05:00
13
13
  default_executable:
14
14
  dependencies: []
15
15