re 0.0.4 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (5) hide show
  1. data/README.rdoc +71 -7
  2. data/Rakefile +31 -3
  3. data/lib/re.rb +129 -38
  4. data/test/re_test.rb +58 -2
  5. metadata +2 -2
@@ -1,5 +1,5 @@
1
1
 
2
- = Regular Expression Construction.
2
+ = Regular Expression Construction
3
3
 
4
4
  Complex regular expressions are hard to construct and even harder to
5
5
  read. The Re library allows users to construct complex regular
@@ -8,7 +8,7 @@ following regular expression that will parse dates:
8
8
 
9
9
  /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
10
10
 
11
- Using the Re library, That regular expression can be built
11
+ Using the Re library, that regular expression can be built
12
12
  incrementaly from smaller, easier to understand expressions.
13
13
  Perhaps something like this:
14
14
 
@@ -38,9 +38,11 @@ groups can be retrieved by name:
38
38
  result.data(:month) # => "01"
39
39
  result.data(:day) # => "23"
40
40
 
41
- == Version: 0.0.4
41
+ == Version
42
42
 
43
- == Usage:
43
+ This document describes Re version 0.0.5.
44
+
45
+ == Usage
44
46
 
45
47
  include Re
46
48
 
@@ -51,7 +53,7 @@ groups can be retrieved by name:
51
53
  puts "No Match"
52
54
  end
53
55
 
54
- == Examples:
56
+ == Examples
55
57
 
56
58
  re("a") -- matches "a"
57
59
  re("a") + re("b") -- matches "ab"
@@ -83,9 +85,70 @@ and character class functions.
83
85
 
84
86
  See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
85
87
 
86
- == License and Copyright:
88
+ == Performance
89
+
90
+ We should say a word or two about performance.
91
+
92
+ First of all, building regular expressions using Re is slow. If you
93
+ use Re to build regular expressions, you are encouraged to build the
94
+ regular expression once and reuse it as needed. This means you
95
+ won't do a lot of inline expressions using Re, but rather assign the
96
+ generated Re regular expression to a constant. For example:
97
+
98
+ PHONE_RE = re.digit.repeat(3).capture(:area) +
99
+ re("-") +
100
+ re.digit.repeat(3).capture(:exchange) +
101
+ re("-") +
102
+ re.digit.repeat(4)).capture(:subscriber)
103
+
104
+ Alternatively, you can arrange for the regular expression to be
105
+ constructed only when actually needed. Something like:q
106
+
107
+ def phone_re
108
+ @phone_re ||= re.digit.repeat(3).capture(:area) +
109
+ re("-") +
110
+ re.digit.repeat(3).capture(:exchange) +
111
+ re("-") +
112
+ re.digit.repeat(4)).capture(:subscriber)
113
+ end
114
+
115
+ That method constructs the phone number regular expression once and
116
+ returns a cached value thereafter. Just make sure you put the
117
+ method in an object that is instantiated once (e.g. a class method).
118
+
119
+ When used in matching, Re regular expressions perform fairly well
120
+ compared to native regular expressions. The overhead is a small
121
+ number of extra method calls and the creation of a Re::Result object
122
+ to return the match results.
123
+
124
+ If regular expression performance is a premium in your application,
125
+ then you can still use Re to construct the regular expression and
126
+ extract the raw Ruby Regexp object to be used for the actual
127
+ matching. You lose the ability to use named capture groups easily,
128
+ but you get raw Ruby regular expression matching performance.
129
+
130
+ For example, if you wanted to use the raw regular expression from
131
+ PHONE_RE defined above, you could extract the regular expression
132
+ like this:
133
+
134
+ PHONE_REGEXP = PHONE_RE.regexp
135
+
136
+ And then use it directly:
137
+
138
+ if PHONE_REGEXP =~ string
139
+ # blah blah blah
140
+ end
141
+
142
+ The above match runs at full Ruby matching speed. If you still
143
+ wanted named capture groups, you can something like this:
144
+
145
+ match_data = PHONE_REGEXP.match(string)
146
+ area_code = match_data[PHONE_RE.name_map[:area]]
147
+
148
+ == License and Copyright
87
149
 
88
- Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
150
+ Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
151
+ All rights Reserved.
89
152
 
90
153
  Re is provided under the MIT open source license (see MIT-LICENSE)
91
154
 
@@ -94,6 +157,7 @@ Re is provided under the MIT open source license (see MIT-LICENSE)
94
157
  Documentation :: http://re-lib.rubyforge.org
95
158
  Source :: http://github.com/jimweirich/re
96
159
  GemCutter :: http://gemcutter.org/gems/re
160
+ Download :: http://rubyforge.org/frs/?group_id=9329
97
161
  Bug Tracker :: http://www.pivotaltracker.com/projects/47758
98
162
  Author :: jim.weirich@gmail.com
99
163
 
data/Rakefile CHANGED
@@ -14,8 +14,36 @@ Rake::TestTask.new(:test) do |t|
14
14
  t.test_files = FileList['test/*_test.rb']
15
15
  end
16
16
 
17
- task :release => [:check_non_beta, :readme, :gem, "publish:rdoc"]
17
+ namespace "release" do
18
+ task :new => [
19
+ :readme,
20
+ :check_non_beta,
21
+ :check_all_committed,
22
+ :tag_version,
23
+ :gem,
24
+ "publish:rdoc"
25
+ ]
26
+
27
+ task :check_all_committed do
28
+ status = `git status`
29
+ unless status =~ /nothing to commit/
30
+ fail "Outstanding Git Changes:\n#{status}"
31
+ end
32
+ end
33
+
34
+ task :commit_new_version do
35
+ sh "git commit -m 'bumped to version #{Re::VERSION}'"
36
+ end
37
+
38
+ task :not_already_tagged
18
39
 
19
- task :check_non_beta do
20
- fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
40
+ task :tag_version => :not_already_tagged do
41
+ sh "git tag re-#{Re::VERSION}"
42
+ sh "git push --tags"
43
+ end
44
+
45
+ task :check_non_beta do
46
+ fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
47
+ end
21
48
  end
49
+ task :release => "release:new"
data/lib/re.rb CHANGED
@@ -1,6 +1,6 @@
1
1
  #!/usr/bin/ruby -wKU
2
2
  #
3
- # = Regular Expression Construction.
3
+ # = Regular Expression Construction
4
4
  #
5
5
  # Complex regular expressions are hard to construct and even harder to
6
6
  # read. The Re library allows users to construct complex regular
@@ -9,7 +9,7 @@
9
9
  #
10
10
  # /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
11
11
  #
12
- # Using the Re library, That regular expression can be built
12
+ # Using the Re library, that regular expression can be built
13
13
  # incrementaly from smaller, easier to understand expressions.
14
14
  # Perhaps something like this:
15
15
  #
@@ -39,7 +39,7 @@
39
39
  # result.data(:month) # => "01"
40
40
  # result.data(:day) # => "23"
41
41
  #
42
- # == Usage:
42
+ # == Usage
43
43
  #
44
44
  # include Re
45
45
  #
@@ -50,7 +50,7 @@
50
50
  # puts "No Match"
51
51
  # end
52
52
  #
53
- # == Examples:
53
+ # == Examples
54
54
  #
55
55
  # re("a") -- matches "a"
56
56
  # re("a") + re("b") -- matches "ab"
@@ -82,9 +82,70 @@
82
82
  #
83
83
  # See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
84
84
  #
85
- # == License and Copyright:
85
+ # == Performance
86
86
  #
87
- # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
87
+ # We should say a word or two about performance.
88
+ #
89
+ # First of all, building regular expressions using Re is slow. If you
90
+ # use Re to build regular expressions, you are encouraged to build the
91
+ # regular expression once and reuse it as needed. This means you
92
+ # won't do a lot of inline expressions using Re, but rather assign the
93
+ # generated Re regular expression to a constant. For example:
94
+ #
95
+ # PHONE_RE = re.digit.repeat(3).capture(:area) +
96
+ # re("-") +
97
+ # re.digit.repeat(3).capture(:exchange) +
98
+ # re("-") +
99
+ # re.digit.repeat(4)).capture(:subscriber)
100
+ #
101
+ # Alternatively, you can arrange for the regular expression to be
102
+ # constructed only when actually needed. Something like:q
103
+ #
104
+ # def phone_re
105
+ # @phone_re ||= re.digit.repeat(3).capture(:area) +
106
+ # re("-") +
107
+ # re.digit.repeat(3).capture(:exchange) +
108
+ # re("-") +
109
+ # re.digit.repeat(4)).capture(:subscriber)
110
+ # end
111
+ #
112
+ # That method constructs the phone number regular expression once and
113
+ # returns a cached value thereafter. Just make sure you put the
114
+ # method in an object that is instantiated once (e.g. a class method).
115
+ #
116
+ # When used in matching, Re regular expressions perform fairly well
117
+ # compared to native regular expressions. The overhead is a small
118
+ # number of extra method calls and the creation of a Re::Result object
119
+ # to return the match results.
120
+ #
121
+ # If regular expression performance is a premium in your application,
122
+ # then you can still use Re to construct the regular expression and
123
+ # extract the raw Ruby Regexp object to be used for the actual
124
+ # matching. You lose the ability to use named capture groups easily,
125
+ # but you get raw Ruby regular expression matching performance.
126
+ #
127
+ # For example, if you wanted to use the raw regular expression from
128
+ # PHONE_RE defined above, you could extract the regular expression
129
+ # like this:
130
+ #
131
+ # PHONE_REGEXP = PHONE_RE.regexp
132
+ #
133
+ # And then use it directly:
134
+ #
135
+ # if PHONE_REGEXP =~ string
136
+ # # blah blah blah
137
+ # end
138
+ #
139
+ # The above match runs at full Ruby matching speed. If you still
140
+ # wanted named capture groups, you can something like this:
141
+ #
142
+ # match_data = PHONE_REGEXP.match(string)
143
+ # area_code = match_data[PHONE_RE.name_map[:area]]
144
+ #
145
+ # == License and Copyright
146
+ #
147
+ # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
148
+ # All rights Reserved.
88
149
  #
89
150
  # Re is provided under the MIT open source license (see MIT-LICENSE)
90
151
  #
@@ -93,6 +154,7 @@
93
154
  # Documentation :: http://re-lib.rubyforge.org
94
155
  # Source :: http://github.com/jimweirich/re
95
156
  # GemCutter :: http://gemcutter.org/gems/re
157
+ # Download :: http://rubyforge.org/frs/?group_id=9329
96
158
  # Bug Tracker :: http://www.pivotaltracker.com/projects/47758
97
159
  # Author :: jim.weirich@gmail.com
98
160
  #
@@ -102,7 +164,7 @@ module Re
102
164
  NUMBERS = [
103
165
  MAJOR = 0,
104
166
  MINOR = 0,
105
- BUILD = 4,
167
+ BUILD = 5,
106
168
  BETA = nil,
107
169
  ].compact
108
170
  end
@@ -125,8 +187,15 @@ module Re
125
187
 
126
188
  # Return the text of the named capture data.
127
189
  def [](name)
128
- index = @rexp.capture_keys.index(name)
129
- index ? @match_data[index+1] : nil
190
+ index = name_map[name]
191
+ index ? @match_data[index] : nil
192
+ end
193
+
194
+ private
195
+
196
+ # Lazy eval map of names to capture indices.
197
+ def name_map
198
+ @name_map ||= @rexp.name_map
130
199
  end
131
200
  end
132
201
 
@@ -154,29 +223,47 @@ module Re
154
223
  @level = level
155
224
  @capture_keys = keys
156
225
  @options = options
226
+ @greedy = true
157
227
  end
158
228
 
229
+ # Does it match a string? Returns Re::Result if match, nil
230
+ # otherwise.
231
+ def match(string)
232
+ md = regexp.match(string)
233
+ md ? Result.new(md, self) : nil
234
+ end
235
+ alias =~ match
236
+
159
237
  # Return a Regexp from the the constructed regular expression.
160
238
  def regexp
161
239
  @regexp ||= Regexp.new(encoding)
162
240
  end
163
241
 
164
- # Does it match a string? (returns Re::Result if match, nil otherwise)
165
- def match(string)
166
- md = regexp.match(string)
167
- md ? Result.new(md, self) : nil
242
+ # Is the current regular expression marked to be treated as greedy
243
+ # when repeat operators are applied?
244
+ def greedy?
245
+ @greedy
168
246
  end
169
- alias =~ match
170
247
 
171
- # New regular expresion that matches the concatenation of self and
172
- # other.
248
+ # Map of names to capture indices. Use this to lookup names in
249
+ # the the match data returned from a regular expression match.
250
+ def name_map
251
+ result = {}
252
+ capture_keys.each_with_index do |key, i|
253
+ result[key] = i + 1
254
+ end
255
+ result
256
+ end
257
+
258
+ # New regular expression that matches the concatenation of self
259
+ # and other.
173
260
  def +(other)
174
261
  Rexp.new(parenthesized_encoding(CONCAT) + other.parenthesized_encoding(CONCAT),
175
262
  CONCAT,
176
263
  capture_keys + other.capture_keys)
177
264
  end
178
265
 
179
- # New regular expresion that matches either self or other.
266
+ # New regular expression that matches either self or other.
180
267
  def |(other)
181
268
  Rexp.new(parenthesized_encoding(ALT) + "|" + other.parenthesized_encoding(ALT),
182
269
  ALT,
@@ -187,28 +274,26 @@ module Re
187
274
  def optional
188
275
  Rexp.new(parenthesized_encoding(POSTFIX) + "?", POSTFIX, capture_keys)
189
276
  end
277
+
278
+ # Mark the current regular expression with the non-greedy flag.
279
+ # Repeats applied to this regular expression will be treated as
280
+ # non-greedy repeats. Note that +non_greedy has no effect unless
281
+ # immediately followed by +many+, +one_or_more+, +repeat+,
282
+ # +at_least+ or +at_most+.
283
+ def non_greedy
284
+ @greedy = false
285
+ self
286
+ end
190
287
 
191
288
  # New regular expression that matches self many (zero or more)
192
289
  # times.
193
290
  def many
194
- Rexp.new(parenthesized_encoding(POSTFIX) + "*", POSTFIX, capture_keys)
195
- end
196
-
197
- # New regular expression that matches self many (zero or more)
198
- # times (non-greedy version).
199
- def many!
200
- Rexp.new(parenthesized_encoding(POSTFIX) + "*?", POSTFIX, capture_keys)
291
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("*"), POSTFIX, capture_keys)
201
292
  end
202
-
293
+
203
294
  # New regular expression that matches self one or more times.
204
295
  def one_or_more
205
- Rexp.new(parenthesized_encoding(POSTFIX) + "+", POSTFIX, capture_keys)
206
- end
207
-
208
- # New regular expression that matches self one or more times
209
- # (non-greedy version).
210
- def one_or_more!
211
- Rexp.new(parenthesized_encoding(POSTFIX) + "+?", POSTFIX, capture_keys)
296
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("+"), POSTFIX, capture_keys)
212
297
  end
213
298
 
214
299
  # New regular expression that matches self between +min+ and +max+
@@ -216,7 +301,7 @@ module Re
216
301
  # exactly exactly +min+ times.
217
302
  def repeat(min, max=nil)
218
303
  if min && max
219
- Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},#{max}}", POSTFIX, capture_keys)
304
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},#{max}}"), POSTFIX, capture_keys)
220
305
  else
221
306
  Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min}}", POSTFIX, capture_keys)
222
307
  end
@@ -224,12 +309,12 @@ module Re
224
309
 
225
310
  # New regular expression that matches self at least +min+ times.
226
311
  def at_least(min)
227
- Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},}", POSTFIX, capture_keys)
312
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},}"), POSTFIX, capture_keys)
228
313
  end
229
314
 
230
315
  # New regular expression that matches self at most +max+ times.
231
316
  def at_most(max)
232
- Rexp.new(parenthesized_encoding(POSTFIX) + "{0,#{max}}", POSTFIX, capture_keys)
317
+ Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{0,#{max}}"), POSTFIX, capture_keys)
233
318
  end
234
319
 
235
320
  # New regular expression that matches self across the complete
@@ -323,7 +408,13 @@ module Re
323
408
 
324
409
  protected
325
410
 
326
- # String representation with grouping if needed.
411
+ # Return the repeat op in either greedy or non-greedy form, as
412
+ # determined by the greedy flag on the current regular expression.
413
+ def apply_greedy(op)
414
+ greedy? ? op : "#{op}?"
415
+ end
416
+
417
+ # String encoding with grouping if needed.
327
418
  #
328
419
  # If the precedence of the current Regexp is less than the new
329
420
  # precedence level, return the encoding wrapped in a non-capturing
@@ -457,8 +548,8 @@ module Re
457
548
  # Examples:
458
549
  #
459
550
  # re.none("aieouy") -- matches non-vowels
460
- # re.any("0-9") -- matches non-digits
461
- # re.any("A-Z", "a-z", "0-9") -- matches non-alphanumerics
551
+ # re.none("0-9") -- matches non-digits
552
+ # re.none("A-Z", "a-z", "0-9") -- matches non-alphanumerics
462
553
  #
463
554
  def none(*chars)
464
555
  Rexp.new("[^" + char_class(chars) + "]", GROUPED, [])
@@ -1,5 +1,10 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
+ # Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
4
+ # All rights reserved.
5
+ #
6
+ # Re is provided under the MIT open source license (see MIT-LICENSE)
7
+
3
8
  require 'test/unit'
4
9
  require 're'
5
10
 
@@ -59,7 +64,7 @@ class ReTest < Test::Unit::TestCase
59
64
  end
60
65
 
61
66
  def test_non_greedy_many
62
- r = re.any.many!.capture(:x) + re("b")
67
+ r = re.any.non_greedy.many.capture(:x) + re("b")
63
68
  result = r.match("xbxb")
64
69
  assert result
65
70
  assert_equal "x", result[:x]
@@ -80,7 +85,7 @@ class ReTest < Test::Unit::TestCase
80
85
  end
81
86
 
82
87
  def test_non_greedy_one_or_more
83
- r = re.any.one_or_more!.capture(:any) + re("b")
88
+ r = re.any.non_greedy.one_or_more.capture(:any) + re("b")
84
89
  result = r.match("xbxb")
85
90
  assert result
86
91
  assert_equal "x", result[:any]
@@ -102,6 +107,18 @@ class ReTest < Test::Unit::TestCase
102
107
  assert r !~ "aaaaa"
103
108
  end
104
109
 
110
+ def test_repeat_greedy
111
+ r = re("a").repeat(2, 4)
112
+ result = r =~ "aaaaa"
113
+ assert_equal "aaaa", result.full_match
114
+ end
115
+
116
+ def test_repeat_non_greedy
117
+ r = re("a").non_greedy.repeat(2, 4)
118
+ result = r =~ "aaaaa"
119
+ assert_equal "aa", result.full_match
120
+ end
121
+
105
122
  def test_at_least
106
123
  r = re("a").at_least(2).all
107
124
  assert r !~ "a"
@@ -109,6 +126,18 @@ class ReTest < Test::Unit::TestCase
109
126
  assert r =~ "aaaaaaaaaaaaaaaaaaaa"
110
127
  end
111
128
 
129
+ def test_at_least_greedy
130
+ r = re("a").at_least(2)
131
+ result = r =~ "aaaa"
132
+ assert_equal "aaaa", result.full_match
133
+ end
134
+
135
+ def test_at_least_non_greedy
136
+ r = re("a").non_greedy.at_least(2)
137
+ result = r =~ "aaa"
138
+ assert_equal "aa", result.full_match
139
+ end
140
+
112
141
  def test_at_most
113
142
  r = re("a").at_most(4).all
114
143
  assert r =~ ""
@@ -119,6 +148,24 @@ class ReTest < Test::Unit::TestCase
119
148
  assert r !~ "aaaaa"
120
149
  end
121
150
 
151
+ def test_at_most_greedy
152
+ r = re("a").at_most(4)
153
+ result = r =~ "aaaa"
154
+ assert_equal "aaaa", result.full_match
155
+ end
156
+
157
+ def test_at_most_non_greedy
158
+ r = re("a").non_greedy.at_most(4)
159
+ result = r =~ "aaaa"
160
+ if RUBY_VERSION < "1.9"
161
+ # Ruby 1.8.x seems to have a bug where non-greedy matches with
162
+ # intervals match at least one character.
163
+ assert_equal "a", result.full_match
164
+ else
165
+ assert_equal "", result.full_match
166
+ end
167
+ end
168
+
122
169
  def test_optional
123
170
  r = re("a").optional.all
124
171
  assert r =~ ""
@@ -494,6 +541,15 @@ class ReTest < Test::Unit::TestCase
494
541
  assert_equal "02", result[:month]
495
542
  assert_equal "14", result[:day]
496
543
  end
544
+
545
+ def test_name_map_returns_map_of_keywords
546
+ r = re("a").capture(:a) + re("b").capture(:b) + re("c").capture(:c)
547
+ result = r.match("abc")
548
+ assert result
549
+ assert_equal 1, r.name_map[:a]
550
+ assert_equal 2, r.name_map[:b]
551
+ assert_equal 3, r.name_map[:c]
552
+ end
497
553
 
498
554
  private
499
555
 
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: re
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.4
4
+ version: 0.0.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jim Weirich
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-12-29 00:00:00 -05:00
12
+ date: 2009-12-31 00:00:00 -05:00
13
13
  default_executable:
14
14
  dependencies: []
15
15