RubyGems - re - Versions diffs - 0.0.4 → 0.0.5 - Mend

re 0.0.4 → 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

data/README.rdoc CHANGED

@@ -1,5 +1,5 @@
-= Regular Expression Construction.
+= Regular Expression Construction
 Complex regular expressions are hard to construct and even harder to
 read.  The Re library allows users to construct complex regular
@@ -8,7 +8,7 @@ following regular expression that will parse dates:
    /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
-Using the Re library, That regular expression can be built
+Using the Re library, that regular expression can be built
 incrementaly from smaller, easier to understand expressions.
 Perhaps something like this:
@@ -38,9 +38,11 @@ groups can be retrieved by name:
   result.data(:month)  # => "01"
   result.data(:day)    # => "23"
-== Version: 0.0.4
+== Version
-== Usage:
+This document describes Re version 0.0.5.
+== Usage
   include Re
@@ -51,7 +53,7 @@ groups can be retrieved by name:
     puts "No Match"
   end
-== Examples:
+== Examples
   re("a")                -- matches "a"
   re("a") + re("b")      -- matches "ab"
@@ -83,9 +85,70 @@ and character class functions.
 See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
-== License and Copyright:
+== Performance
+We should say a word or two about performance.
+First of all, building regular expressions using Re is slow.  If you
+use Re to build regular expressions, you are encouraged to build the
+regular expression once and reuse it as needed.  This means you
+won't do a lot of inline expressions using Re, but rather assign the
+generated Re regular expression to a constant.  For example:
+  PHONE_RE = re.digit.repeat(3).capture(:area) +
+               re("-") +
+               re.digit.repeat(3).capture(:exchange) +
+               re("-") +
+               re.digit.repeat(4)).capture(:subscriber)
+Alternatively, you can arrange for the regular expression to be
+constructed only when actually needed.  Something like:q
+  def phone_re
+    @phone_re ||= re.digit.repeat(3).capture(:area) +
+                    re("-") +
+                    re.digit.repeat(3).capture(:exchange) +
+                    re("-") +
+                    re.digit.repeat(4)).capture(:subscriber)
+  end
+That method constructs the phone number regular expression once and
+returns a cached value thereafter.  Just make sure you put the
+method in an object that is instantiated once (e.g. a class method).
+When used in matching, Re regular expressions perform fairly well
+compared to native regular expressions.  The overhead is a small
+number of extra method calls and the creation of a Re::Result object
+to return the match results.
+If regular expression performance is a premium in your application,
+then you can still use Re to construct the regular expression and
+extract the raw Ruby Regexp object to be used for the actual
+matching.  You lose the ability to use named capture groups easily,
+but you get raw Ruby regular expression matching performance.
+For example, if you wanted to use the raw regular expression from
+PHONE_RE defined above, you could extract the regular expression
+like this:
+  PHONE_REGEXP = PHONE_RE.regexp
+And then use it directly:
+  if PHONE_REGEXP =~ string
+    # blah blah blah
+  end
+The above match runs at full Ruby matching speed.  If you still
+wanted named capture groups, you can something like this:
+  match_data = PHONE_REGEXP.match(string)
+  area_code = match_data[PHONE_RE.name_map[:area]]
+== License and Copyright
-Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
+Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
+All rights Reserved.
 Re is provided under the MIT open source license (see MIT-LICENSE)
@@ -94,6 +157,7 @@ Re is provided under the MIT open source license (see MIT-LICENSE)
 Documentation :: http://re-lib.rubyforge.org
 Source        :: http://github.com/jimweirich/re
 GemCutter     :: http://gemcutter.org/gems/re
+Download      :: http://rubyforge.org/frs/?group_id=9329
 Bug Tracker   :: http://www.pivotaltracker.com/projects/47758
 Author        :: jim.weirich@gmail.com

data/Rakefile CHANGED

@@ -14,8 +14,36 @@ Rake::TestTask.new(:test) do |t|
   t.test_files = FileList['test/*_test.rb']
 end
-task :release => [:check_non_beta, :readme, :gem, "publish:rdoc"]
+namespace "release" do
+  task :new => [
+    :readme,
+    :check_non_beta,
+    :check_all_committed,
+    :tag_version,
+    :gem,
+    "publish:rdoc"
+  ]
+  task :check_all_committed do
+    status = `git status`
+    unless status =~ /nothing to commit/
+      fail "Outstanding Git Changes:\n#{status}"
+    end
+  end
+  task :commit_new_version do
+    sh "git commit -m 'bumped to version #{Re::VERSION}'"
+  end
+  task :not_already_tagged
-task :check_non_beta do
-  fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
+  task :tag_version => :not_already_tagged do
+    sh "git tag re-#{Re::VERSION}"
+    sh "git push --tags"
+  end
+  task :check_non_beta do
+    fail "Must not be a beta version! Version is #{Re::VERSION}" if Re::Version::BETA
+  end
 end
+task :release => "release:new"

data/lib/re.rb CHANGED

@@ -1,6 +1,6 @@
 #!/usr/bin/ruby -wKU
 #
-# = Regular Expression Construction.
+# = Regular Expression Construction
 #
 # Complex regular expressions are hard to construct and even harder to
 # read.  The Re library allows users to construct complex regular
@@ -9,7 +9,7 @@
 #
 #    /\A((?:19|20)[0-9]{2})[\- \/.](0[1-9]|1[012])[\- \/.](0[1-9]|[12][0-9]|3[01])\z/
 #
-# Using the Re library, That regular expression can be built
+# Using the Re library, that regular expression can be built
 # incrementaly from smaller, easier to understand expressions.
 # Perhaps something like this:
 #
@@ -39,7 +39,7 @@
 #   result.data(:month)  # => "01"
 #   result.data(:day)    # => "23"
 #
-# == Usage:
+# == Usage
 #
 #   include Re
 #
@@ -50,7 +50,7 @@
 #     puts "No Match"
 #   end
 #
-# == Examples:
+# == Examples
 #
 #   re("a")                -- matches "a"
 #   re("a") + re("b")      -- matches "ab"
@@ -82,9 +82,70 @@
 #
 # See Re.re, Re::Rexp, and Re::ConstructionMethods for details.
 #
-# == License and Copyright:
+# == Performance
 #
-# Copyright 2009 by Jim Weirich (jim.weirich@gmail.com)
+# We should say a word or two about performance.
+#
+# First of all, building regular expressions using Re is slow.  If you
+# use Re to build regular expressions, you are encouraged to build the
+# regular expression once and reuse it as needed.  This means you
+# won't do a lot of inline expressions using Re, but rather assign the
+# generated Re regular expression to a constant.  For example:
+#
+#   PHONE_RE = re.digit.repeat(3).capture(:area) +
+#                re("-") +
+#                re.digit.repeat(3).capture(:exchange) +
+#                re("-") +
+#                re.digit.repeat(4)).capture(:subscriber)
+#
+# Alternatively, you can arrange for the regular expression to be
+# constructed only when actually needed.  Something like:q
+#
+#   def phone_re
+#     @phone_re ||= re.digit.repeat(3).capture(:area) +
+#                     re("-") +
+#                     re.digit.repeat(3).capture(:exchange) +
+#                     re("-") +
+#                     re.digit.repeat(4)).capture(:subscriber)
+#   end
+#
+# That method constructs the phone number regular expression once and
+# returns a cached value thereafter.  Just make sure you put the
+# method in an object that is instantiated once (e.g. a class method).
+#
+# When used in matching, Re regular expressions perform fairly well
+# compared to native regular expressions.  The overhead is a small
+# number of extra method calls and the creation of a Re::Result object
+# to return the match results.
+#
+# If regular expression performance is a premium in your application,
+# then you can still use Re to construct the regular expression and
+# extract the raw Ruby Regexp object to be used for the actual
+# matching.  You lose the ability to use named capture groups easily,
+# but you get raw Ruby regular expression matching performance.
+#
+# For example, if you wanted to use the raw regular expression from
+# PHONE_RE defined above, you could extract the regular expression
+# like this:
+#
+#   PHONE_REGEXP = PHONE_RE.regexp
+#
+# And then use it directly:
+#
+#   if PHONE_REGEXP =~ string
+#     # blah blah blah
+#   end
+#
+# The above match runs at full Ruby matching speed.  If you still
+# wanted named capture groups, you can something like this:
+#
+#   match_data = PHONE_REGEXP.match(string)
+#   area_code = match_data[PHONE_RE.name_map[:area]]
+#
+# == License and Copyright
+#
+# Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
+# All rights Reserved.
 #
 # Re is provided under the MIT open source license (see MIT-LICENSE)
 #
@@ -93,6 +154,7 @@
 # Documentation :: http://re-lib.rubyforge.org
 # Source        :: http://github.com/jimweirich/re
 # GemCutter     :: http://gemcutter.org/gems/re
+# Download      :: http://rubyforge.org/frs/?group_id=9329
 # Bug Tracker   :: http://www.pivotaltracker.com/projects/47758
 # Author        :: jim.weirich@gmail.com
 #
@@ -102,7 +164,7 @@ module Re
     NUMBERS = [
       MAJOR = 0,
       MINOR = 0,
-      BUILD = 4,
+      BUILD = 5,
       BETA  = nil,
     ].compact
   end
@@ -125,8 +187,15 @@ module Re
     # Return the text of the named capture data.
     def [](name)
-      index = @rexp.capture_keys.index(name)
-      index ? @match_data[index+1] : nil
+      index = name_map[name]
+      index ? @match_data[index] : nil
+    end
+    private
+    # Lazy eval map of names to capture indices.
+    def name_map
+      @name_map ||= @rexp.name_map
     end
   end
@@ -154,29 +223,47 @@ module Re
       @level = level
       @capture_keys = keys
       @options = options
+      @greedy = true
     end
+    # Does it match a string?  Returns Re::Result if match, nil
+    # otherwise.
+    def match(string)
+      md = regexp.match(string)
+      md ? Result.new(md, self) : nil
+    end
+    alias =~ match
     # Return a Regexp from the the constructed regular expression.
     def regexp
       @regexp ||= Regexp.new(encoding)
     end
-    # Does it match a string? (returns Re::Result if match, nil otherwise)
-    def match(string)
-      md = regexp.match(string)
-      md ? Result.new(md, self) : nil
+    # Is the current regular expression marked to be treated as greedy
+    # when repeat operators are applied?
+    def greedy?
+      @greedy
     end
-    alias =~ match
-    # New regular expresion that matches the concatenation of self and
-    # other.
+    # Map of names to capture indices.  Use this to lookup names in
+    # the the match data returned from a regular expression match.
+    def name_map
+      result = {}
+      capture_keys.each_with_index do |key, i|
+        result[key] = i + 1
+      end
+      result
+    end
+    # New regular expression that matches the concatenation of self
+    # and other.
     def +(other)
       Rexp.new(parenthesized_encoding(CONCAT) + other.parenthesized_encoding(CONCAT),
         CONCAT,
         capture_keys + other.capture_keys)
     end
-    # New regular expresion that matches either self or other.
+    # New regular expression that matches either self or other.
     def |(other)
       Rexp.new(parenthesized_encoding(ALT) + "|" + other.parenthesized_encoding(ALT),
         ALT,
@@ -187,28 +274,26 @@ module Re
     def optional
       Rexp.new(parenthesized_encoding(POSTFIX) + "?", POSTFIX, capture_keys)
     end
+    # Mark the current regular expression with the non-greedy flag.
+    # Repeats applied to this regular expression will be treated as
+    # non-greedy repeats.  Note that +non_greedy has no effect unless
+    # immediately followed by +many+, +one_or_more+, +repeat+,
+    # +at_least+ or +at_most+.
+    def non_greedy
+      @greedy = false
+      self
+    end
     # New regular expression that matches self many (zero or more)
     # times.
     def many
-      Rexp.new(parenthesized_encoding(POSTFIX) + "*", POSTFIX, capture_keys)
-    end
-    # New regular expression that matches self many (zero or more)
-    # times (non-greedy version).
-    def many!
-      Rexp.new(parenthesized_encoding(POSTFIX) + "*?", POSTFIX, capture_keys)
+      Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("*"), POSTFIX, capture_keys)
     end
     # New regular expression that matches self one or more times.
     def one_or_more
-      Rexp.new(parenthesized_encoding(POSTFIX) + "+", POSTFIX, capture_keys)
-    end
-    # New regular expression that matches self one or more times
-    # (non-greedy version).
-    def one_or_more!
-      Rexp.new(parenthesized_encoding(POSTFIX) + "+?", POSTFIX, capture_keys)
+      Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("+"), POSTFIX, capture_keys)
     end
     # New regular expression that matches self between +min+ and +max+
@@ -216,7 +301,7 @@ module Re
     # exactly exactly +min+ times.
     def repeat(min, max=nil)
       if min && max
-        Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},#{max}}", POSTFIX, capture_keys)
+        Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},#{max}}"), POSTFIX, capture_keys)
       else
         Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min}}", POSTFIX, capture_keys)
       end
@@ -224,12 +309,12 @@ module Re
     # New regular expression that matches self at least +min+ times.
     def at_least(min)
-      Rexp.new(parenthesized_encoding(POSTFIX) + "{#{min},}", POSTFIX, capture_keys)
+      Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{#{min},}"), POSTFIX, capture_keys)
     end
     # New regular expression that matches self at most +max+ times.
     def at_most(max)
-      Rexp.new(parenthesized_encoding(POSTFIX) + "{0,#{max}}", POSTFIX, capture_keys)
+      Rexp.new(parenthesized_encoding(POSTFIX) + apply_greedy("{0,#{max}}"), POSTFIX, capture_keys)
     end
     # New regular expression that matches self across the complete
@@ -323,7 +408,13 @@ module Re
     protected
-    # String representation with grouping if needed.
+    # Return the repeat op in either greedy or non-greedy form, as
+    # determined by the greedy flag on the current regular expression.
+    def apply_greedy(op)
+      greedy? ? op : "#{op}?"
+    end
+    # String encoding with grouping if needed.
     #
     # If the precedence of the current Regexp is less than the new
     # precedence level, return the encoding wrapped in a non-capturing
@@ -457,8 +548,8 @@ module Re
     # Examples:
     #
     #   re.none("aieouy")                 -- matches non-vowels
-    #   re.any("0-9")                     -- matches non-digits
-    #   re.any("A-Z", "a-z", "0-9")       -- matches non-alphanumerics
+    #   re.none("0-9")                    -- matches non-digits
+    #   re.none("A-Z", "a-z", "0-9")      -- matches non-alphanumerics
     #
     def none(*chars)
       Rexp.new("[^" + char_class(chars)  + "]", GROUPED, [])

data/test/re_test.rb CHANGED

@@ -1,5 +1,10 @@
 #!/usr/bin/env ruby
+# Copyright 2009 by Jim Weirich (jim.weirich@gmail.com).
+# All rights reserved.
+#
+# Re is provided under the MIT open source license (see MIT-LICENSE)
 require 'test/unit'
 require 're'
@@ -59,7 +64,7 @@ class ReTest < Test::Unit::TestCase
   end
   def test_non_greedy_many
-    r =  re.any.many!.capture(:x) + re("b")
+    r =  re.any.non_greedy.many.capture(:x) + re("b")
     result = r.match("xbxb")
     assert result
     assert_equal "x", result[:x]
@@ -80,7 +85,7 @@ class ReTest < Test::Unit::TestCase
   end
   def test_non_greedy_one_or_more
-    r = re.any.one_or_more!.capture(:any) + re("b")
+    r = re.any.non_greedy.one_or_more.capture(:any) + re("b")
     result = r.match("xbxb")
     assert result
     assert_equal "x", result[:any]
@@ -102,6 +107,18 @@ class ReTest < Test::Unit::TestCase
     assert r !~ "aaaaa"
   end
+  def test_repeat_greedy
+    r = re("a").repeat(2, 4)
+    result = r =~ "aaaaa"
+    assert_equal "aaaa", result.full_match
+  end
+  def test_repeat_non_greedy
+    r = re("a").non_greedy.repeat(2, 4)
+    result = r =~ "aaaaa"
+    assert_equal "aa", result.full_match
+  end
   def test_at_least
     r = re("a").at_least(2).all
     assert r !~ "a"
@@ -109,6 +126,18 @@ class ReTest < Test::Unit::TestCase
     assert r =~ "aaaaaaaaaaaaaaaaaaaa"
   end
+  def test_at_least_greedy
+    r = re("a").at_least(2)
+    result =  r =~ "aaaa"
+    assert_equal "aaaa", result.full_match
+  end
+  def test_at_least_non_greedy
+    r = re("a").non_greedy.at_least(2)
+    result =  r =~ "aaa"
+    assert_equal "aa", result.full_match
+  end
   def test_at_most
     r = re("a").at_most(4).all
     assert r =~ ""
@@ -119,6 +148,24 @@ class ReTest < Test::Unit::TestCase
     assert r !~ "aaaaa"
   end
+  def test_at_most_greedy
+    r = re("a").at_most(4)
+    result = r =~ "aaaa"
+    assert_equal "aaaa", result.full_match
+  end
+  def test_at_most_non_greedy
+    r = re("a").non_greedy.at_most(4)
+    result = r =~ "aaaa"
+    if RUBY_VERSION < "1.9"
+      # Ruby 1.8.x seems to have a bug where non-greedy matches with
+      # intervals match at least one character.
+      assert_equal "a", result.full_match
+    else
+      assert_equal "", result.full_match
+    end
+  end
   def test_optional
     r = re("a").optional.all
     assert r =~ ""
@@ -494,6 +541,15 @@ class ReTest < Test::Unit::TestCase
     assert_equal "02", result[:month]
     assert_equal "14", result[:day]
   end
+  def test_name_map_returns_map_of_keywords
+    r = re("a").capture(:a) + re("b").capture(:b) + re("c").capture(:c)
+    result = r.match("abc")
+    assert result
+    assert_equal 1, r.name_map[:a]
+    assert_equal 2, r.name_map[:b]
+    assert_equal 3, r.name_map[:c]
+  end
   private

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: re
 version: !ruby/object:Gem::Version
-  version: 0.0.4
+  version: 0.0.5
 platform: ruby
 authors:
 - Jim Weirich
@@ -9,7 +9,7 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2009-12-29 00:00:00 -05:00
+date: 2009-12-31 00:00:00 -05:00
 default_executable:
 dependencies: []