RubyGems - object_regex - Versions diffs - 1.0.0 → 1.0.1 - Mend

object_regex 1.0.0 → 1.0.1

Files changed (3) hide show

data/README.md +217 -246
data/VERSION +1 -1
metadata +4 -8

data/README.md CHANGED

@@ -1,3 +1,32 @@
+# Super-Quick Introduction
+Ruby 1.9+ only. Not for a strict technical reason, but because I like 1.9's standard-library features and didn't
+want to rewrite it to not use them.
+    gem install object_regex
+    require 'object_regex'
+    class Token < Struct.new(:type, :contents)
+      def reg_desc
+        type.to_s
+      end
+    end
+    input = [Token.new(:str, '"hello"'),
+             Token.new(:str, '"there"'),
+             Token.new(:int, '2'),
+             Token.new(:str, '"worldagain"'),
+             Token.new(:str, '"highfive"'),
+             Token.new(:int, '5'),
+             Token.new(:str, 'jklkjl'),
+             Token.new(:int, '3'),
+             Token.new(:comment, '#lol'),
+             Token.new(:str, ''),
+             Token.new(:comment, '#no pairs'),
+             Token.new(:str, 'jkl'),
+             Token.new(:eof, '')]
+    # all contiguous string tokens, and any number that follows them (if any)
+    ObjectRegex.new('str+ int?').all_matches(input)
 ## Introduction
 I present a small Ruby class which provides full Ruby Regexp matching on sequences of (potentially) heterogenous objects, conditioned on those objects implementing a single, no-argument method returning a String. I propose it should be used to implement the desired behavior in the Ruby standard library.
@@ -12,35 +41,29 @@ I decided a while ago I wouldn't use [YARD](http://yardoc.org/)'s Ripper-based p
 Since Ripper strips the comments out when you use `Ripper.sexp`, and I'm not going to switch to the SAX-model of parsing just for comments, I had to use `Ripper.lex` to grab the comments. I immediately found this would prove annoying:
-{{{
-  pp Ripper.lex("  # some comment\n  # another comment\n def abc; end")
-}}}
+     pp Ripper.lex("  # some comment\n  # another comment\n def abc; end")
 gives
-{{{
- [[[1, 0], :on_sp, "  "],
-  [[1, 2], :on_comment, "# some comment\n"],
-  [[2, 0], :on_sp, "  "],
-  [[2, 2], :on_comment, "# another comment\n"],
-  [[3, 0], :on_sp, " "],
-  [[3, 1], :on_kw, "def"],
-  [[3, 4], :on_sp, " "],
-  [[3, 5], :on_ident, "abc"],
-  [[3, 8], :on_semicolon, ";"],
-  [[3, 9], :on_sp, " "],
-  [[3, 10], :on_kw, "end"]]
-}}}
+    [[[1, 0], :on_sp, "  "],
+     [[1, 2], :on_comment, "# some comment\n"],
+     [[2, 0], :on_sp, "  "],
+     [[2, 2], :on_comment, "# another comment\n"],
+     [[3, 0], :on_sp, " "],
+     [[3, 1], :on_kw, "def"],
+     [[3, 4], :on_sp, " "],
+     [[3, 5], :on_ident, "abc"],
+     [[3, 8], :on_semicolon, ";"],
+     [[3, 9], :on_sp, " "],
+     [[3, 10], :on_kw, "end"]]
 Naturally, Ripper is separating each line-comment into its own token, even those that follow on subsequent lines. I'd have to combine those comment tokens to get what a typical programmer considers one logical comment.
 I didn't want to write an ugly, imperative algorithm to do this: part of the beauty of writing Ruby is you don't often have to actually write a `while` loop. I described my frustration to my roommate, and he quickly observed the obvious connection to regular expressions. That's when I remembered [Ripper.slice and Ripper.token_match](http://ruby-doc.org/ruby-1.9/classes/Ripper.html#M001274) (token_match is undocumented), which provide almost exactly what I needed:
-{{{
- Ripper.slice("  # some comment\n  # another comment\n def abc; end",
-              'comment (sp? comment)*')
- # => "# some comment\n  # another comment\n"
-}}}
+    Ripper.slice("  # some comment\n  # another comment\n def abc; end",
+                 'comment (sp? comment)*')
+    # => "# some comment\n  # another comment\n"
 A few problems: `Ripper.slice` lexes its input on each invocation and then searches it from the start for one match. I need *all* matches. `Ripper.slice` also returns the exact string, and not the location in the source text of the match, which I need - how else will I know where the comments are? The lexer output includes line and column locations, so it should be easy to retrieve.
@@ -52,10 +75,8 @@ The core of regular expressions - the [actually "regular" kind](http://en.wikipe
 We could construct a separate DFA engine for searching sequences of our new alphabet, but we'd much rather piggyback an existing (and more-featured) implementation. Since the set of token types is countable, one can create a one-to-one mapping from token types to finite strings of an alphabet that Ruby's `Regexp` class can search, namely regular old characters. If we replace each occurrence of a member of our alphabet with a member of the target, Regexp alphabet, then we should be able to use Regexp to do regex searching on our token sequence. That transformation on the token sequence is easy: just map each token's type onto some string using a 1-to-1 function. However, one important bit that remains is how the search pattern is specified. As you saw above, we used:
-{{{
- 'comment (sp? comment)*'
-}}}
+    'comment (sp? comment)*'
 to specify a search for "a comment token, followed by zero or more groups, where each group is an optional space token followed by a comment token." This departs from traditional Regexp syntax, because our alphabet is no longer composed of individual characters, it is composed of tokens. For this implementation's sake, we can observe that we require whitespace be insensitive, and that `?` and `*` operators apply to tokens, not to characters. We could specify this input however we like, as long as we can generate the correct string-searching pattern from it.
 One last observation that allows us to use Regexp to search our tokens: we must be able to specify a one-to-one function from a token name to the set of tokens that it should match. In other words, no two tokens that we consider "different" can have the same token type. For a normal Regex, this is a trivial condition, as a character matches only that character. However, 'comment' must match the infinite set of all comment tokens. If we satisfy that condition, then there exists a function from a regex on token-types to a regex on strings. This is still pretty trivial to show for tokens, but later when we generalize this approach further, it becomes even more important to do correctly.
@@ -78,31 +99,27 @@ Let's run through the previous example:
 Ripper runs this code at load-time:
-{{{
- seed = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a
- SCANNER_EVENT_TABLE.each do |ev, |
-   raise CompileError, "[RIPPER FATAL] too many system token" if seed.empty?
-   MAP[ev.to_s.sub(/\Aon_/,'')] = seed.shift
- end
-}}}
+    seed = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a
+    SCANNER_EVENT_TABLE.each do |ev, |
+      raise CompileError, "[RIPPER FATAL] too many system token" if seed.empty?
+      MAP[ev.to_s.sub(/\Aon_/,'')] = seed.shift
+    end
 I fired up an `irb` instance and checked the result:
-{{{
- Ripper::TokenPattern::MAP
- # => {"CHAR"=>"a", "__end__"=>"b", "backref"=>"c", "backtick"=>"d",
-       "comma"=>"e", "comment"=>"f", "const"=>"g", "cvar"=>"h", "embdoc"=>"i",
-       "embdoc_beg"=>"j", "embdoc_end"=>"k", "embexpr_beg"=>"l",
-       "embexpr_end"=>"m", "embvar"=>"n", "float"=>"o", "gvar"=>"p",
-       "heredoc_beg"=>"q", "heredoc_end"=>"r", "ident"=>"s", "ignored_nl"=>"t",
-       "int"=>"u", "ivar"=>"v", "kw"=>"w", "label"=>"x", "lbrace"=>"y",
-       "lbracket"=>"z", "lparen"=>"A", "nl"=>"B", "op"=>"C", "period"=>"D",
-       "qwords_beg"=>"E", "rbrace"=>"F", "rbracket"=>"G", "regexp_beg"=>"H",
-       "regexp_end"=>"I", "rparen"=>"J", "semicolon"=>"K", "sp"=>"L",
-       "symbeg"=>"M", "tlambda"=>"N", "tlambeg"=>"O", "tstring_beg"=>"P",
-       "tstring_content"=>"Q", "tstring_end"=>"R", "words_beg"=>"S",
-       "words_sep"=>"T"}
-}}}
+    Ripper::TokenPattern::MAP
+    # => {"CHAR"=>"a", "__end__"=>"b", "backref"=>"c", "backtick"=>"d",
+          "comma"=>"e", "comment"=>"f", "const"=>"g", "cvar"=>"h", "embdoc"=>"i",
+          "embdoc_beg"=>"j", "embdoc_end"=>"k", "embexpr_beg"=>"l",
+          "embexpr_end"=>"m", "embvar"=>"n", "float"=>"o", "gvar"=>"p",
+          "heredoc_beg"=>"q", "heredoc_end"=>"r", "ident"=>"s", "ignored_nl"=>"t",
+          "int"=>"u", "ivar"=>"v", "kw"=>"w", "label"=>"x", "lbrace"=>"y",
+          "lbracket"=>"z", "lparen"=>"A", "nl"=>"B", "op"=>"C", "period"=>"D",
+          "qwords_beg"=>"E", "rbrace"=>"F", "rbracket"=>"G", "regexp_beg"=>"H",
+          "regexp_end"=>"I", "rparen"=>"J", "semicolon"=>"K", "sp"=>"L",
+          "symbeg"=>"M", "tlambda"=>"N", "tlambeg"=>"O", "tstring_beg"=>"P",
+          "tstring_content"=>"Q", "tstring_end"=>"R", "words_beg"=>"S",
+          "words_sep"=>"T"}
 This is completely implementation-dependent, but these characters are an implementation detail for the algorithm anyway.
@@ -110,27 +127,21 @@ This is completely implementation-dependent, but these characters are an impleme
 Ripper implements this as follows:
-{{{
- def map_tokens(tokens)
-   tokens.map {|pos,type,str| map_token(type.to_s.sub(/\Aon_/,'')) }.join
- end
-}}}
+    def map_tokens(tokens)
+      tokens.map {|pos,type,str| map_token(type.to_s.sub(/\Aon_/,'')) }.join
+    end
 Running this on our token stream before (markdown doesn't support anchors, so scroll up if necessary), we get this:
-{{{
- "LfLfLwLsKLw"
-}}}
+    "LfLfLwLsKLw"
 This is what we will eventually run our modified Regexp against.
 ### The search pattern is transformed into a pattern that can search this mapped representation of the token sequence. Each token found in the search pattern is replaced by its corresponding single character, and whitespace is removed.
 What we want is `comment (sp? comment)*`. In this mapped representation, a quick look at the table above shows the regex we need is
-{{{
-  /f(L?f)*/
-}}}
+     /f(L?f)*/
 Ripper implements this in a somewhat roundabout fashion, as it seems they wanted to experiment with slightly different syntax. Since my implementation (which I'll present shortly) does not retain these syntax changes, I choose not to list the Ripper version here.
@@ -140,33 +151,27 @@ We run `/f(L?f)*/` on `"LfLfLwLsKLw"`. It matches `fLf` at position 1.
 As expected, the implementation is quite simple for Ripper:
-{{{
- def match_list(tokens)
-   if m = @re.match(map_tokens(tokens))
-   then MatchData.new(tokens, m)
-   else nil
-   end
- end
-}}}
+    def match_list(tokens)
+      if m = @re.match(map_tokens(tokens))
+      then MatchData.new(tokens, m)
+      else nil
+      end
+    end
 ### Since each character in the mapped sequence corresponds to a single token, we can index into the original token sequence using the exact boundaries of the match result.
 The boundaries returned were `(1..4]` in mathematical notation, or `(1...4)`/`(1..3)` as Ruby ranges. We then use this range on the original sequence, which returns:
-{{{
- [[[1, 2], :on_comment, "# some comment\n"],
-  [[2, 0], :on_sp, "  "],
-  [[2, 2], :on_comment, "# another comment\n"]]
-}}}
+    [[[1, 2], :on_comment, "# some comment\n"],
+     [[2, 0], :on_sp, "  "],
+     [[2, 2], :on_comment, "# another comment\n"]]
 The implementation is again quite simple in Ripper, yet it for some reason immediately extracts the token contents:
-{{{
- def match(n = 0)
-   return [] unless @match
-   @tokens[@match.begin(n)...@match.end(n)].map {|pos,type,str| str }
- end
-}}}
+    def match(n = 0)
+      return [] unless @match
+      @tokens[@match.begin(n)...@match.end(n)].map {|pos,type,str| str }
+    end
 ## Generalization
@@ -187,74 +192,72 @@ For lack of a better name, we'll call this an `ObjectRegex`.
 The full listing follows. You'll quickly notice that I haven't yet implemented the API that I actually need for Wool. Keeping focused seems incompatible with curiosity in my case, unfortunately.
-{{{
- class ObjectRegex
-   def initialize(pattern)
-     @map = generate_map(pattern)
-     @pattern = generate_pattern(pattern)
-   end
-   def mapped_value(reg_desc)
-     @map[reg_desc] || @map[:FAILBOAT]
-   end
-   MAPPING_CHARS = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a
-   def generate_map(pattern)
-     alphabet = pattern.scan(/[A-Za-z]+/).uniq
-     repr_size = Math.log(alphabet.size + 1, MAPPING_CHARS.size).ceil
-     @item_size = repr_size + 1
-     map = Hash[alphabet.map.with_index do |symbol, idx|
-       [symbol, mapping_for_idx(repr_size, idx)]
-     end]
-     map.merge!(FAILBOAT: mapping_for_idx(repr_size, map.size))
-   end
-   def mapping_for_idx(repr_size, idx)
-     convert_to_mapping_radix(repr_size, idx).map do |char|
-       MAPPING_CHARS[char]
-     end.join + ';'
-   end
-   def convert_to_mapping_radix(repr_size, num)
-     result = []
-     repr_size.times do
-       result.unshift(num % MAPPING_CHARS.size)
-       num /= MAPPING_CHARS.size
-     end
-     result
-   end
-   def generate_pattern(pattern)
-     replace_tokens(fix_dots(remove_ranges(pattern)))
-   end
-   def remove_ranges(pattern)
-     pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
-       '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
-     end
-   end
-   def fix_dots(pattern)
-     pattern.gsub('.', '.' * (@item_size - 1) + ';')
-   end
-   def replace_tokens(pattern)
-     pattern.gsub(/[A-Za-z]+/) do |match|
-       '(?:' + mapped_value(match) + ')'
-     end.gsub(/\s/, '')
-   end
-   def match(input)
-     new_input = input.map { |object| object.reg_desc }.
-                       map { |desc| mapped_value(desc) }.join
-     if (match = new_input.match(@pattern))
-       start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
-       input[start...stop]
-     end
-   end
- end
-}}}
+    class ObjectRegex
+      def initialize(pattern)
+        @map = generate_map(pattern)
+        @pattern = generate_pattern(pattern)
+      end
+      def mapped_value(reg_desc)
+        @map[reg_desc] || @map[:FAILBOAT]
+      end
+      MAPPING_CHARS = ('a'..'z').to_a + ('A'..'Z').to_a + ('0'..'9').to_a
+      def generate_map(pattern)
+        alphabet = pattern.scan(/[A-Za-z]+/).uniq
+        repr_size = Math.log(alphabet.size + 1, MAPPING_CHARS.size).ceil
+        @item_size = repr_size + 1
+        map = Hash[alphabet.map.with_index do |symbol, idx|
+          [symbol, mapping_for_idx(repr_size, idx)]
+        end]
+        map.merge!(FAILBOAT: mapping_for_idx(repr_size, map.size))
+      end
+      def mapping_for_idx(repr_size, idx)
+        convert_to_mapping_radix(repr_size, idx).map do |char|
+          MAPPING_CHARS[char]
+        end.join + ';'
+      end
+      def convert_to_mapping_radix(repr_size, num)
+        result = []
+        repr_size.times do
+          result.unshift(num % MAPPING_CHARS.size)
+          num /= MAPPING_CHARS.size
+        end
+        result
+      end
+      def generate_pattern(pattern)
+        replace_tokens(fix_dots(remove_ranges(pattern)))
+      end
+      def remove_ranges(pattern)
+        pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
+          '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
+        end
+      end
+      def fix_dots(pattern)
+        pattern.gsub('.', '.' * (@item_size - 1) + ';')
+      end
+      def replace_tokens(pattern)
+        pattern.gsub(/[A-Za-z]+/) do |match|
+          '(?:' + mapped_value(match) + ')'
+        end.gsub(/\s/, '')
+      end
+      def match(input)
+        new_input = input.map { |object| object.reg_desc }.
+                          map { |desc| mapped_value(desc) }.join
+        if (match = new_input.match(@pattern))
+          start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
+          input[start...stop]
+        end
+      end
+    end
 ## Generalized Map Generation
@@ -262,61 +265,47 @@ Generating the map is the primary interest here, so I'll start there.
 First, we discover the alphabet by extracting all matches for `/[A-Za-z]+/` from the input pattern.
-{{{
- alphabet = pattern.scan(/[A-Za-z]+/).uniq
-}}}
+    alphabet = pattern.scan(/[A-Za-z]+/).uniq
 We figure out how many characters we need to represent that many elements, and save that for later:
-{{{
- # alphabet.size + 1 because of the catch-all, "not-in-pattern" mapping
- repr_size = Math.log(alphabet.size + 1, MAPPING_CHARS.size).ceil
- # repr_size + 1 because we will be inserting a terminator in a moment
- @item_size = repr_size + 1
-}}}
+    # alphabet.size + 1 because of the catch-all, "not-in-pattern" mapping
+    repr_size = Math.log(alphabet.size + 1, MAPPING_CHARS.size).ceil
+    # repr_size + 1 because we will be inserting a terminator in a moment
+    @item_size = repr_size + 1
 Now, we just calculate the [symbol, mapped\_symbol] pairs for each symbol in the input alphabet:
-{{{
- map = Hash[alphabet.map.with_index do |symbol, idx|
-   [symbol, mapping_for_idx(repr_size, idx)]
- end]
-}}}
+    map = Hash[alphabet.map.with_index do |symbol, idx|
+      [symbol, mapping_for_idx(repr_size, idx)]
+    end]
 We'll come back to how this works, but we must add the catch-all map entry: the entry that is triggered if we see a token in the searched sequence that didn't appear in the search pattern:
-{{{
- map.merge!(FAILBOAT: mapping_for_idx(repr_size, map.size))
-}}}
+    map.merge!(FAILBOAT: mapping_for_idx(repr_size, map.size))
 Note that we avoid the use of the `inject({})` idiom common for constructing Hashes, since the computation of each tuple is independent from the others. `mapping_for_idx` is responsible for finding the mapped string for the given element. In Ripper, this was just an index into an array. However, if we want more than 62 possible elements in our alphabet, we instead need to convert the index into a base-62 number, first. `convert_to_mapping_radix` does this, using the size of the `MAPPING_CHARS` constant as the new radix:
-{{{
- # Standard radix conversion.
- def convert_to_mapping_radix(repr_size, num)
-   result = []
-   repr_size.times do
-     result.unshift(num % MAPPING_CHARS.size)
-     num /= MAPPING_CHARS.size
-   end
-   result
- end
-}}}
+    # Standard radix conversion.
+    def convert_to_mapping_radix(repr_size, num)
+      result = []
+      repr_size.times do
+        result.unshift(num % MAPPING_CHARS.size)
+        num /= MAPPING_CHARS.size
+      end
+      result
+    end
 If MAPPING\_CHARS.size = 62, then:
-{{{
- convert_to_mapping_radix(3, 12498)
- # => [3, 15, 36]
-}}}
+    convert_to_mapping_radix(3, 12498)
+    # => [3, 15, 36]
 After we convert each number into the necessary radix, we can then convert that array of place-value integers into a string by mapping each place value to its corresponding character in the MAPPING\_CHARS array:
-{{{
- def mapping_for_idx(repr_size, idx)
-   convert_to_mapping_radix(repr_size, idx).map { |char| MAPPING_CHARS[char] }.join + ';'
- end
-}}}
+    def mapping_for_idx(repr_size, idx)
+      convert_to_mapping_radix(repr_size, idx).map { |char| MAPPING_CHARS[char] }.join + ';'
+    end
 Notice that we added a semicolon at the end there. The choice of semicolon was arbitrary - it could be any valid character that isn't in MAPPING\_CHARS. Why'd I add that?
@@ -326,97 +315,79 @@ Imagine we were searching for a long input sequence that needed 2 characters per
 After building the new map, constructing the corresponding search pattern is quite simple:
-{{{
- def generate_pattern(pattern)
-   replace_tokens(fix_dots(remove_ranges(pattern)))
- end
- def remove_ranges(pattern)
-   pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
-     '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
-   end
- end
- def fix_dots(pattern)
-   pattern.gsub('.', '.' * (@item_size - 1) + ';')
- end
- def replace_tokens(pattern)
-   pattern.gsub(/[A-Za-z]+/) do |match|
-     '(?:' + mapped_value(match) + ')'
-   end.gsub(/\s/, '')
- end
-}}}
+    def generate_pattern(pattern)
+      replace_tokens(fix_dots(remove_ranges(pattern)))
+    end
+    def remove_ranges(pattern)
+      pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
+        '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
+      end
+    end
+    def fix_dots(pattern)
+      pattern.gsub('.', '.' * (@item_size - 1) + ';')
+    end
+    def replace_tokens(pattern)
+      pattern.gsub(/[A-Za-z]+/) do |match|
+        '(?:' + mapped_value(match) + ')'
+      end.gsub(/\s/, '')
+    end
 First, we have to account for this regex syntax:
-{{{
- [comment embdoc_beg int]
-}}}
+    [comment embdoc_beg int]
 which we assume to mean "comment or eof or int", much like `[Acf]` means "A or c or f". Since constructs such as `A-Z` don't make sense with an arbitrary alphabet, we don't need to concern ourselves with that syntax. However, if we simply replace "comment" with its mapped string, and do the same with eof and int, we get something like this:
-{{{
- [f;j;u;]
-}}}
+    [f;j;u;]
 which won't work: it'll match any semicolon! So we manually replace all instances of `[tok1 tok2 ... tokn]` with `tok1|tok2|...|tokn`. A simple gsub does the trick, since nested ranges don't really make much sense. This is implemented in #remove\_ranges:
-{{{
- def remove_ranges(pattern)
-   pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
-     '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
-   end
- end
-}}}
+    def remove_ranges(pattern)
+      pattern.gsub(/\[([A-Za-z ]*)\]/) do |match|
+        '(?:' + match[1..-2].split(/\s+/).join('|') + ')'
+      end
+    end
 Next, we replace the '.' matcher with a sequence of dots equal to the size of our token mapping, followed by a semicolon: this is how we properly match "any alphabet element" in our mapped form.
-{{{
- def fix_dots(pattern)
-   pattern.gsub('.', '.' * (@item_size - 1) + ';')
- end
-}}}
+    def fix_dots(pattern)
+      pattern.gsub('.', '.' * (@item_size - 1) + ';')
+    end
 Then, we simply replace each alphabet element with its mapped value. Since those mapped values could be more than one character, we must group them for other Regex features such as `+` or `*` to work properly; since we may want to extract subexpressions, we must make the group we introduce here non-capturing. Then we just strip whitespace.
-{{{
- def replace_tokens(pattern)
-   pattern.gsub(/[A-Za-z]+/) do |match|
-     '(?:' + mapped_value(match) + ')'
-   end.gsub(/\s/, '')
- end
-}}}
+    def replace_tokens(pattern)
+      pattern.gsub(/[A-Za-z]+/) do |match|
+        '(?:' + mapped_value(match) + ')'
+      end.gsub(/\s/, '')
+    end
 ## Generalized Matching
 Lastly, we have a simple #match method:
-{{{
- def match(input)
-   new_input = input.map { |object| object.reg_desc }.map { |desc| mapped_value(desc) }.join
-   if (match = new_input.match(@pattern))
-     start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
-     input[start...stop]
-   end
- end
-}}}
+    def match(input)
+      new_input = input.map { |object| object.reg_desc }.map { |desc| mapped_value(desc) }.join
+      if (match = new_input.match(@pattern))
+        start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
+        input[start...stop]
+      end
+    end
 While there's many ways of extracting results from a Regex match, here we do the simplest: return the subsequence of the original sequence that matches first (using the usual leftmost, longest rule of course). Here comes the one part where you have to modify the objects that are in the sequence: in the first line, you'll see:
-{{{
- input.map { |object| object.reg_desc }.map { |desc| mapped_value(desc) }
-}}}
+    input.map { |object| object.reg_desc }.map { |desc| mapped_value(desc) }
 This interrogates each object for its string representation: the string you typed into your search pattern if you wanted to find it. The method name (`reg_desc` in this case) is arbitrary, and this could also be implemented by providing a `Proc` to the ObjectRegex at initialization, and having the Proc be responsible for determining string representations.
 We also see on the 3rd and 4th lines of the method why we stored @item\_size earlier: for boundary calculations:
-{{{
- start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
- input[start...stop]
-}}}
+    start, stop = match.begin(0) / @item_size, match.end(0) / @item_size
+    input[start...stop]
 Sometimes I wish `begin` and `end` could be local variable names in Ruby. Alas.
 ## Conclusion

data/VERSION CHANGED

	@@ -1 +1 @@
1	- 1.0.0
1	+ 1.0.1

metadata CHANGED

@@ -5,8 +5,8 @@ version: !ruby/object:Gem::Version
   segments:
   - 1
   - 0
-  - 0
-  version: 1.0.0
+  - 1
+  version: 1.0.1
 platform: ruby
 authors:
 - Michael Edgar
@@ -14,14 +14,13 @@ autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-01-25 00:00:00 -05:00
+date: 2011-01-31 00:00:00 -05:00
 default_executable:
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
   prerelease: false
   requirement: &id001 !ruby/object:Gem::Requirement
-    none: false
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
@@ -36,7 +35,6 @@ dependencies:
   name: yard
   prerelease: false
   requirement: &id002 !ruby/object:Gem::Requirement
-    none: false
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
@@ -78,7 +76,6 @@ rdoc_options:
 require_paths:
 - lib
 required_ruby_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
@@ -86,7 +83,6 @@ required_ruby_version: !ruby/object:Gem::Requirement
       - 0
       version: "0"
 required_rubygems_version: !ruby/object:Gem::Requirement
-  none: false
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
@@ -96,7 +92,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
 requirements: []
 rubyforge_project:
-rubygems_version: 1.3.7
+rubygems_version: 1.3.6
 signing_key:
 specification_version: 3
 summary: Perform regex searches on arbitrary sequences.