RubyGems - scanner - Versions diffs - 0.0.2 → 0.0.3 - Mend

scanner 0.0.2 → 0.0.3

Files changed (5) hide show

data/README.md +131 -4
data/lib/scanner/scanner.rb +29 -14
data/lib/scanner/version.rb +1 -1
data/spec/scanner/scanner_spec.rb +15 -0
metadata +4 -4

data/README.md CHANGED Viewed

@@ -24,19 +24,146 @@ Scanner is a module that you can include in your classes. It defines a
 token function that accepts the regular expression that the token
 matches.
-Example code
+For example
     class TestScanner
       include Scanner
-      ignore /\s+/
-      token :number, /\d+/
-      token :id, /\w+/
+      ignore '\s+'
+      token :number, '\d+'
+      token :id, '[a-z]+'
     end
     @scanner = TestScanner.new
     @scanner.parse("123")
     @scanner.look_ahead.is?(:number) # Should be true
+### Token definition
+Each token is defined by a symbol, used to identify the token, and a
+regular expression that the token should match. An optional third
+parameter accepts a hash of options that we will explore later. For
+example
+    token :number, '\d+'
+will match strings containing digits.
+Some care is needed when defining tokens that collide with other
+tokens. For instance, a languange may define the token '==' and the
+token '='. You need to define the double equals before the single
+equals, otherwise the string '==' will be identified as two '=' tokens,
+instead of a '==' token.
+### Ignoring characters
+For many scanning needs, there is a set of characters that is safely
+ignored, for instace, in many programming languages, spaces and
+newlines. You can define the set of characters to ignore with the
+following definition:
+    ignore '[\s|\n]+'
+### Defining keywords
+For many scanning needs, there is a set of tokens that define the
+reserved words or keywords of a language. For instance, in Ruby, the
+tokens 'def', 'class', 'module', and so on, are language reserved words.
+Usually, these tokens are a subset of a larger token group, called
+identifiers or ids. You can define a family of reserved words by using
+the 'keywords' function.
+    ignore '[\s|\n]+'
+    token :id, '[a-z]+'
+    keywords %w{def class module}
+    @scanner.parse("other def")
+    @scanner.lookahead.is?(:id)
+    @scanner.lookahead(2).is?(:def)
+Note that you will need to have a token definition that matches those
+keywords, as the token :id in the previous example.
+### Consuming tokens and looking ahead
+The Scanner method consume will try to match the first token remaining
+in the input string. If successful, it will return the token, and remove
+it from the input string.
+    ignore '[\s|\n]+'
+    token :id, '[a-z]+'
+    @scanner.parse("one two")
+    @scanner.consume.content == "one"
+    @scanner.consume.content == "two"
+Lookahead performs a similar function, but without removing the token
+from the string. It accepts an optional parameter indicating the number
+of tokens to look ahead.
+    @scanner.parse("one two")
+    @scanner.lookahead.content == "one"
+    @scanner.lookahead(2).content == "two"
+### End of file
+    ignore '\s+'
+    token :number, '\d+'
+    token :id, '[a-z]+'
+    @scanner = TestScanner.new
+    @scanner.parse("123 abc 456 other")
+    begin
+      token = @scanner.consume
+      puts token.content
+    end while token.is_not? :eof
+You need you have reached the end of the parse string when you receive
+the :eof token. For instance
+### Looping through tokens
+A scanner instance is a ruby Enumerable, so you can use each, map, and
+others.
+      @scanner.parse("123 456")
+      @scanner.map { |tok| "-#{tok.content}-" }
+### Token separation
+Sometimes it is necessary to indicate that a given token needs to be
+followed by a token separator. For instance, in this example
+      token :number, '\d+'
+      token :id, '[a-z]+'
+The string "abc123" will be parsed as an :id followed by a :number,
+which may be undesirable. You may want to indicate that a token
+separator (commonly spaces, arithmetic operators, puntuation marks,
+etc) needs to occur after :id or :number.
+The following code requires a space after ids and numbers:
+      token :number, '\d+', check_for_token_separator: true
+      token :id, '[a-z]+', check_for_token_separator: true
+      token_separator '\s'
+### Looking ahead for token types
+When scanning strings, it is often necessary to lookahead to check what
+types of tokens are coming. For instance:
+    if @scanner.lookahead.is?(:id) && @scanner.lookahead(2).is(:equal)
+      # variable assignment
+Scanner provides a few utility functions to make this type of check
+easier. For instance, the previous code could be refactored to:
+    if @scanner.tokens_are?(:id, :equal)
+The other two methods available are token_is? and token_is_not?.
+### Tokens
+The tokens returned by consume and lookahead have a few  methods, which
+should be self explanatory:
+* content
+* line
+* column
+* is? => Checks that the token is of a given type
+* is_not? => The opposite
 ## Contributing

data/lib/scanner/scanner.rb CHANGED Viewed

@@ -50,38 +50,38 @@ module Scanner
   def check_for_token_separator
     self.class.instance_eval { @check_for_token_separator }
-  end
+  end
   def separator
     self.class.instance_eval { @separator }
-  end
+  end
   public
+  include Enumerable
   def parse(program)
     @program = program
     @token_list = []
     @line_number = 1
     @column_number = 1
+    @token_number = 0
   end
   def consume
-    if @token_list.empty?
+    if @token_number >= @token_list.size
       consume_next_token
-    else
-      @token_list.shift
     end
+    token = @token_list[@token_number]
+    @token_number+=1
+    token
   end
   def look_ahead(number_of_tokens = 1)
-    end_of_file_met = false
-    while @token_list.size < number_of_tokens
-      throw :scanner_exception if end_of_file_met
-      token = consume_next_token
-      @token_list << token
-      end_of_file_met = token.is? :eof
+    while @token_list.size < @token_number + number_of_tokens
+      consume_next_token
     end
-    @token_list[-1]
+    @token_list[@token_number + number_of_tokens - 1]
   end
   def token_is?(token_type)
@@ -101,6 +101,20 @@ module Scanner
     return true
   end
+  def each
+    local_index = 0
+    begin
+      if local_index >= @token_list.size
+        consume_next_token
+      end
+      current_token = @token_list[local_index]
+      if current_token.is_not? :eof
+        yield current_token
+      end
+      local_index += 1
+    end while current_token.is_not? :eof
+  end
   private
@@ -114,7 +128,8 @@ module Scanner
         if check_for_token_separator[symbol]
           check_for_separator
         end
-        return Token.new(token_type, content, @line_number, currently_at_column)
+        @token_list << Token.new(token_type, content, @line_number, currently_at_column)
+        return
       end
     end
@@ -128,7 +143,7 @@ module Scanner
   def get_token_from_reg_exp(reg_exp, symbol)
     content = consume_regular_expression(reg_exp)
-    if keywords.include? content
+    if keywords && keywords.include?(content)
       token_type = content.to_sym
     else
       token_type = symbol

data/lib/scanner/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Scanner
-  VERSION = "0.0.2"
+  VERSION = "0.0.3"
 end

data/spec/scanner/scanner_spec.rb CHANGED Viewed

@@ -35,6 +35,21 @@ describe Scanner do
     end
   end
+  describe "has enumerable functions" do
+    it "has each" do
+      @scanner.parse("123 456")
+      @scanner.each do |tok|
+        tok.content.should match /123|456/
+      end
+    end
+    it "has map" do
+      @scanner.parse("123 456")
+      map_results = @scanner.map { |tok| "-#{tok.content}-" }
+      map_results.should eq ["-123-","-456-"]
+    end
+  end
   describe "lookahead" do
     it "returns the next token without arguments" do
       @scanner.parse("123")

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: scanner
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
   prerelease:
 platform: ruby
 authors:
@@ -9,7 +9,7 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2012-08-03 00:00:00.000000000 Z
+date: 2012-08-09 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
@@ -77,7 +77,7 @@ required_ruby_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: 1008594902208819548
+      hash: -2266866493885490648
 required_rubygems_version: !ruby/object:Gem::Requirement
   none: false
   requirements:
@@ -86,7 +86,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
       segments:
       - 0
-      hash: 1008594902208819548
+      hash: -2266866493885490648
 requirements: []
 rubyforge_project:
 rubygems_version: 1.8.24