RubyGems - minilex - Versions diffs - 0.1.0 - Mend

minilex 0.1.0

Files changed (6) hide show

data/LICENSE ADDED Viewed

@@ -0,0 +1,19 @@
+Copyright (c) 2012 Arun Srinivasan
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,103 @@
+# Minilex
+A little lexer toolkit, for basic lexing needs.
+It's designed for the cases where parsers do the parsing, and all you need from
+your lexer is an array of simple tokens.
+## Usage
+```ruby
+Expression = Minilex::Lexer.new do
+  skip :whitespace, /\s+/
+  tok :number, /\d+(?:\.\d+)?/
+  tok :operator, /[\+\=\/\*]/
+end
+Expression.lex('1 + 2.34')
+# => [[:number, '1', 1, 0],
+#     [:operator, '+', 1, 3],
+#     [:number, '2.34', 1, 5]
+#     [:eos]]
+```
+To create a lexer with Lex, instantiate a `Minilex::Lexer` and define rules.
+There are two methods for defining rules, `skip` and `tok`:
+`skip` takes an `id` and a `pattern`. The lexer will ignore all occurrences of
+the pattern in the input text. The `id` isn't strictly necessary, but it's nice
+for readability and is a required argument.
+`tok` also takes an `id` and a  `pattern`. The lexer will turn all occurrences
+of the pattern into a token of the form:
+```ruby
+[id, value, line, offset]
+# id     - the id you provided
+# value  - the matched value
+# line   - line number
+# offset - character position in the line
+```
+## Overriding the token format
+If you'd like to customize the token format, override `append_token`:
+```ruby
+Digits = Minilex::Lexer.new do
+  skip :whitespace, /\s+/
+  tok :digit, /\d/
+  # id    - the id of the matched rule
+  # value - the value that was matched
+  #
+  # You have access to the array of tokens via `tokens` and the current
+  # token's position # information via `pos`.
+  def append_token(id, value)
+    tokens << Integer(value)
+  end
+  # By default, the lexer will append an end-of-stream token to the end of
+  # the tokens array. You can override what the eos token is or even suppress
+  # it altogether with the append_eos callback.
+  #
+  # Here we'll suppress it by doing nothing
+  def append_eos
+  end
+end
+digits.lex('1 2 3 4')
+# => [1, 2, 3, 4]
+```
+## Processing values
+There's one more thing you can do. It's just for convenience, though I'm not
+sure it really belongs in something that's supposed to do as little as
+possible. I might remove it.
+The `tok` method accepts a third optional `processor` argument, which should
+name a method on the lexer (you'll have to write the method, of course).
+What this will do is give you a chance to get at the matched text before it
+gets stuffed into a token:
+```ruby
+DigitsConverter = Minilex::Lexer.new do
+  skip :whitespace, /\s+/
+  tok :digit, /\d/, :integer
+  def integer(str)
+    Integer(str)
+  end
+end
+DigitsConverter.lex('123')
+# => [[:digit, 1, 1, 0], [:digit, 2, 1, 1], [:digit, 3, 1, 2], [:eos]]
+#              ^                  ^                  ^
+#              ^                  ^                  ^
+#            These are Integers (would have been Strings)
+```

data/Rakefile ADDED Viewed

@@ -0,0 +1,29 @@
+require File.expand_path('../lib/minilex', __FILE__)
+version = Minilex::VERSION
+name = 'minilex'
+desc "Build minilex gem"
+task :build => :clean do
+  sh "mkdir -p pkg"
+  sh "gem build minilex.gemspec"
+end
+desc "Create tag v#{version}, build, and push to Rubygems"
+task :release => :build do
+  unless `git branch` =~ /^\* master$/
+    puts "You must be on the master branch to release!"
+    exit!
+  end
+  sh "git commit --allow-empty -a -m 'Release #{version}'"
+  sh "git tag v#{version}"
+  sh "git push origin master"
+  sh "git push origin v#{version}"
+  sh "gem push #{name}-#{version}.gem"
+end
+desc "Clean up generated files"
+task :clean do
+  sh "rm -f *.gem"
+end

data/lib/minilex.rb ADDED Viewed

@@ -0,0 +1,144 @@
+require 'strscan'
+module Minilex
+  Rule = Struct.new(:id, :pattern, :processor, :skip)
+  Pos = Struct.new(:line, :offset)
+  class Lexer
+    attr_reader :rules, :tokens, :pos, :scanner
+    # Creates a Lexer instance
+    #
+    #     Expression = Minilex::Lexer.new do
+    #       skip :whitespace, /\s+/
+    #       tok :number, /\d+(?:\.\d+)?/
+    #       tok :operator, /[\+\=\/\*]/
+    #     end
+    #
+    # You don't have to pass a block. This also works:
+    #
+    #     Expression = Minilex::Lexer.new
+    #     Expression.skip :whitespace, /\s+/
+    #     Expression.tok :number, /\d+(?:\.\d+)?/
+    #     Expression.tok :operator, /[\+\=\/\*]/
+    def initialize(&block)
+      @rules = []
+      instance_eval &block if block
+    end
+    # Defines a token-matching rule
+    #
+    # id        - this token's identifier
+    # pattern   - a Regexp to match this token
+    # processor - a Sym that references a method on
+    #             this Lexer instance, which will
+    #             be called to produce the `value`
+    #             for this token (defaults to nil)
+    def tok(id, pattern, processor=nil)
+      rules << Rule.new(id, pattern, processor)
+    end
+    # Defines patterns to ignore
+    #
+    # id      - an identifier, it's nice to name things
+    # pattern - the Regexp to skip
+    def skip(id, pattern)
+      rules << Rule.new(id, pattern, nil, true)
+    end
+    # Runs the lexer on the given input
+    #
+    # returns an Array of tokens
+    def lex(input)
+      @tokens = []
+      @pos = Pos.new(1, 0)
+      @scanner = StringScanner.new(input)
+      until scanner.eos?
+        rule, text = match
+        value = rule.processor ? send(rule.processor, text) : text
+        append_token(rule.id, value) unless rule.skip
+        update_pos(text)
+      end
+      append_eos
+      tokens
+    end
+    # Makes a token
+    #
+    # id    - the id of the matched rule
+    # value - the value that was matched
+    #
+    # Called when a rule is matched to build the
+    # resulting token.
+    #
+    # Override this method if you'd like your tokens
+    # in a different form. You have access to the
+    # array of tokens via `tokens` and the current
+    # token's position information via `pos`.
+    #
+    # returns an Array of [id, value, line, offset]
+    def append_token(id, value)
+      tokens << [id, value, pos.line, pos.offset]
+    end
+    # Makes the end-of-stream token
+    #
+    # Similar to `append_token`, used to make the final
+    # token. Append [:eos] to the `tokens` array.
+    def append_eos
+      tokens << [:eos]
+    end
+    # [internal] Finds the matching rule
+    #
+    # Tries the rules in defined order until there's
+    # a match. Raise an UnrecognizedInput error if
+    # ther isn't one.
+    #
+    # returns a 2-element Array of [rule, matched_text]
+    def match
+      rules.each do |rule|
+        next unless text = scanner.scan(rule.pattern)
+        return [rule, text]
+      end
+      raise UnrecognizedInput.new(scanner, pos)
+    end
+    # [internal] Updates the position information
+    #
+    # text - the String that was matched by `match`
+    #
+    # Inspects the matched text for newlines and updates
+    # the line number and offset accordingly
+    def update_pos(text)
+      pos.line += newlines = text.count(?\n)
+      if newlines > 0
+        pos.offset = text.rpartition(?\n)[2].length
+      else
+        pos.offset += text.length
+      end
+    end
+  end
+  # The error raised when a Lexer can't match some input
+  #
+  # It will show the offending characters and tell you
+  # where in the input it was when it got confused.
+  class UnrecognizedInput < StandardError
+    attr_reader :scanner, :pos
+    def initialize(scanner, pos)
+      @scanner = scanner
+      @pos = pos
+    end
+    def to_s
+      "\"#{scanner.peek(10)}\" at line:#{pos.line}, offset:#{pos.offset}"
+    end
+  end
+end
+Minilex::VERSION = '0.1.0'

data/spec/lexer_spec.rb ADDED Viewed

@@ -0,0 +1,78 @@
+require 'rspec'
+require'minilex'
+describe "A simple lexer" do
+  let(:lexer) do
+    Minilex::Lexer.new do
+      skip :whitespace, /\s+/
+      tok :digit, /\d/
+    end
+  end
+  it "raises an error on unrecognized input" do
+    expect do
+      lexer.lex('123abc')
+    end.to raise_error(Minilex::UnrecognizedInput)
+  end
+  it "returns a single :eos token on empty input" do
+    lexer.lex('').should == [[:eos]]
+  end
+  it "recognizes a single digit" do
+    lexer.lex('1').should == [[:digit, '1', 1, 0], [:eos]]
+  end
+  it "recognizes multiple digits" do
+    lexer.lex('123').should ==
+      [[:digit, '1', 1, 0], [:digit, '2', 1, 1], [:digit, '3', 1, 2], [:eos]]
+  end
+  it "skips whitespace" do
+    lexer.lex('1 2 3').should ==
+      [[:digit, '1', 1, 0], [:digit, '2', 1, 2], [:digit, '3', 1, 4], [:eos]]
+  end
+  it "keeps track of line numbers" do
+    lexer.lex("1\n2\n3").should ==
+      [[:digit, '1', 1, 0], [:digit, '2', 2, 0], [:digit, '3', 3, 0], [:eos]]
+  end
+  it "keeps track of line offsets" do
+    lexer.lex(" 1\n  2\n   3").should ==
+      [[:digit, '1', 1, 1], [:digit, '2', 2, 2], [:digit, '3', 3, 3], [:eos]]
+  end
+end
+describe "A simple lexer with converter" do
+  let(:lexer) do
+    Minilex::Lexer.new do
+      skip :whitespace, /\s+/
+      tok :digit, /\d/, :integer
+      def integer(str); Integer(str); end
+    end
+  end
+  it "convert the digit value to an integer" do
+    lexer.lex('123').should ==
+      [[:digit, 1, 1, 0], [:digit, 2, 1, 1], [:digit, 3, 1, 2], [:eos]]
+  end
+end
+describe "Overriding how tokens are made" do
+  let(:lexer) do
+    Minilex::Lexer.new do
+      skip :whitespace, /\s+/
+      tok :digit, /\d/
+      def append_token(id, value); tokens << Integer(value) ** 2; end
+      def append_eos; tokens << "Zanzibar!"; end
+    end
+  end
+  it "returns tokens from the overwritten :append_{token|eos} methods" do
+    lexer.lex('123').should == [1, 4, 9, "Zanzibar!"]
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,63 @@
+--- !ruby/object:Gem::Specification
+name: minilex
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+  prerelease:
+platform: ruby
+authors:
+- satchmorun
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2012-04-30 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: &70325614561220 !ruby/object:Gem::Requirement
+    none: false
+    requirements:
+    - - ! '>='
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: *70325614561220
+description: A little lexer toolkit, designed for the cases where parsers do the parsing
+  and lexers do the lexing.
+email: rulfzid@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- LICENSE
+- README.md
+- Rakefile
+- lib/minilex.rb
+- spec/lexer_spec.rb
+homepage: http://github.com/satchmorun/minilex
+licenses:
+- MIT
+post_install_message:
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ! '>='
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubyforge_project:
+rubygems_version: 1.8.10
+signing_key:
+specification_version: 3
+summary: A little lexer toolkit.
+test_files:
+- spec/lexer_spec.rb