RubyGems - whittle - Versions diffs - 0.0.6 → 0.0.7 - Mend

whittle 0.0.6 → 0.0.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

data/README.md +49 -7
data/examples/calculator.rb +10 -0
data/lib/whittle/parse_error_builder.rb +92 -0
data/lib/whittle/parser.rb +57 -64
data/lib/whittle/rule.rb +0 -39
data/lib/whittle/terminal.rb +46 -0
data/lib/whittle/version.rb +1 -1
data/lib/whittle.rb +1 -0
data/spec/unit/parse_error_builder_spec.rb +78 -0
data/spec/unit/parser/one_off_start_rule_spec.rb +26 -0
metadata +9 -4

data/README.md CHANGED Viewed

@@ -91,9 +91,7 @@ be.  We use `#as` to provide an action that actually does something meaningful w
 inputs.
 We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
-we declare terminal rules too, as we have done with the `:int` rule above.  Note that the
-longer way around defining terminal rules is to do like we have done for `:expr` and define a
-block, but since this is such a common use-case, Whittle offers the shorthand.
+we declare terminal rules too, as we have done with the `:int` rule above.
 As the input string is parsed, it *must* match the start rule `:expr`.
@@ -438,6 +436,17 @@ end
 The following would return the array `["a", "b", "c"]` given the input string "a, b, c", or
 given the input string "" (nothing) it would return the empty array.
+## You can use a different start rule on-demand
+While this is not advised in production (requiring such a thing in production would suggest
+you need to re-think your grammar), during development you may wish to specify any of your
+smaller rules as the start rule for a parse.  This is particularly useful in debugging, and
+in writing unit tests.
+``` ruby
+parser.parse(input_string, :rule => :something_else)
+```
 ## Parse errors
 ### The default error reporting
@@ -464,10 +473,23 @@ class ListParser < Whittle::Parser
   start(:list)
 end
-ListParser.new.parse("a, \nb, \nc- \nd")
+str = <<-END
+one, two, three, four, five,
+six, seven, eight, nine, ten,
+eleven, twelve, thirteen,
+fourteen, fifteen - sixteen, seventeen
+END
+ListParser.new.parse(str)
 # =>
-# Parse error: expected "," but got "-" on line 3
+# Parse error: expected "," but got "-" on line 4. (Whittle::ParseError)
+#
+# Exact region marked...
+#
+# fourteen, fifteen - sixteen, seventeen
+#               ... ^ ... right here
+#
 ```
 You can also access `#line`, `#expected` and `#received` if you catch the exception.
@@ -500,6 +522,16 @@ rule("keyword")
 rule(:name => /pattern/)
 ```
+### Matching with case-insenstivity
+You can use the Hash notation from above, with a String as the key, mapping to a Regexp.
+``` ruby
+rule("function" => /function/i)
+```
+Now in all rules that allow case-insensitive "function", just use the String `"function"`.
 ### Providing a semantic action for a terminal rule
 ``` ruby
@@ -580,14 +612,20 @@ rule("+") ^ 1
 rule("*") ^ 2
 ```
-### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
+### How do I make two expresions mutually reference each other?
+Let's say you have two types of expression, `:binary_expr` (like "a + b") and `:invcation_expr` (like "foo(bar)").
+What you're saying is that any argument in the invocation expression should support either another invocation, or
+a `:binary_expr`.  Likewise, you want any operand of `:binary_expr` to support either another `:binary_expr` or
+an `:invoation_expr`.
 If you can explain it this simply on paper, you can explain it formally in your grammar.  If `:binary_expr`
 allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
 what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
 and use that new rule where you want to support both types of expression.
-Assuming your grammar looked something like this:
+Assuming your grammar looked something like this pseudo example.
 ``` ruby
 rule("+")
@@ -648,6 +686,10 @@ Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any iss
 ### How do I track state to store variables etc with Whittle?
+In general you build an complete AST to be interpreted if you're writing a program, rather than interpret the input as
+it is parsed (what would happen if something had written to disk and then a parse error occurred?).  That said, in
+simple cases it may be useful to simply interpret the input as it is read.
 One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
 something.  Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
 and do as you need during the parse.  For example, you could add a method to the class called something like `runtime`,

data/examples/calculator.rb CHANGED Viewed

@@ -57,3 +57,13 @@ p calculator.parse("5 - -2").to_f
 p calculator.parse("5 * 2 - -2").to_f
 # => 12
+p calculator.parse(
+  <<-END
+    5 * 2 - -2
+  + 3 / 7 * 90
+  / 45 * 8 * 8 * 8 - 6 * (1 - 2 - 3 ) (45 / 5)) / (7 * 3)
+  - 6
+  END
+).to_f

data/lib/whittle/parse_error_builder.rb ADDED Viewed

@@ -0,0 +1,92 @@
+# Whittle: A little LALR(1) parser in pure ruby, without a generator.
+#
+# Copyright (c) Chris Corbyn, 2011
+module Whittle
+  # Since parse error diagram the region where the error occured,
+  # this logic is split out from the main Parser
+  class ParseErrorBuilder
+    class << self
+      # Generates a ParseError for the given set of error conditions
+      #
+      # A ParseError always specifies the line nunber, the expected inputs and
+      # the received input.
+      #
+      # If possible, it also draw a diagram indicating the point where the
+      # error occurred.
+      #
+      # @param [Hash] state
+      #   all the instructions for the current parser state
+      #
+      # @param [Hash] token
+      #   the unexpected input token
+      #
+      # @param [Hash] context
+      #   the current parser context, providing line number, input string + stack etc
+      #
+      # @return [ParseError]
+      #   a detailed Exception to be raised
+      def exception(state, token, context)
+        region   = extract_error_region(token[:offset], context[:input])
+        expected = extract_expected_tokens(state)
+        message  = <<-ERROR.gsub(/\n(?!\n)\s+/, " ").strip
+          Parse error:
+          #{expected.count > 1 ? 'expected one of' : 'expected'}
+          #{expected.map { |k| k.inspect }.join(", ")}
+          but got
+          #{token[:name].inspect}
+          on line
+          #{token[:line]}.
+        ERROR
+        unless region.nil?
+          region = "\n\nExact region marked...\n\n#{region}"
+        end
+        ParseError.new(message + region.to_s, token[:line], expected, token[:name])
+      end
+      private
+      def extract_error_region(offset, input)
+        return if offset.nil?
+        # FIXME: If anybody has a cleaner way to insert the ^ marker, please do :-)
+        width        = 100
+        start_offset = [offset - width, 0].max
+        end_offset   = offset + width
+        before       = input[start_offset, [offset, width].min]
+        after        = input[offset, width]
+        before_lines = "~#{before}~".lines.to_a
+        after_lines  = "~#{after}~".lines.to_a
+        before_lines.first.slice!(0)
+        before_lines.last.chop!
+        after_lines.first.slice!(0)
+        after_lines.last.chop!
+        region_before = before_lines.pop
+        region_after  = after_lines.shift
+        error_line = region_before + region_after
+        padding = if region_before.length > 5
+          (" " * (region_before.length - 5)) + " ... "
+        else
+          " " * region_before.length
+        end
+        marker = "#{padding}^ ... right here\n\n"
+        unless error_line =~ /[\r\n]\Z/
+          marker = "\n#{marker}"
+        end
+        "#{error_line}#{marker}"
+      end
+      def extract_expected_tokens(state)
+        state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
+      end
+    end
+  end
+end

data/lib/whittle/parser.rb CHANGED Viewed

@@ -107,19 +107,6 @@ module Whittle
         @start
       end
-      # Returns the numeric value for the initial state (the state ID associated with the start
-      # rule).
-      #
-      # In most LALR(1) parsers, this would be zero, but for implementation reasons, this will
-      # be an unpredictably large (or small) number.
-      #
-      # @return [Fixnum]
-      #   the ID for the initial state in the parse table
-      def initial_state
-        prepare_start_rule
-        [rules[start], 0].hash
-      end
       # Returns the entire parse table used to interpret input into the parser.
       #
       # You should not need to call this method, though you may wish to inspect its contents
@@ -133,34 +120,38 @@ module Whittle
       # @return [Hash]
       #   a 2-dimensional Hash representing states with actions to perform for a given lookahead
       def parse_table
-        @parse_table ||= begin
-          prepare_start_rule
-          rules[start].build_parse_table(
-            {},
-            self,
-            {
-              :initial => true,
-              :state   => initial_state,
-              :seen    => [],
-              :offset  => 0,
-              :prec    => 0
-            }
-          )
-        end
+        @parse_table ||= parse_table_for_rule(start)
       end
-      private
-      def prepare_start_rule
-        raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
-        if rules[start].terminal?
-          rule(:$start) do |r|
-            r[start].as { |prog| prog }
-          end
+      # Prepare the parse table for a given rule instead of the start rule.
+      #
+      # Warning: this method does not memoize the result, so you should not use it in production.
+      #
+      # @param [Symbol, String] name
+      #   the name of the Rule to use as the start rule
+      #
+      # @return [Hash]
+      #   the complete parse table for this rule
+      def parse_table_for_rule(name)
+        raise GrammarError, "Undefined start rule #{name.inspect}" unless rules.key?(name)
-          start(:$start)
+        rule = if rules[name].terminal?
+          RuleSet.new(:$start, false).tap { |r| r[name].as { |prog| prog } }
+        else
+          rules[name]
         end
+        rule.build_parse_table(
+          {},
+          self,
+          {
+            :initial => true,
+            :state   => [rule, 0].hash,
+            :seen    => [],
+            :offset  => 0,
+            :prec    => 0
+          }
+        )
       end
     end
@@ -185,14 +176,28 @@ module Whittle
     #
     # A successful parse returns the result of evaluating the start rule, whatever that may be.
     #
+    # It is possible to specify a different start rule during development.
+    #
+    # @example Using a different start rule
+    #
+    #   parser.parse(str, :rule => :another_rule)
+    #
     # @param [String] input
     #   a complete input string to parse according to the grammar
     #
+    # @param [Hash] options
+    #   currently the only supported option is :rule, to specify a different once-off start rule
+    #
     # @return [Object]
     #   whatever the grammar defines
-    def parse(input)
-      table  = self.class.parse_table
-      states = [self.class.initial_state]
+    def parse(input, options = {})
+      table  = if options.key?(:rule)
+        self.class.parse_table_for_rule(options[:rule])
+      else
+        self.class.parse_table
+      end
+      states = [table.keys.first]
       args   = []
       line   = 1
@@ -223,7 +228,7 @@ module Whittle
             end
           end
-          error(state, token, :states => states, :args => args)
+          error(state, token, :input => input, :states => states, :args => args)
         end
       end
     end
@@ -246,13 +251,14 @@ module Whittle
           raise UnconsumedInputError,
             "Unmatched input #{input[offset..-1].inspect} on line #{line}" if token.nil?
-          offset += token[:value].length
+          token[:offset] = offset
           line, token[:line] = token[:line], line
+          offset += token[:value].length
           yield token unless token[:discarded]
         end
       end
-      yield ({ :name => :$end, :line => line, :value => nil })
+      yield ({ :name => :$end, :line => line, :value => nil, :offset => offset })
     end
     # Invoked when the parser detects an error.
@@ -267,30 +273,21 @@ module Whittle
     # @param [Hash] state
     #   the possible actions for the current parser state
     #
-    # @param [Hash] input
-    #   the received token (or, unlikely, a nonterminal symbol)
+    # @param [Hash] token
+    #   the received token
     #
-    # @param [Hash] stack
-    #   the current parse context (arg stack + state stack)
-    def error(state, input, stack)
-      expected = extract_expected_tokens(state)
-      message  = <<-ERROR.gsub(/\n\s+/, " ").strip
-        Parse error:
-        expected
-        #{expected.map { |k| k.inspect }.join("; or ")}
-        but got
-        #{input[:name].inspect}
-        on line
-        #{input[:line]}
-      ERROR
-      raise ParseError.new(message, input[:line], expected, input[:name])
+    # @param [Hash] context
+    #   the current parse context (input + arg stack + state stack)
+    def error(state, token, context)
+      raise ParseErrorBuilder.exception(state, token, context)
     end
     private
     def next_token(source, offset, line)
       rules.each do |name, rule|
+        next unless rule.terminal?
         if token = rule.scan(source, offset, line)
           token[:name] = name
           return token
@@ -299,9 +296,5 @@ module Whittle
       nil
     end
-    def extract_expected_tokens(state)
-      state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
-    end
   end
 end

data/lib/whittle/rule.rb CHANGED Viewed

@@ -36,16 +36,6 @@ module Whittle
           raise ArgumentError, "Unsupported rule component #{c.class}"
         end
       end
-      pattern = @components.first
-      if terminal?
-        @pattern = if pattern.kind_of?(Regexp)
-          Regexp.new("\\G#{pattern}")
-        else
-          Regexp.new("\\G#{Regexp.escape(pattern)}")
-        end
-      end
     end
     # Predicate check for  whether or not the Rule represents a terminal symbol.
@@ -213,35 +203,6 @@ module Whittle
       tap { @prec = prec.to_i }
     end
-    # Invoked for terminal rules during lexing, ignored for nonterminal rules.
-    #
-    # @param [String] source
-    #   the input String the scan
-    #
-    # @param [Fixnum] offset
-    #   the current index in the search
-    #
-    # @param [Fixnum] line
-    #   the line the lexer was up to when the previous token was matched
-    #
-    # @return [Hash]
-    #   a Hash representing the token, containing :rule, :value, :line and
-    #   :discarded, if the token is to be skipped.
-    #
-    # Returns nil if nothing is matched.
-    def scan(source, offset, line)
-      return nil unless terminal?
-      if match = source.match(@pattern, offset)
-        {
-          :rule      => self,
-          :value     => match[0],
-          :line      => line + match[0].count("\r\n", "\n"),
-          :discarded => @action.equal?(NULL_ACTION)
-        }
-      end
-    end
     private
     def resolve_conflicts(instructions)

data/lib/whittle/terminal.rb CHANGED Viewed

@@ -5,8 +5,54 @@
 module Whittle
   # Represents an terminal Rule, matching a pattern in the input String
   class Terminal < Rule
+    # Hard-coded to always return true
     def terminal?
       true
     end
+    # Invoked for terminal rules during lexing, ignored for nonterminal rules.
+    #
+    # @param [String] source
+    #   the input String the scan
+    #
+    # @param [Fixnum] offset
+    #   the current index in the search
+    #
+    # @param [Fixnum] line
+    #   the line the lexer was up to when the previous token was matched
+    #
+    # @return [Hash]
+    #   a Hash representing the token, containing :rule, :value, :line and
+    #   :discarded, if the token is to be skipped.
+    #
+    # Returns nil if nothing is matched.
+    def scan(source, offset, line)
+      if match = source.match(@pattern, offset)
+        {
+          :rule      => self,
+          :value     => match[0],
+          :line      => line + match[0].count("\r\n", "\n"),
+          :discarded => @action.equal?(NULL_ACTION)
+        }
+      end
+    end
+    private
+    def initialize(name, *components)
+      raise ArgumentError, \
+        "Rule #{name.inspect} is terminal and can only have one rule component" \
+        unless components.length == 1
+      super
+      pattern = components.first
+      @pattern = if pattern.kind_of?(Regexp)
+        Regexp.new("\\G#{pattern}")
+      else
+        Regexp.new("\\G#{Regexp.escape(pattern)}")
+      end
+    end
   end
 end

data/lib/whittle/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # Copyright (c) Chris Corbyn, 2011
 module Whittle
-  VERSION = "0.0.6"
+  VERSION = "0.0.7"
 end

data/lib/whittle.rb CHANGED Viewed

@@ -7,6 +7,7 @@ require "whittle/error"
 require "whittle/errors/unconsumed_input_error"
 require "whittle/errors/parse_error"
 require "whittle/errors/grammar_error"
+require "whittle/parse_error_builder"
 require "whittle/rule"
 require "whittle/terminal"
 require "whittle/non_terminal"

data/spec/unit/parse_error_builder_spec.rb ADDED Viewed

@@ -0,0 +1,78 @@
+require "spec_helper"
+describe Whittle::ParseErrorBuilder do
+  let(:context) do
+    {
+      :input => "one two three four five\nsix seven eight nine ten\neleven twelve"
+    }
+  end
+  let(:state) do
+    {
+      "gazillion" => { :action => :shift, :state => 7 }
+    }
+  end
+  context "given an error region in the middle of a line" do
+    let(:token) do
+      {
+        :name   => "eight",
+        :value  => "eight",
+        :offset => 34
+      }
+    end
+    let(:indicator) do
+      Regexp.escape(
+        "six seven eight nine ten\n" <<
+        "      ... ^ ..."
+      )
+    end
+    it "indicates the exact region" do
+      Whittle::ParseErrorBuilder.exception(state, token, context).message.should =~ /#{indicator}/
+    end
+  end
+  context "given an error region near the start of a line" do
+    let(:token) do
+      {
+        :name   => "two",
+        :value  => "two",
+        :offset => 4
+      }
+    end
+    let(:indicator) do
+      Regexp.escape(
+        "one two three four five\n" <<
+        "    ^ ..."
+      )
+    end
+    it "indicates the exact region" do
+      Whittle::ParseErrorBuilder.exception(state, token, context).message.should =~ /#{indicator}/
+    end
+  end
+  context "given an error region near the end of a line" do
+    let(:token) do
+      {
+        :name   => "five",
+        :value  => "five",
+        :offset => 19
+      }
+    end
+    let(:indicator) do
+      Regexp.escape(
+        "one two three four five\n" <<
+        "               ... ^ ..."
+      )
+    end
+    it "indicates the exact region" do
+      Whittle::ParseErrorBuilder.exception(state, token, context).message.should =~ /#{indicator}/
+    end
+  end
+end

data/spec/unit/parser/one_off_start_rule_spec.rb ADDED Viewed

@@ -0,0 +1,26 @@
+require "spec_helper"
+describe "parsing according to a different start rule" do
+  let(:parser) do
+    Class.new(Whittle::Parser) do
+      rule("+")
+      rule("-")
+      rule(:int => /[0-9]+/).as { |i| Integer(i) }
+      rule(:sum) do |r|
+        r[:int, "+", :int].as { |a, _, b| a + b }
+      end
+      rule(:sub) do |r|
+        r[:sum, "-", :sum].as { |a, _, b| a - b }
+      end
+      start(:sub)
+    end
+  end
+  it "ignores the defined start rule and uses the specified one" do
+    parser.new.parse("1+2", :rule => :sum).should == 3
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: whittle
 version: !ruby/object:Gem::Version
-  version: 0.0.6
+  version: 0.0.7
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-12-02 00:00:00.000000000 Z
+date: 2011-12-03 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &70361567853620 !ruby/object:Gem::Requirement
+  requirement: &70312516285540 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -21,7 +21,7 @@ dependencies:
         version: '2.6'
   type: :development
   prerelease: false
-  version_requirements: *70361567853620
+  version_requirements: *70312516285540
 description: ! "Write powerful parsers by defining a series of very simple rules\n
   \                    and operations to perform as those rules are matched.  Whittle\n
   \                    parsers are written in pure ruby and as such are extremely
@@ -47,12 +47,14 @@ files:
 - lib/whittle/errors/parse_error.rb
 - lib/whittle/errors/unconsumed_input_error.rb
 - lib/whittle/non_terminal.rb
+- lib/whittle/parse_error_builder.rb
 - lib/whittle/parser.rb
 - lib/whittle/rule.rb
 - lib/whittle/rule_set.rb
 - lib/whittle/terminal.rb
 - lib/whittle/version.rb
 - spec/spec_helper.rb
+- spec/unit/parse_error_builder_spec.rb
 - spec/unit/parser/empty_rule_spec.rb
 - spec/unit/parser/empty_string_spec.rb
 - spec/unit/parser/error_reporting_spec.rb
@@ -60,6 +62,7 @@ files:
 - spec/unit/parser/multiple_precedence_spec.rb
 - spec/unit/parser/non_terminal_ambiguity_spec.rb
 - spec/unit/parser/noop_spec.rb
+- spec/unit/parser/one_off_start_rule_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
 - spec/unit/parser/premature_eof_spec.rb
@@ -96,6 +99,7 @@ specification_version: 3
 summary: An efficient, easy to use, LALR parser for Ruby
 test_files:
 - spec/spec_helper.rb
+- spec/unit/parse_error_builder_spec.rb
 - spec/unit/parser/empty_rule_spec.rb
 - spec/unit/parser/empty_string_spec.rb
 - spec/unit/parser/error_reporting_spec.rb
@@ -103,6 +107,7 @@ test_files:
 - spec/unit/parser/multiple_precedence_spec.rb
 - spec/unit/parser/non_terminal_ambiguity_spec.rb
 - spec/unit/parser/noop_spec.rb
+- spec/unit/parser/one_off_start_rule_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
 - spec/unit/parser/premature_eof_spec.rb