RubyGems - whittle - Versions diffs - 0.0.2 → 0.0.3 - Mend

whittle 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

data/README.md +41 -13
data/lib/whittle/parser.rb +38 -48
data/lib/whittle/version.rb +1 -1
data/spec/unit/parser/surplus_input_spec.rb +37 -0
metadata +6 -4

data/README.md CHANGED Viewed

@@ -83,9 +83,10 @@ program, which in this case is the `:expr` rule that can add two numbers togethe
 There are two terminal rules (`"+"` and `:int`) and one nonterminal (`:expr`) in the above
 grammar.  Each rule can have a block attached to it.  The block is invoked with the result
-evaluating the blocks attached to each of its inputs (in a depth-first manner).  The default
+evaluating each of its inputs via their own blocks (in a depth-first manner).  The default
 action if no block is given, is to return whatever the leftmost input to the rule happens to
-be.
+be.  We use `#as` to provide an action that actually does something meaningful with the
+inputs.
 We can optionally use the Hash notation to map a name with a pattern (or a fixed string) when
 we declare terminal rules too, as we have done with the `:int` rule above.  Note that the
@@ -94,16 +95,26 @@ block, but since this is such a common use-case, Whittle offers the shorthand.
 As the input string is parsed, it *must* match the start rule `:expr`.
-Let's step through the parse for the above input "1+2".  When the parser starts, it looks at
-the start rule `:expr` and decides what tokens would be valid if they were encountered. Since
-`:expr` starts with `:int`, the only thing that would be valid is anything matching
-`/[0-9]+/`. When the parser reads the "1", it recognizes it as an `:int`, puts at aside (puts
-it on the stack, in technical terms).  Now it advances through the rule for `:expr` and
-decides the only possible valid input would be a "+", and finally the last `:int`.  Upon
-having read the sequence `:int`, "+", `:int`, our block attached to that rule is invoked to
-return a result.  First the three inputs are passed through their respective blocks (so the
-"1" and the "2" are cast to integers, according to the rule for `:int`), then they are passed
-to the `:expr`, which adds the 1 and the 2 to make 3.  Magic!
+Let's step through the parse for the above input "1+2".
+  - When the parser starts, it looks at the start rule `:expr` and decides what tokens would
+    be valid if they were encountered.
+  - Since `:expr` starts with `:int`, the only thing that would be valid is anything matching
+    `/[0-9]+/`.
+  - When the parser reads the "1", it recognizes it as an `:int`, evaluates its block (thus
+    casting it to an Integer), and moves it aside (puts it on the stack, to be precise).
+  - Now it advances through the rule for `:expr` and decides the only valid input would be a
+    "+"
+  - Upon reading the "+", the rule for "+" is invoked (which does nothing) and the "+" is put
+    on the stack, along with the `:int` we already have.
+  - Now the parser's only valid input is another `:int`, which it gets from the "2", casting
+    it to an Integer according to its block, and putting it on the stack.
+  - Finally, upon having read the sequence `:int`, "+", `:int`, our block attached to that
+    particular rule is invoked to return a result by summing the 1 and the 2 to make 3. Magic!
+This was a simple parse. At each point there was only one valid input.  As we'll see, parses
+can be arbitrarily complex, without increasing the amount of work needed to process the input
+string.
 ## Nonterminal rules can have more than one valid sequence
@@ -474,12 +485,29 @@ would probably be a useful exercise.
 If you have any examples you'd like to contribute, I will gladly add them to the repository.
+## Issues & Questions
+Any issues, I will address them quickly as it is still early days, though I am pretty confident,
+since this is based on a scientific algorithm, issues would be relatively minor.  Post them to
+the issue tracker:
+  - https://github.com/d11wtq/whittle/issues
+If you have any suggestions for how I might improve the DSL in order to be more human-friendly,
+you can suggest those in the issue tracker too.
+For any "how do I do this?" type questions, you can message me directly (via my github profile
+page):
+  - https://github.com/d11wtq
+Or simply post an issue.
 ## TODO
   - Provide a more powerful (state based) lexer algorithm, or at least document how users can
     override `#lex`.
   - Allow inspection of the parse table (it is not very human friendly right now).
-  - Allow inspection of the AST (maybe).
   - Given in an input String, provide a human readble explanation of the parse.
 ## License & Copyright

data/lib/whittle/parser.rb CHANGED Viewed

@@ -19,25 +19,21 @@ module Whittle
   # @example A simple Whittle Parser
   #
   #   class Calculator < Whittle::Parser
-  #     rule(:wsp) do |r|
-  #       r[/s+/] # skip whitespace
-  #     end
+  #     rule(:wsp => /\s+/).skip!
   #
-  #     rule(:int) do |r|
-  #       r[/[0-9]+/].as { |i| Integer(i) }
-  #     end
+  #     rule(:int => /[0-9]+/).as { |i| Integer(i) }
   #
-  #     rule("+") % :left
-  #     rule("-") % :left
-  #     rule("/") % :left
-  #     rule("*") % :left
+  #     rule("+") % :left ^ 1
+  #     rule("-") % :left ^ 1
+  #     rule("/") % :left ^ 2
+  #     rule("*") % :left ^ 2
   #
   #     rule(:expr) do |r|
   #       r[:expr, "+", :expr].as { |left, _, right| left + right }
   #       r[:expr, "-", :expr].as { |left, _, right| left - right }
   #       r[:expr, "/", :expr].as { |left, _, right| left / right }
   #       r[:expr, "*", :expr].as { |left, _, right| left * right }
-  #       r[:int].as(:value)
+  #       r[:int]
   #     end
   #
   #     start(:expr)
@@ -158,11 +154,11 @@ module Whittle
         raise GrammarError, "Undefined start rule #{start.inspect}" unless rules.key?(start)
         if rules[start].terminal?
-          rule(:*) do |r|
+          rule(:$start) do |r|
             r[start].as { |prog| prog }
           end
-          start(:*)
+          start(:$start)
         end
       end
     end
@@ -177,12 +173,13 @@ module Whittle
     # Accepts input in the form of a String and attempts to parse it according to the grammar.
     #
     # The input is scanned using a lexical analysis routine, defined by the #lex method. Each
-    # token detected by the routine is used to pick an action from the parse table.  Each
-    # reduction initially builds a branch in an AST (abstract syntax tree), until all input has
-    # been read and the start rule has been recognized, at which point the AST is evaluated by
-    # invoking the callbacks defined in the grammar in a depth-first fashion.
+    # token detected by the routine is used to pick an action from the parse table.
     #
-    # If the parser encounters a token it does not recognise, a parse error will be raised,
+    # Each time a sequence of inputs has been read that concludes a rule in the grammar, the
+    # inputs are passed as arguments to the block for that rule, converting the sequence into
+    # single input before the parse continues.
+    #
+    # If the parser encounters a token it does not expect, a parse error will be raised,
     # specifying what was expected, what was received, and on which line the error occurred.
     #
     # A successful parse returns the result of evaluating the start rule, whatever that may be.
@@ -200,39 +197,32 @@ module Whittle
       lex(input) do |token|
         line  = token[:line]
-        input = token
-        catch(:shifted) do
-          loop do
-            state = table[states.last]
+        loop do
+          state = table[states.last]
-            if ins = state[input[:name]] || state[nil]
-              case ins[:action]
-                when :shift
-                  input[:args] = [input.delete(:value)]
-                  states << ins[:state]
-                  args << input
-                  throw :shifted
-                when :reduce
-                  size = ins[:rule].components.length
-                  input = {
-                    :rule => ins[:rule],
-                    :name => ins[:rule].name,
-                    :line => line,
-                    :args => args.pop(size)
-                  }
-                  states.pop(size)
-                  args << input
+          if instruction = state[token[:name]] || state[nil]
+            case instruction[:action]
+            when :shift
+              states << instruction[:state]
+              args   << token[:rule].action.call(token[:value])
+              break
+            when :reduce
+              rule = instruction[:rule]
+              size = rule.components.length
+              args << rule.action.call(*args.pop(size))
+              states.pop(size)
-                  return accept(args.pop) if states.length == 1 && token[:name] == :$end
-                when :goto
-                  input = token
-                  states << ins[:state]
+              if states.length == 1 && token[:name] == :$end
+                return args.pop
+              elsif goto = table[states.last][rule.name]
+                states << goto[:state]
+                next
               end
-            else
-              error(state, input, :states => states, :args => args)
             end
           end
+          error(state, token, :states => states, :args => args)
         end
       end
     end
@@ -282,7 +272,7 @@ module Whittle
     # @param [Hash] stack
     #   the current parse context (arg stack + state stack)
     def error(state, input, stack)
-      expected = state.reject { |s, i| i[:action] == :goto }.keys
+      expected = extract_expected_tokens(state)
       message  = <<-ERROR.gsub(/\n\s+/, " ").strip
         Parse error:
         expected
@@ -309,8 +299,8 @@ module Whittle
       nil
     end
-    def accept(tree)
-      tree[:rule].action.call(*tree[:args].map { |arg| Hash === arg ? accept(arg) : arg })
+    def extract_expected_tokens(state)
+      state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
     end
   end
 end

data/lib/whittle/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # Copyright (c) Chris Corbyn, 2011
 module Whittle
-  VERSION = "0.0.2"
+  VERSION = "0.0.3"
 end

data/spec/unit/parser/surplus_input_spec.rb ADDED Viewed

@@ -0,0 +1,37 @@
+require "spec_helper"
+describe "a parser expecting a fixed amount of input" do
+  let(:parser) do
+    Class.new(Whittle::Parser) do
+      rule("a")
+      rule("b")
+      rule("c")
+      rule(:prog) do |r|
+        r["a", "b", "c"]
+      end
+      start(:prog)
+    end
+  end
+  it "raises a parse error if additional input is encountered" do
+    expect { parser.new.parse("abcabc") }.to raise_error(Whittle::ParseError)
+  end
+  it "indicates that :$end is the expected token" do
+    begin
+      parser.new.parse("abcabc")
+    rescue Whittle::ParseError => e
+      e.expected.should == [:$end]
+    end
+  end
+  it "indicates that the first surplus token is the received input" do
+    begin
+      parser.new.parse("abcabc")
+    rescue Whittle::ParseError => e
+      e.received.should == "a"
+    end
+  end
+end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: whittle
 version: !ruby/object:Gem::Version
-  version: 0.0.2
+  version: 0.0.3
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-11-28 00:00:00.000000000 Z
+date: 2011-11-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &70351976364700 !ruby/object:Gem::Requirement
+  requirement: &70110140399280 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -21,7 +21,7 @@ dependencies:
         version: '2.6'
   type: :development
   prerelease: false
-  version_requirements: *70351976364700
+  version_requirements: *70110140399280
 description: ! "Write powerful parsers by defining a series of very simple rules\n
   \                    and operations to perform as those rules are matched.  Whittle\n
   \                    parsers are written in pure ruby and as such are extremely
@@ -62,6 +62,7 @@ files:
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb
+- spec/unit/parser/surplus_input_spec.rb
 - spec/unit/parser/typecast_parser_spec.rb
 - whittle.gemspec
 homepage: https://github.com/d11wtq/whittle
@@ -101,5 +102,6 @@ test_files:
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb
+- spec/unit/parser/surplus_input_spec.rb
 - spec/unit/parser/typecast_parser_spec.rb
 has_rdoc: