RubyGems - whittle - Versions diffs - 0.0.3 → 0.0.4 - Mend

whittle 0.0.3 → 0.0.4

Files changed (6) hide show

data/README.md +194 -0
data/lib/whittle/parser.rb +8 -7
data/lib/whittle/rule.rb +13 -6
data/lib/whittle/version.rb +1 -1
data/spec/unit/parser/premature_eof_spec.rb +43 -0
metadata +6 -4

data/README.md CHANGED

@@ -5,6 +5,8 @@ it's 100% ruby.  You write parsers by specifying sequences of allowable rules (w
 other rules, or even to themselves).  For each rule in your grammar, you provide a block that
 is invoked when the grammar is recognized.
+**TL;DR** (Skip to 'Summary & FAQ')
 If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
 parser.
@@ -485,6 +487,198 @@ would probably be a useful exercise.
 If you have any examples you'd like to contribute, I will gladly add them to the repository.
+## Summary & FAQ
+### Defining a rule to match a chunk of the input string
+These are called "terminal rules", since they don't lead anywhere beyond themselves.  A word of
+caution here: the ordering matters. They are scanned in order from top to bottom.
+``` ruby
+rule("keyword")
+# or
+rule(:name => /pattern/)
+```
+### Providing a semantic action for a terminal rule
+``` ruby
+rule(:int => /[0-9]+/).as { |str| Integer(str) }
+```
+### Defining a rule to match a sequence of other rules
+These are called "nonterminal rules", since they require chaining to other rules.
+``` ruby
+rule(:sum) do |r|
+  r[:int, "+", :int].as { |a, _, b| a + b }
+end
+```
+Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
+block are the two integers.  Argument `_` is the "+" (which we're not using, hence the argument
+name).
+### Defining alternatives for the same rule
+Call `[](*args)` more than once.
+``` ruby
+rule(:expr) do |r|
+  r[:expr, "+", :expr].as { |a, _, b| a + b }
+  r[:expr, "-", :expr].as { |a, _, b| a - b }
+  r[:int]
+end
+```
+### Skipping whitespace and comments
+``` ruby
+rule(:wsp     => /\s+/).skip!
+rule(:comment => /#.*$/m).skip!
+```
+### Looking for the same thing multiple times
+Define the rule for the single item, then add another rule for itself, followed by the single item.
+``` ruby
+rule(:list) do |r|
+  r[:list, :id].as { |list, id| list << id }
+  r[:id].as        { |id| [id] }
+end
+```
+If you want to allow zero of something, add an additional `r[]`.
+### Looking for a comma separated list of something
+Just like for above, but with a comma in our recursive rule.
+``` ruby
+rule(:list) do |r|
+  r[:list, ",", :id].as { |list, _, id| list << id }
+  r[:id].as        { |id| [id] }
+end
+```
+### Evaluate the left hand side of binary expressions as early as possible
+This is called left association. Tag the operators with `% :left`.  They are tagged `% :right` by default.
+``` ruby
+rule("+") % :left
+```
+### Give one operator a higher precedence than another
+Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
+``` ruby
+rule("+") ^ 1
+rule("*") ^ 2
+```
+### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
+If you can explain it this simply on paper, you can explain it formally in your grammar.  If `:binary_expr`
+allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
+what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
+and use that new rule where you want to support both types of expression.
+Assuming your grammar looked something like this:
+``` ruby
+rule("+")
+rule(:int => /[0-9]+/).as { |i| Integer(i) }
+rule(:id  => /\w+/)
+rule(:binary_expr) do |r|
+  r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
+  r[:int]
+end
+rule(:args) do |r|
+  r[].as            { [] } # empty list
+  r[:args, :int].as { |args, i| args << i }
+  r[:int].as        { |i| [i] }
+end
+rule(:invocation_expr) do |r|
+  r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
+end
+```
+This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
+"1 + foo(2 + 3) + 4".
+The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
+something that represents both types of expression.
+``` ruby
+rule("+")
+rule(:int => /[0-9]+/).as { |i| Integer(i) }
+rule(:id  => /\w+/)
+rule(:expr) do |r|
+  r[:binary_expr]
+  r[:invocation_expr]
+end
+rule(:binary_expr) do |r|
+  r[:expr, "+", :expr].as { |a, _, b| a + b}
+  r[:int]
+end
+rule(:args) do |r|
+  r[].as             { [] } # empty list
+  r[:args, :expr].as { |args, expr| args << expr }
+  r[:expr].as        { |expr| [expr] }
+end
+rule(:invocation_expr) do |r|
+  r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
+end
+```
+Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
+### How do I track state to store variables etc with Whittle?
+One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
+something.  Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
+and do as you need during the parse.  For example, you could add a method to the class called something like `runtime`,
+which is accessible from each block.
+### I just want Whittle to give me an AST of my input
+AST (abstract syntax tree) is a loose term.  Early versions originally created an AST, but the format you want the AST
+in probably differs from the format the next developer wants it in.  It's really easy to use your grammar to make one
+however you please:
+``` ruby
+class Parser < Whittle::Parser
+  rule("+")
+  rule(:int => /[0-9]+/).as { |int| { :int => int } }
+  rule(:sum) do |r|
+    r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
+  end
+  start(:sum)
+end
+p Parser.new.parse("1+2")
+# =>
+# {:sum=>[{:int=>"1"}, {:int=>"2"}]}
+```
+(There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
 ## Issues & Questions
 Any issues, I will address them quickly as it is still early days, though I am pretty confident,

data/lib/whittle/parser.rb CHANGED

@@ -139,10 +139,11 @@ module Whittle
             {},
             self,
             {
-              :state  => initial_state,
-              :seen   => [],
-              :offset => 0,
-              :prec   => 0
+              :initial => true,
+              :state   => initial_state,
+              :seen    => [],
+              :offset  => 0,
+              :prec    => 0
             }
           )
         end
@@ -207,13 +208,13 @@ module Whittle
               states << instruction[:state]
               args   << token[:rule].action.call(token[:value])
               break
-            when :reduce
+            when :reduce, :accept
               rule = instruction[:rule]
               size = rule.components.length
               args << rule.action.call(*args.pop(size))
               states.pop(size)
-              if states.length == 1 && token[:name] == :$end
+              if states.length == 1 && instruction[:action] == :accept
                 return args.pop
               elsif goto = table[states.last][rule.name]
                 states << goto[:state]
@@ -300,7 +301,7 @@ module Whittle
     end
     def extract_expected_tokens(state)
-      state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
+      state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
     end
   end
 end

data/lib/whittle/rule.rb CHANGED

@@ -88,6 +88,13 @@ module Whittle
           :rule   => self,
           :prec   => context[:prec]
         }
+        if context[:initial]
+          state[:$end] = {
+            :action => :accept,
+            :rule   => self
+          }
+        end
       else
         raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
@@ -126,10 +133,11 @@ module Whittle
           table,
           parser,
           {
-            :state  => new_state,
-            :seen   => context[:seen],
-            :offset => new_offset,
-            :prec   => new_prec
+            :initial => context[:initial],
+            :state   => new_state,
+            :seen    => context[:seen],
+            :offset  => new_offset,
+            :prec    => new_prec
           }
         )
       end
@@ -227,8 +235,7 @@ module Whittle
         {
           :rule      => self,
           :value     => match[0],
-          # FIXME: Optimize this line count in a cross-platform compatible way
-          :line      => line + ("~" + match[0] + "~").lines.count - 1,
+          :line      => line + match[0].count("\r\n", "\n"),
           :discarded => @action.equal?(NULL_ACTION)
         }
       end

data/lib/whittle/version.rb CHANGED

@@ -3,5 +3,5 @@
 # Copyright (c) Chris Corbyn, 2011
 module Whittle
-  VERSION = "0.0.3"
+  VERSION = "0.0.4"
 end

data/spec/unit/parser/premature_eof_spec.rb ADDED

@@ -0,0 +1,43 @@
+require "spec_helper"
+describe "a parser receiving only partial input" do
+  let(:parser) do
+    Class.new(Whittle::Parser) do
+      rule("a")
+      rule("b")
+      rule("c")
+      rule(";")
+      rule(:abc) do |r|
+        r["a", "b", "c"]
+      end
+      rule(:prog) do |r|
+        r[:abc, ";"]
+      end
+      start(:prog)
+    end
+  end
+  it "raises a parse error" do
+    expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
+  end
+  it "reports the expected token" do
+    begin
+      parser.new.parse("abc")
+    rescue Whittle::ParseError => e
+      e.expected.should == [";"]
+    end
+  end
+  it "indicates :$end as the received token" do
+    begin
+      parser.new.parse("abc")
+    rescue Whittle::ParseError => e
+      e.received.should == :$end
+    end
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: whittle
 version: !ruby/object:Gem::Version
-  version: 0.0.3
+  version: 0.0.4
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-11-29 00:00:00.000000000 Z
+date: 2011-11-30 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &70110140399280 !ruby/object:Gem::Requirement
+  requirement: &70302063205940 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -21,7 +21,7 @@ dependencies:
         version: '2.6'
   type: :development
   prerelease: false
-  version_requirements: *70110140399280
+  version_requirements: *70302063205940
 description: ! "Write powerful parsers by defining a series of very simple rules\n
   \                    and operations to perform as those rules are matched.  Whittle\n
   \                    parsers are written in pure ruby and as such are extremely
@@ -59,6 +59,7 @@ files:
 - spec/unit/parser/noop_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
+- spec/unit/parser/premature_eof_spec.rb
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb
@@ -99,6 +100,7 @@ test_files:
 - spec/unit/parser/noop_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
+- spec/unit/parser/premature_eof_spec.rb
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb