RubyGems - whittle - Versions diffs - 0.0.3 → 0.0.4 - Mend

whittle 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

data/README.md +194 -0
data/lib/whittle/parser.rb +8 -7
data/lib/whittle/rule.rb +13 -6
data/lib/whittle/version.rb +1 -1
data/spec/unit/parser/premature_eof_spec.rb +43 -0
metadata +6 -4

data/README.md CHANGED

@@ -5,6 +5,8 @@ it's 100% ruby.  You write parsers by specifying sequences of allowable rules (w
 other rules, or even to themselves).  For each rule in your grammar, you provide a block that
 is invoked when the grammar is recognized.
+**TL;DR** (Skip to 'Summary & FAQ')
 If you're *not* familiar with parsing, you should find Whittle to be a very friendly little
 parser.
@@ -485,6 +487,198 @@ would probably be a useful exercise.
 If you have any examples you'd like to contribute, I will gladly add them to the repository.
+## Summary & FAQ
+### Defining a rule to match a chunk of the input string
+These are called "terminal rules", since they don't lead anywhere beyond themselves.  A word of
+caution here: the ordering matters. They are scanned in order from top to bottom.
+``` ruby
+rule("keyword")
+# or
+rule(:name => /pattern/)
+```
+### Providing a semantic action for a terminal rule
+``` ruby
+rule(:int => /[0-9]+/).as { |str| Integer(str) }
+```
+### Defining a rule to match a sequence of other rules
+These are called "nonterminal rules", since they require chaining to other rules.
+``` ruby
+rule(:sum) do |r|
+  r[:int, "+", :int].as { |a, _, b| a + b }
+end
+```
+Where `:int` and `"+"` have been previously declared by other rules. Arguments `a` and `b` in the
+block are the two integers.  Argument `_` is the "+" (which we're not using, hence the argument
+name).
+### Defining alternatives for the same rule
+Call `[](*args)` more than once.
+``` ruby
+rule(:expr) do |r|
+  r[:expr, "+", :expr].as { |a, _, b| a + b }
+  r[:expr, "-", :expr].as { |a, _, b| a - b }
+  r[:int]
+end
+```
+### Skipping whitespace and comments
+``` ruby
+rule(:wsp     => /\s+/).skip!
+rule(:comment => /#.*$/m).skip!
+```
+### Looking for the same thing multiple times
+Define the rule for the single item, then add another rule for itself, followed by the single item.
+``` ruby
+rule(:list) do |r|
+  r[:list, :id].as { |list, id| list << id }
+  r[:id].as        { |id| [id] }
+end
+```
+If you want to allow zero of something, add an additional `r[]`.
+### Looking for a comma separated list of something
+Just like for above, but with a comma in our recursive rule.
+``` ruby
+rule(:list) do |r|
+  r[:list, ",", :id].as { |list, _, id| list << id }
+  r[:id].as        { |id| [id] }
+end
+```
+### Evaluate the left hand side of binary expressions as early as possible
+This is called left association. Tag the operators with `% :left`.  They are tagged `% :right` by default.
+``` ruby
+rule("+") % :left
+```
+### Give one operator a higher precedence than another
+Attach a precedence number to any operators that need them. The higher the number, the higher the precedence.
+``` ruby
+rule("+") ^ 1
+rule("*") ^ 2
+```
+### I have two types of expression: binary and function call. How can I allow a binary expression in a function call argument, and a function call in a binary expression?
+If you can explain it this simply on paper, you can explain it formally in your grammar.  If `:binary_expr`
+allows `:invocation_expr` as an operand, and if `:invocation_expr` allows `:binary_expr` as an argument, then
+what you're saying is they can be used in place of each other; thus, define a rule that represents the two of them
+and use that new rule where you want to support both types of expression.
+Assuming your grammar looked something like this:
+``` ruby
+rule("+")
+rule(:int => /[0-9]+/).as { |i| Integer(i) }
+rule(:id  => /\w+/)
+rule(:binary_expr) do |r|
+  r[:binary_expr, "+", :binary_expr].as { |a, _, b| a + b}
+  r[:int]
+end
+rule(:args) do |r|
+  r[].as            { [] } # empty list
+  r[:args, :int].as { |args, i| args << i }
+  r[:int].as        { |i| [i] }
+end
+rule(:invocation_expr) do |r|
+  r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
+end
+```
+This grammar can parse things like "1 + 2 + 3" and "foo(1, 2, 3)", but it can't parse something like
+"1 + foo(2 + 3) + 4".
+The goal is to replace `:int` in the `:args` rule and `:binary_expr` in the `:binary_expr` rule, with
+something that represents both types of expression.
+``` ruby
+rule("+")
+rule(:int => /[0-9]+/).as { |i| Integer(i) }
+rule(:id  => /\w+/)
+rule(:expr) do |r|
+  r[:binary_expr]
+  r[:invocation_expr]
+end
+rule(:binary_expr) do |r|
+  r[:expr, "+", :expr].as { |a, _, b| a + b}
+  r[:int]
+end
+rule(:args) do |r|
+  r[].as             { [] } # empty list
+  r[:args, :expr].as { |args, expr| args << expr }
+  r[:expr].as        { |expr| [expr] }
+end
+rule(:invocation_expr) do |r|
+  r[:id, "(", :args, ")"].as { |name, _, args, _| FuncCall.new(name, args) }
+end
+```
+Now we can parse the more complex expression "1 + foo(2, 3) + 4" without any issues.
+### How do I track state to store variables etc with Whittle?
+One of the goals of making Whittle all ruby was that I wouldn't have to tie people into any particular way of doing
+something.  Your blocks can call any ruby code they like, so create an object of some sort those blocks can reference
+and do as you need during the parse.  For example, you could add a method to the class called something like `runtime`,
+which is accessible from each block.
+### I just want Whittle to give me an AST of my input
+AST (abstract syntax tree) is a loose term.  Early versions originally created an AST, but the format you want the AST
+in probably differs from the format the next developer wants it in.  It's really easy to use your grammar to make one
+however you please:
+``` ruby
+class Parser < Whittle::Parser
+  rule("+")
+  rule(:int => /[0-9]+/).as { |int| { :int => int } }
+  rule(:sum) do |r|
+    r[:int, "+", :int].as { |a, _, b| { :sum => [a, b] } }
+  end
+  start(:sum)
+end
+p Parser.new.parse("1+2")
+# =>
+# {:sum=>[{:int=>"1"}, {:int=>"2"}]}
+```
+(There could be a side-project in this if somebody thinks a "generic AST" is useful enough).
 ## Issues & Questions
 Any issues, I will address them quickly as it is still early days, though I am pretty confident,

data/lib/whittle/parser.rb CHANGED

@@ -139,10 +139,11 @@ module Whittle
             {},
             self,
             {
-              :state  => initial_state,
-              :seen   => [],
-              :offset => 0,
-              :prec   => 0
+              :initial => true,
+              :state   => initial_state,
+              :seen    => [],
+              :offset  => 0,
+              :prec    => 0
             }
           )
         end
@@ -207,13 +208,13 @@ module Whittle
               states << instruction[:state]
               args   << token[:rule].action.call(token[:value])
               break
-            when :reduce
+            when :reduce, :accept
               rule = instruction[:rule]
               size = rule.components.length
               args << rule.action.call(*args.pop(size))
               states.pop(size)
-              if states.length == 1 && token[:name] == :$end
+              if states.length == 1 && instruction[:action] == :accept
                 return args.pop
               elsif goto = table[states.last][rule.name]
                 states << goto[:state]
@@ -300,7 +301,7 @@ module Whittle
     end
     def extract_expected_tokens(state)
-      state.reject { |s, i| i[:action] == :goto }.keys.collect { |k| k.nil? ? :$end : k }
+      state.select { |s, i| [:shift, :accept].include?(i[:action]) }.keys
     end
   end
 end

data/lib/whittle/rule.rb CHANGED

@@ -88,6 +88,13 @@ module Whittle
           :rule   => self,
           :prec   => context[:prec]
         }
+        if context[:initial]
+          state[:$end] = {
+            :action => :accept,
+            :rule   => self
+          }
+        end
       else
         raise GrammarError, "Unreferenced rule #{sym.inspect}" if rule.nil?
@@ -126,10 +133,11 @@ module Whittle
           table,
           parser,
           {
-            :state  => new_state,
-            :seen   => context[:seen],
-            :offset => new_offset,
-            :prec   => new_prec
+            :initial => context[:initial],
+            :state   => new_state,
+            :seen    => context[:seen],
+            :offset  => new_offset,
+            :prec    => new_prec
           }
         )
       end
@@ -227,8 +235,7 @@ module Whittle
         {
           :rule      => self,
           :value     => match[0],
-          # FIXME: Optimize this line count in a cross-platform compatible way
-          :line      => line + ("~" + match[0] + "~").lines.count - 1,
+          :line      => line + match[0].count("\r\n", "\n"),
           :discarded => @action.equal?(NULL_ACTION)
         }
       end

data/lib/whittle/version.rb CHANGED

@@ -3,5 +3,5 @@
 # Copyright (c) Chris Corbyn, 2011
 module Whittle
-  VERSION = "0.0.3"
+  VERSION = "0.0.4"
 end

data/spec/unit/parser/premature_eof_spec.rb ADDED

@@ -0,0 +1,43 @@
+require "spec_helper"
+describe "a parser receiving only partial input" do
+  let(:parser) do
+    Class.new(Whittle::Parser) do
+      rule("a")
+      rule("b")
+      rule("c")
+      rule(";")
+      rule(:abc) do |r|
+        r["a", "b", "c"]
+      end
+      rule(:prog) do |r|
+        r[:abc, ";"]
+      end
+      start(:prog)
+    end
+  end
+  it "raises a parse error" do
+    expect { parser.new.parse("abc") }.to raise_error(Whittle::ParseError)
+  end
+  it "reports the expected token" do
+    begin
+      parser.new.parse("abc")
+    rescue Whittle::ParseError => e
+      e.expected.should == [";"]
+    end
+  end
+  it "indicates :$end as the received token" do
+    begin
+      parser.new.parse("abc")
+    rescue Whittle::ParseError => e
+      e.received.should == :$end
+    end
+  end
+end

metadata CHANGED

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: whittle
 version: !ruby/object:Gem::Version
-  version: 0.0.3
+  version: 0.0.4
   prerelease:
 platform: ruby
 authors:
@@ -9,11 +9,11 @@ authors:
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2011-11-29 00:00:00.000000000 Z
+date: 2011-11-30 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: rspec
-  requirement: &70110140399280 !ruby/object:Gem::Requirement
+  requirement: &70302063205940 !ruby/object:Gem::Requirement
     none: false
     requirements:
     - - ~>
@@ -21,7 +21,7 @@ dependencies:
         version: '2.6'
   type: :development
   prerelease: false
-  version_requirements: *70110140399280
+  version_requirements: *70302063205940
 description: ! "Write powerful parsers by defining a series of very simple rules\n
   \                    and operations to perform as those rules are matched.  Whittle\n
   \                    parsers are written in pure ruby and as such are extremely
@@ -59,6 +59,7 @@ files:
 - spec/unit/parser/noop_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
+- spec/unit/parser/premature_eof_spec.rb
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb
@@ -99,6 +100,7 @@ test_files:
 - spec/unit/parser/noop_spec.rb
 - spec/unit/parser/pass_through_parser_spec.rb
 - spec/unit/parser/precedence_spec.rb
+- spec/unit/parser/premature_eof_spec.rb
 - spec/unit/parser/self_referential_expr_spec.rb
 - spec/unit/parser/skipped_tokens_spec.rb
 - spec/unit/parser/sum_parser_spec.rb