RubyGems - tdp4r - Versions diffs - 1.3.3 - Mend

tdp4r 1.3.3

Files changed (14) hide show

data/doc/faq.txt ADDED Viewed

@@ -0,0 +1,37 @@
+* How do I write a rule that represents left/right-associative
+  infix operators.
+One of the good example is an arithmetic expression for "*", "/",
+"+" and "-". If you use racc (yacc-style parser for ruby), you
+would write the following rule:
+  prechigh
+    left '*','/'
+    left '+','-'
+  preclow
+  ...
+  expr : expr '*' expr { result = val[0] * val[2]}
+       | expr '/' expr { result = val[0] / val[2]}
+       | expr '+' expr { result = val[0] + val[2]}
+       | expr '-' expr { result = val[0] - val[2]}
+       | NUMBER        { result = val[0].to_i() }
+In TDP4R, you can write the above rule as follows:
+  TDParser.define{|g|
+    g.expr = chainl(NUMBER >> Proc.new{|x| x[0].to_i},
+                    token("*")|token("/"),
+                    token("+")|token("-")){|x|
+      case x[1]
+      when "*"
+        x[0] * x[2]
+      when "/"
+        x[0] / x[2]
+      when "+"
+        x[0] + x[2]
+      when "-"
+        x[0] - x[2]
+      end
+    }
+    ...
+  }

data/doc/guide.txt ADDED Viewed

@@ -0,0 +1,150 @@
+TDP4R Programmers Guide
+Introduction
+------------
+TDP4R is a ruby component that helps us to construct a top-down parser using
+method calls. This document describes how to use TDP4R in two styles.
+Both of styles are similar to one of JavaCC on the surface. However, one is a
+style in which we define rules of a grammar as methods (like shown in sample4.rb).
+The other is a style in which each rule is defined as if it is a property of
+a grammar (see also sample5.rb).
+Defining Rules in Module
+-------------------------
+The following class is a parser class, and it accepts expressions that consists
+of digits and "+".
+  class MyParser
+    include TDParser
+    def expr
+      token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } |
+      token(/\d+/) >> proc{|x| x[0].to_i }
+    end
+  end
+In this class, the method expr represents the following production rule.
+  expr := int '+' expr
+        | int
+In addition, at the first line of the method expr, values accepted by
+token(/\d+/), token("+") and rule(:expr) are assigned to x[0], x[1] and
+x[2] respectively.
+After that, in order to parse "1 + 2", we first split it into an array
+of tokens like ["1", "+", "2"], and then call the parse method of a parser
+object, which is created by MyParser.new(), as follows.
+   parser = MyParser.new()
+   parser.expr.parse(["1", "+", "2"])
+Note that we can pass one of the following objects to the parse method.
+  - an Enumerable object
+    E.g.: expr.parse(["1", "+", "2"])
+  - an object which has methods 'shift' and 'unshift'
+    E.g.: expr.parse(TDParser::TokenGenerator{|x|
+                       x.yield("1"); x.yield("+"); x.yield("2")
+                     })
+  - a block
+    E.g.: expr.parse{|x| x.yield("1"); x.yield("+"); x.yield("2") }
+In that syntax, '+' is right-associative. However, we *can't* write as
+follows.
+    def expr
+      rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i }
+      token(/\d+/) >> proc{|x| x[0].to_i }
+    end
+This problem is called left-recursion problem. So we have to use one of the
+following rules instead.
+    def expr
+      token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x|
+        x[1].inject(x[0]){|acc,y|
+          case y[0]
+          when "+"
+            acc + y[1]
+          end
+        }
+      }
+    end
+    def expr  # javacc style
+      n = nil
+      (token(/\d+/) >> proc{|x| n = x }) -
+      (token("+") - rule(/\d+/) >> proc{|y|
+        case y[0]
+        when "+"
+          n += y[1].to_i
+        end
+      })*0 >> proc{|x| n }
+    end
+In the rules, '(...)*N' represents N or more rules '(...)'. x[1] has multiple
+sequences of tokens accepted by '(...)*0'. For example, if ["1", "+","1","+","2"]
+is parsed by the rule:
+  token(/\d+/) - (token("+") - token(/\d+/))*0,
+we obtain [["+", "1"], ["+", "2"]] by x[1].
+Defining Rules using TDParser.define()
+---------------------------------------
+The rule defined in the first sample script, shown in the previous section, can
+also be defined as follows.
+  parser = TDParser.define{|g|
+    g.expr =
+      g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } |
+      g.token(/\d+/) >> proc{|x| x[0].to_i }
+  }
+(See also sample5.rb and sample6.rb)
+Parser Combinators
+-------------------
+* Constructors
+  token(obj)
+  rule(method)
+  any()				any token
+  none()			no more token
+  empty()			empty
+  fail()			failure
+  backref(label)	back reference
+  stackref(stack)	stack reference
+* Operators
+  rule - rule		sequence
+  rule | rule		choice
+  rule * n			iteration
+  rule * n..m		iteration
+  rule / label		label
+  rule % stack		stack
+  ~ rule			negative lookahead
+* Utility Functions
+  leftrec(base, rule1, ..., ruleN, &action)
+  					This constructs the following rule:
+  						base - ruleN* >> action' |
+  						... |
+  						base - rule1* >> action' |
+  						fail()
+  rightrec(rule1, ..., ruleN, base, &action)
+  					This constructs the following rule:
+  						ruleN* - base >> action' |
+  						... |
+  						rule1* - base >> action' |
+  						fail()
+  chainl(base, infix1, ..., infixN, &action)
+  chainr(base, infix1, ..., infixN, &action)
+StringTokenizer
+-----------------
+There is a simple tokenizer called TDPUtils::StringTokenizer in the library
+"tdputils".
+(See MyParser#parse in sample2.rb)

data/lib/tdp.rb ADDED Viewed

@@ -0,0 +1,463 @@
+# -*- ruby -*-
+#
+# Top-down parser for embedded in a ruby script.
+#
+require 'generator'
+module TDParser
+  class ParserException < RuntimeError
+  end
+  class TokenGenerator < Generator
+    def initialize(*args)
+      super(*args)
+      @buffer = []
+    end
+    def shift()
+      if( @buffer.empty? )
+        if( self.next? )
+          token = self.next()
+        else
+          token = nil
+        end
+      else
+        token = @buffer.shift()
+      end
+      token
+    end
+    def unshift(*token)
+      @buffer.unshift(*token)
+    end
+  end
+  class TokenBuffer < Array
+    attr_accessor :map
+    def initialize(*args)
+      super(*args)
+      @map = {}
+    end
+    def [](idx)
+      case idx
+      when Symbol, String
+        @map[idx]
+      else
+        super(idx)
+      end
+    end
+    def []=(idx, val)
+      case idx
+      when Symbol, String
+        @map[idx] = val
+      else
+        super(idx, val)
+      end
+    end
+    def state()
+      @map[:__state__]
+    end
+    def state=(s)
+      @map[:__state__] = s
+    end
+    def clear()
+      super()
+      @map.clear()
+    end
+  end
+  class Sequence < Array
+    def +(seq)
+      self.dup.concat(seq)
+    end
+  end
+  module BufferUtils
+    def prepare(buff)
+      b = TokenBuffer.new()
+      b.map = buff.map
+      b
+    end
+    def recover(buff, ts)
+      buff.each{|b| ts.unshift(b)}
+      buff.clear()
+    end
+  end
+  include BufferUtils
+  class Rule < Proc
+    include BufferUtils
+    def -(r)
+      Rule.new{|ts, buff|
+        if( (x = self[ts, buff]).nil? )
+          nil
+        else
+          if( (y = r[ts, buff]).nil? )
+            nil
+          else
+            x + y
+          end
+        end
+      }
+    end
+    def |(r)
+      Rule.new{|ts, buff|
+        b = prepare(buff)
+        if( (x = self[ts, b]).nil? )
+          recover(b, ts)
+          r[ts, buff]
+        else
+          buff.insert(0, *b)
+          x
+        end
+      }
+    end
+    def *(n)
+      if( n.is_a?(Range) )
+        range = n
+        n = range.min
+      else
+        range = nil
+      end
+      Rule.new{|ts, buff|
+        x  = true
+        xs = []
+        while( n > 0 )
+          n -= 1
+          b = prepare(buff)
+          if( (x = self[ts, b]).nil? )
+            recover(b, ts)
+            break
+          else
+            buff.insert(0, *b)
+            xs.push(x)
+          end
+        end
+        if ( x.nil? )
+          nil
+        else
+          if( range )
+            range.each{
+              while( true )
+                y = x
+                b = prepare(buff)
+                if( (x = self[ts, b]).nil? )
+                  recover(b, ts)
+                  x = y
+                  break
+                else
+                  buff.insert(0, *b)
+                  xs.push(x)
+                end
+              end
+            }
+          else
+            while( true )
+              y = x
+              b = prepare(buff)
+              if( (x = self[ts, b]).nil? )
+                recover(b, ts)
+                x = y
+                break
+              else
+                buff.insert(0, *b)
+                xs.push(x)
+              end
+            end
+          end
+          Sequence[xs]
+        end
+      }
+    end
+    def >>(act)
+      Rule.new{|tokens, buff|
+        if( (x = self[tokens, buff]).nil? )
+          nil
+        else
+          x = TokenBuffer[*x]
+          x.map = buff.map
+          Sequence[act[x]]
+        end
+      }
+    end
+    def /(symbol)
+      Rule.new{|tokens, buff|
+        x = self[tokens, buff]
+        buff.map[symbol] = x
+        x
+      }
+    end
+    def %(stack)
+      Rule.new{|tokens, buff|
+        x = self[tokens, buff]
+        stack.push(x)
+        x
+      }
+    end
+    def >(symbol)
+      Rule.new{|tokens, buff|
+        buff[symbol] = buff.dup()
+        self[tokens, buff]
+      }
+    end
+    def ~@()
+      Rule.new{|tokens, buff|
+        b = prepare(buff)
+        r = self[tokens,b]
+        rev = b.reverse
+        recover(b, tokens)
+        if( r.nil? )
+          Sequence[Sequence[*rev]]
+        else
+          nil
+        end
+      }
+    end
+    def parse(tokens=nil, &blk)
+      if( blk.nil? )
+        if( tokens.respond_to?(:shift) && tokens.respond_to?(:unshift) )
+          @tokens = tokens
+        elsif( tokens.respond_to?(:each) )
+          @tokens = TokenGenerator.new(tokens)
+        else
+          @tokens = tokens
+        end
+      else
+        @tokens = TokenGenerator.new(&blk)
+      end
+      r = self[@tokens, TokenBuffer.new()]
+      if( r.nil? )
+        nil
+      else
+        r[0]
+      end
+    end
+    def peek()
+      t = @tokens.shift()
+      if( ! t.nil? )
+        @tokens.unshift(t)
+      end
+      t
+    end
+    def do(&block)
+        self >> block
+    end
+  end
+  # end of Rule
+  def rule(sym, *opts)
+    Rule.new{|tokens, buff|
+      res = nil
+      case sym
+      when Symbol, String
+        res = __send__(sym,*opts)[tokens, buff]
+      when Rule
+        res = sym[tokens, buff]
+      end
+      if( block_given? && !res.nil? )
+        res = yield(res)
+      end
+      res
+    }
+  end
+  def token(x, eqsym=:===)
+    Rule.new{|tokens, buff|
+      t = tokens.shift
+      buff.unshift(t)
+      if( x.__send__(eqsym,t) || t.__send__(eqsym,x) )
+        t = yield(t) if( block_given? )
+        Sequence[t]
+      else
+        nil
+      end
+    }
+  end
+  def __backref__(xs, eqsym)
+    x = xs.shift()
+    xs.inject(token(x, eqsym)){|acc,x|
+      case x
+      when Sequence
+        acc - __backref__(x, eqsym)
+      else
+        acc - token(x, eqsym)
+      end
+    }
+  end
+  def backref(x, eqsym=:===)
+    Rule.new{|tokens, buff|
+      ys = buff.map[x]
+      if (ys.nil? || ys.empty?)
+        nil
+      else
+        __backref__(ys.dup(), eqsym)[tokens,buff]
+      end
+    }
+  end
+  def stackref(stack, eqsym=:===)
+    Rule.new{|tokens, buff|
+      ys = stack.pop()
+      if (ys.nil? || ys.empty?)
+        nil
+      else
+        __backref__(ys.dup(), eqsym)[tokens,buff]
+      end
+    }
+  end
+  def state(s)
+    Rule.new{|tokens, buff|
+      if (buff.map[:state] == s)
+        Sequence[s]
+      else
+        nil
+      end
+    }
+  end
+  def empty_rule()
+    Rule.new{|tokens, buff| Sequence[nil] }
+  end
+  alias empty empty_rule
+  def any_rule()
+    Rule.new{|tokens, buff|
+      t = tokens.shift
+      if (t.nil?)
+        nil
+      else
+        Sequence[t]
+      end
+    }
+  end
+  alias any any_rule
+  def none_rule()
+    Rule.new{|tokens, buff|
+      t = tokens.shift
+      if (t.nil?)
+        Sequence[nil]
+      else
+        nil
+      end
+    }
+  end
+  alias none none_rule
+  def fail_rule()
+    Rule.new{|tokens, buff| nil }
+  end
+  alias fail fail_rule
+  def leftrec(*rules, &act)
+    f = Proc.new{|x|
+      x[1].inject(x[0]){|acc,y|
+        act.call(Sequence[acc,*y])
+      }
+    }
+    base = rules.shift()
+    rules.collect{|r| base - r*0 >> f}.inject(fail()){|acc,r| r | acc}
+  end
+  def rightrec(*rules, &act)
+    f = Proc.new{|x|
+      x[0].reverse.inject(x[1]){|acc,y|
+        ys = y.dup()
+        ys.push(acc)
+        act.call(Sequence[*ys])
+      }
+    }
+    base = rules.pop()
+    rules.collect{|r| r*0 - base >> f}.inject(fail()){|acc,r| r | acc}
+  end
+  def chainl(base, *infixes, &act)
+    infixes.inject(base){|acc,r|
+      leftrec(acc, r - acc, &act)
+    }
+  end
+  def chainr(base, *infixes, &act)
+    infixes.inject(base){|acc,r|
+      rightrec(acc - r, acc, &act)
+    }
+  end
+  class Grammar
+    include TDParser
+    def define(&block)
+      instance_eval{
+        begin
+          alias method_missing g_method_missing
+          block.call(self)
+        ensure
+          undef method_missing
+        end
+      }
+    end
+    def g_method_missing(sym, *args)
+      arg0 = args[0]
+      sym = sym.to_s()
+      if (sym[-1,1] == "=")
+        case arg0
+        when Rule
+          self.class.instance_eval{
+            define_method(sym[0..-2]){ arg0 }
+          }
+        else
+          t = token(arg0)
+          self.class.instance_eval{
+            define_method(sym[0..-2]){ t }
+          }
+        end
+      elsif (args.size == 0)
+        rule(sym)
+      else
+        raise(NoMethodError, "undefined method `#{sym}' for #{self.inspect}")
+      end
+    end
+    alias method_missing g_method_missing
+  end
+  def TDParser.define(*args, &block)
+    klass = Class.new(Grammar)
+    g = klass.new()
+    begin
+      if defined?(g.instance_exec)
+        g.instance_exec(g, &block)
+      else
+        g.instance_eval(&block)
+      end
+    ensure
+      g.instance_eval{
+        undef method_missing
+      }
+    end
+    g
+  end
+end