RubyGems - ebnf - Versions diffs - 0.0.1 → 0.1.0 - Mend

ebnf 0.0.1 → 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

data/lib/ebnf/base.rb ADDED Viewed

@@ -0,0 +1,266 @@
+require 'strscan'
+# Extended Bakus-Nour Form (EBNF), being the W3C variation is
+# originaly defined in the
+# [W3C XML 1.0 Spec](http://www.w3.org/TR/REC-xml/#sec-notation).
+#
+# This version attempts to be less strict than the strict definition
+# to allow for coloquial variations (such as in the Turtle syntax).
+#
+# A rule takes the following form:
+#     \[1\]  symbol ::= expression
+#
+# Comments include the content between '/*' and '*/'
+#
+# @see http://www.w3.org/2000/10/swap/grammar/ebnf2turtle.py
+# @see http://www.w3.org/2000/10/swap/grammar/ebnf2bnf.n3
+#
+# Based on bnf2turtle by Dan Connolly.
+#
+# Motivation
+# ----------
+#
+# Many specifications include grammars that look formal but are not
+# actually checked, by machine, against test data sets. Debugging the
+# grammar in the XML specification has been a long, tedious manual
+# process. Only when the loop is closed between a fully formal grammar
+# and a large test data set can we be confident that we have an accurate
+# specification of a language (and even then, only the syntax of the language).
+#
+#
+# The grammar in the [N3 design note][] has evolved based on the original
+# manual transcription into a python recursive-descent parser and
+# subsequent development of test cases. Rather than maintain the grammar
+# and the parser independently, our [goal] is to formalize the language
+# syntax sufficiently to replace the manual implementation with one
+# derived mechanically from the specification.
+#
+#
+# [N3 design note]: http://www.w3.org/DesignIssues/Notation3
+#
+# Related Work
+# ------------
+#
+# Sean Palmer's [n3p announcement][] demonstrated the feasibility of the
+# approach, though that work did not cover some aspects of N3.
+#
+# In development of the [SPARQL specification][], Eric Prud'hommeaux
+# developed [Yacker][], which converts EBNF syntax to perl and C and C++
+# yacc grammars. It includes an interactive facility for checking
+# strings against the resulting grammars.
+# Yosi Scharf used it in [cwm Release 1.1.0rc1][], which includes
+# a SPAQRL parser that is *almost* completely mechanically generated.
+#
+# The N3/turtle output from yacker is lower level than the EBNF notation
+# from the XML specification; it has the ?, +, and * operators compiled
+# down to pure context-free rules, obscuring the grammar
+# structure. Since that transformation is straightforwardly expressed in
+# semantic web rules (see [bnf-rules.n3][]), it seems best to keep the RDF
+# expression of the grammar in terms of the higher level EBNF
+# constructs.
+#
+# [goal]: http://www.w3.org/2002/02/mid/1086902566.21030.1479.camel@dirk;list=public-cwm-bugs
+# [n3p announcement]: http://lists.w3.org/Archives/Public/public-cwm-talk/2004OctDec/0029.html
+# [Yacker]: http://www.w3.org/1999/02/26-modules/User/Yacker
+# [SPARQL specification]: http://www.w3.org/TR/rdf-sparql-query/
+# [Cwm Release 1.1.0rc1]: http://lists.w3.org/Archives/Public/public-cwm-announce/2005JulSep/0000.html
+# [bnf-rules.n3]: http://www.w3.org/2000/10/swap/grammar/bnf-rules.n3
+#
+# Open Issues and Future Work
+# ---------------------------
+#
+# The yacker output also has the terminals compiled to elaborate regular
+# expressions. The best strategy for dealing with lexical tokens is not
+# yet clear. Many tokens in SPARQL are case insensitive; this is not yet
+# captured formally.
+#
+# The schema for the EBNF vocabulary used here (``g:seq``, ``g:alt``, ...)
+# is not yet published; it should be aligned with [swap/grammar/bnf][]
+# and the [bnf2html.n3][] rules (and/or the style of linked XHTML grammar
+# in the SPARQL and XML specificiations).
+#
+# It would be interesting to corroborate the claim in the SPARQL spec
+# that the grammar is LL(1) with a mechanical proof based on N3 rules.
+#
+# [swap/grammar/bnf]: http://www.w3.org/2000/10/swap/grammar/bnf
+# [bnf2html.n3]: http://www.w3.org/2000/10/swap/grammar/bnf2html.n3
+#
+# Background
+# ----------
+#
+# The [N3 Primer] by Tim Berners-Lee introduces RDF and the Semantic
+# web using N3, a teaching and scribbling language. Turtle is a subset
+# of N3 that maps directly to (and from) the standard XML syntax for
+# RDF.
+#
+# [N3 Primer]: http://www.w3.org/2000/10/swap/Primer.html
+#
+# @author Gregg Kellogg
+module EBNF
+  class Base
+    include BNF
+    include LL1
+    include Parser
+    # Abstract syntax tree from parse
+    # @!attribute [r] ast
+    # @return [Array<Rule>]
+    attr_reader :ast
+    # Grammar errors, or errors found genering parse tables
+    # @!attribute [r] errors
+    # @return [Array<String>]
+    attr_accessor :errors
+    # Parse the string or file input generating an abstract syntax tree
+    # in S-Expressions (similar to SPARQL SSE)
+    #
+    # @param [#read, #to_s] input
+    # @param [Hash{Symbol => Object}] options
+    # @option options [Boolean, Array] :debug
+    #   Output debug information to an array or STDOUT.
+    def initialize(input, options = {})
+      @options = options
+      @lineno, @depth, @errors = 1, 0, []
+      terminal = false
+      @ast = []
+      input = input.respond_to?(:read) ? input.read : input.to_s
+      scanner = StringScanner.new(input)
+      eachRule(scanner) do |r|
+        debug("rule string") {r.inspect}
+        case r
+        when /^@terminals/
+          # Switch mode to parsing terminals
+          terminal = true
+        when /^@pass\s*(.*)$/m
+          rule = depth {ruleParts("[0] " + r)}
+          rule.kind = :pass
+          rule.orig = r
+          @ast << rule
+        else
+          rule = depth {ruleParts(r)}
+          rule.kind = :terminal if terminal # Override after we've parsed @terminals
+          rule.orig = r
+          @ast << rule
+        end
+      end
+    end
+    # Iterate over each rule or terminal
+    # @param [:termina, :rule] kind
+    # @yield rule
+    # @yieldparam [Rule] rule
+    def each(kind, &block)
+      ast.each {|r| block.call(r) if r.kind == kind}
+    end
+    ##
+    # Write out parsed syntax string as an S-Expression
+    # @return [String]
+    def to_sxp
+      begin
+        require 'sxp'
+        SXP::Generator.string(ast.sort)
+      rescue LoadError
+        ast.to_sxp
+      end
+    end
+    def to_s; to_sxp; end
+    def dup
+      new_obj = super
+      new_obj.instance_variable_set(:@ast, @ast.dup)
+      new_obj
+    end
+    ##
+    # Find a rule given a symbol
+    # @param [Symbol] sym
+    # @return [Rule]
+    def find_rule(sym)
+      (@find ||= {})[sym] ||= ast.detect {|r| r.sym == sym}
+    end
+    ##
+    # Write out syntax tree as Turtle
+    # @param [String] prefix for language
+    # @param [String] ns URI for language
+    # @return [String]
+    def to_ttl(prefix, ns)
+      unless ast.empty?
+        [
+          "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.",
+          "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.",
+          "@prefix #{prefix}: <#{ns}>.",
+          "@prefix : <#{ns}>.",
+          "@prefix re: <http://www.w3.org/2000/10/swap/grammar/regex#>.",
+          "@prefix g: <http://www.w3.org/2000/10/swap/grammar/ebnf#>.",
+          "",
+          ":language rdfs:isDefinedBy <>; g:start :#{ast.first.id}.",
+          "",
+        ]
+      end.join("\n") +
+      ast.sort.
+        select {|a| [:rule, :terminal].include?(a.kind)}.
+        map(&:to_ttl).
+        join("\n")
+    end
+    def depth
+      @depth += 1
+      ret = yield
+      @depth -= 1
+      ret
+    end
+    # Progress output, less than debugging
+    def progress(*args)
+      return unless @options[:progress]
+      options = args.last.is_a?(Hash) ? args.pop : {}
+      depth = options[:depth] || @depth
+      args << yield if block_given?
+      message = "#{args.join(': ')}"
+      str = "[#{@lineno}]#{' ' * depth}#{message}"
+      @options[:debug] << str if @options[:debug].is_a?(Array)
+      $stderr.puts(str)
+    end
+    # Error output
+    def error(*args)
+      options = args.last.is_a?(Hash) ? args.pop : {}
+      depth = options[:depth] || @depth
+      args << yield if block_given?
+      message = "#{args.join(': ')}"
+      @errors << message
+      str = "[#{@lineno}]#{' ' * depth}#{message}"
+      @options[:debug] << str if @options[:debug].is_a?(Array)
+      $stderr.puts(str)
+    end
+    ##
+    # Progress output when debugging
+    #
+    # @overload debug(node, message)
+    #   @param [String] node relative location in input
+    #   @param [String] message ("")
+    #
+    # @overload debug(message)
+    #   @param [String] message ("")
+    #
+    # @yieldreturn [String] added to message
+    def debug(*args)
+      return unless @options[:debug]
+      options = args.last.is_a?(Hash) ? args.pop : {}
+      depth = options[:depth] || @depth
+      args << yield if block_given?
+      message = "#{args.join(': ')}"
+      str = "[#{@lineno}]#{' ' * depth}#{message}"
+      @options[:debug] << str if @options[:debug].is_a?(Array)
+      $stderr.puts(str) if @options[:debug] == true
+    end
+  end
+end

data/lib/ebnf/bnf.rb ADDED Viewed

@@ -0,0 +1,50 @@
+module EBNF
+  module BNF
+    ##
+    # Transform EBNF Rule set to BNF:
+    #
+    #   * Add rule [0] (_empty rule (seq))
+    #   * Transform each rule into a set of rules that are just BNF, using {Rule#to_bnf}.
+    # @return [ENBF] self
+    def make_bnf
+      progress("make_bnf") {"Start: #{@ast.length} rules"}
+      new_ast = [Rule.new(:_empty, "0", [:seq], :kind => :rule)]
+      ast.each do |rule|
+        debug("make_bnf") {"expand from: #{rule.inspect}"}
+        new_rules = rule.to_bnf
+        debug(" => ") {new_rules.map(&:sym).join(', ')}
+        new_ast += new_rules
+      end
+      # Consolodate equivalent terminal rules
+      to_rewrite = {}
+      new_ast.select {|r| r.kind == :terminal}.each do |src_rule|
+        new_ast.select {|r| r.kind == :terminal}.each do |dst_rule|
+          if src_rule.equivalent?(dst_rule) && src_rule != dst_rule
+            debug("make_bnf") {"equivalent rules: #{src_rule.inspect} and #{dst_rule.inspect}"}
+            (to_rewrite[src_rule] ||= []) << dst_rule
+          end
+        end
+      end
+      # Replace references to equivalent rules with canonical rule
+      to_rewrite.each do |src_rule, dst_rules|
+        dst_rules.each do |dst_rule|
+          new_ast.each do |mod_rule|
+            debug("make_bnf") {"rewrite #{mod_rule.inspect} from #{dst_rule.sym} to #{src_rule.sym}"}
+            mod_rule.rewrite(dst_rule, src_rule)
+          end
+        end
+      end
+      # AST now has just rewritten rules
+      compacted_ast = new_ast - to_rewrite.values.flatten.compact
+      # Sort AST by number
+      @ast = compacted_ast
+      progress("make_bnf") {"End: #{@ast.length} rules"}
+      self
+    end
+  end
+end

data/lib/ebnf/ll1.rb ADDED Viewed

@@ -0,0 +1,321 @@
+module EBNF
+  module LL1
+    autoload :Lexer,    "ebnf/ll1/lexer"
+    autoload :Parser,   "ebnf/ll1/parser"
+    autoload :Scanner,  "ebnf/ll1/scanner"
+    # Branch table, represented as a recursive hash.
+    # The table is indexed by rule symbol, which in-turn references a hash of terminals (which are the first terminals of the production), which in turn reference the sequence of rules that follow, given that terminal as input
+    # @!attribute [r] branch
+    # @return [Hash{Symbol => Hash{String, Symbol => Array<Symbol>}}]
+    attr_reader :branch
+    # First table
+    # @!attribute [r] first
+    # @return [Hash{Symbol, String => Symbol}]
+    attr_reader :first
+    # Follow table
+    # @!attribute [r] first
+    # @return [Hash{Symbol, String => Symbol}]
+    attr_reader :follow
+    # Terminal table
+    # The list of terminals used in the grammar.
+    # @!attribute [r] terminals
+    # @return [Array<String, Symbol>]
+    attr_reader :terminals
+    # Start symbol
+    # The rule which starts the grammar
+    # @!attribute[r] start
+    # @return [Symbol]
+    attr_reader :start
+    ##
+    # Create first/follow for each rule using techniques defined for LL(1) parsers.
+    #
+    # @return [EBNF] self
+    # @see http://en.wikipedia.org/wiki/LL_parser#Constructing_an_LL.281.29_parsing_table
+    # @param [Symbol] start
+    #   Set of symbols which are start rules
+    def first_follow(start)
+      # Add _eof to follow all start rules
+      if @start = start
+        start_rule = find_rule(@start)
+        raise "No rule found for start symbol #{@start}" unless start_rule
+        start_rule.add_follow([:_eof])
+        start_rule.start = true
+      end
+      # Comprehnsion rule, create shorter versions of all non-terminal sequences
+      comprehensions = []
+      begin
+        comprehensions = []
+        ast.select {|r| r.seq? && r.kind == :rule && r.expr.length > 2}.each do |rule|
+          new_expr = rule.expr[2..-1].unshift(:seq)
+          unless ast.any? {|r| r.expr == new_expr}
+            debug("first_follow") {"add comprehension rule for #{rule.sym} => #{new_expr.inspect}"}
+            new_rule = rule.build(new_expr)
+            rule.comp = new_rule
+            comprehensions << new_rule
+          end
+        end
+        @ast += comprehensions
+        progress("first_follow") {"comprehensions #{comprehensions.length}"}
+      end while !comprehensions.empty?
+      # Fi(a w' ) = { a } for every terminal a
+      # For each rule who's expr's first element of a seq a terminal, or having any element of alt a terminal, add that terminal to the first set for this rule
+      each(:rule) do |rule|
+        each(:terminal) do |terminal|
+          rule.add_first([terminal.sym]) if rule.starts_with(terminal.sym)
+        end
+        # Add strings to first for strings which are start elements
+        start_strs = rule.starts_with(String)
+        rule.add_first(start_strs) if start_strs
+      end
+      # # Fi(ε) = { ε }
+      # Add _eps as a first of _empty
+      empty = ast.detect {|r| r.sym == :_empty}
+      empty.add_first([:_eps])
+      # Loop until no more first elements are added
+      firsts, follows = 0, 0
+      begin
+        firsts, follows = 0, 0
+        each(:rule) do |rule|
+          each(:rule) do |first_rule|
+            next if first_rule == rule || first_rule.first.nil?
+            # Fi(A w' ) = Fi(A) for every nonterminal A with ε not in Fi(A)
+            # For each rule that starts with another rule having firsts, add  the firsts of that rule to this rule, unless it already has those terminals in its first
+            if rule.starts_with(first_rule.sym)
+              depth {debug("FF.1") {"add first #{first_rule.first.inspect} to #{rule.sym}"}}
+              firsts += rule.add_first(first_rule.first)
+            end
+            # Fi(A w' ) = Fi(A) \ { ε } ∪ Fi(w' ) for every nonterminal A with ε in Fi(A)
+            # For each rule starting with eps, add the terminals for the comprehension of this rule
+            if rule.seq? &&
+               rule.expr.fetch(1, nil) == first_rule &&
+               first_rule.first.include?(:_eps) &&
+               (comp = rule.comp)
+              depth {debug("FF.2") {"add first #{first_rule.first.inspect} to #{comp.sym}"}}
+              firsts += comp.add_first(first_rule.first)
+            end
+          end
+          # Only run these rules if the rule is a sequence having two or more elements, whos first element is also a sequence and first_rule is the comprehension of rule
+          if rule.seq? && (comp = rule.comp)
+             #if there is a rule of the form Aj → wAiw' , then
+             #
+            if (ai = find_rule(rule.expr[1])) && ai.kind == :rule && comp.first
+              #    * if the terminal a is in Fi(w' ), then add a to Fo(Ai)
+              #
+              # Add follow terminals based on the first terminals
+              # of a comprehension of this rule (having the same
+              # sequence other than the first rule in the sequence)
+              #
+              # @example
+              #   rule: (seq a b c)
+              #   first_rule: (seq b c)
+              #   if first_rule.first == [T]
+              #   => a.follow += [T]
+              depth {debug("FF.3") {"add follow #{comp.first.inspect} to #{ai.sym}"}}
+              follows += ai.add_follow(comp.first)
+            end
+            # Follows of a rule are also follows of the comprehension of the rule.
+            if rule.follow
+              depth {debug("FF.4") {"add follow #{rule.follow.inspect} to #{comp.sym}"}}
+              follows += comp.add_follow(rule.follow)
+            end
+            #    * if ε is in Fi(w' ), then add Fo(Aj) to Fo(Ai)
+            #
+            # If the comprehension of a sequence has an _eps first, then the follows of the rule also become the follows of the first member of the rule
+            if comp.first && comp.first.include?(:_eps) && rule.first &&
+               (member = find_rule(rule.expr.fetch(1, nil))) &&
+               member.kind == :rule
+              depth {debug("FF.5") {"add follow #{rule.follow.inspect} to #{member.sym}"}}
+              follows += member.add_follow(rule.first)
+            end
+          end
+          # Follows of a rule are also follows of the last production in the rule
+          if rule.seq? && rule.follow &&
+             (member = find_rule(rule.expr.last)) &&
+             member.kind == :rule
+            depth {debug("FF.6") {"add follow #{rule.follow.inspect} to #{member.sym}"}}
+            follows += member.add_follow(rule.follow)
+          end
+          # For alts, anything that follows the rule follows each member of the rule
+          if rule.alt? && rule.follow
+            rule.expr[1..-1].map {|s| find_rule(s)}.each do |mem|
+              if mem && mem.kind == :rule
+                depth {debug("FF.7") {"add follow #{rule.first.inspect} to #{mem.sym}"}}
+                follows += mem.add_follow(rule.follow)
+              end
+            end
+          end
+        end
+        progress("first_follow") {"firsts #{firsts}, follows #{follows}"}
+      end while (firsts + follows) > 0
+    end
+    ##
+    # Generate parser tables, {#branch}, {#first}, {#follow}, and {#terminals}
+    def build_tables
+      progress("build_tables") {
+        "Terminals: #{ast.count {|r| r.kind == :terminal}} " +
+        "Non-Terminals: #{ast.count {|r| r.kind == :rule}}"
+      }
+      @first = ast.
+        select(&:first).
+        inject({}) {|memo, r|
+          memo[r.sym] = r.first.reject {|t| t == :_eps};
+          memo
+        }
+      @follow = ast.
+        select(&:follow).
+        inject({}) {|memo, r|
+          memo[r.sym] = r.first.reject {|t| t == :_eps};
+          memo
+        }
+      @terminals = ast.map do |r|
+        (r.first || []) + (r.follow || [])
+      end.flatten.uniq
+      @terminals = (@terminals - [:_eps, :_eof, :_empty]).sort_by(&:to_s)
+      @branch = {}
+      @already = []
+      @agenda = []
+      do_production(@start)
+      while !@agenda.empty?
+        x = @agenda.shift
+        do_production(x)
+      end
+      if !@errors.empty?
+        progress("###### FAILED with #{errors.length} errors.")
+        @errors.each {|s| progress("  #{s}")}
+        raise "Table creation failed with errors"
+      else
+        progress("Ok for predictive parsing")
+      end
+    end
+    # Generate an output table in Ruby format
+    # @param [IO, StringIO] io
+    # @param [String] name of the table constant
+    # @param [String] table
+    #   to output, one of {#branch}, {#first}, {#follow}, or {#terminals}
+    # @param [Integer] indent = 0
+    def outputTable(io, name, table, indent = 0)
+      ind0 = '  ' * indent
+      ind1 = ind0 + '  '
+      ind2 = ind1 + '  '
+      if table.is_a?(Hash)
+        io.puts "#{ind0}#{name} = {"
+        table.keys.sort_by(&:to_s).each do |prod|
+          case table[prod]
+          when Array
+            list = table[prod].map(&:inspect).join(",\n#{ind2}")
+            io.puts "#{ind1}#{prod.inspect} => [\n#{ind2}#{list}],"
+          when Hash
+            io.puts "#{ind1}#{prod.inspect} => {"
+            table[prod].keys.sort_by(&:to_s).each do |term|
+              list = table[prod][term].map(&:inspect).join(", ")
+              io.puts "#{ind2}#{term.inspect} => [#{list}],"
+            end
+            io.puts "#{ind1}},"
+          else
+            "Unknown table entry type: #{table[prod].class}"
+          end
+        end
+        io.puts "#{ind0}}.freeze\n"
+      else
+        io.puts "#{ind0}#{name} = [\n#{ind1}" +
+          table.sort_by(&:to_s).map(&:inspect).join(",\n#{ind1}") +
+          "\n#{ind0}].freeze\n"
+      end
+    end
+    private
+    def do_production(lhs)
+      rule = find_rule(lhs)
+      if rule.nil? || rule.kind != :rule || rule.sym == :_empty
+        progress("prod") {"Skip: #{lhs.inspect}"}
+        return
+      end
+      @already << lhs
+      branchDict = {}
+      progress("prod") {"Production #{lhs.inspect}"}
+      if rule.expr.first == :matches
+        debug("prod") {"Rule is regexp: #{rule}"}
+        error("No record of what token #{lhs} can start with") unless rule.first
+        return
+      end
+      if rule.alt?
+        # Add entries for each alternative, based on the alternative's first/seq
+        rule.expr[1..-1].each do |prod|
+          prod_rule = find_rule(prod)
+          debug("  Alt", prod)
+          @agenda << prod unless @already.include?(prod) || @agenda.include?(prod)
+          if prod == :_empty
+            debug("    empty")
+            branchDict[prod] = []
+          elsif prod_rule.nil? || prod_rule.first.nil?
+            debug("    no first =>", prod)
+            branchDict[prod] = [prod]
+          else
+            prod_rule.first.each do |f|
+              if branchDict.has_key?(f)
+                error("First/First Conflict: #{f} is also the condition for #{branchDict[f]}")
+              end
+              debug("   alt") {"[#{f}] => #{prod}"}
+              branchDict[f] = [prod]
+            end
+          end
+        end
+      else
+        error("prod") {"Expected lhs to be alt or seq, was: #{rule}"} unless rule.seq?
+        debug("  Seq", rule)
+        # Entries for each first element referencing the sequence
+        (rule.first || []).each do |f|
+          debug("   seq") {"[#{f}] => #{rule.expr[1..-1].inspect}"}
+          branchDict[f] = rule.expr[1..-1]
+        end
+        # Add each production to the agenda
+        rule.expr[1..-1].each do |prod|
+          @agenda << prod unless @already.include?(prod) || @agenda.include?(prod)
+        end
+      end
+      # Add follow rules
+      (rule.follow || []).each do |f|
+        debug("  Follow") {f.inspect}
+        branchDict[f] ||= []
+      end
+      @branch[lhs] = branchDict
+    end
+  end
+end