RubyGems - tdparser - Versions diffs - 1.5.0 - Mend

tdparser 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 0b0d873d5eee490ac295ad3a31d7df50c8477335aeac7f94d1614e429de40b05
+  data.tar.gz: 4e247c68e1d59d4931d97a32990cda0f0d708a291d09a50ba4b8ac0dffb777d1
+SHA512:
+  metadata.gz: 80fd063170357063e7c2bc1ad0479246cd3ffd85b5f08586403b7523f42dc2d974d21f9f398d71553bc2848f9bc425461e1cf1696dfdf218e87e2aae26ef38fc
+  data.tar.gz: dbecb62bcc8104bc21fa5c08dd6bd6896c6430d473617f0b171df379e707939d0266718abaa8271cfda7244ac8a7fda1586a64e0b5ea2216c02cb0c0858e1bce

data/.dir-locals.el ADDED Viewed

@@ -0,0 +1,9 @@
+;;; Directory Local Variables            -*- no-byte-compile: t -*-
+;;; For more information see (info "(emacs) Directory Variables")
+((nil
+  . ((eval
+      . (progn
+          (require 'grep)
+          (add-to-list 'grep-find-ignored-directories "html")
+          (add-to-list 'grep-find-ignored-directories "coverage"))))))

data/.envrc ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ watch_file manifest.scm
2	+ use guix

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,8 @@
+AllCops:
+  TargetRubyVersion: 3.1
+  NewCops: enable
+  DisabledByDefault: true
+# Method "fail" defined.
+Style/SignalException:
+  Enabled: false

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,10 @@
+## [Unreleased]
+## [1.5.0] - 2024-11-08
+* Support Ruby 3.1 or later.
+* Fix and add tests.
+* Format documents.
+* Rename some modules; Moved `TDPUtils` and `TDPXML` to `TDParser`.
+* Rename require path from `tdp` to `tdparser`.
+* Rename the gem name to TDParser from TDP4R.

data/COPYING ADDED Viewed

@@ -0,0 +1,25 @@
+Copyright (c) 2003,2004,2005,2006 Takaaki Tateishi <ttate@ttsky.net>
+Copyright (c) 2024                gemmaro          <gemmaro.dev@gmail.com>
+All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+3. The name of the author may not be used to endorse or promote products
+   derived from this software without specific prior written permission.
+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
+IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

data/README ADDED Viewed

@@ -0,0 +1,31 @@
+= TDParser
+This is a top-down parser combinator library for Ruby (LL(k) parser),
+and is a successor of TDP4R.
+== Description
+TDParser is a Ruby library that helps us to construct a top-down
+parser using recursive method calls, that is also called a recursive
+descendent parser.  The main features are
+1. constructing a parser using combinators as in Parsec (Daan Leijen:
+   Parsec (Monadic Parser Combinator Library for Haskell),
+   http://www.cs.uu.nl/~daan/parsec.html),
+2. backtracking parse algorithm with unlimited lookahead (Bryan Ford:
+   "Packrat Parsing: Simple, Powerful, Lazy, Linear Time", ICFP,
+   2002.), and
+3. writing EBNF grammars using Ruby's objects.
+The feature of (1) enables us to change some production rules in a
+grammar at runtime and componentize a set of production rules.  From
+the feature of (2), we need not consider how to prevent conflicts
+among production rules.  In addition, TDParser can be viewed as an
+internal DSL for writing LL(k) grammars because of (3).
+== License
+  Copyright(C) 2003, 2004, 2005, 2006 Takaaki Tateishi <ttate@ttsky.net>
+  Copyright(C) 2024                   gemmaro          <gemmaro.dev@gmail.com>
+See COPYING.

data/Rakefile ADDED Viewed

@@ -0,0 +1,18 @@
+# frozen_string_literal: true
+$LOAD_PATH << File.join(__dir__, '../lib')
+require 'bundler/gem_tasks'
+require 'rdoc/task'
+require 'rake/testtask'
+RDoc::Task.new do |rdoc|
+  readme = 'README'
+  rdoc.main = readme
+  rdoc.rdoc_files.include('lib/**/*.rb', readme, 'doc/*.rdoc')
+end
+Rake::TestTask.new do |t|
+  t.libs << 'samples' << 'test'
+  t.test_files = FileList['test/*_test.rb']
+end

data/doc/faq.rdoc ADDED Viewed

@@ -0,0 +1,36 @@
+= How do I write a rule that represents left/right-associative infix operators
+One of the good example is an arithmetic expression for <tt>*</tt>,
+<tt>/</tt>, <tt>+</tt> and <tt>-</tt>.  If you use Racc (Yacc-style
+parser for Ruby), you would write the following rule:
+  prechigh
+    left '*','/'
+    left '+','-'
+  preclow
+  ...
+  expr : expr '*' expr { result = val[0] * val[2]}
+       | expr '/' expr { result = val[0] / val[2]}
+       | expr '+' expr { result = val[0] + val[2]}
+       | expr '-' expr { result = val[0] - val[2]}
+       | NUMBER        { result = val[0].to_i() }
+In TDParser, you can write the above rule as follows:
+  TDParser.define{|g|
+    g.expr = chainl(NUMBER >> Proc.new{|x| x[0].to_i},
+                    token("*")|token("/"),
+                    token("+")|token("-")){|x|
+      case x[1]
+      when "*"
+        x[0] * x[2]
+      when "/"
+        x[0] / x[2]
+      when "+"
+        x[0] + x[2]
+      when "-"
+        x[0] - x[2]
+      end
+    }
+    # ...
+  }

data/doc/guide.rdoc ADDED Viewed

@@ -0,0 +1,150 @@
+= TDParser Programmers Guide
+TDParser is a Ruby component that helps us to construct a top-down
+parser using method calls.  This document describes how to use TDParser
+in two styles.  Both of styles are similar to one of JavaCC on the
+surface.  However, one is a style in which we define rules of a
+grammar as methods (like shown in +sample4.rb+).  The other is a style
+in which each rule is defined as if it is a property of a grammar (see
+also +sample5.rb+).
+== Defining Rules in Module
+The following class is a parser class, and it accepts expressions that
+consists of digits and <tt>+</tt>.
+  class MyParser
+    include TDParser
+    def expr
+      token(/\d+/) - token("+") - rule(:expr) >> proc{|x| x[0].to_i + x[2] } |
+      token(/\d+/) >> proc{|x| x[0].to_i }
+    end
+  end
+In this class, the method +expr+ represents the following production
+rule.
+  expr := int '+' expr
+        | int
+In addition, at the first line of the method +expr+, values accepted
+by <tt>token(/\d+/)</tt>, <tt>token("+")</tt> and <tt>rule(:expr)</tt>
+are assigned to <tt>x[0]</tt>, <tt>x[1]</tt> and <tt>x[2]</tt>
+respectively.  After that, in order to parse <tt>1 + 2</tt>, we first
+split it into an array of tokens like <tt>["1", "+", "2"]</tt>, and
+then call the +parse+ method of a parser object, which is created by
+<tt>MyParser.new()</tt>, as follows.
+   parser = MyParser.new()
+   parser.expr.parse(["1", "+", "2"])
+Note that we can pass one of the following objects to the parse method.
+- an Enumerable object.  E.g.: <tt>expr.parse(["1", "+", "2"])</tt>
+- an object which has methods 'shift' and 'unshift'.
+  E.g.:
+    expr.parse(TDParser::TokenGenerator{|x|
+                 x.yield("1"); x.yield("+"); x.yield("2")
+               })
+- a block.  E.g.: <tt>expr.parse{|x| x.yield("1"); x.yield("+");
+  x.yield("2") }</tt>
+In that syntax, <tt>+</tt> is right-associative.  However, we
+<i>can't</i> write as follows.
+  def expr
+    rule(:expr) - token("+") - token(/\d+/) >> proc{|x| x[0].to_i + x[2].to_i }
+    token(/\d+/) >> proc{|x| x[0].to_i }
+  end
+This problem is called left-recursion problem.  So we have to use one
+of the following rules instead.
+  def expr
+    token(/\d+/) - (token("+") - token(/\d+/))*0 >> proc{|x|
+      x[1].inject(x[0]){|acc,y|
+        case y[0]
+        when "+"
+          acc + y[1]
+        end
+      }
+    }
+  end
+  def expr  # javacc style
+    n = nil
+    (token(/\d+/) >> proc{|x| n = x }) -
+    (token("+") - rule(/\d+/) >> proc{|y|
+      case y[0]
+      when "+"
+        n += y[1].to_i
+      end
+    })*0 >> proc{|x| n }
+  end
+In the rules, <tt>(...)*N</tt> represents <i>N</i> or more rules
+<tt>(...)</tt>.  <tt>x[1]</tt> has multiple sequences of tokens
+accepted by <tt>(...)*0</tt>.  For example, if <tt>["1",
+"+","1","+","2"]</tt> is parsed by the rule: <tt>token(/\d+/) -
+(token("+") - token(/\d+/))*0</tt>, we obtain <tt>[["+", "1"], ["+",
+"2"]]</tt> by <tt>x[1]</tt>.
+== Defining Rules using <tt>TDParser.define()</tt>
+The rule defined in the first sample script, shown in the previous
+section, can also be defined as follows.
+  parser = TDParser.define{|g|
+    g.expr =
+      g.token(/\d+/) - g.token("+") - g.expr >> proc{|x| x[0].to_i + x[2] } |
+      g.token(/\d+/) >> proc{|x| x[0].to_i }
+  }
+(See also <tt>sample5.rb</tt> and <tt>sample6.rb</tt>)
+== Parser Combinators
+* Constructors
+  * <tt>token(obj)</tt>
+  * <tt>rule(method)</tt>
+  * <tt>any()</tt>:: any token
+  * <tt>none()</tt>:: no more token
+  * <tt>empty()</tt>:: empty
+  * <tt>fail()</tt>:: failure
+  * <tt>backref(label)</tt>:: back reference
+  * <tt>stackref(stack)</tt>:: stack reference
+* Operators
+  * <tt>rule - rule</tt>:: sequence
+  * <tt>rule | rule</tt>:: choice
+  * <tt>rule * n</tt>:: iteration
+  * <tt>rule * n..m</tt>:: iteration
+  * <tt>rule / label</tt>:: label
+  * <tt>rule % stack</tt>:: stack
+  * <tt>~ rule</tt>:: negative lookahead
+* Utility Functions
+  * <tt>leftrec(base, rule1, ..., ruleN, &action)</tt>:: This constructs the following rule:
+      base - ruleN* >> action' |
+      ... |
+      base - rule1* >> action' |
+      fail()
+  * <tt>rightrec(rule1, ..., ruleN, base, &action)</tt>:: This constructs the following rule:
+      ruleN* - base >> action' |
+      ... |
+      rule1* - base >> action' |
+      fail()
+  * <tt>chainl(base, infix1, ..., infixN, &action)</tt>
+  * <tt>chainr(base, infix1, ..., infixN, &action)</tt>
+== <tt>StringTokenizer</tt>
+There is a simple tokenizer called TDParser::StringTokenizer in the
+library <tt>tdparser/utils</tt>.  (See <tt>MyParser#parse</tt> in
+<tt>sample2.rb</tt>)

data/lib/tdparser/utils.rb ADDED Viewed

@@ -0,0 +1,91 @@
+# frozen_string_literal: true
+require 'tdparser'
+module TDParser
+  class Token
+    attr_accessor :kind, :value
+    def initialize(kind, value)
+      @kind = kind
+      @value = value
+    end
+    def ==(other)
+      (other.class == self.class) &&
+        (@kind == other.kind) &&
+        (@value == other.value)
+    end
+    def ===(other)
+      super(other) || (@kind == other)
+    end
+    def =~(other)
+      @kind == other
+    end
+  end
+  class BasicStringTokenizer
+    def self.[](rule, ignore = nil)
+      new(rule, ignore)
+    end
+    def initialize(rule, ignore = nil)
+      require('strscan')
+      @rule = rule
+      @scan_pattern = Regexp.new(@rule.keys.join('|'))
+      @ignore_pattern = ignore
+    end
+    def generate(str)
+      scanner = StringScanner.new(str)
+      TDParser::TokenGenerator.new do |x|
+        until scanner.empty?
+          if @ignore_pattern
+            while scanner.scan(@ignore_pattern)
+            end
+          end
+          sstr = scanner.scan(@scan_pattern)
+          if sstr
+            @rule.each do |reg, kind|
+              next unless reg =~ sstr
+              x.yield(Token.new(kind, sstr))
+              yielded = true
+              break
+            end
+          else
+            c = scanner.scan(/./)
+            x.yield(c)
+          end
+        end
+      end
+    end
+  end
+  class StringTokenizer < BasicStringTokenizer
+    def initialize(rule, ignore = nil)
+      super(rule, ignore || /\s+/)
+    end
+  end
+  class WaitingTokenGenerator < TDParser::TokenGenerator
+    def initialize(*args)
+      super(*args)
+      @terminated = false
+    end
+    def terminate
+      @terminated = true
+    end
+    def shift
+      return nil if @terminated
+      while empty?
+      end
+      super()
+    end
+  end
+end

data/lib/tdparser/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+# frozen_string_literal: true
+module TDParser
+  VERSION = '1.5.0'
+end

data/lib/tdparser/xml.rb ADDED Viewed

@@ -0,0 +1,180 @@
+# frozen_string_literal: true
+require 'tdparser'
+require 'rexml/parsers/pullparser'
+require 'rexml/document'
+module TDParser
+  module XMLParser
+    class XMLTokenGenerator < TDParser::TokenGenerator
+      def initialize(src)
+        @xparser = REXML::Parsers::BaseParser.new(src)
+        super()  do |g|
+          while @xparser.has_next?
+            e = @xparser.pull
+            g.yield(e)
+          end
+        end
+      end
+    end
+    class XArray < Array
+      def ===(ary)
+        return true if super(ary)
+        return false unless ary.is_a?(Array)
+        each_with_index do |v, idx|
+          case ary[idx]
+          when v
+          else
+            return false
+          end
+        end
+        true
+      end
+    end
+    class XHash < Hash
+      def ===(h)
+        return true if super(h)
+        return false unless h.is_a?(Hash)
+        each do |k, v|
+          case h[k]
+          when v
+          else
+            return false
+          end
+        end
+        true
+      end
+    end
+    def start_element(name = String)
+      token(XArray[:start_element, name, Hash])
+    end
+    def end_element(name = String)
+      token(XArray[:end_element, name])
+    end
+    def element(elem = String, &inner)
+      crule = if inner
+                inner.call | empty
+              else
+                empty
+              end
+      (start_element(elem) - crule - end_element(elem)) >> proc do |x|
+        name = x[0][1]
+        attrs = x[0][2]
+        node = REXML::Element.new
+        node.name = name
+        node.attributes.merge!(attrs)
+        [node, x[1]]
+      end
+    end
+    def text(match = String)
+      token(XArray[:text, match]) >> proc do |x|
+        REXML::Text.new(x[0][1])
+      end
+    end
+    def pi
+      token(XArray[:processing_instruction, String, String]) >> proc do |x|
+        REXML::Instruction.new(x[0][1], x[0][2])
+      end
+    end
+    def cdata(match = String)
+      token(XArray[:cdata, match]) >> proc do |x|
+        REXML::CData.new(x[0][1])
+      end
+    end
+    def comment(match = String)
+      token(XArray[:comment, match]) >> proc do |x|
+        REXML::Comment.new(x[0][1])
+      end
+    end
+    def xmldecl
+      token(XArray[:xmldecl]) >> proc do |x|
+        REXML::XMLDecl.new(x[0][1], x[0][2], x[0][3])
+      end
+    end
+    def start_doctype(name = String)
+      token(XArray[:start_doctype, name])
+    end
+    def end_doctype
+      token(XArray[:end_doctype])
+    end
+    def doctype(name = String, &inner)
+      crule = if inner
+                inner.call | empty
+              else
+                empty
+              end
+      (start_doctype(name) - crule - end_doctype) >> proc do |x|
+        node = REXML::DocType.new(x[0][1..])
+        [node, x[1]]
+      end
+    end
+    def externalentity(entity = String)
+      token(XArray[:externalentity, entity]) >> proc do |x|
+        REXML::ExternalEntity.new(x[0][1])
+      end
+    end
+    def elementdecl(elem = String)
+      token(XArray[:elementdecl, elem]) >> proc do |x|
+        REXML::ElementDecl.new(x[0][1])
+      end
+    end
+    def entitydecl(_entity = String)
+      token(XArray[:entitydecl, elem]) >> proc do |x|
+        REXML::Entity.new(x[0])
+      end
+    end
+    def attlistdecl(_decl = String)
+      token(XArray[:attlistdecl]) >> proc do |x|
+        REXML::AttlistDecl.new(x[0][1..])
+      end
+    end
+    def notationdecl(_decl = String)
+      token(XArray[:notationdecl]) >> proc do |x|
+        REXML::NotationDecl.new(*x[0][1..])
+      end
+    end
+    def any_node(&)
+      (element(&) | doctype(&) | text | pi | cdata |
+       comment | xmldecl | externalentity | elementdecl |
+       entitydecl | attlistdecl | notationdecl) >> proc { |x| x[2] }
+    end
+    def dom_constructor(&act)
+      proc do |x|
+        node = x[0][0]
+        child = x[0][1]
+        if child.is_a?(Array)
+          child.each { |c| node.add(c) }
+        else
+          node.add(child)
+        end
+        if act
+          act[node]
+        else
+          node
+        end
+      end
+    end
+  end
+end