RubyGems - pegparse - Versions diffs - 0.1.0 - Mend

pegparse 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +7 -0
data/.gitignore +10 -0
data/.rubocop.yml +13 -0
data/Gemfile +14 -0
data/LICENSE.txt +21 -0
data/README.md +133 -0
data/Rakefile +16 -0
data/bin/console +15 -0
data/bin/setup +8 -0
data/lib/pegparse/biop_rule_chain.rb +113 -0
data/lib/pegparse/borrowed_areas.rb +35 -0
data/lib/pegparse/line_counter.rb +61 -0
data/lib/pegparse/parser_base.rb +139 -0
data/lib/pegparse/parser_context.rb +19 -0
data/lib/pegparse/parser_core.rb +243 -0
data/lib/pegparse/parser_errors.rb +97 -0
data/lib/pegparse/version.rb +5 -0
data/lib/pegparse.rb +9 -0
data/pegparse.gemspec +37 -0
data/samples/bsh_parser.rb +337 -0
data/samples/calc_parser.rb +55 -0
data/samples/json_parser.rb +92 -0
data/samples/xml_parser.rb +182 -0
metadata +67 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: c35eb89599b2cf50fb3076cd4b48873624940fc5e0ad8c07aaaa1e05c0992640
+  data.tar.gz: d02335bee92250709be7ab1e04871d6fff127548acb20979fe273738c3ccd8fa
+SHA512:
+  metadata.gz: 763f0630ece3da62bb793259033307135776f02afe7cb312f9e730e207b8a9ae8ceee501a73fdea45c4ccd426f2df96220697e3bee98e608fadbdfc8b5d4da52
+  data.tar.gz: 89b15e6e06db3659ea15d97aa83de2eb4f37a59cc12be0f0337a85e665b46eae34ba7b344ce00a3bbfc62ada052f32f2fe3b6917a18e42e892214d9d485a13d9

data/.gitignore ADDED Viewed

@@ -0,0 +1,10 @@
+/.bundle/
+/.yardoc
+/_yardoc/
+/coverage/
+/doc/
+/pkg/
+/spec/reports/
+/tmp/
+.vscode
+Gemfile.lock

data/.rubocop.yml ADDED Viewed

@@ -0,0 +1,13 @@
+AllCops:
+  TargetRubyVersion: 2.4
+Style/StringLiterals:
+  Enabled: true
+  EnforcedStyle: double_quotes
+Style/StringLiteralsInInterpolation:
+  Enabled: true
+  EnforcedStyle: double_quotes
+Layout/LineLength:
+  Max: 120

data/Gemfile ADDED Viewed

@@ -0,0 +1,14 @@
+# frozen_string_literal: true
+source "https://rubygems.org"
+# Specify your gem's dependencies in pegparse.gemspec
+gemspec
+gem "rake", "~> 13.0"
+gem "minitest", "~> 5.0"
+gem "rubocop", "~> 1.7"
+gem "debug"

data/LICENSE.txt ADDED Viewed

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+Copyright (c) 2021 Riki Ishikawa
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/README.md ADDED Viewed

@@ -0,0 +1,133 @@
+# Pegparse
+Pegparse is library to create recursive descent parser.
+This provide parser base class which has helper methods.
+- PEG semantics
+- binary-operations
+- quoted-strings
+- comments aware skip
+- indent level checking
+- here-documents
+## Installation
+Add this line to your application's Gemfile:
+```ruby
+gem 'pegparse'
+```
+And then execute:
+    $ bundle install
+Or install it yourself as:
+    $ gem install pegparse
+## Usage
+1. Create class inherit `Pegparse::ParserBase` class.
+2. Set entrypoint with `start_rule_symbol`.
+3. Write parsing rule by method.
+```ruby
+require 'pegparse'
+class MyParser < Pegparse::ParserBase
+  def initialize(scanner_or_context)
+    super(scanner_or_context)
+    self.start_rule_symbol = :number_rule
+  end
+  def number_rule
+    digits = one_or_more {  # digits becomes ['1', '2']
+      read(/[0-9]/)
+    }
+    decimal = optional {  # decimal is '34'
+      decimal_rule()
+    }
+    return [digits.join.to_i, decimal&.to_i]
+  end
+  def decimal_rule
+    read('.')
+    read(/[0-9]+/)  # decimal_rule returns '34'
+  end
+end
+MyParser.new(nil).parse(StringScanner.new('12.34'))  # => [12, 34]
+```
+### Core methods
+- `raed(str_or_regexp)` : Try to consume input. If success, return string. If failed, make backtrack.
+- `peek(str_or_regexp)` : Peek input. If success, return string.
+- `peek{ ... }` : Peek input. If success, return block result.
+- `optional{ ... }` : Match only available. (PEG's option operator('?'))
+- `zero_or_more{ ... }` : Repeat matching. (PEG's repeat operator('*'))
+- `one_or_more{ ... }` : Repeat matching. (PEG's repeat operator('+'))
+- `choice(proc, proc, ...)` : Choice matching (PEG's choice operator('/'))
+- `backtrack()` : Make backtrack.
+### Helper methods
+- `sp()` : Spaces. (Space charactors or comments)
+- `inline_sp()` : Spaces without line feed.
+- `deeper_sp()` : Spaces without line feed or have deeper indent than previous line.
+- `lf()` : Spaces contain line feed.
+- `separative(separator){ ... }` : Repeat matching with separator.
+- `string_like(end_pattern, normal_pattern){ ... }` : String like "" and ''. Block is for special char handlings like escaping.
+- `borrow_next_line{ ... }` : Skip current line and parse next line temporaliry. Used lines become unmatchable with normal process. (For here-document)
+- `borrowed_area()` : Only matches to lines used by `borrow_next_line`.
+- `Pegparse::BiopRuleChain` : Binary operator helper class.
+You can see sample parser implementations under `/samples`.
+### debug
+Use `Pegparse::ParserCore#best_errors` to find parsing error location.
+`best_errors` returns farthest location where parsing failed.
+It also returns the deepest rule name.
+You can improve message by decorating your rule method with `rule`.
+```ruby
+  rule def your_rule
+    ...
+  end
+```
+### VSCode
+If you want to debug your parser with VSCode by breakpoint or step-by-step execution, add this config to your launch.json.
+(debug gem newer than 1.4.0 required)
+Then all process inside gem will be skipped while VSCode step-by-step execution.
+```json
+    {
+        "type": "rdbg",
+        "name": "Debug specified user program with rdbg",
+        "request": "launch",
+        "script": "${workspaceFolder}/YOUR_PARSER_HERE.rb",
+        "args": [],
+        "env": {
+            "RUBY_DEBUG_SKIP_PATH": [
+                "YOUR_GEM_DIRECTORY_HERE",
+            ],
+        }
+    }
+```
+## Contributing
+Bug reports and pull requests are welcome on GitHub at https://github.com/jljse/pegparse.
+## License
+The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

data/Rakefile ADDED Viewed

@@ -0,0 +1,16 @@
+# frozen_string_literal: true
+require "bundler/gem_tasks"
+require "rake/testtask"
+Rake::TestTask.new(:test) do |t|
+  t.libs << "test"
+  t.libs << "lib"
+  t.test_files = FileList["test/**/*_test.rb"]
+end
+require "rubocop/rake_task"
+RuboCop::RakeTask.new
+task default: %i[test rubocop]

data/bin/console ADDED Viewed

@@ -0,0 +1,15 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require "bundler/setup"
+require "pegparse"
+# You can add fixtures and/or initialization code here to make experimenting
+# with your gem easier. You can also use a different console, if you like.
+# (If you use this, don't forget to add pry to your Gemfile!)
+# require "pry"
+# Pry.start
+require "irb"
+IRB.start(__FILE__)

data/bin/setup ADDED Viewed

@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+set -euo pipefail
+IFS=$'\n\t'
+set -vx
+bundle install
+# Do any other automated setup that you need to do here

data/lib/pegparse/biop_rule_chain.rb ADDED Viewed

@@ -0,0 +1,113 @@
+require_relative 'parser_base'
+# Binary operator rule helper.
+module Pegparse::BiopRuleChain
+  # @!parse include Pegparse::ParserCore
+  # Create new parser class derived from passed one.
+  # If you want to customize parser behavior, override method in exec_block.
+  # @return [Class<Pegparse::BiopRuleChainImitation>]
+  def self.based_on(parser_class, &exec_block)
+    raise ArgumentError unless parser_class.ancestors.include?(Pegparse::ParserBase)
+    klass = Class.new(parser_class) do
+      include Pegparse::BiopRuleChain
+    end
+    klass.class_exec(&exec_block) if exec_block
+    klass
+  end
+  def initialize(scanner_or_context)
+    super(scanner_or_context)
+    @start_rule_symbol = :start_rule
+    @operators = []
+    @term = nil
+  end
+  # Default construction of matching result. (override this if you want)
+  def construct_result(lhs, op, rhs)
+    [op, lhs, rhs]
+  end
+  # Default matching rule of spaces before operator. (override this if you want)
+  # This rule will be used when you pass string to #left_op.
+  def operator_sp
+    sp()
+  end
+  # Default matching rule of spaces before operand. (override this if you want)
+  def operand_sp
+    sp()
+  end
+  # Create match proc for operator.
+  # @param operator_matcher [Array, Proc, String, Regexp]
+  # @return [Proc]
+  private def get_operator_matcher(operator_matcher)
+    if operator_matcher.is_a? Array
+      ops = operator_matcher.map{|x| get_operator_matcher(x)}
+      return ->{
+        choice(*ops)
+      }
+    end
+    if operator_matcher.is_a? Proc
+      return operator_matcher
+    end
+    if operator_matcher.is_a?(String) || operator_matcher.is_a?(Regexp)
+      return ->{
+        operator_sp()
+        op = read(operator_matcher)
+      }
+    end
+    raise ArgumentError
+  end
+  # Add left-associative binary operators.
+  # Call in order of operators precedence.
+  # If you have multiple operators in same precedence, pass Array as parameter.
+  # @param operator_matcher [String, Regexp, Array, Proc]
+  # @return [Pegparse::BiopRuleChainImitation]
+  def left_op(operator_matcher)
+    @operators << get_operator_matcher(operator_matcher)
+    self
+  end
+  # Set terminal matching rule.
+  # @param term_block [Proc]
+  def term(term_block)
+    @term = term_block
+    nil
+  end
+  # Match expression of the operators which have specified precedence level.
+  private def match(operator_level)
+    return @term.call if operator_level >= @operators.size
+    lhs = match(operator_level + 1)
+    operands = zero_or_more {
+      op = choice(*@operators[operator_level])
+      operand_sp()
+      rhs = match(operator_level + 1)
+      [op, rhs]
+    }
+    tree = operands.inject(lhs) {|subtree, operand|
+      construct_result(subtree, operand[0], operand[1])
+    }
+  end
+  # entry point
+  private def start_rule
+    match(0)
+  end
+end
+# this is an imitation class just for documentation.
+# actual runtime never use this instance.
+class Pegparse::BiopRuleChainImitation < Pegparse::ParserBase
+  include Pegparse::BiopRuleChain
+end

data/lib/pegparse/borrowed_areas.rb ADDED Viewed

@@ -0,0 +1,35 @@
+module Pegparse
+  BorrowedArea = Struct.new(
+    :marker_pos,
+    :start_pos,
+    :end_pos,
+    keyword_init: true,
+  )
+end
+class Pegparse::BorrowedAreas
+  def initialize
+    @areas = []
+  end
+  def add_area(area)
+    @areas << area
+  end
+  def conflicted_area(pos)
+    conflicted = @areas.find{|area| area.start_pos <= pos && pos < area.end_pos }
+  end
+  def backtracked(pos)
+    @areas.reject!{|area| area.marker_pos > pos }
+  end
+  def borrowed_area_start_pos
+    @areas.first ? @areas.first.start_pos : nil
+  end
+  def borrowed_area_end_pos
+    @areas.last ? @areas.last.end_pos : nil
+  end
+end

data/lib/pegparse/line_counter.rb ADDED Viewed

@@ -0,0 +1,61 @@
+# count line number and indent level
+class Pegparse::LineCounter
+  def initialize
+    @line_start_pos = [0]
+    @line_start_pos_noindent = [0]
+    @farthest_pos = 0
+  end
+  # update with partial string
+  # @param pos [Integer]  position of str relative to whole input
+  # @param str [String]  partial string
+  def memo(pos, str)
+    return if pos + str.size < @farthest_pos
+    raise ArgumentError if pos > @farthest_pos
+    row, * = position(pos)
+    str.each_byte.with_index do |ch, index|
+      if ch == ' '.ord || ch == "\t".ord
+        # 既知のインデントより後ろに空白が続いている場合、インデントの深さを増やす
+        if (pos + index) == (@line_start_pos_noindent[row])
+          @line_start_pos_noindent[row] += 1
+        end
+      end
+      if ch == "\n".ord
+        next_line_start_pos = pos + index + 1
+        if @line_start_pos.last < next_line_start_pos
+          @line_start_pos << next_line_start_pos
+          @line_start_pos_noindent << next_line_start_pos
+        end
+        row += 1
+      end
+    end
+    if @farthest_pos < pos + str.size
+      @farthest_pos = pos + str.size
+    end
+  end
+  # get line number and char offset for pos
+  # @param pos [Integer]
+  # @return [Array[Integer]]
+  def position(pos)
+    if pos >= @line_start_pos.last
+      line_count = @line_start_pos.size - 1
+    else
+      after_pos_line_head = @line_start_pos.bsearch_index{|x| x > pos}
+      line_count = after_pos_line_head - 1
+    end
+    char_count = pos - @line_start_pos[line_count]
+    [line_count, char_count]
+  end
+  # get indent level for the line including pos
+  # @param pos [Integer]
+  # @return [Integer]
+  def indent(pos)
+    line_count, * = position(pos)
+    @line_start_pos_noindent[line_count] - @line_start_pos[line_count]
+  end
+end

data/lib/pegparse/parser_base.rb ADDED Viewed

@@ -0,0 +1,139 @@
+require_relative 'parser_core'
+# Parser base class (reusable rules)
+class Pegparse::ParserBase < Pegparse::ParserCore
+  def initialize(scanner_or_context)
+    super(scanner_or_context)
+  end
+  # match for spaces
+  def _
+    one_or_more {
+      choice(
+        ->{ read(/[ \t\r]+/) },
+        ->{ read(/\n/) },
+        ->{ borrowed_area() },
+        ->{ line_comment() },
+        ->{ block_comment() },
+      )
+    }
+  end
+  def line_comment
+    # read(/#[^\n]*/)
+    backtrack
+  end
+  rule def block_comment
+    # ret = ""
+    # ret << read('/*')
+    # ret << zero_or_more {
+    #   part = read(/[^*]*/)
+    #   break if peek('*/')
+    #   part << '*' if optional('*')
+    # }.join
+    # ret << read('*/')
+    # ret
+    backtrack
+  end
+  # match for spaces
+  def sp
+    optional{ _ }
+  end
+  # match for spaces without newline
+  def inline_sp
+    before_line, * = @context.line_counter.position(@context.scanner.pos)
+    ret = optional{ _ }
+    after_line, * = @context.line_counter.position(@context.scanner.pos)
+    backtrack() if before_line != after_line
+    ret
+  end
+  # match for spaces (if spaces cross to the next line, it must have deeper indent than previous line)
+  def deeper_sp
+    base_line, * = @context.line_counter.position(@context.scanner.pos)
+    base_indent = @indent_stack.last
+    raise StandardError unless base_indent
+    ret = optional{ _ }
+    new_line, * = @context.line_counter.position(@context.scanner.pos)
+    new_indent = @context.line_counter.indent(@context.scanner.pos)
+    backtrack() if base_line != new_line && base_indent >= new_indent
+    ret
+  end
+  # match for spaces (must contain newline)
+  def lf
+    before_line, * = @context.line_counter.position(@context.scanner.pos)
+    ret = optional{ _ }
+    after_line, * = @context.line_counter.position(@context.scanner.pos)
+    backtrack() if before_line == after_line
+    ret
+  end
+  # loop with separator
+  # @param separator_matcher [Regexp, String, Proc]
+  # @param allow_additional_separator [Boolean]  Allow redundant separator at tail.
+  def separative(separator_matcher, allow_additional_separator: false, &repeat_block)
+    if separator_matcher.is_a? Proc
+      separator_proc = separator_matcher
+    else
+      separator_proc = ->{
+        sp()
+        read(separator_matcher)
+        sp()
+      }
+    end
+    ret = []
+    optional {
+      ret << repeat_block.call()
+      rest = zero_or_more {
+        separator_proc.call()
+        repeat_block.call()
+      }
+      ret.concat(rest)
+      if allow_additional_separator
+        optional {
+          separator_proc.call()
+        }
+      end
+    }
+    ret
+  end
+  # string literal
+  # @param end_pattern [String, Regexp] End of literal (e.g. "'", "\"")
+  # @param normal_pattern [Regexp] Pattern for string without special process (e.g. /[^'\\]*/)
+  # @param special_process [Proc] Process for special characters. Block should return processed result.
+  # @return [Array<String,Object>]  Match result. Result has one ore more elements.
+  #   If block returned non-string result, array has multiple elements.
+  def string_like(end_pattern, normal_pattern, &special_process)
+    ret = []
+    str = ''
+    while true
+      str << read(normal_pattern)
+      break if peek(end_pattern)
+      break if eos?
+      break unless special_process
+      processed = special_process.call()
+      break unless processed
+      if processed.is_a? String
+        str << processed
+      else
+        ret << str if str.size > 0
+        ret << processed
+        str = ''
+      end
+    end
+    ret << str if str.size > 0
+    if ret.size > 0
+      ret
+    else
+      ['']
+    end
+  end
+end

data/lib/pegparse/parser_context.rb ADDED Viewed

@@ -0,0 +1,19 @@
+require_relative "parser_errors"
+require_relative "line_counter"
+require_relative "borrowed_areas"
+class Pegparse::ParserContext
+  attr_accessor :scanner
+  attr_accessor :rule_stack
+  attr_accessor :errors
+  attr_accessor :line_counter
+  attr_accessor :borrowed_areas
+  def initialize(scanner)
+    @scanner = scanner
+    @rule_stack = []
+    @errors = Pegparse::ParserErrors.new
+    @line_counter = Pegparse::LineCounter.new
+    @borrowed_areas = Pegparse::BorrowedAreas.new
+  end
+end