RubyGems - lrama - Versions diffs - 0.6.2 → 0.6.3 - Mend

lrama 0.6.2 → 0.6.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

checksums.yaml +4 -4
data/NEWS.md +34 -0
data/README.md +23 -0
data/Steepfile +2 -0
data/lib/lrama/context.rb +4 -4
data/lib/lrama/grammar/code/initial_action_code.rb +6 -0
data/lib/lrama/grammar/code/no_reference_code.rb +4 -0
data/lib/lrama/grammar/code/printer_code.rb +6 -0
data/lib/lrama/grammar/code/rule_action.rb +11 -1
data/lib/lrama/grammar/reference.rb +4 -3
data/lib/lrama/grammar/rule_builder.rb +8 -1
data/lib/lrama/grammar/symbol.rb +1 -1
data/lib/lrama/grammar/symbols/resolver.rb +276 -0
data/lib/lrama/grammar/symbols.rb +1 -0
data/lib/lrama/grammar.rb +25 -244
data/lib/lrama/lexer/token/user_code.rb +13 -2
data/lib/lrama/lexer.rb +6 -0
data/lib/lrama/output.rb +56 -2
data/lib/lrama/parser.rb +520 -457
data/lib/lrama/state.rb +4 -4
data/lib/lrama/states/item.rb +6 -8
data/lib/lrama/states_reporter.rb +2 -2
data/lib/lrama/version.rb +1 -1
data/lrama.gemspec +7 -0
data/parser.y +20 -0
data/sig/lrama/grammar/reference.rbs +2 -1
data/sig/lrama/grammar/symbol.rbs +4 -4
data/sig/lrama/grammar/symbols/resolver.rbs +41 -0
data/sig/lrama/grammar/type.rbs +11 -0
data/template/bison/yacc.c +6 -0
metadata +12 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: e4158de45c42ff62eacfb00737261feaa49d8f0cc646004e30da74ba4e2e69c6
-  data.tar.gz: 734830227f701e18df2e9e8bc3da55d15f49c890e08530e6ac55ef87ae5f952d
+  metadata.gz: ecd30d3fab4dd73442ed6d3b2802db5b463159cb6ddf1f1835d9b8e860d4c9dd
+  data.tar.gz: 79b6087e68d3c2e95db81fa1d25f58280a5543e9c5fa91f5b6ecc7c40b5599d7
 SHA512:
-  metadata.gz: 52ebbe4d099ae63d73aa995bddc8e966f989a4d00ad3b39634d2abe2448da404dd9bff8f15e0dedd0577716089329c804ef2c4edcadd39ca6ba47f8d293d101d
-  data.tar.gz: 72e91c79618071b5850c85335cfe3f1b63ff89f11cd332b0141623a4e2a7e2c2c389dd0db9afe38a5a84b7aa891ac1834fc7a2d6c6eed8f62d87734f6b99cbbf
+  metadata.gz: f3302156423399987015deb90afbaa0d6916e5e61b14c5297ff6a0e01ab9db3bbd164b334a11e26d38cf626425b7faee404e5d1cdec236a9b08b576ced4fe201
+  data.tar.gz: 380d8d31c93e5ae6c5a406c2b2eedad0d4b52dd311ecd742ca1412bd1acade7dae8fe12f7a6325ef8a0b4922ffd927b0dfa2caf3cd54c095669ab2b2cab85516

data/NEWS.md CHANGED Viewed

@@ -1,5 +1,39 @@
 # NEWS for Lrama
+## Lrama 0.6.3 (2024-02-15)
+### Bring Your Own Stack
+Provide functionalities for Bring Your Own Stack.
+Ruby’s Ripper library requires their own semantic value stack to manage Ruby Objects returned by user defined callback method. Currently Ripper uses semantic value stack (`yyvsa`) which is used by parser to manage Node. This hack introduces some limitation on Ripper. For example, Ripper can not execute semantic analysis depending on Node structure.
+Lrama introduces two features to support another semantic value stack by parser generator users.
+1. Callback entry points
+User can emulate semantic value stack by these callbacks.
+Lrama provides these five callbacks. Registered functions are called when each event happen. For example %after-shift function is called when shift happens on original semantic value stack.
+* `%after-shift` function_name
+* `%before-reduce` function_name
+* `%after-reduce` function_name
+* `%after-shift-error-token` function_name
+* `%after-pop-stack` function_name
+2. `$:n` variable to access index of each grammar symbols
+User also needs to access semantic value of their stack in grammar action. `$:n` provides the way to access to it. `$:n` is translated to the minus index from the top of the stack.
+For example
+```
+primary: k_if expr_value then compstmt if_tail k_end
+          {
+          /*% ripper: if!($:2, $:4, $:5) %*/
+          /* $:2 = -5, $:4 = -3, $:5 = -2. */
+          }
+```
 ## Lrama 0.6.2 (2024-01-27)
 ### %no-stdlib directive

data/README.md CHANGED Viewed

@@ -1,7 +1,23 @@
 # Lrama
+[![Gem Version](https://badge.fury.io/rb/lrama.svg)](https://badge.fury.io/rb/lrama)
+[![build](https://github.com/ruby/lrama/actions/workflows/test.yaml/badge.svg)](https://github.com/ruby/lrama/actions/workflows/test.yaml)
 Lrama is LALR (1) parser generator written by Ruby. The first goal of this project is providing error tolerant parser for CRuby with minimal changes on CRuby parse.y file.
+* [Features](#features)
+* [Installation](#installation)
+* [Usage](#usage)
+* [Versions and Branches](#versions-and-branches)
+* [Supported Ruby version](#supported-ruby-version)
+* [Development](#development)
+  * [How to generate parser.rb](#how-to-generate-parserrb)
+  * [Test](#test)
+  * [Profiling Lrama](#profiling-lrama)
+  * [Build Ruby](#build-ruby)
+* [Release flow](#release-flow)
+* [License](#license)
 ## Features
 * Bison style grammar file is supported with some assumptions
@@ -11,6 +27,9 @@ Lrama is LALR (1) parser generator written by Ruby. The first goal of this proje
   * b4_lac_if is always false
 * Error Tolerance parser
   * Subset of [Repairing Syntax Errors in LR Parsers (Corchuelo et al.)](https://idus.us.es/bitstream/handle/11441/65631/Repairing%20syntax%20errors.pdf) algorithm is supported
+* Parameterizing rules
+  * The definition of a non-terminal symbol can be parameterized with other (terminal or non-terminal) symbols.
+  * Providing a generic definition of parameterizing rules as a [standard library](lib/lrama/grammar/stdlib.y).
 ## Installation
@@ -85,6 +104,8 @@ Running tests:
 ```shell
 $ bundle install
 $ bundle exec rspec
+# or
+$ bundle exec rake spec
 ```
 Running type check:
@@ -93,6 +114,8 @@ Running type check:
 $ bundle install
 $ bundle exec rbs collection install
 $ bundle exec steep check
+# or
+$ bundle exec rake steep
 ```
 Running both of them:

data/Steepfile CHANGED Viewed

@@ -11,12 +11,14 @@ target :lib do
   check "lib/lrama/grammar/error_token.rb"
   check "lib/lrama/grammar/parameterizing_rule"
   check "lib/lrama/grammar/parameterizing_rules"
+  check "lib/lrama/grammar/symbols"
   check "lib/lrama/grammar/percent_code.rb"
   check "lib/lrama/grammar/precedence.rb"
   check "lib/lrama/grammar/printer.rb"
   check "lib/lrama/grammar/reference.rb"
   check "lib/lrama/grammar/rule_builder.rb"
   check "lib/lrama/grammar/symbol.rb"
+  check "lib/lrama/grammar/type.rb"
   check "lib/lrama/lexer"
   check "lib/lrama/report"
   check "lib/lrama/bitmap.rb"

data/lib/lrama/context.rb CHANGED Viewed

@@ -265,9 +265,9 @@ module Lrama
         s = actions.each_with_index.map do |n, i|
           [i, n]
-        end.select do |i, n|
+        end.reject do |i, n|
           # Remove default_reduction_rule entries
-          n != 0
+          n == 0
         end
         if s.count != 0
@@ -462,7 +462,7 @@ module Lrama
       @yylast = high
       # replace_ninf
-      @yypact_ninf = (@base.select {|i| i != BaseMin } + [0]).min - 1
+      @yypact_ninf = (@base.reject {|i| i == BaseMin } + [0]).min - 1
       @base.map! do |i|
         case i
         when BaseMin
@@ -472,7 +472,7 @@ module Lrama
         end
       end
-      @yytable_ninf = (@table.compact.select {|i| i != ErrorActionNumber } + [0]).min - 1
+      @yytable_ninf = (@table.compact.reject {|i| i == ErrorActionNumber } + [0]).min - 1
       @table.map! do |i|
         case i
         when nil

data/lib/lrama/grammar/code/initial_action_code.rb CHANGED Viewed

@@ -6,18 +6,24 @@ module Lrama
         # * ($$) yylval
         # * (@$) yylloc
+        # * ($:$) error
         # * ($1) error
         # * (@1) error
+        # * ($:1) error
         def reference_to_c(ref)
           case
           when ref.type == :dollar && ref.name == "$" # $$
             "yylval"
           when ref.type == :at && ref.name == "$" # @$
             "yylloc"
+          when ref.type == :index && ref.name == "$" # $:$
+            raise "$:#{ref.value} can not be used in initial_action."
           when ref.type == :dollar # $n
             raise "$#{ref.value} can not be used in initial_action."
           when ref.type == :at # @n
             raise "@#{ref.value} can not be used in initial_action."
+          when ref.type == :index # $:n
+            raise "$:#{ref.value} can not be used in initial_action."
           else
             raise "Unexpected. #{self}, #{ref}"
           end

data/lib/lrama/grammar/code/no_reference_code.rb CHANGED Viewed

@@ -6,14 +6,18 @@ module Lrama
         # * ($$) error
         # * (@$) error
+        # * ($:$) error
         # * ($1) error
         # * (@1) error
+        # * ($:1) error
         def reference_to_c(ref)
           case
           when ref.type == :dollar # $$, $n
             raise "$#{ref.value} can not be used in #{type}."
           when ref.type == :at # @$, @n
             raise "@#{ref.value} can not be used in #{type}."
+          when ref.type == :index # $:$, $:n
+            raise "$:#{ref.value} can not be used in #{type}."
           else
             raise "Unexpected. #{self}, #{ref}"
           end

data/lib/lrama/grammar/code/printer_code.rb CHANGED Viewed

@@ -11,8 +11,10 @@ module Lrama
         # * ($$) *yyvaluep
         # * (@$) *yylocationp
+        # * ($:$) error
         # * ($1) error
         # * (@1) error
+        # * ($:1) error
         def reference_to_c(ref)
           case
           when ref.type == :dollar && ref.name == "$" # $$
@@ -20,10 +22,14 @@ module Lrama
             "((*yyvaluep).#{member})"
           when ref.type == :at && ref.name == "$" # @$
             "(*yylocationp)"
+          when ref.type == :index && ref.name == "$" # $:$
+            raise "$:#{ref.value} can not be used in #{type}."
           when ref.type == :dollar # $n
             raise "$#{ref.value} can not be used in #{type}."
           when ref.type == :at # @n
             raise "@#{ref.value} can not be used in #{type}."
+          when ref.type == :index # $:n
+            raise "$:#{ref.value} can not be used in #{type}."
           else
             raise "Unexpected. #{self}, #{ref}"
           end

data/lib/lrama/grammar/code/rule_action.rb CHANGED Viewed

@@ -11,8 +11,10 @@ module Lrama
         # * ($$) yyval
         # * (@$) yyloc
+        # * ($:$) error
         # * ($1) yyvsp[i]
         # * (@1) yylsp[i]
+        # * ($:1) i - 1
         #
         #
         # Consider a rule like
@@ -24,6 +26,8 @@ module Lrama
         # "Rule"                class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
         # "Position in grammar"                   $1     $2      $3          $4          $5
         # "Index for yyvsp"                       -4     -3      -2          -1           0
+        # "$:n"                                  $:1    $:2     $:3         $:4         $:5
+        # "index of $:n"                          -5     -4      -3          -2          -1
         #
         #
         # For the first midrule action:
@@ -31,6 +35,7 @@ module Lrama
         # "Rule"                class: keyword_class { $1 } tSTRING { $2 + $3 } keyword_end { $class = $1 + $keyword_end }
         # "Position in grammar"                   $1
         # "Index for yyvsp"                        0
+        # "$:n"                                  $:1
         def reference_to_c(ref)
           case
           when ref.type == :dollar && ref.name == "$" # $$
@@ -39,6 +44,8 @@ module Lrama
             "(yyval.#{tag.member})"
           when ref.type == :at && ref.name == "$" # @$
             "(yyloc)"
+          when ref.type == :index && ref.name == "$" # $:$
+            raise "$:$ is not supported"
           when ref.type == :dollar # $n
             i = -position_in_rhs + ref.index
             tag = ref.ex_tag || rhs[ref.index - 1].tag
@@ -47,6 +54,9 @@ module Lrama
           when ref.type == :at # @n
             i = -position_in_rhs + ref.index
             "(yylsp[#{i}])"
+          when ref.type == :index # $:n
+            i = -position_in_rhs + ref.index
+            "(#{i} - 1)"
           else
             raise "Unexpected. #{self}, #{ref}"
           end
@@ -70,7 +80,7 @@ module Lrama
         end
         def raise_tag_not_found_error(ref)
-          raise "Tag is not specified for '$#{ref.value}' in '#{@rule.to_s}'"
+          raise "Tag is not specified for '$#{ref.value}' in '#{@rule}'"
         end
       end
     end

data/lib/lrama/grammar/reference.rb CHANGED Viewed

@@ -2,11 +2,12 @@ module Lrama
   class Grammar
     # type: :dollar or :at
     # name: String (e.g. $$, $foo, $expr.right)
-    # index: Integer (e.g. $1)
+    # number: Integer (e.g. $1)
+    # index: Integer
     # ex_tag: "$<tag>1" (Optional)
-    class Reference < Struct.new(:type, :name, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
+    class Reference < Struct.new(:type, :name, :number, :index, :ex_tag, :first_column, :last_column, keyword_init: true)
       def value
-        name || index
+        name || number
       end
     end
   end

data/lib/lrama/grammar/rule_builder.rb CHANGED Viewed

@@ -181,11 +181,18 @@ module Lrama
                 if referring_symbol[1] == 0 # Refers to LHS
                   ref.name = '$'
                 else
-                  ref.index = referring_symbol[1]
+                  ref.number = referring_symbol[1]
                 end
               end
             end
+            if ref.number
+              # TODO: When Inlining is implemented, for example, if `$1` is expanded to multiple RHS tokens,
+              #       `$2` needs to access `$2 + n` to actually access it. So, after the Inlining implementation,
+              #       it needs resolves from number to index.
+              ref.index = ref.number
+            end
             # TODO: Need to check index of @ too?
             next if ref.type == :at

data/lib/lrama/grammar/symbol.rb CHANGED Viewed

@@ -11,7 +11,7 @@ module Lrama
       attr_reader :term
       attr_writer :eof_symbol, :error_symbol, :undef_symbol, :accept_symbol
-      def initialize(id:, alias_name: nil, number: nil, tag: nil, term:, token_id: nil, nullable: nil, precedence: nil, printer: nil)
+      def initialize(id:, term:, alias_name: nil, number: nil, tag: nil, token_id: nil, nullable: nil, precedence: nil, printer: nil)
         @id = id
         @alias_name = alias_name
         @number = number

data/lib/lrama/grammar/symbols/resolver.rb ADDED Viewed

@@ -0,0 +1,276 @@
+module Lrama
+  class Grammar
+    class Symbols
+      class Resolver
+        attr_reader :terms, :nterms
+        def initialize
+          @terms = []
+          @nterms = []
+        end
+        def symbols
+          @symbols ||= (@terms + @nterms)
+        end
+        def sort_by_number!
+          symbols.sort_by!(&:number)
+        end
+        def add_term(id:, alias_name: nil, tag: nil, token_id: nil, replace: false)
+          if token_id && (sym = find_symbol_by_token_id(token_id))
+            if replace
+              sym.id = id
+              sym.alias_name = alias_name
+              sym.tag = tag
+            end
+            return sym
+          end
+          if (sym = find_symbol_by_id(id))
+            return sym
+          end
+          @symbols = nil
+          term = Symbol.new(
+            id: id, alias_name: alias_name, number: nil, tag: tag,
+            term: true, token_id: token_id, nullable: false
+          )
+          @terms << term
+          term
+        end
+        def add_nterm(id:, alias_name: nil, tag: nil)
+          return if find_symbol_by_id(id)
+          @symbols = nil
+          nterm = Symbol.new(
+            id: id, alias_name: alias_name, number: nil, tag: tag,
+            term: false, token_id: nil, nullable: nil,
+          )
+          @nterms << nterm
+          nterm
+        end
+        def find_symbol_by_s_value(s_value)
+          symbols.find { |s| s.id.s_value == s_value }
+        end
+        def find_symbol_by_s_value!(s_value)
+          find_symbol_by_s_value(s_value) || (raise "Symbol not found: #{s_value}")
+        end
+        def find_symbol_by_id(id)
+          symbols.find do |s|
+            s.id == id || s.alias_name == id.s_value
+          end
+        end
+        def find_symbol_by_id!(id)
+          find_symbol_by_id(id) || (raise "Symbol not found: #{id}")
+        end
+        def find_symbol_by_token_id(token_id)
+          symbols.find {|s| s.token_id == token_id }
+        end
+        def find_symbol_by_number!(number)
+          sym = symbols[number]
+          raise "Symbol not found: #{number}" unless sym
+          raise "[BUG] Symbol number mismatch. #{number}, #{sym}" if sym.number != number
+          sym
+        end
+        def fill_symbol_number
+          # YYEMPTY = -2
+          # YYEOF   =  0
+          # YYerror =  1
+          # YYUNDEF =  2
+          @number = 3
+          fill_terms_number
+          fill_nterms_number
+        end
+        def fill_nterm_type(types)
+          types.each do |type|
+            nterm = find_nterm_by_id!(type.id)
+            nterm.tag = type.tag
+          end
+        end
+        def fill_printer(printers)
+          symbols.each do |sym|
+            printers.each do |printer|
+              printer.ident_or_tags.each do |ident_or_tag|
+                case ident_or_tag
+                when Lrama::Lexer::Token::Ident
+                  sym.printer = printer if sym.id == ident_or_tag
+                when Lrama::Lexer::Token::Tag
+                  sym.printer = printer if sym.tag == ident_or_tag
+                else
+                  raise "Unknown token type. #{printer}"
+                end
+              end
+            end
+          end
+        end
+        def fill_error_token(error_tokens)
+          symbols.each do |sym|
+            error_tokens.each do |token|
+              token.ident_or_tags.each do |ident_or_tag|
+                case ident_or_tag
+                when Lrama::Lexer::Token::Ident
+                  sym.error_token = token if sym.id == ident_or_tag
+                when Lrama::Lexer::Token::Tag
+                  sym.error_token = token if sym.tag == ident_or_tag
+                else
+                  raise "Unknown token type. #{token}"
+                end
+              end
+            end
+          end
+        end
+        def token_to_symbol(token)
+          case token
+          when Lrama::Lexer::Token
+            find_symbol_by_id!(token)
+          else
+            raise "Unknown class: #{token}"
+          end
+        end
+        def validate!
+          validate_number_uniqueness!
+          validate_alias_name_uniqueness!
+        end
+        private
+        def find_nterm_by_id!(id)
+          @nterms.find do |s|
+            s.id == id
+          end || (raise "Symbol not found: #{id}")
+        end
+        def fill_terms_number
+          # Character literal in grammar file has
+          # token id corresponding to ASCII code by default,
+          # so start token_id from 256.
+          token_id = 256
+          @terms.each do |sym|
+            while used_numbers[@number] do
+              @number += 1
+            end
+            if sym.number.nil?
+              sym.number = @number
+              used_numbers[@number] = true
+              @number += 1
+            end
+            # If id is Token::Char, it uses ASCII code
+            if sym.token_id.nil?
+              if sym.id.is_a?(Lrama::Lexer::Token::Char)
+                # Ignore ' on the both sides
+                case sym.id.s_value[1..-2]
+                when "\\b"
+                  sym.token_id = 8
+                when "\\f"
+                  sym.token_id = 12
+                when "\\n"
+                  sym.token_id = 10
+                when "\\r"
+                  sym.token_id = 13
+                when "\\t"
+                  sym.token_id = 9
+                when "\\v"
+                  sym.token_id = 11
+                when "\""
+                  sym.token_id = 34
+                when "'"
+                  sym.token_id = 39
+                when "\\\\"
+                  sym.token_id = 92
+                when /\A\\(\d+)\z/
+                  unless (id = Integer($1, 8)).nil?
+                    sym.token_id = id
+                  else
+                    raise "Unknown Char s_value #{sym}"
+                  end
+                when /\A(.)\z/
+                  unless (id = $1&.bytes&.first).nil?
+                    sym.token_id = id
+                  else
+                    raise "Unknown Char s_value #{sym}"
+                  end
+                else
+                  raise "Unknown Char s_value #{sym}"
+                end
+              else
+                sym.token_id = token_id
+                token_id += 1
+              end
+            end
+          end
+        end
+        def fill_nterms_number
+          token_id = 0
+          @nterms.each do |sym|
+            while used_numbers[@number] do
+              @number += 1
+            end
+            if sym.number.nil?
+              sym.number = @number
+              used_numbers[@number] = true
+              @number += 1
+            end
+            if sym.token_id.nil?
+              sym.token_id = token_id
+              token_id += 1
+            end
+          end
+        end
+        def used_numbers
+          return @used_numbers if defined?(@used_numbers)
+          @used_numbers = {}
+          symbols.map(&:number).each do |n|
+            @used_numbers[n] = true
+          end
+          @used_numbers
+        end
+        def validate_number_uniqueness!
+          invalid = symbols.group_by(&:number).select do |number, syms|
+            syms.count > 1
+          end
+          return if invalid.empty?
+          raise "Symbol number is duplicated. #{invalid}"
+        end
+        def validate_alias_name_uniqueness!
+          invalid = symbols.select(&:alias_name).group_by(&:alias_name).select do |alias_name, syms|
+            syms.count > 1
+          end
+          return if invalid.empty?
+          raise "Symbol alias name is duplicated. #{invalid}"
+        end
+      end
+    end
+  end
+end

data/lib/lrama/grammar/symbols.rb ADDED Viewed

	@@ -0,0 +1 @@
1	+ require_relative "symbols/resolver"