RubyGems - prism - Versions diffs - 1.8.0 → 1.9.0 - Mend

prism 1.8.0 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +17 -3
data/config.yml +4 -4
data/docs/ripper_translation.md +8 -17
data/ext/prism/extension.h +1 -1
data/include/prism/ast.h +4 -4
data/include/prism/version.h +2 -2
data/lib/prism/lex_compat.rb +135 -94
data/lib/prism/node.rb +27 -19
data/lib/prism/parse_result.rb +9 -0
data/lib/prism/serialize.rb +1 -1
data/lib/prism/translation/ripper/filter.rb +53 -0
data/lib/prism/translation/ripper/lexer.rb +90 -1
data/lib/prism/translation/ripper.rb +59 -36
data/lib/prism.rb +1 -14
data/prism.gemspec +2 -2
data/rbi/prism/node.rbi +3 -0
data/rbi/prism.rbi +0 -3
data/sig/prism/node.rbs +3 -0
data/sig/prism/parse_result.rbs +1 -0
data/sig/prism.rbs +54 -40
data/src/prism.c +1 -1
metadata +2 -2
data/lib/prism/lex_ripper.rb +0 -64

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 2344922b08aa30076aab32c5a92e0d7f21a05f03bb85d089cc69e228a71e3b20
-  data.tar.gz: 7a44533dd2827ec9f6c31fc69c533c9f90b88520aef71219a348aa61fa460fbd
+  metadata.gz: 1f7205a73da36d10903c1495fd475e014a30c875568c932568cac22351eb3059
+  data.tar.gz: d59eac5f6e8e8955ea9b259b63eec423a0af06c79a748edaa9f64f24463dc874
 SHA512:
-  metadata.gz: c278e7b881b89f51150850df09062d9a9bd5e5076746e4b86ab199729a588f8aa346e12b6ba9b95dc0cf53592866bf216977bbb7da10cef6bd8a018681e9f45f
-  data.tar.gz: '009dee7695d1f7d58adb68d3829f97ecff445cc0eb8509526979f938932e553abdc2ba601049df67b8b29a75b7d542a9fb06d4c1984131d6f60e2ac9e3cf83c2'
+  metadata.gz: 59571b78dba19ec01a6f52094cc6d30e5648e633cd24f3abfbbc765145bed778f88fea75eada071d780ecb3dbe77a1bebab563d676289fed0ea6d21e50056290
+  data.tar.gz: 3c5b40536f98839fefb0afcefd49d5e78a3a31f9a55bd33b78b80665b2ca3bd1c2262c1836f9c6f1d3ddc7ad2d9b7dd12f06b3dff2b4aea9475a7171340d95b4

data/CHANGELOG.md CHANGED Viewed

@@ -4,6 +4,21 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+## [1.9.0] - 2026-01-27
+### Added
+- Lots of work on the Ripper translation layer to make it more compatible and efficient.
+- Alias `Prism::Node#breadth_first_search` to `Prism::Node#find`.
+- Add `Prism::Node#breadth_first_search_all`/`Prism::Node#find_all` for finding all nodes matching a condition.
+### Changed
+- Fixed location of opening tokens when invalid syntax is parsed.
+- Fix RBI for parsing options.
 ## [1.8.0] - 2026-01-12
 ### Added
@@ -19,8 +34,6 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
 - Decouple ripper translator from ripper library.
 - Sync Prism::Translation::ParserCurrent with Ruby 4.0.
-## [Unreleased]
 ## [1.7.0] - 2025-12-18
 ### Added
@@ -731,7 +744,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a
 - 🎉 Initial release! 🎉
-[unreleased]: https://github.com/ruby/prism/compare/v1.8.0...HEAD
+[unreleased]: https://github.com/ruby/prism/compare/v1.9.0...HEAD
+[1.9.0]: https://github.com/ruby/prism/compare/v1.8.0...v1.9.0
 [1.8.0]: https://github.com/ruby/prism/compare/v1.7.0...v1.8.0
 [1.7.0]: https://github.com/ruby/prism/compare/v1.6.0...v1.7.0
 [1.6.0]: https://github.com/ruby/prism/compare/v1.5.2...v1.6.0

data/config.yml CHANGED Viewed

@@ -1269,17 +1269,17 @@ nodes:
       - name: opening_loc
         type: location
         comment: |
-          Represents the location of the opening `|`.
+          Represents the location of the opening `{` or `do`.
               [1, 2, 3].each { |i| puts x }
-                               ^
+                             ^
       - name: closing_loc
         type: location
         comment: |
-          Represents the location of the closing `|`.
+          Represents the location of the closing `}` or `end`.
               [1, 2, 3].each { |i| puts x }
-                                 ^
+                                          ^
     comment: |
       Represents a block of ruby code.

data/docs/ripper_translation.md CHANGED Viewed

@@ -1,22 +1,6 @@
 # Ripper translation
-Prism provides the ability to mirror the `Ripper` standard library. You can do this by:
-```ruby
-require "prism/translation/ripper/shim"
-```
-This provides the APIs like:
-```ruby
-Ripper.lex
-Ripper.parse
-Ripper.sexp_raw
-Ripper.sexp
-Ripper::SexpBuilder
-Ripper::SexpBuilderPP
-```
+Prism provides the ability to mirror the `Ripper` standard library. It is available under `Prism::Translation::Ripper`. You can use the entire public API, and also some undocumented features that are commonly used.
 Briefly, `Ripper` is a streaming parser that allows you to construct your own syntax tree. As an example:
@@ -49,6 +33,13 @@ ArithmeticRipper.new("1 + 2 - 3").parse # => [0]
 The exact names of the `on_*` methods are listed in the `Ripper` source.
+You can can also automatically use the ripper translation in places that don't explicitly use the translation layer by doing the following:
+```ruby
+# Will redirect access of the `Ripper` constant to `Prism::Translation::Ripper`.
+require "prism/translation/ripper/shim"
+```
 ## Background
 It is helpful to understand the differences between the `Ripper` library and the `Prism` library. Both libraries perform parsing and provide you with APIs to manipulate and understand the resulting syntax tree. However, there are a few key differences.

data/ext/prism/extension.h CHANGED Viewed

@@ -1,7 +1,7 @@
 #ifndef PRISM_EXT_NODE_H
 #define PRISM_EXT_NODE_H
-#define EXPECTED_PRISM_VERSION "1.8.0"
+#define EXPECTED_PRISM_VERSION "1.9.0"
 #include <ruby.h>
 #include <ruby/encoding.h>

data/include/prism/ast.h CHANGED Viewed

@@ -1826,20 +1826,20 @@ typedef struct pm_block_node {
     /**
      * BlockNode#opening_loc
      *
-     * Represents the location of the opening `|`.
+     * Represents the location of the opening `{` or `do`.
      *
      *     [1, 2, 3].each { |i| puts x }
-     *                      ^
+     *                    ^
      */
     pm_location_t opening_loc;
     /**
      * BlockNode#closing_loc
      *
-     * Represents the location of the closing `|`.
+     * Represents the location of the closing `}` or `end`.
      *
      *     [1, 2, 3].each { |i| puts x }
-     *                        ^
+     *                                 ^
      */
     pm_location_t closing_loc;
 } pm_block_node_t;

data/include/prism/version.h CHANGED Viewed

@@ -14,7 +14,7 @@
 /**
  * The minor version of the Prism library as an int.
  */
-#define PRISM_VERSION_MINOR 8
+#define PRISM_VERSION_MINOR 9
 /**
  * The patch version of the Prism library as an int.
@@ -24,6 +24,6 @@
 /**
  * The version of the Prism library as a constant string.
  */
-#define PRISM_VERSION "1.8.0"
+#define PRISM_VERSION "1.9.0"
 #endif

data/lib/prism/lex_compat.rb CHANGED Viewed

@@ -1,8 +1,6 @@
 # frozen_string_literal: true
 # :markup: markdown
-require "delegate"
 module Prism
   # This class is responsible for lexing the source using prism and then
   # converting those tokens to be compatible with Ripper. In the vast majority
@@ -201,87 +199,51 @@ module Prism
     # When we produce tokens, we produce the same arrays that Ripper does.
     # However, we add a couple of convenience methods onto them to make them a
     # little easier to work with. We delegate all other methods to the array.
-    class Token < SimpleDelegator
-      # @dynamic initialize, each, []
+    class Token < BasicObject
+      # Create a new token object with the given ripper-compatible array.
+      def initialize(array)
+        @array = array
+      end
       # The location of the token in the source.
       def location
-        self[0]
+        @array[0]
       end
       # The type of the token.
       def event
-        self[1]
+        @array[1]
       end
       # The slice of the source that this token represents.
       def value
-        self[2]
+        @array[2]
       end
       # The state of the lexer when this token was produced.
       def state
-        self[3]
+        @array[3]
       end
-    end
-    # Ripper doesn't include the rest of the token in the event, so we need to
-    # trim it down to just the content on the first line when comparing.
-    class EndContentToken < Token
+      # We want to pretend that this is just an Array.
       def ==(other) # :nodoc:
-        [self[0], self[1], self[2][0..self[2].index("\n")], self[3]] == other
+        @array == other
       end
-    end
-    # Tokens where state should be ignored
-    # used for :on_comment, :on_heredoc_end, :on_embexpr_end
-    class IgnoreStateToken < Token
-      def ==(other) # :nodoc:
-        self[0...-1] == other[0...-1]
+      def respond_to_missing?(name, include_private = false) # :nodoc:
+        @array.respond_to?(name, include_private)
       end
-    end
-    # Ident tokens for the most part are exactly the same, except sometimes we
-    # know an ident is a local when ripper doesn't (when they are introduced
-    # through named captures in regular expressions). In that case we don't
-    # compare the state.
-    class IdentToken < Token
-      def ==(other) # :nodoc:
-        (self[0...-1] == other[0...-1]) && (
-          (other[3] == Translation::Ripper::EXPR_LABEL | Translation::Ripper::EXPR_END) ||
-          (other[3] & (Translation::Ripper::EXPR_ARG | Translation::Ripper::EXPR_CMDARG) != 0)
-        )
+      def method_missing(name, ...) # :nodoc:
+        @array.send(name, ...)
       end
     end
-    # Ignored newlines can occasionally have a LABEL state attached to them, so
-    # we compare the state differently here.
-    class IgnoredNewlineToken < Token
-      def ==(other) # :nodoc:
-        return false unless self[0...-1] == other[0...-1]
-        if self[3] == Translation::Ripper::EXPR_ARG | Translation::Ripper::EXPR_LABELED
-          other[3] & Translation::Ripper::EXPR_ARG | Translation::Ripper::EXPR_LABELED != 0
-        else
-          self[3] == other[3]
-        end
-      end
-    end
-    # If we have an identifier that follows a method name like:
-    #
-    #     def foo bar
-    #
-    # then Ripper will mark bar as END|LABEL if there is a local in a parent
-    # scope named bar because it hasn't pushed the local table yet. We do this
-    # more accurately, so we need to allow comparing against both END and
-    # END|LABEL.
-    class ParamToken < Token
+    # Tokens where state should be ignored
+    # used for :on_sp, :on_comment, :on_heredoc_end, :on_embexpr_end
+    class IgnoreStateToken < Token
       def ==(other) # :nodoc:
-        (self[0...-1] == other[0...-1]) && (
-          (other[3] == Translation::Ripper::EXPR_END) ||
-          (other[3] == Translation::Ripper::EXPR_END | Translation::Ripper::EXPR_LABEL)
-        )
+        self[0...-1] == other[0...-1]
       end
     end
@@ -619,10 +581,10 @@ module Prism
     BOM_FLUSHED = RUBY_VERSION >= "3.3.0"
     private_constant :BOM_FLUSHED
-    attr_reader :source, :options
+    attr_reader :options
-    def initialize(source, **options)
-      @source = source
+    def initialize(code, **options)
+      @code = code
       @options = options
     end
@@ -632,12 +594,14 @@ module Prism
       state = :default
       heredoc_stack = [[]] #: Array[Array[Heredoc::PlainHeredoc | Heredoc::DashHeredoc | Heredoc::DedentingHeredoc]]
-      result = Prism.lex(source, **options)
+      result = Prism.lex(@code, **options)
+      source = result.source
       result_value = result.value
       previous_state = nil #: State?
       last_heredoc_end = nil #: Integer?
+      eof_token = nil
-      bom = source.byteslice(0..2) == "\xEF\xBB\xBF"
+      bom = source.slice(0, 3) == "\xEF\xBB\xBF"
       result_value.each_with_index do |(token, lex_state), index|
         lineno = token.location.start_line
@@ -675,12 +639,15 @@ module Prism
         event = RIPPER.fetch(token.type)
         value = token.value
-        lex_state = Translation::Ripper::Lexer::State.new(lex_state)
+        lex_state = Translation::Ripper::Lexer::State.cached(lex_state)
         token =
           case event
           when :on___end__
-            EndContentToken.new([[lineno, column], event, value, lex_state])
+            # Ripper doesn't include the rest of the token in the event, so we need to
+            # trim it down to just the content on the first line.
+            value = value[0..value.index("\n")]
+            Token.new([[lineno, column], event, value, lex_state])
           when :on_comment
             IgnoreStateToken.new([[lineno, column], event, value, lex_state])
           when :on_heredoc_end
@@ -688,33 +655,18 @@ module Prism
             # want to bother comparing the state on them.
             last_heredoc_end = token.location.end_offset
             IgnoreStateToken.new([[lineno, column], event, value, lex_state])
-          when :on_ident
-            if lex_state == Translation::Ripper::EXPR_END
-              # If we have an identifier that follows a method name like:
-              #
-              #     def foo bar
-              #
-              # then Ripper will mark bar as END|LABEL if there is a local in a
-              # parent scope named bar because it hasn't pushed the local table
-              # yet. We do this more accurately, so we need to allow comparing
-              # against both END and END|LABEL.
-              ParamToken.new([[lineno, column], event, value, lex_state])
-            elsif lex_state == Translation::Ripper::EXPR_END | Translation::Ripper::EXPR_LABEL
-              # In the event that we're comparing identifiers, we're going to
-              # allow a little divergence. Ripper doesn't account for local
-              # variables introduced through named captures in regexes, and we
-              # do, which accounts for this difference.
-              IdentToken.new([[lineno, column], event, value, lex_state])
-            else
-              Token.new([[lineno, column], event, value, lex_state])
-            end
           when :on_embexpr_end
             IgnoreStateToken.new([[lineno, column], event, value, lex_state])
-          when :on_ignored_nl
-            # Ignored newlines can occasionally have a LABEL state attached to
-            # them which doesn't actually impact anything. We don't mirror that
-            # state so we ignored it.
-            IgnoredNewlineToken.new([[lineno, column], event, value, lex_state])
+          when :on_words_sep
+            # Ripper emits one token each per line.
+            value.each_line.with_index do |line, index|
+              if index > 0
+                lineno += 1
+                column = 0
+              end
+              tokens << Token.new([[lineno, column], event, line, lex_state])
+            end
+            tokens.pop
           when :on_regexp_end
             # On regex end, Ripper scans and then sets end state, so the ripper
             # lexed output is begin, when it should be end. prism sets lex state
@@ -739,13 +691,14 @@ module Prism
                   counter += { on_embexpr_beg: -1, on_embexpr_end: 1 }[current_event] || 0
                 end
-                Translation::Ripper::Lexer::State.new(result_value[current_index][1])
+                Translation::Ripper::Lexer::State.cached(result_value[current_index][1])
               else
                 previous_state
               end
             Token.new([[lineno, column], event, value, lex_state])
           when :on_eof
+            eof_token = token
             previous_token = result_value[index - 1][0]
             # If we're at the end of the file and the previous token was a
@@ -768,7 +721,7 @@ module Prism
                   end_offset += 3
                 end
-                tokens << Token.new([[lineno, 0], :on_nl, source.byteslice(start_offset...end_offset), lex_state])
+                tokens << Token.new([[lineno, 0], :on_nl, source.slice(start_offset, end_offset - start_offset), lex_state])
               end
             end
@@ -859,10 +812,98 @@ module Prism
       # Drop the EOF token from the list
       tokens = tokens[0...-1]
-      # We sort by location to compare against Ripper's output
-      tokens.sort_by!(&:location)
+      # We sort by location because Ripper.lex sorts.
+      # Manually implemented instead of `sort_by!(&:location)` for performance.
+      tokens.sort_by! do |token|
+        line, column = token.location
+        source.byte_offset(line, column)
+      end
+      # Add :on_sp tokens
+      tokens = add_on_sp_tokens(tokens, source, result.data_loc, bom, eof_token)
+      Result.new(tokens, result.comments, result.magic_comments, result.data_loc, result.errors, result.warnings, source)
+    end
+    def add_on_sp_tokens(tokens, source, data_loc, bom, eof_token)
+      new_tokens = []
+      prev_token_state = Translation::Ripper::Lexer::State.cached(Translation::Ripper::EXPR_BEG)
+      prev_token_end = bom ? 3 : 0
+      tokens.each do |token|
+        line, column = token.location
+        start_offset = source.byte_offset(line, column)
+        # Ripper reports columns on line 1 without counting the BOM, so we
+        # adjust to get the real offset
+        start_offset += 3 if line == 1 && bom
+        if start_offset > prev_token_end
+          sp_value = source.slice(prev_token_end, start_offset - prev_token_end)
+          sp_line = source.line(prev_token_end)
+          sp_column = source.column(prev_token_end)
+          # Ripper reports columns on line 1 without counting the BOM
+          sp_column -= 3 if sp_line == 1 && bom
+          continuation_index = sp_value.byteindex("\\")
+          # ripper emits up to three :on_sp tokens when line continuations are used
+          if continuation_index
+            next_whitespace_index = continuation_index + 1
+            next_whitespace_index += 1 if sp_value.byteslice(next_whitespace_index) == "\r"
+            next_whitespace_index += 1
+            first_whitespace = sp_value[0...continuation_index]
+            continuation = sp_value[continuation_index...next_whitespace_index]
+            second_whitespace = sp_value[next_whitespace_index..]
+            new_tokens << IgnoreStateToken.new([
+              [sp_line, sp_column],
+              :on_sp,
+              first_whitespace,
+              prev_token_state
+            ]) unless first_whitespace.empty?
+            new_tokens << IgnoreStateToken.new([
+              [sp_line, sp_column + continuation_index],
+              :on_sp,
+              continuation,
+              prev_token_state
+            ])
+            new_tokens << IgnoreStateToken.new([
+              [sp_line + 1, 0],
+              :on_sp,
+              second_whitespace,
+              prev_token_state
+            ]) unless second_whitespace.empty?
+          else
+            new_tokens << IgnoreStateToken.new([
+              [sp_line, sp_column],
+              :on_sp,
+              sp_value,
+              prev_token_state
+            ])
+          end
+        end
+        new_tokens << token
+        prev_token_state = token.state
+        prev_token_end = start_offset + token.value.bytesize
+      end
+      unless data_loc # no trailing :on_sp with __END__ as it is always preceded by :on_nl
+        end_offset = eof_token.location.end_offset
+        if prev_token_end < end_offset
+          new_tokens << IgnoreStateToken.new([
+            [source.line(prev_token_end), source.column(prev_token_end)],
+            :on_sp,
+            source.slice(prev_token_end, end_offset - prev_token_end),
+            prev_token_state
+          ])
+        end
+      end
-      Result.new(tokens, result.comments, result.magic_comments, result.data_loc, result.errors, result.warnings, Source.for(source))
+      new_tokens
     end
   end

data/lib/prism/node.rb CHANGED Viewed

@@ -194,25 +194,13 @@ module Prism
     def tunnel(line, column)
       queue = [self] #: Array[Prism::node]
       result = [] #: Array[Prism::node]
+      offset = source.byte_offset(line, column)
       while (node = queue.shift)
         result << node
         node.each_child_node do |child_node|
-          child_location = child_node.location
-          start_line = child_location.start_line
-          end_line = child_location.end_line
-          if start_line == end_line
-            if line == start_line && column >= child_location.start_column && column < child_location.end_column
-              queue << child_node
-              break
-            end
-          elsif (line == start_line && column >= child_location.start_column) || (line == end_line && column < child_location.end_column)
-            queue << child_node
-            break
-          elsif line > start_line && line < end_line
+          if child_node.start_offset <= offset && offset < child_node.end_offset
             queue << child_node
             break
           end
@@ -223,7 +211,7 @@ module Prism
     end
     # Returns the first node that matches the given block when visited in a
-    # depth-first search. This is useful for finding a node that matches a
+    # breadth-first search. This is useful for finding a node that matches a
     # particular condition.
     #
     #     node.breadth_first_search { |node| node.node_id == node_id }
@@ -238,6 +226,26 @@ module Prism
       nil
     end
+    alias find breadth_first_search
+    # Returns all of the nodes that match the given block when visited in a
+    # breadth-first search. This is useful for finding all nodes that match a
+    # particular condition.
+    #
+    #     node.breadth_first_search_all { |node| node.is_a?(Prism::CallNode) }
+    #
+    def breadth_first_search_all(&block)
+      queue = [self] #: Array[Prism::node]
+      results = [] #: Array[Prism::node]
+      while (node = queue.shift)
+        results << node if yield node
+        queue.concat(node.compact_child_nodes)
+      end
+      results
+    end
+    alias find_all breadth_first_search_all
     # Returns a list of the fields that exist for this node class. Fields
     # describe the structure of the node. This kind of reflection is useful for
@@ -2025,10 +2033,10 @@ module Prism
     #                          ^^^^^^
     attr_reader :body
-    # Represents the location of the opening `|`.
+    # Represents the location of the opening `{` or `do`.
     #
     #     [1, 2, 3].each { |i| puts x }
-    #                      ^
+    #                    ^
     def opening_loc
       location = @opening_loc
       return location if location.is_a?(Location)
@@ -2041,10 +2049,10 @@ module Prism
       repository.enter(node_id, :opening_loc)
     end
-    # Represents the location of the closing `|`.
+    # Represents the location of the closing `}` or `end`.
     #
     #     [1, 2, 3].each { |i| puts x }
-    #                        ^
+    #                                 ^
     def closing_loc
       location = @closing_loc
       return location if location.is_a?(Location)

data/lib/prism/parse_result.rb CHANGED Viewed

@@ -76,6 +76,15 @@ module Prism
       source.byteslice(byte_offset, length) or raise
     end
+    # Converts the line number and column in bytes to a byte offset.
+    def byte_offset(line, column)
+      normal = line - @start_line
+      raise IndexError if normal < 0
+      offsets.fetch(normal) + column
+    rescue IndexError
+      raise ArgumentError, "line #{line} is out of range"
+    end
     # Binary search through the offsets to find the line number for the given
     # byte offset.
     def line(byte_offset)

data/lib/prism/serialize.rb CHANGED Viewed

@@ -21,7 +21,7 @@ module Prism
     # The minor version of prism that we are expecting to find in the serialized
     # strings.
-    MINOR_VERSION = 8
+    MINOR_VERSION = 9
     # The patch version of prism that we are expecting to find in the serialized
     # strings.

data/lib/prism/translation/ripper/filter.rb ADDED Viewed

@@ -0,0 +1,53 @@
+# frozen_string_literal: true
+module Prism
+  module Translation
+    class Ripper
+      class Filter # :nodoc:
+        # :stopdoc:
+        def initialize(src, filename = '-', lineno = 1)
+          @__lexer = Lexer.new(src, filename, lineno)
+          @__line = nil
+          @__col = nil
+          @__state = nil
+        end
+        def filename
+          @__lexer.filename
+        end
+        def lineno
+          @__line
+        end
+        def column
+          @__col
+        end
+        def state
+          @__state
+        end
+        def parse(init = nil)
+          data = init
+          @__lexer.lex.each do |pos, event, tok, state|
+            @__line, @__col = *pos
+            @__state = state
+            data = if respond_to?(event, true)
+                  then __send__(event, tok, data)
+                  else on_default(event, tok, data)
+                  end
+          end
+          data
+        end
+        private
+        def on_default(event, token, data)
+          data
+        end
+        # :startdoc:
+      end
+    end
+  end
+end