RubyGems - liquid2 - Versions diffs - 0.3.0 → 0.3.1 - Mend

liquid2 0.3.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
checksums.yaml.gz.sig +0 -0
data/CHANGELOG.md +5 -0
data/README.md +8 -2
data/lib/liquid2/environment.rb +120 -10
data/lib/liquid2/filters/slice.rb +40 -0
data/lib/liquid2/parser.rb +3 -10
data/lib/liquid2/scanner.rb +90 -79
data/lib/liquid2/version.rb +1 -1
data/performance/benchmark.rb +0 -6
data/sig/liquid2.rbs +152 -16
data.tar.gz.sig +0 -0
metadata +1 -1
metadata.gz.sig +0 -0

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 42973b8aae08cf4321586ac4cdaf39ecab29e0a2a4b7f26aed278291e41b7645
-  data.tar.gz: 9c2bca82b7f589cfdccd4a2a5c091ac3cd32613e0c40354b6313d23a6c35bbec
+  metadata.gz: 3a68d0ef0f934b9b4fd68d99591e5b0faf9df0e4d408e35c4df1aa2b7b98f4a1
+  data.tar.gz: 41d881fe5f30b1f390e2c8297e36ca08f6eb70c1b70225f8418ba255f6297759
 SHA512:
-  metadata.gz: fbb3917b6b68ba37aaffbf158aac61d85a577ddf244b12fdad9eaa99e5ea54a0f94b255b3d21381514bb42bbb7c3543c247b7bba38255e95dacc195aedf2f10a
-  data.tar.gz: a61fd0b11ff0d4ed92bdcd1b8eca60b5997f6881caad131132256f561a0c039cdc33e758601b7dd9da0e02534f53c21a38da6c09e38bc947a4a93834a0413210
+  metadata.gz: 53ad1737b2ae742366a0fc26e038c971d18a4f500ce104faac21a94547ac61a9926a5683a5150539e1b968d144b2cb15aa93823db163cbaa07d785b2e9ed3c31
+  data.tar.gz: 25e214ff840aacacb4ffed35160295d8fd7dd04ea301c62aec2a490d4c5d54ba72b73f9a7318b2bc0fb1f8c3ed7eea26a737ba5f8ea182b3a36e44996a8b06f4

checksums.yaml.gz.sig CHANGED Viewed

Binary file

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,8 @@
+## [0.3.1] - 25-06-24
+- Added support for custom markup delimiters. See [#16](https://github.com/jg-rp/ruby-liquid2/pull/16).
+- Added the `range` filter. `range` is an array slicing filter that takes optional start and end indexes, and an optional step argument, any of which can be negative. See [#18](https://github.com/jg-rp/ruby-liquid2/pull/18).
 ## [0.3.0] - 25-05-29
 - Fixed static analysis of lambda expressions (arrow functions). Previously we were not including lambda parameters in the scope of the expression. See [#12](https://github.com/jg-rp/ruby-liquid2/issues/12).

data/README.md CHANGED Viewed

@@ -36,7 +36,7 @@ Liquid templates for Ruby, with some extra features.
 Add `'liquid2'` to your Gemfile:
 ```
-gem 'liquid2', '~> 0.3.0'
+gem 'liquid2', '~> 0.3.1'
 ```
 Or
@@ -231,7 +231,13 @@ Integer and float literals can use scientific notation, like `1.2e3` or `1e-2`.
 Liquid2 includes implementations of `{% extends %}` and `{% block %}` for template inheritance, `{% with %}` for block scoped variables and `{% macro %}` and `{% call %}` for defining parameterized blocks.
-There's also built-in implementations of `sort_numeric` and `json` filters.
+The following filters are included in Liquid2's default environment:
+- `sort_numeric` - Sorts array elements by runs of digits found in their string representation.
+- `json` - Outputs objects serialized in JSON format.
+- `range`- An alternative to the standard `slice` filter that takes optional start and stop indexes, and an optional step, all of which can be negative.
+See [Tags and filters](#tags-and-filters) for how to add, remove or alias tags and/or filters from your own Liquid2 environment.
 ## API

data/lib/liquid2/environment.rb CHANGED Viewed

@@ -38,14 +38,20 @@ require_relative "nodes/tags/with"
 module Liquid2
   # Template parsing and rendering configuration.
   #
-  # A Liquid::Environment is where you might register custom tags and filters,
+  # A Liquid2::Environment is where you might register custom tags and filters,
   # or store global context data that should be available to all templates.
   #
   # `Liquid2.parse(source)` is equivalent to `Liquid2::Environment.new.parse(source)`.
   class Environment
     attr_reader :tags, :local_namespace_limit, :context_depth_limit, :loop_iteration_limit,
                 :output_stream_limit, :filters, :suppress_blank_control_flow_blocks,
-                :shorthand_indexes, :falsy_undefined, :arithmetic_operators
+                :shorthand_indexes, :falsy_undefined, :arithmetic_operators, :markup_comment_prefix,
+                :markup_comment_suffix, :markup_out_end, :markup_out_start, :markup_tag_end,
+                :markup_tag_start, :re_tag_name, :re_word, :re_int, :re_float,
+                :re_double_quote_string_special, :re_single_quote_string_special, :re_markup_start,
+                :re_markup_end, :re_markup_end_chars, :re_up_to_markup_start, :re_punctuation,
+                :re_up_to_inline_comment_end, :re_up_to_raw_end, :re_block_comment_chunk,
+                :re_up_to_doc_end, :re_line_statement_comment
     # @param context_depth_limit [Integer] The maximum number of times a render context can
     #   be extended or copied before a `Liquid2::LiquidResourceLimitError`` is raised.
@@ -59,8 +65,23 @@ module Liquid2
     #   `Liquid2::LiquidResourceLimitError`` is raised.
     # @param loop_iteration_limit [Integer?] The maximum number of loop iterations allowed
     #   before a `LiquidResourceLimitError` is raised.
+    # @param markup_comment_prefix [String] The string of characters that indicate the start of a
+    #   Liquid comment. This should include a single trailing `#`. Additional, variable length
+    #   hashes will be handled by the tokenizer. It is not possible to change comment syntax to not
+    #   use `#`.
+    # @param markup_comment_suffix [String] The string of characters that indicate the end of a
+    #   Liquid comment, excluding any hashes.
+    # @param markup_out_end [String] The string of characters that indicate the end of a Liquid
+    #   output statement.
+    # @param markup_out_start [String] The string of characters that indicate the start of a Liquid
+    #   output statement.
+    # @param markup_tag_end [String] The string of characters that indicate the end of a Liquid tag.
+    # @param markup_tag_start [String] The string of characters that indicate the start of a Liquid
+    #   tag.
     # @param output_stream_limit [Integer?] The maximum number of bytes that can be written
     #   to a template's output buffer before a `LiquidResourceLimitError` is raised.
+    # @param parser [singleton(Parser)] `Liquid2::Parser` or a subclass of it.
+    # @param scanner [singleton(Scanner)] `Liquid2::Scanner` or a subclass of it.
     # @param shorthand_indexes [bool] When `true`, allow shorthand dotted array indexes as
     #   well as bracketed indexes in variable paths. Defaults to `false`.
     # @param suppress_blank_control_flow_blocks [bool] When `true`, suppress blank control
@@ -70,15 +91,23 @@ module Liquid2
     def initialize(
       arithmetic_operators: false,
       context_depth_limit: 30,
+      falsy_undefined: true,
       globals: nil,
       loader: nil,
       local_namespace_limit: nil,
       loop_iteration_limit: nil,
+      markup_comment_prefix: "{#",
+      markup_comment_suffix: "}",
+      markup_out_end: "}}",
+      markup_out_start: "{{",
+      markup_tag_end: "%}",
+      markup_tag_start: "{%",
       output_stream_limit: nil,
+      parser: Parser,
+      scanner: Scanner,
       shorthand_indexes: false,
       suppress_blank_control_flow_blocks: true,
-      undefined: Undefined,
-      falsy_undefined: true
+      undefined: Undefined
     )
       # A mapping of tag names to objects responding to `parse(token, parser)`.
       @tags = {}
@@ -116,9 +145,17 @@ module Liquid2
       # before a `LiquidResourceLimitError` is raised.
       @output_stream_limit = output_stream_limit
+      # Liquid2::Scanner or a subclass of it. This is used to tokenize Liquid source
+      # text before parsing it.
+      @scanner = scanner
+      # Liquid2::Parser or a subclass of it. The parser takes tokens from the scanner
+      # and produces an abstract syntax tree.
+      @parser = parser
       # We reuse the same string scanner when parsing templates for improved performance.
       # TODO: Is this going to cause issues in multi threaded environments?
-      @scanner = StringScanner.new("")
+      @string_scanner = StringScanner.new("")
       # When `true`, allow shorthand dotted array indexes as well as bracketed indexes
       # in variable paths. Defaults to `false`.
@@ -136,6 +173,31 @@ module Liquid2
       # raise an error when tested for truthiness.
       @falsy_undefined = falsy_undefined
+      # The string of characters that indicate the start of a Liquid output statement.
+      @markup_out_start = markup_out_start
+      # The string of characters that indicate the end of a Liquid output statement.
+      @markup_out_end = markup_out_end
+      # The string of characters that indicate the start of a Liquid tag.
+      @markup_tag_start = markup_tag_start
+      # The string of characters that indicate the end of a Liquid tag.
+      @markup_tag_end = markup_tag_end
+      # The string of characters that indicate the start of a Liquid comment. This should
+      # include a single trailing `#`. Additional, variable length hashes will be handled
+      # by the tokenizer. It is not possible to change comment syntax to not use `#`.
+      @markup_comment_prefix = markup_comment_prefix
+      # The string of characters that indicate the end of a Liquid comment, excluding any
+      # hashes.
+      @markup_comment_suffix = markup_comment_suffix
+      # You might need to override `setup_scanner` if you've specified custom markup
+      # delimiters and they conflict with standard punctuation.
+      setup_scanner
       # Override `setup_tags_and_filters` in environment subclasses to configure custom
       # tags and/or filters.
       setup_tags_and_filters
@@ -145,11 +207,13 @@ module Liquid2
     # @param source [String] template source text.
     # @return [Template]
     def parse(source, name: "", path: nil, up_to_date: nil, globals: nil, overlay: nil)
-      Template.new(self,
-                   source,
-                   Parser.parse(self, source, scanner: @scanner),
-                   name: name, path: path, up_to_date: up_to_date,
-                   globals: make_globals(globals), overlay: overlay)
+      Template.new(
+        self,
+        source,
+        @parser.new(self, @scanner.tokenize(self, source, @string_scanner), source.length).parse,
+        name: name, path: path, up_to_date: up_to_date,
+        globals: make_globals(globals), overlay: overlay
+      )
     rescue LiquidError => e
       e.source = source unless e.source
       e.template_name = name unless e.template_name || name.empty?
@@ -262,6 +326,7 @@ module Liquid2
       register_filter("newline_to_br", Liquid2::Filters.method(:newline_to_br))
       register_filter("plus", Liquid2::Filters.method(:plus))
       register_filter("prepend", Liquid2::Filters.method(:prepend))
+      register_filter("range", Liquid2::Filters.method(:better_slice))
       register_filter("reject", Liquid2::Filters.method(:reject))
       register_filter("remove_first", Liquid2::Filters.method(:remove_first))
       register_filter("remove_last", Liquid2::Filters.method(:remove_last))
@@ -292,6 +357,51 @@ module Liquid2
       register_filter("where", Liquid2::Filters.method(:where))
     end
+    # Compile regular expressions for use by the tokenizer attached to this environment.
+    def setup_scanner
+      # A regex pattern matching Liquid tag names. Should include `#` for inline comments.
+      @re_tag_name = /(?:[a-z][a-z_0-9]*|#)/
+      # A regex pattern matching keywords and/or variable/path names. Replace this if
+      # you want to disable Unicode characters in identifiers, for example.
+      @re_word = /[\u0080-\uFFFFa-zA-Z_][\u0080-\uFFFFa-zA-Z0-9_-]*/
+      # Patterns matching literal integers and floats, possibly in scientific notation.
+      # You could simplify these to disable scientific notation.
+      @re_int = /-?\d+(?:[eE]\+?\d+)?/
+      @re_float = /((?:-?\d+\.\d+(?:[eE][+-]?\d+)?)|(-?\d+[eE]-\d+))/
+      # Patterns matching escape sequences, interpolation and end of string in string literals.
+      # You could remove `\$` from these to disable string interpolation.
+      @re_double_quote_string_special = /[\\"\$]/
+      @re_single_quote_string_special = /[\\'\$]/
+      # rubocop: disable Layout/LineLength
+      # A regex pattern matching the start of some Liquid markup. Could be the start of an
+      # output statement, tag or comment. Traditionally `{{`, `{%` and `{#`, respectively.
+      @re_markup_start = /#{Regexp.escape(@markup_out_start)}|#{Regexp.escape(@markup_tag_start)}|#{Regexp.escape(@markup_comment_prefix)}/
+      # A regex pattern matching the end of some Liquid markup. Could be the end of
+      # an output statement or tag. Traditionally `}}`, `%}`, respectively.
+      @re_markup_end = /#{Regexp.escape(@markup_out_end)}|#{Regexp.escape(@markup_tag_end)}/
+      # A regex pattern matching any one of the possible characters ending some Liquid
+      # markup. This is used to detect incomplete and malformed markup and provide
+      # helpful error messages.
+      @re_markup_end_chars = /[#{Regexp.escape((@markup_out_end + @markup_tag_end).each_char.uniq.join)}]/
+      @re_up_to_markup_start = /(?=#{Regexp.escape(@markup_out_start)}|#{Regexp.escape(@markup_tag_start)}|#{Regexp.escape(@markup_comment_prefix)})/
+      @re_punctuation = %r{(?!#{@re_markup_end})(\?|\[|\]|\|{1,2}|\.{1,2}|,|:|\(|\)|[<>=!]+|[+\-%*/]+(?!#{@re_markup_end_chars}))}
+      @re_up_to_inline_comment_end = /(?=([+\-~])?#{Regexp.escape(@markup_tag_end)})/
+      @re_up_to_raw_end = /(?=(#{Regexp.escape(@markup_tag_start)}[+\-~]?\s*endraw\s*[+\-~]?#{Regexp.escape(@markup_tag_end)}))/
+      @re_block_comment_chunk = /(#{Regexp.escape(@markup_tag_start)}[+\-~]?\s*(comment|raw|endcomment|endraw)\s*[+\-~]?#{Regexp.escape(@markup_tag_end)})/
+      @re_up_to_doc_end = /(?=(#{Regexp.escape(@markup_tag_start)}[+\-~]?\s*enddoc\s*[+\-~]?#{Regexp.escape(@markup_tag_end)}))/
+      @re_line_statement_comment = /(?=([\r\n]+|-?#{Regexp.escape(@markup_tag_end)}))/
+      # rubocop: enable Layout/LineLength
+    end
     def undefined(name, node: nil)
       @undefined.new(name, node: node)
     end

data/lib/liquid2/filters/slice.rb CHANGED Viewed

@@ -13,5 +13,45 @@ module Liquid2
         Liquid2.to_s(left).slice(to_integer(start), to_integer(length)) || ""
       end
     end
+    def self.better_slice(
+      left,
+      start_ = :undefined, stop_ = :undefined, step_ = :undefined,
+      start: :undefined, stop: :undefined, step: :undefined
+    )
+      # Give priority to keyword arguments, default to nil if neither are given.
+      start = start_ == :undefined ? nil : start_ if start == :undefined
+      stop = stop_ == :undefined ? nil : stop_ if stop == :undefined
+      step = step_ == :undefined ? nil : step_ if step == :undefined
+      step = to_integer(step || 1)
+      length = left.length
+      return [] if length.zero? || step.zero?
+      start = to_integer(start) unless start.nil?
+      stop = to_integer(stop) unless stop.nil?
+      normalized_start = if start.nil?
+                           step.negative? ? length - 1 : 0
+                         elsif start&.negative?
+                           [length + start, 0].max
+                         else
+                           [start, length - 1].min
+                         end
+      normalized_stop = if stop.nil?
+                          step.negative? ? -1 : length
+                        elsif stop&.negative?
+                          [length + stop, -1].max
+                        else
+                          [stop, length].min
+                        end
+      # This does not work with Ruby 3.1
+      # left[(normalized_start...normalized_stop).step(step)]
+      #
+      # But this does.
+      (normalized_start...normalized_stop).step(step).map { |i| left[i] }
+    end
   end
 end

data/lib/liquid2/parser.rb CHANGED Viewed

@@ -25,11 +25,12 @@ module Liquid2
   # Liquid template parser.
   class Parser
     # Parse Liquid template text into a syntax tree.
+    # @param env [Environment]
     # @param source [String]
     # @return [Array[Node | String]]
     def self.parse(env, source, scanner: nil)
       new(env,
-          Liquid2::Scanner.tokenize(source, scanner || StringScanner.new("")),
+          Liquid2::Scanner.tokenize(env, source, scanner || StringScanner.new("")),
           source.length).parse
     end
@@ -824,15 +825,7 @@ module Liquid2
         return parse_partial_arrow_function(expr)
       end
-      unless TERMINATE_GROUPED_EXPRESSION.member?(kind)
-        unless BINARY_OPERATORS.member?(kind)
-          raise LiquidSyntaxError.new("expected an infix operator, found #{kind}", current)
-        end
-        expr = parse_infix_expression(expr)
-      end
-      eat(:token_rparen)
+      eat(:token_rparen, "unbalanced parentheses")
       expr
     end

data/lib/liquid2/scanner.rb CHANGED Viewed

@@ -12,14 +12,6 @@ module Liquid2
   class Scanner
     attr_reader :tokens
-    RE_LINE_SPACE = /[ \t]+/
-    RE_WORD = /[\u0080-\uFFFFa-zA-Z_][\u0080-\uFFFFa-zA-Z0-9_-]*/
-    RE_INT  = /-?\d+(?:[eE]\+?\d+)?/
-    RE_FLOAT = /((?:-?\d+\.\d+(?:[eE][+-]?\d+)?)|(-?\d+[eE]-\d+))/
-    RE_PUNCTUATION = %r{\?|\[|\]|\|{1,2}|\.{1,2}|,|:|\(|\)|[<>=!]+|[+\-%*/]+(?![\}%])}
-    RE_SINGLE_QUOTE_STRING_SPECIAL = /[\\'\$]/
-    RE_DOUBLE_QUOTE_STRING_SPECIAL = /[\\"\$]/
     # Keywords and symbols that get their own token kind.
     TOKEN_MAP = {
       "true" => :token_true,
@@ -68,15 +60,16 @@ module Liquid2
       "**" => :token_pow
     }.freeze
-    def self.tokenize(source, scanner)
-      lexer = new(source, scanner)
+    def self.tokenize(env, source, scanner)
+      lexer = new(env, source, scanner)
       lexer.run
       lexer.tokens
     end
+    # @param env [Environment]
     # @param source [String]
     # @param scanner [StringScanner]
-    def initialize(source, scanner)
+    def initialize(env, source, scanner)
       @source = source
       @scanner = scanner
       @scanner.string = @source
@@ -84,8 +77,33 @@ module Liquid2
       # A pointer to the start of the current token.
       @start = 0
-      # Tokens are arrays of (kind, value, start index)
+      # Tokens are arrays of (kind, value, start index).
+      # Sometimes we set value to `nil` when the symbol is unambiguous.
       @tokens = [] # : Array[[Symbol, String?, Integer]]
+      @s_out_start = env.markup_out_start
+      @s_out_end = env.markup_out_end
+      @s_tag_start = env.markup_tag_start
+      @s_tag_end = env.markup_tag_end
+      @s_comment_prefix = env.markup_comment_prefix
+      @s_comment_suffix = env.markup_comment_suffix
+      @re_tag_name = env.re_tag_name
+      @re_word = env.re_word
+      @re_int = env.re_int
+      @re_float = env.re_float
+      @re_double_quote_string_special = env.re_double_quote_string_special
+      @re_single_quote_string_special = env.re_single_quote_string_special
+      @re_markup_start = env.re_markup_start
+      @re_markup_end = env.re_markup_end
+      @re_markup_end_chars = env.re_markup_end_chars
+      @re_up_to_markup_start = env.re_up_to_markup_start
+      @re_punctuation = env.re_punctuation
+      @re_up_to_inline_comment_end = env.re_up_to_inline_comment_end
+      @re_up_to_raw_end = env.re_up_to_raw_end
+      @re_block_comment_chunk = env.re_block_comment_chunk
+      @re_up_to_doc_end = env.re_up_to_doc_end
+      @re_line_statement_comment = env.re_line_statement_comment
     end
     def run
@@ -108,14 +126,13 @@ module Liquid2
     end
     def skip_line_trivia
-      @start = @scanner.pos if @scanner.skip(RE_LINE_SPACE)
+      @start = @scanner.pos if @scanner.skip(/[ \t]+/)
     end
     def accept_whitespace_control
       ch = @scanner.peek(1)
-      case ch
-      when "-", "+", "~"
+      if ch == "-" || ch == "+" || ch == "~" # rubocop: disable Style/MultipleComparison
         @scanner.pos += 1
         @tokens << [:token_whitespace_control, ch, @start]
         @start = @scanner.pos
@@ -126,22 +143,22 @@ module Liquid2
     end
     def lex_markup
-      case @scanner.scan(/\{[\{%#]/)
-      when "{#"
+      case @scanner.scan(@re_markup_start)
+      when @s_comment_prefix
         :lex_comment
-      when "{{"
+      when @s_out_start
         @tokens << [:token_output_start, nil, @start]
         @start = @scanner.pos
         accept_whitespace_control
         skip_trivia
         :lex_expression
-      when "{%"
+      when @s_tag_start
         @tokens << [:token_tag_start, nil, @start]
         @start = @scanner.pos
         accept_whitespace_control
         skip_trivia
-        if (tag_name = @scanner.scan(/(?:[a-z][a-z_0-9]*|#)/))
+        if (tag_name = @scanner.scan(@re_tag_name))
           @tokens << [:token_tag_name, tag_name, @start]
           @start = @scanner.pos
@@ -173,8 +190,7 @@ module Liquid2
           :lex_expression
         end
       else
-        if @scanner.skip_until(/\{[\{%#]/)
-          @scanner.pos -= 2
+        if @scanner.skip_until(@re_up_to_markup_start)
           @tokens << [:token_other, @source.byteslice(@start...@scanner.pos), @start]
           @start = @scanner.pos
           :lex_markup
@@ -192,26 +208,27 @@ module Liquid2
     def lex_expression
       loop do
         skip_trivia
-        if (value = @scanner.scan(RE_FLOAT))
+        if (value = @scanner.scan(@re_float))
           @tokens << [:token_float, value, @start]
           @start = @scanner.pos
-        elsif (value = @scanner.scan(RE_INT))
+        elsif (value = @scanner.scan(@re_int))
           @tokens << [:token_int, value, @start]
           @start = @scanner.pos
-        elsif (value = @scanner.scan(RE_PUNCTUATION))
+        elsif (value = @scanner.scan(@re_punctuation))
           @tokens << [TOKEN_MAP[value] || :token_unknown, value, @start]
           @start = @scanner.pos
-        elsif (value = @scanner.scan(RE_WORD))
+        elsif (value = @scanner.scan(@re_word))
           @tokens << [TOKEN_MAP[value] || :token_word, value, @start]
           @start = @scanner.pos
         else
           case @scanner.get_byte
           when "'"
             @start = @scanner.pos
-            scan_string("'", :token_single_quote_string, RE_SINGLE_QUOTE_STRING_SPECIAL)
+            scan_string("'", :token_single_quote_string, @re_single_quote_string_special)
           when "\""
             @start = @scanner.pos
-            scan_string("\"", :token_double_quote_string, RE_DOUBLE_QUOTE_STRING_SPECIAL)
+            scan_string("\"", :token_double_quote_string,
+                        @re_double_quote_string_special)
           else
             @scanner.pos -= 1
             break
@@ -222,17 +239,17 @@ module Liquid2
       accept_whitespace_control
       # Miro benchmarks show no performance gain using scan_byte and peek_byte over scan here.
-      case @scanner.scan(/[\}%]\}/)
-      when "}}"
+      case @scanner.scan(@re_markup_end)
+      when @s_out_end
         @tokens << [:token_output_end, nil, @start]
-      when "%}"
+      when @s_tag_end
         @tokens << [:token_tag_end, nil, @start]
       else
         # Unexpected token
         return nil if @scanner.eos?
-        if (ch = @scanner.scan(/[\}%]/))
-          raise LiquidSyntaxError.new("missing \"}\" or \"%\" detected",
+        if (ch = @scanner.scan(@re_markup_end_chars))
+          raise LiquidSyntaxError.new("missing markup delimiter detected",
                                       [:token_unknown, ch, @start])
         end
@@ -255,8 +272,7 @@ module Liquid2
       wc = accept_whitespace_control
-      if @scanner.skip_until(/([+\-~]?)(\#{#{hash_count}}\})/)
-        @scanner.pos -= @scanner[0]&.length || 0
+      if @scanner.skip_until(/(?=([+\-~]?)(\#{#{hash_count}}#{Regexp.escape(@s_comment_suffix)}))/)
         @tokens << [:token_comment, @source.byteslice(@start...@scanner.pos), @start]
         @start = @scanner.pos
@@ -282,18 +298,17 @@ module Liquid2
     end
     def lex_inside_inline_comment
-      if @scanner.skip_until(/([+\-~])?%\}/)
-        @scanner.pos -= @scanner.captures&.first.nil? ? 2 : 3
+      if @scanner.skip_until(@re_up_to_inline_comment_end)
         @tokens << [:token_comment, @source.byteslice(@start...@scanner.pos), @start]
         @start = @scanner.pos
       end
       accept_whitespace_control
-      case @scanner.scan(/[\}%]\}/)
-      when "}}"
+      case @scanner.scan(@re_markup_end)
+      when @s_out_end
         @tokens << [:token_output_end, nil, @start]
-      when "%}"
+      when @s_tag_end
         @tokens << [:token_tag_end, nil, @start]
       else
         # Unexpected token
@@ -310,17 +325,16 @@ module Liquid2
       skip_trivia
       accept_whitespace_control
-      case @scanner.scan(/[\}%]\}/)
-      when "}}"
+      case @scanner.scan(@re_markup_end)
+      when @s_out_end
         @tokens << [:token_output_end, nil, @start]
         @start = @scanner.pos
-      when "%}"
+      when @s_tag_end
         @tokens << [:token_tag_end, nil, @start]
         @start = @scanner.pos
       end
-      if @scanner.skip_until(/(\{%[+\-~]?\s*endraw\s*[+\-~]?%\})/)
-        @scanner.pos -= @scanner.captures&.first&.length || raise
+      if @scanner.skip_until(@re_up_to_raw_end)
         @tokens << [:token_raw, @source.byteslice(@start...@scanner.pos), @start]
         @start = @scanner.pos
       end
@@ -332,11 +346,11 @@ module Liquid2
       skip_trivia
       accept_whitespace_control
-      case @scanner.scan(/[\}%]\}/)
-      when "}}"
+      case @scanner.scan(@re_markup_end)
+      when @s_out_end
         @tokens << [:token_output_end, nil, @start]
         @start = @scanner.pos
-      when "%}"
+      when @s_tag_end
         @tokens << [:token_tag_end, nil, @start]
         @start = @scanner.pos
       end
@@ -345,9 +359,7 @@ module Liquid2
       raw_depth = 0
       loop do
-        unless @scanner.skip_until(/(\{%[+\-~]?\s*(comment|raw|endcomment|endraw)\s*[+\-~]?%\})/)
-          break
-        end
+        break unless @scanner.skip_until(@re_block_comment_chunk)
         tag_name = @scanner.captures&.last || raise
@@ -380,17 +392,16 @@ module Liquid2
       skip_trivia
       accept_whitespace_control
-      case @scanner.scan(/[\}%]\}/)
-      when "}}"
+      case @scanner.scan(@re_markup_end)
+      when @s_out_end
         @tokens << [:token_output_end, nil, @start]
         @start = @scanner.pos
-      when "%}"
+      when @s_tag_end
         @tokens << [:token_tag_end, nil, @start]
         @start = @scanner.pos
       end
-      if @scanner.skip_until(/(\{%[+\-~]?\s*enddoc\s*[+\-~]?%\})/)
-        @scanner.pos -= @scanner.captures&.first&.length || raise
+      if @scanner.skip_until(@re_up_to_doc_end)
         @tokens << [:token_doc, @source.byteslice(@start...@scanner.pos), @start]
         @start = @scanner.pos
       end
@@ -401,21 +412,19 @@ module Liquid2
     def lex_line_statements
       skip_trivia # Leading newlines are OK
-      if (tag_name = @scanner.scan(/(?:[a-z][a-z_0-9]*|#)/))
+      if (tag_name = @scanner.scan(@re_tag_name))
         @tokens << [:token_tag_start, nil, @start]
         @tokens << [:token_tag_name, tag_name, @start]
         @start = @scanner.pos
-        if tag_name == "#" && @scanner.scan_until(/([\r\n]+|-?%\})/)
-          @scanner.pos -= @scanner.captures&.first&.length || raise
+        if tag_name == "#" && @scanner.scan_until(@re_line_statement_comment)
           @tokens << [:token_comment, @source.byteslice(@start...@scanner.pos), @start]
           @start = @scanner.pos
           @tokens << [:token_tag_end, nil, @start]
           :lex_line_statements
-        elsif tag_name == "comment" && @scanner.scan_until(/(endcomment)/)
+        elsif tag_name == "comment" && @scanner.scan_until(/(?=endcomment)/)
           @tokens << [:token_tag_end, nil, @start]
-          @scanner.pos -= @scanner.captures&.first&.length || raise
           @tokens << [:token_comment, @source.byteslice(@start...@scanner.pos), @start]
           @start = @scanner.pos
           :lex_line_statements
@@ -424,11 +433,11 @@ module Liquid2
         end
       else
         accept_whitespace_control
-        case @scanner.scan(/[\}%]\}/)
-        when "}}"
+        case @scanner.scan(@re_markup_end)
+        when @s_out_end
           @tokens << [:token_output_end, nil, @start]
           @start = @scanner.pos
-        when "%}"
+        when @s_tag_end
           @tokens << [:token_tag_end, nil, @start]
           @start = @scanner.pos
         end
@@ -444,26 +453,26 @@ module Liquid2
         case @scanner.get_byte
         when "'"
           @start = @scanner.pos
-          scan_string("'", :token_single_quote_string, RE_SINGLE_QUOTE_STRING_SPECIAL)
+          scan_string("'", :token_single_quote_string, @re_single_quote_string_special)
         when "\""
           @start = @scanner.pos
-          scan_string("\"", :token_double_quote_string, RE_DOUBLE_QUOTE_STRING_SPECIAL)
+          scan_string("\"", :token_double_quote_string, @re_double_quote_string_special)
         when nil
           # End of scanner. Unclosed expression or string literal.
           break
         else
           @scanner.pos -= 1
-          if (value = @scanner.scan(RE_FLOAT))
+          if (value = @scanner.scan(@re_float))
             @tokens << [:token_float, value, @start]
             @start = @scanner.pos
-          elsif (value = @scanner.scan(RE_INT))
+          elsif (value = @scanner.scan(@re_int))
             @tokens << [:token_int, value, @start]
             @start = @scanner.pos
-          elsif (value = @scanner.scan(RE_PUNCTUATION))
+          elsif (value = @scanner.scan(@re_punctuation))
             @tokens << [TOKEN_MAP[value] || raise, nil, @start]
             @start = @scanner.pos
-          elsif (value = @scanner.scan(RE_WORD))
+          elsif (value = @scanner.scan(@re_word))
             @tokens << [TOKEN_MAP[value] || :token_word, value, @start]
             @start = @scanner.pos
           elsif @scanner.scan(/(\r?\n)+/)
@@ -475,11 +484,11 @@ module Liquid2
             # End of the line statement and enclosing `liquid` tag.
             @tokens << [:token_tag_end, nil, @start]
             accept_whitespace_control
-            case @scanner.scan(/[\}%]\}/)
-            when "}}"
+            case @scanner.scan(@re_markup_end)
+            when @s_out_end
               @tokens << [:token_output_end, nil, @start]
               @start = @scanner.pos
-            when "%}"
+            when @s_tag_end
               @tokens << [:token_tag_end, nil, @start]
               @start = @scanner.pos
             end
@@ -536,10 +545,12 @@ module Liquid2
             case @scanner.get_byte
             when "'"
               @start = @scanner.pos
-              scan_string("'", :token_single_quote_string, RE_SINGLE_QUOTE_STRING_SPECIAL)
+              scan_string("'", :token_single_quote_string,
+                          @re_single_quote_string_special)
             when "\""
               @start = @scanner.pos
-              scan_string("\"", :token_double_quote_string, RE_DOUBLE_QUOTE_STRING_SPECIAL)
+              scan_string("\"", :token_double_quote_string,
+                          @re_double_quote_string_special)
             when "}"
               @tokens << [:token_string_interpol_end, nil, @start]
               @start = @scanner.pos
@@ -550,16 +561,16 @@ module Liquid2
                                           [symbol, nil, start_of_string])
             else
               @scanner.pos -= 1
-              if (value = @scanner.scan(RE_FLOAT))
+              if (value = @scanner.scan(@re_float))
                 @tokens << [:token_float, value, @start]
                 @start = @scanner.pos
-              elsif (value = @scanner.scan(RE_INT))
+              elsif (value = @scanner.scan(@re_int))
                 @tokens << [:token_int, value, @start]
                 @start = @scanner.pos
-              elsif (value = @scanner.scan(RE_PUNCTUATION))
+              elsif (value = @scanner.scan(@re_punctuation))
                 @tokens << [TOKEN_MAP[value] || raise, nil, @start]
                 @start = @scanner.pos
-              elsif (value = @scanner.scan(RE_WORD))
+              elsif (value = @scanner.scan(@re_word))
                 @tokens << [TOKEN_MAP[value] || :token_word, value, @start]
                 @start = @scanner.pos
               else

data/lib/liquid2/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module Liquid2
-  VERSION = "0.3.0"
+  VERSION = "0.3.1"
 end

data/performance/benchmark.rb CHANGED Viewed

@@ -48,17 +48,11 @@ env = fixture.env
 source = fixture.templates["index.liquid"]
 template = env.get_template("index.liquid")
-# scanner = StringScanner.new("")
 Benchmark.ips do |x|
   # Configure the number of seconds used during
   # the warmup phase (default 2) and calculation phase (default 5)
   x.config(warmup: 2, time: 5)
-  # x.report("tokenize (#{fixture.name}):") do
-  #   Liquid2::Scanner.tokenize(source, scanner)
-  # end
   x.report("parse (#{fixture.name}):") do
     env.parse(source)
   end

data/sig/liquid2.rbs CHANGED Viewed

@@ -82,10 +82,76 @@ module Liquid2
     @globals: Hash[String, untyped]?
-    @scanner: StringScanner
+    @scanner: singleton(Scanner)
+    @parser: singleton(Parser)
+    @string_scanner: StringScanner
     @arithmetic_operators: bool
+    # The string of characters that indicate the start of a Liquid output statement.
+    @markup_out_start: String
+    # The string of characters that indicate the end of a Liquid output statement.
+    @markup_out_end: String
+    # The string of characters that indicate the start of a Liquid tag.
+    @markup_tag_start: String
+    # The string of characters that indicate the end of a Liquid tag.
+    @markup_tag_end: String
+    # The string of characters that indicate the start of a Liquid comment. This should
+    # include a single trailing `#`. Additional, variable length hashes will be handled
+    # by the tokenizer. It is not possible to change comment syntax to not use `#`.
+    @markup_comment_prefix: String
+    # The string of characters that indicate the end of a Liquid comment, excluding any
+    # hashes.
+    @markup_comment_suffix: String
+    # A regex pattern matching Liquid tag names. Should include `#` for inline comments.
+    @re_tag_name: Regexp
+    @re_word: Regexp
+    @re_int: Regexp
+    @re_float: Regexp
+    @re_double_quote_string_special: Regexp
+    @re_single_quote_string_special: Regexp
+    # A regex pattern matching the start of some Liquid markup. Could be the start of an
+    # output statement, tag or comment. Traditionally `{{`, `{%` and `{#`, respectively.
+    @re_markup_start: Regexp
+    # A regex pattern matching the end of some Liquid markup. Could be the end of
+    # an output statement or tag. Traditionally `}}`, `%}`, respectively.
+    # respectively.
+    @re_markup_end: Regexp
+    # A regex pattern matching any one of the possible characters ending some Liquid
+    # markup. This is used to detect incomplete and malformed markup and provide
+    # helpful error messages.
+    @re_markup_end_chars: Regexp
+    @re_up_to_markup_start: Regexp
+    @re_punctuation: Regexp
+    @re_up_to_inline_comment_end: Regexp
+    @re_up_to_raw_end: Regexp
+    @re_block_comment_chunk: Regexp
+    @re_up_to_doc_end: Regexp
+    @re_line_statement_comment: Regexp
     attr_reader tags: Hash[String, _Tag]
     attr_reader local_namespace_limit: Integer?
@@ -106,7 +172,51 @@ module Liquid2
     attr_reader arithmetic_operators: bool
-    def initialize: (?context_depth_limit: ::Integer, ?globals: Hash[String, untyped]?, ?loader: TemplateLoader?, ?local_namespace_limit: Integer?, ?loop_iteration_limit: Integer?, ?output_stream_limit: Integer?, ?shorthand_indexes: bool, ?suppress_blank_control_flow_blocks: bool, ?undefined: singleton(Undefined), ?falsy_undefined: bool) -> void
+    attr_reader markup_comment_prefix: String
+    attr_reader markup_comment_suffix: String
+    attr_reader markup_out_end: String
+    attr_reader markup_out_start: String
+    attr_reader markup_tag_end: String
+    attr_reader markup_tag_start: String
+    attr_reader re_tag_name: Regexp
+    attr_reader re_word: Regexp
+    attr_reader re_int: Regexp
+    attr_reader re_float: Regexp
+    attr_reader re_double_quote_string_special: Regexp
+    attr_reader re_single_quote_string_special: Regexp
+    attr_reader re_markup_start: Regexp
+    attr_reader re_markup_end: Regexp
+    attr_reader re_markup_end_chars: Regexp
+    attr_reader re_up_to_markup_start: Regexp
+    attr_reader re_punctuation: Regexp
+    attr_reader re_up_to_inline_comment_end: Regexp
+    attr_reader re_up_to_raw_end: Regexp
+    attr_reader re_block_comment_chunk: Regexp
+    attr_reader re_up_to_doc_end: Regexp
+    attr_reader re_line_statement_comment: Regexp
+    def initialize: (?arithmetic_operators: bool, ?context_depth_limit: ::Integer, ?falsy_undefined: bool, ?globals: untyped?, ?loader: TemplateLoader?, ?local_namespace_limit: Integer?, ?loop_iteration_limit: Integer?, ?markup_comment_prefix: ::String, ?markup_comment_suffix: ::String, ?markup_out_end: ::String, ?markup_out_start: ::String, ?markup_tag_end: ::String, ?markup_tag_start: ::String, ?output_stream_limit: Integer?, ?parser: singleton(Parser), ?scanner: singleton(Scanner), ?shorthand_indexes: bool, ?suppress_blank_control_flow_blocks: bool, ?undefined: singleton(Undefined)) -> void
     # @param source [String] template source text.
     # @return [Template]
@@ -136,6 +246,8 @@ module Liquid2
     def delete_tag: (String name) -> (_Tag | nil)
     def setup_tags_and_filters: () -> void
+    def setup_scanner: () -> void
     def undefined: (String name, ?node: _HasToken?) -> Undefined
@@ -171,37 +283,59 @@ module Liquid2
     # A pointer to the start of the current token.
     @start: Integer
-    # Tokens are arrays of (kind, value, start index)
-    @tokens: Array[[Symbol, String?, Integer]]
+    @s_out_start: String
-    attr_reader tokens: Array[[Symbol, String?, Integer]]
+    @s_out_end: String
+    @s_tag_start: String
+    @s_tag_end: String
+    @s_comment_prefix: String
+    @s_comment_suffix: String
+    @re_tag_name: Regexp
+    @re_word: Regexp
-    RE_MARKUP_START: ::Regexp
+    @re_int: Regexp
-    RE_WHITESPACE: ::Regexp
+    @re_float: Regexp
-    RE_LINE_SPACE: ::Regexp
+    @re_double_quote_string_special: Regexp
-    RE_WORD: ::Regexp
+    @re_single_quote_string_special: Regexp
-    RE_INT: ::Regexp
+    @re_markup_start: Regexp
-    RE_FLOAT: ::Regexp
+    @re_markup_end: Regexp
-    RE_PUNCTUATION: ::Regexp
+    @re_markup_end_chars: Regexp
-    RE_SINGLE_QUOTE_STRING_SPECIAL: ::Regexp
+    @re_up_to_markup_start: Regexp
-    RE_DOUBLE_QUOTE_STRING_SPECIAL: ::Regexp
+    @re_punctuation: Regexp
+    @re_up_to_inline_comment_end: Regexp
+    @re_up_to_raw_end: Regexp
+    @re_block_comment_chunk: Regexp
+    @re_up_to_doc_end: Regexp
+    @re_line_statement_comment: Regexp
+    # Tokens are arrays of (kind, value, start index)
+    @tokens: Array[[Symbol, String?, Integer]]
+    attr_reader tokens: Array[[Symbol, String?, Integer]]
     # Keywords and symbols that get their own token kind.
     TOKEN_MAP: Hash[String, Symbol]
-    def self.tokenize: (String source, StringScanner scanner) -> Array[[Symbol, String?, Integer]]
+    def self.tokenize: (Environment env, String source, StringScanner scanner) -> Array[[Symbol, String?, Integer]]
     # @param source [String]
     # @param scanner [StringScanner]
-    def initialize: (String source, StringScanner scanner) -> void
+    def initialize: (Environment env, String source, StringScanner scanner) -> void
     def run: () -> void
@@ -1760,6 +1894,8 @@ module Liquid2
     # Return the subsequence of _left_ starting at _start_ up to _length_.
     def self.slice: (untyped left, untyped start, ?untyped length) -> untyped
+    def self.better_slice: (untyped left, ?untyped start_, ?untyped stop_, ?untyped step_, ?start: untyped, ?stop: untyped, ?step: untyped) -> untyped
     # Return _left_ with all characters converted to uppercase.
     # Coerce _left_ to a string if it is not one already.

data.tar.gz.sig CHANGED Viewed

Binary file

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: liquid2
 version: !ruby/object:Gem::Version
-  version: 0.3.0
+  version: 0.3.1
 platform: ruby
 authors:
 - James Prior

metadata.gz.sig CHANGED Viewed

Binary file