RubyGems - parsanol - Versions diffs - 1.2.2-aarch64-linux → 1.3.2-aarch64-linux - Mend

parsanol 1.2.2-aarch64-linux → 1.3.2-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/HISTORY.txt +33 -3
data/README.adoc +103 -9
data/lib/parsanol/3.2/parsanol_native.so +0 -0
data/lib/parsanol/3.3/parsanol_native.so +0 -0
data/lib/parsanol/3.4/parsanol_native.so +0 -0
data/lib/parsanol/4.0/parsanol_native.so +0 -0
data/lib/parsanol/native/batch_decoder.rb +252 -0
data/lib/parsanol/native/parser.rb +28 -574
data/lib/parsanol/native/transformer.rb +125 -58
data/lib/parsanol/native.rb +107 -183
data/lib/parsanol/parser.rb +2 -6
data/lib/parsanol/slice.rb +51 -105
data/lib/parsanol/version.rb +1 -1
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c8db2fae51762a0e0b70dc2ab9bed59a653ad3cb6f244ba1e38473e5c7c3de4a
-  data.tar.gz: 2a7217e553f6f857347459477d1df17356f20891ec21f85f95e2512e5bd9ba72
+  metadata.gz: a4e1c8284b1cdc7d254fbd5e7af15417dc56e8a631ce8a1891591bc40ef3d420
+  data.tar.gz: 0ae7e373ebda15f3cab39f1da332a53f1c9abe67e10260847e718817960a671a
 SHA512:
-  metadata.gz: 60e95d9a28a1acd0c3fa636a47886c4fe4a358edc69042d4dfa925d6bd9b8bc71b8d176b44130f8a28947305d531f5a30be87fe6cde5591253be6e97998caf6c
-  data.tar.gz: 1d04982bc519a1845c6a0d49a148b37f835dc839c9013927ac8ab381453c3baa4ae7b241012a2721c5a976d176f12c9501378192b92fdf26d862e57721e9ff7f
+  metadata.gz: f26c1803a829cfb6b9753a7e4ed27fdafbef8df350b043e11c866384d08b01949a8dfa06c287d94b03beae8f94cac76de23f2c4b20e4eda1830b0a98cbdad843
+  data.tar.gz: 137b12b8eb6f2ef8c5ddc811a160a4f6f91bf00a6a34c1a5f9a4bf6978c928744729acc8779a543828e4245f88da6922f5298ae92e447185f819a847c1409523

data/HISTORY.txt CHANGED Viewed

@@ -1,3 +1,33 @@
+== Parsanol 1.3.0 (unreleased)
+Breaking changes:
+* Simplified API: Single `parse()` method with lazy line/column
+* Removed deprecated methods: `parse_parslet`, `parse_parslet_with_positions`,
+  `parse_with_transform`, `parse_to_objects`, `parse_raw`
+New features:
+* Lazy line/column computation in `Slice#line_and_column` - computed only when accessed
+* `BatchDecoder` module for efficient batch AST processing
+* Grammar accepts Ruby atoms directly - no JSON serialization step needed
+Bug fixes:
+* Fixed grammar cache cycle detection to prevent stack overflow on recursive grammars
+* Fixed wrapper vs repetition pattern detection in `AstTransformer`:
+  - Same inner keys → Repetition pattern (keep as array)
+  - Different inner keys → Wrapper pattern (merge inner hashes)
+* Correct entity extraction for grammars with multiple declarations
+== Parsanol 1.2.2 (2026-03-07)
+Syncs with parsanol-rs 0.3.0:
+* Full FFI support for capture, scope, and dynamic atoms
+* Native extension performance for all new features
+* BuilderCallbacks module for streaming parsing
 == Parsanol 1.2.1 (2026-03-07)
 Bug fix:
@@ -34,12 +64,12 @@ Runtime-determined parsing via FFI callbacks:
 === Native Extension Updates
-* Updated to parsanol-rs 0.2.0
+* Updated to parsanol-rs 0.3.0
 * New backend abstraction (Packrat, Bytecode, Auto)
 * Streaming parser with capture extraction
 * Performance improvements
-== Parsanol 1.1.0 (2025-03-15)
+== Parsanol 1.1.0 (2026-03-05)
 Position information is now returned by default:
@@ -54,7 +84,7 @@ Performance improvements:
 * ZeroCopy API for direct FFI object construction
 * Parallel batch parsing with `Parsanol::Parallel`
-== Parsanol 1.0.0 (2025-03-02)
+== Parsanol 1.0.0 (2026-03-03)
 Initial release of Parsanol, a high-performance PEG parser library for Ruby.

data/README.adoc CHANGED Viewed

@@ -21,7 +21,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
 * <<basic-parsing,PEG-based Parser Construction>> - Declarative grammar definition
 * <<error-reporting,Detailed Error Reporting>> - Precise failure location and context
-* <<native-extension,Rust Native Extension>> - Up to 29x faster parsing
+* <<native-extension,Rust Native Extension>> - **Up to 1300x faster parsing**
 * <<slice-support,Slice Support>> - Source position preservation for linters and IDEs
 * <<transformation,Tree Transformation>> - Pattern-based AST construction
 * <<streaming-builder,Streaming Builder API>> - Single-pass parsing with callbacks
@@ -29,6 +29,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
 * <<infix-expressions,Infix Expression Parsing>> - Built-in operator precedence support
 * <<security-features,Security Features>> - Input size and recursion limits
 * <<debug-tools,Debug Tools>> - Tracing and grammar visualization
+* <<benchmarking,Benchmarking>> - Built-in performance testing tools
 == Installation
@@ -53,6 +54,23 @@ Or install it yourself as:
 gem install parsanol
 ----
+== Examples
+The repository includes several example files demonstrating different use cases:
+* link:examples/benchmark_examples.rb[] - Basic performance benchmarks
+* link:examples/benchmark_full.rb[] - Comprehensive benchmark suite
+* link:examples/parsing_modes.rb[] - Demonstrates all parsing modes
+* link:examples/parslet_migration.rb[] - Migration guide from Parslet
+[source,ruby]
+----
+# Run examples
+bundle exec ruby examples/benchmark_examples.rb
+bundle exec ruby examples/parsing_modes.rb
+bundle exec ruby examples/parslet_migration.rb
+----
 == Usage
 [[basic-parsing]]
@@ -103,6 +121,23 @@ ast = MyTransform.new.apply(parse_tree)
 [[native-extension]]
 === Native Extension
+The Rust native extension provides significant performance improvements over pure Ruby parsing:
+[cols="2,1,1,1"]
+|===
+| Pattern Type | Ruby (i/s) | Native (i/s) | Speedup
+| Simple string match | ~575 | ~775,000 | **1,340x**
+| Sequence (3 parts) | ~580 | ~530,000 | **910x**
+| Named capture | ~575 | ~510,000 | **880x**
+| Repetition (10x digits) | ~560 | ~740,000 | **1,310x**
+| Alternative (3 options) | ~575 | ~720,000 | **1,250x**
+| Calculator expression | ~570 | ~180,000 | **315x**
+| Repetition with name | ~575 | ~125,000 | **220x**
+| Repetition (20x) | ~560 | ~720,000 | **1,280x**
+|===
 For maximum performance, compile the Rust native extension:
 [source,shell]
@@ -114,6 +149,13 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 bundle exec rake compile
 ----
+Run your own benchmarks with:
+[source,ruby]
+----
+bundle exec ruby examples/benchmark_examples.rb
+----
 [[slice-support]]
 === Slice Support
@@ -247,9 +289,9 @@ parser.parse('123')  # Works exactly the same
 | `sequence(:x)` | ✅ | Match array of values
 | `subtree(:x)` | ✅ | Match any subtree
 | `Parslet::Slice` | ✅ | Parsanol::Slice compatible
-| `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.1.0)
-| `scope { }` | ✅ | Isolated capture context (NEW in 1.1.0)
-| `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.1.0)
+| `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.2.0)
+| `scope { }` | ✅ | Isolated capture context (NEW in 1.2.0)
+| `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.2.0)
 |===
 NOTE: The new capture, scope, and dynamic atoms provide powerful extraction and context-sensitive parsing capabilities. See the <<captures,Captures>> section for details.
@@ -475,7 +517,7 @@ For more details on backend selection and grammar analysis, see the https://pars
 [[captures]]
 == Captures, Scopes, and Dynamic Atoms
-Parsanol 1.1.0 introduces powerful new features for extracting and managing parsed data.
+Parsanol 1.2.0 introduces powerful new features for extracting and managing parsed data.
 [[capture-atoms]]
 === Capture Atoms
@@ -789,12 +831,64 @@ ruby -I lib -e "require 'parsanol'; puts Parsanol::Native.available?"
 [source,shell]
 ----
-# Quick benchmarks
-bundle exec rake benchmark
+# Quick benchmarks (examples provided in the repository)
+bundle exec ruby examples/benchmark_examples.rb
+# Full benchmark suite
+bundle exec ruby examples/benchmark_full.rb
+# Compare parsing modes
+bundle exec ruby examples/parsing_modes.rb
+# Parslet migration examples
+bundle exec ruby examples/parslet_migration.rb
+----
+[[benchmarking]]
+== Benchmarking Your Own Code
-# Comprehensive benchmark suite
-bundle exec rake benchmark:all
+Parsanol includes built-in benchmarking tools to measure performance:
+[source,ruby]
 ----
+require 'parsanol'
+require 'benchmark/ips'
+# Your grammar
+grammar = Parsanol.str("hello").as(:greeting)
+json = Parsanol::Native.serialize_grammar(grammar)
+# Benchmark
+Benchmark.ips do |x|
+  x.config(warmup: 5, time: 3)
+  x.report("Ruby mode") { grammar.parse("hello", mode: :ruby) }
+  x.report("Native mode") { Parsanol::Native.parse(json, "hello") }
+  x.compare!
+end
+----
+=== Performance Factors
+The actual speedup depends on several factors:
+1. **Grammar complexity** - More complex grammars have more transformation overhead
+2. **Named captures** - More named captures mean more hash construction
+3. **Repetition patterns** - Named repetitions have more overhead than unnamed
+4. **Input size** - Larger inputs benefit more from native parsing
+[cols="1,2,2"]
+|===
+| Pattern | Speedup | Reason for Difference
+| Simple strings | ~1,300x | Minimal transformation
+| Sequences | ~900x | String joining overhead
+| Alternatives | ~1,250x | Fast path in Rust
+| Repetitions (unnamed) | ~1,300x | Minimal transformation
+| Repetitions (named) | ~220x | Array of hashes construction
+| Complex expressions | ~315x | More grammar elements
+|===
 == License

data/lib/parsanol/3.2/parsanol_native.so CHANGED Viewed

Binary file

data/lib/parsanol/3.3/parsanol_native.so CHANGED Viewed

Binary file

data/lib/parsanol/3.4/parsanol_native.so CHANGED Viewed

Binary file

data/lib/parsanol/4.0/parsanol_native.so CHANGED Viewed

Binary file

data/lib/parsanol/native/batch_decoder.rb ADDED Viewed

@@ -0,0 +1,252 @@
+# frozen_string_literal: true
+require 'parsanol/native/transformer'
+module Parsanol
+  module Native
+    # Decodes flat u64 arrays from Rust batch parser into Ruby AST
+    #
+    # The batch format uses tagged u64 values:
+    # - 0x00 = nil
+    # - 0x01 + value = bool (0 or 1)
+    # - 0x02 + value = int
+    # - 0x03 + bits = float (IEEE 754 bits)
+    # - 0x04 + offset + length = input string reference
+    # - 0x05 ... 0x06 = array (start ... end)
+    # - 0x07 ... 0x08 = hash (start ... end)
+    # - 0x09 + len + data... = hash key
+    # - 0x0A + len + data... = inline string
+    module BatchDecoder
+      TAG_NIL = 0x00
+      TAG_BOOL = 0x01
+      TAG_INT = 0x02
+      TAG_FLOAT = 0x03
+      TAG_STRING = 0x04
+      TAG_ARRAY_START = 0x05
+      TAG_ARRAY_END = 0x06
+      TAG_HASH_START = 0x07
+      TAG_HASH_END = 0x08
+      TAG_HASH_KEY = 0x09
+      TAG_INLINE_STRING = 0x0A
+      TAG_SYMBOL = 0x0B
+      TAG_REPETITION = 0x0C
+      TAG_SEQUENCE = 0x0D
+      class << self
+        # Decode a flat u64 array into Ruby AST with Slice objects
+        #
+        # @param data [Array<Integer>] Flat u64 array from batch parser
+        # @param input [String] Original input string (for Slice references)
+        # @param slice_class [Class] The Slice class to use
+        # @return [Object] Ruby AST (Hash, Array, Slice, etc.)
+        def decode(data, input, slice_class)
+          @input = input
+          @input_bytes = input.b
+          @slice_class = slice_class
+          @pos = 0
+          @data = data
+          decode_value
+        end
+        # Decode batch format to Ruby AST and apply transformation.
+        #
+        # The Rust parser produces raw AST that needs transformation to match
+        # Ruby parser behavior (merging duplicate keys, etc.)
+        #
+        # @param data [Array<Integer>|Object] Either flat u64 array from batch parser OR
+        #   pre-decoded Ruby value from _parse_raw
+        # @param input [String] Original input string (for Slice references)
+        # @param slice_class [Class] The Slice class to use
+        # @param grammar_atom [Parsanol::Atoms::Base] The grammar atom (unused, kept for API compat)
+        # @return [Object] Transformed Ruby AST
+        def decode_and_flatten(data, input, slice_class, grammar_atom)
+          # Check if data is batch data (flat u64 array) or already a Ruby value
+          if data.is_a?(Integer) || (data.is_a?(Array) && data.first.is_a?(Integer))
+            # Batch data (flat u64 array) - decode first, then transform
+            raw_ast = decode(data, input, slice_class)
+            AstTransformer.transform(raw_ast)
+          else
+            # Already decoded Ruby value from _parse_raw - apply transformer directly
+            AstTransformer.transform(data)
+          end
+        end
+        # Join consecutive Slice objects in arrays into single Slices
+        # This matches what transform_ast does in Rust (join_slices_from_array)
+        #
+        # @param value [Object] AST value
+        # @param slice_class [Class] The Slice class to check for
+        # @param input [String] Original input string
+        # @return [Object] AST with joined slices
+        def join_consecutive_slices(value, slice_class, input)
+          input_bytes = input.b
+          case value
+          when Array
+            # Recursively process array elements
+            processed = value.map { |v| join_consecutive_slices(v, slice_class, input) }
+            # Check if all non-nil elements are Slices
+            non_nil = processed.compact
+            if non_nil.all? { |v| v.is_a?(slice_class) }
+              # Check if slices are consecutive
+              if slices_consecutive?(non_nil)
+                # Join into single slice
+                join_slices(non_nil, slice_class, input_bytes, input)
+              else
+                processed
+              end
+            else
+              processed
+            end
+          when Hash
+            # Process hash values recursively
+            result = {}
+            value.each do |k, v|
+              result[k] = join_consecutive_slices(v, slice_class, input)
+            end
+            result
+          else
+            value
+          end
+        end
+        private
+        def slices_consecutive?(slices)
+          return true if slices.empty?
+          slices.each_cons(2).all? do |a, b|
+            a.offset + a.content.bytesize == b.offset
+          end
+        end
+        def join_slices(slices, slice_class, input_bytes, input)
+          return nil if slices.empty?
+          return slices.first if slices.length == 1
+          first = slices.first
+          last = slices.last
+          total_length = last.offset + last.content.bytesize - first.offset
+          content = input_bytes[first.offset, total_length]
+          content = content.force_encoding('UTF-8') if content
+          slice_class.new(first.offset, content, input)
+        end
+        def decode_value
+          tag = @data[@pos]
+          @pos += 1
+          case tag
+          when TAG_NIL
+            nil
+          when TAG_BOOL
+            val = @data[@pos]
+            @pos += 1
+            val != 0
+          when TAG_INT
+            val = @data[@pos]
+            @pos += 1
+            # Handle negative numbers (signed i64 stored as u64)
+            if val >= 0x8000_0000_0000_0000
+              val = val - 0x1_0000_0000_0000_0000
+            end
+            val
+          when TAG_FLOAT
+            bits = @data[@pos]
+            @pos += 1
+            # Convert IEEE 754 bits to float
+            [bits].pack('Q').unpack1('D')
+          when TAG_STRING
+            offset = @data[@pos]
+            length = @data[@pos + 1]
+            @pos += 2
+            create_slice(offset, length)
+          when TAG_SYMBOL
+            # Symbol is encoded like inline string: len, then u64 chunks
+            len = @data[@pos]
+            @pos += 1
+            str = decode_inline_string_bytes(len)
+            str.to_sym
+          when TAG_REPETITION
+            inner = decode_value
+            [:repetition, inner].compact
+          when TAG_SEQUENCE
+            inner = decode_value
+            [:sequence, inner].compact
+          when TAG_ARRAY_START
+            decode_array
+          when TAG_HASH_START
+            decode_hash
+          else
+            raise "Unknown tag: #{tag} at position #{@pos - 1}"
+          end
+        end
+        def decode_array
+          result = []
+          loop do
+            tag = @data[@pos]
+            break if tag == TAG_ARRAY_END
+            result << decode_value
+          end
+          @pos += 1 # consume TAG_ARRAY_END
+          result
+        end
+        def decode_hash
+          result = {}
+          loop do
+            tag = @data[@pos]
+            break if tag == TAG_HASH_END
+            # Read key
+            raise "Expected TAG_HASH_KEY, got #{tag}" unless tag == TAG_HASH_KEY
+            @pos += 1
+            key = decode_inline_string
+            # Read value
+            value = decode_value
+            # Keep original key format (camelCase) for Ruby parser compatibility
+            result[key.to_sym] = value
+          end
+          @pos += 1 # consume TAG_HASH_END
+          result
+        end
+        def decode_inline_string
+          len = @data[@pos]
+          @pos += 1
+          decode_inline_string_bytes(len)
+        end
+        # Decode inline string bytes given the length
+        # @param len [Integer] Length of the string in bytes
+        # @return [String] Decoded string
+        def decode_inline_string_bytes(len)
+          # Read u64 chunks
+          chunks = (len + 7) / 8
+          bytes = String.new(encoding: 'ASCII-8BIT', capacity: len)
+          chunks.times do
+            chunk = @data[@pos]
+            @pos += 1
+            8.times do |byte_idx|
+              break if bytes.bytesize >= len
+              bytes << ((chunk >> (byte_idx * 8)) & 0xFF)
+            end
+          end
+          bytes.force_encoding('UTF-8')
+        end
+        def create_slice(offset, length)
+          content = @input_bytes[offset, length]
+          content = content.force_encoding('UTF-8') if content
+          @slice_class.new(offset, content, @input)
+        end
+      end
+    end
+  end
+end