parsanol 1.2.2-aarch64-linux → 1.3.2-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: c8db2fae51762a0e0b70dc2ab9bed59a653ad3cb6f244ba1e38473e5c7c3de4a
4
- data.tar.gz: 2a7217e553f6f857347459477d1df17356f20891ec21f85f95e2512e5bd9ba72
3
+ metadata.gz: a4e1c8284b1cdc7d254fbd5e7af15417dc56e8a631ce8a1891591bc40ef3d420
4
+ data.tar.gz: 0ae7e373ebda15f3cab39f1da332a53f1c9abe67e10260847e718817960a671a
5
5
  SHA512:
6
- metadata.gz: 60e95d9a28a1acd0c3fa636a47886c4fe4a358edc69042d4dfa925d6bd9b8bc71b8d176b44130f8a28947305d531f5a30be87fe6cde5591253be6e97998caf6c
7
- data.tar.gz: 1d04982bc519a1845c6a0d49a148b37f835dc839c9013927ac8ab381453c3baa4ae7b241012a2721c5a976d176f12c9501378192b92fdf26d862e57721e9ff7f
6
+ metadata.gz: f26c1803a829cfb6b9753a7e4ed27fdafbef8df350b043e11c866384d08b01949a8dfa06c287d94b03beae8f94cac76de23f2c4b20e4eda1830b0a98cbdad843
7
+ data.tar.gz: 137b12b8eb6f2ef8c5ddc811a160a4f6f91bf00a6a34c1a5f9a4bf6978c928744729acc8779a543828e4245f88da6922f5298ae92e447185f819a847c1409523
data/HISTORY.txt CHANGED
@@ -1,3 +1,33 @@
1
+ == Parsanol 1.3.0 (unreleased)
2
+
3
+ Breaking changes:
4
+
5
+ * Simplified API: Single `parse()` method with lazy line/column
6
+ * Removed deprecated methods: `parse_parslet`, `parse_parslet_with_positions`,
7
+ `parse_with_transform`, `parse_to_objects`, `parse_raw`
8
+
9
+ New features:
10
+
11
+ * Lazy line/column computation in `Slice#line_and_column` - computed only when accessed
12
+ * `BatchDecoder` module for efficient batch AST processing
13
+ * Grammar accepts Ruby atoms directly - no JSON serialization step needed
14
+
15
+ Bug fixes:
16
+
17
+ * Fixed grammar cache cycle detection to prevent stack overflow on recursive grammars
18
+ * Fixed wrapper vs repetition pattern detection in `AstTransformer`:
19
+ - Same inner keys → Repetition pattern (keep as array)
20
+ - Different inner keys → Wrapper pattern (merge inner hashes)
21
+ * Correct entity extraction for grammars with multiple declarations
22
+
23
+ == Parsanol 1.2.2 (2026-03-07)
24
+
25
+ Syncs with parsanol-rs 0.3.0:
26
+
27
+ * Full FFI support for capture, scope, and dynamic atoms
28
+ * Native extension performance for all new features
29
+ * BuilderCallbacks module for streaming parsing
30
+
1
31
  == Parsanol 1.2.1 (2026-03-07)
2
32
 
3
33
  Bug fix:
@@ -34,12 +64,12 @@ Runtime-determined parsing via FFI callbacks:
34
64
 
35
65
  === Native Extension Updates
36
66
 
37
- * Updated to parsanol-rs 0.2.0
67
+ * Updated to parsanol-rs 0.3.0
38
68
  * New backend abstraction (Packrat, Bytecode, Auto)
39
69
  * Streaming parser with capture extraction
40
70
  * Performance improvements
41
71
 
42
- == Parsanol 1.1.0 (2025-03-15)
72
+ == Parsanol 1.1.0 (2026-03-05)
43
73
 
44
74
  Position information is now returned by default:
45
75
 
@@ -54,7 +84,7 @@ Performance improvements:
54
84
  * ZeroCopy API for direct FFI object construction
55
85
  * Parallel batch parsing with `Parsanol::Parallel`
56
86
 
57
- == Parsanol 1.0.0 (2025-03-02)
87
+ == Parsanol 1.0.0 (2026-03-03)
58
88
 
59
89
  Initial release of Parsanol, a high-performance PEG parser library for Ruby.
60
90
 
data/README.adoc CHANGED
@@ -21,7 +21,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
21
21
 
22
22
  * <<basic-parsing,PEG-based Parser Construction>> - Declarative grammar definition
23
23
  * <<error-reporting,Detailed Error Reporting>> - Precise failure location and context
24
- * <<native-extension,Rust Native Extension>> - Up to 29x faster parsing
24
+ * <<native-extension,Rust Native Extension>> - **Up to 1300x faster parsing**
25
25
  * <<slice-support,Slice Support>> - Source position preservation for linters and IDEs
26
26
  * <<transformation,Tree Transformation>> - Pattern-based AST construction
27
27
  * <<streaming-builder,Streaming Builder API>> - Single-pass parsing with callbacks
@@ -29,6 +29,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
29
29
  * <<infix-expressions,Infix Expression Parsing>> - Built-in operator precedence support
30
30
  * <<security-features,Security Features>> - Input size and recursion limits
31
31
  * <<debug-tools,Debug Tools>> - Tracing and grammar visualization
32
+ * <<benchmarking,Benchmarking>> - Built-in performance testing tools
32
33
 
33
34
  == Installation
34
35
 
@@ -53,6 +54,23 @@ Or install it yourself as:
53
54
  gem install parsanol
54
55
  ----
55
56
 
57
+ == Examples
58
+
59
+ The repository includes several example files demonstrating different use cases:
60
+
61
+ * link:examples/benchmark_examples.rb[] - Basic performance benchmarks
62
+ * link:examples/benchmark_full.rb[] - Comprehensive benchmark suite
63
+ * link:examples/parsing_modes.rb[] - Demonstrates all parsing modes
64
+ * link:examples/parslet_migration.rb[] - Migration guide from Parslet
65
+
66
+ [source,ruby]
67
+ ----
68
+ # Run examples
69
+ bundle exec ruby examples/benchmark_examples.rb
70
+ bundle exec ruby examples/parsing_modes.rb
71
+ bundle exec ruby examples/parslet_migration.rb
72
+ ----
73
+
56
74
  == Usage
57
75
 
58
76
  [[basic-parsing]]
@@ -103,6 +121,23 @@ ast = MyTransform.new.apply(parse_tree)
103
121
 
104
122
  [[native-extension]]
105
123
  === Native Extension
124
+
125
+ The Rust native extension provides significant performance improvements over pure Ruby parsing:
126
+
127
+ [cols="2,1,1,1"]
128
+ |===
129
+ | Pattern Type | Ruby (i/s) | Native (i/s) | Speedup
130
+
131
+ | Simple string match | ~575 | ~775,000 | **1,340x**
132
+ | Sequence (3 parts) | ~580 | ~530,000 | **910x**
133
+ | Named capture | ~575 | ~510,000 | **880x**
134
+ | Repetition (10x digits) | ~560 | ~740,000 | **1,310x**
135
+ | Alternative (3 options) | ~575 | ~720,000 | **1,250x**
136
+ | Calculator expression | ~570 | ~180,000 | **315x**
137
+ | Repetition with name | ~575 | ~125,000 | **220x**
138
+ | Repetition (20x) | ~560 | ~720,000 | **1,280x**
139
+ |===
140
+
106
141
  For maximum performance, compile the Rust native extension:
107
142
 
108
143
  [source,shell]
@@ -114,6 +149,13 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
114
149
  bundle exec rake compile
115
150
  ----
116
151
 
152
+ Run your own benchmarks with:
153
+
154
+ [source,ruby]
155
+ ----
156
+ bundle exec ruby examples/benchmark_examples.rb
157
+ ----
158
+
117
159
  [[slice-support]]
118
160
  === Slice Support
119
161
 
@@ -247,9 +289,9 @@ parser.parse('123') # Works exactly the same
247
289
  | `sequence(:x)` | ✅ | Match array of values
248
290
  | `subtree(:x)` | ✅ | Match any subtree
249
291
  | `Parslet::Slice` | ✅ | Parsanol::Slice compatible
250
- | `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.1.0)
251
- | `scope { }` | ✅ | Isolated capture context (NEW in 1.1.0)
252
- | `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.1.0)
292
+ | `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.2.0)
293
+ | `scope { }` | ✅ | Isolated capture context (NEW in 1.2.0)
294
+ | `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.2.0)
253
295
  |===
254
296
 
255
297
  NOTE: The new capture, scope, and dynamic atoms provide powerful extraction and context-sensitive parsing capabilities. See the <<captures,Captures>> section for details.
@@ -475,7 +517,7 @@ For more details on backend selection and grammar analysis, see the https://pars
475
517
  [[captures]]
476
518
  == Captures, Scopes, and Dynamic Atoms
477
519
 
478
- Parsanol 1.1.0 introduces powerful new features for extracting and managing parsed data.
520
+ Parsanol 1.2.0 introduces powerful new features for extracting and managing parsed data.
479
521
 
480
522
  [[capture-atoms]]
481
523
  === Capture Atoms
@@ -789,12 +831,64 @@ ruby -I lib -e "require 'parsanol'; puts Parsanol::Native.available?"
789
831
 
790
832
  [source,shell]
791
833
  ----
792
- # Quick benchmarks
793
- bundle exec rake benchmark
834
+ # Quick benchmarks (examples provided in the repository)
835
+ bundle exec ruby examples/benchmark_examples.rb
836
+
837
+ # Full benchmark suite
838
+ bundle exec ruby examples/benchmark_full.rb
839
+
840
+ # Compare parsing modes
841
+ bundle exec ruby examples/parsing_modes.rb
842
+
843
+ # Parslet migration examples
844
+ bundle exec ruby examples/parslet_migration.rb
845
+ ----
846
+
847
+ [[benchmarking]]
848
+ == Benchmarking Your Own Code
794
849
 
795
- # Comprehensive benchmark suite
796
- bundle exec rake benchmark:all
850
+ Parsanol includes built-in benchmarking tools to measure performance:
851
+
852
+ [source,ruby]
797
853
  ----
854
+ require 'parsanol'
855
+ require 'benchmark/ips'
856
+
857
+ # Your grammar
858
+ grammar = Parsanol.str("hello").as(:greeting)
859
+ json = Parsanol::Native.serialize_grammar(grammar)
860
+
861
+ # Benchmark
862
+ Benchmark.ips do |x|
863
+ x.config(warmup: 5, time: 3)
864
+
865
+ x.report("Ruby mode") { grammar.parse("hello", mode: :ruby) }
866
+ x.report("Native mode") { Parsanol::Native.parse(json, "hello") }
867
+
868
+ x.compare!
869
+ end
870
+ ----
871
+
872
+ === Performance Factors
873
+
874
+ The actual speedup depends on several factors:
875
+
876
+ 1. **Grammar complexity** - More complex grammars have more transformation overhead
877
+ 2. **Named captures** - More named captures mean more hash construction
878
+ 3. **Repetition patterns** - Named repetitions have more overhead than unnamed
879
+ 4. **Input size** - Larger inputs benefit more from native parsing
880
+
881
+ [cols="1,2,2"]
882
+ |===
883
+ | Pattern | Speedup | Reason for Difference
884
+
885
+ | Simple strings | ~1,300x | Minimal transformation
886
+ | Sequences | ~900x | String joining overhead
887
+ | Alternatives | ~1,250x | Fast path in Rust
888
+ | Repetitions (unnamed) | ~1,300x | Minimal transformation
889
+ | Repetitions (named) | ~220x | Array of hashes construction
890
+ | Complex expressions | ~315x | More grammar elements
891
+ |===
798
892
 
799
893
  == License
800
894
 
Binary file
Binary file
Binary file
Binary file
@@ -0,0 +1,252 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'parsanol/native/transformer'
4
+
5
+ module Parsanol
6
+ module Native
7
+ # Decodes flat u64 arrays from Rust batch parser into Ruby AST
8
+ #
9
+ # The batch format uses tagged u64 values:
10
+ # - 0x00 = nil
11
+ # - 0x01 + value = bool (0 or 1)
12
+ # - 0x02 + value = int
13
+ # - 0x03 + bits = float (IEEE 754 bits)
14
+ # - 0x04 + offset + length = input string reference
15
+ # - 0x05 ... 0x06 = array (start ... end)
16
+ # - 0x07 ... 0x08 = hash (start ... end)
17
+ # - 0x09 + len + data... = hash key
18
+ # - 0x0A + len + data... = inline string
19
+ module BatchDecoder
20
+ TAG_NIL = 0x00
21
+ TAG_BOOL = 0x01
22
+ TAG_INT = 0x02
23
+ TAG_FLOAT = 0x03
24
+ TAG_STRING = 0x04
25
+ TAG_ARRAY_START = 0x05
26
+ TAG_ARRAY_END = 0x06
27
+ TAG_HASH_START = 0x07
28
+ TAG_HASH_END = 0x08
29
+ TAG_HASH_KEY = 0x09
30
+ TAG_INLINE_STRING = 0x0A
31
+ TAG_SYMBOL = 0x0B
32
+ TAG_REPETITION = 0x0C
33
+ TAG_SEQUENCE = 0x0D
34
+
35
+ class << self
36
+ # Decode a flat u64 array into Ruby AST with Slice objects
37
+ #
38
+ # @param data [Array<Integer>] Flat u64 array from batch parser
39
+ # @param input [String] Original input string (for Slice references)
40
+ # @param slice_class [Class] The Slice class to use
41
+ # @return [Object] Ruby AST (Hash, Array, Slice, etc.)
42
+ def decode(data, input, slice_class)
43
+ @input = input
44
+ @input_bytes = input.b
45
+ @slice_class = slice_class
46
+ @pos = 0
47
+ @data = data
48
+ decode_value
49
+ end
50
+
51
+ # Decode batch format to Ruby AST and apply transformation.
52
+ #
53
+ # The Rust parser produces raw AST that needs transformation to match
54
+ # Ruby parser behavior (merging duplicate keys, etc.)
55
+ #
56
+ # @param data [Array<Integer>|Object] Either flat u64 array from batch parser OR
57
+ # pre-decoded Ruby value from _parse_raw
58
+ # @param input [String] Original input string (for Slice references)
59
+ # @param slice_class [Class] The Slice class to use
60
+ # @param grammar_atom [Parsanol::Atoms::Base] The grammar atom (unused, kept for API compat)
61
+ # @return [Object] Transformed Ruby AST
62
+ def decode_and_flatten(data, input, slice_class, grammar_atom)
63
+ # Check if data is batch data (flat u64 array) or already a Ruby value
64
+ if data.is_a?(Integer) || (data.is_a?(Array) && data.first.is_a?(Integer))
65
+ # Batch data (flat u64 array) - decode first, then transform
66
+ raw_ast = decode(data, input, slice_class)
67
+ AstTransformer.transform(raw_ast)
68
+ else
69
+ # Already decoded Ruby value from _parse_raw - apply transformer directly
70
+ AstTransformer.transform(data)
71
+ end
72
+ end
73
+
74
+ # Join consecutive Slice objects in arrays into single Slices
75
+ # This matches what transform_ast does in Rust (join_slices_from_array)
76
+ #
77
+ # @param value [Object] AST value
78
+ # @param slice_class [Class] The Slice class to check for
79
+ # @param input [String] Original input string
80
+ # @return [Object] AST with joined slices
81
+ def join_consecutive_slices(value, slice_class, input)
82
+ input_bytes = input.b
83
+
84
+ case value
85
+ when Array
86
+ # Recursively process array elements
87
+ processed = value.map { |v| join_consecutive_slices(v, slice_class, input) }
88
+
89
+ # Check if all non-nil elements are Slices
90
+ non_nil = processed.compact
91
+ if non_nil.all? { |v| v.is_a?(slice_class) }
92
+ # Check if slices are consecutive
93
+ if slices_consecutive?(non_nil)
94
+ # Join into single slice
95
+ join_slices(non_nil, slice_class, input_bytes, input)
96
+ else
97
+ processed
98
+ end
99
+ else
100
+ processed
101
+ end
102
+ when Hash
103
+ # Process hash values recursively
104
+ result = {}
105
+ value.each do |k, v|
106
+ result[k] = join_consecutive_slices(v, slice_class, input)
107
+ end
108
+ result
109
+ else
110
+ value
111
+ end
112
+ end
113
+
114
+ private
115
+
116
+ def slices_consecutive?(slices)
117
+ return true if slices.empty?
118
+
119
+ slices.each_cons(2).all? do |a, b|
120
+ a.offset + a.content.bytesize == b.offset
121
+ end
122
+ end
123
+
124
+ def join_slices(slices, slice_class, input_bytes, input)
125
+ return nil if slices.empty?
126
+ return slices.first if slices.length == 1
127
+
128
+ first = slices.first
129
+ last = slices.last
130
+ total_length = last.offset + last.content.bytesize - first.offset
131
+ content = input_bytes[first.offset, total_length]
132
+ content = content.force_encoding('UTF-8') if content
133
+ slice_class.new(first.offset, content, input)
134
+ end
135
+
136
+ def decode_value
137
+ tag = @data[@pos]
138
+ @pos += 1
139
+
140
+ case tag
141
+ when TAG_NIL
142
+ nil
143
+ when TAG_BOOL
144
+ val = @data[@pos]
145
+ @pos += 1
146
+ val != 0
147
+ when TAG_INT
148
+ val = @data[@pos]
149
+ @pos += 1
150
+ # Handle negative numbers (signed i64 stored as u64)
151
+ if val >= 0x8000_0000_0000_0000
152
+ val = val - 0x1_0000_0000_0000_0000
153
+ end
154
+ val
155
+ when TAG_FLOAT
156
+ bits = @data[@pos]
157
+ @pos += 1
158
+ # Convert IEEE 754 bits to float
159
+ [bits].pack('Q').unpack1('D')
160
+ when TAG_STRING
161
+ offset = @data[@pos]
162
+ length = @data[@pos + 1]
163
+ @pos += 2
164
+ create_slice(offset, length)
165
+ when TAG_SYMBOL
166
+ # Symbol is encoded like inline string: len, then u64 chunks
167
+ len = @data[@pos]
168
+ @pos += 1
169
+ str = decode_inline_string_bytes(len)
170
+ str.to_sym
171
+ when TAG_REPETITION
172
+ inner = decode_value
173
+ [:repetition, inner].compact
174
+ when TAG_SEQUENCE
175
+ inner = decode_value
176
+ [:sequence, inner].compact
177
+ when TAG_ARRAY_START
178
+ decode_array
179
+ when TAG_HASH_START
180
+ decode_hash
181
+ else
182
+ raise "Unknown tag: #{tag} at position #{@pos - 1}"
183
+ end
184
+ end
185
+
186
+ def decode_array
187
+ result = []
188
+ loop do
189
+ tag = @data[@pos]
190
+ break if tag == TAG_ARRAY_END
191
+
192
+ result << decode_value
193
+ end
194
+ @pos += 1 # consume TAG_ARRAY_END
195
+ result
196
+ end
197
+
198
+ def decode_hash
199
+ result = {}
200
+ loop do
201
+ tag = @data[@pos]
202
+ break if tag == TAG_HASH_END
203
+
204
+ # Read key
205
+ raise "Expected TAG_HASH_KEY, got #{tag}" unless tag == TAG_HASH_KEY
206
+ @pos += 1
207
+ key = decode_inline_string
208
+
209
+ # Read value
210
+ value = decode_value
211
+
212
+ # Keep original key format (camelCase) for Ruby parser compatibility
213
+ result[key.to_sym] = value
214
+ end
215
+ @pos += 1 # consume TAG_HASH_END
216
+ result
217
+ end
218
+
219
+ def decode_inline_string
220
+ len = @data[@pos]
221
+ @pos += 1
222
+ decode_inline_string_bytes(len)
223
+ end
224
+
225
+ # Decode inline string bytes given the length
226
+ # @param len [Integer] Length of the string in bytes
227
+ # @return [String] Decoded string
228
+ def decode_inline_string_bytes(len)
229
+ # Read u64 chunks
230
+ chunks = (len + 7) / 8
231
+ bytes = String.new(encoding: 'ASCII-8BIT', capacity: len)
232
+ chunks.times do
233
+ chunk = @data[@pos]
234
+ @pos += 1
235
+ 8.times do |byte_idx|
236
+ break if bytes.bytesize >= len
237
+ bytes << ((chunk >> (byte_idx * 8)) & 0xFF)
238
+ end
239
+ end
240
+
241
+ bytes.force_encoding('UTF-8')
242
+ end
243
+
244
+ def create_slice(offset, length)
245
+ content = @input_bytes[offset, length]
246
+ content = content.force_encoding('UTF-8') if content
247
+ @slice_class.new(offset, content, @input)
248
+ end
249
+ end
250
+ end
251
+ end
252
+ end