parsanol 1.2.2-aarch64-linux → 1.3.2-aarch64-linux
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/HISTORY.txt +33 -3
- data/README.adoc +103 -9
- data/lib/parsanol/3.2/parsanol_native.so +0 -0
- data/lib/parsanol/3.3/parsanol_native.so +0 -0
- data/lib/parsanol/3.4/parsanol_native.so +0 -0
- data/lib/parsanol/4.0/parsanol_native.so +0 -0
- data/lib/parsanol/native/batch_decoder.rb +252 -0
- data/lib/parsanol/native/parser.rb +28 -574
- data/lib/parsanol/native/transformer.rb +125 -58
- data/lib/parsanol/native.rb +107 -183
- data/lib/parsanol/parser.rb +2 -6
- data/lib/parsanol/slice.rb +51 -105
- data/lib/parsanol/version.rb +1 -1
- metadata +3 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: a4e1c8284b1cdc7d254fbd5e7af15417dc56e8a631ce8a1891591bc40ef3d420
|
|
4
|
+
data.tar.gz: 0ae7e373ebda15f3cab39f1da332a53f1c9abe67e10260847e718817960a671a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: f26c1803a829cfb6b9753a7e4ed27fdafbef8df350b043e11c866384d08b01949a8dfa06c287d94b03beae8f94cac76de23f2c4b20e4eda1830b0a98cbdad843
|
|
7
|
+
data.tar.gz: 137b12b8eb6f2ef8c5ddc811a160a4f6f91bf00a6a34c1a5f9a4bf6978c928744729acc8779a543828e4245f88da6922f5298ae92e447185f819a847c1409523
|
data/HISTORY.txt
CHANGED
|
@@ -1,3 +1,33 @@
|
|
|
1
|
+
== Parsanol 1.3.0 (unreleased)
|
|
2
|
+
|
|
3
|
+
Breaking changes:
|
|
4
|
+
|
|
5
|
+
* Simplified API: Single `parse()` method with lazy line/column
|
|
6
|
+
* Removed deprecated methods: `parse_parslet`, `parse_parslet_with_positions`,
|
|
7
|
+
`parse_with_transform`, `parse_to_objects`, `parse_raw`
|
|
8
|
+
|
|
9
|
+
New features:
|
|
10
|
+
|
|
11
|
+
* Lazy line/column computation in `Slice#line_and_column` - computed only when accessed
|
|
12
|
+
* `BatchDecoder` module for efficient batch AST processing
|
|
13
|
+
* Grammar accepts Ruby atoms directly - no JSON serialization step needed
|
|
14
|
+
|
|
15
|
+
Bug fixes:
|
|
16
|
+
|
|
17
|
+
* Fixed grammar cache cycle detection to prevent stack overflow on recursive grammars
|
|
18
|
+
* Fixed wrapper vs repetition pattern detection in `AstTransformer`:
|
|
19
|
+
- Same inner keys → Repetition pattern (keep as array)
|
|
20
|
+
- Different inner keys → Wrapper pattern (merge inner hashes)
|
|
21
|
+
* Correct entity extraction for grammars with multiple declarations
|
|
22
|
+
|
|
23
|
+
== Parsanol 1.2.2 (2026-03-07)
|
|
24
|
+
|
|
25
|
+
Syncs with parsanol-rs 0.3.0:
|
|
26
|
+
|
|
27
|
+
* Full FFI support for capture, scope, and dynamic atoms
|
|
28
|
+
* Native extension performance for all new features
|
|
29
|
+
* BuilderCallbacks module for streaming parsing
|
|
30
|
+
|
|
1
31
|
== Parsanol 1.2.1 (2026-03-07)
|
|
2
32
|
|
|
3
33
|
Bug fix:
|
|
@@ -34,12 +64,12 @@ Runtime-determined parsing via FFI callbacks:
|
|
|
34
64
|
|
|
35
65
|
=== Native Extension Updates
|
|
36
66
|
|
|
37
|
-
* Updated to parsanol-rs 0.
|
|
67
|
+
* Updated to parsanol-rs 0.3.0
|
|
38
68
|
* New backend abstraction (Packrat, Bytecode, Auto)
|
|
39
69
|
* Streaming parser with capture extraction
|
|
40
70
|
* Performance improvements
|
|
41
71
|
|
|
42
|
-
== Parsanol 1.1.0 (
|
|
72
|
+
== Parsanol 1.1.0 (2026-03-05)
|
|
43
73
|
|
|
44
74
|
Position information is now returned by default:
|
|
45
75
|
|
|
@@ -54,7 +84,7 @@ Performance improvements:
|
|
|
54
84
|
* ZeroCopy API for direct FFI object construction
|
|
55
85
|
* Parallel batch parsing with `Parsanol::Parallel`
|
|
56
86
|
|
|
57
|
-
== Parsanol 1.0.0 (
|
|
87
|
+
== Parsanol 1.0.0 (2026-03-03)
|
|
58
88
|
|
|
59
89
|
Initial release of Parsanol, a high-performance PEG parser library for Ruby.
|
|
60
90
|
|
data/README.adoc
CHANGED
|
@@ -21,7 +21,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
|
|
|
21
21
|
|
|
22
22
|
* <<basic-parsing,PEG-based Parser Construction>> - Declarative grammar definition
|
|
23
23
|
* <<error-reporting,Detailed Error Reporting>> - Precise failure location and context
|
|
24
|
-
* <<native-extension,Rust Native Extension>> - Up to
|
|
24
|
+
* <<native-extension,Rust Native Extension>> - **Up to 1300x faster parsing**
|
|
25
25
|
* <<slice-support,Slice Support>> - Source position preservation for linters and IDEs
|
|
26
26
|
* <<transformation,Tree Transformation>> - Pattern-based AST construction
|
|
27
27
|
* <<streaming-builder,Streaming Builder API>> - Single-pass parsing with callbacks
|
|
@@ -29,6 +29,7 @@ While maintaining full API compatibility with Parslet, Parsanol features a compl
|
|
|
29
29
|
* <<infix-expressions,Infix Expression Parsing>> - Built-in operator precedence support
|
|
30
30
|
* <<security-features,Security Features>> - Input size and recursion limits
|
|
31
31
|
* <<debug-tools,Debug Tools>> - Tracing and grammar visualization
|
|
32
|
+
* <<benchmarking,Benchmarking>> - Built-in performance testing tools
|
|
32
33
|
|
|
33
34
|
== Installation
|
|
34
35
|
|
|
@@ -53,6 +54,23 @@ Or install it yourself as:
|
|
|
53
54
|
gem install parsanol
|
|
54
55
|
----
|
|
55
56
|
|
|
57
|
+
== Examples
|
|
58
|
+
|
|
59
|
+
The repository includes several example files demonstrating different use cases:
|
|
60
|
+
|
|
61
|
+
* link:examples/benchmark_examples.rb[] - Basic performance benchmarks
|
|
62
|
+
* link:examples/benchmark_full.rb[] - Comprehensive benchmark suite
|
|
63
|
+
* link:examples/parsing_modes.rb[] - Demonstrates all parsing modes
|
|
64
|
+
* link:examples/parslet_migration.rb[] - Migration guide from Parslet
|
|
65
|
+
|
|
66
|
+
[source,ruby]
|
|
67
|
+
----
|
|
68
|
+
# Run examples
|
|
69
|
+
bundle exec ruby examples/benchmark_examples.rb
|
|
70
|
+
bundle exec ruby examples/parsing_modes.rb
|
|
71
|
+
bundle exec ruby examples/parslet_migration.rb
|
|
72
|
+
----
|
|
73
|
+
|
|
56
74
|
== Usage
|
|
57
75
|
|
|
58
76
|
[[basic-parsing]]
|
|
@@ -103,6 +121,23 @@ ast = MyTransform.new.apply(parse_tree)
|
|
|
103
121
|
|
|
104
122
|
[[native-extension]]
|
|
105
123
|
=== Native Extension
|
|
124
|
+
|
|
125
|
+
The Rust native extension provides significant performance improvements over pure Ruby parsing:
|
|
126
|
+
|
|
127
|
+
[cols="2,1,1,1"]
|
|
128
|
+
|===
|
|
129
|
+
| Pattern Type | Ruby (i/s) | Native (i/s) | Speedup
|
|
130
|
+
|
|
131
|
+
| Simple string match | ~575 | ~775,000 | **1,340x**
|
|
132
|
+
| Sequence (3 parts) | ~580 | ~530,000 | **910x**
|
|
133
|
+
| Named capture | ~575 | ~510,000 | **880x**
|
|
134
|
+
| Repetition (10x digits) | ~560 | ~740,000 | **1,310x**
|
|
135
|
+
| Alternative (3 options) | ~575 | ~720,000 | **1,250x**
|
|
136
|
+
| Calculator expression | ~570 | ~180,000 | **315x**
|
|
137
|
+
| Repetition with name | ~575 | ~125,000 | **220x**
|
|
138
|
+
| Repetition (20x) | ~560 | ~720,000 | **1,280x**
|
|
139
|
+
|===
|
|
140
|
+
|
|
106
141
|
For maximum performance, compile the Rust native extension:
|
|
107
142
|
|
|
108
143
|
[source,shell]
|
|
@@ -114,6 +149,13 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|
|
114
149
|
bundle exec rake compile
|
|
115
150
|
----
|
|
116
151
|
|
|
152
|
+
Run your own benchmarks with:
|
|
153
|
+
|
|
154
|
+
[source,ruby]
|
|
155
|
+
----
|
|
156
|
+
bundle exec ruby examples/benchmark_examples.rb
|
|
157
|
+
----
|
|
158
|
+
|
|
117
159
|
[[slice-support]]
|
|
118
160
|
=== Slice Support
|
|
119
161
|
|
|
@@ -247,9 +289,9 @@ parser.parse('123') # Works exactly the same
|
|
|
247
289
|
| `sequence(:x)` | ✅ | Match array of values
|
|
248
290
|
| `subtree(:x)` | ✅ | Match any subtree
|
|
249
291
|
| `Parslet::Slice` | ✅ | Parsanol::Slice compatible
|
|
250
|
-
| `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.
|
|
251
|
-
| `scope { }` | ✅ | Isolated capture context (NEW in 1.
|
|
252
|
-
| `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.
|
|
292
|
+
| `.capture(:name)` | ✅ | Named capture extraction (NEW in 1.2.0)
|
|
293
|
+
| `scope { }` | ✅ | Isolated capture context (NEW in 1.2.0)
|
|
294
|
+
| `dynamic { \|ctx\| }` | ✅ | Runtime-determined parsing (NEW in 1.2.0)
|
|
253
295
|
|===
|
|
254
296
|
|
|
255
297
|
NOTE: The new capture, scope, and dynamic atoms provide powerful extraction and context-sensitive parsing capabilities. See the <<captures,Captures>> section for details.
|
|
@@ -475,7 +517,7 @@ For more details on backend selection and grammar analysis, see the https://pars
|
|
|
475
517
|
[[captures]]
|
|
476
518
|
== Captures, Scopes, and Dynamic Atoms
|
|
477
519
|
|
|
478
|
-
Parsanol 1.
|
|
520
|
+
Parsanol 1.2.0 introduces powerful new features for extracting and managing parsed data.
|
|
479
521
|
|
|
480
522
|
[[capture-atoms]]
|
|
481
523
|
=== Capture Atoms
|
|
@@ -789,12 +831,64 @@ ruby -I lib -e "require 'parsanol'; puts Parsanol::Native.available?"
|
|
|
789
831
|
|
|
790
832
|
[source,shell]
|
|
791
833
|
----
|
|
792
|
-
# Quick benchmarks
|
|
793
|
-
bundle exec
|
|
834
|
+
# Quick benchmarks (examples provided in the repository)
|
|
835
|
+
bundle exec ruby examples/benchmark_examples.rb
|
|
836
|
+
|
|
837
|
+
# Full benchmark suite
|
|
838
|
+
bundle exec ruby examples/benchmark_full.rb
|
|
839
|
+
|
|
840
|
+
# Compare parsing modes
|
|
841
|
+
bundle exec ruby examples/parsing_modes.rb
|
|
842
|
+
|
|
843
|
+
# Parslet migration examples
|
|
844
|
+
bundle exec ruby examples/parslet_migration.rb
|
|
845
|
+
----
|
|
846
|
+
|
|
847
|
+
[[benchmarking]]
|
|
848
|
+
== Benchmarking Your Own Code
|
|
794
849
|
|
|
795
|
-
|
|
796
|
-
|
|
850
|
+
Parsanol includes built-in benchmarking tools to measure performance:
|
|
851
|
+
|
|
852
|
+
[source,ruby]
|
|
797
853
|
----
|
|
854
|
+
require 'parsanol'
|
|
855
|
+
require 'benchmark/ips'
|
|
856
|
+
|
|
857
|
+
# Your grammar
|
|
858
|
+
grammar = Parsanol.str("hello").as(:greeting)
|
|
859
|
+
json = Parsanol::Native.serialize_grammar(grammar)
|
|
860
|
+
|
|
861
|
+
# Benchmark
|
|
862
|
+
Benchmark.ips do |x|
|
|
863
|
+
x.config(warmup: 5, time: 3)
|
|
864
|
+
|
|
865
|
+
x.report("Ruby mode") { grammar.parse("hello", mode: :ruby) }
|
|
866
|
+
x.report("Native mode") { Parsanol::Native.parse(json, "hello") }
|
|
867
|
+
|
|
868
|
+
x.compare!
|
|
869
|
+
end
|
|
870
|
+
----
|
|
871
|
+
|
|
872
|
+
=== Performance Factors
|
|
873
|
+
|
|
874
|
+
The actual speedup depends on several factors:
|
|
875
|
+
|
|
876
|
+
1. **Grammar complexity** - More complex grammars have more transformation overhead
|
|
877
|
+
2. **Named captures** - More named captures mean more hash construction
|
|
878
|
+
3. **Repetition patterns** - Named repetitions have more overhead than unnamed
|
|
879
|
+
4. **Input size** - Larger inputs benefit more from native parsing
|
|
880
|
+
|
|
881
|
+
[cols="1,2,2"]
|
|
882
|
+
|===
|
|
883
|
+
| Pattern | Speedup | Reason for Difference
|
|
884
|
+
|
|
885
|
+
| Simple strings | ~1,300x | Minimal transformation
|
|
886
|
+
| Sequences | ~900x | String joining overhead
|
|
887
|
+
| Alternatives | ~1,250x | Fast path in Rust
|
|
888
|
+
| Repetitions (unnamed) | ~1,300x | Minimal transformation
|
|
889
|
+
| Repetitions (named) | ~220x | Array of hashes construction
|
|
890
|
+
| Complex expressions | ~315x | More grammar elements
|
|
891
|
+
|===
|
|
798
892
|
|
|
799
893
|
== License
|
|
800
894
|
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
Binary file
|
|
@@ -0,0 +1,252 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'parsanol/native/transformer'
|
|
4
|
+
|
|
5
|
+
module Parsanol
|
|
6
|
+
module Native
|
|
7
|
+
# Decodes flat u64 arrays from Rust batch parser into Ruby AST
|
|
8
|
+
#
|
|
9
|
+
# The batch format uses tagged u64 values:
|
|
10
|
+
# - 0x00 = nil
|
|
11
|
+
# - 0x01 + value = bool (0 or 1)
|
|
12
|
+
# - 0x02 + value = int
|
|
13
|
+
# - 0x03 + bits = float (IEEE 754 bits)
|
|
14
|
+
# - 0x04 + offset + length = input string reference
|
|
15
|
+
# - 0x05 ... 0x06 = array (start ... end)
|
|
16
|
+
# - 0x07 ... 0x08 = hash (start ... end)
|
|
17
|
+
# - 0x09 + len + data... = hash key
|
|
18
|
+
# - 0x0A + len + data... = inline string
|
|
19
|
+
module BatchDecoder
|
|
20
|
+
TAG_NIL = 0x00
|
|
21
|
+
TAG_BOOL = 0x01
|
|
22
|
+
TAG_INT = 0x02
|
|
23
|
+
TAG_FLOAT = 0x03
|
|
24
|
+
TAG_STRING = 0x04
|
|
25
|
+
TAG_ARRAY_START = 0x05
|
|
26
|
+
TAG_ARRAY_END = 0x06
|
|
27
|
+
TAG_HASH_START = 0x07
|
|
28
|
+
TAG_HASH_END = 0x08
|
|
29
|
+
TAG_HASH_KEY = 0x09
|
|
30
|
+
TAG_INLINE_STRING = 0x0A
|
|
31
|
+
TAG_SYMBOL = 0x0B
|
|
32
|
+
TAG_REPETITION = 0x0C
|
|
33
|
+
TAG_SEQUENCE = 0x0D
|
|
34
|
+
|
|
35
|
+
class << self
|
|
36
|
+
# Decode a flat u64 array into Ruby AST with Slice objects
|
|
37
|
+
#
|
|
38
|
+
# @param data [Array<Integer>] Flat u64 array from batch parser
|
|
39
|
+
# @param input [String] Original input string (for Slice references)
|
|
40
|
+
# @param slice_class [Class] The Slice class to use
|
|
41
|
+
# @return [Object] Ruby AST (Hash, Array, Slice, etc.)
|
|
42
|
+
def decode(data, input, slice_class)
|
|
43
|
+
@input = input
|
|
44
|
+
@input_bytes = input.b
|
|
45
|
+
@slice_class = slice_class
|
|
46
|
+
@pos = 0
|
|
47
|
+
@data = data
|
|
48
|
+
decode_value
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
# Decode batch format to Ruby AST and apply transformation.
|
|
52
|
+
#
|
|
53
|
+
# The Rust parser produces raw AST that needs transformation to match
|
|
54
|
+
# Ruby parser behavior (merging duplicate keys, etc.)
|
|
55
|
+
#
|
|
56
|
+
# @param data [Array<Integer>|Object] Either flat u64 array from batch parser OR
|
|
57
|
+
# pre-decoded Ruby value from _parse_raw
|
|
58
|
+
# @param input [String] Original input string (for Slice references)
|
|
59
|
+
# @param slice_class [Class] The Slice class to use
|
|
60
|
+
# @param grammar_atom [Parsanol::Atoms::Base] The grammar atom (unused, kept for API compat)
|
|
61
|
+
# @return [Object] Transformed Ruby AST
|
|
62
|
+
def decode_and_flatten(data, input, slice_class, grammar_atom)
|
|
63
|
+
# Check if data is batch data (flat u64 array) or already a Ruby value
|
|
64
|
+
if data.is_a?(Integer) || (data.is_a?(Array) && data.first.is_a?(Integer))
|
|
65
|
+
# Batch data (flat u64 array) - decode first, then transform
|
|
66
|
+
raw_ast = decode(data, input, slice_class)
|
|
67
|
+
AstTransformer.transform(raw_ast)
|
|
68
|
+
else
|
|
69
|
+
# Already decoded Ruby value from _parse_raw - apply transformer directly
|
|
70
|
+
AstTransformer.transform(data)
|
|
71
|
+
end
|
|
72
|
+
end
|
|
73
|
+
|
|
74
|
+
# Join consecutive Slice objects in arrays into single Slices
|
|
75
|
+
# This matches what transform_ast does in Rust (join_slices_from_array)
|
|
76
|
+
#
|
|
77
|
+
# @param value [Object] AST value
|
|
78
|
+
# @param slice_class [Class] The Slice class to check for
|
|
79
|
+
# @param input [String] Original input string
|
|
80
|
+
# @return [Object] AST with joined slices
|
|
81
|
+
def join_consecutive_slices(value, slice_class, input)
|
|
82
|
+
input_bytes = input.b
|
|
83
|
+
|
|
84
|
+
case value
|
|
85
|
+
when Array
|
|
86
|
+
# Recursively process array elements
|
|
87
|
+
processed = value.map { |v| join_consecutive_slices(v, slice_class, input) }
|
|
88
|
+
|
|
89
|
+
# Check if all non-nil elements are Slices
|
|
90
|
+
non_nil = processed.compact
|
|
91
|
+
if non_nil.all? { |v| v.is_a?(slice_class) }
|
|
92
|
+
# Check if slices are consecutive
|
|
93
|
+
if slices_consecutive?(non_nil)
|
|
94
|
+
# Join into single slice
|
|
95
|
+
join_slices(non_nil, slice_class, input_bytes, input)
|
|
96
|
+
else
|
|
97
|
+
processed
|
|
98
|
+
end
|
|
99
|
+
else
|
|
100
|
+
processed
|
|
101
|
+
end
|
|
102
|
+
when Hash
|
|
103
|
+
# Process hash values recursively
|
|
104
|
+
result = {}
|
|
105
|
+
value.each do |k, v|
|
|
106
|
+
result[k] = join_consecutive_slices(v, slice_class, input)
|
|
107
|
+
end
|
|
108
|
+
result
|
|
109
|
+
else
|
|
110
|
+
value
|
|
111
|
+
end
|
|
112
|
+
end
|
|
113
|
+
|
|
114
|
+
private
|
|
115
|
+
|
|
116
|
+
def slices_consecutive?(slices)
|
|
117
|
+
return true if slices.empty?
|
|
118
|
+
|
|
119
|
+
slices.each_cons(2).all? do |a, b|
|
|
120
|
+
a.offset + a.content.bytesize == b.offset
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
|
|
124
|
+
def join_slices(slices, slice_class, input_bytes, input)
|
|
125
|
+
return nil if slices.empty?
|
|
126
|
+
return slices.first if slices.length == 1
|
|
127
|
+
|
|
128
|
+
first = slices.first
|
|
129
|
+
last = slices.last
|
|
130
|
+
total_length = last.offset + last.content.bytesize - first.offset
|
|
131
|
+
content = input_bytes[first.offset, total_length]
|
|
132
|
+
content = content.force_encoding('UTF-8') if content
|
|
133
|
+
slice_class.new(first.offset, content, input)
|
|
134
|
+
end
|
|
135
|
+
|
|
136
|
+
def decode_value
|
|
137
|
+
tag = @data[@pos]
|
|
138
|
+
@pos += 1
|
|
139
|
+
|
|
140
|
+
case tag
|
|
141
|
+
when TAG_NIL
|
|
142
|
+
nil
|
|
143
|
+
when TAG_BOOL
|
|
144
|
+
val = @data[@pos]
|
|
145
|
+
@pos += 1
|
|
146
|
+
val != 0
|
|
147
|
+
when TAG_INT
|
|
148
|
+
val = @data[@pos]
|
|
149
|
+
@pos += 1
|
|
150
|
+
# Handle negative numbers (signed i64 stored as u64)
|
|
151
|
+
if val >= 0x8000_0000_0000_0000
|
|
152
|
+
val = val - 0x1_0000_0000_0000_0000
|
|
153
|
+
end
|
|
154
|
+
val
|
|
155
|
+
when TAG_FLOAT
|
|
156
|
+
bits = @data[@pos]
|
|
157
|
+
@pos += 1
|
|
158
|
+
# Convert IEEE 754 bits to float
|
|
159
|
+
[bits].pack('Q').unpack1('D')
|
|
160
|
+
when TAG_STRING
|
|
161
|
+
offset = @data[@pos]
|
|
162
|
+
length = @data[@pos + 1]
|
|
163
|
+
@pos += 2
|
|
164
|
+
create_slice(offset, length)
|
|
165
|
+
when TAG_SYMBOL
|
|
166
|
+
# Symbol is encoded like inline string: len, then u64 chunks
|
|
167
|
+
len = @data[@pos]
|
|
168
|
+
@pos += 1
|
|
169
|
+
str = decode_inline_string_bytes(len)
|
|
170
|
+
str.to_sym
|
|
171
|
+
when TAG_REPETITION
|
|
172
|
+
inner = decode_value
|
|
173
|
+
[:repetition, inner].compact
|
|
174
|
+
when TAG_SEQUENCE
|
|
175
|
+
inner = decode_value
|
|
176
|
+
[:sequence, inner].compact
|
|
177
|
+
when TAG_ARRAY_START
|
|
178
|
+
decode_array
|
|
179
|
+
when TAG_HASH_START
|
|
180
|
+
decode_hash
|
|
181
|
+
else
|
|
182
|
+
raise "Unknown tag: #{tag} at position #{@pos - 1}"
|
|
183
|
+
end
|
|
184
|
+
end
|
|
185
|
+
|
|
186
|
+
def decode_array
|
|
187
|
+
result = []
|
|
188
|
+
loop do
|
|
189
|
+
tag = @data[@pos]
|
|
190
|
+
break if tag == TAG_ARRAY_END
|
|
191
|
+
|
|
192
|
+
result << decode_value
|
|
193
|
+
end
|
|
194
|
+
@pos += 1 # consume TAG_ARRAY_END
|
|
195
|
+
result
|
|
196
|
+
end
|
|
197
|
+
|
|
198
|
+
def decode_hash
|
|
199
|
+
result = {}
|
|
200
|
+
loop do
|
|
201
|
+
tag = @data[@pos]
|
|
202
|
+
break if tag == TAG_HASH_END
|
|
203
|
+
|
|
204
|
+
# Read key
|
|
205
|
+
raise "Expected TAG_HASH_KEY, got #{tag}" unless tag == TAG_HASH_KEY
|
|
206
|
+
@pos += 1
|
|
207
|
+
key = decode_inline_string
|
|
208
|
+
|
|
209
|
+
# Read value
|
|
210
|
+
value = decode_value
|
|
211
|
+
|
|
212
|
+
# Keep original key format (camelCase) for Ruby parser compatibility
|
|
213
|
+
result[key.to_sym] = value
|
|
214
|
+
end
|
|
215
|
+
@pos += 1 # consume TAG_HASH_END
|
|
216
|
+
result
|
|
217
|
+
end
|
|
218
|
+
|
|
219
|
+
def decode_inline_string
|
|
220
|
+
len = @data[@pos]
|
|
221
|
+
@pos += 1
|
|
222
|
+
decode_inline_string_bytes(len)
|
|
223
|
+
end
|
|
224
|
+
|
|
225
|
+
# Decode inline string bytes given the length
|
|
226
|
+
# @param len [Integer] Length of the string in bytes
|
|
227
|
+
# @return [String] Decoded string
|
|
228
|
+
def decode_inline_string_bytes(len)
|
|
229
|
+
# Read u64 chunks
|
|
230
|
+
chunks = (len + 7) / 8
|
|
231
|
+
bytes = String.new(encoding: 'ASCII-8BIT', capacity: len)
|
|
232
|
+
chunks.times do
|
|
233
|
+
chunk = @data[@pos]
|
|
234
|
+
@pos += 1
|
|
235
|
+
8.times do |byte_idx|
|
|
236
|
+
break if bytes.bytesize >= len
|
|
237
|
+
bytes << ((chunk >> (byte_idx * 8)) & 0xFF)
|
|
238
|
+
end
|
|
239
|
+
end
|
|
240
|
+
|
|
241
|
+
bytes.force_encoding('UTF-8')
|
|
242
|
+
end
|
|
243
|
+
|
|
244
|
+
def create_slice(offset, length)
|
|
245
|
+
content = @input_bytes[offset, length]
|
|
246
|
+
content = content.force_encoding('UTF-8') if content
|
|
247
|
+
@slice_class.new(offset, content, @input)
|
|
248
|
+
end
|
|
249
|
+
end
|
|
250
|
+
end
|
|
251
|
+
end
|
|
252
|
+
end
|