tree_haver 2.0.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -67,7 +67,11 @@ If you've used [Faraday](https://github.com/lostisland/faraday), [multi_json](ht
67
67
  | **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
68
68
  | **TreeHaver** | tree-sitter parsing | ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus |
69
69
 
70
- **Write once, run anywhere.** Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
70
+ **Write once, run anywhere.**
71
+
72
+ **Learn once, write anywhere.**
73
+
74
+ Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
71
75
 
72
76
  ```ruby
73
77
  # Your code stays the same regardless of backend
@@ -76,7 +80,7 @@ parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
76
80
  tree = parser.parse(source_code)
77
81
 
78
82
  # TreeHaver automatically picks the best backend:
79
- # - MRI → ruby_tree_sitter (C extension)
83
+ # - MRI → ruby_tree_sitter (C extensions)
80
84
  # - JRuby → FFI (system's libtree-sitter)
81
85
  # - TruffleRuby → FFI or MRI backend
82
86
  ```
@@ -97,6 +101,74 @@ tree = parser.parse(source_code)
97
101
  - **Thread-Safe**: Built-in language registry with thread-safe caching
98
102
  - **Minimal API Surface**: Simple, focused API that covers the most common tree-sitter use cases
99
103
 
104
+ ### Backend Requirements
105
+
106
+ TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
107
+
108
+ #### MRI Backend (ruby_tree_sitter, C extensions)
109
+
110
+ **Requires `ruby_tree_sitter` v2.0+**
111
+
112
+ In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from `Exception` (not `StandardError`). This was an intentional breaking change made for thread-safety and signal handling reasons.
113
+
114
+ **Exception Mapping**: TreeHaver catches `TreeSitter::TreeSitterError` and its subclasses, converting them to `TreeHaver::NotAvailable` while preserving the original error message. This provides a consistent exception API across all backends:
115
+
116
+ | ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
117
+ |-------------------------------------|----------------------------|------------------------------------------------|
118
+ | `TreeSitter::ParserNotFoundError` | `TreeHaver::NotAvailable` | Parser library file cannot be loaded |
119
+ | `TreeSitter::LanguageLoadError` | `TreeHaver::NotAvailable` | Language symbol loads but returns nothing |
120
+ | `TreeSitter::SymbolNotFoundError` | `TreeHaver::NotAvailable` | Symbol not found in library |
121
+ | `TreeSitter::ParserVersionError` | `TreeHaver::NotAvailable` | Parser version incompatible with tree-sitter |
122
+ | `TreeSitter::QueryCreationError` | `TreeHaver::NotAvailable` | Query creation fails |
123
+
124
+ ```ruby
125
+ # Add to your Gemfile for MRI backend
126
+ gem "ruby_tree_sitter", "~> 2.0"
127
+ ```
128
+
129
+ #### Rust Backend (tree_stump)
130
+
131
+ Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs are merged.
132
+
133
+ ```ruby
134
+ # Add to your Gemfile for Rust backend
135
+ gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
136
+ ```
137
+
138
+ #### FFI Backend
139
+
140
+ Requires the `ffi` gem and a system installation of `libtree-sitter`:
141
+
142
+ ```ruby
143
+ # Add to your Gemfile for FFI backend
144
+ gem "ffi", ">= 1.15", "< 2.0"
145
+ ```
146
+
147
+ ```bash
148
+ # Install libtree-sitter on your system:
149
+ # macOS
150
+ brew install tree-sitter
151
+
152
+ # Ubuntu/Debian
153
+ apt-get install libtree-sitter0 libtree-sitter-dev
154
+
155
+ # Fedora
156
+ dnf install tree-sitter tree-sitter-devel
157
+ ```
158
+
159
+ #### Citrus Backend
160
+
161
+ Pure Ruby parser with no native dependencies:
162
+
163
+ ```ruby
164
+ # Add to your Gemfile for Citrus backend
165
+ gem "citrus", "~> 3.0"
166
+ ```
167
+
168
+ #### Java Backend (JRuby only)
169
+
170
+ No additional dependencies required beyond grammar JARs built for java-tree-sitter.
171
+
100
172
  ### Why TreeHaver?
101
173
 
102
174
  tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:
@@ -281,6 +353,108 @@ NOTE: Be prepared to track down certs for signed gems and add them the same way
281
353
 
282
354
  ## ⚙️ Configuration
283
355
 
356
+ ### Available Backends
357
+
358
+ TreeHaver supports multiple parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
359
+
360
+ | Backend | Description | Performance | Portability | Examples |
361
+ |---------|-------------|-------------|-------------|----------|
362
+ | **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) |
363
+ | **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* |
364
+ | **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* |
365
+ | **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) |
366
+ | **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) |
367
+ | **Citrus** | Pure Ruby parsing | 🟡 Slower | ✅ Universal | [TOML](examples/citrus_toml.rb) · [Finitio](examples/citrus_finitio.rb) · [Dhall](examples/citrus_dhall.rb) |
368
+
369
+ **Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Citrus
370
+
371
+ **Known Issues:**
372
+ - *MRI + Bash: ABI incompatibility (use FFI instead)
373
+ - *Rust + Bash: Version mismatch (use FFI instead)
374
+
375
+ **Backend Requirements:**
376
+
377
+ ```ruby
378
+ # MRI Backend
379
+ gem 'ruby_tree_sitter'
380
+
381
+ # Rust Backend
382
+ gem 'tree_stump'
383
+
384
+ # FFI Backend
385
+ gem 'ffi'
386
+
387
+ # Citrus Backend
388
+ gem 'citrus'
389
+ # Plus grammar gems: toml-rb, dhall, finitio, etc.
390
+ ```
391
+
392
+ **Force Specific Backend:**
393
+
394
+ ```ruby
395
+ TreeHaver.backend = :ffi # Force FFI backend
396
+ TreeHaver.backend = :mri # Force MRI backend
397
+ TreeHaver.backend = :rust # Force Rust backend
398
+ TreeHaver.backend = :java # Force Java backend (JRuby)
399
+ TreeHaver.backend = :citrus # Force Citrus backend
400
+ TreeHaver.backend = :auto # Auto-select (default)
401
+ ```
402
+
403
+ **Block-based Backend Switching:**
404
+
405
+ Use `with_backend` to temporarily switch backends for a specific block of code.
406
+ This is thread-safe and supports nesting—the previous backend is automatically
407
+ restored when the block exits (even if an exception is raised).
408
+
409
+ ```ruby
410
+ # Temporarily use a specific backend
411
+ TreeHaver.with_backend(:mri) do
412
+ parser = TreeHaver::Parser.new
413
+ tree = parser.parse(source)
414
+ # All operations in this block use the MRI backend
415
+ end
416
+ # Backend is restored to its previous value here
417
+
418
+ # Nested blocks work correctly
419
+ TreeHaver.with_backend(:rust) do
420
+ # Uses :rust
421
+ TreeHaver.with_backend(:citrus) do
422
+ # Uses :citrus
423
+ parser = TreeHaver::Parser.new
424
+ end
425
+ # Back to :rust
426
+ end
427
+ # Back to original backend
428
+ ```
429
+
430
+ This is particularly useful for:
431
+
432
+ - **Testing**: Test the same code with different backends
433
+ - **Performance comparison**: Benchmark different backends
434
+ - **Fallback scenarios**: Try one backend, fall back to another
435
+ - **Thread isolation**: Each thread can use a different backend safely
436
+
437
+ ```ruby
438
+ # Example: Testing with multiple backends
439
+ [:mri, :rust, :citrus].each do |backend_name|
440
+ TreeHaver.with_backend(backend_name) do
441
+ parser = TreeHaver::Parser.new
442
+ result = parser.parse(source)
443
+ puts "#{backend_name}: #{result.root_node.type}"
444
+ end
445
+ end
446
+ ```
447
+
448
+ **Check Backend Capabilities:**
449
+
450
+ ```ruby
451
+ TreeHaver.backend # => :ffi
452
+ TreeHaver.backend_module # => TreeHaver::Backends::FFI
453
+ TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
454
+ ```
455
+
456
+ See [examples/](examples/) directory for 18 complete working examples demonstrating all backends and languages.
457
+
284
458
  ### Security Considerations
285
459
 
286
460
  **⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.**
@@ -591,6 +765,64 @@ parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
591
765
 
592
766
  This is safe and idempotent—if the real `TreeSitter` module is already loaded, the shim does nothing.
593
767
 
768
+ #### ⚠️ Critical: Exception Hierarchy Incompatibility
769
+
770
+ **ruby_tree_sitter v2+ exceptions inherit from `Exception` (not `StandardError`).**
771
+ **TreeHaver exceptions follow Ruby best practices and inherit from `StandardError`.**
772
+
773
+ This means exception handling behaves **differently** between the two:
774
+
775
+ | Scenario | ruby_tree_sitter v2+ | TreeHaver Compat Mode |
776
+ |----------|---------------------|----------------------|
777
+ | `rescue => e` | Does NOT catch TreeSitter errors | DOES catch TreeHaver errors |
778
+ | Behavior | Errors propagate (inherit Exception) | Errors caught (inherit StandardError) |
779
+
780
+ **Example showing the difference:**
781
+
782
+ ```ruby
783
+ # With real ruby_tree_sitter v2+
784
+ begin
785
+ TreeSitter::Language.load("missing", "/nonexistent.so")
786
+ rescue => e
787
+ puts "Caught!" # Never reached - TreeSitter errors inherit Exception
788
+ end
789
+
790
+ # With TreeHaver compat mode
791
+ require "tree_haver/compat"
792
+ begin
793
+ TreeSitter::Language.load("missing", "/nonexistent.so") # Actually TreeHaver
794
+ rescue => e
795
+ puts "Caught!" # WILL be reached - TreeHaver errors inherit StandardError
796
+ end
797
+ ```
798
+
799
+ **To write compatible exception handling:**
800
+
801
+ ```ruby
802
+ # Option 1: Catch specific exception (works with both)
803
+ begin
804
+ TreeSitter::Language.load(...)
805
+ rescue TreeSitter::TreeSitterError => e # Explicit rescue
806
+ # Works with both ruby_tree_sitter and TreeHaver compat mode
807
+ end
808
+
809
+ # Option 2: Use TreeHaver API directly (recommended)
810
+ begin
811
+ TreeHaver::Language.from_library(...)
812
+ rescue TreeHaver::NotAvailable => e # TreeHaver's unified exception
813
+ # Clear and consistent when using TreeHaver
814
+ end
815
+ ```
816
+
817
+ **Why TreeHaver uses StandardError:**
818
+
819
+ 1. **Ruby Best Practice**: The [Ruby style guide](https://rubystyle.guide/#exception-flow-control) recommends inheriting from `StandardError`
820
+ 2. **Safety**: Inheriting from `Exception` can catch system signals (`SIGTERM`, `SIGINT`) and `exit`, which is dangerous
821
+ 3. **Consistency**: Most Ruby libraries follow this convention
822
+ 4. **Testability**: StandardError exceptions are easier to test and mock
823
+
824
+ See `lib/tree_haver/compat.rb` for detailed documentation.
825
+
594
826
  ## 🔧 Basic Usage
595
827
 
596
828
  ### Quick Start
@@ -657,6 +889,38 @@ parser.language = toml_language
657
889
  tree = parser.parse(toml_source)
658
890
  ```
659
891
 
892
+ #### Flexible Language Names
893
+
894
+ The `name` parameter in `register_language` is an arbitrary identifier you choose—it doesn't
895
+ need to match the actual language name. The actual grammar identity comes from the `path`
896
+ and `symbol` parameters (for tree-sitter) or `grammar_module` (for Citrus).
897
+
898
+ This flexibility is useful for:
899
+
900
+ - **Aliasing**: Register the same grammar under multiple names
901
+ - **Versioning**: Register different grammar versions (e.g., `:ruby_2`, `:ruby_3`)
902
+ - **Testing**: Use unique names to avoid collisions between tests
903
+ - **Context-specific naming**: Use names that make sense for your application
904
+
905
+ ```ruby
906
+ # Register the same TOML grammar under different names for different purposes
907
+ TreeHaver.register_language(
908
+ :config_parser, # Custom name for your app
909
+ path: "/usr/local/lib/libtree-sitter-toml.so",
910
+ symbol: "tree_sitter_toml",
911
+ )
912
+
913
+ TreeHaver.register_language(
914
+ :toml_v1, # Version-specific name
915
+ path: "/usr/local/lib/libtree-sitter-toml.so",
916
+ symbol: "tree_sitter_toml",
917
+ )
918
+
919
+ # Use your custom names
920
+ config_lang = TreeHaver::Language.config_parser
921
+ versioned_lang = TreeHaver::Language.toml_v1
922
+ ```
923
+
660
924
  ### Parsing Different Languages
661
925
 
662
926
  TreeHaver works with any tree-sitter grammar:
@@ -875,23 +1139,90 @@ TreeHaver.backend = :mri
875
1139
  TreeHaver.backend = :citrus
876
1140
  ```
877
1141
 
878
- ### Advanced: Testing with Multiple Backends
1142
+ ### Advanced: Thread-Safe Backend Switching
1143
+
1144
+ TreeHaver provides `with_backend` for thread-safe, temporary backend switching. This is
1145
+ essential for testing, benchmarking, and applications that need different backends in
1146
+ different contexts.
879
1147
 
880
- If you're developing a library that uses TreeHaver, you can test against different backends:
1148
+ #### Testing with Multiple Backends
1149
+
1150
+ Test the same code path with different backends using `with_backend`:
881
1151
 
882
1152
  ```ruby
883
1153
  # In your test setup
884
1154
  RSpec.describe("MyParser") do
885
- before do
886
- TreeHaver.reset_backend!(to: :ffi)
1155
+ # Test with each available backend
1156
+ [:mri, :rust, :citrus].each do |backend_name|
1157
+ context "with #{backend_name} backend" do
1158
+ it "parses correctly" do
1159
+ TreeHaver.with_backend(backend_name) do
1160
+ parser = TreeHaver::Parser.new
1161
+ result = parser.parse("x = 42")
1162
+ expect(result.root_node.type).to eq("document")
1163
+ end
1164
+ # Backend automatically restored after block
1165
+ end
1166
+ end
887
1167
  end
1168
+ end
1169
+ ```
1170
+
1171
+ #### Thread Isolation
1172
+
1173
+ Each thread can use a different backend safely—`with_backend` uses thread-local storage:
888
1174
 
889
- after do
890
- TreeHaver.reset_backend!(to: :auto)
1175
+ ```ruby
1176
+ threads = []
1177
+
1178
+ threads << Thread.new do
1179
+ TreeHaver.with_backend(:mri) do
1180
+ # This thread uses MRI backend
1181
+ parser = TreeHaver::Parser.new
1182
+ 100.times { parser.parse("x = 1") }
891
1183
  end
1184
+ end
892
1185
 
893
- it "parses correctly with FFI backend" do
894
- # Your test code
1186
+ threads << Thread.new do
1187
+ TreeHaver.with_backend(:citrus) do
1188
+ # This thread uses Citrus backend simultaneously
1189
+ parser = TreeHaver::Parser.new
1190
+ 100.times { parser.parse("x = 1") }
1191
+ end
1192
+ end
1193
+
1194
+ threads.each(&:join)
1195
+ ```
1196
+
1197
+ #### Nested Blocks
1198
+
1199
+ `with_backend` supports nesting—inner blocks override outer blocks:
1200
+
1201
+ ```ruby
1202
+ TreeHaver.with_backend(:rust) do
1203
+ puts TreeHaver.effective_backend # => :rust
1204
+
1205
+ TreeHaver.with_backend(:citrus) do
1206
+ puts TreeHaver.effective_backend # => :citrus
1207
+ end
1208
+
1209
+ puts TreeHaver.effective_backend # => :rust (restored)
1210
+ end
1211
+ ```
1212
+
1213
+ #### Fallback Pattern
1214
+
1215
+ Try one backend, fall back to another on failure:
1216
+
1217
+ ```ruby
1218
+ def parse_with_fallback(source)
1219
+ TreeHaver.with_backend(:mri) do
1220
+ TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
1221
+ end
1222
+ rescue TreeHaver::NotAvailable
1223
+ # Fall back to Citrus if MRI backend unavailable
1224
+ TreeHaver.with_backend(:citrus) do
1225
+ TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
895
1226
  end
896
1227
  end
897
1228
  ```
@@ -1292,7 +1623,7 @@ Thanks for RTFM. ☺️
1292
1623
  [📌gitmoji]: https://gitmoji.dev
1293
1624
  [📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
1294
1625
  [🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
1295
- [🧮kloc-img]: https://img.shields.io/badge/KLOC-0.726-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1626
+ [🧮kloc-img]: https://img.shields.io/badge/KLOC-1.067-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1296
1627
  [🔐security]: SECURITY.md
1297
1628
  [🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
1298
1629
  [📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
@@ -86,10 +86,16 @@ module TreeHaver
86
86
  # # For TOML, use toml-rb's grammar
87
87
  # language = TreeHaver::Backends::Citrus::Language.new(TomlRB::Document)
88
88
  class Language
89
+ include Comparable
90
+
89
91
  # The Citrus grammar module
90
92
  # @return [Module] Citrus grammar module (e.g., TomlRB::Document)
91
93
  attr_reader :grammar_module
92
94
 
95
+ # The backend this language is for
96
+ # @return [Symbol]
97
+ attr_reader :backend
98
+
93
99
  # @param grammar_module [Module] A Citrus grammar module with a parse method
94
100
  def initialize(grammar_module)
95
101
  unless grammar_module.respond_to?(:parse)
@@ -98,8 +104,33 @@ module TreeHaver
98
104
  "Expected a Citrus grammar module (e.g., TomlRB::Document)."
99
105
  end
100
106
  @grammar_module = grammar_module
107
+ @backend = :citrus
108
+ end
109
+
110
+ # Compare languages for equality
111
+ #
112
+ # Citrus languages are equal if they have the same backend and grammar_module.
113
+ # Grammar module uniquely identifies a Citrus language.
114
+ #
115
+ # @param other [Object] object to compare with
116
+ # @return [Integer, nil] -1, 0, 1, or nil if not comparable
117
+ def <=>(other)
118
+ return unless other.is_a?(Language)
119
+ return unless other.backend == @backend
120
+
121
+ # Compare by grammar_module name (modules are compared by object_id by default)
122
+ @grammar_module.name <=> other.grammar_module.name
123
+ end
124
+
125
+ # Hash value for this language (for use in Sets/Hashes)
126
+ # @return [Integer]
127
+ def hash
128
+ [@backend, @grammar_module.name].hash
101
129
  end
102
130
 
131
+ # Alias eql? to ==
132
+ alias_method :eql?, :==
133
+
103
134
  # Not applicable for Citrus (tree-sitter-specific)
104
135
  #
105
136
  # Citrus grammars are Ruby modules, not shared libraries.
@@ -131,30 +162,29 @@ module TreeHaver
131
162
 
132
163
  # Set the grammar for this parser
133
164
  #
134
- # @param grammar [Language, Module] Citrus grammar module or Language wrapper
135
- # @return [Language, Module] the grammar that was set
165
+ # Note: TreeHaver::Parser unwraps language objects before calling this method.
166
+ # This backend receives the raw Citrus grammar module (unwrapped), not the Language wrapper.
167
+ #
168
+ # @param grammar [Module] Citrus grammar module with a parse method
169
+ # @return [void]
136
170
  # @example
137
171
  # require "toml-rb"
138
- # parser.language = TomlRB::Document # Pass module directly
139
- # # or
140
- # parser.language = TreeHaver::Backends::Citrus::Language.new(TomlRB::Document)
172
+ # # TreeHaver::Parser unwraps Language.new(TomlRB::Document) to just TomlRB::Document
173
+ # parser.language = TomlRB::Document # Backend receives unwrapped module
141
174
  def language=(grammar)
142
- @grammar = if grammar.respond_to?(:grammar_module)
143
- grammar.grammar_module
144
- elsif grammar.respond_to?(:parse)
145
- grammar
146
- else
175
+ # grammar is already unwrapped by TreeHaver::Parser
176
+ unless grammar.respond_to?(:parse)
147
177
  raise ArgumentError,
148
- "Expected Citrus grammar module or Language wrapper, " \
178
+ "Expected Citrus grammar module with parse method, " \
149
179
  "got #{grammar.class}"
150
180
  end
151
- grammar
181
+ @grammar = grammar
152
182
  end
153
183
 
154
184
  # Parse source code
155
185
  #
156
186
  # @param source [String] the source code to parse
157
- # @return [TreeHaver::Tree] wrapped tree
187
+ # @return [Tree] raw backend tree (wrapping happens in TreeHaver::Parser)
158
188
  # @raise [TreeHaver::NotAvailable] if no grammar is set
159
189
  # @raise [::Citrus::ParseError] if parsing fails
160
190
  def parse(source)
@@ -162,8 +192,8 @@ module TreeHaver
162
192
 
163
193
  begin
164
194
  citrus_match = @grammar.parse(source)
165
- inner_tree = Tree.new(citrus_match, source)
166
- TreeHaver::Tree.new(inner_tree, source: source)
195
+ # Return raw Citrus::Tree - TreeHaver::Parser will wrap it
196
+ Tree.new(citrus_match, source)
167
197
  rescue ::Citrus::ParseError => e
168
198
  # Re-raise with more context
169
199
  raise TreeHaver::Error, "Parse error: #{e.message}"
@@ -176,8 +206,8 @@ module TreeHaver
176
206
  #
177
207
  # @param old_tree [TreeHaver::Tree, nil] ignored (no incremental parsing support)
178
208
  # @param source [String] the source code to parse
179
- # @return [TreeHaver::Tree] wrapped tree
180
- def parse_string(old_tree, source)
209
+ # @return [Tree] raw backend tree (wrapping happens in TreeHaver::Parser)
210
+ def parse_string(old_tree, source) # rubocop:disable Lint/UnusedMethodArgument
181
211
  parse(source) # Citrus doesn't support incremental parsing
182
212
  end
183
213
  end
@@ -213,6 +243,10 @@ module TreeHaver
213
243
  # - matches: child matches
214
244
  # - captures: named groups
215
245
  #
246
+ # Language-specific helpers can be mixed in for convenience:
247
+ # require "tree_haver/backends/citrus/toml_helpers"
248
+ # TreeHaver::Backends::Citrus::Node.include(TreeHaver::Backends::Citrus::TomlHelpers)
249
+ #
216
250
  # @api private
217
251
  class Node
218
252
  attr_reader :match, :source
@@ -224,17 +258,104 @@ module TreeHaver
224
258
 
225
259
  # Get node type from Citrus rule name
226
260
  #
261
+ # Uses Citrus grammar introspection to dynamically determine node types.
262
+ # Works with any Citrus grammar without language-specific knowledge.
263
+ #
264
+ # Strategy:
265
+ # 1. Check if first event has a .name method (returns Symbol) - use that
266
+ # 2. If first event is a Symbol directly - use that
267
+ # 3. For compound rules (Repeat, Choice), recurse into first match
268
+ #
227
269
  # @return [String] rule name from grammar
228
270
  def type
229
- # Citrus stores the rule name in events[0]
230
271
  return "unknown" unless @match.respond_to?(:events)
231
272
  return "unknown" unless @match.events.is_a?(Array)
232
273
  return "unknown" if @match.events.empty?
233
274
 
234
- first = @match.events.first
235
- first.is_a?(Symbol) ? first.to_s : "unknown"
275
+ extract_type_from_event(@match.events.first)
276
+ end
277
+
278
+ # Check if this node represents a structural element vs a terminal/token
279
+ #
280
+ # Uses Citrus grammar's terminal? method to determine if this is
281
+ # a structural rule (like "table", "keyvalue") vs a terminal token
282
+ # (like "[", "=", whitespace).
283
+ #
284
+ # @return [Boolean] true if this is a structural (non-terminal) node
285
+ def structural?
286
+ return false unless @match.respond_to?(:events)
287
+ return false if @match.events.empty?
288
+
289
+ first_event = @match.events.first
290
+
291
+ # Check if event has terminal? method (Citrus rule object)
292
+ if first_event.respond_to?(:terminal?)
293
+ return !first_event.terminal?
294
+ end
295
+
296
+ # For Symbol events, try to look up in grammar
297
+ if first_event.is_a?(Symbol) && @match.respond_to?(:grammar)
298
+ grammar = @match.grammar
299
+ if grammar.respond_to?(:rules) && grammar.rules.key?(first_event)
300
+ rule = grammar.rules[first_event]
301
+ return !rule.terminal? if rule.respond_to?(:terminal?)
302
+ end
303
+ end
304
+
305
+ # Default: assume structural if not a simple string/regex terminal
306
+ true
307
+ end
308
+
309
+ private
310
+
311
+ # Extract type name from a Citrus event object
312
+ #
313
+ # Handles different event types:
314
+ # - Objects with .name method (Citrus rule objects) -> use .name
315
+ # - Symbol -> use directly
316
+ # - Compound rules (Repeat, Choice) -> check string representation
317
+ #
318
+ # @param event [Object] Citrus event object
319
+ # @return [String] type name
320
+ def extract_type_from_event(event)
321
+ # Case 1: Event has .name method (returns Symbol)
322
+ if event.respond_to?(:name)
323
+ name = event.name
324
+ return name.to_s if name.is_a?(Symbol)
325
+ end
326
+
327
+ # Case 2: Event is a Symbol directly (most common for child nodes)
328
+ return event.to_s if event.is_a?(Symbol)
329
+
330
+ # Case 3: Event is a String
331
+ return event if event.is_a?(String)
332
+
333
+ # Case 4: For compound rules (Repeat, Choice), try string parsing first
334
+ # This avoids recursion issues
335
+ str = event.to_s
336
+
337
+ # Try to extract rule name from string representation
338
+ # Examples: "table", "(comment | table)*", "space?", etc.
339
+ if str =~ /^([a-z_][a-z0-9_]*)/i
340
+ return $1
341
+ end
342
+
343
+ # If we have a pattern like "(rule1 | rule2)*", we can't determine
344
+ # the type without looking at actual matches, but that causes recursion
345
+ # So just return a generic type based on the pattern
346
+ if /^\(.*\)\*$/.match?(str)
347
+ return "repeat"
348
+ elsif /^\(.*\)\?$/.match?(str)
349
+ return "optional"
350
+ elsif /^.*\|.*$/.match?(str)
351
+ return "choice"
352
+ end
353
+
354
+ "unknown"
236
355
  end
237
356
 
357
+ public
358
+
238
359
  def start_byte
239
360
  @match.offset
240
361
  end