tree_haver 2.0.0 → 3.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +190 -1
- data/CONTRIBUTING.md +100 -0
- data/README.md +342 -11
- data/lib/tree_haver/backends/citrus.rb +141 -20
- data/lib/tree_haver/backends/ffi.rb +338 -141
- data/lib/tree_haver/backends/java.rb +65 -16
- data/lib/tree_haver/backends/mri.rb +154 -17
- data/lib/tree_haver/backends/rust.rb +59 -16
- data/lib/tree_haver/citrus_grammar_finder.rb +170 -0
- data/lib/tree_haver/grammar_finder.rb +42 -7
- data/lib/tree_haver/language_registry.rb +62 -71
- data/lib/tree_haver/node.rb +150 -0
- data/lib/tree_haver/path_validator.rb +29 -24
- data/lib/tree_haver/tree.rb +63 -9
- data/lib/tree_haver/version.rb +2 -2
- data/lib/tree_haver.rb +697 -56
- data.tar.gz.sig +0 -0
- metadata +5 -4
- metadata.gz.sig +0 -0
data/README.md
CHANGED
|
@@ -67,7 +67,11 @@ If you've used [Faraday](https://github.com/lostisland/faraday), [multi_json](ht
|
|
|
67
67
|
| **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
|
|
68
68
|
| **TreeHaver** | tree-sitter parsing | ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus |
|
|
69
69
|
|
|
70
|
-
**Write once, run anywhere.**
|
|
70
|
+
**Write once, run anywhere.**
|
|
71
|
+
|
|
72
|
+
**Learn once, write anywhere.**
|
|
73
|
+
|
|
74
|
+
Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
|
|
71
75
|
|
|
72
76
|
```ruby
|
|
73
77
|
# Your code stays the same regardless of backend
|
|
@@ -76,7 +80,7 @@ parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
|
|
|
76
80
|
tree = parser.parse(source_code)
|
|
77
81
|
|
|
78
82
|
# TreeHaver automatically picks the best backend:
|
|
79
|
-
# - MRI → ruby_tree_sitter (C
|
|
83
|
+
# - MRI → ruby_tree_sitter (C extensions)
|
|
80
84
|
# - JRuby → FFI (system's libtree-sitter)
|
|
81
85
|
# - TruffleRuby → FFI or MRI backend
|
|
82
86
|
```
|
|
@@ -97,6 +101,74 @@ tree = parser.parse(source_code)
|
|
|
97
101
|
- **Thread-Safe**: Built-in language registry with thread-safe caching
|
|
98
102
|
- **Minimal API Surface**: Simple, focused API that covers the most common tree-sitter use cases
|
|
99
103
|
|
|
104
|
+
### Backend Requirements
|
|
105
|
+
|
|
106
|
+
TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
|
|
107
|
+
|
|
108
|
+
#### MRI Backend (ruby_tree_sitter, C extensions)
|
|
109
|
+
|
|
110
|
+
**Requires `ruby_tree_sitter` v2.0+**
|
|
111
|
+
|
|
112
|
+
In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from `Exception` (not `StandardError`). This was an intentional breaking change made for thread-safety and signal handling reasons.
|
|
113
|
+
|
|
114
|
+
**Exception Mapping**: TreeHaver catches `TreeSitter::TreeSitterError` and its subclasses, converting them to `TreeHaver::NotAvailable` while preserving the original error message. This provides a consistent exception API across all backends:
|
|
115
|
+
|
|
116
|
+
| ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
|
|
117
|
+
|-------------------------------------|----------------------------|------------------------------------------------|
|
|
118
|
+
| `TreeSitter::ParserNotFoundError` | `TreeHaver::NotAvailable` | Parser library file cannot be loaded |
|
|
119
|
+
| `TreeSitter::LanguageLoadError` | `TreeHaver::NotAvailable` | Language symbol loads but returns nothing |
|
|
120
|
+
| `TreeSitter::SymbolNotFoundError` | `TreeHaver::NotAvailable` | Symbol not found in library |
|
|
121
|
+
| `TreeSitter::ParserVersionError` | `TreeHaver::NotAvailable` | Parser version incompatible with tree-sitter |
|
|
122
|
+
| `TreeSitter::QueryCreationError` | `TreeHaver::NotAvailable` | Query creation fails |
|
|
123
|
+
|
|
124
|
+
```ruby
|
|
125
|
+
# Add to your Gemfile for MRI backend
|
|
126
|
+
gem "ruby_tree_sitter", "~> 2.0"
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
#### Rust Backend (tree_stump)
|
|
130
|
+
|
|
131
|
+
Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs are merged.
|
|
132
|
+
|
|
133
|
+
```ruby
|
|
134
|
+
# Add to your Gemfile for Rust backend
|
|
135
|
+
gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
|
|
136
|
+
```
|
|
137
|
+
|
|
138
|
+
#### FFI Backend
|
|
139
|
+
|
|
140
|
+
Requires the `ffi` gem and a system installation of `libtree-sitter`:
|
|
141
|
+
|
|
142
|
+
```ruby
|
|
143
|
+
# Add to your Gemfile for FFI backend
|
|
144
|
+
gem "ffi", ">= 1.15", "< 2.0"
|
|
145
|
+
```
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Install libtree-sitter on your system:
|
|
149
|
+
# macOS
|
|
150
|
+
brew install tree-sitter
|
|
151
|
+
|
|
152
|
+
# Ubuntu/Debian
|
|
153
|
+
apt-get install libtree-sitter0 libtree-sitter-dev
|
|
154
|
+
|
|
155
|
+
# Fedora
|
|
156
|
+
dnf install tree-sitter tree-sitter-devel
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
#### Citrus Backend
|
|
160
|
+
|
|
161
|
+
Pure Ruby parser with no native dependencies:
|
|
162
|
+
|
|
163
|
+
```ruby
|
|
164
|
+
# Add to your Gemfile for Citrus backend
|
|
165
|
+
gem "citrus", "~> 3.0"
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
#### Java Backend (JRuby only)
|
|
169
|
+
|
|
170
|
+
No additional dependencies required beyond grammar JARs built for java-tree-sitter.
|
|
171
|
+
|
|
100
172
|
### Why TreeHaver?
|
|
101
173
|
|
|
102
174
|
tree-sitter is a powerful parser generator that creates incremental parsers for many programming languages. However, integrating it into Ruby applications can be challenging:
|
|
@@ -281,6 +353,108 @@ NOTE: Be prepared to track down certs for signed gems and add them the same way
|
|
|
281
353
|
|
|
282
354
|
## ⚙️ Configuration
|
|
283
355
|
|
|
356
|
+
### Available Backends
|
|
357
|
+
|
|
358
|
+
TreeHaver supports multiple parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
|
|
359
|
+
|
|
360
|
+
| Backend | Description | Performance | Portability | Examples |
|
|
361
|
+
|---------|-------------|-------------|-------------|----------|
|
|
362
|
+
| **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) |
|
|
363
|
+
| **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* |
|
|
364
|
+
| **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* |
|
|
365
|
+
| **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) |
|
|
366
|
+
| **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) |
|
|
367
|
+
| **Citrus** | Pure Ruby parsing | 🟡 Slower | ✅ Universal | [TOML](examples/citrus_toml.rb) · [Finitio](examples/citrus_finitio.rb) · [Dhall](examples/citrus_dhall.rb) |
|
|
368
|
+
|
|
369
|
+
**Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Citrus
|
|
370
|
+
|
|
371
|
+
**Known Issues:**
|
|
372
|
+
- *MRI + Bash: ABI incompatibility (use FFI instead)
|
|
373
|
+
- *Rust + Bash: Version mismatch (use FFI instead)
|
|
374
|
+
|
|
375
|
+
**Backend Requirements:**
|
|
376
|
+
|
|
377
|
+
```ruby
|
|
378
|
+
# MRI Backend
|
|
379
|
+
gem 'ruby_tree_sitter'
|
|
380
|
+
|
|
381
|
+
# Rust Backend
|
|
382
|
+
gem 'tree_stump'
|
|
383
|
+
|
|
384
|
+
# FFI Backend
|
|
385
|
+
gem 'ffi'
|
|
386
|
+
|
|
387
|
+
# Citrus Backend
|
|
388
|
+
gem 'citrus'
|
|
389
|
+
# Plus grammar gems: toml-rb, dhall, finitio, etc.
|
|
390
|
+
```
|
|
391
|
+
|
|
392
|
+
**Force Specific Backend:**
|
|
393
|
+
|
|
394
|
+
```ruby
|
|
395
|
+
TreeHaver.backend = :ffi # Force FFI backend
|
|
396
|
+
TreeHaver.backend = :mri # Force MRI backend
|
|
397
|
+
TreeHaver.backend = :rust # Force Rust backend
|
|
398
|
+
TreeHaver.backend = :java # Force Java backend (JRuby)
|
|
399
|
+
TreeHaver.backend = :citrus # Force Citrus backend
|
|
400
|
+
TreeHaver.backend = :auto # Auto-select (default)
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**Block-based Backend Switching:**
|
|
404
|
+
|
|
405
|
+
Use `with_backend` to temporarily switch backends for a specific block of code.
|
|
406
|
+
This is thread-safe and supports nesting—the previous backend is automatically
|
|
407
|
+
restored when the block exits (even if an exception is raised).
|
|
408
|
+
|
|
409
|
+
```ruby
|
|
410
|
+
# Temporarily use a specific backend
|
|
411
|
+
TreeHaver.with_backend(:mri) do
|
|
412
|
+
parser = TreeHaver::Parser.new
|
|
413
|
+
tree = parser.parse(source)
|
|
414
|
+
# All operations in this block use the MRI backend
|
|
415
|
+
end
|
|
416
|
+
# Backend is restored to its previous value here
|
|
417
|
+
|
|
418
|
+
# Nested blocks work correctly
|
|
419
|
+
TreeHaver.with_backend(:rust) do
|
|
420
|
+
# Uses :rust
|
|
421
|
+
TreeHaver.with_backend(:citrus) do
|
|
422
|
+
# Uses :citrus
|
|
423
|
+
parser = TreeHaver::Parser.new
|
|
424
|
+
end
|
|
425
|
+
# Back to :rust
|
|
426
|
+
end
|
|
427
|
+
# Back to original backend
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
This is particularly useful for:
|
|
431
|
+
|
|
432
|
+
- **Testing**: Test the same code with different backends
|
|
433
|
+
- **Performance comparison**: Benchmark different backends
|
|
434
|
+
- **Fallback scenarios**: Try one backend, fall back to another
|
|
435
|
+
- **Thread isolation**: Each thread can use a different backend safely
|
|
436
|
+
|
|
437
|
+
```ruby
|
|
438
|
+
# Example: Testing with multiple backends
|
|
439
|
+
[:mri, :rust, :citrus].each do |backend_name|
|
|
440
|
+
TreeHaver.with_backend(backend_name) do
|
|
441
|
+
parser = TreeHaver::Parser.new
|
|
442
|
+
result = parser.parse(source)
|
|
443
|
+
puts "#{backend_name}: #{result.root_node.type}"
|
|
444
|
+
end
|
|
445
|
+
end
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
**Check Backend Capabilities:**
|
|
449
|
+
|
|
450
|
+
```ruby
|
|
451
|
+
TreeHaver.backend # => :ffi
|
|
452
|
+
TreeHaver.backend_module # => TreeHaver::Backends::FFI
|
|
453
|
+
TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
|
|
454
|
+
```
|
|
455
|
+
|
|
456
|
+
See [examples/](examples/) directory for 18 complete working examples demonstrating all backends and languages.
|
|
457
|
+
|
|
284
458
|
### Security Considerations
|
|
285
459
|
|
|
286
460
|
**⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.**
|
|
@@ -591,6 +765,64 @@ parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
|
|
|
591
765
|
|
|
592
766
|
This is safe and idempotent—if the real `TreeSitter` module is already loaded, the shim does nothing.
|
|
593
767
|
|
|
768
|
+
#### ⚠️ Critical: Exception Hierarchy Incompatibility
|
|
769
|
+
|
|
770
|
+
**ruby_tree_sitter v2+ exceptions inherit from `Exception` (not `StandardError`).**
|
|
771
|
+
**TreeHaver exceptions follow Ruby best practices and inherit from `StandardError`.**
|
|
772
|
+
|
|
773
|
+
This means exception handling behaves **differently** between the two:
|
|
774
|
+
|
|
775
|
+
| Scenario | ruby_tree_sitter v2+ | TreeHaver Compat Mode |
|
|
776
|
+
|----------|---------------------|----------------------|
|
|
777
|
+
| `rescue => e` | Does NOT catch TreeSitter errors | DOES catch TreeHaver errors |
|
|
778
|
+
| Behavior | Errors propagate (inherit Exception) | Errors caught (inherit StandardError) |
|
|
779
|
+
|
|
780
|
+
**Example showing the difference:**
|
|
781
|
+
|
|
782
|
+
```ruby
|
|
783
|
+
# With real ruby_tree_sitter v2+
|
|
784
|
+
begin
|
|
785
|
+
TreeSitter::Language.load("missing", "/nonexistent.so")
|
|
786
|
+
rescue => e
|
|
787
|
+
puts "Caught!" # Never reached - TreeSitter errors inherit Exception
|
|
788
|
+
end
|
|
789
|
+
|
|
790
|
+
# With TreeHaver compat mode
|
|
791
|
+
require "tree_haver/compat"
|
|
792
|
+
begin
|
|
793
|
+
TreeSitter::Language.load("missing", "/nonexistent.so") # Actually TreeHaver
|
|
794
|
+
rescue => e
|
|
795
|
+
puts "Caught!" # WILL be reached - TreeHaver errors inherit StandardError
|
|
796
|
+
end
|
|
797
|
+
```
|
|
798
|
+
|
|
799
|
+
**To write compatible exception handling:**
|
|
800
|
+
|
|
801
|
+
```ruby
|
|
802
|
+
# Option 1: Catch specific exception (works with both)
|
|
803
|
+
begin
|
|
804
|
+
TreeSitter::Language.load(...)
|
|
805
|
+
rescue TreeSitter::TreeSitterError => e # Explicit rescue
|
|
806
|
+
# Works with both ruby_tree_sitter and TreeHaver compat mode
|
|
807
|
+
end
|
|
808
|
+
|
|
809
|
+
# Option 2: Use TreeHaver API directly (recommended)
|
|
810
|
+
begin
|
|
811
|
+
TreeHaver::Language.from_library(...)
|
|
812
|
+
rescue TreeHaver::NotAvailable => e # TreeHaver's unified exception
|
|
813
|
+
# Clear and consistent when using TreeHaver
|
|
814
|
+
end
|
|
815
|
+
```
|
|
816
|
+
|
|
817
|
+
**Why TreeHaver uses StandardError:**
|
|
818
|
+
|
|
819
|
+
1. **Ruby Best Practice**: The [Ruby style guide](https://rubystyle.guide/#exception-flow-control) recommends inheriting from `StandardError`
|
|
820
|
+
2. **Safety**: Inheriting from `Exception` can catch system signals (`SIGTERM`, `SIGINT`) and `exit`, which is dangerous
|
|
821
|
+
3. **Consistency**: Most Ruby libraries follow this convention
|
|
822
|
+
4. **Testability**: StandardError exceptions are easier to test and mock
|
|
823
|
+
|
|
824
|
+
See `lib/tree_haver/compat.rb` for detailed documentation.
|
|
825
|
+
|
|
594
826
|
## 🔧 Basic Usage
|
|
595
827
|
|
|
596
828
|
### Quick Start
|
|
@@ -657,6 +889,38 @@ parser.language = toml_language
|
|
|
657
889
|
tree = parser.parse(toml_source)
|
|
658
890
|
```
|
|
659
891
|
|
|
892
|
+
#### Flexible Language Names
|
|
893
|
+
|
|
894
|
+
The `name` parameter in `register_language` is an arbitrary identifier you choose—it doesn't
|
|
895
|
+
need to match the actual language name. The actual grammar identity comes from the `path`
|
|
896
|
+
and `symbol` parameters (for tree-sitter) or `grammar_module` (for Citrus).
|
|
897
|
+
|
|
898
|
+
This flexibility is useful for:
|
|
899
|
+
|
|
900
|
+
- **Aliasing**: Register the same grammar under multiple names
|
|
901
|
+
- **Versioning**: Register different grammar versions (e.g., `:ruby_2`, `:ruby_3`)
|
|
902
|
+
- **Testing**: Use unique names to avoid collisions between tests
|
|
903
|
+
- **Context-specific naming**: Use names that make sense for your application
|
|
904
|
+
|
|
905
|
+
```ruby
|
|
906
|
+
# Register the same TOML grammar under different names for different purposes
|
|
907
|
+
TreeHaver.register_language(
|
|
908
|
+
:config_parser, # Custom name for your app
|
|
909
|
+
path: "/usr/local/lib/libtree-sitter-toml.so",
|
|
910
|
+
symbol: "tree_sitter_toml",
|
|
911
|
+
)
|
|
912
|
+
|
|
913
|
+
TreeHaver.register_language(
|
|
914
|
+
:toml_v1, # Version-specific name
|
|
915
|
+
path: "/usr/local/lib/libtree-sitter-toml.so",
|
|
916
|
+
symbol: "tree_sitter_toml",
|
|
917
|
+
)
|
|
918
|
+
|
|
919
|
+
# Use your custom names
|
|
920
|
+
config_lang = TreeHaver::Language.config_parser
|
|
921
|
+
versioned_lang = TreeHaver::Language.toml_v1
|
|
922
|
+
```
|
|
923
|
+
|
|
660
924
|
### Parsing Different Languages
|
|
661
925
|
|
|
662
926
|
TreeHaver works with any tree-sitter grammar:
|
|
@@ -875,23 +1139,90 @@ TreeHaver.backend = :mri
|
|
|
875
1139
|
TreeHaver.backend = :citrus
|
|
876
1140
|
```
|
|
877
1141
|
|
|
878
|
-
### Advanced:
|
|
1142
|
+
### Advanced: Thread-Safe Backend Switching
|
|
1143
|
+
|
|
1144
|
+
TreeHaver provides `with_backend` for thread-safe, temporary backend switching. This is
|
|
1145
|
+
essential for testing, benchmarking, and applications that need different backends in
|
|
1146
|
+
different contexts.
|
|
879
1147
|
|
|
880
|
-
|
|
1148
|
+
#### Testing with Multiple Backends
|
|
1149
|
+
|
|
1150
|
+
Test the same code path with different backends using `with_backend`:
|
|
881
1151
|
|
|
882
1152
|
```ruby
|
|
883
1153
|
# In your test setup
|
|
884
1154
|
RSpec.describe("MyParser") do
|
|
885
|
-
|
|
886
|
-
|
|
1155
|
+
# Test with each available backend
|
|
1156
|
+
[:mri, :rust, :citrus].each do |backend_name|
|
|
1157
|
+
context "with #{backend_name} backend" do
|
|
1158
|
+
it "parses correctly" do
|
|
1159
|
+
TreeHaver.with_backend(backend_name) do
|
|
1160
|
+
parser = TreeHaver::Parser.new
|
|
1161
|
+
result = parser.parse("x = 42")
|
|
1162
|
+
expect(result.root_node.type).to eq("document")
|
|
1163
|
+
end
|
|
1164
|
+
# Backend automatically restored after block
|
|
1165
|
+
end
|
|
1166
|
+
end
|
|
887
1167
|
end
|
|
1168
|
+
end
|
|
1169
|
+
```
|
|
1170
|
+
|
|
1171
|
+
#### Thread Isolation
|
|
1172
|
+
|
|
1173
|
+
Each thread can use a different backend safely—`with_backend` uses thread-local storage:
|
|
888
1174
|
|
|
889
|
-
|
|
890
|
-
|
|
1175
|
+
```ruby
|
|
1176
|
+
threads = []
|
|
1177
|
+
|
|
1178
|
+
threads << Thread.new do
|
|
1179
|
+
TreeHaver.with_backend(:mri) do
|
|
1180
|
+
# This thread uses MRI backend
|
|
1181
|
+
parser = TreeHaver::Parser.new
|
|
1182
|
+
100.times { parser.parse("x = 1") }
|
|
891
1183
|
end
|
|
1184
|
+
end
|
|
892
1185
|
|
|
893
|
-
|
|
894
|
-
|
|
1186
|
+
threads << Thread.new do
|
|
1187
|
+
TreeHaver.with_backend(:citrus) do
|
|
1188
|
+
# This thread uses Citrus backend simultaneously
|
|
1189
|
+
parser = TreeHaver::Parser.new
|
|
1190
|
+
100.times { parser.parse("x = 1") }
|
|
1191
|
+
end
|
|
1192
|
+
end
|
|
1193
|
+
|
|
1194
|
+
threads.each(&:join)
|
|
1195
|
+
```
|
|
1196
|
+
|
|
1197
|
+
#### Nested Blocks
|
|
1198
|
+
|
|
1199
|
+
`with_backend` supports nesting—inner blocks override outer blocks:
|
|
1200
|
+
|
|
1201
|
+
```ruby
|
|
1202
|
+
TreeHaver.with_backend(:rust) do
|
|
1203
|
+
puts TreeHaver.effective_backend # => :rust
|
|
1204
|
+
|
|
1205
|
+
TreeHaver.with_backend(:citrus) do
|
|
1206
|
+
puts TreeHaver.effective_backend # => :citrus
|
|
1207
|
+
end
|
|
1208
|
+
|
|
1209
|
+
puts TreeHaver.effective_backend # => :rust (restored)
|
|
1210
|
+
end
|
|
1211
|
+
```
|
|
1212
|
+
|
|
1213
|
+
#### Fallback Pattern
|
|
1214
|
+
|
|
1215
|
+
Try one backend, fall back to another on failure:
|
|
1216
|
+
|
|
1217
|
+
```ruby
|
|
1218
|
+
def parse_with_fallback(source)
|
|
1219
|
+
TreeHaver.with_backend(:mri) do
|
|
1220
|
+
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
|
|
1221
|
+
end
|
|
1222
|
+
rescue TreeHaver::NotAvailable
|
|
1223
|
+
# Fall back to Citrus if MRI backend unavailable
|
|
1224
|
+
TreeHaver.with_backend(:citrus) do
|
|
1225
|
+
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
|
|
895
1226
|
end
|
|
896
1227
|
end
|
|
897
1228
|
```
|
|
@@ -1292,7 +1623,7 @@ Thanks for RTFM. ☺️
|
|
|
1292
1623
|
[📌gitmoji]: https://gitmoji.dev
|
|
1293
1624
|
[📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
|
|
1294
1625
|
[🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
|
1295
|
-
[🧮kloc-img]: https://img.shields.io/badge/KLOC-
|
|
1626
|
+
[🧮kloc-img]: https://img.shields.io/badge/KLOC-1.067-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
|
|
1296
1627
|
[🔐security]: SECURITY.md
|
|
1297
1628
|
[🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
|
|
1298
1629
|
[📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
|
|
@@ -86,10 +86,16 @@ module TreeHaver
|
|
|
86
86
|
# # For TOML, use toml-rb's grammar
|
|
87
87
|
# language = TreeHaver::Backends::Citrus::Language.new(TomlRB::Document)
|
|
88
88
|
class Language
|
|
89
|
+
include Comparable
|
|
90
|
+
|
|
89
91
|
# The Citrus grammar module
|
|
90
92
|
# @return [Module] Citrus grammar module (e.g., TomlRB::Document)
|
|
91
93
|
attr_reader :grammar_module
|
|
92
94
|
|
|
95
|
+
# The backend this language is for
|
|
96
|
+
# @return [Symbol]
|
|
97
|
+
attr_reader :backend
|
|
98
|
+
|
|
93
99
|
# @param grammar_module [Module] A Citrus grammar module with a parse method
|
|
94
100
|
def initialize(grammar_module)
|
|
95
101
|
unless grammar_module.respond_to?(:parse)
|
|
@@ -98,8 +104,33 @@ module TreeHaver
|
|
|
98
104
|
"Expected a Citrus grammar module (e.g., TomlRB::Document)."
|
|
99
105
|
end
|
|
100
106
|
@grammar_module = grammar_module
|
|
107
|
+
@backend = :citrus
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
# Compare languages for equality
|
|
111
|
+
#
|
|
112
|
+
# Citrus languages are equal if they have the same backend and grammar_module.
|
|
113
|
+
# Grammar module uniquely identifies a Citrus language.
|
|
114
|
+
#
|
|
115
|
+
# @param other [Object] object to compare with
|
|
116
|
+
# @return [Integer, nil] -1, 0, 1, or nil if not comparable
|
|
117
|
+
def <=>(other)
|
|
118
|
+
return unless other.is_a?(Language)
|
|
119
|
+
return unless other.backend == @backend
|
|
120
|
+
|
|
121
|
+
# Compare by grammar_module name (modules are compared by object_id by default)
|
|
122
|
+
@grammar_module.name <=> other.grammar_module.name
|
|
123
|
+
end
|
|
124
|
+
|
|
125
|
+
# Hash value for this language (for use in Sets/Hashes)
|
|
126
|
+
# @return [Integer]
|
|
127
|
+
def hash
|
|
128
|
+
[@backend, @grammar_module.name].hash
|
|
101
129
|
end
|
|
102
130
|
|
|
131
|
+
# Alias eql? to ==
|
|
132
|
+
alias_method :eql?, :==
|
|
133
|
+
|
|
103
134
|
# Not applicable for Citrus (tree-sitter-specific)
|
|
104
135
|
#
|
|
105
136
|
# Citrus grammars are Ruby modules, not shared libraries.
|
|
@@ -131,30 +162,29 @@ module TreeHaver
|
|
|
131
162
|
|
|
132
163
|
# Set the grammar for this parser
|
|
133
164
|
#
|
|
134
|
-
#
|
|
135
|
-
#
|
|
165
|
+
# Note: TreeHaver::Parser unwraps language objects before calling this method.
|
|
166
|
+
# This backend receives the raw Citrus grammar module (unwrapped), not the Language wrapper.
|
|
167
|
+
#
|
|
168
|
+
# @param grammar [Module] Citrus grammar module with a parse method
|
|
169
|
+
# @return [void]
|
|
136
170
|
# @example
|
|
137
171
|
# require "toml-rb"
|
|
138
|
-
#
|
|
139
|
-
# #
|
|
140
|
-
# parser.language = TreeHaver::Backends::Citrus::Language.new(TomlRB::Document)
|
|
172
|
+
# # TreeHaver::Parser unwraps Language.new(TomlRB::Document) to just TomlRB::Document
|
|
173
|
+
# parser.language = TomlRB::Document # Backend receives unwrapped module
|
|
141
174
|
def language=(grammar)
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
elsif grammar.respond_to?(:parse)
|
|
145
|
-
grammar
|
|
146
|
-
else
|
|
175
|
+
# grammar is already unwrapped by TreeHaver::Parser
|
|
176
|
+
unless grammar.respond_to?(:parse)
|
|
147
177
|
raise ArgumentError,
|
|
148
|
-
"Expected Citrus grammar module
|
|
178
|
+
"Expected Citrus grammar module with parse method, " \
|
|
149
179
|
"got #{grammar.class}"
|
|
150
180
|
end
|
|
151
|
-
grammar
|
|
181
|
+
@grammar = grammar
|
|
152
182
|
end
|
|
153
183
|
|
|
154
184
|
# Parse source code
|
|
155
185
|
#
|
|
156
186
|
# @param source [String] the source code to parse
|
|
157
|
-
# @return [
|
|
187
|
+
# @return [Tree] raw backend tree (wrapping happens in TreeHaver::Parser)
|
|
158
188
|
# @raise [TreeHaver::NotAvailable] if no grammar is set
|
|
159
189
|
# @raise [::Citrus::ParseError] if parsing fails
|
|
160
190
|
def parse(source)
|
|
@@ -162,8 +192,8 @@ module TreeHaver
|
|
|
162
192
|
|
|
163
193
|
begin
|
|
164
194
|
citrus_match = @grammar.parse(source)
|
|
165
|
-
|
|
166
|
-
|
|
195
|
+
# Return raw Citrus::Tree - TreeHaver::Parser will wrap it
|
|
196
|
+
Tree.new(citrus_match, source)
|
|
167
197
|
rescue ::Citrus::ParseError => e
|
|
168
198
|
# Re-raise with more context
|
|
169
199
|
raise TreeHaver::Error, "Parse error: #{e.message}"
|
|
@@ -176,8 +206,8 @@ module TreeHaver
|
|
|
176
206
|
#
|
|
177
207
|
# @param old_tree [TreeHaver::Tree, nil] ignored (no incremental parsing support)
|
|
178
208
|
# @param source [String] the source code to parse
|
|
179
|
-
# @return [
|
|
180
|
-
def parse_string(old_tree, source)
|
|
209
|
+
# @return [Tree] raw backend tree (wrapping happens in TreeHaver::Parser)
|
|
210
|
+
def parse_string(old_tree, source) # rubocop:disable Lint/UnusedMethodArgument
|
|
181
211
|
parse(source) # Citrus doesn't support incremental parsing
|
|
182
212
|
end
|
|
183
213
|
end
|
|
@@ -213,6 +243,10 @@ module TreeHaver
|
|
|
213
243
|
# - matches: child matches
|
|
214
244
|
# - captures: named groups
|
|
215
245
|
#
|
|
246
|
+
# Language-specific helpers can be mixed in for convenience:
|
|
247
|
+
# require "tree_haver/backends/citrus/toml_helpers"
|
|
248
|
+
# TreeHaver::Backends::Citrus::Node.include(TreeHaver::Backends::Citrus::TomlHelpers)
|
|
249
|
+
#
|
|
216
250
|
# @api private
|
|
217
251
|
class Node
|
|
218
252
|
attr_reader :match, :source
|
|
@@ -224,17 +258,104 @@ module TreeHaver
|
|
|
224
258
|
|
|
225
259
|
# Get node type from Citrus rule name
|
|
226
260
|
#
|
|
261
|
+
# Uses Citrus grammar introspection to dynamically determine node types.
|
|
262
|
+
# Works with any Citrus grammar without language-specific knowledge.
|
|
263
|
+
#
|
|
264
|
+
# Strategy:
|
|
265
|
+
# 1. Check if first event has a .name method (returns Symbol) - use that
|
|
266
|
+
# 2. If first event is a Symbol directly - use that
|
|
267
|
+
# 3. For compound rules (Repeat, Choice), recurse into first match
|
|
268
|
+
#
|
|
227
269
|
# @return [String] rule name from grammar
|
|
228
270
|
def type
|
|
229
|
-
# Citrus stores the rule name in events[0]
|
|
230
271
|
return "unknown" unless @match.respond_to?(:events)
|
|
231
272
|
return "unknown" unless @match.events.is_a?(Array)
|
|
232
273
|
return "unknown" if @match.events.empty?
|
|
233
274
|
|
|
234
|
-
|
|
235
|
-
|
|
275
|
+
extract_type_from_event(@match.events.first)
|
|
276
|
+
end
|
|
277
|
+
|
|
278
|
+
# Check if this node represents a structural element vs a terminal/token
|
|
279
|
+
#
|
|
280
|
+
# Uses Citrus grammar's terminal? method to determine if this is
|
|
281
|
+
# a structural rule (like "table", "keyvalue") vs a terminal token
|
|
282
|
+
# (like "[", "=", whitespace).
|
|
283
|
+
#
|
|
284
|
+
# @return [Boolean] true if this is a structural (non-terminal) node
|
|
285
|
+
def structural?
|
|
286
|
+
return false unless @match.respond_to?(:events)
|
|
287
|
+
return false if @match.events.empty?
|
|
288
|
+
|
|
289
|
+
first_event = @match.events.first
|
|
290
|
+
|
|
291
|
+
# Check if event has terminal? method (Citrus rule object)
|
|
292
|
+
if first_event.respond_to?(:terminal?)
|
|
293
|
+
return !first_event.terminal?
|
|
294
|
+
end
|
|
295
|
+
|
|
296
|
+
# For Symbol events, try to look up in grammar
|
|
297
|
+
if first_event.is_a?(Symbol) && @match.respond_to?(:grammar)
|
|
298
|
+
grammar = @match.grammar
|
|
299
|
+
if grammar.respond_to?(:rules) && grammar.rules.key?(first_event)
|
|
300
|
+
rule = grammar.rules[first_event]
|
|
301
|
+
return !rule.terminal? if rule.respond_to?(:terminal?)
|
|
302
|
+
end
|
|
303
|
+
end
|
|
304
|
+
|
|
305
|
+
# Default: assume structural if not a simple string/regex terminal
|
|
306
|
+
true
|
|
307
|
+
end
|
|
308
|
+
|
|
309
|
+
private
|
|
310
|
+
|
|
311
|
+
# Extract type name from a Citrus event object
|
|
312
|
+
#
|
|
313
|
+
# Handles different event types:
|
|
314
|
+
# - Objects with .name method (Citrus rule objects) -> use .name
|
|
315
|
+
# - Symbol -> use directly
|
|
316
|
+
# - Compound rules (Repeat, Choice) -> check string representation
|
|
317
|
+
#
|
|
318
|
+
# @param event [Object] Citrus event object
|
|
319
|
+
# @return [String] type name
|
|
320
|
+
def extract_type_from_event(event)
|
|
321
|
+
# Case 1: Event has .name method (returns Symbol)
|
|
322
|
+
if event.respond_to?(:name)
|
|
323
|
+
name = event.name
|
|
324
|
+
return name.to_s if name.is_a?(Symbol)
|
|
325
|
+
end
|
|
326
|
+
|
|
327
|
+
# Case 2: Event is a Symbol directly (most common for child nodes)
|
|
328
|
+
return event.to_s if event.is_a?(Symbol)
|
|
329
|
+
|
|
330
|
+
# Case 3: Event is a String
|
|
331
|
+
return event if event.is_a?(String)
|
|
332
|
+
|
|
333
|
+
# Case 4: For compound rules (Repeat, Choice), try string parsing first
|
|
334
|
+
# This avoids recursion issues
|
|
335
|
+
str = event.to_s
|
|
336
|
+
|
|
337
|
+
# Try to extract rule name from string representation
|
|
338
|
+
# Examples: "table", "(comment | table)*", "space?", etc.
|
|
339
|
+
if str =~ /^([a-z_][a-z0-9_]*)/i
|
|
340
|
+
return $1
|
|
341
|
+
end
|
|
342
|
+
|
|
343
|
+
# If we have a pattern like "(rule1 | rule2)*", we can't determine
|
|
344
|
+
# the type without looking at actual matches, but that causes recursion
|
|
345
|
+
# So just return a generic type based on the pattern
|
|
346
|
+
if /^\(.*\)\*$/.match?(str)
|
|
347
|
+
return "repeat"
|
|
348
|
+
elsif /^\(.*\)\?$/.match?(str)
|
|
349
|
+
return "optional"
|
|
350
|
+
elsif /^.*\|.*$/.match?(str)
|
|
351
|
+
return "choice"
|
|
352
|
+
end
|
|
353
|
+
|
|
354
|
+
"unknown"
|
|
236
355
|
end
|
|
237
356
|
|
|
357
|
+
public
|
|
358
|
+
|
|
238
359
|
def start_byte
|
|
239
360
|
@match.offset
|
|
240
361
|
end
|