tree_haver 2.0.0 → 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +285 -1
- data/CONTRIBUTING.md +132 -0
- data/README.md +529 -36
- data/lib/tree_haver/backends/citrus.rb +177 -20
- data/lib/tree_haver/backends/commonmarker.rb +490 -0
- data/lib/tree_haver/backends/ffi.rb +341 -142
- data/lib/tree_haver/backends/java.rb +65 -16
- data/lib/tree_haver/backends/markly.rb +559 -0
- data/lib/tree_haver/backends/mri.rb +183 -17
- data/lib/tree_haver/backends/prism.rb +624 -0
- data/lib/tree_haver/backends/psych.rb +597 -0
- data/lib/tree_haver/backends/rust.rb +60 -17
- data/lib/tree_haver/citrus_grammar_finder.rb +170 -0
- data/lib/tree_haver/grammar_finder.rb +115 -11
- data/lib/tree_haver/language_registry.rb +62 -71
- data/lib/tree_haver/node.rb +220 -4
- data/lib/tree_haver/path_validator.rb +29 -24
- data/lib/tree_haver/tree.rb +63 -9
- data/lib/tree_haver/version.rb +2 -2
- data/lib/tree_haver.rb +835 -75
- data/sig/tree_haver.rbs +18 -1
- data.tar.gz.sig +0 -0
- metadata +9 -4
- metadata.gz.sig +0 -0
data/README.md
CHANGED
|
@@ -54,20 +54,24 @@
|
|
|
54
54
|
|
|
55
55
|
## 🌻 Synopsis
|
|
56
56
|
|
|
57
|
-
TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) parsing
|
|
57
|
+
TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) and [Citrus](https://github.com/mjackson/citrus) parsing libraries and other dedicated parsing tools that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using grammars, regardless of your Ruby implementation.
|
|
58
58
|
|
|
59
59
|
### The Adapter Pattern: Like Faraday, but for Parsing
|
|
60
60
|
|
|
61
61
|
If you've used [Faraday](https://github.com/lostisland/faraday), [multi_json](https://github.com/intridea/multi_json), or [multi_xml](https://github.com/sferik/multi_xml), you'll feel right at home with TreeHaver. These gems share a common philosophy:
|
|
62
62
|
|
|
63
|
-
| Gem | Unified API for | Backend Examples
|
|
64
|
-
|
|
65
|
-
| **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon
|
|
66
|
-
| **multi_json** | JSON parsing | Oj, Yajl, JSON gem
|
|
67
|
-
| **multi_xml** | XML parsing | Nokogiri, LibXML, Ox
|
|
68
|
-
| **TreeHaver** |
|
|
63
|
+
| Gem | Unified API for | Backend Examples |
|
|
64
|
+
|----------------|---------------------|--------------------------------------------------------------------------|
|
|
65
|
+
| **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
|
|
66
|
+
| **multi_json** | JSON parsing | Oj, Yajl, JSON gem |
|
|
67
|
+
| **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
|
|
68
|
+
| **TreeHaver** | Code parsing | MRI, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus (& Co.) |
|
|
69
69
|
|
|
70
|
-
**Write once, run anywhere.**
|
|
70
|
+
**Write once, run anywhere.**
|
|
71
|
+
|
|
72
|
+
**Learn once, write anywhere.**
|
|
73
|
+
|
|
74
|
+
Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
|
|
71
75
|
|
|
72
76
|
```ruby
|
|
73
77
|
# Your code stays the same regardless of backend
|
|
@@ -76,7 +80,7 @@ parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
|
|
|
76
80
|
tree = parser.parse(source_code)
|
|
77
81
|
|
|
78
82
|
# TreeHaver automatically picks the best backend:
|
|
79
|
-
# - MRI → ruby_tree_sitter (C
|
|
83
|
+
# - MRI → ruby_tree_sitter (C extensions)
|
|
80
84
|
# - JRuby → FFI (system's libtree-sitter)
|
|
81
85
|
# - TruffleRuby → FFI or MRI backend
|
|
82
86
|
```
|
|
@@ -84,18 +88,94 @@ tree = parser.parse(source_code)
|
|
|
84
88
|
### Key Features
|
|
85
89
|
|
|
86
90
|
- **Universal Ruby Support**: Works on MRI Ruby, JRuby, and TruffleRuby
|
|
87
|
-
- **
|
|
88
|
-
- **
|
|
89
|
-
|
|
90
|
-
- **
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
91
|
+
- **10 Parsing Backends** - Choose the right backend for your needs:
|
|
92
|
+
- **Tree-sitter Backends** (high-performance, incremental parsing):
|
|
93
|
+
- **MRI Backend**: Leverages [`ruby_tree_sitter`](https://github.com/Faveod/ruby-tree-sitter) gem (C extension, fastest on MRI)
|
|
94
|
+
- **Rust Backend**: Uses [`tree_stump`](https://github.com/anthropics/tree_stump) gem (Rust with precompiled binaries)
|
|
95
|
+
- **Note**: Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13](https://github.com/joker1007/tree_stump/pull/13) are merged
|
|
96
|
+
- **FFI Backend**: Pure Ruby FFI bindings to `libtree-sitter` (ideal for JRuby, TruffleRuby)
|
|
97
|
+
- **Java Backend**: Native Java integration for JRuby with java-tree-sitter grammar JARs
|
|
98
|
+
- **Language-Specific Backends** (native parser integration):
|
|
99
|
+
- **Prism Backend**: Ruby's official parser ([Prism](https://github.com/ruby/prism), stdlib in Ruby 3.4+)
|
|
100
|
+
- **Psych Backend**: Ruby's YAML parser ([Psych](https://github.com/ruby/psych), stdlib)
|
|
101
|
+
- **Commonmarker Backend**: Fast Markdown parser ([Commonmarker](https://github.com/gjtorikian/commonmarker), comrak Rust)
|
|
102
|
+
- **Markly Backend**: GitHub Flavored Markdown ([Markly](https://github.com/ioquatix/markly), cmark-gfm C)
|
|
103
|
+
- **Pure Ruby Fallback**:
|
|
104
|
+
- **Citrus Backend**: Pure Ruby parsing via [`citrus`](https://github.com/mjackson/citrus) (no native dependencies)
|
|
94
105
|
- **Automatic Backend Selection**: Intelligently selects the best backend for your Ruby implementation
|
|
95
|
-
- **Language Agnostic**:
|
|
106
|
+
- **Language Agnostic**: Parse any language - Ruby, Markdown, YAML, JSON, Bash, TOML, JavaScript, etc.
|
|
96
107
|
- **Grammar Discovery**: Built-in `GrammarFinder` utility for platform-aware grammar library discovery
|
|
108
|
+
- **Unified Position API**: Consistent `start_line`, `end_line`, `source_position` across all backends
|
|
97
109
|
- **Thread-Safe**: Built-in language registry with thread-safe caching
|
|
98
|
-
- **Minimal API Surface**: Simple, focused API that covers the most common
|
|
110
|
+
- **Minimal API Surface**: Simple, focused API that covers the most common use cases
|
|
111
|
+
|
|
112
|
+
### Backend Requirements
|
|
113
|
+
|
|
114
|
+
TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
|
|
115
|
+
|
|
116
|
+
#### MRI Backend (ruby_tree_sitter, C extensions)
|
|
117
|
+
|
|
118
|
+
**Requires `ruby_tree_sitter` v2.0+**
|
|
119
|
+
|
|
120
|
+
In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from `Exception` (not `StandardError`). This was an intentional breaking change made for thread-safety and signal handling reasons.
|
|
121
|
+
|
|
122
|
+
**Exception Mapping**: TreeHaver catches `TreeSitter::TreeSitterError` and its subclasses, converting them to `TreeHaver::NotAvailable` while preserving the original error message. This provides a consistent exception API across all backends:
|
|
123
|
+
|
|
124
|
+
| ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
|
|
125
|
+
|-------------------------------------|----------------------------|------------------------------------------------|
|
|
126
|
+
| `TreeSitter::ParserNotFoundError` | `TreeHaver::NotAvailable` | Parser library file cannot be loaded |
|
|
127
|
+
| `TreeSitter::LanguageLoadError` | `TreeHaver::NotAvailable` | Language symbol loads but returns nothing |
|
|
128
|
+
| `TreeSitter::SymbolNotFoundError` | `TreeHaver::NotAvailable` | Symbol not found in library |
|
|
129
|
+
| `TreeSitter::ParserVersionError` | `TreeHaver::NotAvailable` | Parser version incompatible with tree-sitter |
|
|
130
|
+
| `TreeSitter::QueryCreationError` | `TreeHaver::NotAvailable` | Query creation fails |
|
|
131
|
+
|
|
132
|
+
```ruby
|
|
133
|
+
# Add to your Gemfile for MRI backend
|
|
134
|
+
gem "ruby_tree_sitter", "~> 2.0"
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
#### Rust Backend (tree_stump)
|
|
138
|
+
|
|
139
|
+
Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs are merged.
|
|
140
|
+
|
|
141
|
+
```ruby
|
|
142
|
+
# Add to your Gemfile for Rust backend
|
|
143
|
+
gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
#### FFI Backend
|
|
147
|
+
|
|
148
|
+
Requires the `ffi` gem and a system installation of `libtree-sitter`:
|
|
149
|
+
|
|
150
|
+
```ruby
|
|
151
|
+
# Add to your Gemfile for FFI backend
|
|
152
|
+
gem "ffi", ">= 1.15", "< 2.0"
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
```bash
|
|
156
|
+
# Install libtree-sitter on your system:
|
|
157
|
+
# macOS
|
|
158
|
+
brew install tree-sitter
|
|
159
|
+
|
|
160
|
+
# Ubuntu/Debian
|
|
161
|
+
apt-get install libtree-sitter0 libtree-sitter-dev
|
|
162
|
+
|
|
163
|
+
# Fedora
|
|
164
|
+
dnf install tree-sitter tree-sitter-devel
|
|
165
|
+
```
|
|
166
|
+
|
|
167
|
+
#### Citrus Backend
|
|
168
|
+
|
|
169
|
+
Pure Ruby parser with no native dependencies:
|
|
170
|
+
|
|
171
|
+
```ruby
|
|
172
|
+
# Add to your Gemfile for Citrus backend
|
|
173
|
+
gem "citrus", "~> 3.0"
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
#### Java Backend (JRuby only)
|
|
177
|
+
|
|
178
|
+
No additional dependencies required beyond grammar JARs built for java-tree-sitter.
|
|
99
179
|
|
|
100
180
|
### Why TreeHaver?
|
|
101
181
|
|
|
@@ -132,7 +212,7 @@ TreeHaver solves these problems by providing a unified API that automatically se
|
|
|
132
212
|
|
|
133
213
|
**Note:** Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
|
|
134
214
|
|
|
135
|
-
**Note:** TreeHaver can use `ruby_tree_sitter` or `tree_stump` as backends, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
|
|
215
|
+
**Note:** TreeHaver can use `ruby_tree_sitter` (MRI) or `tree_stump` (MRI, JRuby?) as backends, or `jruby-tree-sitter` (JRuby), giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
|
|
136
216
|
|
|
137
217
|
**Note:** `tree_stump` currently requires [pboling's fork (tree_haver branch)](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13](https://github.com/joker1007/tree_stump/pull/13) are merged.
|
|
138
218
|
|
|
@@ -281,6 +361,133 @@ NOTE: Be prepared to track down certs for signed gems and add them the same way
|
|
|
281
361
|
|
|
282
362
|
## ⚙️ Configuration
|
|
283
363
|
|
|
364
|
+
### Available Backends
|
|
365
|
+
|
|
366
|
+
TreeHaver supports 10 parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
|
|
367
|
+
|
|
368
|
+
#### Tree-sitter Backends (Universal Parsing)
|
|
369
|
+
|
|
370
|
+
| Backend | Description | Performance | Portability | Examples |
|
|
371
|
+
|---------|-------------|-------------|-------------|----------|
|
|
372
|
+
| **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) · [TOML](examples/auto_toml.rb) |
|
|
373
|
+
| **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* · [TOML](examples/mri_toml.rb) |
|
|
374
|
+
| **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* · [TOML](examples/rust_toml.rb) |
|
|
375
|
+
| **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) · [TOML](examples/ffi_toml.rb) |
|
|
376
|
+
| **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) · [TOML](examples/java_toml.rb) |
|
|
377
|
+
|
|
378
|
+
#### Language-Specific Backends (Native Parser Integration)
|
|
379
|
+
|
|
380
|
+
| Backend | Description | Performance | Portability | Examples |
|
|
381
|
+
|---------|-------------|-------------|-------------|----------|
|
|
382
|
+
| **Prism** | Ruby's official parser | ⚡ Very Fast | ✅ Universal | [Ruby](examples/prism_ruby.rb) |
|
|
383
|
+
| **Psych** | Ruby's YAML parser (stdlib) | ⚡ Very Fast | ✅ Universal | [YAML](examples/psych_yaml.rb) |
|
|
384
|
+
| **Commonmarker** | Markdown via comrak (Rust) | ⚡ Very Fast | ✅ Good | [Markdown](examples/commonmarker_markdown.rb) · [Merge](examples/commonmarker_merge_example.rb) |
|
|
385
|
+
| **Markly** | GFM via cmark-gfm (C) | ⚡ Very Fast | ✅ Good | [Markdown](examples/markly_markdown.rb) · [Merge](examples/markly_merge_example.rb) |
|
|
386
|
+
| **Citrus** | Pure Ruby parsing | 🟡 Slower | ✅ Universal | [TOML](examples/citrus_toml.rb) · [Finitio](examples/citrus_finitio.rb) · [Dhall](examples/citrus_dhall.rb) |
|
|
387
|
+
|
|
388
|
+
**Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Prism → Psych → Commonmarker → Markly → Citrus
|
|
389
|
+
|
|
390
|
+
**Known Issues:**
|
|
391
|
+
- *MRI + Bash: ABI incompatibility (use FFI instead)
|
|
392
|
+
- *Rust + Bash: Version mismatch (use FFI instead)
|
|
393
|
+
|
|
394
|
+
**Backend Requirements:**
|
|
395
|
+
|
|
396
|
+
```ruby
|
|
397
|
+
# Tree-sitter backends
|
|
398
|
+
gem "ruby_tree_sitter", "~> 2.0" # MRI backend
|
|
399
|
+
gem "tree_stump" # Rust backend
|
|
400
|
+
gem "ffi", ">= 1.15", "< 2.0" # FFI backend
|
|
401
|
+
# Java backend: no gem required (uses JRuby's built-in JNI)
|
|
402
|
+
|
|
403
|
+
# Language-specific backends
|
|
404
|
+
gem "prism", "~> 1.0" # Ruby parsing (stdlib in Ruby 3.4+)
|
|
405
|
+
# Psych: no gem required (Ruby stdlib)
|
|
406
|
+
gem "commonmarker", ">= 0.23" # Markdown parsing (comrak)
|
|
407
|
+
gem "markly", "~> 0.11" # GFM parsing (cmark-gfm)
|
|
408
|
+
|
|
409
|
+
# Pure Ruby fallback
|
|
410
|
+
gem "citrus", "~> 3.0" # Citrus backend
|
|
411
|
+
# Plus grammar gems: toml-rb, dhall, finitio, etc.
|
|
412
|
+
```
|
|
413
|
+
|
|
414
|
+
**Force Specific Backend:**
|
|
415
|
+
|
|
416
|
+
```ruby
|
|
417
|
+
# Tree-sitter backends
|
|
418
|
+
TreeHaver.backend = :mri # Force MRI backend (ruby_tree_sitter)
|
|
419
|
+
TreeHaver.backend = :rust # Force Rust backend (tree_stump)
|
|
420
|
+
TreeHaver.backend = :ffi # Force FFI backend
|
|
421
|
+
TreeHaver.backend = :java # Force Java backend (JRuby only)
|
|
422
|
+
|
|
423
|
+
# Language-specific backends
|
|
424
|
+
TreeHaver.backend = :prism # Force Prism (Ruby parsing)
|
|
425
|
+
TreeHaver.backend = :psych # Force Psych (YAML parsing)
|
|
426
|
+
TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
|
|
427
|
+
TreeHaver.backend = :markly # Force Markly (GFM Markdown)
|
|
428
|
+
|
|
429
|
+
# Pure Ruby fallback
|
|
430
|
+
TreeHaver.backend = :citrus # Force Citrus backend
|
|
431
|
+
|
|
432
|
+
# Auto-selection (default)
|
|
433
|
+
TreeHaver.backend = :auto # Let TreeHaver choose
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
**Block-based Backend Switching:**
|
|
437
|
+
|
|
438
|
+
Use `with_backend` to temporarily switch backends for a specific block of code.
|
|
439
|
+
This is thread-safe and supports nesting—the previous backend is automatically
|
|
440
|
+
restored when the block exits (even if an exception is raised).
|
|
441
|
+
|
|
442
|
+
```ruby
|
|
443
|
+
# Temporarily use a specific backend
|
|
444
|
+
TreeHaver.with_backend(:mri) do
|
|
445
|
+
parser = TreeHaver::Parser.new
|
|
446
|
+
tree = parser.parse(source)
|
|
447
|
+
# All operations in this block use the MRI backend
|
|
448
|
+
end
|
|
449
|
+
# Backend is restored to its previous value here
|
|
450
|
+
|
|
451
|
+
# Nested blocks work correctly
|
|
452
|
+
TreeHaver.with_backend(:rust) do
|
|
453
|
+
# Uses :rust
|
|
454
|
+
TreeHaver.with_backend(:citrus) do
|
|
455
|
+
# Uses :citrus
|
|
456
|
+
parser = TreeHaver::Parser.new
|
|
457
|
+
end
|
|
458
|
+
# Back to :rust
|
|
459
|
+
end
|
|
460
|
+
# Back to original backend
|
|
461
|
+
```
|
|
462
|
+
|
|
463
|
+
This is particularly useful for:
|
|
464
|
+
|
|
465
|
+
- **Testing**: Test the same code with different backends
|
|
466
|
+
- **Performance comparison**: Benchmark different backends
|
|
467
|
+
- **Fallback scenarios**: Try one backend, fall back to another
|
|
468
|
+
- **Thread isolation**: Each thread can use a different backend safely
|
|
469
|
+
|
|
470
|
+
```ruby
|
|
471
|
+
# Example: Testing with multiple backends
|
|
472
|
+
[:mri, :rust, :citrus].each do |backend_name|
|
|
473
|
+
TreeHaver.with_backend(backend_name) do
|
|
474
|
+
parser = TreeHaver::Parser.new
|
|
475
|
+
result = parser.parse(source)
|
|
476
|
+
puts "#{backend_name}: #{result.root_node.type}"
|
|
477
|
+
end
|
|
478
|
+
end
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
**Check Backend Capabilities:**
|
|
482
|
+
|
|
483
|
+
```ruby
|
|
484
|
+
TreeHaver.backend # => :ffi
|
|
485
|
+
TreeHaver.backend_module # => TreeHaver::Backends::FFI
|
|
486
|
+
TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
|
|
487
|
+
```
|
|
488
|
+
|
|
489
|
+
See [examples/](examples/) directory for **26 complete working examples** demonstrating all 10 backends with multiple languages (JSON, JSONC, Bash, TOML, Ruby, YAML, Markdown) plus markdown-merge integration examples.
|
|
490
|
+
|
|
284
491
|
### Security Considerations
|
|
285
492
|
|
|
286
493
|
**⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.**
|
|
@@ -591,16 +798,103 @@ parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
|
|
|
591
798
|
|
|
592
799
|
This is safe and idempotent—if the real `TreeSitter` module is already loaded, the shim does nothing.
|
|
593
800
|
|
|
801
|
+
#### ⚠️ Important: Exception Hierarchy
|
|
802
|
+
|
|
803
|
+
**Both ruby_tree_sitter v2+ and TreeHaver exceptions inherit from `Exception` (not `StandardError`).**
|
|
804
|
+
|
|
805
|
+
This design decision follows ruby_tree_sitter's lead for thread-safety and signal handling reasons. See [ruby_tree_sitter PR #83](https://github.com/Faveod/ruby-tree-sitter/pull/83) for the rationale.
|
|
806
|
+
|
|
807
|
+
**What this means for exception handling:**
|
|
808
|
+
|
|
809
|
+
```ruby
|
|
810
|
+
# ⚠️ This will NOT catch TreeHaver errors
|
|
811
|
+
begin
|
|
812
|
+
TreeHaver::Language.from_library("/nonexistent.so")
|
|
813
|
+
rescue => e
|
|
814
|
+
puts "Caught!" # Never reached - TreeHaver::Error inherits Exception
|
|
815
|
+
end
|
|
816
|
+
|
|
817
|
+
# ✅ Explicit rescue is required
|
|
818
|
+
begin
|
|
819
|
+
TreeHaver::Language.from_library("/nonexistent.so")
|
|
820
|
+
rescue TreeHaver::Error => e
|
|
821
|
+
puts "Caught!" # This works
|
|
822
|
+
end
|
|
823
|
+
|
|
824
|
+
# ✅ Or rescue specific exceptions
|
|
825
|
+
begin
|
|
826
|
+
TreeHaver::Language.from_library("/nonexistent.so")
|
|
827
|
+
rescue TreeHaver::NotAvailable => e
|
|
828
|
+
puts "Grammar not available: #{e.message}"
|
|
829
|
+
end
|
|
830
|
+
```
|
|
831
|
+
|
|
832
|
+
**TreeHaver Exception Hierarchy:**
|
|
833
|
+
|
|
834
|
+
```
|
|
835
|
+
Exception
|
|
836
|
+
└── TreeHaver::Error # Base error class
|
|
837
|
+
├── TreeHaver::NotAvailable # Backend/grammar not available
|
|
838
|
+
└── TreeHaver::BackendConflict # Backend incompatibility detected
|
|
839
|
+
```
|
|
840
|
+
|
|
841
|
+
**Compatibility Mode Behavior:**
|
|
842
|
+
|
|
843
|
+
The compat mode (`require "tree_haver/compat"`) creates aliases but **does not change the exception hierarchy**:
|
|
844
|
+
|
|
845
|
+
```ruby
|
|
846
|
+
require "tree_haver/compat"
|
|
847
|
+
|
|
848
|
+
# TreeSitter constants are now aliases to TreeHaver
|
|
849
|
+
TreeSitter::Error # => TreeHaver::Error (still inherits Exception)
|
|
850
|
+
TreeSitter::Parser # => TreeHaver::Parser
|
|
851
|
+
TreeSitter::Language # => TreeHaver::Language
|
|
852
|
+
|
|
853
|
+
# Exception handling remains the same
|
|
854
|
+
begin
|
|
855
|
+
TreeSitter::Language.load("missing", "/nonexistent.so")
|
|
856
|
+
rescue TreeSitter::Error => e # Still requires explicit rescue
|
|
857
|
+
puts "Error: #{e.message}"
|
|
858
|
+
end
|
|
859
|
+
```
|
|
860
|
+
|
|
861
|
+
**Best Practices:**
|
|
862
|
+
|
|
863
|
+
1. **Always use explicit rescue** for TreeHaver errors:
|
|
864
|
+
```ruby
|
|
865
|
+
begin
|
|
866
|
+
finder = TreeHaver::GrammarFinder.new(:toml)
|
|
867
|
+
finder.register! if finder.available?
|
|
868
|
+
language = TreeHaver::Language.toml
|
|
869
|
+
rescue TreeHaver::NotAvailable => e
|
|
870
|
+
warn("TOML grammar not available: #{e.message}")
|
|
871
|
+
# Fallback to another backend or fail gracefully
|
|
872
|
+
end
|
|
873
|
+
`````
|
|
874
|
+
|
|
875
|
+
2. **Never rely on `rescue => e`** to catch TreeHaver errors (it won't work)
|
|
876
|
+
|
|
877
|
+
**Why inherit from Exception?**
|
|
878
|
+
|
|
879
|
+
Following ruby_tree_sitter's reasoning:
|
|
880
|
+
- **Thread safety**: Prevents accidental catching in thread cleanup code
|
|
881
|
+
- **Signal handling**: Ensures parsing errors don't interfere with SIGTERM/SIGINT
|
|
882
|
+
- **Intentional handling**: Forces developers to explicitly handle parsing errors
|
|
883
|
+
|
|
884
|
+
See `lib/tree_haver/compat.rb` for compatibility layer documentation.
|
|
885
|
+
|
|
594
886
|
## 🔧 Basic Usage
|
|
595
887
|
|
|
596
888
|
### Quick Start
|
|
597
889
|
|
|
598
|
-
Here
|
|
890
|
+
TreeHaver works with any language through its 10 backends. Here are examples for different parsing needs:
|
|
891
|
+
|
|
892
|
+
#### Parsing with Tree-sitter (Universal Languages)
|
|
599
893
|
|
|
600
894
|
```ruby
|
|
601
895
|
require "tree_haver"
|
|
602
896
|
|
|
603
|
-
# Load a
|
|
897
|
+
# Load a tree-sitter grammar (works with MRI, Rust, FFI, or Java backend)
|
|
604
898
|
language = TreeHaver::Language.from_library(
|
|
605
899
|
"/usr/local/lib/libtree-sitter-toml.so",
|
|
606
900
|
symbol: "tree_sitter_toml",
|
|
@@ -610,7 +904,7 @@ language = TreeHaver::Language.from_library(
|
|
|
610
904
|
parser = TreeHaver::Parser.new
|
|
611
905
|
parser.language = language
|
|
612
906
|
|
|
613
|
-
# Parse
|
|
907
|
+
# Parse source code
|
|
614
908
|
source = <<~TOML
|
|
615
909
|
[package]
|
|
616
910
|
name = "my-app"
|
|
@@ -619,16 +913,116 @@ TOML
|
|
|
619
913
|
|
|
620
914
|
tree = parser.parse(source)
|
|
621
915
|
|
|
622
|
-
# Access the
|
|
916
|
+
# Access the unified Position API (works across all backends)
|
|
623
917
|
root = tree.root_node
|
|
624
|
-
puts "Root
|
|
918
|
+
puts "Root type: #{root.type}" # => "document"
|
|
919
|
+
puts "Start line: #{root.start_line}" # => 1 (1-based)
|
|
920
|
+
puts "End line: #{root.end_line}" # => 3
|
|
921
|
+
puts "Position: #{root.source_position}" # => {start_line: 1, end_line: 3, ...}
|
|
625
922
|
|
|
626
923
|
# Traverse the tree
|
|
627
924
|
root.each do |child|
|
|
628
|
-
puts "Child
|
|
629
|
-
|
|
630
|
-
|
|
925
|
+
puts "Child: #{child.type} at line #{child.start_line}"
|
|
926
|
+
end
|
|
927
|
+
```
|
|
928
|
+
|
|
929
|
+
#### Parsing Ruby with Prism
|
|
930
|
+
|
|
931
|
+
```ruby
|
|
932
|
+
require "tree_haver"
|
|
933
|
+
|
|
934
|
+
TreeHaver.backend = :prism
|
|
935
|
+
parser = TreeHaver::Parser.new
|
|
936
|
+
parser.language = TreeHaver::Backends::Prism::Language.ruby
|
|
937
|
+
|
|
938
|
+
source = <<~RUBY
|
|
939
|
+
class Example
|
|
940
|
+
def hello
|
|
941
|
+
puts "Hello, world!"
|
|
942
|
+
end
|
|
631
943
|
end
|
|
944
|
+
RUBY
|
|
945
|
+
|
|
946
|
+
tree = parser.parse(source)
|
|
947
|
+
root = tree.root_node
|
|
948
|
+
|
|
949
|
+
# Find all method definitions
|
|
950
|
+
def find_methods(node, results = [])
|
|
951
|
+
results << node if node.type == "def_node"
|
|
952
|
+
node.children.each { |child| find_methods(child, results) }
|
|
953
|
+
results
|
|
954
|
+
end
|
|
955
|
+
|
|
956
|
+
methods = find_methods(root)
|
|
957
|
+
methods.each do |method_node|
|
|
958
|
+
pos = method_node.source_position
|
|
959
|
+
puts "Method at lines #{pos[:start_line]}-#{pos[:end_line]}"
|
|
960
|
+
end
|
|
961
|
+
```
|
|
962
|
+
|
|
963
|
+
#### Parsing YAML with Psych
|
|
964
|
+
|
|
965
|
+
```ruby
|
|
966
|
+
require "tree_haver"
|
|
967
|
+
|
|
968
|
+
TreeHaver.backend = :psych
|
|
969
|
+
parser = TreeHaver::Parser.new
|
|
970
|
+
parser.language = TreeHaver::Backends::Psych::Language.yaml
|
|
971
|
+
|
|
972
|
+
source = <<~YAML
|
|
973
|
+
database:
|
|
974
|
+
host: localhost
|
|
975
|
+
port: 5432
|
|
976
|
+
YAML
|
|
977
|
+
|
|
978
|
+
tree = parser.parse(source)
|
|
979
|
+
root = tree.root_node
|
|
980
|
+
|
|
981
|
+
# Navigate YAML structure
|
|
982
|
+
def show_structure(node, indent = 0)
|
|
983
|
+
prefix = " " * indent
|
|
984
|
+
puts "#{prefix}#{node.type} (line #{node.start_line})"
|
|
985
|
+
node.children.each { |child| show_structure(child, indent + 1) }
|
|
986
|
+
end
|
|
987
|
+
|
|
988
|
+
show_structure(root)
|
|
989
|
+
```
|
|
990
|
+
|
|
991
|
+
#### Parsing Markdown with Commonmarker or Markly
|
|
992
|
+
|
|
993
|
+
```ruby
|
|
994
|
+
require "tree_haver"
|
|
995
|
+
|
|
996
|
+
# Choose your backend
|
|
997
|
+
TreeHaver.backend = :commonmarker # or :markly for GFM
|
|
998
|
+
|
|
999
|
+
parser = TreeHaver::Parser.new
|
|
1000
|
+
parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
|
|
1001
|
+
|
|
1002
|
+
source = <<~MARKDOWN
|
|
1003
|
+
# My Document
|
|
1004
|
+
|
|
1005
|
+
## Section
|
|
1006
|
+
|
|
1007
|
+
- Item 1
|
|
1008
|
+
- Item 2
|
|
1009
|
+
MARKDOWN
|
|
1010
|
+
|
|
1011
|
+
tree = parser.parse(source)
|
|
1012
|
+
root = tree.root_node
|
|
1013
|
+
|
|
1014
|
+
# Find all headings
|
|
1015
|
+
def find_headings(node, results = [])
|
|
1016
|
+
results << node if node.type == "heading"
|
|
1017
|
+
node.children.each { |child| find_headings(child, results) }
|
|
1018
|
+
results
|
|
1019
|
+
end
|
|
1020
|
+
|
|
1021
|
+
headings = find_headings(root)
|
|
1022
|
+
headings.each do |heading|
|
|
1023
|
+
level = heading.header_level
|
|
1024
|
+
text = heading.children.map(&:text).join
|
|
1025
|
+
puts "H#{level}: #{text} (line #{heading.start_line})"
|
|
632
1026
|
end
|
|
633
1027
|
```
|
|
634
1028
|
|
|
@@ -657,6 +1051,38 @@ parser.language = toml_language
|
|
|
657
1051
|
tree = parser.parse(toml_source)
|
|
658
1052
|
```
|
|
659
1053
|
|
|
1054
|
+
#### Flexible Language Names
|
|
1055
|
+
|
|
1056
|
+
The `name` parameter in `register_language` is an arbitrary identifier you choose—it doesn't
|
|
1057
|
+
need to match the actual language name. The actual grammar identity comes from the `path`
|
|
1058
|
+
and `symbol` parameters (for tree-sitter) or `grammar_module` (for Citrus).
|
|
1059
|
+
|
|
1060
|
+
This flexibility is useful for:
|
|
1061
|
+
|
|
1062
|
+
- **Aliasing**: Register the same grammar under multiple names
|
|
1063
|
+
- **Versioning**: Register different grammar versions (e.g., `:ruby_2`, `:ruby_3`)
|
|
1064
|
+
- **Testing**: Use unique names to avoid collisions between tests
|
|
1065
|
+
- **Context-specific naming**: Use names that make sense for your application
|
|
1066
|
+
|
|
1067
|
+
```ruby
|
|
1068
|
+
# Register the same TOML grammar under different names for different purposes
|
|
1069
|
+
TreeHaver.register_language(
|
|
1070
|
+
:config_parser, # Custom name for your app
|
|
1071
|
+
path: "/usr/local/lib/libtree-sitter-toml.so",
|
|
1072
|
+
symbol: "tree_sitter_toml",
|
|
1073
|
+
)
|
|
1074
|
+
|
|
1075
|
+
TreeHaver.register_language(
|
|
1076
|
+
:toml_v1, # Version-specific name
|
|
1077
|
+
path: "/usr/local/lib/libtree-sitter-toml.so",
|
|
1078
|
+
symbol: "tree_sitter_toml",
|
|
1079
|
+
)
|
|
1080
|
+
|
|
1081
|
+
# Use your custom names
|
|
1082
|
+
config_lang = TreeHaver::Language.config_parser
|
|
1083
|
+
versioned_lang = TreeHaver::Language.toml_v1
|
|
1084
|
+
```
|
|
1085
|
+
|
|
660
1086
|
### Parsing Different Languages
|
|
661
1087
|
|
|
662
1088
|
TreeHaver works with any tree-sitter grammar:
|
|
@@ -875,23 +1301,90 @@ TreeHaver.backend = :mri
|
|
|
875
1301
|
TreeHaver.backend = :citrus
|
|
876
1302
|
```
|
|
877
1303
|
|
|
878
|
-
### Advanced:
|
|
1304
|
+
### Advanced: Thread-Safe Backend Switching
|
|
1305
|
+
|
|
1306
|
+
TreeHaver provides `with_backend` for thread-safe, temporary backend switching. This is
|
|
1307
|
+
essential for testing, benchmarking, and applications that need different backends in
|
|
1308
|
+
different contexts.
|
|
879
1309
|
|
|
880
|
-
|
|
1310
|
+
#### Testing with Multiple Backends
|
|
1311
|
+
|
|
1312
|
+
Test the same code path with different backends using `with_backend`:
|
|
881
1313
|
|
|
882
1314
|
```ruby
|
|
883
1315
|
# In your test setup
|
|
884
1316
|
RSpec.describe("MyParser") do
|
|
885
|
-
|
|
886
|
-
|
|
1317
|
+
# Test with each available backend
|
|
1318
|
+
[:mri, :rust, :citrus].each do |backend_name|
|
|
1319
|
+
context "with #{backend_name} backend" do
|
|
1320
|
+
it "parses correctly" do
|
|
1321
|
+
TreeHaver.with_backend(backend_name) do
|
|
1322
|
+
parser = TreeHaver::Parser.new
|
|
1323
|
+
result = parser.parse("x = 42")
|
|
1324
|
+
expect(result.root_node.type).to(eq("document"))
|
|
1325
|
+
end
|
|
1326
|
+
# Backend automatically restored after block
|
|
1327
|
+
end
|
|
1328
|
+
end
|
|
887
1329
|
end
|
|
1330
|
+
end
|
|
1331
|
+
```
|
|
1332
|
+
|
|
1333
|
+
#### Thread Isolation
|
|
1334
|
+
|
|
1335
|
+
Each thread can use a different backend safely—`with_backend` uses thread-local storage:
|
|
888
1336
|
|
|
889
|
-
|
|
890
|
-
|
|
1337
|
+
```ruby
|
|
1338
|
+
threads = []
|
|
1339
|
+
|
|
1340
|
+
threads << Thread.new do
|
|
1341
|
+
TreeHaver.with_backend(:mri) do
|
|
1342
|
+
# This thread uses MRI backend
|
|
1343
|
+
parser = TreeHaver::Parser.new
|
|
1344
|
+
100.times { parser.parse("x = 1") }
|
|
891
1345
|
end
|
|
1346
|
+
end
|
|
1347
|
+
|
|
1348
|
+
threads << Thread.new do
|
|
1349
|
+
TreeHaver.with_backend(:citrus) do
|
|
1350
|
+
# This thread uses Citrus backend simultaneously
|
|
1351
|
+
parser = TreeHaver::Parser.new
|
|
1352
|
+
100.times { parser.parse("x = 1") }
|
|
1353
|
+
end
|
|
1354
|
+
end
|
|
1355
|
+
|
|
1356
|
+
threads.each(&:join)
|
|
1357
|
+
```
|
|
1358
|
+
|
|
1359
|
+
#### Nested Blocks
|
|
892
1360
|
|
|
893
|
-
|
|
894
|
-
|
|
1361
|
+
`with_backend` supports nesting—inner blocks override outer blocks:
|
|
1362
|
+
|
|
1363
|
+
```ruby
|
|
1364
|
+
TreeHaver.with_backend(:rust) do
|
|
1365
|
+
puts TreeHaver.effective_backend # => :rust
|
|
1366
|
+
|
|
1367
|
+
TreeHaver.with_backend(:citrus) do
|
|
1368
|
+
puts TreeHaver.effective_backend # => :citrus
|
|
1369
|
+
end
|
|
1370
|
+
|
|
1371
|
+
puts TreeHaver.effective_backend # => :rust (restored)
|
|
1372
|
+
end
|
|
1373
|
+
```
|
|
1374
|
+
|
|
1375
|
+
#### Fallback Pattern
|
|
1376
|
+
|
|
1377
|
+
Try one backend, fall back to another on failure:
|
|
1378
|
+
|
|
1379
|
+
```ruby
|
|
1380
|
+
def parse_with_fallback(source)
|
|
1381
|
+
TreeHaver.with_backend(:mri) do
|
|
1382
|
+
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
|
|
1383
|
+
end
|
|
1384
|
+
rescue TreeHaver::NotAvailable
|
|
1385
|
+
# Fall back to Citrus if MRI backend unavailable
|
|
1386
|
+
TreeHaver.with_backend(:citrus) do
|
|
1387
|
+
TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
|
|
895
1388
|
end
|
|
896
1389
|
end
|
|
897
1390
|
```
|
|
@@ -1292,7 +1785,7 @@ Thanks for RTFM. ☺️
|
|
|
1292
1785
|
[📌gitmoji]: https://gitmoji.dev
|
|
1293
1786
|
[📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
|
|
1294
1787
|
[🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
|
1295
|
-
[🧮kloc-img]: https://img.shields.io/badge/KLOC-
|
|
1788
|
+
[🧮kloc-img]: https://img.shields.io/badge/KLOC-1.141-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
|
|
1296
1789
|
[🔐security]: SECURITY.md
|
|
1297
1790
|
[🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
|
|
1298
1791
|
[📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
|