tree_haver 3.0.0 → 3.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -54,20 +54,20 @@
54
54
 
55
55
  ## 🌻 Synopsis
56
56
 
57
- TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) parsing library that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using tree-sitter grammars, regardless of your Ruby implementation.
57
+ TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) and [Citrus](https://github.com/mjackson/citrus) parsing libraries and other dedicated parsing tools that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using grammars, regardless of your Ruby implementation.
58
58
 
59
59
  ### The Adapter Pattern: Like Faraday, but for Parsing
60
60
 
61
61
  If you've used [Faraday](https://github.com/lostisland/faraday), [multi_json](https://github.com/intridea/multi_json), or [multi_xml](https://github.com/sferik/multi_xml), you'll feel right at home with TreeHaver. These gems share a common philosophy:
62
62
 
63
- | Gem | Unified API for | Backend Examples |
64
- |----------------|---------------------|------------------------------------------------------|
65
- | **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
66
- | **multi_json** | JSON parsing | Oj, Yajl, JSON gem |
67
- | **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
68
- | **TreeHaver** | tree-sitter parsing | ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus |
63
+ | Gem | Unified API for | Backend Examples |
64
+ |----------------|---------------------|--------------------------------------------------------------------------|
65
+ | **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
66
+ | **multi_json** | JSON parsing | Oj, Yajl, JSON gem |
67
+ | **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
68
+ | **TreeHaver** | Code parsing | MRI, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus (& Co.) |
69
69
 
70
- **Write once, run anywhere.**
70
+ **Write once, run anywhere.**
71
71
 
72
72
  **Learn once, write anywhere.**
73
73
 
@@ -88,18 +88,26 @@ tree = parser.parse(source_code)
88
88
  ### Key Features
89
89
 
90
90
  - **Universal Ruby Support**: Works on MRI Ruby, JRuby, and TruffleRuby
91
- - **Multiple Backends**:
92
- - **MRI Backend**: Leverages the excellent [`ruby_tree_sitter`](https://github.com/Faveod/ruby-tree-sitter) gem (C extension)
93
- - **Rust Backend**: Uses [`tree_stump`](https://github.com/anthropics/tree_stump) gem (Rust extension with precompiled binaries)
94
- - **Note**: Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13 (inclusive of the others)](https://github.com/joker1007/tree_stump/pull/13) are merged
95
- - **FFI Backend**: Pure Ruby FFI bindings to `libtree-sitter` (ideal for JRuby)
96
- - **Java Backend**: Support for JRuby's native Java integration, and native java-tree-sitter grammar JARs
97
- - **Citrus Backend**: Pure Ruby parser using [`citrus`](https://github.com/mjackson/citrus) gem (no native dependencies, portable)
91
+ - **10 Parsing Backends** - Choose the right backend for your needs:
92
+ - **Tree-sitter Backends** (high-performance, incremental parsing):
93
+ - **MRI Backend**: Leverages [`ruby_tree_sitter`](https://github.com/Faveod/ruby-tree-sitter) gem (C extension, fastest on MRI)
94
+ - **Rust Backend**: Uses [`tree_stump`][tree_stump] gem (Rust with precompiled binaries)
95
+ - **Note**: `tree_stump` currently requires unreleased fixes in the `main` branch.
96
+ - **FFI Backend**: Pure Ruby FFI bindings to `libtree-sitter` (ideal for JRuby, TruffleRuby)
97
+ - **Java Backend**: Native Java integration for JRuby with [`java-tree-sitter`](https://github.com/tree-sitter/java-tree-sitter) / [`jtreesitter`](https://central.sonatype.com/artifact/io.github.tree-sitter/jtreesitter) grammar JARs
98
+ - **Language-Specific Backends** (native parser integration):
99
+ - **Prism Backend**: Ruby's official parser ([Prism][prism], stdlib in Ruby 3.4+)
100
+ - **Psych Backend**: Ruby's YAML parser ([Psych][psych], stdlib)
101
+ - **Commonmarker Backend**: Fast Markdown parser ([Commonmarker][commonmarker], comrak Rust)
102
+ - **Markly Backend**: GitHub Flavored Markdown ([Markly][markly], cmark-gfm C)
103
+ - **Pure Ruby Fallback**:
104
+ - **Citrus Backend**: Pure Ruby parsing via [`citrus`][citrus] (no native dependencies)
98
105
  - **Automatic Backend Selection**: Intelligently selects the best backend for your Ruby implementation
99
- - **Language Agnostic**: Load any tree-sitter grammar dynamically (TOML, JSON, Ruby, JavaScript, etc.)
106
+ - **Language Agnostic**: Parse any language - Ruby, Markdown, YAML, JSON, Bash, TOML, JavaScript, etc.
100
107
  - **Grammar Discovery**: Built-in `GrammarFinder` utility for platform-aware grammar library discovery
108
+ - **Unified Position API**: Consistent `start_line`, `end_line`, `source_position` across all backends
101
109
  - **Thread-Safe**: Built-in language registry with thread-safe caching
102
- - **Minimal API Surface**: Simple, focused API that covers the most common tree-sitter use cases
110
+ - **Minimal API Surface**: Simple, focused API that covers the most common use cases
103
111
 
104
112
  ### Backend Requirements
105
113
 
@@ -128,11 +136,11 @@ gem "ruby_tree_sitter", "~> 2.0"
128
136
 
129
137
  #### Rust Backend (tree_stump)
130
138
 
131
- Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs are merged.
139
+ NOTE: `tree_stump` currently requires unreleased fixes in the `main` branch.
132
140
 
133
141
  ```ruby
134
142
  # Add to your Gemfile for Rust backend
135
- gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
143
+ gem "tree_stump", github: "joker1007/tree_stump", branch: "main"
136
144
  ```
137
145
 
138
146
  #### FFI Backend
@@ -167,7 +175,7 @@ gem "citrus", "~> 3.0"
167
175
 
168
176
  #### Java Backend (JRuby only)
169
177
 
170
- No additional dependencies required beyond grammar JARs built for java-tree-sitter.
178
+ No additional dependencies required beyond grammar JARs built for java-tree-sitter / jtreesitter.
171
179
 
172
180
  ### Why TreeHaver?
173
181
 
@@ -179,6 +187,58 @@ tree-sitter is a powerful parser generator that creates incremental parsers for
179
187
 
180
188
  TreeHaver solves these problems by providing a unified API that automatically selects the appropriate backend for your Ruby implementation, allowing you to write code once and run it anywhere.
181
189
 
190
+ ### The `*-merge` Gem Family
191
+
192
+ The `*-merge` gem family provides intelligent, AST-based merging for various file formats. At the foundation is [tree_haver][tree_haver], which provides a unified cross-Ruby parsing API that works seamlessly across MRI, JRuby, and TruffleRuby.
193
+
194
+ | Gem | Format | Parser Backend(s) | Description |
195
+ |------------------------------------------|----------|-----------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
196
+ | [tree_haver][tree_haver] | Multi | MRI C, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus | **Foundation**: Cross-Ruby adapter for parsing libraries (like Faraday for HTTP) |
197
+ | [ast-merge][ast-merge] | Text | internal | **Infrastructure**: Shared base classes and merge logic for all `*-merge` gems |
198
+ | [prism-merge][prism-merge] | Ruby | [Prism][prism] | Smart merge for Ruby source files |
199
+ | [psych-merge][psych-merge] | YAML | [Psych][psych] | Smart merge for YAML files |
200
+ | [json-merge][json-merge] | JSON | [tree-sitter-json][ts-json] (via tree_haver) | Smart merge for JSON files |
201
+ | [jsonc-merge][jsonc-merge] | JSONC | [tree-sitter-jsonc][ts-jsonc] (via tree_haver) | ⚠️ Proof of concept; Smart merge for JSON with Comments |
202
+ | [bash-merge][bash-merge] | Bash | [tree-sitter-bash][ts-bash] (via tree_haver) | Smart merge for Bash scripts |
203
+ | [rbs-merge][rbs-merge] | RBS | [RBS][rbs] | Smart merge for Ruby type signatures |
204
+ | [dotenv-merge][dotenv-merge] | Dotenv | internal | Smart merge for `.env` files |
205
+ | [toml-merge][toml-merge] | TOML | [Citrus + toml-rb][toml-rb] (default, via tree_haver), [tree-sitter-toml][ts-toml] (via tree_haver) | Smart merge for TOML files |
206
+ | [markdown-merge][markdown-merge] | Markdown | [Commonmarker][commonmarker] / [Markly][markly] (via tree_haver) | **Foundation**: Shared base for Markdown mergers with inner code block merging |
207
+ | [markly-merge][markly-merge] | Markdown | [Markly][markly] (via tree_haver) | Smart merge for Markdown (CommonMark via cmark-gfm C) |
208
+ | [commonmarker-merge][commonmarker-merge] | Markdown | [Commonmarker][commonmarker] (via tree_haver) | Smart merge for Markdown (CommonMark via comrak Rust) |
209
+
210
+ **Example implementations** for the gem templating use case:
211
+
212
+ | Gem | Purpose | Description |
213
+ |-----|---------|-------------|
214
+ | [kettle-dev][kettle-dev] | Gem Development | Gem templating tool using `*-merge` gems |
215
+ | [kettle-jem][kettle-jem] | Gem Templating | Gem template library with smart merge support |
216
+
217
+ [tree_haver]: https://github.com/kettle-rb/tree_haver
218
+ [ast-merge]: https://github.com/kettle-rb/ast-merge
219
+ [prism-merge]: https://github.com/kettle-rb/prism-merge
220
+ [psych-merge]: https://github.com/kettle-rb/psych-merge
221
+ [json-merge]: https://github.com/kettle-rb/json-merge
222
+ [jsonc-merge]: https://github.com/kettle-rb/jsonc-merge
223
+ [bash-merge]: https://github.com/kettle-rb/bash-merge
224
+ [rbs-merge]: https://github.com/kettle-rb/rbs-merge
225
+ [dotenv-merge]: https://github.com/kettle-rb/dotenv-merge
226
+ [toml-merge]: https://github.com/kettle-rb/toml-merge
227
+ [markdown-merge]: https://github.com/kettle-rb/markdown-merge
228
+ [markly-merge]: https://github.com/kettle-rb/markly-merge
229
+ [commonmarker-merge]: https://github.com/kettle-rb/commonmarker-merge
230
+ [kettle-dev]: https://github.com/kettle-rb/kettle-dev
231
+ [kettle-jem]: https://github.com/kettle-rb/kettle-jem
232
+ [prism]: https://github.com/ruby/prism
233
+ [psych]: https://github.com/ruby/psych
234
+ [ts-json]: https://github.com/tree-sitter/tree-sitter-json
235
+ [ts-bash]: https://github.com/tree-sitter/tree-sitter-bash
236
+ [ts-toml]: https://github.com/tree-sitter-grammars/tree-sitter-toml
237
+ [rbs]: https://github.com/ruby/rbs
238
+ [toml-rb]: https://github.com/emancu/toml-rb
239
+ [markly]: https://github.com/ioquatix/markly
240
+ [commonmarker]: https://github.com/gjtorikian/commonmarker
241
+
182
242
  ### Comparison with Other Ruby AST / Parser Bindings
183
243
 
184
244
  | Feature | [tree_haver] (this gem) | [ruby_tree_sitter] | [tree_stump] | [citrus] |
@@ -198,15 +258,15 @@ TreeHaver solves these problems by providing a unified API that automatically se
198
258
  | **Minimum Ruby** | 3.2+ | 3.0+ | 3.1+ | 0+ |
199
259
 
200
260
  [ruby_tree_sitter]: https://github.com/Faveod/ruby-tree-sitter
201
- [tree_stump]: https://github.com/anthropics/tree_stump
261
+ [tree_stump]: https://github.com/joker1007/tree_stump
202
262
  [citrus]: https://github.com/mjackson/citrus
203
263
  [tree_haver]: https://github.com/kettle-rb/tree_haver
204
264
 
205
- **Note:** Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
265
+ **Note:** Java backend works with grammar JARs built specifically for `java-tree-sitter` / `jtreesitter`, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
206
266
 
207
- **Note:** TreeHaver can use `ruby_tree_sitter` or `tree_stump` as backends, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
267
+ **Note:** TreeHaver can use `ruby_tree_sitter` (MRI) or `tree_stump` (MRI, JRuby?) as backends, or `java-tree-sitter` ([docs](https://tree-sitter.github.io/java-tree-sitter/), [maven](https://central.sonatype.com/artifact/io.github.tree-sitter/jtreesitter), [source](https://github.com/tree-sitter/java-tree-sitter), JRuby), or FFI on any backend, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
208
268
 
209
- **Note:** `tree_stump` currently requires [pboling's fork (tree_haver branch)](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13](https://github.com/joker1007/tree_stump/pull/13) are merged.
269
+ **Note:** `tree_stump` currently requires unreleased fixes in the `main` branch.
210
270
 
211
271
  #### When to Use Each
212
272
 
@@ -231,7 +291,7 @@ TreeHaver solves these problems by providing a unified API that automatically se
231
291
  - You prefer Rust-based native extensions
232
292
  - You want precompiled binaries without system dependencies
233
293
  - You don't need TreeHaver's grammar discovery
234
- - **Note:** Use [pboling's fork (tree_haver branch)](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), [#13](https://github.com/joker1007/tree_stump/pull/13) are merged
294
+ - **Note:** `tree_stump` currently requires unreleased fixes in the `main` branch.
235
295
 
236
296
  **Choose citrus directly when:**
237
297
 
@@ -355,18 +415,29 @@ NOTE: Be prepared to track down certs for signed gems and add them the same way
355
415
 
356
416
  ### Available Backends
357
417
 
358
- TreeHaver supports multiple parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
418
+ TreeHaver supports 10 parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
419
+
420
+ #### Tree-sitter Backends (Universal Parsing)
421
+
422
+ | Backend | Description | Performance | Portability | Examples |
423
+ |---------|-------------|-------------|-------------|----------|
424
+ | **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) · [TOML](examples/auto_toml.rb) |
425
+ | **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* · [TOML](examples/mri_toml.rb) |
426
+ | **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* · [TOML](examples/rust_toml.rb) |
427
+ | **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) · [TOML](examples/ffi_toml.rb) |
428
+ | **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) · [TOML](examples/java_toml.rb) |
429
+
430
+ #### Language-Specific Backends (Native Parser Integration)
359
431
 
360
432
  | Backend | Description | Performance | Portability | Examples |
361
433
  |---------|-------------|-------------|-------------|----------|
362
- | **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) |
363
- | **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* |
364
- | **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* |
365
- | **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) |
366
- | **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) |
434
+ | **Prism** | Ruby's official parser | Very Fast | ✅ Universal | [Ruby](examples/prism_ruby.rb) |
435
+ | **Psych** | Ruby's YAML parser (stdlib) | ⚡ Very Fast | Universal | [YAML](examples/psych_yaml.rb) |
436
+ | **Commonmarker** | Markdown via comrak (Rust) | ⚡ Very Fast | ✅ Good | [Markdown](examples/commonmarker_markdown.rb) · [Merge](examples/commonmarker_merge_example.rb) |
437
+ | **Markly** | GFM via cmark-gfm (C) | Very Fast | ✅ Good | [Markdown](examples/markly_markdown.rb) · [Merge](examples/markly_merge_example.rb) |
367
438
  | **Citrus** | Pure Ruby parsing | 🟡 Slower | ✅ Universal | [TOML](examples/citrus_toml.rb) · [Finitio](examples/citrus_finitio.rb) · [Dhall](examples/citrus_dhall.rb) |
368
439
 
369
- **Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Citrus
440
+ **Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Prism → Psych → Commonmarker → Markly → Citrus
370
441
 
371
442
  **Known Issues:**
372
443
  - *MRI + Bash: ABI incompatibility (use FFI instead)
@@ -375,29 +446,43 @@ TreeHaver supports multiple parsing backends, each with different trade-offs. Th
375
446
  **Backend Requirements:**
376
447
 
377
448
  ```ruby
378
- # MRI Backend
379
- gem 'ruby_tree_sitter'
380
-
381
- # Rust Backend
382
- gem 'tree_stump'
383
-
384
- # FFI Backend
385
- gem 'ffi'
386
-
387
- # Citrus Backend
388
- gem 'citrus'
449
+ # Tree-sitter backends
450
+ gem "ruby_tree_sitter", "~> 2.0" # MRI backend
451
+ gem "tree_stump" # Rust backend
452
+ gem "ffi", ">= 1.15", "< 2.0" # FFI backend
453
+ # Java backend: no gem required (uses JRuby's built-in JNI)
454
+
455
+ # Language-specific backends
456
+ gem "prism", "~> 1.0" # Ruby parsing (stdlib in Ruby 3.4+)
457
+ # Psych: no gem required (Ruby stdlib)
458
+ gem "commonmarker", ">= 0.23" # Markdown parsing (comrak)
459
+ gem "markly", "~> 0.11" # GFM parsing (cmark-gfm)
460
+
461
+ # Pure Ruby fallback
462
+ gem "citrus", "~> 3.0" # Citrus backend
389
463
  # Plus grammar gems: toml-rb, dhall, finitio, etc.
390
464
  ```
391
465
 
392
466
  **Force Specific Backend:**
393
467
 
394
468
  ```ruby
469
+ # Tree-sitter backends
470
+ TreeHaver.backend = :mri # Force MRI backend (ruby_tree_sitter)
471
+ TreeHaver.backend = :rust # Force Rust backend (tree_stump)
395
472
  TreeHaver.backend = :ffi # Force FFI backend
396
- TreeHaver.backend = :mri # Force MRI backend
397
- TreeHaver.backend = :rust # Force Rust backend
398
- TreeHaver.backend = :java # Force Java backend (JRuby)
473
+ TreeHaver.backend = :java # Force Java backend (JRuby only)
474
+
475
+ # Language-specific backends
476
+ TreeHaver.backend = :prism # Force Prism (Ruby parsing)
477
+ TreeHaver.backend = :psych # Force Psych (YAML parsing)
478
+ TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
479
+ TreeHaver.backend = :markly # Force Markly (GFM Markdown)
480
+
481
+ # Pure Ruby fallback
399
482
  TreeHaver.backend = :citrus # Force Citrus backend
400
- TreeHaver.backend = :auto # Auto-select (default)
483
+
484
+ # Auto-selection (default)
485
+ TreeHaver.backend = :auto # Let TreeHaver choose
401
486
  ```
402
487
 
403
488
  **Block-based Backend Switching:**
@@ -453,7 +538,7 @@ TreeHaver.backend_module # => TreeHaver::Backends::FFI
453
538
  TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
454
539
  ```
455
540
 
456
- See [examples/](examples/) directory for 18 complete working examples demonstrating all backends and languages.
541
+ See [examples/](examples/) directory for **26 complete working examples** demonstrating all 10 backends with multiple languages (JSON, JSONC, Bash, TOML, Ruby, YAML, Markdown) plus markdown-merge integration examples.
457
542
 
458
543
  ### Security Considerations
459
544
 
@@ -549,8 +634,8 @@ TreeHaver.backend = :auto
549
634
  # Force a specific backend
550
635
  TreeHaver.backend = :mri # Use ruby_tree_sitter (MRI only, C extension)
551
636
  TreeHaver.backend = :rust # Use tree_stump (MRI, Rust extension with precompiled binaries)
552
- # Note: Requires pboling's fork until PRs #5, #7, #11, #13 are merged
553
- # See: https://github.com/pboling/tree_stump/tree/tree_haver
637
+ # Note: `tree_stump` currently requires unreleased fixes in the `main` branch.
638
+ # See: https://github.com/joker1007/tree_stump
554
639
  TreeHaver.backend = :ffi # Use FFI bindings (works on MRI and JRuby)
555
640
  TreeHaver.backend = :java # Use Java bindings (JRuby only, coming soon)
556
641
  TreeHaver.backend = :citrus # Use Citrus pure Ruby parser
@@ -627,6 +712,8 @@ For the Java backend on JRuby:
627
712
  export TREE_SITTER_JAVA_JARS_DIR=/path/to/java-tree-sitter/jars
628
713
  ```
629
714
 
715
+ For more see [docs](https://tree-sitter.github.io/java-tree-sitter/), [maven](https://central.sonatype.com/artifact/io.github.tree-sitter/jtreesitter), and [source](https://github.com/tree-sitter/java-tree-sitter).
716
+
630
717
  ### Language Registration
631
718
 
632
719
  Register languages once at application startup for convenient access:
@@ -765,74 +852,141 @@ parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
765
852
 
766
853
  This is safe and idempotent—if the real `TreeSitter` module is already loaded, the shim does nothing.
767
854
 
768
- #### ⚠️ Critical: Exception Hierarchy Incompatibility
769
-
770
- **ruby_tree_sitter v2+ exceptions inherit from `Exception` (not `StandardError`).**
771
- **TreeHaver exceptions follow Ruby best practices and inherit from `StandardError`.**
855
+ #### ⚠️ Important: Exception Hierarchy
772
856
 
773
- This means exception handling behaves **differently** between the two:
857
+ **Both ruby_tree_sitter v2+ and TreeHaver exceptions inherit from `Exception` (not `StandardError`).**
774
858
 
775
- | Scenario | ruby_tree_sitter v2+ | TreeHaver Compat Mode |
776
- |----------|---------------------|----------------------|
777
- | `rescue => e` | Does NOT catch TreeSitter errors | DOES catch TreeHaver errors |
778
- | Behavior | Errors propagate (inherit Exception) | Errors caught (inherit StandardError) |
859
+ This design decision follows ruby_tree_sitter's lead for thread-safety and signal handling reasons. See [ruby_tree_sitter PR #83](https://github.com/Faveod/ruby-tree-sitter/pull/83) for the rationale.
779
860
 
780
- **Example showing the difference:**
861
+ **What this means for exception handling:**
781
862
 
782
863
  ```ruby
783
- # With real ruby_tree_sitter v2+
864
+ # ⚠️ This will NOT catch TreeHaver errors
784
865
  begin
785
- TreeSitter::Language.load("missing", "/nonexistent.so")
866
+ TreeHaver::Language.from_library("/nonexistent.so")
786
867
  rescue => e
787
- puts "Caught!" # Never reached - TreeSitter errors inherit Exception
868
+ puts "Caught!" # Never reached - TreeHaver::Error inherits Exception
788
869
  end
789
870
 
790
- # With TreeHaver compat mode
791
- require "tree_haver/compat"
871
+ # Explicit rescue is required
792
872
  begin
793
- TreeSitter::Language.load("missing", "/nonexistent.so") # Actually TreeHaver
794
- rescue => e
795
- puts "Caught!" # WILL be reached - TreeHaver errors inherit StandardError
873
+ TreeHaver::Language.from_library("/nonexistent.so")
874
+ rescue TreeHaver::Error => e
875
+ puts "Caught!" # This works
876
+ end
877
+
878
+ # ✅ Or rescue specific exceptions
879
+ begin
880
+ TreeHaver::Language.from_library("/nonexistent.so")
881
+ rescue TreeHaver::NotAvailable => e
882
+ puts "Grammar not available: #{e.message}"
796
883
  end
797
884
  ```
798
885
 
799
- **To write compatible exception handling:**
886
+ **TreeHaver Exception Hierarchy:**
887
+
888
+ ```
889
+ Exception
890
+ └── TreeHaver::Error # Base error class
891
+ ├── TreeHaver::NotAvailable # Backend/grammar not available
892
+ └── TreeHaver::BackendConflict # Backend incompatibility detected
893
+ ```
894
+
895
+ **Compatibility Mode Behavior:**
896
+
897
+ The compat mode (`require "tree_haver/compat"`) creates aliases but **does not change the exception hierarchy**:
800
898
 
801
899
  ```ruby
802
- # Option 1: Catch specific exception (works with both)
803
- begin
804
- TreeSitter::Language.load(...)
805
- rescue TreeSitter::TreeSitterError => e # Explicit rescue
806
- # Works with both ruby_tree_sitter and TreeHaver compat mode
807
- end
900
+ require "tree_haver/compat"
901
+
902
+ # TreeSitter constants are now aliases to TreeHaver
903
+ TreeSitter::Error # => TreeHaver::Error (still inherits Exception)
904
+ TreeSitter::Parser # => TreeHaver::Parser
905
+ TreeSitter::Language # => TreeHaver::Language
808
906
 
809
- # Option 2: Use TreeHaver API directly (recommended)
907
+ # Exception handling remains the same
810
908
  begin
811
- TreeHaver::Language.from_library(...)
812
- rescue TreeHaver::NotAvailable => e # TreeHaver's unified exception
813
- # Clear and consistent when using TreeHaver
909
+ TreeSitter::Language.load("missing", "/nonexistent.so")
910
+ rescue TreeSitter::Error => e # Still requires explicit rescue
911
+ puts "Error: #{e.message}"
814
912
  end
815
913
  ```
816
914
 
817
- **Why TreeHaver uses StandardError:**
915
+ **Best Practices:**
916
+
917
+ 1. **Always use explicit rescue** for TreeHaver errors:
918
+ ```ruby
919
+ begin
920
+ finder = TreeHaver::GrammarFinder.new(:toml)
921
+ finder.register! if finder.available?
922
+ language = TreeHaver::Language.toml
923
+ rescue TreeHaver::NotAvailable => e
924
+ warn("TOML grammar not available: #{e.message}")
925
+ # Fallback to another backend or fail gracefully
926
+ end
927
+ `````
928
+
929
+ 2. **Never rely on `rescue => e`** to catch TreeHaver errors (it won't work)
930
+
931
+ **Why inherit from Exception?**
818
932
 
819
- 1. **Ruby Best Practice**: The [Ruby style guide](https://rubystyle.guide/#exception-flow-control) recommends inheriting from `StandardError`
820
- 2. **Safety**: Inheriting from `Exception` can catch system signals (`SIGTERM`, `SIGINT`) and `exit`, which is dangerous
821
- 3. **Consistency**: Most Ruby libraries follow this convention
822
- 4. **Testability**: StandardError exceptions are easier to test and mock
933
+ Following ruby_tree_sitter's reasoning:
934
+ - **Thread safety**: Prevents accidental catching in thread cleanup code
935
+ - **Signal handling**: Ensures parsing errors don't interfere with SIGTERM/SIGINT
936
+ - **Intentional handling**: Forces developers to explicitly handle parsing errors
823
937
 
824
- See `lib/tree_haver/compat.rb` for detailed documentation.
938
+ See `lib/tree_haver/compat.rb` for compatibility layer documentation.
825
939
 
826
940
  ## 🔧 Basic Usage
827
941
 
828
942
  ### Quick Start
829
943
 
830
- Here's a complete example of parsing TOML with TreeHaver:
944
+ The simplest way to parse code is with `TreeHaver.parser_for`, which handles all the complexity of language loading, grammar discovery, and backend selection:
831
945
 
832
946
  ```ruby
833
947
  require "tree_haver"
834
948
 
835
- # Load a language grammar
949
+ # Parse TOML - auto-discovers grammar and falls back to Citrus if needed
950
+ parser = TreeHaver.parser_for(:toml)
951
+ tree = parser.parse("[package]\nname = \"my-app\"")
952
+
953
+ # Parse JSON
954
+ parser = TreeHaver.parser_for(:json)
955
+ tree = parser.parse('{"key": "value"}')
956
+
957
+ # Parse Bash
958
+ parser = TreeHaver.parser_for(:bash)
959
+ tree = parser.parse("#!/bin/bash\necho hello")
960
+
961
+ # With explicit library path
962
+ parser = TreeHaver.parser_for(:toml, library_path: "/custom/path/libtree-sitter-toml.so")
963
+
964
+ # With Citrus fallback configuration
965
+ parser = TreeHaver.parser_for(
966
+ :toml,
967
+ citrus_config: {gem_name: "toml-rb", grammar_const: "TomlRB::Document"},
968
+ )
969
+ ```
970
+
971
+ `TreeHaver.parser_for` handles:
972
+ 1. Checking if the language is already registered
973
+ 2. Auto-discovering tree-sitter grammar via `GrammarFinder`
974
+ 3. Falling back to Citrus grammar if tree-sitter is unavailable
975
+ 4. Creating and configuring the parser
976
+ 5. Raising `NotAvailable` with a helpful message if nothing works
977
+
978
+ ### Manual Parser Setup
979
+
980
+ For more control, you can create parsers manually:
981
+
982
+ TreeHaver works with any language through its 10 backends. Here are examples for different parsing needs:
983
+
984
+ #### Parsing with Tree-sitter (Universal Languages)
985
+
986
+ ```ruby
987
+ require "tree_haver"
988
+
989
+ # Load a tree-sitter grammar (works with MRI, Rust, FFI, or Java backend)
836
990
  language = TreeHaver::Language.from_library(
837
991
  "/usr/local/lib/libtree-sitter-toml.so",
838
992
  symbol: "tree_sitter_toml",
@@ -842,7 +996,7 @@ language = TreeHaver::Language.from_library(
842
996
  parser = TreeHaver::Parser.new
843
997
  parser.language = language
844
998
 
845
- # Parse some source code
999
+ # Parse source code
846
1000
  source = <<~TOML
847
1001
  [package]
848
1002
  name = "my-app"
@@ -851,16 +1005,116 @@ TOML
851
1005
 
852
1006
  tree = parser.parse(source)
853
1007
 
854
- # Access the root node
1008
+ # Access the unified Position API (works across all backends)
855
1009
  root = tree.root_node
856
- puts "Root node type: #{root.type}" # => "document"
1010
+ puts "Root type: #{root.type}" # => "document"
1011
+ puts "Start line: #{root.start_line}" # => 1 (1-based)
1012
+ puts "End line: #{root.end_line}" # => 3
1013
+ puts "Position: #{root.source_position}" # => {start_line: 1, end_line: 3, ...}
857
1014
 
858
1015
  # Traverse the tree
859
1016
  root.each do |child|
860
- puts "Child type: #{child.type}"
861
- child.each do |grandchild|
862
- puts " Grandchild type: #{grandchild.type}"
1017
+ puts "Child: #{child.type} at line #{child.start_line}"
1018
+ end
1019
+ ```
1020
+
1021
+ #### Parsing Ruby with Prism
1022
+
1023
+ ```ruby
1024
+ require "tree_haver"
1025
+
1026
+ TreeHaver.backend = :prism
1027
+ parser = TreeHaver::Parser.new
1028
+ parser.language = TreeHaver::Backends::Prism::Language.ruby
1029
+
1030
+ source = <<~RUBY
1031
+ class Example
1032
+ def hello
1033
+ puts "Hello, world!"
1034
+ end
863
1035
  end
1036
+ RUBY
1037
+
1038
+ tree = parser.parse(source)
1039
+ root = tree.root_node
1040
+
1041
+ # Find all method definitions
1042
+ def find_methods(node, results = [])
1043
+ results << node if node.type == "def_node"
1044
+ node.children.each { |child| find_methods(child, results) }
1045
+ results
1046
+ end
1047
+
1048
+ methods = find_methods(root)
1049
+ methods.each do |method_node|
1050
+ pos = method_node.source_position
1051
+ puts "Method at lines #{pos[:start_line]}-#{pos[:end_line]}"
1052
+ end
1053
+ ```
1054
+
1055
+ #### Parsing YAML with Psych
1056
+
1057
+ ```ruby
1058
+ require "tree_haver"
1059
+
1060
+ TreeHaver.backend = :psych
1061
+ parser = TreeHaver::Parser.new
1062
+ parser.language = TreeHaver::Backends::Psych::Language.yaml
1063
+
1064
+ source = <<~YAML
1065
+ database:
1066
+ host: localhost
1067
+ port: 5432
1068
+ YAML
1069
+
1070
+ tree = parser.parse(source)
1071
+ root = tree.root_node
1072
+
1073
+ # Navigate YAML structure
1074
+ def show_structure(node, indent = 0)
1075
+ prefix = " " * indent
1076
+ puts "#{prefix}#{node.type} (line #{node.start_line})"
1077
+ node.children.each { |child| show_structure(child, indent + 1) }
1078
+ end
1079
+
1080
+ show_structure(root)
1081
+ ```
1082
+
1083
+ #### Parsing Markdown with Commonmarker or Markly
1084
+
1085
+ ```ruby
1086
+ require "tree_haver"
1087
+
1088
+ # Choose your backend
1089
+ TreeHaver.backend = :commonmarker # or :markly for GFM
1090
+
1091
+ parser = TreeHaver::Parser.new
1092
+ parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
1093
+
1094
+ source = <<~MARKDOWN
1095
+ # My Document
1096
+
1097
+ ## Section
1098
+
1099
+ - Item 1
1100
+ - Item 2
1101
+ MARKDOWN
1102
+
1103
+ tree = parser.parse(source)
1104
+ root = tree.root_node
1105
+
1106
+ # Find all headings
1107
+ def find_headings(node, results = [])
1108
+ results << node if node.type == "heading"
1109
+ node.children.each { |child| find_headings(child, results) }
1110
+ results
1111
+ end
1112
+
1113
+ headings = find_headings(root)
1114
+ headings.each do |heading|
1115
+ level = heading.header_level
1116
+ text = heading.children.map(&:text).join
1117
+ puts "H#{level}: #{text} (line #{heading.start_line})"
864
1118
  end
865
1119
  ```
866
1120
 
@@ -989,9 +1243,9 @@ tree.edit(
989
1243
  new_tree = parser.parse_string(tree, "x = 42")
990
1244
  ```
991
1245
 
992
- **Note:** Incremental parsing requires the MRI (`ruby_tree_sitter`), Rust (`tree_stump`), or Java (`java-tree-sitter`) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:
1246
+ **Note:** Incremental parsing requires the MRI (`ruby_tree_sitter`), Rust (`tree_stump`), or Java (`java-tree-sitter` / `jtreesitter`) backend. The FFI and Citrus backends do not currently support incremental parsing. You can check support with:
993
1247
 
994
- **Note:** `tree_stump` requires [pboling's fork (tree_haver branch)](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), [#13](https://github.com/joker1007/tree_stump/pull/13) are merged.
1248
+ **Note:** `tree_stump` currently requires unreleased fixes in the `main` branch.
995
1249
 
996
1250
  ```ruby
997
1251
  tree.supports_editing? # => true if edit() is available
@@ -1095,7 +1349,7 @@ The FFI backend uses Ruby's FFI gem which relies on the system's dynamic linker,
1095
1349
 
1096
1350
  The Java backend will work with:
1097
1351
 
1098
- - Grammar JARs built specifically for java-tree-sitter (self-contained)
1352
+ - Grammar JARs built specifically for java-tree-sitter / jtreesitter (self-contained, [docs](https://tree-sitter.github.io/java-tree-sitter/), [maven](https://central.sonatype.com/artifact/io.github.tree-sitter/jtreesitter), [source](https://github.com/tree-sitter/java-tree-sitter))
1099
1353
  - Grammar `.so` files that statically link tree-sitter
1100
1354
 
1101
1355
  **Option 3: Citrus Backend (pure Ruby, portable)**
@@ -1159,7 +1413,7 @@ RSpec.describe("MyParser") do
1159
1413
  TreeHaver.with_backend(backend_name) do
1160
1414
  parser = TreeHaver::Parser.new
1161
1415
  result = parser.parse("x = 42")
1162
- expect(result.root_node.type).to eq("document")
1416
+ expect(result.root_node.type).to(eq("document"))
1163
1417
  end
1164
1418
  # Backend automatically restored after block
1165
1419
  end
@@ -1274,6 +1528,76 @@ TOML
1274
1528
  package_name = extract_package_name(toml)
1275
1529
  ```
1276
1530
 
1531
+ ### 🧪 RSpec Integration
1532
+
1533
+ TreeHaver provides shared RSpec helpers for conditional test execution based on dependency availability. This is useful for testing code that uses optional backends.
1534
+
1535
+ ```ruby
1536
+ # In your spec_helper.rb
1537
+ require "tree_haver/rspec"
1538
+ ```
1539
+
1540
+ This automatically configures RSpec with exclusion filters for all TreeHaver dependencies. Use tags to conditionally run tests:
1541
+
1542
+ ```ruby
1543
+ # Runs only when FFI backend is available
1544
+ it "parses with FFI", :ffi do
1545
+ # ...
1546
+ end
1547
+
1548
+ # Runs only when ruby_tree_sitter gem is available
1549
+ it "uses MRI backend", :mri_backend do
1550
+ # ...
1551
+ end
1552
+
1553
+ # Runs only when tree-sitter-toml grammar works
1554
+ it "parses TOML", :tree_sitter_toml do
1555
+ # ...
1556
+ end
1557
+
1558
+ # Runs only when any markdown backend is available
1559
+ it "parses markdown", :markdown_backend do
1560
+ # ...
1561
+ end
1562
+ ```
1563
+
1564
+ **Available Tags:**
1565
+
1566
+ | Tag | Description |
1567
+ |-----|-------------|
1568
+ | `:ffi` | FFI backend available (dynamic check) |
1569
+ | `:mri_backend` | ruby_tree_sitter gem available |
1570
+ | `:rust_backend` | tree_stump gem available |
1571
+ | `:java_backend` | Java backend available (JRuby) |
1572
+ | `:prism_backend` | Prism gem available |
1573
+ | `:psych_backend` | Psych available (stdlib) |
1574
+ | `:commonmarker` | commonmarker gem available |
1575
+ | `:markly` | markly gem available |
1576
+ | `:citrus_toml` | toml-rb with Citrus grammar available |
1577
+ | `:jruby` | Running on JRuby |
1578
+ | `:truffleruby` | Running on TruffleRuby |
1579
+ | `:mri` | Running on MRI (CRuby) |
1580
+ | `:tree_sitter_bash` | Bash grammar available and working |
1581
+ | `:tree_sitter_toml` | TOML grammar available and working |
1582
+ | `:tree_sitter_json` | JSON grammar available and working |
1583
+ | `:tree_sitter_jsonc` | JSONC grammar available and working |
1584
+ | `:toml_backend` | Any TOML backend available |
1585
+ | `:markdown_backend` | Any markdown backend available |
1586
+ | `:toml_merge` | toml-merge gem functional |
1587
+ | `:json_merge` | json-merge gem functional |
1588
+ | `:prism_merge` | prism-merge gem functional |
1589
+ | `:psych_merge` | psych-merge gem functional |
1590
+
1591
+ All tags have negated versions (e.g., `:not_mri_backend`, `:not_jruby`) for testing fallback behavior.
1592
+
1593
+ **Debug Output:**
1594
+
1595
+ Set `TREE_HAVER_DEBUG=1` to print a dependency summary at the start of your test suite:
1596
+
1597
+ ```bash
1598
+ TREE_HAVER_DEBUG=1 bundle exec rspec
1599
+ ```
1600
+
1277
1601
  ## 🦷 FLOSS Funding
1278
1602
 
1279
1603
  While kettle-rb tools are free software and will always be, the project would benefit immensely from some funding.
@@ -1304,9 +1628,7 @@ Support us with a monthly donation and help us continue our activities. [[Become
1304
1628
  NOTE: [kettle-readme-backers][kettle-readme-backers] updates this list every day, automatically.
1305
1629
 
1306
1630
  <!-- OPENCOLLECTIVE-INDIVIDUALS:START -->
1307
-
1308
1631
  No backers yet. Be the first!
1309
-
1310
1632
  <!-- OPENCOLLECTIVE-INDIVIDUALS:END -->
1311
1633
 
1312
1634
  ### Open Collective for Organizations
@@ -1316,9 +1638,7 @@ Become a sponsor and get your logo on our README on GitHub with a link to your s
1316
1638
  NOTE: [kettle-readme-backers][kettle-readme-backers] updates this list every day, automatically.
1317
1639
 
1318
1640
  <!-- OPENCOLLECTIVE-ORGANIZATIONS:START -->
1319
-
1320
1641
  No sponsors yet. Be the first!
1321
-
1322
1642
  <!-- OPENCOLLECTIVE-ORGANIZATIONS:END -->
1323
1643
 
1324
1644
  [kettle-readme-backers]: https://github.com/kettle-rb/tree_haver/blob/main/exe/kettle-readme-backers
@@ -1623,7 +1943,7 @@ Thanks for RTFM. ☺️
1623
1943
  [📌gitmoji]: https://gitmoji.dev
1624
1944
  [📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
1625
1945
  [🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
1626
- [🧮kloc-img]: https://img.shields.io/badge/KLOC-1.067-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1946
+ [🧮kloc-img]: https://img.shields.io/badge/KLOC-2.461-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1627
1947
  [🔐security]: SECURITY.md
1628
1948
  [🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
1629
1949
  [📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
@@ -1643,3 +1963,6 @@ Thanks for RTFM. ☺️
1643
1963
  [💎appraisal2]: https://github.com/appraisal-rb/appraisal2
1644
1964
  [💎appraisal2-img]: https://img.shields.io/badge/appraised_by-appraisal2-34495e.svg?plastic&logo=ruby&logoColor=white
1645
1965
  [💎d-in-dvcs]: https://railsbling.com/posts/dvcs/put_the_d_in_dvcs/
1966
+
1967
+ [ts-jsonc]: https://gitlab.com/WhyNotHugo/tree-sitter-jsonc
1968
+ [dotenv]: https://github.com/bkeepers/dotenv