tree_haver 2.0.0 → 3.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -54,20 +54,24 @@
54
54
 
55
55
  ## 🌻 Synopsis
56
56
 
57
- TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) parsing library that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using tree-sitter grammars, regardless of your Ruby implementation.
57
+ TreeHaver is a cross-Ruby adapter for the [tree-sitter](https://tree-sitter.github.io/tree-sitter/) and [Citrus](https://github.com/mjackson/citrus) parsing libraries and other dedicated parsing tools that works seamlessly across MRI Ruby, JRuby, and TruffleRuby. It provides a unified API for parsing source code using grammars, regardless of your Ruby implementation.
58
58
 
59
59
  ### The Adapter Pattern: Like Faraday, but for Parsing
60
60
 
61
61
  If you've used [Faraday](https://github.com/lostisland/faraday), [multi_json](https://github.com/intridea/multi_json), or [multi_xml](https://github.com/sferik/multi_xml), you'll feel right at home with TreeHaver. These gems share a common philosophy:
62
62
 
63
- | Gem | Unified API for | Backend Examples |
64
- |----------------|---------------------|------------------------------------------------------|
65
- | **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
66
- | **multi_json** | JSON parsing | Oj, Yajl, JSON gem |
67
- | **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
68
- | **TreeHaver** | tree-sitter parsing | ruby_tree_sitter, tree_stump, FFI, Java JARs, Citrus |
63
+ | Gem | Unified API for | Backend Examples |
64
+ |----------------|---------------------|--------------------------------------------------------------------------|
65
+ | **Faraday** | HTTP requests | Net::HTTP, Typhoeus, Patron, Excon |
66
+ | **multi_json** | JSON parsing | Oj, Yajl, JSON gem |
67
+ | **multi_xml** | XML parsing | Nokogiri, LibXML, Ox |
68
+ | **TreeHaver** | Code parsing | MRI, Rust, FFI, Java, Prism, Psych, Commonmarker, Markly, Citrus (& Co.) |
69
69
 
70
- **Write once, run anywhere.** Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
70
+ **Write once, run anywhere.**
71
+
72
+ **Learn once, write anywhere.**
73
+
74
+ Just as Faraday lets you swap HTTP adapters without changing your code, TreeHaver lets you swap tree-sitter backends. Your parsing code remains the same whether you're running on MRI with native C extensions, JRuby with FFI, or TruffleRuby.
71
75
 
72
76
  ```ruby
73
77
  # Your code stays the same regardless of backend
@@ -76,7 +80,7 @@ parser.language = TreeHaver::Language.from_library("/path/to/grammar.so")
76
80
  tree = parser.parse(source_code)
77
81
 
78
82
  # TreeHaver automatically picks the best backend:
79
- # - MRI → ruby_tree_sitter (C extension)
83
+ # - MRI → ruby_tree_sitter (C extensions)
80
84
  # - JRuby → FFI (system's libtree-sitter)
81
85
  # - TruffleRuby → FFI or MRI backend
82
86
  ```
@@ -84,18 +88,94 @@ tree = parser.parse(source_code)
84
88
  ### Key Features
85
89
 
86
90
  - **Universal Ruby Support**: Works on MRI Ruby, JRuby, and TruffleRuby
87
- - **Multiple Backends**:
88
- - **MRI Backend**: Leverages the excellent [`ruby_tree_sitter`](https://github.com/Faveod/ruby-tree-sitter) gem (C extension)
89
- - **Rust Backend**: Uses [`tree_stump`](https://github.com/anthropics/tree_stump) gem (Rust extension with precompiled binaries)
90
- - **Note**: Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13 (inclusive of the others)](https://github.com/joker1007/tree_stump/pull/13) are merged
91
- - **FFI Backend**: Pure Ruby FFI bindings to `libtree-sitter` (ideal for JRuby)
92
- - **Java Backend**: Support for JRuby's native Java integration, and native java-tree-sitter grammar JARs
93
- - **Citrus Backend**: Pure Ruby parser using [`citrus`](https://github.com/mjackson/citrus) gem (no native dependencies, portable)
91
+ - **10 Parsing Backends** - Choose the right backend for your needs:
92
+ - **Tree-sitter Backends** (high-performance, incremental parsing):
93
+ - **MRI Backend**: Leverages [`ruby_tree_sitter`](https://github.com/Faveod/ruby-tree-sitter) gem (C extension, fastest on MRI)
94
+ - **Rust Backend**: Uses [`tree_stump`](https://github.com/anthropics/tree_stump) gem (Rust with precompiled binaries)
95
+ - **Note**: Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13](https://github.com/joker1007/tree_stump/pull/13) are merged
96
+ - **FFI Backend**: Pure Ruby FFI bindings to `libtree-sitter` (ideal for JRuby, TruffleRuby)
97
+ - **Java Backend**: Native Java integration for JRuby with java-tree-sitter grammar JARs
98
+ - **Language-Specific Backends** (native parser integration):
99
+ - **Prism Backend**: Ruby's official parser ([Prism](https://github.com/ruby/prism), stdlib in Ruby 3.4+)
100
+ - **Psych Backend**: Ruby's YAML parser ([Psych](https://github.com/ruby/psych), stdlib)
101
+ - **Commonmarker Backend**: Fast Markdown parser ([Commonmarker](https://github.com/gjtorikian/commonmarker), comrak Rust)
102
+ - **Markly Backend**: GitHub Flavored Markdown ([Markly](https://github.com/ioquatix/markly), cmark-gfm C)
103
+ - **Pure Ruby Fallback**:
104
+ - **Citrus Backend**: Pure Ruby parsing via [`citrus`](https://github.com/mjackson/citrus) (no native dependencies)
94
105
  - **Automatic Backend Selection**: Intelligently selects the best backend for your Ruby implementation
95
- - **Language Agnostic**: Load any tree-sitter grammar dynamically (TOML, JSON, Ruby, JavaScript, etc.)
106
+ - **Language Agnostic**: Parse any language - Ruby, Markdown, YAML, JSON, Bash, TOML, JavaScript, etc.
96
107
  - **Grammar Discovery**: Built-in `GrammarFinder` utility for platform-aware grammar library discovery
108
+ - **Unified Position API**: Consistent `start_line`, `end_line`, `source_position` across all backends
97
109
  - **Thread-Safe**: Built-in language registry with thread-safe caching
98
- - **Minimal API Surface**: Simple, focused API that covers the most common tree-sitter use cases
110
+ - **Minimal API Surface**: Simple, focused API that covers the most common use cases
111
+
112
+ ### Backend Requirements
113
+
114
+ TreeHaver has minimal dependencies and automatically selects the best backend for your Ruby implementation. Each backend has specific version requirements:
115
+
116
+ #### MRI Backend (ruby_tree_sitter, C extensions)
117
+
118
+ **Requires `ruby_tree_sitter` v2.0+**
119
+
120
+ In ruby_tree_sitter v2.0, all TreeSitter exceptions were changed to inherit from `Exception` (not `StandardError`). This was an intentional breaking change made for thread-safety and signal handling reasons.
121
+
122
+ **Exception Mapping**: TreeHaver catches `TreeSitter::TreeSitterError` and its subclasses, converting them to `TreeHaver::NotAvailable` while preserving the original error message. This provides a consistent exception API across all backends:
123
+
124
+ | ruby_tree_sitter Exception | TreeHaver Exception | When It Occurs |
125
+ |-------------------------------------|----------------------------|------------------------------------------------|
126
+ | `TreeSitter::ParserNotFoundError` | `TreeHaver::NotAvailable` | Parser library file cannot be loaded |
127
+ | `TreeSitter::LanguageLoadError` | `TreeHaver::NotAvailable` | Language symbol loads but returns nothing |
128
+ | `TreeSitter::SymbolNotFoundError` | `TreeHaver::NotAvailable` | Symbol not found in library |
129
+ | `TreeSitter::ParserVersionError` | `TreeHaver::NotAvailable` | Parser version incompatible with tree-sitter |
130
+ | `TreeSitter::QueryCreationError` | `TreeHaver::NotAvailable` | Query creation fails |
131
+
132
+ ```ruby
133
+ # Add to your Gemfile for MRI backend
134
+ gem "ruby_tree_sitter", "~> 2.0"
135
+ ```
136
+
137
+ #### Rust Backend (tree_stump)
138
+
139
+ Currently requires [pboling's fork](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs are merged.
140
+
141
+ ```ruby
142
+ # Add to your Gemfile for Rust backend
143
+ gem "tree_stump", github: "pboling/tree_stump", branch: "tree_haver"
144
+ ```
145
+
146
+ #### FFI Backend
147
+
148
+ Requires the `ffi` gem and a system installation of `libtree-sitter`:
149
+
150
+ ```ruby
151
+ # Add to your Gemfile for FFI backend
152
+ gem "ffi", ">= 1.15", "< 2.0"
153
+ ```
154
+
155
+ ```bash
156
+ # Install libtree-sitter on your system:
157
+ # macOS
158
+ brew install tree-sitter
159
+
160
+ # Ubuntu/Debian
161
+ apt-get install libtree-sitter0 libtree-sitter-dev
162
+
163
+ # Fedora
164
+ dnf install tree-sitter tree-sitter-devel
165
+ ```
166
+
167
+ #### Citrus Backend
168
+
169
+ Pure Ruby parser with no native dependencies:
170
+
171
+ ```ruby
172
+ # Add to your Gemfile for Citrus backend
173
+ gem "citrus", "~> 3.0"
174
+ ```
175
+
176
+ #### Java Backend (JRuby only)
177
+
178
+ No additional dependencies required beyond grammar JARs built for java-tree-sitter.
99
179
 
100
180
  ### Why TreeHaver?
101
181
 
@@ -132,7 +212,7 @@ TreeHaver solves these problems by providing a unified API that automatically se
132
212
 
133
213
  **Note:** Java backend works with grammar JARs built specifically for java-tree-sitter, or grammar .so files that statically link tree-sitter. This is why FFI is recommended for JRuby & TruffleRuby.
134
214
 
135
- **Note:** TreeHaver can use `ruby_tree_sitter` or `tree_stump` as backends, giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
215
+ **Note:** TreeHaver can use `ruby_tree_sitter` (MRI) or `tree_stump` (MRI, JRuby?) as backends, or `jruby-tree-sitter` (JRuby), giving you TreeHaver's unified API, grammar discovery, and security features, plus full access to incremental parsing when using those backends.
136
216
 
137
217
  **Note:** `tree_stump` currently requires [pboling's fork (tree_haver branch)](https://github.com/pboling/tree_stump/tree/tree_haver) until upstream PRs [#5](https://github.com/joker1007/tree_stump/pull/5), [#7](https://github.com/joker1007/tree_stump/pull/7), [#11](https://github.com/joker1007/tree_stump/pull/11), and [#13](https://github.com/joker1007/tree_stump/pull/13) are merged.
138
218
 
@@ -281,6 +361,133 @@ NOTE: Be prepared to track down certs for signed gems and add them the same way
281
361
 
282
362
  ## ⚙️ Configuration
283
363
 
364
+ ### Available Backends
365
+
366
+ TreeHaver supports 10 parsing backends, each with different trade-offs. The `auto` backend automatically selects the best available option.
367
+
368
+ #### Tree-sitter Backends (Universal Parsing)
369
+
370
+ | Backend | Description | Performance | Portability | Examples |
371
+ |---------|-------------|-------------|-------------|----------|
372
+ | **Auto** | Auto-selects best backend | Varies | ✅ Universal | [JSON](examples/auto_json.rb) · [JSONC](examples/auto_jsonc.rb) · [Bash](examples/auto_bash.rb) · [TOML](examples/auto_toml.rb) |
373
+ | **MRI** | C extension via ruby_tree_sitter | ⚡ Fastest | MRI only | [JSON](examples/mri_json.rb) · [JSONC](examples/mri_jsonc.rb) · ~~Bash~~* · [TOML](examples/mri_toml.rb) |
374
+ | **Rust** | Precompiled via tree_stump | ⚡ Very Fast | ✅ Good | [JSON](examples/rust_json.rb) · [JSONC](examples/rust_jsonc.rb) · ~~Bash~~* · [TOML](examples/rust_toml.rb) |
375
+ | **FFI** | Dynamic linking via FFI | 🔵 Fast | ✅ Universal | [JSON](examples/ffi_json.rb) · [JSONC](examples/ffi_jsonc.rb) · [Bash](examples/ffi_bash.rb) · [TOML](examples/ffi_toml.rb) |
376
+ | **Java** | JNI bindings | ⚡ Very Fast | JRuby only | [JSON](examples/java_json.rb) · [JSONC](examples/java_jsonc.rb) · [Bash](examples/java_bash.rb) · [TOML](examples/java_toml.rb) |
377
+
378
+ #### Language-Specific Backends (Native Parser Integration)
379
+
380
+ | Backend | Description | Performance | Portability | Examples |
381
+ |---------|-------------|-------------|-------------|----------|
382
+ | **Prism** | Ruby's official parser | ⚡ Very Fast | ✅ Universal | [Ruby](examples/prism_ruby.rb) |
383
+ | **Psych** | Ruby's YAML parser (stdlib) | ⚡ Very Fast | ✅ Universal | [YAML](examples/psych_yaml.rb) |
384
+ | **Commonmarker** | Markdown via comrak (Rust) | ⚡ Very Fast | ✅ Good | [Markdown](examples/commonmarker_markdown.rb) · [Merge](examples/commonmarker_merge_example.rb) |
385
+ | **Markly** | GFM via cmark-gfm (C) | ⚡ Very Fast | ✅ Good | [Markdown](examples/markly_markdown.rb) · [Merge](examples/markly_merge_example.rb) |
386
+ | **Citrus** | Pure Ruby parsing | 🟡 Slower | ✅ Universal | [TOML](examples/citrus_toml.rb) · [Finitio](examples/citrus_finitio.rb) · [Dhall](examples/citrus_dhall.rb) |
387
+
388
+ **Selection Priority (Auto mode):** MRI → Rust → FFI → Java → Prism → Psych → Commonmarker → Markly → Citrus
389
+
390
+ **Known Issues:**
391
+ - *MRI + Bash: ABI incompatibility (use FFI instead)
392
+ - *Rust + Bash: Version mismatch (use FFI instead)
393
+
394
+ **Backend Requirements:**
395
+
396
+ ```ruby
397
+ # Tree-sitter backends
398
+ gem "ruby_tree_sitter", "~> 2.0" # MRI backend
399
+ gem "tree_stump" # Rust backend
400
+ gem "ffi", ">= 1.15", "< 2.0" # FFI backend
401
+ # Java backend: no gem required (uses JRuby's built-in JNI)
402
+
403
+ # Language-specific backends
404
+ gem "prism", "~> 1.0" # Ruby parsing (stdlib in Ruby 3.4+)
405
+ # Psych: no gem required (Ruby stdlib)
406
+ gem "commonmarker", ">= 0.23" # Markdown parsing (comrak)
407
+ gem "markly", "~> 0.11" # GFM parsing (cmark-gfm)
408
+
409
+ # Pure Ruby fallback
410
+ gem "citrus", "~> 3.0" # Citrus backend
411
+ # Plus grammar gems: toml-rb, dhall, finitio, etc.
412
+ ```
413
+
414
+ **Force Specific Backend:**
415
+
416
+ ```ruby
417
+ # Tree-sitter backends
418
+ TreeHaver.backend = :mri # Force MRI backend (ruby_tree_sitter)
419
+ TreeHaver.backend = :rust # Force Rust backend (tree_stump)
420
+ TreeHaver.backend = :ffi # Force FFI backend
421
+ TreeHaver.backend = :java # Force Java backend (JRuby only)
422
+
423
+ # Language-specific backends
424
+ TreeHaver.backend = :prism # Force Prism (Ruby parsing)
425
+ TreeHaver.backend = :psych # Force Psych (YAML parsing)
426
+ TreeHaver.backend = :commonmarker # Force Commonmarker (Markdown)
427
+ TreeHaver.backend = :markly # Force Markly (GFM Markdown)
428
+
429
+ # Pure Ruby fallback
430
+ TreeHaver.backend = :citrus # Force Citrus backend
431
+
432
+ # Auto-selection (default)
433
+ TreeHaver.backend = :auto # Let TreeHaver choose
434
+ ```
435
+
436
+ **Block-based Backend Switching:**
437
+
438
+ Use `with_backend` to temporarily switch backends for a specific block of code.
439
+ This is thread-safe and supports nesting—the previous backend is automatically
440
+ restored when the block exits (even if an exception is raised).
441
+
442
+ ```ruby
443
+ # Temporarily use a specific backend
444
+ TreeHaver.with_backend(:mri) do
445
+ parser = TreeHaver::Parser.new
446
+ tree = parser.parse(source)
447
+ # All operations in this block use the MRI backend
448
+ end
449
+ # Backend is restored to its previous value here
450
+
451
+ # Nested blocks work correctly
452
+ TreeHaver.with_backend(:rust) do
453
+ # Uses :rust
454
+ TreeHaver.with_backend(:citrus) do
455
+ # Uses :citrus
456
+ parser = TreeHaver::Parser.new
457
+ end
458
+ # Back to :rust
459
+ end
460
+ # Back to original backend
461
+ ```
462
+
463
+ This is particularly useful for:
464
+
465
+ - **Testing**: Test the same code with different backends
466
+ - **Performance comparison**: Benchmark different backends
467
+ - **Fallback scenarios**: Try one backend, fall back to another
468
+ - **Thread isolation**: Each thread can use a different backend safely
469
+
470
+ ```ruby
471
+ # Example: Testing with multiple backends
472
+ [:mri, :rust, :citrus].each do |backend_name|
473
+ TreeHaver.with_backend(backend_name) do
474
+ parser = TreeHaver::Parser.new
475
+ result = parser.parse(source)
476
+ puts "#{backend_name}: #{result.root_node.type}"
477
+ end
478
+ end
479
+ ```
480
+
481
+ **Check Backend Capabilities:**
482
+
483
+ ```ruby
484
+ TreeHaver.backend # => :ffi
485
+ TreeHaver.backend_module # => TreeHaver::Backends::FFI
486
+ TreeHaver.capabilities # => { backend: :ffi, parse: true, query: false, ... }
487
+ ```
488
+
489
+ See [examples/](examples/) directory for **26 complete working examples** demonstrating all 10 backends with multiple languages (JSON, JSONC, Bash, TOML, Ruby, YAML, Markdown) plus markdown-merge integration examples.
490
+
284
491
  ### Security Considerations
285
492
 
286
493
  **⚠️ Loading shared libraries (.so/.dylib/.dll) executes arbitrary native code.**
@@ -591,16 +798,103 @@ parser = TreeSitter::Parser.new # Actually creates TreeHaver::Parser
591
798
 
592
799
  This is safe and idempotent—if the real `TreeSitter` module is already loaded, the shim does nothing.
593
800
 
801
+ #### ⚠️ Important: Exception Hierarchy
802
+
803
+ **Both ruby_tree_sitter v2+ and TreeHaver exceptions inherit from `Exception` (not `StandardError`).**
804
+
805
+ This design decision follows ruby_tree_sitter's lead for thread-safety and signal handling reasons. See [ruby_tree_sitter PR #83](https://github.com/Faveod/ruby-tree-sitter/pull/83) for the rationale.
806
+
807
+ **What this means for exception handling:**
808
+
809
+ ```ruby
810
+ # ⚠️ This will NOT catch TreeHaver errors
811
+ begin
812
+ TreeHaver::Language.from_library("/nonexistent.so")
813
+ rescue => e
814
+ puts "Caught!" # Never reached - TreeHaver::Error inherits Exception
815
+ end
816
+
817
+ # ✅ Explicit rescue is required
818
+ begin
819
+ TreeHaver::Language.from_library("/nonexistent.so")
820
+ rescue TreeHaver::Error => e
821
+ puts "Caught!" # This works
822
+ end
823
+
824
+ # ✅ Or rescue specific exceptions
825
+ begin
826
+ TreeHaver::Language.from_library("/nonexistent.so")
827
+ rescue TreeHaver::NotAvailable => e
828
+ puts "Grammar not available: #{e.message}"
829
+ end
830
+ ```
831
+
832
+ **TreeHaver Exception Hierarchy:**
833
+
834
+ ```
835
+ Exception
836
+ └── TreeHaver::Error # Base error class
837
+ ├── TreeHaver::NotAvailable # Backend/grammar not available
838
+ └── TreeHaver::BackendConflict # Backend incompatibility detected
839
+ ```
840
+
841
+ **Compatibility Mode Behavior:**
842
+
843
+ The compat mode (`require "tree_haver/compat"`) creates aliases but **does not change the exception hierarchy**:
844
+
845
+ ```ruby
846
+ require "tree_haver/compat"
847
+
848
+ # TreeSitter constants are now aliases to TreeHaver
849
+ TreeSitter::Error # => TreeHaver::Error (still inherits Exception)
850
+ TreeSitter::Parser # => TreeHaver::Parser
851
+ TreeSitter::Language # => TreeHaver::Language
852
+
853
+ # Exception handling remains the same
854
+ begin
855
+ TreeSitter::Language.load("missing", "/nonexistent.so")
856
+ rescue TreeSitter::Error => e # Still requires explicit rescue
857
+ puts "Error: #{e.message}"
858
+ end
859
+ ```
860
+
861
+ **Best Practices:**
862
+
863
+ 1. **Always use explicit rescue** for TreeHaver errors:
864
+ ```ruby
865
+ begin
866
+ finder = TreeHaver::GrammarFinder.new(:toml)
867
+ finder.register! if finder.available?
868
+ language = TreeHaver::Language.toml
869
+ rescue TreeHaver::NotAvailable => e
870
+ warn("TOML grammar not available: #{e.message}")
871
+ # Fallback to another backend or fail gracefully
872
+ end
873
+ `````
874
+
875
+ 2. **Never rely on `rescue => e`** to catch TreeHaver errors (it won't work)
876
+
877
+ **Why inherit from Exception?**
878
+
879
+ Following ruby_tree_sitter's reasoning:
880
+ - **Thread safety**: Prevents accidental catching in thread cleanup code
881
+ - **Signal handling**: Ensures parsing errors don't interfere with SIGTERM/SIGINT
882
+ - **Intentional handling**: Forces developers to explicitly handle parsing errors
883
+
884
+ See `lib/tree_haver/compat.rb` for compatibility layer documentation.
885
+
594
886
  ## 🔧 Basic Usage
595
887
 
596
888
  ### Quick Start
597
889
 
598
- Here's a complete example of parsing TOML with TreeHaver:
890
+ TreeHaver works with any language through its 10 backends. Here are examples for different parsing needs:
891
+
892
+ #### Parsing with Tree-sitter (Universal Languages)
599
893
 
600
894
  ```ruby
601
895
  require "tree_haver"
602
896
 
603
- # Load a language grammar
897
+ # Load a tree-sitter grammar (works with MRI, Rust, FFI, or Java backend)
604
898
  language = TreeHaver::Language.from_library(
605
899
  "/usr/local/lib/libtree-sitter-toml.so",
606
900
  symbol: "tree_sitter_toml",
@@ -610,7 +904,7 @@ language = TreeHaver::Language.from_library(
610
904
  parser = TreeHaver::Parser.new
611
905
  parser.language = language
612
906
 
613
- # Parse some source code
907
+ # Parse source code
614
908
  source = <<~TOML
615
909
  [package]
616
910
  name = "my-app"
@@ -619,16 +913,116 @@ TOML
619
913
 
620
914
  tree = parser.parse(source)
621
915
 
622
- # Access the root node
916
+ # Access the unified Position API (works across all backends)
623
917
  root = tree.root_node
624
- puts "Root node type: #{root.type}" # => "document"
918
+ puts "Root type: #{root.type}" # => "document"
919
+ puts "Start line: #{root.start_line}" # => 1 (1-based)
920
+ puts "End line: #{root.end_line}" # => 3
921
+ puts "Position: #{root.source_position}" # => {start_line: 1, end_line: 3, ...}
625
922
 
626
923
  # Traverse the tree
627
924
  root.each do |child|
628
- puts "Child type: #{child.type}"
629
- child.each do |grandchild|
630
- puts " Grandchild type: #{grandchild.type}"
925
+ puts "Child: #{child.type} at line #{child.start_line}"
926
+ end
927
+ ```
928
+
929
+ #### Parsing Ruby with Prism
930
+
931
+ ```ruby
932
+ require "tree_haver"
933
+
934
+ TreeHaver.backend = :prism
935
+ parser = TreeHaver::Parser.new
936
+ parser.language = TreeHaver::Backends::Prism::Language.ruby
937
+
938
+ source = <<~RUBY
939
+ class Example
940
+ def hello
941
+ puts "Hello, world!"
942
+ end
631
943
  end
944
+ RUBY
945
+
946
+ tree = parser.parse(source)
947
+ root = tree.root_node
948
+
949
+ # Find all method definitions
950
+ def find_methods(node, results = [])
951
+ results << node if node.type == "def_node"
952
+ node.children.each { |child| find_methods(child, results) }
953
+ results
954
+ end
955
+
956
+ methods = find_methods(root)
957
+ methods.each do |method_node|
958
+ pos = method_node.source_position
959
+ puts "Method at lines #{pos[:start_line]}-#{pos[:end_line]}"
960
+ end
961
+ ```
962
+
963
+ #### Parsing YAML with Psych
964
+
965
+ ```ruby
966
+ require "tree_haver"
967
+
968
+ TreeHaver.backend = :psych
969
+ parser = TreeHaver::Parser.new
970
+ parser.language = TreeHaver::Backends::Psych::Language.yaml
971
+
972
+ source = <<~YAML
973
+ database:
974
+ host: localhost
975
+ port: 5432
976
+ YAML
977
+
978
+ tree = parser.parse(source)
979
+ root = tree.root_node
980
+
981
+ # Navigate YAML structure
982
+ def show_structure(node, indent = 0)
983
+ prefix = " " * indent
984
+ puts "#{prefix}#{node.type} (line #{node.start_line})"
985
+ node.children.each { |child| show_structure(child, indent + 1) }
986
+ end
987
+
988
+ show_structure(root)
989
+ ```
990
+
991
+ #### Parsing Markdown with Commonmarker or Markly
992
+
993
+ ```ruby
994
+ require "tree_haver"
995
+
996
+ # Choose your backend
997
+ TreeHaver.backend = :commonmarker # or :markly for GFM
998
+
999
+ parser = TreeHaver::Parser.new
1000
+ parser.language = TreeHaver::Backends::Commonmarker::Language.markdown
1001
+
1002
+ source = <<~MARKDOWN
1003
+ # My Document
1004
+
1005
+ ## Section
1006
+
1007
+ - Item 1
1008
+ - Item 2
1009
+ MARKDOWN
1010
+
1011
+ tree = parser.parse(source)
1012
+ root = tree.root_node
1013
+
1014
+ # Find all headings
1015
+ def find_headings(node, results = [])
1016
+ results << node if node.type == "heading"
1017
+ node.children.each { |child| find_headings(child, results) }
1018
+ results
1019
+ end
1020
+
1021
+ headings = find_headings(root)
1022
+ headings.each do |heading|
1023
+ level = heading.header_level
1024
+ text = heading.children.map(&:text).join
1025
+ puts "H#{level}: #{text} (line #{heading.start_line})"
632
1026
  end
633
1027
  ```
634
1028
 
@@ -657,6 +1051,38 @@ parser.language = toml_language
657
1051
  tree = parser.parse(toml_source)
658
1052
  ```
659
1053
 
1054
+ #### Flexible Language Names
1055
+
1056
+ The `name` parameter in `register_language` is an arbitrary identifier you choose—it doesn't
1057
+ need to match the actual language name. The actual grammar identity comes from the `path`
1058
+ and `symbol` parameters (for tree-sitter) or `grammar_module` (for Citrus).
1059
+
1060
+ This flexibility is useful for:
1061
+
1062
+ - **Aliasing**: Register the same grammar under multiple names
1063
+ - **Versioning**: Register different grammar versions (e.g., `:ruby_2`, `:ruby_3`)
1064
+ - **Testing**: Use unique names to avoid collisions between tests
1065
+ - **Context-specific naming**: Use names that make sense for your application
1066
+
1067
+ ```ruby
1068
+ # Register the same TOML grammar under different names for different purposes
1069
+ TreeHaver.register_language(
1070
+ :config_parser, # Custom name for your app
1071
+ path: "/usr/local/lib/libtree-sitter-toml.so",
1072
+ symbol: "tree_sitter_toml",
1073
+ )
1074
+
1075
+ TreeHaver.register_language(
1076
+ :toml_v1, # Version-specific name
1077
+ path: "/usr/local/lib/libtree-sitter-toml.so",
1078
+ symbol: "tree_sitter_toml",
1079
+ )
1080
+
1081
+ # Use your custom names
1082
+ config_lang = TreeHaver::Language.config_parser
1083
+ versioned_lang = TreeHaver::Language.toml_v1
1084
+ ```
1085
+
660
1086
  ### Parsing Different Languages
661
1087
 
662
1088
  TreeHaver works with any tree-sitter grammar:
@@ -875,23 +1301,90 @@ TreeHaver.backend = :mri
875
1301
  TreeHaver.backend = :citrus
876
1302
  ```
877
1303
 
878
- ### Advanced: Testing with Multiple Backends
1304
+ ### Advanced: Thread-Safe Backend Switching
1305
+
1306
+ TreeHaver provides `with_backend` for thread-safe, temporary backend switching. This is
1307
+ essential for testing, benchmarking, and applications that need different backends in
1308
+ different contexts.
879
1309
 
880
- If you're developing a library that uses TreeHaver, you can test against different backends:
1310
+ #### Testing with Multiple Backends
1311
+
1312
+ Test the same code path with different backends using `with_backend`:
881
1313
 
882
1314
  ```ruby
883
1315
  # In your test setup
884
1316
  RSpec.describe("MyParser") do
885
- before do
886
- TreeHaver.reset_backend!(to: :ffi)
1317
+ # Test with each available backend
1318
+ [:mri, :rust, :citrus].each do |backend_name|
1319
+ context "with #{backend_name} backend" do
1320
+ it "parses correctly" do
1321
+ TreeHaver.with_backend(backend_name) do
1322
+ parser = TreeHaver::Parser.new
1323
+ result = parser.parse("x = 42")
1324
+ expect(result.root_node.type).to(eq("document"))
1325
+ end
1326
+ # Backend automatically restored after block
1327
+ end
1328
+ end
887
1329
  end
1330
+ end
1331
+ ```
1332
+
1333
+ #### Thread Isolation
1334
+
1335
+ Each thread can use a different backend safely—`with_backend` uses thread-local storage:
888
1336
 
889
- after do
890
- TreeHaver.reset_backend!(to: :auto)
1337
+ ```ruby
1338
+ threads = []
1339
+
1340
+ threads << Thread.new do
1341
+ TreeHaver.with_backend(:mri) do
1342
+ # This thread uses MRI backend
1343
+ parser = TreeHaver::Parser.new
1344
+ 100.times { parser.parse("x = 1") }
891
1345
  end
1346
+ end
1347
+
1348
+ threads << Thread.new do
1349
+ TreeHaver.with_backend(:citrus) do
1350
+ # This thread uses Citrus backend simultaneously
1351
+ parser = TreeHaver::Parser.new
1352
+ 100.times { parser.parse("x = 1") }
1353
+ end
1354
+ end
1355
+
1356
+ threads.each(&:join)
1357
+ ```
1358
+
1359
+ #### Nested Blocks
892
1360
 
893
- it "parses correctly with FFI backend" do
894
- # Your test code
1361
+ `with_backend` supports nesting—inner blocks override outer blocks:
1362
+
1363
+ ```ruby
1364
+ TreeHaver.with_backend(:rust) do
1365
+ puts TreeHaver.effective_backend # => :rust
1366
+
1367
+ TreeHaver.with_backend(:citrus) do
1368
+ puts TreeHaver.effective_backend # => :citrus
1369
+ end
1370
+
1371
+ puts TreeHaver.effective_backend # => :rust (restored)
1372
+ end
1373
+ ```
1374
+
1375
+ #### Fallback Pattern
1376
+
1377
+ Try one backend, fall back to another on failure:
1378
+
1379
+ ```ruby
1380
+ def parse_with_fallback(source)
1381
+ TreeHaver.with_backend(:mri) do
1382
+ TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
1383
+ end
1384
+ rescue TreeHaver::NotAvailable
1385
+ # Fall back to Citrus if MRI backend unavailable
1386
+ TreeHaver.with_backend(:citrus) do
1387
+ TreeHaver::Parser.new.tap { |p| p.language = load_language }.parse(source)
895
1388
  end
896
1389
  end
897
1390
  ```
@@ -1292,7 +1785,7 @@ Thanks for RTFM. ☺️
1292
1785
  [📌gitmoji]: https://gitmoji.dev
1293
1786
  [📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
1294
1787
  [🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
1295
- [🧮kloc-img]: https://img.shields.io/badge/KLOC-0.726-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1788
+ [🧮kloc-img]: https://img.shields.io/badge/KLOC-1.141-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1296
1789
  [🔐security]: SECURITY.md
1297
1790
  [🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
1298
1791
  [📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year