tree_haver 3.2.0 → 3.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4086bd0349e2cfd205314110a7af3b8548530cc8d5f4328d84ecc38f840744f9
4
- data.tar.gz: 01c27844a370a8235518002d6947226061931a127de6473b495d99ac4549cefd
3
+ metadata.gz: 3da36f96aa78d0995c76a3cbd6451889066b65d219414938e098ef9949df317f
4
+ data.tar.gz: 11e11da6c9f53e439d4226700c981466c3b47f0b9411d2e97a5211edade8d81d
5
5
  SHA512:
6
- metadata.gz: 1a77445cd016d4cf0d5cb0cf28aee79e6a664578ac2da92babc663e3df483f024f1c17c434ef19d4941cf25dd6674aea68177d71c096b75e3d83315d4d0e222a
7
- data.tar.gz: a53b817c9a24181d7b2b17810977f0978085c0603ca07b41f9189a221d1e3a7cd0b2e7751e53f4440883ea02e22c262de6b965ca9dedb4bf7bc6282ff29d57c3
6
+ metadata.gz: 1dce9daf70fc5573d579d699724601756db9758170fdb0e7be759ea9572a3d02fa491766786c708172e0f66c2473ae9e0da97fb20bddf7b185d93e285cc8bd3e
7
+ data.tar.gz: eaf3a3649233933b523a7f8e49831b7d5c7f00da268d805c7cbc043176424b5b4214da607bbe7e57e03dcd9072269dfedf3feba4383a828e6cb1abb89394b5c8
checksums.yaml.gz.sig CHANGED
Binary file
data/CHANGELOG.md CHANGED
@@ -30,6 +30,85 @@ Please file a bug if you notice a violation of semantic versioning.
30
30
 
31
31
  ### Security
32
32
 
33
+ ## [3.2.1] - 2025-12-31
34
+
35
+ - TAG: [v3.2.1][3.2.1t]
36
+ - COVERAGE: 94.75% -- 2075/2190 lines in 22 files
37
+ - BRANCH COVERAGE: 81.35% -- 733/901 branches in 22 files
38
+ - 90.14% documented
39
+
40
+ ### Added
41
+
42
+ - `TreeHaver::LibraryPathUtils` module for consistent path parsing across all backends
43
+ - `derive_symbol_from_path(path)` - derives tree-sitter symbol (e.g., `tree_sitter_toml`) from library path
44
+ - `derive_language_name_from_path(path)` - derives language name (e.g., `toml`) from library path
45
+ - `derive_language_name_from_symbol(symbol)` - strips `tree_sitter_` prefix from symbol
46
+ - Handles various naming conventions: `libtree-sitter-toml.so`, `libtree_sitter_toml.so`, `tree-sitter-toml.so`, `toml.so`
47
+ - Isolated backend RSpec tags for running tests without loading conflicting backends
48
+ - `:ffi_backend_only` - runs FFI tests without triggering `mri_backend_available?` check
49
+ - `:mri_backend_only` - runs MRI tests without triggering `ffi_available?` check
50
+ - Uses `TreeHaver::Backends::BLOCKED_BY` to dynamically determine which availability checks to skip
51
+ - Enables `rake ffi_specs` to run FFI tests before MRI is loaded
52
+ - `DependencyTags.ffi_backend_only_available?` - checks FFI availability without loading MRI
53
+ - `DependencyTags.mri_backend_only_available?` - checks MRI availability without checking FFI
54
+
55
+ ### Changed
56
+
57
+ - All backends now use shared `LibraryPathUtils` for path parsing
58
+ - MRI, Rust, FFI, and Java backends updated for consistency
59
+ - Ensures identical behavior across all tree-sitter backends
60
+ - `TreeHaver::Language` class extracted to `lib/tree_haver/language.rb`
61
+ - No API changes, just file organization
62
+ - Loaded via autoload for lazy loading
63
+ - `TreeHaver::Parser` class extracted to `lib/tree_haver/parser.rb`
64
+ - No API changes, just file organization
65
+ - Loaded via autoload for lazy loading
66
+ - Backend availability exclusions in `dependency_tags.rb` are now dynamic
67
+ - Uses `TreeHaver::Backends::BLOCKED_BY` to skip availability checks for blocked backends
68
+ - When running with `--tag ffi_backend_only`, MRI availability is not checked
69
+ - Prevents MRI from being loaded before FFI tests can run
70
+ - Rakefile `ffi_specs` task now uses `:ffi_backend_only` tag
71
+ - Ensures FFI tests run without loading MRI backend first
72
+
73
+ ### Fixed
74
+
75
+ - Rakefile now uses correct RSpec tags for FFI isolation
76
+ - The `ffi_specs` task uses `:ffi_backend_only` to prevent MRI from loading
77
+ - The `remaining_specs` task excludes `:ffi_backend_only` tests
78
+ - Tags in Rakefile align with canonical tags from `dependency_tags.rb`
79
+ - `TreeHaver::RSpec::DependencyTags.mri_backend_available?` now uses correct require path
80
+ - Was: `require "ruby_tree_sitter"` (wrong - causes LoadError)
81
+ - Now: `require "tree_sitter"` (correct - gem name is ruby_tree_sitter but require path is tree_sitter)
82
+ - This fix ensures the MRI backend is correctly detected as available in CI environments
83
+ - `TreeHaver::Backends::MRI::Language.from_library` now properly derives symbol from path
84
+ - Previously, calling `from_library(path)` without `symbol:` would fail because `language_name` was nil
85
+ - Now delegates to private `from_path` after deriving symbol, ensuring proper language name derivation
86
+ - `from_path` is now private (but still accessible via `send` for testing if needed)
87
+ - Extracts language name from paths like `/usr/lib/libtree-sitter-toml.so` → `tree_sitter_toml`
88
+ - Handles both dash and underscore separators in filenames
89
+ - Handles simple language names like `toml.so` → `tree_sitter_toml`
90
+ - `TreeHaver::Backends::MRI::Parser#language=` now unwraps `TreeHaver::Backends::MRI::Language` wrappers
91
+ - Accepts both raw `TreeSitter::Language` and wrapped `TreeHaver::Backends::MRI::Language`
92
+ - `TreeHaver::GrammarFinder.tree_sitter_runtime_usable?` no longer references `FFI::NotFoundError` directly
93
+ - Prevents `NameError` when FFI gem is not loaded
94
+ - `TreeHaver::Parser#initialize` no longer references `FFI::NotFoundError` directly in rescue clause
95
+ - Uses `defined?(::FFI::NotFoundError)` check to safely handle FFI errors when FFI is loaded
96
+ - Prevents `NameError: uninitialized constant TreeHaver::Parser::FFI` when FFI gem is not available
97
+ - Extracted error handling to `handle_parser_creation_failure` private method for clarity
98
+ - RSpec `dependency_tags.rb` now correctly detects `--tag` options during configuration
99
+ - RSpec's `config.inclusion_filter.rules` is empty during configuration phase
100
+ - Now parses `ARGV` directly to detect `--tag ffi_backend_only` and similar tags
101
+ - Skips grammar availability checks (which load MRI) when running isolated backend tests
102
+ - Skips full dependency summary in `before(:suite)` when backends are blocked
103
+ - `TreeHaver::Backends::FFI.reset!` now uses consistent pattern with other backends
104
+ - Was using `@ffi_gem_available` with `defined?()` check, which returned truthy after `reset!` set it to nil
105
+ - Now uses `@load_attempted` / `@loaded` pattern like MRI, Rust, Citrus, Prism, Psych, etc.
106
+ - This fixes FFI tests failing after the first test when `reset!` was called in `after` blocks
107
+ - `TreeHaver::Language.method_missing` no longer references `FFI::NotFoundError` directly in rescue clause
108
+ - Uses `defined?(::FFI::NotFoundError)` check to safely handle FFI errors when FFI is loaded
109
+ - Prevents `NameError` when FFI gem is not available but tree-sitter backends are used
110
+ - Extracted Citrus fallback logic to `handle_tree_sitter_load_failure` private method
111
+
33
112
  ## [3.2.0] - 2025-12-30
34
113
 
35
114
  - TAG: [v3.2.0][3.2.0t]
@@ -580,7 +659,9 @@ Despite the major version bump to 3.0.0 (following semver due to the breaking `L
580
659
 
581
660
  - Initial release
582
661
 
583
- [Unreleased]: https://github.com/kettle-rb/tree_haver/compare/v3.2.0...HEAD
662
+ [Unreleased]: https://github.com/kettle-rb/tree_haver/compare/v3.2.1...HEAD
663
+ [3.2.1]: https://github.com/kettle-rb/tree_haver/compare/v3.2.0...v3.2.1
664
+ [3.2.1t]: https://github.com/kettle-rb/tree_haver/releases/tag/v3.2.1
584
665
  [3.2.0]: https://github.com/kettle-rb/tree_haver/compare/v3.1.2...v3.2.0
585
666
  [3.2.0t]: https://github.com/kettle-rb/tree_haver/releases/tag/v3.2.0
586
667
  [3.1.2]: https://github.com/kettle-rb/tree_haver/compare/v3.1.1...v3.1.2
data/README.md CHANGED
@@ -1977,7 +1977,7 @@ Thanks for RTFM. ☺️
1977
1977
  [📌gitmoji]: https://gitmoji.dev
1978
1978
  [📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
1979
1979
  [🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
1980
- [🧮kloc-img]: https://img.shields.io/badge/KLOC-2.496-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1980
+ [🧮kloc-img]: https://img.shields.io/badge/KLOC-2.190-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
1981
1981
  [🔐security]: SECURITY.md
1982
1982
  [🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
1983
1983
  [📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
@@ -56,9 +56,10 @@ module TreeHaver
56
56
  # @note Returns false on TruffleRuby because TruffleRuby's FFI doesn't support
57
57
  # STRUCT_BY_VALUE return types (used by ts_tree_root_node, ts_node_child, etc.)
58
58
  def ffi_gem_available?
59
- return @ffi_gem_available if defined?(@ffi_gem_available)
59
+ return @loaded if @load_attempted
60
60
 
61
- @ffi_gem_available = begin
61
+ @load_attempted = true
62
+ @loaded = begin
62
63
  # TruffleRuby's FFI doesn't support STRUCT_BY_VALUE return types
63
64
  # which tree-sitter uses extensively (ts_tree_root_node, ts_node_child, etc.)
64
65
  return false if RUBY_ENGINE == "truffleruby"
@@ -75,7 +76,8 @@ module TreeHaver
75
76
  # @return [void]
76
77
  # @api private
77
78
  def reset!
78
- @ffi_gem_available = nil
79
+ @load_attempted = false
80
+ @loaded = false
79
81
  end
80
82
 
81
83
  # Get capabilities supported by this backend
@@ -395,14 +397,14 @@ module TreeHaver
395
397
  end
396
398
 
397
399
  requested = symbol || ENV["TREE_SITTER_LANG_SYMBOL"]
398
- base = File.basename(path)
399
- guessed_lang = base.sub(/^libtree[-_]sitter[-_]/, "").sub(/\.(so(\.\d+)?)|\.dylib|\.dll\z/, "")
400
+ # Use shared utility for consistent symbol derivation across backends
401
+ guessed_symbol = LibraryPathUtils.derive_symbol_from_path(path)
400
402
  # If an override was provided (arg or ENV), treat it as strict and do not fall back.
401
403
  # Only when no override is provided do we attempt guessed and default candidates.
402
404
  candidates = if requested && !requested.to_s.empty?
403
405
  [requested]
404
406
  else
405
- [(guessed_lang.empty? ? nil : "tree_sitter_#{guessed_lang}"), "tree_sitter_toml"].compact
407
+ [guessed_symbol, "tree_sitter_toml"].compact.uniq
406
408
  end
407
409
 
408
410
  func = nil
@@ -316,9 +316,11 @@ module TreeHaver
316
316
  def from_library(path, symbol: nil, name: nil)
317
317
  raise TreeHaver::NotAvailable, "Java backend not available" unless Java.available?
318
318
 
319
- # Derive symbol from name or path if not provided
320
- base_name = File.basename(path, ".*").sub(/^lib/, "")
321
- sym = symbol || "tree_sitter_#{name || base_name.sub(/^tree-sitter-/, "")}"
319
+ # Use shared utility for consistent symbol derivation across backends
320
+ # If symbol not provided, derive from name or path
321
+ sym = symbol || LibraryPathUtils.derive_symbol_from_path(path)
322
+ # If name was provided, use it to override the derived symbol
323
+ sym = "tree_sitter_#{name}" if name && !symbol
322
324
 
323
325
  begin
324
326
  arena = ::Java::JavaLangForeign::Arena.global
@@ -166,6 +166,21 @@ module TreeHaver
166
166
  # lang = TreeHaver::Backends::MRI::Language.from_library("/path/to/lib.so", symbol: "tree_sitter_json")
167
167
  class << self
168
168
  def from_library(path, symbol: nil, name: nil)
169
+ # Derive symbol from path if not provided using shared utility
170
+ symbol ||= LibraryPathUtils.derive_symbol_from_path(path)
171
+ from_path(path, symbol: symbol, name: name)
172
+ end
173
+
174
+ private
175
+
176
+ # Load a language from a shared library path (internal implementation)
177
+ #
178
+ # @param path [String] absolute path to the language shared library
179
+ # @param symbol [String] the exported symbol name (e.g., "tree_sitter_json")
180
+ # @param name [String, nil] optional language name
181
+ # @return [Language] wrapped language handle
182
+ # @api private
183
+ def from_path(path, symbol: nil, name: nil)
169
184
  raise TreeHaver::NotAvailable, "ruby_tree_sitter not available" unless MRI.available?
170
185
 
171
186
  # ruby_tree_sitter's TreeSitter::Language.load takes (language_name, path_to_so)
@@ -173,8 +188,8 @@ module TreeHaver
173
188
  # NOT the full symbol name (e.g., NOT "tree_sitter_toml")
174
189
  # and path_to_so is the full path to the .so file
175
190
  #
176
- # If name is not provided, derive it from symbol by stripping "tree_sitter_" prefix
177
- language_name = name || symbol&.sub(/\Atree_sitter_/, "")
191
+ # If name is not provided, derive it from symbol using shared utility
192
+ language_name = name || LibraryPathUtils.derive_language_name_from_symbol(symbol)
178
193
  ts_lang = ::TreeSitter::Language.load(language_name, path)
179
194
  new(ts_lang, path: path, symbol: symbol)
180
195
  rescue NameError => e
@@ -190,16 +205,6 @@ module TreeHaver
190
205
  raise # Re-raise if it's not a TreeSitter error
191
206
  end
192
207
  end
193
-
194
- # Load a language from a shared library path (legacy method)
195
- #
196
- # @param path [String] absolute path to the language shared library
197
- # @param symbol [String] the exported symbol name (e.g., "tree_sitter_json")
198
- # @return [Language] wrapped language handle
199
- # @deprecated Use {from_library} instead
200
- def from_path(path, symbol: nil)
201
- from_library(path, symbol: symbol)
202
- end
203
208
  end
204
209
  end
205
210
 
@@ -229,19 +234,17 @@ module TreeHaver
229
234
 
230
235
  # Set the language for this parser
231
236
  #
232
- # Note: TreeHaver::Parser unwraps language objects before calling this method.
233
- # This backend receives raw ::TreeSitter::Language objects, never wrapped ones.
234
- #
235
- # @param lang [::TreeSitter::Language] the language to use (already unwrapped)
236
- # @return [::TreeSitter::Language] the language that was set
237
+ # @param lang [::TreeSitter::Language, TreeHaver::Backends::MRI::Language] the language to use
238
+ # @return [::TreeSitter::Language, TreeHaver::Backends::MRI::Language] the language that was set
237
239
  # @raise [TreeHaver::NotAvailable] if setting language fails
238
240
  def language=(lang)
239
- # lang is already unwrapped by TreeHaver::Parser, use directly
240
- @parser.language = lang
241
+ # Unwrap if it's a TreeHaver wrapper
242
+ inner_lang = lang.respond_to?(:inner_language) ? lang.inner_language : lang
243
+ @parser.language = inner_lang
241
244
  # Verify it was set
242
245
  raise TreeHaver::NotAvailable, "Language not set correctly" if @parser.language.nil?
243
246
 
244
- # Return the language object
247
+ # Return the original language object (wrapped or unwrapped)
245
248
  lang
246
249
  rescue Exception => e # rubocop:disable Lint/RescueException
247
250
  # TreeSitter errors inherit from Exception (not StandardError) in ruby_tree_sitter v2+
@@ -149,16 +149,15 @@ module TreeHaver
149
149
 
150
150
  # tree_stump uses TreeStump.register_lang(name, path) to register languages
151
151
  # The name is used to derive the symbol automatically (tree_sitter_<name>)
152
- lang_name = name || File.basename(path, ".*").sub(/^libtree-sitter-/, "")
152
+ # Use shared utility for consistent path parsing across backends
153
+ lang_name = name || LibraryPathUtils.derive_language_name_from_path(path)
153
154
  ::TreeStump.register_lang(lang_name, path)
154
155
  new(lang_name, path: path)
155
156
  rescue RuntimeError => e
156
157
  raise TreeHaver::NotAvailable, "Failed to load language from #{path}: #{e.message}"
157
158
  end
158
159
 
159
- # Alias for compatibility
160
- #
161
- # @see from_library
160
+ # Backward-compatible alias for from_library
162
161
  alias_method :from_path, :from_library
163
162
  end
164
163
  end
@@ -270,7 +270,10 @@ module TreeHaver
270
270
  # Try to instantiate a parser - this will fail if runtime isn't available
271
271
  mod::Parser.new
272
272
  true
273
- rescue NoMethodError, FFI::NotFoundError, LoadError, NotAvailable => _e
273
+ rescue NoMethodError, LoadError, NotAvailable => _e
274
+ false
275
+ rescue StandardError => _e
276
+ # Catch FFI::NotFoundError and other errors when FFI is loaded
274
277
  false
275
278
  end
276
279
  end
@@ -0,0 +1,255 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TreeHaver
4
+ # Represents a language grammar for parsing source code
5
+ #
6
+ # Language is the entry point for loading and using grammars. It provides
7
+ # a unified interface that works across all backends (MRI, Rust, FFI, Java, Citrus).
8
+ #
9
+ # For tree-sitter backends, languages are loaded from shared library files (.so/.dylib/.dll).
10
+ # For pure-Ruby backends (Citrus, Prism, Psych), languages are built-in or provided by gems.
11
+ #
12
+ # == Loading Languages
13
+ #
14
+ # The primary way to load a language is via registration:
15
+ #
16
+ # TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
17
+ # language = TreeHaver::Language.toml
18
+ #
19
+ # For explicit loading without registration:
20
+ #
21
+ # language = TreeHaver::Language.from_library(
22
+ # "/path/to/libtree-sitter-toml.so",
23
+ # symbol: "tree_sitter_toml"
24
+ # )
25
+ #
26
+ # For ruby_tree_sitter compatibility:
27
+ #
28
+ # language = TreeHaver::Language.load("toml", "/path/to/libtree-sitter-toml.so")
29
+ #
30
+ # @example Register and load a language
31
+ # TreeHaver.register_language(:toml, path: "/path/to/grammar.so")
32
+ # language = TreeHaver::Language.toml
33
+ class Language
34
+ class << self
35
+ # Load a language grammar from a shared library (ruby_tree_sitter compatibility)
36
+ #
37
+ # This method provides API compatibility with ruby_tree_sitter which uses
38
+ # `Language.load(name, path)`.
39
+ #
40
+ # @param name [String] the language name (e.g., "toml")
41
+ # @param path [String] absolute path to the language shared library
42
+ # @param validate [Boolean] if true, validates the path for safety (default: true)
43
+ # @return [Language] loaded language handle
44
+ # @raise [NotAvailable] if the library cannot be loaded
45
+ # @raise [ArgumentError] if the path fails security validation
46
+ # @example
47
+ # language = TreeHaver::Language.load("toml", "/usr/local/lib/libtree-sitter-toml.so")
48
+ def load(name, path, validate: true)
49
+ from_library(path, symbol: "tree_sitter_#{name}", name: name, validate: validate)
50
+ end
51
+
52
+ # Load a language grammar from a shared library
53
+ #
54
+ # The library must export a function that returns a pointer to a TSLanguage struct.
55
+ # By default, TreeHaver looks for a symbol named "tree_sitter_<name>".
56
+ #
57
+ # == Security
58
+ #
59
+ # By default, paths are validated using {PathValidator} to prevent path traversal
60
+ # and other attacks. Set `validate: false` to skip validation (not recommended
61
+ # unless you've already validated the path).
62
+ #
63
+ # @param path [String] absolute path to the language shared library (.so/.dylib/.dll)
64
+ # @param symbol [String, nil] name of the exported function (defaults to auto-detection)
65
+ # @param name [String, nil] logical name for the language (used in caching)
66
+ # @param validate [Boolean] if true, validates path and symbol for safety (default: true)
67
+ # @param backend [Symbol, String, nil] optional backend to use (overrides context/global)
68
+ # @return [Language] loaded language handle
69
+ # @raise [NotAvailable] if the library cannot be loaded or the symbol is not found
70
+ # @raise [ArgumentError] if path or symbol fails security validation
71
+ # @example
72
+ # language = TreeHaver::Language.from_library(
73
+ # "/usr/local/lib/libtree-sitter-toml.so",
74
+ # symbol: "tree_sitter_toml",
75
+ # name: "toml"
76
+ # )
77
+ # @example With explicit backend
78
+ # language = TreeHaver::Language.from_library(
79
+ # "/usr/local/lib/libtree-sitter-toml.so",
80
+ # symbol: "tree_sitter_toml",
81
+ # backend: :ffi
82
+ # )
83
+ def from_library(path, symbol: nil, name: nil, validate: true, backend: nil)
84
+ if validate
85
+ unless PathValidator.safe_library_path?(path)
86
+ errors = PathValidator.validation_errors(path)
87
+ raise ArgumentError, "Unsafe library path: #{path.inspect}. Errors: #{errors.join("; ")}"
88
+ end
89
+
90
+ if symbol && !PathValidator.safe_symbol_name?(symbol)
91
+ raise ArgumentError, "Unsafe symbol name: #{symbol.inspect}. " \
92
+ "Symbol names must be valid C identifiers."
93
+ end
94
+ end
95
+
96
+ # from_library only works with tree-sitter backends that support .so files
97
+ # Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) don't support from_library
98
+ mod = TreeHaver.resolve_native_backend_module(backend)
99
+
100
+ if mod.nil?
101
+ if backend
102
+ raise NotAvailable, "Requested backend #{backend.inspect} is not available or does not support shared libraries"
103
+ else
104
+ raise NotAvailable,
105
+ "No native tree-sitter backend is available for loading shared libraries. " \
106
+ "Available native backends (MRI, Rust, FFI, Java) require platform-specific setup. " \
107
+ "For pure-Ruby parsing, use backend-specific Language classes directly (e.g., Prism, Psych, Citrus)."
108
+ end
109
+ end
110
+
111
+ # Backend must implement .from_library; fallback to .from_path for older impls
112
+ # Include effective backend AND ENV vars in cache key since they affect loading
113
+ effective_b = TreeHaver.resolve_effective_backend(backend)
114
+ key = [effective_b, path, symbol, name, ENV["TREE_SITTER_LANG_SYMBOL"]]
115
+ LanguageRegistry.fetch(key) do
116
+ if mod::Language.respond_to?(:from_library)
117
+ mod::Language.from_library(path, symbol: symbol, name: name)
118
+ else
119
+ mod::Language.from_path(path)
120
+ end
121
+ end
122
+ end
123
+ # Alias for {from_library}
124
+ # @see from_library
125
+ alias_method :from_path, :from_library
126
+
127
+ # Dynamic helper to load a registered language by name
128
+ #
129
+ # After registering a language with {TreeHaver.register_language},
130
+ # you can load it using a method call. The appropriate backend will be
131
+ # used based on registration and current backend.
132
+ #
133
+ # @example With tree-sitter
134
+ # TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
135
+ # language = TreeHaver::Language.toml
136
+ #
137
+ # @example With both backends
138
+ # TreeHaver.register_language(:toml,
139
+ # path: "/path/to/libtree-sitter-toml.so", symbol: "tree_sitter_toml")
140
+ # TreeHaver.register_language(:toml,
141
+ # grammar_module: TomlRB::Document)
142
+ # language = TreeHaver::Language.toml # Uses appropriate grammar for active backend
143
+ #
144
+ # @param method_name [Symbol] the registered language name
145
+ # @param args [Array] positional arguments
146
+ # @param kwargs [Hash] keyword arguments
147
+ # @return [Language] loaded language handle
148
+ # @raise [NoMethodError] if the language name is not registered
149
+ def method_missing(method_name, *args, **kwargs, &block)
150
+ # Resolve only if the language name was registered
151
+ all_backends = TreeHaver.registered_language(method_name)
152
+ return super unless all_backends
153
+
154
+ # Check current backend
155
+ current_backend = TreeHaver.backend_module
156
+
157
+ # Determine which backend type to use
158
+ backend_type = if current_backend == Backends::Citrus
159
+ :citrus
160
+ else
161
+ :tree_sitter # MRI, Rust, FFI, Java all use tree-sitter
162
+ end
163
+
164
+ # Get backend-specific registration
165
+ reg = all_backends[backend_type]
166
+
167
+ # If Citrus backend is active
168
+ if backend_type == :citrus
169
+ if reg && reg[:grammar_module]
170
+ return Backends::Citrus::Language.new(reg[:grammar_module])
171
+ end
172
+
173
+ # Fall back to error if no Citrus grammar registered
174
+ raise NotAvailable,
175
+ "Citrus backend is active but no Citrus grammar registered for :#{method_name}. " \
176
+ "Either register a Citrus grammar or use a tree-sitter backend. " \
177
+ "Registered backends: #{all_backends.keys.inspect}"
178
+ end
179
+
180
+ # For tree-sitter backends, try to load from path
181
+ # If that fails, fall back to Citrus if available
182
+ if reg && reg[:path]
183
+ path = kwargs[:path] || args.first || reg[:path]
184
+ # Symbol priority: kwargs override > registration > derive from method_name
185
+ symbol = if kwargs.key?(:symbol)
186
+ kwargs[:symbol]
187
+ elsif reg[:symbol]
188
+ reg[:symbol]
189
+ else
190
+ "tree_sitter_#{method_name}"
191
+ end
192
+ # Name priority: kwargs override > derive from symbol (strip tree_sitter_ prefix)
193
+ # Using symbol-derived name ensures ruby_tree_sitter gets the correct language name
194
+ # e.g., "toml" not "toml_both" when symbol is "tree_sitter_toml"
195
+ name = kwargs[:name] || symbol&.sub(/\Atree_sitter_/, "")
196
+
197
+ begin
198
+ return from_library(path, symbol: symbol, name: name)
199
+ rescue NotAvailable, ArgumentError, LoadError => e
200
+ # Tree-sitter failed to load - check for Citrus fallback
201
+ handle_tree_sitter_load_failure(e, all_backends)
202
+ rescue => e
203
+ # Also catch FFI::NotFoundError if FFI is loaded (can't reference directly as FFI may not exist)
204
+ if defined?(::FFI::NotFoundError) && e.is_a?(::FFI::NotFoundError)
205
+ handle_tree_sitter_load_failure(e, all_backends)
206
+ else
207
+ raise
208
+ end
209
+ end
210
+ end
211
+
212
+ # No tree-sitter path registered - check for Citrus fallback
213
+ # This enables auto-fallback when tree-sitter grammar is not installed
214
+ # but a Citrus grammar (pure Ruby) is available
215
+ citrus_reg = all_backends[:citrus]
216
+ if citrus_reg && citrus_reg[:grammar_module]
217
+ return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
218
+ end
219
+
220
+ # No appropriate registration found
221
+ raise ArgumentError,
222
+ "No grammar registered for :#{method_name} compatible with #{backend_type} backend. " \
223
+ "Registered backends: #{all_backends.keys.inspect}"
224
+ end
225
+
226
+ # @api private
227
+ def respond_to_missing?(method_name, include_private = false)
228
+ !!TreeHaver.registered_language(method_name) || super
229
+ end
230
+
231
+ private
232
+
233
+ # Handle tree-sitter load failure with optional Citrus fallback
234
+ #
235
+ # This handles cases where:
236
+ # - The .so file doesn't exist or can't be loaded (NotAvailable, LoadError)
237
+ # - FFI can't find required symbols like ts_parser_new (FFI::NotFoundError)
238
+ # - Invalid arguments were provided (ArgumentError)
239
+ #
240
+ # @param error [Exception] the original error
241
+ # @param all_backends [Hash] all registered backends for the language
242
+ # @return [Backends::Citrus::Language] if Citrus fallback available
243
+ # @raise [Exception] re-raises original error if no fallback
244
+ # @api private
245
+ def handle_tree_sitter_load_failure(error, all_backends)
246
+ citrus_reg = all_backends[:citrus]
247
+ if citrus_reg && citrus_reg[:grammar_module]
248
+ return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
249
+ end
250
+ # No Citrus fallback available, re-raise the original error
251
+ raise error
252
+ end
253
+ end
254
+ end
255
+ end
@@ -0,0 +1,80 @@
1
+ # frozen_string_literal: true
2
+
3
+ module TreeHaver
4
+ # Utility methods for deriving tree-sitter symbol and language names from library paths
5
+ #
6
+ # This module provides consistent path parsing across all backends that load
7
+ # tree-sitter grammar libraries from shared object files (.so/.dylib/.dll).
8
+ #
9
+ # @example
10
+ # TreeHaver::LibraryPathUtils.derive_symbol_from_path("/usr/lib/libtree-sitter-toml.so")
11
+ # # => "tree_sitter_toml"
12
+ #
13
+ # TreeHaver::LibraryPathUtils.derive_language_name_from_path("/usr/lib/libtree-sitter-toml.so")
14
+ # # => "toml"
15
+ module LibraryPathUtils
16
+ module_function
17
+
18
+ # Derive the tree-sitter symbol name from a library path
19
+ #
20
+ # Symbol names are the exported C function names (e.g., "tree_sitter_toml")
21
+ # that return a pointer to the TSLanguage struct.
22
+ #
23
+ # Handles various naming conventions:
24
+ # - libtree-sitter-toml.so → tree_sitter_toml
25
+ # - libtree_sitter_toml.so → tree_sitter_toml
26
+ # - tree-sitter-toml.so → tree_sitter_toml
27
+ # - tree_sitter_toml.so → tree_sitter_toml
28
+ # - toml.so → tree_sitter_toml (assumes simple language name)
29
+ #
30
+ # @param path [String, nil] path like "/usr/lib/libtree-sitter-toml.so"
31
+ # @return [String, nil] symbol like "tree_sitter_toml", or nil if path is nil
32
+ def derive_symbol_from_path(path)
33
+ return unless path
34
+
35
+ # Extract filename without extension: "libtree-sitter-toml" or "toml"
36
+ filename = File.basename(path, ".*")
37
+
38
+ # Handle multi-part extensions like .so.0.24
39
+ filename = filename.sub(/\.so(\.\d+)*\z/, "")
40
+
41
+ # Match patterns and normalize to tree_sitter_<lang>
42
+ case filename
43
+ when /\Alib[-_]?tree[-_]sitter[-_](.+)\z/
44
+ "tree_sitter_#{Regexp.last_match(1).tr("-", "_")}"
45
+ when /\Atree[-_]sitter[-_](.+)\z/
46
+ "tree_sitter_#{Regexp.last_match(1).tr("-", "_")}"
47
+ else
48
+ # Assume filename is just the language name (e.g., "toml.so" -> "tree_sitter_toml")
49
+ # Also strip "lib" prefix if present (e.g., "libtoml.so" -> "tree_sitter_toml")
50
+ lang = filename.sub(/\Alib/, "").tr("-", "_")
51
+ "tree_sitter_#{lang}"
52
+ end
53
+ end
54
+
55
+ # Derive the language name from a library path
56
+ #
57
+ # Language names are the short identifiers (e.g., "toml", "json", "ruby")
58
+ # used by some backends (like tree_stump/Rust) to register grammars.
59
+ #
60
+ # @param path [String, nil] path like "/usr/lib/libtree-sitter-toml.so"
61
+ # @return [String, nil] language name like "toml", or nil if path is nil
62
+ def derive_language_name_from_path(path)
63
+ symbol = derive_symbol_from_path(path)
64
+ return unless symbol
65
+
66
+ # Strip the "tree_sitter_" prefix to get the language name
67
+ symbol.sub(/\Atree_sitter_/, "")
68
+ end
69
+
70
+ # Derive language name from a symbol
71
+ #
72
+ # @param symbol [String, nil] symbol like "tree_sitter_toml"
73
+ # @return [String, nil] language name like "toml", or nil if symbol is nil
74
+ def derive_language_name_from_symbol(symbol)
75
+ return unless symbol
76
+
77
+ symbol.sub(/\Atree_sitter_/, "")
78
+ end
79
+ end
80
+ end