tree_haver 3.2.0 → 3.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- checksums.yaml.gz.sig +0 -0
- data/CHANGELOG.md +82 -1
- data/README.md +1 -1
- data/lib/tree_haver/backends/ffi.rb +8 -6
- data/lib/tree_haver/backends/java.rb +5 -3
- data/lib/tree_haver/backends/mri.rb +23 -20
- data/lib/tree_haver/backends/rust.rb +3 -4
- data/lib/tree_haver/grammar_finder.rb +4 -1
- data/lib/tree_haver/language.rb +255 -0
- data/lib/tree_haver/library_path_utils.rb +80 -0
- data/lib/tree_haver/parser.rb +352 -0
- data/lib/tree_haver/rspec/dependency_tags.rb +264 -47
- data/lib/tree_haver/version.rb +1 -1
- data/lib/tree_haver.rb +14 -553
- data.tar.gz.sig +0 -0
- metadata +7 -4
- metadata.gz.sig +0 -0
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 3da36f96aa78d0995c76a3cbd6451889066b65d219414938e098ef9949df317f
|
|
4
|
+
data.tar.gz: 11e11da6c9f53e439d4226700c981466c3b47f0b9411d2e97a5211edade8d81d
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 1dce9daf70fc5573d579d699724601756db9758170fdb0e7be759ea9572a3d02fa491766786c708172e0f66c2473ae9e0da97fb20bddf7b185d93e285cc8bd3e
|
|
7
|
+
data.tar.gz: eaf3a3649233933b523a7f8e49831b7d5c7f00da268d805c7cbc043176424b5b4214da607bbe7e57e03dcd9072269dfedf3feba4383a828e6cb1abb89394b5c8
|
checksums.yaml.gz.sig
CHANGED
|
Binary file
|
data/CHANGELOG.md
CHANGED
|
@@ -30,6 +30,85 @@ Please file a bug if you notice a violation of semantic versioning.
|
|
|
30
30
|
|
|
31
31
|
### Security
|
|
32
32
|
|
|
33
|
+
## [3.2.1] - 2025-12-31
|
|
34
|
+
|
|
35
|
+
- TAG: [v3.2.1][3.2.1t]
|
|
36
|
+
- COVERAGE: 94.75% -- 2075/2190 lines in 22 files
|
|
37
|
+
- BRANCH COVERAGE: 81.35% -- 733/901 branches in 22 files
|
|
38
|
+
- 90.14% documented
|
|
39
|
+
|
|
40
|
+
### Added
|
|
41
|
+
|
|
42
|
+
- `TreeHaver::LibraryPathUtils` module for consistent path parsing across all backends
|
|
43
|
+
- `derive_symbol_from_path(path)` - derives tree-sitter symbol (e.g., `tree_sitter_toml`) from library path
|
|
44
|
+
- `derive_language_name_from_path(path)` - derives language name (e.g., `toml`) from library path
|
|
45
|
+
- `derive_language_name_from_symbol(symbol)` - strips `tree_sitter_` prefix from symbol
|
|
46
|
+
- Handles various naming conventions: `libtree-sitter-toml.so`, `libtree_sitter_toml.so`, `tree-sitter-toml.so`, `toml.so`
|
|
47
|
+
- Isolated backend RSpec tags for running tests without loading conflicting backends
|
|
48
|
+
- `:ffi_backend_only` - runs FFI tests without triggering `mri_backend_available?` check
|
|
49
|
+
- `:mri_backend_only` - runs MRI tests without triggering `ffi_available?` check
|
|
50
|
+
- Uses `TreeHaver::Backends::BLOCKED_BY` to dynamically determine which availability checks to skip
|
|
51
|
+
- Enables `rake ffi_specs` to run FFI tests before MRI is loaded
|
|
52
|
+
- `DependencyTags.ffi_backend_only_available?` - checks FFI availability without loading MRI
|
|
53
|
+
- `DependencyTags.mri_backend_only_available?` - checks MRI availability without checking FFI
|
|
54
|
+
|
|
55
|
+
### Changed
|
|
56
|
+
|
|
57
|
+
- All backends now use shared `LibraryPathUtils` for path parsing
|
|
58
|
+
- MRI, Rust, FFI, and Java backends updated for consistency
|
|
59
|
+
- Ensures identical behavior across all tree-sitter backends
|
|
60
|
+
- `TreeHaver::Language` class extracted to `lib/tree_haver/language.rb`
|
|
61
|
+
- No API changes, just file organization
|
|
62
|
+
- Loaded via autoload for lazy loading
|
|
63
|
+
- `TreeHaver::Parser` class extracted to `lib/tree_haver/parser.rb`
|
|
64
|
+
- No API changes, just file organization
|
|
65
|
+
- Loaded via autoload for lazy loading
|
|
66
|
+
- Backend availability exclusions in `dependency_tags.rb` are now dynamic
|
|
67
|
+
- Uses `TreeHaver::Backends::BLOCKED_BY` to skip availability checks for blocked backends
|
|
68
|
+
- When running with `--tag ffi_backend_only`, MRI availability is not checked
|
|
69
|
+
- Prevents MRI from being loaded before FFI tests can run
|
|
70
|
+
- Rakefile `ffi_specs` task now uses `:ffi_backend_only` tag
|
|
71
|
+
- Ensures FFI tests run without loading MRI backend first
|
|
72
|
+
|
|
73
|
+
### Fixed
|
|
74
|
+
|
|
75
|
+
- Rakefile now uses correct RSpec tags for FFI isolation
|
|
76
|
+
- The `ffi_specs` task uses `:ffi_backend_only` to prevent MRI from loading
|
|
77
|
+
- The `remaining_specs` task excludes `:ffi_backend_only` tests
|
|
78
|
+
- Tags in Rakefile align with canonical tags from `dependency_tags.rb`
|
|
79
|
+
- `TreeHaver::RSpec::DependencyTags.mri_backend_available?` now uses correct require path
|
|
80
|
+
- Was: `require "ruby_tree_sitter"` (wrong - causes LoadError)
|
|
81
|
+
- Now: `require "tree_sitter"` (correct - gem name is ruby_tree_sitter but require path is tree_sitter)
|
|
82
|
+
- This fix ensures the MRI backend is correctly detected as available in CI environments
|
|
83
|
+
- `TreeHaver::Backends::MRI::Language.from_library` now properly derives symbol from path
|
|
84
|
+
- Previously, calling `from_library(path)` without `symbol:` would fail because `language_name` was nil
|
|
85
|
+
- Now delegates to private `from_path` after deriving symbol, ensuring proper language name derivation
|
|
86
|
+
- `from_path` is now private (but still accessible via `send` for testing if needed)
|
|
87
|
+
- Extracts language name from paths like `/usr/lib/libtree-sitter-toml.so` → `tree_sitter_toml`
|
|
88
|
+
- Handles both dash and underscore separators in filenames
|
|
89
|
+
- Handles simple language names like `toml.so` → `tree_sitter_toml`
|
|
90
|
+
- `TreeHaver::Backends::MRI::Parser#language=` now unwraps `TreeHaver::Backends::MRI::Language` wrappers
|
|
91
|
+
- Accepts both raw `TreeSitter::Language` and wrapped `TreeHaver::Backends::MRI::Language`
|
|
92
|
+
- `TreeHaver::GrammarFinder.tree_sitter_runtime_usable?` no longer references `FFI::NotFoundError` directly
|
|
93
|
+
- Prevents `NameError` when FFI gem is not loaded
|
|
94
|
+
- `TreeHaver::Parser#initialize` no longer references `FFI::NotFoundError` directly in rescue clause
|
|
95
|
+
- Uses `defined?(::FFI::NotFoundError)` check to safely handle FFI errors when FFI is loaded
|
|
96
|
+
- Prevents `NameError: uninitialized constant TreeHaver::Parser::FFI` when FFI gem is not available
|
|
97
|
+
- Extracted error handling to `handle_parser_creation_failure` private method for clarity
|
|
98
|
+
- RSpec `dependency_tags.rb` now correctly detects `--tag` options during configuration
|
|
99
|
+
- RSpec's `config.inclusion_filter.rules` is empty during configuration phase
|
|
100
|
+
- Now parses `ARGV` directly to detect `--tag ffi_backend_only` and similar tags
|
|
101
|
+
- Skips grammar availability checks (which load MRI) when running isolated backend tests
|
|
102
|
+
- Skips full dependency summary in `before(:suite)` when backends are blocked
|
|
103
|
+
- `TreeHaver::Backends::FFI.reset!` now uses consistent pattern with other backends
|
|
104
|
+
- Was using `@ffi_gem_available` with `defined?()` check, which returned truthy after `reset!` set it to nil
|
|
105
|
+
- Now uses `@load_attempted` / `@loaded` pattern like MRI, Rust, Citrus, Prism, Psych, etc.
|
|
106
|
+
- This fixes FFI tests failing after the first test when `reset!` was called in `after` blocks
|
|
107
|
+
- `TreeHaver::Language.method_missing` no longer references `FFI::NotFoundError` directly in rescue clause
|
|
108
|
+
- Uses `defined?(::FFI::NotFoundError)` check to safely handle FFI errors when FFI is loaded
|
|
109
|
+
- Prevents `NameError` when FFI gem is not available but tree-sitter backends are used
|
|
110
|
+
- Extracted Citrus fallback logic to `handle_tree_sitter_load_failure` private method
|
|
111
|
+
|
|
33
112
|
## [3.2.0] - 2025-12-30
|
|
34
113
|
|
|
35
114
|
- TAG: [v3.2.0][3.2.0t]
|
|
@@ -580,7 +659,9 @@ Despite the major version bump to 3.0.0 (following semver due to the breaking `L
|
|
|
580
659
|
|
|
581
660
|
- Initial release
|
|
582
661
|
|
|
583
|
-
[Unreleased]: https://github.com/kettle-rb/tree_haver/compare/v3.2.
|
|
662
|
+
[Unreleased]: https://github.com/kettle-rb/tree_haver/compare/v3.2.1...HEAD
|
|
663
|
+
[3.2.1]: https://github.com/kettle-rb/tree_haver/compare/v3.2.0...v3.2.1
|
|
664
|
+
[3.2.1t]: https://github.com/kettle-rb/tree_haver/releases/tag/v3.2.1
|
|
584
665
|
[3.2.0]: https://github.com/kettle-rb/tree_haver/compare/v3.1.2...v3.2.0
|
|
585
666
|
[3.2.0t]: https://github.com/kettle-rb/tree_haver/releases/tag/v3.2.0
|
|
586
667
|
[3.1.2]: https://github.com/kettle-rb/tree_haver/compare/v3.1.1...v3.1.2
|
data/README.md
CHANGED
|
@@ -1977,7 +1977,7 @@ Thanks for RTFM. ☺️
|
|
|
1977
1977
|
[📌gitmoji]: https://gitmoji.dev
|
|
1978
1978
|
[📌gitmoji-img]: https://img.shields.io/badge/gitmoji_commits-%20%F0%9F%98%9C%20%F0%9F%98%8D-34495e.svg?style=flat-square
|
|
1979
1979
|
[🧮kloc]: https://www.youtube.com/watch?v=dQw4w9WgXcQ
|
|
1980
|
-
[🧮kloc-img]: https://img.shields.io/badge/KLOC-2.
|
|
1980
|
+
[🧮kloc-img]: https://img.shields.io/badge/KLOC-2.190-FFDD67.svg?style=for-the-badge&logo=YouTube&logoColor=blue
|
|
1981
1981
|
[🔐security]: SECURITY.md
|
|
1982
1982
|
[🔐security-img]: https://img.shields.io/badge/security-policy-259D6C.svg?style=flat
|
|
1983
1983
|
[📄copyright-notice-explainer]: https://opensource.stackexchange.com/questions/5778/why-do-licenses-such-as-the-mit-license-specify-a-single-year
|
|
@@ -56,9 +56,10 @@ module TreeHaver
|
|
|
56
56
|
# @note Returns false on TruffleRuby because TruffleRuby's FFI doesn't support
|
|
57
57
|
# STRUCT_BY_VALUE return types (used by ts_tree_root_node, ts_node_child, etc.)
|
|
58
58
|
def ffi_gem_available?
|
|
59
|
-
return @
|
|
59
|
+
return @loaded if @load_attempted
|
|
60
60
|
|
|
61
|
-
@
|
|
61
|
+
@load_attempted = true
|
|
62
|
+
@loaded = begin
|
|
62
63
|
# TruffleRuby's FFI doesn't support STRUCT_BY_VALUE return types
|
|
63
64
|
# which tree-sitter uses extensively (ts_tree_root_node, ts_node_child, etc.)
|
|
64
65
|
return false if RUBY_ENGINE == "truffleruby"
|
|
@@ -75,7 +76,8 @@ module TreeHaver
|
|
|
75
76
|
# @return [void]
|
|
76
77
|
# @api private
|
|
77
78
|
def reset!
|
|
78
|
-
@
|
|
79
|
+
@load_attempted = false
|
|
80
|
+
@loaded = false
|
|
79
81
|
end
|
|
80
82
|
|
|
81
83
|
# Get capabilities supported by this backend
|
|
@@ -395,14 +397,14 @@ module TreeHaver
|
|
|
395
397
|
end
|
|
396
398
|
|
|
397
399
|
requested = symbol || ENV["TREE_SITTER_LANG_SYMBOL"]
|
|
398
|
-
|
|
399
|
-
|
|
400
|
+
# Use shared utility for consistent symbol derivation across backends
|
|
401
|
+
guessed_symbol = LibraryPathUtils.derive_symbol_from_path(path)
|
|
400
402
|
# If an override was provided (arg or ENV), treat it as strict and do not fall back.
|
|
401
403
|
# Only when no override is provided do we attempt guessed and default candidates.
|
|
402
404
|
candidates = if requested && !requested.to_s.empty?
|
|
403
405
|
[requested]
|
|
404
406
|
else
|
|
405
|
-
[
|
|
407
|
+
[guessed_symbol, "tree_sitter_toml"].compact.uniq
|
|
406
408
|
end
|
|
407
409
|
|
|
408
410
|
func = nil
|
|
@@ -316,9 +316,11 @@ module TreeHaver
|
|
|
316
316
|
def from_library(path, symbol: nil, name: nil)
|
|
317
317
|
raise TreeHaver::NotAvailable, "Java backend not available" unless Java.available?
|
|
318
318
|
|
|
319
|
-
#
|
|
320
|
-
|
|
321
|
-
sym = symbol ||
|
|
319
|
+
# Use shared utility for consistent symbol derivation across backends
|
|
320
|
+
# If symbol not provided, derive from name or path
|
|
321
|
+
sym = symbol || LibraryPathUtils.derive_symbol_from_path(path)
|
|
322
|
+
# If name was provided, use it to override the derived symbol
|
|
323
|
+
sym = "tree_sitter_#{name}" if name && !symbol
|
|
322
324
|
|
|
323
325
|
begin
|
|
324
326
|
arena = ::Java::JavaLangForeign::Arena.global
|
|
@@ -166,6 +166,21 @@ module TreeHaver
|
|
|
166
166
|
# lang = TreeHaver::Backends::MRI::Language.from_library("/path/to/lib.so", symbol: "tree_sitter_json")
|
|
167
167
|
class << self
|
|
168
168
|
def from_library(path, symbol: nil, name: nil)
|
|
169
|
+
# Derive symbol from path if not provided using shared utility
|
|
170
|
+
symbol ||= LibraryPathUtils.derive_symbol_from_path(path)
|
|
171
|
+
from_path(path, symbol: symbol, name: name)
|
|
172
|
+
end
|
|
173
|
+
|
|
174
|
+
private
|
|
175
|
+
|
|
176
|
+
# Load a language from a shared library path (internal implementation)
|
|
177
|
+
#
|
|
178
|
+
# @param path [String] absolute path to the language shared library
|
|
179
|
+
# @param symbol [String] the exported symbol name (e.g., "tree_sitter_json")
|
|
180
|
+
# @param name [String, nil] optional language name
|
|
181
|
+
# @return [Language] wrapped language handle
|
|
182
|
+
# @api private
|
|
183
|
+
def from_path(path, symbol: nil, name: nil)
|
|
169
184
|
raise TreeHaver::NotAvailable, "ruby_tree_sitter not available" unless MRI.available?
|
|
170
185
|
|
|
171
186
|
# ruby_tree_sitter's TreeSitter::Language.load takes (language_name, path_to_so)
|
|
@@ -173,8 +188,8 @@ module TreeHaver
|
|
|
173
188
|
# NOT the full symbol name (e.g., NOT "tree_sitter_toml")
|
|
174
189
|
# and path_to_so is the full path to the .so file
|
|
175
190
|
#
|
|
176
|
-
# If name is not provided, derive it from symbol
|
|
177
|
-
language_name = name || symbol
|
|
191
|
+
# If name is not provided, derive it from symbol using shared utility
|
|
192
|
+
language_name = name || LibraryPathUtils.derive_language_name_from_symbol(symbol)
|
|
178
193
|
ts_lang = ::TreeSitter::Language.load(language_name, path)
|
|
179
194
|
new(ts_lang, path: path, symbol: symbol)
|
|
180
195
|
rescue NameError => e
|
|
@@ -190,16 +205,6 @@ module TreeHaver
|
|
|
190
205
|
raise # Re-raise if it's not a TreeSitter error
|
|
191
206
|
end
|
|
192
207
|
end
|
|
193
|
-
|
|
194
|
-
# Load a language from a shared library path (legacy method)
|
|
195
|
-
#
|
|
196
|
-
# @param path [String] absolute path to the language shared library
|
|
197
|
-
# @param symbol [String] the exported symbol name (e.g., "tree_sitter_json")
|
|
198
|
-
# @return [Language] wrapped language handle
|
|
199
|
-
# @deprecated Use {from_library} instead
|
|
200
|
-
def from_path(path, symbol: nil)
|
|
201
|
-
from_library(path, symbol: symbol)
|
|
202
|
-
end
|
|
203
208
|
end
|
|
204
209
|
end
|
|
205
210
|
|
|
@@ -229,19 +234,17 @@ module TreeHaver
|
|
|
229
234
|
|
|
230
235
|
# Set the language for this parser
|
|
231
236
|
#
|
|
232
|
-
#
|
|
233
|
-
#
|
|
234
|
-
#
|
|
235
|
-
# @param lang [::TreeSitter::Language] the language to use (already unwrapped)
|
|
236
|
-
# @return [::TreeSitter::Language] the language that was set
|
|
237
|
+
# @param lang [::TreeSitter::Language, TreeHaver::Backends::MRI::Language] the language to use
|
|
238
|
+
# @return [::TreeSitter::Language, TreeHaver::Backends::MRI::Language] the language that was set
|
|
237
239
|
# @raise [TreeHaver::NotAvailable] if setting language fails
|
|
238
240
|
def language=(lang)
|
|
239
|
-
#
|
|
240
|
-
|
|
241
|
+
# Unwrap if it's a TreeHaver wrapper
|
|
242
|
+
inner_lang = lang.respond_to?(:inner_language) ? lang.inner_language : lang
|
|
243
|
+
@parser.language = inner_lang
|
|
241
244
|
# Verify it was set
|
|
242
245
|
raise TreeHaver::NotAvailable, "Language not set correctly" if @parser.language.nil?
|
|
243
246
|
|
|
244
|
-
# Return the language object
|
|
247
|
+
# Return the original language object (wrapped or unwrapped)
|
|
245
248
|
lang
|
|
246
249
|
rescue Exception => e # rubocop:disable Lint/RescueException
|
|
247
250
|
# TreeSitter errors inherit from Exception (not StandardError) in ruby_tree_sitter v2+
|
|
@@ -149,16 +149,15 @@ module TreeHaver
|
|
|
149
149
|
|
|
150
150
|
# tree_stump uses TreeStump.register_lang(name, path) to register languages
|
|
151
151
|
# The name is used to derive the symbol automatically (tree_sitter_<name>)
|
|
152
|
-
|
|
152
|
+
# Use shared utility for consistent path parsing across backends
|
|
153
|
+
lang_name = name || LibraryPathUtils.derive_language_name_from_path(path)
|
|
153
154
|
::TreeStump.register_lang(lang_name, path)
|
|
154
155
|
new(lang_name, path: path)
|
|
155
156
|
rescue RuntimeError => e
|
|
156
157
|
raise TreeHaver::NotAvailable, "Failed to load language from #{path}: #{e.message}"
|
|
157
158
|
end
|
|
158
159
|
|
|
159
|
-
#
|
|
160
|
-
#
|
|
161
|
-
# @see from_library
|
|
160
|
+
# Backward-compatible alias for from_library
|
|
162
161
|
alias_method :from_path, :from_library
|
|
163
162
|
end
|
|
164
163
|
end
|
|
@@ -270,7 +270,10 @@ module TreeHaver
|
|
|
270
270
|
# Try to instantiate a parser - this will fail if runtime isn't available
|
|
271
271
|
mod::Parser.new
|
|
272
272
|
true
|
|
273
|
-
rescue NoMethodError,
|
|
273
|
+
rescue NoMethodError, LoadError, NotAvailable => _e
|
|
274
|
+
false
|
|
275
|
+
rescue StandardError => _e
|
|
276
|
+
# Catch FFI::NotFoundError and other errors when FFI is loaded
|
|
274
277
|
false
|
|
275
278
|
end
|
|
276
279
|
end
|
|
@@ -0,0 +1,255 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module TreeHaver
|
|
4
|
+
# Represents a language grammar for parsing source code
|
|
5
|
+
#
|
|
6
|
+
# Language is the entry point for loading and using grammars. It provides
|
|
7
|
+
# a unified interface that works across all backends (MRI, Rust, FFI, Java, Citrus).
|
|
8
|
+
#
|
|
9
|
+
# For tree-sitter backends, languages are loaded from shared library files (.so/.dylib/.dll).
|
|
10
|
+
# For pure-Ruby backends (Citrus, Prism, Psych), languages are built-in or provided by gems.
|
|
11
|
+
#
|
|
12
|
+
# == Loading Languages
|
|
13
|
+
#
|
|
14
|
+
# The primary way to load a language is via registration:
|
|
15
|
+
#
|
|
16
|
+
# TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
|
|
17
|
+
# language = TreeHaver::Language.toml
|
|
18
|
+
#
|
|
19
|
+
# For explicit loading without registration:
|
|
20
|
+
#
|
|
21
|
+
# language = TreeHaver::Language.from_library(
|
|
22
|
+
# "/path/to/libtree-sitter-toml.so",
|
|
23
|
+
# symbol: "tree_sitter_toml"
|
|
24
|
+
# )
|
|
25
|
+
#
|
|
26
|
+
# For ruby_tree_sitter compatibility:
|
|
27
|
+
#
|
|
28
|
+
# language = TreeHaver::Language.load("toml", "/path/to/libtree-sitter-toml.so")
|
|
29
|
+
#
|
|
30
|
+
# @example Register and load a language
|
|
31
|
+
# TreeHaver.register_language(:toml, path: "/path/to/grammar.so")
|
|
32
|
+
# language = TreeHaver::Language.toml
|
|
33
|
+
class Language
|
|
34
|
+
class << self
|
|
35
|
+
# Load a language grammar from a shared library (ruby_tree_sitter compatibility)
|
|
36
|
+
#
|
|
37
|
+
# This method provides API compatibility with ruby_tree_sitter which uses
|
|
38
|
+
# `Language.load(name, path)`.
|
|
39
|
+
#
|
|
40
|
+
# @param name [String] the language name (e.g., "toml")
|
|
41
|
+
# @param path [String] absolute path to the language shared library
|
|
42
|
+
# @param validate [Boolean] if true, validates the path for safety (default: true)
|
|
43
|
+
# @return [Language] loaded language handle
|
|
44
|
+
# @raise [NotAvailable] if the library cannot be loaded
|
|
45
|
+
# @raise [ArgumentError] if the path fails security validation
|
|
46
|
+
# @example
|
|
47
|
+
# language = TreeHaver::Language.load("toml", "/usr/local/lib/libtree-sitter-toml.so")
|
|
48
|
+
def load(name, path, validate: true)
|
|
49
|
+
from_library(path, symbol: "tree_sitter_#{name}", name: name, validate: validate)
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
# Load a language grammar from a shared library
|
|
53
|
+
#
|
|
54
|
+
# The library must export a function that returns a pointer to a TSLanguage struct.
|
|
55
|
+
# By default, TreeHaver looks for a symbol named "tree_sitter_<name>".
|
|
56
|
+
#
|
|
57
|
+
# == Security
|
|
58
|
+
#
|
|
59
|
+
# By default, paths are validated using {PathValidator} to prevent path traversal
|
|
60
|
+
# and other attacks. Set `validate: false` to skip validation (not recommended
|
|
61
|
+
# unless you've already validated the path).
|
|
62
|
+
#
|
|
63
|
+
# @param path [String] absolute path to the language shared library (.so/.dylib/.dll)
|
|
64
|
+
# @param symbol [String, nil] name of the exported function (defaults to auto-detection)
|
|
65
|
+
# @param name [String, nil] logical name for the language (used in caching)
|
|
66
|
+
# @param validate [Boolean] if true, validates path and symbol for safety (default: true)
|
|
67
|
+
# @param backend [Symbol, String, nil] optional backend to use (overrides context/global)
|
|
68
|
+
# @return [Language] loaded language handle
|
|
69
|
+
# @raise [NotAvailable] if the library cannot be loaded or the symbol is not found
|
|
70
|
+
# @raise [ArgumentError] if path or symbol fails security validation
|
|
71
|
+
# @example
|
|
72
|
+
# language = TreeHaver::Language.from_library(
|
|
73
|
+
# "/usr/local/lib/libtree-sitter-toml.so",
|
|
74
|
+
# symbol: "tree_sitter_toml",
|
|
75
|
+
# name: "toml"
|
|
76
|
+
# )
|
|
77
|
+
# @example With explicit backend
|
|
78
|
+
# language = TreeHaver::Language.from_library(
|
|
79
|
+
# "/usr/local/lib/libtree-sitter-toml.so",
|
|
80
|
+
# symbol: "tree_sitter_toml",
|
|
81
|
+
# backend: :ffi
|
|
82
|
+
# )
|
|
83
|
+
def from_library(path, symbol: nil, name: nil, validate: true, backend: nil)
|
|
84
|
+
if validate
|
|
85
|
+
unless PathValidator.safe_library_path?(path)
|
|
86
|
+
errors = PathValidator.validation_errors(path)
|
|
87
|
+
raise ArgumentError, "Unsafe library path: #{path.inspect}. Errors: #{errors.join("; ")}"
|
|
88
|
+
end
|
|
89
|
+
|
|
90
|
+
if symbol && !PathValidator.safe_symbol_name?(symbol)
|
|
91
|
+
raise ArgumentError, "Unsafe symbol name: #{symbol.inspect}. " \
|
|
92
|
+
"Symbol names must be valid C identifiers."
|
|
93
|
+
end
|
|
94
|
+
end
|
|
95
|
+
|
|
96
|
+
# from_library only works with tree-sitter backends that support .so files
|
|
97
|
+
# Pure Ruby backends (Citrus, Prism, Psych, Commonmarker, Markly) don't support from_library
|
|
98
|
+
mod = TreeHaver.resolve_native_backend_module(backend)
|
|
99
|
+
|
|
100
|
+
if mod.nil?
|
|
101
|
+
if backend
|
|
102
|
+
raise NotAvailable, "Requested backend #{backend.inspect} is not available or does not support shared libraries"
|
|
103
|
+
else
|
|
104
|
+
raise NotAvailable,
|
|
105
|
+
"No native tree-sitter backend is available for loading shared libraries. " \
|
|
106
|
+
"Available native backends (MRI, Rust, FFI, Java) require platform-specific setup. " \
|
|
107
|
+
"For pure-Ruby parsing, use backend-specific Language classes directly (e.g., Prism, Psych, Citrus)."
|
|
108
|
+
end
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
# Backend must implement .from_library; fallback to .from_path for older impls
|
|
112
|
+
# Include effective backend AND ENV vars in cache key since they affect loading
|
|
113
|
+
effective_b = TreeHaver.resolve_effective_backend(backend)
|
|
114
|
+
key = [effective_b, path, symbol, name, ENV["TREE_SITTER_LANG_SYMBOL"]]
|
|
115
|
+
LanguageRegistry.fetch(key) do
|
|
116
|
+
if mod::Language.respond_to?(:from_library)
|
|
117
|
+
mod::Language.from_library(path, symbol: symbol, name: name)
|
|
118
|
+
else
|
|
119
|
+
mod::Language.from_path(path)
|
|
120
|
+
end
|
|
121
|
+
end
|
|
122
|
+
end
|
|
123
|
+
# Alias for {from_library}
|
|
124
|
+
# @see from_library
|
|
125
|
+
alias_method :from_path, :from_library
|
|
126
|
+
|
|
127
|
+
# Dynamic helper to load a registered language by name
|
|
128
|
+
#
|
|
129
|
+
# After registering a language with {TreeHaver.register_language},
|
|
130
|
+
# you can load it using a method call. The appropriate backend will be
|
|
131
|
+
# used based on registration and current backend.
|
|
132
|
+
#
|
|
133
|
+
# @example With tree-sitter
|
|
134
|
+
# TreeHaver.register_language(:toml, path: "/path/to/libtree-sitter-toml.so")
|
|
135
|
+
# language = TreeHaver::Language.toml
|
|
136
|
+
#
|
|
137
|
+
# @example With both backends
|
|
138
|
+
# TreeHaver.register_language(:toml,
|
|
139
|
+
# path: "/path/to/libtree-sitter-toml.so", symbol: "tree_sitter_toml")
|
|
140
|
+
# TreeHaver.register_language(:toml,
|
|
141
|
+
# grammar_module: TomlRB::Document)
|
|
142
|
+
# language = TreeHaver::Language.toml # Uses appropriate grammar for active backend
|
|
143
|
+
#
|
|
144
|
+
# @param method_name [Symbol] the registered language name
|
|
145
|
+
# @param args [Array] positional arguments
|
|
146
|
+
# @param kwargs [Hash] keyword arguments
|
|
147
|
+
# @return [Language] loaded language handle
|
|
148
|
+
# @raise [NoMethodError] if the language name is not registered
|
|
149
|
+
def method_missing(method_name, *args, **kwargs, &block)
|
|
150
|
+
# Resolve only if the language name was registered
|
|
151
|
+
all_backends = TreeHaver.registered_language(method_name)
|
|
152
|
+
return super unless all_backends
|
|
153
|
+
|
|
154
|
+
# Check current backend
|
|
155
|
+
current_backend = TreeHaver.backend_module
|
|
156
|
+
|
|
157
|
+
# Determine which backend type to use
|
|
158
|
+
backend_type = if current_backend == Backends::Citrus
|
|
159
|
+
:citrus
|
|
160
|
+
else
|
|
161
|
+
:tree_sitter # MRI, Rust, FFI, Java all use tree-sitter
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
# Get backend-specific registration
|
|
165
|
+
reg = all_backends[backend_type]
|
|
166
|
+
|
|
167
|
+
# If Citrus backend is active
|
|
168
|
+
if backend_type == :citrus
|
|
169
|
+
if reg && reg[:grammar_module]
|
|
170
|
+
return Backends::Citrus::Language.new(reg[:grammar_module])
|
|
171
|
+
end
|
|
172
|
+
|
|
173
|
+
# Fall back to error if no Citrus grammar registered
|
|
174
|
+
raise NotAvailable,
|
|
175
|
+
"Citrus backend is active but no Citrus grammar registered for :#{method_name}. " \
|
|
176
|
+
"Either register a Citrus grammar or use a tree-sitter backend. " \
|
|
177
|
+
"Registered backends: #{all_backends.keys.inspect}"
|
|
178
|
+
end
|
|
179
|
+
|
|
180
|
+
# For tree-sitter backends, try to load from path
|
|
181
|
+
# If that fails, fall back to Citrus if available
|
|
182
|
+
if reg && reg[:path]
|
|
183
|
+
path = kwargs[:path] || args.first || reg[:path]
|
|
184
|
+
# Symbol priority: kwargs override > registration > derive from method_name
|
|
185
|
+
symbol = if kwargs.key?(:symbol)
|
|
186
|
+
kwargs[:symbol]
|
|
187
|
+
elsif reg[:symbol]
|
|
188
|
+
reg[:symbol]
|
|
189
|
+
else
|
|
190
|
+
"tree_sitter_#{method_name}"
|
|
191
|
+
end
|
|
192
|
+
# Name priority: kwargs override > derive from symbol (strip tree_sitter_ prefix)
|
|
193
|
+
# Using symbol-derived name ensures ruby_tree_sitter gets the correct language name
|
|
194
|
+
# e.g., "toml" not "toml_both" when symbol is "tree_sitter_toml"
|
|
195
|
+
name = kwargs[:name] || symbol&.sub(/\Atree_sitter_/, "")
|
|
196
|
+
|
|
197
|
+
begin
|
|
198
|
+
return from_library(path, symbol: symbol, name: name)
|
|
199
|
+
rescue NotAvailable, ArgumentError, LoadError => e
|
|
200
|
+
# Tree-sitter failed to load - check for Citrus fallback
|
|
201
|
+
handle_tree_sitter_load_failure(e, all_backends)
|
|
202
|
+
rescue => e
|
|
203
|
+
# Also catch FFI::NotFoundError if FFI is loaded (can't reference directly as FFI may not exist)
|
|
204
|
+
if defined?(::FFI::NotFoundError) && e.is_a?(::FFI::NotFoundError)
|
|
205
|
+
handle_tree_sitter_load_failure(e, all_backends)
|
|
206
|
+
else
|
|
207
|
+
raise
|
|
208
|
+
end
|
|
209
|
+
end
|
|
210
|
+
end
|
|
211
|
+
|
|
212
|
+
# No tree-sitter path registered - check for Citrus fallback
|
|
213
|
+
# This enables auto-fallback when tree-sitter grammar is not installed
|
|
214
|
+
# but a Citrus grammar (pure Ruby) is available
|
|
215
|
+
citrus_reg = all_backends[:citrus]
|
|
216
|
+
if citrus_reg && citrus_reg[:grammar_module]
|
|
217
|
+
return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
|
|
218
|
+
end
|
|
219
|
+
|
|
220
|
+
# No appropriate registration found
|
|
221
|
+
raise ArgumentError,
|
|
222
|
+
"No grammar registered for :#{method_name} compatible with #{backend_type} backend. " \
|
|
223
|
+
"Registered backends: #{all_backends.keys.inspect}"
|
|
224
|
+
end
|
|
225
|
+
|
|
226
|
+
# @api private
|
|
227
|
+
def respond_to_missing?(method_name, include_private = false)
|
|
228
|
+
!!TreeHaver.registered_language(method_name) || super
|
|
229
|
+
end
|
|
230
|
+
|
|
231
|
+
private
|
|
232
|
+
|
|
233
|
+
# Handle tree-sitter load failure with optional Citrus fallback
|
|
234
|
+
#
|
|
235
|
+
# This handles cases where:
|
|
236
|
+
# - The .so file doesn't exist or can't be loaded (NotAvailable, LoadError)
|
|
237
|
+
# - FFI can't find required symbols like ts_parser_new (FFI::NotFoundError)
|
|
238
|
+
# - Invalid arguments were provided (ArgumentError)
|
|
239
|
+
#
|
|
240
|
+
# @param error [Exception] the original error
|
|
241
|
+
# @param all_backends [Hash] all registered backends for the language
|
|
242
|
+
# @return [Backends::Citrus::Language] if Citrus fallback available
|
|
243
|
+
# @raise [Exception] re-raises original error if no fallback
|
|
244
|
+
# @api private
|
|
245
|
+
def handle_tree_sitter_load_failure(error, all_backends)
|
|
246
|
+
citrus_reg = all_backends[:citrus]
|
|
247
|
+
if citrus_reg && citrus_reg[:grammar_module]
|
|
248
|
+
return Backends::Citrus::Language.new(citrus_reg[:grammar_module])
|
|
249
|
+
end
|
|
250
|
+
# No Citrus fallback available, re-raise the original error
|
|
251
|
+
raise error
|
|
252
|
+
end
|
|
253
|
+
end
|
|
254
|
+
end
|
|
255
|
+
end
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module TreeHaver
|
|
4
|
+
# Utility methods for deriving tree-sitter symbol and language names from library paths
|
|
5
|
+
#
|
|
6
|
+
# This module provides consistent path parsing across all backends that load
|
|
7
|
+
# tree-sitter grammar libraries from shared object files (.so/.dylib/.dll).
|
|
8
|
+
#
|
|
9
|
+
# @example
|
|
10
|
+
# TreeHaver::LibraryPathUtils.derive_symbol_from_path("/usr/lib/libtree-sitter-toml.so")
|
|
11
|
+
# # => "tree_sitter_toml"
|
|
12
|
+
#
|
|
13
|
+
# TreeHaver::LibraryPathUtils.derive_language_name_from_path("/usr/lib/libtree-sitter-toml.so")
|
|
14
|
+
# # => "toml"
|
|
15
|
+
module LibraryPathUtils
|
|
16
|
+
module_function
|
|
17
|
+
|
|
18
|
+
# Derive the tree-sitter symbol name from a library path
|
|
19
|
+
#
|
|
20
|
+
# Symbol names are the exported C function names (e.g., "tree_sitter_toml")
|
|
21
|
+
# that return a pointer to the TSLanguage struct.
|
|
22
|
+
#
|
|
23
|
+
# Handles various naming conventions:
|
|
24
|
+
# - libtree-sitter-toml.so → tree_sitter_toml
|
|
25
|
+
# - libtree_sitter_toml.so → tree_sitter_toml
|
|
26
|
+
# - tree-sitter-toml.so → tree_sitter_toml
|
|
27
|
+
# - tree_sitter_toml.so → tree_sitter_toml
|
|
28
|
+
# - toml.so → tree_sitter_toml (assumes simple language name)
|
|
29
|
+
#
|
|
30
|
+
# @param path [String, nil] path like "/usr/lib/libtree-sitter-toml.so"
|
|
31
|
+
# @return [String, nil] symbol like "tree_sitter_toml", or nil if path is nil
|
|
32
|
+
def derive_symbol_from_path(path)
|
|
33
|
+
return unless path
|
|
34
|
+
|
|
35
|
+
# Extract filename without extension: "libtree-sitter-toml" or "toml"
|
|
36
|
+
filename = File.basename(path, ".*")
|
|
37
|
+
|
|
38
|
+
# Handle multi-part extensions like .so.0.24
|
|
39
|
+
filename = filename.sub(/\.so(\.\d+)*\z/, "")
|
|
40
|
+
|
|
41
|
+
# Match patterns and normalize to tree_sitter_<lang>
|
|
42
|
+
case filename
|
|
43
|
+
when /\Alib[-_]?tree[-_]sitter[-_](.+)\z/
|
|
44
|
+
"tree_sitter_#{Regexp.last_match(1).tr("-", "_")}"
|
|
45
|
+
when /\Atree[-_]sitter[-_](.+)\z/
|
|
46
|
+
"tree_sitter_#{Regexp.last_match(1).tr("-", "_")}"
|
|
47
|
+
else
|
|
48
|
+
# Assume filename is just the language name (e.g., "toml.so" -> "tree_sitter_toml")
|
|
49
|
+
# Also strip "lib" prefix if present (e.g., "libtoml.so" -> "tree_sitter_toml")
|
|
50
|
+
lang = filename.sub(/\Alib/, "").tr("-", "_")
|
|
51
|
+
"tree_sitter_#{lang}"
|
|
52
|
+
end
|
|
53
|
+
end
|
|
54
|
+
|
|
55
|
+
# Derive the language name from a library path
|
|
56
|
+
#
|
|
57
|
+
# Language names are the short identifiers (e.g., "toml", "json", "ruby")
|
|
58
|
+
# used by some backends (like tree_stump/Rust) to register grammars.
|
|
59
|
+
#
|
|
60
|
+
# @param path [String, nil] path like "/usr/lib/libtree-sitter-toml.so"
|
|
61
|
+
# @return [String, nil] language name like "toml", or nil if path is nil
|
|
62
|
+
def derive_language_name_from_path(path)
|
|
63
|
+
symbol = derive_symbol_from_path(path)
|
|
64
|
+
return unless symbol
|
|
65
|
+
|
|
66
|
+
# Strip the "tree_sitter_" prefix to get the language name
|
|
67
|
+
symbol.sub(/\Atree_sitter_/, "")
|
|
68
|
+
end
|
|
69
|
+
|
|
70
|
+
# Derive language name from a symbol
|
|
71
|
+
#
|
|
72
|
+
# @param symbol [String, nil] symbol like "tree_sitter_toml"
|
|
73
|
+
# @return [String, nil] language name like "toml", or nil if symbol is nil
|
|
74
|
+
def derive_language_name_from_symbol(symbol)
|
|
75
|
+
return unless symbol
|
|
76
|
+
|
|
77
|
+
symbol.sub(/\Atree_sitter_/, "")
|
|
78
|
+
end
|
|
79
|
+
end
|
|
80
|
+
end
|