fast_regexp 0.5.0 → 0.6.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a8403654241b531547ec912289ab9420155c9e85a33caa4069583bea84fb7608
4
- data.tar.gz: 76f09be683665969fc78fe6bd69fb4b818b886bc634f241d584224ec5595a42e
3
+ metadata.gz: b7e4b114057d201c62b2d8ec4405b3d2554c1cebf1399433d275e111b93864dd
4
+ data.tar.gz: da33511a662b556bcd9d085d0d3a52e2a138120b040a79b37015970486630ee5
5
5
  SHA512:
6
- metadata.gz: fe1ab72d32fc050eabee5cd970970455ac61d468d3a3b9948d74e97956d87b0008c405e82b6504540327865cc4279696ba9eb2018ed070ca06fdaad85577547c
7
- data.tar.gz: ad1862de8f1fd879ef5cc6a8b2baf109a97b404f855d1954135f67553c958f1f1b1e140f7c8c5188aa8db87e5827b32b5869cb4daf71647c42ae7e8d2ce93732
6
+ metadata.gz: 6417817f643dd1f81979e02f75b5c211c2921e87fa448fdf33c2a7da4c44980ebe6e53f9a91feb18e494e2feafbbfb03d004d3c97e51a91e3de182d0b09132a1
7
+ data.tar.gz: 6ad069ffca51038fe5f74a6ad8496a672c25c8044e76cb084dd965a8cd380beaebba99908460431e41d223f96ac9168d0f02d6fca2e05f3d3fcf3c5661e8b1e2
data/Cargo.lock CHANGED
@@ -72,7 +72,6 @@ name = "fast_regexp"
72
72
  version = "0.1.0"
73
73
  dependencies = [
74
74
  "magnus",
75
- "rb-sys",
76
75
  "regex",
77
76
  ]
78
77
 
data/README.md CHANGED
@@ -5,12 +5,10 @@
5
5
 
6
6
  Fast, drop-in regex for Ruby — backed by [rust/regex](https://docs.rs/regex/latest/regex/) with transparent fallback to the stdlib `::Regexp` engine for features rust/regex doesn't support (lookaround, backreferences, possessive quantifiers, etc.).
7
7
 
8
- You get rust/regex's speed and GVL-releasing matching on the common path, and a single uniform API (`Fast::Regexp`, `Fast::Regexp::MatchData`) regardless of which engine actually ran underneath.
8
+ You get rust/regex's speed on the common path, and a single uniform API (`Fast::Regexp`, `Fast::Regexp::MatchData`) regardless of which engine actually ran underneath.
9
9
 
10
10
  ## Installation
11
11
 
12
- Install [Rust](https://www.rust-lang.org/) via [rustup](https://rustup.rs/) or in any other way.
13
-
14
12
  Add as a dependency:
15
13
 
16
14
  ```ruby
@@ -21,6 +19,14 @@ gem "fast_regexp"
21
19
  gem install fast_regexp
22
20
  ```
23
21
 
22
+ Precompiled native gems are published for **arm64-darwin**, **x86_64-linux**,
23
+ and **aarch64-linux** against Ruby **3.3**, **3.4**, and **4.0** — no Rust
24
+ toolchain required on those platforms.
25
+
26
+ On any other platform/Ruby combo, Bundler/RubyGems falls back to the source
27
+ gem and compiles the extension at install time. That path needs
28
+ [Rust](https://www.rust-lang.org/) (install via [rustup](https://rustup.rs/)).
29
+
24
30
  Include in your code:
25
31
 
26
32
  ```ruby
@@ -155,6 +161,14 @@ slow.match?("foobar") # => true
155
161
  / `#stdlib` accessors. Replacement templates use rust/regex syntax (`$1`,
156
162
  `${name}`, `$$`) on both paths; the stdlib fallback translates them for you.
157
163
 
164
+ You can force a specific engine via the `backend:` kwarg:
165
+
166
+ ```ruby
167
+ Fast::Regexp.new('\w+', backend: :fast) # rust/regex only; raises on unsupported
168
+ Fast::Regexp.new(pat, backend: :stdlib) # skip rust/regex; use ::Regexp directly
169
+ Fast::Regexp.new('\w+', backend: :auto) # default — try rust, fall back on reject
170
+ ```
171
+
158
172
  > [!NOTE]
159
173
  > The fast path is byte-based (rust/regex's `regex::bytes`), so `#=~` returns
160
174
  > a *byte* offset. The stdlib fallback path returns the byte offset too, for
@@ -230,7 +244,7 @@ In-depth docs live under [`docs/`](docs/README.md), organized via the
230
244
  - **Tutorial:** [Getting started](docs/tutorials/getting-started.md)
231
245
  - **How-to:** [Migrate from stdlib `::Regexp`](docs/how-to/migrate-from-stdlib-regexp.md), [Handle unsupported syntax](docs/how-to/handle-unsupported-syntax.md)
232
246
  - **Reference:** [`Fast::Regexp`](docs/reference/fast-regexp.md), [`MatchData`](docs/reference/fast-regexp-matchdata.md), [`Set`](docs/reference/fast-regexp-set.md)
233
- - **Explainers:** [Engine fallback](docs/explainers/engine-fallback.md), [Concurrency and GVL](docs/explainers/concurrency-and-gvl.md)
247
+ - **Explainers:** [Engine fallback](docs/explainers/engine-fallback.md)
234
248
 
235
249
  ## Development
236
250
 
@@ -252,8 +266,7 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/jetpks
252
266
  huge thanks for the original bindings and the clean magnus integration that
253
267
  made this work easy to extend. This fork rebrands the gem, reshapes the
254
268
  public API (`Fast::Regexp`, real `MatchData`, `sub`/`gsub`, `===`/`=~`,
255
- `Regexp`-constructor coercion), releases the GVL around regex execution for
256
- thread/fiber-friendly matching, and adds transparent fallback to stdlib
269
+ `Regexp`-constructor coercion), and adds transparent fallback to stdlib
257
270
  `::Regexp` for patterns rust/regex can't compile.
258
271
 
259
272
  ## License
@@ -9,4 +9,3 @@ crate-type = ["cdylib"]
9
9
  [dependencies]
10
10
  magnus = "0.8"
11
11
  regex = "1"
12
- rb-sys = "0.9"
@@ -6,67 +6,9 @@ use magnus::{
6
6
  };
7
7
  use regex::bytes::{NoExpand, Regex, RegexBuilder, RegexSet, RegexSetBuilder};
8
8
  use std::collections::HashMap;
9
- use std::ffi::c_void;
10
- use std::ptr;
11
9
  use std::sync::Arc;
12
10
 
13
- /// Skip GVL release for haystacks smaller than this — the release/reacquire
14
- /// overhead (~µs) dwarfs the cost of a regex match on a tiny string.
15
- const GVL_RELEASE_THRESHOLD: usize = 1024;
16
-
17
- /// Run `func` with the GVL released so other Ruby threads (and the fiber
18
- /// scheduler on its host thread) can make progress during a long regex match.
19
- ///
20
- /// The callback runs on the same OS thread — `Send` is not required, and the
21
- /// borrow checker enforces that any references stay valid for the call.
22
- fn without_gvl<F, R>(func: F) -> R
23
- where
24
- F: FnOnce() -> R,
25
- {
26
- struct Pack<F, R> {
27
- func: Option<F>,
28
- result: Option<R>,
29
- }
30
-
31
- unsafe extern "C" fn trampoline<F, R>(data: *mut c_void) -> *mut c_void
32
- where
33
- F: FnOnce() -> R,
34
- {
35
- let pack = &mut *(data as *mut Pack<F, R>);
36
- let func = pack.func.take().expect("trampoline called twice");
37
- pack.result = Some(func());
38
- ptr::null_mut()
39
- }
40
-
41
- let mut pack: Pack<F, R> = Pack {
42
- func: Some(func),
43
- result: None,
44
- };
45
- unsafe {
46
- rb_sys::rb_thread_call_without_gvl(
47
- Some(trampoline::<F, R>),
48
- &mut pack as *mut _ as *mut c_void,
49
- None,
50
- ptr::null_mut(),
51
- );
52
- }
53
- pack.result.take().expect("callback did not run")
54
- }
55
-
56
- fn run_regex<F, R>(haystack_len: usize, func: F) -> R
57
- where
58
- F: FnOnce() -> R,
59
- {
60
- if haystack_len >= GVL_RELEASE_THRESHOLD {
61
- without_gvl(func)
62
- } else {
63
- func()
64
- }
65
- }
66
-
67
11
  fn haystack_bytes(haystack: &RString) -> Vec<u8> {
68
- // Copy out of the Ruby heap so the bytes are safe to read after the GVL is
69
- // released (Ruby 4's compacting GC can otherwise move the string).
70
12
  unsafe { haystack.as_slice() }.to_vec()
71
13
  }
72
14
 
@@ -157,12 +99,10 @@ impl FastRegexp {
157
99
  let regex = &self.0.regex;
158
100
  let bytes = haystack_bytes(&haystack);
159
101
 
160
- let offsets = run_regex(bytes.len(), || {
161
- regex.captures(&bytes).map(|caps| {
162
- (0..regex.captures_len())
163
- .map(|i| caps.get(i).map(|m| (m.start(), m.end())))
164
- .collect::<Vec<_>>()
165
- })
102
+ let offsets = regex.captures(&bytes).map(|caps| {
103
+ (0..regex.captures_len())
104
+ .map(|i| caps.get(i).map(|m| (m.start(), m.end())))
105
+ .collect::<Vec<_>>()
166
106
  })?;
167
107
 
168
108
  Some(self.build_match_data(Arc::new(bytes), offsets))
@@ -173,29 +113,25 @@ impl FastRegexp {
173
113
  let bytes = haystack_bytes(&haystack);
174
114
 
175
115
  if regex.captures_len() == 1 {
176
- let ranges: Vec<(usize, usize)> = run_regex(bytes.len(), || {
177
- regex
178
- .find_iter(&bytes)
179
- .map(|m| (m.start(), m.end()))
180
- .collect()
181
- });
116
+ let ranges: Vec<(usize, usize)> = regex
117
+ .find_iter(&bytes)
118
+ .map(|m| (m.start(), m.end()))
119
+ .collect();
182
120
  let result = ruby.ary_new_capa(ranges.len());
183
121
  for (s, e) in ranges {
184
122
  result.push(utf8_string(ruby, &bytes[s..e]))?;
185
123
  }
186
124
  Ok(result)
187
125
  } else {
188
- let groups: Vec<Vec<CaptureOffset>> = run_regex(bytes.len(), || {
189
- regex
190
- .captures_iter(&bytes)
191
- .map(|caps| {
192
- caps.iter()
193
- .skip(1)
194
- .map(|c| c.map(|m| (m.start(), m.end())))
195
- .collect()
196
- })
197
- .collect()
198
- });
126
+ let groups: Vec<Vec<CaptureOffset>> = regex
127
+ .captures_iter(&bytes)
128
+ .map(|caps| {
129
+ caps.iter()
130
+ .skip(1)
131
+ .map(|c| c.map(|m| (m.start(), m.end())))
132
+ .collect()
133
+ })
134
+ .collect();
199
135
  let result = ruby.ary_new_capa(groups.len());
200
136
  for group_ranges in groups {
201
137
  let group = ruby.ary_new_capa(group_ranges.len());
@@ -216,16 +152,14 @@ impl FastRegexp {
216
152
  let bytes = haystack_bytes(&haystack);
217
153
  let n_groups = regex.captures_len();
218
154
 
219
- let all: Vec<Vec<CaptureOffset>> = run_regex(bytes.len(), || {
220
- regex
221
- .captures_iter(&bytes)
222
- .map(|caps| {
223
- (0..n_groups)
224
- .map(|i| caps.get(i).map(|m| (m.start(), m.end())))
225
- .collect()
226
- })
227
- .collect()
228
- });
155
+ let all: Vec<Vec<CaptureOffset>> = regex
156
+ .captures_iter(&bytes)
157
+ .map(|caps| {
158
+ (0..n_groups)
159
+ .map(|i| caps.get(i).map(|m| (m.start(), m.end())))
160
+ .collect()
161
+ })
162
+ .collect();
229
163
 
230
164
  let shared = Arc::new(bytes);
231
165
  let result = ruby.ary_new_capa(all.len());
@@ -238,7 +172,7 @@ impl FastRegexp {
238
172
  pub fn is_match(&self, haystack: RString) -> bool {
239
173
  let regex = &self.0.regex;
240
174
  let bytes = haystack_bytes(&haystack);
241
- run_regex(bytes.len(), || regex.is_match(&bytes))
175
+ regex.is_match(&bytes)
242
176
  }
243
177
 
244
178
  pub fn sub_str(
@@ -251,13 +185,11 @@ impl FastRegexp {
251
185
  let regex = &rb_self.0.regex;
252
186
  let bytes = haystack_bytes(&haystack);
253
187
  let repl = haystack_bytes(&replacement);
254
- let out = run_regex(bytes.len(), || -> Vec<u8> {
255
- if literal {
256
- regex.replace(&bytes, NoExpand(&repl)).into_owned()
257
- } else {
258
- regex.replace(&bytes, &repl[..]).into_owned()
259
- }
260
- });
188
+ let out: Vec<u8> = if literal {
189
+ regex.replace(&bytes, NoExpand(&repl)).into_owned()
190
+ } else {
191
+ regex.replace(&bytes, &repl[..]).into_owned()
192
+ };
261
193
  utf8_string(ruby, &out)
262
194
  }
263
195
 
@@ -271,13 +203,11 @@ impl FastRegexp {
271
203
  let regex = &rb_self.0.regex;
272
204
  let bytes = haystack_bytes(&haystack);
273
205
  let repl = haystack_bytes(&replacement);
274
- let out = run_regex(bytes.len(), || -> Vec<u8> {
275
- if literal {
276
- regex.replace_all(&bytes, NoExpand(&repl)).into_owned()
277
- } else {
278
- regex.replace_all(&bytes, &repl[..]).into_owned()
279
- }
280
- });
206
+ let out: Vec<u8> = if literal {
207
+ regex.replace_all(&bytes, NoExpand(&repl)).into_owned()
208
+ } else {
209
+ regex.replace_all(&bytes, &repl[..]).into_owned()
210
+ };
281
211
  utf8_string(ruby, &out)
282
212
  }
283
213
 
@@ -482,13 +412,13 @@ impl FastRegexpSet {
482
412
  pub fn matches(&self, haystack: RString) -> Vec<usize> {
483
413
  let set = &self.0;
484
414
  let bytes = haystack_bytes(&haystack);
485
- run_regex(bytes.len(), || set.matches(&bytes).iter().collect())
415
+ set.matches(&bytes).iter().collect()
486
416
  }
487
417
 
488
418
  pub fn is_match(&self, haystack: RString) -> bool {
489
419
  let set = &self.0;
490
420
  let bytes = haystack_bytes(&haystack);
491
- run_regex(bytes.len(), || set.is_match(&bytes))
421
+ set.is_match(&bytes)
492
422
  }
493
423
 
494
424
  pub fn patterns(&self) -> Vec<String> {
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Fast
4
4
  class Regexp
5
- VERSION = "0.5.0"
5
+ VERSION = "0.6.1"
6
6
  end
7
7
  end
data/lib/fast_regexp.rb CHANGED
@@ -6,16 +6,27 @@ module Fast
6
6
  # Façade over rust/regex with a transparent fallback to stdlib `::Regexp`.
7
7
  #
8
8
  # `Fast::Regexp.new(pattern)` first tries to compile with rust/regex (fast,
9
- # GVL-releasing, byte-based). If the pattern uses features rust/regex does
10
- # not support (lookaround, backreferences, possessive quantifiers, etc.) we
11
- # fall back to `::Regexp` so consumers don't have to juggle two libraries.
9
+ # byte-based). If the pattern uses features rust/regex does not support
10
+ # (lookaround, backreferences, possessive quantifiers, etc.) we fall back
11
+ # to `::Regexp` so consumers don't have to juggle two libraries.
12
12
  class Regexp
13
+ NATIVE_EXTENSIONS = %w[.bundle .so .rb].freeze
14
+
15
+ # Precompiled native gems ship per-ABI subdirs (`fast_regexp/4.0/...`),
16
+ # the source-gem `rake compile` build lands flat (`fast_regexp/...`).
17
+ # Pick whichever exists for the current Ruby ABI, with the per-ABI path
18
+ # winning when both are present.
19
+ def self.locate_native(base, ruby_version: RUBY_VERSION)
20
+ abi = ruby_version[/\d+\.\d+/]
21
+ candidates = [File.join(base, abi, "fast_regexp"), File.join(base, "fast_regexp")]
22
+ candidates.find { |stem| NATIVE_EXTENSIONS.any? { |ext| File.exist?(stem + ext) } }
23
+ end
13
24
  end
14
25
  end
15
26
 
16
- # Load the native extension AFTER the Fast::Regexp class shell exists — the
17
- # Rust init() looks up Fast::Regexp and registers Native under it.
18
- require_relative "fast_regexp/fast_regexp"
27
+ native = Fast::Regexp.locate_native(File.expand_path("fast_regexp", __dir__))
28
+ raise LoadError, "could not locate fast_regexp native extension" unless native
29
+ require native
19
30
 
20
31
  module Fast
21
32
  class Regexp
@@ -34,6 +45,15 @@ module Fast
34
45
  allocate.tap { |re| re.send(:initialize, translated, original: pattern, **opts) }
35
46
  end
36
47
 
48
+ # Bulk-compile a symbol-keyed hash of patterns. Handy for defining a set
49
+ # of regex constants in one shot:
50
+ #
51
+ # RE = Fast::Regexp.create_many(word: '\w+', num: '\d+').freeze
52
+ # RE[:word].match("hello")
53
+ def create_many(**patterns)
54
+ patterns.transform_values { |pat| new(pat) }
55
+ end
56
+
37
57
  private
38
58
 
39
59
  def translate_regexp(regexp)
@@ -46,11 +66,17 @@ module Fast
46
66
 
47
67
  attr_reader :pattern, :backend
48
68
 
69
+ BACKENDS = %i[auto fast stdlib].freeze
70
+
49
71
  # Internal — use `Fast::Regexp.new`. `original` is the unmodified input
50
72
  # (String or ::Regexp) so we can build an accurate stdlib fallback.
51
- def initialize(pattern, original: pattern, **opts)
73
+ # `backend:` forces a specific engine: `:auto` (default) tries rust/regex
74
+ # and falls back to stdlib, `:fast` raises if rust/regex rejects the
75
+ # pattern, `:stdlib` skips rust/regex entirely.
76
+ def initialize(pattern, original: pattern, backend: :auto, **opts)
77
+ raise ArgumentError, "backend must be one of #{BACKENDS.inspect}" unless BACKENDS.include?(backend)
52
78
  @pattern = pattern
53
- @backend = compile_backend(pattern, original, opts)
79
+ @backend = compile_backend(pattern, original, backend, opts)
54
80
  end
55
81
 
56
82
  def fast? = @backend.is_a?(Native)
@@ -178,17 +204,22 @@ module Fast
178
204
  count
179
205
  end
180
206
 
181
- def compile_backend(pattern, original, opts)
207
+ def compile_backend(pattern, original, backend, opts)
208
+ return compile_stdlib(pattern, original) if backend == :stdlib
182
209
  Native._native_new(pattern, **opts)
183
210
  rescue ArgumentError => e
211
+ raise if backend == :fast
212
+ compile_stdlib(pattern, original, fallback_from: e)
213
+ end
214
+
215
+ def compile_stdlib(pattern, original, fallback_from: nil)
184
216
  return original if original.is_a?(::Regexp)
185
- begin
186
- ::Regexp.new(pattern)
187
- rescue ::RegexpError
188
- # Pattern is malformed in both engines surface the original
189
- # rust/regex error so the user sees the more detailed message.
190
- raise ArgumentError, e.message
191
- end
217
+ ::Regexp.new(pattern)
218
+ rescue ::RegexpError => e
219
+ # Pattern is malformed in both engines (auto path) — surface the
220
+ # original rust/regex error since it's typically more detailed.
221
+ # Otherwise propagate the stdlib error as-is.
222
+ raise(fallback_from ? ArgumentError.new(fallback_from.message) : e)
192
223
  end
193
224
 
194
225
  # Maps positional capture indices to names for the stdlib backend so we
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fast_regexp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.6.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Eric Jacobs