fast_regexp 0.4.0-aarch64-linux → 0.6.0-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9e7f80619e2ffc99b8abe00ecf291d8df9f79806eef7c6a3de30e08233619c6c
4
- data.tar.gz: f834be706bed41f45c93156ca97ba78876b021027ce00f5187e26a23d1641874
3
+ metadata.gz: '039084596a2999534c4e78c9c3fc008ddcf0712a573c93234f4191d9b8e6fd3d'
4
+ data.tar.gz: 8f2629a2a40c5a82c26f73ce9826954bf8de24ab879757c7e672808a3b1edf26
5
5
  SHA512:
6
- metadata.gz: daea2f2211cca44de646668960d31175dc7ce26eead28da2aae6ad92ad4d1761b82227ead275f76bb5c41c6a2b87f31fd38310a633175498312b416f48b48af5
7
- data.tar.gz: 27fcf044024abbb1b835afb0668dac487f5d9896e45a6840d2e2e46c90f3944a2caf39e2d08d4ecf6fb6484da75c1d09f80b170c69ddc1bdde099c8ed61aaa21
6
+ metadata.gz: 415c361fe5761dfcb68d734c970e120189cd2f7770dc8997af51e4572c42ba17f72fa1d04465d3b67d9f191ffc58054b8bff98498a42ff41ad25145ecad0eed1
7
+ data.tar.gz: 2a011e1aa4c1b1c45107fbe93ed9dffe1e5d94e61bfb429e57daa58feeeabf8dca954d85c041dee7cdb5844e5cd7df1b39be1f696fa419e2c65477036cb2a950
data/README.md CHANGED
@@ -3,7 +3,9 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/fast_regexp.svg)](https://badge.fury.io/rb/fast_regexp)
4
4
  [![Test](https://github.com/jetpks/fast_regexp/workflows/CI/badge.svg)](https://github.com/jetpks/fast_regexp/actions)
5
5
 
6
- Ruby bindings for [rust/regex](https://docs.rs/regex/latest/regex/) library.
6
+ Fast, drop-in regex for Ruby — backed by [rust/regex](https://docs.rs/regex/latest/regex/) with transparent fallback to the stdlib `::Regexp` engine for features rust/regex doesn't support (lookaround, backreferences, possessive quantifiers, etc.).
7
+
8
+ You get rust/regex's speed on the common path, and a single uniform API (`Fast::Regexp`, `Fast::Regexp::MatchData`) regardless of which engine actually ran underneath.
7
9
 
8
10
  ## Installation
9
11
 
@@ -128,11 +130,50 @@ Fast::Regexp.new('(?P<n>\w+)').names # => ["n"]
128
130
  Fast::Regexp.new('(a)(b)').captures_count # => 2
129
131
  ```
130
132
 
133
+ ### Engine fallback
134
+
135
+ rust/regex doesn't support lookaround, backreferences, or possessive
136
+ quantifiers. Rather than make you manage two regex libraries, `Fast::Regexp`
137
+ silently falls back to stdlib `::Regexp` when it sees something rust/regex
138
+ can't compile. The public API (`#match`, `#sub`, `#gsub`, `#===`, `#=~`,
139
+ `MatchData`) is identical on both paths, so callers don't have to care which
140
+ engine ran — but you can inspect or reach the underlying object when you
141
+ need to:
142
+
143
+ ```ruby
144
+ fast = Fast::Regexp.new('\w+')
145
+ fast.fast? # => true
146
+ fast.native # => #<Fast::Regexp::Native ...> (rust-backed)
147
+
148
+ slow = Fast::Regexp.new('foo(?=bar)') # lookahead — rust/regex rejects
149
+ slow.stdlib? # => true
150
+ slow.stdlib # => /foo(?=bar)/ (the real ::Regexp)
151
+ slow.match?("foobar") # => true
152
+ ```
153
+
154
+ `Fast::Regexp::MatchData` exposes the same `#native?` / `#stdlib?` / `#native`
155
+ / `#stdlib` accessors. Replacement templates use rust/regex syntax (`$1`,
156
+ `${name}`, `$$`) on both paths; the stdlib fallback translates them for you.
157
+
158
+ You can force a specific engine via the `backend:` kwarg:
159
+
160
+ ```ruby
161
+ Fast::Regexp.new('\w+', backend: :fast) # rust/regex only; raises on unsupported
162
+ Fast::Regexp.new(pat, backend: :stdlib) # skip rust/regex; use ::Regexp directly
163
+ Fast::Regexp.new('\w+', backend: :auto) # default — try rust, fall back on reject
164
+ ```
165
+
166
+ > [!NOTE]
167
+ > The fast path is byte-based (rust/regex's `regex::bytes`), so `#=~` returns
168
+ > a *byte* offset. The stdlib fallback path returns the byte offset too, for
169
+ > API consistency.
170
+
131
171
  > [!WARNING]
132
- > `rust/regex` regular expression syntax differs from Ruby's built-in
133
- > [`Regexp`](https://docs.ruby-lang.org/en/3.4/Regexp.html) library, see the
134
- > [official syntax page](https://docs.rs/regex/latest/regex/index.html#syntax) for more
135
- > details.
172
+ > `rust/regex` syntax differs from Ruby's built-in
173
+ > [`Regexp`](https://docs.ruby-lang.org/en/3.4/Regexp.html) see the
174
+ > [rust/regex syntax page](https://docs.rs/regex/latest/regex/index.html#syntax).
175
+ > When fallback kicks in, your pattern is interpreted by stdlib `::Regexp`
176
+ > instead, so Ruby's syntax applies for that compile.
136
177
 
137
178
  ### Searching simultaneously
138
179
 
@@ -176,11 +217,11 @@ It also supports parsing of strings with invalid UTF-8 characters by default. It
176
217
  In case unicode awarness of matchers should be disabled, both `Fast::Regexp` and `Fast::Regexp::Set` support `unicode: false` option:
177
218
 
178
219
  ```ruby
179
- Fast::Regexp.new('\w+').match('ю٤夏')
180
- # => ["ю٤夏"]
220
+ Fast::Regexp.new('\w+').match('ю٤夏')[0]
221
+ # => "ю٤夏"
181
222
 
182
223
  Fast::Regexp.new('\w+', unicode: false).match('ю٤夏')
183
- # => []
224
+ # => nil
184
225
 
185
226
  Fast::Regexp::Set.new(['\w', '\d', '\s']).match("ю٤\u2000")
186
227
  # => [0, 1, 2]
@@ -189,6 +230,16 @@ Fast::Regexp::Set.new(['\w', '\d', '\s'], unicode: false).match("ю٤\u2000")
189
230
  # => []
190
231
  ```
191
232
 
233
+ ## Documentation
234
+
235
+ In-depth docs live under [`docs/`](docs/README.md), organized via the
236
+ [Diátaxis](https://diataxis.fr/) framework:
237
+
238
+ - **Tutorial:** [Getting started](docs/tutorials/getting-started.md)
239
+ - **How-to:** [Migrate from stdlib `::Regexp`](docs/how-to/migrate-from-stdlib-regexp.md), [Handle unsupported syntax](docs/how-to/handle-unsupported-syntax.md)
240
+ - **Reference:** [`Fast::Regexp`](docs/reference/fast-regexp.md), [`MatchData`](docs/reference/fast-regexp-matchdata.md), [`Set`](docs/reference/fast-regexp-set.md)
241
+ - **Explainers:** [Engine fallback](docs/explainers/engine-fallback.md)
242
+
192
243
  ## Development
193
244
 
194
245
  ```sh
@@ -209,8 +260,8 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/jetpks
209
260
  huge thanks for the original bindings and the clean magnus integration that
210
261
  made this work easy to extend. This fork rebrands the gem, reshapes the
211
262
  public API (`Fast::Regexp`, real `MatchData`, `sub`/`gsub`, `===`/`=~`,
212
- `Regexp`-constructor coercion), and releases the GVL around regex execution
213
- for thread/fiber-friendly matching.
263
+ `Regexp`-constructor coercion), and adds transparent fallback to stdlib
264
+ `::Regexp` for patterns rust/regex can't compile.
214
265
 
215
266
  ## License
216
267
 
Binary file
Binary file
Binary file
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Fast
4
4
  class Regexp
5
- VERSION = "0.4.0"
5
+ VERSION = "0.6.0"
6
6
  end
7
7
  end
data/lib/fast_regexp.rb CHANGED
@@ -1,12 +1,22 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "fast_regexp/version"
4
+
5
+ module Fast
6
+ # Façade over rust/regex with a transparent fallback to stdlib `::Regexp`.
7
+ #
8
+ # `Fast::Regexp.new(pattern)` first tries to compile with rust/regex (fast,
9
+ # byte-based). If the pattern uses features rust/regex does not support
10
+ # (lookaround, backreferences, possessive quantifiers, etc.) we fall back
11
+ # to `::Regexp` so consumers don't have to juggle two libraries.
12
+ class Regexp
13
+ end
14
+ end
15
+
16
+ # Load the native extension AFTER the Fast::Regexp class shell exists — the
17
+ # Rust init() looks up Fast::Regexp and registers Native under it.
4
18
  require_relative "fast_regexp/fast_regexp"
5
19
 
6
- # Ruby-side conveniences on top of the Rust extension. The native class only
7
- # exposes raw primitives (`_native_new`, `_native_match`, `_native_sub`,
8
- # `_native_gsub`); everything user-facing lives here so the API can grow
9
- # without touching FFI.
10
20
  module Fast
11
21
  class Regexp
12
22
  RUBY_FLAG_MAP = {
@@ -16,14 +26,21 @@ module Fast
16
26
  }.freeze
17
27
 
18
28
  class << self
19
- # Accept either a pattern string or an existing `::Regexp`. When given a
20
- # Regexp the flags (`/i`, `/x`, `/m`) are translated into a leading
21
- # inline group so the rust/regex engine sees them. Unsupported features
22
- # (lookaround, backrefs) still raise from the engine with a clear
23
- # message.
29
+ # Compile a pattern. Accepts a String or a `::Regexp` (flags are
30
+ # translated into a leading `(?...)` group so rust/regex sees them).
31
+ # Falls back to `::Regexp` if rust/regex rejects the pattern.
24
32
  def new(pattern, **opts)
25
- pattern = translate_regexp(pattern) if pattern.is_a?(::Regexp)
26
- _native_new(pattern, **opts)
33
+ translated = pattern.is_a?(::Regexp) ? translate_regexp(pattern) : pattern
34
+ allocate.tap { |re| re.send(:initialize, translated, original: pattern, **opts) }
35
+ end
36
+
37
+ # Bulk-compile a symbol-keyed hash of patterns. Handy for defining a set
38
+ # of regex constants in one shot:
39
+ #
40
+ # RE = Fast::Regexp.create_many(word: '\w+', num: '\d+').freeze
41
+ # RE[:word].match("hello")
42
+ def create_many(**patterns)
43
+ patterns.transform_values { |pat| new(pat) }
27
44
  end
28
45
 
29
46
  private
@@ -36,34 +53,66 @@ module Fast
36
53
  end
37
54
  end
38
55
 
39
- # Returns a `Fast::Regexp::MatchData` on hit, `nil` on miss — matching
40
- # Ruby's `Regexp#match` shape so the old `[]`-truthy-but-empty trap is
41
- # gone.
56
+ attr_reader :pattern, :backend
57
+
58
+ BACKENDS = %i[auto fast stdlib].freeze
59
+
60
+ # Internal — use `Fast::Regexp.new`. `original` is the unmodified input
61
+ # (String or ::Regexp) so we can build an accurate stdlib fallback.
62
+ # `backend:` forces a specific engine: `:auto` (default) tries rust/regex
63
+ # and falls back to stdlib, `:fast` raises if rust/regex rejects the
64
+ # pattern, `:stdlib` skips rust/regex entirely.
65
+ def initialize(pattern, original: pattern, backend: :auto, **opts)
66
+ raise ArgumentError, "backend must be one of #{BACKENDS.inspect}" unless BACKENDS.include?(backend)
67
+ @pattern = pattern
68
+ @backend = compile_backend(pattern, original, backend, opts)
69
+ end
70
+
71
+ def fast? = @backend.is_a?(Native)
72
+ def stdlib? = !fast?
73
+
74
+ # Escape hatches for callers that need the underlying object directly.
75
+ def native = fast? ? @backend : nil
76
+ def stdlib = stdlib? ? @backend : nil
77
+
42
78
  def match(haystack)
43
- _native_match(coerce_string(haystack))
79
+ haystack = coerce_string(haystack)
80
+ raw = fast? ? @backend._native_match(haystack) : @backend.match(haystack)
81
+ raw && MatchData.new(raw, haystack)
82
+ end
83
+
84
+ def match?(haystack)
85
+ @backend.match?(coerce_string(haystack))
44
86
  end
45
87
 
46
- # Case-equality. Enables `case/when` and RSpec's
47
- # `expect(str).to match(re)`.
48
88
  def ===(other)
49
89
  return false unless other.respond_to?(:to_str)
50
90
  match?(other.to_str)
51
91
  end
52
92
 
53
- # Returns the byte offset of the first match, or nil. Matches the
54
- # semantics of `Regexp#=~` except positions are in **bytes** (rust/regex
55
- # is byte-based).
93
+ # Byte offset of the first match (rust/regex is byte-based; stdlib path
94
+ # also returns bytes here for API consistency).
56
95
  def =~(other)
57
96
  return nil unless other.respond_to?(:to_str)
58
97
  m = match(other.to_str)
59
98
  m && m.byte_begin(0)
60
99
  end
61
100
 
62
- # `sub(haystack, replacement)` or `sub(haystack) { |m| ... }`.
63
- #
64
- # The string form uses rust/regex's native replacement template: `$1`,
65
- # `${name}`, `$$` for a literal `$`. To pass a replacement string that
66
- # contains `$` literally, pass `literal: true`.
101
+ def scan(haystack)
102
+ @backend.scan(coerce_string(haystack))
103
+ end
104
+
105
+ def scan_matches(haystack)
106
+ haystack = coerce_string(haystack)
107
+ if fast?
108
+ @backend.scan_matches(haystack).map { |m| MatchData.new(m, haystack) }
109
+ else
110
+ results = []
111
+ haystack.scan(@backend) { results << MatchData.new(::Regexp.last_match, haystack) }
112
+ results
113
+ end
114
+ end
115
+
67
116
  def sub(haystack, replacement = nil, literal: false, &block)
68
117
  haystack = coerce_string(haystack)
69
118
  if block
@@ -73,32 +122,123 @@ module Fast
73
122
  "#{m.pre_match}#{block.call(m)}#{m.post_match}"
74
123
  else
75
124
  raise ArgumentError, "wrong number of arguments (given 1, expected 2)" if replacement.nil?
76
- _native_sub(haystack, coerce_string(replacement), literal)
125
+ replacement = coerce_string(replacement)
126
+ if fast?
127
+ @backend._native_sub(haystack, replacement, literal)
128
+ else
129
+ haystack.sub(@backend, stdlib_replacement(replacement, literal))
130
+ end
77
131
  end
78
132
  end
79
133
 
80
- # `gsub(haystack, replacement)` or `gsub(haystack) { |m| ... }`. See
81
- # `#sub` for the template syntax.
82
134
  def gsub(haystack, replacement = nil, literal: false, &block)
83
135
  haystack = coerce_string(haystack)
84
136
  if block
85
137
  raise ArgumentError, "wrong number of arguments (given 2, expected 1 with block)" if replacement
86
- gsub_with_block(haystack, &block)
138
+ fast? ? fast_gsub_with_block(haystack, &block) : stdlib_gsub_with_block(haystack, &block)
87
139
  else
88
140
  raise ArgumentError, "wrong number of arguments (given 1, expected 2)" if replacement.nil?
89
- _native_gsub(haystack, coerce_string(replacement), literal)
141
+ replacement = coerce_string(replacement)
142
+ if fast?
143
+ @backend._native_gsub(haystack, replacement, literal)
144
+ else
145
+ haystack.gsub(@backend, stdlib_replacement(replacement, literal))
146
+ end
90
147
  end
91
148
  end
92
149
 
93
- def inspect
94
- "#<Fast::Regexp #{pattern.inspect}>"
150
+ # On the fast path this comes from rust/regex directly. On the stdlib
151
+ # fallback we walk the source counting capturing groups while honoring
152
+ # escapes, character classes, and non-capturing / lookaround prefixes.
153
+ def captures_count
154
+ return @backend.captures_count if fast?
155
+ count_stdlib_captures(@backend.source)
95
156
  end
96
157
 
158
+ def names
159
+ @backend.names
160
+ end
161
+
162
+ def inspect = "#<Fast::Regexp #{@pattern.inspect}#{stdlib? ? " (stdlib)" : ""}>"
97
163
  alias_method :to_s, :pattern
98
164
 
99
165
  private
100
166
 
101
- def gsub_with_block(haystack)
167
+ def count_stdlib_captures(source)
168
+ count = 0
169
+ i = 0
170
+ len = source.length
171
+ while i < len
172
+ c = source[i]
173
+ if c == "\\"
174
+ i += 2
175
+ elsif c == "["
176
+ i += 1
177
+ i += 1 while i < len && source[i] != "]"
178
+ i += 1
179
+ elsif c == "("
180
+ # Capturing unless followed by (?:, (?=, (?!, (?#, (?<=, (?<!.
181
+ prefix = source[i + 1, 4] || ""
182
+ if prefix.start_with?("?:") || prefix.start_with?("?=") || prefix.start_with?("?!") ||
183
+ prefix.start_with?("?#") || prefix.start_with?("?<=") || prefix.start_with?("?<!")
184
+ i += 1
185
+ else
186
+ count += 1
187
+ i += 1
188
+ end
189
+ else
190
+ i += 1
191
+ end
192
+ end
193
+ count
194
+ end
195
+
196
+ def compile_backend(pattern, original, backend, opts)
197
+ return compile_stdlib(pattern, original) if backend == :stdlib
198
+ Native._native_new(pattern, **opts)
199
+ rescue ArgumentError => e
200
+ raise if backend == :fast
201
+ compile_stdlib(pattern, original, fallback_from: e)
202
+ end
203
+
204
+ def compile_stdlib(pattern, original, fallback_from: nil)
205
+ return original if original.is_a?(::Regexp)
206
+ ::Regexp.new(pattern)
207
+ rescue ::RegexpError => e
208
+ # Pattern is malformed in both engines (auto path) — surface the
209
+ # original rust/regex error since it's typically more detailed.
210
+ # Otherwise propagate the stdlib error as-is.
211
+ raise(fallback_from ? ArgumentError.new(fallback_from.message) : e)
212
+ end
213
+
214
+ # Maps positional capture indices to names for the stdlib backend so we
215
+ # can translate `$N` to `\k<name>` (Ruby's gsub ignores `\N` for named
216
+ # groups). Only invoked from the stdlib path.
217
+ def stdlib_name_by_index
218
+ @stdlib_name_by_index ||= @backend.named_captures.flat_map { |n, idxs| idxs.map { |i| [i, n] } }.to_h
219
+ end
220
+
221
+ # Translate rust/regex replacement syntax ($N, ${name}, $$) into Ruby
222
+ # syntax (\N, \k<name>, $) for the stdlib fallback path.
223
+ def stdlib_replacement(template, literal)
224
+ return template.gsub('\\', '\\\\\\\\') if literal
225
+
226
+ template.gsub(/\$(\$|\d+|\{[^}]+\})/) do
227
+ token = ::Regexp.last_match(1)
228
+ case token
229
+ when "$" then "$"
230
+ when /\A\d+\z/
231
+ name = stdlib_name_by_index[token.to_i]
232
+ name ? "\\k<#{name}>" : "\\#{token}"
233
+ else "\\k<#{token[1..-2]}>"
234
+ end
235
+ end
236
+ end
237
+
238
+ # Fast path: rust/regex has no native iterate-with-replace, so we scan all
239
+ # match positions up-front, then splice the result by byte offset in one
240
+ # pass. Stays UTF-8 since haystack and match slices are.
241
+ def fast_gsub_with_block(haystack)
102
242
  matches = scan_matches(haystack)
103
243
  return haystack.dup if matches.empty?
104
244
 
@@ -114,31 +254,64 @@ module Fast
114
254
  out
115
255
  end
116
256
 
257
+ # Stdlib path: String#gsub already does single-pass iterate-and-replace
258
+ # and sets $~ inside the block, so wrap the current ::MatchData and yield.
259
+ def stdlib_gsub_with_block(haystack)
260
+ haystack.gsub(@backend) { yield(MatchData.new(::Regexp.last_match, haystack)).to_s }
261
+ end
262
+
117
263
  def coerce_string(value)
118
264
  return value if value.is_a?(String)
119
265
  return value.to_str if value.respond_to?(:to_str)
120
266
  raise TypeError, "no implicit conversion of #{value.class} into String"
121
267
  end
122
268
 
269
+ # Wraps either a Fast::Regexp::Native::MatchData or a stdlib ::MatchData
270
+ # so callers see one type regardless of which backend ran.
123
271
  class MatchData
124
272
  include Enumerable
125
273
 
126
- def each(&block)
127
- to_a.each(&block)
128
- end
274
+ attr_reader :backend, :string
129
275
 
130
- def values_at(*indices)
131
- indices.map { |i| self[i] }
276
+ def initialize(backend, haystack)
277
+ @backend = backend
278
+ @string = haystack
132
279
  end
133
280
 
281
+ def native? = @backend.is_a?(Fast::Regexp::Native::MatchData)
282
+ def stdlib? = !native?
283
+
284
+ def native = native? ? @backend : nil
285
+ def stdlib = stdlib? ? @backend : nil
286
+
287
+ def [](key) = @backend[key]
288
+ def to_a = @backend.to_a
289
+ def captures = @backend.captures
290
+ def named_captures = @backend.named_captures
291
+ def names = @backend.names
292
+ def size = @backend.size
293
+ alias_method :length, :size
294
+ def pre_match = @backend.pre_match
295
+ def post_match = @backend.post_match
296
+ def to_s = @backend.to_s
297
+
298
+ # Byte-based offsets. Both backends expose these (stdlib MatchData has
299
+ # `byteoffset` / `byte_begin` / `byte_end` since Ruby 3.2).
300
+ def byteoffset(key) = @backend.byteoffset(key)
301
+ def byte_begin(key) = @backend.byte_begin(key)
302
+ def byte_end(key) = @backend.byte_end(key)
303
+
304
+ def each(&block) = to_a.each(&block)
305
+ def values_at(*indices) = indices.map { |i| self[i] }
306
+
134
307
  def ==(other)
135
308
  other.is_a?(MatchData) && to_a == other.to_a && string == other.string
136
309
  end
137
310
  alias_method :eql?, :==
138
311
 
139
- def hash
140
- [to_a, string].hash
141
- end
312
+ def hash = [to_a, string].hash
313
+
314
+ def inspect = "#<Fast::Regexp::MatchData #{to_s.inspect}>"
142
315
  end
143
316
  end
144
317
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fast_regexp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.6.0
5
5
  platform: aarch64-linux
6
6
  authors:
7
7
  - Eric Jacobs