fast_regexp 0.4.0-aarch64-linux → 0.5.0-aarch64-linux

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9e7f80619e2ffc99b8abe00ecf291d8df9f79806eef7c6a3de30e08233619c6c
4
- data.tar.gz: f834be706bed41f45c93156ca97ba78876b021027ce00f5187e26a23d1641874
3
+ metadata.gz: 7d582628d35a655bddbc2957b25874b21ee2c7a53c3cde023c1cf7a2a7570ed2
4
+ data.tar.gz: 161d29c08b3897f77248b624904fa26ddf7512ae73f74f31d845ac545d70bfea
5
5
  SHA512:
6
- metadata.gz: daea2f2211cca44de646668960d31175dc7ce26eead28da2aae6ad92ad4d1761b82227ead275f76bb5c41c6a2b87f31fd38310a633175498312b416f48b48af5
7
- data.tar.gz: 27fcf044024abbb1b835afb0668dac487f5d9896e45a6840d2e2e46c90f3944a2caf39e2d08d4ecf6fb6484da75c1d09f80b170c69ddc1bdde099c8ed61aaa21
6
+ metadata.gz: d53c46c5cc47cc38d4b1b64c1c93473850ee5b9ae864538c0792a2675b9bf64496eb7328193678bff35990810be6dbe80340cd74d6dd099fedb6f4f79607a262
7
+ data.tar.gz: 280a5bc0c14b490089cc6a3098bceba67bd58dfb347a526c3ee979ad11561888fdb7342a53173c7a854727d352b8ab0cf4f900740f448958e867308ee47834de
data/README.md CHANGED
@@ -3,7 +3,9 @@
3
3
  [![Gem Version](https://badge.fury.io/rb/fast_regexp.svg)](https://badge.fury.io/rb/fast_regexp)
4
4
  [![Test](https://github.com/jetpks/fast_regexp/workflows/CI/badge.svg)](https://github.com/jetpks/fast_regexp/actions)
5
5
 
6
- Ruby bindings for [rust/regex](https://docs.rs/regex/latest/regex/) library.
6
+ Fast, drop-in regex for Ruby — backed by [rust/regex](https://docs.rs/regex/latest/regex/) with transparent fallback to the stdlib `::Regexp` engine for features rust/regex doesn't support (lookaround, backreferences, possessive quantifiers, etc.).
7
+
8
+ You get rust/regex's speed and GVL-releasing matching on the common path, and a single uniform API (`Fast::Regexp`, `Fast::Regexp::MatchData`) regardless of which engine actually ran underneath.
7
9
 
8
10
  ## Installation
9
11
 
@@ -128,11 +130,42 @@ Fast::Regexp.new('(?P<n>\w+)').names # => ["n"]
128
130
  Fast::Regexp.new('(a)(b)').captures_count # => 2
129
131
  ```
130
132
 
133
+ ### Engine fallback
134
+
135
+ rust/regex doesn't support lookaround, backreferences, or possessive
136
+ quantifiers. Rather than make you manage two regex libraries, `Fast::Regexp`
137
+ silently falls back to stdlib `::Regexp` when it sees something rust/regex
138
+ can't compile. The public API (`#match`, `#sub`, `#gsub`, `#===`, `#=~`,
139
+ `MatchData`) is identical on both paths, so callers don't have to care which
140
+ engine ran — but you can inspect or reach the underlying object when you
141
+ need to:
142
+
143
+ ```ruby
144
+ fast = Fast::Regexp.new('\w+')
145
+ fast.fast? # => true
146
+ fast.native # => #<Fast::Regexp::Native ...> (rust-backed)
147
+
148
+ slow = Fast::Regexp.new('foo(?=bar)') # lookahead — rust/regex rejects
149
+ slow.stdlib? # => true
150
+ slow.stdlib # => /foo(?=bar)/ (the real ::Regexp)
151
+ slow.match?("foobar") # => true
152
+ ```
153
+
154
+ `Fast::Regexp::MatchData` exposes the same `#native?` / `#stdlib?` / `#native`
155
+ / `#stdlib` accessors. Replacement templates use rust/regex syntax (`$1`,
156
+ `${name}`, `$$`) on both paths; the stdlib fallback translates them for you.
157
+
158
+ > [!NOTE]
159
+ > The fast path is byte-based (rust/regex's `regex::bytes`), so `#=~` returns
160
+ > a *byte* offset. The stdlib fallback path returns the byte offset too, for
161
+ > API consistency.
162
+
131
163
  > [!WARNING]
132
- > `rust/regex` regular expression syntax differs from Ruby's built-in
133
- > [`Regexp`](https://docs.ruby-lang.org/en/3.4/Regexp.html) library, see the
134
- > [official syntax page](https://docs.rs/regex/latest/regex/index.html#syntax) for more
135
- > details.
164
+ > `rust/regex` syntax differs from Ruby's built-in
165
+ > [`Regexp`](https://docs.ruby-lang.org/en/3.4/Regexp.html) see the
166
+ > [rust/regex syntax page](https://docs.rs/regex/latest/regex/index.html#syntax).
167
+ > When fallback kicks in, your pattern is interpreted by stdlib `::Regexp`
168
+ > instead, so Ruby's syntax applies for that compile.
136
169
 
137
170
  ### Searching simultaneously
138
171
 
@@ -176,11 +209,11 @@ It also supports parsing of strings with invalid UTF-8 characters by default. It
176
209
  In case unicode awarness of matchers should be disabled, both `Fast::Regexp` and `Fast::Regexp::Set` support `unicode: false` option:
177
210
 
178
211
  ```ruby
179
- Fast::Regexp.new('\w+').match('ю٤夏')
180
- # => ["ю٤夏"]
212
+ Fast::Regexp.new('\w+').match('ю٤夏')[0]
213
+ # => "ю٤夏"
181
214
 
182
215
  Fast::Regexp.new('\w+', unicode: false).match('ю٤夏')
183
- # => []
216
+ # => nil
184
217
 
185
218
  Fast::Regexp::Set.new(['\w', '\d', '\s']).match("ю٤\u2000")
186
219
  # => [0, 1, 2]
@@ -189,6 +222,16 @@ Fast::Regexp::Set.new(['\w', '\d', '\s'], unicode: false).match("ю٤\u2000")
189
222
  # => []
190
223
  ```
191
224
 
225
+ ## Documentation
226
+
227
+ In-depth docs live under [`docs/`](docs/README.md), organized via the
228
+ [Diátaxis](https://diataxis.fr/) framework:
229
+
230
+ - **Tutorial:** [Getting started](docs/tutorials/getting-started.md)
231
+ - **How-to:** [Migrate from stdlib `::Regexp`](docs/how-to/migrate-from-stdlib-regexp.md), [Handle unsupported syntax](docs/how-to/handle-unsupported-syntax.md)
232
+ - **Reference:** [`Fast::Regexp`](docs/reference/fast-regexp.md), [`MatchData`](docs/reference/fast-regexp-matchdata.md), [`Set`](docs/reference/fast-regexp-set.md)
233
+ - **Explainers:** [Engine fallback](docs/explainers/engine-fallback.md), [Concurrency and GVL](docs/explainers/concurrency-and-gvl.md)
234
+
192
235
  ## Development
193
236
 
194
237
  ```sh
@@ -209,8 +252,9 @@ Bug reports and pull requests are welcome on GitHub at https://github.com/jetpks
209
252
  huge thanks for the original bindings and the clean magnus integration that
210
253
  made this work easy to extend. This fork rebrands the gem, reshapes the
211
254
  public API (`Fast::Regexp`, real `MatchData`, `sub`/`gsub`, `===`/`=~`,
212
- `Regexp`-constructor coercion), and releases the GVL around regex execution
213
- for thread/fiber-friendly matching.
255
+ `Regexp`-constructor coercion), releases the GVL around regex execution for
256
+ thread/fiber-friendly matching, and adds transparent fallback to stdlib
257
+ `::Regexp` for patterns rust/regex can't compile.
214
258
 
215
259
  ## License
216
260
 
Binary file
Binary file
Binary file
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Fast
4
4
  class Regexp
5
- VERSION = "0.4.0"
5
+ VERSION = "0.5.0"
6
6
  end
7
7
  end
data/lib/fast_regexp.rb CHANGED
@@ -1,12 +1,22 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require_relative "fast_regexp/version"
4
+
5
+ module Fast
6
+ # Façade over rust/regex with a transparent fallback to stdlib `::Regexp`.
7
+ #
8
+ # `Fast::Regexp.new(pattern)` first tries to compile with rust/regex (fast,
9
+ # GVL-releasing, byte-based). If the pattern uses features rust/regex does
10
+ # not support (lookaround, backreferences, possessive quantifiers, etc.) we
11
+ # fall back to `::Regexp` so consumers don't have to juggle two libraries.
12
+ class Regexp
13
+ end
14
+ end
15
+
16
+ # Load the native extension AFTER the Fast::Regexp class shell exists — the
17
+ # Rust init() looks up Fast::Regexp and registers Native under it.
4
18
  require_relative "fast_regexp/fast_regexp"
5
19
 
6
- # Ruby-side conveniences on top of the Rust extension. The native class only
7
- # exposes raw primitives (`_native_new`, `_native_match`, `_native_sub`,
8
- # `_native_gsub`); everything user-facing lives here so the API can grow
9
- # without touching FFI.
10
20
  module Fast
11
21
  class Regexp
12
22
  RUBY_FLAG_MAP = {
@@ -16,14 +26,12 @@ module Fast
16
26
  }.freeze
17
27
 
18
28
  class << self
19
- # Accept either a pattern string or an existing `::Regexp`. When given a
20
- # Regexp the flags (`/i`, `/x`, `/m`) are translated into a leading
21
- # inline group so the rust/regex engine sees them. Unsupported features
22
- # (lookaround, backrefs) still raise from the engine with a clear
23
- # message.
29
+ # Compile a pattern. Accepts a String or a `::Regexp` (flags are
30
+ # translated into a leading `(?...)` group so rust/regex sees them).
31
+ # Falls back to `::Regexp` if rust/regex rejects the pattern.
24
32
  def new(pattern, **opts)
25
- pattern = translate_regexp(pattern) if pattern.is_a?(::Regexp)
26
- _native_new(pattern, **opts)
33
+ translated = pattern.is_a?(::Regexp) ? translate_regexp(pattern) : pattern
34
+ allocate.tap { |re| re.send(:initialize, translated, original: pattern, **opts) }
27
35
  end
28
36
 
29
37
  private
@@ -36,34 +44,60 @@ module Fast
36
44
  end
37
45
  end
38
46
 
39
- # Returns a `Fast::Regexp::MatchData` on hit, `nil` on miss — matching
40
- # Ruby's `Regexp#match` shape so the old `[]`-truthy-but-empty trap is
41
- # gone.
47
+ attr_reader :pattern, :backend
48
+
49
+ # Internal — use `Fast::Regexp.new`. `original` is the unmodified input
50
+ # (String or ::Regexp) so we can build an accurate stdlib fallback.
51
+ def initialize(pattern, original: pattern, **opts)
52
+ @pattern = pattern
53
+ @backend = compile_backend(pattern, original, opts)
54
+ end
55
+
56
+ def fast? = @backend.is_a?(Native)
57
+ def stdlib? = !fast?
58
+
59
+ # Escape hatches for callers that need the underlying object directly.
60
+ def native = fast? ? @backend : nil
61
+ def stdlib = stdlib? ? @backend : nil
62
+
42
63
  def match(haystack)
43
- _native_match(coerce_string(haystack))
64
+ haystack = coerce_string(haystack)
65
+ raw = fast? ? @backend._native_match(haystack) : @backend.match(haystack)
66
+ raw && MatchData.new(raw, haystack)
67
+ end
68
+
69
+ def match?(haystack)
70
+ @backend.match?(coerce_string(haystack))
44
71
  end
45
72
 
46
- # Case-equality. Enables `case/when` and RSpec's
47
- # `expect(str).to match(re)`.
48
73
  def ===(other)
49
74
  return false unless other.respond_to?(:to_str)
50
75
  match?(other.to_str)
51
76
  end
52
77
 
53
- # Returns the byte offset of the first match, or nil. Matches the
54
- # semantics of `Regexp#=~` except positions are in **bytes** (rust/regex
55
- # is byte-based).
78
+ # Byte offset of the first match (rust/regex is byte-based; stdlib path
79
+ # also returns bytes here for API consistency).
56
80
  def =~(other)
57
81
  return nil unless other.respond_to?(:to_str)
58
82
  m = match(other.to_str)
59
83
  m && m.byte_begin(0)
60
84
  end
61
85
 
62
- # `sub(haystack, replacement)` or `sub(haystack) { |m| ... }`.
63
- #
64
- # The string form uses rust/regex's native replacement template: `$1`,
65
- # `${name}`, `$$` for a literal `$`. To pass a replacement string that
66
- # contains `$` literally, pass `literal: true`.
86
+ def scan(haystack)
87
+ @backend.scan(coerce_string(haystack))
88
+ end
89
+
90
+ def scan_matches(haystack)
91
+ haystack = coerce_string(haystack)
92
+ if fast?
93
+ @backend.scan_matches(haystack).map { |m| MatchData.new(m, haystack) }
94
+ else
95
+ results = []
96
+ haystack.scan(@backend) { results << MatchData.new(::Regexp.last_match, haystack) }
97
+ results
98
+ end
99
+ end
100
+
67
101
  def sub(haystack, replacement = nil, literal: false, &block)
68
102
  haystack = coerce_string(haystack)
69
103
  if block
@@ -73,32 +107,118 @@ module Fast
73
107
  "#{m.pre_match}#{block.call(m)}#{m.post_match}"
74
108
  else
75
109
  raise ArgumentError, "wrong number of arguments (given 1, expected 2)" if replacement.nil?
76
- _native_sub(haystack, coerce_string(replacement), literal)
110
+ replacement = coerce_string(replacement)
111
+ if fast?
112
+ @backend._native_sub(haystack, replacement, literal)
113
+ else
114
+ haystack.sub(@backend, stdlib_replacement(replacement, literal))
115
+ end
77
116
  end
78
117
  end
79
118
 
80
- # `gsub(haystack, replacement)` or `gsub(haystack) { |m| ... }`. See
81
- # `#sub` for the template syntax.
82
119
  def gsub(haystack, replacement = nil, literal: false, &block)
83
120
  haystack = coerce_string(haystack)
84
121
  if block
85
122
  raise ArgumentError, "wrong number of arguments (given 2, expected 1 with block)" if replacement
86
- gsub_with_block(haystack, &block)
123
+ fast? ? fast_gsub_with_block(haystack, &block) : stdlib_gsub_with_block(haystack, &block)
87
124
  else
88
125
  raise ArgumentError, "wrong number of arguments (given 1, expected 2)" if replacement.nil?
89
- _native_gsub(haystack, coerce_string(replacement), literal)
126
+ replacement = coerce_string(replacement)
127
+ if fast?
128
+ @backend._native_gsub(haystack, replacement, literal)
129
+ else
130
+ haystack.gsub(@backend, stdlib_replacement(replacement, literal))
131
+ end
90
132
  end
91
133
  end
92
134
 
93
- def inspect
94
- "#<Fast::Regexp #{pattern.inspect}>"
135
+ # On the fast path this comes from rust/regex directly. On the stdlib
136
+ # fallback we walk the source counting capturing groups while honoring
137
+ # escapes, character classes, and non-capturing / lookaround prefixes.
138
+ def captures_count
139
+ return @backend.captures_count if fast?
140
+ count_stdlib_captures(@backend.source)
141
+ end
142
+
143
+ def names
144
+ @backend.names
95
145
  end
96
146
 
147
+ def inspect = "#<Fast::Regexp #{@pattern.inspect}#{stdlib? ? " (stdlib)" : ""}>"
97
148
  alias_method :to_s, :pattern
98
149
 
99
150
  private
100
151
 
101
- def gsub_with_block(haystack)
152
+ def count_stdlib_captures(source)
153
+ count = 0
154
+ i = 0
155
+ len = source.length
156
+ while i < len
157
+ c = source[i]
158
+ if c == "\\"
159
+ i += 2
160
+ elsif c == "["
161
+ i += 1
162
+ i += 1 while i < len && source[i] != "]"
163
+ i += 1
164
+ elsif c == "("
165
+ # Capturing unless followed by (?:, (?=, (?!, (?#, (?<=, (?<!.
166
+ prefix = source[i + 1, 4] || ""
167
+ if prefix.start_with?("?:") || prefix.start_with?("?=") || prefix.start_with?("?!") ||
168
+ prefix.start_with?("?#") || prefix.start_with?("?<=") || prefix.start_with?("?<!")
169
+ i += 1
170
+ else
171
+ count += 1
172
+ i += 1
173
+ end
174
+ else
175
+ i += 1
176
+ end
177
+ end
178
+ count
179
+ end
180
+
181
+ def compile_backend(pattern, original, opts)
182
+ Native._native_new(pattern, **opts)
183
+ rescue ArgumentError => e
184
+ return original if original.is_a?(::Regexp)
185
+ begin
186
+ ::Regexp.new(pattern)
187
+ rescue ::RegexpError
188
+ # Pattern is malformed in both engines — surface the original
189
+ # rust/regex error so the user sees the more detailed message.
190
+ raise ArgumentError, e.message
191
+ end
192
+ end
193
+
194
+ # Maps positional capture indices to names for the stdlib backend so we
195
+ # can translate `$N` to `\k<name>` (Ruby's gsub ignores `\N` for named
196
+ # groups). Only invoked from the stdlib path.
197
+ def stdlib_name_by_index
198
+ @stdlib_name_by_index ||= @backend.named_captures.flat_map { |n, idxs| idxs.map { |i| [i, n] } }.to_h
199
+ end
200
+
201
+ # Translate rust/regex replacement syntax ($N, ${name}, $$) into Ruby
202
+ # syntax (\N, \k<name>, $) for the stdlib fallback path.
203
+ def stdlib_replacement(template, literal)
204
+ return template.gsub('\\', '\\\\\\\\') if literal
205
+
206
+ template.gsub(/\$(\$|\d+|\{[^}]+\})/) do
207
+ token = ::Regexp.last_match(1)
208
+ case token
209
+ when "$" then "$"
210
+ when /\A\d+\z/
211
+ name = stdlib_name_by_index[token.to_i]
212
+ name ? "\\k<#{name}>" : "\\#{token}"
213
+ else "\\k<#{token[1..-2]}>"
214
+ end
215
+ end
216
+ end
217
+
218
+ # Fast path: rust/regex has no native iterate-with-replace, so we scan all
219
+ # match positions up-front, then splice the result by byte offset in one
220
+ # pass. Stays UTF-8 since haystack and match slices are.
221
+ def fast_gsub_with_block(haystack)
102
222
  matches = scan_matches(haystack)
103
223
  return haystack.dup if matches.empty?
104
224
 
@@ -114,31 +234,64 @@ module Fast
114
234
  out
115
235
  end
116
236
 
237
+ # Stdlib path: String#gsub already does single-pass iterate-and-replace
238
+ # and sets $~ inside the block, so wrap the current ::MatchData and yield.
239
+ def stdlib_gsub_with_block(haystack)
240
+ haystack.gsub(@backend) { yield(MatchData.new(::Regexp.last_match, haystack)).to_s }
241
+ end
242
+
117
243
  def coerce_string(value)
118
244
  return value if value.is_a?(String)
119
245
  return value.to_str if value.respond_to?(:to_str)
120
246
  raise TypeError, "no implicit conversion of #{value.class} into String"
121
247
  end
122
248
 
249
+ # Wraps either a Fast::Regexp::Native::MatchData or a stdlib ::MatchData
250
+ # so callers see one type regardless of which backend ran.
123
251
  class MatchData
124
252
  include Enumerable
125
253
 
126
- def each(&block)
127
- to_a.each(&block)
128
- end
254
+ attr_reader :backend, :string
129
255
 
130
- def values_at(*indices)
131
- indices.map { |i| self[i] }
256
+ def initialize(backend, haystack)
257
+ @backend = backend
258
+ @string = haystack
132
259
  end
133
260
 
261
+ def native? = @backend.is_a?(Fast::Regexp::Native::MatchData)
262
+ def stdlib? = !native?
263
+
264
+ def native = native? ? @backend : nil
265
+ def stdlib = stdlib? ? @backend : nil
266
+
267
+ def [](key) = @backend[key]
268
+ def to_a = @backend.to_a
269
+ def captures = @backend.captures
270
+ def named_captures = @backend.named_captures
271
+ def names = @backend.names
272
+ def size = @backend.size
273
+ alias_method :length, :size
274
+ def pre_match = @backend.pre_match
275
+ def post_match = @backend.post_match
276
+ def to_s = @backend.to_s
277
+
278
+ # Byte-based offsets. Both backends expose these (stdlib MatchData has
279
+ # `byteoffset` / `byte_begin` / `byte_end` since Ruby 3.2).
280
+ def byteoffset(key) = @backend.byteoffset(key)
281
+ def byte_begin(key) = @backend.byte_begin(key)
282
+ def byte_end(key) = @backend.byte_end(key)
283
+
284
+ def each(&block) = to_a.each(&block)
285
+ def values_at(*indices) = indices.map { |i| self[i] }
286
+
134
287
  def ==(other)
135
288
  other.is_a?(MatchData) && to_a == other.to_a && string == other.string
136
289
  end
137
290
  alias_method :eql?, :==
138
291
 
139
- def hash
140
- [to_a, string].hash
141
- end
292
+ def hash = [to_a, string].hash
293
+
294
+ def inspect = "#<Fast::Regexp::MatchData #{to_s.inspect}>"
142
295
  end
143
296
  end
144
297
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: fast_regexp
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.0
4
+ version: 0.5.0
5
5
  platform: aarch64-linux
6
6
  authors:
7
7
  - Eric Jacobs