js_regex_to_ruby 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4aafea531008742c38c529eee7b2145db1dbffa69b3fb1d4b6ff5d6c8931a0f8
4
- data.tar.gz: 3c26adad1068ded864447c384fab39b339b8abb12a401617fcaaecde385b19ea
3
+ metadata.gz: 1fae9505fb2087b5f0600ae3600911debc5d9f492f74798ad25772645d9e8ce0
4
+ data.tar.gz: 9e316aab02a15003581d5d45cf863e4368a55dd81038fb0965de7f1f52296229
5
5
  SHA512:
6
- metadata.gz: ca43d3056ef25e4cf29f630b11dd5bbaf21a210b5e77756d45f572ca5339e9c22941c29eb3131da0c688a56d41e111bdff5643a80097f13385a09b9cc6a84b6f
7
- data.tar.gz: eca9a62564569a9a8e4413804e85c6c94254cd98fca8e0d9ecd2c91e7301028c2ef2eea884d16db1a0b1f4d9ed99e3efd64c17da2e5dcbc6776e02e1508e4836
6
+ metadata.gz: 71113df052913508f1fcce6df020cce4c10de0974469e032b9d000aec4f4fd0b4081d8b07b3e9500117d01cc90c182d5aa6808cbd41325e17b66b4d4b4cdcf2a
7
+ data.tar.gz: dd24f145589a79e09eaf99ba7a663a4c0bde8e134de7871f70a3e467667a5769132c57b298c9ddb61e854e2ecd531bb18fc1db8f0dd6344a6ae89ba9dcae18f9
data/README.md CHANGED
@@ -12,7 +12,8 @@ JavaScript and Ruby regular expressions have subtle but important differences:
12
12
  | `/s` flag (dotAll) | Makes `.` match newlines | N/A (use `/m` in Ruby) |
13
13
  | `/m` flag (multiline) | Makes `^`/`$` match line boundaries | N/A (already default behavior) |
14
14
  | `[^]` (any character) | Matches any char including `\n` | Invalid syntax (use `[\s\S]`) |
15
- | `/g`, `/y`, `/d`, `/u`, `/v` flags | Various features | No direct equivalents |
15
+ | `/g`, `/y` flags | Global / sticky runtime semantics | Supported via `JsRegexToRuby::JsRegExp` (stateful `Regexp` subclass) |
16
+ | `/d`, `/u`, `/v` flags | Various features | No direct equivalents |
16
17
 
17
18
  This gem handles these conversions automatically, emitting warnings when perfect conversion isn't possible.
18
19
 
@@ -102,10 +103,31 @@ flags #=> "gi"
102
103
  result = JsRegexToRuby.convert("/test/guy")
103
104
 
104
105
  result.warnings
105
- #=> ["JS flag(s) not representable as Ruby Regexp options: g, u, y"]
106
+ #=> ["JS flag(s) not representable as Ruby Regexp options: u"]
106
107
 
107
108
  result.ignored_js_flags
108
- #=> ["g", "u", "y"]
109
+ #=> ["u"]
110
+ ```
111
+
112
+ ### Global / Sticky (`g` / `y`) Runtime Semantics
113
+
114
+ When the JS flags include `g` and/or `y`, `convert` returns a `JsRegexToRuby::JsRegExp` (a `Regexp` subclass) that tracks `last_index` and provides JS-like methods:
115
+
116
+ ```ruby
117
+ res = JsRegexToRuby.convert("/foo/g")
118
+ re = res.regexp
119
+
120
+ re.last_index = 2
121
+ re.exec("foo foo")&.begin(0) #=> 4
122
+ re.last_index #=> 7
123
+ ```
124
+
125
+ For safe global iteration (avoids empty-match infinite loops), use `match_all`:
126
+
127
+ ```ruby
128
+ re = JsRegexToRuby.convert("/.*/g").regexp
129
+ re.match_all("a").map { |m| [m[0], m.begin(0)] }
130
+ #=> [["a", 0], ["", 1]]
109
131
  ```
110
132
 
111
133
  ### Result Object
@@ -114,7 +136,7 @@ The `Result` struct provides comprehensive information:
114
136
 
115
137
  | Method | Description |
116
138
  |--------|-------------|
117
- | `regexp` | The compiled `Regexp` object (or `nil` if compilation failed) |
139
+ | `regexp` | The compiled `Regexp` object (or `JsRegexToRuby::JsRegExp` when `g`/`y` are present), or `nil` if compilation failed |
118
140
  | `success?` | Returns `true` if `regexp` is not `nil` |
119
141
  | `ruby_source` | The converted Ruby regex pattern string |
120
142
  | `ruby_options` | Integer flags (`Regexp::IGNORECASE`, `Regexp::MULTILINE`, etc.) |
@@ -144,8 +166,8 @@ result.regexp #=> nil
144
166
  | `i` | `Regexp::IGNORECASE` | Case-insensitive matching |
145
167
  | `s` | `Regexp::MULTILINE` | JS dotAll → Ruby multiline (`.` matches `\n`) |
146
168
  | `m` | *(behavior change)* | Keeps `^`/`$` as-is instead of converting to `\A`/`\z` |
147
- | `g` | *(ignored)* | Global matching - handle in application code |
148
- | `y` | *(ignored)* | Sticky matching - no equivalent |
169
+ | `g` | `JsRegexToRuby::JsRegExp` | JS-like `lastIndex`/`exec`/`test` behavior (not a Ruby Regexp option) |
170
+ | `y` | `\G` + `JsRegexToRuby::JsRegExp` | Sticky matching via `\G` prefix + `lastIndex` runtime behavior |
149
171
  | `u` | *(ignored)* | Unicode mode - Ruby handles Unicode differently |
150
172
  | `v` | *(ignored)* | Unicode sets mode - no equivalent |
151
173
  | `d` | *(ignored)* | Indices for matches - no equivalent |
@@ -197,9 +219,15 @@ result.regexp.match?("axb") #=> true
197
219
 
198
220
  Note: Negated character classes like `[^abc]` are NOT affected and work as expected.
199
221
 
222
+ ### Ruby-Only Escape Sequences
223
+
224
+ JavaScript allows many "identity escapes" where `\X` means `"X"` (outside Unicode modes).
225
+ Ruby has additional special sequences like `\A`, `\z`, `\G`, and `\Q...\E`.
226
+ To preserve JS behavior, the converter rewrites these to literal characters where appropriate (e.g., JS `\A` becomes Ruby `A`).
227
+
200
228
  ## Limitations
201
229
 
202
- 1. **No runtime flags**: JS flags like `g` (global) and `y` (sticky) affect matching behavior at runtime and have no Ruby `Regexp` equivalent. Handle these in your application logic.
230
+ 1. **Runtime flags (`g`, `y`) are stateful**: When present, `convert` returns a `JsRegexToRuby::JsRegExp` (subclass of `Regexp`) that tracks `last_index` and provides JS-like `exec`/`test`/`match` behavior. Use `match_all` for safe iteration. Note: `String#match?`, `String#scan`, and `String#=~` bypass `Regexp` method dispatch and will not update `last_index`.
203
231
 
204
232
  2. **Unicode properties**: `\p{...}` syntax exists in both JS and Ruby but with different property names and semantics. No automatic conversion is performed.
205
233
 
@@ -9,13 +9,19 @@ module JsRegexToRuby
9
9
  # - JS /s (dotAll) => Ruby /m (dot-all in Ruby)
10
10
  # - JS /m (multiline anchors) => Ruby has ^/$ multiline by default, so we rewrite ^/$
11
11
  # to \A/\z when JS multiline is NOT enabled.
12
+ # - JS /y (sticky) => Ruby \G prefix (requires runtime lastIndex management for full fidelity)
12
13
  # - JS inline modifiers (?ims-ims:...) are supported, with mapping s->m and special handling for m.
13
14
  # - JS [^] (match any character including newline) => Ruby [\s\S]
14
- # - JS /g, /y, /u, /v, /d have no direct Regexp equivalent; we report them in Result#ignored_js_flags.
15
+ # - JS /g and /y are supported via JsRegexToRuby::JsRegExp runtime semantics (last_index).
16
+ # - JS /u, /v, /d have no direct Regexp equivalent; we report them in Result#ignored_js_flags.
15
17
  class Converter
16
18
  JS_KNOWN_FLAGS = %w[d g i m s u v y].freeze
17
19
  JS_GROUP_MOD_FLAGS = %w[i m s].freeze
18
20
 
21
+ # Ruby has additional backslash escapes that JavaScript treats as identity escapes (i.e., \X == "X").
22
+ # To preserve JS behavior, we drop the backslash for these sequences during rewriting.
23
+ RUBY_IDENTITY_ESCAPE_CHARS = %w[A Z z G K Q E R X h H a e C M].freeze
24
+
19
25
  # Tracks modifier state during source rewriting (immutable).
20
26
  Context = Data.define(:js_multiline_anchors, :ruby_ignorecase, :ruby_dotall)
21
27
 
@@ -86,7 +92,10 @@ module JsRegexToRuby
86
92
  warnings << "Unknown JS RegExp flag(s): #{unknown_flags.uniq.join(', ')}" unless unknown_flags.empty?
87
93
  warnings << "Duplicate JS RegExp flag(s) ignored: #{duplicate_flags.uniq.join(', ')}" unless duplicate_flags.empty?
88
94
 
89
- ignored_js_flags = (seen_flags.keys - %w[i m s]).sort
95
+ # Flags that remain unhandled in the conversion output.
96
+ handled_js_flags = %w[i m s y]
97
+ handled_js_flags << "g" if compile
98
+ ignored_js_flags = (seen_flags.keys - handled_js_flags).sort
90
99
 
91
100
  unless ignored_js_flags.empty?
92
101
  warnings << "JS flag(s) not representable as Ruby Regexp options: #{ignored_js_flags.join(', ')}"
@@ -95,6 +104,8 @@ module JsRegexToRuby
95
104
  base_js_multiline = seen_flags["m"]
96
105
  base_js_ignorecase = seen_flags["i"]
97
106
  base_js_dotall = seen_flags["s"]
107
+ base_js_global = seen_flags["g"]
108
+ base_js_sticky = seen_flags["y"]
98
109
 
99
110
  ruby_options = 0
100
111
  ruby_options |= Regexp::IGNORECASE if base_js_ignorecase
@@ -107,11 +118,17 @@ module JsRegexToRuby
107
118
  )
108
119
 
109
120
  ruby_source = rewrite_source(js_source, base_ctx, warnings)
121
+ ruby_source = "\\G(?:#{ruby_source})" if base_js_sticky
110
122
 
111
123
  regexp = nil
112
124
  if compile
113
125
  begin
114
- regexp = Regexp.new(ruby_source, ruby_options)
126
+ regexp =
127
+ if base_js_global || base_js_sticky
128
+ JsRegExp.new(ruby_source, ruby_options, js_flags: js_flags)
129
+ else
130
+ Regexp.new(ruby_source, ruby_options)
131
+ end
115
132
  rescue RegexpError => e
116
133
  warnings << "Ruby RegexpError: #{e.message}"
117
134
  regexp = nil
@@ -178,12 +195,18 @@ module JsRegexToRuby
178
195
  next
179
196
  end
180
197
 
181
- out << ch
182
- if i + 1 < src.length
183
- out << src[i + 1]
198
+ next_ch = src[i + 1]
199
+ if next_ch && ruby_identity_escape_char?(next_ch)
200
+ out << next_ch
184
201
  i += 2
185
202
  else
186
- i += 1
203
+ out << ch
204
+ if next_ch
205
+ out << next_ch
206
+ i += 2
207
+ else
208
+ i += 1
209
+ end
187
210
  end
188
211
  next
189
212
  end
@@ -200,12 +223,18 @@ module JsRegexToRuby
200
223
  out << control_char(src[i + 2])
201
224
  i += 3
202
225
  else
203
- out << ch
204
- if i + 1 < src.length
205
- out << src[i + 1]
226
+ next_ch = src[i + 1]
227
+ if next_ch && ruby_identity_escape_char?(next_ch)
228
+ out << next_ch
206
229
  i += 2
207
230
  else
208
- i += 1
231
+ out << ch
232
+ if next_ch
233
+ out << next_ch
234
+ i += 2
235
+ else
236
+ i += 1
237
+ end
209
238
  end
210
239
  end
211
240
 
@@ -375,9 +404,13 @@ module JsRegexToRuby
375
404
  end
376
405
  end
377
406
 
407
+ def self.ruby_identity_escape_char?(ch)
408
+ RUBY_IDENTITY_ESCAPE_CHARS.include?(ch)
409
+ end
410
+
378
411
  private_class_method :looks_like_literal?, :normalize_flags,
379
412
  :rewrite_source, :control_escape_at?, :control_char,
380
413
  :parse_js_modifier_group, :apply_js_group_modifiers,
381
- :build_ruby_modifier_prefix
414
+ :build_ruby_modifier_prefix, :ruby_identity_escape_char?
382
415
  end
383
416
  end
@@ -0,0 +1,150 @@
1
+ # frozen_string_literal: true
2
+
3
+ module JsRegexToRuby
4
+ # A Regexp subclass that emulates JavaScript's runtime RegExp semantics for:
5
+ # - g (global): searches from last_index and updates it on success; resets to 0 on failure
6
+ # - y (sticky): requires a match at last_index (implemented via a leading \G)
7
+ #
8
+ # Note: Some Ruby String methods (e.g., String#match?, String#=~, String#scan) do not dispatch
9
+ # to Regexp methods and therefore cannot update last_index.
10
+ class JsRegExp < Regexp
11
+ POS_UNSET = Object.new
12
+ private_constant :POS_UNSET
13
+
14
+ attr_reader :js_flags
15
+
16
+ def initialize(source, options = 0, js_flags: "")
17
+ @js_flags = js_flags.to_s
18
+ @global = @js_flags.include?("g")
19
+ @sticky = @js_flags.include?("y")
20
+ @last_index = 0
21
+ super(source, options)
22
+ end
23
+
24
+ def global?
25
+ @global
26
+ end
27
+
28
+ def sticky?
29
+ @sticky
30
+ end
31
+
32
+ def last_index
33
+ @last_index
34
+ end
35
+
36
+ def last_index=(value)
37
+ i = value.to_i
38
+ @last_index = i < 0 ? 0 : i
39
+ end
40
+
41
+ alias lastIndex last_index
42
+
43
+ def lastIndex=(value)
44
+ self.last_index = value
45
+ end
46
+
47
+ def reset
48
+ @last_index = 0
49
+ self
50
+ end
51
+
52
+ # JS-like exec: returns MatchData (or nil) and updates last_index for g/y.
53
+ def exec(str)
54
+ match_internal(str, POS_UNSET)
55
+ end
56
+
57
+ # JS-like test: boolean wrapper around exec.
58
+ def test(str)
59
+ !exec(str).nil?
60
+ end
61
+
62
+ alias test? test
63
+
64
+ # Safe global iteration (similar to JS String#matchAll):
65
+ # - does not permanently mutate last_index
66
+ # - avoids infinite loops for empty-string matches by advancing 1 char
67
+ def match_all(str)
68
+ return enum_for(:match_all, str) unless block_given?
69
+
70
+ str = str.to_s
71
+ saved_last_index = @last_index
72
+
73
+ begin
74
+ @last_index = 0
75
+ while (m = exec(str))
76
+ yield m
77
+ if m[0].empty?
78
+ @last_index += 1
79
+ @last_index = str.length + 1 if @last_index > str.length
80
+ end
81
+ end
82
+ ensure
83
+ @last_index = saved_last_index
84
+ end
85
+ end
86
+
87
+ # If g/y and pos is omitted, behaves like #exec (uses last_index and updates it).
88
+ # If pos is provided, behaves like Ruby Regexp#match and does not touch last_index.
89
+ def match(str, pos = POS_UNSET, &block)
90
+ m = match_internal(str, pos)
91
+ return yield(m) if block && m
92
+ m
93
+ end
94
+
95
+ # If g/y and pos is omitted, behaves like #test (uses last_index and updates it).
96
+ # If pos is provided, behaves like Ruby Regexp#match? and does not touch last_index.
97
+ def match?(str, pos = POS_UNSET)
98
+ !!match_internal(str, pos)
99
+ end
100
+
101
+ # Used by `case`/`when`.
102
+ def ===(other)
103
+ !!match_internal(other, POS_UNSET)
104
+ end
105
+
106
+ # JS-like semantics when regexp is on the LHS.
107
+ def =~(other)
108
+ m = match_internal(other, POS_UNSET)
109
+ m ? m.begin(0) : nil
110
+ end
111
+
112
+ private
113
+
114
+ def uses_last_index?
115
+ @global || @sticky
116
+ end
117
+
118
+ def raw_match(str, pos)
119
+ Regexp.instance_method(:match).bind_call(self, str, pos)
120
+ end
121
+
122
+ def match_internal(str, pos)
123
+ str = str.to_s
124
+
125
+ if uses_last_index?
126
+ if pos == POS_UNSET
127
+ start = @last_index
128
+ if start < 0 || start > str.length
129
+ @last_index = 0
130
+ return nil
131
+ end
132
+
133
+ m = raw_match(str, start)
134
+ if m
135
+ @last_index = m.end(0)
136
+ return m
137
+ end
138
+
139
+ @last_index = 0
140
+ return nil
141
+ end
142
+
143
+ return raw_match(str, pos)
144
+ end
145
+
146
+ pos = 0 if pos == POS_UNSET
147
+ raw_match(str, pos)
148
+ end
149
+ end
150
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module JsRegexToRuby
4
- VERSION = "0.1.2"
4
+ VERSION = "0.1.3"
5
5
  end
@@ -2,6 +2,7 @@
2
2
 
3
3
  require_relative "js_regex_to_ruby/version"
4
4
  require_relative "js_regex_to_ruby/result"
5
+ require_relative "js_regex_to_ruby/js_regexp"
5
6
  require_relative "js_regex_to_ruby/converter"
6
7
 
7
8
  module JsRegexToRuby
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: js_regex_to_ruby
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - jasl
@@ -25,6 +25,7 @@ files:
25
25
  - Rakefile
26
26
  - lib/js_regex_to_ruby.rb
27
27
  - lib/js_regex_to_ruby/converter.rb
28
+ - lib/js_regex_to_ruby/js_regexp.rb
28
29
  - lib/js_regex_to_ruby/result.rb
29
30
  - lib/js_regex_to_ruby/version.rb
30
31
  - sig/js_regex_to_ruby.rbs