js_regex_to_ruby 0.1.2 → 0.1.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +35 -7
- data/lib/js_regex_to_ruby/converter.rb +45 -12
- data/lib/js_regex_to_ruby/js_regexp.rb +150 -0
- data/lib/js_regex_to_ruby/version.rb +1 -1
- data/lib/js_regex_to_ruby.rb +1 -0
- metadata +2 -1
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1fae9505fb2087b5f0600ae3600911debc5d9f492f74798ad25772645d9e8ce0
|
|
4
|
+
data.tar.gz: 9e316aab02a15003581d5d45cf863e4368a55dd81038fb0965de7f1f52296229
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 71113df052913508f1fcce6df020cce4c10de0974469e032b9d000aec4f4fd0b4081d8b07b3e9500117d01cc90c182d5aa6808cbd41325e17b66b4d4b4cdcf2a
|
|
7
|
+
data.tar.gz: dd24f145589a79e09eaf99ba7a663a4c0bde8e134de7871f70a3e467667a5769132c57b298c9ddb61e854e2ecd531bb18fc1db8f0dd6344a6ae89ba9dcae18f9
|
data/README.md
CHANGED
|
@@ -12,7 +12,8 @@ JavaScript and Ruby regular expressions have subtle but important differences:
|
|
|
12
12
|
| `/s` flag (dotAll) | Makes `.` match newlines | N/A (use `/m` in Ruby) |
|
|
13
13
|
| `/m` flag (multiline) | Makes `^`/`$` match line boundaries | N/A (already default behavior) |
|
|
14
14
|
| `[^]` (any character) | Matches any char including `\n` | Invalid syntax (use `[\s\S]`) |
|
|
15
|
-
| `/g`, `/y
|
|
15
|
+
| `/g`, `/y` flags | Global / sticky runtime semantics | Supported via `JsRegexToRuby::JsRegExp` (stateful `Regexp` subclass) |
|
|
16
|
+
| `/d`, `/u`, `/v` flags | Various features | No direct equivalents |
|
|
16
17
|
|
|
17
18
|
This gem handles these conversions automatically, emitting warnings when perfect conversion isn't possible.
|
|
18
19
|
|
|
@@ -102,10 +103,31 @@ flags #=> "gi"
|
|
|
102
103
|
result = JsRegexToRuby.convert("/test/guy")
|
|
103
104
|
|
|
104
105
|
result.warnings
|
|
105
|
-
#=> ["JS flag(s) not representable as Ruby Regexp options:
|
|
106
|
+
#=> ["JS flag(s) not representable as Ruby Regexp options: u"]
|
|
106
107
|
|
|
107
108
|
result.ignored_js_flags
|
|
108
|
-
#=> ["
|
|
109
|
+
#=> ["u"]
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
### Global / Sticky (`g` / `y`) Runtime Semantics
|
|
113
|
+
|
|
114
|
+
When the JS flags include `g` and/or `y`, `convert` returns a `JsRegexToRuby::JsRegExp` (a `Regexp` subclass) that tracks `last_index` and provides JS-like methods:
|
|
115
|
+
|
|
116
|
+
```ruby
|
|
117
|
+
res = JsRegexToRuby.convert("/foo/g")
|
|
118
|
+
re = res.regexp
|
|
119
|
+
|
|
120
|
+
re.last_index = 2
|
|
121
|
+
re.exec("foo foo")&.begin(0) #=> 4
|
|
122
|
+
re.last_index #=> 7
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
For safe global iteration (avoids empty-match infinite loops), use `match_all`:
|
|
126
|
+
|
|
127
|
+
```ruby
|
|
128
|
+
re = JsRegexToRuby.convert("/.*/g").regexp
|
|
129
|
+
re.match_all("a").map { |m| [m[0], m.begin(0)] }
|
|
130
|
+
#=> [["a", 0], ["", 1]]
|
|
109
131
|
```
|
|
110
132
|
|
|
111
133
|
### Result Object
|
|
@@ -114,7 +136,7 @@ The `Result` struct provides comprehensive information:
|
|
|
114
136
|
|
|
115
137
|
| Method | Description |
|
|
116
138
|
|--------|-------------|
|
|
117
|
-
| `regexp` | The compiled `Regexp` object (or `nil` if compilation failed
|
|
139
|
+
| `regexp` | The compiled `Regexp` object (or `JsRegexToRuby::JsRegExp` when `g`/`y` are present), or `nil` if compilation failed |
|
|
118
140
|
| `success?` | Returns `true` if `regexp` is not `nil` |
|
|
119
141
|
| `ruby_source` | The converted Ruby regex pattern string |
|
|
120
142
|
| `ruby_options` | Integer flags (`Regexp::IGNORECASE`, `Regexp::MULTILINE`, etc.) |
|
|
@@ -144,8 +166,8 @@ result.regexp #=> nil
|
|
|
144
166
|
| `i` | `Regexp::IGNORECASE` | Case-insensitive matching |
|
|
145
167
|
| `s` | `Regexp::MULTILINE` | JS dotAll → Ruby multiline (`.` matches `\n`) |
|
|
146
168
|
| `m` | *(behavior change)* | Keeps `^`/`$` as-is instead of converting to `\A`/`\z` |
|
|
147
|
-
| `g` |
|
|
148
|
-
| `y` |
|
|
169
|
+
| `g` | `JsRegexToRuby::JsRegExp` | JS-like `lastIndex`/`exec`/`test` behavior (not a Ruby Regexp option) |
|
|
170
|
+
| `y` | `\G` + `JsRegexToRuby::JsRegExp` | Sticky matching via `\G` prefix + `lastIndex` runtime behavior |
|
|
149
171
|
| `u` | *(ignored)* | Unicode mode - Ruby handles Unicode differently |
|
|
150
172
|
| `v` | *(ignored)* | Unicode sets mode - no equivalent |
|
|
151
173
|
| `d` | *(ignored)* | Indices for matches - no equivalent |
|
|
@@ -197,9 +219,15 @@ result.regexp.match?("axb") #=> true
|
|
|
197
219
|
|
|
198
220
|
Note: Negated character classes like `[^abc]` are NOT affected and work as expected.
|
|
199
221
|
|
|
222
|
+
### Ruby-Only Escape Sequences
|
|
223
|
+
|
|
224
|
+
JavaScript allows many "identity escapes" where `\X` means `"X"` (outside Unicode modes).
|
|
225
|
+
Ruby has additional special sequences like `\A`, `\z`, `\G`, and `\Q...\E`.
|
|
226
|
+
To preserve JS behavior, the converter rewrites these to literal characters where appropriate (e.g., JS `\A` becomes Ruby `A`).
|
|
227
|
+
|
|
200
228
|
## Limitations
|
|
201
229
|
|
|
202
|
-
1. **
|
|
230
|
+
1. **Runtime flags (`g`, `y`) are stateful**: When present, `convert` returns a `JsRegexToRuby::JsRegExp` (subclass of `Regexp`) that tracks `last_index` and provides JS-like `exec`/`test`/`match` behavior. Use `match_all` for safe iteration. Note: `String#match?`, `String#scan`, and `String#=~` bypass `Regexp` method dispatch and will not update `last_index`.
|
|
203
231
|
|
|
204
232
|
2. **Unicode properties**: `\p{...}` syntax exists in both JS and Ruby but with different property names and semantics. No automatic conversion is performed.
|
|
205
233
|
|
|
@@ -9,13 +9,19 @@ module JsRegexToRuby
|
|
|
9
9
|
# - JS /s (dotAll) => Ruby /m (dot-all in Ruby)
|
|
10
10
|
# - JS /m (multiline anchors) => Ruby has ^/$ multiline by default, so we rewrite ^/$
|
|
11
11
|
# to \A/\z when JS multiline is NOT enabled.
|
|
12
|
+
# - JS /y (sticky) => Ruby \G prefix (requires runtime lastIndex management for full fidelity)
|
|
12
13
|
# - JS inline modifiers (?ims-ims:...) are supported, with mapping s->m and special handling for m.
|
|
13
14
|
# - JS [^] (match any character including newline) => Ruby [\s\S]
|
|
14
|
-
# - JS /g
|
|
15
|
+
# - JS /g and /y are supported via JsRegexToRuby::JsRegExp runtime semantics (last_index).
|
|
16
|
+
# - JS /u, /v, /d have no direct Regexp equivalent; we report them in Result#ignored_js_flags.
|
|
15
17
|
class Converter
|
|
16
18
|
JS_KNOWN_FLAGS = %w[d g i m s u v y].freeze
|
|
17
19
|
JS_GROUP_MOD_FLAGS = %w[i m s].freeze
|
|
18
20
|
|
|
21
|
+
# Ruby has additional backslash escapes that JavaScript treats as identity escapes (i.e., \X == "X").
|
|
22
|
+
# To preserve JS behavior, we drop the backslash for these sequences during rewriting.
|
|
23
|
+
RUBY_IDENTITY_ESCAPE_CHARS = %w[A Z z G K Q E R X h H a e C M].freeze
|
|
24
|
+
|
|
19
25
|
# Tracks modifier state during source rewriting (immutable).
|
|
20
26
|
Context = Data.define(:js_multiline_anchors, :ruby_ignorecase, :ruby_dotall)
|
|
21
27
|
|
|
@@ -86,7 +92,10 @@ module JsRegexToRuby
|
|
|
86
92
|
warnings << "Unknown JS RegExp flag(s): #{unknown_flags.uniq.join(', ')}" unless unknown_flags.empty?
|
|
87
93
|
warnings << "Duplicate JS RegExp flag(s) ignored: #{duplicate_flags.uniq.join(', ')}" unless duplicate_flags.empty?
|
|
88
94
|
|
|
89
|
-
|
|
95
|
+
# Flags that remain unhandled in the conversion output.
|
|
96
|
+
handled_js_flags = %w[i m s y]
|
|
97
|
+
handled_js_flags << "g" if compile
|
|
98
|
+
ignored_js_flags = (seen_flags.keys - handled_js_flags).sort
|
|
90
99
|
|
|
91
100
|
unless ignored_js_flags.empty?
|
|
92
101
|
warnings << "JS flag(s) not representable as Ruby Regexp options: #{ignored_js_flags.join(', ')}"
|
|
@@ -95,6 +104,8 @@ module JsRegexToRuby
|
|
|
95
104
|
base_js_multiline = seen_flags["m"]
|
|
96
105
|
base_js_ignorecase = seen_flags["i"]
|
|
97
106
|
base_js_dotall = seen_flags["s"]
|
|
107
|
+
base_js_global = seen_flags["g"]
|
|
108
|
+
base_js_sticky = seen_flags["y"]
|
|
98
109
|
|
|
99
110
|
ruby_options = 0
|
|
100
111
|
ruby_options |= Regexp::IGNORECASE if base_js_ignorecase
|
|
@@ -107,11 +118,17 @@ module JsRegexToRuby
|
|
|
107
118
|
)
|
|
108
119
|
|
|
109
120
|
ruby_source = rewrite_source(js_source, base_ctx, warnings)
|
|
121
|
+
ruby_source = "\\G(?:#{ruby_source})" if base_js_sticky
|
|
110
122
|
|
|
111
123
|
regexp = nil
|
|
112
124
|
if compile
|
|
113
125
|
begin
|
|
114
|
-
regexp =
|
|
126
|
+
regexp =
|
|
127
|
+
if base_js_global || base_js_sticky
|
|
128
|
+
JsRegExp.new(ruby_source, ruby_options, js_flags: js_flags)
|
|
129
|
+
else
|
|
130
|
+
Regexp.new(ruby_source, ruby_options)
|
|
131
|
+
end
|
|
115
132
|
rescue RegexpError => e
|
|
116
133
|
warnings << "Ruby RegexpError: #{e.message}"
|
|
117
134
|
regexp = nil
|
|
@@ -178,12 +195,18 @@ module JsRegexToRuby
|
|
|
178
195
|
next
|
|
179
196
|
end
|
|
180
197
|
|
|
181
|
-
|
|
182
|
-
if
|
|
183
|
-
out <<
|
|
198
|
+
next_ch = src[i + 1]
|
|
199
|
+
if next_ch && ruby_identity_escape_char?(next_ch)
|
|
200
|
+
out << next_ch
|
|
184
201
|
i += 2
|
|
185
202
|
else
|
|
186
|
-
|
|
203
|
+
out << ch
|
|
204
|
+
if next_ch
|
|
205
|
+
out << next_ch
|
|
206
|
+
i += 2
|
|
207
|
+
else
|
|
208
|
+
i += 1
|
|
209
|
+
end
|
|
187
210
|
end
|
|
188
211
|
next
|
|
189
212
|
end
|
|
@@ -200,12 +223,18 @@ module JsRegexToRuby
|
|
|
200
223
|
out << control_char(src[i + 2])
|
|
201
224
|
i += 3
|
|
202
225
|
else
|
|
203
|
-
|
|
204
|
-
if
|
|
205
|
-
out <<
|
|
226
|
+
next_ch = src[i + 1]
|
|
227
|
+
if next_ch && ruby_identity_escape_char?(next_ch)
|
|
228
|
+
out << next_ch
|
|
206
229
|
i += 2
|
|
207
230
|
else
|
|
208
|
-
|
|
231
|
+
out << ch
|
|
232
|
+
if next_ch
|
|
233
|
+
out << next_ch
|
|
234
|
+
i += 2
|
|
235
|
+
else
|
|
236
|
+
i += 1
|
|
237
|
+
end
|
|
209
238
|
end
|
|
210
239
|
end
|
|
211
240
|
|
|
@@ -375,9 +404,13 @@ module JsRegexToRuby
|
|
|
375
404
|
end
|
|
376
405
|
end
|
|
377
406
|
|
|
407
|
+
def self.ruby_identity_escape_char?(ch)
|
|
408
|
+
RUBY_IDENTITY_ESCAPE_CHARS.include?(ch)
|
|
409
|
+
end
|
|
410
|
+
|
|
378
411
|
private_class_method :looks_like_literal?, :normalize_flags,
|
|
379
412
|
:rewrite_source, :control_escape_at?, :control_char,
|
|
380
413
|
:parse_js_modifier_group, :apply_js_group_modifiers,
|
|
381
|
-
:build_ruby_modifier_prefix
|
|
414
|
+
:build_ruby_modifier_prefix, :ruby_identity_escape_char?
|
|
382
415
|
end
|
|
383
416
|
end
|
|
@@ -0,0 +1,150 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module JsRegexToRuby
|
|
4
|
+
# A Regexp subclass that emulates JavaScript's runtime RegExp semantics for:
|
|
5
|
+
# - g (global): searches from last_index and updates it on success; resets to 0 on failure
|
|
6
|
+
# - y (sticky): requires a match at last_index (implemented via a leading \G)
|
|
7
|
+
#
|
|
8
|
+
# Note: Some Ruby String methods (e.g., String#match?, String#=~, String#scan) do not dispatch
|
|
9
|
+
# to Regexp methods and therefore cannot update last_index.
|
|
10
|
+
class JsRegExp < Regexp
|
|
11
|
+
POS_UNSET = Object.new
|
|
12
|
+
private_constant :POS_UNSET
|
|
13
|
+
|
|
14
|
+
attr_reader :js_flags
|
|
15
|
+
|
|
16
|
+
def initialize(source, options = 0, js_flags: "")
|
|
17
|
+
@js_flags = js_flags.to_s
|
|
18
|
+
@global = @js_flags.include?("g")
|
|
19
|
+
@sticky = @js_flags.include?("y")
|
|
20
|
+
@last_index = 0
|
|
21
|
+
super(source, options)
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def global?
|
|
25
|
+
@global
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
def sticky?
|
|
29
|
+
@sticky
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def last_index
|
|
33
|
+
@last_index
|
|
34
|
+
end
|
|
35
|
+
|
|
36
|
+
def last_index=(value)
|
|
37
|
+
i = value.to_i
|
|
38
|
+
@last_index = i < 0 ? 0 : i
|
|
39
|
+
end
|
|
40
|
+
|
|
41
|
+
alias lastIndex last_index
|
|
42
|
+
|
|
43
|
+
def lastIndex=(value)
|
|
44
|
+
self.last_index = value
|
|
45
|
+
end
|
|
46
|
+
|
|
47
|
+
def reset
|
|
48
|
+
@last_index = 0
|
|
49
|
+
self
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
# JS-like exec: returns MatchData (or nil) and updates last_index for g/y.
|
|
53
|
+
def exec(str)
|
|
54
|
+
match_internal(str, POS_UNSET)
|
|
55
|
+
end
|
|
56
|
+
|
|
57
|
+
# JS-like test: boolean wrapper around exec.
|
|
58
|
+
def test(str)
|
|
59
|
+
!exec(str).nil?
|
|
60
|
+
end
|
|
61
|
+
|
|
62
|
+
alias test? test
|
|
63
|
+
|
|
64
|
+
# Safe global iteration (similar to JS String#matchAll):
|
|
65
|
+
# - does not permanently mutate last_index
|
|
66
|
+
# - avoids infinite loops for empty-string matches by advancing 1 char
|
|
67
|
+
def match_all(str)
|
|
68
|
+
return enum_for(:match_all, str) unless block_given?
|
|
69
|
+
|
|
70
|
+
str = str.to_s
|
|
71
|
+
saved_last_index = @last_index
|
|
72
|
+
|
|
73
|
+
begin
|
|
74
|
+
@last_index = 0
|
|
75
|
+
while (m = exec(str))
|
|
76
|
+
yield m
|
|
77
|
+
if m[0].empty?
|
|
78
|
+
@last_index += 1
|
|
79
|
+
@last_index = str.length + 1 if @last_index > str.length
|
|
80
|
+
end
|
|
81
|
+
end
|
|
82
|
+
ensure
|
|
83
|
+
@last_index = saved_last_index
|
|
84
|
+
end
|
|
85
|
+
end
|
|
86
|
+
|
|
87
|
+
# If g/y and pos is omitted, behaves like #exec (uses last_index and updates it).
|
|
88
|
+
# If pos is provided, behaves like Ruby Regexp#match and does not touch last_index.
|
|
89
|
+
def match(str, pos = POS_UNSET, &block)
|
|
90
|
+
m = match_internal(str, pos)
|
|
91
|
+
return yield(m) if block && m
|
|
92
|
+
m
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
# If g/y and pos is omitted, behaves like #test (uses last_index and updates it).
|
|
96
|
+
# If pos is provided, behaves like Ruby Regexp#match? and does not touch last_index.
|
|
97
|
+
def match?(str, pos = POS_UNSET)
|
|
98
|
+
!!match_internal(str, pos)
|
|
99
|
+
end
|
|
100
|
+
|
|
101
|
+
# Used by `case`/`when`.
|
|
102
|
+
def ===(other)
|
|
103
|
+
!!match_internal(other, POS_UNSET)
|
|
104
|
+
end
|
|
105
|
+
|
|
106
|
+
# JS-like semantics when regexp is on the LHS.
|
|
107
|
+
def =~(other)
|
|
108
|
+
m = match_internal(other, POS_UNSET)
|
|
109
|
+
m ? m.begin(0) : nil
|
|
110
|
+
end
|
|
111
|
+
|
|
112
|
+
private
|
|
113
|
+
|
|
114
|
+
def uses_last_index?
|
|
115
|
+
@global || @sticky
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
def raw_match(str, pos)
|
|
119
|
+
Regexp.instance_method(:match).bind_call(self, str, pos)
|
|
120
|
+
end
|
|
121
|
+
|
|
122
|
+
def match_internal(str, pos)
|
|
123
|
+
str = str.to_s
|
|
124
|
+
|
|
125
|
+
if uses_last_index?
|
|
126
|
+
if pos == POS_UNSET
|
|
127
|
+
start = @last_index
|
|
128
|
+
if start < 0 || start > str.length
|
|
129
|
+
@last_index = 0
|
|
130
|
+
return nil
|
|
131
|
+
end
|
|
132
|
+
|
|
133
|
+
m = raw_match(str, start)
|
|
134
|
+
if m
|
|
135
|
+
@last_index = m.end(0)
|
|
136
|
+
return m
|
|
137
|
+
end
|
|
138
|
+
|
|
139
|
+
@last_index = 0
|
|
140
|
+
return nil
|
|
141
|
+
end
|
|
142
|
+
|
|
143
|
+
return raw_match(str, pos)
|
|
144
|
+
end
|
|
145
|
+
|
|
146
|
+
pos = 0 if pos == POS_UNSET
|
|
147
|
+
raw_match(str, pos)
|
|
148
|
+
end
|
|
149
|
+
end
|
|
150
|
+
end
|
data/lib/js_regex_to_ruby.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: js_regex_to_ruby
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.3
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- jasl
|
|
@@ -25,6 +25,7 @@ files:
|
|
|
25
25
|
- Rakefile
|
|
26
26
|
- lib/js_regex_to_ruby.rb
|
|
27
27
|
- lib/js_regex_to_ruby/converter.rb
|
|
28
|
+
- lib/js_regex_to_ruby/js_regexp.rb
|
|
28
29
|
- lib/js_regex_to_ruby/result.rb
|
|
29
30
|
- lib/js_regex_to_ruby/version.rb
|
|
30
31
|
- sig/js_regex_to_ruby.rbs
|