re2js 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +273 -0
- package/build/index.cjs.js +12 -0
- package/build/index.cjs.js.map +1 -0
- package/build/index.esm.js +12 -0
- package/build/index.esm.js.map +1 -0
- package/build/index.umd.js +12 -0
- package/build/index.umd.js.map +1 -0
- package/package.json +63 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2023 Alexey Vasiliev
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,273 @@
|
|
|
1
|
+
# RE2JS is the JavaScript port of RE2, a regular expression engine that provides linear time matching
|
|
2
|
+
[](https://github.com/le0pard/re2js/actions/workflows/tests.yml)
|
|
3
|
+
|
|
4
|
+
## TLDR
|
|
5
|
+
|
|
6
|
+
The built-in JavaScript regular expression engine can, under certain special combinations, run in exponential time. This situation can trigger what's referred to as a [Regular Expression Denial of Service (ReDoS)](https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS). RE2, a different regular expression engine, can effectively safeguard your Node.js applications from ReDoS attacks. With RE2JS, this protective feature extends to browser environments as well, enabling you to utilize the RE2 engine more comprehensively.
|
|
7
|
+
|
|
8
|
+
## What is RE2?
|
|
9
|
+
|
|
10
|
+
RE2 is a regular expression engine designed to operate in time proportional to the size of the input, ensuring linear time complexity. RE2JS, on the other hand, is a pure JavaScript port of the [RE2 library](https://github.com/google/re2) — more specifically, it's a port of the [RE2/J library](https://github.com/google/re2j).
|
|
11
|
+
|
|
12
|
+
JavaScript standard regular expression package, [RegExp](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions), and many other widely used regular expression packages such as PCRE, Perl and Python use a backtracking implementation strategy: when a pattern presents two alternatives such as a|b, the engine will try to match subpattern a first, and if that yields no match, it will reset the input stream and try to match b instead.
|
|
13
|
+
|
|
14
|
+
If such choices are deeply nested, this strategy requires an exponential number of passes over the input data before it can detect whether the input matches. If the input is large, it is easy to construct a pattern whose running time would exceed the lifetime of the universe. This creates a security risk when accepting regular expression patterns from untrusted sources, such as users of a web application.
|
|
15
|
+
|
|
16
|
+
In contrast, the RE2 algorithm explores all matches simultaneously in a single pass over the input data by using a nondeterministic finite automaton.
|
|
17
|
+
|
|
18
|
+
There are certain features of PCRE or Perl regular expressions that cannot be implemented in linear time, for example, backreferences, but the vast majority of regular expressions patterns in practice avoid such features.
|
|
19
|
+
|
|
20
|
+
## Usage
|
|
21
|
+
|
|
22
|
+
This document provides a series of examples demonstrating how to use RE2JS in your code. For more detailed information about regex syntax, please visit this page: [Google RE2 Syntax Documentation](https://github.com/google/re2/wiki/Syntax).
|
|
23
|
+
|
|
24
|
+
### Compiling Patterns
|
|
25
|
+
|
|
26
|
+
You can compile a regex pattern using the `RE2JS.compile()` function:
|
|
27
|
+
|
|
28
|
+
```js
|
|
29
|
+
import { RE2JS } from 're2js'
|
|
30
|
+
|
|
31
|
+
const p = RE2JS.compile('abc');
|
|
32
|
+
console.log(p.pattern()); // Outputs: 'abc'
|
|
33
|
+
console.log(p.flags()); // Outputs: 0
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
The `RE2JS.compile()` function also supports flags:
|
|
37
|
+
|
|
38
|
+
```js
|
|
39
|
+
import { RE2JS } from 're2js'
|
|
40
|
+
|
|
41
|
+
const p = RE2JS.compile('abc', RE2JS.CASE_INSENSITIVE | RE2JS.MULTILINE);
|
|
42
|
+
console.log(p.pattern()); // Outputs: 'abc'
|
|
43
|
+
console.log(p.flags()); // Outputs: 5
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Supported flags:
|
|
47
|
+
|
|
48
|
+
```js
|
|
49
|
+
/**
|
|
50
|
+
* Flag: case insensitive matching.
|
|
51
|
+
*/
|
|
52
|
+
RE2JS.CASE_INSENSITIVE
|
|
53
|
+
/**
|
|
54
|
+
* Flag: dot ({@code .}) matches all characters, including newline.
|
|
55
|
+
*/
|
|
56
|
+
RE2JS.DOTALL
|
|
57
|
+
/**
|
|
58
|
+
* Flag: multiline matching: {@code ^} and {@code $} match at beginning and end of line, not just
|
|
59
|
+
* beginning and end of input.
|
|
60
|
+
*/
|
|
61
|
+
RE2JS.MULTILINE
|
|
62
|
+
/**
|
|
63
|
+
* Flag: Unicode groups (e.g. {@code \p\ Greek\} ) will be syntax errors.
|
|
64
|
+
*/
|
|
65
|
+
RE2JS.DISABLE_UNICODE_GROUPS
|
|
66
|
+
/**
|
|
67
|
+
* Flag: matches longest possible string.
|
|
68
|
+
*/
|
|
69
|
+
RE2JS.LONGEST_MATCH
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
### Checking for Matches
|
|
73
|
+
|
|
74
|
+
RE2JS allows you to check if a string matches a given regex pattern using the `RE2JS.matches()` function
|
|
75
|
+
|
|
76
|
+
```js
|
|
77
|
+
import { RE2JS } from 're2js'
|
|
78
|
+
|
|
79
|
+
RE2JS.matches('ab+c', 'abbbc') // true
|
|
80
|
+
RE2JS.matches('ab+c', 'cbbba') // false
|
|
81
|
+
// or
|
|
82
|
+
RE2JS.compile('ab+c').matches('abbbc') // true
|
|
83
|
+
RE2JS.compile('ab+c').matches('cbbba') // false
|
|
84
|
+
// with flags
|
|
85
|
+
RE2JS.compile('ab+c', RE2JS.CASE_INSENSITIVE).matches('AbBBc') // true
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Finding Matches
|
|
89
|
+
|
|
90
|
+
To find a match for a given regex pattern in a string, you can use the `find()` function
|
|
91
|
+
|
|
92
|
+
```js
|
|
93
|
+
import { RE2JS } from 're2js'
|
|
94
|
+
|
|
95
|
+
RE2JS.compile('ab+c').matcher('xxabbbc').find() // true
|
|
96
|
+
RE2JS.compile('ab+c').matcher('cbbba').find() // false
|
|
97
|
+
// with flags
|
|
98
|
+
RE2JS.compile('ab+c', RE2JS.CASE_INSENSITIVE).matcher('abBBc').find() // true
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
The `find()` method searches for a pattern match in a string starting from a specific index
|
|
102
|
+
|
|
103
|
+
```js
|
|
104
|
+
import { RE2JS } from 're2js'
|
|
105
|
+
|
|
106
|
+
const p = RE2JS.compile('.*[aeiou]')
|
|
107
|
+
const matchString = p.matcher('abcdefgh')
|
|
108
|
+
matchString.find(0) // true
|
|
109
|
+
matchString.group() // 'abcde'
|
|
110
|
+
matchString.find(1) // true
|
|
111
|
+
matchString.group() // 'bcde'
|
|
112
|
+
matchString.find(4) // true
|
|
113
|
+
matchString.group() // 'e'
|
|
114
|
+
matchString.find(7) // false
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### Splitting Strings
|
|
118
|
+
|
|
119
|
+
You can split a string based on a regex pattern using the `split()` function
|
|
120
|
+
|
|
121
|
+
```js
|
|
122
|
+
import { RE2JS } from 're2js'
|
|
123
|
+
|
|
124
|
+
RE2JS.compile('/').split('abcde') // ['abcde']
|
|
125
|
+
RE2JS.compile('/').split('a/b/cc//d/e//') // ['a', 'b', 'cc', '', 'd', 'e']
|
|
126
|
+
RE2JS.compile(':').split(':a::b') // ['', 'a', '', 'b']
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
The `split()` function also supports a limit parameter
|
|
130
|
+
|
|
131
|
+
```js
|
|
132
|
+
import { RE2JS } from 're2js'
|
|
133
|
+
|
|
134
|
+
RE2JS.compile('/').split('a/b/cc//d/e//', 3) // ['a', 'b', 'cc//d/e//']
|
|
135
|
+
RE2JS.compile('/').split('a/b/cc//d/e//', 4) // ['a', 'b', 'cc', '/d/e//']
|
|
136
|
+
RE2JS.compile('/').split('a/b/cc//d/e//', 9) // ['a', 'b', 'cc', '', 'd', 'e', '', '']
|
|
137
|
+
RE2JS.compile(':').split('boo:and:foo', 2) // ['boo', 'and:foo']
|
|
138
|
+
RE2JS.compile(':').split('boo:and:foo', 5) // ['boo', 'and', 'foo']
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### Working with Groups
|
|
142
|
+
|
|
143
|
+
RE2JS supports capturing groups in regex patterns
|
|
144
|
+
|
|
145
|
+
#### Group Count
|
|
146
|
+
|
|
147
|
+
You can get the count of groups in a pattern using the `groupCount()` function
|
|
148
|
+
|
|
149
|
+
```js
|
|
150
|
+
import { RE2JS } from 're2js'
|
|
151
|
+
|
|
152
|
+
RE2JS.compile('(.*)ab(.*)a').groupCount() // 2
|
|
153
|
+
RE2JS.compile('(.*)((a)b)(.*)a').groupCount() // 4
|
|
154
|
+
RE2JS.compile('(.*)(\\(a\\)b)(.*)a').groupCount() // 3
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
#### Named Groups
|
|
158
|
+
|
|
159
|
+
You can access the named groups in a pattern using the `namedGroups()` function
|
|
160
|
+
|
|
161
|
+
```js
|
|
162
|
+
import { RE2JS } from 're2js'
|
|
163
|
+
|
|
164
|
+
RE2JS.compile('(?P<foo>\\d{2})').namedGroups() // { foo: 1 }
|
|
165
|
+
RE2JS.compile('\\d{2}').namedGroups() // {}
|
|
166
|
+
RE2JS.compile('(?P<foo>.*)(?P<bar>.*)').namedGroups() // { foo: 1, bar: 2 }
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
#### Group Content
|
|
170
|
+
|
|
171
|
+
The `group()` method retrieves the content matched by a specific capturing group
|
|
172
|
+
|
|
173
|
+
```js
|
|
174
|
+
import { RE2JS } from 're2js'
|
|
175
|
+
|
|
176
|
+
const p = RE2JS.compile('(a)(b(c)?)d?(e)')
|
|
177
|
+
const matchString = p.matcher('xabdez')
|
|
178
|
+
if (matchString.find()) {
|
|
179
|
+
matchString.group(0) // 'abde'
|
|
180
|
+
matchString.group(1) // 'a'
|
|
181
|
+
matchString.group(2) // 'b'
|
|
182
|
+
matchString.group(3) // null
|
|
183
|
+
matchString.group(4) // 'e'
|
|
184
|
+
}
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
#### Named Group Content
|
|
188
|
+
|
|
189
|
+
The `group()` method retrieves the content matched by a specific name of capturing group
|
|
190
|
+
|
|
191
|
+
```js
|
|
192
|
+
import { RE2JS } from 're2js'
|
|
193
|
+
|
|
194
|
+
const p = RE2JS.compile(
|
|
195
|
+
'(?P<baz>f(?P<foo>b*a(?P<another>r+)){0,10})(?P<bag>bag)?(?P<nomatch>zzz)?'
|
|
196
|
+
)
|
|
197
|
+
const matchString = p.matcher('fbbarrrrrbag')
|
|
198
|
+
if (matchString.matches()) {
|
|
199
|
+
matchString.group('baz') // 'fbbarrrrr'
|
|
200
|
+
matchString.group('foo') // 'bbarrrrr'
|
|
201
|
+
matchString.group('another') // 'rrrrr'
|
|
202
|
+
matchString.group('bag') // 'bag'
|
|
203
|
+
matchString.group('nomatch') // null
|
|
204
|
+
}
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
### Replacing Matches
|
|
208
|
+
|
|
209
|
+
RE2JS allows you to replace all occurrences or the first occurrence of a pattern match in a string with a specific replacement string
|
|
210
|
+
|
|
211
|
+
#### Replacing All Occurrences
|
|
212
|
+
|
|
213
|
+
The `replaceAll()` method replaces all occurrences of a pattern match in a string with the given replacement
|
|
214
|
+
|
|
215
|
+
```js
|
|
216
|
+
import { RE2JS } from 're2js'
|
|
217
|
+
|
|
218
|
+
RE2JS.compile('Frog')
|
|
219
|
+
.matcher("What the Frog's Eye Tells the Frog's Brain")
|
|
220
|
+
.replaceAll('Lizard') // "What the Lizard's Eye Tells the Lizard's Brain"
|
|
221
|
+
RE2JS.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
|
|
222
|
+
.matcher('abcdefghijklmnopqrstuvwxyz123')
|
|
223
|
+
.replaceAll('$10$20') // 'jb0wo0123'
|
|
224
|
+
```
|
|
225
|
+
|
|
226
|
+
Note that the replacement string can include references to capturing groups from the pattern
|
|
227
|
+
|
|
228
|
+
#### Replacing the First Occurrence
|
|
229
|
+
|
|
230
|
+
The `replaceFirst()` method replaces the first occurrence of a pattern match in a string with the given replacement
|
|
231
|
+
|
|
232
|
+
```js
|
|
233
|
+
import { RE2JS } from 're2js'
|
|
234
|
+
|
|
235
|
+
RE2JS.compile('Frog')
|
|
236
|
+
.matcher("What the Frog's Eye Tells the Frog's Brain")
|
|
237
|
+
.replaceFirst('Lizard') // "What the Lizard's Eye Tells the Frog's Brain"
|
|
238
|
+
RE2JS.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
|
|
239
|
+
.matcher('abcdefghijklmnopqrstuvwxyz123')
|
|
240
|
+
.replaceFirst('$10$20') // 'jb0nopqrstuvwxyz123'
|
|
241
|
+
```
|
|
242
|
+
|
|
243
|
+
## Performance
|
|
244
|
+
|
|
245
|
+
The RE2JS engine runs more slowly compared to native RegExp objects. This reduced speed is also noticeable when comparing RE2JS to the original RE2 engine. The primary reason behind this is the lack of a synchronous threads solution within the browser environment. This deficiency is significant because the regex engine requires a synchronous API to operate optimally.
|
|
246
|
+
|
|
247
|
+
The C++ implementation of the RE2 engine includes both NFA (Nondeterministic Finite Automaton) and DFA (Deterministic Finite Automaton) engines, as well as a variety of optimizations. Russ Cox ported a simplified version of the NFA engine to Go. Later, Alan Donovan ported the NFA-based Go implementation to Java. I then ported the NFA-based Java implementation to a pure JS version. This is another reason why the pure JS version will perform more slowly compared to the original RE2 engine.
|
|
248
|
+
|
|
249
|
+
Should you require high performance on the server side when using RE2, it would be beneficial to consider the following packages for JS:
|
|
250
|
+
|
|
251
|
+
- [Node-RE2](https://github.com/uhop/node-re2/): A powerful RE2 package for Node.js
|
|
252
|
+
- [RE2-WASM](https://github.com/google/re2-wasm/): This package is a WASM wrapper for RE2. Please note, as of now, it does not work in browsers
|
|
253
|
+
|
|
254
|
+
## Justification for this JS port existence
|
|
255
|
+
|
|
256
|
+
There are several reasons that underscore the importance of having an RE2 vanilla JavaScript (JS) port.
|
|
257
|
+
|
|
258
|
+
Firstly, it enables RE2 JS validation on the client side within the browser. This is vital as it allows the implementation and execution of regular expression operations directly in the browser, enhancing performance by reducing the necessity of server-side computations and back-and-forth communication.
|
|
259
|
+
|
|
260
|
+
Secondly, it provides a platform for simple RE2 parsing, specifically for the extraction of regex groups. This feature is particularly useful when dealing with complex regular expressions, as it allows for the breakdown of regex patterns into manageable and identifiable segments or 'groups'.
|
|
261
|
+
|
|
262
|
+
These factors combined make the RE2 vanilla JS port a valuable tool for developers needing to work with complex regular expressions within a browser environment.
|
|
263
|
+
|
|
264
|
+
## Development
|
|
265
|
+
|
|
266
|
+
Some files like `CharGroup.js` and `UnicodeTables.js` is generated and should be edited in generator files
|
|
267
|
+
|
|
268
|
+
```bash
|
|
269
|
+
./tools/scripts/make_perl_groups.pl > src/CharGroup.js
|
|
270
|
+
yarn node ./tools/scripts/genUnicodeTable.js > src/UnicodeTables.js
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
To run `make_perl_groups.pl` you need to have install perl (version inside `.tool-versions`)
|