re2 1.24.0 → 1.24.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +131 -0
- package/ARCHITECTURE.md +152 -0
- package/README.md +1 -0
- package/lib/addon.cc +2 -2
- package/lib/new.cc +6 -6
- package/lib/set.cc +10 -10
- package/lib/wrapped_re2.h +29 -0
- package/llms-full.txt +467 -0
- package/llms.txt +132 -0
- package/package.json +17 -10
- package/re2.js +1 -0
- package/vendor/abseil-cpp/MODULE.bazel +1 -1
- package/vendor/abseil-cpp/absl/base/config.h +1 -1
- package/vendor/abseil-cpp/absl/hash/internal/hash.h +4 -1
- package/vendor/abseil-cpp/absl/strings/escaping.cc +7 -5
- package/vendor/abseil-cpp/absl/strings/escaping_test.cc +4 -0
package/AGENTS.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# AGENTS.md — node-re2
|
|
2
|
+
|
|
3
|
+
> `node-re2` provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`.
|
|
4
|
+
|
|
5
|
+
For project structure, module dependencies, and the architecture overview see [ARCHITECTURE.md](./ARCHITECTURE.md).
|
|
6
|
+
For detailed usage docs see the [README](./README.md) and the [wiki](https://github.com/uhop/node-re2/wiki).
|
|
7
|
+
|
|
8
|
+
## Setup
|
|
9
|
+
|
|
10
|
+
This project uses git submodules for vendored dependencies (RE2 and Abseil):
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
git clone --recursive https://github.com/uhop/node-re2.git
|
|
14
|
+
cd node-re2
|
|
15
|
+
npm install
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
If the native addon fails to download a prebuilt artifact, it builds locally via `node-gyp`.
|
|
19
|
+
|
|
20
|
+
## Commands
|
|
21
|
+
|
|
22
|
+
- **Install:** `npm install` (downloads prebuilt artifact or builds from source)
|
|
23
|
+
- **Build (release):** `npm run rebuild` (or `node-gyp -j max rebuild`)
|
|
24
|
+
- **Build (debug):** `npm run rebuild:dev` (or `node-gyp -j max rebuild --debug`)
|
|
25
|
+
- **Test:** `npm test` (runs `tape6 --flags FO`, worker threads)
|
|
26
|
+
- **Test (sequential):** `npm run test:seq`
|
|
27
|
+
- **Test (multi-process):** `npm run test:proc`
|
|
28
|
+
- **Test (single file):** `node tests/test-<name>.mjs`
|
|
29
|
+
- **TypeScript check:** `npm run ts-check`
|
|
30
|
+
- **Lint:** `npm run lint` (Prettier check)
|
|
31
|
+
- **Lint fix:** `npm run lint:fix` (Prettier write)
|
|
32
|
+
- **Verify build:** `npm run verify-build`
|
|
33
|
+
|
|
34
|
+
## Project structure
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
node-re2/
|
|
38
|
+
├── package.json # Package config; "tape6" section configures test discovery
|
|
39
|
+
├── binding.gyp # node-gyp build configuration for the C++ addon
|
|
40
|
+
├── re2.js # Main entry point: loads native addon, sets up Symbol aliases
|
|
41
|
+
├── re2.d.ts # TypeScript declarations for the public API
|
|
42
|
+
├── tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
|
|
43
|
+
├── lib/ # C++ source code (native addon)
|
|
44
|
+
│ ├── addon.cc # Node.js addon initialization, method registration
|
|
45
|
+
│ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
|
|
46
|
+
│ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
|
|
47
|
+
│ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
|
|
48
|
+
│ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
|
|
49
|
+
│ ├── exec.cc # RE2.prototype.exec() implementation
|
|
50
|
+
│ ├── test.cc # RE2.prototype.test() implementation
|
|
51
|
+
│ ├── match.cc # RE2.prototype.match() implementation
|
|
52
|
+
│ ├── replace.cc # RE2.prototype.replace() implementation
|
|
53
|
+
│ ├── search.cc # RE2.prototype.search() implementation
|
|
54
|
+
│ ├── split.cc # RE2.prototype.split() implementation
|
|
55
|
+
│ ├── to_string.cc # RE2.prototype.toString() implementation
|
|
56
|
+
│ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
|
|
57
|
+
│ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
|
|
58
|
+
│ ├── set.cc # RE2.Set implementation (multi-pattern matching)
|
|
59
|
+
│ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
|
|
60
|
+
│ ├── util.h # Utility declarations
|
|
61
|
+
│ └── pattern.h # Pattern translation declarations
|
|
62
|
+
├── scripts/
|
|
63
|
+
│ └── verify-build.js # Quick smoke test for the built addon
|
|
64
|
+
├── tests/ # Test files (test-*.mjs using tape-six)
|
|
65
|
+
├── ts-tests/ # TypeScript type-checking tests
|
|
66
|
+
│ └── test-types.ts # Verifies type declarations compile correctly
|
|
67
|
+
├── bench/ # Benchmarks
|
|
68
|
+
├── vendor/ # Vendored C++ dependencies (git submodules)
|
|
69
|
+
│ ├── re2/ # Google RE2 library source
|
|
70
|
+
│ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
|
|
71
|
+
└── .github/ # CI workflows, Dependabot config, actions
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Code style
|
|
75
|
+
|
|
76
|
+
- **CommonJS** throughout (`"type": "commonjs"` in package.json).
|
|
77
|
+
- **No transpilation** — JavaScript code runs directly.
|
|
78
|
+
- **C++ code** uses tabs for indentation, 4-wide. JavaScript uses 2-space indentation.
|
|
79
|
+
- **Prettier** for JS/TS formatting (see `.prettierrc`): 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid".
|
|
80
|
+
- **nan** (Native Abstractions for Node.js) for the C++ addon API.
|
|
81
|
+
- Semicolons are enforced by Prettier (default `semi: true`).
|
|
82
|
+
- Imports use `require()` syntax in source, `import` in tests (`.mjs`).
|
|
83
|
+
|
|
84
|
+
## Critical rules
|
|
85
|
+
|
|
86
|
+
- **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules.
|
|
87
|
+
- **Do not modify or delete test expectations** without understanding why they changed.
|
|
88
|
+
- **Do not add comments or remove comments** unless explicitly asked.
|
|
89
|
+
- **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`.
|
|
90
|
+
- **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64).
|
|
91
|
+
- **RE2 is always Unicode-mode.** The `u` flag is always added implicitly.
|
|
92
|
+
- **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input.
|
|
93
|
+
|
|
94
|
+
## Architecture
|
|
95
|
+
|
|
96
|
+
- `re2.js` is the main entry point. It loads the native C++ addon from `build/Release/re2.node` and sets up `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` on the prototype.
|
|
97
|
+
- The C++ addon (`lib/*.cc`) wraps Google's RE2 library via nan. Each RegExp method has its own `.cc` file.
|
|
98
|
+
- `lib/new.cc` handles construction: parsing patterns, translating RegExp syntax to RE2 syntax (via `lib/pattern.cc`), and creating the underlying `re2::RE2` instance.
|
|
99
|
+
- `lib/pattern.cc` translates JavaScript RegExp features to RE2 equivalents, including Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`).
|
|
100
|
+
- `lib/set.cc` implements `RE2.Set` for multi-pattern matching using `re2::RE2::Set`.
|
|
101
|
+
- `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion helpers and buffer utilities.
|
|
102
|
+
- Prebuilt native artifacts are hosted on GitHub Releases and downloaded at install time via `install-artifact-from-github`.
|
|
103
|
+
|
|
104
|
+
## Writing tests
|
|
105
|
+
|
|
106
|
+
```js
|
|
107
|
+
import test from 'tape-six';
|
|
108
|
+
import {RE2} from '../re2.js';
|
|
109
|
+
|
|
110
|
+
test('example', t => {
|
|
111
|
+
const re = new RE2('a(b*)', 'i');
|
|
112
|
+
const result = re.exec('aBbC');
|
|
113
|
+
t.ok(result);
|
|
114
|
+
t.equal(result[0], 'aBb');
|
|
115
|
+
t.equal(result[1], 'Bb');
|
|
116
|
+
});
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
- Test files use `tape-six`: `.mjs` for runtime tests, `.ts` for TypeScript typing tests.
|
|
120
|
+
- Test file naming convention: `test-*.mjs` in `tests/`, `test-*.ts` in `ts-tests/`.
|
|
121
|
+
- Tests are configured in `package.json` under the `"tape6"` section.
|
|
122
|
+
- Test files should be directly executable: `node tests/test-foo.mjs`.
|
|
123
|
+
|
|
124
|
+
## Key conventions
|
|
125
|
+
|
|
126
|
+
- The library is a drop-in replacement for `RegExp` — the `RE2` object emulates the standard `RegExp` API.
|
|
127
|
+
- `RE2.Set` provides multi-pattern matching: `new RE2.Set(patterns, flags, options)`.
|
|
128
|
+
- Static helpers: `RE2.getUtf8Length(str)`, `RE2.getUtf16Length(buf)`.
|
|
129
|
+
- `RE2.unicodeWarningLevel` controls behavior when non-Unicode regexps are created.
|
|
130
|
+
- The `install` script tries to download a prebuilt `.node` artifact before falling back to `node-gyp rebuild`.
|
|
131
|
+
- All C++ source is in `lib/`, all vendored third-party C++ is in `vendor/`.
|
package/ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
`node-re2` provides Node.js bindings for Google's [RE2](https://github.com/google/re2) regular expression engine. It is a C++ native addon built with `node-gyp` and `nan`. The `RE2` object is a drop-in replacement for `RegExp` with guaranteed linear-time matching (no ReDoS).
|
|
4
|
+
|
|
5
|
+
## Project layout
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
package.json # Package config; "tape6" section configures test discovery
|
|
9
|
+
binding.gyp # node-gyp build configuration for the C++ addon
|
|
10
|
+
re2.js # Main entry point: loads native addon, sets up Symbol aliases
|
|
11
|
+
re2.d.ts # TypeScript declarations for the public API
|
|
12
|
+
tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
|
|
13
|
+
lib/ # C++ source code (native addon)
|
|
14
|
+
├── addon.cc # Node.js addon initialization, method registration
|
|
15
|
+
├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
|
|
16
|
+
├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
|
|
17
|
+
├── isolate_data.h # Per-isolate data struct for thread-safe addon state
|
|
18
|
+
├── new.cc # Constructor: parse pattern/flags, create RE2 instance
|
|
19
|
+
├── exec.cc # RE2.prototype.exec() implementation
|
|
20
|
+
├── test.cc # RE2.prototype.test() implementation
|
|
21
|
+
├── match.cc # RE2.prototype.match() implementation
|
|
22
|
+
├── replace.cc # RE2.prototype.replace() implementation
|
|
23
|
+
├── search.cc # RE2.prototype.search() implementation
|
|
24
|
+
├── split.cc # RE2.prototype.split() implementation
|
|
25
|
+
├── to_string.cc # RE2.prototype.toString() implementation
|
|
26
|
+
├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
|
|
27
|
+
├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
|
|
28
|
+
├── pattern.h # Pattern translation declarations
|
|
29
|
+
├── set.cc # RE2.Set implementation (multi-pattern matching)
|
|
30
|
+
├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
|
|
31
|
+
└── util.h # Utility declarations
|
|
32
|
+
scripts/
|
|
33
|
+
└── verify-build.js # Quick smoke test for the built addon
|
|
34
|
+
tests/ # Test files (test-*.mjs using tape-six)
|
|
35
|
+
ts-tests/ # TypeScript type-checking tests
|
|
36
|
+
└── test-types.ts # Verifies type declarations compile correctly
|
|
37
|
+
bench/ # Benchmarks
|
|
38
|
+
vendor/ # Vendored C++ dependencies (git submodules) — DO NOT MODIFY
|
|
39
|
+
├── re2/ # Google RE2 library source
|
|
40
|
+
└── abseil-cpp/ # Abseil C++ library (RE2 dependency)
|
|
41
|
+
.github/ # CI workflows, Dependabot config, actions
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Core concepts
|
|
45
|
+
|
|
46
|
+
### How the addon works
|
|
47
|
+
|
|
48
|
+
1. `re2.js` is the entry point. It loads the compiled C++ addon from `build/Release/re2.node`.
|
|
49
|
+
2. The addon exposes an `RE2` constructor that wraps `re2::RE2` from Google's RE2 library.
|
|
50
|
+
3. `re2.js` adds `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` to the prototype so `RE2` instances work with ES6 string methods.
|
|
51
|
+
4. The `RE2` constructor can be called with or without `new` (factory mode).
|
|
52
|
+
|
|
53
|
+
### C++ addon structure
|
|
54
|
+
|
|
55
|
+
Each RegExp method has its own `.cc` file for maintainability:
|
|
56
|
+
|
|
57
|
+
| File | Purpose |
|
|
58
|
+
| --------------- | ---------------------------------------------------------------- |
|
|
59
|
+
| `addon.cc` | Node.js module initialization, registers all methods/accessors |
|
|
60
|
+
| `isolate_data.h` | Per-isolate data struct (`AddonData`) for thread-safe addon state |
|
|
61
|
+
| `wrapped_re2.h` | `WrappedRE2` class: holds `re2::RE2*`, flags, lastIndex, source |
|
|
62
|
+
| `new.cc` | Constructor: parses pattern + flags, translates syntax, creates RE2 instance |
|
|
63
|
+
| `exec.cc` | `exec()` — find match with capture groups |
|
|
64
|
+
| `test.cc` | `test()` — boolean match check |
|
|
65
|
+
| `match.cc` | `match()` — String.prototype.match equivalent |
|
|
66
|
+
| `replace.cc` | `replace()` — substitution with string or function replacer |
|
|
67
|
+
| `search.cc` | `search()` — find index of first match |
|
|
68
|
+
| `split.cc` | `split()` — split string by pattern |
|
|
69
|
+
| `to_string.cc` | `toString()` — `/pattern/flags` representation |
|
|
70
|
+
| `accessors.cc` | Property getters: `source`, `flags`, `lastIndex`, `global`, `ignoreCase`, `multiline`, `dotAll`, `unicode`, `sticky`, `hasIndices`, `internalSource` |
|
|
71
|
+
| `pattern.cc` | Translates JS RegExp syntax to RE2 syntax, maps Unicode property names |
|
|
72
|
+
| `set.cc` | `RE2.Set` — multi-pattern matching via `re2::RE2::Set` |
|
|
73
|
+
| `util.cc` | UTF-8 ↔ UTF-16 conversion, buffer/string helpers |
|
|
74
|
+
|
|
75
|
+
### Pattern translation (pattern.cc)
|
|
76
|
+
|
|
77
|
+
JavaScript RegExp features are translated to RE2 equivalents:
|
|
78
|
+
|
|
79
|
+
- Named groups: `(?<name>...)` syntax is preserved (RE2 supports it natively).
|
|
80
|
+
- Unicode classes: long names like `\p{Letter}` are mapped to short names `\p{L}`. Script names like `\p{Script=Latin}` are mapped to `\p{Latin}`.
|
|
81
|
+
- Backreferences and lookahead assertions are **not supported** — RE2 throws `SyntaxError`.
|
|
82
|
+
|
|
83
|
+
### Buffer support
|
|
84
|
+
|
|
85
|
+
All methods accept both strings and Node.js Buffers:
|
|
86
|
+
|
|
87
|
+
- Buffer inputs are assumed UTF-8 encoded.
|
|
88
|
+
- Buffer inputs produce Buffer outputs (in composite result objects too).
|
|
89
|
+
- Offsets and lengths are in bytes (not characters) when using Buffers.
|
|
90
|
+
- The `useBuffers` property on replacer functions controls offset reporting in `replace()`.
|
|
91
|
+
|
|
92
|
+
### RE2.Set (set.cc)
|
|
93
|
+
|
|
94
|
+
Multi-pattern matching using `re2::RE2::Set`:
|
|
95
|
+
|
|
96
|
+
- `new RE2.Set(patterns, flags?, options?)` — compile multiple patterns into a single automaton.
|
|
97
|
+
- `set.test(str)` — returns `true` if any pattern matches.
|
|
98
|
+
- `set.match(str)` — returns array of indices of matching patterns.
|
|
99
|
+
- Properties: `size`, `source`, `sources`, `flags`, `anchor`.
|
|
100
|
+
|
|
101
|
+
### Build system
|
|
102
|
+
|
|
103
|
+
- `binding.gyp` defines the node-gyp build: compiles all `.cc` files in `lib/` plus vendored RE2 and Abseil sources.
|
|
104
|
+
- Platform-specific compiler flags are set for GCC, Clang, and MSVC.
|
|
105
|
+
- The `install` npm script first tries to download a prebuilt `re2.node` from GitHub Releases via `install-artifact-from-github`, falling back to a local `node-gyp rebuild`.
|
|
106
|
+
- Prebuilt artifacts cover: Linux (x64, arm64, Alpine/musl), macOS (x64, arm64), Windows (x64, arm64).
|
|
107
|
+
|
|
108
|
+
## Module dependency graph
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
re2.js ──→ build/Release/re2.node (compiled C++ addon)
|
|
112
|
+
│
|
|
113
|
+
├── lib/addon.cc (init)
|
|
114
|
+
│ ├── lib/new.cc ──→ lib/pattern.cc
|
|
115
|
+
│ ├── lib/exec.cc
|
|
116
|
+
│ ├── lib/test.cc
|
|
117
|
+
│ ├── lib/match.cc
|
|
118
|
+
│ ├── lib/replace.cc
|
|
119
|
+
│ ├── lib/search.cc
|
|
120
|
+
│ ├── lib/split.cc
|
|
121
|
+
│ ├── lib/to_string.cc
|
|
122
|
+
│ ├── lib/accessors.cc
|
|
123
|
+
│ └── lib/set.cc
|
|
124
|
+
│
|
|
125
|
+
├── lib/wrapped_re2.h (shared class definition)
|
|
126
|
+
├── lib/wrapped_re2_set.h (RE2.Set class)
|
|
127
|
+
├── lib/util.cc / lib/util.h (shared utilities)
|
|
128
|
+
│
|
|
129
|
+
└── vendor/ (re2 + abseil-cpp)
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Testing
|
|
133
|
+
|
|
134
|
+
- **Framework**: tape-six (`tape6`)
|
|
135
|
+
- **Run all**: `npm test` (worker threads via `tape6 --flags FO`)
|
|
136
|
+
- **Run sequential**: `npm run test:seq`
|
|
137
|
+
- **Run multi-process**: `npm run test:proc`
|
|
138
|
+
- **Run single file**: `node tests/test-<name>.mjs`
|
|
139
|
+
- **TypeScript check**: `npm run ts-check`
|
|
140
|
+
- **Lint**: `npm run lint` (Prettier check)
|
|
141
|
+
- **Lint fix**: `npm run lint:fix` (Prettier write)
|
|
142
|
+
- **Verify build**: `npm run verify-build` (quick smoke test)
|
|
143
|
+
|
|
144
|
+
## Import paths
|
|
145
|
+
|
|
146
|
+
```js
|
|
147
|
+
// CommonJS (source, scripts)
|
|
148
|
+
const RE2 = require('re2');
|
|
149
|
+
|
|
150
|
+
// ESM (tests)
|
|
151
|
+
import {RE2} from '../re2.js';
|
|
152
|
+
```
|
package/README.md
CHANGED
|
@@ -385,6 +385,7 @@ The same applies to `\P{...}`.
|
|
|
385
385
|
|
|
386
386
|
## Release history
|
|
387
387
|
|
|
388
|
+
- 1.24.1 *Support for Node 22, 24, 26 + precompiled binaries.*
|
|
388
389
|
- 1.24.0 *Fixed multi-threaded crash in worker threads (#235). Added named import: `import {RE2} from 're2'`. Added CJS test. Updated docs and dependencies.*
|
|
389
390
|
- 1.23.3 *Updated Abseil and dev dependencies.*
|
|
390
391
|
- 1.23.2 *Updated dev dependencies.*
|
package/lib/addon.cc
CHANGED
|
@@ -40,7 +40,7 @@ static NAN_METHOD(GetUtf8Length)
|
|
|
40
40
|
return;
|
|
41
41
|
}
|
|
42
42
|
auto s = t.ToLocalChecked();
|
|
43
|
-
info.GetReturnValue().Set(static_cast<int>(s
|
|
43
|
+
info.GetReturnValue().Set(static_cast<int>(utf8Length(s, v8::Isolate::GetCurrent())));
|
|
44
44
|
}
|
|
45
45
|
|
|
46
46
|
static NAN_METHOD(GetUtf16Length)
|
|
@@ -197,7 +197,7 @@ const StrVal &WrappedRE2::prepareArgument(const v8::Local<v8::Value> &arg, bool
|
|
|
197
197
|
auto isolate = v8::Isolate::GetCurrent();
|
|
198
198
|
|
|
199
199
|
auto s = t.ToLocalChecked();
|
|
200
|
-
auto argLength = s
|
|
200
|
+
auto argLength = utf8Length(s, isolate);
|
|
201
201
|
|
|
202
202
|
auto buffer = node::Buffer::New(isolate, s).ToLocalChecked();
|
|
203
203
|
lastCache.Reset(buffer);
|
package/lib/new.cc
CHANGED
|
@@ -76,10 +76,10 @@ NAN_METHOD(WrappedRE2::New)
|
|
|
76
76
|
auto isolate = v8::Isolate::GetCurrent();
|
|
77
77
|
auto t = info[1]->ToString(Nan::GetCurrentContext());
|
|
78
78
|
auto s = t.ToLocalChecked();
|
|
79
|
-
size = s
|
|
79
|
+
size = utf8Length(s, isolate);
|
|
80
80
|
buffer.resize(size + 1);
|
|
81
81
|
data = &buffer[0];
|
|
82
|
-
s
|
|
82
|
+
writeUtf8(s, isolate, data, buffer.size());
|
|
83
83
|
buffer[size] = '\0';
|
|
84
84
|
}
|
|
85
85
|
else if (node::Buffer::HasInstance(info[1]))
|
|
@@ -134,10 +134,10 @@ NAN_METHOD(WrappedRE2::New)
|
|
|
134
134
|
auto isolate = v8::Isolate::GetCurrent();
|
|
135
135
|
auto t = re->GetSource()->ToString(Nan::GetCurrentContext());
|
|
136
136
|
auto s = t.ToLocalChecked();
|
|
137
|
-
size = s
|
|
137
|
+
size = utf8Length(s, isolate);
|
|
138
138
|
buffer.resize(size + 1);
|
|
139
139
|
data = &buffer[0];
|
|
140
|
-
s
|
|
140
|
+
writeUtf8(s, isolate, data, buffer.size());
|
|
141
141
|
buffer[size] = '\0';
|
|
142
142
|
|
|
143
143
|
source = escapeRegExp(data, size);
|
|
@@ -192,10 +192,10 @@ NAN_METHOD(WrappedRE2::New)
|
|
|
192
192
|
auto isolate = v8::Isolate::GetCurrent();
|
|
193
193
|
auto t = info[0]->ToString(Nan::GetCurrentContext());
|
|
194
194
|
auto s = t.ToLocalChecked();
|
|
195
|
-
size = s
|
|
195
|
+
size = utf8Length(s, isolate);
|
|
196
196
|
buffer.resize(size + 1);
|
|
197
197
|
data = &buffer[0];
|
|
198
|
-
s
|
|
198
|
+
writeUtf8(s, isolate, data, buffer.size());
|
|
199
199
|
buffer[size] = '\0';
|
|
200
200
|
|
|
201
201
|
source = escapeRegExp(data, size);
|
package/lib/set.cc
CHANGED
|
@@ -34,9 +34,9 @@ static bool parseFlags(const v8::Local<v8::Value> &arg, SetFlags &flags)
|
|
|
34
34
|
return false;
|
|
35
35
|
}
|
|
36
36
|
auto s = t.ToLocalChecked();
|
|
37
|
-
size = s
|
|
37
|
+
size = utf8Length(s, isolate);
|
|
38
38
|
buffer.resize(size + 1);
|
|
39
|
-
s
|
|
39
|
+
writeUtf8(s, isolate, &buffer[0], buffer.size());
|
|
40
40
|
buffer[buffer.size() - 1] = '\0';
|
|
41
41
|
data = &buffer[0];
|
|
42
42
|
}
|
|
@@ -287,10 +287,10 @@ static bool fillInput(const v8::Local<v8::Value> &arg, StrVal &str, v8::Local<v8
|
|
|
287
287
|
return false;
|
|
288
288
|
}
|
|
289
289
|
auto s = t.ToLocalChecked();
|
|
290
|
-
auto
|
|
290
|
+
auto len = utf8Length(s, isolate);
|
|
291
291
|
auto buffer = node::Buffer::New(isolate, s).ToLocalChecked();
|
|
292
292
|
keepAlive = buffer;
|
|
293
|
-
str.reset(buffer, node::Buffer::Length(buffer),
|
|
293
|
+
str.reset(buffer, node::Buffer::Length(buffer), len, 0);
|
|
294
294
|
return true;
|
|
295
295
|
}
|
|
296
296
|
|
|
@@ -331,7 +331,7 @@ static const char setDeprecationMessage[] = "BMP patterns aren't supported by no
|
|
|
331
331
|
NAN_METHOD(WrappedRE2Set::New)
|
|
332
332
|
{
|
|
333
333
|
auto context = Nan::GetCurrentContext();
|
|
334
|
-
auto isolate =
|
|
334
|
+
auto isolate = v8::Isolate::GetCurrent();
|
|
335
335
|
|
|
336
336
|
if (!info.IsConstructCall())
|
|
337
337
|
{
|
|
@@ -340,7 +340,7 @@ NAN_METHOD(WrappedRE2Set::New)
|
|
|
340
340
|
{
|
|
341
341
|
parameters[i] = info[i];
|
|
342
342
|
}
|
|
343
|
-
auto isolate =
|
|
343
|
+
auto isolate = v8::Isolate::GetCurrent();
|
|
344
344
|
auto addonData = getAddonData(isolate);
|
|
345
345
|
if (!addonData) return;
|
|
346
346
|
auto maybeNew = Nan::NewInstance(Nan::GetFunction(addonData->re2SetTpl.Get(isolate)).ToLocalChecked(), parameters.size(), ¶meters[0]);
|
|
@@ -513,9 +513,9 @@ NAN_METHOD(WrappedRE2Set::New)
|
|
|
513
513
|
return;
|
|
514
514
|
}
|
|
515
515
|
auto s = t.ToLocalChecked();
|
|
516
|
-
size = s
|
|
516
|
+
size = utf8Length(s, isolate);
|
|
517
517
|
buffer.resize(size + 1);
|
|
518
|
-
s
|
|
518
|
+
writeUtf8(s, isolate, &buffer[0], buffer.size());
|
|
519
519
|
buffer[size] = '\0';
|
|
520
520
|
data = &buffer[0];
|
|
521
521
|
source = escapeRegExp(data, size);
|
|
@@ -528,9 +528,9 @@ NAN_METHOD(WrappedRE2Set::New)
|
|
|
528
528
|
return;
|
|
529
529
|
}
|
|
530
530
|
auto s = t.ToLocalChecked();
|
|
531
|
-
size = s
|
|
531
|
+
size = utf8Length(s, isolate);
|
|
532
532
|
buffer.resize(size + 1);
|
|
533
|
-
s
|
|
533
|
+
writeUtf8(s, isolate, &buffer[0], buffer.size());
|
|
534
534
|
buffer[size] = '\0';
|
|
535
535
|
data = &buffer[0];
|
|
536
536
|
source = escapeRegExp(data, size);
|
package/lib/wrapped_re2.h
CHANGED
|
@@ -225,6 +225,35 @@ inline size_t getUtf8CharSize(char ch)
|
|
|
225
225
|
return ((0xE5000000 >> ((ch >> 3) & 0x1E)) & 3) + 1;
|
|
226
226
|
}
|
|
227
227
|
|
|
228
|
+
// V8 13.4 introduced Utf8LengthV2 / WriteUtf8V2; V8 14.6 removed the bare
|
|
229
|
+
// Utf8Length / WriteUtf8. On older V8 (Node 22) only the bare forms exist.
|
|
230
|
+
#if defined(V8_MAJOR_VERSION) && (V8_MAJOR_VERSION > 13 || \
|
|
231
|
+
(V8_MAJOR_VERSION == 13 && defined(V8_MINOR_VERSION) && V8_MINOR_VERSION >= 4))
|
|
232
|
+
|
|
233
|
+
inline size_t utf8Length(v8::Local<v8::String> s, v8::Isolate *isolate)
|
|
234
|
+
{
|
|
235
|
+
return s->Utf8LengthV2(isolate);
|
|
236
|
+
}
|
|
237
|
+
|
|
238
|
+
inline void writeUtf8(v8::Local<v8::String> s, v8::Isolate *isolate, char *buffer, size_t capacity)
|
|
239
|
+
{
|
|
240
|
+
s->WriteUtf8V2(isolate, buffer, capacity);
|
|
241
|
+
}
|
|
242
|
+
|
|
243
|
+
#else
|
|
244
|
+
|
|
245
|
+
inline size_t utf8Length(v8::Local<v8::String> s, v8::Isolate *isolate)
|
|
246
|
+
{
|
|
247
|
+
return static_cast<size_t>(s->Utf8Length(isolate));
|
|
248
|
+
}
|
|
249
|
+
|
|
250
|
+
inline void writeUtf8(v8::Local<v8::String> s, v8::Isolate *isolate, char *buffer, size_t capacity)
|
|
251
|
+
{
|
|
252
|
+
s->WriteUtf8(isolate, buffer, static_cast<int>(capacity));
|
|
253
|
+
}
|
|
254
|
+
|
|
255
|
+
#endif
|
|
256
|
+
|
|
228
257
|
inline size_t getUtf16PositionByCounter(const char *data, size_t from, size_t n)
|
|
229
258
|
{
|
|
230
259
|
for (; n > 0; --n)
|
package/llms-full.txt
ADDED
|
@@ -0,0 +1,467 @@
|
|
|
1
|
+
# node-re2
|
|
2
|
+
|
|
3
|
+
> Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS (Regular Expression Denial of Service). Works with strings and Buffers. C++ native addon built with node-gyp and nan.
|
|
4
|
+
|
|
5
|
+
- Drop-in replacement for RegExp with linear-time matching guarantee
|
|
6
|
+
- Prevents ReDoS by disallowing backreferences and lookahead assertions
|
|
7
|
+
- Full Unicode mode (always on)
|
|
8
|
+
- Buffer support for high-performance binary/UTF-8 processing
|
|
9
|
+
- Named capture groups
|
|
10
|
+
- Symbol-based methods (Symbol.match, Symbol.search, Symbol.replace, Symbol.split, Symbol.matchAll)
|
|
11
|
+
- RE2.Set for multi-pattern matching
|
|
12
|
+
- Prebuilt binaries for Linux, macOS, Windows (x64 + arm64)
|
|
13
|
+
- TypeScript declarations included
|
|
14
|
+
|
|
15
|
+
## Install
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
npm install re2
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Prebuilt native binaries are downloaded automatically. Falls back to building from source via node-gyp if no prebuilt is available.
|
|
22
|
+
|
|
23
|
+
## Quick start
|
|
24
|
+
|
|
25
|
+
```js
|
|
26
|
+
const RE2 = require('re2');
|
|
27
|
+
|
|
28
|
+
// Create and use like RegExp
|
|
29
|
+
const re = new RE2('a(b*)', 'i');
|
|
30
|
+
const result = re.exec('aBbC');
|
|
31
|
+
console.log(result[0]); // "aBb"
|
|
32
|
+
console.log(result[1]); // "Bb"
|
|
33
|
+
|
|
34
|
+
// Works with ES6 string methods
|
|
35
|
+
'hello world'.match(new RE2('\\w+', 'g')); // ['hello', 'world']
|
|
36
|
+
'hello world'.replace(new RE2('world'), 'RE2'); // 'hello RE2'
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Importing
|
|
40
|
+
|
|
41
|
+
```js
|
|
42
|
+
// CommonJS
|
|
43
|
+
const RE2 = require('re2');
|
|
44
|
+
|
|
45
|
+
// ESM
|
|
46
|
+
import { RE2 } from 're2';
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Construction
|
|
50
|
+
|
|
51
|
+
`new RE2(pattern[, flags])` or `RE2(pattern[, flags])` (factory mode).
|
|
52
|
+
|
|
53
|
+
Pattern can be:
|
|
54
|
+
- **String**: `new RE2('\\d+')`
|
|
55
|
+
- **String with flags**: `new RE2('\\d+', 'gi')`
|
|
56
|
+
- **RegExp**: `new RE2(/ab*/ig)` — copies pattern and flags.
|
|
57
|
+
- **RE2**: `new RE2(existingRE2)` — copies pattern and flags.
|
|
58
|
+
- **Buffer**: `new RE2(Buffer.from('pattern'))` — pattern from UTF-8 buffer.
|
|
59
|
+
|
|
60
|
+
Supported flags:
|
|
61
|
+
- `g` — global (find all matches)
|
|
62
|
+
- `i` — ignoreCase
|
|
63
|
+
- `m` — multiline (`^`/`$` match line boundaries)
|
|
64
|
+
- `s` — dotAll (`.` matches `\n`)
|
|
65
|
+
- `u` — unicode (always on, added implicitly)
|
|
66
|
+
- `y` — sticky (match at lastIndex only)
|
|
67
|
+
- `d` — hasIndices (include index info for capture groups)
|
|
68
|
+
|
|
69
|
+
Invalid patterns throw `SyntaxError`. Patterns with backreferences or lookahead throw `SyntaxError`.
|
|
70
|
+
|
|
71
|
+
## Properties
|
|
72
|
+
|
|
73
|
+
### Instance properties
|
|
74
|
+
|
|
75
|
+
- `re.source` (string) — the pattern string, escaped for use in `new RE2(re.source)` or `new RegExp(re.source)`.
|
|
76
|
+
- `re.flags` (string) — the flags string (e.g., `'giu'`).
|
|
77
|
+
- `re.lastIndex` (number) — the index at which to start the next match (used with `g` or `y` flags).
|
|
78
|
+
- `re.global` (boolean) — whether the `g` flag is set.
|
|
79
|
+
- `re.ignoreCase` (boolean) — whether the `i` flag is set.
|
|
80
|
+
- `re.multiline` (boolean) — whether the `m` flag is set.
|
|
81
|
+
- `re.dotAll` (boolean) — whether the `s` flag is set.
|
|
82
|
+
- `re.unicode` (boolean) — always `true` (RE2 always operates in Unicode mode).
|
|
83
|
+
- `re.sticky` (boolean) — whether the `y` flag is set.
|
|
84
|
+
- `re.hasIndices` (boolean) — whether the `d` flag is set.
|
|
85
|
+
- `re.internalSource` (string) — the RE2-translated pattern (for debugging; may differ from `source`).
|
|
86
|
+
|
|
87
|
+
### Static properties
|
|
88
|
+
|
|
89
|
+
- `RE2.unicodeWarningLevel` (string) — controls behavior when a non-Unicode regexp is created:
|
|
90
|
+
- `'nothing'` (default) — silently add `u` flag.
|
|
91
|
+
- `'warnOnce'` — warn once, then silently add `u`. Assigning resets the one-time flag.
|
|
92
|
+
- `'warn'` — warn every time.
|
|
93
|
+
- `'throw'` — throw `SyntaxError` every time.
|
|
94
|
+
|
|
95
|
+
## RegExp methods
|
|
96
|
+
|
|
97
|
+
### re.exec(str)
|
|
98
|
+
|
|
99
|
+
Executes a search for a match. Returns a result array or `null`.
|
|
100
|
+
|
|
101
|
+
```js
|
|
102
|
+
const re = new RE2('a(b+)', 'g');
|
|
103
|
+
const result = re.exec('abbc abbc');
|
|
104
|
+
// result[0] === 'abb'
|
|
105
|
+
// result[1] === 'bb'
|
|
106
|
+
// result.index === 0
|
|
107
|
+
// result.input === 'abbc abbc'
|
|
108
|
+
// re.lastIndex === 3
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
With `d` flag (hasIndices), result has `.indices` property with `[start, end]` pairs for each group.
|
|
112
|
+
|
|
113
|
+
With `g` or `y` flag, advances `lastIndex`. Call repeatedly to iterate matches.
|
|
114
|
+
|
|
115
|
+
### re.test(str)
|
|
116
|
+
|
|
117
|
+
Returns `true` if the pattern matches, `false` otherwise.
|
|
118
|
+
|
|
119
|
+
```js
|
|
120
|
+
new RE2('\\d+').test('abc123'); // true
|
|
121
|
+
new RE2('\\d+').test('abcdef'); // false
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
With `g` or `y` flag, advances `lastIndex`.
|
|
125
|
+
|
|
126
|
+
### re.toString()
|
|
127
|
+
|
|
128
|
+
Returns `'/pattern/flags'` string representation.
|
|
129
|
+
|
|
130
|
+
```js
|
|
131
|
+
new RE2('abc', 'gi').toString(); // '/abc/giu'
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
## String methods (via Symbol)
|
|
135
|
+
|
|
136
|
+
RE2 instances implement well-known symbols, so they work directly with ES6 string methods:
|
|
137
|
+
|
|
138
|
+
### str.match(re) / re[Symbol.match](str)
|
|
139
|
+
|
|
140
|
+
```js
|
|
141
|
+
'test 123 test 456'.match(new RE2('\\d+', 'g')); // ['123', '456']
|
|
142
|
+
'test 123'.match(new RE2('(\\d+)')); // ['123', '123', index: 5, input: 'test 123']
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### str.matchAll(re) / re[Symbol.matchAll](str)
|
|
146
|
+
|
|
147
|
+
Returns an iterator of all matches (requires `g` flag).
|
|
148
|
+
|
|
149
|
+
```js
|
|
150
|
+
const re = new RE2('\\d+', 'g');
|
|
151
|
+
for (const m of '1a2b3c'.matchAll(re)) {
|
|
152
|
+
console.log(m[0]); // '1', '2', '3'
|
|
153
|
+
}
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### str.search(re) / re[Symbol.search](str)
|
|
157
|
+
|
|
158
|
+
Returns the index of the first match, or `-1`.
|
|
159
|
+
|
|
160
|
+
```js
|
|
161
|
+
'hello world'.search(new RE2('world')); // 6
|
|
162
|
+
```
|
|
163
|
+
|
|
164
|
+
### str.replace(re, replacement) / re[Symbol.replace](str, replacement)
|
|
165
|
+
|
|
166
|
+
Returns a new string with matches replaced.
|
|
167
|
+
|
|
168
|
+
```js
|
|
169
|
+
'aabba'.replace(new RE2('b', 'g'), 'c'); // 'aacca'
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Replacement string supports:
|
|
173
|
+
- `$1`, `$2`, ... — numbered capture groups.
|
|
174
|
+
- `$<name>` — named capture groups.
|
|
175
|
+
- `$&` — the matched substring.
|
|
176
|
+
- `` $` `` — portion before the match.
|
|
177
|
+
- `$'` — portion after the match.
|
|
178
|
+
- `$$` — literal `$`.
|
|
179
|
+
|
|
180
|
+
Replacement function receives `(match, ...groups, offset, input)`:
|
|
181
|
+
|
|
182
|
+
```js
|
|
183
|
+
'abc'.replace(new RE2('(b)'), (match, g1, offset) => `[${g1}@${offset}]`);
|
|
184
|
+
// 'a[b@1]c'
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
### str.split(re[, limit]) / re[Symbol.split](str[, limit])
|
|
188
|
+
|
|
189
|
+
Splits string by pattern.
|
|
190
|
+
|
|
191
|
+
```js
|
|
192
|
+
'a1b2c3'.split(new RE2('\\d')); // ['a', 'b', 'c', '']
|
|
193
|
+
'a1b2c3'.split(new RE2('\\d'), 2); // ['a', 'b']
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
## String methods (direct)
|
|
197
|
+
|
|
198
|
+
These are convenience methods on the RE2 instance with swapped argument order:
|
|
199
|
+
|
|
200
|
+
- `re.match(str)` — equivalent to `str.match(re)`.
|
|
201
|
+
- `re.search(str)` — equivalent to `str.search(re)`.
|
|
202
|
+
- `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`.
|
|
203
|
+
- `re.split(str[, limit])` — equivalent to `str.split(re, limit)`.
|
|
204
|
+
|
|
205
|
+
```js
|
|
206
|
+
const re = new RE2('\\d+', 'g');
|
|
207
|
+
re.match('test 123 test 456'); // ['123', '456']
|
|
208
|
+
re.search('test 123'); // 5
|
|
209
|
+
re.replace('test 1 and 2', 'N'); // 'test N and N' (global replaces all)
|
|
210
|
+
re.split('a1b2c'); // ['a', 'b', 'c']
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
## Buffer support
|
|
214
|
+
|
|
215
|
+
All methods accept Node.js Buffers (UTF-8) instead of strings. When given Buffer input, they return Buffer output.
|
|
216
|
+
|
|
217
|
+
```js
|
|
218
|
+
const re = new RE2('матч', 'g');
|
|
219
|
+
const buf = Buffer.from('тест матч тест');
|
|
220
|
+
const result = re.exec(buf);
|
|
221
|
+
// result[0] is a Buffer containing 'матч' in UTF-8
|
|
222
|
+
// result.index is in bytes (not characters)
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
Differences from string mode:
|
|
226
|
+
- All offsets and lengths are in **bytes**, not characters.
|
|
227
|
+
- Results contain Buffers instead of strings.
|
|
228
|
+
- Use `buf.toString()` to convert results back to strings.
|
|
229
|
+
|
|
230
|
+
### useBuffers on replacer functions
|
|
231
|
+
|
|
232
|
+
When using `re.replace(buf, replacerFn)`, the replacer receives string arguments and character offsets by default. Set `replacerFn.useBuffers = true` to receive byte offsets instead:
|
|
233
|
+
|
|
234
|
+
```js
|
|
235
|
+
function replacer(match, offset, input) {
|
|
236
|
+
return '<' + offset + ' bytes>';
|
|
237
|
+
}
|
|
238
|
+
replacer.useBuffers = true;
|
|
239
|
+
new RE2('б').replace(Buffer.from('абв'), replacer);
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
## RE2.Set
|
|
243
|
+
|
|
244
|
+
Multi-pattern matching — compile many patterns into a single automaton and test/match against all of them at once. Faster than testing individual patterns when the number of patterns is large.
|
|
245
|
+
|
|
246
|
+
### Constructor
|
|
247
|
+
|
|
248
|
+
```js
|
|
249
|
+
new RE2.Set(patterns[, flagsOrOptions][, options])
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
- `patterns` — any iterable of strings, Buffers, RegExp, or RE2 instances.
|
|
253
|
+
- `flagsOrOptions` — optional string/Buffer with flags (apply to all patterns), or options object.
|
|
254
|
+
- `options.anchor` — `'unanchored'` (default), `'start'`, or `'both'`.
|
|
255
|
+
|
|
256
|
+
```js
|
|
257
|
+
const set = new RE2.Set([
|
|
258
|
+
'^/users/\\d+$',
|
|
259
|
+
'^/posts/\\d+$',
|
|
260
|
+
'^/api/.*$'
|
|
261
|
+
], 'i', {anchor: 'start'});
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### set.test(str)
|
|
265
|
+
|
|
266
|
+
Returns `true` if any pattern matches, `false` otherwise.
|
|
267
|
+
|
|
268
|
+
```js
|
|
269
|
+
set.test('/users/42'); // true
|
|
270
|
+
set.test('/unknown'); // false
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
### set.match(str)
|
|
274
|
+
|
|
275
|
+
Returns an array of indices of matching patterns, sorted ascending. Empty array if none match.
|
|
276
|
+
|
|
277
|
+
```js
|
|
278
|
+
set.match('/users/42'); // [0]
|
|
279
|
+
set.match('/api/users'); // [2]
|
|
280
|
+
set.match('/unknown'); // []
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### Properties
|
|
284
|
+
|
|
285
|
+
- `set.size` (number) — number of patterns.
|
|
286
|
+
- `set.source` (string) — all patterns joined with `|`.
|
|
287
|
+
- `set.sources` (string[]) — individual pattern sources.
|
|
288
|
+
- `set.flags` (string) — flags string.
|
|
289
|
+
- `set.anchor` (string) — anchor mode.
|
|
290
|
+
|
|
291
|
+
### set.toString()
|
|
292
|
+
|
|
293
|
+
Returns `'/pattern1|pattern2|.../flags'`.
|
|
294
|
+
|
|
295
|
+
```js
|
|
296
|
+
set.toString(); // '/^/users/\\d+$|^/posts/\\d+$|^/api/.*$/iu'
|
|
297
|
+
```
|
|
298
|
+
|
|
299
|
+
## Static helpers
|
|
300
|
+
|
|
301
|
+
### RE2.getUtf8Length(str)
|
|
302
|
+
|
|
303
|
+
Calculate the byte size needed to encode a UTF-16 string as UTF-8.
|
|
304
|
+
|
|
305
|
+
```js
|
|
306
|
+
RE2.getUtf8Length('hello'); // 5
|
|
307
|
+
RE2.getUtf8Length('привет'); // 12
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
### RE2.getUtf16Length(buf)
|
|
311
|
+
|
|
312
|
+
Calculate the character count needed to encode a UTF-8 buffer as a UTF-16 string.
|
|
313
|
+
|
|
314
|
+
```js
|
|
315
|
+
RE2.getUtf16Length(Buffer.from('hello')); // 5
|
|
316
|
+
RE2.getUtf16Length(Buffer.from('привет')); // 6
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
## Named groups
|
|
320
|
+
|
|
321
|
+
Named capture groups are supported:
|
|
322
|
+
|
|
323
|
+
```js
|
|
324
|
+
const re = new RE2('(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})');
|
|
325
|
+
const result = re.exec('2024-01-15');
|
|
326
|
+
result.groups.year; // '2024'
|
|
327
|
+
result.groups.month; // '01'
|
|
328
|
+
result.groups.day; // '15'
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
Named backreferences in replacement strings:
|
|
332
|
+
|
|
333
|
+
```js
|
|
334
|
+
'2024-01-15'.replace(
|
|
335
|
+
new RE2('(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})'),
|
|
336
|
+
'$<d>/$<m>/$<y>'
|
|
337
|
+
); // '15/01/2024'
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
## Unicode classes
|
|
341
|
+
|
|
342
|
+
RE2 supports Unicode property escapes. Long names are translated to RE2 short names:
|
|
343
|
+
|
|
344
|
+
```js
|
|
345
|
+
new RE2('\\p{Letter}+'); // same as \p{L}+
|
|
346
|
+
new RE2('\\p{Number}+'); // same as \p{N}+
|
|
347
|
+
new RE2('\\p{Script=Latin}+'); // same as \p{Latin}+
|
|
348
|
+
new RE2('\\p{sc=Cyrillic}+'); // same as \p{Cyrillic}+
|
|
349
|
+
new RE2('\\P{Letter}+'); // negated: non-letters
|
|
350
|
+
```
|
|
351
|
+
|
|
352
|
+
Only `\p{name}` form is supported (not `\p{name=value}` in general). Exception: `Script` and `sc` names.
|
|
353
|
+
|
|
354
|
+
## Limitations
|
|
355
|
+
|
|
356
|
+
RE2 does **not** support:
|
|
357
|
+
|
|
358
|
+
- **Backreferences** (`\1`, `\2`, etc.) — throw `SyntaxError`.
|
|
359
|
+
- **Lookahead assertions** (`(?=...)`, `(?!...)`) — throw `SyntaxError`.
|
|
360
|
+
- **Lookbehind assertions** (`(?<=...)`, `(?<!...)`) — throw `SyntaxError`.
|
|
361
|
+
|
|
362
|
+
Fallback pattern:
|
|
363
|
+
|
|
364
|
+
```js
|
|
365
|
+
let re = /pattern-with-lookahead(?=foo)/;
|
|
366
|
+
try {
|
|
367
|
+
re = new RE2(re);
|
|
368
|
+
} catch (e) {
|
|
369
|
+
// use original RegExp as fallback
|
|
370
|
+
}
|
|
371
|
+
const result = re.exec(input);
|
|
372
|
+
```
|
|
373
|
+
|
|
374
|
+
## Common patterns
|
|
375
|
+
|
|
376
|
+
### Drop-in RegExp replacement
|
|
377
|
+
|
|
378
|
+
```js
|
|
379
|
+
const RE2 = require('re2');
|
|
380
|
+
|
|
381
|
+
// Before (vulnerable to ReDoS):
|
|
382
|
+
const re = new RegExp(userInput);
|
|
383
|
+
|
|
384
|
+
// After (safe):
|
|
385
|
+
const re = new RE2(userInput);
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
### Process Buffer data efficiently
|
|
389
|
+
|
|
390
|
+
```js
|
|
391
|
+
const RE2 = require('re2');
|
|
392
|
+
const fs = require('fs');
|
|
393
|
+
|
|
394
|
+
const data = fs.readFileSync('large-file.txt');
|
|
395
|
+
const re = new RE2('pattern', 'g');
|
|
396
|
+
let match;
|
|
397
|
+
while ((match = re.exec(data)) !== null) {
|
|
398
|
+
console.log('Found at byte offset:', match.index);
|
|
399
|
+
}
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
### Route matching with RE2.Set
|
|
403
|
+
|
|
404
|
+
```js
|
|
405
|
+
const RE2 = require('re2');
|
|
406
|
+
|
|
407
|
+
const routes = new RE2.Set([
|
|
408
|
+
'^/users/\\d+$',
|
|
409
|
+
'^/posts/\\d+$',
|
|
410
|
+
'^/api/v\\d+/.*$'
|
|
411
|
+
], 'i');
|
|
412
|
+
|
|
413
|
+
function findRoute(path) {
|
|
414
|
+
const matches = routes.match(path);
|
|
415
|
+
return matches.length > 0 ? matches[0] : -1;
|
|
416
|
+
}
|
|
417
|
+
|
|
418
|
+
findRoute('/users/42'); // 0
|
|
419
|
+
findRoute('/posts/7'); // 1
|
|
420
|
+
findRoute('/api/v2/foo'); // 2
|
|
421
|
+
findRoute('/unknown'); // -1
|
|
422
|
+
```
|
|
423
|
+
|
|
424
|
+
### Validate user-supplied patterns safely
|
|
425
|
+
|
|
426
|
+
```js
|
|
427
|
+
const RE2 = require('re2');
|
|
428
|
+
|
|
429
|
+
function safeMatch(input, pattern, flags) {
|
|
430
|
+
try {
|
|
431
|
+
const re = new RE2(pattern, flags);
|
|
432
|
+
return re.test(input);
|
|
433
|
+
} catch (e) {
|
|
434
|
+
return false; // invalid pattern
|
|
435
|
+
}
|
|
436
|
+
}
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
## TypeScript
|
|
440
|
+
|
|
441
|
+
```ts
|
|
442
|
+
import RE2 from 're2';
|
|
443
|
+
|
|
444
|
+
const re: RE2 = new RE2('\\d+', 'g');
|
|
445
|
+
const result: RegExpExecArray | null = re.exec('test 123');
|
|
446
|
+
|
|
447
|
+
// Buffer overloads
|
|
448
|
+
const bufResult: RE2BufferExecArray | null = re.exec(Buffer.from('test 123'));
|
|
449
|
+
|
|
450
|
+
// RE2.Set
|
|
451
|
+
const set: RE2Set = new RE2.Set(['a', 'b'], 'i');
|
|
452
|
+
const matches: number[] = set.match('abc');
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
## Project structure notes
|
|
456
|
+
|
|
457
|
+
- Entry point: `re2.js` (loads native addon), types: `re2.d.ts`.
|
|
458
|
+
- C++ addon source: `lib/*.cc`, `lib/*.h`.
|
|
459
|
+
- Tests: `tests/test-*.mjs` (runtime), `ts-tests/test-*.ts` (type-checking).
|
|
460
|
+
- Vendored dependencies: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) — **never modify files under `vendor/`**.
|
|
461
|
+
|
|
462
|
+
## Links
|
|
463
|
+
|
|
464
|
+
- Docs: https://github.com/uhop/node-re2/wiki
|
|
465
|
+
- npm: https://www.npmjs.com/package/re2
|
|
466
|
+
- Repository: https://github.com/uhop/node-re2
|
|
467
|
+
- RE2 syntax: https://github.com/google/re2/wiki/Syntax
|
package/llms.txt
ADDED
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# node-re2
|
|
2
|
+
|
|
3
|
+
> Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS. Works with strings and Buffers.
|
|
4
|
+
|
|
5
|
+
## Install
|
|
6
|
+
|
|
7
|
+
npm install re2
|
|
8
|
+
|
|
9
|
+
## Quick start
|
|
10
|
+
|
|
11
|
+
```js
|
|
12
|
+
// CommonJS
|
|
13
|
+
const RE2 = require('re2');
|
|
14
|
+
|
|
15
|
+
// ESM
|
|
16
|
+
import {RE2} from 're2';
|
|
17
|
+
|
|
18
|
+
const re = new RE2('a(b*)', 'i');
|
|
19
|
+
const result = re.exec('aBbC');
|
|
20
|
+
console.log(result[0]); // "aBb"
|
|
21
|
+
console.log(result[1]); // "Bb"
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
## Why use node-re2?
|
|
25
|
+
|
|
26
|
+
The built-in Node.js RegExp engine can run in exponential time with vulnerable patterns (ReDoS). RE2 guarantees linear-time matching by disallowing backreferences and lookahead assertions.
|
|
27
|
+
|
|
28
|
+
## API
|
|
29
|
+
|
|
30
|
+
### Construction
|
|
31
|
+
|
|
32
|
+
```js
|
|
33
|
+
const RE2 = require('re2');
|
|
34
|
+
|
|
35
|
+
const re1 = new RE2('\\d+'); // from string
|
|
36
|
+
const re2 = new RE2('\\d+', 'gi'); // with flags
|
|
37
|
+
const re3 = new RE2(/ab*/ig); // from RegExp
|
|
38
|
+
const re4 = new RE2(re3); // from another RE2
|
|
39
|
+
const re5 = RE2('\\d+'); // factory (no new)
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Supported flags: `g` (global), `i` (ignoreCase), `m` (multiline), `s` (dotAll), `u` (unicode, always on), `y` (sticky), `d` (hasIndices).
|
|
43
|
+
|
|
44
|
+
### RegExp methods
|
|
45
|
+
|
|
46
|
+
- `re.exec(str)` — find match with capture groups.
|
|
47
|
+
- `re.test(str)` — boolean match check.
|
|
48
|
+
- `re.toString()` — `/pattern/flags` representation.
|
|
49
|
+
|
|
50
|
+
### String methods (via Symbol)
|
|
51
|
+
|
|
52
|
+
RE2 instances work with ES6 string methods:
|
|
53
|
+
|
|
54
|
+
```js
|
|
55
|
+
'abc'.match(re);
|
|
56
|
+
'abc'.search(re);
|
|
57
|
+
'abc'.replace(re, 'x');
|
|
58
|
+
'abc'.split(re);
|
|
59
|
+
Array.from('abc'.matchAll(re));
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
### String methods (direct)
|
|
63
|
+
|
|
64
|
+
- `re.match(str)` — equivalent to `str.match(re)`.
|
|
65
|
+
- `re.search(str)` — equivalent to `str.search(re)`.
|
|
66
|
+
- `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`.
|
|
67
|
+
- `re.split(str[, limit])` — equivalent to `str.split(re, limit)`.
|
|
68
|
+
|
|
69
|
+
### Properties
|
|
70
|
+
|
|
71
|
+
- `re.source` — pattern string.
|
|
72
|
+
- `re.flags` — flags string.
|
|
73
|
+
- `re.lastIndex` — index for next match (with `g` or `y` flag).
|
|
74
|
+
- `re.global`, `re.ignoreCase`, `re.multiline`, `re.dotAll`, `re.unicode`, `re.sticky`, `re.hasIndices` — boolean flag accessors.
|
|
75
|
+
- `re.internalSource` — RE2-translated pattern (for debugging).
|
|
76
|
+
|
|
77
|
+
### Buffer support
|
|
78
|
+
|
|
79
|
+
All methods accept Buffers (UTF-8) instead of strings. Buffer input produces Buffer output. Offsets are in bytes.
|
|
80
|
+
|
|
81
|
+
```js
|
|
82
|
+
const re = new RE2('матч', 'g');
|
|
83
|
+
const buf = Buffer.from('тест матч тест');
|
|
84
|
+
const result = re.exec(buf);
|
|
85
|
+
// result[0] is a Buffer
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### RE2.Set
|
|
89
|
+
|
|
90
|
+
Multi-pattern matching — test a string against many patterns at once.
|
|
91
|
+
|
|
92
|
+
```js
|
|
93
|
+
const set = new RE2.Set(['^/users/\\d+$', '^/posts/\\d+$'], 'i');
|
|
94
|
+
set.test('/users/7'); // true
|
|
95
|
+
set.match('/posts/42'); // [1]
|
|
96
|
+
set.sources; // ['^/users/\\d+$', '^/posts/\\d+$']
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
- `new RE2.Set(patterns[, flags][, options])` — compile patterns.
|
|
100
|
+
- `options.anchor`: `'unanchored'` (default), `'start'`, or `'both'`.
|
|
101
|
+
- `set.test(str)` — returns `true` if any pattern matches.
|
|
102
|
+
- `set.match(str)` — returns array of matching pattern indices.
|
|
103
|
+
- Properties: `size`, `source`, `sources`, `flags`, `anchor`.
|
|
104
|
+
|
|
105
|
+
### Static helpers
|
|
106
|
+
|
|
107
|
+
- `RE2.getUtf8Length(str)` — byte size of string as UTF-8.
|
|
108
|
+
- `RE2.getUtf16Length(buf)` — character count of UTF-8 buffer as UTF-16 string.
|
|
109
|
+
- `RE2.unicodeWarningLevel` — `'nothing'` (default), `'warnOnce'`, `'warn'`, or `'throw'`.
|
|
110
|
+
|
|
111
|
+
## Limitations
|
|
112
|
+
|
|
113
|
+
RE2 does not support:
|
|
114
|
+
- **Backreferences** (`\1`, `\2`, etc.)
|
|
115
|
+
- **Lookahead assertions** (`(?=...)`, `(?!...)`)
|
|
116
|
+
|
|
117
|
+
These throw `SyntaxError`. Use try-catch to fall back to RegExp when needed:
|
|
118
|
+
|
|
119
|
+
```js
|
|
120
|
+
let re = /pattern-with-lookahead/;
|
|
121
|
+
try { re = new RE2(re); } catch (e) { /* use original RegExp */ }
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
## Project notes
|
|
125
|
+
|
|
126
|
+
- C++ addon source is in `lib/`. Vendored deps (`vendor/re2/`, `vendor/abseil-cpp/`) are git submodules — **never modify files under `vendor/`**.
|
|
127
|
+
|
|
128
|
+
## Links
|
|
129
|
+
|
|
130
|
+
- Docs: https://github.com/uhop/node-re2/wiki
|
|
131
|
+
- npm: https://www.npmjs.com/package/re2
|
|
132
|
+
- Full LLM reference: https://github.com/uhop/node-re2/blob/master/llms-full.txt
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "re2",
|
|
3
|
-
"version": "1.24.
|
|
3
|
+
"version": "1.24.1",
|
|
4
4
|
"description": "Bindings for RE2: fast, safe alternative to backtracking regular expression engines.",
|
|
5
5
|
"homepage": "https://github.com/uhop/node-re2",
|
|
6
6
|
"bugs": "https://github.com/uhop/node-re2/issues",
|
|
@@ -8,24 +8,28 @@
|
|
|
8
8
|
"main": "re2.js",
|
|
9
9
|
"types": "re2.d.ts",
|
|
10
10
|
"files": [
|
|
11
|
+
"AGENTS.md",
|
|
12
|
+
"ARCHITECTURE.md",
|
|
11
13
|
"binding.gyp",
|
|
12
14
|
"lib",
|
|
15
|
+
"llms-full.txt",
|
|
16
|
+
"llms.txt",
|
|
13
17
|
"re2.d.ts",
|
|
14
18
|
"scripts/*.js",
|
|
15
19
|
"vendor"
|
|
16
20
|
],
|
|
17
21
|
"dependencies": {
|
|
18
|
-
"install-artifact-from-github": "^1.
|
|
19
|
-
"nan": "^2.
|
|
20
|
-
"node-gyp": "^12.
|
|
22
|
+
"install-artifact-from-github": "^1.6.0",
|
|
23
|
+
"nan": "^2.27.0",
|
|
24
|
+
"node-gyp": "^12.3.0"
|
|
21
25
|
},
|
|
22
26
|
"devDependencies": {
|
|
23
|
-
"@types/node": "^25.
|
|
27
|
+
"@types/node": "^25.7.0",
|
|
24
28
|
"nano-benchmark": "^1.0.15",
|
|
25
|
-
"prettier": "^3.8.
|
|
26
|
-
"tape-six": "^1.
|
|
27
|
-
"tape-six-proc": "^1.2.
|
|
28
|
-
"typescript": "^6.0.
|
|
29
|
+
"prettier": "^3.8.3",
|
|
30
|
+
"tape-six": "^1.9.0",
|
|
31
|
+
"tape-six-proc": "^1.2.9",
|
|
32
|
+
"typescript": "^6.0.3"
|
|
29
33
|
},
|
|
30
34
|
"scripts": {
|
|
31
35
|
"test": "tape6 --flags FO",
|
|
@@ -49,7 +53,10 @@
|
|
|
49
53
|
"github": "https://github.com/uhop/node-re2",
|
|
50
54
|
"repository": {
|
|
51
55
|
"type": "git",
|
|
52
|
-
"url": "git://github.com/uhop/node-re2.git"
|
|
56
|
+
"url": "git+https://github.com/uhop/node-re2.git"
|
|
57
|
+
},
|
|
58
|
+
"engines": {
|
|
59
|
+
"node": ">=22"
|
|
53
60
|
},
|
|
54
61
|
"keywords": [
|
|
55
62
|
"RegExp",
|
package/re2.js
CHANGED
|
@@ -118,7 +118,7 @@
|
|
|
118
118
|
// LTS releases can be obtained from
|
|
119
119
|
// https://github.com/abseil/abseil-cpp/releases.
|
|
120
120
|
#define ABSL_LTS_RELEASE_VERSION 20260107
|
|
121
|
-
#define ABSL_LTS_RELEASE_PATCH_LEVEL
|
|
121
|
+
#define ABSL_LTS_RELEASE_PATCH_LEVEL 1
|
|
122
122
|
|
|
123
123
|
// Helper macro to convert a CPP variable to a string literal.
|
|
124
124
|
#define ABSL_INTERNAL_DO_TOKEN_STR(x) #x
|
|
@@ -104,7 +104,10 @@
|
|
|
104
104
|
#define ABSL_HASH_INTERNAL_CRC32_U32 _mm_crc32_u32
|
|
105
105
|
#define ABSL_HASH_INTERNAL_CRC32_U8 _mm_crc32_u8
|
|
106
106
|
|
|
107
|
-
|
|
107
|
+
// 32-bit builds with AVX do not have _mm_crc32_u64, so the _M_X64 condition is
|
|
108
|
+
// necessary.
|
|
109
|
+
#elif defined(_MSC_VER) && !defined(__clang__) && defined(__AVX__) && \
|
|
110
|
+
defined(_M_X64)
|
|
108
111
|
|
|
109
112
|
// MSVC AVX (/arch:AVX) implies SSE 4.2.
|
|
110
113
|
#include <intrin.h>
|
|
@@ -827,7 +827,7 @@ bool Base64UnescapeInternal(const char* absl_nullable src, size_t slen,
|
|
|
827
827
|
}
|
|
828
828
|
|
|
829
829
|
/* clang-format off */
|
|
830
|
-
constexpr std::array<
|
|
830
|
+
constexpr std::array<uint8_t, 256> kHexValueLenient = {
|
|
831
831
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
|
832
832
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
|
833
833
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
|
@@ -846,7 +846,7 @@ constexpr std::array<char, 256> kHexValueLenient = {
|
|
|
846
846
|
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
|
|
847
847
|
};
|
|
848
848
|
|
|
849
|
-
constexpr std::array<
|
|
849
|
+
constexpr std::array<int8_t, 256> kHexValueStrict = {
|
|
850
850
|
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
|
|
851
851
|
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
|
|
852
852
|
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
|
|
@@ -874,7 +874,7 @@ void HexStringToBytesInternal(const char* absl_nullable from, T to,
|
|
|
874
874
|
size_t num) {
|
|
875
875
|
for (size_t i = 0; i < num; i++) {
|
|
876
876
|
to[i] = static_cast<char>(kHexValueLenient[from[i * 2] & 0xFF] << 4) +
|
|
877
|
-
(kHexValueLenient[from[i * 2 + 1] & 0xFF]);
|
|
877
|
+
static_cast<char>(kHexValueLenient[from[i * 2 + 1] & 0xFF]);
|
|
878
878
|
}
|
|
879
879
|
}
|
|
880
880
|
|
|
@@ -992,8 +992,10 @@ bool HexStringToBytes(absl::string_view hex, std::string* absl_nonnull bytes) {
|
|
|
992
992
|
output, num_bytes, [hex](char* buf, size_t buf_size) {
|
|
993
993
|
auto hex_p = hex.cbegin();
|
|
994
994
|
for (size_t i = 0; i < buf_size; ++i) {
|
|
995
|
-
int h1 = absl::kHexValueStrict[static_cast<size_t>(
|
|
996
|
-
|
|
995
|
+
int h1 = absl::kHexValueStrict[static_cast<size_t>(
|
|
996
|
+
static_cast<uint8_t>(*hex_p++))];
|
|
997
|
+
int h2 = absl::kHexValueStrict[static_cast<size_t>(
|
|
998
|
+
static_cast<uint8_t>(*hex_p++))];
|
|
997
999
|
if (h1 == -1 || h2 == -1) {
|
|
998
1000
|
return size_t{0};
|
|
999
1001
|
}
|
|
@@ -733,6 +733,10 @@ TEST(Escaping, HexStringToBytesBackToHex) {
|
|
|
733
733
|
bytes = "abc";
|
|
734
734
|
EXPECT_TRUE(absl::HexStringToBytes("", &bytes));
|
|
735
735
|
EXPECT_EQ("", bytes); // Results in empty output.
|
|
736
|
+
|
|
737
|
+
// Ensure there is no sign extension bug on a signed char.
|
|
738
|
+
hex.assign("\xC8" "b", 2);
|
|
739
|
+
EXPECT_FALSE(absl::HexStringToBytes(hex, &bytes));
|
|
736
740
|
}
|
|
737
741
|
|
|
738
742
|
TEST(HexAndBack, HexStringToBytes_and_BytesToHexString) {
|