re2 1.24.0 → 1.24.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md ADDED
@@ -0,0 +1,131 @@
1
+ # AGENTS.md — node-re2
2
+
3
+ > `node-re2` provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`.
4
+
5
+ For project structure, module dependencies, and the architecture overview see [ARCHITECTURE.md](./ARCHITECTURE.md).
6
+ For detailed usage docs see the [README](./README.md) and the [wiki](https://github.com/uhop/node-re2/wiki).
7
+
8
+ ## Setup
9
+
10
+ This project uses git submodules for vendored dependencies (RE2 and Abseil):
11
+
12
+ ```bash
13
+ git clone --recursive https://github.com/uhop/node-re2.git
14
+ cd node-re2
15
+ npm install
16
+ ```
17
+
18
+ If the native addon fails to download a prebuilt artifact, it builds locally via `node-gyp`.
19
+
20
+ ## Commands
21
+
22
+ - **Install:** `npm install` (downloads prebuilt artifact or builds from source)
23
+ - **Build (release):** `npm run rebuild` (or `node-gyp -j max rebuild`)
24
+ - **Build (debug):** `npm run rebuild:dev` (or `node-gyp -j max rebuild --debug`)
25
+ - **Test:** `npm test` (runs `tape6 --flags FO`, worker threads)
26
+ - **Test (sequential):** `npm run test:seq`
27
+ - **Test (multi-process):** `npm run test:proc`
28
+ - **Test (single file):** `node tests/test-<name>.mjs`
29
+ - **TypeScript check:** `npm run ts-check`
30
+ - **Lint:** `npm run lint` (Prettier check)
31
+ - **Lint fix:** `npm run lint:fix` (Prettier write)
32
+ - **Verify build:** `npm run verify-build`
33
+
34
+ ## Project structure
35
+
36
+ ```
37
+ node-re2/
38
+ ├── package.json # Package config; "tape6" section configures test discovery
39
+ ├── binding.gyp # node-gyp build configuration for the C++ addon
40
+ ├── re2.js # Main entry point: loads native addon, sets up Symbol aliases
41
+ ├── re2.d.ts # TypeScript declarations for the public API
42
+ ├── tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
43
+ ├── lib/ # C++ source code (native addon)
44
+ │ ├── addon.cc # Node.js addon initialization, method registration
45
+ │ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
46
+ │ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
47
+ │ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
48
+ │ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
49
+ │ ├── exec.cc # RE2.prototype.exec() implementation
50
+ │ ├── test.cc # RE2.prototype.test() implementation
51
+ │ ├── match.cc # RE2.prototype.match() implementation
52
+ │ ├── replace.cc # RE2.prototype.replace() implementation
53
+ │ ├── search.cc # RE2.prototype.search() implementation
54
+ │ ├── split.cc # RE2.prototype.split() implementation
55
+ │ ├── to_string.cc # RE2.prototype.toString() implementation
56
+ │ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
57
+ │ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
58
+ │ ├── set.cc # RE2.Set implementation (multi-pattern matching)
59
+ │ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
60
+ │ ├── util.h # Utility declarations
61
+ │ └── pattern.h # Pattern translation declarations
62
+ ├── scripts/
63
+ │ └── verify-build.js # Quick smoke test for the built addon
64
+ ├── tests/ # Test files (test-*.mjs using tape-six)
65
+ ├── ts-tests/ # TypeScript type-checking tests
66
+ │ └── test-types.ts # Verifies type declarations compile correctly
67
+ ├── bench/ # Benchmarks
68
+ ├── vendor/ # Vendored C++ dependencies (git submodules)
69
+ │ ├── re2/ # Google RE2 library source
70
+ │ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
71
+ └── .github/ # CI workflows, Dependabot config, actions
72
+ ```
73
+
74
+ ## Code style
75
+
76
+ - **CommonJS** throughout (`"type": "commonjs"` in package.json).
77
+ - **No transpilation** — JavaScript code runs directly.
78
+ - **C++ code** uses tabs for indentation, 4-wide. JavaScript uses 2-space indentation.
79
+ - **Prettier** for JS/TS formatting (see `.prettierrc`): 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid".
80
+ - **nan** (Native Abstractions for Node.js) for the C++ addon API.
81
+ - Semicolons are enforced by Prettier (default `semi: true`).
82
+ - Imports use `require()` syntax in source, `import` in tests (`.mjs`).
83
+
84
+ ## Critical rules
85
+
86
+ - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules.
87
+ - **Do not modify or delete test expectations** without understanding why they changed.
88
+ - **Do not add comments or remove comments** unless explicitly asked.
89
+ - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`.
90
+ - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64).
91
+ - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly.
92
+ - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input.
93
+
94
+ ## Architecture
95
+
96
+ - `re2.js` is the main entry point. It loads the native C++ addon from `build/Release/re2.node` and sets up `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` on the prototype.
97
+ - The C++ addon (`lib/*.cc`) wraps Google's RE2 library via nan. Each RegExp method has its own `.cc` file.
98
+ - `lib/new.cc` handles construction: parsing patterns, translating RegExp syntax to RE2 syntax (via `lib/pattern.cc`), and creating the underlying `re2::RE2` instance.
99
+ - `lib/pattern.cc` translates JavaScript RegExp features to RE2 equivalents, including Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`).
100
+ - `lib/set.cc` implements `RE2.Set` for multi-pattern matching using `re2::RE2::Set`.
101
+ - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion helpers and buffer utilities.
102
+ - Prebuilt native artifacts are hosted on GitHub Releases and downloaded at install time via `install-artifact-from-github`.
103
+
104
+ ## Writing tests
105
+
106
+ ```js
107
+ import test from 'tape-six';
108
+ import {RE2} from '../re2.js';
109
+
110
+ test('example', t => {
111
+ const re = new RE2('a(b*)', 'i');
112
+ const result = re.exec('aBbC');
113
+ t.ok(result);
114
+ t.equal(result[0], 'aBb');
115
+ t.equal(result[1], 'Bb');
116
+ });
117
+ ```
118
+
119
+ - Test files use `tape-six`: `.mjs` for runtime tests, `.ts` for TypeScript typing tests.
120
+ - Test file naming convention: `test-*.mjs` in `tests/`, `test-*.ts` in `ts-tests/`.
121
+ - Tests are configured in `package.json` under the `"tape6"` section.
122
+ - Test files should be directly executable: `node tests/test-foo.mjs`.
123
+
124
+ ## Key conventions
125
+
126
+ - The library is a drop-in replacement for `RegExp` — the `RE2` object emulates the standard `RegExp` API.
127
+ - `RE2.Set` provides multi-pattern matching: `new RE2.Set(patterns, flags, options)`.
128
+ - Static helpers: `RE2.getUtf8Length(str)`, `RE2.getUtf16Length(buf)`.
129
+ - `RE2.unicodeWarningLevel` controls behavior when non-Unicode regexps are created.
130
+ - The `install` script tries to download a prebuilt `.node` artifact before falling back to `node-gyp rebuild`.
131
+ - All C++ source is in `lib/`, all vendored third-party C++ is in `vendor/`.
@@ -0,0 +1,152 @@
1
+ # Architecture
2
+
3
+ `node-re2` provides Node.js bindings for Google's [RE2](https://github.com/google/re2) regular expression engine. It is a C++ native addon built with `node-gyp` and `nan`. The `RE2` object is a drop-in replacement for `RegExp` with guaranteed linear-time matching (no ReDoS).
4
+
5
+ ## Project layout
6
+
7
+ ```
8
+ package.json # Package config; "tape6" section configures test discovery
9
+ binding.gyp # node-gyp build configuration for the C++ addon
10
+ re2.js # Main entry point: loads native addon, sets up Symbol aliases
11
+ re2.d.ts # TypeScript declarations for the public API
12
+ tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
13
+ lib/ # C++ source code (native addon)
14
+ ├── addon.cc # Node.js addon initialization, method registration
15
+ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
16
+ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
17
+ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
18
+ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
19
+ ├── exec.cc # RE2.prototype.exec() implementation
20
+ ├── test.cc # RE2.prototype.test() implementation
21
+ ├── match.cc # RE2.prototype.match() implementation
22
+ ├── replace.cc # RE2.prototype.replace() implementation
23
+ ├── search.cc # RE2.prototype.search() implementation
24
+ ├── split.cc # RE2.prototype.split() implementation
25
+ ├── to_string.cc # RE2.prototype.toString() implementation
26
+ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
27
+ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
28
+ ├── pattern.h # Pattern translation declarations
29
+ ├── set.cc # RE2.Set implementation (multi-pattern matching)
30
+ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
31
+ └── util.h # Utility declarations
32
+ scripts/
33
+ └── verify-build.js # Quick smoke test for the built addon
34
+ tests/ # Test files (test-*.mjs using tape-six)
35
+ ts-tests/ # TypeScript type-checking tests
36
+ └── test-types.ts # Verifies type declarations compile correctly
37
+ bench/ # Benchmarks
38
+ vendor/ # Vendored C++ dependencies (git submodules) — DO NOT MODIFY
39
+ ├── re2/ # Google RE2 library source
40
+ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
41
+ .github/ # CI workflows, Dependabot config, actions
42
+ ```
43
+
44
+ ## Core concepts
45
+
46
+ ### How the addon works
47
+
48
+ 1. `re2.js` is the entry point. It loads the compiled C++ addon from `build/Release/re2.node`.
49
+ 2. The addon exposes an `RE2` constructor that wraps `re2::RE2` from Google's RE2 library.
50
+ 3. `re2.js` adds `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` to the prototype so `RE2` instances work with ES6 string methods.
51
+ 4. The `RE2` constructor can be called with or without `new` (factory mode).
52
+
53
+ ### C++ addon structure
54
+
55
+ Each RegExp method has its own `.cc` file for maintainability:
56
+
57
+ | File | Purpose |
58
+ | --------------- | ---------------------------------------------------------------- |
59
+ | `addon.cc` | Node.js module initialization, registers all methods/accessors |
60
+ | `isolate_data.h` | Per-isolate data struct (`AddonData`) for thread-safe addon state |
61
+ | `wrapped_re2.h` | `WrappedRE2` class: holds `re2::RE2*`, flags, lastIndex, source |
62
+ | `new.cc` | Constructor: parses pattern + flags, translates syntax, creates RE2 instance |
63
+ | `exec.cc` | `exec()` — find match with capture groups |
64
+ | `test.cc` | `test()` — boolean match check |
65
+ | `match.cc` | `match()` — String.prototype.match equivalent |
66
+ | `replace.cc` | `replace()` — substitution with string or function replacer |
67
+ | `search.cc` | `search()` — find index of first match |
68
+ | `split.cc` | `split()` — split string by pattern |
69
+ | `to_string.cc` | `toString()` — `/pattern/flags` representation |
70
+ | `accessors.cc` | Property getters: `source`, `flags`, `lastIndex`, `global`, `ignoreCase`, `multiline`, `dotAll`, `unicode`, `sticky`, `hasIndices`, `internalSource` |
71
+ | `pattern.cc` | Translates JS RegExp syntax to RE2 syntax, maps Unicode property names |
72
+ | `set.cc` | `RE2.Set` — multi-pattern matching via `re2::RE2::Set` |
73
+ | `util.cc` | UTF-8 ↔ UTF-16 conversion, buffer/string helpers |
74
+
75
+ ### Pattern translation (pattern.cc)
76
+
77
+ JavaScript RegExp features are translated to RE2 equivalents:
78
+
79
+ - Named groups: `(?<name>...)` syntax is preserved (RE2 supports it natively).
80
+ - Unicode classes: long names like `\p{Letter}` are mapped to short names `\p{L}`. Script names like `\p{Script=Latin}` are mapped to `\p{Latin}`.
81
+ - Backreferences and lookahead assertions are **not supported** — RE2 throws `SyntaxError`.
82
+
83
+ ### Buffer support
84
+
85
+ All methods accept both strings and Node.js Buffers:
86
+
87
+ - Buffer inputs are assumed UTF-8 encoded.
88
+ - Buffer inputs produce Buffer outputs (in composite result objects too).
89
+ - Offsets and lengths are in bytes (not characters) when using Buffers.
90
+ - The `useBuffers` property on replacer functions controls offset reporting in `replace()`.
91
+
92
+ ### RE2.Set (set.cc)
93
+
94
+ Multi-pattern matching using `re2::RE2::Set`:
95
+
96
+ - `new RE2.Set(patterns, flags?, options?)` — compile multiple patterns into a single automaton.
97
+ - `set.test(str)` — returns `true` if any pattern matches.
98
+ - `set.match(str)` — returns array of indices of matching patterns.
99
+ - Properties: `size`, `source`, `sources`, `flags`, `anchor`.
100
+
101
+ ### Build system
102
+
103
+ - `binding.gyp` defines the node-gyp build: compiles all `.cc` files in `lib/` plus vendored RE2 and Abseil sources.
104
+ - Platform-specific compiler flags are set for GCC, Clang, and MSVC.
105
+ - The `install` npm script first tries to download a prebuilt `re2.node` from GitHub Releases via `install-artifact-from-github`, falling back to a local `node-gyp rebuild`.
106
+ - Prebuilt artifacts cover: Linux (x64, arm64, Alpine/musl), macOS (x64, arm64), Windows (x64, arm64).
107
+
108
+ ## Module dependency graph
109
+
110
+ ```
111
+ re2.js ──→ build/Release/re2.node (compiled C++ addon)
112
+
113
+ ├── lib/addon.cc (init)
114
+ │ ├── lib/new.cc ──→ lib/pattern.cc
115
+ │ ├── lib/exec.cc
116
+ │ ├── lib/test.cc
117
+ │ ├── lib/match.cc
118
+ │ ├── lib/replace.cc
119
+ │ ├── lib/search.cc
120
+ │ ├── lib/split.cc
121
+ │ ├── lib/to_string.cc
122
+ │ ├── lib/accessors.cc
123
+ │ └── lib/set.cc
124
+
125
+ ├── lib/wrapped_re2.h (shared class definition)
126
+ ├── lib/wrapped_re2_set.h (RE2.Set class)
127
+ ├── lib/util.cc / lib/util.h (shared utilities)
128
+
129
+ └── vendor/ (re2 + abseil-cpp)
130
+ ```
131
+
132
+ ## Testing
133
+
134
+ - **Framework**: tape-six (`tape6`)
135
+ - **Run all**: `npm test` (worker threads via `tape6 --flags FO`)
136
+ - **Run sequential**: `npm run test:seq`
137
+ - **Run multi-process**: `npm run test:proc`
138
+ - **Run single file**: `node tests/test-<name>.mjs`
139
+ - **TypeScript check**: `npm run ts-check`
140
+ - **Lint**: `npm run lint` (Prettier check)
141
+ - **Lint fix**: `npm run lint:fix` (Prettier write)
142
+ - **Verify build**: `npm run verify-build` (quick smoke test)
143
+
144
+ ## Import paths
145
+
146
+ ```js
147
+ // CommonJS (source, scripts)
148
+ const RE2 = require('re2');
149
+
150
+ // ESM (tests)
151
+ import {RE2} from '../re2.js';
152
+ ```
package/README.md CHANGED
@@ -385,6 +385,7 @@ The same applies to `\P{...}`.
385
385
 
386
386
  ## Release history
387
387
 
388
+ - 1.24.1 *Support for Node 22, 24, 26 + precompiled binaries.*
388
389
  - 1.24.0 *Fixed multi-threaded crash in worker threads (#235). Added named import: `import {RE2} from 're2'`. Added CJS test. Updated docs and dependencies.*
389
390
  - 1.23.3 *Updated Abseil and dev dependencies.*
390
391
  - 1.23.2 *Updated dev dependencies.*
package/lib/addon.cc CHANGED
@@ -40,7 +40,7 @@ static NAN_METHOD(GetUtf8Length)
40
40
  return;
41
41
  }
42
42
  auto s = t.ToLocalChecked();
43
- info.GetReturnValue().Set(static_cast<int>(s->Utf8Length(v8::Isolate::GetCurrent())));
43
+ info.GetReturnValue().Set(static_cast<int>(utf8Length(s, v8::Isolate::GetCurrent())));
44
44
  }
45
45
 
46
46
  static NAN_METHOD(GetUtf16Length)
@@ -197,7 +197,7 @@ const StrVal &WrappedRE2::prepareArgument(const v8::Local<v8::Value> &arg, bool
197
197
  auto isolate = v8::Isolate::GetCurrent();
198
198
 
199
199
  auto s = t.ToLocalChecked();
200
- auto argLength = s->Utf8Length(isolate);
200
+ auto argLength = utf8Length(s, isolate);
201
201
 
202
202
  auto buffer = node::Buffer::New(isolate, s).ToLocalChecked();
203
203
  lastCache.Reset(buffer);
package/lib/new.cc CHANGED
@@ -76,10 +76,10 @@ NAN_METHOD(WrappedRE2::New)
76
76
  auto isolate = v8::Isolate::GetCurrent();
77
77
  auto t = info[1]->ToString(Nan::GetCurrentContext());
78
78
  auto s = t.ToLocalChecked();
79
- size = s->Utf8Length(isolate);
79
+ size = utf8Length(s, isolate);
80
80
  buffer.resize(size + 1);
81
81
  data = &buffer[0];
82
- s->WriteUtf8(isolate, data, buffer.size());
82
+ writeUtf8(s, isolate, data, buffer.size());
83
83
  buffer[size] = '\0';
84
84
  }
85
85
  else if (node::Buffer::HasInstance(info[1]))
@@ -134,10 +134,10 @@ NAN_METHOD(WrappedRE2::New)
134
134
  auto isolate = v8::Isolate::GetCurrent();
135
135
  auto t = re->GetSource()->ToString(Nan::GetCurrentContext());
136
136
  auto s = t.ToLocalChecked();
137
- size = s->Utf8Length(isolate);
137
+ size = utf8Length(s, isolate);
138
138
  buffer.resize(size + 1);
139
139
  data = &buffer[0];
140
- s->WriteUtf8(isolate, data, buffer.size());
140
+ writeUtf8(s, isolate, data, buffer.size());
141
141
  buffer[size] = '\0';
142
142
 
143
143
  source = escapeRegExp(data, size);
@@ -192,10 +192,10 @@ NAN_METHOD(WrappedRE2::New)
192
192
  auto isolate = v8::Isolate::GetCurrent();
193
193
  auto t = info[0]->ToString(Nan::GetCurrentContext());
194
194
  auto s = t.ToLocalChecked();
195
- size = s->Utf8Length(isolate);
195
+ size = utf8Length(s, isolate);
196
196
  buffer.resize(size + 1);
197
197
  data = &buffer[0];
198
- s->WriteUtf8(isolate, data, buffer.size());
198
+ writeUtf8(s, isolate, data, buffer.size());
199
199
  buffer[size] = '\0';
200
200
 
201
201
  source = escapeRegExp(data, size);
package/lib/set.cc CHANGED
@@ -34,9 +34,9 @@ static bool parseFlags(const v8::Local<v8::Value> &arg, SetFlags &flags)
34
34
  return false;
35
35
  }
36
36
  auto s = t.ToLocalChecked();
37
- size = s->Utf8Length(isolate);
37
+ size = utf8Length(s, isolate);
38
38
  buffer.resize(size + 1);
39
- s->WriteUtf8(isolate, &buffer[0], buffer.size());
39
+ writeUtf8(s, isolate, &buffer[0], buffer.size());
40
40
  buffer[buffer.size() - 1] = '\0';
41
41
  data = &buffer[0];
42
42
  }
@@ -287,10 +287,10 @@ static bool fillInput(const v8::Local<v8::Value> &arg, StrVal &str, v8::Local<v8
287
287
  return false;
288
288
  }
289
289
  auto s = t.ToLocalChecked();
290
- auto utf8Length = s->Utf8Length(isolate);
290
+ auto len = utf8Length(s, isolate);
291
291
  auto buffer = node::Buffer::New(isolate, s).ToLocalChecked();
292
292
  keepAlive = buffer;
293
- str.reset(buffer, node::Buffer::Length(buffer), utf8Length, 0);
293
+ str.reset(buffer, node::Buffer::Length(buffer), len, 0);
294
294
  return true;
295
295
  }
296
296
 
@@ -331,7 +331,7 @@ static const char setDeprecationMessage[] = "BMP patterns aren't supported by no
331
331
  NAN_METHOD(WrappedRE2Set::New)
332
332
  {
333
333
  auto context = Nan::GetCurrentContext();
334
- auto isolate = context->GetIsolate();
334
+ auto isolate = v8::Isolate::GetCurrent();
335
335
 
336
336
  if (!info.IsConstructCall())
337
337
  {
@@ -340,7 +340,7 @@ NAN_METHOD(WrappedRE2Set::New)
340
340
  {
341
341
  parameters[i] = info[i];
342
342
  }
343
- auto isolate = context->GetIsolate();
343
+ auto isolate = v8::Isolate::GetCurrent();
344
344
  auto addonData = getAddonData(isolate);
345
345
  if (!addonData) return;
346
346
  auto maybeNew = Nan::NewInstance(Nan::GetFunction(addonData->re2SetTpl.Get(isolate)).ToLocalChecked(), parameters.size(), &parameters[0]);
@@ -513,9 +513,9 @@ NAN_METHOD(WrappedRE2Set::New)
513
513
  return;
514
514
  }
515
515
  auto s = t.ToLocalChecked();
516
- size = s->Utf8Length(isolate);
516
+ size = utf8Length(s, isolate);
517
517
  buffer.resize(size + 1);
518
- s->WriteUtf8(isolate, &buffer[0], buffer.size());
518
+ writeUtf8(s, isolate, &buffer[0], buffer.size());
519
519
  buffer[size] = '\0';
520
520
  data = &buffer[0];
521
521
  source = escapeRegExp(data, size);
@@ -528,9 +528,9 @@ NAN_METHOD(WrappedRE2Set::New)
528
528
  return;
529
529
  }
530
530
  auto s = t.ToLocalChecked();
531
- size = s->Utf8Length(isolate);
531
+ size = utf8Length(s, isolate);
532
532
  buffer.resize(size + 1);
533
- s->WriteUtf8(isolate, &buffer[0], buffer.size());
533
+ writeUtf8(s, isolate, &buffer[0], buffer.size());
534
534
  buffer[size] = '\0';
535
535
  data = &buffer[0];
536
536
  source = escapeRegExp(data, size);
package/lib/wrapped_re2.h CHANGED
@@ -225,6 +225,35 @@ inline size_t getUtf8CharSize(char ch)
225
225
  return ((0xE5000000 >> ((ch >> 3) & 0x1E)) & 3) + 1;
226
226
  }
227
227
 
228
+ // V8 13.4 introduced Utf8LengthV2 / WriteUtf8V2; V8 14.6 removed the bare
229
+ // Utf8Length / WriteUtf8. On older V8 (Node 22) only the bare forms exist.
230
+ #if defined(V8_MAJOR_VERSION) && (V8_MAJOR_VERSION > 13 || \
231
+ (V8_MAJOR_VERSION == 13 && defined(V8_MINOR_VERSION) && V8_MINOR_VERSION >= 4))
232
+
233
+ inline size_t utf8Length(v8::Local<v8::String> s, v8::Isolate *isolate)
234
+ {
235
+ return s->Utf8LengthV2(isolate);
236
+ }
237
+
238
+ inline void writeUtf8(v8::Local<v8::String> s, v8::Isolate *isolate, char *buffer, size_t capacity)
239
+ {
240
+ s->WriteUtf8V2(isolate, buffer, capacity);
241
+ }
242
+
243
+ #else
244
+
245
+ inline size_t utf8Length(v8::Local<v8::String> s, v8::Isolate *isolate)
246
+ {
247
+ return static_cast<size_t>(s->Utf8Length(isolate));
248
+ }
249
+
250
+ inline void writeUtf8(v8::Local<v8::String> s, v8::Isolate *isolate, char *buffer, size_t capacity)
251
+ {
252
+ s->WriteUtf8(isolate, buffer, static_cast<int>(capacity));
253
+ }
254
+
255
+ #endif
256
+
228
257
  inline size_t getUtf16PositionByCounter(const char *data, size_t from, size_t n)
229
258
  {
230
259
  for (; n > 0; --n)
package/llms-full.txt ADDED
@@ -0,0 +1,467 @@
1
+ # node-re2
2
+
3
+ > Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS (Regular Expression Denial of Service). Works with strings and Buffers. C++ native addon built with node-gyp and nan.
4
+
5
+ - Drop-in replacement for RegExp with linear-time matching guarantee
6
+ - Prevents ReDoS by disallowing backreferences and lookahead assertions
7
+ - Full Unicode mode (always on)
8
+ - Buffer support for high-performance binary/UTF-8 processing
9
+ - Named capture groups
10
+ - Symbol-based methods (Symbol.match, Symbol.search, Symbol.replace, Symbol.split, Symbol.matchAll)
11
+ - RE2.Set for multi-pattern matching
12
+ - Prebuilt binaries for Linux, macOS, Windows (x64 + arm64)
13
+ - TypeScript declarations included
14
+
15
+ ## Install
16
+
17
+ ```bash
18
+ npm install re2
19
+ ```
20
+
21
+ Prebuilt native binaries are downloaded automatically. Falls back to building from source via node-gyp if no prebuilt is available.
22
+
23
+ ## Quick start
24
+
25
+ ```js
26
+ const RE2 = require('re2');
27
+
28
+ // Create and use like RegExp
29
+ const re = new RE2('a(b*)', 'i');
30
+ const result = re.exec('aBbC');
31
+ console.log(result[0]); // "aBb"
32
+ console.log(result[1]); // "Bb"
33
+
34
+ // Works with ES6 string methods
35
+ 'hello world'.match(new RE2('\\w+', 'g')); // ['hello', 'world']
36
+ 'hello world'.replace(new RE2('world'), 'RE2'); // 'hello RE2'
37
+ ```
38
+
39
+ ## Importing
40
+
41
+ ```js
42
+ // CommonJS
43
+ const RE2 = require('re2');
44
+
45
+ // ESM
46
+ import { RE2 } from 're2';
47
+ ```
48
+
49
+ ## Construction
50
+
51
+ `new RE2(pattern[, flags])` or `RE2(pattern[, flags])` (factory mode).
52
+
53
+ Pattern can be:
54
+ - **String**: `new RE2('\\d+')`
55
+ - **String with flags**: `new RE2('\\d+', 'gi')`
56
+ - **RegExp**: `new RE2(/ab*/ig)` — copies pattern and flags.
57
+ - **RE2**: `new RE2(existingRE2)` — copies pattern and flags.
58
+ - **Buffer**: `new RE2(Buffer.from('pattern'))` — pattern from UTF-8 buffer.
59
+
60
+ Supported flags:
61
+ - `g` — global (find all matches)
62
+ - `i` — ignoreCase
63
+ - `m` — multiline (`^`/`$` match line boundaries)
64
+ - `s` — dotAll (`.` matches `\n`)
65
+ - `u` — unicode (always on, added implicitly)
66
+ - `y` — sticky (match at lastIndex only)
67
+ - `d` — hasIndices (include index info for capture groups)
68
+
69
+ Invalid patterns throw `SyntaxError`. Patterns with backreferences or lookahead throw `SyntaxError`.
70
+
71
+ ## Properties
72
+
73
+ ### Instance properties
74
+
75
+ - `re.source` (string) — the pattern string, escaped for use in `new RE2(re.source)` or `new RegExp(re.source)`.
76
+ - `re.flags` (string) — the flags string (e.g., `'giu'`).
77
+ - `re.lastIndex` (number) — the index at which to start the next match (used with `g` or `y` flags).
78
+ - `re.global` (boolean) — whether the `g` flag is set.
79
+ - `re.ignoreCase` (boolean) — whether the `i` flag is set.
80
+ - `re.multiline` (boolean) — whether the `m` flag is set.
81
+ - `re.dotAll` (boolean) — whether the `s` flag is set.
82
+ - `re.unicode` (boolean) — always `true` (RE2 always operates in Unicode mode).
83
+ - `re.sticky` (boolean) — whether the `y` flag is set.
84
+ - `re.hasIndices` (boolean) — whether the `d` flag is set.
85
+ - `re.internalSource` (string) — the RE2-translated pattern (for debugging; may differ from `source`).
86
+
87
+ ### Static properties
88
+
89
+ - `RE2.unicodeWarningLevel` (string) — controls behavior when a non-Unicode regexp is created:
90
+ - `'nothing'` (default) — silently add `u` flag.
91
+ - `'warnOnce'` — warn once, then silently add `u`. Assigning resets the one-time flag.
92
+ - `'warn'` — warn every time.
93
+ - `'throw'` — throw `SyntaxError` every time.
94
+
95
+ ## RegExp methods
96
+
97
+ ### re.exec(str)
98
+
99
+ Executes a search for a match. Returns a result array or `null`.
100
+
101
+ ```js
102
+ const re = new RE2('a(b+)', 'g');
103
+ const result = re.exec('abbc abbc');
104
+ // result[0] === 'abb'
105
+ // result[1] === 'bb'
106
+ // result.index === 0
107
+ // result.input === 'abbc abbc'
108
+ // re.lastIndex === 3
109
+ ```
110
+
111
+ With `d` flag (hasIndices), result has `.indices` property with `[start, end]` pairs for each group.
112
+
113
+ With `g` or `y` flag, advances `lastIndex`. Call repeatedly to iterate matches.
114
+
115
+ ### re.test(str)
116
+
117
+ Returns `true` if the pattern matches, `false` otherwise.
118
+
119
+ ```js
120
+ new RE2('\\d+').test('abc123'); // true
121
+ new RE2('\\d+').test('abcdef'); // false
122
+ ```
123
+
124
+ With `g` or `y` flag, advances `lastIndex`.
125
+
126
+ ### re.toString()
127
+
128
+ Returns `'/pattern/flags'` string representation.
129
+
130
+ ```js
131
+ new RE2('abc', 'gi').toString(); // '/abc/giu'
132
+ ```
133
+
134
+ ## String methods (via Symbol)
135
+
136
+ RE2 instances implement well-known symbols, so they work directly with ES6 string methods:
137
+
138
+ ### str.match(re) / re[Symbol.match](str)
139
+
140
+ ```js
141
+ 'test 123 test 456'.match(new RE2('\\d+', 'g')); // ['123', '456']
142
+ 'test 123'.match(new RE2('(\\d+)')); // ['123', '123', index: 5, input: 'test 123']
143
+ ```
144
+
145
+ ### str.matchAll(re) / re[Symbol.matchAll](str)
146
+
147
+ Returns an iterator of all matches (requires `g` flag).
148
+
149
+ ```js
150
+ const re = new RE2('\\d+', 'g');
151
+ for (const m of '1a2b3c'.matchAll(re)) {
152
+ console.log(m[0]); // '1', '2', '3'
153
+ }
154
+ ```
155
+
156
+ ### str.search(re) / re[Symbol.search](str)
157
+
158
+ Returns the index of the first match, or `-1`.
159
+
160
+ ```js
161
+ 'hello world'.search(new RE2('world')); // 6
162
+ ```
163
+
164
+ ### str.replace(re, replacement) / re[Symbol.replace](str, replacement)
165
+
166
+ Returns a new string with matches replaced.
167
+
168
+ ```js
169
+ 'aabba'.replace(new RE2('b', 'g'), 'c'); // 'aacca'
170
+ ```
171
+
172
+ Replacement string supports:
173
+ - `$1`, `$2`, ... — numbered capture groups.
174
+ - `$<name>` — named capture groups.
175
+ - `$&` — the matched substring.
176
+ - `` $` `` — portion before the match.
177
+ - `$'` — portion after the match.
178
+ - `$$` — literal `$`.
179
+
180
+ Replacement function receives `(match, ...groups, offset, input)`:
181
+
182
+ ```js
183
+ 'abc'.replace(new RE2('(b)'), (match, g1, offset) => `[${g1}@${offset}]`);
184
+ // 'a[b@1]c'
185
+ ```
186
+
187
+ ### str.split(re[, limit]) / re[Symbol.split](str[, limit])
188
+
189
+ Splits string by pattern.
190
+
191
+ ```js
192
+ 'a1b2c3'.split(new RE2('\\d')); // ['a', 'b', 'c', '']
193
+ 'a1b2c3'.split(new RE2('\\d'), 2); // ['a', 'b']
194
+ ```
195
+
196
+ ## String methods (direct)
197
+
198
+ These are convenience methods on the RE2 instance with swapped argument order:
199
+
200
+ - `re.match(str)` — equivalent to `str.match(re)`.
201
+ - `re.search(str)` — equivalent to `str.search(re)`.
202
+ - `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`.
203
+ - `re.split(str[, limit])` — equivalent to `str.split(re, limit)`.
204
+
205
+ ```js
206
+ const re = new RE2('\\d+', 'g');
207
+ re.match('test 123 test 456'); // ['123', '456']
208
+ re.search('test 123'); // 5
209
+ re.replace('test 1 and 2', 'N'); // 'test N and N' (global replaces all)
210
+ re.split('a1b2c'); // ['a', 'b', 'c']
211
+ ```
212
+
213
+ ## Buffer support
214
+
215
+ All methods accept Node.js Buffers (UTF-8) instead of strings. When given Buffer input, they return Buffer output.
216
+
217
+ ```js
218
+ const re = new RE2('матч', 'g');
219
+ const buf = Buffer.from('тест матч тест');
220
+ const result = re.exec(buf);
221
+ // result[0] is a Buffer containing 'матч' in UTF-8
222
+ // result.index is in bytes (not characters)
223
+ ```
224
+
225
+ Differences from string mode:
226
+ - All offsets and lengths are in **bytes**, not characters.
227
+ - Results contain Buffers instead of strings.
228
+ - Use `buf.toString()` to convert results back to strings.
229
+
230
+ ### useBuffers on replacer functions
231
+
232
+ When using `re.replace(buf, replacerFn)`, the replacer receives string arguments and character offsets by default. Set `replacerFn.useBuffers = true` to receive byte offsets instead:
233
+
234
+ ```js
235
+ function replacer(match, offset, input) {
236
+ return '<' + offset + ' bytes>';
237
+ }
238
+ replacer.useBuffers = true;
239
+ new RE2('б').replace(Buffer.from('абв'), replacer);
240
+ ```
241
+
242
+ ## RE2.Set
243
+
244
+ Multi-pattern matching — compile many patterns into a single automaton and test/match against all of them at once. Faster than testing individual patterns when the number of patterns is large.
245
+
246
+ ### Constructor
247
+
248
+ ```js
249
+ new RE2.Set(patterns[, flagsOrOptions][, options])
250
+ ```
251
+
252
+ - `patterns` — any iterable of strings, Buffers, RegExp, or RE2 instances.
253
+ - `flagsOrOptions` — optional string/Buffer with flags (apply to all patterns), or options object.
254
+ - `options.anchor` — `'unanchored'` (default), `'start'`, or `'both'`.
255
+
256
+ ```js
257
+ const set = new RE2.Set([
258
+ '^/users/\\d+$',
259
+ '^/posts/\\d+$',
260
+ '^/api/.*$'
261
+ ], 'i', {anchor: 'start'});
262
+ ```
263
+
264
+ ### set.test(str)
265
+
266
+ Returns `true` if any pattern matches, `false` otherwise.
267
+
268
+ ```js
269
+ set.test('/users/42'); // true
270
+ set.test('/unknown'); // false
271
+ ```
272
+
273
+ ### set.match(str)
274
+
275
+ Returns an array of indices of matching patterns, sorted ascending. Empty array if none match.
276
+
277
+ ```js
278
+ set.match('/users/42'); // [0]
279
+ set.match('/api/users'); // [2]
280
+ set.match('/unknown'); // []
281
+ ```
282
+
283
+ ### Properties
284
+
285
+ - `set.size` (number) — number of patterns.
286
+ - `set.source` (string) — all patterns joined with `|`.
287
+ - `set.sources` (string[]) — individual pattern sources.
288
+ - `set.flags` (string) — flags string.
289
+ - `set.anchor` (string) — anchor mode.
290
+
291
+ ### set.toString()
292
+
293
+ Returns `'/pattern1|pattern2|.../flags'`.
294
+
295
+ ```js
296
+ set.toString(); // '/^/users/\\d+$|^/posts/\\d+$|^/api/.*$/iu'
297
+ ```
298
+
299
+ ## Static helpers
300
+
301
+ ### RE2.getUtf8Length(str)
302
+
303
+ Calculate the byte size needed to encode a UTF-16 string as UTF-8.
304
+
305
+ ```js
306
+ RE2.getUtf8Length('hello'); // 5
307
+ RE2.getUtf8Length('привет'); // 12
308
+ ```
309
+
310
+ ### RE2.getUtf16Length(buf)
311
+
312
+ Calculate the character count needed to encode a UTF-8 buffer as a UTF-16 string.
313
+
314
+ ```js
315
+ RE2.getUtf16Length(Buffer.from('hello')); // 5
316
+ RE2.getUtf16Length(Buffer.from('привет')); // 6
317
+ ```
318
+
319
+ ## Named groups
320
+
321
+ Named capture groups are supported:
322
+
323
+ ```js
324
+ const re = new RE2('(?<year>\\d{4})-(?<month>\\d{2})-(?<day>\\d{2})');
325
+ const result = re.exec('2024-01-15');
326
+ result.groups.year; // '2024'
327
+ result.groups.month; // '01'
328
+ result.groups.day; // '15'
329
+ ```
330
+
331
+ Named backreferences in replacement strings:
332
+
333
+ ```js
334
+ '2024-01-15'.replace(
335
+ new RE2('(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})'),
336
+ '$<d>/$<m>/$<y>'
337
+ ); // '15/01/2024'
338
+ ```
339
+
340
+ ## Unicode classes
341
+
342
+ RE2 supports Unicode property escapes. Long names are translated to RE2 short names:
343
+
344
+ ```js
345
+ new RE2('\\p{Letter}+'); // same as \p{L}+
346
+ new RE2('\\p{Number}+'); // same as \p{N}+
347
+ new RE2('\\p{Script=Latin}+'); // same as \p{Latin}+
348
+ new RE2('\\p{sc=Cyrillic}+'); // same as \p{Cyrillic}+
349
+ new RE2('\\P{Letter}+'); // negated: non-letters
350
+ ```
351
+
352
+ Only `\p{name}` form is supported (not `\p{name=value}` in general). Exception: `Script` and `sc` names.
353
+
354
+ ## Limitations
355
+
356
+ RE2 does **not** support:
357
+
358
+ - **Backreferences** (`\1`, `\2`, etc.) — throw `SyntaxError`.
359
+ - **Lookahead assertions** (`(?=...)`, `(?!...)`) — throw `SyntaxError`.
360
+ - **Lookbehind assertions** (`(?<=...)`, `(?<!...)`) — throw `SyntaxError`.
361
+
362
+ Fallback pattern:
363
+
364
+ ```js
365
+ let re = /pattern-with-lookahead(?=foo)/;
366
+ try {
367
+ re = new RE2(re);
368
+ } catch (e) {
369
+ // use original RegExp as fallback
370
+ }
371
+ const result = re.exec(input);
372
+ ```
373
+
374
+ ## Common patterns
375
+
376
+ ### Drop-in RegExp replacement
377
+
378
+ ```js
379
+ const RE2 = require('re2');
380
+
381
+ // Before (vulnerable to ReDoS):
382
+ const re = new RegExp(userInput);
383
+
384
+ // After (safe):
385
+ const re = new RE2(userInput);
386
+ ```
387
+
388
+ ### Process Buffer data efficiently
389
+
390
+ ```js
391
+ const RE2 = require('re2');
392
+ const fs = require('fs');
393
+
394
+ const data = fs.readFileSync('large-file.txt');
395
+ const re = new RE2('pattern', 'g');
396
+ let match;
397
+ while ((match = re.exec(data)) !== null) {
398
+ console.log('Found at byte offset:', match.index);
399
+ }
400
+ ```
401
+
402
+ ### Route matching with RE2.Set
403
+
404
+ ```js
405
+ const RE2 = require('re2');
406
+
407
+ const routes = new RE2.Set([
408
+ '^/users/\\d+$',
409
+ '^/posts/\\d+$',
410
+ '^/api/v\\d+/.*$'
411
+ ], 'i');
412
+
413
+ function findRoute(path) {
414
+ const matches = routes.match(path);
415
+ return matches.length > 0 ? matches[0] : -1;
416
+ }
417
+
418
+ findRoute('/users/42'); // 0
419
+ findRoute('/posts/7'); // 1
420
+ findRoute('/api/v2/foo'); // 2
421
+ findRoute('/unknown'); // -1
422
+ ```
423
+
424
+ ### Validate user-supplied patterns safely
425
+
426
+ ```js
427
+ const RE2 = require('re2');
428
+
429
+ function safeMatch(input, pattern, flags) {
430
+ try {
431
+ const re = new RE2(pattern, flags);
432
+ return re.test(input);
433
+ } catch (e) {
434
+ return false; // invalid pattern
435
+ }
436
+ }
437
+ ```
438
+
439
+ ## TypeScript
440
+
441
+ ```ts
442
+ import RE2 from 're2';
443
+
444
+ const re: RE2 = new RE2('\\d+', 'g');
445
+ const result: RegExpExecArray | null = re.exec('test 123');
446
+
447
+ // Buffer overloads
448
+ const bufResult: RE2BufferExecArray | null = re.exec(Buffer.from('test 123'));
449
+
450
+ // RE2.Set
451
+ const set: RE2Set = new RE2.Set(['a', 'b'], 'i');
452
+ const matches: number[] = set.match('abc');
453
+ ```
454
+
455
+ ## Project structure notes
456
+
457
+ - Entry point: `re2.js` (loads native addon), types: `re2.d.ts`.
458
+ - C++ addon source: `lib/*.cc`, `lib/*.h`.
459
+ - Tests: `tests/test-*.mjs` (runtime), `ts-tests/test-*.ts` (type-checking).
460
+ - Vendored dependencies: `vendor/re2/`, `vendor/abseil-cpp/` (git submodules) — **never modify files under `vendor/`**.
461
+
462
+ ## Links
463
+
464
+ - Docs: https://github.com/uhop/node-re2/wiki
465
+ - npm: https://www.npmjs.com/package/re2
466
+ - Repository: https://github.com/uhop/node-re2
467
+ - RE2 syntax: https://github.com/google/re2/wiki/Syntax
package/llms.txt ADDED
@@ -0,0 +1,132 @@
1
+ # node-re2
2
+
3
+ > Node.js bindings for RE2: a fast, safe alternative to backtracking regular expression engines. Drop-in RegExp replacement that prevents ReDoS. Works with strings and Buffers.
4
+
5
+ ## Install
6
+
7
+ npm install re2
8
+
9
+ ## Quick start
10
+
11
+ ```js
12
+ // CommonJS
13
+ const RE2 = require('re2');
14
+
15
+ // ESM
16
+ import {RE2} from 're2';
17
+
18
+ const re = new RE2('a(b*)', 'i');
19
+ const result = re.exec('aBbC');
20
+ console.log(result[0]); // "aBb"
21
+ console.log(result[1]); // "Bb"
22
+ ```
23
+
24
+ ## Why use node-re2?
25
+
26
+ The built-in Node.js RegExp engine can run in exponential time with vulnerable patterns (ReDoS). RE2 guarantees linear-time matching by disallowing backreferences and lookahead assertions.
27
+
28
+ ## API
29
+
30
+ ### Construction
31
+
32
+ ```js
33
+ const RE2 = require('re2');
34
+
35
+ const re1 = new RE2('\\d+'); // from string
36
+ const re2 = new RE2('\\d+', 'gi'); // with flags
37
+ const re3 = new RE2(/ab*/ig); // from RegExp
38
+ const re4 = new RE2(re3); // from another RE2
39
+ const re5 = RE2('\\d+'); // factory (no new)
40
+ ```
41
+
42
+ Supported flags: `g` (global), `i` (ignoreCase), `m` (multiline), `s` (dotAll), `u` (unicode, always on), `y` (sticky), `d` (hasIndices).
43
+
44
+ ### RegExp methods
45
+
46
+ - `re.exec(str)` — find match with capture groups.
47
+ - `re.test(str)` — boolean match check.
48
+ - `re.toString()` — `/pattern/flags` representation.
49
+
50
+ ### String methods (via Symbol)
51
+
52
+ RE2 instances work with ES6 string methods:
53
+
54
+ ```js
55
+ 'abc'.match(re);
56
+ 'abc'.search(re);
57
+ 'abc'.replace(re, 'x');
58
+ 'abc'.split(re);
59
+ Array.from('abc'.matchAll(re));
60
+ ```
61
+
62
+ ### String methods (direct)
63
+
64
+ - `re.match(str)` — equivalent to `str.match(re)`.
65
+ - `re.search(str)` — equivalent to `str.search(re)`.
66
+ - `re.replace(str, replacement)` — equivalent to `str.replace(re, replacement)`.
67
+ - `re.split(str[, limit])` — equivalent to `str.split(re, limit)`.
68
+
69
+ ### Properties
70
+
71
+ - `re.source` — pattern string.
72
+ - `re.flags` — flags string.
73
+ - `re.lastIndex` — index for next match (with `g` or `y` flag).
74
+ - `re.global`, `re.ignoreCase`, `re.multiline`, `re.dotAll`, `re.unicode`, `re.sticky`, `re.hasIndices` — boolean flag accessors.
75
+ - `re.internalSource` — RE2-translated pattern (for debugging).
76
+
77
+ ### Buffer support
78
+
79
+ All methods accept Buffers (UTF-8) instead of strings. Buffer input produces Buffer output. Offsets are in bytes.
80
+
81
+ ```js
82
+ const re = new RE2('матч', 'g');
83
+ const buf = Buffer.from('тест матч тест');
84
+ const result = re.exec(buf);
85
+ // result[0] is a Buffer
86
+ ```
87
+
88
+ ### RE2.Set
89
+
90
+ Multi-pattern matching — test a string against many patterns at once.
91
+
92
+ ```js
93
+ const set = new RE2.Set(['^/users/\\d+$', '^/posts/\\d+$'], 'i');
94
+ set.test('/users/7'); // true
95
+ set.match('/posts/42'); // [1]
96
+ set.sources; // ['^/users/\\d+$', '^/posts/\\d+$']
97
+ ```
98
+
99
+ - `new RE2.Set(patterns[, flags][, options])` — compile patterns.
100
+ - `options.anchor`: `'unanchored'` (default), `'start'`, or `'both'`.
101
+ - `set.test(str)` — returns `true` if any pattern matches.
102
+ - `set.match(str)` — returns array of matching pattern indices.
103
+ - Properties: `size`, `source`, `sources`, `flags`, `anchor`.
104
+
105
+ ### Static helpers
106
+
107
+ - `RE2.getUtf8Length(str)` — byte size of string as UTF-8.
108
+ - `RE2.getUtf16Length(buf)` — character count of UTF-8 buffer as UTF-16 string.
109
+ - `RE2.unicodeWarningLevel` — `'nothing'` (default), `'warnOnce'`, `'warn'`, or `'throw'`.
110
+
111
+ ## Limitations
112
+
113
+ RE2 does not support:
114
+ - **Backreferences** (`\1`, `\2`, etc.)
115
+ - **Lookahead assertions** (`(?=...)`, `(?!...)`)
116
+
117
+ These throw `SyntaxError`. Use try-catch to fall back to RegExp when needed:
118
+
119
+ ```js
120
+ let re = /pattern-with-lookahead/;
121
+ try { re = new RE2(re); } catch (e) { /* use original RegExp */ }
122
+ ```
123
+
124
+ ## Project notes
125
+
126
+ - C++ addon source is in `lib/`. Vendored deps (`vendor/re2/`, `vendor/abseil-cpp/`) are git submodules — **never modify files under `vendor/`**.
127
+
128
+ ## Links
129
+
130
+ - Docs: https://github.com/uhop/node-re2/wiki
131
+ - npm: https://www.npmjs.com/package/re2
132
+ - Full LLM reference: https://github.com/uhop/node-re2/blob/master/llms-full.txt
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "re2",
3
- "version": "1.24.0",
3
+ "version": "1.24.1",
4
4
  "description": "Bindings for RE2: fast, safe alternative to backtracking regular expression engines.",
5
5
  "homepage": "https://github.com/uhop/node-re2",
6
6
  "bugs": "https://github.com/uhop/node-re2/issues",
@@ -8,24 +8,28 @@
8
8
  "main": "re2.js",
9
9
  "types": "re2.d.ts",
10
10
  "files": [
11
+ "AGENTS.md",
12
+ "ARCHITECTURE.md",
11
13
  "binding.gyp",
12
14
  "lib",
15
+ "llms-full.txt",
16
+ "llms.txt",
13
17
  "re2.d.ts",
14
18
  "scripts/*.js",
15
19
  "vendor"
16
20
  ],
17
21
  "dependencies": {
18
- "install-artifact-from-github": "^1.4.0",
19
- "nan": "^2.26.2",
20
- "node-gyp": "^12.2.0"
22
+ "install-artifact-from-github": "^1.6.0",
23
+ "nan": "^2.27.0",
24
+ "node-gyp": "^12.3.0"
21
25
  },
22
26
  "devDependencies": {
23
- "@types/node": "^25.5.0",
27
+ "@types/node": "^25.7.0",
24
28
  "nano-benchmark": "^1.0.15",
25
- "prettier": "^3.8.1",
26
- "tape-six": "^1.7.13",
27
- "tape-six-proc": "^1.2.8",
28
- "typescript": "^6.0.2"
29
+ "prettier": "^3.8.3",
30
+ "tape-six": "^1.9.0",
31
+ "tape-six-proc": "^1.2.9",
32
+ "typescript": "^6.0.3"
29
33
  },
30
34
  "scripts": {
31
35
  "test": "tape6 --flags FO",
@@ -49,7 +53,10 @@
49
53
  "github": "https://github.com/uhop/node-re2",
50
54
  "repository": {
51
55
  "type": "git",
52
- "url": "git://github.com/uhop/node-re2.git"
56
+ "url": "git+https://github.com/uhop/node-re2.git"
57
+ },
58
+ "engines": {
59
+ "node": ">=22"
53
60
  },
54
61
  "keywords": [
55
62
  "RegExp",
package/re2.js CHANGED
@@ -1,3 +1,4 @@
1
+ // @ts-self-types="./re2.d.ts"
1
2
  'use strict';
2
3
 
3
4
  const RE2 = require('./build/Release/re2.node');
@@ -16,7 +16,7 @@
16
16
 
17
17
  module(
18
18
  name = "abseil-cpp",
19
- version = "20260107.0",
19
+ version = "20260107.1",
20
20
  compatibility_level = 1,
21
21
  )
22
22
 
@@ -118,7 +118,7 @@
118
118
  // LTS releases can be obtained from
119
119
  // https://github.com/abseil/abseil-cpp/releases.
120
120
  #define ABSL_LTS_RELEASE_VERSION 20260107
121
- #define ABSL_LTS_RELEASE_PATCH_LEVEL 0
121
+ #define ABSL_LTS_RELEASE_PATCH_LEVEL 1
122
122
 
123
123
  // Helper macro to convert a CPP variable to a string literal.
124
124
  #define ABSL_INTERNAL_DO_TOKEN_STR(x) #x
@@ -104,7 +104,10 @@
104
104
  #define ABSL_HASH_INTERNAL_CRC32_U32 _mm_crc32_u32
105
105
  #define ABSL_HASH_INTERNAL_CRC32_U8 _mm_crc32_u8
106
106
 
107
- #elif defined(_MSC_VER) && !defined(__clang__) && defined(__AVX__)
107
+ // 32-bit builds with AVX do not have _mm_crc32_u64, so the _M_X64 condition is
108
+ // necessary.
109
+ #elif defined(_MSC_VER) && !defined(__clang__) && defined(__AVX__) && \
110
+ defined(_M_X64)
108
111
 
109
112
  // MSVC AVX (/arch:AVX) implies SSE 4.2.
110
113
  #include <intrin.h>
@@ -827,7 +827,7 @@ bool Base64UnescapeInternal(const char* absl_nullable src, size_t slen,
827
827
  }
828
828
 
829
829
  /* clang-format off */
830
- constexpr std::array<char, 256> kHexValueLenient = {
830
+ constexpr std::array<uint8_t, 256> kHexValueLenient = {
831
831
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
832
832
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
833
833
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
@@ -846,7 +846,7 @@ constexpr std::array<char, 256> kHexValueLenient = {
846
846
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
847
847
  };
848
848
 
849
- constexpr std::array<signed char, 256> kHexValueStrict = {
849
+ constexpr std::array<int8_t, 256> kHexValueStrict = {
850
850
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
851
851
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
852
852
  -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
@@ -874,7 +874,7 @@ void HexStringToBytesInternal(const char* absl_nullable from, T to,
874
874
  size_t num) {
875
875
  for (size_t i = 0; i < num; i++) {
876
876
  to[i] = static_cast<char>(kHexValueLenient[from[i * 2] & 0xFF] << 4) +
877
- (kHexValueLenient[from[i * 2 + 1] & 0xFF]);
877
+ static_cast<char>(kHexValueLenient[from[i * 2 + 1] & 0xFF]);
878
878
  }
879
879
  }
880
880
 
@@ -992,8 +992,10 @@ bool HexStringToBytes(absl::string_view hex, std::string* absl_nonnull bytes) {
992
992
  output, num_bytes, [hex](char* buf, size_t buf_size) {
993
993
  auto hex_p = hex.cbegin();
994
994
  for (size_t i = 0; i < buf_size; ++i) {
995
- int h1 = absl::kHexValueStrict[static_cast<size_t>(*hex_p++)];
996
- int h2 = absl::kHexValueStrict[static_cast<size_t>(*hex_p++)];
995
+ int h1 = absl::kHexValueStrict[static_cast<size_t>(
996
+ static_cast<uint8_t>(*hex_p++))];
997
+ int h2 = absl::kHexValueStrict[static_cast<size_t>(
998
+ static_cast<uint8_t>(*hex_p++))];
997
999
  if (h1 == -1 || h2 == -1) {
998
1000
  return size_t{0};
999
1001
  }
@@ -733,6 +733,10 @@ TEST(Escaping, HexStringToBytesBackToHex) {
733
733
  bytes = "abc";
734
734
  EXPECT_TRUE(absl::HexStringToBytes("", &bytes));
735
735
  EXPECT_EQ("", bytes); // Results in empty output.
736
+
737
+ // Ensure there is no sign extension bug on a signed char.
738
+ hex.assign("\xC8" "b", 2);
739
+ EXPECT_FALSE(absl::HexStringToBytes(hex, &bytes));
736
740
  }
737
741
 
738
742
  TEST(HexAndBack, HexStringToBytes_and_BytesToHexString) {