re2 1.23.3 → 1.24.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +131 -0
- package/ARCHITECTURE.md +152 -0
- package/LICENSE +1 -1
- package/README.md +122 -151
- package/lib/accessors.cc +1 -1
- package/lib/addon.cc +38 -7
- package/lib/isolate_data.h +12 -0
- package/lib/new.cc +10 -9
- package/lib/pattern.cc +1 -1
- package/lib/set.cc +19 -13
- package/lib/split.cc +16 -2
- package/lib/test.cc +1 -1
- package/lib/to_string.cc +4 -0
- package/lib/wrapped_re2.h +37 -6
- package/lib/wrapped_re2_set.h +5 -3
- package/llms-full.txt +467 -0
- package/llms.txt +132 -0
- package/package.json +23 -13
- package/re2.d.ts +5 -0
- package/re2.js +2 -0
- package/vendor/abseil-cpp/MODULE.bazel +1 -1
- package/vendor/abseil-cpp/absl/base/config.h +1 -1
- package/vendor/abseil-cpp/absl/hash/internal/hash.h +4 -1
- package/vendor/abseil-cpp/absl/strings/escaping.cc +7 -5
- package/vendor/abseil-cpp/absl/strings/escaping_test.cc +4 -0
package/AGENTS.md
ADDED
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
# AGENTS.md — node-re2
|
|
2
|
+
|
|
3
|
+
> `node-re2` provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`.
|
|
4
|
+
|
|
5
|
+
For project structure, module dependencies, and the architecture overview see [ARCHITECTURE.md](./ARCHITECTURE.md).
|
|
6
|
+
For detailed usage docs see the [README](./README.md) and the [wiki](https://github.com/uhop/node-re2/wiki).
|
|
7
|
+
|
|
8
|
+
## Setup
|
|
9
|
+
|
|
10
|
+
This project uses git submodules for vendored dependencies (RE2 and Abseil):
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
git clone --recursive https://github.com/uhop/node-re2.git
|
|
14
|
+
cd node-re2
|
|
15
|
+
npm install
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
If the native addon fails to download a prebuilt artifact, it builds locally via `node-gyp`.
|
|
19
|
+
|
|
20
|
+
## Commands
|
|
21
|
+
|
|
22
|
+
- **Install:** `npm install` (downloads prebuilt artifact or builds from source)
|
|
23
|
+
- **Build (release):** `npm run rebuild` (or `node-gyp -j max rebuild`)
|
|
24
|
+
- **Build (debug):** `npm run rebuild:dev` (or `node-gyp -j max rebuild --debug`)
|
|
25
|
+
- **Test:** `npm test` (runs `tape6 --flags FO`, worker threads)
|
|
26
|
+
- **Test (sequential):** `npm run test:seq`
|
|
27
|
+
- **Test (multi-process):** `npm run test:proc`
|
|
28
|
+
- **Test (single file):** `node tests/test-<name>.mjs`
|
|
29
|
+
- **TypeScript check:** `npm run ts-check`
|
|
30
|
+
- **Lint:** `npm run lint` (Prettier check)
|
|
31
|
+
- **Lint fix:** `npm run lint:fix` (Prettier write)
|
|
32
|
+
- **Verify build:** `npm run verify-build`
|
|
33
|
+
|
|
34
|
+
## Project structure
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
node-re2/
|
|
38
|
+
├── package.json # Package config; "tape6" section configures test discovery
|
|
39
|
+
├── binding.gyp # node-gyp build configuration for the C++ addon
|
|
40
|
+
├── re2.js # Main entry point: loads native addon, sets up Symbol aliases
|
|
41
|
+
├── re2.d.ts # TypeScript declarations for the public API
|
|
42
|
+
├── tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
|
|
43
|
+
├── lib/ # C++ source code (native addon)
|
|
44
|
+
│ ├── addon.cc # Node.js addon initialization, method registration
|
|
45
|
+
│ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
|
|
46
|
+
│ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
|
|
47
|
+
│ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
|
|
48
|
+
│ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
|
|
49
|
+
│ ├── exec.cc # RE2.prototype.exec() implementation
|
|
50
|
+
│ ├── test.cc # RE2.prototype.test() implementation
|
|
51
|
+
│ ├── match.cc # RE2.prototype.match() implementation
|
|
52
|
+
│ ├── replace.cc # RE2.prototype.replace() implementation
|
|
53
|
+
│ ├── search.cc # RE2.prototype.search() implementation
|
|
54
|
+
│ ├── split.cc # RE2.prototype.split() implementation
|
|
55
|
+
│ ├── to_string.cc # RE2.prototype.toString() implementation
|
|
56
|
+
│ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
|
|
57
|
+
│ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
|
|
58
|
+
│ ├── set.cc # RE2.Set implementation (multi-pattern matching)
|
|
59
|
+
│ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
|
|
60
|
+
│ ├── util.h # Utility declarations
|
|
61
|
+
│ └── pattern.h # Pattern translation declarations
|
|
62
|
+
├── scripts/
|
|
63
|
+
│ └── verify-build.js # Quick smoke test for the built addon
|
|
64
|
+
├── tests/ # Test files (test-*.mjs using tape-six)
|
|
65
|
+
├── ts-tests/ # TypeScript type-checking tests
|
|
66
|
+
│ └── test-types.ts # Verifies type declarations compile correctly
|
|
67
|
+
├── bench/ # Benchmarks
|
|
68
|
+
├── vendor/ # Vendored C++ dependencies (git submodules)
|
|
69
|
+
│ ├── re2/ # Google RE2 library source
|
|
70
|
+
│ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
|
|
71
|
+
└── .github/ # CI workflows, Dependabot config, actions
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
## Code style
|
|
75
|
+
|
|
76
|
+
- **CommonJS** throughout (`"type": "commonjs"` in package.json).
|
|
77
|
+
- **No transpilation** — JavaScript code runs directly.
|
|
78
|
+
- **C++ code** uses tabs for indentation, 4-wide. JavaScript uses 2-space indentation.
|
|
79
|
+
- **Prettier** for JS/TS formatting (see `.prettierrc`): 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid".
|
|
80
|
+
- **nan** (Native Abstractions for Node.js) for the C++ addon API.
|
|
81
|
+
- Semicolons are enforced by Prettier (default `semi: true`).
|
|
82
|
+
- Imports use `require()` syntax in source, `import` in tests (`.mjs`).
|
|
83
|
+
|
|
84
|
+
## Critical rules
|
|
85
|
+
|
|
86
|
+
- **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules.
|
|
87
|
+
- **Do not modify or delete test expectations** without understanding why they changed.
|
|
88
|
+
- **Do not add comments or remove comments** unless explicitly asked.
|
|
89
|
+
- **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`.
|
|
90
|
+
- **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64).
|
|
91
|
+
- **RE2 is always Unicode-mode.** The `u` flag is always added implicitly.
|
|
92
|
+
- **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input.
|
|
93
|
+
|
|
94
|
+
## Architecture
|
|
95
|
+
|
|
96
|
+
- `re2.js` is the main entry point. It loads the native C++ addon from `build/Release/re2.node` and sets up `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` on the prototype.
|
|
97
|
+
- The C++ addon (`lib/*.cc`) wraps Google's RE2 library via nan. Each RegExp method has its own `.cc` file.
|
|
98
|
+
- `lib/new.cc` handles construction: parsing patterns, translating RegExp syntax to RE2 syntax (via `lib/pattern.cc`), and creating the underlying `re2::RE2` instance.
|
|
99
|
+
- `lib/pattern.cc` translates JavaScript RegExp features to RE2 equivalents, including Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`).
|
|
100
|
+
- `lib/set.cc` implements `RE2.Set` for multi-pattern matching using `re2::RE2::Set`.
|
|
101
|
+
- `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion helpers and buffer utilities.
|
|
102
|
+
- Prebuilt native artifacts are hosted on GitHub Releases and downloaded at install time via `install-artifact-from-github`.
|
|
103
|
+
|
|
104
|
+
## Writing tests
|
|
105
|
+
|
|
106
|
+
```js
|
|
107
|
+
import test from 'tape-six';
|
|
108
|
+
import {RE2} from '../re2.js';
|
|
109
|
+
|
|
110
|
+
test('example', t => {
|
|
111
|
+
const re = new RE2('a(b*)', 'i');
|
|
112
|
+
const result = re.exec('aBbC');
|
|
113
|
+
t.ok(result);
|
|
114
|
+
t.equal(result[0], 'aBb');
|
|
115
|
+
t.equal(result[1], 'Bb');
|
|
116
|
+
});
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
- Test files use `tape-six`: `.mjs` for runtime tests, `.ts` for TypeScript typing tests.
|
|
120
|
+
- Test file naming convention: `test-*.mjs` in `tests/`, `test-*.ts` in `ts-tests/`.
|
|
121
|
+
- Tests are configured in `package.json` under the `"tape6"` section.
|
|
122
|
+
- Test files should be directly executable: `node tests/test-foo.mjs`.
|
|
123
|
+
|
|
124
|
+
## Key conventions
|
|
125
|
+
|
|
126
|
+
- The library is a drop-in replacement for `RegExp` — the `RE2` object emulates the standard `RegExp` API.
|
|
127
|
+
- `RE2.Set` provides multi-pattern matching: `new RE2.Set(patterns, flags, options)`.
|
|
128
|
+
- Static helpers: `RE2.getUtf8Length(str)`, `RE2.getUtf16Length(buf)`.
|
|
129
|
+
- `RE2.unicodeWarningLevel` controls behavior when non-Unicode regexps are created.
|
|
130
|
+
- The `install` script tries to download a prebuilt `.node` artifact before falling back to `node-gyp rebuild`.
|
|
131
|
+
- All C++ source is in `lib/`, all vendored third-party C++ is in `vendor/`.
|
package/ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
`node-re2` provides Node.js bindings for Google's [RE2](https://github.com/google/re2) regular expression engine. It is a C++ native addon built with `node-gyp` and `nan`. The `RE2` object is a drop-in replacement for `RegExp` with guaranteed linear-time matching (no ReDoS).
|
|
4
|
+
|
|
5
|
+
## Project layout
|
|
6
|
+
|
|
7
|
+
```
|
|
8
|
+
package.json # Package config; "tape6" section configures test discovery
|
|
9
|
+
binding.gyp # node-gyp build configuration for the C++ addon
|
|
10
|
+
re2.js # Main entry point: loads native addon, sets up Symbol aliases
|
|
11
|
+
re2.d.ts # TypeScript declarations for the public API
|
|
12
|
+
tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
|
|
13
|
+
lib/ # C++ source code (native addon)
|
|
14
|
+
├── addon.cc # Node.js addon initialization, method registration
|
|
15
|
+
├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
|
|
16
|
+
├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
|
|
17
|
+
├── isolate_data.h # Per-isolate data struct for thread-safe addon state
|
|
18
|
+
├── new.cc # Constructor: parse pattern/flags, create RE2 instance
|
|
19
|
+
├── exec.cc # RE2.prototype.exec() implementation
|
|
20
|
+
├── test.cc # RE2.prototype.test() implementation
|
|
21
|
+
├── match.cc # RE2.prototype.match() implementation
|
|
22
|
+
├── replace.cc # RE2.prototype.replace() implementation
|
|
23
|
+
├── search.cc # RE2.prototype.search() implementation
|
|
24
|
+
├── split.cc # RE2.prototype.split() implementation
|
|
25
|
+
├── to_string.cc # RE2.prototype.toString() implementation
|
|
26
|
+
├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
|
|
27
|
+
├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
|
|
28
|
+
├── pattern.h # Pattern translation declarations
|
|
29
|
+
├── set.cc # RE2.Set implementation (multi-pattern matching)
|
|
30
|
+
├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
|
|
31
|
+
└── util.h # Utility declarations
|
|
32
|
+
scripts/
|
|
33
|
+
└── verify-build.js # Quick smoke test for the built addon
|
|
34
|
+
tests/ # Test files (test-*.mjs using tape-six)
|
|
35
|
+
ts-tests/ # TypeScript type-checking tests
|
|
36
|
+
└── test-types.ts # Verifies type declarations compile correctly
|
|
37
|
+
bench/ # Benchmarks
|
|
38
|
+
vendor/ # Vendored C++ dependencies (git submodules) — DO NOT MODIFY
|
|
39
|
+
├── re2/ # Google RE2 library source
|
|
40
|
+
└── abseil-cpp/ # Abseil C++ library (RE2 dependency)
|
|
41
|
+
.github/ # CI workflows, Dependabot config, actions
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Core concepts
|
|
45
|
+
|
|
46
|
+
### How the addon works
|
|
47
|
+
|
|
48
|
+
1. `re2.js` is the entry point. It loads the compiled C++ addon from `build/Release/re2.node`.
|
|
49
|
+
2. The addon exposes an `RE2` constructor that wraps `re2::RE2` from Google's RE2 library.
|
|
50
|
+
3. `re2.js` adds `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` to the prototype so `RE2` instances work with ES6 string methods.
|
|
51
|
+
4. The `RE2` constructor can be called with or without `new` (factory mode).
|
|
52
|
+
|
|
53
|
+
### C++ addon structure
|
|
54
|
+
|
|
55
|
+
Each RegExp method has its own `.cc` file for maintainability:
|
|
56
|
+
|
|
57
|
+
| File | Purpose |
|
|
58
|
+
| --------------- | ---------------------------------------------------------------- |
|
|
59
|
+
| `addon.cc` | Node.js module initialization, registers all methods/accessors |
|
|
60
|
+
| `isolate_data.h` | Per-isolate data struct (`AddonData`) for thread-safe addon state |
|
|
61
|
+
| `wrapped_re2.h` | `WrappedRE2` class: holds `re2::RE2*`, flags, lastIndex, source |
|
|
62
|
+
| `new.cc` | Constructor: parses pattern + flags, translates syntax, creates RE2 instance |
|
|
63
|
+
| `exec.cc` | `exec()` — find match with capture groups |
|
|
64
|
+
| `test.cc` | `test()` — boolean match check |
|
|
65
|
+
| `match.cc` | `match()` — String.prototype.match equivalent |
|
|
66
|
+
| `replace.cc` | `replace()` — substitution with string or function replacer |
|
|
67
|
+
| `search.cc` | `search()` — find index of first match |
|
|
68
|
+
| `split.cc` | `split()` — split string by pattern |
|
|
69
|
+
| `to_string.cc` | `toString()` — `/pattern/flags` representation |
|
|
70
|
+
| `accessors.cc` | Property getters: `source`, `flags`, `lastIndex`, `global`, `ignoreCase`, `multiline`, `dotAll`, `unicode`, `sticky`, `hasIndices`, `internalSource` |
|
|
71
|
+
| `pattern.cc` | Translates JS RegExp syntax to RE2 syntax, maps Unicode property names |
|
|
72
|
+
| `set.cc` | `RE2.Set` — multi-pattern matching via `re2::RE2::Set` |
|
|
73
|
+
| `util.cc` | UTF-8 ↔ UTF-16 conversion, buffer/string helpers |
|
|
74
|
+
|
|
75
|
+
### Pattern translation (pattern.cc)
|
|
76
|
+
|
|
77
|
+
JavaScript RegExp features are translated to RE2 equivalents:
|
|
78
|
+
|
|
79
|
+
- Named groups: `(?<name>...)` syntax is preserved (RE2 supports it natively).
|
|
80
|
+
- Unicode classes: long names like `\p{Letter}` are mapped to short names `\p{L}`. Script names like `\p{Script=Latin}` are mapped to `\p{Latin}`.
|
|
81
|
+
- Backreferences and lookahead assertions are **not supported** — RE2 throws `SyntaxError`.
|
|
82
|
+
|
|
83
|
+
### Buffer support
|
|
84
|
+
|
|
85
|
+
All methods accept both strings and Node.js Buffers:
|
|
86
|
+
|
|
87
|
+
- Buffer inputs are assumed UTF-8 encoded.
|
|
88
|
+
- Buffer inputs produce Buffer outputs (in composite result objects too).
|
|
89
|
+
- Offsets and lengths are in bytes (not characters) when using Buffers.
|
|
90
|
+
- The `useBuffers` property on replacer functions controls offset reporting in `replace()`.
|
|
91
|
+
|
|
92
|
+
### RE2.Set (set.cc)
|
|
93
|
+
|
|
94
|
+
Multi-pattern matching using `re2::RE2::Set`:
|
|
95
|
+
|
|
96
|
+
- `new RE2.Set(patterns, flags?, options?)` — compile multiple patterns into a single automaton.
|
|
97
|
+
- `set.test(str)` — returns `true` if any pattern matches.
|
|
98
|
+
- `set.match(str)` — returns array of indices of matching patterns.
|
|
99
|
+
- Properties: `size`, `source`, `sources`, `flags`, `anchor`.
|
|
100
|
+
|
|
101
|
+
### Build system
|
|
102
|
+
|
|
103
|
+
- `binding.gyp` defines the node-gyp build: compiles all `.cc` files in `lib/` plus vendored RE2 and Abseil sources.
|
|
104
|
+
- Platform-specific compiler flags are set for GCC, Clang, and MSVC.
|
|
105
|
+
- The `install` npm script first tries to download a prebuilt `re2.node` from GitHub Releases via `install-artifact-from-github`, falling back to a local `node-gyp rebuild`.
|
|
106
|
+
- Prebuilt artifacts cover: Linux (x64, arm64, Alpine/musl), macOS (x64, arm64), Windows (x64, arm64).
|
|
107
|
+
|
|
108
|
+
## Module dependency graph
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
re2.js ──→ build/Release/re2.node (compiled C++ addon)
|
|
112
|
+
│
|
|
113
|
+
├── lib/addon.cc (init)
|
|
114
|
+
│ ├── lib/new.cc ──→ lib/pattern.cc
|
|
115
|
+
│ ├── lib/exec.cc
|
|
116
|
+
│ ├── lib/test.cc
|
|
117
|
+
│ ├── lib/match.cc
|
|
118
|
+
│ ├── lib/replace.cc
|
|
119
|
+
│ ├── lib/search.cc
|
|
120
|
+
│ ├── lib/split.cc
|
|
121
|
+
│ ├── lib/to_string.cc
|
|
122
|
+
│ ├── lib/accessors.cc
|
|
123
|
+
│ └── lib/set.cc
|
|
124
|
+
│
|
|
125
|
+
├── lib/wrapped_re2.h (shared class definition)
|
|
126
|
+
├── lib/wrapped_re2_set.h (RE2.Set class)
|
|
127
|
+
├── lib/util.cc / lib/util.h (shared utilities)
|
|
128
|
+
│
|
|
129
|
+
└── vendor/ (re2 + abseil-cpp)
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Testing
|
|
133
|
+
|
|
134
|
+
- **Framework**: tape-six (`tape6`)
|
|
135
|
+
- **Run all**: `npm test` (worker threads via `tape6 --flags FO`)
|
|
136
|
+
- **Run sequential**: `npm run test:seq`
|
|
137
|
+
- **Run multi-process**: `npm run test:proc`
|
|
138
|
+
- **Run single file**: `node tests/test-<name>.mjs`
|
|
139
|
+
- **TypeScript check**: `npm run ts-check`
|
|
140
|
+
- **Lint**: `npm run lint` (Prettier check)
|
|
141
|
+
- **Lint fix**: `npm run lint:fix` (Prettier write)
|
|
142
|
+
- **Verify build**: `npm run verify-build` (quick smoke test)
|
|
143
|
+
|
|
144
|
+
## Import paths
|
|
145
|
+
|
|
146
|
+
```js
|
|
147
|
+
// CommonJS (source, scripts)
|
|
148
|
+
const RE2 = require('re2');
|
|
149
|
+
|
|
150
|
+
// ESM (tests)
|
|
151
|
+
import {RE2} from '../re2.js';
|
|
152
|
+
```
|
package/LICENSE
CHANGED
|
@@ -7,7 +7,7 @@ The text of the BSD license is reproduced below.
|
|
|
7
7
|
The "New" BSD License:
|
|
8
8
|
**********************
|
|
9
9
|
|
|
10
|
-
Copyright (c) 2005-
|
|
10
|
+
Copyright (c) 2005-2026, Eugene Lazutkin
|
|
11
11
|
All rights reserved.
|
|
12
12
|
|
|
13
13
|
Redistribution and use in source and binary forms, with or without
|