re2 1.23.3 → 1.24.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/AGENTS.md ADDED
@@ -0,0 +1,131 @@
1
+ # AGENTS.md — node-re2
2
+
3
+ > `node-re2` provides Node.js bindings for [RE2](https://github.com/google/re2): a fast, safe alternative to backtracking regular expression engines. The npm package name is `re2`. It is a C++ native addon built with `node-gyp` and `nan`.
4
+
5
+ For project structure, module dependencies, and the architecture overview see [ARCHITECTURE.md](./ARCHITECTURE.md).
6
+ For detailed usage docs see the [README](./README.md) and the [wiki](https://github.com/uhop/node-re2/wiki).
7
+
8
+ ## Setup
9
+
10
+ This project uses git submodules for vendored dependencies (RE2 and Abseil):
11
+
12
+ ```bash
13
+ git clone --recursive https://github.com/uhop/node-re2.git
14
+ cd node-re2
15
+ npm install
16
+ ```
17
+
18
+ If the native addon fails to download a prebuilt artifact, it builds locally via `node-gyp`.
19
+
20
+ ## Commands
21
+
22
+ - **Install:** `npm install` (downloads prebuilt artifact or builds from source)
23
+ - **Build (release):** `npm run rebuild` (or `node-gyp -j max rebuild`)
24
+ - **Build (debug):** `npm run rebuild:dev` (or `node-gyp -j max rebuild --debug`)
25
+ - **Test:** `npm test` (runs `tape6 --flags FO`, worker threads)
26
+ - **Test (sequential):** `npm run test:seq`
27
+ - **Test (multi-process):** `npm run test:proc`
28
+ - **Test (single file):** `node tests/test-<name>.mjs`
29
+ - **TypeScript check:** `npm run ts-check`
30
+ - **Lint:** `npm run lint` (Prettier check)
31
+ - **Lint fix:** `npm run lint:fix` (Prettier write)
32
+ - **Verify build:** `npm run verify-build`
33
+
34
+ ## Project structure
35
+
36
+ ```
37
+ node-re2/
38
+ ├── package.json # Package config; "tape6" section configures test discovery
39
+ ├── binding.gyp # node-gyp build configuration for the C++ addon
40
+ ├── re2.js # Main entry point: loads native addon, sets up Symbol aliases
41
+ ├── re2.d.ts # TypeScript declarations for the public API
42
+ ├── tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
43
+ ├── lib/ # C++ source code (native addon)
44
+ │ ├── addon.cc # Node.js addon initialization, method registration
45
+ │ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
46
+ │ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
47
+ │ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
48
+ │ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
49
+ │ ├── exec.cc # RE2.prototype.exec() implementation
50
+ │ ├── test.cc # RE2.prototype.test() implementation
51
+ │ ├── match.cc # RE2.prototype.match() implementation
52
+ │ ├── replace.cc # RE2.prototype.replace() implementation
53
+ │ ├── search.cc # RE2.prototype.search() implementation
54
+ │ ├── split.cc # RE2.prototype.split() implementation
55
+ │ ├── to_string.cc # RE2.prototype.toString() implementation
56
+ │ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
57
+ │ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
58
+ │ ├── set.cc # RE2.Set implementation (multi-pattern matching)
59
+ │ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
60
+ │ ├── util.h # Utility declarations
61
+ │ └── pattern.h # Pattern translation declarations
62
+ ├── scripts/
63
+ │ └── verify-build.js # Quick smoke test for the built addon
64
+ ├── tests/ # Test files (test-*.mjs using tape-six)
65
+ ├── ts-tests/ # TypeScript type-checking tests
66
+ │ └── test-types.ts # Verifies type declarations compile correctly
67
+ ├── bench/ # Benchmarks
68
+ ├── vendor/ # Vendored C++ dependencies (git submodules)
69
+ │ ├── re2/ # Google RE2 library source
70
+ │ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
71
+ └── .github/ # CI workflows, Dependabot config, actions
72
+ ```
73
+
74
+ ## Code style
75
+
76
+ - **CommonJS** throughout (`"type": "commonjs"` in package.json).
77
+ - **No transpilation** — JavaScript code runs directly.
78
+ - **C++ code** uses tabs for indentation, 4-wide. JavaScript uses 2-space indentation.
79
+ - **Prettier** for JS/TS formatting (see `.prettierrc`): 80 char width, single quotes, no bracket spacing, no trailing commas, arrow parens "avoid".
80
+ - **nan** (Native Abstractions for Node.js) for the C++ addon API.
81
+ - Semicolons are enforced by Prettier (default `semi: true`).
82
+ - Imports use `require()` syntax in source, `import` in tests (`.mjs`).
83
+
84
+ ## Critical rules
85
+
86
+ - **Do not modify vendored code.** Never edit files under `vendor/`. They are git submodules.
87
+ - **Do not modify or delete test expectations** without understanding why they changed.
88
+ - **Do not add comments or remove comments** unless explicitly asked.
89
+ - **Keep `re2.js` and `re2.d.ts` in sync.** All public API exposed from `re2.js` must be typed in `re2.d.ts`.
90
+ - **The addon must build on all supported platforms:** Linux (x64, arm64, Alpine), macOS (x64, arm64), Windows (x64, arm64).
91
+ - **RE2 is always Unicode-mode.** The `u` flag is always added implicitly.
92
+ - **Buffer support is a first-class feature.** All methods that accept strings must also accept Buffers, returning Buffers when given Buffer input.
93
+
94
+ ## Architecture
95
+
96
+ - `re2.js` is the main entry point. It loads the native C++ addon from `build/Release/re2.node` and sets up `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` on the prototype.
97
+ - The C++ addon (`lib/*.cc`) wraps Google's RE2 library via nan. Each RegExp method has its own `.cc` file.
98
+ - `lib/new.cc` handles construction: parsing patterns, translating RegExp syntax to RE2 syntax (via `lib/pattern.cc`), and creating the underlying `re2::RE2` instance.
99
+ - `lib/pattern.cc` translates JavaScript RegExp features to RE2 equivalents, including Unicode class names (`\p{Letter}` → `\p{L}`, `\p{Script=Latin}` → `\p{Latin}`).
100
+ - `lib/set.cc` implements `RE2.Set` for multi-pattern matching using `re2::RE2::Set`.
101
+ - `lib/util.cc` provides UTF-8 ↔ UTF-16 conversion helpers and buffer utilities.
102
+ - Prebuilt native artifacts are hosted on GitHub Releases and downloaded at install time via `install-artifact-from-github`.
103
+
104
+ ## Writing tests
105
+
106
+ ```js
107
+ import test from 'tape-six';
108
+ import {RE2} from '../re2.js';
109
+
110
+ test('example', t => {
111
+ const re = new RE2('a(b*)', 'i');
112
+ const result = re.exec('aBbC');
113
+ t.ok(result);
114
+ t.equal(result[0], 'aBb');
115
+ t.equal(result[1], 'Bb');
116
+ });
117
+ ```
118
+
119
+ - Test files use `tape-six`: `.mjs` for runtime tests, `.ts` for TypeScript typing tests.
120
+ - Test file naming convention: `test-*.mjs` in `tests/`, `test-*.ts` in `ts-tests/`.
121
+ - Tests are configured in `package.json` under the `"tape6"` section.
122
+ - Test files should be directly executable: `node tests/test-foo.mjs`.
123
+
124
+ ## Key conventions
125
+
126
+ - The library is a drop-in replacement for `RegExp` — the `RE2` object emulates the standard `RegExp` API.
127
+ - `RE2.Set` provides multi-pattern matching: `new RE2.Set(patterns, flags, options)`.
128
+ - Static helpers: `RE2.getUtf8Length(str)`, `RE2.getUtf16Length(buf)`.
129
+ - `RE2.unicodeWarningLevel` controls behavior when non-Unicode regexps are created.
130
+ - The `install` script tries to download a prebuilt `.node` artifact before falling back to `node-gyp rebuild`.
131
+ - All C++ source is in `lib/`, all vendored third-party C++ is in `vendor/`.
@@ -0,0 +1,152 @@
1
+ # Architecture
2
+
3
+ `node-re2` provides Node.js bindings for Google's [RE2](https://github.com/google/re2) regular expression engine. It is a C++ native addon built with `node-gyp` and `nan`. The `RE2` object is a drop-in replacement for `RegExp` with guaranteed linear-time matching (no ReDoS).
4
+
5
+ ## Project layout
6
+
7
+ ```
8
+ package.json # Package config; "tape6" section configures test discovery
9
+ binding.gyp # node-gyp build configuration for the C++ addon
10
+ re2.js # Main entry point: loads native addon, sets up Symbol aliases
11
+ re2.d.ts # TypeScript declarations for the public API
12
+ tsconfig.json # TypeScript config (noEmit, strict, types: ["node"])
13
+ lib/ # C++ source code (native addon)
14
+ ├── addon.cc # Node.js addon initialization, method registration
15
+ ├── wrapped_re2.h # WrappedRE2 class definition (core C++ wrapper)
16
+ ├── wrapped_re2_set.h # WrappedRE2Set class definition (RE2.Set wrapper)
17
+ ├── isolate_data.h # Per-isolate data struct for thread-safe addon state
18
+ ├── new.cc # Constructor: parse pattern/flags, create RE2 instance
19
+ ├── exec.cc # RE2.prototype.exec() implementation
20
+ ├── test.cc # RE2.prototype.test() implementation
21
+ ├── match.cc # RE2.prototype.match() implementation
22
+ ├── replace.cc # RE2.prototype.replace() implementation
23
+ ├── search.cc # RE2.prototype.search() implementation
24
+ ├── split.cc # RE2.prototype.split() implementation
25
+ ├── to_string.cc # RE2.prototype.toString() implementation
26
+ ├── accessors.cc # Property accessors (source, flags, lastIndex, etc.)
27
+ ├── pattern.cc # Pattern translation (RegExp → RE2 syntax, Unicode classes)
28
+ ├── pattern.h # Pattern translation declarations
29
+ ├── set.cc # RE2.Set implementation (multi-pattern matching)
30
+ ├── util.cc # Shared utilities (UTF-8/UTF-16 conversion, buffer helpers)
31
+ └── util.h # Utility declarations
32
+ scripts/
33
+ └── verify-build.js # Quick smoke test for the built addon
34
+ tests/ # Test files (test-*.mjs using tape-six)
35
+ ts-tests/ # TypeScript type-checking tests
36
+ └── test-types.ts # Verifies type declarations compile correctly
37
+ bench/ # Benchmarks
38
+ vendor/ # Vendored C++ dependencies (git submodules) — DO NOT MODIFY
39
+ ├── re2/ # Google RE2 library source
40
+ └── abseil-cpp/ # Abseil C++ library (RE2 dependency)
41
+ .github/ # CI workflows, Dependabot config, actions
42
+ ```
43
+
44
+ ## Core concepts
45
+
46
+ ### How the addon works
47
+
48
+ 1. `re2.js` is the entry point. It loads the compiled C++ addon from `build/Release/re2.node`.
49
+ 2. The addon exposes an `RE2` constructor that wraps `re2::RE2` from Google's RE2 library.
50
+ 3. `re2.js` adds `Symbol.match`, `Symbol.search`, `Symbol.replace`, `Symbol.split`, and `Symbol.matchAll` to the prototype so `RE2` instances work with ES6 string methods.
51
+ 4. The `RE2` constructor can be called with or without `new` (factory mode).
52
+
53
+ ### C++ addon structure
54
+
55
+ Each RegExp method has its own `.cc` file for maintainability:
56
+
57
+ | File | Purpose |
58
+ | --------------- | ---------------------------------------------------------------- |
59
+ | `addon.cc` | Node.js module initialization, registers all methods/accessors |
60
+ | `isolate_data.h` | Per-isolate data struct (`AddonData`) for thread-safe addon state |
61
+ | `wrapped_re2.h` | `WrappedRE2` class: holds `re2::RE2*`, flags, lastIndex, source |
62
+ | `new.cc` | Constructor: parses pattern + flags, translates syntax, creates RE2 instance |
63
+ | `exec.cc` | `exec()` — find match with capture groups |
64
+ | `test.cc` | `test()` — boolean match check |
65
+ | `match.cc` | `match()` — String.prototype.match equivalent |
66
+ | `replace.cc` | `replace()` — substitution with string or function replacer |
67
+ | `search.cc` | `search()` — find index of first match |
68
+ | `split.cc` | `split()` — split string by pattern |
69
+ | `to_string.cc` | `toString()` — `/pattern/flags` representation |
70
+ | `accessors.cc` | Property getters: `source`, `flags`, `lastIndex`, `global`, `ignoreCase`, `multiline`, `dotAll`, `unicode`, `sticky`, `hasIndices`, `internalSource` |
71
+ | `pattern.cc` | Translates JS RegExp syntax to RE2 syntax, maps Unicode property names |
72
+ | `set.cc` | `RE2.Set` — multi-pattern matching via `re2::RE2::Set` |
73
+ | `util.cc` | UTF-8 ↔ UTF-16 conversion, buffer/string helpers |
74
+
75
+ ### Pattern translation (pattern.cc)
76
+
77
+ JavaScript RegExp features are translated to RE2 equivalents:
78
+
79
+ - Named groups: `(?<name>...)` syntax is preserved (RE2 supports it natively).
80
+ - Unicode classes: long names like `\p{Letter}` are mapped to short names `\p{L}`. Script names like `\p{Script=Latin}` are mapped to `\p{Latin}`.
81
+ - Backreferences and lookahead assertions are **not supported** — RE2 throws `SyntaxError`.
82
+
83
+ ### Buffer support
84
+
85
+ All methods accept both strings and Node.js Buffers:
86
+
87
+ - Buffer inputs are assumed UTF-8 encoded.
88
+ - Buffer inputs produce Buffer outputs (in composite result objects too).
89
+ - Offsets and lengths are in bytes (not characters) when using Buffers.
90
+ - The `useBuffers` property on replacer functions controls offset reporting in `replace()`.
91
+
92
+ ### RE2.Set (set.cc)
93
+
94
+ Multi-pattern matching using `re2::RE2::Set`:
95
+
96
+ - `new RE2.Set(patterns, flags?, options?)` — compile multiple patterns into a single automaton.
97
+ - `set.test(str)` — returns `true` if any pattern matches.
98
+ - `set.match(str)` — returns array of indices of matching patterns.
99
+ - Properties: `size`, `source`, `sources`, `flags`, `anchor`.
100
+
101
+ ### Build system
102
+
103
+ - `binding.gyp` defines the node-gyp build: compiles all `.cc` files in `lib/` plus vendored RE2 and Abseil sources.
104
+ - Platform-specific compiler flags are set for GCC, Clang, and MSVC.
105
+ - The `install` npm script first tries to download a prebuilt `re2.node` from GitHub Releases via `install-artifact-from-github`, falling back to a local `node-gyp rebuild`.
106
+ - Prebuilt artifacts cover: Linux (x64, arm64, Alpine/musl), macOS (x64, arm64), Windows (x64, arm64).
107
+
108
+ ## Module dependency graph
109
+
110
+ ```
111
+ re2.js ──→ build/Release/re2.node (compiled C++ addon)
112
+
113
+ ├── lib/addon.cc (init)
114
+ │ ├── lib/new.cc ──→ lib/pattern.cc
115
+ │ ├── lib/exec.cc
116
+ │ ├── lib/test.cc
117
+ │ ├── lib/match.cc
118
+ │ ├── lib/replace.cc
119
+ │ ├── lib/search.cc
120
+ │ ├── lib/split.cc
121
+ │ ├── lib/to_string.cc
122
+ │ ├── lib/accessors.cc
123
+ │ └── lib/set.cc
124
+
125
+ ├── lib/wrapped_re2.h (shared class definition)
126
+ ├── lib/wrapped_re2_set.h (RE2.Set class)
127
+ ├── lib/util.cc / lib/util.h (shared utilities)
128
+
129
+ └── vendor/ (re2 + abseil-cpp)
130
+ ```
131
+
132
+ ## Testing
133
+
134
+ - **Framework**: tape-six (`tape6`)
135
+ - **Run all**: `npm test` (worker threads via `tape6 --flags FO`)
136
+ - **Run sequential**: `npm run test:seq`
137
+ - **Run multi-process**: `npm run test:proc`
138
+ - **Run single file**: `node tests/test-<name>.mjs`
139
+ - **TypeScript check**: `npm run ts-check`
140
+ - **Lint**: `npm run lint` (Prettier check)
141
+ - **Lint fix**: `npm run lint:fix` (Prettier write)
142
+ - **Verify build**: `npm run verify-build` (quick smoke test)
143
+
144
+ ## Import paths
145
+
146
+ ```js
147
+ // CommonJS (source, scripts)
148
+ const RE2 = require('re2');
149
+
150
+ // ESM (tests)
151
+ import {RE2} from '../re2.js';
152
+ ```
package/LICENSE CHANGED
@@ -7,7 +7,7 @@ The text of the BSD license is reproduced below.
7
7
  The "New" BSD License:
8
8
  **********************
9
9
 
10
- Copyright (c) 2005-2025, Eugene Lazutkin
10
+ Copyright (c) 2005-2026, Eugene Lazutkin
11
11
  All rights reserved.
12
12
 
13
13
  Redistribution and use in source and binary forms, with or without