easy-regex-lib 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +546 -0
- package/dist/index.cjs +1037 -0
- package/dist/index.d.cts +308 -0
- package/dist/index.d.ts +308 -0
- package/dist/index.js +972 -0
- package/package.json +39 -0
package/README.md
ADDED
|
@@ -0,0 +1,546 @@
|
|
|
1
|
+
# regex-lib
|
|
2
|
+
|
|
3
|
+
**Semantic patterns → real regex.** Build composable, typed-friendly patterns from primitives (digits, letters, anchors, groups), compile them to JavaScript `RegExp`, and ship extras like plain-English explanations, JSON serialization, and lightweight safety hints.
|
|
4
|
+
|
|
5
|
+
This library is **AST-first**: everything compiles from an immutable pattern tree. The fluent API is sugar over that tree—so the same structure can power explanations, diagnostics, and tooling later.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Installation
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
npm install regex-lib
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
Requires a modern JavaScript runtime with support for `RegExp` named capture groups if you use `.named()` / `namedGroup()` (Node 10+, modern browsers).
|
|
16
|
+
|
|
17
|
+
---
|
|
18
|
+
|
|
19
|
+
## Regex coverage (evaluation)
|
|
20
|
+
|
|
21
|
+
**No — `regex-lib` does not model “all of regex.”** It models a **structured subset** that maps cleanly to an AST and JavaScript `RegExp`, plus an escape hatch.
|
|
22
|
+
|
|
23
|
+
| Area | Status in this library | Notes |
|
|
24
|
+
| ------------------------------------------------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
|
|
25
|
+
| Concatenation, alternation (` | `) | Supported | Via `seq`, fluent `.take`, `alt`. Compiler inserts `(?:…)` when precedence requires it. |
|
|
26
|
+
| Quantifiers `?`, `*`, `+`, `{n}`, `{n,}`, `{n,m}` | Supported | Greedy by default. Lazy via `repeat(..., false)` / `optional(..., false)`. Helpers like `digit().oneOrMore()` are **greedy** only—use `repeat` for lazy. |
|
|
27
|
+
| Literal text | Supported | `literal()`, `.text()` / `.literal()` on builder; metacharacters escaped on emit. |
|
|
28
|
+
| `^` / `$` | Supported | `start()`, `end()`, `.start()`, `.end()`. Meaning follows `m` (multiline) flag like normal JS. |
|
|
29
|
+
| `.` (dot) | Supported | `anyChar()` → `.`; with `dotAll` (`s`) matches line breaks like JS. |
|
|
30
|
+
| `\d`, `\w`, `\s` | Supported | `digit()`, `word()`, `whitespace()` (Unicode variants set node flags; digit/word still use JS shorthands—engine defines exact sets when `u` is on). |
|
|
31
|
+
| Letters (ASCII / Unicode) | Partially structured | ASCII: character classes. Unicode: `\p{L}`, `\p{Ll}`, `\p{Lu}` when `letter({ unicode: true })`. Arbitrary `\p{…}` is **not** first-class—use `raw()`. |
|
|
32
|
+
| Hex digits | Supported | `hexDigit()` → `[0-9A-Fa-f]` (or uppercase-only variant). |
|
|
33
|
+
| Non-capturing groups `(?:…)` | Supported | `nonCapturing(child)`. |
|
|
34
|
+
| Named captures `(?<name>…)` | Supported | `namedGroup`, `.named()`. |
|
|
35
|
+
| Numbered captures `(…)` | **Not** as a dedicated node | Use `raw()` if you truly need `(foo)` numbering, or prefer named groups. |
|
|
36
|
+
| Custom bracket expressions `[…]`, `[^…]` | **Not** first-class | Compose with `alt`/`literal` for tiny sets; otherwise `raw("[a-z_]+")`. |
|
|
37
|
+
| Word boundary `\b` | Supported | `wordBoundary()`, `.boundary()`. |
|
|
38
|
+
| Flags `g`, `i`, `m`, `s`, `u`, `y` | Supported | Pass via `compile` / `compilePattern` `flags`; some nodes **infer** `u` (Unicode letters). |
|
|
39
|
+
| Lookahead / lookbehind | **Not** first-class | `(?=…)`, `(?!…)`, `(?<=…)`, `(?<!…)` → `raw()` (subject to JS engine rules). |
|
|
40
|
+
| Backreferences `\1`, `\k<name>` | **Not** first-class | `raw()`. |
|
|
41
|
+
| Atomic groups, possessive quantifiers | **Not** supported | JS `RegExp` has no native possessive/atomic syntax; `raw()` only if you target another engine in the future. |
|
|
42
|
+
| Comments, verbose mode, recursion, conditionals | **Not** supported | PCRE-only or other-engine features → `raw()` or another tool. |
|
|
43
|
+
| “Full” RFC validators (email, URL, …) | **Not** guaranteed | Use presets or domain-specific logic; regex alone is rarely sufficient for specs. |
|
|
44
|
+
|
|
45
|
+
**Practical takeaway:** use **`regex-lib`** for readable **structured** patterns and tooling (explain / serialize). When you need arbitrary regex power, use **`raw()`** sparingly and treat it as **unsafe-by-default** (warnings + manual ReDoS review). You are still bound by **JavaScript’s** regex flavor regardless.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## Basic examples (start here)
|
|
50
|
+
|
|
51
|
+
### Digits only (whole string)
|
|
52
|
+
|
|
53
|
+
```ts
|
|
54
|
+
import { compile, digit, regex } from "regex-lib";
|
|
55
|
+
|
|
56
|
+
const onlyDigits = regex().start().take(digit().oneOrMore()).end().compile();
|
|
57
|
+
onlyDigits.test("90210"); // true
|
|
58
|
+
onlyDigits.test("42a"); // false
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
**Explanation:** `digit()` is one “atom”; `.oneOrMore()` turns it into `\d+`. Wrapping with `.start()` / `.end()` makes the match cover the **entire** string (same idea as `^\d+$`).
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
### Literal text + optional digits
|
|
66
|
+
|
|
67
|
+
```ts
|
|
68
|
+
import { compile, digit, literal, optional, seq } from "regex-lib";
|
|
69
|
+
|
|
70
|
+
const ast = seq(literal("id:"), optional(digit().oneOrMore()));
|
|
71
|
+
const p = compile(ast);
|
|
72
|
+
p.test("id:"); // true
|
|
73
|
+
p.test("id:99"); // true
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Explanation:** `literal("id:")` emits escaped text. `optional(…)` is the regex `?` applied to `\d+`, so digits are **allowed but not required** after the prefix.
|
|
77
|
+
|
|
78
|
+
---
|
|
79
|
+
|
|
80
|
+
### Either “yes” or “no” (case-sensitive)
|
|
81
|
+
|
|
82
|
+
```ts
|
|
83
|
+
import { alt, compile, literal } from "regex-lib";
|
|
84
|
+
|
|
85
|
+
const p = compile(alt(literal("yes"), literal("no")));
|
|
86
|
+
p.test("yes"); // true
|
|
87
|
+
p.test("YES"); // false
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
**Explanation:** `alt` is alternation (`|`). There are no anchors here, so this regex **matches substrings** anywhere unless you add `.start()` / `.end()` or use `^…$` via anchors.
|
|
91
|
+
|
|
92
|
+
---
|
|
93
|
+
|
|
94
|
+
### Case-insensitive match (flag)
|
|
95
|
+
|
|
96
|
+
```ts
|
|
97
|
+
import { compile, literal, regex } from "regex-lib";
|
|
98
|
+
|
|
99
|
+
const p = regex()
|
|
100
|
+
.start()
|
|
101
|
+
.literal("hello")
|
|
102
|
+
.end()
|
|
103
|
+
.compile({ flags: { ignoreCase: true } });
|
|
104
|
+
|
|
105
|
+
p.test("HELLO"); // true
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
**Explanation:** Flags are **not** magic syntax—they’re normal `RegExp` flags. `ignoreCase: true` adds `i`.
|
|
109
|
+
|
|
110
|
+
---
|
|
111
|
+
|
|
112
|
+
### One digit **or** one letter
|
|
113
|
+
|
|
114
|
+
```ts
|
|
115
|
+
import { alt, compile, digit, letter } from "regex-lib";
|
|
116
|
+
|
|
117
|
+
const p = compile(alt(digit(), letter({ case: "both" })));
|
|
118
|
+
p.test("5"); // true
|
|
119
|
+
p.test("a"); // true
|
|
120
|
+
p.test("ab"); // false (only one character in this pattern)
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
**Explanation:** `alt` chooses between two **single-character** alternatives here. To allow longer runs, wrap repeats: e.g. `digit().oneOrMore()` vs `letter().oneOrMore()`.
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
### Single digit shortcut on the fluent builder
|
|
128
|
+
|
|
129
|
+
```ts
|
|
130
|
+
import { regex } from "regex-lib";
|
|
131
|
+
|
|
132
|
+
const p = regex().start().digit().end().compile();
|
|
133
|
+
p.test("7"); // true
|
|
134
|
+
p.test("77"); // false
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
**Explanation:** `.digit()` appends **exactly one** `\d` (not “some digits”). Use `.take(digit().oneOrMore())` when you need repetition.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Example 1 — Fluent builder (readable “starts with / then / ends with”)
|
|
142
|
+
|
|
143
|
+
```ts
|
|
144
|
+
import { digit, letter, regex } from "regex-lib";
|
|
145
|
+
|
|
146
|
+
const pattern = regex()
|
|
147
|
+
.start()
|
|
148
|
+
.take(digit().exactly(3))
|
|
149
|
+
.dash()
|
|
150
|
+
.take(letter({ case: "upper" }).oneOrMore())
|
|
151
|
+
.end()
|
|
152
|
+
.compile();
|
|
153
|
+
|
|
154
|
+
pattern.source; // ^\d{3}-[A-Z]+$
|
|
155
|
+
pattern.test("123-ABC"); // true
|
|
156
|
+
pattern.test("123-ab"); // false
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
**What this example shows**
|
|
160
|
+
|
|
161
|
+
- **`regex()` / `match()`** — Start a fluent `MatchBuilder`. Both names behave the same; pick whichever reads better in your codebase.
|
|
162
|
+
- **`.start()` / `.end()`** — Map to `^` and `$` so the match is anchored to the whole string (unless you change flags or omit anchors).
|
|
163
|
+
- **`.take(...)`** — Accepts any composed **pattern** (AST node). Here we pass quantified primitives.
|
|
164
|
+
- **`digit().exactly(3)`** — Three digits. Methods like `.oneOrMore()`, `.maybe()`, `.between(min, max)` work the same way on other primitives (`letter`, `hexDigit`, …).
|
|
165
|
+
- **`.dash()`** — Inserts a literal `-` (escaped correctly at compile time).
|
|
166
|
+
- **`.compile()`** — Produces a **`CompiledPattern`**: wrapper around the AST with `.test()`, `.toRegExp()`, `.explain()`, etc.
|
|
167
|
+
|
|
168
|
+
**Why it helps**
|
|
169
|
+
|
|
170
|
+
You express **intent** (“three digits, a dash, then uppercase letters”) without manually juggling escapes, quantifier precedence, or accidental capturing groups.
|
|
171
|
+
|
|
172
|
+
---
|
|
173
|
+
|
|
174
|
+
## Example 2 — Functional composition (`seq`, `alt`, `literal`)
|
|
175
|
+
|
|
176
|
+
```ts
|
|
177
|
+
import { alt, compile, digit, letter, literal, seq } from "regex-lib";
|
|
178
|
+
|
|
179
|
+
const ast = seq(
|
|
180
|
+
literal("sku_"),
|
|
181
|
+
digit().between(2, 5),
|
|
182
|
+
literal("-"),
|
|
183
|
+
alt(letter({ case: "upper" }), digit()),
|
|
184
|
+
);
|
|
185
|
+
|
|
186
|
+
const p = compile(ast);
|
|
187
|
+
p.test("sku_999-A"); // true
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
**What this example shows**
|
|
191
|
+
|
|
192
|
+
- **`seq(...)`** — Concatenates fragments left-to-right (like implicit concatenation in regex).
|
|
193
|
+
- **`alt(...)`** — Alternation (`|`). The compiler adds non-capturing groups when needed so precedence stays correct (for example `(?:a|b)c`, not `ac|bc`).
|
|
194
|
+
- **`literal("...")`** — Raw text with meta-characters escaped on emit.
|
|
195
|
+
- **`compile(ast)`** — Same end state as `.compile()` on the builder: a **`CompiledPattern`** from an arbitrary AST.
|
|
196
|
+
|
|
197
|
+
**Why it helps**
|
|
198
|
+
|
|
199
|
+
Fluent chains are great for linear shapes; **`seq` / `alt`** shine when patterns are **data-driven** (built from config), recursive, or generated by your own helpers.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## Example 3 — Named capture groups
|
|
204
|
+
|
|
205
|
+
```ts
|
|
206
|
+
import { compile, digit, letter, match } from "regex-lib";
|
|
207
|
+
|
|
208
|
+
const p = match()
|
|
209
|
+
.start()
|
|
210
|
+
.named("id", digit().oneOrMore())
|
|
211
|
+
.literal("@")
|
|
212
|
+
.named("host", letter({ case: "lower" }).oneOrMore())
|
|
213
|
+
.end()
|
|
214
|
+
.compile();
|
|
215
|
+
|
|
216
|
+
const m = p.exec("42@example");
|
|
217
|
+
console.log(m?.groups?.id); // "42"
|
|
218
|
+
console.log(m?.groups?.host); // "example"
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**What this example shows**
|
|
222
|
+
|
|
223
|
+
- **`.named(name, inner)`** — Emits `(?<name>…)` in modern JavaScript.
|
|
224
|
+
- **`CompiledPattern.exec`** — Delegates to `RegExp.prototype.exec`, so **`groups`** works like native regex.
|
|
225
|
+
|
|
226
|
+
**Why it helps**
|
|
227
|
+
|
|
228
|
+
Named groups document **what** each slice means (“id”, “host”) instead of counting `$1`, `$2`. Downstream code stays readable and refactor-safe.
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## Example 4 — `compilePattern` vs `compile` (string + flags vs wrapper)
|
|
233
|
+
|
|
234
|
+
```ts
|
|
235
|
+
import { compilePattern, digit, regex } from "regex-lib";
|
|
236
|
+
|
|
237
|
+
const ast = regex().start().take(digit().oneOrMore()).end().build();
|
|
238
|
+
|
|
239
|
+
const { pattern, flags, warnings } = compilePattern(ast);
|
|
240
|
+
|
|
241
|
+
console.log(pattern); // ^\d+$
|
|
242
|
+
console.log(flags); // ""
|
|
243
|
+
console.log(warnings); // e.g. notes about raw fragments if you used `raw()`
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
**What this example shows**
|
|
247
|
+
|
|
248
|
+
- **`build()`** — Materializes the AST from a fluent builder **without** wrapping it.
|
|
249
|
+
- **`compilePattern(ast)`** — Low-level: returns **`pattern`** (body string), **`flags`** string, and **`warnings`** (for example when **`raw()`** fragments appear).
|
|
250
|
+
|
|
251
|
+
Use **`compile(ast)`** when you want a **`CompiledPattern`** API; use **`compilePattern`** when integrating with libraries that only accept a pattern string and flags (OpenAPI `pattern`, legacy validators, etc.).
|
|
252
|
+
|
|
253
|
+
---
|
|
254
|
+
|
|
255
|
+
## Example 5 — Plain-English explanation from the AST
|
|
256
|
+
|
|
257
|
+
```ts
|
|
258
|
+
import { explainPattern, literal, seq, digit } from "regex-lib";
|
|
259
|
+
|
|
260
|
+
const ast = seq(literal("user_"), digit().oneOrMore());
|
|
261
|
+
|
|
262
|
+
const { clauses, summary } = explainPattern(ast);
|
|
263
|
+
|
|
264
|
+
console.log(summary);
|
|
265
|
+
// Matches the literal "user_". Repeat one or more times: Matches an ASCII digit.
|
|
266
|
+
|
|
267
|
+
console.log(clauses.map((c) => c.text));
|
|
268
|
+
// [ 'Matches the literal "user_".',
|
|
269
|
+
// 'Repeat one or more times: Matches an ASCII digit.' ]
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
**What this example shows**
|
|
273
|
+
|
|
274
|
+
- **`explainPattern`** walks the **same AST** used for codegen and builds short natural-language clauses plus a **`summary`** string.
|
|
275
|
+
- **`clauses`** — Structured lines you could feed to a UI, docs generator, or logs.
|
|
276
|
+
|
|
277
|
+
**Why it helps**
|
|
278
|
+
|
|
279
|
+
Regex errors are notoriously opaque. Explanations tied to structure give humans (and support teams) a **second representation** that stays in sync with what actually ships.
|
|
280
|
+
|
|
281
|
+
---
|
|
282
|
+
|
|
283
|
+
## Example 6 — Diagnostics when a string fails (full-string check)
|
|
284
|
+
|
|
285
|
+
```ts
|
|
286
|
+
import { diagnose, regex, digit } from "regex-lib";
|
|
287
|
+
|
|
288
|
+
const ast = regex().start().take(digit().exactly(3)).end().build();
|
|
289
|
+
|
|
290
|
+
console.log(diagnose(ast, "12"));
|
|
291
|
+
// { ok: false, index: 2, message: "...", expected: "..." }
|
|
292
|
+
|
|
293
|
+
console.log(diagnose(ast, "123"));
|
|
294
|
+
// { ok: true, match: "123", index: 0, groups: {} }
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
**What this example shows**
|
|
298
|
+
|
|
299
|
+
- **`diagnose`** first tries an **anchored full-string** match (`^(?:…)$` internally), then falls back to a **linear simulation** for many shapes to suggest **where** things went wrong.
|
|
300
|
+
- Results include **`expected`** text derived from explanations (best-effort, not a full regex debugger).
|
|
301
|
+
|
|
302
|
+
**Caveat**
|
|
303
|
+
|
|
304
|
+
Lazy quantifiers, complex backtracking, and ambiguous alternatives can make simulation and the real engine disagree. Treat **`diagnose`** as a **UX helper**, not a formal verifier.
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## Example 7 — Lightweight safety hints (`raw()` and nested quantifiers)
|
|
309
|
+
|
|
310
|
+
```ts
|
|
311
|
+
import { analyzePattern, digit, raw, repeat, seq } from "regex-lib";
|
|
312
|
+
|
|
313
|
+
const suspicious = seq(
|
|
314
|
+
repeat(digit(), 0, Number.POSITIVE_INFINITY),
|
|
315
|
+
raw("\\d+"),
|
|
316
|
+
);
|
|
317
|
+
|
|
318
|
+
console.log(analyzePattern(suspicious));
|
|
319
|
+
// Includes warnings/info about raw regex and nested quantifier shapes
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
**What this example shows**
|
|
323
|
+
|
|
324
|
+
- **`raw("…")`** — Escape hatch: embed a regex **fragment** as-is. The compiler emits a **warning** because you bypass structural guarantees (including ReDoS reviews).
|
|
325
|
+
- **`analyzePattern`** — Rule-based **hints** (not a full ReDoS solver). Use it to flag patterns worth human review.
|
|
326
|
+
|
|
327
|
+
**Why it helps**
|
|
328
|
+
|
|
329
|
+
Interop with existing regex snippets is unavoidable; pairing **`raw()`** with warnings keeps that path **visible** in audits and CI output.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## Example 8 — JSON serialization (AI/tooling friendly)
|
|
334
|
+
|
|
335
|
+
```ts
|
|
336
|
+
import {
|
|
337
|
+
deserializePattern,
|
|
338
|
+
patternFromJsonString,
|
|
339
|
+
patternToJsonString,
|
|
340
|
+
presets,
|
|
341
|
+
compilePattern,
|
|
342
|
+
} from "regex-lib";
|
|
343
|
+
|
|
344
|
+
const ast = presets.slug();
|
|
345
|
+
|
|
346
|
+
const jsonText = patternToJsonString(ast, 2);
|
|
347
|
+
console.log(jsonText);
|
|
348
|
+
|
|
349
|
+
const roundTrip = deserializePattern(JSON.parse(jsonText));
|
|
350
|
+
console.log(compilePattern(roundTrip).pattern === compilePattern(ast).pattern); // true
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
**What this example shows**
|
|
354
|
+
|
|
355
|
+
- **`patternToJsonString` / `serializePattern`** — Stable JSON with **`schemaVersion`** for forward compatibility.
|
|
356
|
+
- **`deserializePattern` / `patternFromJsonString`** — Parse and validate payloads back into AST nodes.
|
|
357
|
+
|
|
358
|
+
**Why it helps**
|
|
359
|
+
|
|
360
|
+
Agents and services can exchange **structured intent** (JSON AST) instead of brittle plain-regex strings, then compile locally with your approved engine flags.
|
|
361
|
+
|
|
362
|
+
---
|
|
363
|
+
|
|
364
|
+
## Example 9 — Presets (UUID-shaped, slug, hex color)
|
|
365
|
+
|
|
366
|
+
```ts
|
|
367
|
+
import { compile, presets } from "regex-lib";
|
|
368
|
+
|
|
369
|
+
const uuid = compile(presets.uuid(), { flags: { ignoreCase: true } });
|
|
370
|
+
uuid.test("550e8400-e29b-41d4-a716-446655440000"); // true
|
|
371
|
+
|
|
372
|
+
const slug = compile(presets.slug());
|
|
373
|
+
slug.test("hello-world-2"); // true
|
|
374
|
+
|
|
375
|
+
const color = compile(presets.hexColor({ alpha: true }));
|
|
376
|
+
color.test("#abc");
|
|
377
|
+
color.test("#aabbcc");
|
|
378
|
+
color.test("#aabbccdd");
|
|
379
|
+
```
|
|
380
|
+
|
|
381
|
+
**What each preset means (practically)**
|
|
382
|
+
|
|
383
|
+
- **`presets.uuid()`** — Hex digit runs with hyphen layout typical of UUID strings. It does **not** fully validate RFC 4122 variant/version bits; tighten rules if you need cryptographic guarantees.
|
|
384
|
+
- **`presets.slug()`** — Opinionated ASCII slug: lowercase letters and digits, segments separated by single dashes.
|
|
385
|
+
- **`presets.hexColor({ alpha: true })`** — `#RGB`, `#RRGGBB`, and optional `#RRGGBBAA` when **`alpha`** is enabled.
|
|
386
|
+
|
|
387
|
+
**Why presets exist**
|
|
388
|
+
|
|
389
|
+
They encode **product defaults** once (tests + explanations), instead of copy-pasting fragile regex across services.
|
|
390
|
+
|
|
391
|
+
---
|
|
392
|
+
|
|
393
|
+
## Example 10 — Unicode-aware letters (`u` flag inference)
|
|
394
|
+
|
|
395
|
+
```ts
|
|
396
|
+
import { compilePattern, letter } from "regex-lib";
|
|
397
|
+
|
|
398
|
+
const { pattern, flags } = compilePattern(letter({ unicode: true }));
|
|
399
|
+
console.log(pattern); // \p{L}
|
|
400
|
+
console.log(flags); // includes "u"
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**What this example shows**
|
|
404
|
+
|
|
405
|
+
- Some nodes **infer** engine flags (here **`u`** for Unicode property escapes).
|
|
406
|
+
- You can still pass **`compile({ flags: { … } })`** to merge **ignoreCase**, **multiline**, etc.
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
## Public API reference
|
|
411
|
+
|
|
412
|
+
Types below use short names; import types from the package as needed (`Pattern`, `CompileOptions`, …).
|
|
413
|
+
|
|
414
|
+
### Constants
|
|
415
|
+
|
|
416
|
+
| Name | Returns | Description |
|
|
417
|
+
| ------------------------ | -------- | ------------------------------------------------------------------------------------- |
|
|
418
|
+
| `PATTERN_SCHEMA_VERSION` | `number` | Version baked into `serializePattern` / JSON payloads for forward-compatible tooling. |
|
|
419
|
+
|
|
420
|
+
### AST helpers (combine & shape)
|
|
421
|
+
|
|
422
|
+
| Function | Returns | Description |
|
|
423
|
+
| ---------------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------- |
|
|
424
|
+
| `literal(value)` | `Pattern` | Literal string; regex metacharacters escaped when compiling. |
|
|
425
|
+
| `seq(...children)` | `Pattern` | Concatenate fragments left-to-right; flattens nested `seq`. |
|
|
426
|
+
| `alt(...children)` | `Pattern` | Alternation (` | `); flattens nested `alt`. |
|
|
427
|
+
| `repeat(child, min, max, greedy?)` | `Pattern` | Bounded repetition; uses `Infinity` for open-ended `{n,}`. Fourth argument `greedy` (default `true`); pass `false` for lazy `*?`, `+?`, `{…}?`. Collapses `min===max===1`. Maps `(0,1)` to `optional`. |
|
|
428
|
+
| `optional(child, greedy?)` | `Pattern` | `child` made optional (`?` when greedy, `??` when `greedy` is `false`). |
|
|
429
|
+
| `namedGroup(name, child)` | `Pattern` | Named capture `(?<name>…)` in JavaScript. |
|
|
430
|
+
| `nonCapturing(child)` | `Pattern` | Non-capturing group `(?:…)` for grouping/precedence. |
|
|
431
|
+
| `optimize(pattern)` | `Pattern` | Normalize tree (flatten `seq`/`alt`, simplify structure). Used internally by compile/explain too. |
|
|
432
|
+
|
|
433
|
+
### Character-class-style primitives
|
|
434
|
+
|
|
435
|
+
Quantified helpers (`digit`, `letter`, …) return **`Quantified`**: a `Pattern` plus `.exactly`, `.oneOrMore`, `.maybe`, etc. Those chain methods compile to **greedy** quantifiers; use `repeat` / `optional` with `greedy: false` when you need lazy behavior.
|
|
436
|
+
|
|
437
|
+
| Function | Returns | Description |
|
|
438
|
+
| ------------------- | ------------ | --------------------------------------------------------------------------------- |
|
|
439
|
+
| `digit(opts?)` | `Quantified` | `\d` atom (see JS `u` for Unicode digit semantics). |
|
|
440
|
+
| `word(opts?)` | `Quantified` | `\w` atom. |
|
|
441
|
+
| `whitespace(opts?)` | `Quantified` | `\s` atom. |
|
|
442
|
+
| `letter(opts?)` | `Quantified` | ASCII classes or `\p{L}` / `\p{Ll}` / `\p{Lu}` when `unicode: true` (infers `u`). |
|
|
443
|
+
| `hexDigit(opts?)` | `Quantified` | `[0-9A-Fa-f]` or uppercase-only variant. |
|
|
444
|
+
| `anyChar()` | `Quantified` | `.` (dot); respects `s` (`dotAll`) like normal `RegExp`. |
|
|
445
|
+
|
|
446
|
+
### Anchors & boundaries
|
|
447
|
+
|
|
448
|
+
| Function | Returns | Description |
|
|
449
|
+
| ---------------- | --------- | ----------- |
|
|
450
|
+
| `start()` | `Pattern` | `^` |
|
|
451
|
+
| `end()` | `Pattern` | `$` |
|
|
452
|
+
| `wordBoundary()` | `Pattern` | `\b` |
|
|
453
|
+
|
|
454
|
+
### Small literals & composites
|
|
455
|
+
|
|
456
|
+
| Function | Returns | Description |
|
|
457
|
+
| ------------------ | --------- | ------------------------------------------- |
|
|
458
|
+
| `dash()` | `Pattern` | Literal `-`. |
|
|
459
|
+
| `underscore()` | `Pattern` | Literal `_`. |
|
|
460
|
+
| `dot()` | `Pattern` | Literal `.` (not “any character”). |
|
|
461
|
+
| `integer()` | `Pattern` | One-or-more digits (`digit().oneOrMore()`). |
|
|
462
|
+
| `booleanLiteral()` | `Pattern` | Alternates literal `true` \| `false`. |
|
|
463
|
+
|
|
464
|
+
### Escape hatch
|
|
465
|
+
|
|
466
|
+
| Function | Returns | Description |
|
|
467
|
+
| --------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------- |
|
|
468
|
+
| `raw(source, flags?)` | `Pattern` | Inserts regex source **verbatim**; triggers compiler warnings; merges optional per-fragment flags into compile result. |
|
|
469
|
+
|
|
470
|
+
### Compilation & execution
|
|
471
|
+
|
|
472
|
+
| Function | Returns | Description |
|
|
473
|
+
| ------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------ |
|
|
474
|
+
| `compile(ast, options?)` | `CompiledPattern` | High-level wrapper: regex body, flags, `.test`, `.exec`, explain/diagnose/analyze, JSON helpers. |
|
|
475
|
+
| `compilePattern(ast, options?)` | `CompileResult` | `{ pattern, flags, warnings }` for interop (OpenAPI, validators, logging). |
|
|
476
|
+
| `toRegExp(ast, options?)` | `RegExp` | Construct native `RegExp` without `CompiledPattern`. |
|
|
477
|
+
|
|
478
|
+
`CompileOptions` (second arg) includes `flags` (`ignoreCase`, `multiline`, `dotAll`, `unicode`, `global`, `sticky`) and `nonCapturing` (strip named captures for simulation-style use).
|
|
479
|
+
|
|
480
|
+
### Understanding & safety
|
|
481
|
+
|
|
482
|
+
| Function | Returns | Description |
|
|
483
|
+
| -------------------------------- | ------------------- | --------------------------------------------------------------------------------------------- |
|
|
484
|
+
| `explainPattern(ast)` | `ExplainResult` | `{ clauses, summary }` — human-oriented sentences from AST. |
|
|
485
|
+
| `diagnose(ast, input, options?)` | `DiagnoseResult` | Full-string anchored check first; best-effort mismatch hints (not a complete regex debugger). |
|
|
486
|
+
| `analyzePattern(ast)` | `AnalysisFinding[]` | Rule-based hints (e.g. `raw()`, nested quantifier shape); **not** a full ReDoS prover. |
|
|
487
|
+
|
|
488
|
+
### Serialization
|
|
489
|
+
|
|
490
|
+
| Function | Returns | Description |
|
|
491
|
+
| ---------------------------------- | ------------------- | ----------------------------------------------- |
|
|
492
|
+
| `serializePattern(ast)` | `SerializedPattern` | `{ schemaVersion, pattern }` JSON-ready object. |
|
|
493
|
+
| `deserializePattern(data)` | `Pattern` | Validate + revive AST from unknown JSON input. |
|
|
494
|
+
| `patternToJsonString(ast, space?)` | `string` | `JSON.stringify` of `serializePattern`. |
|
|
495
|
+
| `patternFromJsonString(text)` | `Pattern` | Parse JSON text then deserialize. |
|
|
496
|
+
|
|
497
|
+
### Fluent builder
|
|
498
|
+
|
|
499
|
+
| Function | Returns | Description |
|
|
500
|
+
| -------------- | -------------- | ----------------------------------------------------- |
|
|
501
|
+
| `match(opts?)` | `MatchBuilder` | Fluent API; optional default `flags` on `.compile()`. |
|
|
502
|
+
| `regex(opts?)` | `MatchBuilder` | Alias of `match`. |
|
|
503
|
+
|
|
504
|
+
### `MatchBuilder` methods
|
|
505
|
+
|
|
506
|
+
| Method | Returns | Description |
|
|
507
|
+
| ---------------------------------- | ----------------- | --------------------------------------------------------- |
|
|
508
|
+
| `.start()` | `this` | Append `^`. |
|
|
509
|
+
| `.end()` | `this` | Append `$`. |
|
|
510
|
+
| `.boundary()` | `this` | Append `\b`. |
|
|
511
|
+
| `.text(value)` / `.literal(value)` | `this` | Append escaped literal. |
|
|
512
|
+
| `.dash()` | `this` | Append `-`. |
|
|
513
|
+
| `.take(fragment)` | `this` | Append any `Pattern`. |
|
|
514
|
+
| `.digit()` | `this` | Append **one** `\d` (not quantified). |
|
|
515
|
+
| `.lettersUpper()` | `this` | Append one ASCII `A–Z`. |
|
|
516
|
+
| `.lettersLower()` | `this` | Append one ASCII `a–z`. |
|
|
517
|
+
| `.named(name, inner)` | `MatchBuilder` | Append named group. |
|
|
518
|
+
| `.build()` | `Pattern` | Materialize AST from chained fragments. |
|
|
519
|
+
| `.compile(options?)` | `CompiledPattern` | Merge default flags + options, return executable wrapper. |
|
|
520
|
+
|
|
521
|
+
### `CompiledPattern` members
|
|
522
|
+
|
|
523
|
+
| Member | Returns | Description |
|
|
524
|
+
| ----------------------- | ------------------------- | --------------------------------------------------------- |
|
|
525
|
+
| `.ast` | `Pattern` | Underlying AST reference. |
|
|
526
|
+
| `.source` | `string` | Regex body (no delimiters). |
|
|
527
|
+
| `.flags` | `string` | Concatenated flag letters (`i`, `u`, …). |
|
|
528
|
+
| `.warnings` | `CompileWarning[]` | Compiler notices (e.g. raw fragments). |
|
|
529
|
+
| `.toRegExp()` | `RegExp` | Fresh native regex instance. |
|
|
530
|
+
| `.test(input)` | `boolean` | `RegExp.prototype.test`. |
|
|
531
|
+
| `.exec(input)` | `RegExpExecArray \| null` | `RegExp.prototype.exec` (named `groups` when applicable). |
|
|
532
|
+
| `.explain()` | `ExplainResult` | Same as `explainPattern(ast)`. |
|
|
533
|
+
| `.diagnose(input)` | `DiagnoseResult` | Same as `diagnose(ast, input, opts)`. |
|
|
534
|
+
| `.analyze()` | `AnalysisFinding[]` | Same as `analyzePattern(ast)`. |
|
|
535
|
+
| `.toJSON()` | `SerializedPattern` | Structured JSON object. |
|
|
536
|
+
| `.toJSONString(space?)` | `string` | Pretty-printed JSON text. |
|
|
537
|
+
|
|
538
|
+
### Presets
|
|
539
|
+
|
|
540
|
+
| Function | Returns | Description |
|
|
541
|
+
| ------------------------- | --------- | --------------------------------------------------------------------------------------------- |
|
|
542
|
+
| `presets.uuid()` | `Pattern` | UUID-shaped hyphenated hex (not full RFC 4122 semantics). |
|
|
543
|
+
| `presets.slug()` | `Pattern` | Lowercase alphanumeric segments separated by `-`. |
|
|
544
|
+
| `presets.hexColor(opts?)` | `Pattern` | `#RGB` / `#RRGGBB` / optional `#RRGGBBAA` when `alpha: true`; `short: false` disables `#RGB`. |
|
|
545
|
+
|
|
546
|
+
---
|