namespace-guard 0.7.0 → 0.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +35 -3
- package/dist/index.d.mts +11 -6
- package/dist/index.d.ts +11 -6
- package/dist/index.js +667 -27
- package/dist/index.mjs +667 -27
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -299,7 +299,7 @@ No words are bundled — use any word list you like (e.g., the `bad-words` npm p
|
|
|
299
299
|
|
|
300
300
|
### Built-in Homoglyph Validator
|
|
301
301
|
|
|
302
|
-
Prevent spoofing attacks where
|
|
302
|
+
Prevent spoofing attacks where visually similar characters from any Unicode script are substituted for Latin letters (e.g., Cyrillic "а" for Latin "a" in "admin"):
|
|
303
303
|
|
|
304
304
|
```typescript
|
|
305
305
|
import { createNamespaceGuard, createHomoglyphValidator } from "namespace-guard";
|
|
@@ -318,11 +318,43 @@ Options:
|
|
|
318
318
|
createHomoglyphValidator({
|
|
319
319
|
message: "Custom rejection message.", // optional
|
|
320
320
|
additionalMappings: { "\u0261": "g" }, // extend the built-in map
|
|
321
|
-
rejectMixedScript: true, // also reject Latin +
|
|
321
|
+
rejectMixedScript: true, // also reject Latin + non-Latin script mixing
|
|
322
322
|
})
|
|
323
323
|
```
|
|
324
324
|
|
|
325
|
-
The built-in `CONFUSABLE_MAP`
|
|
325
|
+
The built-in `CONFUSABLE_MAP` contains 613 character pairs generated from [Unicode TR39 confusables.txt](https://unicode.org/reports/tr39/) plus supplemental Latin small capitals. It covers Cyrillic, Greek, Armenian, Cherokee, IPA, Coptic, Lisu, Canadian Syllabics, Georgian, and 20+ other scripts. The map is exported for inspection or extension, and is regenerable for new Unicode versions with `npx tsx scripts/generate-confusables.ts`.
|
|
326
|
+
|
|
327
|
+
### How the anti-spoofing pipeline works
|
|
328
|
+
|
|
329
|
+
Most confusable-detection libraries apply a character map in isolation. namespace-guard uses a three-stage pipeline where each stage is aware of the others:
|
|
330
|
+
|
|
331
|
+
```
|
|
332
|
+
Input → NFKC normalize → Confusable map → Mixed-script reject
|
|
333
|
+
(stage 1) (stage 2) (stage 3)
|
|
334
|
+
```
|
|
335
|
+
|
|
336
|
+
**Stage 1: NFKC normalization** collapses full-width characters (`I` → `I`), ligatures (`fi` → `fi`), superscripts, and other Unicode compatibility forms to their canonical equivalents. This runs first, before any confusable check.
|
|
337
|
+
|
|
338
|
+
**Stage 2: Confusable map** catches characters that survive NFKC but visually mimic Latin letters — Cyrillic `а` for `a`, Greek `ο` for `o`, Cherokee `Ꭺ` for `A`, and 600+ others from the Unicode Consortium's [confusables.txt](https://unicode.org/Public/security/latest/confusables.txt).
|
|
339
|
+
|
|
340
|
+
**Stage 3: Mixed-script rejection** (`rejectMixedScript: true`) blocks identifiers that mix Latin with non-Latin scripts (Hebrew, Arabic, Devanagari, Thai, Georgian, Ethiopic, etc.) even if the specific characters aren't in the confusable map. This catches novel homoglyphs that the map doesn't cover.
|
|
341
|
+
|
|
342
|
+
#### Why NFKC-aware filtering matters
|
|
343
|
+
|
|
344
|
+
The key insight: TR39's confusables.txt and NFKC normalization sometimes disagree. For example, Unicode says capital `I` (U+0049) is confusable with lowercase `l` — visually true in many fonts. But NFKC maps Mathematical Bold `𝐈` (U+1D408) to `I`, not `l`. If you naively ship the TR39 mapping (`𝐈` → `l`), the confusable check will never see that character — NFKC already converted it to `I` in stage 1.
|
|
345
|
+
|
|
346
|
+
We found 31 entries where this happens:
|
|
347
|
+
|
|
348
|
+
| Character | TR39 says | NFKC says | Winner |
|
|
349
|
+
|-----------|-----------|-----------|--------|
|
|
350
|
+
| `ſ` Long S (U+017F) | `f` | `s` | NFKC (`s` is correct) |
|
|
351
|
+
| `Ⅰ` Roman Numeral I (U+2160) | `l` | `i` | NFKC (`i` is correct) |
|
|
352
|
+
| `I` Fullwidth I (U+FF29) | `l` | `i` | NFKC (`i` is correct) |
|
|
353
|
+
| `𝟎` Math Bold 0 (U+1D7CE) | `o` | `0` | NFKC (`0` is correct) |
|
|
354
|
+
| 11 Mathematical I variants | `l` | `i` | NFKC |
|
|
355
|
+
| 12 Mathematical 0/1 variants | `o`/`l` | `0`/`1` | NFKC |
|
|
356
|
+
|
|
357
|
+
These entries are dead code in any pipeline that runs NFKC first — and worse, they encode the *wrong* mapping. The generate script (`scripts/generate-confusables.ts`) automatically detects and excludes them.
|
|
326
358
|
|
|
327
359
|
## Unicode Normalization
|
|
328
360
|
|
package/dist/index.d.mts
CHANGED
|
@@ -136,22 +136,27 @@ declare function createProfanityValidator(words: string[], options?: {
|
|
|
136
136
|
message: string;
|
|
137
137
|
} | null>;
|
|
138
138
|
/**
|
|
139
|
-
*
|
|
140
|
-
*
|
|
141
|
-
*
|
|
139
|
+
* Mapping of visually confusable Unicode characters to their Latin/digit equivalents.
|
|
140
|
+
* Generated from Unicode TR39 confusables.txt + supplemental Latin small capitals.
|
|
141
|
+
* Covers every single-character mapping to a lowercase Latin letter or digit,
|
|
142
|
+
* excluding characters already handled by NFKC normalization (either collapsed
|
|
143
|
+
* to the same target, or mapped to a different valid Latin char/digit).
|
|
144
|
+
* Regenerate: `npx tsx scripts/generate-confusables.ts`
|
|
142
145
|
*/
|
|
143
146
|
declare const CONFUSABLE_MAP: Record<string, string>;
|
|
144
147
|
/**
|
|
145
148
|
* Create a validator that rejects identifiers containing homoglyph/confusable characters.
|
|
146
149
|
*
|
|
147
|
-
* Catches spoofing attacks where
|
|
150
|
+
* Catches spoofing attacks where characters from other scripts are substituted for
|
|
148
151
|
* visually identical Latin characters (e.g., Cyrillic "а" for Latin "a" in "admin").
|
|
149
|
-
* Uses a
|
|
152
|
+
* Uses a comprehensive mapping of 613 character pairs generated from Unicode TR39
|
|
153
|
+
* confusables.txt, covering Cyrillic, Greek, Armenian, Cherokee, IPA, Latin small
|
|
154
|
+
* capitals, Canadian Syllabics, Georgian, Lisu, Coptic, and many other scripts.
|
|
150
155
|
*
|
|
151
156
|
* @param options - Optional settings
|
|
152
157
|
* @param options.message - Custom rejection message (default: "That name contains characters that could be confused with other letters.")
|
|
153
158
|
* @param options.additionalMappings - Extra confusable pairs to merge with the built-in map
|
|
154
|
-
* @param options.rejectMixedScript - Also reject identifiers that mix Latin with Cyrillic
|
|
159
|
+
* @param options.rejectMixedScript - Also reject identifiers that mix Latin with non-Latin characters from any covered script (Cyrillic, Greek, Armenian, Hebrew, Arabic, Georgian, Cherokee, Canadian Syllabics, Ethiopic, Coptic, Lisu, and more) (default: false)
|
|
155
160
|
* @returns An async validator function for use in `config.validators`
|
|
156
161
|
*
|
|
157
162
|
* @example
|
package/dist/index.d.ts
CHANGED
|
@@ -136,22 +136,27 @@ declare function createProfanityValidator(words: string[], options?: {
|
|
|
136
136
|
message: string;
|
|
137
137
|
} | null>;
|
|
138
138
|
/**
|
|
139
|
-
*
|
|
140
|
-
*
|
|
141
|
-
*
|
|
139
|
+
* Mapping of visually confusable Unicode characters to their Latin/digit equivalents.
|
|
140
|
+
* Generated from Unicode TR39 confusables.txt + supplemental Latin small capitals.
|
|
141
|
+
* Covers every single-character mapping to a lowercase Latin letter or digit,
|
|
142
|
+
* excluding characters already handled by NFKC normalization (either collapsed
|
|
143
|
+
* to the same target, or mapped to a different valid Latin char/digit).
|
|
144
|
+
* Regenerate: `npx tsx scripts/generate-confusables.ts`
|
|
142
145
|
*/
|
|
143
146
|
declare const CONFUSABLE_MAP: Record<string, string>;
|
|
144
147
|
/**
|
|
145
148
|
* Create a validator that rejects identifiers containing homoglyph/confusable characters.
|
|
146
149
|
*
|
|
147
|
-
* Catches spoofing attacks where
|
|
150
|
+
* Catches spoofing attacks where characters from other scripts are substituted for
|
|
148
151
|
* visually identical Latin characters (e.g., Cyrillic "а" for Latin "a" in "admin").
|
|
149
|
-
* Uses a
|
|
152
|
+
* Uses a comprehensive mapping of 613 character pairs generated from Unicode TR39
|
|
153
|
+
* confusables.txt, covering Cyrillic, Greek, Armenian, Cherokee, IPA, Latin small
|
|
154
|
+
* capitals, Canadian Syllabics, Georgian, Lisu, Coptic, and many other scripts.
|
|
150
155
|
*
|
|
151
156
|
* @param options - Optional settings
|
|
152
157
|
* @param options.message - Custom rejection message (default: "That name contains characters that could be confused with other letters.")
|
|
153
158
|
* @param options.additionalMappings - Extra confusable pairs to merge with the built-in map
|
|
154
|
-
* @param options.rejectMixedScript - Also reject identifiers that mix Latin with Cyrillic
|
|
159
|
+
* @param options.rejectMixedScript - Also reject identifiers that mix Latin with non-Latin characters from any covered script (Cyrillic, Greek, Armenian, Hebrew, Arabic, Georgian, Cherokee, Canadian Syllabics, Ethiopic, Coptic, Lisu, and more) (default: false)
|
|
155
160
|
* @returns An async validator function for use in `config.validators`
|
|
156
161
|
*
|
|
157
162
|
* @example
|