npm - namespace-guard - Versions diffs - 0.7.0 → 0.8.1 - Mend

namespace-guard 0.7.0 → 0.8.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/README.md CHANGED Viewed

@@ -299,7 +299,7 @@ No words are bundled — use any word list you like (e.g., the `bad-words` npm p
 ### Built-in Homoglyph Validator
-Prevent spoofing attacks where Cyrillic or Greek characters are substituted for visually identical Latin letters (e.g., Cyrillic "а" for Latin "a" in "admin"):
+Prevent spoofing attacks where visually similar characters from any Unicode script are substituted for Latin letters (e.g., Cyrillic "а" for Latin "a" in "admin"):
 ```typescript
 import { createNamespaceGuard, createHomoglyphValidator } from "namespace-guard";
@@ -318,11 +318,43 @@ Options:
 createHomoglyphValidator({
   message: "Custom rejection message.",       // optional
   additionalMappings: { "\u0261": "g" },      // extend the built-in map
-  rejectMixedScript: true,                    // also reject Latin + Cyrillic/Greek mixing
+  rejectMixedScript: true,                    // also reject Latin + non-Latin script mixing
 })
 ```
-The built-in `CONFUSABLE_MAP` covers ~30 Cyrillic-to-Latin and Greek-to-Latin pairs — the most common spoofing vectors. It's exported for inspection or extension.
+The built-in `CONFUSABLE_MAP` contains 613 character pairs generated from [Unicode TR39 confusables.txt](https://unicode.org/reports/tr39/) plus supplemental Latin small capitals. It covers Cyrillic, Greek, Armenian, Cherokee, IPA, Coptic, Lisu, Canadian Syllabics, Georgian, and 20+ other scripts. The map is exported for inspection or extension, and is regenerable for new Unicode versions with `npx tsx scripts/generate-confusables.ts`.
+### How the anti-spoofing pipeline works
+Most confusable-detection libraries apply a character map in isolation. namespace-guard uses a three-stage pipeline where each stage is aware of the others:
+```
+Input  →  NFKC normalize  →  Confusable map  →  Mixed-script reject
+           (stage 1)          (stage 2)           (stage 3)
+```
+**Stage 1: NFKC normalization** collapses full-width characters (`Ｉ` → `I`), ligatures (`ﬁ` → `fi`), superscripts, and other Unicode compatibility forms to their canonical equivalents. This runs first, before any confusable check.
+**Stage 2: Confusable map** catches characters that survive NFKC but visually mimic Latin letters — Cyrillic `а` for `a`, Greek `ο` for `o`, Cherokee `Ꭺ` for `A`, and 600+ others from the Unicode Consortium's [confusables.txt](https://unicode.org/Public/security/latest/confusables.txt).
+**Stage 3: Mixed-script rejection** (`rejectMixedScript: true`) blocks identifiers that mix Latin with non-Latin scripts (Hebrew, Arabic, Devanagari, Thai, Georgian, Ethiopic, etc.) even if the specific characters aren't in the confusable map. This catches novel homoglyphs that the map doesn't cover.
+#### Why NFKC-aware filtering matters
+The key insight: TR39's confusables.txt and NFKC normalization sometimes disagree. For example, Unicode says capital `I` (U+0049) is confusable with lowercase `l` — visually true in many fonts. But NFKC maps Mathematical Bold `𝐈` (U+1D408) to `I`, not `l`. If you naively ship the TR39 mapping (`𝐈` → `l`), the confusable check will never see that character — NFKC already converted it to `I` in stage 1.
+We found 31 entries where this happens:
+| Character | TR39 says | NFKC says | Winner |
+|-----------|-----------|-----------|--------|
+| `ſ` Long S (U+017F) | `f` | `s` | NFKC (`s` is correct) |
+| `Ⅰ` Roman Numeral I (U+2160) | `l` | `i` | NFKC (`i` is correct) |
+| `Ｉ` Fullwidth I (U+FF29) | `l` | `i` | NFKC (`i` is correct) |
+| `𝟎` Math Bold 0 (U+1D7CE) | `o` | `0` | NFKC (`0` is correct) |
+| 11 Mathematical I variants | `l` | `i` | NFKC |
+| 12 Mathematical 0/1 variants | `o`/`l` | `0`/`1` | NFKC |
+These entries are dead code in any pipeline that runs NFKC first — and worse, they encode the *wrong* mapping. The generate script (`scripts/generate-confusables.ts`) automatically detects and excludes them.
 ## Unicode Normalization

package/dist/index.d.mts CHANGED Viewed

@@ -136,22 +136,27 @@ declare function createProfanityValidator(words: string[], options?: {
     message: string;
 } | null>;
 /**
- * Default mapping of visually confusable Unicode characters to their Latin equivalents.
- * Covers Cyrillic-to-Latin and Greek-to-Latin lookalikes — the most common spoofing vectors.
- * Exported for advanced users who need to inspect or extend the mapping.
+ * Mapping of visually confusable Unicode characters to their Latin/digit equivalents.
+ * Generated from Unicode TR39 confusables.txt + supplemental Latin small capitals.
+ * Covers every single-character mapping to a lowercase Latin letter or digit,
+ * excluding characters already handled by NFKC normalization (either collapsed
+ * to the same target, or mapped to a different valid Latin char/digit).
+ * Regenerate: `npx tsx scripts/generate-confusables.ts`
  */
 declare const CONFUSABLE_MAP: Record<string, string>;
 /**
  * Create a validator that rejects identifiers containing homoglyph/confusable characters.
  *
- * Catches spoofing attacks where Cyrillic or Greek characters are substituted for
+ * Catches spoofing attacks where characters from other scripts are substituted for
  * visually identical Latin characters (e.g., Cyrillic "а" for Latin "a" in "admin").
- * Uses a curated mapping of ~30 character pairs that covers 95%+ of real impersonation attempts.
+ * Uses a comprehensive mapping of 613 character pairs generated from Unicode TR39
+ * confusables.txt, covering Cyrillic, Greek, Armenian, Cherokee, IPA, Latin small
+ * capitals, Canadian Syllabics, Georgian, Lisu, Coptic, and many other scripts.
  *
  * @param options - Optional settings
  * @param options.message - Custom rejection message (default: "That name contains characters that could be confused with other letters.")
  * @param options.additionalMappings - Extra confusable pairs to merge with the built-in map
- * @param options.rejectMixedScript - Also reject identifiers that mix Latin with Cyrillic/Greek characters (default: false)
+ * @param options.rejectMixedScript - Also reject identifiers that mix Latin with non-Latin characters from any covered script (Cyrillic, Greek, Armenian, Hebrew, Arabic, Georgian, Cherokee, Canadian Syllabics, Ethiopic, Coptic, Lisu, and more) (default: false)
  * @returns An async validator function for use in `config.validators`
  *
  * @example

package/dist/index.d.ts CHANGED Viewed

@@ -136,22 +136,27 @@ declare function createProfanityValidator(words: string[], options?: {
     message: string;
 } | null>;
 /**
- * Default mapping of visually confusable Unicode characters to their Latin equivalents.
- * Covers Cyrillic-to-Latin and Greek-to-Latin lookalikes — the most common spoofing vectors.
- * Exported for advanced users who need to inspect or extend the mapping.
+ * Mapping of visually confusable Unicode characters to their Latin/digit equivalents.
+ * Generated from Unicode TR39 confusables.txt + supplemental Latin small capitals.
+ * Covers every single-character mapping to a lowercase Latin letter or digit,
+ * excluding characters already handled by NFKC normalization (either collapsed
+ * to the same target, or mapped to a different valid Latin char/digit).
+ * Regenerate: `npx tsx scripts/generate-confusables.ts`
  */
 declare const CONFUSABLE_MAP: Record<string, string>;
 /**
  * Create a validator that rejects identifiers containing homoglyph/confusable characters.
  *
- * Catches spoofing attacks where Cyrillic or Greek characters are substituted for
+ * Catches spoofing attacks where characters from other scripts are substituted for
  * visually identical Latin characters (e.g., Cyrillic "а" for Latin "a" in "admin").
- * Uses a curated mapping of ~30 character pairs that covers 95%+ of real impersonation attempts.
+ * Uses a comprehensive mapping of 613 character pairs generated from Unicode TR39
+ * confusables.txt, covering Cyrillic, Greek, Armenian, Cherokee, IPA, Latin small
+ * capitals, Canadian Syllabics, Georgian, Lisu, Coptic, and many other scripts.
  *
  * @param options - Optional settings
  * @param options.message - Custom rejection message (default: "That name contains characters that could be confused with other letters.")
  * @param options.additionalMappings - Extra confusable pairs to merge with the built-in map
- * @param options.rejectMixedScript - Also reject identifiers that mix Latin with Cyrillic/Greek characters (default: false)
+ * @param options.rejectMixedScript - Also reject identifiers that mix Latin with non-Latin characters from any covered script (Cyrillic, Greek, Armenian, Hebrew, Arabic, Georgian, Cherokee, Canadian Syllabics, Ethiopic, Coptic, Lisu, and more) (default: false)
  * @returns An async validator function for use in `config.validators`
  *
  * @example