npm - namespace-guard - Versions diffs - 0.19.0 → 0.20.0 - Mend

namespace-guard 0.19.0 → 0.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +36 -10
package/dist/cli.js +2218 -2218
package/dist/cli.mjs +2218 -2218
package/dist/composability-vectors.js +2218 -2218
package/dist/composability-vectors.mjs +2218 -2218
package/dist/confusable-weights.d.mts +3 -5
package/dist/confusable-weights.d.ts +3 -5
package/dist/confusable-weights.js +4905 -2740
package/dist/confusable-weights.mjs +4905 -2740
package/dist/font-specific-weights.d.mts +1 -1
package/dist/font-specific-weights.d.ts +1 -1
package/dist/index.d.mts +90 -9
package/dist/index.d.ts +90 -9
package/dist/index.js +2288 -2222
package/dist/index.mjs +2287 -2222
package/dist/profanity-en.js +2218 -2218
package/dist/profanity-en.mjs +2218 -2218
package/package.json +1 -1

package/README.md CHANGED Viewed

@@ -5,7 +5,7 @@
 [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-**The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM Denial of Spend defence in one zero-dependency package.
+**The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM [Denial of Spend](https://paultendo.github.io/posts/confusable-vision-llm-attack-tests/) defence in one zero-dependency package.
 - Live demo: https://paultendo.github.io/namespace-guard/
 - Blog post: https://paultendo.github.io/posts/namespace-guard-launch/
@@ -14,19 +14,19 @@
 Existing confusable standards (TR39, IDNA) map non-Latin characters to Latin equivalents. They have zero coverage for confusable pairs *between* two non-Latin scripts.
-namespace-guard ships 494 SSIM-measured cross-script pairs from [confusable-vision](https://github.com/paultendo/confusable-vision) (rendered across 230 system fonts, scored by structural similarity). This catches attacks that no other library detects:
+namespace-guard ships 3,525 cross-script pairs from [confusable-vision](https://github.com/paultendo/confusable-vision) (measured across 245 system fonts using vector-outline raycasting — [RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). This catches attacks that no other library detects:
 ```typescript
 import { areConfusable, detectCrossScriptRisk } from "namespace-guard";
 import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
-// Hangul ᅵ and Han 丨 are visually identical (SSIM 0.999, Arial Unicode MS)
+// Hangul ᅵ and Han 丨 are visually identical (ray distance 0.004, Arial Unicode MS)
 areConfusable("\u1175", "\u4E28", { weights: CONFUSABLE_WEIGHTS }); // true
-// Greek Τ and Han 丅 are near-identical (SSIM 0.930, Hiragino Kaku Gothic ProN)
+// Greek Τ and Han 丅 are near-identical (multiple fonts)
 areConfusable("\u03A4", "\u4E05", { weights: CONFUSABLE_WEIGHTS }); // true
-// Cyrillic І and Greek Ι are pixel-identical (SSIM 1.0, 61 fonts agree)
+// Cyrillic І and Greek Ι are identical outlines (62 fonts)
 areConfusable("\u0406", "\u0399", { weights: CONFUSABLE_WEIGHTS }); // true
 // Without weights, only skeleton-based detection (TR39 coverage)
@@ -37,7 +37,7 @@ const risk = detectCrossScriptRisk("\u1175\u4E28", { weights: CONFUSABLE_WEIGHTS
 // { riskLevel: "high", scripts: ["han", "hangul"], crossScriptPairs: [...] }
 ```
-1,397 total SSIM-scored confusable pairs (110 TR39-confirmed, 793 novel Latin-target, 494 cross-script). Cross-script data licensed CC-BY-4.0.
+4,174 total confusable pairs scored by visual measurement (3,111 TR39-confirmed, 1,063 novel). Each pair carries a `danger` score (0–1) representing geometric similarity across fonts; the shipped dataset uses a 0.5 floor. For higher precision, filter at `danger > 0.7` (574 pairs). Cross-script data licensed CC-BY-4.0.
 ## Installation
@@ -88,7 +88,7 @@ if (!result.claimed) {
 ## What You Get
-- **Cross-script confusable detection** with 494 SSIM-measured pairs between non-Latin scripts
+- **Cross-script confusable detection** with 3,525 measured pairs between non-Latin scripts
 - Cross-table collision checks (users, orgs, teams, etc.)
 - Reserved-name blocking with category-aware messages
 - Unicode anti-spoofing (NFKC + confusable detection + mixed-script/risk controls)
@@ -136,7 +136,7 @@ areConfusable("paypal", "pa\u0443pal"); // true
 confusableDistance("paypal", "pa\u0443pal"); // graded similarity + chainDepth + explainable steps
 ```
-For measured visual scoring, pass the optional weights from confusable-vision (1,397 SSIM-scored pairs across 230 fonts, including 494 cross-script pairs). The `context` filter restricts to identifier-valid, domain-valid, or all pairs.
+For measured visual scoring, pass the optional weights from confusable-vision (4,174 pairs scored across 245 fonts using vector-outline raycasting, including 3,525 cross-script pairs). Each pair has a `danger` score (0–1); the default 0.5 floor favours recall, use `danger > 0.7` for precision. The `context` filter restricts to identifier-valid, domain-valid, or all pairs.
 ```typescript
 import { confusableDistance } from "namespace-guard";
@@ -149,11 +149,37 @@ const result = confusableDistance("paypal", "pa\u0443pal", {
 // result.similarity, result.steps (including "visual-weight" reason for novel pairs)
 ```
+### Realistic Domain Spoof Detection
+For domain name validation, `isDomainSpoof()` only flags threats that could produce registrable domain names. ICANN registrars enforce single-script labels, so mixed-script spoofs (e.g., one Cyrillic letter in a Latin domain) are excluded — they can't actually be registered.
+```typescript
+import { isDomainSpoof } from "namespace-guard";
+import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
+// Full-Cyrillic lookalike — registrable and deceptive
+isDomainSpoof("\u0440\u0430\u0443\u0440\u0430\u04CF", "paypal", { weights: CONFUSABLE_WEIGHTS });
+// { spoof: true, script: "cyrillic", danger: 0.91, substitutions: [...] }
+// Mixed-script — not registrable, not flagged
+isDomainSpoof("\u0440aypal", "paypal", { weights: CONFUSABLE_WEIGHTS });
+// { spoof: false }
+// Known-legitimate non-Latin domain — skip via allowlist
+isDomainSpoof("\u0430\u0441\u0435", "ace", {
+  weights: CONFUSABLE_WEIGHTS,
+  allowlist: ["\u0430\u0441\u0435"],
+});
+// { spoof: false }
+```
+The `danger` score (0–1) is always returned when a script match is found, even if below the `minDanger` threshold (default 0.5). Set `minDanger: 0.7` for higher precision.
 ## Research
 Two research tracks feed the library:
-**Visual measurement.** 1,397 confusable pairs rendered across 230 system fonts, scored by structural similarity (SSIM). 494 of these are novel cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Full dataset published as [confusable-vision](https://github.com/paultendo/confusable-vision) (CC-BY-4.0).
+**Visual measurement.** 4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting ([RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). 3,525 of these are cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Each pair carries a `danger` score (0–1) representing geometric similarity; the shipped floor is 0.5 (for higher precision, try 0.7). Full dataset published as [confusable-vision](https://github.com/paultendo/confusable-vision) (CC-BY-4.0).
 **Normalisation composability.** 31 characters where Unicode's confusables.txt and NFKC normalisation disagree. Two production maps (`CONFUSABLE_MAP` for NFKC-first, `CONFUSABLE_MAP_FULL` for raw-input pipelines), a benchmark corpus, and composability vectors wired into CLI drift baselines. Submitted to [Unicode public review (PRI #540)](https://www.unicode.org/review/pri540/) and published in [accumulated feedback](https://www.unicode.org/review/pri540/feedback.html).
@@ -243,7 +269,7 @@ Migration guides per adapter: [docs/reference.md#canonical-uniqueness-migration-
 - LLM preprocessing (`canonicalise`, `scan`, `isClean`): [docs/reference.md#llm-pipeline-preprocessing](docs/reference.md#llm-pipeline-preprocessing)
 - Benchmark corpus (`confusable-bench.v1`): [docs/reference.md#confusable-benchmark-corpus-artifact](docs/reference.md#confusable-benchmark-corpus-artifact)
 - Advanced primitives (`skeleton`, `areConfusable`, `confusableDistance`): [docs/reference.md#advanced-security-primitives](docs/reference.md#advanced-security-primitives)
-- Confusable weights (SSIM-scored pairs, including cross-script): [docs/reference.md#confusable-weights-subpath](docs/reference.md#confusable-weights-subpath)
+- Confusable weights (scored pairs, including cross-script): [docs/reference.md#confusable-weights-subpath](docs/reference.md#confusable-weights-subpath)
 - Cross-script detection: [docs/reference.md#cross-script-detection](docs/reference.md#cross-script-detection)
 - CLI reference: [docs/reference.md#cli](docs/reference.md#cli)
 - API reference: [docs/reference.md#api-reference](docs/reference.md#api-reference)