namespace-guard 0.19.0 → 0.20.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -5,7 +5,7 @@
5
5
  [![TypeScript](https://img.shields.io/badge/TypeScript-5.0+-blue.svg)](https://www.typescriptlang.org/)
6
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
7
7
 
8
- **The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM Denial of Spend defence in one zero-dependency package.
8
+ **The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM [Denial of Spend](https://paultendo.github.io/posts/confusable-vision-llm-attack-tests/) defence in one zero-dependency package.
9
9
 
10
10
  - Live demo: https://paultendo.github.io/namespace-guard/
11
11
  - Blog post: https://paultendo.github.io/posts/namespace-guard-launch/
@@ -14,19 +14,19 @@
14
14
 
15
15
  Existing confusable standards (TR39, IDNA) map non-Latin characters to Latin equivalents. They have zero coverage for confusable pairs *between* two non-Latin scripts.
16
16
 
17
- namespace-guard ships 494 SSIM-measured cross-script pairs from [confusable-vision](https://github.com/paultendo/confusable-vision) (rendered across 230 system fonts, scored by structural similarity). This catches attacks that no other library detects:
17
+ namespace-guard ships 3,525 cross-script pairs from [confusable-vision](https://github.com/paultendo/confusable-vision) (measured across 245 system fonts using vector-outline raycasting — [RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). This catches attacks that no other library detects:
18
18
 
19
19
  ```typescript
20
20
  import { areConfusable, detectCrossScriptRisk } from "namespace-guard";
21
21
  import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
22
22
 
23
- // Hangul ᅵ and Han 丨 are visually identical (SSIM 0.999, Arial Unicode MS)
23
+ // Hangul ᅵ and Han 丨 are visually identical (ray distance 0.004, Arial Unicode MS)
24
24
  areConfusable("\u1175", "\u4E28", { weights: CONFUSABLE_WEIGHTS }); // true
25
25
 
26
- // Greek Τ and Han 丅 are near-identical (SSIM 0.930, Hiragino Kaku Gothic ProN)
26
+ // Greek Τ and Han 丅 are near-identical (multiple fonts)
27
27
  areConfusable("\u03A4", "\u4E05", { weights: CONFUSABLE_WEIGHTS }); // true
28
28
 
29
- // Cyrillic І and Greek Ι are pixel-identical (SSIM 1.0, 61 fonts agree)
29
+ // Cyrillic І and Greek Ι are identical outlines (62 fonts)
30
30
  areConfusable("\u0406", "\u0399", { weights: CONFUSABLE_WEIGHTS }); // true
31
31
 
32
32
  // Without weights, only skeleton-based detection (TR39 coverage)
@@ -37,7 +37,7 @@ const risk = detectCrossScriptRisk("\u1175\u4E28", { weights: CONFUSABLE_WEIGHTS
37
37
  // { riskLevel: "high", scripts: ["han", "hangul"], crossScriptPairs: [...] }
38
38
  ```
39
39
 
40
- 1,397 total SSIM-scored confusable pairs (110 TR39-confirmed, 793 novel Latin-target, 494 cross-script). Cross-script data licensed CC-BY-4.0.
40
+ 4,174 total confusable pairs scored by visual measurement (3,111 TR39-confirmed, 1,063 novel). Each pair carries a `danger` score (0–1) representing geometric similarity across fonts; the shipped dataset uses a 0.5 floor. For higher precision, filter at `danger > 0.7` (574 pairs). Cross-script data licensed CC-BY-4.0.
41
41
 
42
42
  ## Installation
43
43
 
@@ -88,7 +88,7 @@ if (!result.claimed) {
88
88
 
89
89
  ## What You Get
90
90
 
91
- - **Cross-script confusable detection** with 494 SSIM-measured pairs between non-Latin scripts
91
+ - **Cross-script confusable detection** with 3,525 measured pairs between non-Latin scripts
92
92
  - Cross-table collision checks (users, orgs, teams, etc.)
93
93
  - Reserved-name blocking with category-aware messages
94
94
  - Unicode anti-spoofing (NFKC + confusable detection + mixed-script/risk controls)
@@ -136,7 +136,7 @@ areConfusable("paypal", "pa\u0443pal"); // true
136
136
  confusableDistance("paypal", "pa\u0443pal"); // graded similarity + chainDepth + explainable steps
137
137
  ```
138
138
 
139
- For measured visual scoring, pass the optional weights from confusable-vision (1,397 SSIM-scored pairs across 230 fonts, including 494 cross-script pairs). The `context` filter restricts to identifier-valid, domain-valid, or all pairs.
139
+ For measured visual scoring, pass the optional weights from confusable-vision (4,174 pairs scored across 245 fonts using vector-outline raycasting, including 3,525 cross-script pairs). Each pair has a `danger` score (0–1); the default 0.5 floor favours recall, use `danger > 0.7` for precision. The `context` filter restricts to identifier-valid, domain-valid, or all pairs.
140
140
 
141
141
  ```typescript
142
142
  import { confusableDistance } from "namespace-guard";
@@ -149,11 +149,37 @@ const result = confusableDistance("paypal", "pa\u0443pal", {
149
149
  // result.similarity, result.steps (including "visual-weight" reason for novel pairs)
150
150
  ```
151
151
 
152
+ ### Realistic Domain Spoof Detection
153
+
154
+ For domain name validation, `isDomainSpoof()` only flags threats that could produce registrable domain names. ICANN registrars enforce single-script labels, so mixed-script spoofs (e.g., one Cyrillic letter in a Latin domain) are excluded — they can't actually be registered.
155
+
156
+ ```typescript
157
+ import { isDomainSpoof } from "namespace-guard";
158
+ import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
159
+
160
+ // Full-Cyrillic lookalike — registrable and deceptive
161
+ isDomainSpoof("\u0440\u0430\u0443\u0440\u0430\u04CF", "paypal", { weights: CONFUSABLE_WEIGHTS });
162
+ // { spoof: true, script: "cyrillic", danger: 0.91, substitutions: [...] }
163
+
164
+ // Mixed-script — not registrable, not flagged
165
+ isDomainSpoof("\u0440aypal", "paypal", { weights: CONFUSABLE_WEIGHTS });
166
+ // { spoof: false }
167
+
168
+ // Known-legitimate non-Latin domain — skip via allowlist
169
+ isDomainSpoof("\u0430\u0441\u0435", "ace", {
170
+ weights: CONFUSABLE_WEIGHTS,
171
+ allowlist: ["\u0430\u0441\u0435"],
172
+ });
173
+ // { spoof: false }
174
+ ```
175
+
176
+ The `danger` score (0–1) is always returned when a script match is found, even if below the `minDanger` threshold (default 0.5). Set `minDanger: 0.7` for higher precision.
177
+
152
178
  ## Research
153
179
 
154
180
  Two research tracks feed the library:
155
181
 
156
- **Visual measurement.** 1,397 confusable pairs rendered across 230 system fonts, scored by structural similarity (SSIM). 494 of these are novel cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Full dataset published as [confusable-vision](https://github.com/paultendo/confusable-vision) (CC-BY-4.0).
182
+ **Visual measurement.** 4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting ([RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). 3,525 of these are cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Each pair carries a `danger` score (0–1) representing geometric similarity; the shipped floor is 0.5 (for higher precision, try 0.7). Full dataset published as [confusable-vision](https://github.com/paultendo/confusable-vision) (CC-BY-4.0).
157
183
 
158
184
  **Normalisation composability.** 31 characters where Unicode's confusables.txt and NFKC normalisation disagree. Two production maps (`CONFUSABLE_MAP` for NFKC-first, `CONFUSABLE_MAP_FULL` for raw-input pipelines), a benchmark corpus, and composability vectors wired into CLI drift baselines. Submitted to [Unicode public review (PRI #540)](https://www.unicode.org/review/pri540/) and published in [accumulated feedback](https://www.unicode.org/review/pri540/feedback.html).
159
185
 
@@ -243,7 +269,7 @@ Migration guides per adapter: [docs/reference.md#canonical-uniqueness-migration-
243
269
  - LLM preprocessing (`canonicalise`, `scan`, `isClean`): [docs/reference.md#llm-pipeline-preprocessing](docs/reference.md#llm-pipeline-preprocessing)
244
270
  - Benchmark corpus (`confusable-bench.v1`): [docs/reference.md#confusable-benchmark-corpus-artifact](docs/reference.md#confusable-benchmark-corpus-artifact)
245
271
  - Advanced primitives (`skeleton`, `areConfusable`, `confusableDistance`): [docs/reference.md#advanced-security-primitives](docs/reference.md#advanced-security-primitives)
246
- - Confusable weights (SSIM-scored pairs, including cross-script): [docs/reference.md#confusable-weights-subpath](docs/reference.md#confusable-weights-subpath)
272
+ - Confusable weights (scored pairs, including cross-script): [docs/reference.md#confusable-weights-subpath](docs/reference.md#confusable-weights-subpath)
247
273
  - Cross-script detection: [docs/reference.md#cross-script-detection](docs/reference.md#cross-script-detection)
248
274
  - CLI reference: [docs/reference.md#cli](docs/reference.md#cli)
249
275
  - API reference: [docs/reference.md#api-reference](docs/reference.md#api-reference)