namespace-guard 0.19.0 → 0.20.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +36 -10
- package/dist/cli.js +2218 -2218
- package/dist/cli.mjs +2218 -2218
- package/dist/composability-vectors.js +2218 -2218
- package/dist/composability-vectors.mjs +2218 -2218
- package/dist/confusable-weights.d.mts +3 -5
- package/dist/confusable-weights.d.ts +3 -5
- package/dist/confusable-weights.js +4905 -2740
- package/dist/confusable-weights.mjs +4905 -2740
- package/dist/font-specific-weights.d.mts +1 -1
- package/dist/font-specific-weights.d.ts +1 -1
- package/dist/index.d.mts +90 -9
- package/dist/index.d.ts +90 -9
- package/dist/index.js +2288 -2222
- package/dist/index.mjs +2287 -2222
- package/dist/profanity-en.js +2218 -2218
- package/dist/profanity-en.mjs +2218 -2218
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
[](https://www.typescriptlang.org/)
|
|
6
6
|
[](https://opensource.org/licenses/MIT)
|
|
7
7
|
|
|
8
|
-
**The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM Denial of Spend defence in one zero-dependency package.
|
|
8
|
+
**The world's first library that detects confusable characters across non-Latin scripts.** Slug claimability, Unicode anti-spoofing, and LLM [Denial of Spend](https://paultendo.github.io/posts/confusable-vision-llm-attack-tests/) defence in one zero-dependency package.
|
|
9
9
|
|
|
10
10
|
- Live demo: https://paultendo.github.io/namespace-guard/
|
|
11
11
|
- Blog post: https://paultendo.github.io/posts/namespace-guard-launch/
|
|
@@ -14,19 +14,19 @@
|
|
|
14
14
|
|
|
15
15
|
Existing confusable standards (TR39, IDNA) map non-Latin characters to Latin equivalents. They have zero coverage for confusable pairs *between* two non-Latin scripts.
|
|
16
16
|
|
|
17
|
-
namespace-guard ships
|
|
17
|
+
namespace-guard ships 3,525 cross-script pairs from [confusable-vision](https://github.com/paultendo/confusable-vision) (measured across 245 system fonts using vector-outline raycasting — [RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). This catches attacks that no other library detects:
|
|
18
18
|
|
|
19
19
|
```typescript
|
|
20
20
|
import { areConfusable, detectCrossScriptRisk } from "namespace-guard";
|
|
21
21
|
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
|
|
22
22
|
|
|
23
|
-
// Hangul ᅵ and Han 丨 are visually identical (
|
|
23
|
+
// Hangul ᅵ and Han 丨 are visually identical (ray distance 0.004, Arial Unicode MS)
|
|
24
24
|
areConfusable("\u1175", "\u4E28", { weights: CONFUSABLE_WEIGHTS }); // true
|
|
25
25
|
|
|
26
|
-
// Greek Τ and Han 丅 are near-identical (
|
|
26
|
+
// Greek Τ and Han 丅 are near-identical (multiple fonts)
|
|
27
27
|
areConfusable("\u03A4", "\u4E05", { weights: CONFUSABLE_WEIGHTS }); // true
|
|
28
28
|
|
|
29
|
-
// Cyrillic І and Greek Ι are
|
|
29
|
+
// Cyrillic І and Greek Ι are identical outlines (62 fonts)
|
|
30
30
|
areConfusable("\u0406", "\u0399", { weights: CONFUSABLE_WEIGHTS }); // true
|
|
31
31
|
|
|
32
32
|
// Without weights, only skeleton-based detection (TR39 coverage)
|
|
@@ -37,7 +37,7 @@ const risk = detectCrossScriptRisk("\u1175\u4E28", { weights: CONFUSABLE_WEIGHTS
|
|
|
37
37
|
// { riskLevel: "high", scripts: ["han", "hangul"], crossScriptPairs: [...] }
|
|
38
38
|
```
|
|
39
39
|
|
|
40
|
-
|
|
40
|
+
4,174 total confusable pairs scored by visual measurement (3,111 TR39-confirmed, 1,063 novel). Each pair carries a `danger` score (0–1) representing geometric similarity across fonts; the shipped dataset uses a 0.5 floor. For higher precision, filter at `danger > 0.7` (574 pairs). Cross-script data licensed CC-BY-4.0.
|
|
41
41
|
|
|
42
42
|
## Installation
|
|
43
43
|
|
|
@@ -88,7 +88,7 @@ if (!result.claimed) {
|
|
|
88
88
|
|
|
89
89
|
## What You Get
|
|
90
90
|
|
|
91
|
-
- **Cross-script confusable detection** with
|
|
91
|
+
- **Cross-script confusable detection** with 3,525 measured pairs between non-Latin scripts
|
|
92
92
|
- Cross-table collision checks (users, orgs, teams, etc.)
|
|
93
93
|
- Reserved-name blocking with category-aware messages
|
|
94
94
|
- Unicode anti-spoofing (NFKC + confusable detection + mixed-script/risk controls)
|
|
@@ -136,7 +136,7 @@ areConfusable("paypal", "pa\u0443pal"); // true
|
|
|
136
136
|
confusableDistance("paypal", "pa\u0443pal"); // graded similarity + chainDepth + explainable steps
|
|
137
137
|
```
|
|
138
138
|
|
|
139
|
-
For measured visual scoring, pass the optional weights from confusable-vision (
|
|
139
|
+
For measured visual scoring, pass the optional weights from confusable-vision (4,174 pairs scored across 245 fonts using vector-outline raycasting, including 3,525 cross-script pairs). Each pair has a `danger` score (0–1); the default 0.5 floor favours recall, use `danger > 0.7` for precision. The `context` filter restricts to identifier-valid, domain-valid, or all pairs.
|
|
140
140
|
|
|
141
141
|
```typescript
|
|
142
142
|
import { confusableDistance } from "namespace-guard";
|
|
@@ -149,11 +149,37 @@ const result = confusableDistance("paypal", "pa\u0443pal", {
|
|
|
149
149
|
// result.similarity, result.steps (including "visual-weight" reason for novel pairs)
|
|
150
150
|
```
|
|
151
151
|
|
|
152
|
+
### Realistic Domain Spoof Detection
|
|
153
|
+
|
|
154
|
+
For domain name validation, `isDomainSpoof()` only flags threats that could produce registrable domain names. ICANN registrars enforce single-script labels, so mixed-script spoofs (e.g., one Cyrillic letter in a Latin domain) are excluded — they can't actually be registered.
|
|
155
|
+
|
|
156
|
+
```typescript
|
|
157
|
+
import { isDomainSpoof } from "namespace-guard";
|
|
158
|
+
import { CONFUSABLE_WEIGHTS } from "namespace-guard/confusable-weights";
|
|
159
|
+
|
|
160
|
+
// Full-Cyrillic lookalike — registrable and deceptive
|
|
161
|
+
isDomainSpoof("\u0440\u0430\u0443\u0440\u0430\u04CF", "paypal", { weights: CONFUSABLE_WEIGHTS });
|
|
162
|
+
// { spoof: true, script: "cyrillic", danger: 0.91, substitutions: [...] }
|
|
163
|
+
|
|
164
|
+
// Mixed-script — not registrable, not flagged
|
|
165
|
+
isDomainSpoof("\u0440aypal", "paypal", { weights: CONFUSABLE_WEIGHTS });
|
|
166
|
+
// { spoof: false }
|
|
167
|
+
|
|
168
|
+
// Known-legitimate non-Latin domain — skip via allowlist
|
|
169
|
+
isDomainSpoof("\u0430\u0441\u0435", "ace", {
|
|
170
|
+
weights: CONFUSABLE_WEIGHTS,
|
|
171
|
+
allowlist: ["\u0430\u0441\u0435"],
|
|
172
|
+
});
|
|
173
|
+
// { spoof: false }
|
|
174
|
+
```
|
|
175
|
+
|
|
176
|
+
The `danger` score (0–1) is always returned when a script match is found, even if below the `minDanger` threshold (default 0.5). Set `minDanger: 0.7` for higher precision.
|
|
177
|
+
|
|
152
178
|
## Research
|
|
153
179
|
|
|
154
180
|
Two research tracks feed the library:
|
|
155
181
|
|
|
156
|
-
**Visual measurement.**
|
|
182
|
+
**Visual measurement.** 4,174 confusable pairs measured across 245 system fonts using vector-outline raycasting ([RaySpace](https://paultendo.github.io/posts/rayspace-methodology/)). 3,525 of these are cross-script pairs between non-Latin scripts (Hangul/Han, Cyrillic/Greek, Cyrillic/Arabic, and more) with zero coverage in any existing standard. Each pair carries a `danger` score (0–1) representing geometric similarity; the shipped floor is 0.5 (for higher precision, try 0.7). Full dataset published as [confusable-vision](https://github.com/paultendo/confusable-vision) (CC-BY-4.0).
|
|
157
183
|
|
|
158
184
|
**Normalisation composability.** 31 characters where Unicode's confusables.txt and NFKC normalisation disagree. Two production maps (`CONFUSABLE_MAP` for NFKC-first, `CONFUSABLE_MAP_FULL` for raw-input pipelines), a benchmark corpus, and composability vectors wired into CLI drift baselines. Submitted to [Unicode public review (PRI #540)](https://www.unicode.org/review/pri540/) and published in [accumulated feedback](https://www.unicode.org/review/pri540/feedback.html).
|
|
159
185
|
|
|
@@ -243,7 +269,7 @@ Migration guides per adapter: [docs/reference.md#canonical-uniqueness-migration-
|
|
|
243
269
|
- LLM preprocessing (`canonicalise`, `scan`, `isClean`): [docs/reference.md#llm-pipeline-preprocessing](docs/reference.md#llm-pipeline-preprocessing)
|
|
244
270
|
- Benchmark corpus (`confusable-bench.v1`): [docs/reference.md#confusable-benchmark-corpus-artifact](docs/reference.md#confusable-benchmark-corpus-artifact)
|
|
245
271
|
- Advanced primitives (`skeleton`, `areConfusable`, `confusableDistance`): [docs/reference.md#advanced-security-primitives](docs/reference.md#advanced-security-primitives)
|
|
246
|
-
- Confusable weights (
|
|
272
|
+
- Confusable weights (scored pairs, including cross-script): [docs/reference.md#confusable-weights-subpath](docs/reference.md#confusable-weights-subpath)
|
|
247
273
|
- Cross-script detection: [docs/reference.md#cross-script-detection](docs/reference.md#cross-script-detection)
|
|
248
274
|
- CLI reference: [docs/reference.md#cli](docs/reference.md#cli)
|
|
249
275
|
- API reference: [docs/reference.md#api-reference](docs/reference.md#api-reference)
|