@agulbra/uts58 0.2.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,23 @@
1
+
2
+ Copyright (c) 2025, ICANN
3
+
4
+ Redistribution and use in source and binary forms, with or without
5
+ modification, are permitted provided that the following conditions are met:
6
+
7
+ 1. Redistributions of source code must retain the above copyright notice, this
8
+ list of conditions and the following disclaimer.
9
+
10
+ 2. Redistributions in binary form must reproduce the above copyright notice,
11
+ this list of conditions and the following disclaimer in the documentation
12
+ and/or other materials provided with the distribution.
13
+
14
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17
+ DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18
+ FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19
+ DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20
+ SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21
+ CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22
+ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23
+ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
package/README.md ADDED
@@ -0,0 +1,257 @@
1
+ # uts58
2
+
3
+ A JavaScript implementation of [UTS58](https://www.unicode.org/reports/tr58/),
4
+ the Unicode spec for finding links in running text. Given a chunk of text, it
5
+ returns the URLs in it along with their UTF-16 offsets. This is a port of
6
+ the [Ruby `uts58` gem](https://github.com/arnt/uts58); the test suite is the
7
+ same, with very minor differences.
8
+
9
+ Tested extensively on relevant OSes: [![CI](https://github.com/arnt/uts58/actions/workflows/javascript.yml/badge.svg)](https://github.com/arnt/uts58/actions/workflows/javascript.yml)
10
+
11
+ ## Install
12
+
13
+ ```sh
14
+ npm install @agulbra/uts58
15
+ ```
16
+
17
+ ESM-only; Node 18+ (uses Unicode property escapes and lookbehinds).
18
+
19
+ ## Usage
20
+
21
+ ```js
22
+ import { extractUrls, extractUrlsWithIndices } from '@agulbra/uts58';
23
+
24
+ extractUrlsWithIndices('see https://example.com/ for details');
25
+ // => [{ url: 'https://example.com/', indices: [4, 24] }]
26
+
27
+ extractUrls('see https://example.com/ for details');
28
+ // => ['https://example.com/']
29
+ ```
30
+
31
+ The API mirrors `twitter-text`'s `Extractor#extractUrlsWithIndices` closely;
32
+ it was written to provide what Mastodon-style consumers need. The two
33
+ top-level functions above also strip partly overlapping matches via
34
+ `Extractor#removeOverlappingEntities`: candidates are sorted by start
35
+ offset and anything that begins inside an earlier survivor's span is
36
+ dropped. Length doesn't enter into it — the earlier-starting candidate
37
+ wins even if a later one is longer. Import `Extractor` directly if you'd
38
+ rather merge with other extractors (mentions, hashtags, …) and resolve
39
+ overlap across all of them yourself.
40
+
41
+ Unlike `twitter-text`, the functions take no options object. What counts as
42
+ a link is fixed by [UTS58](https://www.unicode.org/reports/tr58/); there is
43
+ no `extractUrlsWithoutProtocol`-style switch, because the spec already says
44
+ how scheme-less input is handled (see below).
45
+
46
+ Input without a scheme is recognised, and `https://` is prepended in the
47
+ returned `url`:
48
+
49
+ ```js
50
+ extractUrlsWithIndices('blogspot.com is still a thing');
51
+ // => [{ url: 'https://blogspot.com', indices: [0, 12] }]
52
+ ```
53
+
54
+ IDNs are decoded to use UTF-8 in the output, for better readability:
55
+
56
+ ```js
57
+ extractUrls('xn-----ctdbabcfhu9c2b9l1acccr4c.xn--mgbah1a3hjkrd')[0];
58
+ // => 'https://تجربة-القبول-الشامل.موريتانيا'
59
+ ```
60
+
61
+ (Admittedly that output isn't very readable if you can't read Arabic. But
62
+ the input wasn't readable to anyone, no matter what languages they can read.)
63
+
64
+ Trailing punctuation, balanced brackets, ports, paths, queries and
65
+ fragments are handled per the spec. Indices in the output are UTF-16
66
+ code unit offsets — the units used by `String.prototype.slice`,
67
+ `String#length`, the DOM, and text editors — so `text.slice(start,
68
+ end)` returns the matched substring directly, even across characters
69
+ outside the BMP. (The Ruby gem reports codepoint offsets instead, which
70
+ are idiomatic for Ruby strings; `example.com/🐪/#camel` is a good test
71
+ of the difference, since the emoji is one codepoint but two UTF-16
72
+ units.)
73
+
74
+ ## Email addresses
75
+
76
+ ```js
77
+ import {
78
+ extractEmailAddresses,
79
+ extractEmailAddressesWithIndices,
80
+ extractEntities,
81
+ extractEntitiesWithIndices,
82
+ } from '@agulbra/uts58';
83
+
84
+ extractEmailAddressesWithIndices('contact info@grå.org today');
85
+ // => [{ email: 'info@grå.org', url: 'mailto:info@grå.org', indices: [8, 20] }]
86
+
87
+ extractEmailAddresses('contact info@grå.org today');
88
+ // => ['info@grå.org']
89
+ ```
90
+
91
+ Each result carries both the bare `email` and a `mailto:` `url`, so it
92
+ drops straight into anything that already renders a `url` entity. The
93
+ domain is IDN-decoded the same way as in `extractUrls`, and a leading
94
+ `mailto:` in the input is absorbed into `indices` per UTS58 5.2.
95
+
96
+ A plain address overlaps the bare domain that `extractUrls` would find
97
+ after the `@`. `extractEntitiesWithIndices` runs both extractors,
98
+ sorts by start offset, and removes overlaps — the earlier-starting
99
+ email wins over the domain inside it:
100
+
101
+ ```js
102
+ extractEntities('mail arnt@grå.org or see blogspot.com');
103
+ // => ['mailto:arnt@grå.org', 'https://blogspot.com']
104
+ ```
105
+
106
+ ## Choosing the public-suffix check
107
+
108
+ To decide whether `something.example` is a plausible link, the extractor
109
+ checks the host against a public-suffix table. Which table is a bundle-time
110
+ choice — pick the entry point that fits, and your bundler ships only that
111
+ table:
112
+
113
+ | import | table | gzipped |
114
+ | --- | --- | --- |
115
+ | `@agulbra/uts58` (default) | Public Suffix List, ICANN section | ~5 KB |
116
+ | `@agulbra/uts58/iana` | IANA root-zone TLDs | ~5 KB |
117
+ | `@agulbra/uts58/core` | none — you supply the check | 0 KB |
118
+
119
+ All three expose the same API. The two tables are about the same size and
120
+ agree on nearly every host — the difference is how strict the check is for
121
+ the handful of TLDs that only register at the second level. `@agulbra/uts58/iana`
122
+ asks only "is the rightmost label a real TLD": enough to tell `blogspot.jp`
123
+ from `blogspot.exe` and to reject typos like `example.cmo`, but it treats a
124
+ bare `foo.za` as plausible. The default `@agulbra/uts58` carries the PSL, which knows
125
+ South Africa registers under `co.za` / `org.za` and so rejects a bare
126
+ `foo.za`. If that distinction doesn't matter to you, the tables are
127
+ interchangeable.
128
+
129
+ Neither reproduces the PSL's wildcard/exception rules: the question here is
130
+ "could this be a link", not "where is the exact registrable boundary", so a
131
+ flat membership test is all it does. The PSL table is folded accordingly —
132
+ `!` exceptions dropped, `*.foo` collapsed to `foo`, and any suffix made
133
+ redundant by a shorter one removed (with `no` present, `møre-og-romsdal.no`
134
+ is dropped). That folding is what keeps it down to ~5 KB.
135
+
136
+ `@agulbra/uts58/core` bundles no table at all. Bring your own check — over a suffix
137
+ set, or wrapping a library you already depend on:
138
+
139
+ ```js
140
+ import { Extractor } from '@agulbra/uts58/core';
141
+ import { parse } from 'tldts';
142
+
143
+ const ex = new Extractor({
144
+ isPlausibleHost: (host) => {
145
+ const p = parse(host);
146
+ return !!p.domain && p.isIcann && p.publicSuffix !== 'invalid';
147
+ },
148
+ });
149
+ ex.extractUrls('see example.com here');
150
+ ```
151
+
152
+ That route is also how you get exact PSL semantics back, at the cost of a
153
+ dependency you choose rather than one this package forces on you. (`@agulbra/uts58`
154
+ itself depends only on `punycode`.)
155
+
156
+ ## Suggested test cases and notable behaviour
157
+
158
+ A few sharp edges worth covering in your own tests if you're swapping
159
+ `twitter-text` out, or just using this from scratch.
160
+
161
+ **The `href` and the visible text are not the same string.** For
162
+ `see example.com here`, the `indices` span
163
+ `example.com`, but `url` is the longer `https://example.com`.
164
+ Use `url` for the `href` attribute, slice the original text by
165
+ `indices` for the visible content. A test that compares
166
+ `text.slice(start, end) === url` will fail on every scheme-less input,
167
+ and on every IDN where A-labels were decoded.
168
+
169
+ **No options object.** There is no `extractUrlsWithoutProtocol: false`
170
+ switch. If you want only scheme-bearing URLs, filter on the matched
171
+ substring (not on `url`, which always carries `https://`):
172
+
173
+ ```js
174
+ extractUrlsWithIndices(text)
175
+ .filter((r) => /^https?:\/\//i.test(text.slice(...r.indices)));
176
+ ```
177
+
178
+ **No `autoLink`.** This package extracts; it doesn't render. There's
179
+ no equivalent of `twitter-text`'s `autoLink`, `autoLinkUrlsCustom`,
180
+ `htmlEscape`, etc. — building HTML is the caller's job, which keeps
181
+ escaping decisions where they belong.
182
+
183
+ **No mentions, hashtags, cashtags, replies.** UTS58 doesn't define
184
+ them, so this package doesn't either. If you need them, run
185
+ `twitter-text` (or another extractor) for those alongside this one and
186
+ merge with `Extractor#removeOverlappingEntities`.
187
+
188
+ **Overlap resolution is start-wins, not longest-wins.** Worth a test
189
+ when you merge entities from multiple extractors. `ask
190
+ alice@example.com/02074960909 for details` shows why. The raw
191
+ extractors find both email `alice@example.com` at `[4, 21]` and url
192
+ `https://example.com/02074960909` at `[10, 33]`. Start-wins keeps
193
+ `alice@example.com`, which is what a reader would call right, at least
194
+ one who reads 02074960909 as a phone number.
195
+ Longest-wins would keep the longer `https://example.com/02074960909`.
196
+
197
+ **`maxLength` measures the matched input span.** Not the returned URL.
198
+ A 12-codepoint cap keeps `blogspot.com` (12) and drops
199
+ `https://example.com`, even though the input span and `url` happen to
200
+ be identical there. The asymmetry shows up the other way for
201
+ `example.com` (11 input cp, 19 in `url`) — the cap of 12 keeps it.
202
+
203
+ Even though most of the API counts in terms of UTF-16, like the String
204
+ class, maxLength uses codepoints. The reason is that most of the API
205
+ is for software developers, but maxLength is for end-users, and
206
+ codepoints are closer to what end-users see. "☺" and "😀" are both one
207
+ codepoint long. (This isn't quite perfect: "é" may be either one or
208
+ two codepoints long and "🇳🇴" is two. Life is hard.)
209
+
210
+ **`mailto:` is absorbed into `indices`.** Per UTS58 5.2, the input
211
+ `mailto:abcd@example.com` returns an entity whose span covers the
212
+ whole 23-codepoint run, not just the address. The `email` field still
213
+ holds the bare address. If your link-rendering code assumes the span
214
+ starts at the local-part, mailto inputs will surprise it.
215
+
216
+ ## What's not here
217
+
218
+ - **Link validation.** Recognised URLs are not fetched, normalised
219
+ beyond IDN decoding, or their hostnames checked in the DNS. There is
220
+ no attempt at checking for possible attacks (`Мышκин.рф` and
221
+ `Мышкин.рф` are both detected, note the Greek kappa in the middle of
222
+ the prince's name).
223
+
224
+ ## Regenerating the generated tables
225
+
226
+ `src/constants.js` (the link-termination and bracket tables) is packed from
227
+ the Ruby reference's `constants.rb`:
228
+
229
+ ```sh
230
+ npm run maketables
231
+ ```
232
+
233
+ That reads `../ruby/lib/uts58/constants.rb` by default; pass a path to point
234
+ it elsewhere. `constants.rb` is itself generated on the Ruby side, where
235
+ `tools/maketables.rb` downloads the UTS58 data files (`LinkTerm.txt` and
236
+ `LinkBracket.txt`) straight from unicode.org — so the source of truth is
237
+ Unicode's published data, not a copy kept in this repo.
238
+
239
+ `src/tlds-iana.js` and `src/suffixes-psl.js` are the public-suffix tables.
240
+ With no arguments they're fetched from their canonical sources
241
+ ([IANA](https://data.iana.org/TLD/tlds-alpha-by-domain.txt),
242
+ [PSL](https://publicsuffix.org/list/public_suffix_list.dat)); pass local
243
+ paths to regenerate offline:
244
+
245
+ ```sh
246
+ npm run maketlds
247
+ npm run maketlds -- /path/to/tlds-alpha-by-domain.txt /path/to/public_suffix_list.dat
248
+ ```
249
+
250
+ ## License
251
+
252
+ BSD-2-Clause. See `LICENSE`.
253
+
254
+ FWIW, I wrote this as part of my work at ICANN and will maintain it as
255
+ part of the same work. (I resolve problems relating to Unicode in
256
+ domains, email addresses and similar, so more people, more
257
+ communities, can use the internet in the way they prefer.)
package/package.json ADDED
@@ -0,0 +1,52 @@
1
+ {
2
+ "name": "@agulbra/uts58",
3
+ "version": "0.2.3",
4
+ "description": "UTS #58 web-link extraction for JavaScript.",
5
+ "type": "module",
6
+ "main": "src/index.js",
7
+ "types": "src/index.d.ts",
8
+ "exports": {
9
+ ".": {
10
+ "types": "./src/index.d.ts",
11
+ "default": "./src/index.js"
12
+ },
13
+ "./iana": {
14
+ "types": "./src/index.d.ts",
15
+ "default": "./src/index-iana.js"
16
+ },
17
+ "./core": {
18
+ "types": "./src/core.d.ts",
19
+ "default": "./src/core.js"
20
+ }
21
+ },
22
+ "files": ["src"],
23
+ "sideEffects": false,
24
+ "engines": {
25
+ "node": ">=18"
26
+ },
27
+ "scripts": {
28
+ "test": "node --test test/*.test.js",
29
+ "maketables": "node tools/maketables.js",
30
+ "maketlds": "node tools/maketlds.js",
31
+ "prepack": "cp ../LICENSE LICENSE",
32
+ "postpack": "rm -f LICENSE"
33
+ },
34
+ "keywords": ["uts58", "url", "extractor", "idn", "twitter-text"],
35
+ "author": "Arnt Gulbrandsen <arnt@gulbrandsen.priv.no>",
36
+ "license": "BSD-2-Clause",
37
+ "publishConfig": {
38
+ "access": "public"
39
+ },
40
+ "repository": {
41
+ "type": "git",
42
+ "url": "git+https://github.com/arnt/uts58.git",
43
+ "directory": "javascript"
44
+ },
45
+ "bugs": {
46
+ "url": "https://github.com/arnt/uts58/issues"
47
+ },
48
+ "homepage": "https://github.com/arnt/uts58#readme",
49
+ "dependencies": {
50
+ "punycode": "^2.3.1"
51
+ }
52
+ }
package/src/api.js ADDED
@@ -0,0 +1,37 @@
1
+ // The six module-level convenience functions, built over one shared
2
+ // Extractor. Factored out so each entry point (default/PSL, /iana, and a
3
+ // caller's own /core instance) gets the identical wrappers without copying
4
+ // them. Each wrapper drops overlapping candidates, keeping the
5
+ // earlier-starting one.
6
+
7
+ export function makeApi(shared) {
8
+ const extractUrlsWithIndices = (text) =>
9
+ shared.removeOverlappingEntities(shared.extractUrlsWithIndices(text));
10
+
11
+ const extractUrls = (text) =>
12
+ extractUrlsWithIndices(text).map((r) => r.url);
13
+
14
+ const extractEmailAddressesWithIndices = (text) =>
15
+ shared.removeOverlappingEntities(shared.extractEmailAddressesWithIndices(text));
16
+
17
+ const extractEmailAddresses = (text) =>
18
+ extractEmailAddressesWithIndices(text).map((r) => r.email);
19
+
20
+ const extractEntitiesWithIndices = (text) =>
21
+ shared.removeOverlappingEntities([
22
+ ...shared.extractUrlsWithIndices(text),
23
+ ...shared.extractEmailAddressesWithIndices(text),
24
+ ]);
25
+
26
+ const extractEntities = (text) =>
27
+ extractEntitiesWithIndices(text).map((e) => e.url);
28
+
29
+ return {
30
+ extractUrlsWithIndices,
31
+ extractUrls,
32
+ extractEmailAddressesWithIndices,
33
+ extractEmailAddresses,
34
+ extractEntitiesWithIndices,
35
+ extractEntities,
36
+ };
37
+ }
@@ -0,0 +1,103 @@
1
+ // Generated by tools/maketables.js from the Ruby reference's constants.rb.
2
+ // Do not edit by hand; rerun the generator instead.
3
+
4
+ export const OPENERS = new Map([
5
+ [0x29, 0x28],
6
+ [0x3e, 0x3c],
7
+ [0x5d, 0x5b],
8
+ [0x7d, 0x7b],
9
+ [0xf3b, 0xf3a],
10
+ [0xf3d, 0xf3c],
11
+ [0x169c, 0x169b],
12
+ [0x2046, 0x2045],
13
+ [0x207e, 0x207d],
14
+ [0x208e, 0x208d],
15
+ [0x2309, 0x2308],
16
+ [0x230b, 0x230a],
17
+ [0x2769, 0x2768],
18
+ [0x276b, 0x276a],
19
+ [0x276d, 0x276c],
20
+ [0x276f, 0x276e],
21
+ [0x2771, 0x2770],
22
+ [0x2773, 0x2772],
23
+ [0x2775, 0x2774],
24
+ [0x27c6, 0x27c5],
25
+ [0x27e7, 0x27e6],
26
+ [0x27e9, 0x27e8],
27
+ [0x27eb, 0x27ea],
28
+ [0x27ed, 0x27ec],
29
+ [0x27ef, 0x27ee],
30
+ [0x2984, 0x2983],
31
+ [0x2986, 0x2985],
32
+ [0x2988, 0x2987],
33
+ [0x298a, 0x2989],
34
+ [0x298c, 0x298b],
35
+ [0x298e, 0x298f],
36
+ [0x2990, 0x298d],
37
+ [0x2992, 0x2991],
38
+ [0x2994, 0x2993],
39
+ [0x2996, 0x2995],
40
+ [0x2998, 0x2997],
41
+ [0x29d9, 0x29d8],
42
+ [0x29db, 0x29da],
43
+ [0x29fd, 0x29fc],
44
+ [0x2e23, 0x2e22],
45
+ [0x2e25, 0x2e24],
46
+ [0x2e27, 0x2e26],
47
+ [0x2e29, 0x2e28],
48
+ [0x2e56, 0x2e55],
49
+ [0x2e58, 0x2e57],
50
+ [0x2e5a, 0x2e59],
51
+ [0x2e5c, 0x2e5b],
52
+ [0x3009, 0x3008],
53
+ [0x300b, 0x300a],
54
+ [0x300d, 0x300c],
55
+ [0x300f, 0x300e],
56
+ [0x3011, 0x3010],
57
+ [0x3015, 0x3014],
58
+ [0x3017, 0x3016],
59
+ [0x3019, 0x3018],
60
+ [0x301b, 0x301a],
61
+ [0xfe5a, 0xfe59],
62
+ [0xfe5c, 0xfe5b],
63
+ [0xfe5e, 0xfe5d],
64
+ [0xff09, 0xff08],
65
+ [0xff3d, 0xff3b],
66
+ [0xff5d, 0xff5b],
67
+ [0xff60, 0xff5f],
68
+ [0xff63, 0xff62],
69
+ ]);
70
+
71
+ const KIND_INCLUDE = 0, KIND_SOFT = 1, KIND_OPEN = 2, KIND_CLOSE = 3;
72
+ export const TERMINATION_KIND = Object.freeze({
73
+ include: KIND_INCLUDE, soft: KIND_SOFT, open: KIND_OPEN, close: KIND_CLOSE,
74
+ });
75
+
76
+ const _startsB64 = "IQAAACMAAAAnAAAAKAAAACkAAAAqAAAALAAAAC0AAAAuAAAALwAAADoAAAA8AAAAPQAAAD4AAAA/AAAAQAAAAFsAAABcAAAAXQAAAF4AAAB7AAAAfAAAAH0AAAB+AAAAoQAAAKsAAACsAAAAuwAAALwAAABKAQAAegMAAH4DAAB/AwAAhAMAAIcDAACIAwAAjAMAAI4DAACjAwAAMQUAAFkFAACJBQAAigUAAI0FAACRBQAAwwUAAMQFAADQBQAA7wUAAAAGAAAMBgAADQYAABsGAAAcBgAAHQYAACAGAAB0BgAA1AYAANUGAAAABwAACwcAAAwHAAANBwAADwcAAE0HAADABwAA+AcAAPoHAAD9BwAAMAgAADYIAAA3CAAAQAgAAF4IAABgCAAAcAgAAJcIAABkCQAAZgkAAIUJAACPCQAAkwkAAKoJAACyCQAAtgkAALwJAADHCQAAywkAANcJAADcCQAA3wkAAOYJAAABCgAABQoAAA8KAAATCgAAKgoAADIKAAA1CgAAOAoAADwKAAA+CgAARwoAAEsKAABRCgAAWQoAAF4KAABmCgAAgQoAAIUKAACPCgAAkwoAAKoKAACyCgAAtQoAALwKAADHCgAAywoAANAKAADgCgAA5goAAPkKAAABCwAABQsAAA8LAAATCwAAKgsAADILAAA1CwAAPAsAAEcLAABLCwAAVQsAAFwLAABfCwAAZgsAAIILAACFCwAAjgsAAJILAACZCwAAnAsAAJ4LAACjCwAAqAsAAK4LAAC+CwAAxgsAAMoLAADQCwAA1wsAAOYLAAAADAAADgwAABIMAAAqDAAAPAwAAEYMAABKDAAAVQwAAFgMAABcDAAAYAwAAGYMAAB3DAAAjgwAAJIMAACqDAAAtQwAALwMAADGDAAAygwAANUMAADcDAAA4AwAAOYMAADxDAAAAA0AAA4NAAASDQAARg0AAEoNAABUDQAAZg0AAIENAACFDQAAmg0AALMNAAC9DQAAwA0AAMoNAADPDQAA1g0AANgNAADmDQAA8g0AAAEOAAA/DgAAWg4AAIEOAACEDgAAhg4AAIwOAAClDgAApw4AAMAOAADGDgAAyA4AANAOAADcDgAAAA8AAAgPAAAJDwAADQ8AABMPAAA6DwAAOw8AADwPAAA9DwAAPg8AAEkPAABxDwAAeA8AAHoPAACZDwAAvg8AAM4PAAAAEAAAShAAAEwQAADHEAAAzRAAANAQAABKEgAAUBIAAFgSAABaEgAAYBIAAIoSAACQEgAAshIAALgSAADAEgAAwhIAAMgSAADYEgAAEhMAABgTAABdEwAAYRMAAGkTAACAEwAAoBMAAPgTAAAAFAAAbhYAAG8WAACBFgAAmxYAAJwWAACgFgAA6xYAAO4WAAAAFwAAHxcAADUXAABAFwAAYBcAAG4XAAByFwAAgBcAAKUXAADUFwAA1xcAANoXAADbFwAA4BcAAPAXAAAAGAAAAhgAAAYYAAAIGAAAChgAACAYAACAGAAAsBgAAAAZAAAgGQAAMBkAAEAZAABEGQAARhkAAHAZAACAGQAAsBkAANAZAADeGQAAHhoAAGAaAAB/GgAAkBoAAKAaAACoGgAArBoAALAaAADgGgAAABsAAE4bAABQGwAAWhsAAFwbAABdGwAAYBsAAH0bAACAGwAA/BsAADscAABAHAAATRwAAH4cAACAHAAAkBwAAL0cAADQHAAAAB0AABgfAAAgHwAASB8AAFAfAABZHwAAWx8AAF0fAABfHwAAgB8AALYfAADGHwAA1h8AAN0fAADyHwAA9h8AAAsgAAAYIAAAGiAAABsgAAAeIAAAHyAAACAgAAAkIAAAJSAAACogAAAwIAAAOSAAADsgAAA8IAAAPiAAAEUgAABGIAAARyAAAEogAABgIAAAZiAAAHAgAAB0IAAAfSAAAH4gAAB/IAAAjSAAAI4gAACQIAAAoCAAANAgAAAAIQAAkCEAAAgjAAAJIwAACiMAAAsjAAAMIwAAKyMAAEAkAABgJAAAWycAAGEnAABoJwAAaScAAGonAABrJwAAbCcAAG0nAABuJwAAbycAAHAnAABxJwAAcicAAHMnAAB0JwAAdScAAHYnAADFJwAAxicAAMcnAADmJwAA5ycAAOgnAADpJwAA6icAAOsnAADsJwAA7ScAAO4nAADvJwAA8CcAAIMpAACEKQAAhSkAAIYpAACHKQAAiCkAAIkpAACKKQAAiykAAIwpAACNKQAAjikAAI8pAACQKQAAkSkAAJIpAACTKQAAlCkAAJUpAACWKQAAlykAAJgpAACZKQAA2CkAANkpAADaKQAA2ykAANwpAAD8KQAA/SkAAP4pAAB2KwAA+SwAAPwsAAAnLQAALS0AADAtAABvLQAAfy0AAKAtAACoLQAAsC0AALgtAADALQAAyC0AANAtAADYLQAA4C0AAAAuAAAOLgAAHC4AAB4uAAAgLgAAIi4AACMuAAAkLgAAJS4AACYuAAAnLgAAKC4AACkuAAAqLgAALi4AAC8uAAA8LgAAPS4AAEEuAABCLgAATC4AAE0uAABOLgAAUC4AAFMuAABVLgAAVi4AAFcuAABYLgAAWS4AAFouAABbLgAAXC4AAF0uAACALgAAmy4AAAAvAADwLwAAATAAAAMwAAAIMAAACTAAAAowAAALMAAADDAAAA0wAAAOMAAADzAAABAwAAARMAAAEjAAABQwAAAVMAAAFjAAABcwAAAYMAAAGTAAABowAAAbMAAAHDAAAEEwAACZMAAABTEAADExAACQMQAA7zEAACAyAACQpAAA0KQAAP6kAAAApQAADaYAABCmAABApgAA86YAAACnAADxpwAAMKgAAECoAAB2qAAAgKgAAM6oAADQqAAA4KgAAC+pAAAwqQAAX6kAAICpAADHqQAAyqkAAM+pAADeqQAAAKoAAECqAABQqgAAXKoAAF2qAABgqgAA26oAAN+qAADgqgAA8KoAAPKqAAABqwAACasAABGrAAAgqwAAKKsAADCrAABwqwAA66sAAOyrAADwqwAAAKwAALDXAADL1wAAAPkAAHD6AAAA+wAAE/sAAB37AAA4+wAAPvsAAED7AABD+wAARvsAAPD9AAAS/gAAE/4AABX+AAAX/gAAIP4AAFD+AABU/gAAWP4AAFn+AABa/gAAW/4AAFz+AABd/gAAXv4AAF/+AABo/gAAcP4AAHb+AAD//gAAAf8AAAL/AAAI/wAACf8AAAr/AAAM/wAADf8AAA7/AAAP/wAAGv8AABz/AAAf/wAAIP8AADv/AAA8/wAAPf8AAD7/AABb/wAAXP8AAF3/AABe/wAAX/8AAGD/AABh/wAAYv8AAGP/AABk/wAAZf8AAML/AADK/wAA0v8AANr/AADg/wAA6P8AAPn/AAAAAAEADQABACgAAQA8AAEAPwABAFAAAQCAAAEAAAEBAAcBAQA3AQEAkAEBAKABAQDQAQEAgAIBAKACAQDgAgEAAAMBAC0DAQBQAwEAgAMBAJ8DAQCgAwEAyAMBANADAQDRAwEAAAQBAKAEAQCwBAEA2AQBAAAFAQAwBQEAbwUBAHwFAQCMBQEAlAUBAJcFAQCjBQEAswUBALsFAQDABQEAAAYBAEAHAQBgBwEAgAcBAIcHAQCyBwEAAAgBAAgIAQAKCAEANwgBADwIAQA/CAEAVwgBAFgIAQCnCAEA4AgBAPQIAQD7CAEAHwkBACAJAQA/CQEAgAkBALwJAQDSCQEABQoBAAwKAQAVCgEAGQoBADgKAQA/CgEAUAoBAFYKAQBYCgEAYAoBAMAKAQDrCgEA8AoBAPYKAQAACwEAOQsBADoLAQBACwEAWAsBAHgLAQCZCwEAqQsBAAAMAQCADAEAwAwBAPoMAQAwDQEAQA0BAGkNAQCODQEAYA4BAIAOAQCrDgEAsA4BAMIOAQDQDgEA+g4BADAPAQBVDwEAcA8BAIYPAQCwDwEA4A8BAAAQAQBHEAEAUhABAH8QAQC+EAEAwhABAM0QAQDQEAEA8BABAAARAQA2EQEAQREBAEQRAQBQEQEAgBEBAMURAQDHEQEAzREBAM4RAQDeEQEA4REBAAASAQATEgEAOBIBAD0SAQCAEgEAiBIBAIoSAQCPEgEAnxIBAKkSAQCwEgEA8BIBAAATAQAFEwEADxMBABMTAQAqEwEAMhMBADUTAQA7EwEARxMBAEsTAQBQEwEAVxMBAF0TAQBmEwEAcBMBAIATAQCLEwEAjhMBAJATAQC3EwEAwhMBAMUTAQDHEwEAzBMBANQTAQDXEwEA4RMBAAAUAQBLFAEAThQBAFoUAQBdFAEAgBQBANAUAQCAFQEAuBUBAMIVAQDGFQEAyRUBANgVAQAAFgEAQRYBAEMWAQBQFgEAYBYBAIAWAQDAFgEA0BYBAAAXAQAdFwEAMBcBADwXAQA/FwEAABgBAKAYAQD/GAEACRkBAAwZAQAVGQEAGBkBADcZAQA7GQEARBkBAEUZAQBGGQEAUBkBAKAZAQCqGQEA2hkBAAAaAQBCGgEARBoBAFAaAQCbGgEAnRoBAKEaAQCwGgEAABsBAGAbAQDAGwEA8BsBAAAcAQAKHAEAOBwBAEEcAQBEHAEAUBwBAHAcAQBxHAEAchwBAJIcAQCpHAEAAB0BAAgdAQALHQEAOh0BADwdAQA/HQEAUB0BAGAdAQBnHQEAah0BAJAdAQCTHQEAoB0BALAdAQDgHQEA4B4BAPceAQAAHwEAEh8BAD4fAQBDHwEARR8BALAfAQDAHwEA/x8BAAAkAQBwJAEAgCQBAJAvAQAAMAEAYDQBAABEAQAAYQEAAGgBAEBqAQBgagEAbmoBAHBqAQDAagEA0GoBAPBqAQD1agEAAGsBADdrAQA6awEARGsBAEVrAQBQawEAW2sBAGNrAQB9awEAQG0BAG5tAQBwbQEAQG4BAJduAQCZbgEAoG4BALtuAQAAbwEAT28BAI9vAQDgbwEA8G8BAABwAQD/jAEAgI0BAPCvAQD1rwEA/a8BAACwAQAysQEAULEBAFWxAQBksQEAcLEBAAC8AQBwvAEAgLwBAJC8AQCcvAEAn7wBAKC8AQAAzAEAAM0BALrOAQDgzgEAAM8BADDPAQBQzwEAANABAADRAQAp0QEAANIBAMDSAQDg0gEAANMBAGDTAQAA1AEAVtQBAJ7UAQCi1AEApdQBAKnUAQCu1AEAu9QBAL3UAQDF1AEAB9UBAA3VAQAW1QEAHtUBADvVAQBA1QEARtUBAErVAQBS1QEAqNYBAM7XAQCH2gEAi9oBAJvaAQCh2gEAAN8BACXfAQAA4AEACOABABvgAQAj4AEAJuABADDgAQCP4AEAAOEBADDhAQBA4QEATuEBAJDiAQDA4gEA/+IBANDkAQDQ5QEA/+UBAMDmAQDg5gEA/uYBAODnAQDo5wEA7ecBAPDnAQAA6AEAx+gBAADpAQBQ6QEAXukBAHHsAQAB7QEAAO4BAAXuAQAh7gEAJO4BACfuAQAp7gEANO4BADnuAQA77gEAQu4BAEfuAQBJ7gEAS+4BAE3uAQBR7gEAVO4BAFfuAQBZ7gEAW+4BAF3uAQBf7gEAYe4BAGTuAQBn7gEAbO4BAHTuAQB57gEAfu4BAIDuAQCL7gEAoe4BAKXuAQCr7gEA8O4BAADwAQAw8AEAoPABALHwAQDB8AEA0fABAADxAQDm8QEAEPIBAEDyAQBQ8gEAYPIBAADzAQB29gEAefYBANz2AQDw9gEAAPcBAOD3AQDw9wEAAPgBABD4AQBQ+AEAYPgBAJD4AQCw+AEAwPgBAND4AQAA+QEAYPoBAHD6AQCA+gEAjvoBAMj6AQDN+gEA3/oBAO/6AQAA+wEAlPsBAAAAAgAApwIAILgCALDOAgDw6wIAAPgCAAAAAwBQEwMAIAAOAAABDgA=";
77
+ const _packedB64 = "iQAAAJgAAACdAAAAogAAAKcAAACsAAAAsQAAALQAAAC5AAAA5AAAAO0AAADyAAAA9AAAAPsAAAD9AAAAaAEAAG4BAABwAQAAdwEAAOgBAADuAQAA8AEAAPcBAAD4AQAAqAIAAK0CAADoAgAA7QIAACAFAADcDQAA9A0AAPkNAAD8DQAAGA4AAB0OAAAoDgAAMA4AAIQOAAC8FAAAWBUAACAWAAAlFgAAKBYAADwWAAAIFwAADRcAABwXAACoFwAA0BcAACwYAAAxGAAAaBgAAG0YAABwGAAAfRgAAMgZAABMGwAAURsAAPwbAAApHAAALBwAADEcAAA0HAAAKB0AAMQeAADcHwAA5R8AAOgfAAC0IAAA1SAAANggAAD5IAAAbCEAAHkhAACoIQAARCIAAIwlAACVJQAADCYAADAmAABAJgAAoCYAAMAmAADIJgAA5CYAABAnAAAgJwAAOCcAAFwnAAB0JwAAjCcAAPgnAAAMKAAAKCgAAEAoAACgKAAAwCgAAMwoAADYKAAA5CgAAPAoAAAIKQAAICkAADQpAABEKQAAcCkAAHgpAADYKQAADCoAADQqAABEKgAAoCoAAMAqAADMKgAA5CoAABQrAAAkKwAANCsAAEArAACMKwAAxCsAAPwrAAAMLAAAMCwAAEAsAACgLAAAwCwAAMwsAADkLAAAEC0AACAtAAA0LQAAXC0AAHQtAACMLQAA3C0AAAwuAAAoLgAAQC4AAFQuAABoLgAAcC4AAHwuAACQLgAAqC4AAOQuAAAILwAAIC8AADQvAABALwAAXC8AAOgvAAAwMAAAQDAAAKAwAADkMAAAEDEAACAxAAA0MQAAWDEAAGgxAAB0MQAAjDEAALwxAAAwMgAAQDIAAKAyAADMMgAA5DIAABAzAAAgMwAANDMAAFgzAAB4MwAAjDMAALwzAADMMwAAMDQAAEA0AAAQNQAAIDUAADw1AACMNQAA/DUAAAw2AABYNgAAxDYAAOw2AAD0NgAAGDcAACg3AABQNwAAWDcAAHw3AAC8NwAA0DcAAOg4AABkOQAAbTkAAAg6AAAQOgAAKDoAAIw6AACUOgAA9DoAABA7AAAYOwAAODsAAGQ7AAB8OwAAHDwAACE8AAAwPAAASTwAAOQ8AADqPAAA7zwAAPI8AAD3PAAAHD0AALA9AADYPQAA4D0AAFw+AADwPgAAMD8AAGg/AAAkQQAALUEAABRDAAAcQwAANEMAACBJAAA0SQAAWEkAAGBJAAB0SQAAIEoAADRKAADASgAA1EoAAPhKAAAASwAAFEsAAFhLAABATAAAVEwAAGhNAACATQAAoU0AAPBNAABkTgAA1E8AAPRPAAC0WQAAuVkAAPxZAABoWgAAbloAAHNaAACoWwAAtVsAAOBbAABUXAAA0FwAANlcAABMXQAAsF0AAMBdAADMXQAAiF4AAExfAABZXwAAZF8AAGlfAAB0XwAApF8AAORfAAAEYAAAFWAAABxgAAAlYAAAZGAAAOBhAACoYgAA1GMAAHhkAACsZAAA7GQAAABlAAAVZQAAtGUAANBlAACsZgAAJGcAAGhnAABsaAAAeGkAAPBpAAAkagAAZGoAAJxqAACtagAAtGoAAHRrAACsawAAMG0AAD1tAABkbQAAbW0AAHBtAAB9bQAA8G0AAP1tAADMbwAA3HAAAP1wAAAkcQAA9HEAAP1xAAAocgAA6HIAABxzAADocwAAVHwAAHR8AAAUfQAANH0AAFx9AABkfQAAbH0AAHR9AAD0fQAA0H4AABB/AABMfwAAbH8AALx/AADQfwAA+H8AAFyAAABlgAAAaIAAAHWAAAB4gAAAfYAAAIyAAACRgAAAnIAAALiAAADggAAA6YAAAOyAAAD1gAAAEIEAABaBAAAbgQAAJYEAAHiBAACQgQAApIEAAMSBAADwgQAA9oEAAPuBAAAwggAANoIAADuCAABwggAABIMAAMCDAAAshgAAHIwAACKMAAAnjAAAKowAAC+MAACgjAAApJAAACiRAABonQAAgZ0AAJydAACinQAAp50AAKqdAACvnQAAsp0AALedAAC6nQAAv50AAMKdAADHnQAAyp0AAM+dAADSnQAA150AABCfAAAWnwAAG58AAJSfAACanwAAn58AAKKfAACnnwAAqp8AAK+fAACynwAAt58AALqfAAC/nwAACKYAAA6mAAATpgAAFqYAABumAAAepgAAI6YAACamAAArpgAALqYAADOmAAA2pgAAO6YAAD6mAABDpgAARqYAAEumAABOpgAAU6YAAFamAABbpgAAXqYAAGOmAABcpwAAYqcAAGenAABqpwAAb6cAAOynAADypwAA96cAAMytAADMswAA7bMAAJS0AACctAAAtLQAAJy1AADAtQAAWLYAAJi2AAC4tgAA2LYAAPi2AAAYtwAAOLcAAFi3AAB4twAA/LcAADW4AABsuAAAdbgAAHy4AACFuAAAirgAAI+4AACSuAAAl7gAAJq4AACfuAAAorgAAKe4AAC0uAAAubgAAOy4AADxuAAAALkAAAW5AAAsuQAAMbkAADS5AAA9uQAASLkAAFG5AABWuQAAW7kAAF65AABjuQAAZrkAAGu5AABuuQAAc7kAAHS5AABkugAAzLsAAFS/AAD8vwAACcAAABzAAAAiwAAAJ8AAACrAAAAvwAAAMsAAADfAAAA6wAAAP8AAAELAAABHwAAATMAAAFLAAABXwAAAWsAAAF/AAABiwAAAZ8AAAGrAAABvwAAA/MAAAFjCAAD8wwAAvMQAADjGAACUxwAAeMgAADCSAgAYkwIA9JMCAP2TAgAwmAIAPZgCAKyYAgDImwIA3ZsCAHCfAgCwoAIA5KACANShAgDdoQIAFKMCAD2jAgBkowIAuKQCAL2kAgBMpQIA8KUCABinAgAlpwIANKcCAGSnAgD4pwIA2KgCADSpAgBkqQIAcKkCAH2pAgAIqwIAeKsCAH2rAgC8qwIAxasCANirAgAYrAIAOKwCAFisAgCYrAIAuKwCAKytAgCorwIAra8CALSvAgDkrwIAjF4DABhfAwDsXwMAtOkDAGTrAwAY7AMAXOwDANjsAwDw7AMA+OwDAATtAwAQ7QMAPPcDAET4AwBJ+AMAUPgDAFn4AwBk+AMAPPkDAEn5AwBd+QMAYPkDAGb5AwBr+QMAbvkDAHP5AwB2+QMAe/kDAJj5AwCs+QMA0PkDAPD7AwD8+wMABfwDABz8AwAi/AMAJ/wDACz8AwAx/AMANPwDADn8AwBk/AMAbfwDAHj8AwB9/AMA6PwDAO78AwDw/AMA9/wDAGj9AwBu/QMAcP0DAHf9AwB4/QMAfv0DAIP9AwCF/QMAiv0DAI/9AwCR/QMA+P4DABz/AwA8/wMAXP8DAHD/AwCY/wMAuP8DAPT/AwAsAAQAmAAEAOgABAD0AAQANAEEAHQBBADoAwQACAQEAMwEBAA4BgQAcAYEAIAGBAD0BwQAcAoEAEALBADsCwQAjAwEACgNBADoDQQAdA4EAH0OBAAMDwQAPA8EAEEPBABUDwQAdBIEAKQSBABMEwQA7BMEAJwUBACMFQQA6BUEACgWBABIFgQAVBYEAIQWBADEFgQA5BYEAPAWBADMFwQA2BwEAFQdBACcHQQAFB4EAMAeBADoHgQAFCAEACAgBADUIAQA4CAEAPAgBABUIQQAXSEEAHgiBAC8IgQAyCMEANQjBABsJAQAfSQEAOQkBABkJQQA3CYEADwnBAAMKAQAGCgEAEwoBABcKAQA1CgEAOgoBAAgKQQAVCkEAF0pBABgKQQAfCoEAJgrBAC8KwQA1SsEANgrBADULAQA5CwEAP0sBABULQQAyC0EAEQuBABxLgQAvC4EACAxBADIMgQAyDMEAJw0BADkNAQAlDUEABQ2BAA8NgQA+DkEAKQ6BAC0OgQAxDoEABw7BABgOwQAnDwEAFA9BABlPQQAFD4EACU+BAAsPwQA2D8EABhBBAA1QQQA1EEEAPRCBAAFQwQACEMEADRDBACgQwQA5EMEANBEBAAARQQADUUEABxFBADYRQQAEEcEABlHBAAwRwQANUcEAHRHBAB9RwQA0EcEAERIBADcSAQA8UgEAARJBAAYSgQAIEoEADRKBAB0SgQAoEoEAKVKBACoSwQA5EsEAAxMBAAwTAQAQEwEAKBMBADATAQAzEwEAORMBAAQTQQAIE0EADRNBABATQQAXE0EAIxNBACwTQQA0E0EACROBAAsTgQAOE4EANROBAAATwQACE8EABRPBAAoTwQATE8EAFVPBABgTwQAiE8EAChRBAA1UQQAZFEEAG1RBACEUQQAHFMEAGRTBADUVgQABFcEABVXBAAgVwQAXVcEAHRXBAAAWQQACVkEABBZBABkWQQAsFkEAORaBAAkWwQAjFsEAGhcBACsXAQA7FwEAPlcBAAYXQQA7GAEAMhjBAAYZAQAJGQEAExkBABYZAQA1GQEAOBkBAAMZQQAEWUEABRlBAAZZQQAZGUEAJxmBABcZwQAkGcEAARpBAANaQQAHGkEAGhqBABxagQAgGoEAIlqBADgawQAJGwEAJxtBACEbwQA5G8EACBwBADYcAQAAHEEAA1xBAAUcQQAsHEEAMBxBADFcQQAPHIEAJxyBADYcgQAGHQEACR0BADYdAQA6HQEAPR0BAAcdQQAZHUEAJR1BACgdQQAOHYEAER2BABgdgQApHYEAGx3BACkdwQA2HsEAOF7BABAfAQA6HwEAAh9BAARfQQAaH0EAMB+BADEfwQAZI4EALiRBADRkQQADJUEAMi/BABU0QQA6A8FABgZBQDkhAUA4KgFAHipBQCkqQUAvakFAPiqBQAkqwUAtKsFANCrBQDVqwUA2KwFAOWsBQAMrQUAEa0FABStBQBkrQUAhK0FANytBQA8rgUAtLUFAL21BQDktQUAWLoFAGG6BQBougUA4LoFAEy7BQAovQUAHL4FAHy+BQCQvwUA2L8FAFQzBgB4NAYAyDcGAMy/BgDsvwYA+L8GAIjEBgDIxAYASMUGAFTFBgCcxQYA7MsGAKjxBgDw8QYAIPIGAGTyBgB48gYAffIGAIzyBgDwMwcAzDoHAEA7BwDAOwcAtDwHABg9BwAMPwcA1EMHAJhEBwCoRwcAFEkHAExLBwDMSwcAWE0HAOBNBwBQUQcAcFIHAHxSBwCIUgcAmFIHALBSBwDkUgcA7FIHAAxTBwAUVAcAKFQHAFBUBwBwVAcA5FQHAPhUBwAQVQcAGFUHAEBVBwCUWgcALF8HABhqBwApagcALGoHAHxqBwC8agcAeHwHAKh8BwAYgAcAYIAHAISABwCQgAcAqIAHALSBBwA8ggcAsIQHAPSEBwAkhQcAPIUHALiKBwDkiwcA/IsHAOSTBwDolwcA/JcHAHibBwDUmwcA/JsHAJifBwCsnwcAuJ8HAPifBwAQowcAWKMHACylBwBkpQcAfKUHANCyBwD0tAcADLgHAHy4BwCIuAcAkLgHAJy4BwDIuAcA3LgHAOS4BwDsuAcACLkHABy5BwAkuQcALLkHADy5BwBIuQcAULkHAFy5BwBkuQcAbLkHAHS5BwB8uQcAiLkHAJC5BwCouQcAyLkHANy5BwDwuQcA+LkHACS6BwBsugcAjLoHAKS6BwDsugcAxLsHAKzABwBMwgcAuMIHAPzCBwA8wwcA1MMHALTGBwAIyAcA7MgHACDJBwBEyQcAlMkHANTZBwDh2QcAYNsHALDbBwDw2wcAZN8HAKzfBwDA3wcALOAHABzhBwBk4QcAHOIHALTiBwDs4gcABOMHAGDjBwBc6QcAtOkHAPDpBwAo6gcAGOsHACDrBwBw6wcAqOsHAODrBwBI7gcA6O8HAHybCgB04AoAtDoLAICvCwB0uQsAdOgLAChNDADk0QwA/AE4ALwHOAA=";
78
+
79
+ function _decode(b64) {
80
+ const bin = Buffer.from(b64, 'base64');
81
+ return new Uint32Array(bin.buffer, bin.byteOffset, bin.byteLength / 4);
82
+ }
83
+
84
+ const STARTS = _decode(_startsB64);
85
+ const PACKED = _decode(_packedB64);
86
+
87
+ // Returns one of KIND_INCLUDE/SOFT/OPEN/CLOSE for cp, or -1 when cp has no
88
+ // entry — which means Hard, the UTS58 @missing default. Binary search over a
89
+ // flat run-length encoding; generated from constants.rb so the answers match
90
+ // the Ruby reference implementation.
91
+ export function terminationKind(cp) {
92
+ let lo = 0, hi = STARTS.length - 1;
93
+ while (lo <= hi) {
94
+ const mid = (lo + hi) >>> 1;
95
+ const s = STARTS[mid];
96
+ const p = PACKED[mid];
97
+ const end = p >>> 2;
98
+ if (cp < s) hi = mid - 1;
99
+ else if (cp > end) lo = mid + 1;
100
+ else return p & 3;
101
+ }
102
+ return -1;
103
+ }
package/src/core.d.ts ADDED
@@ -0,0 +1,32 @@
1
+ // Types for 'uts58/core' — the extractor with no public-suffix table.
2
+
3
+ import type { UrlEntity, EmailEntity, Entity } from './index.js';
4
+
5
+ export {
6
+ Extractor,
7
+ type ExtractorOptions,
8
+ type UrlEntity,
9
+ type EmailEntity,
10
+ type Entity,
11
+ type Indices,
12
+ } from './index.js';
13
+
14
+ /** The six convenience wrappers bound to one extractor. */
15
+ export interface Api {
16
+ extractUrlsWithIndices(text: string): UrlEntity[];
17
+ extractUrls(text: string): string[];
18
+ extractEmailAddressesWithIndices(text: string): EmailEntity[];
19
+ extractEmailAddresses(text: string): string[];
20
+ extractEntitiesWithIndices(text: string): Entity[];
21
+ extractEntities(text: string): string[];
22
+ }
23
+
24
+ /** Build a host-plausibility predicate over a flat set of public suffixes
25
+ * (lower-cased, A-label folded). A host passes when some trailing run of its
26
+ * labels is in the set and at least one label sits in front of it. */
27
+ export declare function makeHostChecker(
28
+ suffixes: Set<string>,
29
+ ): (host: string) => boolean;
30
+
31
+ /** Build the six convenience functions over a shared extractor instance. */
32
+ export declare function makeApi(shared: import('./index.js').Extractor): Api;
package/src/core.js ADDED
@@ -0,0 +1,21 @@
1
+ // 'uts58/core': the extractor with no public-suffix table bundled. Bring
2
+ // your own host validator: either build one over a suffix set with
3
+ // makeHostChecker, or wrap a library you already depend on:
4
+ //
5
+ // import { Extractor } from 'uts58/core';
6
+ // import { parse } from 'tldts';
7
+ // const ex = new Extractor({
8
+ // isPlausibleHost: (h) => {
9
+ // const p = parse(h);
10
+ // return !!p.domain && p.isIcann && p.publicSuffix !== 'invalid';
11
+ // },
12
+ // });
13
+ // ex.extractUrls('see example.com here');
14
+ //
15
+ // With no `isPlausibleHost`, any host that clears the syntax rules (≥2
16
+ // labels, length, etc.) is accepted. This is useful when some later stage
17
+ // does the real validation, perhaps using an actual DNS lookup.
18
+
19
+ export { Extractor } from './extractor.js';
20
+ export { makeHostChecker } from './plausible.js';
21
+ export { makeApi } from './api.js';
@@ -0,0 +1,383 @@
1
+ // JavaScript port of the Ruby `Uts58::Extractor`. UTS58 link
2
+ // extraction; see https://github.com/arnt/uts58 Ruby
3
+ // implementation. Behavioural notes that differ from the Ruby are
4
+ // flagged inline; everything else is meant to match.
5
+
6
+ import punycode from 'punycode/punycode.js';
7
+ import {
8
+ OPENERS,
9
+ TERMINATION_KIND,
10
+ terminationKind,
11
+ } from './constants.js';
12
+
13
+ const PATH_CLOSERS = new Set([0x23, 0x2f, 0x3f]); // # / ?
14
+ const QUERY_CLOSERS = new Set([0x23]); // #
15
+ const FRAGMENT_CLOSERS = new Set();
16
+ const NO_SEPARATORS = new Set();
17
+ const QUERY_SEPARATORS = new Set([0x3d, 0x26]); // = &
18
+ const DIRECTIVE_SEPARATORS = new Set([0x2c, 0x3d, 0x26]); // , = &
19
+
20
+ // The Ruby `\p{Alnum}` class is letters + numbers only; JS doesn't
21
+ // ship that as a single property escape, so we spell it out.
22
+ const ALNUM = '\\p{L}\\p{N}';
23
+ const LNM = '\\p{L}\\p{N}\\p{M}';
24
+ const SEP_EXTRAS = '\\u00DF\\u03C2\\u06FD\\u06FE\\u0F0B\\u3007';
25
+
26
+ // The '@' in the negative lookbehind is non-obvious: UTS58 has no
27
+ // userinfo, so a host that immediately follows an '@' may be part of
28
+ // an email address (or perhaps a bluesky/mastodon address) but it cannot
29
+ // be a http(s) URL. The rest of the set just keeps a trigger from
30
+ // firing inside a word, number, or path.
31
+ const TRIGGER_RE = new RegExp(
32
+ `(?<![-${ALNUM}\\p{M}.\\/@])(?=[${ALNUM}][-${LNM}${SEP_EXTRAS}]*[.:。])`,
33
+ 'gu',
34
+ );
35
+
36
+ const SCHEME_RE = new RegExp(
37
+ '^([\\p{Script=Han}\\p{Script=Hiragana}\\p{Script=Katakana}\\p{Script=Hangul}\\p{Script=Thai}\\p{Script=Lao}\\p{Script=Khmer}\\p{Script=Myanmar}]*?)(https?:\\/\\/)',
38
+ 'iu',
39
+ );
40
+
41
+ const PREFIX_RE = new RegExp(
42
+ `^(?:[-${LNM}${SEP_EXTRAS}]+[.。]){1,4}[-${LNM}]+(?![-${LNM}])`,
43
+ 'u',
44
+ );
45
+
46
+ const PORT_RE = /^:(\d+)/;
47
+
48
+ // The widest slice we ever hand to an anchored host/scheme match. A host is
49
+ // rejected past 254 cp and a scheme is tiny, so nothing legitimate reaches
50
+ // this limit. This protects the server against malevolent input.
51
+ const MAX_HOST_SCAN_CP = 262;
52
+
53
+ // One local-part character. We walk left from an '@' testing this rather
54
+ // than slicing the whole preceding text and matching it anchored, which was
55
+ // O(prefix) per '@'. XID_Continue covers the letters/digits/marks UTS58
56
+ // allows; the punctuation set is the dot-atom-plus-extras from the spec.
57
+ const LP_CHAR = new RegExp("[\\p{XID_Continue}.!#$%&'*+\\-/=?^_`{|}~]", 'u');
58
+
59
+ // RFC5321 caps a local-part at 64 octets. If a candidate is longer,
60
+ // we reject the whole candidate, rather than assume that some suffix
61
+ // is an address.
62
+ const MAX_LOCALPART_CP = 64;
63
+
64
+ // Convert a UTF-16 string into an array of codepoint strings, so the
65
+ // rest of the pipeline can index by codepoint exactly like the Ruby
66
+ // version. Each element holds one full codepoint (1 or 2 UTF-16
67
+ // units).
68
+ function toCodepoints(s) {
69
+ return Array.from(s);
70
+ }
71
+
72
+ function cpSlice(cps, start, end) {
73
+ return cps.slice(start, end).join('');
74
+ }
75
+
76
+ // Codepoint index → UTF-16 offset, as a prefix sum of codepoint widths. The
77
+ // scan runs in codepoints to mirror the Ruby, but the public indices are
78
+ // UTF-16 so they drop straight into String.prototype.slice, the DOM, and
79
+ // editors. (Ruby keeps codepoint offsets, idiomatic for Ruby strings.)
80
+ function utf16Offsets(cps) {
81
+ const off = new Int32Array(cps.length + 1);
82
+ for (let c = 0; c < cps.length; c++) off[c + 1] = off[c] + cps[c].length;
83
+ return off;
84
+ }
85
+
86
+ // Returns true if a soft terminator at position i is followed only by
87
+ // more soft terminators and then a hard one (or end of input). Used
88
+ // to decide whether trailing punctuation belongs to the URL or to the
89
+ // surrounding prose.
90
+ function followedByHard(cps, i) {
91
+ let j = i;
92
+ while (j < cps.length) {
93
+ const k = terminationKind(cps[j].codePointAt(0));
94
+ if (k !== TERMINATION_KIND.soft) break;
95
+ j++;
96
+ }
97
+ if (j >= cps.length) return true;
98
+ return terminationKind(cps[j].codePointAt(0)) === -1; // -1 == Hard (default)
99
+ }
100
+
101
+ // Walks one path/query/fragment segment, honouring bracket pairing
102
+ // and the spec's soft/hard terminator distinction. Returns the
103
+ // codepoint index in `cps` at which the segment ends (i.e. the first
104
+ // codepoint not consumed); the caller passes that index back in as
105
+ // the new `start` for the next segment.
106
+ function skipComponent(cps, start, extraClosers, separators = NO_SEPARATORS, directiveSeparators = null) {
107
+ let openers = [];
108
+ let seps = separators;
109
+ for (let i = start; i < cps.length; i++) {
110
+ if (i === start) continue; // the lead-in character (e.g. '/', '?', '#')
111
+ const cp = cps[i].codePointAt(0);
112
+ if (extraClosers.has(cp)) return i;
113
+ // ':~:' begins a fragment text directive; its own separators take over
114
+ // and bracket pairing restarts (the directive is a fresh part).
115
+ if (directiveSeparators && cp === 0x3a && cps[i + 1] === '~' && cps[i + 2] === ':') {
116
+ openers = [];
117
+ seps = directiveSeparators;
118
+ i += 2;
119
+ continue;
120
+ }
121
+ // A separator ends one part of the component. UTS58 pairs brackets per
122
+ // part, so the open-bracket stack restarts here, but the link continues.
123
+ if (seps.has(cp)) {
124
+ openers = [];
125
+ continue;
126
+ }
127
+ const k = terminationKind(cp);
128
+ if (k === TERMINATION_KIND.include) continue; // part of the link
129
+ if (k === TERMINATION_KIND.soft) {
130
+ if (followedByHard(cps, i)) return i;
131
+ } else if (k === TERMINATION_KIND.close) {
132
+ const want = OPENERS.get(cp);
133
+ if (openers.length && openers[openers.length - 1] === want) {
134
+ openers.pop();
135
+ } else {
136
+ return i;
137
+ }
138
+ } else if (k === TERMINATION_KIND.open) {
139
+ openers.push(cp);
140
+ } else {
141
+ return i; // -1 (uncovered) == Hard, the UTS58 default: link ends here
142
+ }
143
+ }
144
+ return cps.length;
145
+ }
146
+
147
+ // IDN A-label → U-label decoding. punycode.toUnicode does the lookup
148
+ // per label, so '.' separation is enough; the caller has already
149
+ // translated the ideographic full stop '。' to '.'.
150
+ function idnToUnicode(host) {
151
+ try {
152
+ return punycode.toUnicode(host);
153
+ } catch {
154
+ return host;
155
+ }
156
+ }
157
+
158
+ // A label may not start or end with a hyphen (the LDH rule), so a host like
159
+ // -foo.example-.com is not a link at all. PREFIX_RE already rejects empty
160
+ // labels; xn-- A-labels pass, since they neither start nor end with '-'.
161
+ function validLabels(host) {
162
+ return host.split('.').every((label) =>
163
+ !label.startsWith('-') && !label.endsWith('-'));
164
+ }
165
+
166
+ export class Extractor {
167
+ // `isPlausibleHost(host)` decides whether a candidate hostname is
168
+ // real enough to linkify. Most users want to determine this
169
+ // locally, without performing e.g. a DNS lookup, but details
170
+ // vary. This is injected so the table (IANA, PSL, or a tldts-backed
171
+ // predicate the caller provides) is a bundle-time choice; the entry
172
+ // points in index.js / index-iana.js wire in a default.
173
+ //
174
+ // With no predicate, any host that matches earlier checks is
175
+ // accepted, which is the table-free 'core' behaviour.
176
+
177
+
178
+ constructor({ isPlausibleHost = () => true } = {}) {
179
+ this.isPlausibleHost = isPlausibleHost;
180
+ // Maximum allowed length of the matched text, in input codepoints,
181
+ // since for a human user, the lengths of ☺ and 😀 ought to be the
182
+ // same. null means no limit.
183
+ this.maxLength = null;
184
+ }
185
+
186
+ extractUrlsWithIndices(text) {
187
+ const cps = toCodepoints(text);
188
+ const u16 = utf16Offsets(cps); // codepoint index → UTF-16 offset for output
189
+ const joined = cps.join(''); // identical to text in content, but we
190
+ // also need a UTF-16 → codepoint index map.
191
+ // Build a map from UTF-16 offset to codepoint index. cpAt[u] is the
192
+ // number of codepoints starting strictly before UTF-16 offset u.
193
+ const cpAt = new Int32Array(joined.length + 1);
194
+ {
195
+ let cp = 0;
196
+ for (let u = 0; u < joined.length; u++) {
197
+ cpAt[u] = cp;
198
+ const code = joined.charCodeAt(u);
199
+ // a high surrogate consumes two UTF-16 units for one codepoint
200
+ if (code >= 0xd800 && code <= 0xdbff && u + 1 < joined.length) {
201
+ cpAt[u + 1] = cp;
202
+ u++;
203
+ }
204
+ cp++;
205
+ }
206
+ cpAt[joined.length] = cp;
207
+ }
208
+
209
+ const result = [];
210
+ TRIGGER_RE.lastIndex = 0;
211
+ let m;
212
+ while ((m = TRIGGER_RE.exec(joined)) !== null) {
213
+ // Zero-width lookahead match: advance lastIndex by one UTF-16
214
+ // unit (or one codepoint) so we don't loop forever.
215
+ const triggerU16 = m.index;
216
+ const triggerCp = cpAt[triggerU16];
217
+ {
218
+ const code = joined.charCodeAt(triggerU16);
219
+ TRIGGER_RE.lastIndex = triggerU16 +
220
+ (code >= 0xd800 && code <= 0xdbff ? 2 : 1);
221
+ }
222
+
223
+ // post_match: everything from the trigger onward, as a
224
+ // codepoint-indexed view. We'll work in codepoint indices and
225
+ // build the result substring from `cps`.
226
+ const postStart = triggerCp;
227
+
228
+ // Allow letter/mark/number characters between the trigger and a
229
+ // scheme like "http://". post_match string for SCHEME_RE:
230
+ const postStr = cpSlice(cps, postStart,
231
+ Math.min(cps.length, postStart + MAX_HOST_SCAN_CP));
232
+ let schemeOffset = 0; // codepoints between trigger and the actual link start
233
+ let proto = 'https://';
234
+ let prefixStart = postStart;
235
+ const sm = SCHEME_RE.exec(postStr);
236
+ if (sm) {
237
+ schemeOffset = toCodepoints(sm[1]).length;
238
+ proto = sm[2];
239
+ prefixStart = postStart + schemeOffset + toCodepoints(sm[2]).length;
240
+ }
241
+
242
+ // Prefix: the candidate hostname.
243
+ const prefixStr = cpSlice(cps, prefixStart,
244
+ Math.min(cps.length, prefixStart + MAX_HOST_SCAN_CP));
245
+ const pm = PREFIX_RE.exec(prefixStr);
246
+ if (!pm) continue;
247
+ const prefixCpLen = toCodepoints(pm[0]).length;
248
+ if (prefixCpLen >= 254) continue;
249
+
250
+ const hostRaw = pm[0].replace(/。/g, '.');
251
+ if (!validLabels(hostRaw)) continue;
252
+ const hn = idnToUnicode(hostRaw);
253
+ if (!this.isPlausibleHost(hn)) continue;
254
+
255
+ // Walk an optional trailing root-label dot, the optional port, then any
256
+ // number of path segments, then an optional query, then an optional
257
+ // fragment.
258
+ let i = prefixStart + prefixCpLen;
259
+ // "example.com." keeps its trailing dot only when a path, query, or
260
+ // fragment follows; at the end of a sentence the dot is prose (UTS58).
261
+ if (cps[i] === '.' &&
262
+ (cps[i + 1] === '/' || cps[i + 1] === '?' || cps[i + 1] === '#')) {
263
+ i++;
264
+ }
265
+ const restStr = () => cpSlice(cps, i, Math.min(cps.length, i + 8));
266
+
267
+ const pmPort = PORT_RE.exec(restStr());
268
+ if (pmPort) {
269
+ const n = parseInt(pmPort[1], 10);
270
+ if (n < 1 || n > 65535) continue;
271
+ i += pmPort[0].length; // ASCII digits + colon, so length == cp count
272
+ }
273
+
274
+ while (cps[i] === '/') i = skipComponent(cps, i, PATH_CLOSERS);
275
+ if (cps[i] === '?') i = skipComponent(cps, i, QUERY_CLOSERS, QUERY_SEPARATORS);
276
+ if (cps[i] === '#') i = skipComponent(cps, i, FRAGMENT_CLOSERS, NO_SEPARATORS, DIRECTIVE_SEPARATORS);
277
+
278
+ const restLen = i - (prefixStart + prefixCpLen);
279
+ // Length of the matched span in the *input*, measured in
280
+ // codepoints from the start of the link (after any scheme
281
+ // offset) to wherever we stopped.
282
+ const matchLength = (i - postStart) - schemeOffset;
283
+ if (this.maxLength != null && matchLength > this.maxLength) continue;
284
+
285
+ const start = postStart + schemeOffset;
286
+ const tail = cpSlice(cps, prefixStart + prefixCpLen, prefixStart + prefixCpLen + restLen);
287
+ result.push({
288
+ url: `${proto}${hn}${tail}`,
289
+ indices: [u16[start], u16[start + matchLength]],
290
+ });
291
+ }
292
+ return result;
293
+ }
294
+
295
+ extractUrls(text) {
296
+ return this.extractUrlsWithIndices(text).map((r) => r.url);
297
+ }
298
+
299
+ // Returns every email address found in `text` as a list of objects:
300
+ //
301
+ // { email: String, url: String, indices: [start, end] }
302
+ //
303
+ // `email` is the bare address ("info@example.com"); `url` is the same
304
+ // thing as a mailto: URL, so the result drops straight into anything
305
+ // that already renders a `url` entity. Both carry the IDN-decoded
306
+ // domain. `indices` are UTF-16 offsets, `end` exclusive, and absorb a
307
+ // leading "mailto:" if the input had one (UTS58 5.2).
308
+ extractEmailAddressesWithIndices(text) {
309
+ const cps = toCodepoints(text);
310
+ const u16 = utf16Offsets(cps); // codepoint index → UTF-16 offset for output
311
+ const result = [];
312
+ for (let at = 0; at < cps.length; at++) {
313
+ if (cps[at] !== '@') continue;
314
+
315
+ // Walk left over the run of local-part characters. We stop one past
316
+ // the RFC 5321 limit and bail: a run longer than that means this isn't
317
+ // an address, not that its last 64 characters are one. Scanning
318
+ // (rather than slicing the whole prefix and matching anchored) is what
319
+ // keeps '@'-dense input from going O(n²).
320
+ let localStart = at;
321
+ while (localStart > 0 &&
322
+ LP_CHAR.test(cps[localStart - 1]) &&
323
+ at - localStart < MAX_LOCALPART_CP + 1) {
324
+ localStart--;
325
+ }
326
+ const localLen = at - localStart;
327
+ if (localLen === 0 || localLen > MAX_LOCALPART_CP) continue;
328
+ const local = cpSlice(cps, localStart, at);
329
+ if (local.startsWith('.') || local.endsWith('.') || local.includes('..')) {
330
+ continue;
331
+ }
332
+
333
+ const afterStr = cpSlice(cps, at + 1,
334
+ Math.min(cps.length, at + 1 + MAX_HOST_SCAN_CP));
335
+ const pm = PREFIX_RE.exec(afterStr);
336
+ if (!pm) continue;
337
+ const prefixCpLen = toCodepoints(pm[0]).length;
338
+ if (prefixCpLen >= 254) continue;
339
+
340
+ const hostRaw = pm[0].replace(/。/g, '.');
341
+ if (!validLabels(hostRaw)) continue;
342
+ const hn = idnToUnicode(hostRaw);
343
+ if (!this.isPlausibleHost(hn)) continue;
344
+
345
+ const endPos = at + 1 + prefixCpLen;
346
+ // UTS58 5.2 step 6: absorb a leading "mailto:" into the span.
347
+ if (localStart >= 7 &&
348
+ cpSlice(cps, localStart - 7, localStart).toLowerCase() === 'mailto:') {
349
+ localStart -= 7;
350
+ }
351
+ if (this.maxLength != null && (endPos - localStart) > this.maxLength) {
352
+ continue;
353
+ }
354
+ result.push({
355
+ email: `${local}@${hn}`,
356
+ url: `mailto:${local}@${hn}`,
357
+ indices: [u16[localStart], u16[endPos]],
358
+ });
359
+ }
360
+ return result;
361
+ }
362
+
363
+ extractEmailAddresses(text) {
364
+ return this.extractEmailAddressesWithIndices(text).map((r) => r.email);
365
+ }
366
+
367
+ // Sorts `entities` by start offset and drops any whose [start, end)
368
+ // overlaps the survivor before it. The earlier-starting entity wins;
369
+ // length plays no part, so a long candidate is dropped if it begins
370
+ // inside a shorter earlier one. Ties on start are broken by input
371
+ // order. Doesn't mutate `entities`.
372
+ removeOverlappingEntities(entities) {
373
+ const sorted = [...entities].sort((a, b) => a.indices[0] - b.indices[0]);
374
+ const out = [];
375
+ let prev = null;
376
+ for (const e of sorted) {
377
+ if (prev && prev.indices[1] > e.indices[0]) continue;
378
+ out.push(e);
379
+ prev = e;
380
+ }
381
+ return out;
382
+ }
383
+ }
@@ -0,0 +1,27 @@
1
+ // 'uts58/iana': same API as the default entry, but hosts are checked only
2
+ // against the IANA root-zone TLD list (~5 KB gz vs ~27 KB for the PSL). The
3
+ // rightmost label must be a real TLD; that distinguishes blogspot.jp from
4
+ // blogspot.exe and rejects typos like example.cmo, while accepting a bare
5
+ // co.uk / kawasaki.jp as plausible. The smallest sensible table for the
6
+ // browser.
7
+
8
+ import { Extractor as CoreExtractor } from './extractor.js';
9
+ import { isPlausibleHost } from './tlds-iana.js';
10
+ import { makeApi } from './api.js';
11
+
12
+ export class Extractor extends CoreExtractor {
13
+ constructor(options = {}) {
14
+ super({ isPlausibleHost, ...options });
15
+ }
16
+ }
17
+
18
+ const _shared = new Extractor();
19
+
20
+ export const {
21
+ extractUrlsWithIndices,
22
+ extractUrls,
23
+ extractEmailAddressesWithIndices,
24
+ extractEmailAddresses,
25
+ extractEntitiesWithIndices,
26
+ extractEntities,
27
+ } = makeApi(_shared);
package/src/index.d.ts ADDED
@@ -0,0 +1,73 @@
1
+ // Type declarations for uts58 (the default and /iana entries; /core has its
2
+ // own core.d.ts). Hand-written to match the sources; keep them in sync by
3
+ // hand (there's no build step).
4
+
5
+ /** A `[start, end)` span in UTF-16 code units — the same units as
6
+ * `String.prototype.slice`, `String#length`, the DOM, and editors — so
7
+ * `text.slice(start, end)` returns the matched substring directly. (The Ruby
8
+ * port reports codepoint offsets instead, idiomatic for Ruby strings.) */
9
+ export type Indices = [start: number, end: number];
10
+
11
+ /** A web link found in the text. `url` carries the IDN-decoded host and a
12
+ * scheme (`https://` is prepended when the input had none). */
13
+ export interface UrlEntity {
14
+ url: string;
15
+ indices: Indices;
16
+ }
17
+
18
+ /** An email address found in the text. `email` is the bare address;
19
+ * `url` is the same address as a `mailto:` URL, so the result drops into
20
+ * anything that already renders a `url` entity. Both carry the IDN-decoded
21
+ * domain. */
22
+ export interface EmailEntity {
23
+ email: string;
24
+ url: string;
25
+ indices: Indices;
26
+ }
27
+
28
+ /** Either kind of entity, as returned by the combined extractors. Narrow on
29
+ * the presence of `email` to tell them apart. */
30
+ export type Entity = UrlEntity | EmailEntity;
31
+
32
+ export interface ExtractorOptions {
33
+ /** Decides whether a candidate hostname is real enough to linkify — the
34
+ * public-suffix check. The default and /iana entries inject one; on /core
35
+ * it defaults to accepting any host that clears the shape rules. */
36
+ isPlausibleHost?: (host: string) => boolean;
37
+ }
38
+
39
+ /** The underlying extractor. The module-level functions wrap a shared
40
+ * instance; construct your own to set `maxLength`, swap the host check, or
41
+ * get the raw, possibly-overlapping list before merging with other
42
+ * extractors. */
43
+ export declare class Extractor {
44
+ constructor(options?: ExtractorOptions);
45
+
46
+ /** Maximum length of a matched span, in input codepoints, so that
47
+ * "☺" "😀" are equal in length. `null` (the * default) means no
48
+ * limit. Candidates longer than this are dropped. */
49
+
50
+ maxLength: number | null;
51
+
52
+ isPlausibleHost: (host: string) => boolean;
53
+
54
+ extractUrlsWithIndices(text: string): UrlEntity[];
55
+ extractUrls(text: string): string[];
56
+ extractEmailAddressesWithIndices(text: string): EmailEntity[];
57
+ extractEmailAddresses(text: string): string[];
58
+
59
+ /** Drops every entity whose span overlaps an earlier one. Stable; the
60
+ * earlier-starting entity wins. Does not mutate the input. */
61
+ removeOverlappingEntities<T extends { indices: Indices }>(entities: T[]): T[];
62
+ }
63
+
64
+ export declare function extractUrlsWithIndices(text: string): UrlEntity[];
65
+ export declare function extractUrls(text: string): string[];
66
+ export declare function extractEmailAddressesWithIndices(text: string): EmailEntity[];
67
+ export declare function extractEmailAddresses(text: string): string[];
68
+
69
+ /** Both URLs and email addresses, sorted by start offset with overlaps
70
+ * removed — an email beats the bare domain that sits inside it after the
71
+ * `@`. Email addresses appear in their `mailto:` form under `url`. */
72
+ export declare function extractEntitiesWithIndices(text: string): Entity[];
73
+ export declare function extractEntities(text: string): string[];
package/src/index.js ADDED
@@ -0,0 +1,34 @@
1
+ // Default entry point: UTS58 web-link extraction, with hosts validated
2
+ // against the Public Suffix List (ICANN section). The six functions are thin
3
+ // wrappers around a shared Extractor; they also drop overlapping candidates,
4
+ // keeping the earlier-starting one.
5
+ //
6
+ // import { extractUrls, extractUrlsWithIndices } from 'uts58';
7
+ // extractUrls("see example.com here") // ["https://example.com"]
8
+ // extractUrlsWithIndices("see example.com here") // [{ url: "https://example.com", indices: [4, 15] }]
9
+ //
10
+ // For a smaller table import 'uts58/iana' (root-zone TLDs only); to bring
11
+ // your own validator (e.g. tldts) import { Extractor } from 'uts58/core'.
12
+
13
+ import { Extractor as CoreExtractor } from './extractor.js';
14
+ import { isPlausibleHost } from './suffixes-psl.js';
15
+ import { makeApi } from './api.js';
16
+
17
+ // Extractor pre-wired with the PSL check, so `new Extractor()` from this
18
+ // entry validates the same way the module-level functions do.
19
+ export class Extractor extends CoreExtractor {
20
+ constructor(options = {}) {
21
+ super({ isPlausibleHost, ...options });
22
+ }
23
+ }
24
+
25
+ const _shared = new Extractor();
26
+
27
+ export const {
28
+ extractUrlsWithIndices,
29
+ extractUrls,
30
+ extractEmailAddressesWithIndices,
31
+ extractEmailAddresses,
32
+ extractEntitiesWithIndices,
33
+ extractEntities,
34
+ } = makeApi(_shared);
@@ -0,0 +1,29 @@
1
+ // Host-plausibility check over a flat set of public suffixes. The
2
+ // question this answers is "could this be a link?", not "what is the
3
+ // exact registrable domain?". Therefore a flat membership test is
4
+ // enough, and the tabble can be much smaller than if we needed PSL's
5
+ // full wildcard/exception machinery.
6
+
7
+ // The tables are pre-folded for plausibility; see tools/maketlds.js).
8
+
9
+ import punycode from 'punycode/punycode.js';
10
+
11
+ // Build a `(host) => boolean` predicate. A host is plausible when
12
+ // some trailing run of its labels is a known suffix AND has at least
13
+ // one label before the suffix.
14
+ export function makeHostChecker(suffixes) {
15
+ return (host) => {
16
+ const folded = host.normalize('NFC').toLowerCase();
17
+ let ascii;
18
+ try {
19
+ ascii = punycode.toASCII(folded);
20
+ } catch {
21
+ ascii = folded;
22
+ }
23
+ const labels = ascii.split('.');
24
+ for (let i = labels.length - 1; i >= 1; i--) {
25
+ if (suffixes.has(labels.slice(i).join('.'))) return true;
26
+ }
27
+ return false;
28
+ };
29
+ }
@@ -0,0 +1,9 @@
1
+ // Generated by tools/maketlds.js. Do not edit by hand; rerun the generator.
2
+ // Public Suffix List, ICANN section, folded for plausibility.
3
+ import { makeHostChecker } from './plausible.js';
4
+
5
+ const SUFFIXES = new Set(
6
+ "aaa aarp abb abbott abbvie abc able abogado abudhabi ac ac.za academy accenture accountant accountants aco actor ad ads adult ae aeg aero aetna af afl africa ag agakhan agency agric.za ai aig airbus airforce airtel akdn al alibaba alipay allfinanz allstate ally alsace alstom alt.za am amazon americanexpress americanfamily amex amfam amica amsterdam analytics android anquan anz ao aol apartments app apple aq aquarelle ar arab aramco archi army arpa art arte as asda asia associates at athleta attorney au auction audi audible audio auspost author auto autos aw aws ax axa az azure ba baby baidu banamex band bank bar barcelona barclaycard barclays barefoot bargains baseball basketball bauhaus bayern bb bbc bbt bbva bcg bcn bd be beats beauty beer berlin best bestbuy bet bf bg bh bharti bi bible bid bike bing bingo bio biz bj black blackfriday blockbuster blog bloomberg blue bm bms bmw bn bnpparibas bo boats boehringer bofa bom bond boo book booking bosch bostik boston bot boutique box br bradesco bridgestone broadway broker brother brussels bs bt build builders business buy buzz bv bw by bz bzh ca cab cafe cal call calvinklein cam camera camp canon capetown capital capitalone car caravan cards care career careers cars casa case cash casino cat catering catholic cba cbn cbre cc cd center ceo cern cf cfa cfd cg ch chanel channel charity chase chat cheap chintai christmas chrome church ci cipriani circle cisco citadel citi citic city ck cl claims cleaning click clinic clinique clothing cloud club clubmed cm cn co co.za coach codes coffee college cologne com commbank community company compare computer comsec condos construction consulting contact contractors cooking cool coop corsica country coupon coupons courses cpa cr credit creditcard creditunion cricket crown crs cruise cruises cu cuisinella cv cw cx cy cymru cyou cz dad dance data date dating datsun day dclk dds de deal dealer deals degree delivery dell deloitte delta democrat dental dentist desi design dev dhl diamonds diet digital direct directory discount discover dish diy dj dk dm dnp do docs doctor dog domains dot download drive dtv dubai dupont durban dvag dvr dz earth eat ec eco edeka edu edu.za education ee eg email emerck energy engineer engineering enterprises epson equipment er ericsson erni es esq estate et eu eurovision eus events exchange expert exposed express extraspace fage fail fairwinds faith family fan fans farm farmers fashion fast fedex feedback ferrari ferrero fi fidelity fido film final finance financial fire firestone firmdale fish fishing fit fitness fj fk flickr flights flir florist flowers fly fm fo foo food football ford forex forsale forum foundation fox fr free fresenius frl frogans frontier ftr fujitsu fun fund furniture futbol fyi ga gal gallery gallo gallup game games gap garden gay gb gbiz gd gdn ge gea gent genting george gf gg ggee gh gi gift gifts gives giving gl glass gle global globo gm gmail gmbh gmo gmx gn godaddy gold goldpoint golf goodyear goog google gop got gov gov.za gp gq gr grainger graphics gratis green gripe grocery grondar.za group gs gt gu gucci guge guide guitars guru gw gy hair hamburg hangout haus hbo hdfc hdfcbank health healthcare help helsinki here hermes hiphop hisamitsu hitachi hiv hk hkt hm hn hockey holdings holiday homedepot homegoods homes homesense honda horse hospital host hosting hot hotel hotels hotmail house how hr hsbc ht hu hughes hyatt hyundai ibm icbc ice icu id ie ieee ifm ikano il im imamat imdb immo immobilien in inc industries infiniti info ing ink institute insurance insure int international intuit investments io ipiranga iq ir irish is ismaili ist istanbul it itau itv jaguar java jcb je jeep jetzt jewelry jio jll jm jmp jnj jo jobs joburg jot joy jp jpmorgan jprs juegos juniper kaufen kddi ke kerryhotels kerryproperties kfh kg kh ki kia kids kim kindle kitchen kiwi km kn koeln komatsu kosher kp kpmg kpn kr krd kred kuokgroup kw ky kyoto kz la lacaixa lamborghini lamer land landrover lanxess lasalle lat latino latrobe law law.za lawyer lb lc lds lease leclerc lefrak legal lego lexus lgbt li lidl life lifeinsurance lifestyle lighting like lilly limited limo lincoln link live living lk llc llp loan loans locker locus lol london lotte lotto love lpl lplfinancial lr ls lt ltd ltda lu lundbeck luxe luxury lv ly ma madrid maif maison makeup man management mango map market marketing markets marriott marshalls mattel mba mc mckinsey md me med media meet melbourne meme memorial men menu merck merckmsd mg mh miami microsoft mil mil.za mini mint mit mitsubishi mk ml mlb mls mm mma mn mo mobi mobile moda moe moi mom monash money monster mormon mortgage moscow moto motorcycles mov movie mp mq mr ms msd mt mtn mtr mu museum music mv mw mx my mz na nab nagoya name navy nba nc ne nec net net.za netbank netflix network neustar new news next nextdirect nexus nf nfl ng ngo ngo.za nhk ni nic.za nico nike nikon ninja nis.za nissan nissay nl no nokia nom.za norton now nowruz nowtv np nr nra nrw ntt nu nyc nz obi observer office okinawa olayan olayangroup ollo om omega one ong onion onl online ooo open oracle orange org org.za organic origins osaka otsuka ott ovh pa page panasonic paris pars partners parts party pay pccw pe pet pf pfizer pg ph pharmacy phd philips phone photo photography photos physio pics pictet pictures pid pin ping pink pioneer pizza pk pl place play playstation plumbing plus pm pn pnc pohl poker politie porn post pr praxi press prime pro prod productions prof progressive promo properties property protection pru prudential ps pt pub pw pwc py qa qpon quebec quest racing radio re read realestate realtor realty recipes red redumbrella rehab reise reisen reit reliance ren rent rentals repair report republican rest restaurant review reviews rexroth rich richardli ricoh ril rio rip ro rocks rodeo rogers room rs rsvp ru rugby ruhr run rw rwe ryukyu sa saarland safe safety sakura sale salon samsclub samsung sandvik sandvikcoromant sanofi sap sarl sas save saxo sb sbi sbs sc scb schaeffler schmidt scholarships school school.za schule schwarz science scot sd se search seat secure security seek select sener services seven sew sex sexy sfr sg sh shangrila sharp shell shia shiksha shoes shop shopping shouji show si silk sina singles site sj sk ski skin sky skype sl sling sm smart smile sn sncf so soccer social softbank software sohu solar solutions song sony soy spa space sport spot sr srl ss st stada staples star statebank statefarm stc stcgroup stockholm storage store stream studio study style su sucks supplies supply support surf surgery suzuki sv swatch swiss sx sy sydney systems sz tab taipei talk taobao target tatamotors tatar tattoo tax taxi tc tci td tdk team tech technology tel temasek tennis teva tf tg th thd theater theatre tiaa tickets tienda tips tires tirol tj tjmaxx tjx tk tkmaxx tl tm tm.za tmall tn to today tokyo tools top toray toshiba total tours town toyota toys tr trade trading training travel travelers travelersinsurance trust trv tt tube tui tunes tushu tv tvs tw tz ua ubank ubs ug uk unicom university uno uol ups us uy uz va vacations vana vanguard vc ve vegas ventures verisign versicherung vet vg vi viajes video vig viking villas vin vip virgin visa vision viva vivo vlaanderen vn vodka volvo vote voting voto voyage vu wales walmart walter wang wanggou watch watches weather weatherchannel web.za webcam weber website wed wedding weibo weir wf whoswho wien wiki williamhill win windows wine winners wme woodside work works world wow ws wtc wtf xbox xerox xihuan xin xn--11b4c3d xn--1ck2e1b xn--1qqw23a xn--2scrj9c xn--30rr7y xn--3bst00m xn--3ds443g xn--3e0b707e xn--3hcrj9c xn--3pxu8k xn--42c2d9a xn--45br5cyl xn--45brj9c xn--45q11c xn--4dbrk0ce xn--4gbrim xn--54b7fta0cc xn--55qw42g xn--55qx5d xn--5su34j936bgsg xn--5tzm5g xn--6frz82g xn--6qq986b3xl xn--80adxhks xn--80ao21a xn--80aqecdr1a xn--80asehdb xn--80aswg xn--8y0a063a xn--90a3ac xn--90ae xn--90ais xn--9dbq2a xn--9et52u xn--9krt00a xn--b4w605ferd xn--bck1b9a5dre4c xn--c1avg xn--c2br7g xn--cck2b3b xn--cckwcxetd xn--cg4bki xn--clchc0ea0b2g2a9gcd xn--czr694b xn--czrs0t xn--czru2d xn--d1acj3b xn--d1alf xn--e1a4c xn--eckvdtc9d xn--efvy88h xn--fct429k xn--fhbei xn--fiq228c5hs xn--fiq64b xn--fiqs8s xn--fiqz9s xn--fjq720a xn--flw351e xn--fpcrj9c3d xn--fzc2c9e2c xn--fzys8d69uvgm xn--g2xx48c xn--gckr3f0f xn--gecrj9c xn--gk3at1e xn--h2breg3eve xn--h2brj9c xn--h2brj9c8c xn--hxt814e xn--i1b6b1a6a2e xn--imr513n xn--io0a7i xn--j1aef xn--j1amh xn--j6w193g xn--jlq480n2rg xn--jvr189m xn--kcrx77d1x4a xn--kprw13d xn--kpry57d xn--kput3i xn--l1acc xn--lgbbat1ad8j xn--mgb2ddes xn--mgb9awbf xn--mgba3a3ejt xn--mgba3a4f16a xn--mgba3a4fra xn--mgba7c0bbn0a xn--mgbaam7a8h xn--mgbab2bd xn--mgbah1a3hjkrd xn--mgbai9a5eva00b xn--mgbai9azgqp6j xn--mgbayh7gpa xn--mgbbh1a xn--mgbbh1a71e xn--mgbc0a9azcg xn--mgbca7dzdo xn--mgbcpq6gpa1a xn--mgberp4a5d4a87g xn--mgberp4a5d4ar xn--mgbgu82a xn--mgbi4ecexp xn--mgbpl2fh xn--mgbqly7c0a67fbc xn--mgbqly7cvafr xn--mgbt3dhd xn--mgbtf8fl xn--mgbtx2b xn--mgbx4cd0ab xn--mix082f xn--mix891f xn--mk1bu44c xn--mxtq1m xn--ngbc5azd xn--ngbe9e0a xn--ngbrx xn--nnx388a xn--node xn--nqv7f xn--nqv7fs00ema xn--nyqy26a xn--o3cw4h xn--ogbpf8fl xn--otu796d xn--p1acf xn--p1ai xn--pgbs0dh xn--pssy2u xn--q7ce6a xn--q9jyb4c xn--qcka1pmc xn--qxa6a xn--qxam xn--rhqv96g xn--rovu88b xn--rvc1e0am3e xn--s9brj9c xn--ses554g xn--t60b56a xn--tckwe xn--tiq49xqyj xn--unup4y xn--vermgensberater-ctb xn--vermgensberatung-pwb xn--vhquv xn--vuq861b xn--w4r85el8fhu5dnra xn--w4rs40l xn--wgbh1c xn--wgbl6a xn--xhq521b xn--xkc2al3hye2a xn--xkc2dl3a5ee0h xn--y9a3aq xn--yfro4i67o xn--ygbi2ammx xn--zfr164b xxx xyz yachts yahoo yamaxun yandex ye yodobashi yoga yokohama you youtube yt yun zappos zara zero zip zm zone zuerich zw".split(' '),
7
+ );
8
+
9
+ export const isPlausibleHost = makeHostChecker(SUFFIXES);
@@ -0,0 +1,9 @@
1
+ // Generated by tools/maketlds.js. Do not edit by hand; rerun the generator.
2
+ // IANA root-zone TLDs: the rightmost label must be a real TLD.
3
+ import { makeHostChecker } from './plausible.js';
4
+
5
+ const TLDS = new Set(
6
+ "aaa aarp abb abbott abbvie abc able abogado abudhabi ac academy accenture accountant accountants aco actor ad ads adult ae aeg aero aetna af afl africa ag agakhan agency ai aig airbus airforce airtel akdn al alibaba alipay allfinanz allstate ally alsace alstom am amazon americanexpress americanfamily amex amfam amica amsterdam analytics android anquan anz ao aol apartments app apple aq aquarelle ar arab aramco archi army arpa art arte as asda asia associates at athleta attorney au auction audi audible audio auspost author auto autos aw aws ax axa az azure ba baby baidu banamex band bank bar barcelona barclaycard barclays barefoot bargains baseball basketball bauhaus bayern bb bbc bbt bbva bcg bcn bd be beats beauty beer berlin best bestbuy bet bf bg bh bharti bi bible bid bike bing bingo bio biz bj black blackfriday blockbuster blog bloomberg blue bm bms bmw bn bnpparibas bo boats boehringer bofa bom bond boo book booking bosch bostik boston bot boutique box br bradesco bridgestone broadway broker brother brussels bs bt build builders business buy buzz bv bw by bz bzh ca cab cafe cal call calvinklein cam camera camp canon capetown capital capitalone car caravan cards care career careers cars casa case cash casino cat catering catholic cba cbn cbre cc cd center ceo cern cf cfa cfd cg ch chanel channel charity chase chat cheap chintai christmas chrome church ci cipriani circle cisco citadel citi citic city ck cl claims cleaning click clinic clinique clothing cloud club clubmed cm cn co coach codes coffee college cologne com commbank community company compare computer comsec condos construction consulting contact contractors cooking cool coop corsica country coupon coupons courses cpa cr credit creditcard creditunion cricket crown crs cruise cruises cu cuisinella cv cw cx cy cymru cyou cz dad dance data date dating datsun day dclk dds de deal dealer deals degree delivery dell deloitte delta democrat dental dentist desi design dev dhl diamonds diet digital direct directory discount discover dish diy dj dk dm dnp do docs doctor dog domains dot download drive dtv dubai dupont durban dvag dvr dz earth eat ec eco edeka edu education ee eg email emerck energy engineer engineering enterprises epson equipment er ericsson erni es esq estate et eu eurovision eus events exchange expert exposed express extraspace fage fail fairwinds faith family fan fans farm farmers fashion fast fedex feedback ferrari ferrero fi fidelity fido film final finance financial fire firestone firmdale fish fishing fit fitness fj fk flickr flights flir florist flowers fly fm fo foo food football ford forex forsale forum foundation fox fr free fresenius frl frogans frontier ftr fujitsu fun fund furniture futbol fyi ga gal gallery gallo gallup game games gap garden gay gb gbiz gd gdn ge gea gent genting george gf gg ggee gh gi gift gifts gives giving gl glass gle global globo gm gmail gmbh gmo gmx gn godaddy gold goldpoint golf goodyear goog google gop got gov gp gq gr grainger graphics gratis green gripe grocery group gs gt gu gucci guge guide guitars guru gw gy hair hamburg hangout haus hbo hdfc hdfcbank health healthcare help helsinki here hermes hiphop hisamitsu hitachi hiv hk hkt hm hn hockey holdings holiday homedepot homegoods homes homesense honda horse hospital host hosting hot hotels hotmail house how hr hsbc ht hu hughes hyatt hyundai ibm icbc ice icu id ie ieee ifm ikano il im imamat imdb immo immobilien in inc industries infiniti info ing ink institute insurance insure int international intuit investments io ipiranga iq ir irish is ismaili ist istanbul it itau itv jaguar java jcb je jeep jetzt jewelry jio jll jm jmp jnj jo jobs joburg jot joy jp jpmorgan jprs juegos juniper kaufen kddi ke kerryhotels kerryproperties kfh kg kh ki kia kids kim kindle kitchen kiwi km kn koeln komatsu kosher kp kpmg kpn kr krd kred kuokgroup kw ky kyoto kz la lacaixa lamborghini lamer land landrover lanxess lasalle lat latino latrobe law lawyer lb lc lds lease leclerc lefrak legal lego lexus lgbt li lidl life lifeinsurance lifestyle lighting like lilly limited limo lincoln link live living lk llc llp loan loans locker locus lol london lotte lotto love lpl lplfinancial lr ls lt ltd ltda lu lundbeck luxe luxury lv ly ma madrid maif maison makeup man management mango map market marketing markets marriott marshalls mattel mba mc mckinsey md me med media meet melbourne meme memorial men menu merck merckmsd mg mh miami microsoft mil mini mint mit mitsubishi mk ml mlb mls mm mma mn mo mobi mobile moda moe moi mom monash money monster mormon mortgage moscow moto motorcycles mov movie mp mq mr ms msd mt mtn mtr mu museum music mv mw mx my mz na nab nagoya name navy nba nc ne nec net netbank netflix network neustar new news next nextdirect nexus nf nfl ng ngo nhk ni nico nike nikon ninja nissan nissay nl no nokia norton now nowruz nowtv np nr nra nrw ntt nu nyc nz obi observer office okinawa olayan olayangroup ollo om omega one ong onl online ooo open oracle orange org organic origins osaka otsuka ott ovh pa page panasonic paris pars partners parts party pay pccw pe pet pf pfizer pg ph pharmacy phd philips phone photo photography photos physio pics pictet pictures pid pin ping pink pioneer pizza pk pl place play playstation plumbing plus pm pn pnc pohl poker politie porn post pr praxi press prime pro prod productions prof progressive promo properties property protection pru prudential ps pt pub pw pwc py qa qpon quebec quest racing radio re read realestate realtor realty recipes red redumbrella rehab reise reisen reit reliance ren rent rentals repair report republican rest restaurant review reviews rexroth rich richardli ricoh ril rio rip ro rocks rodeo rogers room rs rsvp ru rugby ruhr run rw rwe ryukyu sa saarland safe safety sakura sale salon samsclub samsung sandvik sandvikcoromant sanofi sap sarl sas save saxo sb sbi sbs sc scb schaeffler schmidt scholarships school schule schwarz science scot sd se search seat secure security seek select sener services seven sew sex sexy sfr sg sh shangrila sharp shell shia shiksha shoes shop shopping shouji show si silk sina singles site sj sk ski skin sky skype sl sling sm smart smile sn sncf so soccer social softbank software sohu solar solutions song sony soy spa space sport spot sr srl ss st stada staples star statebank statefarm stc stcgroup stockholm storage store stream studio study style su sucks supplies supply support surf surgery suzuki sv swatch swiss sx sy sydney systems sz tab taipei talk taobao target tatamotors tatar tattoo tax taxi tc tci td tdk team tech technology tel temasek tennis teva tf tg th thd theater theatre tiaa tickets tienda tips tires tirol tj tjmaxx tjx tk tkmaxx tl tm tmall tn to today tokyo tools top toray toshiba total tours town toyota toys tr trade trading training travel travelers travelersinsurance trust trv tt tube tui tunes tushu tv tvs tw tz ua ubank ubs ug uk unicom university uno uol ups us uy uz va vacations vana vanguard vc ve vegas ventures verisign versicherung vet vg vi viajes video vig viking villas vin vip virgin visa vision viva vivo vlaanderen vn vodka volvo vote voting voto voyage vu wales walmart walter wang wanggou watch watches weather weatherchannel webcam weber website wed wedding weibo weir wf whoswho wien wiki williamhill win windows wine winners wme woodside work works world wow ws wtc wtf xbox xerox xihuan xin xn--11b4c3d xn--1ck2e1b xn--1qqw23a xn--2scrj9c xn--30rr7y xn--3bst00m xn--3ds443g xn--3e0b707e xn--3hcrj9c xn--3pxu8k xn--42c2d9a xn--45br5cyl xn--45brj9c xn--45q11c xn--4dbrk0ce xn--4gbrim xn--54b7fta0cc xn--55qw42g xn--55qx5d xn--5su34j936bgsg xn--5tzm5g xn--6frz82g xn--6qq986b3xl xn--80adxhks xn--80ao21a xn--80aqecdr1a xn--80asehdb xn--80aswg xn--8y0a063a xn--90a3ac xn--90ae xn--90ais xn--9dbq2a xn--9et52u xn--9krt00a xn--b4w605ferd xn--bck1b9a5dre4c xn--c1avg xn--c2br7g xn--cck2b3b xn--cckwcxetd xn--cg4bki xn--clchc0ea0b2g2a9gcd xn--czr694b xn--czrs0t xn--czru2d xn--d1acj3b xn--d1alf xn--e1a4c xn--eckvdtc9d xn--efvy88h xn--fct429k xn--fhbei xn--fiq228c5hs xn--fiq64b xn--fiqs8s xn--fiqz9s xn--fjq720a xn--flw351e xn--fpcrj9c3d xn--fzc2c9e2c xn--fzys8d69uvgm xn--g2xx48c xn--gckr3f0f xn--gecrj9c xn--gk3at1e xn--h2breg3eve xn--h2brj9c xn--h2brj9c8c xn--hxt814e xn--i1b6b1a6a2e xn--imr513n xn--io0a7i xn--j1aef xn--j1amh xn--j6w193g xn--jlq480n2rg xn--jvr189m xn--kcrx77d1x4a xn--kprw13d xn--kpry57d xn--kput3i xn--l1acc xn--lgbbat1ad8j xn--mgb9awbf xn--mgba3a3ejt xn--mgba3a4f16a xn--mgba7c0bbn0a xn--mgbaam7a8h xn--mgbab2bd xn--mgbah1a3hjkrd xn--mgbai9azgqp6j xn--mgbayh7gpa xn--mgbbh1a xn--mgbbh1a71e xn--mgbc0a9azcg xn--mgbca7dzdo xn--mgbcpq6gpa1a xn--mgberp4a5d4ar xn--mgbgu82a xn--mgbi4ecexp xn--mgbpl2fh xn--mgbt3dhd xn--mgbtx2b xn--mgbx4cd0ab xn--mix891f xn--mk1bu44c xn--mxtq1m xn--ngbc5azd xn--ngbe9e0a xn--ngbrx xn--node xn--nqv7f xn--nqv7fs00ema xn--nyqy26a xn--o3cw4h xn--ogbpf8fl xn--otu796d xn--p1acf xn--p1ai xn--pgbs0dh xn--pssy2u xn--q7ce6a xn--q9jyb4c xn--qcka1pmc xn--qxa6a xn--qxam xn--rhqv96g xn--rovu88b xn--rvc1e0am3e xn--s9brj9c xn--ses554g xn--t60b56a xn--tckwe xn--tiq49xqyj xn--unup4y xn--vermgensberater-ctb xn--vermgensberatung-pwb xn--vhquv xn--vuq861b xn--w4r85el8fhu5dnra xn--w4rs40l xn--wgbh1c xn--wgbl6a xn--xhq521b xn--xkc2al3hye2a xn--xkc2dl3a5ee0h xn--y9a3aq xn--yfro4i67o xn--ygbi2ammx xn--zfr164b xxx xyz yachts yahoo yamaxun yandex ye yodobashi yoga yokohama you youtube yt yun za zappos zara zero zip zm zone zuerich zw".split(' '),
7
+ );
8
+
9
+ export const isPlausibleHost = makeHostChecker(TLDS);