tldts-icann 6.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +13 -0
- package/README.md +290 -0
- package/dist/cjs/index.js +656 -0
- package/dist/cjs/index.js.map +1 -0
- package/dist/cjs/src/data/trie.js +14 -0
- package/dist/cjs/src/data/trie.js.map +1 -0
- package/dist/cjs/src/suffix-trie.js +57 -0
- package/dist/cjs/src/suffix-trie.js.map +1 -0
- package/dist/cjs/tsconfig.tsbuildinfo +1 -0
- package/dist/es6/index.js +33 -0
- package/dist/es6/index.js.map +1 -0
- package/dist/es6/src/data/trie.js +11 -0
- package/dist/es6/src/data/trie.js.map +1 -0
- package/dist/es6/src/suffix-trie.js +54 -0
- package/dist/es6/src/suffix-trie.js.map +1 -0
- package/dist/es6/tsconfig.bundle.tsbuildinfo +1 -0
- package/dist/index.cjs.min.js +2 -0
- package/dist/index.cjs.min.js.map +1 -0
- package/dist/index.esm.min.js +2 -0
- package/dist/index.esm.min.js.map +1 -0
- package/dist/index.umd.min.js +2 -0
- package/dist/index.umd.min.js.map +1 -0
- package/dist/types/index.d.ts +7 -0
- package/dist/types/src/data/trie.d.ts +5 -0
- package/dist/types/src/suffix-trie.d.ts +5 -0
- package/index.ts +62 -0
- package/package.json +88 -0
- package/src/data/trie.ts +14 -0
- package/src/suffix-trie.ts +87 -0
package/LICENSE
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
Copyright (c) 2017 Thomas Parisot, 2018 Rémi Berson
|
|
2
|
+
|
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
|
|
4
|
+
associated documentation files (the "Software"), to deal in the Software without restriction,
|
|
5
|
+
including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
6
|
+
and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
|
|
7
|
+
subject to the following conditions:
|
|
8
|
+
|
|
9
|
+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
|
|
10
|
+
|
|
11
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
12
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
|
13
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
1
|
+
# tldts - Blazing Fast URL Parsing (ICANN rules only)
|
|
2
|
+
|
|
3
|
+
`tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.
|
|
4
|
+
|
|
5
|
+
**Features**:
|
|
6
|
+
|
|
7
|
+
1. Tuned for **performance** (order of 0.1 to 1 μs per input)
|
|
8
|
+
2. Handles both URLs and hostnames
|
|
9
|
+
3. Full Unicode/IDNA support
|
|
10
|
+
4. Support parsing email addresses
|
|
11
|
+
5. Detect IPv4 and IPv6 addresses
|
|
12
|
+
6. Continuously updated version of the public suffix list
|
|
13
|
+
7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and _type definitions_
|
|
14
|
+
8. Small bundles and small memory footprint
|
|
15
|
+
9. Battle tested: full test coverage and production use
|
|
16
|
+
|
|
17
|
+
# Install
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
npm install --save tldts-icann
|
|
21
|
+
```
|
|
22
|
+
|
|
23
|
+
# Usage
|
|
24
|
+
|
|
25
|
+
Programmatically:
|
|
26
|
+
|
|
27
|
+
```js
|
|
28
|
+
const { parse } = require('tldts-icann');
|
|
29
|
+
|
|
30
|
+
// Retrieving hostname related informations of a given URL
|
|
31
|
+
parse('http://www.writethedocs.org/conf/eu/2017/');
|
|
32
|
+
// { domain: 'writethedocs.org',
|
|
33
|
+
// domainWithoutSuffix: 'writethedocs',
|
|
34
|
+
// hostname: 'www.writethedocs.org',
|
|
35
|
+
// isIp: false,
|
|
36
|
+
// publicSuffix: 'org',
|
|
37
|
+
// subdomain: 'www' }
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Modern _ES6 modules import_ is also supported:
|
|
41
|
+
|
|
42
|
+
```js
|
|
43
|
+
import { parse } from 'tldts-icann';
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts
|
|
47
|
+
|
|
48
|
+
# API
|
|
49
|
+
|
|
50
|
+
- `tldts.parse(url | hostname, options)`
|
|
51
|
+
- `tldts.getHostname(url | hostname, options)`
|
|
52
|
+
- `tldts.getDomain(url | hostname, options)`
|
|
53
|
+
- `tldts.getPublicSuffix(url | hostname, options)`
|
|
54
|
+
- `tldts.getSubdomain(url, | hostname, options)`
|
|
55
|
+
- `tldts.getDomainWithoutSuffix(url | hostname, options)`
|
|
56
|
+
|
|
57
|
+
The behavior of `tldts` can be customized using an `options` argument for all
|
|
58
|
+
the functions exposed as part of the public API. This is useful to both change
|
|
59
|
+
the behavior of the library as well as fine-tune the performance depending on
|
|
60
|
+
your inputs.
|
|
61
|
+
|
|
62
|
+
```js
|
|
63
|
+
{
|
|
64
|
+
// Extract and validate hostname (default: true)
|
|
65
|
+
// When set to `false`, inputs will be considered valid hostnames.
|
|
66
|
+
extractHostname: boolean;
|
|
67
|
+
// Validate hostnames after parsing (default: true)
|
|
68
|
+
// If a hostname is not valid, not further processing is performed. When set
|
|
69
|
+
// to `false`, inputs to the library will be considered valid and parsing will
|
|
70
|
+
// proceed regardless.
|
|
71
|
+
validateHostname: boolean;
|
|
72
|
+
// Perform IP address detection (default: true).
|
|
73
|
+
detectIp: boolean;
|
|
74
|
+
// Assume that both URLs and hostnames can be given as input (default: true)
|
|
75
|
+
// If set to `false` we assume only URLs will be given as input, which
|
|
76
|
+
// speed-ups processing.
|
|
77
|
+
mixedInputs: boolean;
|
|
78
|
+
// Specifies extra valid suffixes (default: null)
|
|
79
|
+
validHosts: string[] | null;
|
|
80
|
+
}
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
The `parse` method returns handy **properties about a URL or a hostname**.
|
|
84
|
+
|
|
85
|
+
```js
|
|
86
|
+
const tldts = require('tldts-icann');
|
|
87
|
+
|
|
88
|
+
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
|
|
89
|
+
// { domain: 'amazonaws.com',
|
|
90
|
+
// domainWithoutSuffix: 'amazonaws',
|
|
91
|
+
// hostname: 'spark-public.s3.amazonaws.com',
|
|
92
|
+
// isIp: false,
|
|
93
|
+
// publicSuffix: 'com',
|
|
94
|
+
// subdomain: 'spark-public.s3' }
|
|
95
|
+
|
|
96
|
+
tldts.parse(
|
|
97
|
+
'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
|
|
98
|
+
{ allowPrivateDomains: true },
|
|
99
|
+
);
|
|
100
|
+
// { domain: 'spark-public.s3.amazonaws.com',
|
|
101
|
+
// domainWithoutSuffix: 'spark-public',
|
|
102
|
+
// hostname: 'spark-public.s3.amazonaws.com',
|
|
103
|
+
// isIp: false,
|
|
104
|
+
// publicSuffix: 's3.amazonaws.com',
|
|
105
|
+
// subdomain: '' }
|
|
106
|
+
|
|
107
|
+
tldts.parse('gopher://domain.unknown/');
|
|
108
|
+
// { domain: 'domain.unknown',
|
|
109
|
+
// domainWithoutSuffix: 'domain',
|
|
110
|
+
// hostname: 'domain.unknown',
|
|
111
|
+
// isIp: false,
|
|
112
|
+
// publicSuffix: 'unknown',
|
|
113
|
+
// subdomain: '' }
|
|
114
|
+
|
|
115
|
+
tldts.parse('https://192.168.0.0'); // IPv4
|
|
116
|
+
// { domain: null,
|
|
117
|
+
// domainWithoutSuffix: null,
|
|
118
|
+
// hostname: '192.168.0.0',
|
|
119
|
+
// isIp: true,
|
|
120
|
+
// publicSuffix: null,
|
|
121
|
+
// subdomain: null }
|
|
122
|
+
|
|
123
|
+
tldts.parse('https://[::1]'); // IPv6
|
|
124
|
+
// { domain: null,
|
|
125
|
+
// domainWithoutSuffix: null,
|
|
126
|
+
// hostname: '::1',
|
|
127
|
+
// isIp: true,
|
|
128
|
+
// publicSuffix: null,
|
|
129
|
+
// subdomain: null }
|
|
130
|
+
|
|
131
|
+
tldts.parse('tldts@emailprovider.co.uk'); // email
|
|
132
|
+
// { domain: 'emailprovider.co.uk',
|
|
133
|
+
// domainWithoutSuffix: 'emailprovider',
|
|
134
|
+
// hostname: 'emailprovider.co.uk',
|
|
135
|
+
// isIp: false,
|
|
136
|
+
// publicSuffix: 'co.uk',
|
|
137
|
+
// subdomain: '' }
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
| Property Name | Type | Description |
|
|
141
|
+
| :-------------------- | :----- | :---------------------------------------------- |
|
|
142
|
+
| `hostname` | `str` | `hostname` of the input extracted automatically |
|
|
143
|
+
| `domain` | `str` | Domain (tld + sld) |
|
|
144
|
+
| `domainWithoutSuffix` | `str` | Domain without public suffix |
|
|
145
|
+
| `subdomain` | `str` | Sub domain (what comes after `domain`) |
|
|
146
|
+
| `publicSuffix` | `str` | Public Suffix (tld) of `hostname` |
|
|
147
|
+
| `isIP` | `bool` | Is `hostname` an IP address? |
|
|
148
|
+
|
|
149
|
+
## Single purpose methods
|
|
150
|
+
|
|
151
|
+
These methods are shorthands if you want to retrieve only a single value (and
|
|
152
|
+
will perform better than `parse` because less work will be needed).
|
|
153
|
+
|
|
154
|
+
### getHostname(url | hostname, options?)
|
|
155
|
+
|
|
156
|
+
Returns the hostname from a given string.
|
|
157
|
+
|
|
158
|
+
```javascript
|
|
159
|
+
const { getHostname } = require('tldts-icann');
|
|
160
|
+
|
|
161
|
+
getHostname('google.com'); // returns `google.com`
|
|
162
|
+
getHostname('fr.google.com'); // returns `fr.google.com`
|
|
163
|
+
getHostname('fr.google.google'); // returns `fr.google.google`
|
|
164
|
+
getHostname('foo.google.co.uk'); // returns `foo.google.co.uk`
|
|
165
|
+
getHostname('t.co'); // returns `t.co`
|
|
166
|
+
getHostname('fr.t.co'); // returns `fr.t.co`
|
|
167
|
+
getHostname(
|
|
168
|
+
'https://user:password@example.co.uk:8080/some/path?and&query#hash',
|
|
169
|
+
); // returns `example.co.uk`
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### getDomain(url | hostname, options?)
|
|
173
|
+
|
|
174
|
+
Returns the fully qualified domain from a given string.
|
|
175
|
+
|
|
176
|
+
```javascript
|
|
177
|
+
const { getDomain } = require('tldts-icann');
|
|
178
|
+
|
|
179
|
+
getDomain('google.com'); // returns `google.com`
|
|
180
|
+
getDomain('fr.google.com'); // returns `google.com`
|
|
181
|
+
getDomain('fr.google.google'); // returns `google.google`
|
|
182
|
+
getDomain('foo.google.co.uk'); // returns `google.co.uk`
|
|
183
|
+
getDomain('t.co'); // returns `t.co`
|
|
184
|
+
getDomain('fr.t.co'); // returns `t.co`
|
|
185
|
+
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
### getDomainWithoutSuffix(url | hostname, options?)
|
|
189
|
+
|
|
190
|
+
Returns the domain (as returned by `getDomain(...)`) without the public suffix part.
|
|
191
|
+
|
|
192
|
+
```javascript
|
|
193
|
+
const { getDomainWithoutSuffix } = require('tldts-icann');
|
|
194
|
+
|
|
195
|
+
getDomainWithoutSuffix('google.com'); // returns `google`
|
|
196
|
+
getDomainWithoutSuffix('fr.google.com'); // returns `google`
|
|
197
|
+
getDomainWithoutSuffix('fr.google.google'); // returns `google`
|
|
198
|
+
getDomainWithoutSuffix('foo.google.co.uk'); // returns `google`
|
|
199
|
+
getDomainWithoutSuffix('t.co'); // returns `t`
|
|
200
|
+
getDomainWithoutSuffix('fr.t.co'); // returns `t`
|
|
201
|
+
getDomainWithoutSuffix(
|
|
202
|
+
'https://user:password@example.co.uk:8080/some/path?and&query#hash',
|
|
203
|
+
); // returns `example`
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### getSubdomain(url | hostname, options?)
|
|
207
|
+
|
|
208
|
+
Returns the complete subdomain for a given string.
|
|
209
|
+
|
|
210
|
+
```javascript
|
|
211
|
+
const { getSubdomain } = require('tldts-icann');
|
|
212
|
+
|
|
213
|
+
getSubdomain('google.com'); // returns ``
|
|
214
|
+
getSubdomain('fr.google.com'); // returns `fr`
|
|
215
|
+
getSubdomain('google.co.uk'); // returns ``
|
|
216
|
+
getSubdomain('foo.google.co.uk'); // returns `foo`
|
|
217
|
+
getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo`
|
|
218
|
+
getSubdomain('t.co'); // returns ``
|
|
219
|
+
getSubdomain('fr.t.co'); // returns `fr`
|
|
220
|
+
getSubdomain(
|
|
221
|
+
'https://user:password@secure.example.co.uk:443/some/path?and&query#hash',
|
|
222
|
+
); // returns `secure`
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
### getPublicSuffix(url | hostname, options?)
|
|
226
|
+
|
|
227
|
+
Returns the [public suffix][] for a given string.
|
|
228
|
+
|
|
229
|
+
```javascript
|
|
230
|
+
const { getPublicSuffix } = require('tldts-icann');
|
|
231
|
+
|
|
232
|
+
getPublicSuffix('google.com'); // returns `com`
|
|
233
|
+
getPublicSuffix('fr.google.com'); // returns `com`
|
|
234
|
+
getPublicSuffix('google.co.uk'); // returns `co.uk`
|
|
235
|
+
getPublicSuffix('s3.amazonaws.com'); // returns `com`
|
|
236
|
+
getPublicSuffix('tld.is.unknown'); // returns `unknown`
|
|
237
|
+
```
|
|
238
|
+
|
|
239
|
+
# Troubleshooting
|
|
240
|
+
|
|
241
|
+
## Retrieving subdomain of `localhost` and custom hostnames
|
|
242
|
+
|
|
243
|
+
`tldts` methods `getDomain` and `getSubdomain` are designed to **work only with _known and valid_ TLDs**.
|
|
244
|
+
This way, you can trust what a domain is.
|
|
245
|
+
|
|
246
|
+
`localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:
|
|
247
|
+
|
|
248
|
+
```js
|
|
249
|
+
const tldts = require('tldts-icann');
|
|
250
|
+
|
|
251
|
+
tldts.getDomain('localhost'); // returns null
|
|
252
|
+
tldts.getSubdomain('vhost.localhost'); // returns null
|
|
253
|
+
|
|
254
|
+
tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
|
|
255
|
+
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
|
|
256
|
+
```
|
|
257
|
+
|
|
258
|
+
## Updating the TLDs List
|
|
259
|
+
|
|
260
|
+
`tldts` made the opinionated choice of shipping with a list of suffixes directly
|
|
261
|
+
in its bundle. There is currently no mechanism to update the lists yourself, but
|
|
262
|
+
we make sure that the version shipped is always up-to-date.
|
|
263
|
+
|
|
264
|
+
If you keep `tldts` updated, the lists should be up-to-date as well!
|
|
265
|
+
|
|
266
|
+
# Performance
|
|
267
|
+
|
|
268
|
+
`tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).
|
|
269
|
+
|
|
270
|
+
Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.
|
|
271
|
+
|
|
272
|
+
## Contributors
|
|
273
|
+
|
|
274
|
+
`tldts` is based upon the excellent `tld.js` library and would not exist without
|
|
275
|
+
the many contributors who worked on the project:
|
|
276
|
+
<a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>
|
|
277
|
+
|
|
278
|
+
This project would not be possible without the amazing Mozilla's
|
|
279
|
+
[public suffix list][]. Thank you for your hard work!
|
|
280
|
+
|
|
281
|
+
# License
|
|
282
|
+
|
|
283
|
+
[MIT License](LICENSE).
|
|
284
|
+
|
|
285
|
+
[badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
|
|
286
|
+
[badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
|
|
287
|
+
[public suffix list]: https://publicsuffix.org/list/
|
|
288
|
+
[list the recent changes]: https://github.com/publicsuffix/list/commits/master
|
|
289
|
+
[changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
|
|
290
|
+
[public suffix]: https://publicsuffix.org/learn/
|