tldts-icann 6.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,13 @@
1
+ Copyright (c) 2017 Thomas Parisot, 2018 Rémi Berson
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
4
+ associated documentation files (the "Software"), to deal in the Software without restriction,
5
+ including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
6
+ and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
7
+ subject to the following conditions:
8
+
9
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
10
+
11
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
12
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
13
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,290 @@
1
+ # tldts - Blazing Fast URL Parsing (ICANN rules only)
2
+
3
+ `tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.
4
+
5
+ **Features**:
6
+
7
+ 1. Tuned for **performance** (order of 0.1 to 1 μs per input)
8
+ 2. Handles both URLs and hostnames
9
+ 3. Full Unicode/IDNA support
10
+ 4. Support parsing email addresses
11
+ 5. Detect IPv4 and IPv6 addresses
12
+ 6. Continuously updated version of the public suffix list
13
+ 7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and _type definitions_
14
+ 8. Small bundles and small memory footprint
15
+ 9. Battle tested: full test coverage and production use
16
+
17
+ # Install
18
+
19
+ ```bash
20
+ npm install --save tldts-icann
21
+ ```
22
+
23
+ # Usage
24
+
25
+ Programmatically:
26
+
27
+ ```js
28
+ const { parse } = require('tldts-icann');
29
+
30
+ // Retrieving hostname related informations of a given URL
31
+ parse('http://www.writethedocs.org/conf/eu/2017/');
32
+ // { domain: 'writethedocs.org',
33
+ // domainWithoutSuffix: 'writethedocs',
34
+ // hostname: 'www.writethedocs.org',
35
+ // isIp: false,
36
+ // publicSuffix: 'org',
37
+ // subdomain: 'www' }
38
+ ```
39
+
40
+ Modern _ES6 modules import_ is also supported:
41
+
42
+ ```js
43
+ import { parse } from 'tldts-icann';
44
+ ```
45
+
46
+ Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts
47
+
48
+ # API
49
+
50
+ - `tldts.parse(url | hostname, options)`
51
+ - `tldts.getHostname(url | hostname, options)`
52
+ - `tldts.getDomain(url | hostname, options)`
53
+ - `tldts.getPublicSuffix(url | hostname, options)`
54
+ - `tldts.getSubdomain(url, | hostname, options)`
55
+ - `tldts.getDomainWithoutSuffix(url | hostname, options)`
56
+
57
+ The behavior of `tldts` can be customized using an `options` argument for all
58
+ the functions exposed as part of the public API. This is useful to both change
59
+ the behavior of the library as well as fine-tune the performance depending on
60
+ your inputs.
61
+
62
+ ```js
63
+ {
64
+ // Extract and validate hostname (default: true)
65
+ // When set to `false`, inputs will be considered valid hostnames.
66
+ extractHostname: boolean;
67
+ // Validate hostnames after parsing (default: true)
68
+ // If a hostname is not valid, not further processing is performed. When set
69
+ // to `false`, inputs to the library will be considered valid and parsing will
70
+ // proceed regardless.
71
+ validateHostname: boolean;
72
+ // Perform IP address detection (default: true).
73
+ detectIp: boolean;
74
+ // Assume that both URLs and hostnames can be given as input (default: true)
75
+ // If set to `false` we assume only URLs will be given as input, which
76
+ // speed-ups processing.
77
+ mixedInputs: boolean;
78
+ // Specifies extra valid suffixes (default: null)
79
+ validHosts: string[] | null;
80
+ }
81
+ ```
82
+
83
+ The `parse` method returns handy **properties about a URL or a hostname**.
84
+
85
+ ```js
86
+ const tldts = require('tldts-icann');
87
+
88
+ tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
89
+ // { domain: 'amazonaws.com',
90
+ // domainWithoutSuffix: 'amazonaws',
91
+ // hostname: 'spark-public.s3.amazonaws.com',
92
+ // isIp: false,
93
+ // publicSuffix: 'com',
94
+ // subdomain: 'spark-public.s3' }
95
+
96
+ tldts.parse(
97
+ 'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv',
98
+ { allowPrivateDomains: true },
99
+ );
100
+ // { domain: 'spark-public.s3.amazonaws.com',
101
+ // domainWithoutSuffix: 'spark-public',
102
+ // hostname: 'spark-public.s3.amazonaws.com',
103
+ // isIp: false,
104
+ // publicSuffix: 's3.amazonaws.com',
105
+ // subdomain: '' }
106
+
107
+ tldts.parse('gopher://domain.unknown/');
108
+ // { domain: 'domain.unknown',
109
+ // domainWithoutSuffix: 'domain',
110
+ // hostname: 'domain.unknown',
111
+ // isIp: false,
112
+ // publicSuffix: 'unknown',
113
+ // subdomain: '' }
114
+
115
+ tldts.parse('https://192.168.0.0'); // IPv4
116
+ // { domain: null,
117
+ // domainWithoutSuffix: null,
118
+ // hostname: '192.168.0.0',
119
+ // isIp: true,
120
+ // publicSuffix: null,
121
+ // subdomain: null }
122
+
123
+ tldts.parse('https://[::1]'); // IPv6
124
+ // { domain: null,
125
+ // domainWithoutSuffix: null,
126
+ // hostname: '::1',
127
+ // isIp: true,
128
+ // publicSuffix: null,
129
+ // subdomain: null }
130
+
131
+ tldts.parse('tldts@emailprovider.co.uk'); // email
132
+ // { domain: 'emailprovider.co.uk',
133
+ // domainWithoutSuffix: 'emailprovider',
134
+ // hostname: 'emailprovider.co.uk',
135
+ // isIp: false,
136
+ // publicSuffix: 'co.uk',
137
+ // subdomain: '' }
138
+ ```
139
+
140
+ | Property Name | Type | Description |
141
+ | :-------------------- | :----- | :---------------------------------------------- |
142
+ | `hostname` | `str` | `hostname` of the input extracted automatically |
143
+ | `domain` | `str` | Domain (tld + sld) |
144
+ | `domainWithoutSuffix` | `str` | Domain without public suffix |
145
+ | `subdomain` | `str` | Sub domain (what comes after `domain`) |
146
+ | `publicSuffix` | `str` | Public Suffix (tld) of `hostname` |
147
+ | `isIP` | `bool` | Is `hostname` an IP address? |
148
+
149
+ ## Single purpose methods
150
+
151
+ These methods are shorthands if you want to retrieve only a single value (and
152
+ will perform better than `parse` because less work will be needed).
153
+
154
+ ### getHostname(url | hostname, options?)
155
+
156
+ Returns the hostname from a given string.
157
+
158
+ ```javascript
159
+ const { getHostname } = require('tldts-icann');
160
+
161
+ getHostname('google.com'); // returns `google.com`
162
+ getHostname('fr.google.com'); // returns `fr.google.com`
163
+ getHostname('fr.google.google'); // returns `fr.google.google`
164
+ getHostname('foo.google.co.uk'); // returns `foo.google.co.uk`
165
+ getHostname('t.co'); // returns `t.co`
166
+ getHostname('fr.t.co'); // returns `fr.t.co`
167
+ getHostname(
168
+ 'https://user:password@example.co.uk:8080/some/path?and&query#hash',
169
+ ); // returns `example.co.uk`
170
+ ```
171
+
172
+ ### getDomain(url | hostname, options?)
173
+
174
+ Returns the fully qualified domain from a given string.
175
+
176
+ ```javascript
177
+ const { getDomain } = require('tldts-icann');
178
+
179
+ getDomain('google.com'); // returns `google.com`
180
+ getDomain('fr.google.com'); // returns `google.com`
181
+ getDomain('fr.google.google'); // returns `google.google`
182
+ getDomain('foo.google.co.uk'); // returns `google.co.uk`
183
+ getDomain('t.co'); // returns `t.co`
184
+ getDomain('fr.t.co'); // returns `t.co`
185
+ getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk`
186
+ ```
187
+
188
+ ### getDomainWithoutSuffix(url | hostname, options?)
189
+
190
+ Returns the domain (as returned by `getDomain(...)`) without the public suffix part.
191
+
192
+ ```javascript
193
+ const { getDomainWithoutSuffix } = require('tldts-icann');
194
+
195
+ getDomainWithoutSuffix('google.com'); // returns `google`
196
+ getDomainWithoutSuffix('fr.google.com'); // returns `google`
197
+ getDomainWithoutSuffix('fr.google.google'); // returns `google`
198
+ getDomainWithoutSuffix('foo.google.co.uk'); // returns `google`
199
+ getDomainWithoutSuffix('t.co'); // returns `t`
200
+ getDomainWithoutSuffix('fr.t.co'); // returns `t`
201
+ getDomainWithoutSuffix(
202
+ 'https://user:password@example.co.uk:8080/some/path?and&query#hash',
203
+ ); // returns `example`
204
+ ```
205
+
206
+ ### getSubdomain(url | hostname, options?)
207
+
208
+ Returns the complete subdomain for a given string.
209
+
210
+ ```javascript
211
+ const { getSubdomain } = require('tldts-icann');
212
+
213
+ getSubdomain('google.com'); // returns ``
214
+ getSubdomain('fr.google.com'); // returns `fr`
215
+ getSubdomain('google.co.uk'); // returns ``
216
+ getSubdomain('foo.google.co.uk'); // returns `foo`
217
+ getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo`
218
+ getSubdomain('t.co'); // returns ``
219
+ getSubdomain('fr.t.co'); // returns `fr`
220
+ getSubdomain(
221
+ 'https://user:password@secure.example.co.uk:443/some/path?and&query#hash',
222
+ ); // returns `secure`
223
+ ```
224
+
225
+ ### getPublicSuffix(url | hostname, options?)
226
+
227
+ Returns the [public suffix][] for a given string.
228
+
229
+ ```javascript
230
+ const { getPublicSuffix } = require('tldts-icann');
231
+
232
+ getPublicSuffix('google.com'); // returns `com`
233
+ getPublicSuffix('fr.google.com'); // returns `com`
234
+ getPublicSuffix('google.co.uk'); // returns `co.uk`
235
+ getPublicSuffix('s3.amazonaws.com'); // returns `com`
236
+ getPublicSuffix('tld.is.unknown'); // returns `unknown`
237
+ ```
238
+
239
+ # Troubleshooting
240
+
241
+ ## Retrieving subdomain of `localhost` and custom hostnames
242
+
243
+ `tldts` methods `getDomain` and `getSubdomain` are designed to **work only with _known and valid_ TLDs**.
244
+ This way, you can trust what a domain is.
245
+
246
+ `localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`:
247
+
248
+ ```js
249
+ const tldts = require('tldts-icann');
250
+
251
+ tldts.getDomain('localhost'); // returns null
252
+ tldts.getSubdomain('vhost.localhost'); // returns null
253
+
254
+ tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost'
255
+ tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost'
256
+ ```
257
+
258
+ ## Updating the TLDs List
259
+
260
+ `tldts` made the opinionated choice of shipping with a list of suffixes directly
261
+ in its bundle. There is currently no mechanism to update the lists yourself, but
262
+ we make sure that the version shipped is always up-to-date.
263
+
264
+ If you keep `tldts` updated, the lists should be up-to-date as well!
265
+
266
+ # Performance
267
+
268
+ `tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`).
269
+
270
+ Please see [this detailed comparison](./comparison/comparison.md) with other available libraries.
271
+
272
+ ## Contributors
273
+
274
+ `tldts` is based upon the excellent `tld.js` library and would not exist without
275
+ the many contributors who worked on the project:
276
+ <a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a>
277
+
278
+ This project would not be possible without the amazing Mozilla's
279
+ [public suffix list][]. Thank you for your hard work!
280
+
281
+ # License
282
+
283
+ [MIT License](LICENSE).
284
+
285
+ [badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master
286
+ [badge-downloads]: https://img.shields.io/npm/dm/tldts.svg
287
+ [public suffix list]: https://publicsuffix.org/list/
288
+ [list the recent changes]: https://github.com/publicsuffix/list/commits/master
289
+ [changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom
290
+ [public suffix]: https://publicsuffix.org/learn/