PyPI - tldextract - Versions diffs - 5.1.3__tar.gz → 5.3.0__tar.gz - Mend

tldextract 5.1.3tar.gz → 5.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

{tldextract-5.1.3 → tldextract-5.3.0}/CHANGELOG.md RENAMED Viewed

@@ -3,6 +3,38 @@
 After upgrading, update your cache file by deleting it or via `tldextract
 --update`.
+## 5.3.0 (2025-04-21)
+* Features
+  * Add result field `registry_suffix` ([#344](https://github.com/john-kurkowski/tldextract/issues/344))
+    * To complement the existing public suffix field `suffix`
+  * Add result property `top_domain_under_public_suffix` ([#344](https://github.com/john-kurkowski/tldextract/issues/344))
+  * Add result property `top_domain_under_registry_suffix` ([#344](https://github.com/john-kurkowski/tldextract/issues/344))
+  * Deprecate `registered_domain` property
+    * Use `top_domain_under_public_suffix` instead, which has the same behavior
+      but a more accurate name
+* Bugfixes
+  * Fix missing `reverse_domain_name` property in CLI `--json` output ([`a545c67`](https://github.com/john-kurkowski/tldextract/commit/a545c67d87223616fc13e90692886b3ca9af18bb))
+* Misc.
+    * Expand internal `suffix_index` return type to be richer than bools, and
+      include the registry suffix during trie traversal
+      ([#344](https://github.com/john-kurkowski/tldextract/issues/344))
+## 5.2.0 (2025-04-07)
+* Features
+  * Add `reverse_domain_name` result property ([#342](https://github.com/john-kurkowski/tldextract/issues/342))
+* Bugfixes
+  * Extend exported public interface with `ExtractResult` and `update` ([`36ff658`](https://github.com/john-kurkowski/tldextract/commit/36ff658c53b510c5d56f8af235c8b08ce3c512f5))
+    * These were always meant to be public. Eases user import.
+* Docs
+  * Document result fields
+  * Note all return values
+  * Colocate usage in the usage section
+  * Link to private domain docs
+* Misc.
+  * Update bundled snapshot
 ## 5.1.3 (2024-11-04)
 * Bugfixes
@@ -10,7 +42,7 @@ After upgrading, update your cache file by deleting it or via `tldextract
   * Drop support for EOL Python 3.8 ([#340](https://github.com/john-kurkowski/tldextract/issues/340))
   * Support Python 3.13 ([#341](https://github.com/john-kurkowski/tldextract/issues/341))
   * Update bundled snapshot
-* Documentation
+* Docs
   * Clarify how to use your own definitions
   * Clarify first-successful definitions vs. merged definitions
 * Misc.

{tldextract-5.1.3 → tldextract-5.3.0}/LICENSE RENAMED Viewed

@@ -1,6 +1,6 @@
 BSD 3-Clause License
-Copyright (c) 2013-2024, John Kurkowski
+Copyright (c) 2013-2025, John Kurkowski
 All rights reserved.
 Redistribution and use in source and binary forms, with or without

{tldextract-5.1.3 → tldextract-5.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: tldextract
-Version: 5.1.3
+Version: 5.3.0
 Summary: Accurately separates a URL's subdomain, domain, and public suffix, using the Public Suffix List (PSL). By default, this includes the public ICANN TLDs and their exceptions. You can optionally support the Public Suffix List's private domains as well.
 Author-email: John Kurkowski <john.kurkowski@gmail.com>
 License: BSD-3-Clause
@@ -37,6 +37,7 @@ Requires-Dist: tox; extra == "testing"
 Requires-Dist: tox-uv; extra == "testing"
 Requires-Dist: types-filelock; extra == "testing"
 Requires-Dist: types-requests; extra == "testing"
+Dynamic: license-file
 # tldextract [![PyPI version](https://badge.fury.io/py/tldextract.svg)](https://badge.fury.io/py/tldextract) [![Build Status](https://github.com/john-kurkowski/tldextract/actions/workflows/ci.yml/badge.svg)](https://github.com/john-kurkowski/tldextract/actions/workflows/ci.yml)
@@ -89,14 +90,23 @@ To rejoin the original hostname, if it was indeed a valid, registered hostname:
 ```python
 >>> ext = tldextract.extract('http://forums.bbc.co.uk')
->>> ext.registered_domain
+>>> ext.top_domain_under_public_suffix
 'bbc.co.uk'
 >>> ext.fqdn
 'forums.bbc.co.uk'
 ```
+In addition to the Python interface, there is a command-line interface. Split
+the URL components by space:
+```zsh
+$ tldextract 'http://forums.bbc.co.uk'
+forums bbc co.uk
+```
 By default, this package supports the public ICANN TLDs and their exceptions.
-You can optionally support the Public Suffix List's private domains as well.
+You can optionally support the Public Suffix List's [private
+domains](#public-vs-private-domains) as well.
 This package started by implementing the chosen answer from [this StackOverflow question on
 getting the "domain name" from a URL](http://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url/569219#569219).
@@ -118,13 +128,6 @@ Or the latest dev version:
 pip install -e 'git://github.com/john-kurkowski/tldextract.git#egg=tldextract'
 ```
-Command-line usage, splits the URL components by space:
-```zsh
-tldextract http://forums.bbc.co.uk
-# forums bbc co.uk
-```
 ## Note about caching
 Beware when first calling `tldextract`, it updates its TLD list with a live HTTP
@@ -188,15 +191,17 @@ ExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_privat
 ```
 The following overrides this.
 ```python
 >>> extract = tldextract.TLDExtract()
 >>> extract('waiterrant.blogspot.com', include_psl_private_domains=True)
 ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)
 ```
-or to change the default for all extract calls,
+To change the default for all extract calls:
 ```python
->>> extract = tldextract.TLDExtract( include_psl_private_domains=True)
+>>> extract = tldextract.TLDExtract(include_psl_private_domains=True)
 >>> extract('waiterrant.blogspot.com')
 ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)
 ```
@@ -282,7 +287,7 @@ For example:
 extractor = TLDExtract()
 split_url = urllib.parse.urlsplit("https://foo.bar.com:8080")
 split_suffix = extractor.extract_urllib(split_url)
-url_to_crawl = f"{split_url.scheme}://{split_suffix.registered_domain}:{split_url.port}"
+url_to_crawl = f"{split_url.scheme}://{split_suffix.top_domain_under_public_suffix}:{split_url.port}"
 ```
 `tldextract`'s lenient string parsing stance lowers the learning curve of using

{tldextract-5.1.3 → tldextract-5.3.0}/README.md RENAMED Viewed

@@ -49,14 +49,23 @@ To rejoin the original hostname, if it was indeed a valid, registered hostname:
 ```python
 >>> ext = tldextract.extract('http://forums.bbc.co.uk')
->>> ext.registered_domain
+>>> ext.top_domain_under_public_suffix
 'bbc.co.uk'
 >>> ext.fqdn
 'forums.bbc.co.uk'
 ```
+In addition to the Python interface, there is a command-line interface. Split
+the URL components by space:
+```zsh
+$ tldextract 'http://forums.bbc.co.uk'
+forums bbc co.uk
+```
 By default, this package supports the public ICANN TLDs and their exceptions.
-You can optionally support the Public Suffix List's private domains as well.
+You can optionally support the Public Suffix List's [private
+domains](#public-vs-private-domains) as well.
 This package started by implementing the chosen answer from [this StackOverflow question on
 getting the "domain name" from a URL](http://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url/569219#569219).
@@ -78,13 +87,6 @@ Or the latest dev version:
 pip install -e 'git://github.com/john-kurkowski/tldextract.git#egg=tldextract'
 ```
-Command-line usage, splits the URL components by space:
-```zsh
-tldextract http://forums.bbc.co.uk
-# forums bbc co.uk
-```
 ## Note about caching
 Beware when first calling `tldextract`, it updates its TLD list with a live HTTP
@@ -148,15 +150,17 @@ ExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_privat
 ```
 The following overrides this.
 ```python
 >>> extract = tldextract.TLDExtract()
 >>> extract('waiterrant.blogspot.com', include_psl_private_domains=True)
 ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)
 ```
-or to change the default for all extract calls,
+To change the default for all extract calls:
 ```python
->>> extract = tldextract.TLDExtract( include_psl_private_domains=True)
+>>> extract = tldextract.TLDExtract(include_psl_private_domains=True)
 >>> extract('waiterrant.blogspot.com')
 ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)
 ```
@@ -242,7 +246,7 @@ For example:
 extractor = TLDExtract()
 split_url = urllib.parse.urlsplit("https://foo.bar.com:8080")
 split_suffix = extractor.extract_urllib(split_url)
-url_to_crawl = f"{split_url.scheme}://{split_suffix.registered_domain}:{split_url.port}"
+url_to_crawl = f"{split_url.scheme}://{split_suffix.top_domain_under_public_suffix}:{split_url.port}"
 ```
 `tldextract`'s lenient string parsing stance lowers the learning curve of using

{tldextract-5.1.3 → tldextract-5.3.0}/pyproject.toml RENAMED Viewed

@@ -89,6 +89,12 @@ strict = true
 [tool.pytest.ini_options]
 addopts = "--doctest-modules"
+filterwarnings = [
+    "ignore:The 'registered_domain' property is deprecated:DeprecationWarning:tldextract.*:"
+]
+[tool.ruff.format]
+docstring-code-format = true
 [tool.ruff.lint]
 select = [

{tldextract-5.1.3 → tldextract-5.3.0}/scripts/release.py RENAMED Viewed

@@ -13,10 +13,18 @@ It will:
 Prerequisites:
     - This must be run from the root of the repository.
     - The repo must have a clean git working tree.
-    - The user must have the GITHUB_TOKEN environment variable set to a GitHub personal access token with repository "Contents" read and write permission.
-    - The user will need credentials for the PyPI repository, which the user will be prompted for during the upload step. The user will need to paste the token manually from a password manager or similar.
-    - The CHANGELOG.md file must already contain an entry for the version being released.
-    - Install requirements with: pip install --upgrade --editable '.[release]'
+    - The user must have the `GITHUB_TOKEN` environment variable set to a
+      GitHub personal access token with repository "Contents" read and write
+      permission. To generate, see
+      https://github.com/settings/personal-access-tokens
+    - The user will need an API token for the PyPI repository, which the user
+      will be prompted for during the upload step. The user will need to paste
+      the token manually from a password manager or similar. To generate, see
+      https://pypi.org/manage/account/
+    - The CHANGELOG.md file must already contain an entry for the version being
+      released.
+    - Install requirements with: `pip install --upgrade --editable
+      '.[release]'`
 """
@@ -158,7 +166,7 @@ def create_github_release_draft(token: str, version: str) -> None:
         )
         return
-    print(f'Release created successfully: {response.json()["html_url"]}')
+    print(f"Release created successfully: {response.json()['html_url']}")
     if not changelog_notes:
         print(

{tldextract-5.1.3 → tldextract-5.3.0}/tests/cli_test.py RENAMED Viewed

@@ -77,12 +77,16 @@ def test_cli_json_output(
     stdout, stderr = capsys.readouterr()
     assert not stderr
     assert json.loads(stdout) == {
-        "subdomain": "www",
         "domain": "bbc",
-        "suffix": "co.uk",
         "fqdn": "www.bbc.co.uk",
         "ipv4": "",
         "ipv6": "",
         "is_private": False,
         "registered_domain": "bbc.co.uk",
+        "registry_suffix": "co.uk",
+        "reverse_domain_name": "co.uk.bbc.www",
+        "subdomain": "www",
+        "suffix": "co.uk",
+        "top_domain_under_public_suffix": "bbc.co.uk",
+        "top_domain_under_registry_suffix": "bbc.co.uk",
     }

{tldextract-5.1.3 → tldextract-5.3.0}/tests/custom_suffix_test.py RENAMED Viewed

@@ -32,12 +32,19 @@ def test_private_extraction() -> None:
     """Test this library's uncached, offline, private domain extraction."""
     tld = tldextract.TLDExtract(cache_dir=tempfile.mkdtemp(), suffix_list_urls=[])
-    assert tld("foo.blogspot.com") == ExtractResult("foo", "blogspot", "com", False)
+    assert tld("foo.blogspot.com") == ExtractResult(
+        subdomain="foo",
+        domain="blogspot",
+        suffix="com",
+        is_private=False,
+        registry_suffix="com",
+    )
     assert tld("foo.blogspot.com", include_psl_private_domains=True) == ExtractResult(
-        "",
-        "foo",
-        "blogspot.com",
-        True,
+        subdomain="",
+        domain="foo",
+        suffix="blogspot.com",
+        is_private=True,
+        registry_suffix="com",
     )

{tldextract-5.1.3 → tldextract-5.3.0}/tests/main_test.py RENAMED Viewed

@@ -374,6 +374,42 @@ def test_dns_root_label() -> None:
     )
+def test_top_domain_under_public_suffix() -> None:
+    """Test property `top_domain_under_public_suffix`."""
+    assert (
+        tldextract.extract(
+            "http://www.example.auth.us-east-1.amazoncognito.com",
+            include_psl_private_domains=False,
+        ).top_domain_under_public_suffix
+        == "amazoncognito.com"
+    )
+    assert (
+        tldextract.extract(
+            "http://www.example.auth.us-east-1.amazoncognito.com",
+            include_psl_private_domains=True,
+        ).top_domain_under_public_suffix
+        == "example.auth.us-east-1.amazoncognito.com"
+    )
+def test_top_domain_under_registry_suffix() -> None:
+    """Test property `top_domain_under_registry_suffix`."""
+    assert (
+        tldextract.extract(
+            "http://www.example.auth.us-east-1.amazoncognito.com",
+            include_psl_private_domains=False,
+        ).top_domain_under_registry_suffix
+        == "amazoncognito.com"
+    )
+    assert (
+        tldextract.extract(
+            "http://www.example.auth.us-east-1.amazoncognito.com",
+            include_psl_private_domains=True,
+        ).top_domain_under_registry_suffix
+        == "amazoncognito.com"
+    )
 def test_ipv4() -> None:
     """Test IPv4 addresses."""
     assert_extract(
@@ -415,6 +451,46 @@ def test_ipv4_lookalike() -> None:
     )
+def test_reverse_domain_name_notation() -> None:
+    """Test property `reverse_domain_name`."""
+    assert (
+        tldextract.extract("www.example.com").reverse_domain_name == "com.example.www"
+    )
+    assert (
+        tldextract.extract("www.theregister.co.uk").reverse_domain_name
+        == "co.uk.theregister.www"
+    )
+    assert tldextract.extract("example.com").reverse_domain_name == "com.example"
+    assert (
+        tldextract.extract("theregister.co.uk").reverse_domain_name
+        == "co.uk.theregister"
+    )
+    assert (
+        tldextract.extract("media.forums.theregister.co.uk").reverse_domain_name
+        == "co.uk.theregister.forums.media"
+    )
+    assert (
+        tldextract.extract(
+            "foo.uk.com", include_psl_private_domains=False
+        ).reverse_domain_name
+        == "com.uk.foo"
+    )
+    assert (
+        tldextract.extract(
+            "foo.uk.com", include_psl_private_domains=True
+        ).reverse_domain_name
+        == "uk.com.foo"
+    )
+def test_bad_kwargs_no_way_to_fetch() -> None:
+    """Test an impossible combination of kwargs that disable all ways to fetch data."""
+    with pytest.raises(ValueError, match="disable all ways"):
+        tldextract.TLDExtract(
+            cache_dir=None, suffix_list_urls=(), fallback_to_snapshot=False
+        )
 def test_cache_permission(
     mocker: pytest_mock.MockerFixture, monkeypatch: pytest.MonkeyPatch, tmp_path: Path
 ) -> None:
@@ -486,12 +562,22 @@ def test_include_psl_private_domain_attr() -> None:
     extract_public1 = tldextract.TLDExtract()
     extract_public2 = tldextract.TLDExtract(include_psl_private_domains=False)
     assert extract_private("foo.uk.com") == ExtractResult(
-        subdomain="", domain="foo", suffix="uk.com", is_private=True
+        subdomain="",
+        domain="foo",
+        suffix="uk.com",
+        is_private=True,
+        registry_suffix="com",
     )
     assert (
         extract_public1("foo.uk.com")
         == extract_public2("foo.uk.com")
-        == ExtractResult(subdomain="foo", domain="uk", suffix="com", is_private=False)
+        == ExtractResult(
+            subdomain="foo",
+            domain="uk",
+            suffix="com",
+            is_private=False,
+            registry_suffix="com",
+        )
     )
@@ -514,11 +600,21 @@ def test_global_extract() -> None:
     """
     assert tldextract.extract(
         "blogspot.com", include_psl_private_domains=True
-    ) == ExtractResult(subdomain="", domain="", suffix="blogspot.com", is_private=True)
+    ) == ExtractResult(
+        subdomain="",
+        domain="",
+        suffix="blogspot.com",
+        is_private=True,
+        registry_suffix="com",
+    )
     assert tldextract.extract(
         "foo.blogspot.com", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="", domain="foo", suffix="blogspot.com", is_private=True
+        subdomain="",
+        domain="foo",
+        suffix="blogspot.com",
+        is_private=True,
+        registry_suffix="com",
     )
@@ -534,15 +630,26 @@ def test_private_domains_depth() -> None:
         domain="amazonaws",
         suffix="com",
         is_private=False,
+        registry_suffix="com",
     )
     assert tldextract.extract(
         "ap-south-1.amazonaws.com", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="ap-south-1", domain="amazonaws", suffix="com", is_private=False
+        subdomain="ap-south-1",
+        domain="amazonaws",
+        suffix="com",
+        is_private=False,
+        registry_suffix="com",
     )
     assert tldextract.extract(
         "amazonaws.com", include_psl_private_domains=True
-    ) == ExtractResult(subdomain="", domain="amazonaws", suffix="com", is_private=False)
+    ) == ExtractResult(
+        subdomain="",
+        domain="amazonaws",
+        suffix="com",
+        is_private=False,
+        registry_suffix="com",
+    )
     assert tldextract.extract(
         "the-quick-brown-fox.cn-north-1.amazonaws.com.cn",
         include_psl_private_domains=True,
@@ -551,16 +658,25 @@ def test_private_domains_depth() -> None:
         domain="amazonaws",
         suffix="com.cn",
         is_private=False,
+        registry_suffix="com.cn",
     )
     assert tldextract.extract(
         "cn-north-1.amazonaws.com.cn", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="cn-north-1", domain="amazonaws", suffix="com.cn", is_private=False
+        subdomain="cn-north-1",
+        domain="amazonaws",
+        suffix="com.cn",
+        is_private=False,
+        registry_suffix="com.cn",
     )
     assert tldextract.extract(
         "amazonaws.com.cn", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="", domain="amazonaws", suffix="com.cn", is_private=False
+        subdomain="",
+        domain="amazonaws",
+        suffix="com.cn",
+        is_private=False,
+        registry_suffix="com.cn",
     )
     assert tldextract.extract(
         "another.icann.compute.amazonaws.com", include_psl_private_domains=True
@@ -569,6 +685,7 @@ def test_private_domains_depth() -> None:
         domain="another",
         suffix="icann.compute.amazonaws.com",
         is_private=True,
+        registry_suffix="com",
     )
     assert tldextract.extract(
         "another.s3.dualstack.us-east-1.amazonaws.com", include_psl_private_domains=True
@@ -577,12 +694,17 @@ def test_private_domains_depth() -> None:
         domain="another",
         suffix="s3.dualstack.us-east-1.amazonaws.com",
         is_private=True,
+        registry_suffix="com",
     )
     assert tldextract.extract(
         "s3.ap-south-1.amazonaws.com", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="", domain="", suffix="s3.ap-south-1.amazonaws.com", is_private=True
+        subdomain="",
+        domain="",
+        suffix="s3.ap-south-1.amazonaws.com",
+        is_private=True,
+        registry_suffix="com",
     )
     assert tldextract.extract(
         "s3.cn-north-1.amazonaws.com.cn", include_psl_private_domains=True
@@ -591,11 +713,16 @@ def test_private_domains_depth() -> None:
         domain="",
         suffix="s3.cn-north-1.amazonaws.com.cn",
         is_private=True,
+        registry_suffix="com.cn",
     )
     assert tldextract.extract(
         "icann.compute.amazonaws.com", include_psl_private_domains=True
     ) == ExtractResult(
-        subdomain="", domain="", suffix="icann.compute.amazonaws.com", is_private=True
+        subdomain="",
+        domain="",
+        suffix="icann.compute.amazonaws.com",
+        is_private=True,
+        registry_suffix="com",
     )
     # Entire URL is private suffix which ends with another private suffix
@@ -607,4 +734,5 @@ def test_private_domains_depth() -> None:
         domain="",
         suffix="s3.dualstack.us-east-1.amazonaws.com",
         is_private=True,
+        registry_suffix="com",
     )

tldextract 5.1.3__tar.gz → 5.3.0__tar.gz

tldextract 5.1.3tar.gz → 5.3.0tar.gz