web-distiller 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,89 @@
1
+ Metadata-Version: 2.4
2
+ Name: web-distiller
3
+ Version: 0.1.0
4
+ Summary: Python client for Distiller — agent-ready web extraction with AI cleaning
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/nickandbro/distiller
7
+ Project-URL: Documentation, https://webdistiller.dev/docs
8
+ Project-URL: Source, https://github.com/nickandbro/distiller
9
+ Project-URL: Issues, https://github.com/nickandbro/distiller/issues
10
+ Keywords: distiller,web-distiller,markdown,agents,llm,web-scraping,url-to-markdown
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Internet :: WWW/HTTP
18
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
19
+ Classifier: Topic :: Text Processing :: Markup :: Markdown
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: httpx>=0.27.0
23
+
24
+ # web-distiller
25
+
26
+ Python client for the [Distiller](https://webdistiller.dev) API — agent-ready web extraction with AI cleaning.
27
+
28
+ `pip install web-distiller` installs the client for the hosted Distiller API. It does not install or run the Distiller server.
29
+
30
+ Use it with:
31
+ - free-tier API keys (Trafilatura cleaning),
32
+ - the Starter paid plan (Stripe-backed AI cleaning),
33
+ - or alternate x402 payment-compatible deployments.
34
+
35
+ ## Install
36
+
37
+ ```bash
38
+ pip install web-distiller
39
+ set DISTILLER_API_BASE=https://webdistiller.dev
40
+ ```
41
+
42
+ ## Quick start
43
+
44
+ ```python
45
+ from distiller_client import DistillerClient
46
+
47
+ client = DistillerClient(api_key="your-key")
48
+ result = client.markdown("https://example.com")
49
+ print(result["cleaned_markdown"])
50
+ ```
51
+
52
+ Or as a one-liner:
53
+
54
+ ```python
55
+ import distiller_client
56
+
57
+ print(distiller_client.markdown("https://example.com")["cleaned_markdown"])
58
+ ```
59
+
60
+ ## CLI
61
+
62
+ ```bash
63
+ web-distiller https://example.com
64
+ web-distiller https://example.com --format text
65
+ web-distiller https://example.com --use-browser
66
+ web-distiller https://example.com --force-refresh
67
+ ```
68
+
69
+ ## Configuration
70
+
71
+ | Env var | Purpose |
72
+ |---------|---------|
73
+ | `DISTILLER_API_BASE` | API base URL (default: production) |
74
+ | `DISTILLER_API_KEY` | Bearer token for authenticated access |
75
+ | `DISTILLER_REFERRER_TOKEN` | Referral attribution token |
76
+
77
+ All env vars can also be passed as constructor arguments.
78
+
79
+ Typical flow:
80
+ - use `web-distiller` without an API key for open/demo deployments,
81
+ - register for an API key when you want the free 1,000-call tier,
82
+ - upgrade that account to Starter through Stripe when you want AI cleaning and `/extract`.
83
+
84
+ ## Releases
85
+
86
+ `web-distiller` is released to PyPI from GitHub Actions via Trusted Publishing.
87
+
88
+ - Use the `.github/workflows/publish-web-distiller.yml` workflow for manual publishes.
89
+ - Or push a tag in the format `web-distiller-vX.Y.Z` after updating `distiller-client/pyproject.toml`.
@@ -0,0 +1,66 @@
1
+ # web-distiller
2
+
3
+ Python client for the [Distiller](https://webdistiller.dev) API — agent-ready web extraction with AI cleaning.
4
+
5
+ `pip install web-distiller` installs the client for the hosted Distiller API. It does not install or run the Distiller server.
6
+
7
+ Use it with:
8
+ - free-tier API keys (Trafilatura cleaning),
9
+ - the Starter paid plan (Stripe-backed AI cleaning),
10
+ - or alternate x402 payment-compatible deployments.
11
+
12
+ ## Install
13
+
14
+ ```bash
15
+ pip install web-distiller
16
+ set DISTILLER_API_BASE=https://webdistiller.dev
17
+ ```
18
+
19
+ ## Quick start
20
+
21
+ ```python
22
+ from distiller_client import DistillerClient
23
+
24
+ client = DistillerClient(api_key="your-key")
25
+ result = client.markdown("https://example.com")
26
+ print(result["cleaned_markdown"])
27
+ ```
28
+
29
+ Or as a one-liner:
30
+
31
+ ```python
32
+ import distiller_client
33
+
34
+ print(distiller_client.markdown("https://example.com")["cleaned_markdown"])
35
+ ```
36
+
37
+ ## CLI
38
+
39
+ ```bash
40
+ web-distiller https://example.com
41
+ web-distiller https://example.com --format text
42
+ web-distiller https://example.com --use-browser
43
+ web-distiller https://example.com --force-refresh
44
+ ```
45
+
46
+ ## Configuration
47
+
48
+ | Env var | Purpose |
49
+ |---------|---------|
50
+ | `DISTILLER_API_BASE` | API base URL (default: production) |
51
+ | `DISTILLER_API_KEY` | Bearer token for authenticated access |
52
+ | `DISTILLER_REFERRER_TOKEN` | Referral attribution token |
53
+
54
+ All env vars can also be passed as constructor arguments.
55
+
56
+ Typical flow:
57
+ - use `web-distiller` without an API key for open/demo deployments,
58
+ - register for an API key when you want the free 1,000-call tier,
59
+ - upgrade that account to Starter through Stripe when you want AI cleaning and `/extract`.
60
+
61
+ ## Releases
62
+
63
+ `web-distiller` is released to PyPI from GitHub Actions via Trusted Publishing.
64
+
65
+ - Use the `.github/workflows/publish-web-distiller.yml` workflow for manual publishes.
66
+ - Or push a tag in the format `web-distiller-vX.Y.Z` after updating `distiller-client/pyproject.toml`.
@@ -0,0 +1,38 @@
1
+ [project]
2
+ name = "web-distiller"
3
+ version = "0.1.0"
4
+ description = "Python client for Distiller — agent-ready web extraction with AI cleaning"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ license = "MIT"
8
+ keywords = ["distiller", "web-distiller", "markdown", "agents", "llm", "web-scraping", "url-to-markdown"]
9
+ classifiers = [
10
+ "Development Status :: 4 - Beta",
11
+ "Intended Audience :: Developers",
12
+ "Programming Language :: Python :: 3",
13
+ "Programming Language :: Python :: 3.10",
14
+ "Programming Language :: Python :: 3.11",
15
+ "Programming Language :: Python :: 3.12",
16
+ "Topic :: Internet :: WWW/HTTP",
17
+ "Topic :: Software Development :: Libraries :: Python Modules",
18
+ "Topic :: Text Processing :: Markup :: Markdown",
19
+ ]
20
+ dependencies = [
21
+ "httpx>=0.27.0",
22
+ ]
23
+
24
+ [project.urls]
25
+ Homepage = "https://github.com/nickandbro/distiller"
26
+ Documentation = "https://webdistiller.dev/docs"
27
+ Source = "https://github.com/nickandbro/distiller"
28
+ Issues = "https://github.com/nickandbro/distiller/issues"
29
+
30
+ [project.scripts]
31
+ web-distiller = "distiller_client:main"
32
+
33
+ [build-system]
34
+ requires = ["setuptools>=70.0"]
35
+ build-backend = "setuptools.build_meta"
36
+
37
+ [tool.setuptools.packages.find]
38
+ where = ["src"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,173 @@
1
+ """Lightweight Python client for the Distiller API."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import argparse
6
+ import json
7
+ import os
8
+ from typing import Any
9
+
10
+ import httpx
11
+
12
+ __all__ = ["DistillerClient", "distill", "markdown", "main"]
13
+
14
+ DEFAULT_BASE_URL = "https://webdistiller.dev"
15
+
16
+
17
+ class DistillerClient:
18
+ """Minimal client for the Distiller web-extraction API.
19
+
20
+ Usage::
21
+
22
+ client = DistillerClient(api_key="your-key")
23
+ result = client.markdown("https://example.com")
24
+ print(result["cleaned_markdown"])
25
+ """
26
+
27
+ def __init__(
28
+ self,
29
+ *,
30
+ api_key: str | None = None,
31
+ base_url: str | None = None,
32
+ referrer_token: str | None = None,
33
+ timeout: float = 30.0,
34
+ ) -> None:
35
+ self.base_url = (base_url or os.getenv("DISTILLER_API_BASE", DEFAULT_BASE_URL)).rstrip("/")
36
+ self.api_key = api_key or os.getenv("DISTILLER_API_KEY")
37
+ self.referrer_token = referrer_token or os.getenv("DISTILLER_REFERRER_TOKEN")
38
+ self.timeout = timeout
39
+
40
+ def _headers(self) -> dict[str, str]:
41
+ headers: dict[str, str] = {}
42
+ if self.api_key:
43
+ headers["Authorization"] = f"Bearer {self.api_key}"
44
+ if self.referrer_token:
45
+ headers["X-Distiller-Referrer"] = self.referrer_token
46
+ return headers
47
+
48
+ def markdown(
49
+ self,
50
+ url: str,
51
+ *,
52
+ use_browser: bool = False,
53
+ force_refresh: bool = False,
54
+ force_refetch: bool = False,
55
+ ) -> dict[str, Any]:
56
+ """Fetch cleaned Markdown for *url*."""
57
+ return self._post(
58
+ "/markdown", url,
59
+ use_browser=use_browser,
60
+ force_refresh=force_refresh,
61
+ force_refetch=force_refetch,
62
+ )
63
+
64
+ def distill(
65
+ self,
66
+ url: str,
67
+ *,
68
+ use_browser: bool = False,
69
+ force_refresh: bool = False,
70
+ force_refetch: bool = False,
71
+ ) -> dict[str, Any]:
72
+ """Fetch full distillation (HTML + Markdown + text) for *url*."""
73
+ return self._post(
74
+ "/distill", url,
75
+ use_browser=use_browser,
76
+ force_refresh=force_refresh,
77
+ force_refetch=force_refetch,
78
+ )
79
+
80
+ def start_paid_ai_checkout(self) -> dict[str, Any]:
81
+ """Create a Stripe checkout session for paid AI tier."""
82
+ resp = httpx.post(
83
+ f"{self.base_url}/billing/checkout",
84
+ headers=self._headers(),
85
+ timeout=self.timeout,
86
+ )
87
+ resp.raise_for_status()
88
+ return resp.json()
89
+
90
+ def _post(
91
+ self,
92
+ endpoint: str,
93
+ url: str,
94
+ *,
95
+ use_browser: bool = False,
96
+ force_refresh: bool = False,
97
+ force_refetch: bool = False,
98
+ ) -> dict[str, Any]:
99
+ headers = self._headers()
100
+ body: dict[str, Any] = {"url": url, "use_browser": use_browser}
101
+ if force_refresh:
102
+ body["force_refresh"] = True
103
+ if force_refetch:
104
+ body["force_refetch"] = True
105
+ resp = httpx.post(
106
+ f"{self.base_url}{endpoint}",
107
+ json=body,
108
+ headers=headers,
109
+ timeout=self.timeout,
110
+ )
111
+ try:
112
+ resp.raise_for_status()
113
+ except httpx.HTTPStatusError as exc:
114
+ if exc.response is not None and exc.response.status_code == 402:
115
+ raise RuntimeError("Payment required (402). Configure paid auth or provide payment headers.") from exc
116
+ raise
117
+ return resp.json()
118
+
119
+
120
+ def distill(url: str, **kwargs: Any) -> dict[str, Any]:
121
+ """One-shot helper: ``distiller_client.distill("https://…")``."""
122
+ return DistillerClient(**kwargs).distill(url)
123
+
124
+
125
+ def markdown(url: str, **kwargs: Any) -> dict[str, Any]:
126
+ """One-shot helper: ``distiller_client.markdown("https://…")``."""
127
+ return DistillerClient(**kwargs).markdown(url)
128
+
129
+
130
+ def main() -> None:
131
+ parser = argparse.ArgumentParser(
132
+ prog="web-distiller",
133
+ description="Fetch cleaned content from the Distiller API.",
134
+ )
135
+ parser.add_argument("url", help="URL to distill")
136
+ parser.add_argument("--api-base", default=None, help="Distiller API base URL")
137
+ parser.add_argument("--api-key", default=None, help="API key (or set DISTILLER_API_KEY)")
138
+ parser.add_argument("--use-browser", action="store_true", help="Force browser rendering")
139
+ parser.add_argument("--force-refresh", action="store_true", help="Bypass response cache (content dedup still applies)")
140
+ parser.add_argument("--force-refetch", action="store_true", help="Bypass ALL caches and re-fetch from upstream (always billed)")
141
+ parser.add_argument(
142
+ "--format",
143
+ choices=("html", "markdown", "text", "json"),
144
+ default="markdown",
145
+ help="Output format (default: markdown)",
146
+ )
147
+ args = parser.parse_args()
148
+
149
+ client = DistillerClient(api_key=args.api_key, base_url=args.api_base)
150
+
151
+ call_kwargs = {
152
+ "use_browser": args.use_browser,
153
+ "force_refresh": args.force_refresh,
154
+ "force_refetch": args.force_refetch,
155
+ }
156
+
157
+ if args.format in ("html", "json"):
158
+ result = client.distill(args.url, **call_kwargs)
159
+ else:
160
+ result = client.markdown(args.url, **call_kwargs)
161
+
162
+ if args.format == "html":
163
+ print(result["cleaned_html"])
164
+ elif args.format == "markdown":
165
+ print(result["cleaned_markdown"])
166
+ elif args.format == "text":
167
+ print(result["cleaned_text"])
168
+ else:
169
+ print(json.dumps(result, indent=2))
170
+
171
+
172
+ if __name__ == "__main__":
173
+ main()
@@ -0,0 +1,89 @@
1
+ Metadata-Version: 2.4
2
+ Name: web-distiller
3
+ Version: 0.1.0
4
+ Summary: Python client for Distiller — agent-ready web extraction with AI cleaning
5
+ License-Expression: MIT
6
+ Project-URL: Homepage, https://github.com/nickandbro/distiller
7
+ Project-URL: Documentation, https://webdistiller.dev/docs
8
+ Project-URL: Source, https://github.com/nickandbro/distiller
9
+ Project-URL: Issues, https://github.com/nickandbro/distiller/issues
10
+ Keywords: distiller,web-distiller,markdown,agents,llm,web-scraping,url-to-markdown
11
+ Classifier: Development Status :: 4 - Beta
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Internet :: WWW/HTTP
18
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
19
+ Classifier: Topic :: Text Processing :: Markup :: Markdown
20
+ Requires-Python: >=3.10
21
+ Description-Content-Type: text/markdown
22
+ Requires-Dist: httpx>=0.27.0
23
+
24
+ # web-distiller
25
+
26
+ Python client for the [Distiller](https://webdistiller.dev) API — agent-ready web extraction with AI cleaning.
27
+
28
+ `pip install web-distiller` installs the client for the hosted Distiller API. It does not install or run the Distiller server.
29
+
30
+ Use it with:
31
+ - free-tier API keys (Trafilatura cleaning),
32
+ - the Starter paid plan (Stripe-backed AI cleaning),
33
+ - or alternate x402 payment-compatible deployments.
34
+
35
+ ## Install
36
+
37
+ ```bash
38
+ pip install web-distiller
39
+ set DISTILLER_API_BASE=https://webdistiller.dev
40
+ ```
41
+
42
+ ## Quick start
43
+
44
+ ```python
45
+ from distiller_client import DistillerClient
46
+
47
+ client = DistillerClient(api_key="your-key")
48
+ result = client.markdown("https://example.com")
49
+ print(result["cleaned_markdown"])
50
+ ```
51
+
52
+ Or as a one-liner:
53
+
54
+ ```python
55
+ import distiller_client
56
+
57
+ print(distiller_client.markdown("https://example.com")["cleaned_markdown"])
58
+ ```
59
+
60
+ ## CLI
61
+
62
+ ```bash
63
+ web-distiller https://example.com
64
+ web-distiller https://example.com --format text
65
+ web-distiller https://example.com --use-browser
66
+ web-distiller https://example.com --force-refresh
67
+ ```
68
+
69
+ ## Configuration
70
+
71
+ | Env var | Purpose |
72
+ |---------|---------|
73
+ | `DISTILLER_API_BASE` | API base URL (default: production) |
74
+ | `DISTILLER_API_KEY` | Bearer token for authenticated access |
75
+ | `DISTILLER_REFERRER_TOKEN` | Referral attribution token |
76
+
77
+ All env vars can also be passed as constructor arguments.
78
+
79
+ Typical flow:
80
+ - use `web-distiller` without an API key for open/demo deployments,
81
+ - register for an API key when you want the free 1,000-call tier,
82
+ - upgrade that account to Starter through Stripe when you want AI cleaning and `/extract`.
83
+
84
+ ## Releases
85
+
86
+ `web-distiller` is released to PyPI from GitHub Actions via Trusted Publishing.
87
+
88
+ - Use the `.github/workflows/publish-web-distiller.yml` workflow for manual publishes.
89
+ - Or push a tag in the format `web-distiller-vX.Y.Z` after updating `distiller-client/pyproject.toml`.
@@ -0,0 +1,9 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/distiller_client/__init__.py
4
+ src/web_distiller.egg-info/PKG-INFO
5
+ src/web_distiller.egg-info/SOURCES.txt
6
+ src/web_distiller.egg-info/dependency_links.txt
7
+ src/web_distiller.egg-info/entry_points.txt
8
+ src/web_distiller.egg-info/requires.txt
9
+ src/web_distiller.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ web-distiller = distiller_client:main
@@ -0,0 +1 @@
1
+ httpx>=0.27.0
@@ -0,0 +1 @@
1
+ distiller_client