PyPI - linktrace - Versions diffs - 0.2.0__tar.gz → 0.2.1__tar.gz - Mend

linktrace 0.2.0tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

{linktrace-0.2.0 → linktrace-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: linktrace
-Version: 0.2.0
+Version: 0.2.1
 Summary: Async web crawler with rate limiting, robots.txt support, and broken link tracking
 License-File: LICENSE
 Requires-Python: >=3.12
@@ -123,7 +123,7 @@ spider = Spider(
 ```python
 spider = Spider(
     start_url="https://example.com",
-    cache_dir=".webcrawler_cache"  # Enable disk caching (default: None/disabled)
+    cache_dir=".linktrace_cache"  # Enable disk caching (default: None/disabled)
 )
 # 2nd run will be 10-50x faster for same URLs
 ```
@@ -352,13 +352,23 @@ Spider (orchestrator)
       └─ CookieJar (automatic cookie handling)
 ```
-Spider manages the crawl queue and traversal. Crawler handles individual document fetching/parsing. All requests share one persistent aiohttp session per Spider instance.
+Spider manages the crawl queue and traversal. Crawler handles individual document fetching/parsing. All requests share one persistent aiohttp session per Spider instance, so connection pooling, cookies, SSL configuration, and DNS caching are reused across the crawl.
 ## Why linktrace?
-**vs Scrapy:** Lightweight, focused, simpler API for link analysis. Scrapy is better for complex extraction pipelines.
+Scrapy is an excellent full crawling and extraction framework. `linktrace` is designed for a narrower job: fast async link analysis with minimal setup.
-**vs requests + BeautifulSoup:** Built-in async concurrency, automatic session reuse, retries, caching. Better for crawling multiple pages.
+Instead of building a Scrapy project around spiders, requests, responses, callbacks, items, pipelines, middleware, and settings, `linktrace` gives you a direct document-centric API. Each crawled URL becomes a `Document` object containing the page source, title, status code, response headers, domain, internal links, external links, and crawl status metadata.
+That makes `linktrace` useful when your goal is to inspect site structure, trace links, audit crawl status, or export crawl results to dataframe-oriented tools without creating a larger scraping project.
+`linktrace` also reuses a persistent `aiohttp` session during a crawl. Connection pooling, cookie reuse, SSL configuration, request timeouts, per-host limits, and DNS caching are carried across requests, which can make repeated same-domain crawls much faster than creating a fresh client/session per URL.
+**Use Scrapy when:** you need a mature scraping framework with item pipelines, middleware, schedulers, broad ecosystem support, and complex extraction workflows.
+**Use linktrace when:** you want a focused async crawler that turns URLs into analyzable `Document` objects with automatic link classification and simple exports.
+**vs requests + BeautifulSoup:** Built-in async concurrency, automatic session reuse, retries, caching, rate limiting, and structured document objects. Better for crawling multiple pages.
 **vs Selenium:** Pure HTTP crawler (no JS execution). Faster, lighter, but can't handle dynamic sites.

{linktrace-0.2.0 → linktrace-0.2.1}/README.md RENAMED Viewed

@@ -100,7 +100,7 @@ spider = Spider(
 ```python
 spider = Spider(
     start_url="https://example.com",
-    cache_dir=".webcrawler_cache"  # Enable disk caching (default: None/disabled)
+    cache_dir=".linktrace_cache"  # Enable disk caching (default: None/disabled)
 )
 # 2nd run will be 10-50x faster for same URLs
 ```
@@ -329,13 +329,23 @@ Spider (orchestrator)
       └─ CookieJar (automatic cookie handling)
 ```
-Spider manages the crawl queue and traversal. Crawler handles individual document fetching/parsing. All requests share one persistent aiohttp session per Spider instance.
+Spider manages the crawl queue and traversal. Crawler handles individual document fetching/parsing. All requests share one persistent aiohttp session per Spider instance, so connection pooling, cookies, SSL configuration, and DNS caching are reused across the crawl.
 ## Why linktrace?
-**vs Scrapy:** Lightweight, focused, simpler API for link analysis. Scrapy is better for complex extraction pipelines.
+Scrapy is an excellent full crawling and extraction framework. `linktrace` is designed for a narrower job: fast async link analysis with minimal setup.
-**vs requests + BeautifulSoup:** Built-in async concurrency, automatic session reuse, retries, caching. Better for crawling multiple pages.
+Instead of building a Scrapy project around spiders, requests, responses, callbacks, items, pipelines, middleware, and settings, `linktrace` gives you a direct document-centric API. Each crawled URL becomes a `Document` object containing the page source, title, status code, response headers, domain, internal links, external links, and crawl status metadata.
+That makes `linktrace` useful when your goal is to inspect site structure, trace links, audit crawl status, or export crawl results to dataframe-oriented tools without creating a larger scraping project.
+`linktrace` also reuses a persistent `aiohttp` session during a crawl. Connection pooling, cookie reuse, SSL configuration, request timeouts, per-host limits, and DNS caching are carried across requests, which can make repeated same-domain crawls much faster than creating a fresh client/session per URL.
+**Use Scrapy when:** you need a mature scraping framework with item pipelines, middleware, schedulers, broad ecosystem support, and complex extraction workflows.
+**Use linktrace when:** you want a focused async crawler that turns URLs into analyzable `Document` objects with automatic link classification and simple exports.
+**vs requests + BeautifulSoup:** Built-in async concurrency, automatic session reuse, retries, caching, rate limiting, and structured document objects. Better for crawling multiple pages.
 **vs Selenium:** Pure HTTP crawler (no JS execution). Faster, lighter, but can't handle dynamic sites.

{linktrace-0.2.0 → linktrace-0.2.1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 [project]
 name = "linktrace"
-version = "0.2.0"
+version = "0.2.1"
 description = "Async web crawler with rate limiting, robots.txt support, and broken link tracking"
 readme = "README.md"
 requires-python = ">=3.12"
@@ -44,7 +44,7 @@ dev = [
 packages = ["linktrace"]
 [tool.ruff]
-target-version = "py312"
+target-version = "0.2.1"
 line-length = 88
 [tool.ruff.lint]
@@ -82,7 +82,7 @@ precision = 2
 directory = "htmlcov"
 [tool.mypy]
-python_version = "3.12"
+python_version = "0.2.1"
 warn_return_any = true
 warn_unused_configs = true
 disallow_untyped_defs = false

{linktrace-0.2.0 → linktrace-0.2.1}/uv.lock RENAMED Viewed

@@ -796,7 +796,7 @@ wheels = [
 [[package]]
 name = "linktrace"
-version = "0.1.2"
+version = "0.2.0"
 source = { editable = "." }
 dependencies = [
     { name = "aiofiles" },