PyPI - linksanity - Versions diffs - 0.1.0__tar.gz - Mend

linksanity 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (64) hide show

linksanity-0.1.0/.github/ISSUE_TEMPLATE/bug_report.yml +43 -0
linksanity-0.1.0/.github/ISSUE_TEMPLATE/feature_request.yml +24 -0
linksanity-0.1.0/.github/dependabot.yml +13 -0
linksanity-0.1.0/.github/workflows/ci.yml +58 -0
linksanity-0.1.0/.gitignore +31 -0
linksanity-0.1.0/CONTRIBUTING.md +52 -0
linksanity-0.1.0/PKG-INFO +436 -0
linksanity-0.1.0/README.md +392 -0
linksanity-0.1.0/linksanity/__init__.py +3 -0
linksanity-0.1.0/linksanity/__main__.py +3 -0
linksanity-0.1.0/linksanity/checkers/__init__.py +0 -0
linksanity-0.1.0/linksanity/checkers/filesystem.py +136 -0
linksanity-0.1.0/linksanity/checkers/http.py +171 -0
linksanity-0.1.0/linksanity/checkers/playwright.py +228 -0
linksanity-0.1.0/linksanity/cli.py +254 -0
linksanity-0.1.0/linksanity/config.py +104 -0
linksanity-0.1.0/linksanity/crawler.py +125 -0
linksanity-0.1.0/linksanity/parsers/__init__.py +0 -0
linksanity-0.1.0/linksanity/parsers/html.py +42 -0
linksanity-0.1.0/linksanity/parsers/markdown.py +48 -0
linksanity-0.1.0/linksanity/parsers/rst.py +53 -0
linksanity-0.1.0/linksanity/py.typed +0 -0
linksanity-0.1.0/linksanity/queue.py +72 -0
linksanity-0.1.0/linksanity/reporters/__init__.py +26 -0
linksanity-0.1.0/linksanity/reporters/console.py +78 -0
linksanity-0.1.0/linksanity/reporters/csv_reporter.py +39 -0
linksanity-0.1.0/linksanity/reporters/github_reporter.py +108 -0
linksanity-0.1.0/linksanity/reporters/json_reporter.py +28 -0
linksanity-0.1.0/linksanity/reporters/markdown_reporter.py +68 -0
linksanity-0.1.0/linksanity/router.py +72 -0
linksanity-0.1.0/linksanity/scanner.py +77 -0
linksanity-0.1.0/pyproject.toml +95 -0
linksanity-0.1.0/tests/__init__.py +0 -0
linksanity-0.1.0/tests/fixtures/docs/broken.md +7 -0
linksanity-0.1.0/tests/fixtures/docs/external.md +5 -0
linksanity-0.1.0/tests/fixtures/docs/guide.md +5 -0
linksanity-0.1.0/tests/fixtures/docs/index.md +9 -0
linksanity-0.1.0/tests/fixtures/linksanity.toml +6 -0
linksanity-0.1.0/tests/fixtures/sample.html +16 -0
linksanity-0.1.0/tests/fixtures/sample.md +24 -0
linksanity-0.1.0/tests/fixtures/sample.rst +22 -0
linksanity-0.1.0/tests/fixtures/site/index.html +10 -0
linksanity-0.1.0/tests/fixtures/site/page2.html +9 -0
linksanity-0.1.0/tests/integration/__init__.py +0 -0
linksanity-0.1.0/tests/integration/test_crawl_e2e.py +105 -0
linksanity-0.1.0/tests/integration/test_playwright.py +105 -0
linksanity-0.1.0/tests/integration/test_scan_e2e.py +227 -0
linksanity-0.1.0/tests/unit/__init__.py +0 -0
linksanity-0.1.0/tests/unit/test_checkers/__init__.py +0 -0
linksanity-0.1.0/tests/unit/test_checkers/test_filesystem.py +212 -0
linksanity-0.1.0/tests/unit/test_checkers/test_http.py +166 -0
linksanity-0.1.0/tests/unit/test_config.py +117 -0
linksanity-0.1.0/tests/unit/test_crawler.py +202 -0
linksanity-0.1.0/tests/unit/test_parsers/__init__.py +0 -0
linksanity-0.1.0/tests/unit/test_parsers/test_html.py +84 -0
linksanity-0.1.0/tests/unit/test_parsers/test_markdown.py +92 -0
linksanity-0.1.0/tests/unit/test_parsers/test_rst.py +69 -0
linksanity-0.1.0/tests/unit/test_queue.py +90 -0
linksanity-0.1.0/tests/unit/test_reporters/__init__.py +0 -0
linksanity-0.1.0/tests/unit/test_reporters/test_console.py +180 -0
linksanity-0.1.0/tests/unit/test_reporters/test_github.py +184 -0
linksanity-0.1.0/tests/unit/test_reporters/test_json_csv.py +148 -0
linksanity-0.1.0/tests/unit/test_reporters/test_markdown.py +130 -0
linksanity-0.1.0/tests/unit/test_router.py +206 -0

linksanity-0.1.0/.github/ISSUE_TEMPLATE/bug_report.yml ADDED Viewed

@@ -0,0 +1,43 @@
+name: Bug report
+description: Report a bug in linksanity
+labels: [bug]
+body:
+  - type: input
+    id: version
+    attributes:
+      label: linksanity version
+      placeholder: "e.g. 0.1.0 (run `linksanity --version`)"
+    validations:
+      required: true
+  - type: input
+    id: python
+    attributes:
+      label: Python version
+      placeholder: "e.g. 3.12.1"
+    validations:
+      required: true
+  - type: textarea
+    id: command
+    attributes:
+      label: Command
+      description: The exact command you ran
+      placeholder: linksanity scan ./docs/ --check-anchors
+    validations:
+      required: true
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behavior
+    validations:
+      required: true
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behavior
+      description: Paste the full output or error message
+    validations:
+      required: true

linksanity-0.1.0/.github/ISSUE_TEMPLATE/feature_request.yml ADDED Viewed

@@ -0,0 +1,24 @@
+name: Feature request
+description: Suggest an improvement or new feature
+labels: [enhancement]
+body:
+  - type: textarea
+    id: problem
+    attributes:
+      label: Problem
+      description: What problem does this solve?
+    validations:
+      required: true
+  - type: textarea
+    id: solution
+    attributes:
+      label: Proposed solution
+    validations:
+      required: true
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Other approaches you thought about

linksanity-0.1.0/.github/dependabot.yml ADDED Viewed

@@ -0,0 +1,13 @@
+version: 2
+updates:
+  - package-ecosystem: "pip"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5
+  - package-ecosystem: "github-actions"
+    directory: "/"
+    schedule:
+      interval: "weekly"
+    open-pull-requests-limit: 5

linksanity-0.1.0/.github/workflows/ci.yml ADDED Viewed

@@ -0,0 +1,58 @@
+name: CI
+on:
+  push:
+    branches: [main]
+  pull_request:
+permissions:
+  contents: read
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.11", "3.12"]
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install dependencies
+        run: pip install -e ".[dev]"
+      - name: Lint
+        run: ruff check linksanity/ tests/
+      - name: Type check
+        run: mypy linksanity/
+      - name: Test (unit only, no browser)
+        run: pytest tests/unit/ -x -q
+  test-browser:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
+      - uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
+        with:
+          python-version: "3.12"
+      - name: Install dependencies
+        run: pip install -e ".[dev,browser]"
+      - name: Install Playwright browsers
+        run: playwright install --with-deps chromium
+      - name: Test (all including browser)
+        run: pytest -x -q --cov=linksanity --cov-report=xml
+      - name: Upload coverage
+        uses: codecov/codecov-action@b9fd7d16f6d7d1b5d2bec1a2887e65ceed900238 # v4
+        with:
+          file: coverage.xml

linksanity-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,31 @@
+# Python
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+*.egg-info/
+*.egg
+.eggs/
+# Virtual environments
+.venv/
+venv/
+env/
+# Build
+dist/
+build/
+*.whl
+# Testing & coverage
+.coverage
+.coverage.*
+htmlcov/
+.pytest_cache/
+.mypy_cache/
+# IDE
+.vscode/
+.idea/
+*.swp
+.DS_Store

linksanity-0.1.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,52 @@
+# Contributing to linksanity
+Thanks for your interest in contributing!
+## Setup
+```bash
+git clone https://github.com/ya8282/linksanity
+cd linksanity
+python -m venv .venv && source .venv/bin/activate
+pip install -e ".[dev,browser]"
+playwright install chromium
+```
+## Running tests
+```bash
+pytest                    # all tests
+pytest tests/unit/        # unit tests only (no browser)
+pytest tests/integration/ # integration tests (browser optional)
+```
+## Code quality
+```bash
+ruff check linksanity/ tests/ --fix   # lint + auto-fix
+mypy linksanity/                      # type check (strict mode)
+```
+Both must pass before opening a PR.
+## Guidelines
+- Follow the existing code style (ruff-enforced)
+- New features need unit tests; new checkers/parsers need integration tests
+- All public functions must have type annotations
+- `GITHUB_TOKEN` must never be accepted as a CLI argument — env only
+- Never write to disk unless `--output`, `--report`, or `--github-issue` is passed
+## Pull requests
+1. Fork and create a branch from `main`
+2. Write tests for your change
+3. Run `pytest`, `ruff check`, and `mypy` — all must pass
+4. Open a PR with a short description of what changed and why
+## Reporting bugs
+Open an issue at https://github.com/ya8282/linksanity/issues with:
+- Python version
+- Command you ran
+- Expected vs. actual output

linksanity-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,436 @@
+Metadata-Version: 2.4
+Name: linksanity
+Version: 0.1.0
+Summary: Detect broken links in Markdown, reStructuredText, and HTML documentation
+Project-URL: Homepage, https://github.com/ya8282/linksanity
+Project-URL: Bug Tracker, https://github.com/ya8282/linksanity/issues
+Author: linksanity contributors
+License: MIT
+Keywords: broken-links,documentation,html,link-checker,markdown,rst
+Classifier: Development Status :: 3 - Alpha
+Classifier: Environment :: Console
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: MacOS
+Classifier: Operating System :: POSIX :: Linux
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Documentation
+Classifier: Topic :: Software Development :: Testing
+Classifier: Topic :: Utilities
+Classifier: Typing :: Typed
+Requires-Python: >=3.11
+Requires-Dist: beautifulsoup4>=4.12
+Requires-Dist: docutils>=0.20
+Requires-Dist: httpx[http2]>=0.27
+Requires-Dist: lxml>=5.0
+Requires-Dist: markdown-it-py>=3.0
+Requires-Dist: rich>=13.0
+Requires-Dist: typer>=0.12
+Provides-Extra: browser
+Requires-Dist: playwright>=1.40; extra == 'browser'
+Provides-Extra: dev
+Requires-Dist: mypy>=1.10; extra == 'dev'
+Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
+Requires-Dist: pytest-cov>=5.0; extra == 'dev'
+Requires-Dist: pytest>=8.0; extra == 'dev'
+Requires-Dist: respx>=0.21; extra == 'dev'
+Requires-Dist: ruff>=0.4; extra == 'dev'
+Requires-Dist: types-beautifulsoup4; extra == 'dev'
+Requires-Dist: types-docutils; extra == 'dev'
+Description-Content-Type: text/markdown
+# linksanity (🏀17)
+Detect broken links and redirects in Markdown, reStructuredText, and HTML documentation.
+```
+$ linksanity scan ./docs/
+docs/api/guide.md
+  BROKEN    line   12  ./missing.md — file not found
+  REDIRECT  line   45  https://old.example.com → https://new.example.com
+ok=38   broken=1   redirect=1   skipped=0
+```
+## Features
+- **Static scan** — parse `.md`, `.rst`, and `.html` source files without a browser
+- **Live crawl** — follow links on a deployed site using a headless browser (Playwright)
+- **Exit codes** — `0` = clean, `1` = broken links found (ideal for CI)
+- **Multiple formats** — console (Rich), JSON, CSV; optional Markdown summary report
+- **Anchor validation** — opt-in `--check-anchors` flag
+- **GitHub Issues** — create or update an issue summarising broken links
+- **Ignore domains** — skip domains you don't control
+- **JS-rendered pages** — route specific domains through Playwright in scan mode
+- **Retry logic** — exponential back-off on 429/503; HEAD→GET fallback on 405
+## Install
+**From PyPI** (once published):
+```bash
+pip install linksanity
+# Optional: browser support for JS-rendered pages
+pip install "linksanity[browser]"
+playwright install chromium
+```
+**From source:**
+```bash
+git clone https://github.com/linksanity/linksanity
+cd linksanity
+pip install -e ".[dev,browser]"
+playwright install chromium
+```
+Requires Python 3.11+.
+## Quick start
+### Scan local source files
+```bash
+# Scan a directory (finds all .md / .rst / .html files recursively)
+linksanity scan ./docs/
+# Scan specific files or globs
+linksanity scan README.md docs/**/*.md
+# Validate anchor fragments too
+linksanity scan ./docs/ --check-anchors
+# Write JSON output; exit 1 if broken links found
+linksanity scan ./docs/ --format json --output results.json
+# Create a Markdown summary report
+linksanity scan ./docs/ --report report.md
+# Skip domains you don't control
+echo "internal.corp.example.com" > ignore.txt
+linksanity scan ./docs/ --ignore-domains ignore.txt
+```
+### Crawl a live site
+```bash
+# Crawl up to 500 pages (default)
+linksanity crawl https://docs.example.com
+# Limit crawl depth
+linksanity crawl https://docs.example.com --max-pages 50
+# Ignore external domains
+linksanity crawl https://docs.example.com --ignore-domains ignore.txt
+```
+### CI integration
+Add a link-check job that runs on every pull request and on a weekly schedule.
+```yaml
+# .github/workflows/linkcheck.yml
+name: Link check
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+  schedule:
+    - cron: "0 8 * * 1"   # every Monday at 08:00 UTC
+permissions:
+  contents: read
+jobs:
+  linkcheck:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+          cache: pip
+      - name: Install linksanity
+        run: pip install linksanity
+      - name: Check links
+        run: |
+          linksanity scan ./docs/ \
+            --skip-urls .linksanity-skip \
+            --format json \
+            --output linkcheck.json
+      - name: Upload results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: linkcheck-results
+          path: linkcheck.json
+```
+**File-based skip list** — commit a `.linksanity-skip` file at your repo root to exclude auth-gated or staging URLs. Supports `*` wildcards:
+```
+# .linksanity-skip
+https://app.example.com/login
+https://staging.example.com/*
+https://internal.corp.example.com/*
+```
+**Report broken links to a GitHub Issue** — useful for scheduled runs that find regressions after merge:
+```yaml
+      - name: Report broken links
+        if: failure()
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          linksanity scan ./docs/ \
+            --github-issue \
+            --repo ${{ github.repository }}
+```
+`GITHUB_TOKEN` is always read from the environment — never pass it as a CLI flag or store it in a file.
+**Crawl a live docs site** — swap `scan` for `crawl` to test a deployed site:
+```yaml
+      - name: Crawl live docs
+        run: |
+          pip install "linksanity[browser]"
+          playwright install --with-deps chromium
+          linksanity crawl https://docs.example.com \
+            --max-pages 200 \
+            --block-analytics \
+            --format json \
+            --output crawl-results.json
+```
+### GitHub Issue reporting
+Use `--github-issue` when you want broken links surfaced as a trackable GitHub Issue rather than just a failed CI run. It creates or updates a single `[linksanity]` issue listing every broken URL, so the team has a persistent record to triage — not just a red check mark that disappears on the next push.
+**When to use it:**
+- **Scheduled runs** — a weekly cron job catches link rot that crept in after your last merge. The issue stays open until you fix the links and the check goes green.
+- **Repos without branch protection** — if broken links won't block a PR merge, an issue is the only signal that survives past the CI run.
+- **Large docs sites** — when dozens of links break at once (e.g. a domain migration), a single issue is easier to triage than scrolling through CI logs.
+**When you don't need it:**
+- PRs where branch protection already blocks the merge on failure — a failed job is sufficient.
+- Local runs and one-off checks.
+**Setup:**
+```bash
+export GITHUB_TOKEN=ghp_...
+linksanity scan ./docs/ --github-issue --repo owner/repo
+```
+`GITHUB_TOKEN` is read from the environment only — never pass it as a CLI flag or store it in a file. In GitHub Actions, use the built-in token:
+```yaml
+env:
+  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+```
+The workflow job also needs `issues: write` permission:
+```yaml
+permissions:
+  contents: read
+  issues: write
+```
+## Use with AI agents
+linksanity is designed to be a clean tool call for AI agents. Use `--format json` so an agent can parse structured output without screen-scraping console text.
+**Exit codes** are the primary signal:
+| Code | Meaning |
+|---|---|
+| `0` | All links OK |
+| `1` | One or more broken links |
+| `2` | Invocation error |
+### JSON output schema
+```bash
+linksanity scan ./docs/ --format json --output results.json
+```
+Each item in the output array has:
+```json
+[
+  {
+    "url": "https://example.com/old",
+    "source_file": "docs/guide.md",
+    "line": 42,
+    "status": "broken",
+    "status_code": 404,
+    "redirect_url": null,
+    "error": null
+  }
+]
+```
+`status` is one of `"ok"`, `"broken"`, `"redirect"`, `"skipped"`, or `"error"`.
+### Python subprocess usage
+Use this when you want to drive linksanity from a Python script or agent — for example, to file tickets, send alerts, or trigger auto-repair after a scan. linksanity doesn't expose a public Python API, so `subprocess.run` is the correct integration point.
+`result.returncode` is the fast path: check it before touching the file. If it's `2`, something went wrong with invocation — read `result.stderr` for the error message rather than trying to parse the output file.
+```python
+import json
+import subprocess
+result = subprocess.run(
+    ["linksanity", "scan", "./docs/", "--format", "json", "--output", "results.json"],
+    capture_output=True,  # stdout goes to the file; stderr carries error messages
+    text=True,
+)
+if result.returncode == 2:
+    raise RuntimeError(f"linksanity invocation error: {result.stderr.strip()}")
+with open("results.json") as f:
+    links = json.load(f)
+# result.returncode == 1 means broken links exist; iterate to act on them
+broken = [r for r in links if r["status"] == "broken"]
+```
+### MCP tool definition
+Register linksanity as a tool so an AI agent can call it on demand:
+```json
+{
+  "name": "check_links",
+  "description": "Scan documentation files for broken links. Returns structured JSON. Exit code 1 means broken links were found.",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "paths": {
+        "type": "array",
+        "items": { "type": "string" },
+        "description": "Files or directories to scan"
+      },
+      "skip_urls_file": {
+        "type": "string",
+        "description": "Path to a file listing URLs to skip (optional)"
+      }
+    },
+    "required": ["paths"]
+  }
+}
+```
+Invoke it in your MCP server by shelling out to `linksanity scan <paths> --format json --output /tmp/results.json` and returning the parsed JSON.
+### Claude Code / claude-code tool call
+If you use Claude Code, you can invoke linksanity directly from the Claude CLI:
+```
+! linksanity scan ./docs/ --format json --output results.json
+```
+Then ask Claude to interpret the output:
+```
+Read results.json and summarise which links are broken and why they might have rotted.
+```
+## Options
+### `linksanity scan <paths...>`
+| Flag | Default | Description |
+|---|---|---|
+| `--workers N` | 5 | Max concurrent HTTP checks |
+| `--timeout N` | 10 | Per-request timeout (seconds) |
+| `--retry N` | 2 | Retries on 429/503 |
+| `--check-anchors` | off | Validate `#fragment` links |
+| `--ignore-domains FILE` | — | One domain per line to skip |
+| `--js-domains FILE` | — | Domains to check via Playwright |
+| `--skip-urls FILE` | — | URLs/patterns to skip (one per line, `*` wildcards ok) |
+| `--format` | console | `console`, `json`, or `csv` |
+| `--output FILE` | stdout | Write results to file |
+| `--report FILE` | — | Write Markdown summary to file |
+| `--github-issue` | off | Open/update a GitHub Issue |
+| `--repo OWNER/REPO` | — | Required with `--github-issue` |
+| `--config FILE` | auto | Path to `linksanity.toml` |
+### `linksanity crawl <url>`
+Same flags as `scan`, minus `--check-anchors` and `--js-domains`, plus:
+| Flag | Default | Description |
+|---|---|---|
+| `--max-pages N` | 500 | Stop after N pages crawled |
+| `--playwright-workers N` | 2 | Max concurrent browser sessions |
+| `--skip-urls FILE` | — | URLs/patterns to skip (one per line, `*` wildcards ok) |
+| `--block-analytics` | off | Block analytics/tracking domains in the browser |
+## Configuration file
+Place a `linksanity.toml` in your project root (auto-discovered):
+```toml
+workers = 10
+timeout = 15
+retry = 3
+check_anchors = false
+max_pages = 200
+block_analytics = true
+ignore_domains = ["status.example.com", "internal.example.com"]
+js_domains = ["spa.example.com"]
+skip_urls = [
+  "https://app.example.com/login",
+  "https://staging.example.com/*",
+]
+```
+## Exit codes
+| Code | Meaning |
+|---|---|
+| `0` | All links OK (or only redirects/skipped) |
+| `1` | One or more broken links |
+| `2` | Invocation error (bad arguments, missing file) |
+## Development
+```bash
+git clone https://github.com/linksanity/linksanity
+cd linksanity
+python -m venv .venv && source .venv/bin/activate
+pip install -e ".[dev,browser]"
+playwright install chromium
+# Run tests
+pytest
+# Lint + type check
+ruff check linksanity/ tests/
+mypy linksanity/
+```
+## License
+MIT