PyPI - thealgorithms-mcp - Versions diffs - 0.1.0__tar.gz - Mend

thealgorithms-mcp 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

thealgorithms_mcp-0.1.0/.gitignore +15 -0
thealgorithms_mcp-0.1.0/DESIGN.md +146 -0
thealgorithms_mcp-0.1.0/PKG-INFO +72 -0
thealgorithms_mcp-0.1.0/README.md +61 -0
thealgorithms_mcp-0.1.0/pyproject.toml +22 -0
thealgorithms_mcp-0.1.0/scripts/verify_stdio.py +125 -0
thealgorithms_mcp-0.1.0/server.json +20 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/__init__.py +3 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/fetch.py +56 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/index.py +141 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/parse.py +40 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/search.py +49 -0
thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/server.py +75 -0
thealgorithms_mcp-0.1.0/uv.lock +809 -0

thealgorithms_mcp-0.1.0/.gitignore ADDED Viewed

@@ -0,0 +1,15 @@
+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+.eggs/
+# Build artifacts
+/dist/
+/build/
+# uv
+.venv/
+# OS
+.DS_Store

thealgorithms_mcp-0.1.0/DESIGN.md ADDED Viewed

@@ -0,0 +1,146 @@
+# TheAlgorithms MCP — Design Spec
+**Status:** reviewed + concepts validated — ready to build · **Date:** 2026-05-30 · **Owner:** mcande21
+An MCP server that lets an LLM efficiently query [TheAlgorithms/Python](https://github.com/TheAlgorithms/Python)
+for algorithm implementations and their built-in usage examples (doctests).
+---
+## 1. Goal & scope
+- **Goal:** "query the system → get the implementation + examples" cheaply, mid-task, without the
+  model burning tokens browsing GitHub.
+- **Scope (v1):** Python repo only (**1,160 algorithms, 44 categories** — measured, not estimated;
+  richest doctests in the org).
+- **Storage model:** **Hybrid** — cache the small `DIRECTORY.md` index locally for instant fuzzy
+  search; fetch file *contents* on-demand from raw GitHub. Tiny footprint, near-zero staleness.
+- **Out of scope (v1):** other languages, semantic/embedding search, write access, running code.
+## 2. Why this maps cleanly
+- `DIRECTORY.md` is a free, pre-built index: `## Category` headers + `* [Name](path)` rows.
+- Subcategory hierarchy lives in the path (`data_structures/binary_tree/avl_tree.py`) — **derive
+  category from `path.split('/')[0]`**; ignore header depth. One regex parses the whole file.
+- Every file is self-contained with a module docstring and **doctests (`>>>`)** — the usage
+  examples ship *inside the source*, so we never synthesize examples.
+## 3. Data sources
+| What | URL |
+|------|-----|
+| Index | `https://raw.githubusercontent.com/TheAlgorithms/Python/master/DIRECTORY.md` |
+| File  | `https://raw.githubusercontent.com/TheAlgorithms/Python/master/<path>` |
+- Branch: `master`. Anonymous `raw.githubusercontent.com` (no token, no API rate limit).
+- Responses carry an **ETag** → conditional `If-None-Match` GET returns `304` when unchanged.
+  Cache validation costs ~0 bytes. **Verified empirically** (probe got a real ETag + a `304` on
+  revalidation) — this overrides the review's claim that raw GitHub omits ETags. If GitHub ever
+  drops the header, `fetch.py` falls back to the TTL timer (already specified), so we're safe either way.
+## 4. Tools (the MCP surface)
+The token-efficiency core: **search returns only paths + one-liners (cheap); full source is pulled
+only on demand, and `get_algorithm` has a `mode` so the model can grab just the examples.**
+### `list_categories() -> [{category, count}]`
+Top-level categories with entry counts. Served from cached index.
+### `search_algorithms(query: str, category?: str, limit: int = 10) -> [{name, category, path, score}]`
+Fuzzy lexical rank over `name + category + path` tokens (rapidfuzz). No file fetches — index only.
+Returns paths the model feeds to `get_algorithm`.
+### `get_algorithm(path: str, include_source: bool = True) -> {...}`
+On-demand fetch of one file (cached by path). Always returns the docstring + doctests together —
+that's the common case (search → read implementation + its examples in one call). The review flagged
+the original 3-mode enum as wrong-altitude: the model can't know whether a file has doctests before
+fetching, so forcing an upfront `mode` choice causes wasted round-trips. Dropped.
+- Returns `{path, github_url, docstring, doctests[], line_count, source?}`.
+- `include_source=False` is the only knob — a cheap peek (docstring + examples, no body) for
+  "is this the right file" disambiguation. Defaults to full.
+### `get_category(category: str) -> [{name, path}]`  *(core, phase 1)*
+All entries in one category — for "show me every sort." One-liner over the cached index
+(`filter path.split('/')[0] == category`). Promoted from optional: without it the model can only
+enumerate a category via a search query, which is awkward for "list everything in X."
+## 5. Internals
+```
+src/thealgorithms_mcp/
+  server.py   # FastMCP app, tool registration, stdio transport
+  index.py    # fetch + parse + cache DIRECTORY.md  (ETag, TTL)
+  fetch.py    # raw file fetch, cached by path (+commit/etag)
+  search.py   # rapidfuzz ranking over the parsed index
+  parse.py    # extract module docstring + doctest blocks from source
+```
+- **Index parse:** regex `^\s*\* \[(?P<name>.+?)\]\((?P<path>.+?\.py)\)$`; `category = path.split('/')[0]`.
+  Measured **100% match rate** on the live file (1,160/1,160 lines, all `.py`, zero root-level paths).
+  **Drift guard:** count matched vs. total `* [..](..)` lines; if match rate < 95% on a refresh, log
+  loudly and keep the prior cached index rather than silently dropping entries.
+- **Index cache:** `~/.cache/thealgorithms-mcp/directory.json` = `{entries, etag, fetched_at}`.
+  Refresh on TTL miss (default 24h) via conditional GET; `304` → bump `fetched_at`, keep entries.
+- **File cache:** `~/.cache/thealgorithms-mcp/files/<path>` keyed by path; revalidate via ETag.
+- **Doctest extraction:** scan the **whole source** for `>>>` / `...` continuation lines + following
+  expected-output lines, grouped into blocks. Verified this catches *every* doctest regardless of
+  whether it lives in the module, a class, or a function docstring (probe found all 3 blocks in
+  `merge_sort.py` via a flat source scan). This is simpler and more complete than walking the AST for
+  per-node docstrings, which would miss nothing extra. Separately use `ast.get_docstring(module)` for
+  the human-readable **description** field. (Review flagged module-only extraction as a bug — the
+  whole-source scan is the fix, and it's less machinery than the AST-walk alternative proposed.)
+- **Offline:** any fetch failure falls back to cache; tools degrade, never hard-crash.
+- **Errors:** unknown `path` → error that suggests calling `search_algorithms` first.
+## 6. Dependencies & runtime
+- `mcp` (FastMCP), `httpx`, `rapidfuzz`, `platformdirs`. Stdlib `ast`/`re` for parsing.
+- Python 3.11+. **stdio** transport (local).
+- Register in `~/.normandy-generic/mcp.json`:
+  ```json
+  { "thealgorithms": { "command": "uvx", "args": ["thealgorithms-mcp"] } }
+  ```
+## 7. Build phases
+1. **Index + search** — `index.py` (fetch/parse/cache + drift guard) + `search.py` +
+   `list_categories` / `search_algorithms` / `get_category`. Verify ranking on real queries.
+2. **Retrieval** — `fetch.py` + `parse.py` + `get_algorithm` (`include_source` toggle). Verify
+   whole-source doctest extraction on a function-heavy file (not just `merge_sort`).
+3. **Polish** — ETag revalidation (TTL fallback), offline degradation, README, `pyproject.toml`
+   `uvx` entry point.
+## 8. Open questions / future
+- **v2 multi-language:** parameterize repo (`Python`→`Java`/`Rust`/…); add `compare(name)` for
+  same-algorithm-across-languages. Index key becomes `(lang, path)`.
+- **Semantic search:** only if lexical fuzzy proves insufficient (would shift toward the "full local
+  clone + embeddings" model we deferred).
+- **Normandy packaging:** could also wrap as a `/biking`-style skill instead of/alongside the MCP if
+  we want it inside the framework's skill surface — decide after v1.
+---
+## Appendix A — Concept validation (probe, 2026-05-30)
+Stdlib-only probe (`urllib` + `ast` + `difflib`) against the live repo. difflib stands in for
+production rapidfuzz; urllib for httpx. All four load-bearing concepts passed:
+| Concept | Result |
+|---------|--------|
+| **Parse `DIRECTORY.md`** | 200 OK, **1,160 entries / 44 categories** in 0.1s. 100% regex match, all `.py`, zero root-level paths. |
+| **ETag conditional GET** | Real ETag returned; revalidation → **`304`**. Cache validation is free. |
+| **Fetch + extract** | `sorts/merge_sort.py`: docstring present, 3 doctest blocks found via flat source scan. |
+| **Fuzzy ranking** | Correct top hit for all of: binary search, dijkstra, knapsack, merge sort, lru cache. |
+## Appendix B — Review dispositions (adversarial pass)
+| # | Finding | Disposition |
+|---|---------|-------------|
+| 1 | "raw GitHub omits ETags; conditional GET is fiction" | **Rejected** — probe proves ETag + `304` work. TTL fallback kept as belt-and-suspenders. |
+| 2 | DIRECTORY.md format edge cases (non-`.py`, root-level, nested parens) | **Downgraded** — none exist in live file (100% match). Kept the drift guard as cheap insurance. |
+| 3 | `mode` enum is wrong-altitude for an LLM | **Accepted** — replaced with `include_source: bool`. |
+| 4 | Doctest extraction misses function/class docstrings | **Accepted (simplified)** — whole-source `>>>` scan; proven complete. |
+| 5 | `get_category` shouldn't be optional | **Accepted** — promoted to phase-1 core tool. |
+</content>
+</invoke>

thealgorithms_mcp-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,72 @@
+Metadata-Version: 2.4
+Name: thealgorithms-mcp
+Version: 0.1.0
+Summary: MCP server for querying TheAlgorithms/Python — search algorithms and fetch implementations with their doctests as examples.
+Requires-Python: >=3.11
+Requires-Dist: httpx>=0.27
+Requires-Dist: mcp>=1.2.0
+Requires-Dist: platformdirs>=4.0
+Requires-Dist: rapidfuzz>=3.9
+Description-Content-Type: text/markdown
+# thealgorithms-mcp
+mcp-name: io.github.mcande21/thealgorithms-mcp
+An [MCP](https://modelcontextprotocol.io) server for querying
+[TheAlgorithms/Python](https://github.com/TheAlgorithms/Python) — search ~1,160 algorithm
+implementations and fetch any one with its **doctests as usage examples**.
+Hybrid design: the small `DIRECTORY.md` index is cached locally (ETag + 24h TTL) for instant
+fuzzy search; file contents are fetched on demand from `raw.githubusercontent.com`. No API token,
+no rate limits, tiny footprint. See [`DESIGN.md`](DESIGN.md).
+## Tools
+| Tool | Purpose |
+|------|---------|
+| `list_categories()` | Categories (sorts, graphs, dynamic_programming, …) with counts |
+| `search_algorithms(query, category?, limit=10)` | Ranked `{name, category, path, score}` |
+| `get_category(category)` | Every algorithm in a category |
+| `get_algorithm(path, include_source=True)` | Source + extracted doctests for one file |
+Typical flow: `search_algorithms("dijkstra")` → `get_algorithm("graphs/dijkstra.py")`.
+## Install
+**From PyPI (recommended):**
+```json
+{ "thealgorithms": { "command": "uvx", "args": ["thealgorithms-mcp"] } }
+```
+**From GitHub (no PyPI needed):**
+```json
+{ "thealgorithms": {
+    "command": "uvx",
+    "args": ["--from", "git+https://github.com/mcande21/thealgorithms-mcp", "thealgorithms-mcp"] } }
+```
+**From a local checkout (development):**
+```bash
+uv sync
+uv run thealgorithms-mcp          # serves over stdio
+```
+```json
+{ "thealgorithms": {
+    "command": "uv",
+    "args": ["run", "--directory", "/path/to/thealgorithms-mcp", "thealgorithms-mcp"] } }
+```
+Add any of the above to `~/.normandy-generic/mcp.json` (or your MCP client config).
+## Verify
+```bash
+uv run python scripts/verify_stdio.py
+```
+Spawns the server over stdio and asserts every tool against the live repo.

thealgorithms_mcp-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,61 @@
+# thealgorithms-mcp
+mcp-name: io.github.mcande21/thealgorithms-mcp
+An [MCP](https://modelcontextprotocol.io) server for querying
+[TheAlgorithms/Python](https://github.com/TheAlgorithms/Python) — search ~1,160 algorithm
+implementations and fetch any one with its **doctests as usage examples**.
+Hybrid design: the small `DIRECTORY.md` index is cached locally (ETag + 24h TTL) for instant
+fuzzy search; file contents are fetched on demand from `raw.githubusercontent.com`. No API token,
+no rate limits, tiny footprint. See [`DESIGN.md`](DESIGN.md).
+## Tools
+| Tool | Purpose |
+|------|---------|
+| `list_categories()` | Categories (sorts, graphs, dynamic_programming, …) with counts |
+| `search_algorithms(query, category?, limit=10)` | Ranked `{name, category, path, score}` |
+| `get_category(category)` | Every algorithm in a category |
+| `get_algorithm(path, include_source=True)` | Source + extracted doctests for one file |
+Typical flow: `search_algorithms("dijkstra")` → `get_algorithm("graphs/dijkstra.py")`.
+## Install
+**From PyPI (recommended):**
+```json
+{ "thealgorithms": { "command": "uvx", "args": ["thealgorithms-mcp"] } }
+```
+**From GitHub (no PyPI needed):**
+```json
+{ "thealgorithms": {
+    "command": "uvx",
+    "args": ["--from", "git+https://github.com/mcande21/thealgorithms-mcp", "thealgorithms-mcp"] } }
+```
+**From a local checkout (development):**
+```bash
+uv sync
+uv run thealgorithms-mcp          # serves over stdio
+```
+```json
+{ "thealgorithms": {
+    "command": "uv",
+    "args": ["run", "--directory", "/path/to/thealgorithms-mcp", "thealgorithms-mcp"] } }
+```
+Add any of the above to `~/.normandy-generic/mcp.json` (or your MCP client config).
+## Verify
+```bash
+uv run python scripts/verify_stdio.py
+```
+Spawns the server over stdio and asserts every tool against the live repo.

thealgorithms_mcp-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,22 @@
+[project]
+name = "thealgorithms-mcp"
+version = "0.1.0"
+description = "MCP server for querying TheAlgorithms/Python — search algorithms and fetch implementations with their doctests as examples."
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "mcp>=1.2.0",
+    "httpx>=0.27",
+    "rapidfuzz>=3.9",
+    "platformdirs>=4.0",
+]
+[project.scripts]
+thealgorithms-mcp = "thealgorithms_mcp.server:main"
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+[tool.hatch.build.targets.wheel]
+packages = ["src/thealgorithms_mcp"]

thealgorithms_mcp-0.1.0/scripts/verify_stdio.py ADDED Viewed

@@ -0,0 +1,125 @@
+"""End-to-end verification: spawn the server over stdio, call every tool, assert correctness
+against the LIVE TheAlgorithms/Python repo. Exits non-zero on any failure.
+Run with:  uv run python scripts/verify_stdio.py
+"""
+from __future__ import annotations
+import asyncio
+import json
+import sys
+from mcp import ClientSession, StdioServerParameters
+from mcp.client.stdio import stdio_client
+PASS, FAIL = "\033[32mPASS\033[0m", "\033[31mFAIL\033[0m"
+failures: list[str] = []
+def check(name: str, cond: bool, detail: str = "") -> None:
+    print(f"  [{PASS if cond else FAIL}] {name}" + (f"  — {detail}" if detail else ""))
+    if not cond:
+        failures.append(name)
+def payload(res):
+    """Extract a tool's return value.
+    FastMCP puts the structured value in `structuredContent`, wrapping list/scalar returns
+    as {"result": ...}. Fall back to concatenating JSON text blocks.
+    """
+    sc = res.structuredContent
+    if isinstance(sc, dict) and set(sc.keys()) == {"result"}:
+        return sc["result"]
+    if sc is not None:
+        return sc
+    texts = [c.text for c in res.content if getattr(c, "type", None) == "text"]
+    if len(texts) == 1:
+        return json.loads(texts[0])
+    return [json.loads(t) for t in texts]
+async def main() -> int:
+    # Default: run the local package. Override by passing a full command as argv,
+    # e.g. `verify_stdio.py uvx --from git+https://github.com/mcande21/thealgorithms-mcp thealgorithms-mcp`
+    if len(sys.argv) > 1:
+        command, args = sys.argv[1], sys.argv[2:]
+    else:
+        command, args = sys.executable, ["-m", "thealgorithms_mcp.server"]
+    params = StdioServerParameters(command=command, args=args)
+    print(f"Spawning server over stdio: {command} {' '.join(args)}")
+    async with stdio_client(params) as (read, write):
+        async with ClientSession(read, write) as session:
+            await session.initialize()
+            print("Server initialized over stdio.\n")
+            # --- tool discovery ---
+            tools = {t.name for t in (await session.list_tools()).tools}
+            expected = {"list_categories", "search_algorithms", "get_category", "get_algorithm"}
+            check("4 tools registered", tools == expected, f"{sorted(tools)}")
+            # --- list_categories ---
+            cats = payload(await session.call_tool("list_categories", {}))
+            names = {c["category"] for c in cats}
+            check("list_categories returns categories", len(cats) >= 40, f"{len(cats)} categories")
+            check("categories include sorts/graphs/maths", {"sorts", "graphs", "maths"} <= names)
+            # --- search_algorithms: top hit must be the canonical file ---
+            cases = {
+                "binary search": "searches/binary_search.py",
+                "dijkstra": "graphs/dijkstra.py",
+                "merge sort": "sorts/merge_sort.py",
+                "knapsack": "dynamic_programming/knapsack.py",
+            }
+            for q, want in cases.items():
+                res = payload(await session.call_tool("search_algorithms", {"query": q}))
+                top = res[0]["path"] if res else "<none>"
+                check(f"search {q!r} -> {want}", top == want, f"got {top}")
+            # category-constrained search
+            res = payload(
+                await session.call_tool(
+                    "search_algorithms", {"query": "quick", "category": "sorts"}
+                )
+            )
+            check("scoped search stays in category", all(r["category"] == "sorts" for r in res))
+            # --- get_category ---
+            sorts = payload(await session.call_tool("get_category", {"category": "sorts"}))
+            sort_paths = {e["path"] for e in sorts}
+            check("get_category('sorts') lists sorts", "sorts/merge_sort.py" in sort_paths,
+                  f"{len(sorts)} entries")
+            # --- get_algorithm: source + doctests on a simple file ---
+            ms = payload(await session.call_tool("get_algorithm", {"path": "sorts/merge_sort.py"}))
+            check("get_algorithm returns source", "def merge_sort" in ms.get("source", ""))
+            check("get_algorithm returns doctests", len(ms.get("doctests", [])) >= 1,
+                  f"{len(ms.get('doctests', []))} examples")
+            check("get_algorithm returns description", bool(ms.get("description")))
+            check("get_algorithm has github_url", ms.get("github_url", "").startswith("https://github.com/"))
+            # --- doctest extraction on a FUNCTION-HEAVY file (the flagged risk) ---
+            bs = payload(await session.call_tool("get_algorithm", {"path": "searches/binary_search.py"}))
+            check("function-level doctests extracted", len(bs.get("doctests", [])) >= 3,
+                  f"{len(bs.get('doctests', []))} examples across functions")
+            # --- include_source=False peek ---
+            peek = payload(await session.call_tool(
+                "get_algorithm", {"path": "sorts/merge_sort.py", "include_source": False}))
+            check("peek omits source but keeps examples",
+                  "source" not in peek and len(peek.get("doctests", [])) >= 1)
+            # --- bad path is handled gracefully ---
+            bad = payload(await session.call_tool("get_algorithm", {"path": "nope/not_real.py"}))
+            check("bad path returns guidance, not crash", "error" in bad)
+    print()
+    if failures:
+        print(f"\033[31m{len(failures)} FAILED:\033[0m {failures}")
+        return 1
+    print("\033[32mALL CHECKS PASSED\033[0m")
+    return 0
+if __name__ == "__main__":
+    sys.exit(asyncio.run(main()))

thealgorithms_mcp-0.1.0/server.json ADDED Viewed

@@ -0,0 +1,20 @@
+{
+  "$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
+  "name": "io.github.mcande21/thealgorithms-mcp",
+  "description": "Query TheAlgorithms/Python — search algorithm implementations and fetch them with their doctests as examples.",
+  "repository": {
+    "url": "https://github.com/mcande21/thealgorithms-mcp",
+    "source": "github"
+  },
+  "version": "0.1.0",
+  "packages": [
+    {
+      "registryType": "pypi",
+      "identifier": "thealgorithms-mcp",
+      "version": "0.1.0",
+      "transport": {
+        "type": "stdio"
+      }
+    }
+  ]
+}

thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/__init__.py ADDED Viewed

@@ -0,0 +1,3 @@
+"""TheAlgorithms MCP — query TheAlgorithms/Python for implementations + doctests."""
+__version__ = "0.1.0"

thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/fetch.py ADDED Viewed

@@ -0,0 +1,56 @@
+"""On-demand fetch of a single algorithm file's source, cached by path + ETag."""
+from __future__ import annotations
+import json
+from pathlib import Path
+import httpx
+from .index import CACHE_DIR, RAW_BASE
+FILE_CACHE_DIR = CACHE_DIR / "files"
+def _cache_paths(path: str) -> tuple[Path, Path]:
+    safe = path.replace("/", "__")
+    return FILE_CACHE_DIR / safe, FILE_CACHE_DIR / (safe + ".meta")
+def get_file(path: str) -> str:
+    """Return raw source for a repo-relative path.
+    Conditional GET via ETag; 304 reuses the cached body. Network failures fall back to
+    cache when present, else raise. Raises FileNotFoundError on a 404 (bad path).
+    """
+    body_file, meta_file = _cache_paths(path)
+    etag = None
+    cached_body = None
+    if body_file.exists():
+        cached_body = body_file.read_text()
+        if meta_file.exists():
+            try:
+                etag = json.loads(meta_file.read_text()).get("etag")
+            except (json.JSONDecodeError, OSError):
+                etag = None
+    headers = {"User-Agent": "thealgorithms-mcp"}
+    if etag:
+        headers["If-None-Match"] = etag
+    try:
+        resp = httpx.get(RAW_BASE + path, headers=headers, timeout=30, follow_redirects=True)
+    except httpx.HTTPError:
+        if cached_body is not None:
+            return cached_body
+        raise
+    if resp.status_code == 304 and cached_body is not None:
+        return cached_body
+    if resp.status_code == 404:
+        raise FileNotFoundError(path)
+    resp.raise_for_status()
+    FILE_CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    body_file.write_text(resp.text)
+    meta_file.write_text(json.dumps({"etag": resp.headers.get("ETag")}))
+    return resp.text

thealgorithms_mcp-0.1.0/src/thealgorithms_mcp/index.py ADDED Viewed

@@ -0,0 +1,141 @@
+"""Fetch, parse, and cache TheAlgorithms/Python DIRECTORY.md.
+Hybrid model: the index is small (~1,160 entries) so we cache it whole, validated by
+ETag with a 24h TTL fallback. File *contents* are fetched on demand (see fetch.py).
+"""
+from __future__ import annotations
+import json
+import re
+import time
+from pathlib import Path
+import httpx
+from platformdirs import user_cache_dir
+REPO = "TheAlgorithms/Python"
+BRANCH = "master"
+RAW_BASE = f"https://raw.githubusercontent.com/{REPO}/{BRANCH}/"
+DIRECTORY_URL = RAW_BASE + "DIRECTORY.md"
+TTL_SECONDS = 24 * 3600
+DRIFT_THRESHOLD = 0.95  # keep prior cache if a refresh matches fewer than this fraction of links
+CACHE_DIR = Path(user_cache_dir("thealgorithms-mcp"))
+INDEX_FILE = CACHE_DIR / "directory.json"
+ENTRY_RE = re.compile(r"^\s*\* \[(?P<name>.+?)\]\((?P<path>.+?\.py)\)\s*$")
+LINK_RE = re.compile(r"^\s*\* \[.+?\]\(.+?\)\s*$")
+# Process-lifetime memo so repeated tool calls don't re-read disk.
+_memo: dict | None = None
+def github_url(path: str) -> str:
+    """Human-facing blob URL for a repo-relative path."""
+    return f"https://github.com/{REPO}/blob/{BRANCH}/{path}"
+def _parse(text: str) -> tuple[list[dict], float]:
+    """Parse DIRECTORY.md into entries; return (entries, match_rate vs all link lines)."""
+    link_lines = 0
+    entries: list[dict] = []
+    for line in text.splitlines():
+        if LINK_RE.match(line):
+            link_lines += 1
+        m = ENTRY_RE.match(line)
+        if m:
+            path = m.group("path")
+            entries.append(
+                {"name": m.group("name"), "path": path, "category": path.split("/")[0]}
+            )
+    match_rate = (len(entries) / link_lines) if link_lines else 1.0
+    return entries, match_rate
+def _read_cache() -> dict | None:
+    if not INDEX_FILE.exists():
+        return None
+    try:
+        return json.loads(INDEX_FILE.read_text())
+    except (json.JSONDecodeError, OSError):
+        return None
+def _write_cache(data: dict) -> None:
+    CACHE_DIR.mkdir(parents=True, exist_ok=True)
+    INDEX_FILE.write_text(json.dumps(data))
+def load_index(force: bool = False) -> list[dict]:
+    """Return the parsed index, refreshing from GitHub when stale.
+    Order of operations:
+      1. Serve the process memo if present and fresh (and not forced).
+      2. Serve the disk cache if fresh.
+      3. Conditional GET (If-None-Match). 304 -> reuse cached entries, bump timestamp.
+         200 -> parse, apply drift guard, persist.
+      4. Any network failure -> fall back to cached entries (offline degradation).
+    """
+    global _memo
+    now = time.time()
+    cache = _read_cache()
+    if not force and _memo and (now - _memo["fetched_at"] < TTL_SECONDS):
+        return _memo["entries"]
+    if not force and cache and (now - cache.get("fetched_at", 0) < TTL_SECONDS):
+        _memo = cache
+        return cache["entries"]
+    headers = {"User-Agent": "thealgorithms-mcp"}
+    if cache and cache.get("etag"):
+        headers["If-None-Match"] = cache["etag"]
+    try:
+        resp = httpx.get(DIRECTORY_URL, headers=headers, timeout=30, follow_redirects=True)
+    except httpx.HTTPError:
+        if cache:
+            _memo = cache
+            return cache["entries"]  # offline: stale is better than dead
+        raise
+    if resp.status_code == 304 and cache:
+        cache["fetched_at"] = now
+        _write_cache(cache)
+        _memo = cache
+        return cache["entries"]
+    resp.raise_for_status()
+    entries, match_rate = _parse(resp.text)
+    # Drift guard: a sudden drop in match rate means the format changed under us.
+    if match_rate < DRIFT_THRESHOLD and cache and cache.get("entries"):
+        # Keep the known-good index rather than silently shipping a broken one.
+        cache["fetched_at"] = now
+        _write_cache(cache)
+        _memo = cache
+        return cache["entries"]
+    data = {
+        "entries": entries,
+        "etag": resp.headers.get("ETag"),
+        "fetched_at": now,
+        "match_rate": match_rate,
+    }
+    _write_cache(data)
+    _memo = data
+    return entries
+def list_categories(entries: list[dict]) -> list[dict]:
+    counts: dict[str, int] = {}
+    for e in entries:
+        counts[e["category"]] = counts.get(e["category"], 0) + 1
+    return [{"category": c, "count": n} for c, n in sorted(counts.items())]
+def category_entries(entries: list[dict], category: str) -> list[dict]:
+    return [
+        {"name": e["name"], "path": e["path"]}
+        for e in entries
+        if e["category"] == category
+    ]