PyPI - jsmithpkp-llm-client-kit - Versions diffs - 0.1.3__py3-none-any.whl - Mend

jsmithpkp-llm-client-kit 0.1.3__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

jsmithpkp_llm_client_kit-0.1.3.dist-info/METADATA +116 -0
jsmithpkp_llm_client_kit-0.1.3.dist-info/RECORD +7 -0
jsmithpkp_llm_client_kit-0.1.3.dist-info/WHEEL +5 -0
jsmithpkp_llm_client_kit-0.1.3.dist-info/top_level.txt +1 -0
llm_client_kit/__init__.py +21 -0
llm_client_kit/client.py +733 -0
llm_client_kit/py.typed +0 -0

jsmithpkp_llm_client_kit-0.1.3.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,116 @@
+Metadata-Version: 2.4
+Name: jsmithpkp-llm-client-kit
+Version: 0.1.3
+Summary: Cached LLM client with Ollama + Anthropic provider dispatch, fixture mode, and JSON-schema-strict completions.
+Author: Jonathan Smith
+License: MIT
+Project-URL: Homepage, https://github.com/jsmithpkp21/llm-client-kit
+Project-URL: Issues, https://github.com/jsmithpkp21/llm-client-kit/issues
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+Provides-Extra: anthropic
+Requires-Dist: anthropic>=0.40; extra == "anthropic"
+# llm-client-kit
+Cached, fixture-aware LLM client with **Ollama** (default) and **Anthropic** provider dispatch. Carved out of `resume-builder/src/resume_builder/llm_client.py` so it can be shared across internal Python apps without each app rolling its own dispatch + caching layer.
+PyPI: [`jsmithpkp-llm-client-kit`](https://pypi.org/project/jsmithpkp-llm-client-kit/). Import path stays `llm_client_kit` (PyPI distribution name and Python import name are allowed to differ).
+## Install
+```bash
+pip install jsmithpkp-llm-client-kit
+```
+For the Anthropic provider path, install the extra:
+```bash
+pip install 'jsmithpkp-llm-client-kit[anthropic]'
+```
+## Public API
+```python
+from llm_client_kit import (
+    LLMClient,
+    LLMResponse,
+    LLMEndpointUnreachableError,
+    AnthropicProviderError,
+    PROVIDER_OLLAMA,
+    PROVIDER_ANTHROPIC,
+)
+```
+### Construct from env (resume-builder-compatible)
+```python
+client = LLMClient.from_env()  # reads RESUME_BUILDER_LLM_* env vars
+result = client.complete_json(
+    namespace="my_stage",
+    system_prompt="You are a JSON-only classifier...",
+    user_payload={"subject": "...", "body": "..."},
+)
+```
+Env vars honored:
+| Env var | Default | Notes |
+|---|---|---|
+| `RESUME_BUILDER_LLM_PROVIDER` | `ollama` | `ollama` or `anthropic` |
+| `RESUME_BUILDER_LLM_API_URL` | provider-default | OpenAI-compat for Ollama; ignored by Anthropic adapter |
+| `RESUME_BUILDER_LLM_MODEL` | `llama3.1:8b` / `claude-sonnet-4-6` | per-provider default |
+| `RESUME_BUILDER_LLM_API_KEY` / `ANTHROPIC_API_KEY` | unset | required for Anthropic |
+| `RESUME_BUILDER_LLM_TIMEOUT_SECONDS` | `90` | per-request timeout |
+| `RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>` | unset | per-stage timeout override |
+| `RESUME_BUILDER_LLM_MAX_TOKENS` | `2048` | Anthropic only |
+| `RESUME_BUILDER_LLM_FIXTURE` | `0` | fixture mode (no network calls; cache-only) |
+| `RESUME_BUILDER_LLM_CACHE_DIR` | `.llm_cache` / `tests/fixtures/llm_cache` (fixture mode) | response cache location |
+> The env var prefix `RESUME_BUILDER_LLM_*` is preserved from the carve-out source for v0.1.0 backward compat. A future release will parameterize the prefix so consumers can use their own namespace.
+### Construct directly (no env)
+```python
+from pathlib import Path
+from llm_client_kit import LLMClient, PROVIDER_OLLAMA
+client = LLMClient(
+    endpoint="http://localhost:11434/v1/chat/completions",
+    model="llama3.2:3b",
+    cache_dir=Path("./.cache"),
+    fixture_mode=False,
+    timeout_seconds=30.0,
+    provider=PROVIDER_OLLAMA,
+)
+```
+## Error contract
+`complete_json` documents three failure modes:
+- `ValueError` — invalid / non-object JSON response, malformed cache payload.
+- `RuntimeError` — provider-agnostic failures (cache I/O, fixture-mode miss) AND Ollama-path transport/parse failures.
+- `LLMEndpointUnreachableError` (`RuntimeError` subclass) — connection refused / DNS failure / host unreachable, on the Ollama path.
+- `AnthropicProviderError` (NOT a `RuntimeError` subclass) — any failure on the Anthropic path. **Intentionally** outside the `(RuntimeError, ValueError)` fallback chain: a paid API call halting loudly is better than silently degrading.
+See class docstrings for the full rationale.
+## Releasing
+Releases are automated. Tag-triggered workflow at `.github/workflows/release.yml`:
+1. Bump `version` in `pyproject.toml` and `__version__` in `src/llm_client_kit/__init__.py` on a PR.
+2. After the PR merges, tag `vX.Y.Z` on `main` and push the tag.
+3. The `release` workflow runs `gitleaks` against the working tree AND full git history — any finding aborts the workflow before the build. Then `python -m build` produces sdist + wheel, and `pypa/gh-action-pypi-publish` uploads via PyPI Trusted Publisher OIDC (no API token stored in repo secrets).
+```bash
+# example bump-and-release flow
+git switch main && git pull
+# (PR merged that bumps version to 0.1.4)
+git tag v0.1.4
+git push origin v0.1.4
+# watch: gh run watch
+```
+The secret-scan step is **mandatory** and is the only thing standing between an accidentally-committed credential and a published wheel. Do not edit the workflow to skip it without first understanding what it catches.

jsmithpkp_llm_client_kit-0.1.3.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,7 @@
+llm_client_kit/__init__.py,sha256=3sPxuoqCC53D4_ZGzxHv_ns-ycGovRlmiULNjWYY4eU,432
+llm_client_kit/client.py,sha256=gLMb1gmhtaJBMh-RICid5K7edktiYgaZ03jc8RQMjic,32073
+llm_client_kit/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+jsmithpkp_llm_client_kit-0.1.3.dist-info/METADATA,sha256=sM_3K5SqYWDNURv0JpnlOByMoIbv1waZ-liFgiixQ7I,4709
+jsmithpkp_llm_client_kit-0.1.3.dist-info/WHEEL,sha256=aeYiig01lYGDzBgS8HxWXOg3uV61G9ijOsup-k9o1sk,91
+jsmithpkp_llm_client_kit-0.1.3.dist-info/top_level.txt,sha256=C4F4-tQWsuciva1ZX4n3B6MXh-mPOIH8tBy5QYtbMfE,15
+jsmithpkp_llm_client_kit-0.1.3.dist-info/RECORD,,

jsmithpkp_llm_client_kit-0.1.3.dist-info/WHEEL ADDED Viewed

@@ -0,0 +1,5 @@
+Wheel-Version: 1.0
+Generator: setuptools (82.0.1)
+Root-Is-Purelib: true
+Tag: py3-none-any

jsmithpkp_llm_client_kit-0.1.3.dist-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ llm_client_kit

llm_client_kit/__init__.py ADDED Viewed

@@ -0,0 +1,21 @@
+"""Shared LLM client with Ollama + Anthropic provider dispatch."""
+from llm_client_kit.client import (
+    PROVIDER_ANTHROPIC,
+    PROVIDER_OLLAMA,
+    AnthropicProviderError,
+    LLMClient,
+    LLMEndpointUnreachableError,
+    LLMResponse,
+)
+__all__ = [
+    "AnthropicProviderError",
+    "LLMClient",
+    "LLMEndpointUnreachableError",
+    "LLMResponse",
+    "PROVIDER_ANTHROPIC",
+    "PROVIDER_OLLAMA",
+]
+__version__ = "0.1.3"

llm_client_kit/client.py ADDED Viewed

@@ -0,0 +1,733 @@
+#!/usr/bin/env python3
+"""Cached LLM client with Ollama + Anthropic provider dispatch.
+Lifted from resume-builder's internal ``llm_client.py`` to be a shared
+dependency consumed via the internal pypiserver. Behavior is identical
+to the resume-builder original at v0.1.0; the only changes are dropping
+the host-app-specific path-safety guard (``assert_not_blocked_runtime_input``)
+and broadening the docstrings. The env var convention (``RESUME_BUILDER_LLM_*``)
+is preserved verbatim for v0.1.0 so resume-builder's adoption is a no-op
+import swap — a follow-up release will parameterize the prefix.
+Consumers that don't want to use the env var convention can construct
+``LLMClient`` directly via ``__init__`` instead of ``from_env``.
+"""
+from __future__ import annotations
+import errno
+import hashlib
+import json
+import os
+import socket
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Any, TypeVar
+_LLMClientT = TypeVar("_LLMClientT", bound="LLMClient")
+from urllib.error import URLError
+from urllib.request import Request, urlopen
+_DEFAULT_CHAT_ENDPOINT = "http://localhost:11434/v1/chat/completions"
+_DEFAULT_MODEL = "llama3.1:8b"
+# Per-provider endpoint default — picked at `from_env` time when no
+# `RESUME_BUILDER_LLM_API_URL` override is set. The Anthropic adapter
+# doesn't actually use `endpoint` at request time (the SDK manages the
+# URL internally), but the value still gets baked into cache keys and
+# diagnostic output, so defaulting to the Anthropic API URL when the
+# provider is Anthropic keeps both honest.
+_DEFAULT_ANTHROPIC_ENDPOINT = "https://api.anthropic.com/v1/messages"
+# Bumped from 30s to 90s in #414: on local llama3.1:8b the longer
+# transform_for_role / enrich_data prompts routinely exceeded 30s on a
+# 12GB consumer GPU. Connection-refused / DNS failures still return
+# instantly (they bypass timeout), so the higher ceiling does not slow
+# down the "endpoint unreachable" diagnostic path. Per-stage env vars
+# below let callers tighten or loosen individual stages.
+_DEFAULT_TIMEOUT_SECONDS = 90.0
+_FIXTURE_ENV = "RESUME_BUILDER_LLM_FIXTURE"
+_CACHE_DIR_ENV = "RESUME_BUILDER_LLM_CACHE_DIR"
+_ENDPOINT_ENV = "RESUME_BUILDER_LLM_API_URL"
+_MODEL_ENV = "RESUME_BUILDER_LLM_MODEL"
+_TIMEOUT_ENV = "RESUME_BUILDER_LLM_TIMEOUT_SECONDS"
+_PROVIDER_ENV = "RESUME_BUILDER_LLM_PROVIDER"
+_API_KEY_ENV = "RESUME_BUILDER_LLM_API_KEY"
+_MAX_TOKENS_ENV = "RESUME_BUILDER_LLM_MAX_TOKENS"
+# Supported provider identifiers.
+PROVIDER_OLLAMA = "ollama"
+PROVIDER_ANTHROPIC = "anthropic"
+_DEFAULT_PROVIDER = PROVIDER_OLLAMA
+# When the user opts into the Anthropic provider but doesn't override
+# the model env var, default to Sonnet 4.6 — cost-efficient frontier
+# model with strong instruction following on long, rule-dense prompts
+# (`_BODY_SYSTEM_PROMPT_BASE` is ~4K tokens of HARD RULES + addendum).
+# Override via RESUME_BUILDER_LLM_MODEL (e.g. claude-opus-4-7).
+_DEFAULT_ANTHROPIC_MODEL = "claude-sonnet-4-6"
+# Per-request output cap for the Anthropic adapter. Cover-letter body
+# drafts return 600-1200 tokens in practice; 2048 gives headroom without
+# inviting runaway output. Override via RESUME_BUILDER_LLM_MAX_TOKENS.
+_DEFAULT_MAX_TOKENS = 2048
+# Per-stage timeout env var prefix (#414). Resolution order at the
+# request site: per-stage override (RESUME_BUILDER_LLM_TIMEOUT_<NS>) ->
+# global override (RESUME_BUILDER_LLM_TIMEOUT_SECONDS) -> default.
+_PER_STAGE_TIMEOUT_ENV_PREFIX = "RESUME_BUILDER_LLM_TIMEOUT_"
+def _per_stage_timeout_env(namespace: str) -> str:
+    """Return the env var name that overrides timeout for ``namespace``.
+    Namespace strings are lowercase snake_case in callers
+    (``transform_for_role``, ``enrich_data``, ...). The matching env var
+    upper-cases the suffix only — the rest of the prefix is fixed so the
+    var name is predictable and greppable.
+    """
+    return f"{_PER_STAGE_TIMEOUT_ENV_PREFIX}{namespace.upper()}"
+def _parse_timeout_env(raw: str | None) -> float:
+    """Parse RESUME_BUILDER_LLM_TIMEOUT_SECONDS, falling back to the default.
+    Empty / unset / non-numeric values fall back to the default; values
+    <= 0 also fall back since urlopen requires a positive timeout (or
+    None, but we don't expose 'no timeout' as a knob).
+    """
+    if raw is None:
+        return _DEFAULT_TIMEOUT_SECONDS
+    try:
+        value = float(raw.strip())
+    except ValueError:
+        return _DEFAULT_TIMEOUT_SECONDS
+    if value <= 0:
+        return _DEFAULT_TIMEOUT_SECONDS
+    return value
+@dataclass(frozen=True)
+class LLMResponse:
+    content: str
+    cache_key: str
+    from_cache: bool
+class LLMEndpointUnreachableError(RuntimeError):
+    """Raised when the LLM endpoint refuses connection or DNS resolution fails.
+    A subclass of ``RuntimeError`` so existing callers that catch
+    ``(RuntimeError, ValueError)`` to degrade to baseline keep working.
+    Build-pipeline code that wants to emit the "endpoint unreachable"
+    operator warning and fast-fail subsequent stages can narrow on this
+    subclass without disturbing the broader transport-error path
+    (timeouts, mid-stream resets, etc.) which keeps using bare
+    ``RuntimeError`` since those are recoverable per-stage failures.
+    """
+class AnthropicProviderError(Exception):
+    """Raised on any Anthropic-provider failure — **explicitly NOT a
+    subclass of ``RuntimeError``**.
+    The pipeline's existing fallback handlers catch
+    ``(RuntimeError, ValueError)`` to degrade gracefully to deterministic
+    logic when the local LLM is flaky. That's the right behavior when
+    a user is running on Ollama and the local model is intermittently
+    slow or broken — fall back to the deterministic baseline and keep
+    moving.
+    When the user has explicitly opted into the Anthropic provider via
+    ``RESUME_BUILDER_LLM_PROVIDER=anthropic``, that fallback is wrong:
+    they paid for the API call expecting frontier-model quality, and
+    silently degrading to non-LLM-tailored output produces something
+    far worse than what they asked for (and takes time + tokens to
+    fail at). Raising a separate exception base means Anthropic
+    failures propagate **uncaught** through ``(RuntimeError, ValueError)``
+    handlers and halt the build with a clear failure mode.
+    Callers must not catch this exception in their existing fallback
+    blocks. If a caller genuinely needs to handle Anthropic failures
+    (e.g. retry logic), it should catch ``AnthropicProviderError``
+    explicitly.
+    **Raised by:** ``LLMClient.complete_json()`` when
+    ``provider == "anthropic"``. This means ``complete_json``'s
+    documented ``(RuntimeError, ValueError)`` escape contract no
+    longer covers every path — on the Anthropic path,
+    ``AnthropicProviderError`` can also escape. The "anything else
+    escaping from here is a real bug" rule in ``complete_json``'s
+    docstring applies only to the Ollama path.
+    """
+# Allowed provider identifiers. Anything else fails fast at construction
+# time so a typo in `RESUME_BUILDER_LLM_PROVIDER` doesn't silently route
+# requests to the Ollama endpoint while the user thinks Anthropic is
+# active.
+_VALID_PROVIDERS = frozenset({PROVIDER_OLLAMA, PROVIDER_ANTHROPIC})
+# Errno values that indicate the LLM endpoint is not reachable at all
+# (connection refused, host unreachable, DNS resolution failure). When
+# urllib raises one of these the request returns essentially instantly,
+# so we treat them as a single "endpoint down" signal rather than as a
+# per-stage timeout.
+_UNREACHABLE_ERRNOS = {
+    errno.ECONNREFUSED,
+    errno.EHOSTUNREACH,
+    errno.ENETUNREACH,
+    errno.EHOSTDOWN,
+}
+def _is_endpoint_unreachable(exc: BaseException) -> bool:
+    """Return True when ``exc`` signals an unreachable LLM endpoint.
+    Covers both urllib's ``URLError`` (which wraps the underlying socket
+    failure in ``.reason``) and the raw socket/OSError shapes. DNS
+    resolution failure surfaces as ``socket.gaierror`` (an OSError
+    subclass) and we treat it the same as connection-refused: the
+    operator's local LLM is not answering.
+    """
+    inner: BaseException = exc
+    if isinstance(exc, URLError):
+        reason = exc.reason
+        if isinstance(reason, BaseException):
+            inner = reason
+        else:
+            return False
+    if isinstance(inner, socket.gaierror):
+        return True
+    if isinstance(inner, ConnectionRefusedError):
+        return True
+    err_no = getattr(inner, "errno", None)
+    return err_no in _UNREACHABLE_ERRNOS
+class LLMClient:
+    """OpenAI-compatible chat client with file cache and fixture mode."""
+    def __init__(
+        self,
+        *,
+        endpoint: str,
+        model: str,
+        cache_dir: Path,
+        fixture_mode: bool,
+        timeout_seconds: float = _DEFAULT_TIMEOUT_SECONDS,
+        provider: str = _DEFAULT_PROVIDER,
+        api_key: str | None = None,
+        max_tokens: int = _DEFAULT_MAX_TOKENS,
+    ) -> None:
+        if provider not in _VALID_PROVIDERS:
+            raise ValueError(
+                f"unsupported LLM provider {provider!r}; "
+                f"expected one of {sorted(_VALID_PROVIDERS)}. "
+                f"Set RESUME_BUILDER_LLM_PROVIDER to a supported value."
+            )
+        self._endpoint = endpoint
+        self._model = model
+        self._cache_dir = cache_dir
+        self._fixture_mode = fixture_mode
+        self._timeout_seconds = timeout_seconds
+        self._provider = provider
+        self._api_key = api_key
+        self._max_tokens = max_tokens
+    @property
+    def endpoint(self) -> str:
+        """Configured chat endpoint URL. Public for diagnostic / warning text."""
+        return self._endpoint
+    @property
+    def model(self) -> str:
+        """Configured model name. Public for diagnostic / warning text."""
+        return self._model
+    @property
+    def provider(self) -> str:
+        """Configured LLM provider (``ollama`` or ``anthropic``). Public for
+        diagnostic / warning text."""
+        return self._provider
+    @classmethod
+    def from_env(cls: type[_LLMClientT]) -> _LLMClientT:
+        fixture_mode = os.getenv(_FIXTURE_ENV, "0").strip() == "1"
+        default_cache_dir = "tests/fixtures/llm_cache" if fixture_mode else ".llm_cache"
+        cache_dir = Path(
+            os.getenv(_CACHE_DIR_ENV, default_cache_dir).strip() or default_cache_dir
+        )
+        provider = (
+            os.getenv(_PROVIDER_ENV, _DEFAULT_PROVIDER).strip() or _DEFAULT_PROVIDER
+        ).lower()
+        # Per-provider endpoint default. An explicit
+        # RESUME_BUILDER_LLM_API_URL override still wins; this only
+        # matters when the env var is unset. Picking the right default
+        # keeps cache keys and diagnostic output from advertising the
+        # Ollama URL when the actual provider is Anthropic.
+        default_endpoint = (
+            _DEFAULT_ANTHROPIC_ENDPOINT
+            if provider == PROVIDER_ANTHROPIC
+            else _DEFAULT_CHAT_ENDPOINT
+        )
+        endpoint = os.getenv(_ENDPOINT_ENV, default_endpoint).strip()
+        # Per-provider model default. The user can always override via
+        # RESUME_BUILDER_LLM_MODEL — this just picks the right "no env
+        # var set" default for whichever provider is active.
+        default_model = (
+            _DEFAULT_ANTHROPIC_MODEL
+            if provider == PROVIDER_ANTHROPIC
+            else _DEFAULT_MODEL
+        )
+        model = os.getenv(_MODEL_ENV, default_model).strip() or default_model
+        timeout_seconds = _parse_timeout_env(os.getenv(_TIMEOUT_ENV))
+        # API-key precedence: repo-convention env first, fall through to
+        # Anthropic's own convention. None is fine for the Ollama path.
+        api_key = os.getenv(_API_KEY_ENV) or os.getenv("ANTHROPIC_API_KEY") or None
+        # Mirror `_parse_timeout_env`: unset / empty / non-numeric /
+        # zero / negative all fall back to the default. A value <= 0
+        # would otherwise round-trip into the Anthropic SDK and fail at
+        # request time with a confusing "max_tokens must be positive"
+        # error far from the misconfiguration site.
+        max_tokens_raw = os.getenv(_MAX_TOKENS_ENV, "").strip()
+        if max_tokens_raw:
+            try:
+                parsed_max = int(max_tokens_raw)
+            except ValueError:
+                max_tokens = _DEFAULT_MAX_TOKENS
+            else:
+                max_tokens = parsed_max if parsed_max > 0 else _DEFAULT_MAX_TOKENS
+        else:
+            max_tokens = _DEFAULT_MAX_TOKENS
+        return cls(
+            endpoint=endpoint,
+            model=model,
+            cache_dir=cache_dir,
+            fixture_mode=fixture_mode,
+            timeout_seconds=timeout_seconds,
+            provider=provider,
+            api_key=api_key,
+            max_tokens=max_tokens,
+        )
+    def complete_json(
+        self,
+        *,
+        namespace: str,
+        system_prompt: str,
+        user_payload: dict[str, object],
+        response_schema: dict[str, Any] | None = None,
+    ) -> dict[str, Any]:
+        """Return a parsed JSON object from the LLM.
+        When ``response_schema`` is provided, the underlying request uses
+        OpenAI-compatible strict json_schema mode
+        (``response_format: {"type": "json_schema", "json_schema": {...},
+        "strict": true}``) so the model is constrained to produce output
+        matching the schema. Ollama's OpenAI-compat layer honors this
+        (verified live during #436 investigation). Without a schema, the
+        request falls back to the prior weak ``{"type": "json_object"}``
+        mode which only enforces valid-JSON structure.
+        The schema is hashed into the cache key alongside the messages —
+        the same prompt with a different schema produces a different
+        cached response, since the model's output can legitimately differ.
+        Error contract (callers may rely on this for narrowed fallback
+        handlers):
+          - ``ValueError`` for response-shape failures: invalid JSON,
+            non-object JSON, or malformed cache payload.
+          - ``RuntimeError`` for **provider-agnostic** failures raised
+            by ``_chat`` itself (outside the upstream-call branch):
+            cache filesystem I/O errors (read or write) and fixture-mode
+            misses. Cache reads and fixture-mode misses fire before the
+            upstream call; cache writes fire after a successful upstream
+            response. All three surface on **either** path — including
+            the Anthropic path — because they're independent of which
+            provider was configured.
+          - ``RuntimeError`` for **Ollama-path-only** failures during
+            or after the upstream call: transport errors (network /
+            DNS / timeout / connection reset) and malformed upstream
+            chat-completion responses.
+          - ``AnthropicProviderError`` (NOT a ``RuntimeError`` subclass)
+            for **Anthropic-path** failures at any point in the Anthropic
+            path (before, during, or after the upstream call): transport
+            errors, API status errors (rate limit, 4xx, 5xx), missing API
+            key (validated before the upstream call), empty response. This
+            intentionally escapes the ``(RuntimeError, ValueError)``
+            fallback so a paid Claude call halts the build loudly instead
+            of silently degrading to deterministic baseline (which is far
+            worse than what the user paid for). See
+            ``AnthropicProviderError``'s docstring for the full rationale.
+        Callers on the **Ollama path** should catch
+        ``(RuntimeError, ValueError)`` to degrade gracefully; anything
+        else escaping from there is a real bug.
+        Callers on the **Anthropic path** should NOT catch
+        ``AnthropicProviderError`` in their fallback blocks — let it
+        propagate. If a caller genuinely needs to handle Anthropic
+        failures (e.g. for retry logic), it should catch
+        ``AnthropicProviderError`` explicitly.
+        """
+        messages = [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": json.dumps(user_payload, sort_keys=True)},
+        ]
+        response = self._chat(
+            messages=messages,
+            namespace=namespace,
+            response_schema=response_schema,
+        )
+        try:
+            parsed = json.loads(response.content)
+        except json.JSONDecodeError as exc:
+            raise ValueError(
+                f"LLM response was not valid JSON for namespace={namespace} "
+                f"cache_key={response.cache_key}"
+            ) from exc
+        if not isinstance(parsed, dict):
+            raise ValueError(
+                f"LLM JSON response must be an object for namespace={namespace} "
+                f"cache_key={response.cache_key}"
+            )
+        return parsed
+    def _chat(
+        self,
+        *,
+        messages: list[dict[str, str]],
+        namespace: str,
+        response_schema: dict[str, Any] | None = None,
+    ) -> LLMResponse:
+        cache_key = self._cache_key(
+            messages=messages, namespace=namespace, response_schema=response_schema
+        )
+        cache_path = self._cache_dir / f"llm_response_{cache_key}.json"
+        pass  # runtime-guard call dropped in kit fork (host-app concern)
+        try:
+            cached = self._read_cache(cache_path)
+        except OSError as exc:
+            # Normalize cache-read filesystem failures (permission denied,
+            # unreadable file, etc.) into RuntimeError so callers that catch
+            # (RuntimeError, ValueError) around complete_json continue to get
+            # the documented fallback behavior instead of an unexpected
+            # OSError propagating out.
+            raise RuntimeError(
+                f"LLM cache read failed for namespace={namespace} "
+                f"cache_key={cache_key}: {exc}"
+            ) from exc
+        if cached is not None:
+            return LLMResponse(content=cached, cache_key=cache_key, from_cache=True)
+        if self._fixture_mode:
+            raise RuntimeError(
+                f"LLM fixture missing for namespace={namespace} cache_key={cache_key}"
+            )
+        content = self._request_chat_completion(
+            messages, namespace=namespace, response_schema=response_schema
+        )
+        try:
+            self._write_cache(cache_path=cache_path, content=content)
+        except OSError as exc:
+            # Same rationale as the read path above: a disk/permission failure
+            # writing the cache must not bypass callers' (RuntimeError,
+            # ValueError) handlers.
+            raise RuntimeError(
+                f"LLM cache write failed for namespace={namespace} "
+                f"cache_key={cache_key}: {exc}"
+            ) from exc
+        return LLMResponse(content=content, cache_key=cache_key, from_cache=False)
+    def _cache_key(
+        self,
+        *,
+        messages: list[dict[str, str]],
+        namespace: str,
+        response_schema: dict[str, Any] | None = None,
+    ) -> str:
+        payload: dict[str, Any] = {
+            "namespace": namespace,
+            "model": self._model,
+            "messages": messages,
+            "endpoint": self._endpoint,
+            # Provider is part of the key so Ollama and Anthropic don't
+            # share cache entries — even when the prompts and model name
+            # collide by accident, their output shapes and JSON-mode
+            # enforcement differ.
+            "provider": self._provider,
+        }
+        # Different schemas legitimately produce different model output for
+        # the same prompt, so include the schema in the cache key.
+        if response_schema is not None:
+            payload["response_schema"] = response_schema
+        serialized = json.dumps(payload, sort_keys=True, separators=(",", ":"))
+        return hashlib.sha256(serialized.encode("utf-8")).hexdigest()
+    def _resolve_timeout(self, namespace: str) -> float:
+        """Pick the timeout for a single request.
+        Resolution order (first non-empty wins):
+          1. Per-stage env override ``RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>``
+          2. The client's configured timeout (which itself came from
+             ``RESUME_BUILDER_LLM_TIMEOUT_SECONDS`` via :meth:`from_env`,
+             or the default when that env var was unset).
+        """
+        raw = os.getenv(_per_stage_timeout_env(namespace))
+        if raw is None:
+            return self._timeout_seconds
+        try:
+            value = float(raw.strip())
+        except (ValueError, AttributeError):
+            return self._timeout_seconds
+        if value <= 0:
+            return self._timeout_seconds
+        return value
+    def _request_chat_completion(
+        self,
+        messages: list[dict[str, str]],
+        *,
+        namespace: str = "",
+        response_schema: dict[str, Any] | None = None,
+    ) -> str:
+        if self._provider == PROVIDER_ANTHROPIC:
+            return self._request_anthropic(
+                messages, namespace=namespace, response_schema=response_schema
+            )
+        # Default: weak json mode (valid JSON, any keys). When a schema is
+        # supplied: OpenAI-compatible strict json_schema mode — Ollama
+        # rejects model output that doesn't match the schema. Issue #436
+        # diagnosis: weak json_object mode let the model invent or drop
+        # keys when the JD content grew past ~1KB, collapsing the
+        # cover-letter body schema entirely.
+        if response_schema is None:
+            response_format: dict[str, Any] = {"type": "json_object"}
+        else:
+            response_format = {
+                "type": "json_schema",
+                "json_schema": {
+                    "name": response_schema.get("name", "response"),
+                    "schema": response_schema.get("schema", response_schema),
+                    "strict": True,
+                },
+            }
+        payload = {
+            "model": self._model,
+            "messages": messages,
+            "temperature": 0,
+            "response_format": response_format,
+        }
+        request = Request(
+            self._endpoint,
+            data=json.dumps(payload).encode("utf-8"),
+            headers={"Content-Type": "application/json"},
+            method="POST",
+        )
+        request_timeout = (
+            self._resolve_timeout(namespace) if namespace else self._timeout_seconds
+        )
+        try:
+            with urlopen(request, timeout=request_timeout) as response:  # nosec B310
+                body = response.read().decode("utf-8", errors="replace")
+        except (OSError, TimeoutError) as exc:
+            # urllib.error.URLError and socket.timeout both inherit from
+            # OSError, and response.read() can also raise raw OSError /
+            # TimeoutError mid-stream. Normalize all transport-level
+            # failures to RuntimeError so callers that catch
+            # (RuntimeError, ValueError) around complete_json (per its
+            # docstring contract) keep the documented fallback behavior
+            # instead of being aborted by an unexpected OSError.
+            #
+            # Endpoint-down failures (connection refused / DNS failure /
+            # host unreachable) are surfaced as the typed
+            # ``LLMEndpointUnreachableError`` subclass so the build
+            # pipeline can emit a single actionable warning and
+            # short-circuit subsequent stages instead of attempting each
+            # one and burning the per-stage timeout budget.
+            if _is_endpoint_unreachable(exc):
+                raise LLMEndpointUnreachableError(
+                    f"LLM endpoint unreachable at {self._endpoint}: {exc}"
+                ) from exc
+            raise RuntimeError(f"LLM request failed: {exc}") from exc
+        try:
+            decoded = json.loads(body)
+        except json.JSONDecodeError as exc:
+            raise RuntimeError("LLM response was not valid JSON") from exc
+        choices = decoded.get("choices", [])
+        if not isinstance(choices, list) or not choices:
+            raise RuntimeError("LLM response missing choices")
+        first = choices[0]
+        if not isinstance(first, dict):
+            raise RuntimeError("LLM response choice was malformed")
+        message = first.get("message", {})
+        if not isinstance(message, dict):
+            raise RuntimeError("LLM response message was malformed")
+        content = message.get("content", "")
+        if not isinstance(content, str) or not content.strip():
+            raise RuntimeError("LLM response content was empty")
+        return content
+    def _request_anthropic(
+        self,
+        messages: list[dict[str, str]],
+        *,
+        namespace: str = "",
+        response_schema: dict[str, Any] | None = None,
+    ) -> str:
+        """Single Anthropic chat-completion call via the official SDK.
+        Returns the model's raw response string (typically JSON when
+        `response_schema` is set via Anthropic's native
+        `output_config.format` structured-output mode). Caller parses +
+        validates.
+        Caching of the system prompt is on by default: every system
+        message in `messages` is hoisted out of the message array into
+        the top-level Anthropic `system` field as a text block, with a
+        `cache_control: {"type": "ephemeral"}` block attached so the
+        prompt is cached across calls within the 5-minute TTL. The
+        cover-letter body system prompt is ~4K tokens — well above
+        Sonnet 4.6's 2048-token minimum cache prefix — and is reused
+        across every call in a multi-JD compare session, so this
+        materially cuts cost on repeat calls.
+        **Error contract — distinct from `_request_chat_completion`.**
+        All failures of the Anthropic path raise `AnthropicProviderError`,
+        which does NOT inherit from `RuntimeError`. The pipeline's
+        existing `(RuntimeError, ValueError)` fallback handlers will
+        not catch these — the build halts loudly instead of silently
+        degrading to deterministic non-LLM output. Rationale: when a
+        user opted into the paid Anthropic provider, falling back to
+        baseline produces output far worse than what they paid for,
+        and burns API latency + tokens producing nothing useful.
+        Specific cases:
+          - `APIConnectionError` (transport unreachable) →
+            `AnthropicProviderError`.
+          - `APIStatusError` (rate limit, 4xx, 5xx) →
+            `AnthropicProviderError`.
+          - Missing API key / empty response → `AnthropicProviderError`.
+        The `anthropic` SDK is lazy-imported so `import llm_client`
+        succeeds in environments that haven't installed it (e.g. CI's
+        lint-only image, the default Ollama path). Same pattern as the
+        playwright lazy imports in `playwright_stealth_kit.launch`.
+        ImportError surfaces directly (not wrapped) since it's a setup
+        failure, not a runtime degradation candidate.
+        """
+        # API-key validation goes BEFORE the SDK import: a missing key
+        # is a configuration error and should surface as an
+        # AnthropicProviderError regardless of whether the SDK is
+        # installed. CI's test-fast / test-slow images don't carry the
+        # `anthropic` dep, so importing first would mask the api-key
+        # check behind an ImportError on those images.
+        if not self._api_key:
+            raise AnthropicProviderError(
+                "anthropic provider requires an API key via "
+                "RESUME_BUILDER_LLM_API_KEY or ANTHROPIC_API_KEY"
+            )
+        try:
+            import anthropic
+            from anthropic import (
+                APIConnectionError,
+                APIStatusError,
+            )
+        except ImportError as exc:
+            raise ImportError(
+                "anthropic SDK is required when "
+                "RESUME_BUILDER_LLM_PROVIDER=anthropic. Install: "
+                "pip install anthropic"
+            ) from exc
+        # Split out the system message (Anthropic puts it at the top
+        # level, not in the messages array) and pull off a cache_control
+        # breakpoint so the system prompt is cached across calls.
+        system_blocks: list[dict[str, Any]] = []
+        user_messages: list[dict[str, Any]] = []
+        for m in messages:
+            role = m.get("role", "")
+            content = m.get("content", "")
+            if role == "system":
+                system_blocks.append(
+                    {
+                        "type": "text",
+                        "text": content,
+                        "cache_control": {"type": "ephemeral"},
+                    }
+                )
+            else:
+                user_messages.append({"role": role, "content": content})
+        kwargs: dict[str, Any] = {
+            "model": self._model,
+            "max_tokens": self._max_tokens,
+            "messages": user_messages,
+        }
+        if system_blocks:
+            kwargs["system"] = system_blocks
+        if response_schema is not None:
+            # Anthropic's native structured-output mode. The caller's
+            # schema dict matches the OpenAI shape `{name, schema}`; the
+            # Anthropic API wants the inner schema only.
+            schema = response_schema.get("schema", response_schema)
+            kwargs["output_config"] = {
+                "format": {"type": "json_schema", "schema": schema}
+            }
+        timeout = (
+            self._resolve_timeout(namespace) if namespace else self._timeout_seconds
+        )
+        client = anthropic.Anthropic(api_key=self._api_key, timeout=timeout)
+        try:
+            response = client.messages.create(**kwargs)
+        except APIConnectionError as exc:
+            raise AnthropicProviderError(f"Anthropic API unreachable: {exc}") from exc
+        except APIStatusError as exc:
+            # Covers RateLimitError, BadRequestError, AuthenticationError,
+            # OverloadedError, etc. — all surface as AnthropicProviderError
+            # so the build halts loudly instead of silently falling back
+            # to deterministic baseline (which would be far worse than
+            # what the user paid for).
+            raise AnthropicProviderError(
+                f"Anthropic API error {exc.status_code}: {exc.message}"
+            ) from exc
+        # Response shape: `content` is a list of typed blocks; structured
+        # output guarantees the first text block holds the JSON payload.
+        for block in response.content:
+            if getattr(block, "type", "") == "text":
+                text = getattr(block, "text", "")
+                if not isinstance(text, str) or not text.strip():
+                    raise AnthropicProviderError(
+                        "Anthropic response text block was empty"
+                    )
+                return text
+        raise AnthropicProviderError("Anthropic response had no text block")
+    def _read_cache(self, path: Path) -> str | None:
+        if not path.exists():
+            return None
+        pass  # runtime-guard call dropped in kit fork (host-app concern)
+        raw = path.read_text(encoding="utf-8")
+        payload = json.loads(raw)
+        if not isinstance(payload, dict):
+            raise ValueError("LLM cache payload must be an object")
+        content = payload.get("content")
+        if not isinstance(content, str):
+            raise ValueError("LLM cache payload missing content string")
+        return content
+    def _write_cache(self, *, cache_path: Path, content: str) -> None:
+        pass  # runtime-guard call dropped in kit fork (host-app concern)
+        cache_path.parent.mkdir(parents=True, exist_ok=True)
+        payload = {
+            "model": self._model,
+            "endpoint": self._endpoint,
+            "content": content,
+        }
+        cache_path.write_text(json.dumps(payload, indent=2) + "\n", encoding="utf-8")

llm_client_kit/py.typed ADDED Viewed

File without changes