jsmithpkp-llm-client-kit 0.1.3__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,116 @@
1
+ Metadata-Version: 2.4
2
+ Name: jsmithpkp-llm-client-kit
3
+ Version: 0.1.3
4
+ Summary: Cached LLM client with Ollama + Anthropic provider dispatch, fixture mode, and JSON-schema-strict completions.
5
+ Author: Jonathan Smith
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jsmithpkp21/llm-client-kit
8
+ Project-URL: Issues, https://github.com/jsmithpkp21/llm-client-kit/issues
9
+ Requires-Python: >=3.11
10
+ Description-Content-Type: text/markdown
11
+ Provides-Extra: anthropic
12
+ Requires-Dist: anthropic>=0.40; extra == "anthropic"
13
+
14
+ # llm-client-kit
15
+
16
+ Cached, fixture-aware LLM client with **Ollama** (default) and **Anthropic** provider dispatch. Carved out of `resume-builder/src/resume_builder/llm_client.py` so it can be shared across internal Python apps without each app rolling its own dispatch + caching layer.
17
+
18
+ PyPI: [`jsmithpkp-llm-client-kit`](https://pypi.org/project/jsmithpkp-llm-client-kit/). Import path stays `llm_client_kit` (PyPI distribution name and Python import name are allowed to differ).
19
+
20
+ ## Install
21
+
22
+ ```bash
23
+ pip install jsmithpkp-llm-client-kit
24
+ ```
25
+
26
+ For the Anthropic provider path, install the extra:
27
+
28
+ ```bash
29
+ pip install 'jsmithpkp-llm-client-kit[anthropic]'
30
+ ```
31
+
32
+ ## Public API
33
+
34
+ ```python
35
+ from llm_client_kit import (
36
+ LLMClient,
37
+ LLMResponse,
38
+ LLMEndpointUnreachableError,
39
+ AnthropicProviderError,
40
+ PROVIDER_OLLAMA,
41
+ PROVIDER_ANTHROPIC,
42
+ )
43
+ ```
44
+
45
+ ### Construct from env (resume-builder-compatible)
46
+
47
+ ```python
48
+ client = LLMClient.from_env() # reads RESUME_BUILDER_LLM_* env vars
49
+ result = client.complete_json(
50
+ namespace="my_stage",
51
+ system_prompt="You are a JSON-only classifier...",
52
+ user_payload={"subject": "...", "body": "..."},
53
+ )
54
+ ```
55
+
56
+ Env vars honored:
57
+
58
+ | Env var | Default | Notes |
59
+ |---|---|---|
60
+ | `RESUME_BUILDER_LLM_PROVIDER` | `ollama` | `ollama` or `anthropic` |
61
+ | `RESUME_BUILDER_LLM_API_URL` | provider-default | OpenAI-compat for Ollama; ignored by Anthropic adapter |
62
+ | `RESUME_BUILDER_LLM_MODEL` | `llama3.1:8b` / `claude-sonnet-4-6` | per-provider default |
63
+ | `RESUME_BUILDER_LLM_API_KEY` / `ANTHROPIC_API_KEY` | unset | required for Anthropic |
64
+ | `RESUME_BUILDER_LLM_TIMEOUT_SECONDS` | `90` | per-request timeout |
65
+ | `RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>` | unset | per-stage timeout override |
66
+ | `RESUME_BUILDER_LLM_MAX_TOKENS` | `2048` | Anthropic only |
67
+ | `RESUME_BUILDER_LLM_FIXTURE` | `0` | fixture mode (no network calls; cache-only) |
68
+ | `RESUME_BUILDER_LLM_CACHE_DIR` | `.llm_cache` / `tests/fixtures/llm_cache` (fixture mode) | response cache location |
69
+
70
+ > The env var prefix `RESUME_BUILDER_LLM_*` is preserved from the carve-out source for v0.1.0 backward compat. A future release will parameterize the prefix so consumers can use their own namespace.
71
+
72
+ ### Construct directly (no env)
73
+
74
+ ```python
75
+ from pathlib import Path
76
+ from llm_client_kit import LLMClient, PROVIDER_OLLAMA
77
+
78
+ client = LLMClient(
79
+ endpoint="http://localhost:11434/v1/chat/completions",
80
+ model="llama3.2:3b",
81
+ cache_dir=Path("./.cache"),
82
+ fixture_mode=False,
83
+ timeout_seconds=30.0,
84
+ provider=PROVIDER_OLLAMA,
85
+ )
86
+ ```
87
+
88
+ ## Error contract
89
+
90
+ `complete_json` documents three failure modes:
91
+
92
+ - `ValueError` — invalid / non-object JSON response, malformed cache payload.
93
+ - `RuntimeError` — provider-agnostic failures (cache I/O, fixture-mode miss) AND Ollama-path transport/parse failures.
94
+ - `LLMEndpointUnreachableError` (`RuntimeError` subclass) — connection refused / DNS failure / host unreachable, on the Ollama path.
95
+ - `AnthropicProviderError` (NOT a `RuntimeError` subclass) — any failure on the Anthropic path. **Intentionally** outside the `(RuntimeError, ValueError)` fallback chain: a paid API call halting loudly is better than silently degrading.
96
+
97
+ See class docstrings for the full rationale.
98
+
99
+ ## Releasing
100
+
101
+ Releases are automated. Tag-triggered workflow at `.github/workflows/release.yml`:
102
+
103
+ 1. Bump `version` in `pyproject.toml` and `__version__` in `src/llm_client_kit/__init__.py` on a PR.
104
+ 2. After the PR merges, tag `vX.Y.Z` on `main` and push the tag.
105
+ 3. The `release` workflow runs `gitleaks` against the working tree AND full git history — any finding aborts the workflow before the build. Then `python -m build` produces sdist + wheel, and `pypa/gh-action-pypi-publish` uploads via PyPI Trusted Publisher OIDC (no API token stored in repo secrets).
106
+
107
+ ```bash
108
+ # example bump-and-release flow
109
+ git switch main && git pull
110
+ # (PR merged that bumps version to 0.1.4)
111
+ git tag v0.1.4
112
+ git push origin v0.1.4
113
+ # watch: gh run watch
114
+ ```
115
+
116
+ The secret-scan step is **mandatory** and is the only thing standing between an accidentally-committed credential and a published wheel. Do not edit the workflow to skip it without first understanding what it catches.
@@ -0,0 +1,103 @@
1
+ # llm-client-kit
2
+
3
+ Cached, fixture-aware LLM client with **Ollama** (default) and **Anthropic** provider dispatch. Carved out of `resume-builder/src/resume_builder/llm_client.py` so it can be shared across internal Python apps without each app rolling its own dispatch + caching layer.
4
+
5
+ PyPI: [`jsmithpkp-llm-client-kit`](https://pypi.org/project/jsmithpkp-llm-client-kit/). Import path stays `llm_client_kit` (PyPI distribution name and Python import name are allowed to differ).
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ pip install jsmithpkp-llm-client-kit
11
+ ```
12
+
13
+ For the Anthropic provider path, install the extra:
14
+
15
+ ```bash
16
+ pip install 'jsmithpkp-llm-client-kit[anthropic]'
17
+ ```
18
+
19
+ ## Public API
20
+
21
+ ```python
22
+ from llm_client_kit import (
23
+ LLMClient,
24
+ LLMResponse,
25
+ LLMEndpointUnreachableError,
26
+ AnthropicProviderError,
27
+ PROVIDER_OLLAMA,
28
+ PROVIDER_ANTHROPIC,
29
+ )
30
+ ```
31
+
32
+ ### Construct from env (resume-builder-compatible)
33
+
34
+ ```python
35
+ client = LLMClient.from_env() # reads RESUME_BUILDER_LLM_* env vars
36
+ result = client.complete_json(
37
+ namespace="my_stage",
38
+ system_prompt="You are a JSON-only classifier...",
39
+ user_payload={"subject": "...", "body": "..."},
40
+ )
41
+ ```
42
+
43
+ Env vars honored:
44
+
45
+ | Env var | Default | Notes |
46
+ |---|---|---|
47
+ | `RESUME_BUILDER_LLM_PROVIDER` | `ollama` | `ollama` or `anthropic` |
48
+ | `RESUME_BUILDER_LLM_API_URL` | provider-default | OpenAI-compat for Ollama; ignored by Anthropic adapter |
49
+ | `RESUME_BUILDER_LLM_MODEL` | `llama3.1:8b` / `claude-sonnet-4-6` | per-provider default |
50
+ | `RESUME_BUILDER_LLM_API_KEY` / `ANTHROPIC_API_KEY` | unset | required for Anthropic |
51
+ | `RESUME_BUILDER_LLM_TIMEOUT_SECONDS` | `90` | per-request timeout |
52
+ | `RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>` | unset | per-stage timeout override |
53
+ | `RESUME_BUILDER_LLM_MAX_TOKENS` | `2048` | Anthropic only |
54
+ | `RESUME_BUILDER_LLM_FIXTURE` | `0` | fixture mode (no network calls; cache-only) |
55
+ | `RESUME_BUILDER_LLM_CACHE_DIR` | `.llm_cache` / `tests/fixtures/llm_cache` (fixture mode) | response cache location |
56
+
57
+ > The env var prefix `RESUME_BUILDER_LLM_*` is preserved from the carve-out source for v0.1.0 backward compat. A future release will parameterize the prefix so consumers can use their own namespace.
58
+
59
+ ### Construct directly (no env)
60
+
61
+ ```python
62
+ from pathlib import Path
63
+ from llm_client_kit import LLMClient, PROVIDER_OLLAMA
64
+
65
+ client = LLMClient(
66
+ endpoint="http://localhost:11434/v1/chat/completions",
67
+ model="llama3.2:3b",
68
+ cache_dir=Path("./.cache"),
69
+ fixture_mode=False,
70
+ timeout_seconds=30.0,
71
+ provider=PROVIDER_OLLAMA,
72
+ )
73
+ ```
74
+
75
+ ## Error contract
76
+
77
+ `complete_json` documents three failure modes:
78
+
79
+ - `ValueError` — invalid / non-object JSON response, malformed cache payload.
80
+ - `RuntimeError` — provider-agnostic failures (cache I/O, fixture-mode miss) AND Ollama-path transport/parse failures.
81
+ - `LLMEndpointUnreachableError` (`RuntimeError` subclass) — connection refused / DNS failure / host unreachable, on the Ollama path.
82
+ - `AnthropicProviderError` (NOT a `RuntimeError` subclass) — any failure on the Anthropic path. **Intentionally** outside the `(RuntimeError, ValueError)` fallback chain: a paid API call halting loudly is better than silently degrading.
83
+
84
+ See class docstrings for the full rationale.
85
+
86
+ ## Releasing
87
+
88
+ Releases are automated. Tag-triggered workflow at `.github/workflows/release.yml`:
89
+
90
+ 1. Bump `version` in `pyproject.toml` and `__version__` in `src/llm_client_kit/__init__.py` on a PR.
91
+ 2. After the PR merges, tag `vX.Y.Z` on `main` and push the tag.
92
+ 3. The `release` workflow runs `gitleaks` against the working tree AND full git history — any finding aborts the workflow before the build. Then `python -m build` produces sdist + wheel, and `pypa/gh-action-pypi-publish` uploads via PyPI Trusted Publisher OIDC (no API token stored in repo secrets).
93
+
94
+ ```bash
95
+ # example bump-and-release flow
96
+ git switch main && git pull
97
+ # (PR merged that bumps version to 0.1.4)
98
+ git tag v0.1.4
99
+ git push origin v0.1.4
100
+ # watch: gh run watch
101
+ ```
102
+
103
+ The secret-scan step is **mandatory** and is the only thing standing between an accidentally-committed credential and a published wheel. Do not edit the workflow to skip it without first understanding what it catches.
@@ -0,0 +1,43 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ # PyPI distribution name is namespaced with `jsmithpkp-` so the
7
+ # top-level slot isn't claimed under a generic name. The Python
8
+ # import path stays `llm_client_kit` (PyPI name and import name are
9
+ # allowed to differ, e.g. `python-dateutil` → `dateutil`), so
10
+ # consumers don't have to rewrite imports when bumping from the
11
+ # internal-pypiserver build to the public-PyPI build.
12
+ name = "jsmithpkp-llm-client-kit"
13
+ version = "0.1.3"
14
+ description = "Cached LLM client with Ollama + Anthropic provider dispatch, fixture mode, and JSON-schema-strict completions."
15
+ readme = "README.md"
16
+ requires-python = ">=3.11"
17
+ authors = [{ name = "Jonathan Smith" }]
18
+ license = { text = "MIT" }
19
+ # No runtime deps for the Ollama path (stdlib urllib). Anthropic is
20
+ # optional; consumers that use PROVIDER_ANTHROPIC must install it via
21
+ # the `anthropic` extra. Kept optional so a kit consumer that only
22
+ # uses Ollama doesn't carry the SDK install cost.
23
+ dependencies = []
24
+
25
+ [project.optional-dependencies]
26
+ anthropic = ["anthropic>=0.40"]
27
+
28
+ [project.urls]
29
+ Homepage = "https://github.com/jsmithpkp21/llm-client-kit"
30
+ Issues = "https://github.com/jsmithpkp21/llm-client-kit/issues"
31
+
32
+ [tool.setuptools.packages.find]
33
+ where = ["src"]
34
+
35
+ [tool.setuptools.package-data]
36
+ # PEP 561 marker: tells mypy/pyright the package ships its own type
37
+ # information so consumers don't have to install a separate stubs
38
+ # package or set ignore_missing_imports.
39
+ llm_client_kit = ["py.typed"]
40
+
41
+ [tool.pytest.ini_options]
42
+ testpaths = ["tests"]
43
+ pythonpath = ["src"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,116 @@
1
+ Metadata-Version: 2.4
2
+ Name: jsmithpkp-llm-client-kit
3
+ Version: 0.1.3
4
+ Summary: Cached LLM client with Ollama + Anthropic provider dispatch, fixture mode, and JSON-schema-strict completions.
5
+ Author: Jonathan Smith
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/jsmithpkp21/llm-client-kit
8
+ Project-URL: Issues, https://github.com/jsmithpkp21/llm-client-kit/issues
9
+ Requires-Python: >=3.11
10
+ Description-Content-Type: text/markdown
11
+ Provides-Extra: anthropic
12
+ Requires-Dist: anthropic>=0.40; extra == "anthropic"
13
+
14
+ # llm-client-kit
15
+
16
+ Cached, fixture-aware LLM client with **Ollama** (default) and **Anthropic** provider dispatch. Carved out of `resume-builder/src/resume_builder/llm_client.py` so it can be shared across internal Python apps without each app rolling its own dispatch + caching layer.
17
+
18
+ PyPI: [`jsmithpkp-llm-client-kit`](https://pypi.org/project/jsmithpkp-llm-client-kit/). Import path stays `llm_client_kit` (PyPI distribution name and Python import name are allowed to differ).
19
+
20
+ ## Install
21
+
22
+ ```bash
23
+ pip install jsmithpkp-llm-client-kit
24
+ ```
25
+
26
+ For the Anthropic provider path, install the extra:
27
+
28
+ ```bash
29
+ pip install 'jsmithpkp-llm-client-kit[anthropic]'
30
+ ```
31
+
32
+ ## Public API
33
+
34
+ ```python
35
+ from llm_client_kit import (
36
+ LLMClient,
37
+ LLMResponse,
38
+ LLMEndpointUnreachableError,
39
+ AnthropicProviderError,
40
+ PROVIDER_OLLAMA,
41
+ PROVIDER_ANTHROPIC,
42
+ )
43
+ ```
44
+
45
+ ### Construct from env (resume-builder-compatible)
46
+
47
+ ```python
48
+ client = LLMClient.from_env() # reads RESUME_BUILDER_LLM_* env vars
49
+ result = client.complete_json(
50
+ namespace="my_stage",
51
+ system_prompt="You are a JSON-only classifier...",
52
+ user_payload={"subject": "...", "body": "..."},
53
+ )
54
+ ```
55
+
56
+ Env vars honored:
57
+
58
+ | Env var | Default | Notes |
59
+ |---|---|---|
60
+ | `RESUME_BUILDER_LLM_PROVIDER` | `ollama` | `ollama` or `anthropic` |
61
+ | `RESUME_BUILDER_LLM_API_URL` | provider-default | OpenAI-compat for Ollama; ignored by Anthropic adapter |
62
+ | `RESUME_BUILDER_LLM_MODEL` | `llama3.1:8b` / `claude-sonnet-4-6` | per-provider default |
63
+ | `RESUME_BUILDER_LLM_API_KEY` / `ANTHROPIC_API_KEY` | unset | required for Anthropic |
64
+ | `RESUME_BUILDER_LLM_TIMEOUT_SECONDS` | `90` | per-request timeout |
65
+ | `RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>` | unset | per-stage timeout override |
66
+ | `RESUME_BUILDER_LLM_MAX_TOKENS` | `2048` | Anthropic only |
67
+ | `RESUME_BUILDER_LLM_FIXTURE` | `0` | fixture mode (no network calls; cache-only) |
68
+ | `RESUME_BUILDER_LLM_CACHE_DIR` | `.llm_cache` / `tests/fixtures/llm_cache` (fixture mode) | response cache location |
69
+
70
+ > The env var prefix `RESUME_BUILDER_LLM_*` is preserved from the carve-out source for v0.1.0 backward compat. A future release will parameterize the prefix so consumers can use their own namespace.
71
+
72
+ ### Construct directly (no env)
73
+
74
+ ```python
75
+ from pathlib import Path
76
+ from llm_client_kit import LLMClient, PROVIDER_OLLAMA
77
+
78
+ client = LLMClient(
79
+ endpoint="http://localhost:11434/v1/chat/completions",
80
+ model="llama3.2:3b",
81
+ cache_dir=Path("./.cache"),
82
+ fixture_mode=False,
83
+ timeout_seconds=30.0,
84
+ provider=PROVIDER_OLLAMA,
85
+ )
86
+ ```
87
+
88
+ ## Error contract
89
+
90
+ `complete_json` documents three failure modes:
91
+
92
+ - `ValueError` — invalid / non-object JSON response, malformed cache payload.
93
+ - `RuntimeError` — provider-agnostic failures (cache I/O, fixture-mode miss) AND Ollama-path transport/parse failures.
94
+ - `LLMEndpointUnreachableError` (`RuntimeError` subclass) — connection refused / DNS failure / host unreachable, on the Ollama path.
95
+ - `AnthropicProviderError` (NOT a `RuntimeError` subclass) — any failure on the Anthropic path. **Intentionally** outside the `(RuntimeError, ValueError)` fallback chain: a paid API call halting loudly is better than silently degrading.
96
+
97
+ See class docstrings for the full rationale.
98
+
99
+ ## Releasing
100
+
101
+ Releases are automated. Tag-triggered workflow at `.github/workflows/release.yml`:
102
+
103
+ 1. Bump `version` in `pyproject.toml` and `__version__` in `src/llm_client_kit/__init__.py` on a PR.
104
+ 2. After the PR merges, tag `vX.Y.Z` on `main` and push the tag.
105
+ 3. The `release` workflow runs `gitleaks` against the working tree AND full git history — any finding aborts the workflow before the build. Then `python -m build` produces sdist + wheel, and `pypa/gh-action-pypi-publish` uploads via PyPI Trusted Publisher OIDC (no API token stored in repo secrets).
106
+
107
+ ```bash
108
+ # example bump-and-release flow
109
+ git switch main && git pull
110
+ # (PR merged that bumps version to 0.1.4)
111
+ git tag v0.1.4
112
+ git push origin v0.1.4
113
+ # watch: gh run watch
114
+ ```
115
+
116
+ The secret-scan step is **mandatory** and is the only thing standing between an accidentally-committed credential and a published wheel. Do not edit the workflow to skip it without first understanding what it catches.
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/jsmithpkp_llm_client_kit.egg-info/PKG-INFO
4
+ src/jsmithpkp_llm_client_kit.egg-info/SOURCES.txt
5
+ src/jsmithpkp_llm_client_kit.egg-info/dependency_links.txt
6
+ src/jsmithpkp_llm_client_kit.egg-info/requires.txt
7
+ src/jsmithpkp_llm_client_kit.egg-info/top_level.txt
8
+ src/llm_client_kit/__init__.py
9
+ src/llm_client_kit/client.py
10
+ src/llm_client_kit/py.typed
11
+ tests/test_smoke.py
@@ -0,0 +1,3 @@
1
+
2
+ [anthropic]
3
+ anthropic>=0.40
@@ -0,0 +1,21 @@
1
+ """Shared LLM client with Ollama + Anthropic provider dispatch."""
2
+
3
+ from llm_client_kit.client import (
4
+ PROVIDER_ANTHROPIC,
5
+ PROVIDER_OLLAMA,
6
+ AnthropicProviderError,
7
+ LLMClient,
8
+ LLMEndpointUnreachableError,
9
+ LLMResponse,
10
+ )
11
+
12
+ __all__ = [
13
+ "AnthropicProviderError",
14
+ "LLMClient",
15
+ "LLMEndpointUnreachableError",
16
+ "LLMResponse",
17
+ "PROVIDER_ANTHROPIC",
18
+ "PROVIDER_OLLAMA",
19
+ ]
20
+
21
+ __version__ = "0.1.3"
@@ -0,0 +1,733 @@
1
+ #!/usr/bin/env python3
2
+ """Cached LLM client with Ollama + Anthropic provider dispatch.
3
+
4
+ Lifted from resume-builder's internal ``llm_client.py`` to be a shared
5
+ dependency consumed via the internal pypiserver. Behavior is identical
6
+ to the resume-builder original at v0.1.0; the only changes are dropping
7
+ the host-app-specific path-safety guard (``assert_not_blocked_runtime_input``)
8
+ and broadening the docstrings. The env var convention (``RESUME_BUILDER_LLM_*``)
9
+ is preserved verbatim for v0.1.0 so resume-builder's adoption is a no-op
10
+ import swap — a follow-up release will parameterize the prefix.
11
+
12
+ Consumers that don't want to use the env var convention can construct
13
+ ``LLMClient`` directly via ``__init__`` instead of ``from_env``.
14
+ """
15
+
16
+ from __future__ import annotations
17
+
18
+ import errno
19
+ import hashlib
20
+ import json
21
+ import os
22
+ import socket
23
+ from dataclasses import dataclass
24
+ from pathlib import Path
25
+ from typing import Any, TypeVar
26
+
27
+ _LLMClientT = TypeVar("_LLMClientT", bound="LLMClient")
28
+ from urllib.error import URLError
29
+ from urllib.request import Request, urlopen
30
+
31
+
32
+ _DEFAULT_CHAT_ENDPOINT = "http://localhost:11434/v1/chat/completions"
33
+ _DEFAULT_MODEL = "llama3.1:8b"
34
+ # Per-provider endpoint default — picked at `from_env` time when no
35
+ # `RESUME_BUILDER_LLM_API_URL` override is set. The Anthropic adapter
36
+ # doesn't actually use `endpoint` at request time (the SDK manages the
37
+ # URL internally), but the value still gets baked into cache keys and
38
+ # diagnostic output, so defaulting to the Anthropic API URL when the
39
+ # provider is Anthropic keeps both honest.
40
+ _DEFAULT_ANTHROPIC_ENDPOINT = "https://api.anthropic.com/v1/messages"
41
+ # Bumped from 30s to 90s in #414: on local llama3.1:8b the longer
42
+ # transform_for_role / enrich_data prompts routinely exceeded 30s on a
43
+ # 12GB consumer GPU. Connection-refused / DNS failures still return
44
+ # instantly (they bypass timeout), so the higher ceiling does not slow
45
+ # down the "endpoint unreachable" diagnostic path. Per-stage env vars
46
+ # below let callers tighten or loosen individual stages.
47
+ _DEFAULT_TIMEOUT_SECONDS = 90.0
48
+ _FIXTURE_ENV = "RESUME_BUILDER_LLM_FIXTURE"
49
+ _CACHE_DIR_ENV = "RESUME_BUILDER_LLM_CACHE_DIR"
50
+ _ENDPOINT_ENV = "RESUME_BUILDER_LLM_API_URL"
51
+ _MODEL_ENV = "RESUME_BUILDER_LLM_MODEL"
52
+ _TIMEOUT_ENV = "RESUME_BUILDER_LLM_TIMEOUT_SECONDS"
53
+ _PROVIDER_ENV = "RESUME_BUILDER_LLM_PROVIDER"
54
+ _API_KEY_ENV = "RESUME_BUILDER_LLM_API_KEY"
55
+ _MAX_TOKENS_ENV = "RESUME_BUILDER_LLM_MAX_TOKENS"
56
+
57
+ # Supported provider identifiers.
58
+ PROVIDER_OLLAMA = "ollama"
59
+ PROVIDER_ANTHROPIC = "anthropic"
60
+ _DEFAULT_PROVIDER = PROVIDER_OLLAMA
61
+
62
+ # When the user opts into the Anthropic provider but doesn't override
63
+ # the model env var, default to Sonnet 4.6 — cost-efficient frontier
64
+ # model with strong instruction following on long, rule-dense prompts
65
+ # (`_BODY_SYSTEM_PROMPT_BASE` is ~4K tokens of HARD RULES + addendum).
66
+ # Override via RESUME_BUILDER_LLM_MODEL (e.g. claude-opus-4-7).
67
+ _DEFAULT_ANTHROPIC_MODEL = "claude-sonnet-4-6"
68
+
69
+ # Per-request output cap for the Anthropic adapter. Cover-letter body
70
+ # drafts return 600-1200 tokens in practice; 2048 gives headroom without
71
+ # inviting runaway output. Override via RESUME_BUILDER_LLM_MAX_TOKENS.
72
+ _DEFAULT_MAX_TOKENS = 2048
73
+ # Per-stage timeout env var prefix (#414). Resolution order at the
74
+ # request site: per-stage override (RESUME_BUILDER_LLM_TIMEOUT_<NS>) ->
75
+ # global override (RESUME_BUILDER_LLM_TIMEOUT_SECONDS) -> default.
76
+ _PER_STAGE_TIMEOUT_ENV_PREFIX = "RESUME_BUILDER_LLM_TIMEOUT_"
77
+
78
+
79
+ def _per_stage_timeout_env(namespace: str) -> str:
80
+ """Return the env var name that overrides timeout for ``namespace``.
81
+
82
+ Namespace strings are lowercase snake_case in callers
83
+ (``transform_for_role``, ``enrich_data``, ...). The matching env var
84
+ upper-cases the suffix only — the rest of the prefix is fixed so the
85
+ var name is predictable and greppable.
86
+ """
87
+ return f"{_PER_STAGE_TIMEOUT_ENV_PREFIX}{namespace.upper()}"
88
+
89
+
90
+ def _parse_timeout_env(raw: str | None) -> float:
91
+ """Parse RESUME_BUILDER_LLM_TIMEOUT_SECONDS, falling back to the default.
92
+
93
+ Empty / unset / non-numeric values fall back to the default; values
94
+ <= 0 also fall back since urlopen requires a positive timeout (or
95
+ None, but we don't expose 'no timeout' as a knob).
96
+ """
97
+ if raw is None:
98
+ return _DEFAULT_TIMEOUT_SECONDS
99
+ try:
100
+ value = float(raw.strip())
101
+ except ValueError:
102
+ return _DEFAULT_TIMEOUT_SECONDS
103
+ if value <= 0:
104
+ return _DEFAULT_TIMEOUT_SECONDS
105
+ return value
106
+
107
+
108
+ @dataclass(frozen=True)
109
+ class LLMResponse:
110
+ content: str
111
+ cache_key: str
112
+ from_cache: bool
113
+
114
+
115
+ class LLMEndpointUnreachableError(RuntimeError):
116
+ """Raised when the LLM endpoint refuses connection or DNS resolution fails.
117
+
118
+ A subclass of ``RuntimeError`` so existing callers that catch
119
+ ``(RuntimeError, ValueError)`` to degrade to baseline keep working.
120
+ Build-pipeline code that wants to emit the "endpoint unreachable"
121
+ operator warning and fast-fail subsequent stages can narrow on this
122
+ subclass without disturbing the broader transport-error path
123
+ (timeouts, mid-stream resets, etc.) which keeps using bare
124
+ ``RuntimeError`` since those are recoverable per-stage failures.
125
+ """
126
+
127
+
128
+ class AnthropicProviderError(Exception):
129
+ """Raised on any Anthropic-provider failure — **explicitly NOT a
130
+ subclass of ``RuntimeError``**.
131
+
132
+ The pipeline's existing fallback handlers catch
133
+ ``(RuntimeError, ValueError)`` to degrade gracefully to deterministic
134
+ logic when the local LLM is flaky. That's the right behavior when
135
+ a user is running on Ollama and the local model is intermittently
136
+ slow or broken — fall back to the deterministic baseline and keep
137
+ moving.
138
+
139
+ When the user has explicitly opted into the Anthropic provider via
140
+ ``RESUME_BUILDER_LLM_PROVIDER=anthropic``, that fallback is wrong:
141
+ they paid for the API call expecting frontier-model quality, and
142
+ silently degrading to non-LLM-tailored output produces something
143
+ far worse than what they asked for (and takes time + tokens to
144
+ fail at). Raising a separate exception base means Anthropic
145
+ failures propagate **uncaught** through ``(RuntimeError, ValueError)``
146
+ handlers and halt the build with a clear failure mode.
147
+
148
+ Callers must not catch this exception in their existing fallback
149
+ blocks. If a caller genuinely needs to handle Anthropic failures
150
+ (e.g. retry logic), it should catch ``AnthropicProviderError``
151
+ explicitly.
152
+
153
+ **Raised by:** ``LLMClient.complete_json()`` when
154
+ ``provider == "anthropic"``. This means ``complete_json``'s
155
+ documented ``(RuntimeError, ValueError)`` escape contract no
156
+ longer covers every path — on the Anthropic path,
157
+ ``AnthropicProviderError`` can also escape. The "anything else
158
+ escaping from here is a real bug" rule in ``complete_json``'s
159
+ docstring applies only to the Ollama path.
160
+ """
161
+
162
+
163
+ # Allowed provider identifiers. Anything else fails fast at construction
164
+ # time so a typo in `RESUME_BUILDER_LLM_PROVIDER` doesn't silently route
165
+ # requests to the Ollama endpoint while the user thinks Anthropic is
166
+ # active.
167
+ _VALID_PROVIDERS = frozenset({PROVIDER_OLLAMA, PROVIDER_ANTHROPIC})
168
+
169
+
170
+ # Errno values that indicate the LLM endpoint is not reachable at all
171
+ # (connection refused, host unreachable, DNS resolution failure). When
172
+ # urllib raises one of these the request returns essentially instantly,
173
+ # so we treat them as a single "endpoint down" signal rather than as a
174
+ # per-stage timeout.
175
+ _UNREACHABLE_ERRNOS = {
176
+ errno.ECONNREFUSED,
177
+ errno.EHOSTUNREACH,
178
+ errno.ENETUNREACH,
179
+ errno.EHOSTDOWN,
180
+ }
181
+
182
+
183
+ def _is_endpoint_unreachable(exc: BaseException) -> bool:
184
+ """Return True when ``exc`` signals an unreachable LLM endpoint.
185
+
186
+ Covers both urllib's ``URLError`` (which wraps the underlying socket
187
+ failure in ``.reason``) and the raw socket/OSError shapes. DNS
188
+ resolution failure surfaces as ``socket.gaierror`` (an OSError
189
+ subclass) and we treat it the same as connection-refused: the
190
+ operator's local LLM is not answering.
191
+ """
192
+ inner: BaseException = exc
193
+ if isinstance(exc, URLError):
194
+ reason = exc.reason
195
+ if isinstance(reason, BaseException):
196
+ inner = reason
197
+ else:
198
+ return False
199
+ if isinstance(inner, socket.gaierror):
200
+ return True
201
+ if isinstance(inner, ConnectionRefusedError):
202
+ return True
203
+ err_no = getattr(inner, "errno", None)
204
+ return err_no in _UNREACHABLE_ERRNOS
205
+
206
+
207
+ class LLMClient:
208
+ """OpenAI-compatible chat client with file cache and fixture mode."""
209
+
210
+ def __init__(
211
+ self,
212
+ *,
213
+ endpoint: str,
214
+ model: str,
215
+ cache_dir: Path,
216
+ fixture_mode: bool,
217
+ timeout_seconds: float = _DEFAULT_TIMEOUT_SECONDS,
218
+ provider: str = _DEFAULT_PROVIDER,
219
+ api_key: str | None = None,
220
+ max_tokens: int = _DEFAULT_MAX_TOKENS,
221
+ ) -> None:
222
+ if provider not in _VALID_PROVIDERS:
223
+ raise ValueError(
224
+ f"unsupported LLM provider {provider!r}; "
225
+ f"expected one of {sorted(_VALID_PROVIDERS)}. "
226
+ f"Set RESUME_BUILDER_LLM_PROVIDER to a supported value."
227
+ )
228
+ self._endpoint = endpoint
229
+ self._model = model
230
+ self._cache_dir = cache_dir
231
+ self._fixture_mode = fixture_mode
232
+ self._timeout_seconds = timeout_seconds
233
+ self._provider = provider
234
+ self._api_key = api_key
235
+ self._max_tokens = max_tokens
236
+
237
+ @property
238
+ def endpoint(self) -> str:
239
+ """Configured chat endpoint URL. Public for diagnostic / warning text."""
240
+ return self._endpoint
241
+
242
+ @property
243
+ def model(self) -> str:
244
+ """Configured model name. Public for diagnostic / warning text."""
245
+ return self._model
246
+
247
+ @property
248
+ def provider(self) -> str:
249
+ """Configured LLM provider (``ollama`` or ``anthropic``). Public for
250
+ diagnostic / warning text."""
251
+ return self._provider
252
+
253
+ @classmethod
254
+ def from_env(cls: type[_LLMClientT]) -> _LLMClientT:
255
+ fixture_mode = os.getenv(_FIXTURE_ENV, "0").strip() == "1"
256
+ default_cache_dir = "tests/fixtures/llm_cache" if fixture_mode else ".llm_cache"
257
+ cache_dir = Path(
258
+ os.getenv(_CACHE_DIR_ENV, default_cache_dir).strip() or default_cache_dir
259
+ )
260
+ provider = (
261
+ os.getenv(_PROVIDER_ENV, _DEFAULT_PROVIDER).strip() or _DEFAULT_PROVIDER
262
+ ).lower()
263
+ # Per-provider endpoint default. An explicit
264
+ # RESUME_BUILDER_LLM_API_URL override still wins; this only
265
+ # matters when the env var is unset. Picking the right default
266
+ # keeps cache keys and diagnostic output from advertising the
267
+ # Ollama URL when the actual provider is Anthropic.
268
+ default_endpoint = (
269
+ _DEFAULT_ANTHROPIC_ENDPOINT
270
+ if provider == PROVIDER_ANTHROPIC
271
+ else _DEFAULT_CHAT_ENDPOINT
272
+ )
273
+ endpoint = os.getenv(_ENDPOINT_ENV, default_endpoint).strip()
274
+ # Per-provider model default. The user can always override via
275
+ # RESUME_BUILDER_LLM_MODEL — this just picks the right "no env
276
+ # var set" default for whichever provider is active.
277
+ default_model = (
278
+ _DEFAULT_ANTHROPIC_MODEL
279
+ if provider == PROVIDER_ANTHROPIC
280
+ else _DEFAULT_MODEL
281
+ )
282
+ model = os.getenv(_MODEL_ENV, default_model).strip() or default_model
283
+ timeout_seconds = _parse_timeout_env(os.getenv(_TIMEOUT_ENV))
284
+ # API-key precedence: repo-convention env first, fall through to
285
+ # Anthropic's own convention. None is fine for the Ollama path.
286
+ api_key = os.getenv(_API_KEY_ENV) or os.getenv("ANTHROPIC_API_KEY") or None
287
+ # Mirror `_parse_timeout_env`: unset / empty / non-numeric /
288
+ # zero / negative all fall back to the default. A value <= 0
289
+ # would otherwise round-trip into the Anthropic SDK and fail at
290
+ # request time with a confusing "max_tokens must be positive"
291
+ # error far from the misconfiguration site.
292
+ max_tokens_raw = os.getenv(_MAX_TOKENS_ENV, "").strip()
293
+ if max_tokens_raw:
294
+ try:
295
+ parsed_max = int(max_tokens_raw)
296
+ except ValueError:
297
+ max_tokens = _DEFAULT_MAX_TOKENS
298
+ else:
299
+ max_tokens = parsed_max if parsed_max > 0 else _DEFAULT_MAX_TOKENS
300
+ else:
301
+ max_tokens = _DEFAULT_MAX_TOKENS
302
+ return cls(
303
+ endpoint=endpoint,
304
+ model=model,
305
+ cache_dir=cache_dir,
306
+ fixture_mode=fixture_mode,
307
+ timeout_seconds=timeout_seconds,
308
+ provider=provider,
309
+ api_key=api_key,
310
+ max_tokens=max_tokens,
311
+ )
312
+
313
+ def complete_json(
314
+ self,
315
+ *,
316
+ namespace: str,
317
+ system_prompt: str,
318
+ user_payload: dict[str, object],
319
+ response_schema: dict[str, Any] | None = None,
320
+ ) -> dict[str, Any]:
321
+ """Return a parsed JSON object from the LLM.
322
+
323
+ When ``response_schema`` is provided, the underlying request uses
324
+ OpenAI-compatible strict json_schema mode
325
+ (``response_format: {"type": "json_schema", "json_schema": {...},
326
+ "strict": true}``) so the model is constrained to produce output
327
+ matching the schema. Ollama's OpenAI-compat layer honors this
328
+ (verified live during #436 investigation). Without a schema, the
329
+ request falls back to the prior weak ``{"type": "json_object"}``
330
+ mode which only enforces valid-JSON structure.
331
+
332
+ The schema is hashed into the cache key alongside the messages —
333
+ the same prompt with a different schema produces a different
334
+ cached response, since the model's output can legitimately differ.
335
+
336
+ Error contract (callers may rely on this for narrowed fallback
337
+ handlers):
338
+ - ``ValueError`` for response-shape failures: invalid JSON,
339
+ non-object JSON, or malformed cache payload.
340
+ - ``RuntimeError`` for **provider-agnostic** failures raised
341
+ by ``_chat`` itself (outside the upstream-call branch):
342
+ cache filesystem I/O errors (read or write) and fixture-mode
343
+ misses. Cache reads and fixture-mode misses fire before the
344
+ upstream call; cache writes fire after a successful upstream
345
+ response. All three surface on **either** path — including
346
+ the Anthropic path — because they're independent of which
347
+ provider was configured.
348
+ - ``RuntimeError`` for **Ollama-path-only** failures during
349
+ or after the upstream call: transport errors (network /
350
+ DNS / timeout / connection reset) and malformed upstream
351
+ chat-completion responses.
352
+ - ``AnthropicProviderError`` (NOT a ``RuntimeError`` subclass)
353
+ for **Anthropic-path** failures at any point in the Anthropic
354
+ path (before, during, or after the upstream call): transport
355
+ errors, API status errors (rate limit, 4xx, 5xx), missing API
356
+ key (validated before the upstream call), empty response. This
357
+ intentionally escapes the ``(RuntimeError, ValueError)``
358
+ fallback so a paid Claude call halts the build loudly instead
359
+ of silently degrading to deterministic baseline (which is far
360
+ worse than what the user paid for). See
361
+ ``AnthropicProviderError``'s docstring for the full rationale.
362
+
363
+ Callers on the **Ollama path** should catch
364
+ ``(RuntimeError, ValueError)`` to degrade gracefully; anything
365
+ else escaping from there is a real bug.
366
+
367
+ Callers on the **Anthropic path** should NOT catch
368
+ ``AnthropicProviderError`` in their fallback blocks — let it
369
+ propagate. If a caller genuinely needs to handle Anthropic
370
+ failures (e.g. for retry logic), it should catch
371
+ ``AnthropicProviderError`` explicitly.
372
+ """
373
+ messages = [
374
+ {"role": "system", "content": system_prompt},
375
+ {"role": "user", "content": json.dumps(user_payload, sort_keys=True)},
376
+ ]
377
+ response = self._chat(
378
+ messages=messages,
379
+ namespace=namespace,
380
+ response_schema=response_schema,
381
+ )
382
+ try:
383
+ parsed = json.loads(response.content)
384
+ except json.JSONDecodeError as exc:
385
+ raise ValueError(
386
+ f"LLM response was not valid JSON for namespace={namespace} "
387
+ f"cache_key={response.cache_key}"
388
+ ) from exc
389
+ if not isinstance(parsed, dict):
390
+ raise ValueError(
391
+ f"LLM JSON response must be an object for namespace={namespace} "
392
+ f"cache_key={response.cache_key}"
393
+ )
394
+ return parsed
395
+
396
+ def _chat(
397
+ self,
398
+ *,
399
+ messages: list[dict[str, str]],
400
+ namespace: str,
401
+ response_schema: dict[str, Any] | None = None,
402
+ ) -> LLMResponse:
403
+ cache_key = self._cache_key(
404
+ messages=messages, namespace=namespace, response_schema=response_schema
405
+ )
406
+ cache_path = self._cache_dir / f"llm_response_{cache_key}.json"
407
+ pass # runtime-guard call dropped in kit fork (host-app concern)
408
+
409
+ try:
410
+ cached = self._read_cache(cache_path)
411
+ except OSError as exc:
412
+ # Normalize cache-read filesystem failures (permission denied,
413
+ # unreadable file, etc.) into RuntimeError so callers that catch
414
+ # (RuntimeError, ValueError) around complete_json continue to get
415
+ # the documented fallback behavior instead of an unexpected
416
+ # OSError propagating out.
417
+ raise RuntimeError(
418
+ f"LLM cache read failed for namespace={namespace} "
419
+ f"cache_key={cache_key}: {exc}"
420
+ ) from exc
421
+ if cached is not None:
422
+ return LLMResponse(content=cached, cache_key=cache_key, from_cache=True)
423
+
424
+ if self._fixture_mode:
425
+ raise RuntimeError(
426
+ f"LLM fixture missing for namespace={namespace} cache_key={cache_key}"
427
+ )
428
+
429
+ content = self._request_chat_completion(
430
+ messages, namespace=namespace, response_schema=response_schema
431
+ )
432
+ try:
433
+ self._write_cache(cache_path=cache_path, content=content)
434
+ except OSError as exc:
435
+ # Same rationale as the read path above: a disk/permission failure
436
+ # writing the cache must not bypass callers' (RuntimeError,
437
+ # ValueError) handlers.
438
+ raise RuntimeError(
439
+ f"LLM cache write failed for namespace={namespace} "
440
+ f"cache_key={cache_key}: {exc}"
441
+ ) from exc
442
+ return LLMResponse(content=content, cache_key=cache_key, from_cache=False)
443
+
444
+ def _cache_key(
445
+ self,
446
+ *,
447
+ messages: list[dict[str, str]],
448
+ namespace: str,
449
+ response_schema: dict[str, Any] | None = None,
450
+ ) -> str:
451
+ payload: dict[str, Any] = {
452
+ "namespace": namespace,
453
+ "model": self._model,
454
+ "messages": messages,
455
+ "endpoint": self._endpoint,
456
+ # Provider is part of the key so Ollama and Anthropic don't
457
+ # share cache entries — even when the prompts and model name
458
+ # collide by accident, their output shapes and JSON-mode
459
+ # enforcement differ.
460
+ "provider": self._provider,
461
+ }
462
+ # Different schemas legitimately produce different model output for
463
+ # the same prompt, so include the schema in the cache key.
464
+ if response_schema is not None:
465
+ payload["response_schema"] = response_schema
466
+ serialized = json.dumps(payload, sort_keys=True, separators=(",", ":"))
467
+ return hashlib.sha256(serialized.encode("utf-8")).hexdigest()
468
+
469
+ def _resolve_timeout(self, namespace: str) -> float:
470
+ """Pick the timeout for a single request.
471
+
472
+ Resolution order (first non-empty wins):
473
+ 1. Per-stage env override ``RESUME_BUILDER_LLM_TIMEOUT_<NAMESPACE>``
474
+ 2. The client's configured timeout (which itself came from
475
+ ``RESUME_BUILDER_LLM_TIMEOUT_SECONDS`` via :meth:`from_env`,
476
+ or the default when that env var was unset).
477
+ """
478
+ raw = os.getenv(_per_stage_timeout_env(namespace))
479
+ if raw is None:
480
+ return self._timeout_seconds
481
+ try:
482
+ value = float(raw.strip())
483
+ except (ValueError, AttributeError):
484
+ return self._timeout_seconds
485
+ if value <= 0:
486
+ return self._timeout_seconds
487
+ return value
488
+
489
+ def _request_chat_completion(
490
+ self,
491
+ messages: list[dict[str, str]],
492
+ *,
493
+ namespace: str = "",
494
+ response_schema: dict[str, Any] | None = None,
495
+ ) -> str:
496
+ if self._provider == PROVIDER_ANTHROPIC:
497
+ return self._request_anthropic(
498
+ messages, namespace=namespace, response_schema=response_schema
499
+ )
500
+ # Default: weak json mode (valid JSON, any keys). When a schema is
501
+ # supplied: OpenAI-compatible strict json_schema mode — Ollama
502
+ # rejects model output that doesn't match the schema. Issue #436
503
+ # diagnosis: weak json_object mode let the model invent or drop
504
+ # keys when the JD content grew past ~1KB, collapsing the
505
+ # cover-letter body schema entirely.
506
+ if response_schema is None:
507
+ response_format: dict[str, Any] = {"type": "json_object"}
508
+ else:
509
+ response_format = {
510
+ "type": "json_schema",
511
+ "json_schema": {
512
+ "name": response_schema.get("name", "response"),
513
+ "schema": response_schema.get("schema", response_schema),
514
+ "strict": True,
515
+ },
516
+ }
517
+ payload = {
518
+ "model": self._model,
519
+ "messages": messages,
520
+ "temperature": 0,
521
+ "response_format": response_format,
522
+ }
523
+ request = Request(
524
+ self._endpoint,
525
+ data=json.dumps(payload).encode("utf-8"),
526
+ headers={"Content-Type": "application/json"},
527
+ method="POST",
528
+ )
529
+ request_timeout = (
530
+ self._resolve_timeout(namespace) if namespace else self._timeout_seconds
531
+ )
532
+ try:
533
+ with urlopen(request, timeout=request_timeout) as response: # nosec B310
534
+ body = response.read().decode("utf-8", errors="replace")
535
+ except (OSError, TimeoutError) as exc:
536
+ # urllib.error.URLError and socket.timeout both inherit from
537
+ # OSError, and response.read() can also raise raw OSError /
538
+ # TimeoutError mid-stream. Normalize all transport-level
539
+ # failures to RuntimeError so callers that catch
540
+ # (RuntimeError, ValueError) around complete_json (per its
541
+ # docstring contract) keep the documented fallback behavior
542
+ # instead of being aborted by an unexpected OSError.
543
+ #
544
+ # Endpoint-down failures (connection refused / DNS failure /
545
+ # host unreachable) are surfaced as the typed
546
+ # ``LLMEndpointUnreachableError`` subclass so the build
547
+ # pipeline can emit a single actionable warning and
548
+ # short-circuit subsequent stages instead of attempting each
549
+ # one and burning the per-stage timeout budget.
550
+ if _is_endpoint_unreachable(exc):
551
+ raise LLMEndpointUnreachableError(
552
+ f"LLM endpoint unreachable at {self._endpoint}: {exc}"
553
+ ) from exc
554
+ raise RuntimeError(f"LLM request failed: {exc}") from exc
555
+
556
+ try:
557
+ decoded = json.loads(body)
558
+ except json.JSONDecodeError as exc:
559
+ raise RuntimeError("LLM response was not valid JSON") from exc
560
+
561
+ choices = decoded.get("choices", [])
562
+ if not isinstance(choices, list) or not choices:
563
+ raise RuntimeError("LLM response missing choices")
564
+ first = choices[0]
565
+ if not isinstance(first, dict):
566
+ raise RuntimeError("LLM response choice was malformed")
567
+ message = first.get("message", {})
568
+ if not isinstance(message, dict):
569
+ raise RuntimeError("LLM response message was malformed")
570
+ content = message.get("content", "")
571
+ if not isinstance(content, str) or not content.strip():
572
+ raise RuntimeError("LLM response content was empty")
573
+ return content
574
+
575
+ def _request_anthropic(
576
+ self,
577
+ messages: list[dict[str, str]],
578
+ *,
579
+ namespace: str = "",
580
+ response_schema: dict[str, Any] | None = None,
581
+ ) -> str:
582
+ """Single Anthropic chat-completion call via the official SDK.
583
+
584
+ Returns the model's raw response string (typically JSON when
585
+ `response_schema` is set via Anthropic's native
586
+ `output_config.format` structured-output mode). Caller parses +
587
+ validates.
588
+
589
+ Caching of the system prompt is on by default: every system
590
+ message in `messages` is hoisted out of the message array into
591
+ the top-level Anthropic `system` field as a text block, with a
592
+ `cache_control: {"type": "ephemeral"}` block attached so the
593
+ prompt is cached across calls within the 5-minute TTL. The
594
+ cover-letter body system prompt is ~4K tokens — well above
595
+ Sonnet 4.6's 2048-token minimum cache prefix — and is reused
596
+ across every call in a multi-JD compare session, so this
597
+ materially cuts cost on repeat calls.
598
+
599
+ **Error contract — distinct from `_request_chat_completion`.**
600
+ All failures of the Anthropic path raise `AnthropicProviderError`,
601
+ which does NOT inherit from `RuntimeError`. The pipeline's
602
+ existing `(RuntimeError, ValueError)` fallback handlers will
603
+ not catch these — the build halts loudly instead of silently
604
+ degrading to deterministic non-LLM output. Rationale: when a
605
+ user opted into the paid Anthropic provider, falling back to
606
+ baseline produces output far worse than what they paid for,
607
+ and burns API latency + tokens producing nothing useful.
608
+ Specific cases:
609
+ - `APIConnectionError` (transport unreachable) →
610
+ `AnthropicProviderError`.
611
+ - `APIStatusError` (rate limit, 4xx, 5xx) →
612
+ `AnthropicProviderError`.
613
+ - Missing API key / empty response → `AnthropicProviderError`.
614
+
615
+ The `anthropic` SDK is lazy-imported so `import llm_client`
616
+ succeeds in environments that haven't installed it (e.g. CI's
617
+ lint-only image, the default Ollama path). Same pattern as the
618
+ playwright lazy imports in `playwright_stealth_kit.launch`.
619
+ ImportError surfaces directly (not wrapped) since it's a setup
620
+ failure, not a runtime degradation candidate.
621
+ """
622
+ # API-key validation goes BEFORE the SDK import: a missing key
623
+ # is a configuration error and should surface as an
624
+ # AnthropicProviderError regardless of whether the SDK is
625
+ # installed. CI's test-fast / test-slow images don't carry the
626
+ # `anthropic` dep, so importing first would mask the api-key
627
+ # check behind an ImportError on those images.
628
+ if not self._api_key:
629
+ raise AnthropicProviderError(
630
+ "anthropic provider requires an API key via "
631
+ "RESUME_BUILDER_LLM_API_KEY or ANTHROPIC_API_KEY"
632
+ )
633
+
634
+ try:
635
+ import anthropic
636
+ from anthropic import (
637
+ APIConnectionError,
638
+ APIStatusError,
639
+ )
640
+ except ImportError as exc:
641
+ raise ImportError(
642
+ "anthropic SDK is required when "
643
+ "RESUME_BUILDER_LLM_PROVIDER=anthropic. Install: "
644
+ "pip install anthropic"
645
+ ) from exc
646
+
647
+ # Split out the system message (Anthropic puts it at the top
648
+ # level, not in the messages array) and pull off a cache_control
649
+ # breakpoint so the system prompt is cached across calls.
650
+ system_blocks: list[dict[str, Any]] = []
651
+ user_messages: list[dict[str, Any]] = []
652
+ for m in messages:
653
+ role = m.get("role", "")
654
+ content = m.get("content", "")
655
+ if role == "system":
656
+ system_blocks.append(
657
+ {
658
+ "type": "text",
659
+ "text": content,
660
+ "cache_control": {"type": "ephemeral"},
661
+ }
662
+ )
663
+ else:
664
+ user_messages.append({"role": role, "content": content})
665
+
666
+ kwargs: dict[str, Any] = {
667
+ "model": self._model,
668
+ "max_tokens": self._max_tokens,
669
+ "messages": user_messages,
670
+ }
671
+ if system_blocks:
672
+ kwargs["system"] = system_blocks
673
+ if response_schema is not None:
674
+ # Anthropic's native structured-output mode. The caller's
675
+ # schema dict matches the OpenAI shape `{name, schema}`; the
676
+ # Anthropic API wants the inner schema only.
677
+ schema = response_schema.get("schema", response_schema)
678
+ kwargs["output_config"] = {
679
+ "format": {"type": "json_schema", "schema": schema}
680
+ }
681
+
682
+ timeout = (
683
+ self._resolve_timeout(namespace) if namespace else self._timeout_seconds
684
+ )
685
+ client = anthropic.Anthropic(api_key=self._api_key, timeout=timeout)
686
+ try:
687
+ response = client.messages.create(**kwargs)
688
+ except APIConnectionError as exc:
689
+ raise AnthropicProviderError(f"Anthropic API unreachable: {exc}") from exc
690
+ except APIStatusError as exc:
691
+ # Covers RateLimitError, BadRequestError, AuthenticationError,
692
+ # OverloadedError, etc. — all surface as AnthropicProviderError
693
+ # so the build halts loudly instead of silently falling back
694
+ # to deterministic baseline (which would be far worse than
695
+ # what the user paid for).
696
+ raise AnthropicProviderError(
697
+ f"Anthropic API error {exc.status_code}: {exc.message}"
698
+ ) from exc
699
+
700
+ # Response shape: `content` is a list of typed blocks; structured
701
+ # output guarantees the first text block holds the JSON payload.
702
+ for block in response.content:
703
+ if getattr(block, "type", "") == "text":
704
+ text = getattr(block, "text", "")
705
+ if not isinstance(text, str) or not text.strip():
706
+ raise AnthropicProviderError(
707
+ "Anthropic response text block was empty"
708
+ )
709
+ return text
710
+ raise AnthropicProviderError("Anthropic response had no text block")
711
+
712
+ def _read_cache(self, path: Path) -> str | None:
713
+ if not path.exists():
714
+ return None
715
+ pass # runtime-guard call dropped in kit fork (host-app concern)
716
+ raw = path.read_text(encoding="utf-8")
717
+ payload = json.loads(raw)
718
+ if not isinstance(payload, dict):
719
+ raise ValueError("LLM cache payload must be an object")
720
+ content = payload.get("content")
721
+ if not isinstance(content, str):
722
+ raise ValueError("LLM cache payload missing content string")
723
+ return content
724
+
725
+ def _write_cache(self, *, cache_path: Path, content: str) -> None:
726
+ pass # runtime-guard call dropped in kit fork (host-app concern)
727
+ cache_path.parent.mkdir(parents=True, exist_ok=True)
728
+ payload = {
729
+ "model": self._model,
730
+ "endpoint": self._endpoint,
731
+ "content": content,
732
+ }
733
+ cache_path.write_text(json.dumps(payload, indent=2) + "\n", encoding="utf-8")
@@ -0,0 +1,89 @@
1
+ """Smoke tests for the v0.1.0 carve-out.
2
+
3
+ Full behavioral coverage lives in resume-builder's
4
+ ``tests/scripts/test_llm_client.py`` (1000+ lines) and is preserved
5
+ there; this file only exercises that the kit's public API is
6
+ importable + constructs without side effects. The richer test suite
7
+ will migrate over once we genericize env vars in a follow-up release.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ from pathlib import Path
13
+
14
+ import pytest
15
+
16
+ from llm_client_kit import (
17
+ PROVIDER_ANTHROPIC,
18
+ PROVIDER_OLLAMA,
19
+ AnthropicProviderError,
20
+ LLMClient,
21
+ LLMEndpointUnreachableError,
22
+ LLMResponse,
23
+ )
24
+
25
+
26
+ def test_public_api_importable() -> None:
27
+ """Every name in __all__ resolves to something usable."""
28
+ assert LLMClient is not None
29
+ assert LLMResponse is not None
30
+ assert LLMEndpointUnreachableError is not None
31
+ assert AnthropicProviderError is not None
32
+ assert PROVIDER_OLLAMA == "ollama"
33
+ assert PROVIDER_ANTHROPIC == "anthropic"
34
+
35
+
36
+ def test_exception_hierarchy() -> None:
37
+ """Endpoint-unreachable IS a RuntimeError (for legacy fallback
38
+ handlers); Anthropic-provider error is NOT (intentional, see
39
+ AnthropicProviderError docstring)."""
40
+ assert issubclass(LLMEndpointUnreachableError, RuntimeError)
41
+ assert not issubclass(AnthropicProviderError, RuntimeError)
42
+
43
+
44
+ def test_client_constructs_with_explicit_kwargs(tmp_path: Path) -> None:
45
+ """Direct construction is the non-env path consumers can use to
46
+ avoid the resume-builder-flavored env var convention."""
47
+ client = LLMClient(
48
+ endpoint="http://localhost:11434/v1/chat/completions",
49
+ model="llama3.2:3b",
50
+ cache_dir=tmp_path / "cache",
51
+ fixture_mode=True,
52
+ timeout_seconds=5.0,
53
+ provider=PROVIDER_OLLAMA,
54
+ )
55
+ assert client.endpoint == "http://localhost:11434/v1/chat/completions"
56
+ assert client.model == "llama3.2:3b"
57
+ assert client.provider == PROVIDER_OLLAMA
58
+
59
+
60
+ def test_client_rejects_unknown_provider(tmp_path: Path) -> None:
61
+ """Typo in the provider arg fails fast at construction so the user
62
+ sees the misconfiguration immediately, not silently routed to the
63
+ Ollama path."""
64
+ with pytest.raises(ValueError, match="unsupported LLM provider"):
65
+ LLMClient(
66
+ endpoint="http://x",
67
+ model="m",
68
+ cache_dir=tmp_path,
69
+ fixture_mode=False,
70
+ provider="gemini", # type: ignore[arg-type] — intentional bad value
71
+ )
72
+
73
+
74
+ def test_fixture_mode_misses_raise_runtime_error(tmp_path: Path) -> None:
75
+ """In fixture mode, calling complete_json without a cached entry
76
+ must raise — never silently issue a network request."""
77
+ client = LLMClient(
78
+ endpoint="http://localhost:99999", # nothing listening, would explode if called
79
+ model="llama3.2:3b",
80
+ cache_dir=tmp_path / "empty-cache",
81
+ fixture_mode=True,
82
+ timeout_seconds=1.0,
83
+ )
84
+ with pytest.raises(RuntimeError, match="LLM fixture missing"):
85
+ client.complete_json(
86
+ namespace="test",
87
+ system_prompt="x",
88
+ user_payload={"q": "y"},
89
+ )