omg-llmkit 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- omg_llmkit-0.1.0/.github/workflows/ci.yml +33 -0
- omg_llmkit-0.1.0/.github/workflows/publish.yml +49 -0
- omg_llmkit-0.1.0/.gitignore +16 -0
- omg_llmkit-0.1.0/CHANGELOG.md +26 -0
- omg_llmkit-0.1.0/CONTRIBUTING.md +32 -0
- omg_llmkit-0.1.0/LICENSE +21 -0
- omg_llmkit-0.1.0/PKG-INFO +226 -0
- omg_llmkit-0.1.0/README.md +173 -0
- omg_llmkit-0.1.0/SECURITY.md +28 -0
- omg_llmkit-0.1.0/pyproject.toml +93 -0
- omg_llmkit-0.1.0/pyrightconfig.json +11 -0
- omg_llmkit-0.1.0/src/llmkit/__init__.py +76 -0
- omg_llmkit-0.1.0/src/llmkit/_litellm.py +145 -0
- omg_llmkit-0.1.0/src/llmkit/exceptions.py +27 -0
- omg_llmkit-0.1.0/src/llmkit/logging.py +221 -0
- omg_llmkit-0.1.0/src/llmkit/providers.py +327 -0
- omg_llmkit-0.1.0/src/llmkit/rate_limiting.py +130 -0
- omg_llmkit-0.1.0/src/llmkit/retry.py +108 -0
- omg_llmkit-0.1.0/src/llmkit/structured_output.py +357 -0
- omg_llmkit-0.1.0/src/llmkit/sync.py +29 -0
- omg_llmkit-0.1.0/tests/__init__.py +0 -0
- omg_llmkit-0.1.0/tests/test_retry.py +215 -0
- omg_llmkit-0.1.0/tests/test_structured_output_logging.py +162 -0
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
name: CI
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches: [main]
|
|
6
|
+
pull_request:
|
|
7
|
+
|
|
8
|
+
jobs:
|
|
9
|
+
check:
|
|
10
|
+
runs-on: ubuntu-latest
|
|
11
|
+
steps:
|
|
12
|
+
- uses: actions/checkout@v4
|
|
13
|
+
|
|
14
|
+
- name: Install uv
|
|
15
|
+
uses: astral-sh/setup-uv@v5
|
|
16
|
+
|
|
17
|
+
- name: Set up Python
|
|
18
|
+
run: uv python install 3.13
|
|
19
|
+
|
|
20
|
+
- name: Install dependencies
|
|
21
|
+
run: uv sync --extra dev
|
|
22
|
+
|
|
23
|
+
- name: Ruff (lint)
|
|
24
|
+
run: uv run ruff check .
|
|
25
|
+
|
|
26
|
+
- name: Ruff (format check)
|
|
27
|
+
run: uv run ruff format --check .
|
|
28
|
+
|
|
29
|
+
- name: basedpyright (no baseline)
|
|
30
|
+
run: uv run basedpyright
|
|
31
|
+
|
|
32
|
+
- name: Tests
|
|
33
|
+
run: uv run pytest
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
name: Publish to PyPI
|
|
2
|
+
|
|
3
|
+
# Publishes to PyPI when a GitHub Release is published. Uses PyPI Trusted
|
|
4
|
+
# Publishing (OIDC) — there is NO API token stored in the repo. Configure the
|
|
5
|
+
# matching "pending publisher" on PyPI first (see CONTRIBUTING / release notes):
|
|
6
|
+
# project: omg-llmkit | owner: OMGBrews | repo: llmkit
|
|
7
|
+
# workflow: publish.yml | environment: pypi
|
|
8
|
+
# (PyPI distribution name is "omg-llmkit"; the import name stays "llmkit".)
|
|
9
|
+
|
|
10
|
+
on:
|
|
11
|
+
release:
|
|
12
|
+
types: [published]
|
|
13
|
+
workflow_dispatch: {}
|
|
14
|
+
|
|
15
|
+
jobs:
|
|
16
|
+
build:
|
|
17
|
+
runs-on: ubuntu-latest
|
|
18
|
+
steps:
|
|
19
|
+
- uses: actions/checkout@v4
|
|
20
|
+
|
|
21
|
+
- name: Install uv
|
|
22
|
+
uses: astral-sh/setup-uv@v5
|
|
23
|
+
|
|
24
|
+
- name: Build sdist and wheel
|
|
25
|
+
run: uv build
|
|
26
|
+
|
|
27
|
+
- name: Upload dist artifact
|
|
28
|
+
uses: actions/upload-artifact@v4
|
|
29
|
+
with:
|
|
30
|
+
name: dist
|
|
31
|
+
path: dist/
|
|
32
|
+
|
|
33
|
+
publish:
|
|
34
|
+
needs: build
|
|
35
|
+
runs-on: ubuntu-latest
|
|
36
|
+
environment:
|
|
37
|
+
name: pypi
|
|
38
|
+
url: https://pypi.org/p/omg-llmkit
|
|
39
|
+
permissions:
|
|
40
|
+
id-token: write # required for PyPI Trusted Publishing (OIDC)
|
|
41
|
+
steps:
|
|
42
|
+
- name: Download dist artifact
|
|
43
|
+
uses: actions/download-artifact@v4
|
|
44
|
+
with:
|
|
45
|
+
name: dist
|
|
46
|
+
path: dist/
|
|
47
|
+
|
|
48
|
+
- name: Publish to PyPI
|
|
49
|
+
uses: pypa/gh-action-pypi-publish@release/v1
|
|
@@ -0,0 +1,26 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project are documented here. The format follows
|
|
4
|
+
[Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
|
|
5
|
+
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
6
|
+
|
|
7
|
+
## [0.1.0] — 2026-06-05
|
|
8
|
+
|
|
9
|
+
Initial public release.
|
|
10
|
+
|
|
11
|
+
### Added
|
|
12
|
+
|
|
13
|
+
- Provider-agnostic call surface over LiteLLM (with `instructor` for structured
|
|
14
|
+
output) across OpenRouter, Google, Anthropic, and local Ollama.
|
|
15
|
+
- `structured_llm_call` / `structured_llm_call_sync` — validated Pydantic output,
|
|
16
|
+
with each provider pinned to its native JSON-schema mode (never auto-`Mode.TOOLS`).
|
|
17
|
+
- `text_llm_call` and `stream_text_with_log` for plain-text and streamed calls.
|
|
18
|
+
- Process-global async rate limiter (`GlobalRateLimiter`, `configure_rate_limit`).
|
|
19
|
+
- Transient-error retries (`with_retries`, `LLM_RECOVERABLE_ERRORS`), kept
|
|
20
|
+
separate from instructor's schema-repair retries.
|
|
21
|
+
- Agent-readable logging: `LocalYamlLogSink` writes verdict-first per-call YAML
|
|
22
|
+
plus an append-only `index.jsonl`; pluggable `LogSink` protocol for custom sinks.
|
|
23
|
+
- Approximate per-call cost (`approximate_cost`) sourced from LiteLLM's response
|
|
24
|
+
estimate, for budget visibility.
|
|
25
|
+
|
|
26
|
+
[0.1.0]: https://github.com/OMGBrews/llmkit/releases/tag/v0.1.0
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
# Contributing
|
|
2
|
+
|
|
3
|
+
Thanks for your interest. This is a small, opinionated, best-effort project — see
|
|
4
|
+
the scope notes in the [README](README.md). Bug reports and focused pull requests
|
|
5
|
+
are welcome; large feature proposals may not be a fit for the library's
|
|
6
|
+
deliberately-thin design, so please open an issue to discuss before investing in
|
|
7
|
+
a big change.
|
|
8
|
+
|
|
9
|
+
## Development setup
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
uv sync --extra dev
|
|
13
|
+
```
|
|
14
|
+
|
|
15
|
+
## Checks must pass
|
|
16
|
+
|
|
17
|
+
CI runs the same four gates on every push and pull request, with **no baseline**:
|
|
18
|
+
|
|
19
|
+
```bash
|
|
20
|
+
uv run ruff check .
|
|
21
|
+
uv run ruff format --check .
|
|
22
|
+
uv run basedpyright # 0 errors, 0 warnings
|
|
23
|
+
uv run pytest
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Please run them locally before opening a PR. New behavior needs a test.
|
|
27
|
+
|
|
28
|
+
## Conventions
|
|
29
|
+
|
|
30
|
+
- Keep the public surface small — `llmkit` owns the call ergonomics, not transport.
|
|
31
|
+
- No `dict[str, Any]` / bare `Any`; use precise types (basedpyright enforces this).
|
|
32
|
+
- Hard cuts over deprecation shims for internal changes.
|
omg_llmkit-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 OMGBrews
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,226 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: omg-llmkit
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: A thin, opinionated, local-first structured-output + logging layer over LiteLLM
|
|
5
|
+
Project-URL: Homepage, https://github.com/OMGBrews/llmkit
|
|
6
|
+
Project-URL: Repository, https://github.com/OMGBrews/llmkit
|
|
7
|
+
Project-URL: Issues, https://github.com/OMGBrews/llmkit/issues
|
|
8
|
+
Project-URL: Changelog, https://github.com/OMGBrews/llmkit/blob/main/CHANGELOG.md
|
|
9
|
+
Author: OMGBrews
|
|
10
|
+
License: MIT License
|
|
11
|
+
|
|
12
|
+
Copyright (c) 2026 OMGBrews
|
|
13
|
+
|
|
14
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
15
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
16
|
+
in the Software without restriction, including without limitation the rights
|
|
17
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
18
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
19
|
+
furnished to do so, subject to the following conditions:
|
|
20
|
+
|
|
21
|
+
The above copyright notice and this permission notice shall be included in all
|
|
22
|
+
copies or substantial portions of the Software.
|
|
23
|
+
|
|
24
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
25
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
26
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
27
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
28
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
29
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
30
|
+
SOFTWARE.
|
|
31
|
+
License-File: LICENSE
|
|
32
|
+
Keywords: anthropic,gemini,instructor,litellm,llm,openai,structured-output
|
|
33
|
+
Classifier: Development Status :: 4 - Beta
|
|
34
|
+
Classifier: Intended Audience :: Developers
|
|
35
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
36
|
+
Classifier: Operating System :: OS Independent
|
|
37
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
38
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
39
|
+
Classifier: Typing :: Typed
|
|
40
|
+
Requires-Python: >=3.13
|
|
41
|
+
Requires-Dist: httpx>=0.27.0
|
|
42
|
+
Requires-Dist: instructor>=1.15.1
|
|
43
|
+
Requires-Dist: litellm>=1.87.1
|
|
44
|
+
Requires-Dist: openai>=2.0.0
|
|
45
|
+
Requires-Dist: pydantic>=2.5.0
|
|
46
|
+
Requires-Dist: pyyaml>=6.0.0
|
|
47
|
+
Provides-Extra: dev
|
|
48
|
+
Requires-Dist: basedpyright>=1.39; extra == 'dev'
|
|
49
|
+
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
|
|
50
|
+
Requires-Dist: pytest>=8.0.0; extra == 'dev'
|
|
51
|
+
Requires-Dist: ruff==0.15.0; extra == 'dev'
|
|
52
|
+
Description-Content-Type: text/markdown
|
|
53
|
+
|
|
54
|
+
# llmkit
|
|
55
|
+
|
|
56
|
+
A thin, opinionated, **local-first** layer over [LiteLLM](https://github.com/BerriAI/litellm) (with [instructor](https://github.com/567-labs/instructor) for structured output). It gives an application one provider-agnostic call surface across **OpenRouter, Google, Anthropic, and local Ollama**, with validated structured output, a global async rate limiter, transient-error retries, and **agent-readable per-call logging** out of the box.
|
|
57
|
+
|
|
58
|
+
LiteLLM is the implementation of the HTTP providers; llmkit owns the ergonomic call surface, the structured-output mode pinning, the rate-limit policy, and the logging convention. It is **not** a gateway and does not reimplement transport — that is solved, and reimplementing it is the thing this library deliberately does not do.
|
|
59
|
+
|
|
60
|
+
## Why llmkit
|
|
61
|
+
|
|
62
|
+
- **Structured output that actually validates.** Each provider is pinned to its *native* JSON-schema mode (never instructor's auto-`Mode.TOOLS`, which silently regresses Gemini to empty shapes), and instructor's in-call validation-retry repairs truncated JSON. You pass a Pydantic model; you get a validated instance back.
|
|
63
|
+
- **Provider switching is config, not code.** OpenRouter / Google / Anthropic / Ollama behind one `Provider` enum and one `LLMClientConfig`. Call sites never change when you switch.
|
|
64
|
+
- **Logging tuned for coding agents.** Every call is logged verdict-first (see below) — the design assumption is that the reader is usually an LLM coding agent debugging a run, not a dashboard.
|
|
65
|
+
- **Local-first, zero infra.** The default sink writes plain files to a directory. No collector, no account, no network. A pluggable `LogSink` lets you ship records anywhere later without touching call sites.
|
|
66
|
+
|
|
67
|
+
## Install
|
|
68
|
+
|
|
69
|
+
```bash
|
|
70
|
+
uv add omg-llmkit # or: pip install omg-llmkit
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
The distribution is published as **`omg-llmkit`** (the bare `llmkit` name was already
|
|
74
|
+
taken on PyPI), but the import name is just `llmkit`:
|
|
75
|
+
|
|
76
|
+
```python
|
|
77
|
+
import llmkit
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
Requires Python ≥ 3.13.
|
|
81
|
+
|
|
82
|
+
## Quick start
|
|
83
|
+
|
|
84
|
+
```python
|
|
85
|
+
from pydantic import BaseModel
|
|
86
|
+
from llmkit import (
|
|
87
|
+
LLMClientConfig,
|
|
88
|
+
Provider,
|
|
89
|
+
configure_llm_client,
|
|
90
|
+
structured_llm_call,
|
|
91
|
+
)
|
|
92
|
+
|
|
93
|
+
# Point the library at a provider once, at startup.
|
|
94
|
+
configure_llm_client(lambda: LLMClientConfig(
|
|
95
|
+
provider=Provider.OPENROUTER,
|
|
96
|
+
model="google/gemini-2.5-flash",
|
|
97
|
+
api_key="sk-or-...",
|
|
98
|
+
))
|
|
99
|
+
|
|
100
|
+
class Summary(BaseModel):
|
|
101
|
+
title: str
|
|
102
|
+
bullets: list[str]
|
|
103
|
+
|
|
104
|
+
result: Summary = await structured_llm_call(
|
|
105
|
+
prompt="Summarize the attached report.",
|
|
106
|
+
schema=Summary,
|
|
107
|
+
feature="reports", # groups calls in the logs
|
|
108
|
+
label="exec_summary", # names this specific call in the logs
|
|
109
|
+
)
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
The public call surface:
|
|
113
|
+
|
|
114
|
+
| Function | Use |
|
|
115
|
+
|----------|-----|
|
|
116
|
+
| `structured_llm_call(prompt, schema, feature, label, ...)` | Async, returns a validated Pydantic instance |
|
|
117
|
+
| `structured_llm_call_sync(...)` | Synchronous wrapper around the above |
|
|
118
|
+
| `text_llm_call(prompt, feature, label, ...)` | Async, returns plain text (coerces provider list-content blocks) |
|
|
119
|
+
| `stream_text_with_log(prompt, feature, label, ...)` | Async generator yielding text chunks, logged on completion |
|
|
120
|
+
|
|
121
|
+
`configure_rate_limit(...)` sets the process-global concurrency cap; `configure_llm_logging(sink)` swaps the log sink (below).
|
|
122
|
+
|
|
123
|
+
## Logging: agent-readable by default
|
|
124
|
+
|
|
125
|
+
`LocalYamlLogSink` (the default) writes **two** things to `data/llm-logs/`:
|
|
126
|
+
|
|
127
|
+
1. **One YAML file per call, laid out verdict-first.** The file opens with a one-line `#` header — `ok`/`ERROR`, feature/label, resolved model, schema, duration, approximate cost — so `head -1 *.yaml` triages a whole run. Small metadata is next; the large `response` and `prompt` blobs are last, so the *head* of the file is the whole story for most reads.
|
|
128
|
+
2. **A compact append-only `index.jsonl`** — one JSON line per call (file, timestamp, feature, label, model, provider, schema, duration, cost, error). Cross-call questions — "which calls errored / were slowest / most expensive / the last call for feature X" — are a single small scan instead of globbing and parsing every YAML.
|
|
129
|
+
|
|
130
|
+
```
|
|
131
|
+
# ok | reports/exec_summary | google/gemini-2.5-flash | Summary | 1840ms | $0.0007
|
|
132
|
+
# 2026-06-05T14:22:31.004512
|
|
133
|
+
|
|
134
|
+
timestamp: '2026-06-05T14:22:31.004512'
|
|
135
|
+
feature: reports
|
|
136
|
+
label: exec_summary
|
|
137
|
+
model: google/gemini-2.5-flash
|
|
138
|
+
provider: openrouter
|
|
139
|
+
schema: Summary
|
|
140
|
+
temperature: 0.0
|
|
141
|
+
duration_ms: 1840.2
|
|
142
|
+
approximate_cost: 0.0007
|
|
143
|
+
error: null
|
|
144
|
+
response: ...
|
|
145
|
+
prompt: ...
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
`approximate_cost` is LiteLLM's per-response estimate for budget visibility — **not** a billing figure (and `None` when the provider does not report it, e.g. streamed calls).
|
|
149
|
+
|
|
150
|
+
### Write your own `LogSink`
|
|
151
|
+
|
|
152
|
+
`LogSink` is a one-method `Protocol`. Records (`LLMCallRecord`, a frozen dataclass) are handed to your sink for every call; failures are swallowed so logging can never break a call. To send records somewhere other than local YAML — a database, an HTTP collector, structured stdout — implement `write` and register it:
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
import logging
|
|
156
|
+
from pathlib import Path
|
|
157
|
+
from llmkit import LLMCallRecord, configure_llm_logging
|
|
158
|
+
|
|
159
|
+
logger = logging.getLogger("llm-calls")
|
|
160
|
+
|
|
161
|
+
class StructuredStdoutSink:
|
|
162
|
+
def write(self, record: LLMCallRecord) -> Path | None:
|
|
163
|
+
logger.info(
|
|
164
|
+
"llm_call",
|
|
165
|
+
extra={
|
|
166
|
+
"feature": record.feature,
|
|
167
|
+
"label": record.label,
|
|
168
|
+
"model": record.model,
|
|
169
|
+
"provider": record.provider,
|
|
170
|
+
"schema": record.schema,
|
|
171
|
+
"duration_ms": record.duration_ms,
|
|
172
|
+
"approximate_cost": record.approximate_cost,
|
|
173
|
+
"error": record.error,
|
|
174
|
+
},
|
|
175
|
+
)
|
|
176
|
+
return None # nothing persisted to a path
|
|
177
|
+
|
|
178
|
+
configure_llm_logging(StructuredStdoutSink()) # pass None to disable logging entirely
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
An OpenTelemetry exporter (e.g. to Langfuse/Phoenix) is a natural future `llmkit[otel]` extra; the pluggable seam makes it a non-breaking addition.
|
|
182
|
+
|
|
183
|
+
## Configuration
|
|
184
|
+
|
|
185
|
+
`LLMClientConfig` is flat and carries only what a call needs:
|
|
186
|
+
|
|
187
|
+
```python
|
|
188
|
+
@dataclass(frozen=True)
|
|
189
|
+
class LLMClientConfig:
|
|
190
|
+
provider: Provider # OPENROUTER | OLLAMA | GOOGLE | ANTHROPIC
|
|
191
|
+
model: str # the provider's default model
|
|
192
|
+
api_key: str | None = None
|
|
193
|
+
base_url: str | None = None
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
Per-call `model=` overrides the default, so "strong/small/current" model roles are the host's concern — resolve them to a model string and pass it at the call site. The library has no opinion about roles.
|
|
197
|
+
|
|
198
|
+
Register the config with `configure_llm_client(source)`, where `source` is a zero-arg callable returning an `LLMClientConfig` (re-read on each provider construction, so it tracks live settings changes).
|
|
199
|
+
|
|
200
|
+
## Retries
|
|
201
|
+
|
|
202
|
+
Two retry layers, kept deliberately separate:
|
|
203
|
+
|
|
204
|
+
- **`with_retries()`** ([`retry.py`](src/llmkit/retry.py)) handles *transient provider* errors (429 / 503 / 5xx; the recoverable set is `LLM_RECOVERABLE_ERRORS`).
|
|
205
|
+
- **instructor's own low `max_retries`** handles *schema-validation* repair (re-ask the model to fix malformed JSON).
|
|
206
|
+
|
|
207
|
+
## Development
|
|
208
|
+
|
|
209
|
+
```bash
|
|
210
|
+
uv sync --extra dev
|
|
211
|
+
uv run ruff check . && uv run ruff format --check .
|
|
212
|
+
uv run basedpyright # 0 errors, 0 warnings — no baseline
|
|
213
|
+
uv run pytest
|
|
214
|
+
```
|
|
215
|
+
|
|
216
|
+
## Status & support
|
|
217
|
+
|
|
218
|
+
`llmkit` is a small, opinionated, **best-effort** project, extracted from a real
|
|
219
|
+
application and maintained in the open. It is used in production by its author
|
|
220
|
+
but carries no support SLA. Bug reports and focused pull requests are welcome —
|
|
221
|
+
see [CONTRIBUTING.md](CONTRIBUTING.md). For security issues, see
|
|
222
|
+
[SECURITY.md](SECURITY.md).
|
|
223
|
+
|
|
224
|
+
## License
|
|
225
|
+
|
|
226
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,173 @@
|
|
|
1
|
+
# llmkit
|
|
2
|
+
|
|
3
|
+
A thin, opinionated, **local-first** layer over [LiteLLM](https://github.com/BerriAI/litellm) (with [instructor](https://github.com/567-labs/instructor) for structured output). It gives an application one provider-agnostic call surface across **OpenRouter, Google, Anthropic, and local Ollama**, with validated structured output, a global async rate limiter, transient-error retries, and **agent-readable per-call logging** out of the box.
|
|
4
|
+
|
|
5
|
+
LiteLLM is the implementation of the HTTP providers; llmkit owns the ergonomic call surface, the structured-output mode pinning, the rate-limit policy, and the logging convention. It is **not** a gateway and does not reimplement transport — that is solved, and reimplementing it is the thing this library deliberately does not do.
|
|
6
|
+
|
|
7
|
+
## Why llmkit
|
|
8
|
+
|
|
9
|
+
- **Structured output that actually validates.** Each provider is pinned to its *native* JSON-schema mode (never instructor's auto-`Mode.TOOLS`, which silently regresses Gemini to empty shapes), and instructor's in-call validation-retry repairs truncated JSON. You pass a Pydantic model; you get a validated instance back.
|
|
10
|
+
- **Provider switching is config, not code.** OpenRouter / Google / Anthropic / Ollama behind one `Provider` enum and one `LLMClientConfig`. Call sites never change when you switch.
|
|
11
|
+
- **Logging tuned for coding agents.** Every call is logged verdict-first (see below) — the design assumption is that the reader is usually an LLM coding agent debugging a run, not a dashboard.
|
|
12
|
+
- **Local-first, zero infra.** The default sink writes plain files to a directory. No collector, no account, no network. A pluggable `LogSink` lets you ship records anywhere later without touching call sites.
|
|
13
|
+
|
|
14
|
+
## Install
|
|
15
|
+
|
|
16
|
+
```bash
|
|
17
|
+
uv add omg-llmkit # or: pip install omg-llmkit
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
The distribution is published as **`omg-llmkit`** (the bare `llmkit` name was already
|
|
21
|
+
taken on PyPI), but the import name is just `llmkit`:
|
|
22
|
+
|
|
23
|
+
```python
|
|
24
|
+
import llmkit
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
Requires Python ≥ 3.13.
|
|
28
|
+
|
|
29
|
+
## Quick start
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from pydantic import BaseModel
|
|
33
|
+
from llmkit import (
|
|
34
|
+
LLMClientConfig,
|
|
35
|
+
Provider,
|
|
36
|
+
configure_llm_client,
|
|
37
|
+
structured_llm_call,
|
|
38
|
+
)
|
|
39
|
+
|
|
40
|
+
# Point the library at a provider once, at startup.
|
|
41
|
+
configure_llm_client(lambda: LLMClientConfig(
|
|
42
|
+
provider=Provider.OPENROUTER,
|
|
43
|
+
model="google/gemini-2.5-flash",
|
|
44
|
+
api_key="sk-or-...",
|
|
45
|
+
))
|
|
46
|
+
|
|
47
|
+
class Summary(BaseModel):
|
|
48
|
+
title: str
|
|
49
|
+
bullets: list[str]
|
|
50
|
+
|
|
51
|
+
result: Summary = await structured_llm_call(
|
|
52
|
+
prompt="Summarize the attached report.",
|
|
53
|
+
schema=Summary,
|
|
54
|
+
feature="reports", # groups calls in the logs
|
|
55
|
+
label="exec_summary", # names this specific call in the logs
|
|
56
|
+
)
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
The public call surface:
|
|
60
|
+
|
|
61
|
+
| Function | Use |
|
|
62
|
+
|----------|-----|
|
|
63
|
+
| `structured_llm_call(prompt, schema, feature, label, ...)` | Async, returns a validated Pydantic instance |
|
|
64
|
+
| `structured_llm_call_sync(...)` | Synchronous wrapper around the above |
|
|
65
|
+
| `text_llm_call(prompt, feature, label, ...)` | Async, returns plain text (coerces provider list-content blocks) |
|
|
66
|
+
| `stream_text_with_log(prompt, feature, label, ...)` | Async generator yielding text chunks, logged on completion |
|
|
67
|
+
|
|
68
|
+
`configure_rate_limit(...)` sets the process-global concurrency cap; `configure_llm_logging(sink)` swaps the log sink (below).
|
|
69
|
+
|
|
70
|
+
## Logging: agent-readable by default
|
|
71
|
+
|
|
72
|
+
`LocalYamlLogSink` (the default) writes **two** things to `data/llm-logs/`:
|
|
73
|
+
|
|
74
|
+
1. **One YAML file per call, laid out verdict-first.** The file opens with a one-line `#` header — `ok`/`ERROR`, feature/label, resolved model, schema, duration, approximate cost — so `head -1 *.yaml` triages a whole run. Small metadata is next; the large `response` and `prompt` blobs are last, so the *head* of the file is the whole story for most reads.
|
|
75
|
+
2. **A compact append-only `index.jsonl`** — one JSON line per call (file, timestamp, feature, label, model, provider, schema, duration, cost, error). Cross-call questions — "which calls errored / were slowest / most expensive / the last call for feature X" — are a single small scan instead of globbing and parsing every YAML.
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
# ok | reports/exec_summary | google/gemini-2.5-flash | Summary | 1840ms | $0.0007
|
|
79
|
+
# 2026-06-05T14:22:31.004512
|
|
80
|
+
|
|
81
|
+
timestamp: '2026-06-05T14:22:31.004512'
|
|
82
|
+
feature: reports
|
|
83
|
+
label: exec_summary
|
|
84
|
+
model: google/gemini-2.5-flash
|
|
85
|
+
provider: openrouter
|
|
86
|
+
schema: Summary
|
|
87
|
+
temperature: 0.0
|
|
88
|
+
duration_ms: 1840.2
|
|
89
|
+
approximate_cost: 0.0007
|
|
90
|
+
error: null
|
|
91
|
+
response: ...
|
|
92
|
+
prompt: ...
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
`approximate_cost` is LiteLLM's per-response estimate for budget visibility — **not** a billing figure (and `None` when the provider does not report it, e.g. streamed calls).
|
|
96
|
+
|
|
97
|
+
### Write your own `LogSink`
|
|
98
|
+
|
|
99
|
+
`LogSink` is a one-method `Protocol`. Records (`LLMCallRecord`, a frozen dataclass) are handed to your sink for every call; failures are swallowed so logging can never break a call. To send records somewhere other than local YAML — a database, an HTTP collector, structured stdout — implement `write` and register it:
|
|
100
|
+
|
|
101
|
+
```python
|
|
102
|
+
import logging
|
|
103
|
+
from pathlib import Path
|
|
104
|
+
from llmkit import LLMCallRecord, configure_llm_logging
|
|
105
|
+
|
|
106
|
+
logger = logging.getLogger("llm-calls")
|
|
107
|
+
|
|
108
|
+
class StructuredStdoutSink:
|
|
109
|
+
def write(self, record: LLMCallRecord) -> Path | None:
|
|
110
|
+
logger.info(
|
|
111
|
+
"llm_call",
|
|
112
|
+
extra={
|
|
113
|
+
"feature": record.feature,
|
|
114
|
+
"label": record.label,
|
|
115
|
+
"model": record.model,
|
|
116
|
+
"provider": record.provider,
|
|
117
|
+
"schema": record.schema,
|
|
118
|
+
"duration_ms": record.duration_ms,
|
|
119
|
+
"approximate_cost": record.approximate_cost,
|
|
120
|
+
"error": record.error,
|
|
121
|
+
},
|
|
122
|
+
)
|
|
123
|
+
return None # nothing persisted to a path
|
|
124
|
+
|
|
125
|
+
configure_llm_logging(StructuredStdoutSink()) # pass None to disable logging entirely
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
An OpenTelemetry exporter (e.g. to Langfuse/Phoenix) is a natural future `llmkit[otel]` extra; the pluggable seam makes it a non-breaking addition.
|
|
129
|
+
|
|
130
|
+
## Configuration
|
|
131
|
+
|
|
132
|
+
`LLMClientConfig` is flat and carries only what a call needs:
|
|
133
|
+
|
|
134
|
+
```python
|
|
135
|
+
@dataclass(frozen=True)
|
|
136
|
+
class LLMClientConfig:
|
|
137
|
+
provider: Provider # OPENROUTER | OLLAMA | GOOGLE | ANTHROPIC
|
|
138
|
+
model: str # the provider's default model
|
|
139
|
+
api_key: str | None = None
|
|
140
|
+
base_url: str | None = None
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
Per-call `model=` overrides the default, so "strong/small/current" model roles are the host's concern — resolve them to a model string and pass it at the call site. The library has no opinion about roles.
|
|
144
|
+
|
|
145
|
+
Register the config with `configure_llm_client(source)`, where `source` is a zero-arg callable returning an `LLMClientConfig` (re-read on each provider construction, so it tracks live settings changes).
|
|
146
|
+
|
|
147
|
+
## Retries
|
|
148
|
+
|
|
149
|
+
Two retry layers, kept deliberately separate:
|
|
150
|
+
|
|
151
|
+
- **`with_retries()`** ([`retry.py`](src/llmkit/retry.py)) handles *transient provider* errors (429 / 503 / 5xx; the recoverable set is `LLM_RECOVERABLE_ERRORS`).
|
|
152
|
+
- **instructor's own low `max_retries`** handles *schema-validation* repair (re-ask the model to fix malformed JSON).
|
|
153
|
+
|
|
154
|
+
## Development
|
|
155
|
+
|
|
156
|
+
```bash
|
|
157
|
+
uv sync --extra dev
|
|
158
|
+
uv run ruff check . && uv run ruff format --check .
|
|
159
|
+
uv run basedpyright # 0 errors, 0 warnings — no baseline
|
|
160
|
+
uv run pytest
|
|
161
|
+
```
|
|
162
|
+
|
|
163
|
+
## Status & support
|
|
164
|
+
|
|
165
|
+
`llmkit` is a small, opinionated, **best-effort** project, extracted from a real
|
|
166
|
+
application and maintained in the open. It is used in production by its author
|
|
167
|
+
but carries no support SLA. Bug reports and focused pull requests are welcome —
|
|
168
|
+
see [CONTRIBUTING.md](CONTRIBUTING.md). For security issues, see
|
|
169
|
+
[SECURITY.md](SECURITY.md).
|
|
170
|
+
|
|
171
|
+
## License
|
|
172
|
+
|
|
173
|
+
MIT — see [LICENSE](LICENSE).
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# Security Policy
|
|
2
|
+
|
|
3
|
+
## Reporting a vulnerability
|
|
4
|
+
|
|
5
|
+
Please **do not** open a public issue for security problems.
|
|
6
|
+
|
|
7
|
+
Report privately through GitHub's
|
|
8
|
+
[private vulnerability reporting](https://github.com/OMGBrews/llmkit/security/advisories/new)
|
|
9
|
+
(the **Security** tab → **Report a vulnerability**). This opens a private
|
|
10
|
+
advisory visible only to the maintainers.
|
|
11
|
+
|
|
12
|
+
This is a small, best-effort project. There is no formal SLA, but reports are
|
|
13
|
+
taken seriously and acknowledged as soon as practical.
|
|
14
|
+
|
|
15
|
+
## Scope
|
|
16
|
+
|
|
17
|
+
`llmkit` is a thin client layer; it holds no credentials of its own and runs no
|
|
18
|
+
network services. The most security-relevant surfaces are:
|
|
19
|
+
|
|
20
|
+
- **Provider API keys** passed through `LLMClientConfig` — these live in your
|
|
21
|
+
process, never in `llmkit`.
|
|
22
|
+
- **The default log sink** (`LocalYamlLogSink`) writes prompts and responses to
|
|
23
|
+
local files under `data/llm-logs/`. Treat that directory as sensitive if your
|
|
24
|
+
prompts carry secrets, and add it to `.gitignore` (the default for this repo).
|
|
25
|
+
|
|
26
|
+
## Supported versions
|
|
27
|
+
|
|
28
|
+
Only the latest released version receives fixes.
|