inference-proxy 3.0.0.dev1__tar.gz → 3.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/PKG-INFO +167 -53
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/README.md +151 -50
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/app.py +3 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/base_types.py +34 -2
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/bootstrap.py +14 -3
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config.py +1 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/core.py +55 -33
- inference_proxy-3.0.1/lm_proxy/errors.py +43 -0
- inference_proxy-3.0.1/lm_proxy/handlers/__init__.py +7 -0
- inference_proxy-3.0.1/lm_proxy/handlers/forward_http_headers.py +70 -0
- inference_proxy-3.0.1/lm_proxy/handlers/rate_limiter.py +88 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/models_endpoint.py +2 -1
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/utils.py +2 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/pyproject.toml +39 -16
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/LICENSE +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/__init__.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/__main__.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/_app.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/api_key_check/__init__.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/api_key_check/allow_all.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/api_key_check/in_config.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/api_key_check/with_request.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config_loaders/__init__.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config_loaders/json.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config_loaders/python.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config_loaders/toml.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/config_loaders/yaml.py +0 -0
- {inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/lm_proxy/loggers.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.3
|
|
2
2
|
Name: inference-proxy
|
|
3
|
-
Version: 3.0.
|
|
3
|
+
Version: 3.0.1
|
|
4
4
|
Summary: Inference Proxy is an OpenAI-compatible http proxy server for inferencing various LLMs capable of working with Google, Anthropic, OpenAI APIs, local PyTorch inference, etc.
|
|
5
5
|
License: MIT License
|
|
6
6
|
|
|
@@ -23,7 +23,7 @@ License: MIT License
|
|
|
23
23
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
24
24
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
25
25
|
SOFTWARE.
|
|
26
|
-
Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server
|
|
26
|
+
Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server,llm gateway,openai,anthropic,google genai
|
|
27
27
|
Author: Vitalii Stepanenko
|
|
28
28
|
Author-email: mail@vitaliy.in
|
|
29
29
|
Maintainer: Vitalii Stepanenko
|
|
@@ -36,9 +36,21 @@ Classifier: Programming Language :: Python :: 3.11
|
|
|
36
36
|
Classifier: Programming Language :: Python :: 3.12
|
|
37
37
|
Classifier: Programming Language :: Python :: 3.13
|
|
38
38
|
Classifier: License :: OSI Approved :: MIT License
|
|
39
|
+
Classifier: Operating System :: OS Independent
|
|
40
|
+
Classifier: Intended Audience :: Developers
|
|
41
|
+
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
|
|
42
|
+
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
43
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
44
|
+
Provides-Extra: all
|
|
45
|
+
Provides-Extra: anthropic
|
|
46
|
+
Provides-Extra: google
|
|
39
47
|
Provides-Extra: test
|
|
40
|
-
Requires-Dist: ai-microcore (>=5.
|
|
48
|
+
Requires-Dist: ai-microcore (>=5.1.2,<6)
|
|
49
|
+
Requires-Dist: anthropic (>=0.77,<1) ; extra == "all"
|
|
50
|
+
Requires-Dist: anthropic (>=0.77,<1) ; extra == "anthropic"
|
|
41
51
|
Requires-Dist: fastapi (>=0.121.3,<1)
|
|
52
|
+
Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "all"
|
|
53
|
+
Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "google"
|
|
42
54
|
Requires-Dist: pydantic (>=2.12.5,<2.13.0)
|
|
43
55
|
Requires-Dist: pytest (>=8.4.2,<8.5.0) ; extra == "test"
|
|
44
56
|
Requires-Dist: pytest-asyncio (>=1.2.0,<1.3.0) ; extra == "test"
|
|
@@ -46,6 +58,7 @@ Requires-Dist: pytest-cov (>=7.0.0,<7.1.0) ; extra == "test"
|
|
|
46
58
|
Requires-Dist: requests (>=2.32.5,<2.33.0)
|
|
47
59
|
Requires-Dist: typer (>=0.16.1)
|
|
48
60
|
Requires-Dist: uvicorn (>=0.22.0)
|
|
61
|
+
Project-URL: Bug Tracker, https://github.com/Nayjest/lm-proxy/issues
|
|
49
62
|
Project-URL: Source Code, https://github.com/Nayjest/lm-proxy
|
|
50
63
|
Description-Content-Type: text/markdown
|
|
51
64
|
|
|
@@ -92,13 +105,19 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
|
|
|
92
105
|
- [Load Balancing Example](#load-balancing-example)
|
|
93
106
|
- [Google Vertex AI Example](#google-vertex-ai-configuration-example)
|
|
94
107
|
- [Using Tokens from OIDC Provider as Virtual/Client API Keys](#using-tokens-from-oidc-provider-as-virtualclient-api-keys)
|
|
95
|
-
- [Add-on Components](
|
|
96
|
-
- [Database Connector](#database-connector)
|
|
108
|
+
- [Add-on Components](#-add-on-components)
|
|
109
|
+
- [Database Connector](#database-connector)
|
|
110
|
+
- [Request Handlers (Middleware)](#-request-handlers--middleware)
|
|
111
|
+
- [Guides & Reference](#-guides--reference)
|
|
112
|
+
- [Known Limitations](#-known-limitations)
|
|
97
113
|
- [Debugging](#-debugging)
|
|
98
114
|
- [Contributing](#-contributing)
|
|
99
115
|
- [License](#-license)
|
|
100
116
|
|
|
101
|
-
|
|
117
|
+
<a href="#" align="center"><img alt="Inference Proxy / Gateway" src="https://raw.githubusercontent.com/Nayjest/lm-proxy/main/press-kit/assets/lm-proxy_1_hacker_1600x672.png"></a>
|
|
118
|
+
|
|
119
|
+
|
|
120
|
+
## ✨ Features<a id="-features"></a>
|
|
102
121
|
|
|
103
122
|
- **Provider Agnostic**: Connect to OpenAI, Anthropic, Google AI, local models, and more using a single API
|
|
104
123
|
- **Unified Interface**: Access all models through the standard OpenAI API format
|
|
@@ -108,21 +127,28 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
|
|
|
108
127
|
- **Easy Configuration**: Simple TOML/YAML/JSON/Python configuration files for setup
|
|
109
128
|
- **Extensible by Design**: Minimal core with clearly defined extension points, enabling seamless customization and expansion without modifying the core system.
|
|
110
129
|
|
|
111
|
-
|
|
130
|
+
|
|
131
|
+
## 🚀 Getting Started<a id="-getting-started"></a>
|
|
112
132
|
|
|
113
133
|
### Requirements
|
|
114
134
|
Python 3.11 | 3.12 | 3.13
|
|
115
135
|
|
|
116
|
-
### Installation
|
|
117
|
-
|
|
136
|
+
### Installation<a id="installation"></a>
|
|
118
137
|
```bash
|
|
119
138
|
pip install inference-proxy
|
|
120
139
|
```
|
|
140
|
+
For proxying to Anthropic API or Google Gemini via Vertex AI or Google AI Studio, install optional dependencies:
|
|
141
|
+
```
|
|
142
|
+
pip install inference-proxy[anthropic,google]
|
|
143
|
+
```
|
|
144
|
+
or
|
|
145
|
+
```
|
|
146
|
+
pip install inference-proxy[all]
|
|
147
|
+
```
|
|
121
148
|
|
|
122
|
-
### Quick Start
|
|
149
|
+
### Quick Start<a id="quick-start"></a>
|
|
123
150
|
|
|
124
151
|
#### 1. Create a `config.toml` file:
|
|
125
|
-
|
|
126
152
|
```toml
|
|
127
153
|
host = "0.0.0.0"
|
|
128
154
|
port = 8000
|
|
@@ -149,7 +175,6 @@ api_keys = ["YOUR_API_KEY_HERE"]
|
|
|
149
175
|
> To enhance security, consider storing upstream API keys in operating system environment variables rather than embedding them directly in the configuration file. You can reference these variables in the configuration using the env:<VAR_NAME> syntax.
|
|
150
176
|
|
|
151
177
|
#### 2. Start the server:
|
|
152
|
-
|
|
153
178
|
```bash
|
|
154
179
|
inference-proxy
|
|
155
180
|
```
|
|
@@ -159,7 +184,6 @@ python -m lm_proxy
|
|
|
159
184
|
```
|
|
160
185
|
|
|
161
186
|
#### 3. Use it with any OpenAI-compatible client:
|
|
162
|
-
|
|
163
187
|
```python
|
|
164
188
|
from openai import OpenAI
|
|
165
189
|
|
|
@@ -176,7 +200,6 @@ print(completion.choices[0].message.content)
|
|
|
176
200
|
```
|
|
177
201
|
|
|
178
202
|
Or use the same endpoint with Claude models:
|
|
179
|
-
|
|
180
203
|
```python
|
|
181
204
|
completion = client.chat.completions.create(
|
|
182
205
|
model="claude-opus-4-1-20250805", # This will be routed to Anthropic based on config
|
|
@@ -184,12 +207,12 @@ completion = client.chat.completions.create(
|
|
|
184
207
|
)
|
|
185
208
|
```
|
|
186
209
|
|
|
187
|
-
## 📝 Configuration
|
|
188
210
|
|
|
189
|
-
|
|
211
|
+
## 📝 Configuration<a id="-configuration"></a>
|
|
190
212
|
|
|
191
|
-
|
|
213
|
+
Inference Proxy is configured through a TOML/YAML/JSON/Python file that specifies connections, routing rules, and access control.
|
|
192
214
|
|
|
215
|
+
### Basic Structure<a id="basic-structure"></a>
|
|
193
216
|
```toml
|
|
194
217
|
host = "0.0.0.0" # Interface to bind to
|
|
195
218
|
port = 8000 # Port to listen on
|
|
@@ -248,19 +271,18 @@ created_at = "created_at"
|
|
|
248
271
|
duration = "duration"
|
|
249
272
|
```
|
|
250
273
|
|
|
251
|
-
### Environment Variables
|
|
274
|
+
### Environment Variables<a id="environment-variables"></a>
|
|
252
275
|
|
|
253
276
|
You can reference environment variables in your configuration file by prefixing values with `env:`.
|
|
254
277
|
|
|
255
278
|
For example:
|
|
256
|
-
|
|
257
279
|
```toml
|
|
258
280
|
[connections.openai]
|
|
259
281
|
api_key = "env:OPENAI_API_KEY"
|
|
260
282
|
```
|
|
261
283
|
|
|
262
284
|
At runtime, Inference Proxy automatically retrieves the value of the target variable
|
|
263
|
-
(OPENAI_API_KEY) from your operating system
|
|
285
|
+
(OPENAI_API_KEY) from your operating system's environment or from a .env file, if present.
|
|
264
286
|
|
|
265
287
|
### .env Files
|
|
266
288
|
|
|
@@ -280,7 +302,6 @@ LM_PROXY_DEBUG=no
|
|
|
280
302
|
```
|
|
281
303
|
|
|
282
304
|
You can also control `.env` file usage with the `--env` command-line option:
|
|
283
|
-
|
|
284
305
|
```bash
|
|
285
306
|
# Use a custom .env file path
|
|
286
307
|
inference-proxy --env="path/to/your/.env"
|
|
@@ -288,7 +309,8 @@ inference-proxy --env="path/to/your/.env"
|
|
|
288
309
|
inference-proxy --env=""
|
|
289
310
|
```
|
|
290
311
|
|
|
291
|
-
|
|
312
|
+
|
|
313
|
+
## 🔑 Proxy API Keys vs. Provider API Keys<a id="-proxy-api-keys-vs-provider-api-keys"></a>
|
|
292
314
|
|
|
293
315
|
Inference Proxy utilizes two distinct types of API keys to facilitate secure and efficient request handling.
|
|
294
316
|
|
|
@@ -309,18 +331,17 @@ This distinction ensures a clear separation of concerns:
|
|
|
309
331
|
Virtual API Keys manage user authentication and access within the proxy,
|
|
310
332
|
while Upstream API Keys handle secure communication with external providers.
|
|
311
333
|
|
|
312
|
-
## 🔌 API Usage
|
|
313
334
|
|
|
314
|
-
|
|
335
|
+
## 🔌 API Usage<a id="-api-usage"></a>
|
|
315
336
|
|
|
316
|
-
|
|
337
|
+
Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.
|
|
317
338
|
|
|
339
|
+
### Chat Completions Endpoint<a id="chat-completions-endpoint"></a>
|
|
318
340
|
```http
|
|
319
341
|
POST /v1/chat/completions
|
|
320
342
|
```
|
|
321
343
|
|
|
322
344
|
#### Request Format
|
|
323
|
-
|
|
324
345
|
```json
|
|
325
346
|
{
|
|
326
347
|
"model": "gpt-3.5-turbo",
|
|
@@ -334,7 +355,6 @@ POST /v1/chat/completions
|
|
|
334
355
|
```
|
|
335
356
|
|
|
336
357
|
#### Response Format
|
|
337
|
-
|
|
338
358
|
```json
|
|
339
359
|
{
|
|
340
360
|
"choices": [
|
|
@@ -351,12 +371,10 @@ POST /v1/chat/completions
|
|
|
351
371
|
```
|
|
352
372
|
|
|
353
373
|
|
|
354
|
-
### Models List Endpoint
|
|
374
|
+
### Models List Endpoint<a id="models-list-endpoint"></a>
|
|
355
375
|
|
|
356
376
|
|
|
357
377
|
List and describe all models available through the API.
|
|
358
|
-
|
|
359
|
-
|
|
360
378
|
```http
|
|
361
379
|
GET /v1/models
|
|
362
380
|
```
|
|
@@ -366,7 +384,6 @@ Routing keys can reference both **exact model names** and **model name patterns*
|
|
|
366
384
|
|
|
367
385
|
By default, wildcard patterns are displayed as-is in the models list (e.g., `"gpt*"`, `"claude*"`).
|
|
368
386
|
This behavior can be customized via the `model_listing_mode` configuration option:
|
|
369
|
-
|
|
370
387
|
```
|
|
371
388
|
model_listing_mode = "as_is" | "ignore_wildcards" | "expand_wildcards"
|
|
372
389
|
```
|
|
@@ -399,7 +416,6 @@ api_key = "env:ANTHROPIC_API_KEY"
|
|
|
399
416
|
|
|
400
417
|
|
|
401
418
|
#### Response Format
|
|
402
|
-
|
|
403
419
|
```json
|
|
404
420
|
{
|
|
405
421
|
"object": "list",
|
|
@@ -420,23 +436,22 @@ api_key = "env:ANTHROPIC_API_KEY"
|
|
|
420
436
|
}
|
|
421
437
|
```
|
|
422
438
|
|
|
423
|
-
|
|
439
|
+
|
|
440
|
+
## 🔒 User Groups Configuration<a id="-user-groups-configuration"></a>
|
|
424
441
|
|
|
425
442
|
The `[groups]` section in the configuration defines access control rules for different user groups.
|
|
426
443
|
Each group can have its own set of virtual API keys and permitted connections.
|
|
427
444
|
|
|
428
|
-
### Basic Group Definition
|
|
429
|
-
|
|
445
|
+
### Basic Group Definition<a id="basic-group-definition"></a>
|
|
430
446
|
```toml
|
|
431
447
|
[groups.default]
|
|
432
448
|
api_keys = ["KEY1", "KEY2"]
|
|
433
449
|
allowed_connections = "*" # Allow access to all connections
|
|
434
450
|
```
|
|
435
451
|
|
|
436
|
-
### Group-based Access Control
|
|
452
|
+
### Group-based Access Control<a id="group-based-access-control"></a>
|
|
437
453
|
|
|
438
454
|
You can create multiple groups to segment your users and control their access:
|
|
439
|
-
|
|
440
455
|
```toml
|
|
441
456
|
# Admin group with full access
|
|
442
457
|
[groups.admin]
|
|
@@ -454,7 +469,7 @@ api_keys = ["FREE_KEY_1", "FREE_KEY_2"]
|
|
|
454
469
|
allowed_connections = "openai" # Only allowed to use OpenAI connection
|
|
455
470
|
```
|
|
456
471
|
|
|
457
|
-
### Connection Restrictions
|
|
472
|
+
### Connection Restrictions<a id="connection-restrictions"></a>
|
|
458
473
|
|
|
459
474
|
The `allowed_connections` parameter controls which upstream providers a group can access:
|
|
460
475
|
|
|
@@ -468,7 +483,8 @@ This allows fine-grained control over which users can access which AI providers,
|
|
|
468
483
|
- Implementing usage quotas per group
|
|
469
484
|
- Billing and cost allocation by user group
|
|
470
485
|
|
|
471
|
-
|
|
486
|
+
|
|
487
|
+
### Virtual API Key Validation<a id="virtual-api-key-validation"></a>
|
|
472
488
|
|
|
473
489
|
#### Overview
|
|
474
490
|
|
|
@@ -485,7 +501,6 @@ In the .py config representation, the validator function can be passed directly
|
|
|
485
501
|
#### Example configuration for external API key validation using HTTP request to Keycloak / OpenID Connect
|
|
486
502
|
|
|
487
503
|
This example shows how to validate API keys against an external service (e.g., Keycloak):
|
|
488
|
-
|
|
489
504
|
```toml
|
|
490
505
|
[api_key_check]
|
|
491
506
|
class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
|
|
@@ -502,7 +517,6 @@ Authorization = "Bearer {api_key}"
|
|
|
502
517
|
|
|
503
518
|
For more advanced authentication needs,
|
|
504
519
|
you can implement a custom validator function:
|
|
505
|
-
|
|
506
520
|
```python
|
|
507
521
|
# my_validators.py
|
|
508
522
|
def validate_api_key(api_key: str) -> str | None:
|
|
@@ -523,7 +537,6 @@ def validate_api_key(api_key: str) -> str | None:
|
|
|
523
537
|
```
|
|
524
538
|
|
|
525
539
|
Then reference it in your config:
|
|
526
|
-
|
|
527
540
|
```toml
|
|
528
541
|
api_key_check = "my_validators.validate_api_key"
|
|
529
542
|
```
|
|
@@ -531,11 +544,11 @@ api_key_check = "my_validators.validate_api_key"
|
|
|
531
544
|
> In this case, the `api_keys` lists in groups are ignored, and the custom function is responsible for all validation logic.
|
|
532
545
|
|
|
533
546
|
|
|
534
|
-
## 🛠️ Advanced Usage
|
|
535
|
-
### Dynamic Model Routing
|
|
547
|
+
## 🛠️ Advanced Usage<a id="-advanced-usage"></a>
|
|
536
548
|
|
|
537
|
-
|
|
549
|
+
### Dynamic Model Routing<a id="dynamic-model-routing"></a>
|
|
538
550
|
|
|
551
|
+
The routing section allows flexible pattern matching with wildcards:
|
|
539
552
|
```toml
|
|
540
553
|
[routing]
|
|
541
554
|
"gpt-4*" = "openai.gpt-4" # Route gpt-4 requests to OpenAI GPT-4
|
|
@@ -548,18 +561,19 @@ The routing section allows flexible pattern matching with wildcards:
|
|
|
548
561
|
Keys are model name patterns (with `*` wildcard support), and values are connection/model mappings.
|
|
549
562
|
Connection names reference those defined in the `[connections]` section.
|
|
550
563
|
|
|
551
|
-
### Load Balancing Example
|
|
564
|
+
### Load Balancing Example<a id="load-balancing-example"></a>
|
|
552
565
|
|
|
553
566
|
- [Simple load-balancer configuration](https://github.com/Nayjest/lm-proxy/blob/main/examples/load_balancer_config.py)
|
|
554
567
|
This example demonstrates how to set up a load balancer that randomly
|
|
555
568
|
distributes requests across multiple language model servers using the lm_proxy.
|
|
556
569
|
|
|
557
|
-
### Google Vertex AI Configuration Example
|
|
570
|
+
### Google Vertex AI Configuration Example<a id="google-vertex-ai-configuration-example"></a>
|
|
558
571
|
|
|
559
572
|
- [vertex-ai.toml](https://github.com/Nayjest/lm-proxy/blob/main/examples/vertex-ai.toml)
|
|
560
573
|
This example demonstrates how to connect Inference Proxy to Google Gemini model via Vertex AI API
|
|
561
574
|
|
|
562
|
-
|
|
575
|
+
|
|
576
|
+
### Using Tokens from OIDC Provider as Virtual/Client API Keys<a id="using-tokens-from-oidc-provider-as-virtualclient-api-keys"></a>
|
|
563
577
|
|
|
564
578
|
You can configure Inference Proxy to validate tokens from OpenID Connect (OIDC) providers like Keycloak, Auth0, or Okta as API keys.
|
|
565
579
|
|
|
@@ -594,9 +608,95 @@ Authorization = "Bearer {api_key}"
|
|
|
594
608
|
|
|
595
609
|
Clients pass their OIDC access token as the API key when making requests to Inference Proxy.
|
|
596
610
|
|
|
597
|
-
## 🧩 Add-on Components
|
|
598
611
|
|
|
599
|
-
|
|
612
|
+
## 🪝 Request Handlers (Middleware)<a id="-request-handlers--middleware"></a>
|
|
613
|
+
|
|
614
|
+
Handlers intercept and modify requests *before* they reach the upstream LLM provider. They enable cross-cutting concerns such as rate limiting, logging, auditing, and header manipulation.
|
|
615
|
+
|
|
616
|
+
Handlers are defined in the `before` list within the configuration file and execute sequentially in the order specified.
|
|
617
|
+
|
|
618
|
+
### Built-in Handlers
|
|
619
|
+
|
|
620
|
+
Inference Proxy includes several built-in handlers for common operational needs.
|
|
621
|
+
|
|
622
|
+
#### Rate Limiter
|
|
623
|
+
|
|
624
|
+
The `RateLimiter` protects upstream credentials and manages traffic load using a sliding window algorithm.
|
|
625
|
+
|
|
626
|
+
**Parameters:**
|
|
627
|
+
|
|
628
|
+
| Parameter | Type | Description |
|
|
629
|
+
|-----------|------|-------------|
|
|
630
|
+
| `max_requests` | int | Maximum number of requests allowed per window |
|
|
631
|
+
| `window_seconds` | int | Duration of the sliding window in seconds |
|
|
632
|
+
| `per` | string | Scope of the limit: `api_key`, `ip`, `connection`, `group`, or `global` |
|
|
633
|
+
|
|
634
|
+
**Configuration:**
|
|
635
|
+
```toml
|
|
636
|
+
[[before]]
|
|
637
|
+
class = "lm_proxy.handlers.RateLimiter"
|
|
638
|
+
max_requests = 10
|
|
639
|
+
window_seconds = 60
|
|
640
|
+
per = "api_key"
|
|
641
|
+
|
|
642
|
+
[[before]]
|
|
643
|
+
class = "lm_proxy.handlers.RateLimiter"
|
|
644
|
+
max_requests = 1000
|
|
645
|
+
window_seconds = 300
|
|
646
|
+
per = "global"
|
|
647
|
+
```
|
|
648
|
+
|
|
649
|
+
#### HTTP Headers Forwarder
|
|
650
|
+
|
|
651
|
+
The `HTTPHeadersForwarder` passes specific headers from incoming client requests to the upstream provider—useful for distributed tracing or tenant context propagation.
|
|
652
|
+
|
|
653
|
+
Sensitive headers (`Authorization`, `Host`, `Content-Length`) are stripped by default to prevent protocol corruption and credential leaks.
|
|
654
|
+
```toml
|
|
655
|
+
[[before]]
|
|
656
|
+
class = "lm_proxy.handlers.HTTPHeadersForwarder"
|
|
657
|
+
white_list_headers = ["x-trace-id", "x-correlation-id", "x-tenant-id"]
|
|
658
|
+
```
|
|
659
|
+
See also [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md).
|
|
660
|
+
|
|
661
|
+
### Custom Handlers
|
|
662
|
+
|
|
663
|
+
Extend functionality by implementing custom handlers in Python. A handler is any callable (function or class instance) that accepts a `RequestContext`.
|
|
664
|
+
|
|
665
|
+
#### Interface
|
|
666
|
+
```python
|
|
667
|
+
from lm_proxy.base_types import RequestContext
|
|
668
|
+
|
|
669
|
+
async def my_custom_handler(ctx: RequestContext) -> None:
|
|
670
|
+
# Implementation here
|
|
671
|
+
pass
|
|
672
|
+
```
|
|
673
|
+
|
|
674
|
+
#### Example: Audit Logger
|
|
675
|
+
```python
|
|
676
|
+
# my_extensions.py
|
|
677
|
+
import logging
|
|
678
|
+
from lm_proxy.base_types import RequestContext
|
|
679
|
+
|
|
680
|
+
class AuditLogger:
|
|
681
|
+
def __init__(self, prefix: str = "AUDIT"):
|
|
682
|
+
self.prefix = prefix
|
|
683
|
+
|
|
684
|
+
async def __call__(self, ctx: RequestContext) -> None:
|
|
685
|
+
user = ctx.user_info.get("name", "anonymous")
|
|
686
|
+
logging.info(f"[{self.prefix}] User '{user}' requested model '{ctx.model}'")
|
|
687
|
+
```
|
|
688
|
+
|
|
689
|
+
**Registration:**
|
|
690
|
+
```toml
|
|
691
|
+
[[before]]
|
|
692
|
+
class = "my_extensions.AuditLogger"
|
|
693
|
+
prefix = "SECURITY_AUDIT"
|
|
694
|
+
```
|
|
695
|
+
|
|
696
|
+
|
|
697
|
+
## 🧩 Add-on Components<a id="-add-on-components"></a>
|
|
698
|
+
|
|
699
|
+
### Database Connector<a id="database-connector"></a>
|
|
600
700
|
|
|
601
701
|
[inference-proxy-db-connector](https://github.com/nayjest/lm-proxy-db-connector) is a lightweight SQLAlchemy-based connector that enables Inference Proxy to work with relational databases including PostgreSQL, MySQL/MariaDB, SQLite, Oracle, Microsoft SQL Server, and many others.
|
|
602
702
|
|
|
@@ -605,7 +705,21 @@ Clients pass their OIDC access token as the API key when making requests to Infe
|
|
|
605
705
|
- Share database connections across components, extensions, and custom functions
|
|
606
706
|
- Built-in database logger for structured logging of AI request data
|
|
607
707
|
|
|
608
|
-
|
|
708
|
+
|
|
709
|
+
## 📚 Guides & Reference<a id="-guides--reference"></a>
|
|
710
|
+
|
|
711
|
+
For more detailed information, check out these articles:
|
|
712
|
+
- [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md)
|
|
713
|
+
|
|
714
|
+
|
|
715
|
+
## 🚧 Known Limitations<a id="-known-limitations"></a>
|
|
716
|
+
|
|
717
|
+
- **Multiple generations (n > 1):** When proxying requests to Google or Anthropic APIs, only the first generation is returned. Multi-generation support is tracked in [#35](https://github.com/Nayjest/lm-proxy/issues/35).
|
|
718
|
+
|
|
719
|
+
- **Model listing with wildcards / forwarding actual model metadata:** The `/v1/models` endpoint does not query upstream providers to expand wildcard patterns (e.g., `gpt*`) or fetch model metadata. Only explicitly defined model names are listed [#36](https://github.com/Nayjest/lm-proxy/issues/36).
|
|
720
|
+
|
|
721
|
+
|
|
722
|
+
## 🔍 Debugging<a id="-debugging"></a>
|
|
609
723
|
|
|
610
724
|
### Overview
|
|
611
725
|
When **debugging mode** is enabled,
|
|
@@ -629,7 +743,7 @@ Alternatively, you can enable or disable debugging via the command-line argument
|
|
|
629
743
|
> CLI arguments override environment variable settings.
|
|
630
744
|
|
|
631
745
|
|
|
632
|
-
## 🤝 Contributing
|
|
746
|
+
## 🤝 Contributing<a id="-contributing"></a>
|
|
633
747
|
|
|
634
748
|
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
635
749
|
|
|
@@ -640,7 +754,7 @@ Contributions are welcome! Please feel free to submit a Pull Request.
|
|
|
640
754
|
5. Open a Pull Request
|
|
641
755
|
|
|
642
756
|
|
|
643
|
-
## 📄 License
|
|
757
|
+
## 📄 License<a id="-license"></a>
|
|
644
758
|
|
|
645
759
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
646
760
|
© 2025–2026 [Vitalii Stepanenko](mailto:mail@vitaliy.in)
|