inference-proxy 3.0.0__tar.gz → 3.0.0.dev1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/PKG-INFO +54 -164
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/README.md +51 -150
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/app.py +0 -3
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/base_types.py +1 -12
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/bootstrap.py +3 -14
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config.py +0 -1
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/core.py +33 -55
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/models_endpoint.py +1 -2
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/utils.py +0 -2
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/pyproject.toml +16 -36
- inference_proxy-3.0.0/lm_proxy/errors.py +0 -43
- inference_proxy-3.0.0/lm_proxy/handlers/__init__.py +0 -7
- inference_proxy-3.0.0/lm_proxy/handlers/forward_http_headers.py +0 -70
- inference_proxy-3.0.0/lm_proxy/handlers/rate_limiter.py +0 -88
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/LICENSE +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/__init__.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/__main__.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/_app.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/api_key_check/__init__.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/api_key_check/allow_all.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/api_key_check/in_config.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/api_key_check/with_request.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config_loaders/__init__.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config_loaders/json.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config_loaders/python.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config_loaders/toml.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/config_loaders/yaml.py +0 -0
- {inference_proxy-3.0.0 → inference_proxy-3.0.0.dev1}/lm_proxy/loggers.py +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.3
|
|
2
2
|
Name: inference-proxy
|
|
3
|
-
Version: 3.0.0
|
|
3
|
+
Version: 3.0.0.dev1
|
|
4
4
|
Summary: Inference Proxy is an OpenAI-compatible http proxy server for inferencing various LLMs capable of working with Google, Anthropic, OpenAI APIs, local PyTorch inference, etc.
|
|
5
5
|
License: MIT License
|
|
6
6
|
|
|
@@ -23,7 +23,7 @@ License: MIT License
|
|
|
23
23
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
24
24
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
25
25
|
SOFTWARE.
|
|
26
|
-
Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server
|
|
26
|
+
Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server
|
|
27
27
|
Author: Vitalii Stepanenko
|
|
28
28
|
Author-email: mail@vitaliy.in
|
|
29
29
|
Maintainer: Vitalii Stepanenko
|
|
@@ -36,21 +36,9 @@ Classifier: Programming Language :: Python :: 3.11
|
|
|
36
36
|
Classifier: Programming Language :: Python :: 3.12
|
|
37
37
|
Classifier: Programming Language :: Python :: 3.13
|
|
38
38
|
Classifier: License :: OSI Approved :: MIT License
|
|
39
|
-
Classifier: Operating System :: OS Independent
|
|
40
|
-
Classifier: Intended Audience :: Developers
|
|
41
|
-
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
|
|
42
|
-
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
|
|
43
|
-
Classifier: Development Status :: 5 - Production/Stable
|
|
44
|
-
Provides-Extra: all
|
|
45
|
-
Provides-Extra: anthropic
|
|
46
|
-
Provides-Extra: google
|
|
47
39
|
Provides-Extra: test
|
|
48
|
-
Requires-Dist: ai-microcore (>=5.
|
|
49
|
-
Requires-Dist: anthropic (>=0.77,<1) ; extra == "all"
|
|
50
|
-
Requires-Dist: anthropic (>=0.77,<1) ; extra == "anthropic"
|
|
40
|
+
Requires-Dist: ai-microcore (>=5.0.0.dev7,<6)
|
|
51
41
|
Requires-Dist: fastapi (>=0.121.3,<1)
|
|
52
|
-
Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "all"
|
|
53
|
-
Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "google"
|
|
54
42
|
Requires-Dist: pydantic (>=2.12.5,<2.13.0)
|
|
55
43
|
Requires-Dist: pytest (>=8.4.2,<8.5.0) ; extra == "test"
|
|
56
44
|
Requires-Dist: pytest-asyncio (>=1.2.0,<1.3.0) ; extra == "test"
|
|
@@ -104,17 +92,13 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
|
|
|
104
92
|
- [Load Balancing Example](#load-balancing-example)
|
|
105
93
|
- [Google Vertex AI Example](#google-vertex-ai-configuration-example)
|
|
106
94
|
- [Using Tokens from OIDC Provider as Virtual/Client API Keys](#using-tokens-from-oidc-provider-as-virtualclient-api-keys)
|
|
107
|
-
- [Add-on Components](
|
|
108
|
-
- [Database Connector](#database-connector)
|
|
109
|
-
- [Request Handlers (Middleware)](#-request-handlers--middleware)
|
|
110
|
-
- [Guides & Reference](#-guides--reference)
|
|
111
|
-
- [Known Limitations](#-known-limitations)
|
|
95
|
+
- [Add-on Components](#add-on-components)
|
|
96
|
+
- [Database Connector](#database-connector)
|
|
112
97
|
- [Debugging](#-debugging)
|
|
113
98
|
- [Contributing](#-contributing)
|
|
114
99
|
- [License](#-license)
|
|
115
100
|
|
|
116
|
-
|
|
117
|
-
## ✨ Features<a id="-features"></a>
|
|
101
|
+
## ✨ Features
|
|
118
102
|
|
|
119
103
|
- **Provider Agnostic**: Connect to OpenAI, Anthropic, Google AI, local models, and more using a single API
|
|
120
104
|
- **Unified Interface**: Access all models through the standard OpenAI API format
|
|
@@ -124,28 +108,21 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
|
|
|
124
108
|
- **Easy Configuration**: Simple TOML/YAML/JSON/Python configuration files for setup
|
|
125
109
|
- **Extensible by Design**: Minimal core with clearly defined extension points, enabling seamless customization and expansion without modifying the core system.
|
|
126
110
|
|
|
127
|
-
|
|
128
|
-
## 🚀 Getting Started<a id="-getting-started"></a>
|
|
111
|
+
## 🚀 Getting Started
|
|
129
112
|
|
|
130
113
|
### Requirements
|
|
131
114
|
Python 3.11 | 3.12 | 3.13
|
|
132
115
|
|
|
133
|
-
### Installation
|
|
116
|
+
### Installation
|
|
117
|
+
|
|
134
118
|
```bash
|
|
135
119
|
pip install inference-proxy
|
|
136
120
|
```
|
|
137
|
-
For proxying to Anthropic API or Google Gemini via Vertex AI or Google AI Studio, install optional dependencies:
|
|
138
|
-
```
|
|
139
|
-
pip install inference-proxy[anthropic,google]
|
|
140
|
-
```
|
|
141
|
-
or
|
|
142
|
-
```
|
|
143
|
-
pip install inference-proxy[all]
|
|
144
|
-
```
|
|
145
121
|
|
|
146
|
-
### Quick Start
|
|
122
|
+
### Quick Start
|
|
147
123
|
|
|
148
124
|
#### 1. Create a `config.toml` file:
|
|
125
|
+
|
|
149
126
|
```toml
|
|
150
127
|
host = "0.0.0.0"
|
|
151
128
|
port = 8000
|
|
@@ -172,6 +149,7 @@ api_keys = ["YOUR_API_KEY_HERE"]
|
|
|
172
149
|
> To enhance security, consider storing upstream API keys in operating system environment variables rather than embedding them directly in the configuration file. You can reference these variables in the configuration using the env:<VAR_NAME> syntax.
|
|
173
150
|
|
|
174
151
|
#### 2. Start the server:
|
|
152
|
+
|
|
175
153
|
```bash
|
|
176
154
|
inference-proxy
|
|
177
155
|
```
|
|
@@ -181,6 +159,7 @@ python -m lm_proxy
|
|
|
181
159
|
```
|
|
182
160
|
|
|
183
161
|
#### 3. Use it with any OpenAI-compatible client:
|
|
162
|
+
|
|
184
163
|
```python
|
|
185
164
|
from openai import OpenAI
|
|
186
165
|
|
|
@@ -197,6 +176,7 @@ print(completion.choices[0].message.content)
|
|
|
197
176
|
```
|
|
198
177
|
|
|
199
178
|
Or use the same endpoint with Claude models:
|
|
179
|
+
|
|
200
180
|
```python
|
|
201
181
|
completion = client.chat.completions.create(
|
|
202
182
|
model="claude-opus-4-1-20250805", # This will be routed to Anthropic based on config
|
|
@@ -204,12 +184,12 @@ completion = client.chat.completions.create(
|
|
|
204
184
|
)
|
|
205
185
|
```
|
|
206
186
|
|
|
207
|
-
|
|
208
|
-
## 📝 Configuration<a id="-configuration"></a>
|
|
187
|
+
## 📝 Configuration
|
|
209
188
|
|
|
210
189
|
Inference Proxy is configured through a TOML/YAML/JSON/Python file that specifies connections, routing rules, and access control.
|
|
211
190
|
|
|
212
|
-
### Basic Structure
|
|
191
|
+
### Basic Structure
|
|
192
|
+
|
|
213
193
|
```toml
|
|
214
194
|
host = "0.0.0.0" # Interface to bind to
|
|
215
195
|
port = 8000 # Port to listen on
|
|
@@ -268,18 +248,19 @@ created_at = "created_at"
|
|
|
268
248
|
duration = "duration"
|
|
269
249
|
```
|
|
270
250
|
|
|
271
|
-
### Environment Variables
|
|
251
|
+
### Environment Variables
|
|
272
252
|
|
|
273
253
|
You can reference environment variables in your configuration file by prefixing values with `env:`.
|
|
274
254
|
|
|
275
255
|
For example:
|
|
256
|
+
|
|
276
257
|
```toml
|
|
277
258
|
[connections.openai]
|
|
278
259
|
api_key = "env:OPENAI_API_KEY"
|
|
279
260
|
```
|
|
280
261
|
|
|
281
262
|
At runtime, Inference Proxy automatically retrieves the value of the target variable
|
|
282
|
-
(OPENAI_API_KEY) from your operating system
|
|
263
|
+
(OPENAI_API_KEY) from your operating system’s environment or from a .env file, if present.
|
|
283
264
|
|
|
284
265
|
### .env Files
|
|
285
266
|
|
|
@@ -299,6 +280,7 @@ LM_PROXY_DEBUG=no
|
|
|
299
280
|
```
|
|
300
281
|
|
|
301
282
|
You can also control `.env` file usage with the `--env` command-line option:
|
|
283
|
+
|
|
302
284
|
```bash
|
|
303
285
|
# Use a custom .env file path
|
|
304
286
|
inference-proxy --env="path/to/your/.env"
|
|
@@ -306,8 +288,7 @@ inference-proxy --env="path/to/your/.env"
|
|
|
306
288
|
inference-proxy --env=""
|
|
307
289
|
```
|
|
308
290
|
|
|
309
|
-
|
|
310
|
-
## 🔑 Proxy API Keys vs. Provider API Keys<a id="-proxy-api-keys-vs-provider-api-keys"></a>
|
|
291
|
+
## 🔑 Proxy API Keys vs. Provider API Keys
|
|
311
292
|
|
|
312
293
|
Inference Proxy utilizes two distinct types of API keys to facilitate secure and efficient request handling.
|
|
313
294
|
|
|
@@ -328,17 +309,18 @@ This distinction ensures a clear separation of concerns:
|
|
|
328
309
|
Virtual API Keys manage user authentication and access within the proxy,
|
|
329
310
|
while Upstream API Keys handle secure communication with external providers.
|
|
330
311
|
|
|
331
|
-
|
|
332
|
-
## 🔌 API Usage<a id="-api-usage"></a>
|
|
312
|
+
## 🔌 API Usage
|
|
333
313
|
|
|
334
314
|
Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.
|
|
335
315
|
|
|
336
|
-
### Chat Completions Endpoint
|
|
316
|
+
### Chat Completions Endpoint
|
|
317
|
+
|
|
337
318
|
```http
|
|
338
319
|
POST /v1/chat/completions
|
|
339
320
|
```
|
|
340
321
|
|
|
341
322
|
#### Request Format
|
|
323
|
+
|
|
342
324
|
```json
|
|
343
325
|
{
|
|
344
326
|
"model": "gpt-3.5-turbo",
|
|
@@ -352,6 +334,7 @@ POST /v1/chat/completions
|
|
|
352
334
|
```
|
|
353
335
|
|
|
354
336
|
#### Response Format
|
|
337
|
+
|
|
355
338
|
```json
|
|
356
339
|
{
|
|
357
340
|
"choices": [
|
|
@@ -368,10 +351,12 @@ POST /v1/chat/completions
|
|
|
368
351
|
```
|
|
369
352
|
|
|
370
353
|
|
|
371
|
-
### Models List Endpoint
|
|
354
|
+
### Models List Endpoint
|
|
372
355
|
|
|
373
356
|
|
|
374
357
|
List and describe all models available through the API.
|
|
358
|
+
|
|
359
|
+
|
|
375
360
|
```http
|
|
376
361
|
GET /v1/models
|
|
377
362
|
```
|
|
@@ -381,6 +366,7 @@ Routing keys can reference both **exact model names** and **model name patterns*
|
|
|
381
366
|
|
|
382
367
|
By default, wildcard patterns are displayed as-is in the models list (e.g., `"gpt*"`, `"claude*"`).
|
|
383
368
|
This behavior can be customized via the `model_listing_mode` configuration option:
|
|
369
|
+
|
|
384
370
|
```
|
|
385
371
|
model_listing_mode = "as_is" | "ignore_wildcards" | "expand_wildcards"
|
|
386
372
|
```
|
|
@@ -413,6 +399,7 @@ api_key = "env:ANTHROPIC_API_KEY"
|
|
|
413
399
|
|
|
414
400
|
|
|
415
401
|
#### Response Format
|
|
402
|
+
|
|
416
403
|
```json
|
|
417
404
|
{
|
|
418
405
|
"object": "list",
|
|
@@ -433,22 +420,23 @@ api_key = "env:ANTHROPIC_API_KEY"
|
|
|
433
420
|
}
|
|
434
421
|
```
|
|
435
422
|
|
|
436
|
-
|
|
437
|
-
## 🔒 User Groups Configuration<a id="-user-groups-configuration"></a>
|
|
423
|
+
## 🔒 User Groups Configuration
|
|
438
424
|
|
|
439
425
|
The `[groups]` section in the configuration defines access control rules for different user groups.
|
|
440
426
|
Each group can have its own set of virtual API keys and permitted connections.
|
|
441
427
|
|
|
442
|
-
### Basic Group Definition
|
|
428
|
+
### Basic Group Definition
|
|
429
|
+
|
|
443
430
|
```toml
|
|
444
431
|
[groups.default]
|
|
445
432
|
api_keys = ["KEY1", "KEY2"]
|
|
446
433
|
allowed_connections = "*" # Allow access to all connections
|
|
447
434
|
```
|
|
448
435
|
|
|
449
|
-
### Group-based Access Control
|
|
436
|
+
### Group-based Access Control
|
|
450
437
|
|
|
451
438
|
You can create multiple groups to segment your users and control their access:
|
|
439
|
+
|
|
452
440
|
```toml
|
|
453
441
|
# Admin group with full access
|
|
454
442
|
[groups.admin]
|
|
@@ -466,7 +454,7 @@ api_keys = ["FREE_KEY_1", "FREE_KEY_2"]
|
|
|
466
454
|
allowed_connections = "openai" # Only allowed to use OpenAI connection
|
|
467
455
|
```
|
|
468
456
|
|
|
469
|
-
### Connection Restrictions
|
|
457
|
+
### Connection Restrictions
|
|
470
458
|
|
|
471
459
|
The `allowed_connections` parameter controls which upstream providers a group can access:
|
|
472
460
|
|
|
@@ -480,8 +468,7 @@ This allows fine-grained control over which users can access which AI providers,
|
|
|
480
468
|
- Implementing usage quotas per group
|
|
481
469
|
- Billing and cost allocation by user group
|
|
482
470
|
|
|
483
|
-
|
|
484
|
-
### Virtual API Key Validation<a id="virtual-api-key-validation"></a>
|
|
471
|
+
### Virtual API Key Validation
|
|
485
472
|
|
|
486
473
|
#### Overview
|
|
487
474
|
|
|
@@ -498,6 +485,7 @@ In the .py config representation, the validator function can be passed directly
|
|
|
498
485
|
#### Example configuration for external API key validation using HTTP request to Keycloak / OpenID Connect
|
|
499
486
|
|
|
500
487
|
This example shows how to validate API keys against an external service (e.g., Keycloak):
|
|
488
|
+
|
|
501
489
|
```toml
|
|
502
490
|
[api_key_check]
|
|
503
491
|
class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
|
|
@@ -514,6 +502,7 @@ Authorization = "Bearer {api_key}"
|
|
|
514
502
|
|
|
515
503
|
For more advanced authentication needs,
|
|
516
504
|
you can implement a custom validator function:
|
|
505
|
+
|
|
517
506
|
```python
|
|
518
507
|
# my_validators.py
|
|
519
508
|
def validate_api_key(api_key: str) -> str | None:
|
|
@@ -534,6 +523,7 @@ def validate_api_key(api_key: str) -> str | None:
|
|
|
534
523
|
```
|
|
535
524
|
|
|
536
525
|
Then reference it in your config:
|
|
526
|
+
|
|
537
527
|
```toml
|
|
538
528
|
api_key_check = "my_validators.validate_api_key"
|
|
539
529
|
```
|
|
@@ -541,11 +531,11 @@ api_key_check = "my_validators.validate_api_key"
|
|
|
541
531
|
> In this case, the `api_keys` lists in groups are ignored, and the custom function is responsible for all validation logic.
|
|
542
532
|
|
|
543
533
|
|
|
544
|
-
## 🛠️ Advanced Usage
|
|
545
|
-
|
|
546
|
-
### Dynamic Model Routing<a id="dynamic-model-routing"></a>
|
|
534
|
+
## 🛠️ Advanced Usage
|
|
535
|
+
### Dynamic Model Routing
|
|
547
536
|
|
|
548
537
|
The routing section allows flexible pattern matching with wildcards:
|
|
538
|
+
|
|
549
539
|
```toml
|
|
550
540
|
[routing]
|
|
551
541
|
"gpt-4*" = "openai.gpt-4" # Route gpt-4 requests to OpenAI GPT-4
|
|
@@ -558,19 +548,18 @@ The routing section allows flexible pattern matching with wildcards:
|
|
|
558
548
|
Keys are model name patterns (with `*` wildcard support), and values are connection/model mappings.
|
|
559
549
|
Connection names reference those defined in the `[connections]` section.
|
|
560
550
|
|
|
561
|
-
### Load Balancing Example
|
|
551
|
+
### Load Balancing Example
|
|
562
552
|
|
|
563
553
|
- [Simple load-balancer configuration](https://github.com/Nayjest/lm-proxy/blob/main/examples/load_balancer_config.py)
|
|
564
554
|
This example demonstrates how to set up a load balancer that randomly
|
|
565
555
|
distributes requests across multiple language model servers using the lm_proxy.
|
|
566
556
|
|
|
567
|
-
### Google Vertex AI Configuration Example
|
|
557
|
+
### Google Vertex AI Configuration Example
|
|
568
558
|
|
|
569
559
|
- [vertex-ai.toml](https://github.com/Nayjest/lm-proxy/blob/main/examples/vertex-ai.toml)
|
|
570
560
|
This example demonstrates how to connect Inference Proxy to Google Gemini model via Vertex AI API
|
|
571
561
|
|
|
572
|
-
|
|
573
|
-
### Using Tokens from OIDC Provider as Virtual/Client API Keys<a id="using-tokens-from-oidc-provider-as-virtualclient-api-keys"></a>
|
|
562
|
+
### Using Tokens from OIDC Provider as Virtual/Client API Keys
|
|
574
563
|
|
|
575
564
|
You can configure Inference Proxy to validate tokens from OpenID Connect (OIDC) providers like Keycloak, Auth0, or Okta as API keys.
|
|
576
565
|
|
|
@@ -605,95 +594,9 @@ Authorization = "Bearer {api_key}"
|
|
|
605
594
|
|
|
606
595
|
Clients pass their OIDC access token as the API key when making requests to Inference Proxy.
|
|
607
596
|
|
|
597
|
+
## 🧩 Add-on Components
|
|
608
598
|
|
|
609
|
-
|
|
610
|
-
|
|
611
|
-
Handlers intercept and modify requests *before* they reach the upstream LLM provider. They enable cross-cutting concerns such as rate limiting, logging, auditing, and header manipulation.
|
|
612
|
-
|
|
613
|
-
Handlers are defined in the `before` list within the configuration file and execute sequentially in the order specified.
|
|
614
|
-
|
|
615
|
-
### Built-in Handlers
|
|
616
|
-
|
|
617
|
-
Inference Proxy includes several built-in handlers for common operational needs.
|
|
618
|
-
|
|
619
|
-
#### Rate Limiter
|
|
620
|
-
|
|
621
|
-
The `RateLimiter` protects upstream credentials and manages traffic load using a sliding window algorithm.
|
|
622
|
-
|
|
623
|
-
**Parameters:**
|
|
624
|
-
|
|
625
|
-
| Parameter | Type | Description |
|
|
626
|
-
|-----------|------|-------------|
|
|
627
|
-
| `max_requests` | int | Maximum number of requests allowed per window |
|
|
628
|
-
| `window_seconds` | int | Duration of the sliding window in seconds |
|
|
629
|
-
| `per` | string | Scope of the limit: `api_key`, `ip`, `connection`, `group`, or `global` |
|
|
630
|
-
|
|
631
|
-
**Configuration:**
|
|
632
|
-
```toml
|
|
633
|
-
[[before]]
|
|
634
|
-
class = "lm_proxy.handlers.RateLimiter"
|
|
635
|
-
max_requests = 10
|
|
636
|
-
window_seconds = 60
|
|
637
|
-
per = "api_key"
|
|
638
|
-
|
|
639
|
-
[[before]]
|
|
640
|
-
class = "lm_proxy.handlers.RateLimiter"
|
|
641
|
-
max_requests = 1000
|
|
642
|
-
window_seconds = 300
|
|
643
|
-
per = "global"
|
|
644
|
-
```
|
|
645
|
-
|
|
646
|
-
#### HTTP Headers Forwarder
|
|
647
|
-
|
|
648
|
-
The `HTTPHeadersForwarder` passes specific headers from incoming client requests to the upstream provider—useful for distributed tracing or tenant context propagation.
|
|
649
|
-
|
|
650
|
-
Sensitive headers (`Authorization`, `Host`, `Content-Length`) are stripped by default to prevent protocol corruption and credential leaks.
|
|
651
|
-
```toml
|
|
652
|
-
[[before]]
|
|
653
|
-
class = "lm_proxy.handlers.HTTPHeadersForwarder"
|
|
654
|
-
white_list_headers = ["x-trace-id", "x-correlation-id", "x-tenant-id"]
|
|
655
|
-
```
|
|
656
|
-
See also [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md).
|
|
657
|
-
|
|
658
|
-
### Custom Handlers
|
|
659
|
-
|
|
660
|
-
Extend functionality by implementing custom handlers in Python. A handler is any callable (function or class instance) that accepts a `RequestContext`.
|
|
661
|
-
|
|
662
|
-
#### Interface
|
|
663
|
-
```python
|
|
664
|
-
from lm_proxy.base_types import RequestContext
|
|
665
|
-
|
|
666
|
-
async def my_custom_handler(ctx: RequestContext) -> None:
|
|
667
|
-
# Implementation here
|
|
668
|
-
pass
|
|
669
|
-
```
|
|
670
|
-
|
|
671
|
-
#### Example: Audit Logger
|
|
672
|
-
```python
|
|
673
|
-
# my_extensions.py
|
|
674
|
-
import logging
|
|
675
|
-
from lm_proxy.base_types import RequestContext
|
|
676
|
-
|
|
677
|
-
class AuditLogger:
|
|
678
|
-
def __init__(self, prefix: str = "AUDIT"):
|
|
679
|
-
self.prefix = prefix
|
|
680
|
-
|
|
681
|
-
async def __call__(self, ctx: RequestContext) -> None:
|
|
682
|
-
user = ctx.user_info.get("name", "anonymous")
|
|
683
|
-
logging.info(f"[{self.prefix}] User '{user}' requested model '{ctx.model}'")
|
|
684
|
-
```
|
|
685
|
-
|
|
686
|
-
**Registration:**
|
|
687
|
-
```toml
|
|
688
|
-
[[before]]
|
|
689
|
-
class = "my_extensions.AuditLogger"
|
|
690
|
-
prefix = "SECURITY_AUDIT"
|
|
691
|
-
```
|
|
692
|
-
|
|
693
|
-
|
|
694
|
-
## 🧩 Add-on Components<a id="-add-on-components"></a>
|
|
695
|
-
|
|
696
|
-
### Database Connector<a id="database-connector"></a>
|
|
599
|
+
### Database Connector
|
|
697
600
|
|
|
698
601
|
[inference-proxy-db-connector](https://github.com/nayjest/lm-proxy-db-connector) is a lightweight SQLAlchemy-based connector that enables Inference Proxy to work with relational databases including PostgreSQL, MySQL/MariaDB, SQLite, Oracle, Microsoft SQL Server, and many others.
|
|
699
602
|
|
|
@@ -702,21 +605,7 @@ prefix = "SECURITY_AUDIT"
|
|
|
702
605
|
- Share database connections across components, extensions, and custom functions
|
|
703
606
|
- Built-in database logger for structured logging of AI request data
|
|
704
607
|
|
|
705
|
-
|
|
706
|
-
## 📚 Guides & Reference<a id="-guides--reference"></a>
|
|
707
|
-
|
|
708
|
-
For more detailed information, check out these articles:
|
|
709
|
-
- [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md)
|
|
710
|
-
|
|
711
|
-
|
|
712
|
-
## 🚧 Known Limitations<a id="-known-limitations"></a>
|
|
713
|
-
|
|
714
|
-
- **Multiple generations (n > 1):** When proxying requests to Google or Anthropic APIs, only the first generation is returned. Multi-generation support is tracked in [#35](https://github.com/Nayjest/lm-proxy/issues/35).
|
|
715
|
-
|
|
716
|
-
- **Model listing with wildcards / forwarding actual model metadata:** The `/v1/models` endpoint does not query upstream providers to expand wildcard patterns (e.g., `gpt*`) or fetch model metadata. Only explicitly defined model names are listed [#36](https://github.com/Nayjest/lm-proxy/issues/36).
|
|
717
|
-
|
|
718
|
-
|
|
719
|
-
## 🔍 Debugging<a id="-debugging"></a>
|
|
608
|
+
## 🔍 Debugging
|
|
720
609
|
|
|
721
610
|
### Overview
|
|
722
611
|
When **debugging mode** is enabled,
|
|
@@ -740,7 +629,7 @@ Alternatively, you can enable or disable debugging via the command-line argument
|
|
|
740
629
|
> CLI arguments override environment variable settings.
|
|
741
630
|
|
|
742
631
|
|
|
743
|
-
## 🤝 Contributing
|
|
632
|
+
## 🤝 Contributing
|
|
744
633
|
|
|
745
634
|
Contributions are welcome! Please feel free to submit a Pull Request.
|
|
746
635
|
|
|
@@ -751,7 +640,8 @@ Contributions are welcome! Please feel free to submit a Pull Request.
|
|
|
751
640
|
5. Open a Pull Request
|
|
752
641
|
|
|
753
642
|
|
|
754
|
-
## 📄 License
|
|
643
|
+
## 📄 License
|
|
755
644
|
|
|
756
645
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
|
757
646
|
© 2025–2026 [Vitalii Stepanenko](mailto:mail@vitaliy.in)
|
|
647
|
+
|