inference-proxy 2.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Vitalii Stepanenko
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,643 @@
1
+ Metadata-Version: 2.3
2
+ Name: inference-proxy
3
+ Version: 2.1.0
4
+ Summary: "Inference Proxy" is OpenAI-compatible http proxy server for inferencing various LLMs capable of working with Google, Anthropic, OpenAI APIs, local PyTorch inference, etc.
5
+ License: MIT License
6
+
7
+ Copyright (c) 2025 Vitalii Stepanenko
8
+
9
+ Permission is hereby granted, free of charge, to any person obtaining a copy
10
+ of this software and associated documentation files (the "Software"), to deal
11
+ in the Software without restriction, including without limitation the rights
12
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13
+ copies of the Software, and to permit persons to whom the Software is
14
+ furnished to do so, subject to the following conditions:
15
+
16
+ The above copyright notice and this permission notice shall be included in all
17
+ copies or substantial portions of the Software.
18
+
19
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25
+ SOFTWARE.
26
+ Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server
27
+ Author: Vitalii Stepanenko
28
+ Author-email: mail@vitalii.in
29
+ Maintainer: Vitalii Stepanenko
30
+ Maintainer-email: mail@vitalii.in
31
+ Requires-Python: >=3.11,<4
32
+ Classifier: Intended Audience :: Developers
33
+ Classifier: Operating System :: OS Independent
34
+ Classifier: Programming Language :: Python :: 3
35
+ Classifier: Programming Language :: Python :: 3.11
36
+ Classifier: Programming Language :: Python :: 3.12
37
+ Classifier: Programming Language :: Python :: 3.13
38
+ Classifier: License :: OSI Approved :: MIT License
39
+ Requires-Dist: ai-microcore (>=4.4.4,<4.5.0)
40
+ Requires-Dist: fastapi (>=0.116.1,<0.117.0)
41
+ Requires-Dist: pydantic (>=2.12.3,<2.13.0)
42
+ Requires-Dist: requests (>=2.32.3,<2.33.0)
43
+ Requires-Dist: typer (>=0.16.1)
44
+ Requires-Dist: uvicorn (>=0.22.0)
45
+ Project-URL: Source Code, https://github.com/Nayjest/lm-proxy
46
+ Description-Content-Type: text/markdown
47
+
48
+ <h1 align="center"><a href="#">Inference Proxy</a></h1>
49
+ <p align="center">
50
+ <b>Lightweight, OpenAI-compatible HTTP proxy server / gateway</b><br>unifying access to multiple <b>Large Language Model providers</b> and local inference <br>through a single, standardized API endpoint.
51
+ </p>
52
+ <p align="center">
53
+ <a href="https://pypi.org/project/lm-proxy/"><img src="https://img.shields.io/pypi/v/lm-proxy?color=blue" alt="PyPI"></a>
54
+ <a href="https://github.com/Nayjest/lm-proxy/actions/workflows/tests.yml"><img src="https://github.com/Nayjest/lm-proxy/actions/workflows/tests.yml/badge.svg" alt="Tests"></a>
55
+ <a href="https://github.com/Nayjest/lm-proxy/actions/workflows/code-style.yml"><img src="https://github.com/Nayjest/lm-proxy/actions/workflows/code-style.yml/badge.svg" alt="Code Style"></a>
56
+ <img src="https://raw.githubusercontent.com/Nayjest/lm-proxy/main/coverage.svg" alt="Code Coverage">
57
+ <a href="https://www.bestpractices.dev/projects/11364"><img src="https://www.bestpractices.dev/projects/11364/badge"></a>
58
+ <a href="https://github.com/Nayjest/lm-proxy/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Nayjest/lm-proxy?color=d08aff" alt="License"></a>
59
+ </p>
60
+
61
+ Built with Python, FastAPI and [MicroCore](https://github.com/Nayjest/ai-microcore), **Inference Proxy** seamlessly integrates cloud providers like Google, Anthropic, and OpenAI, as well as local PyTorch-based inference, while maintaining full compatibility with OpenAI's API format.
62
+
63
+ It works as a drop-in replacement for OpenAI's API, allowing you to switch between cloud providers and local models without modifying your existing client code.
64
+
65
+ **Inference Proxy** supports **real-time token streaming**, **secure Virtual API key management**, and can be used both as an importable Python library and as a standalone HTTP service. Whether you're building production applications or experimenting with different models, Inference Proxy eliminates integration complexity and keeps your codebase **provider-agnostic**.
66
+
67
+
68
+ ## Table of Contents
69
+ - [Overview](#inference-proxy)
70
+ - [Features](#-features)
71
+ - [Getting Started](#-getting-started)
72
+ - [Installation](#installation)
73
+ - [Quick Start](#quick-start)
74
+ - [Configuration](#-configuration)
75
+ - [Basic Structure](#basic-structure)
76
+ - [Environment Variables](#environment-variables)
77
+ - [Proxy API Keys vs. Provider API Keys](#-proxy-api-keys-vs-provider-api-keys)
78
+ - [API Usage](#-api-usage)
79
+ - [Chat Completions Endpoint](#chat-completions-endpoint)
80
+ - [Models List Endpoint](#models-list-endpoint)
81
+ - [User Groups Configuration](#-user-groups-configuration)
82
+ - [Basic Group Definition](#basic-group-definition)
83
+ - [Group-based Access Control](#group-based-access-control)
84
+ - [Connection Restrictions](#connection-restrictions)
85
+ - [Virtual API Key Validation](#virtual-api-key-validation)
86
+ - [Advanced Usage](#%EF%B8%8F-advanced-usage)
87
+ - [Dynamic Model Routing](#dynamic-model-routing)
88
+ - [Load Balancing Example](#load-balancing-example)
89
+ - [Google Vertex AI Example](#google-vertex-ai-configuration-example)
90
+ - [Using Tokens from OIDC Provider as Virtual/Client API Keys](#using-tokens-from-oidc-provider-as-virtualclient-api-keys)
91
+ - [Add-on Components](#add-on-components)
92
+ - [Database Connector](#database-connector)
93
+ - [Debugging](#-debugging)
94
+ - [Contributing](#-contributing)
95
+ - [License](#-license)
96
+
97
+ ## ✨ Features
98
+
99
+ - **Provider Agnostic**: Connect to OpenAI, Anthropic, Google AI, local models, and more using a single API
100
+ - **Unified Interface**: Access all models through the standard OpenAI API format
101
+ - **Dynamic Routing**: Route requests to different LLM providers based on model name patterns
102
+ - **Stream Support**: Full streaming support for real-time responses
103
+ - **API Key Management**: Configurable API key validation and access control
104
+ - **Easy Configuration**: Simple TOML/YAML/JSON/Python configuration files for setup
105
+ - **Extensible by Design**: Minimal core with clearly defined extension points, enabling seamless customization and expansion without modifying the core system.
106
+
107
+ ## 🚀 Getting Started
108
+
109
+ ### Requirements
110
+ Python 3.11 | 3.12 | 3.13
111
+
112
+ ### Installation
113
+
114
+ ```bash
115
+ pip install inference-proxy
116
+ ```
117
+
118
+ ### Quick Start
119
+
120
+ #### 1. Create a `config.toml` file:
121
+
122
+ ```toml
123
+ host = "0.0.0.0"
124
+ port = 8000
125
+
126
+ [connections]
127
+ [connections.openai]
128
+ api_type = "open_ai"
129
+ api_base = "https://api.openai.com/v1/"
130
+ api_key = "env:OPENAI_API_KEY"
131
+
132
+ [connections.anthropic]
133
+ api_type = "anthropic"
134
+ api_key = "env:ANTHROPIC_API_KEY"
135
+
136
+ [routing]
137
+ "gpt*" = "openai.*"
138
+ "claude*" = "anthropic.*"
139
+ "*" = "openai.gpt-3.5-turbo"
140
+
141
+ [groups.default]
142
+ api_keys = ["YOUR_API_KEY_HERE"]
143
+ ```
144
+ > **Note** ℹ️
145
+ > To enhance security, consider storing upstream API keys in operating system environment variables rather than embedding them directly in the configuration file. You can reference these variables in the configuration using the env:<VAR_NAME> syntax.
146
+
147
+ #### 2. Start the server:
148
+
149
+ ```bash
150
+ inference-proxy
151
+ ```
152
+ Alternatively, run it as a Python module:
153
+ ```bash
154
+ python -m lm_proxy
155
+ ```
156
+
157
+ #### 3. Use it with any OpenAI-compatible client:
158
+
159
+ ```python
160
+ from openai import OpenAI
161
+
162
+ client = OpenAI(
163
+ api_key="YOUR_API_KEY_HERE",
164
+ base_url="http://localhost:8000/v1"
165
+ )
166
+
167
+ completion = client.chat.completions.create(
168
+ model="gpt-5", # This will be routed to OpenAI based on config
169
+ messages=[{"role": "user", "content": "Hello, world!"}]
170
+ )
171
+ print(completion.choices[0].message.content)
172
+ ```
173
+
174
+ Or use the same endpoint with Claude models:
175
+
176
+ ```python
177
+ completion = client.chat.completions.create(
178
+ model="claude-opus-4-1-20250805", # This will be routed to Anthropic based on config
179
+ messages=[{"role": "user", "content": "Hello, world!"}]
180
+ )
181
+ ```
182
+
183
+ ## 📝 Configuration
184
+
185
+ Inference Proxy is configured through a TOML/YAML/JSON/Python file that specifies connections, routing rules, and access control.
186
+
187
+ ### Basic Structure
188
+
189
+ ```toml
190
+ host = "0.0.0.0" # Interface to bind to
191
+ port = 8000 # Port to listen on
192
+ dev_autoreload = false # Enable for development
193
+
194
+ # API key validation function (optional)
195
+ api_key_check = "lm_proxy.api_key_check.check_api_key_in_config"
196
+
197
+ # LLM Provider Connections
198
+ [connections]
199
+
200
+ [connections.openai]
201
+ api_type = "open_ai"
202
+ api_base = "https://api.openai.com/v1/"
203
+ api_key = "env:OPENAI_API_KEY"
204
+
205
+ [connections.google]
206
+ api_type = "google_ai_studio"
207
+ api_key = "env:GOOGLE_API_KEY"
208
+
209
+ [connections.anthropic]
210
+ api_type = "anthropic"
211
+ api_key = "env:ANTHROPIC_API_KEY"
212
+
213
+ # Routing rules (model_pattern = "connection.model")
214
+ [routing]
215
+ "gpt*" = "openai.*" # Route all GPT models to OpenAI
216
+ "claude*" = "anthropic.*" # Route all Claude models to Anthropic
217
+ "gemini*" = "google.*" # Route all Gemini models to Google
218
+ "*" = "openai.gpt-3.5-turbo" # Default fallback
219
+
220
+ # Access control groups
221
+ [groups.default]
222
+ api_keys = [
223
+ "KEY1",
224
+ "KEY2"
225
+ ]
226
+
227
+ # optional
228
+ [[loggers]]
229
+ class = 'lm_proxy.loggers.BaseLogger'
230
+ [loggers.log_writer]
231
+ class = 'lm_proxy.loggers.log_writers.JsonLogWriter'
232
+ file_name = 'storage/json.log'
233
+ [loggers.entry_transformer]
234
+ class = 'lm_proxy.loggers.LogEntryTransformer'
235
+ completion_tokens = "response.usage.completion_tokens"
236
+ prompt_tokens = "response.usage.prompt_tokens"
237
+ prompt = "request.messages"
238
+ response = "response"
239
+ group = "group"
240
+ connection = "connection"
241
+ api_key_id = "api_key_id"
242
+ remote_addr = "remote_addr"
243
+ created_at = "created_at"
244
+ duration = "duration"
245
+ ```
246
+
247
+ ### Environment Variables
248
+
249
+ You can reference environment variables in your configuration file by prefixing values with `env:`.
250
+
251
+ For example:
252
+
253
+ ```toml
254
+ [connections.openai]
255
+ api_key = "env:OPENAI_API_KEY"
256
+ ```
257
+
258
+ At runtime, Inference Proxy automatically retrieves the value of the target variable
259
+ (OPENAI_API_KEY) from your operating system’s environment or from a .env file, if present.
260
+
261
+ ### .env Files
262
+
263
+ By default, Inference Proxy looks for a `.env` file in the current working directory
264
+ and loads environment variables from it.
265
+
266
+ You can refer to the [.env.template](https://github.com/Nayjest/lm-proxy/blob/main/.env.template)
267
+ file for an example:
268
+ ```dotenv
269
+ OPENAI_API_KEY=sk-u........
270
+ GOOGLE_API_KEY=AI........
271
+ ANTHROPIC_API_KEY=sk-ant-api03--vE........
272
+
273
+ # "1", "TRUE", "YES", "ON", "ENABLED", "Y", "+" are true, case-insensitive.
274
+ # See https://github.com/Nayjest/ai-microcore/blob/v4.4.3/microcore/configuration.py#L36
275
+ LM_PROXY_DEBUG=no
276
+ ```
277
+
278
+ You can also control `.env` file usage with the `--env` command-line option:
279
+
280
+ ```bash
281
+ # Use a custom .env file path
282
+ inference-proxy --env="path/to/your/.env"
283
+ # Disable .env loading
284
+ inference-proxy --env=""
285
+ ```
286
+
287
+ ## 🔑 Proxy API Keys vs. Provider API Keys
288
+
289
+ Inference Proxy utilizes two distinct types of API keys to facilitate secure and efficient request handling.
290
+
291
+ - **Proxy API Key (Virtual API Key, Client API Key):**
292
+ A unique key generated and managed within the Inference Proxy.
293
+ Clients use these keys to authenticate their requests to the proxy's API endpoints.
294
+ Each Client API Key is associated with a specific group, which defines the scope of access and permissions for the client's requests.
295
+ These keys allow users to securely interact with the proxy without direct access to external service credentials.
296
+
297
+
298
+
299
+ - **Provider API Key (Upstream API Key):**
300
+ A key provided by external LLM inference providers (e.g., OpenAI, Anthropic, Mistral, etc.) and configured within the Inference Proxy.
301
+ The proxy uses these keys to authenticate and forward validated client requests to the respective external services.
302
+ Provider API Keys remain hidden from end users, ensuring secure and transparent communication with provider APIs.
303
+
304
+ This distinction ensures a clear separation of concerns:
305
+ Virtual API Keys manage user authentication and access within the proxy,
306
+ while Upstream API Keys handle secure communication with external providers.
307
+
308
+ ## 🔌 API Usage
309
+
310
+ Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.
311
+
312
+ ### Chat Completions Endpoint
313
+
314
+ ```http
315
+ POST /v1/chat/completions
316
+ ```
317
+
318
+ #### Request Format
319
+
320
+ ```json
321
+ {
322
+ "model": "gpt-3.5-turbo",
323
+ "messages": [
324
+ {"role": "system", "content": "You are a helpful assistant."},
325
+ {"role": "user", "content": "What is the capital of France?"}
326
+ ],
327
+ "temperature": 0.7,
328
+ "stream": false
329
+ }
330
+ ```
331
+
332
+ #### Response Format
333
+
334
+ ```json
335
+ {
336
+ "choices": [
337
+ {
338
+ "index": 0,
339
+ "message": {
340
+ "role": "assistant",
341
+ "content": "The capital of France is Paris."
342
+ },
343
+ "finish_reason": "stop"
344
+ }
345
+ ]
346
+ }
347
+ ```
348
+
349
+
350
+ ### Models List Endpoint
351
+
352
+
353
+ List and describe all models available through the API.
354
+
355
+
356
+ ```http
357
+ GET /v1/models
358
+ ```
359
+
360
+ The **Inference Proxy** dynamically builds the models list based on routing rules defined in `config.routing`.
361
+ Routing keys can reference both **exact model names** and **model name patterns** (e.g., `"gpt*"`, `"claude*"`, etc.).
362
+
363
+ By default, wildcard patterns are displayed as-is in the models list (e.g., `"gpt*"`, `"claude*"`).
364
+ This behavior can be customized via the `model_listing_mode` configuration option:
365
+
366
+ ```
367
+ model_listing_mode = "as_is" | "ignore_wildcards" | "expand_wildcards"
368
+ ```
369
+
370
+ Available modes:
371
+
372
+ - **`as_is`** *(default)* — Lists all entries exactly as defined in the routing configuration, including wildcard patterns.
373
+ - **`ignore_wildcards`** — Excludes wildcard patterns, showing only explicitly defined model names.
374
+ - **`expand_wildcards`** — Expands wildcard patterns by querying each connected backend for available models *(feature not yet implemented)*.
375
+
376
+ To obtain a complete and accurate model list in the current implementation,
377
+ all supported models must be explicitly defined in the routing configuration, for example:
378
+ ```toml
379
+ [routing]
380
+ "gpt-4" = "my_openai_connection.*"
381
+ "gpt-5" = "my_openai_connection.*"
382
+ "gpt-8"= "my_openai_connection.gpt-3.5-turbo"
383
+ "claude-4.5-sonnet" = "my_anthropic_connection.claude-sonnet-4-5-20250929"
384
+ "claude-4.1-opus" = "my_anthropic_connection.claude-opus-4-1-20250805"
385
+ [connections]
386
+ [connections.my_openai_connection]
387
+ api_type = "open_ai"
388
+ api_base = "https://api.openai.com/v1/"
389
+ api_key = "env:OPENAI_API_KEY"
390
+ [connections.my_anthropic_connection]
391
+ api_type = "anthropic"
392
+ api_key = "env:ANTHROPIC_API_KEY"
393
+ ```
394
+
395
+
396
+
397
+ #### Response Format
398
+
399
+ ```json
400
+ {
401
+ "object": "list",
402
+ "data": [
403
+ {
404
+ "id": "gpt-6",
405
+ "object": "model",
406
+ "created": 1686935002,
407
+ "owned_by": "organization-owner"
408
+ },
409
+ {
410
+ "id": "claude-5-sonnet",
411
+ "object": "model",
412
+ "created": 1686935002,
413
+ "owned_by": "organization-owner"
414
+ }
415
+ ]
416
+ }
417
+ ```
418
+
419
+ ## 🔒 User Groups Configuration
420
+
421
+ The `[groups]` section in the configuration defines access control rules for different user groups.
422
+ Each group can have its own set of virtual API keys and permitted connections.
423
+
424
+ ### Basic Group Definition
425
+
426
+ ```toml
427
+ [groups.default]
428
+ api_keys = ["KEY1", "KEY2"]
429
+ allowed_connections = "*" # Allow access to all connections
430
+ ```
431
+
432
+ ### Group-based Access Control
433
+
434
+ You can create multiple groups to segment your users and control their access:
435
+
436
+ ```toml
437
+ # Admin group with full access
438
+ [groups.admin]
439
+ api_keys = ["ADMIN_KEY_1", "ADMIN_KEY_2"]
440
+ allowed_connections = "*" # Access to all connections
441
+
442
+ # Regular users with limited access
443
+ [groups.users]
444
+ api_keys = ["USER_KEY_1", "USER_KEY_2"]
445
+ allowed_connections = "openai,anthropic" # Only allowed to use specific connections
446
+
447
+ # Free tier with minimal access
448
+ [groups.free]
449
+ api_keys = ["FREE_KEY_1", "FREE_KEY_2"]
450
+ allowed_connections = "openai" # Only allowed to use OpenAI connection
451
+ ```
452
+
453
+ ### Connection Restrictions
454
+
455
+ The `allowed_connections` parameter controls which upstream providers a group can access:
456
+
457
+ - `"*"` - Group can use all configured connections
458
+ - `"openai,anthropic"` - Comma-separated list of specific connections the group can use
459
+
460
+ This allows fine-grained control over which users can access which AI providers, enabling features like:
461
+
462
+ - Restricting expensive models to premium users
463
+ - Creating specialized access tiers for different user groups
464
+ - Implementing usage quotas per group
465
+ - Billing and cost allocation by user group
466
+
467
+ ### Virtual API Key Validation
468
+
469
+ #### Overview
470
+
471
+ LM-proxy includes 2 built-in methods for validating Virtual API keys:
472
+ - `lm_proxy.api_key_check.check_api_key_in_config` - verifies API keys against those defined in the config file; used by default
473
+ - `lm_proxy.api_key_check.CheckAPIKeyWithRequest` - validates API keys via an external HTTP service
474
+
475
+ The API key check method can be configured using the `api_key_check` configuration key.
476
+ Its value can be either a reference to a Python function in the format `my_module.sub_module1.sub_module2.fn_name`,
477
+ or an object containing parameters for a class-based validator.
478
+
479
+ In the .py config representation, the validator function can be passed directly as a callable.
480
+
481
+ #### Example configuration for external API key validation using HTTP request to Keycloak / OpenID Connect
482
+
483
+ This example shows how to validate API keys against an external service (e.g., Keycloak):
484
+
485
+ ```toml
486
+ [api_key_check]
487
+ class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
488
+ method = "POST"
489
+ url = "http://keycloak:8080/realms/master/protocol/openid-connect/userinfo"
490
+ response_as_user_info = true # interpret response JSON as user info object for further processing / logging
491
+ use_cache = true # requires installing cachetools if True: pip install cachetools
492
+ cache_ttl = 60 # Cache duration in seconds
493
+
494
+ [api_key_check.headers]
495
+ Authorization = "Bearer {api_key}"
496
+ ```
497
+ #### Custom API Key Validation / Extending functionality
498
+
499
+ For more advanced authentication needs,
500
+ you can implement a custom validator function:
501
+
502
+ ```python
503
+ # my_validators.py
504
+ def validate_api_key(api_key: str) -> str | None:
505
+ """
506
+ Validate an API key and return the group name if valid.
507
+
508
+ Args:
509
+ api_key: The API key to validate
510
+
511
+ Returns:
512
+ The name of the group if valid, None otherwise
513
+ """
514
+ if api_key == "secret-key":
515
+ return "admin"
516
+ elif api_key.startswith("user-"):
517
+ return "users"
518
+ return None
519
+ ```
520
+
521
+ Then reference it in your config:
522
+
523
+ ```toml
524
+ api_key_check = "my_validators.validate_api_key"
525
+ ```
526
+ > **NOTE**
527
+ > In this case, the `api_keys` lists in groups are ignored, and the custom function is responsible for all validation logic.
528
+
529
+
530
+ ## 🛠️ Advanced Usage
531
+ ### Dynamic Model Routing
532
+
533
+ The routing section allows flexible pattern matching with wildcards:
534
+
535
+ ```toml
536
+ [routing]
537
+ "gpt-4*" = "openai.gpt-4" # Route gpt-4 requests to OpenAI GPT-4
538
+ "gpt-3.5*" = "openai.gpt-3.5-turbo" # Route gpt-3.5 requests to OpenAI
539
+ "claude*" = "anthropic.*" # Pass model name as-is to Anthropic
540
+ "gemini*" = "google.*" # Pass model name as-is to Google
541
+ "custom*" = "local.llama-7b" # Map any "custom*" to a specific local model
542
+ "*" = "openai.gpt-3.5-turbo" # Default fallback for unmatched models
543
+ ```
544
+ Keys are model name patterns (with `*` wildcard support), and values are connection/model mappings.
545
+ Connection names reference those defined in the `[connections]` section.
546
+
547
+ ### Load Balancing Example
548
+
549
+ - [Simple load-balancer configuration](https://github.com/Nayjest/lm-proxy/blob/main/examples/load_balancer_config.py)
550
+ This example demonstrates how to set up a load balancer that randomly
551
+ distributes requests across multiple language model servers using the lm_proxy.
552
+
553
+ ### Google Vertex AI Configuration Example
554
+
555
+ - [vertex-ai.toml](https://github.com/Nayjest/lm-proxy/blob/main/examples/vertex-ai.toml)
556
+ This example demonstrates how to connect Inference Proxy to Google Gemini model via Vertex AI API
557
+
558
+ ### Using Tokens from OIDC Provider as Virtual/Client API Keys
559
+
560
+ You can configure Inference Proxy to validate tokens from OpenID Connect (OIDC) providers like Keycloak, Auth0, or Okta as API keys.
561
+
562
+ The following configuration validates Keycloak access tokens by calling the userinfo endpoint:
563
+ ```toml
564
+ [api_key_check]
565
+ class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
566
+ method = "POST"
567
+ url = "http://keycloak:8080/realms/master/protocol/openid-connect/userinfo"
568
+ response_as_user_info = true
569
+ use_cache = true
570
+ cache_ttl = 60
571
+
572
+ [api_key_check.headers]
573
+ Authorization = "Bearer {api_key}"
574
+ ```
575
+
576
+ **Configuration Parameters:**
577
+
578
+ - `class` - The API key validation handler class ([lm_proxy.api_key_check.CheckAPIKeyWithRequest](https://github.com/Nayjest/lm-proxy/blob/main/lm_proxy/api_key_check/with_request.py))
579
+ - `method` - HTTP method for the validation request (typically `POST` or `GET`)
580
+ - `url` - The OIDC provider's userinfo endpoint URL
581
+ - `response_as_user_info` - Parse the response as user information for further usage in Inference Proxy (extend logged info, determine user group, etc.)
582
+ - `use_cache` - Enable caching of validation results (requires installing the `cachetools` package if enabled: `pip install cachetools`)
583
+ - `cache_ttl` - Cache time-to-live in seconds (reduces load on identity provider)
584
+ - `headers` - Dictionary of headers to send with the validation request
585
+
586
+ > **Note**: The `{api_key}` placeholder can be used in headers or in the URL. Inference Proxy substitutes it with the API key from the client to perform the check.
587
+
588
+
589
+ **Usage:**
590
+
591
+ Clients pass their OIDC access token as the API key when making requests to Inference Proxy.
592
+
593
+ ## 🧩 Add-on Components
594
+
595
+ ### Database Connector
596
+
597
+ [inference-proxy-db-connector](https://github.com/nayjest/lm-proxy-db-connector) is a lightweight SQLAlchemy-based connector that enables Inference Proxy to work with relational databases including PostgreSQL, MySQL/MariaDB, SQLite, Oracle, Microsoft SQL Server, and many others.
598
+
599
+ **Key Features:**
600
+ - Configure database connections directly through Inference Proxy configuration
601
+ - Share database connections across components, extensions, and custom functions
602
+ - Built-in database logger for structured logging of AI request data
603
+
604
+ ## 🔍 Debugging
605
+
606
+ ### Overview
607
+ When **debugging mode** is enabled,
608
+ Inference Proxy provides detailed logging information to help diagnose issues:
609
+ - Stack traces for exceptions are shown in the console
610
+ - Logging level is set to DEBUG instead of INFO
611
+
612
+ > **Warning** ⚠️
613
+ > Never enable debugging mode in production environments, as it may expose sensitive information to the application logs.
614
+
615
+ ### Enabling Debugging Mode
616
+ To enable debugging, set the `LM_PROXY_DEBUG` environment variable to a truthy value (e.g., "1", "true", "yes").
617
+ > **Tip** 💡
618
+ > Environment variables can also be defined in a `.env` file.
619
+
620
+ Alternatively, you can enable or disable debugging via the command-line arguments:
621
+ - `--debug` to enable debugging
622
+ - `--no-debug` to disable debugging
623
+
624
+ > **Note** ℹ️
625
+ > CLI arguments override environment variable settings.
626
+
627
+
628
+ ## 🤝 Contributing
629
+
630
+ Contributions are welcome! Please feel free to submit a Pull Request.
631
+
632
+ 1. Fork the repository
633
+ 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
634
+ 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
635
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
636
+ 5. Open a Pull Request
637
+
638
+
639
+ ## 📄 License
640
+
641
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
642
+ © 2025 Vitalii Stepanenko
643
+