PyPI - inference-proxy - Versions diffs - 3.0.0.dev1__tar.gz → 3.0.1__tar.gz - Mend

inference-proxy 3.0.0.dev1tar.gz → 3.0.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

{inference_proxy-3.0.0.dev1 → inference_proxy-3.0.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: inference-proxy
-Version: 3.0.0.dev1
+Version: 3.0.1
 Summary: Inference Proxy is an OpenAI-compatible http proxy server for inferencing various LLMs capable of working with Google, Anthropic, OpenAI APIs, local PyTorch inference, etc.
 License: MIT License
@@ -23,7 +23,7 @@ License: MIT License
          LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
          OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
          SOFTWARE.
-Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server
+Keywords: llm,large language models,ai,gpt,openai,proxy,http,proxy-server,llm gateway,openai,anthropic,google genai
 Author: Vitalii Stepanenko
 Author-email: mail@vitaliy.in
 Maintainer: Vitalii Stepanenko
@@ -36,9 +36,21 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
 Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Intended Audience :: Developers
+Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Classifier: Development Status :: 5 - Production/Stable
+Provides-Extra: all
+Provides-Extra: anthropic
+Provides-Extra: google
 Provides-Extra: test
-Requires-Dist: ai-microcore (>=5.0.0.dev7,<6)
+Requires-Dist: ai-microcore (>=5.1.2,<6)
+Requires-Dist: anthropic (>=0.77,<1) ; extra == "all"
+Requires-Dist: anthropic (>=0.77,<1) ; extra == "anthropic"
 Requires-Dist: fastapi (>=0.121.3,<1)
+Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "all"
+Requires-Dist: google-genai (>=1.62.0,<2) ; extra == "google"
 Requires-Dist: pydantic (>=2.12.5,<2.13.0)
 Requires-Dist: pytest (>=8.4.2,<8.5.0) ; extra == "test"
 Requires-Dist: pytest-asyncio (>=1.2.0,<1.3.0) ; extra == "test"
@@ -46,6 +58,7 @@ Requires-Dist: pytest-cov (>=7.0.0,<7.1.0) ; extra == "test"
 Requires-Dist: requests (>=2.32.5,<2.33.0)
 Requires-Dist: typer (>=0.16.1)
 Requires-Dist: uvicorn (>=0.22.0)
+Project-URL: Bug Tracker, https://github.com/Nayjest/lm-proxy/issues
 Project-URL: Source Code, https://github.com/Nayjest/lm-proxy
 Description-Content-Type: text/markdown
@@ -92,13 +105,19 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
   - [Load Balancing Example](#load-balancing-example)
   - [Google Vertex AI Example](#google-vertex-ai-configuration-example)
   - [Using Tokens from OIDC Provider as Virtual/Client API Keys](#using-tokens-from-oidc-provider-as-virtualclient-api-keys)
-- [Add-on Components](#add-on-components)
-  - [Database Connector](#database-connector)
+- [Add-on Components](#-add-on-components)
+  - [Database Connector](#database-connector)
+- [Request Handlers (Middleware)](#-request-handlers--middleware)
+- [Guides & Reference](#-guides--reference)
+- [Known Limitations](#-known-limitations)
 - [Debugging](#-debugging)
 - [Contributing](#-contributing)
 - [License](#-license)
-## ✨ Features
+<a href="#" align="center"><img alt="Inference Proxy / Gateway" src="https://raw.githubusercontent.com/Nayjest/lm-proxy/main/press-kit/assets/lm-proxy_1_hacker_1600x672.png"></a>
+## ✨ Features<a id="-features"></a>
 - **Provider Agnostic**: Connect to OpenAI, Anthropic, Google AI, local models, and more using a single API
 - **Unified Interface**: Access all models through the standard OpenAI API format
@@ -108,21 +127,28 @@ It works as a drop-in replacement for OpenAI's API, allowing you to switch betwe
 - **Easy Configuration**: Simple TOML/YAML/JSON/Python configuration files for setup
 - **Extensible by Design**: Minimal core with clearly defined extension points, enabling seamless customization and expansion without modifying the core system.
-## 🚀 Getting Started
+## 🚀 Getting Started<a id="-getting-started"></a>
 ### Requirements
 Python 3.11 | 3.12 | 3.13
-### Installation
+### Installation<a id="installation"></a>
 ```bash
 pip install inference-proxy
 ```
+For proxying to Anthropic API or Google Gemini  via Vertex AI or Google AI Studio, install optional dependencies:
+```
+pip install inference-proxy[anthropic,google]
+```
+or
+```
+pip install inference-proxy[all]
+```
-### Quick Start
+### Quick Start<a id="quick-start"></a>
 #### 1. Create a `config.toml` file:
 ```toml
 host = "0.0.0.0"
 port = 8000
@@ -149,7 +175,6 @@ api_keys = ["YOUR_API_KEY_HERE"]
 > To enhance security, consider storing upstream API keys in operating system environment variables rather than embedding them directly in the configuration file. You can reference these variables in the configuration using the env:<VAR_NAME> syntax.
 #### 2. Start the server:
 ```bash
 inference-proxy
 ```
@@ -159,7 +184,6 @@ python -m lm_proxy
 ```
 #### 3. Use it with any OpenAI-compatible client:
 ```python
 from openai import OpenAI
@@ -176,7 +200,6 @@ print(completion.choices[0].message.content)
 ```
 Or use the same endpoint with Claude models:
 ```python
 completion = client.chat.completions.create(
     model="claude-opus-4-1-20250805",  # This will be routed to Anthropic based on config
@@ -184,12 +207,12 @@ completion = client.chat.completions.create(
 )
 ```
-## 📝 Configuration
-Inference Proxy is configured through a TOML/YAML/JSON/Python file that specifies connections, routing rules, and access control.
+## 📝 Configuration<a id="-configuration"></a>
-### Basic Structure
+Inference Proxy is configured through a TOML/YAML/JSON/Python file that specifies connections, routing rules, and access control.
+### Basic Structure<a id="basic-structure"></a>
 ```toml
 host = "0.0.0.0"  # Interface to bind to
 port = 8000       # Port to listen on
@@ -248,19 +271,18 @@ created_at = "created_at"
 duration = "duration"
 ```
-### Environment Variables
+### Environment Variables<a id="environment-variables"></a>
 You can reference environment variables in your configuration file by prefixing values with `env:`.
 For example:
 ```toml
 [connections.openai]
 api_key = "env:OPENAI_API_KEY"
 ```
 At runtime, Inference Proxy automatically retrieves the value of the target variable
-(OPENAI_API_KEY) from your operating system’s environment or from a .env file, if present.
+(OPENAI_API_KEY) from your operating system's environment or from a .env file, if present.
 ### .env Files
@@ -280,7 +302,6 @@ LM_PROXY_DEBUG=no
 ```
 You can also control `.env` file usage with the `--env` command-line option:
 ```bash
 # Use a custom .env file path
 inference-proxy --env="path/to/your/.env"
@@ -288,7 +309,8 @@ inference-proxy --env="path/to/your/.env"
 inference-proxy --env=""
 ```
-## 🔑 Proxy API Keys vs. Provider API Keys
+## 🔑 Proxy API Keys vs. Provider API Keys<a id="-proxy-api-keys-vs-provider-api-keys"></a>
 Inference Proxy utilizes two distinct types of API keys to facilitate secure and efficient request handling.
@@ -309,18 +331,17 @@ This distinction ensures a clear separation of concerns:
 Virtual API Keys manage user authentication and access within the proxy,
 while Upstream API Keys handle secure communication with external providers.
-## 🔌 API Usage
-Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.
+## 🔌 API Usage<a id="-api-usage"></a>
-### Chat Completions Endpoint
+Inference Proxy implements the OpenAI chat completions API endpoint. You can use any OpenAI-compatible client to interact with it.
+### Chat Completions Endpoint<a id="chat-completions-endpoint"></a>
 ```http
 POST /v1/chat/completions
 ```
 #### Request Format
 ```json
 {
   "model": "gpt-3.5-turbo",
@@ -334,7 +355,6 @@ POST /v1/chat/completions
 ```
 #### Response Format
 ```json
 {
   "choices": [
@@ -351,12 +371,10 @@ POST /v1/chat/completions
 ```
-### Models List Endpoint
+### Models List Endpoint<a id="models-list-endpoint"></a>
 List and describe all models available through the API.
 ```http
 GET /v1/models
 ```
@@ -366,7 +384,6 @@ Routing keys can reference both **exact model names** and **model name patterns*
 By default, wildcard patterns are displayed as-is in the models list (e.g., `"gpt*"`, `"claude*"`).
 This behavior can be customized via the `model_listing_mode` configuration option:
 ```
 model_listing_mode = "as_is" | "ignore_wildcards" | "expand_wildcards"
 ```
@@ -399,7 +416,6 @@ api_key  = "env:ANTHROPIC_API_KEY"
 #### Response Format
 ```json
 {
   "object": "list",
@@ -420,23 +436,22 @@ api_key  = "env:ANTHROPIC_API_KEY"
 }
 ```
-## 🔒 User Groups Configuration
+## 🔒 User Groups Configuration<a id="-user-groups-configuration"></a>
 The `[groups]` section in the configuration defines access control rules for different user groups.
 Each group can have its own set of virtual API keys and permitted connections.
-### Basic Group Definition
+### Basic Group Definition<a id="basic-group-definition"></a>
 ```toml
 [groups.default]
 api_keys = ["KEY1", "KEY2"]
 allowed_connections = "*"  # Allow access to all connections
 ```
-### Group-based Access Control
+### Group-based Access Control<a id="group-based-access-control"></a>
 You can create multiple groups to segment your users and control their access:
 ```toml
 # Admin group with full access
 [groups.admin]
@@ -454,7 +469,7 @@ api_keys = ["FREE_KEY_1", "FREE_KEY_2"]
 allowed_connections = "openai"  # Only allowed to use OpenAI connection
 ```
-### Connection Restrictions
+### Connection Restrictions<a id="connection-restrictions"></a>
 The `allowed_connections` parameter controls which upstream providers a group can access:
@@ -468,7 +483,8 @@ This allows fine-grained control over which users can access which AI providers,
 - Implementing usage quotas per group
 - Billing and cost allocation by user group
-### Virtual API Key Validation
+### Virtual API Key Validation<a id="virtual-api-key-validation"></a>
 #### Overview
@@ -485,7 +501,6 @@ In the .py config representation, the validator function can be passed directly
 #### Example configuration for external API key validation using HTTP request to Keycloak / OpenID Connect
 This example shows how to validate API keys against an external service (e.g., Keycloak):
 ```toml
 [api_key_check]
 class = "lm_proxy.api_key_check.CheckAPIKeyWithRequest"
@@ -502,7 +517,6 @@ Authorization = "Bearer {api_key}"
 For more advanced authentication needs,
 you can implement a custom validator function:
 ```python
 # my_validators.py
 def validate_api_key(api_key: str) -> str | None:
@@ -523,7 +537,6 @@ def validate_api_key(api_key: str) -> str | None:
 ```
 Then reference it in your config:
 ```toml
 api_key_check = "my_validators.validate_api_key"
 ```
@@ -531,11 +544,11 @@ api_key_check = "my_validators.validate_api_key"
 > In this case, the `api_keys` lists in groups are ignored, and the custom function is responsible for all validation logic.
-## 🛠️ Advanced Usage
-### Dynamic Model Routing
+## 🛠️ Advanced Usage<a id="-advanced-usage"></a>
-The routing section allows flexible pattern matching with wildcards:
+### Dynamic Model Routing<a id="dynamic-model-routing"></a>
+The routing section allows flexible pattern matching with wildcards:
 ```toml
 [routing]
 "gpt-4*" = "openai.gpt-4"           # Route gpt-4 requests to OpenAI GPT-4
@@ -548,18 +561,19 @@ The routing section allows flexible pattern matching with wildcards:
 Keys are model name patterns (with `*` wildcard support), and values are connection/model mappings.
 Connection names reference those defined in the `[connections]` section.
-### Load Balancing Example
+### Load Balancing Example<a id="load-balancing-example"></a>
 - [Simple load-balancer configuration](https://github.com/Nayjest/lm-proxy/blob/main/examples/load_balancer_config.py)
   This example demonstrates how to set up a load balancer that randomly
 distributes requests across multiple language model servers using the lm_proxy.
-### Google Vertex AI Configuration Example
+### Google Vertex AI Configuration Example<a id="google-vertex-ai-configuration-example"></a>
 - [vertex-ai.toml](https://github.com/Nayjest/lm-proxy/blob/main/examples/vertex-ai.toml)
   This example demonstrates how to connect Inference Proxy to Google Gemini model via Vertex AI API
-### Using Tokens from OIDC Provider as Virtual/Client API Keys
+### Using Tokens from OIDC Provider as Virtual/Client API Keys<a id="using-tokens-from-oidc-provider-as-virtualclient-api-keys"></a>
 You can configure Inference Proxy to validate tokens from OpenID Connect (OIDC) providers like Keycloak, Auth0, or Okta as API keys.
@@ -594,9 +608,95 @@ Authorization = "Bearer {api_key}"
 Clients pass their OIDC access token as the API key when making requests to Inference Proxy.
-## 🧩 Add-on Components
-### Database Connector
+## 🪝 Request Handlers (Middleware)<a id="-request-handlers--middleware"></a>
+Handlers intercept and modify requests *before* they reach the upstream LLM provider. They enable cross-cutting concerns such as rate limiting, logging, auditing, and header manipulation.
+Handlers are defined in the `before` list within the configuration file and execute sequentially in the order specified.
+### Built-in Handlers
+Inference Proxy includes several built-in handlers for common operational needs.
+#### Rate Limiter
+The `RateLimiter` protects upstream credentials and manages traffic load using a sliding window algorithm.
+**Parameters:**
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `max_requests` | int | Maximum number of requests allowed per window |
+| `window_seconds` | int | Duration of the sliding window in seconds |
+| `per` | string | Scope of the limit: `api_key`, `ip`, `connection`, `group`, or `global` |
+**Configuration:**
+```toml
+[[before]]
+class = "lm_proxy.handlers.RateLimiter"
+max_requests = 10
+window_seconds = 60
+per = "api_key"
+[[before]]
+class = "lm_proxy.handlers.RateLimiter"
+max_requests = 1000
+window_seconds = 300
+per = "global"
+```
+#### HTTP Headers Forwarder
+The `HTTPHeadersForwarder` passes specific headers from incoming client requests to the upstream provider—useful for distributed tracing or tenant context propagation.
+Sensitive headers (`Authorization`, `Host`, `Content-Length`) are stripped by default to prevent protocol corruption and credential leaks.
+```toml
+[[before]]
+class = "lm_proxy.handlers.HTTPHeadersForwarder"
+white_list_headers = ["x-trace-id", "x-correlation-id", "x-tenant-id"]
+```
+See also [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md).
+### Custom Handlers
+Extend functionality by implementing custom handlers in Python. A handler is any callable (function or class instance) that accepts a `RequestContext`.
+#### Interface
+```python
+from lm_proxy.base_types import RequestContext
+async def my_custom_handler(ctx: RequestContext) -> None:
+    # Implementation here
+    pass
+```
+#### Example: Audit Logger
+```python
+# my_extensions.py
+import logging
+from lm_proxy.base_types import RequestContext
+class AuditLogger:
+    def __init__(self, prefix: str = "AUDIT"):
+        self.prefix = prefix
+    async def __call__(self, ctx: RequestContext) -> None:
+        user = ctx.user_info.get("name", "anonymous")
+        logging.info(f"[{self.prefix}] User '{user}' requested model '{ctx.model}'")
+```
+**Registration:**
+```toml
+[[before]]
+class = "my_extensions.AuditLogger"
+prefix = "SECURITY_AUDIT"
+```
+## 🧩 Add-on Components<a id="-add-on-components"></a>
+### Database Connector<a id="database-connector"></a>
 [inference-proxy-db-connector](https://github.com/nayjest/lm-proxy-db-connector) is a lightweight SQLAlchemy-based connector that enables Inference Proxy to work with relational databases including PostgreSQL, MySQL/MariaDB, SQLite, Oracle, Microsoft SQL Server, and many others.
@@ -605,7 +705,21 @@ Clients pass their OIDC access token as the API key when making requests to Infe
 - Share database connections across components, extensions, and custom functions
 - Built-in database logger for structured logging of AI request data
-## 🔍 Debugging
+## 📚 Guides & Reference<a id="-guides--reference"></a>
+For more detailed information, check out these articles:
+- [HTTP Header Management](https://github.com/Nayjest/lm-proxy/blob/main/doc/http_headers.md)
+## 🚧 Known Limitations<a id="-known-limitations"></a>
+- **Multiple generations (n > 1):** When proxying requests to Google or Anthropic APIs, only the first generation is returned. Multi-generation support is tracked in [#35](https://github.com/Nayjest/lm-proxy/issues/35).
+- **Model listing with wildcards / forwarding actual model metadata:** The `/v1/models` endpoint does not query upstream providers to expand wildcard patterns (e.g., `gpt*`) or fetch model metadata. Only explicitly defined model names are listed [#36](https://github.com/Nayjest/lm-proxy/issues/36).
+## 🔍 Debugging<a id="-debugging"></a>
 ### Overview
 When **debugging mode** is enabled,
@@ -629,7 +743,7 @@ Alternatively, you can enable or disable debugging via the command-line argument
 > CLI arguments override environment variable settings.
-## 🤝 Contributing
+## 🤝 Contributing<a id="-contributing"></a>
 Contributions are welcome! Please feel free to submit a Pull Request.
@@ -640,7 +754,7 @@ Contributions are welcome! Please feel free to submit a Pull Request.
 5. Open a Pull Request
-## 📄 License
+## 📄 License<a id="-license"></a>
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 © 2025–2026 [Vitalii Stepanenko](mailto:mail@vitaliy.in)

inference-proxy 3.0.0.dev1__tar.gz → 3.0.1__tar.gz

inference-proxy 3.0.0.dev1tar.gz → 3.0.1tar.gz