PyPI - lm-deluge - Versions diffs - 0.0.89__tar.gz → 0.0.91__tar.gz - Mend

lm-deluge 0.0.89tar.gz → 0.0.91tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (153) hide show

{lm_deluge-0.0.89/src/lm_deluge.egg-info → lm_deluge-0.0.91}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lm_deluge
-Version: 0.0.89
+Version: 0.0.91
 Summary: Python utility for using LLM API models.
 Author-email: Benjamin Anderson <ben@trytaylor.ai>
 Requires-Python: >=3.10
@@ -9,7 +9,6 @@ License-File: LICENSE
 Requires-Dist: python-dotenv
 Requires-Dist: json5
 Requires-Dist: PyYAML
-Requires-Dist: pandas
 Requires-Dist: aiohttp
 Requires-Dist: tiktoken
 Requires-Dist: xxhash
@@ -23,8 +22,6 @@ Requires-Dist: pdf2image
 Requires-Dist: pillow
 Requires-Dist: fastmcp>=2.4
 Requires-Dist: rich
-Provides-Extra: openai
-Requires-Dist: openai>=1.0.0; extra == "openai"
 Provides-Extra: aws
 Requires-Dist: boto3>=1.28.0; extra == "aws"
 Provides-Extra: docker
@@ -36,6 +33,9 @@ Provides-Extra: sandbox
 Requires-Dist: modal>=0.64.0; extra == "sandbox"
 Requires-Dist: daytona-sdk>=0.1.4; extra == "sandbox"
 Requires-Dist: docker>=7.0.0; extra == "sandbox"
+Provides-Extra: server
+Requires-Dist: fastapi>=0.100.0; extra == "server"
+Requires-Dist: uvicorn>=0.20.0; extra == "server"
 Dynamic: license-file
 # lm-deluge
@@ -48,9 +48,9 @@ Dynamic: license-file
 - **Spray across models/providers** – Configure a client with multiple models from any provider(s), and sampling weights. The client samples a model for each request.
 - **Tool Use** – Unified API for defining tools for all providers, and creating tools automatically from python functions.
 - **MCP Support** – Instantiate a `Tool` from a local or remote MCP server so that any LLM can use it, whether or not that provider natively supports MCP.
-- **Computer Use** – We support Claude Computer Use via the computer_use argument to process_prompts_sync/async. It works with Anthropic's API; Bedrock's API is broken right now and rejects the tool definitions, but in principle this will work there too when Bedrock gets their sh*t together.
-- **Caching** – Save completions in a local or distributed cache to avoid repeated LLM calls to process the same input.
-- **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our client or with the `openai` and `anthropic` packages.
+- **Computer Use** – We support computer use for all major providers, and have pre-fabricated tools to integrate with Kernel, TryCUA, and more.
+- **Local & Remote Caching** – Use Anthropic caching more easily with common patterns (system-only, tools-only, last N messages, etc.) Use client-side caching to save completions to avoid repeated LLM calls to process the same input.
+- **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our `LLMClient` or with the `openai` and `anthropic` packages.
 - **Sync and async APIs** – Use the client from sync or async code.
 **STREAMING IS NOT IN SCOPE.** There are plenty of packages that let you stream chat completions across providers. The sole purpose of this package is to do very fast batch inference using APIs. Sorry!
@@ -145,7 +145,7 @@ Constructing conversations to pass to models is notoriously annoying. Each provi
 ```python
 from lm_deluge import Message, Conversation
-prompt = Conversation.system("You are a helpful assistant.").add(
+prompt = Conversation().system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
@@ -166,7 +166,7 @@ from lm_deluge import LLMClient, Conversation
 # Simple file upload
 client = LLMClient("gpt-4.1-mini")
-conversation = Conversation.user(
+conversation = Conversation().user(
     "Please summarize this document",
     file="path/to/document.pdf"
 )
@@ -175,7 +175,7 @@ resps = client.process_prompts_sync([conversation])
 # You can also create File objects for more control
 from lm_deluge import File
 file = File("path/to/report.pdf", filename="Q4_Report.pdf")
-conversation = Conversation.user("Analyze this financial report")
+conversation = Conversation().user("Analyze this financial report")
 conversation.messages[0].parts.append(file)
 ```
@@ -245,7 +245,7 @@ for tool_call in resps[0].tool_calls:
 import asyncio
 async def main():
-    conv = Conversation.user("List the files in the current directory")
+    conv = Conversation().user("List the files in the current directory")
     conv, resp = await client.run_agent_loop(conv, tools=tools)
     print(resp.content.completion)
@@ -261,7 +261,7 @@ from lm_deluge import LLMClient, Conversation, Message
 # Create a conversation with system message
 conv = (
-    Conversation.system("You are an expert Python developer with deep knowledge of async programming.")
+    Conversation().system("You are an expert Python developer with deep knowledge of async programming.")
     .add(Message.user("How do I use asyncio.gather?"))
 )

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/README.md RENAMED Viewed

@@ -8,9 +8,9 @@
 - **Spray across models/providers** – Configure a client with multiple models from any provider(s), and sampling weights. The client samples a model for each request.
 - **Tool Use** – Unified API for defining tools for all providers, and creating tools automatically from python functions.
 - **MCP Support** – Instantiate a `Tool` from a local or remote MCP server so that any LLM can use it, whether or not that provider natively supports MCP.
-- **Computer Use** – We support Claude Computer Use via the computer_use argument to process_prompts_sync/async. It works with Anthropic's API; Bedrock's API is broken right now and rejects the tool definitions, but in principle this will work there too when Bedrock gets their sh*t together.
-- **Caching** – Save completions in a local or distributed cache to avoid repeated LLM calls to process the same input.
-- **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our client or with the `openai` and `anthropic` packages.
+- **Computer Use** – We support computer use for all major providers, and have pre-fabricated tools to integrate with Kernel, TryCUA, and more.
+- **Local & Remote Caching** – Use Anthropic caching more easily with common patterns (system-only, tools-only, last N messages, etc.) Use client-side caching to save completions to avoid repeated LLM calls to process the same input.
+- **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our `LLMClient` or with the `openai` and `anthropic` packages.
 - **Sync and async APIs** – Use the client from sync or async code.
 **STREAMING IS NOT IN SCOPE.** There are plenty of packages that let you stream chat completions across providers. The sole purpose of this package is to do very fast batch inference using APIs. Sorry!
@@ -105,7 +105,7 @@ Constructing conversations to pass to models is notoriously annoying. Each provi
 ```python
 from lm_deluge import Message, Conversation
-prompt = Conversation.system("You are a helpful assistant.").add(
+prompt = Conversation().system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
@@ -126,7 +126,7 @@ from lm_deluge import LLMClient, Conversation
 # Simple file upload
 client = LLMClient("gpt-4.1-mini")
-conversation = Conversation.user(
+conversation = Conversation().user(
     "Please summarize this document",
     file="path/to/document.pdf"
 )
@@ -135,7 +135,7 @@ resps = client.process_prompts_sync([conversation])
 # You can also create File objects for more control
 from lm_deluge import File
 file = File("path/to/report.pdf", filename="Q4_Report.pdf")
-conversation = Conversation.user("Analyze this financial report")
+conversation = Conversation().user("Analyze this financial report")
 conversation.messages[0].parts.append(file)
 ```
@@ -205,7 +205,7 @@ for tool_call in resps[0].tool_calls:
 import asyncio
 async def main():
-    conv = Conversation.user("List the files in the current directory")
+    conv = Conversation().user("List the files in the current directory")
     conv, resp = await client.run_agent_loop(conv, tools=tools)
     print(resp.content.completion)
@@ -221,7 +221,7 @@ from lm_deluge import LLMClient, Conversation, Message
 # Create a conversation with system message
 conv = (
-    Conversation.system("You are an expert Python developer with deep knowledge of async programming.")
+    Conversation().system("You are an expert Python developer with deep knowledge of async programming.")
     .add(Message.user("How do I use asyncio.gather?"))
 )

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools", "wheel"]
 [project]
 name = "lm_deluge"
-version = "0.0.89"
+version = "0.0.91"
 authors = [{ name = "Benjamin Anderson", email = "ben@trytaylor.ai" }]
 description = "Python utility for using LLM API models."
 readme = "README.md"
@@ -15,7 +15,6 @@ dependencies = [
     "python-dotenv",
     "json5",
     "PyYAML",
-    "pandas",
     "aiohttp",
     "tiktoken",
     "xxhash",
@@ -28,16 +27,16 @@ dependencies = [
     "pdf2image",
     "pillow",
     "fastmcp>=2.4",
-    "rich",
-    # "textual>=0.58.0"
+    "rich"
 ]
 [project.optional-dependencies]
-openai = ["openai>=1.0.0"]
 aws = ["boto3>=1.28.0"]
 docker = ["docker>=7.0.0"]
 full_text_search = ["tantivy>=0.21.0", "lenlp>=0.1.0"]
 sandbox = ["modal>=0.64.0", "daytona-sdk>=0.1.4", "docker>=7.0.0"]
+server = ["fastapi>=0.100.0", "uvicorn>=0.20.0"]
-# [project.scripts]
-# deluge = "lm_deluge.cli:main"
+[project.scripts]
+deluge = "lm_deluge.cli:main"
+deluge-server = "lm_deluge.server.__main__:main"

lm_deluge-0.0.91/src/lm_deluge/__init__.py ADDED Viewed

@@ -0,0 +1,16 @@
+from .client import APIResponse, LLMClient, SamplingParams
+from .prompt import Conversation, Message, File
+from .tool import Tool, MCPServer
+# dotenv.load_dotenv() - don't do this, fucks with other packages
+__all__ = [
+    "LLMClient",
+    "SamplingParams",
+    "APIResponse",
+    "Conversation",
+    "Message",
+    "Tool",
+    "MCPServer",
+    "File",
+]

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/anthropic.py RENAMED Viewed

@@ -6,10 +6,11 @@ from aiohttp import ClientResponse
 from lm_deluge.prompt import (
     Message,
     Text,
+    ThoughtSignature,
     Thinking,
     ToolCall,
 )
-from lm_deluge.request_context import RequestContext
+from lm_deluge.api_requests.context import RequestContext
 from lm_deluge.tool import MCPServer, Tool
 from lm_deluge.usage import Usage
 from lm_deluge.util.schema import (
@@ -102,7 +103,9 @@ def _build_anthropic_request(
             if "top_p" in request_json:
                 request_json["top_p"] = max(request_json["top_p"], 0.95)
             request_json["temperature"] = 1.0
-            request_json["max_tokens"] += budget
+            max_tokens = request_json["max_tokens"]
+            assert isinstance(max_tokens, int)
+            request_json["max_tokens"] = max_tokens + budget
         else:
             request_json["thinking"] = {"type": "disabled"}
             if "kimi" in model.id and "thinking" in model.id:
@@ -250,8 +253,28 @@ class AnthropicRequest(APIRequestBase):
                     if item["type"] == "text":
                         parts.append(Text(item["text"]))
                     elif item["type"] == "thinking":
-                        thinking = item["thinking"]
-                        parts.append(Thinking(item["thinking"]))
+                        thinking_content = item.get("thinking", "")
+                        thinking = thinking_content
+                        signature = item.get("signature")
+                        parts.append(
+                            Thinking(
+                                thinking_content,
+                                raw_payload=item,
+                                thought_signature=ThoughtSignature(
+                                    signature,
+                                    provider="anthropic",
+                                )
+                                if signature is not None
+                                else None,
+                            )
+                        )
+                    elif item["type"] == "redacted_thinking":
+                        parts.append(
+                            Thinking(
+                                item.get("data", ""),
+                                raw_payload=item,
+                            )
+                        )
                     elif item["type"] == "tool_use":
                         parts.append(
                             ToolCall(
@@ -265,9 +288,8 @@ class AnthropicRequest(APIRequestBase):
                 usage = Usage.from_anthropic_usage(data["usage"])
             except Exception as e:
                 is_error = True
-                error_message = (
-                    f"Error calling .json() on response w/ status {status_code}: {e}"
-                )
+                response_text = await http_response.text()
+                error_message = f"Error calling .json() on response w/ status {status_code}: {e}. Response: {response_text[:500]}"
         elif mimetype and "json" in mimetype.lower():
             is_error = True  # expected status is 200, otherwise it's an error
             data = await http_response.json()

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/base.py RENAMED Viewed

@@ -1,4 +1,6 @@
 import asyncio
+import json
+import os
 import time
 import traceback
 from abc import ABC, abstractmethod
@@ -8,7 +10,7 @@ from aiohttp import ClientResponse
 from ..errors import raise_if_modal_exception
 from ..models.openai import OPENAI_MODELS
-from ..request_context import RequestContext
+from ..api_requests.context import RequestContext
 from .response import APIResponse
@@ -73,6 +75,24 @@ class APIRequestBase(ABC):
         # Start with base headers, then overlay filtered extra headers (extra takes precedence)
         merged = dict(base_headers)
+        if "anthropic-beta" in merged and "anthropic-beta" in filtered_extra:
+            combined = []
+            seen = set()
+            for (
+                raw
+            ) in f"{merged['anthropic-beta']},{filtered_extra['anthropic-beta']}".split(
+                ","
+            ):
+                token = raw.strip()
+                if token and token not in seen:
+                    seen.add(token)
+                    combined.append(token)
+            merged["anthropic-beta"] = ",".join(combined)
+            filtered_extra = {
+                key: value
+                for key, value in filtered_extra.items()
+                if key != "anthropic-beta"
+            }
         merged.update(filtered_extra)
         # Filter out None values from final merged headers
@@ -189,6 +209,23 @@ class APIRequestBase(ABC):
         await self.build_request()
         assert self.context.status_tracker
+        if os.getenv("DELUGE_PROXY_LOG_PROVIDER_REQUESTS", "").strip().lower() in {
+            "1",
+            "true",
+            "yes",
+            "on",
+        }:
+            print("DELUGE_PROXY_PROVIDER_REQUEST")
+            print(f"URL: {self.url}")
+            print("Headers:")
+            print(self.request_header)
+            if self.request_json is not None:
+                print("JSON:")
+                try:
+                    print(json.dumps(self.request_json, indent=2))
+                except Exception:
+                    print(self.request_json)
         if (
             self.context.background
             and self.context.use_responses_api

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/bedrock.py RENAMED Viewed

@@ -16,10 +16,11 @@ except ImportError:
 from lm_deluge.prompt import (
     Message,
     Text,
+    ThoughtSignature,
     Thinking,
     ToolCall,
 )
-from lm_deluge.request_context import RequestContext
+from lm_deluge.api_requests.context import RequestContext
 from lm_deluge.tool import MCPServer, Tool
 from lm_deluge.usage import Usage
@@ -262,6 +263,11 @@ class BedrockRequest(APIRequestBase):
         # Create a fake requests.PreparedRequest object for AWS4Auth to sign
         import requests
+        assert self.url is not None, "URL must be set after build_request"
+        assert (
+            self.request_header is not None
+        ), "Headers must be set after build_request"
         fake_request = requests.Request(
             method="POST",
             url=self.url,
@@ -363,8 +369,28 @@ class BedrockRequest(APIRequestBase):
                         if item["type"] == "text":
                             parts.append(Text(item["text"]))
                         elif item["type"] == "thinking":
-                            thinking = item["thinking"]
-                            parts.append(Thinking(item["thinking"]))
+                            thinking_content = item.get("thinking", "")
+                            thinking = thinking_content
+                            signature = item.get("signature")
+                            parts.append(
+                                Thinking(
+                                    thinking_content,
+                                    raw_payload=item,
+                                    thought_signature=ThoughtSignature(
+                                        signature,
+                                        provider="anthropic",
+                                    )
+                                    if signature is not None
+                                    else None,
+                                )
+                            )
+                        elif item["type"] == "redacted_thinking":
+                            parts.append(
+                                Thinking(
+                                    item.get("data", ""),
+                                    raw_payload=item,
+                                )
+                            )
                         elif item["type"] == "tool_use":
                             parts.append(
                                 ToolCall(

lm_deluge-0.0.89/src/lm_deluge/request_context.py → lm_deluge-0.0.91/src/lm_deluge/api_requests/context.py RENAMED Viewed

@@ -2,9 +2,9 @@ from dataclasses import dataclass, field
 from functools import cached_property
 from typing import Any, Callable, Sequence, TYPE_CHECKING
-from .config import SamplingParams
-from .prompt import CachePattern, Conversation
-from .tracker import StatusTracker
+from ..config import SamplingParams
+from ..prompt import CachePattern, Conversation
+from ..tracker import StatusTracker
 if TYPE_CHECKING:
     from pydantic import BaseModel
@@ -83,4 +83,4 @@ class RequestContext:
         # Update with any overrides
         current_values.update(overrides)
-        return RequestContext(**current_values)
+        return RequestContext(**current_values)  # type: ignore[arg-type]

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/gemini.py RENAMED Viewed

@@ -1,15 +1,16 @@
 import json
 import os
+from typing import Any
 from aiohttp import ClientResponse
-from lm_deluge.request_context import RequestContext
+from lm_deluge.api_requests.context import RequestContext
 from lm_deluge.tool import Tool
 from lm_deluge.warnings import maybe_warn
 from ..config import SamplingParams
 from ..models import APIModel
-from ..prompt import Conversation, Message, Text, Thinking, ToolCall
+from ..prompt import Conversation, Message, Text, ThoughtSignature, Thinking, ToolCall
 from ..usage import Usage
 from .base import APIRequestBase, APIResponse
@@ -37,13 +38,14 @@ async def _build_gemini_request(
                             part_type="function call",
                         )
-    request_json = {
+    generation_config: dict[str, Any] = {
+        "temperature": sampling_params.temperature,
+        "topP": sampling_params.top_p,
+        "maxOutputTokens": sampling_params.max_new_tokens,
+    }
+    request_json: dict[str, Any] = {
         "contents": messages,
-        "generationConfig": {
-            "temperature": sampling_params.temperature,
-            "topP": sampling_params.top_p,
-            "maxOutputTokens": sampling_params.max_new_tokens,
-        },
+        "generationConfig": generation_config,
     }
     # Add system instruction if present
@@ -83,7 +85,7 @@ async def _build_gemini_request(
                 }
             effort = level_map[effort_key]
         thinking_config = {"thinkingLevel": effort}
-        request_json["generationConfig"]["thinkingConfig"] = thinking_config
+        generation_config["thinkingConfig"] = thinking_config
     elif model.reasoning_model:
         if (
@@ -126,7 +128,7 @@ async def _build_gemini_request(
             # no thoughts head empty
             thinking_config = {"includeThoughts": False, "thinkingBudget": 0}
-        request_json["generationConfig"]["thinkingConfig"] = thinking_config
+        generation_config["thinkingConfig"] = thinking_config
     else:
         if sampling_params.reasoning_effort:
@@ -171,14 +173,14 @@ async def _build_gemini_request(
     # Handle JSON mode
     if sampling_params.json_mode and model.supports_json:
-        request_json["generationConfig"]["responseMimeType"] = "application/json"
+        generation_config["responseMimeType"] = "application/json"
     # Handle media_resolution for Gemini 3 (requires v1alpha)
     if sampling_params.media_resolution is not None:
         is_gemini_3 = "gemini-3" in model.name.lower()
         if is_gemini_3:
             # Add global media resolution to generationConfig
-            request_json["generationConfig"]["mediaResolution"] = {
+            generation_config["mediaResolution"] = {
                 "level": sampling_params.media_resolution
             }
         else:
@@ -260,10 +262,20 @@ class GeminiRequest(APIRequestBase):
                         if "content" in candidate and "parts" in candidate["content"]:
                             for part in candidate["content"]["parts"]:
                                 # Extract thought signature if present
-                                thought_sig = part.get("thoughtSignature")
+                                raw_sig = part.get("thoughtSignature")
+                                thought_sig = (
+                                    ThoughtSignature(raw_sig, provider="gemini")
+                                    if raw_sig is not None
+                                    else None
+                                )
                                 if "text" in part:
-                                    parts.append(Text(part["text"]))
+                                    parts.append(
+                                        Text(
+                                            part["text"],
+                                            thought_signature=thought_sig,
+                                        )
+                                    )
                                 elif "thought" in part:
                                     # Thought with optional signature
                                     parts.append(
@@ -286,6 +298,10 @@ class GeminiRequest(APIRequestBase):
                                             thought_signature=thought_sig,
                                         )
                                     )
+                                elif thought_sig:
+                                    parts.append(
+                                        Text("", thought_signature=thought_sig)
+                                    )
                     content = Message("assistant", parts)

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/mistral.py RENAMED Viewed

@@ -7,7 +7,7 @@ from lm_deluge.warnings import maybe_warn
 from ..models import APIModel
 from ..prompt import Message
-from ..request_context import RequestContext
+from ..api_requests.context import RequestContext
 from ..usage import Usage
 from .base import APIRequestBase, APIResponse

{lm_deluge-0.0.89 → lm_deluge-0.0.91}/src/lm_deluge/api_requests/openai.py RENAMED Viewed

@@ -7,7 +7,7 @@ from typing import Sequence
 import aiohttp
 from aiohttp import ClientResponse
-from lm_deluge.request_context import RequestContext
+from lm_deluge.api_requests.context import RequestContext
 from lm_deluge.tool import MCPServer, Tool
 from lm_deluge.util.schema import (
     prepare_output_schema,
@@ -22,6 +22,24 @@ from ..usage import Usage
 from .base import APIRequestBase, APIResponse
+def _message_contents_to_string(messages: list[dict]):
+    messages = messages.copy()
+    for msg in messages:
+        content = msg.get("content")
+        assert content
+        if isinstance(content, list):
+            new_content = ""
+            for part in content:
+                assert "text" in part, "Invalid text part: " + str(part)
+                new_content += part["text"]
+                new_content += "\n"
+            msg["content"] = new_content.strip()
+    return messages
 async def _build_oa_chat_request(
     model: APIModel,
     context: RequestContext,
@@ -55,6 +73,12 @@ async def _build_oa_chat_request(
                 request_json["service_tier"] = context.service_tier
         else:
             request_json["service_tier"] = context.service_tier
+    # if tinker, for now hack to mush into 1 string
+    if "tinker" in model.name:
+        messages = request_json["messages"]
+        assert isinstance(messages, list)
+        request_json["messages"] = _message_contents_to_string(messages)
     # set max_tokens or max_completion_tokens dep. on provider
     if "cohere" in model.api_base:
         request_json["max_tokens"] = sampling_params.max_new_tokens
@@ -217,7 +241,7 @@ class OpenAIRequest(APIRequestBase):
                         parts.append(Text(message["content"]))
                     # Add tool calls if present
-                    if "tool_calls" in message:
+                    if "tool_calls" in message and message["tool_calls"] is not None:
                         for tool_call in message["tool_calls"]:
                             parts.append(
                                 ToolCall(
@@ -238,9 +262,9 @@ class OpenAIRequest(APIRequestBase):
                         and "logprobs" in data["choices"][0]
                     ):
                         logprobs = data["choices"][0]["logprobs"]["content"]
-                except Exception:
+                except Exception as e:
                     is_error = True
-                    error_message = f"Error getting 'choices' and 'usage' from {self.model.name} response."
+                    error_message = f"Error getting 'choices' and 'usage' from {self.model.name} response: {data}. Error: {e}"
         elif mimetype and "json" in mimetype.lower():
             is_error = True  # expected status is 200, otherwise it's an error
             data = await http_response.json()
@@ -655,7 +679,12 @@ async def stream_chat(
         request_header.update(filtered_extra)
     context = SimpleNamespace(
-        prompt=prompt, tools=tools, sampling_params=sampling_params
+        prompt=prompt,
+        tools=tools,
+        sampling_params=sampling_params,
+        service_tier=None,
+        output_schema=None,
+        model_name=model_name,
     )
     request_json = await _build_oa_chat_request(model, context)  # type: ignore

lm-deluge 0.0.89__tar.gz → 0.0.91__tar.gz

lm-deluge 0.0.89tar.gz → 0.0.91tar.gz