PyPI - lm-deluge - Versions diffs - 0.0.22__tar.gz → 0.0.70__tar.gz - Mend

lm-deluge 0.0.22tar.gz → 0.0.70tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (93) hide show

{lm_deluge-0.0.22/src/lm_deluge.egg-info → lm_deluge-0.0.70}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lm_deluge
-Version: 0.0.22
+Version: 0.0.70
 Summary: Python utility for using LLM API models.
 Author-email: Benjamin Anderson <ben@trytaylor.ai>
 Requires-Python: >=3.10
@@ -23,6 +23,8 @@ Requires-Dist: pdf2image
 Requires-Dist: pillow
 Requires-Dist: fastmcp>=2.4
 Requires-Dist: rich
+Provides-Extra: openai
+Requires-Dist: openai>=1.0.0; extra == "openai"
 Dynamic: license-file
 # lm-deluge
@@ -54,12 +56,12 @@ The package relies on environment variables for API keys. Typical variables incl
 ## Quickstart
-The easiest way to get started is with the `.basic` constructor. This uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
+`LLMClient` uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(["Hello, world!"])
 print(resp[0].completion)
 ```
@@ -71,7 +73,7 @@ To distribute your requests across models, just provide a list of more than one
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic(
+client = LLMClient(
     ["gpt-4o-mini", "claude-3-haiku"],
     max_requests_per_minute=10_000
 )
@@ -85,8 +87,8 @@ print(resp[0].completion)
 API calls can be customized in a few ways.
-1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models. You can also pass many of these arguments directly to `LLMClient.basic` so you don't have to construct an entire `SamplingParams` object.
-2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, and caching.
+1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models.
+2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, caching, **and progress display style**. Set `progress="rich"` (default), `"tqdm"`, or `"manual"` to choose how progress is reported. The manual option prints an update every 30 seconds.
 3. **Arguments to process_prompts.** Per-call, you can set verbosity, whether to display progress, and whether to return just completions (rather than the full APIResponse object). This is also where you provide tools.
 Putting it all together:
@@ -109,6 +111,22 @@ await client.process_prompts_async(
 )
 ```
+### Queueing individual prompts
+You can queue prompts one at a time and track progress explicitly. Iterate over
+results as they finish with `as_completed` (or gather them all at once with
+`wait_for_all`):
+```python
+client = LLMClient("gpt-4.1-mini", progress="tqdm")
+client.open()
+client.start_nowait("hello there")
+# ... queue more tasks ...
+async for task_id, result in client.as_completed():
+    print(task_id, result.completion)
+client.close()
+```
 ## Multi-Turn Conversations
 Constructing conversations to pass to models is notoriously annoying. Each provider has a slightly different way of defining a list of messages, and with the introduction of images/multi-part messages it's only gotten worse. We provide convenience constructors so you don't have to remember all that stuff.
@@ -120,7 +138,7 @@ prompt = Conversation.system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 resps = client.process_prompts_sync([prompt])
 ```
@@ -136,9 +154,9 @@ For models that support file uploads (OpenAI, Anthropic, and Gemini), you can ea
 from lm_deluge import LLMClient, Conversation
 # Simple file upload
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 conversation = Conversation.user(
-    "Please summarize this document",
+    "Please summarize this document",
     file="path/to/document.pdf"
 )
 resps = client.process_prompts_sync([conversation])
@@ -163,7 +181,7 @@ def get_weather(city: str) -> str:
     return f"The weather in {city} is sunny and 72°F"
 tool = Tool.from_function(get_weather)
-client = LLMClient.basic("claude-3-haiku")
+client = LLMClient("claude-3-haiku")
 resps = client.process_prompts_sync(
     ["What's the weather in Paris?"],
     tools=[tool]
@@ -200,7 +218,7 @@ config = {
 all_tools = Tool.from_mcp_config(config)
 # let the model use the tools
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(
     ["List the files in the current directory"],
     tools=tools
@@ -237,7 +255,7 @@ conv = (
 )
 # Use prompt caching to cache system message and tools
-client = LLMClient.basic("claude-3-5-sonnet")
+client = LLMClient("claude-3-5-sonnet")
 resps = client.process_prompts_sync(
     [conv],
     cache="system_and_tools"  # Cache system message and any tools
@@ -274,7 +292,7 @@ We support all models in `src/lm_deluge/models.py`. Vertex support is not planne
 ## Feature Support
-We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
+We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Passing `None` (or the string `"none"`) disables Gemini thoughts entirely. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
 ## Built‑in tools

{lm_deluge-0.0.22 → lm_deluge-0.0.70}/README.md RENAMED Viewed

@@ -27,12 +27,12 @@ The package relies on environment variables for API keys. Typical variables incl
 ## Quickstart
-The easiest way to get started is with the `.basic` constructor. This uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
+`LLMClient` uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(["Hello, world!"])
 print(resp[0].completion)
 ```
@@ -44,7 +44,7 @@ To distribute your requests across models, just provide a list of more than one
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic(
+client = LLMClient(
     ["gpt-4o-mini", "claude-3-haiku"],
     max_requests_per_minute=10_000
 )
@@ -58,8 +58,8 @@ print(resp[0].completion)
 API calls can be customized in a few ways.
-1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models. You can also pass many of these arguments directly to `LLMClient.basic` so you don't have to construct an entire `SamplingParams` object.
-2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, and caching.
+1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models.
+2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, caching, **and progress display style**. Set `progress="rich"` (default), `"tqdm"`, or `"manual"` to choose how progress is reported. The manual option prints an update every 30 seconds.
 3. **Arguments to process_prompts.** Per-call, you can set verbosity, whether to display progress, and whether to return just completions (rather than the full APIResponse object). This is also where you provide tools.
 Putting it all together:
@@ -82,6 +82,22 @@ await client.process_prompts_async(
 )
 ```
+### Queueing individual prompts
+You can queue prompts one at a time and track progress explicitly. Iterate over
+results as they finish with `as_completed` (or gather them all at once with
+`wait_for_all`):
+```python
+client = LLMClient("gpt-4.1-mini", progress="tqdm")
+client.open()
+client.start_nowait("hello there")
+# ... queue more tasks ...
+async for task_id, result in client.as_completed():
+    print(task_id, result.completion)
+client.close()
+```
 ## Multi-Turn Conversations
 Constructing conversations to pass to models is notoriously annoying. Each provider has a slightly different way of defining a list of messages, and with the introduction of images/multi-part messages it's only gotten worse. We provide convenience constructors so you don't have to remember all that stuff.
@@ -93,7 +109,7 @@ prompt = Conversation.system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 resps = client.process_prompts_sync([prompt])
 ```
@@ -109,9 +125,9 @@ For models that support file uploads (OpenAI, Anthropic, and Gemini), you can ea
 from lm_deluge import LLMClient, Conversation
 # Simple file upload
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 conversation = Conversation.user(
-    "Please summarize this document",
+    "Please summarize this document",
     file="path/to/document.pdf"
 )
 resps = client.process_prompts_sync([conversation])
@@ -136,7 +152,7 @@ def get_weather(city: str) -> str:
     return f"The weather in {city} is sunny and 72°F"
 tool = Tool.from_function(get_weather)
-client = LLMClient.basic("claude-3-haiku")
+client = LLMClient("claude-3-haiku")
 resps = client.process_prompts_sync(
     ["What's the weather in Paris?"],
     tools=[tool]
@@ -173,7 +189,7 @@ config = {
 all_tools = Tool.from_mcp_config(config)
 # let the model use the tools
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(
     ["List the files in the current directory"],
     tools=tools
@@ -210,7 +226,7 @@ conv = (
 )
 # Use prompt caching to cache system message and tools
-client = LLMClient.basic("claude-3-5-sonnet")
+client = LLMClient("claude-3-5-sonnet")
 resps = client.process_prompts_sync(
     [conv],
     cache="system_and_tools"  # Cache system message and any tools
@@ -247,7 +263,7 @@ We support all models in `src/lm_deluge/models.py`. Vertex support is not planne
 ## Feature Support
-We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
+We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Passing `None` (or the string `"none"`) disables Gemini thoughts entirely. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
 ## Built‑in tools

{lm_deluge-0.0.22 → lm_deluge-0.0.70}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools", "wheel"]
 [project]
 name = "lm_deluge"
-version = "0.0.22"
+version = "0.0.70"
 authors = [{ name = "Benjamin Anderson", email = "ben@trytaylor.ai" }]
 description = "Python utility for using LLM API models."
 readme = "README.md"
@@ -28,5 +28,12 @@ dependencies = [
     "pdf2image",
     "pillow",
     "fastmcp>=2.4",
-    "rich"
+    "rich",
+    # "textual>=0.58.0"
 ]
+[project.optional-dependencies]
+openai = ["openai>=1.0.0"]
+# [project.scripts]
+# deluge = "lm_deluge.cli:main"

lm_deluge-0.0.70/src/lm_deluge/__init__.py ADDED Viewed

@@ -0,0 +1,41 @@
+from .client import APIResponse, LLMClient, SamplingParams
+from .file import File
+from .prompt import Conversation, Message
+from .tool import Tool, ToolParams
+try:
+    from .mock_openai import (  # noqa
+        APIError,
+        APITimeoutError,
+        BadRequestError,
+        MockAsyncOpenAI,
+        RateLimitError,
+    )
+    _has_openai = True
+except ImportError:
+    _has_openai = False
+# dotenv.load_dotenv() - don't do this, fucks with other packages
+__all__ = [
+    "LLMClient",
+    "SamplingParams",
+    "APIResponse",
+    "Conversation",
+    "Message",
+    "Tool",
+    "ToolParams",
+    "File",
+]
+if _has_openai:
+    __all__.extend(
+        [
+            "MockAsyncOpenAI",
+            "APIError",
+            "APITimeoutError",
+            "BadRequestError",
+            "RateLimitError",
+        ]
+    )

{lm_deluge-0.0.22 → lm_deluge-0.0.70}/src/lm_deluge/api_requests/anthropic.py RENAMED Viewed

@@ -28,24 +28,28 @@ def _add_beta(headers: dict, beta: str):
 def _build_anthropic_request(
     model: APIModel,
     context: RequestContext,
-    # prompt: Conversation,
-    # tools: list[Tool | dict | MCPServer] | None,
-    # sampling_params: SamplingParams,
-    # cache_pattern: CachePattern | None = None,
 ):
     prompt = context.prompt
     cache_pattern = context.cache
     tools = context.tools
     sampling_params = context.sampling_params
     system_message, messages = prompt.to_anthropic(cache_pattern=cache_pattern)
-    if not system_message:
-        print("WARNING: system_message is None")
+    # if not system_message:
+    #     print("WARNING: system_message is None")
     base_headers = {
         "x-api-key": os.getenv(model.api_key_env_var),
         "anthropic-version": "2023-06-01",
         "content-type": "application/json",
     }
+    # Check if any messages contain uploaded files (file_id)
+    # If so, add the files-api beta header
+    for msg in prompt.messages:
+        for file in msg.files:
+            if file.is_remote and file.remote_provider == "anthropic":
+                _add_beta(base_headers, "files-api-2025-04-14")
+                break
     request_json = {
         "model": model.name,
         "messages": messages,
@@ -57,14 +61,15 @@ def _build_anthropic_request(
     # handle thinking
     if model.reasoning_model and sampling_params.reasoning_effort:
         # translate reasoning effort of low, medium, high to budget tokens
-        budget = {"low": 1024, "medium": 4096, "high": 16384}.get(
+        budget = {"minimal": 256, "low": 1024, "medium": 4096, "high": 16384}.get(
             sampling_params.reasoning_effort
         )
         request_json["thinking"] = {
             "type": "enabled",
             "budget_tokens": budget,
         }
-        request_json.pop("top_p")
+        if "top_p" in request_json:
+            request_json["top_p"] = max(request_json["top_p"], 0.95)
         request_json["temperature"] = 1.0
         request_json["max_tokens"] += budget
     else:
@@ -74,12 +79,20 @@ def _build_anthropic_request(
     if system_message is not None:
         request_json["system"] = system_message
+    # handle temp + top_p for opus 4.1/sonnet 4.5
+    if "4-1" in model.name or "4-5" in model.name:
+        if "temperature" in request_json and "top_p" in request_json:
+            request_json.pop("top_p")
     if tools:
         mcp_servers = []
         tool_definitions = []
         for tool in tools:
             if isinstance(tool, Tool):
                 tool_definitions.append(tool.dump_for("anthropic"))
+            elif isinstance(tool, dict) and "url" in tool:
+                _add_beta(base_headers, "mcp-client-2025-04-04")
+                mcp_servers.append(tool)
             elif isinstance(tool, dict):
                 tool_definitions.append(tool)
                 # add betas if needed
@@ -93,6 +106,9 @@ def _build_anthropic_request(
                     _add_beta(base_headers, "computer-use-2025-01-24")
                 elif tool["type"] == "code_execution_20250522":
                     _add_beta(base_headers, "code-execution-2025-05-22")
+                elif tool["type"] in ["memory_20250818", "clear_tool_uses_20250919"]:
+                    _add_beta(base_headers, "context-management-2025-06-27")
             elif isinstance(tool, MCPServer):
                 _add_beta(base_headers, "mcp-client-2025-04-04")
                 mcp_servers.append(tool.for_anthropic())

{lm_deluge-0.0.22 → lm_deluge-0.0.70}/src/lm_deluge/api_requests/base.py RENAMED Viewed

@@ -1,4 +1,5 @@
 import asyncio
+import time
 import traceback
 from abc import ABC, abstractmethod
@@ -6,6 +7,7 @@ import aiohttp
 from aiohttp import ClientResponse
 from ..errors import raise_if_modal_exception
+from ..models.openai import OPENAI_MODELS
 from ..request_context import RequestContext
 from .response import APIResponse
@@ -52,6 +54,9 @@ class APIRequestBase(ABC):
         self, base_headers: dict[str, str], exclude_patterns: list[str] | None = None
     ) -> dict[str, str]:
         """Merge extra_headers with base headers, giving priority to extra_headers."""
+        # Filter out None values from base headers (e.g., missing API keys)
+        base_headers = {k: v for k, v in base_headers.items() if v is not None}
         if not self.context.extra_headers:
             return base_headers
@@ -69,6 +74,9 @@ class APIRequestBase(ABC):
         # Start with base headers, then overlay filtered extra headers (extra takes precedence)
         merged = dict(base_headers)
         merged.update(filtered_extra)
+        # Filter out None values from final merged headers
+        merged = {k: v for k, v in merged.items() if v is not None}
         return merged
     def handle_success(self, data):
@@ -76,15 +84,95 @@ class APIRequestBase(ABC):
         if self.context.status_tracker:
             self.context.status_tracker.task_succeeded(self.context.task_id)
+    async def _execute_once_background_mode(self) -> APIResponse:
+        """
+        ONLY for OpenAI responses API. Implement the
+        start -> poll -> result style of request.
+        """
+        assert self.context.status_tracker, "no status tracker"
+        start_time = time.time()
+        async with aiohttp.ClientSession() as session:
+            last_status: str | None = None
+            try:
+                self.context.status_tracker.total_requests += 1
+                assert self.url is not None, "URL is not set"
+                async with session.post(
+                    url=self.url,
+                    headers=self.request_header,
+                    json=self.request_json,
+                ) as http_response:
+                    # make sure we created the Response object
+                    http_response.raise_for_status()
+                    data = await http_response.json()
+                    response_id = data["id"]
+                    last_status = data["status"]
+                while True:
+                    if time.time() - start_time > self.context.request_timeout:
+                        # cancel the response
+                        async with session.post(
+                            url=f"{self.url}/{response_id}/cancel",
+                            headers=self.request_header,
+                        ) as http_response:
+                            http_response.raise_for_status()
+                        return APIResponse(
+                            id=self.context.task_id,
+                            model_internal=self.context.model_name,
+                            prompt=self.context.prompt,
+                            sampling_params=self.context.sampling_params,
+                            status_code=None,
+                            is_error=True,
+                            error_message="Request timed out (terminated by client).",
+                            content=None,
+                            usage=None,
+                        )
+                    # poll for the response
+                    await asyncio.sleep(5.0)
+                    async with session.get(
+                        url=f"{self.url}/{response_id}",
+                        headers=self.request_header,
+                    ) as http_response:
+                        http_response.raise_for_status()
+                        data = await http_response.json()
+                        if data["status"] != last_status:
+                            print(
+                                f"Background req {response_id} status updated to: {data['status']}"
+                            )
+                            last_status = data["status"]
+                        if last_status not in ["queued", "in_progress"]:
+                            return await self.handle_response(http_response)
+            except Exception as e:
+                raise_if_modal_exception(e)
+                tb = traceback.format_exc()
+                print(tb)
+                return APIResponse(
+                    id=self.context.task_id,
+                    model_internal=self.context.model_name,
+                    prompt=self.context.prompt,
+                    sampling_params=self.context.sampling_params,
+                    status_code=None,
+                    is_error=True,
+                    error_message=f"Unexpected {type(e).__name__}: {str(e) or 'No message.'}",
+                    content=None,
+                    usage=None,
+                )
     async def execute_once(self) -> APIResponse:
         """Send the HTTP request once and return the parsed APIResponse."""
         await self.build_request()
         assert self.context.status_tracker
-        # try:
-        #     dumped = json.dumps(self.request_json)
-        # except Exception:
-        #     print("couldn't serialize request json")
-        #     print(self.request_json)
+        if (
+            self.context.background
+            and self.context.use_responses_api
+            and self.context.model_name in OPENAI_MODELS
+        ):
+            return await self._execute_once_background_mode()
         try:
             self.context.status_tracker.total_requests += 1
             timeout = aiohttp.ClientTimeout(total=self.context.request_timeout)

lm-deluge 0.0.22__tar.gz → 0.0.70__tar.gz

lm-deluge 0.0.22tar.gz → 0.0.70tar.gz