PyPI - lm-deluge - Versions diffs - 0.0.12__tar.gz → 0.0.14__tar.gz - Mend

lm-deluge 0.0.12tar.gz → 0.0.14tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of lm-deluge might be problematic. Click here for more details.

Files changed (83) hide show

{lm_deluge-0.0.12/src/lm_deluge.egg-info → lm_deluge-0.0.14}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lm_deluge
-Version: 0.0.12
+Version: 0.0.14
 Summary: Python utility for using LLM API models.
 Author-email: Benjamin Anderson <ben@trytaylor.ai>
 Requires-Python: >=3.10
@@ -22,8 +22,7 @@ Requires-Dist: lxml
 Requires-Dist: pdf2image
 Requires-Dist: pillow
 Requires-Dist: fastmcp>=2.4
-Requires-Dist: fasttext-wheel
-Requires-Dist: fasttext-langdetect
+Requires-Dist: rich
 Dynamic: license-file
 # lm-deluge
@@ -31,16 +30,20 @@ Dynamic: license-file
 `lm-deluge` is a lightweight helper library for maxing out your rate limits with LLM providers. It provides the following:
 - **Unified client** – Send prompts to all relevant models with a single client.
+- **Files and Images** - Include images easily for multimodal models, and PDF files for models that support them (OpenAI and Anthropic).
 - **Massive concurrency with throttling** – Set `max_tokens_per_minute` and `max_requests_per_minute` and let it fly. The client will process as many requests as possible while respecting rate limits and retrying failures.
 - **Spray across models/providers** – Configure a client with multiple models from any provider(s), and sampling weights. The client samples a model for each request.
 - **Tool Use** – Unified API for defining tools for all providers, and creating tools automatically from python functions.
 - **MCP Support** – Instantiate a `Tool` from a local or remote MCP server so that any LLM can use it, whether or not that provider natively supports MCP.
+- **Computer Use** – We support Claude Computer Use via the computer_use argument to process_prompts_sync/async. It works with Anthropic's API; Bedrock's API is broken right now and rejects the tool definitions, but in principle this will work there too when Bedrock gets their sh*t together.
 - **Caching** – Save completions in a local or distributed cache to avoid repeated LLM calls to process the same input.
 - **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our client or with the `openai` and `anthropic` packages.
 - **Sync and async APIs** – Use the client from sync or async code.
 **STREAMING IS NOT IN SCOPE.** There are plenty of packages that let you stream chat completions across providers. The sole purpose of this package is to do very fast batch inference using APIs. Sorry!
+**Update 06/02/2025:** I lied, it supports (very basic) streaming now via client.stream(...). It will print tokens as they arrive, then return an APIResponse at the end. More sophisticated streaming may or may not be implemented later, don't count on it.
 ## Installation
 ```bash
@@ -233,11 +236,11 @@ asyncio.run(main())
 ## Available Models
-We support all models in `src/lm_deluge/models.py`. An older version of this client supported Bedrock and Vertex. We plan to re-implement Bedrock support (our previous support was spotty and we need to figure out cross-region inference in order to support the newest Claude models). Vertex support is not currently planned, since Google allows you to connect your Vertex account to AI Studio, and Vertex authentication is a huge pain (requires service account credentials, etc.)
+We support all models in `src/lm_deluge/models.py`. Vertex support is not planned in the short term, since Google allows you to connect your Vertex account to AI Studio, and Vertex authentication is a huge pain (requires service account credentials, etc.)
 ## Feature Support
-We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We don't support tool use yet, but support is planned (keep an eye out for a unified tool definition spec that works for all models!). We support logprobs for OpenAI models that return them via the `logprobs` argument to the `LLMClient`.
+We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
 ## Built‑in tools

{lm_deluge-0.0.12 → lm_deluge-0.0.14}/README.md RENAMED Viewed

@@ -3,16 +3,20 @@
 `lm-deluge` is a lightweight helper library for maxing out your rate limits with LLM providers. It provides the following:
 - **Unified client** – Send prompts to all relevant models with a single client.
+- **Files and Images** - Include images easily for multimodal models, and PDF files for models that support them (OpenAI and Anthropic).
 - **Massive concurrency with throttling** – Set `max_tokens_per_minute` and `max_requests_per_minute` and let it fly. The client will process as many requests as possible while respecting rate limits and retrying failures.
 - **Spray across models/providers** – Configure a client with multiple models from any provider(s), and sampling weights. The client samples a model for each request.
 - **Tool Use** – Unified API for defining tools for all providers, and creating tools automatically from python functions.
 - **MCP Support** – Instantiate a `Tool` from a local or remote MCP server so that any LLM can use it, whether or not that provider natively supports MCP.
+- **Computer Use** – We support Claude Computer Use via the computer_use argument to process_prompts_sync/async. It works with Anthropic's API; Bedrock's API is broken right now and rejects the tool definitions, but in principle this will work there too when Bedrock gets their sh*t together.
 - **Caching** – Save completions in a local or distributed cache to avoid repeated LLM calls to process the same input.
 - **Convenient message constructor** – No more looking up how to build an Anthropic messages list with images. Our `Conversation` and `Message` classes work great with our client or with the `openai` and `anthropic` packages.
 - **Sync and async APIs** – Use the client from sync or async code.
 **STREAMING IS NOT IN SCOPE.** There are plenty of packages that let you stream chat completions across providers. The sole purpose of this package is to do very fast batch inference using APIs. Sorry!
+**Update 06/02/2025:** I lied, it supports (very basic) streaming now via client.stream(...). It will print tokens as they arrive, then return an APIResponse at the end. More sophisticated streaming may or may not be implemented later, don't count on it.
 ## Installation
 ```bash
@@ -205,11 +209,11 @@ asyncio.run(main())
 ## Available Models
-We support all models in `src/lm_deluge/models.py`. An older version of this client supported Bedrock and Vertex. We plan to re-implement Bedrock support (our previous support was spotty and we need to figure out cross-region inference in order to support the newest Claude models). Vertex support is not currently planned, since Google allows you to connect your Vertex account to AI Studio, and Vertex authentication is a huge pain (requires service account credentials, etc.)
+We support all models in `src/lm_deluge/models.py`. Vertex support is not planned in the short term, since Google allows you to connect your Vertex account to AI Studio, and Vertex authentication is a huge pain (requires service account credentials, etc.)
 ## Feature Support
-We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We don't support tool use yet, but support is planned (keep an eye out for a unified tool definition spec that works for all models!). We support logprobs for OpenAI models that return them via the `logprobs` argument to the `LLMClient`.
+We support structured outputs via `json_mode` parameter provided to `SamplingParams`. Structured outputs with a schema are planned. Reasoning models are supported via the `reasoning_effort` parameter, which is translated to a thinking budget for Claude/Gemini. Image models are supported. We support tool use as documented above. We support logprobs for OpenAI models that return them.
 ## Built‑in tools

{lm_deluge-0.0.12 → lm_deluge-0.0.14}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools", "wheel"]
 [project]
 name = "lm_deluge"
-version = "0.0.12"
+version = "0.0.14"
 authors = [{ name = "Benjamin Anderson", email = "ben@trytaylor.ai" }]
 description = "Python utility for using LLM API models."
 readme = "README.md"
@@ -28,6 +28,5 @@ dependencies = [
     "pdf2image",
     "pillow",
     "fastmcp>=2.4",
-    "fasttext-wheel",
-    "fasttext-langdetect",
+    "rich"
 ]

lm_deluge-0.0.14/src/lm_deluge/__init__.py ADDED Viewed

@@ -0,0 +1,17 @@
+from .client import LLMClient, SamplingParams, APIResponse
+from .prompt import Conversation, Message
+from .tool import Tool
+from .file import File
+import dotenv
+dotenv.load_dotenv()
+__all__ = [
+    "LLMClient",
+    "SamplingParams",
+    "APIResponse",
+    "Conversation",
+    "Message",
+    "Tool",
+    "File",
+]

lm_deluge-0.0.14/src/lm_deluge/agent.py ADDED Viewed

File without changes

{lm_deluge-0.0.12 → lm_deluge-0.0.14}/src/lm_deluge/api_requests/anthropic.py RENAMED Viewed

@@ -1,9 +1,6 @@
-import asyncio
 from aiohttp import ClientResponse
 import json
 import os
-import warnings
-from tqdm import tqdm
 from typing import Callable
 from lm_deluge.prompt import (
@@ -14,12 +11,84 @@ from lm_deluge.prompt import (
     Thinking,
     CachePattern,
 )
+from lm_deluge.tool import Tool
 from lm_deluge.usage import Usage
 from .base import APIRequestBase, APIResponse
 from ..tracker import StatusTracker
-from ..sampling_params import SamplingParams
+from ..config import SamplingParams
 from ..models import APIModel
+from ..computer_use.anthropic_tools import get_anthropic_cu_tools
+def _build_anthropic_request(
+    model: APIModel,
+    prompt: Conversation,
+    tools: list[Tool] | None,
+    sampling_params: SamplingParams,
+    cache_pattern: CachePattern | None = None,
+    computer_use: bool = False,
+    display_width: int = 1024,
+    display_height: int = 768,
+):
+    system_message, messages = prompt.to_anthropic(cache_pattern=cache_pattern)
+    request_header = {
+        "x-api-key": os.getenv(model.api_key_env_var),
+        "anthropic-version": "2023-06-01",
+        "content-type": "application/json",
+    }
+    # Add beta header for Computer Use
+    if computer_use:
+        request_header["anthropic-beta"] = "computer-use-2025-01-24"
+    request_json = {
+        "model": model.name,
+        "messages": messages,
+        "temperature": sampling_params.temperature,
+        "top_p": sampling_params.top_p,
+        "max_tokens": sampling_params.max_new_tokens,
+    }
+    # handle thinking
+    if model.reasoning_model and sampling_params.reasoning_effort:
+        # translate reasoning effort of low, medium, high to budget tokens
+        budget = {"low": 1024, "medium": 4096, "high": 16384}.get(
+            sampling_params.reasoning_effort
+        )
+        request_json["thinking"] = {
+            "type": "enabled",
+            "budget_tokens": budget,
+        }
+        request_json.pop("top_p")
+        request_json["temperature"] = 1.0
+        request_json["max_tokens"] += budget
+    else:
+        request_json["thinking"] = {"type": "disabled"}
+        if sampling_params.reasoning_effort:
+            print("ignoring reasoning_effort for non-reasoning model")
+    if system_message is not None:
+        request_json["system"] = system_message
+    if tools or computer_use:
+        tool_definitions = []
+        if tools:
+            tool_definitions.extend([tool.dump_for("anthropic") for tool in tools])
+        # Add Computer Use tools
+        if computer_use:
+            cu_tools = get_anthropic_cu_tools(
+                model=model.id,
+                display_width=display_width,  # todo: set from ComputerUseParams
+                display_height=display_height,
+            )
+            tool_definitions.extend(cu_tools)
+        # Add cache control to last tool if tools_only caching is specified
+        if cache_pattern == "tools_only" and tool_definitions:
+            tool_definitions[-1]["cache_control"] = {"type": "ephemeral"}
+        request_json["tools"] = tool_definitions
+    return request_json, request_header
 class AnthropicRequest(APIRequestBase):
@@ -32,18 +101,19 @@ class AnthropicRequest(APIRequestBase):
         prompt: Conversation,
         attempts_left: int,
         status_tracker: StatusTracker,
-        retry_queue: asyncio.Queue,
         results_arr: list,
         request_timeout: int = 30,
         sampling_params: SamplingParams = SamplingParams(),
-        pbar: tqdm | None = None,
         callback: Callable | None = None,
-        debug: bool = False,
         # for retries
         all_model_names: list[str] | None = None,
         all_sampling_params: list[SamplingParams] | None = None,
         tools: list | None = None,
         cache: CachePattern | None = None,
+        # Computer Use support
+        computer_use: bool = False,
+        display_width: int = 1024,
+        display_height: int = 768,
     ):
         super().__init__(
             task_id=task_id,
@@ -51,18 +121,18 @@ class AnthropicRequest(APIRequestBase):
             prompt=prompt,
             attempts_left=attempts_left,
             status_tracker=status_tracker,
-            retry_queue=retry_queue,
             results_arr=results_arr,
             request_timeout=request_timeout,
             sampling_params=sampling_params,
-            pbar=pbar,
             callback=callback,
-            debug=debug,
             all_model_names=all_model_names,
             all_sampling_params=all_sampling_params,
             tools=tools,
             cache=cache,
         )
+        self.computer_use = computer_use
+        self.display_width = display_width
+        self.display_height = display_height
         self.model = APIModel.from_registry(model_name)
         self.url = f"{self.model.api_base}/messages"
@@ -70,52 +140,16 @@ class AnthropicRequest(APIRequestBase):
         if cache is not None:
             prompt.lock_images_as_bytes()
-        self.system_message, messages = prompt.to_anthropic(cache_pattern=cache)
-        self.request_header = {
-            "x-api-key": os.getenv(self.model.api_key_env_var),
-            "anthropic-version": "2023-06-01",
-            "content-type": "application/json",
-        }
-        self.request_json = {
-            "model": self.model.name,
-            "messages": messages,
-            "temperature": self.sampling_params.temperature,
-            "top_p": self.sampling_params.top_p,
-            "max_tokens": self.sampling_params.max_new_tokens,
-        }
-        # handle thinking
-        if self.model.reasoning_model:
-            if sampling_params.reasoning_effort:
-                # translate reasoning effort of low, medium, high to budget tokens
-                budget = {"low": 1024, "medium": 4096, "high": 16384}.get(
-                    sampling_params.reasoning_effort
-                )
-                self.request_json["thinking"] = {
-                    "type": "enabled",
-                    "budget_tokens": budget,
-                }
-                self.request_json.pop("top_p")
-                self.request_json["temperature"] = 1.0
-                self.request_json["max_tokens"] += (
-                    budget  # assume max tokens is max completion tokens
-                )
-            else:
-                # no thinking
-                self.request_json["thinking"] = {"type": "disabled"}
-        else:
-            if sampling_params.reasoning_effort:
-                warnings.warn(
-                    f"Ignoring reasoning_effort param for non-reasoning model: {model_name}"
-                )
-        if self.system_message is not None:
-            self.request_json["system"] = self.system_message
-        if tools:
-            tool_definitions = [tool.dump_for("anthropic") for tool in tools]
-            # Add cache control to last tool if tools_only caching is specified
-            if cache == "tools_only" and tool_definitions:
-                tool_definitions[-1]["cache_control"] = {"type": "ephemeral"}
-            self.request_json["tools"] = tool_definitions
+        self.request_json, self.request_header = _build_anthropic_request(
+            self.model,
+            prompt,
+            tools,
+            sampling_params,
+            cache,
+            computer_use,
+            display_width,
+            display_height,
+        )
     async def handle_response(self, http_response: ClientResponse) -> APIResponse:
         is_error = False
@@ -135,8 +169,6 @@ class AnthropicRequest(APIRequestBase):
             "anthropic-ratelimit-tokens-reset",
         ]:
             rate_limits[header] = http_response.headers.get(header, None)
-        if self.debug:
-            print(f"Rate limits: {rate_limits}")
         if status_code >= 200 and status_code < 300:
             try:
                 data = await http_response.json()

lm-deluge 0.0.12__tar.gz → 0.0.14__tar.gz

Potentially problematic release.

lm-deluge 0.0.12tar.gz → 0.0.14tar.gz