PyPI - lm-deluge - Versions diffs - 0.0.32__tar.gz → 0.0.33__tar.gz - Mend

lm-deluge 0.0.32tar.gz → 0.0.33tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of lm-deluge might be problematic. Click here for more details.

Files changed (62) hide show

{lm_deluge-0.0.32/src/lm_deluge.egg-info → lm_deluge-0.0.33}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lm_deluge
-Version: 0.0.32
+Version: 0.0.33
 Summary: Python utility for using LLM API models.
 Author-email: Benjamin Anderson <ben@trytaylor.ai>
 Requires-Python: >=3.10
@@ -54,12 +54,12 @@ The package relies on environment variables for API keys. Typical variables incl
 ## Quickstart
-The easiest way to get started is with the `.basic` constructor. This uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
+`LLMClient` uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(["Hello, world!"])
 print(resp[0].completion)
 ```
@@ -71,7 +71,7 @@ To distribute your requests across models, just provide a list of more than one
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic(
+client = LLMClient(
     ["gpt-4o-mini", "claude-3-haiku"],
     max_requests_per_minute=10_000
 )
@@ -85,8 +85,8 @@ print(resp[0].completion)
 API calls can be customized in a few ways.
-1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models. You can also pass many of these arguments directly to `LLMClient.basic` so you don't have to construct an entire `SamplingParams` object.
-2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, and caching.
+1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models.
+2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, caching, **and progress display style**. Set `progress="rich"` (default), `"tqdm"`, or `"manual"` to choose how progress is reported. The manual option prints an update every 30 seconds.
 3. **Arguments to process_prompts.** Per-call, you can set verbosity, whether to display progress, and whether to return just completions (rather than the full APIResponse object). This is also where you provide tools.
 Putting it all together:
@@ -109,6 +109,19 @@ await client.process_prompts_async(
 )
 ```
+### Queueing individual prompts
+You can queue prompts one at a time and track progress explicitly:
+```python
+client = LLMClient("gpt-4.1-mini", progress="tqdm")
+client.open()
+task_id = client.start_nowait("hello there")
+# ... queue more tasks ...
+results = await client.wait_for_all()
+client.close()
+```
 ## Multi-Turn Conversations
 Constructing conversations to pass to models is notoriously annoying. Each provider has a slightly different way of defining a list of messages, and with the introduction of images/multi-part messages it's only gotten worse. We provide convenience constructors so you don't have to remember all that stuff.
@@ -120,7 +133,7 @@ prompt = Conversation.system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 resps = client.process_prompts_sync([prompt])
 ```
@@ -136,9 +149,9 @@ For models that support file uploads (OpenAI, Anthropic, and Gemini), you can ea
 from lm_deluge import LLMClient, Conversation
 # Simple file upload
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 conversation = Conversation.user(
-    "Please summarize this document",
+    "Please summarize this document",
     file="path/to/document.pdf"
 )
 resps = client.process_prompts_sync([conversation])
@@ -163,7 +176,7 @@ def get_weather(city: str) -> str:
     return f"The weather in {city} is sunny and 72°F"
 tool = Tool.from_function(get_weather)
-client = LLMClient.basic("claude-3-haiku")
+client = LLMClient("claude-3-haiku")
 resps = client.process_prompts_sync(
     ["What's the weather in Paris?"],
     tools=[tool]
@@ -200,7 +213,7 @@ config = {
 all_tools = Tool.from_mcp_config(config)
 # let the model use the tools
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(
     ["List the files in the current directory"],
     tools=tools
@@ -237,7 +250,7 @@ conv = (
 )
 # Use prompt caching to cache system message and tools
-client = LLMClient.basic("claude-3-5-sonnet")
+client = LLMClient("claude-3-5-sonnet")
 resps = client.process_prompts_sync(
     [conv],
     cache="system_and_tools"  # Cache system message and any tools

{lm_deluge-0.0.32 → lm_deluge-0.0.33}/README.md RENAMED Viewed

@@ -27,12 +27,12 @@ The package relies on environment variables for API keys. Typical variables incl
 ## Quickstart
-The easiest way to get started is with the `.basic` constructor. This uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
+`LLMClient` uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(["Hello, world!"])
 print(resp[0].completion)
 ```
@@ -44,7 +44,7 @@ To distribute your requests across models, just provide a list of more than one
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic(
+client = LLMClient(
     ["gpt-4o-mini", "claude-3-haiku"],
     max_requests_per_minute=10_000
 )
@@ -58,8 +58,8 @@ print(resp[0].completion)
 API calls can be customized in a few ways.
-1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models. You can also pass many of these arguments directly to `LLMClient.basic` so you don't have to construct an entire `SamplingParams` object.
-2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, and caching.
+1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models.
+2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, caching, **and progress display style**. Set `progress="rich"` (default), `"tqdm"`, or `"manual"` to choose how progress is reported. The manual option prints an update every 30 seconds.
 3. **Arguments to process_prompts.** Per-call, you can set verbosity, whether to display progress, and whether to return just completions (rather than the full APIResponse object). This is also where you provide tools.
 Putting it all together:
@@ -82,6 +82,19 @@ await client.process_prompts_async(
 )
 ```
+### Queueing individual prompts
+You can queue prompts one at a time and track progress explicitly:
+```python
+client = LLMClient("gpt-4.1-mini", progress="tqdm")
+client.open()
+task_id = client.start_nowait("hello there")
+# ... queue more tasks ...
+results = await client.wait_for_all()
+client.close()
+```
 ## Multi-Turn Conversations
 Constructing conversations to pass to models is notoriously annoying. Each provider has a slightly different way of defining a list of messages, and with the introduction of images/multi-part messages it's only gotten worse. We provide convenience constructors so you don't have to remember all that stuff.
@@ -93,7 +106,7 @@ prompt = Conversation.system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 resps = client.process_prompts_sync([prompt])
 ```
@@ -109,9 +122,9 @@ For models that support file uploads (OpenAI, Anthropic, and Gemini), you can ea
 from lm_deluge import LLMClient, Conversation
 # Simple file upload
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 conversation = Conversation.user(
-    "Please summarize this document",
+    "Please summarize this document",
     file="path/to/document.pdf"
 )
 resps = client.process_prompts_sync([conversation])
@@ -136,7 +149,7 @@ def get_weather(city: str) -> str:
     return f"The weather in {city} is sunny and 72°F"
 tool = Tool.from_function(get_weather)
-client = LLMClient.basic("claude-3-haiku")
+client = LLMClient("claude-3-haiku")
 resps = client.process_prompts_sync(
     ["What's the weather in Paris?"],
     tools=[tool]
@@ -173,7 +186,7 @@ config = {
 all_tools = Tool.from_mcp_config(config)
 # let the model use the tools
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(
     ["List the files in the current directory"],
     tools=tools
@@ -210,7 +223,7 @@ conv = (
 )
 # Use prompt caching to cache system message and tools
-client = LLMClient.basic("claude-3-5-sonnet")
+client = LLMClient("claude-3-5-sonnet")
 resps = client.process_prompts_sync(
     [conv],
     cache="system_and_tools"  # Cache system message and any tools

{lm_deluge-0.0.32 → lm_deluge-0.0.33}/pyproject.toml RENAMED Viewed

@@ -3,7 +3,7 @@ requires = ["setuptools", "wheel"]
 [project]
 name = "lm_deluge"
-version = "0.0.32"
+version = "0.0.33"
 authors = [{ name = "Benjamin Anderson", email = "ben@trytaylor.ai" }]
 description = "Python utility for using LLM API models."
 readme = "README.md"

{lm_deluge-0.0.32 → lm_deluge-0.0.33}/src/lm_deluge/client.py RENAMED Viewed

@@ -25,11 +25,10 @@ from .tracker import StatusTracker
 # TODO: get completions as they finish, not all at once at the end.
 # TODO: add optional max_input_tokens to client so we can reject long prompts to prevent abuse
-class LLMClient(BaseModel):
+class _LLMClient(BaseModel):
     """
-    LLMClient abstracts all the fixed arguments to process_prompts_async, so you can create it
-    once and use it for more stuff without having to configure all the arguments.
-    Handles models, sampling params for each model, model weights, rate limits, etc.
+    Internal LLMClient implementation using Pydantic.
+    Keeps all validation, serialization, and existing functionality.
     """
     model_names: str | list[str] = ["gpt-4.1-mini"]
@@ -53,6 +52,9 @@ class LLMClient(BaseModel):
     top_logprobs: int | None = None
     force_local_mcp: bool = False
+    # Progress configuration
+    progress: Literal["rich", "tqdm", "manual"] = "rich"
     # Internal state for async task handling
     _next_task_id: int = PrivateAttr(default=0)
     _tasks: dict[int, asyncio.Task] = PrivateAttr(default_factory=dict)
@@ -60,6 +62,23 @@ class LLMClient(BaseModel):
     _tracker: StatusTracker | None = PrivateAttr(default=None)
     _capacity_lock: asyncio.Lock = PrivateAttr(default_factory=asyncio.Lock)
+    # Progress management for queueing API
+    def open(self, total: int | None = None, show_progress: bool = True):
+        self._tracker = StatusTracker(
+            max_requests_per_minute=self.max_requests_per_minute,
+            max_tokens_per_minute=self.max_tokens_per_minute,
+            max_concurrent_requests=self.max_concurrent_requests,
+            progress_style=self.progress,
+            use_progress_bar=show_progress,
+        )
+        self._tracker.init_progress_bar(total)
+        return self
+    def close(self):
+        if self._tracker:
+            self._tracker.log_final_status()
+            self._tracker = None
     # NEW! Builder methods
     def with_model(self, model: str):
         self.model_names = [model]
@@ -90,7 +109,7 @@ class LLMClient(BaseModel):
                 max_concurrent_requests=self.max_concurrent_requests,
                 use_progress_bar=False,
                 progress_bar_disable=True,
-                use_rich=False,
+                progress_style=self.progress,
             )
         return self._tracker
@@ -100,7 +119,7 @@ class LLMClient(BaseModel):
     @model_validator(mode="before")
     @classmethod
-    def fix_lists(cls, data) -> "LLMClient":
+    def fix_lists(cls, data) -> "_LLMClient":
         if isinstance(data.get("model_names"), str):
             data["model_names"] = [data["model_names"]]
         if not isinstance(data.get("sampling_params", []), list):
@@ -343,13 +362,10 @@ class LLMClient(BaseModel):
             max_requests_per_minute=self.max_requests_per_minute,
             max_tokens_per_minute=self.max_tokens_per_minute,
             max_concurrent_requests=self.max_concurrent_requests,
+            progress_style=self.progress,
             use_progress_bar=show_progress,
-            progress_bar_total=len(prompts),
-            progress_bar_disable=not show_progress,
-            use_rich=show_progress,
         )
-        tracker.init_progress_bar()
+        tracker.init_progress_bar(total=len(prompts), disable=not show_progress)
         # Create retry queue for failed requests
         retry_queue: asyncio.Queue[RequestContext] = asyncio.Queue()
@@ -510,6 +526,7 @@ class LLMClient(BaseModel):
         )
         task = asyncio.create_task(self._run_context(context))
         self._tasks[task_id] = task
+        tracker.add_to_total(1)
         return task_id
     async def start(
@@ -752,3 +769,70 @@ class LLMClient(BaseModel):
 #     combined_results["limiting_factor"] = limiting_factor
 #     return combined_results
+# Clean factory function with perfect IDE support
+@overload
+def LLMClient(model_names: str, **kwargs) -> _LLMClient: ...
+@overload
+def LLMClient(model_names: list[str], **kwargs) -> _LLMClient: ...
+def LLMClient(
+    model_names: str | list[str] = "gpt-4.1-mini",
+    *,
+    max_requests_per_minute: int = 1_000,
+    max_tokens_per_minute: int = 100_000,
+    max_concurrent_requests: int = 225,
+    sampling_params: list[SamplingParams] | None = None,
+    model_weights: list[float] | Literal["uniform", "dynamic"] = "uniform",
+    max_attempts: int = 5,
+    request_timeout: int = 30,
+    cache: Any = None,
+    extra_headers: dict[str, str] | None = None,
+    temperature: float = 0.75,
+    top_p: float = 1.0,
+    json_mode: bool = False,
+    max_new_tokens: int = 512,
+    reasoning_effort: Literal["low", "medium", "high", None] = None,
+    logprobs: bool = False,
+    top_logprobs: int | None = None,
+    force_local_mcp: bool = False,
+    progress: Literal["rich", "tqdm", "manual"] = "rich",
+) -> _LLMClient:
+    """
+    Create an LLMClient with model_names as a positional argument.
+    Args:
+        model_names: Model name(s) to use - can be a single string or list of strings
+        **kwargs: All other LLMClient configuration options (keyword-only)
+    Returns:
+        Configured LLMClient instance
+    """
+    # Handle default for mutable argument
+    if sampling_params is None:
+        sampling_params = []
+    # Simply pass everything to the Pydantic constructor
+    return _LLMClient(
+        model_names=model_names,
+        max_requests_per_minute=max_requests_per_minute,
+        max_tokens_per_minute=max_tokens_per_minute,
+        max_concurrent_requests=max_concurrent_requests,
+        sampling_params=sampling_params,
+        model_weights=model_weights,
+        max_attempts=max_attempts,
+        request_timeout=request_timeout,
+        cache=cache,
+        extra_headers=extra_headers,
+        temperature=temperature,
+        top_p=top_p,
+        json_mode=json_mode,
+        max_new_tokens=max_new_tokens,
+        reasoning_effort=reasoning_effort,
+        logprobs=logprobs,
+        top_logprobs=top_logprobs,
+        force_local_mcp=force_local_mcp,
+        progress=progress,
+    )

{lm_deluge-0.0.32 → lm_deluge-0.0.33}/src/lm_deluge/models.py RENAMED Viewed

@@ -1261,9 +1261,39 @@ class APIModel:
 registry: dict[str, APIModel] = {}
-def register_model(**kwargs) -> APIModel:
+def register_model(
+    id: str,
+    name: str,
+    api_base: str,
+    api_key_env_var: str,
+    api_spec: str,
+    input_cost: float | None = 0,  # $ per million input tokens
+    output_cost: float | None = 0,  # $ per million output tokens
+    supports_json: bool = False,
+    supports_logprobs: bool = False,
+    supports_responses: bool = False,
+    reasoning_model: bool = False,
+    regions: list[str] | dict[str, int] = field(default_factory=list),
+    tokens_per_minute: int | None = None,
+    requests_per_minute: int | None = None
+) -> APIModel:
     """Register a model configuration and return the created APIModel."""
-    model = APIModel(**kwargs)
+    model = APIModel(
+        id=id,
+        name=name,
+        api_base=api_base,
+        api_key_env_var=api_key_env_var,
+        api_spec=api_spec,
+        input_cost=input_cost,
+        output_cost=output_cost,
+        supports_json=supports_json,
+        supports_logprobs=supports_logprobs,
+        supports_responses=supports_responses,
+        reasoning_model=reasoning_model,
+        regions=regions,
+        tokens_per_minute=tokens_per_minute,
+        requests_per_minute=requests_per_minute
+    )
     registry[model.id] = model
     return model

{lm_deluge-0.0.32 → lm_deluge-0.0.33}/src/lm_deluge/tracker.py RENAMED Viewed

@@ -1,6 +1,7 @@
 import asyncio
 import time
 from dataclasses import dataclass, field
+from typing import Literal
 from rich.console import Console, Group
 from rich.live import Live
@@ -35,17 +36,22 @@ class StatusTracker:
     use_progress_bar: bool = True
     progress_bar_total: int | None = None
     progress_bar_disable: bool = False
+    progress_style: Literal["rich", "tqdm", "manual"] = "rich"
+    progress_print_interval: float = 30.0
     _pbar: tqdm | None = None
     # Rich display configuration
-    use_rich: bool = True
     _rich_console: Console | None = None
     _rich_live: object | None = None
-    _rich_progress: object | None = None
-    _rich_task_id: object | None = None
+    _rich_progress: Progress | None = None
+    _rich_task_id: int | None = None
     _rich_display_task: asyncio.Task | None = None
     _rich_stop_event: asyncio.Event | None = None
+    # Manual print configuration
+    _manual_display_task: asyncio.Task | None = None
+    _manual_stop_event: asyncio.Event | None = None
     def __post_init__(self):
         self.available_request_capacity = self.max_requests_per_minute
         self.available_token_capacity = self.max_tokens_per_minute
@@ -147,69 +153,75 @@ class StatusTracker:
         if not self.use_progress_bar:
             return
-        if self.use_rich:
-            self._init_rich_display(total, disable)
-        else:
-            # Use provided values or fall back to instance defaults
-            pbar_total = total if total is not None else self.progress_bar_total
-            pbar_disable = disable if disable is not None else self.progress_bar_disable
+        pbar_total = total if total is not None else self.progress_bar_total
+        pbar_disable = disable if disable is not None else self.progress_bar_disable
+        if pbar_total is None:
+            pbar_total = 0
+        self.progress_bar_total = pbar_total
+        if self.progress_style == "rich":
+            if pbar_disable:
+                return
+            self._init_rich_display(pbar_total)
+        elif self.progress_style == "tqdm":
             self._pbar = tqdm(total=pbar_total, disable=pbar_disable)
+        elif self.progress_style == "manual":
+            self._init_manual_display(pbar_total)
         self.update_pbar()
     def close_progress_bar(self):
         """Close progress bar if it exists."""
-        if self.use_rich and self._rich_stop_event:
-            self._close_rich_display()
-        elif self._pbar is not None:
-            self._pbar.close()
-            self._pbar = None
-    def _init_rich_display(self, total: int | None = None, disable: bool | None = None):
-        """Initialize Rich display components."""
-        if disable:
+        if not self.use_progress_bar:
             return
-        pbar_total = total if total is not None else self.progress_bar_total
-        if pbar_total is None:
-            pbar_total = 100  # Default fallback
+        if self.progress_style == "rich":
+            if self._rich_stop_event:
+                self._close_rich_display()
+        elif self.progress_style == "tqdm":
+            if self._pbar is not None:
+                self._pbar.close()
+                self._pbar = None
+        elif self.progress_style == "manual":
+            self._close_manual_display()
+    def _init_rich_display(self, total: int):
+        """Initialize Rich display components."""
         self._rich_console = Console()
-        self._rich_stop_event = asyncio.Event()
-        # Start the display updater task
-        self._rich_display_task = asyncio.create_task(
-            self._rich_display_updater(pbar_total)
-        )
-    async def _rich_display_updater(self, total: int):
-        """Update Rich display independently."""
-        if not self._rich_console or self._rich_stop_event is None:
-            return
-        # Create progress bar without console so we can use it in Live
-        progress = Progress(
+        self._rich_progress = Progress(
             SpinnerColumn(),
             TextColumn("Processing requests..."),
             BarColumn(),
             MofNCompleteColumn(),
         )
-        main_task = progress.add_task("requests", total=total)
+        self._rich_task_id = self._rich_progress.add_task("requests", total=total)
+        self._rich_stop_event = asyncio.Event()
+        self._rich_display_task = asyncio.create_task(self._rich_display_updater())
-        # Use Live to combine progress + text
+    async def _rich_display_updater(self):
+        """Update Rich display independently."""
+        if (
+            not self._rich_console
+            or self._rich_progress is None
+            or self._rich_task_id is None
+            or self._rich_stop_event is None
+        ):
+            return
         with Live(console=self._rich_console, refresh_per_second=10) as live:
             while not self._rich_stop_event.is_set():
                 completed = self.num_tasks_succeeded
-                progress.update(main_task, completed=completed)
+                self._rich_progress.update(
+                    self._rich_task_id,
+                    completed=completed,
+                    total=self.progress_bar_total,
+                )
-                # Create capacity info text
                 tokens_info = f"TPM Capacity: {self.available_token_capacity / 1000:.1f}k/{self.max_tokens_per_minute / 1000:.1f}k"
                 reqs_info = f"RPM Capacity: {int(self.available_request_capacity)}/{self.max_requests_per_minute}"
                 in_progress = f"In Progress: {int(self.num_tasks_in_progress)}"
                 capacity_text = Text(f"{in_progress} • {tokens_info} • {reqs_info}")
-                # Group progress bar and text
-                display = Group(progress, capacity_text)
+                display = Group(self._rich_progress, capacity_text)
                 live.update(display)
                 await asyncio.sleep(0.1)
@@ -223,15 +235,45 @@ class StatusTracker:
         self._rich_console = None
         self._rich_live = None
+        self._rich_progress = None
+        self._rich_task_id = None
         self._rich_display_task = None
         self._rich_stop_event = None
+    def _init_manual_display(self, total: int):
+        """Initialize manual progress printer."""
+        self.progress_bar_total = total
+        self._manual_stop_event = asyncio.Event()
+        self._manual_display_task = asyncio.create_task(
+            self._manual_display_updater()
+        )
+    async def _manual_display_updater(self):
+        if self._manual_stop_event is None:
+            return
+        while not self._manual_stop_event.is_set():
+            print(
+                f"Completed {self.num_tasks_succeeded}/{self.progress_bar_total} requests"
+            )
+            await asyncio.sleep(self.progress_print_interval)
+    def _close_manual_display(self):
+        if self._manual_stop_event:
+            self._manual_stop_event.set()
+        if self._manual_display_task and not self._manual_display_task.done():
+            self._manual_display_task.cancel()
+        self._manual_display_task = None
+        self._manual_stop_event = None
     def update_pbar(self, n: int = 0):
         """Update progress bar status and optionally increment.
         Args:
             n: Number of items to increment (0 means just update postfix)
         """
+        if self.progress_style != "tqdm":
+            return
         current_time = time.time()
         if self._pbar and (current_time - self.last_pbar_update_time > 1):
             self.last_pbar_update_time = current_time
@@ -249,8 +291,27 @@ class StatusTracker:
     def increment_pbar(self):
         """Increment progress bar by 1."""
-        if self.use_rich:
-            # Rich display is updated automatically by the display updater
-            pass
-        elif self._pbar:
+        if not self.use_progress_bar:
+            return
+        if self.progress_style == "tqdm" and self._pbar:
             self._pbar.update(1)
+        # rich and manual are updated elsewhere
+    def add_to_total(self, n: int = 1):
+        """Increase the total number of tasks being tracked."""
+        if self.progress_bar_total is None:
+            self.progress_bar_total = 0
+        self.progress_bar_total += n
+        if not self.use_progress_bar:
+            return
+        if self.progress_style == "tqdm" and self._pbar:
+            self._pbar.total = self.progress_bar_total
+            self._pbar.refresh()
+        elif (
+            self.progress_style == "rich"
+            and self._rich_progress
+            and self._rich_task_id is not None
+        ):
+            self._rich_progress.update(
+                self._rich_task_id, total=self.progress_bar_total
+            )

{lm_deluge-0.0.32 → lm_deluge-0.0.33/src/lm_deluge.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: lm_deluge
-Version: 0.0.32
+Version: 0.0.33
 Summary: Python utility for using LLM API models.
 Author-email: Benjamin Anderson <ben@trytaylor.ai>
 Requires-Python: >=3.10
@@ -54,12 +54,12 @@ The package relies on environment variables for API keys. Typical variables incl
 ## Quickstart
-The easiest way to get started is with the `.basic` constructor. This uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
+`LLMClient` uses sensible default arguments for rate limits and sampling parameters so that you don't have to provide a ton of arguments.
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(["Hello, world!"])
 print(resp[0].completion)
 ```
@@ -71,7 +71,7 @@ To distribute your requests across models, just provide a list of more than one
 ```python
 from lm_deluge import LLMClient
-client = LLMClient.basic(
+client = LLMClient(
     ["gpt-4o-mini", "claude-3-haiku"],
     max_requests_per_minute=10_000
 )
@@ -85,8 +85,8 @@ print(resp[0].completion)
 API calls can be customized in a few ways.
-1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models. You can also pass many of these arguments directly to `LLMClient.basic` so you don't have to construct an entire `SamplingParams` object.
-2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, and caching.
+1. **Sampling Parameters.** This determines things like structured outputs, maximum completion tokens, nucleus sampling, etc. Provide a custom `SamplingParams` to the `LLMClient` to set temperature, top_p, json_mode, max_new_tokens, and/or reasoning_effort. You can pass 1 `SamplingParams` to use for all models, or a list of `SamplingParams` that's the same length as the list of models.
+2. **Arguments to LLMClient.** This is where you set request timeout, rate limits, model name(s), model weight(s) for distributing requests across models, retries, caching, **and progress display style**. Set `progress="rich"` (default), `"tqdm"`, or `"manual"` to choose how progress is reported. The manual option prints an update every 30 seconds.
 3. **Arguments to process_prompts.** Per-call, you can set verbosity, whether to display progress, and whether to return just completions (rather than the full APIResponse object). This is also where you provide tools.
 Putting it all together:
@@ -109,6 +109,19 @@ await client.process_prompts_async(
 )
 ```
+### Queueing individual prompts
+You can queue prompts one at a time and track progress explicitly:
+```python
+client = LLMClient("gpt-4.1-mini", progress="tqdm")
+client.open()
+task_id = client.start_nowait("hello there")
+# ... queue more tasks ...
+results = await client.wait_for_all()
+client.close()
+```
 ## Multi-Turn Conversations
 Constructing conversations to pass to models is notoriously annoying. Each provider has a slightly different way of defining a list of messages, and with the introduction of images/multi-part messages it's only gotten worse. We provide convenience constructors so you don't have to remember all that stuff.
@@ -120,7 +133,7 @@ prompt = Conversation.system("You are a helpful assistant.").add(
     Message.user("What's in this image?").add_image("tests/image.jpg")
 )
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 resps = client.process_prompts_sync([prompt])
 ```
@@ -136,9 +149,9 @@ For models that support file uploads (OpenAI, Anthropic, and Gemini), you can ea
 from lm_deluge import LLMClient, Conversation
 # Simple file upload
-client = LLMClient.basic("gpt-4.1-mini")
+client = LLMClient("gpt-4.1-mini")
 conversation = Conversation.user(
-    "Please summarize this document",
+    "Please summarize this document",
     file="path/to/document.pdf"
 )
 resps = client.process_prompts_sync([conversation])
@@ -163,7 +176,7 @@ def get_weather(city: str) -> str:
     return f"The weather in {city} is sunny and 72°F"
 tool = Tool.from_function(get_weather)
-client = LLMClient.basic("claude-3-haiku")
+client = LLMClient("claude-3-haiku")
 resps = client.process_prompts_sync(
     ["What's the weather in Paris?"],
     tools=[tool]
@@ -200,7 +213,7 @@ config = {
 all_tools = Tool.from_mcp_config(config)
 # let the model use the tools
-client = LLMClient.basic("gpt-4o-mini")
+client = LLMClient("gpt-4o-mini")
 resps = client.process_prompts_sync(
     ["List the files in the current directory"],
     tools=tools
@@ -237,7 +250,7 @@ conv = (
 )
 # Use prompt caching to cache system message and tools
-client = LLMClient.basic("claude-3-5-sonnet")
+client = LLMClient("claude-3-5-sonnet")
 resps = client.process_prompts_sync(
     [conv],
     cache="system_and_tools"  # Cache system message and any tools