PyPI - lm-deluge - Versions diffs - 0.0.5__py3-none-any.whl → 0.0.7__py3-none-any.whl - Mend

lm-deluge 0.0.5py3-none-any.whl → 0.0.7py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of lm-deluge might be problematic. Click here for more details.

Files changed (21) hide show

lm_deluge/__init__.py +2 -1
lm_deluge/api_requests/base.py +1 -0
lm_deluge/api_requests/common.py +2 -11
lm_deluge/api_requests/deprecated/cohere.py +132 -0
lm_deluge/api_requests/deprecated/vertex.py +361 -0
lm_deluge/api_requests/{cohere.py → mistral.py} +37 -31
lm_deluge/api_requests/openai.py +10 -1
lm_deluge/client.py +2 -0
lm_deluge/image.py +6 -0
lm_deluge/models.py +348 -288
lm_deluge/prompt.py +11 -9
lm_deluge/util/json.py +4 -3
lm_deluge/util/xml.py +11 -12
lm_deluge-0.0.7.dist-info/METADATA +163 -0
{lm_deluge-0.0.5.dist-info → lm_deluge-0.0.7.dist-info}/RECORD +17 -18
lm_deluge/api_requests/google.py +0 -0
lm_deluge/api_requests/vertex.py +0 -361
lm_deluge/util/pdf.py +0 -45
lm_deluge-0.0.5.dist-info/METADATA +0 -127
{lm_deluge-0.0.5.dist-info → lm_deluge-0.0.7.dist-info}/WHEEL +0 -0
{lm_deluge-0.0.5.dist-info → lm_deluge-0.0.7.dist-info}/top_level.txt +0 -0

lm_deluge/api_requests/vertex.py DELETED Viewed

@@ -1,361 +0,0 @@
-# consider: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/call-gemini-using-openai-library#call-chat-completions-api
-import asyncio
-from aiohttp import ClientResponse
-import json
-import os
-import time
-from tqdm import tqdm
-from typing import Callable
-from lm_deluge.prompt import Conversation
-from .base import APIRequestBase, APIResponse
-from ..tracker import StatusTracker
-from ..sampling_params import SamplingParams
-from ..models import APIModel
-from google.oauth2 import service_account
-from google.auth.transport.requests import Request
-def get_access_token(service_account_file: str):
-    """
-    Get access token from environment variables if another process/coroutine
-    has already got them, otherwise get from service account file.
-    """
-    LAST_REFRESHED = os.getenv("VERTEX_TOKEN_LAST_REFRESHED", None)
-    LAST_REFRESHED = int(LAST_REFRESHED) if LAST_REFRESHED is not None else 0
-    VERTEX_API_TOKEN = os.getenv("VERTEX_API_TOKEN", None)
-    if VERTEX_API_TOKEN is not None and time.time() - LAST_REFRESHED < 60 * 50:
-        return VERTEX_API_TOKEN
-    else:
-        credentials = service_account.Credentials.from_service_account_file(
-            service_account_file,
-            scopes=["https://www.googleapis.com/auth/cloud-platform"],
-        )
-        credentials.refresh(Request())
-        token = credentials.token
-        os.environ["VERTEX_API_TOKEN"] = token
-        os.environ["VERTEX_TOKEN_LAST_REFRESHED"] = str(int(time.time()))
-        return token
-class VertexAnthropicRequest(APIRequestBase):
-    """
-    For Claude on Vertex, you'll also have to set the PROJECT_ID environment variable.
-    """
-    def __init__(
-        self,
-        task_id: int,
-        model_name: str,  # must correspond to registry
-        prompt: Conversation,
-        attempts_left: int,
-        status_tracker: StatusTracker,
-        retry_queue: asyncio.Queue,
-        results_arr: list,
-        request_timeout: int = 30,
-        sampling_params: SamplingParams = SamplingParams(),
-        pbar: tqdm | None = None,
-        callback: Callable | None = None,
-        debug: bool = False,
-    ):
-        super().__init__(
-            task_id=task_id,
-            model_name=model_name,
-            prompt=prompt,
-            attempts_left=attempts_left,
-            status_tracker=status_tracker,
-            retry_queue=retry_queue,
-            results_arr=results_arr,
-            request_timeout=request_timeout,
-            sampling_params=sampling_params,
-            pbar=pbar,
-            callback=callback,
-            debug=debug,
-        )
-        creds = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
-        if not creds:
-            raise RuntimeError(
-                "GOOGLE_APPLICATION_CREDENTIALS not provided in environment"
-            )
-        token = get_access_token(creds)
-        self.model = APIModel.from_registry(model_name)
-        project_id = os.getenv("PROJECT_ID")
-        region = self.model.sample_region()
-        endpoint = f"https://{region}-aiplatform.googleapis.com"
-        self.url = f"{endpoint}/v1/projects/{project_id}/locations/{region}/publishers/anthropic/models/{self.model.name}:generateContent"
-        self.request_header = {
-            "Authorization": f"Bearer {token}",
-            "Content-Type": "application/json",
-        }
-        self.system_message, messages = prompt.to_anthropic()
-        self.request_json = {
-            "anthropic_version": "vertex-2023-10-16",
-            "messages": messages,
-            "temperature": self.sampling_params.temperature,
-            "top_p": self.sampling_params.top_p,
-            "max_tokens": self.sampling_params.max_new_tokens,
-        }
-        if self.system_message is not None:
-            self.request_json["system"] = self.system_message
-    async def handle_response(self, http_response: ClientResponse) -> APIResponse:
-        is_error = False
-        error_message = None
-        completion = None
-        input_tokens = None
-        output_tokens = None
-        status_code = http_response.status
-        mimetype = http_response.headers.get("Content-Type", None)
-        if status_code >= 200 and status_code < 300:
-            try:
-                data = await http_response.json()
-                completion = data["content"][0]["text"]
-                input_tokens = data["usage"]["input_tokens"]
-                output_tokens = data["usage"]["output_tokens"]
-            except Exception as e:
-                is_error = True
-                error_message = (
-                    f"Error calling .json() on response w/ status {status_code}: {e}"
-                )
-        elif "json" in (mimetype or "").lower():
-            is_error = True  # expected status is 200, otherwise it's an error
-            data = await http_response.json()
-            error_message = json.dumps(data)
-        else:
-            is_error = True
-            text = await http_response.text()
-            error_message = text
-        # handle special kinds of errors. TODO: make sure these are correct for anthropic
-        if is_error and error_message is not None:
-            if (
-                "rate limit" in error_message.lower()
-                or "overloaded" in error_message.lower()
-                or status_code == 429
-            ):
-                error_message += " (Rate limit error, triggering cooldown.)"
-                self.status_tracker.rate_limit_exceeded()
-            if "context length" in error_message:
-                error_message += " (Context length exceeded, set retries to 0.)"
-                self.attempts_left = 0
-        return APIResponse(
-            id=self.task_id,
-            status_code=status_code,
-            is_error=is_error,
-            error_message=error_message,
-            prompt=self.prompt,
-            completion=completion,
-            model_internal=self.model_name,
-            sampling_params=self.sampling_params,
-            input_tokens=input_tokens,
-            output_tokens=output_tokens,
-        )
-SAFETY_SETTING_CATEGORIES = [
-    "HARM_CATEGORY_DANGEROUS_CONTENT",
-    "HARM_CATEGORY_HARASSMENT",
-    "HARM_CATEGORY_HATE_SPEECH",
-    "HARM_CATEGORY_SEXUALLY_EXPLICIT",
-]
-class GeminiRequest(APIRequestBase):
-    """
-    For Gemini, you'll also have to set the PROJECT_ID environment variable.
-    """
-    def __init__(
-        self,
-        task_id: int,
-        model_name: str,  # must correspond to registry
-        prompt: Conversation,
-        attempts_left: int,
-        status_tracker: StatusTracker,
-        retry_queue: asyncio.Queue,
-        results_arr: list,
-        request_timeout: int = 30,
-        sampling_params: SamplingParams = SamplingParams(),
-        pbar: tqdm | None = None,
-        callback: Callable | None = None,
-        debug: bool = False,
-        all_model_names: list[str] | None = None,
-        all_sampling_params: list[SamplingParams] | None = None,
-    ):
-        super().__init__(
-            task_id=task_id,
-            model_name=model_name,
-            prompt=prompt,
-            attempts_left=attempts_left,
-            status_tracker=status_tracker,
-            retry_queue=retry_queue,
-            results_arr=results_arr,
-            request_timeout=request_timeout,
-            sampling_params=sampling_params,
-            pbar=pbar,
-            callback=callback,
-            debug=debug,
-            all_model_names=all_model_names,
-            all_sampling_params=all_sampling_params,
-        )
-        self.model = APIModel.from_registry(model_name)
-        credentials_file = os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
-        if not credentials_file:
-            raise RuntimeError(
-                "no credentials file found. ensure you provide a google credentials file and point to it with GOOGLE_APPLICATION_CREDENTIALS environment variable."
-            )
-        token = get_access_token(credentials_file)
-        self.project_id = os.getenv("PROJECT_ID")
-        # sample weighted by region counts
-        self.region = self.model.sample_region()
-        assert self.region is not None, "unable to sample region"
-        self.url = f"https://{self.region}-aiplatform.googleapis.com/v1/projects/{self.project_id}/locations/{self.region}/publishers/google/models/{self.model.name}:generateContent"
-        self.request_header = {
-            "Authorization": f"Bearer {token}",
-            "Content-Type": "application/json",
-        }
-        self.system_message, contents = prompt.to_gemini()
-        self.request_json = {
-            "contents": contents,
-            "generationConfig": {
-                "stopSequences": [],
-                "temperature": sampling_params.temperature,
-                "maxOutputTokens": sampling_params.max_new_tokens,
-                "topP": sampling_params.top_p,
-                "topK": None,
-            },
-            "safetySettings": [
-                {"category": category, "threshold": "BLOCK_NONE"}
-                for category in SAFETY_SETTING_CATEGORIES
-            ],
-        }
-        if sampling_params.json_mode and self.model.supports_json:
-            self.request_json["generationConfig"]["responseMimeType"] = (
-                "application/json"
-            )
-        if self.system_message is not None:
-            self.request_json["systemInstruction"] = (
-                {"role": "SYSTEM", "parts": [{"text": self.system_message}]},
-            )
-    async def handle_response(self, http_response: ClientResponse) -> APIResponse:
-        is_error = False
-        error_message = None
-        completion = None
-        input_tokens = None
-        output_tokens = None
-        finish_reason = None
-        data = None
-        retry_with_different_model = False
-        give_up_if_no_other_models = False
-        status_code = http_response.status
-        mimetype = http_response.headers.get("Content-Type", None)
-        if status_code >= 200 and status_code < 300:
-            try:
-                data = await http_response.json()
-                if "candidates" not in data:
-                    is_error = True
-                    if "promptFeedback" in data:
-                        error_message = "Prompt rejected. Feedback: " + str(
-                            data["promptFeedback"]
-                        )
-                    else:
-                        error_message = "No candidates in response."
-                    retry_with_different_model = True
-                    give_up_if_no_other_models = True
-                else:
-                    candidate = data["candidates"][0]
-                    finish_reason = candidate["finishReason"]
-                    if "content" in candidate:
-                        parts = candidate["content"]["parts"]
-                        completion = " ".join([part["text"] for part in parts])
-                        usage = data["usageMetadata"]
-                        input_tokens = usage["promptTokenCount"]
-                        output_tokens = usage["candidatesTokenCount"]
-                    elif finish_reason == "RECITATION":
-                        is_error = True
-                        citations = candidate.get("citationMetadata", {}).get(
-                            "citations", []
-                        )
-                        urls = ",".join(
-                            [citation.get("uri", "") for citation in citations]
-                        )
-                        error_message = "Finish reason RECITATION. URLS: " + urls
-                        retry_with_different_model = True
-                    elif finish_reason == "OTHER":
-                        is_error = True
-                        error_message = "Finish reason OTHER."
-                        retry_with_different_model = True
-                    elif finish_reason == "SAFETY":
-                        is_error = True
-                        error_message = "Finish reason SAFETY."
-                        retry_with_different_model = True
-                    else:
-                        print("Actual structure of response:", data)
-                        is_error = True
-                        error_message = "No content in response."
-            except Exception as e:
-                is_error = True
-                error_message = f"Error calling .json() on response w/ status {status_code}: {e.__class__} {e}"
-                if isinstance(e, KeyError):
-                    print("Actual structure of response:", data)
-        elif "json" in (mimetype or "").lower():
-            is_error = True
-            data = await http_response.json()
-            error_message = json.dumps(data)
-        else:
-            is_error = True
-            text = await http_response.text()
-            error_message = text
-        old_region = self.region
-        if is_error and error_message is not None:
-            if (
-                "rate limit" in error_message.lower()
-                or "temporarily out of capacity" in error_message.lower()
-                or "exceeded" in error_message.lower()
-                or
-                # 429 code
-                status_code == 429
-            ):
-                error_message += " (Rate limit error, triggering cooldown & retrying with different model.)"
-                self.status_tracker.rate_limit_exceeded()
-                retry_with_different_model = (
-                    True  # if possible, retry with a different model
-                )
-        if is_error:
-            # change the region in case error is due to region unavailability
-            self.region = self.model.sample_region()
-            assert self.region is not None, "Unable to sample region"
-            self.url = f"https://{self.region}-aiplatform.googleapis.com/v1/projects/{self.project_id}/locations/{self.region}/publishers/google/models/{self.model.name}:generateContent"
-        return APIResponse(
-            id=self.task_id,
-            status_code=status_code,
-            is_error=is_error,
-            error_message=error_message,
-            prompt=self.prompt,
-            completion=completion,
-            model_internal=self.model_name,
-            sampling_params=self.sampling_params,
-            input_tokens=input_tokens,
-            output_tokens=output_tokens,
-            region=old_region,
-            finish_reason=finish_reason,
-            retry_with_different_model=retry_with_different_model,
-            give_up_if_no_other_models=give_up_if_no_other_models,
-        )
-# class LlamaEndpointRequest(APIRequestBase):
-#     raise NotImplementedError("Llama endpoints are not implemented and never will be because Vertex AI sucks ass.")

lm_deluge/util/pdf.py DELETED Viewed

@@ -1,45 +0,0 @@
-import io
-def text_from_pdf(pdf: str | bytes | io.BytesIO):
-    """
-    Extract text from a PDF. Does NOT use OCR, extracts the literal text.
-    The source can be:
-    - A file path (str)
-    - Bytes of a PDF file
-    - A BytesIO object containing a PDF file
-    """
-    try:
-        import pymupdf  # pyright: ignore
-    except ImportError:
-        raise ImportError(
-            "pymupdf is required to extract text from PDFs. Install lm_deluge[pdf] or lm_deluge[full]."
-        )
-    if isinstance(pdf, str):
-        # It's a file path
-        doc = pymupdf.open(pdf)
-    elif isinstance(pdf, (bytes, io.BytesIO)):
-        # It's bytes or a BytesIO object
-        if isinstance(pdf, bytes):
-            pdf = io.BytesIO(pdf)
-        doc = pymupdf.open(stream=pdf, filetype="pdf")
-    else:
-        raise ValueError("Unsupported pdf_source type. Must be str, bytes, or BytesIO.")
-    text_content = []
-    for page in doc:
-        blocks = page.get_text("blocks", sort=True)
-        for block in blocks:
-            # block[4] contains the text content
-            text_content.append(block[4].strip())
-            text_content.append("\n")  # Add extra newlines between blocks
-    # Join all text content with newlines
-    full_text = "\n".join(text_content).strip()
-    # Replace multiple consecutive spaces with a single space
-    full_text = " ".join(full_text.split())
-    # Clean up any resulting double spaces or newlines
-    full_text = " ".join([x for x in full_text.split(" ") if x])
-    full_text = "\n".join([x for x in full_text.split("\n") if x])
-    return full_text

lm_deluge-0.0.5.dist-info/METADATA DELETED Viewed

@@ -1,127 +0,0 @@
-Metadata-Version: 2.4
-Name: lm_deluge
-Version: 0.0.5
-Summary: Python utility for using LLM API models.
-Author-email: Benjamin Anderson <ben@trytaylor.ai>
-Requires-Python: >=3.10
-Description-Content-Type: text/markdown
-Requires-Dist: python-dotenv
-Requires-Dist: json5
-Requires-Dist: PyYAML
-Requires-Dist: pandas
-Requires-Dist: aiohttp
-Requires-Dist: tiktoken
-Requires-Dist: xxhash
-Requires-Dist: tqdm
-Requires-Dist: google-auth
-Requires-Dist: requests-aws4auth
-Requires-Dist: pydantic
-Requires-Dist: bs4
-Requires-Dist: lxml
-Provides-Extra: image
-Requires-Dist: pdf2image; extra == "image"
-Requires-Dist: pillow; extra == "image"
-Provides-Extra: pdf
-Requires-Dist: pdf2image; extra == "pdf"
-Requires-Dist: pymupdf; extra == "pdf"
-Provides-Extra: translate
-Requires-Dist: fasttext-wheel; extra == "translate"
-Requires-Dist: fasttext-langdetect; extra == "translate"
-Provides-Extra: full
-Requires-Dist: pillow; extra == "full"
-Requires-Dist: pdf2image; extra == "full"
-Requires-Dist: pymupdf; extra == "full"
-Requires-Dist: fasttext-wheel; extra == "full"
-Requires-Dist: fasttext-langdetect; extra == "full"
-# lm_deluge
-`lm_deluge` is a lightweight helper library for talking to large language model APIs.  It wraps several providers under a single interface, handles rate limiting, and exposes a few useful utilities for common NLP tasks.
-## Features
-- **Unified client** – send prompts to OpenAI‑compatible models, Anthropic, Cohere and Vertex hosted Claude models using the same API.
-- **Async or sync** – process prompts concurrently with `process_prompts_async` or run them synchronously with `process_prompts_sync`.
-- **Spray across providers** – configure multiple model names with weighting so requests are distributed across different providers.
-- **Caching** – optional LevelDB, SQLite or custom caches to avoid duplicate calls.
-- **Embeddings and reranking** – helper functions for embedding text and reranking documents via Cohere/OpenAI endpoints.
-- **Built‑in tools** – simple `extract`, `translate` and `score_llm` helpers for common patterns.
-## Installation
-```bash
-pip install lm_deluge
-```
-The package relies on environment variables for API keys.  Typical variables include `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `COHERE_API_KEY`, `META_API_KEY` (for Llama) and `GOOGLE_APPLICATION_CREDENTIALS` for Vertex.
-## Quickstart
-```python
-from lm_deluge import LLMClient
-client = LLMClient.basic(
-    model=["gpt-4o-mini"],    # any model id from lm_deluge.models.registry
-    temperature=0.2,
-    max_new_tokens=256,
-)
-resp = client.process_prompts_sync(["Hello, world!"])  # returns list[APIResponse]
-print(resp[0].completion)
-```
-### Asynchronous usage
-```python
-import asyncio
-async def main():
-    responses = await client.process_prompts_async(
-        ["an async call"],
-        return_completions_only=True,
-    )
-    print(responses[0])
-asyncio.run(main())
-```
-### Distributing requests across models
-You can provide multiple `model_names` and optional `model_weights` when creating an `LLMClient`.  Each prompt will be sent to one of the models based on those weights.
-```python
-client = LLMClient(
-    model_names=["gpt-4o-mini", "claude-haiku-anthropic"],
-    model_weights="rate_limit",        # or a list like [0.7, 0.3]
-    max_requests_per_minute=5000,
-    max_tokens_per_minute=1_000_000,
-    max_concurrent_requests=100,
-)
-```
-### Provider specific notes
-- **OpenAI and compatible providers** – set `OPENAI_API_KEY`.  Model ids in the registry include OpenAI models as well as Meta Llama, Grok and many others that expose OpenAI style APIs.
-- **Anthropic** – set `ANTHROPIC_API_KEY`.  Use model ids such as `claude-haiku-anthropic` or `claude-sonnet-anthropic`.
-- **Cohere** – set `COHERE_API_KEY`.  Models like `command-r` are available.
-- **Vertex Claude** – set `GOOGLE_APPLICATION_CREDENTIALS` and `PROJECT_ID`.  Use a model id such as `claude-sonnet-vertex`.
-The [models.py](src/lm_deluge/models.py) file lists every supported model and the required environment variable.
-## Built‑in tools
-The `lm_deluge.llm_tools` package exposes a few helper functions:
-- `extract` – structure text or images into a Pydantic model based on a schema.
-- `translate` – translate a list of strings to English if needed.
-- `score_llm` – simple yes/no style scoring with optional log probability output.
-Embeddings (`embed.embed_parallel_async`) and document reranking (`rerank.rerank_parallel_async`) are also provided.
-## Caching results
-`lm_deluge.cache` includes LevelDB, SQLite and custom dictionary based caches.  Pass an instance via `LLMClient(..., cache=my_cache)` and previously seen prompts will not be re‑sent.
-## Development notes
-Models and costs are defined in [src/lm_deluge/models.py](src/lm_deluge/models.py).  Conversations are built using the `Conversation` and `Message` helpers in [src/lm_deluge/prompt.py](src/lm_deluge/prompt.py), which also support images.

{lm_deluge-0.0.5.dist-info → lm_deluge-0.0.7.dist-info}/WHEEL RENAMED Viewed

File without changes

{lm_deluge-0.0.5.dist-info → lm_deluge-0.0.7.dist-info}/top_level.txt RENAMED Viewed

File without changes

lm-deluge 0.0.5__py3-none-any.whl → 0.0.7__py3-none-any.whl

Potentially problematic release.

lm-deluge 0.0.5py3-none-any.whl → 0.0.7py3-none-any.whl