PyPI - ai-microcore - Versions diffs - 5.0.0.dev5__tar.gz → 5.0.0.dev7__tar.gz - Mend

ai-microcore 5.0.0.dev5tar.gz → 5.0.0.dev7tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

{ai_microcore-5.0.0.dev5 → ai_microcore-5.0.0.dev7}/PKG-INFO RENAMED Viewed

@@ -1,13 +1,14 @@
 Metadata-Version: 2.4
 Name: ai-microcore
-Version: 5.0.0.dev5
+Version: 5.0.0.dev7
 Summary: # Minimalistic Foundation for AI Applications
-Keywords: llm,large language models,ai,similarity search,ai search,gpt,openai,framework,adapter
+Keywords: llm,large language models,ai,similarity search,ai search,gpt,openai,framework,adapter,anthropic,google gemini,google vertex ai
 Author-email: Vitalii Stepanenko <mail@vitaliy.in>
 Maintainer-email: Vitalii Stepanenko <mail@vitaliy.in>
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Programming Language :: Python :: 3.13
@@ -29,6 +30,7 @@ Requires-Dist: mcp>=1.10.1,<2.0
 Requires-Dist: fastmcp>=2.10.2,<3.0
 Requires-Dist: docstring_parser~=0.16.0
 Requires-Dist: httpx~=0.28.1
+Project-URL: Bug Tracker, https://github.com/Nayjest/ai-microcore/issues
 Project-URL: Source Code, https://github.com/Nayjest/ai-microcore
 # AI MicroCore: A Minimalistic Foundation for AI Applications
@@ -53,7 +55,7 @@ It defines interfaces for features typically used in AI applications,
 which allows you to keep your application as simple as possible and try various models & services
 without need to change your application code.
-You even can switch between text completion and chat completion models only using configuration.
+You can even switch between text completion and chat completion models only using configuration.
 Thanks to LLM-agnostic MCP integration,
 **MicroCore** connects MCP tools to any language models easily,
@@ -105,7 +107,7 @@ Similarity search features will work out of the box if you have the `chromadb` p
 There are a few options available for configuring microcore:
 -   Use `microcore.configure(**params)`
-    <br>💡 <small>All configuration options should be available in IDE autocompletion tooltips</small>
+    <br>💡 <small>All configuration options appear in IDE autocompletion tooltips</small>
 -   Create a `.env` file in your project root; examples: [basic.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.example), [Mistral Large.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.mistral.example), [Anthropic Claude 3 Opus.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.anthropic.example), [Gemini on Vertex AI.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.google-vertex-gemini.example), [Gemini on AI Studio.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.gemini.example)
 -   Use a custom configuration file: `mc.configure(DOT_ENV_FILE='dev-config.ini')`
 -   Define OS environment variables
@@ -113,7 +115,7 @@ There are a few options available for configuring microcore:
 For the full list of available configuration options, you may also check [`microcore/config.py`](https://github.com/Nayjest/ai-microcore/blob/main/microcore/configuration.py#L175).
 ### Installing vendor-specific packages
-For the models working not via OpenAI API, you may need to install additional packages:
+For models working not via OpenAI API, you may need to install additional packages:
 #### Anthropic Claude 3
 ```bash
 pip install anthropic
@@ -132,7 +134,8 @@ and [configure the authorization](https://cloud.google.com/sdk/docs/authorizing)
 #### Local language models via Hugging Face Transformers
-You will need to install transformers and deep learning library of your choice (PyTorch, TensorFlow, Flax, etc).
+You will need to install transformers and a deep learning library of your choice
+(PyTorch, TensorFlow, Flax, etc).
 See [transformers installation](https://huggingface.co/docs/transformers/installation).
@@ -148,13 +151,13 @@ See [transformers installation](https://huggingface.co/docs/transformers/install
 Vector database functions are available via `microcore.texts`.
 #### ChromaDB
-Default vector database is [Chroma](https://www.trychroma.com/).
+The default vector database is [Chroma](https://www.trychroma.com/).
 In order to use vector database functions with ChromaDB, you need to install the `chromadb` package:
 ```bash
 pip install chromadb
 ```
-By default, MicroCore will use ChromaDB PersistentClient (if corresponding package is installed).
-Alternatively, you can run Chroma as separate service and configure MicroCore to use HttpClient:
+By default, MicroCore will use ChromaDB PersistentClient (if the corresponding package is installed).
+Alternatively, you can run Chroma as a separate service and configure MicroCore to use HttpClient:
 ```python
 from microcore import configure
@@ -177,7 +180,7 @@ configure(
     EMBEDDING_DB_TYPE=EmbeddingDbType.QDRANT,
     EMBEDDING_DB_HOST="localhost",
     EMBEDDING_DB_PORT="6333",
-    EMBEDDING_DB_SIZE=384,  # dimensions quantity in used SentenceTransformer model
+    EMBEDDING_DB_SIZE=384,  # number of dimensions in the SentenceTransformer model
     EMBEDDING_DB_FUNCTION=SentenceTransformer("paraphrase-multilingual-MiniLM-L12-v2"),
 )
 ```
@@ -200,7 +203,7 @@ use_logging()
 # Basic usage
 ai_response = llm('What is your model name?')
-# You also may pass a list of strings as prompt
+# You may also pass a list of strings as prompt
 # - For chat completion models elements are treated as separate messages
 # - For completion LLMs elements are treated as text lines
 llm(['1+2', '='])
@@ -293,7 +296,7 @@ LLM Microcore supports all models & API providers having OpenAI API.
 ## 🖼️ Examples
 #### [Code review tool](https://github.com/llm-microcore/microcore/blob/main/examples/code-review-tool)
-Performs code review by LLM for changes in git .patch files in any programming languages.
+Performs a code review by LLM for changes in git .patch files in any programming languages.
 #### [Image analysis](https://colab.research.google.com/drive/1qTJ51wxCv3VlyqLt3M8OZ7183YXPFpic) (Google Colab)
 Determine the number of petals and the color of the flower from a photo (gpt-4-turbo)
@@ -315,7 +318,7 @@ Text generation using HF/Transformers model locally (example with Qwen 3 0.6B).
 @TODO
 ## 🤖 AI Modules
-**This is experimental feature.**
+**This is an experimental feature.**
 Tweaks the Python import system to provide automatic setup of MicroCore environment
 based on metadata in module docstrings.

{ai_microcore-5.0.0.dev5 → ai_microcore-5.0.0.dev7}/README.md RENAMED Viewed

@@ -20,7 +20,7 @@ It defines interfaces for features typically used in AI applications,
 which allows you to keep your application as simple as possible and try various models & services
 without need to change your application code.
-You even can switch between text completion and chat completion models only using configuration.
+You can even switch between text completion and chat completion models only using configuration.
 Thanks to LLM-agnostic MCP integration,
 **MicroCore** connects MCP tools to any language models easily,
@@ -72,7 +72,7 @@ Similarity search features will work out of the box if you have the `chromadb` p
 There are a few options available for configuring microcore:
 -   Use `microcore.configure(**params)`
-    <br>💡 <small>All configuration options should be available in IDE autocompletion tooltips</small>
+    <br>💡 <small>All configuration options appear in IDE autocompletion tooltips</small>
 -   Create a `.env` file in your project root; examples: [basic.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.example), [Mistral Large.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.mistral.example), [Anthropic Claude 3 Opus.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.anthropic.example), [Gemini on Vertex AI.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.google-vertex-gemini.example), [Gemini on AI Studio.env](https://github.com/Nayjest/ai-microcore/blob/main/.env.gemini.example)
 -   Use a custom configuration file: `mc.configure(DOT_ENV_FILE='dev-config.ini')`
 -   Define OS environment variables
@@ -80,7 +80,7 @@ There are a few options available for configuring microcore:
 For the full list of available configuration options, you may also check [`microcore/config.py`](https://github.com/Nayjest/ai-microcore/blob/main/microcore/configuration.py#L175).
 ### Installing vendor-specific packages
-For the models working not via OpenAI API, you may need to install additional packages:
+For models working not via OpenAI API, you may need to install additional packages:
 #### Anthropic Claude 3
 ```bash
 pip install anthropic
@@ -99,7 +99,8 @@ and [configure the authorization](https://cloud.google.com/sdk/docs/authorizing)
 #### Local language models via Hugging Face Transformers
-You will need to install transformers and deep learning library of your choice (PyTorch, TensorFlow, Flax, etc).
+You will need to install transformers and a deep learning library of your choice
+(PyTorch, TensorFlow, Flax, etc).
 See [transformers installation](https://huggingface.co/docs/transformers/installation).
@@ -115,13 +116,13 @@ See [transformers installation](https://huggingface.co/docs/transformers/install
 Vector database functions are available via `microcore.texts`.
 #### ChromaDB
-Default vector database is [Chroma](https://www.trychroma.com/).
+The default vector database is [Chroma](https://www.trychroma.com/).
 In order to use vector database functions with ChromaDB, you need to install the `chromadb` package:
 ```bash
 pip install chromadb
 ```
-By default, MicroCore will use ChromaDB PersistentClient (if corresponding package is installed).
-Alternatively, you can run Chroma as separate service and configure MicroCore to use HttpClient:
+By default, MicroCore will use ChromaDB PersistentClient (if the corresponding package is installed).
+Alternatively, you can run Chroma as a separate service and configure MicroCore to use HttpClient:
 ```python
 from microcore import configure
@@ -144,7 +145,7 @@ configure(
     EMBEDDING_DB_TYPE=EmbeddingDbType.QDRANT,
     EMBEDDING_DB_HOST="localhost",
     EMBEDDING_DB_PORT="6333",
-    EMBEDDING_DB_SIZE=384,  # dimensions quantity in used SentenceTransformer model
+    EMBEDDING_DB_SIZE=384,  # number of dimensions in the SentenceTransformer model
     EMBEDDING_DB_FUNCTION=SentenceTransformer("paraphrase-multilingual-MiniLM-L12-v2"),
 )
 ```
@@ -167,7 +168,7 @@ use_logging()
 # Basic usage
 ai_response = llm('What is your model name?')
-# You also may pass a list of strings as prompt
+# You may also pass a list of strings as prompt
 # - For chat completion models elements are treated as separate messages
 # - For completion LLMs elements are treated as text lines
 llm(['1+2', '='])
@@ -260,7 +261,7 @@ LLM Microcore supports all models & API providers having OpenAI API.
 ## 🖼️ Examples
 #### [Code review tool](https://github.com/llm-microcore/microcore/blob/main/examples/code-review-tool)
-Performs code review by LLM for changes in git .patch files in any programming languages.
+Performs a code review by LLM for changes in git .patch files in any programming languages.
 #### [Image analysis](https://colab.research.google.com/drive/1qTJ51wxCv3VlyqLt3M8OZ7183YXPFpic) (Google Colab)
 Determine the number of petals and the color of the flower from a photo (gpt-4-turbo)
@@ -282,7 +283,7 @@ Text generation using HF/Transformers model locally (example with Qwen 3 0.6B).
 @TODO
 ## 🤖 AI Modules
-**This is experimental feature.**
+**This is an experimental feature.**
 Tweaks the Python import system to provide automatic setup of MicroCore environment
 based on metadata in module docstrings.

{ai_microcore-5.0.0.dev5 → ai_microcore-5.0.0.dev7}/microcore/__init__.py RENAMED Viewed

@@ -19,7 +19,6 @@ from ._env import configure, env, config, min_setup
 from .logging import use_logging
 from .message_types import UserMsg, AssistantMsg, SysMsg, Msg, PartialMsg
 from .configuration import (
-    ApiType,
     LLMApiBaseError,
     LLMApiDeploymentIdError,
     LLMApiKeyError,
@@ -29,6 +28,7 @@ from .configuration import (
     EmbeddingDbType,
     PRINT_STREAM,
 )
+from .llm_backends import ApiPlatform, ApiType
 from .types import BadAIJsonAnswer, BadAIAnswer, LLMContextLengthExceededError
 from .wrappers.prompt_wrapper import PromptWrapper
 from .wrappers.llm_response_wrapper import LLMResponse
@@ -194,6 +194,7 @@ __all__ = [
     "AssistantMsg",
     "PartialMsg",
     "ApiType",
+    "ApiPlatform",
     "EmbeddingDbType",
     "BadAIJsonAnswer",
     "PRINT_STREAM",
@@ -230,4 +231,4 @@ __all__ = [
     # "wrappers",
 ]
-__version__ = "5.0.0dev5"
+__version__ = "5.0.0.dev7"

{ai_microcore-5.0.0.dev5 → ai_microcore-5.0.0.dev7}/microcore/_env.py RENAMED Viewed

@@ -11,11 +11,11 @@ import jinja2
 from .embedding_db import AbstractEmbeddingDB
 from .configuration import (
     Config,
-    ApiType,
     LLMConfigError,
     EmbeddingDbType,
     PRINT_STREAM,
 )
+from .llm_backends import ApiType
 from .presets import MIN_SETUP
 from .lm_client import BaseAIClient
 from .types import TplFunctionType, LLMAsyncFunctionType, LLMFunctionType
@@ -126,23 +126,11 @@ class Env:
             self.llm_function, self.llm_async_function = make_anthropic_llm_functions(
                 self.config
             )
-        elif self.config.LLM_API_TYPE == ApiType.GOOGLE_VERTEX_AI:
-            try:
-                from .llm.google_vertex_ai import (
-                    make_llm_functions as make_google_vertex_llm_functions,
-                )
-            except ModuleNotFoundError as e:
-                raise ModuleNotFoundError(
-                    "To use the Google Vertex language models, "
-                    "you need to install the `vertexai` package "
-                    "and authenticate with Google Cloud cli."
-                    "Run `pip install vertexai`."
-                ) from e
-            (
-                self.llm_function,
-                self.llm_async_function,
-            ) = make_google_vertex_llm_functions(self.config)
-        elif self.config.LLM_API_TYPE in (ApiType.GOOGLE, ApiType.GOOGLE_AI_STUDIO):
+        elif self.config.LLM_API_TYPE in (
+            ApiType.GOOGLE,
+            ApiType.GOOGLE_AI_STUDIO,  # @deprecated
+            ApiType.GOOGLE_VERTEX_AI  # @deprecated
+        ):
             try:
                 from .llm.google_genai import GoogleClient
             except ModuleNotFoundError as e:

{ai_microcore-5.0.0.dev5 → ai_microcore-5.0.0.dev7}/microcore/_llm_functions.py RENAMED Viewed

@@ -3,20 +3,29 @@ import logging
 from datetime import datetime
 from typing import Any
 from .utils import run_parallel, RETURN_EXCEPTION
-from .wrappers.llm_response_wrapper import LLMResponse, DictFromLLMResponse, ImageGenerationResponse
-from .types import TPrompt, LLMContextLengthExceededError
+from .wrappers.llm_response_wrapper import (
+    LLMResponse,
+    DictFromLLMResponse,
+    ImageGenerationResponse,
+)
+from .types import (
+    TPrompt,
+    LLMContextLengthExceededError,
+    LLMQuotaExceededError,
+    LLMAuthError,
+)
 from .file_cache import (
     cache_hit,
     load_cache,
     save_cache,
     build_cache_name,
-    delete_cache
+    delete_cache,
 )
 from ._env import env
+# pylint: disable=too-many-return-statements,too-many-branches
 def convert_exception(e: Exception, model: str = None) -> Exception | None:
     """
     Convert LLM exceptions microcore-specific exceptions if possible.
@@ -26,46 +35,142 @@ def convert_exception(e: Exception, model: str = None) -> Exception | None:
     Returns:
         Converted exception or None if no conversion is possible
     """
+    def with_cause(new_exception: Exception) -> Exception:
+        """
+        Attach a cause to an exception without raising it.
+        Equivalent to `raise new_exc from cause` but returns the exception
+        instead of raising, preserving the exception chain for later use.
+        """
+        new_exception.__cause__ = e
+        return new_exception
     if not isinstance(e, Exception):
         return None
     t, msg = f"{type(e).__module__}.{type(e).__name__}", str(e)
     max_tokens, actual_tokens = None, None
-    if t == "openai.BadRequestError" and "context_length_exceeded" in msg:
-        match = re.search(
-            r"maximum context length is (\d+) tokens.*?resulted in (\d+) tokens",
-            msg
-        )
-        if match:
-            max_tokens = int(match.group(1))
-            actual_tokens = int(match.group(2))
-        return LLMContextLengthExceededError(
-            actual_tokens=actual_tokens,
-            max_tokens=max_tokens,
-            model=model
-        )
+    if t == "openai.BadRequestError":
+        if "context_length_exceeded" in msg:
+            match = re.search(
+                r"maximum context length is (\d+) tokens.*?resulted in (\d+) tokens",
+                msg,
+            )
+            if match:
+                max_tokens = int(match.group(1))
+                actual_tokens = int(match.group(2))
+            return with_cause(
+                LLMContextLengthExceededError(
+                    actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+                )
+            )
+        if (
+            "Please reduce the length of the messages or completion." in msg
+        ):  # Groq, no details
+            return with_cause(LLMContextLengthExceededError(model=model))
+        # x.ai grok-fast
+        if (
+            "This model's maximum prompt length is" in msg
+            and "but the request contains" in msg
+            and "tokens" in msg
+        ):
+            match = re.search(
+                r"maximum prompt length is (\d+) but the request contains (\d+) tokens",
+                msg,
+            )
+            if match:
+                max_tokens = int(match.group(1))
+                actual_tokens = int(match.group(2))
+            return with_cause(
+                LLMContextLengthExceededError(
+                    actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+                )
+            )
+        if "maximum context length" in msg:  # Mistral, # DeepSeek
+            if match := re.search(
+                r"Prompt contains (\d+) tokens.*?model with (\d+) maximum context length",
+                msg,
+            ):  # Mistral
+                max_tokens = int(match.group(2))
+                actual_tokens = int(match.group(1))
+            elif match := re.search(
+                r"maximum context length is (\d+) tokens.*? you requested (\d+) tokens",
+                msg,
+            ):  # DeepSeek
+                max_tokens = int(match.group(1))
+                actual_tokens = int(match.group(2))
+            return with_cause(
+                LLMContextLengthExceededError(
+                    actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+                )
+            )
+        if "too_many_prompt_tokens" in msg:  # Perplexity
+            if match := re.search(r"User input tokens exceeds (\d+) tokens", msg):
+                max_tokens = int(match.group(1))
+            return with_cause(
+                LLMContextLengthExceededError(
+                    actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+                )
+            )
+    if (
+        t == "openai.APIStatusError" and "413 Request Entity Too Large" in msg
+    ):  # Cerebras
+        return with_cause(LLMContextLengthExceededError(model=model))
+    if t == "openai.APIStatusError" and "Payload Too Large" in msg:  # Fireworks
+        return with_cause(LLMContextLengthExceededError(model=model))
     if t == "anthropic.BadRequestError" and "prompt is too long:" in msg:
         if match := re.search(r"(\d+)\s+tokens\s+>\s+(\d+)\s+maximum", msg):
             max_tokens = int(match.group(2))
             actual_tokens = int(match.group(1))
-        return LLMContextLengthExceededError(
-            actual_tokens=actual_tokens,
-            max_tokens=max_tokens,
-            model=model
+        return with_cause(
+            LLMContextLengthExceededError(
+                actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+            )
         )
-    if (
-        t == "google.api_core.exceptions.InvalidArgument"
-        and "The input token count exceeds the maximum number of tokens allowed" in msg
-    ):
-        if match := re.search(
-            r"The input token count exceeds the maximum number of tokens allowed (\d+)",
-            msg
+    if t == "google.genai.errors.ClientError":
+        if "429" in msg and "RESOURCE_EXHAUSTED" in msg:
+            return with_cause(LLMQuotaExceededError(details=msg))
+        if (
+            "input token count" in msg
+            and "exceeds the maximum number of tokens allowed" in msg
         ):
-            max_tokens = int(match.group(1))
-        return LLMContextLengthExceededError(
-            actual_tokens=actual_tokens,
-            max_tokens=max_tokens,
-            model=model
-        )
+            # ai studio
+            if match := re.search(
+                r"input token count exceeds the maximum number of tokens allowed (\d+)",
+                msg,
+            ):
+                max_tokens = int(match.group(1))
+            # vertex
+            elif match := re.search(
+                r"input token count \((\d+)\) "
+                r"exceeds the maximum number of tokens allowed \((\d+)\)",
+                msg,
+            ):
+                actual_tokens = int(match.group(1))
+                max_tokens = int(match.group(2))
+            return with_cause(
+                LLMContextLengthExceededError(
+                    actual_tokens=actual_tokens, max_tokens=max_tokens, model=model
+                )
+            )
+    if t in (
+        "openai.AuthenticationError",
+        "anthropic.AuthenticationError",
+        "google.auth.exceptions.MalformedError",  # Vertex AI, wrong service acc. json
+    ):
+        return with_cause(LLMAuthError(msg))
+    if t == "google.genai.errors.ClientError":
+        if "API_KEY_INVALID" in msg:
+            return with_cause(LLMAuthError(msg))
+        if "PERMISSION_DENIED" in msg:  # invalid project in service account json
+            return with_cause(LLMAuthError(msg))
     return None
@@ -74,7 +179,7 @@ def llm(
     retries: int = 0,
     parse_json: bool | dict = False,
     file_cache: bool | str = False,
-    **kwargs
+    **kwargs,
 ) -> str | LLMResponse | ImageGenerationResponse:
     """
     Request Large Language Model synchronously
@@ -123,12 +228,13 @@ def llm(
     [h(prompt, **kwargs) for h in env().llm_before_handlers]
     start = datetime.now()
-    if (file_cache and cache_hit(
+    if file_cache and cache_hit(
         cache_name := build_cache_name(
-            prompt, kwargs,
-            prefix=file_cache if isinstance(file_cache, str) else "llm_requests"
+            prompt,
+            kwargs,
+            prefix=file_cache if isinstance(file_cache, str) else "llm_requests",
         )
-    )):
+    ):
         response: LLMResponse = load_cache(cache_name)
         response.from_file_cache = True
         tries = 0
@@ -142,7 +248,9 @@ def llm(
             except Exception as e:  # pylint: disable=W0718
                 converted_exception = convert_exception(e)
                 # If context length exceeded, or no tries left --> do not retry
-                if tries == 0 or isinstance(converted_exception, LLMContextLengthExceededError):
+                if tries == 0 or isinstance(
+                    converted_exception, (LLMContextLengthExceededError, LLMAuthError)
+                ):
                     if converted_exception:
                         raise converted_exception from e
                     raise e
@@ -161,11 +269,7 @@ def llm(
     if tries > 0:
         retry_params = dict(**kwargs)
         retry_params["retries"] = tries - 1
-        setattr(
-            response,
-            "_retry_callback",
-            lambda: llm(prompt, **retry_params)
-        )
+        setattr(response, "_retry_callback", lambda: llm(prompt, **retry_params))
     if parse_json:
         parsing_params = parse_json if isinstance(parse_json, dict) else {}
         return response.parse_json(**parsing_params)
@@ -177,7 +281,7 @@ async def allm(
     retries: int = 0,
     parse_json: bool | dict = False,
     file_cache: bool | str = False,
-    **kwargs
+    **kwargs,
 ) -> str | LLMResponse | DictFromLLMResponse | ImageGenerationResponse:
     """
     Request Large Language Model asynchronously
@@ -221,12 +325,13 @@ async def allm(
     [h(prompt, **kwargs) for h in env().llm_before_handlers]
     start = datetime.now()
-    if (file_cache and cache_hit(
+    if file_cache and cache_hit(
         cache_name := build_cache_name(
-            prompt, kwargs,
-            prefix=file_cache if isinstance(file_cache, str) else "llm_requests"
+            prompt,
+            kwargs,
+            prefix=file_cache if isinstance(file_cache, str) else "llm_requests",
         )
-    )):
+    ):
         response: LLMResponse = load_cache(cache_name)
         response.from_file_cache = True
         tries = 0
@@ -240,7 +345,9 @@ async def allm(
             except Exception as e:  # pylint: disable=W0718
                 converted_exception = convert_exception(e)
                 # If context length exceeded, or no tries left --> do not retry
-                if tries == 0 or isinstance(converted_exception, LLMContextLengthExceededError):
+                if tries == 0 or isinstance(
+                    converted_exception, (LLMContextLengthExceededError, LLMAuthError)
+                ):
                     if converted_exception:
                         raise converted_exception from e
                     raise e
@@ -266,7 +373,9 @@ async def allm(
                 logging.info(f"Retrying... {tries} retries left")
                 if file_cache:
                     delete_cache(cache_name)
-                return await allm(prompt, retries=tries - 1, parse_json=parse_json, **kwargs)
+                return await allm(
+                    prompt, retries=tries - 1, parse_json=parse_json, **kwargs
+                )
     return response
@@ -276,7 +385,7 @@ async def llm_parallel(
     allow_failures: bool = False,
     return_on_failure: Any = RETURN_EXCEPTION,
     log_errors: bool = True,
-    **kwargs
+    **kwargs,
 ) -> list[str | LLMResponse]:
     """
     Execute multiple LLM requests in parallel

ai-microcore 5.0.0.dev5__tar.gz → 5.0.0.dev7__tar.gz

ai-microcore 5.0.0.dev5tar.gz → 5.0.0.dev7tar.gz