PyPI - not-again-ai - Versions diffs - 0.9.0__tar.gz → 0.10.0__tar.gz - Mend

not-again-ai 0.9.0tar.gz → 0.10.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: not-again-ai
-Version: 0.9.0
+Version: 0.10.0
 Summary: Designed to once and for all collect all the little things that come up over and over again in AI projects and put them in one place.
 Home-page: https://github.com/DaveCoDev/not-again-ai
 License: MIT
@@ -21,16 +21,18 @@ Provides-Extra: llm
 Provides-Extra: local-llm
 Provides-Extra: statistics
 Provides-Extra: viz
-Requires-Dist: numpy (==1.26.4) ; extra == "statistics" or extra == "viz"
-Requires-Dist: ollama (==0.2.0) ; extra == "llm"
-Requires-Dist: openai (==1.30.1) ; extra == "llm"
+Requires-Dist: jinja2 (==3.1.4) ; extra == "local-llm"
+Requires-Dist: loguru (==0.7.2)
+Requires-Dist: numpy (==2.0.0) ; extra == "statistics" or extra == "viz"
+Requires-Dist: ollama (==0.2.1) ; extra == "local-llm"
+Requires-Dist: openai (==1.35.3) ; extra == "llm"
 Requires-Dist: pandas (==2.2.2) ; extra == "viz"
 Requires-Dist: python-liquid (==1.12.1) ; extra == "llm"
-Requires-Dist: scikit-learn (==1.4.2) ; extra == "statistics"
-Requires-Dist: scipy (==1.13.0) ; extra == "statistics"
+Requires-Dist: scikit-learn (==1.5.0) ; extra == "statistics"
+Requires-Dist: scipy (==1.13.1) ; extra == "statistics"
 Requires-Dist: seaborn (==0.13.2) ; extra == "viz"
 Requires-Dist: tiktoken (==0.7.0) ; extra == "llm"
-Requires-Dist: transformers (==4.41.1) ; extra == "local-llm"
+Requires-Dist: transformers (==4.41.2) ; extra == "local-llm"
 Project-URL: Documentation, https://github.com/DaveCoDev/not-again-ai
 Project-URL: Repository, https://github.com/DaveCoDev/not-again-ai
 Description-Content-Type: text/markdown
@@ -50,7 +52,7 @@ Description-Content-Type: text/markdown
 [ruff-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
 [mypy-badge]: https://www.mypy-lang.org/static/mypy_badge.svg
-**not-again-ai** is a collection of various building blocks that come up over and over again when developing AI products. The key goals of this package are to have simple, but flexible interfaces and to minimize dependencies. Feel free to **a)** use this as a template for your own Python package. **b)** instead of installing the package, copy and paste functions into your own projects (this is made possible with the limited amount of dependencies and the MIT license).
+**not-again-ai** is a collection of various building blocks that come up over and over again when developing AI products. The key goals of this package are to have simple, yet flexible interfaces and to minimize dependencies. It is encouraged to also **a)** use this as a template for your own Python package. **b)** instead of installing the package, copy and paste functions into your own projects. We make this easier by limiting the number of dependencies and use an MIT license.
 **Documentation** available within individual **[notebooks](notebooks)**, docstrings within the source, or auto-generated at [DaveCoDev.github.io/not-again-ai/](https://DaveCoDev.github.io/not-again-ai/).
@@ -66,24 +68,25 @@ $ pip install not_again_ai[llm,local_llm,statistics,viz]
 Note that local LLM requires separate installations and will not work out of the box due to how hardware dependent it is. Be sure to check the [notebooks](notebooks/local_llm/) for more details.
-The package is split into subpackages, so you can install only the parts you need. See the **[notebooks](notebooks)** for examples.
+The package is split into subpackages, so you can install only the parts you need.
 * **Base only**: `pip install not_again_ai`
 * **LLM**: `pip install not_again_ai[llm]`
     1. If you wish to use OpenAI
         1. Go to https://platform.openai.com/settings/profile?tab=api-keys to get your API key.
-        1. (Optionally) Set the `OPENAI_API_KEY` and the `OPENAI_ORG_ID` environment variables.
+        1. (Optional) Set the `OPENAI_API_KEY` and the `OPENAI_ORG_ID` environment variables.
+* **Local LLM**: `pip install not_again_ai[llm,llm_local]`
+    1. Some HuggingFace transformers tokenizers are gated behind access requests. If you wish to use these, you will need to request access from HuggingFace on the model card.
+       1. Then set the `HF_TOKEN` environment variable to your HuggingFace API token which can be found here: https://huggingface.co/settings/tokens
     1. If you wish to use Ollama:
-        1. follow the instructions to install ollama for your system: https://github.com/ollama/ollama
-        1. [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
-        1. If you'd like to make the ollama service accessible on your local network and it is hosted on Linux, add the following to the `/etc/systemd/system/ollama.service` file:
+        1. Follow the instructions at https://github.com/ollama/ollama to install Ollama for your system.
+        1. (Optional) [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
+        1. (Optional) To make the Ollama service accessible on your local network from a Linux server, add the following to the `/etc/systemd/system/ollama.service` file which will make Ollama available at `http://<local_address>:11434`:
             ```bash
             [Service]
             ...
             Environment="OLLAMA_HOST=0.0.0.0"
             ```
-        1. Ollama will be available at `http://<local_address>:11434`
-* **Local LLM**: `pip install not_again_ai[llm_local]`
-    - Most of this package is hardware dependent so this only installs some generic dependencies. Be sure to check the [notebooks](notebooks/local_llm/) for more details on what is available and how to install it.
+    1. HuggingFace transformers and other requirements are hardware dependent so for providers other than Ollama, this only installs some generic dependencies. Check the [notebooks](notebooks/local_llm/) for more details on what is available and how to install it.
 * **Statistics**: `pip install not_again_ai[statistics]`
 * **Visualization**: `pip install not_again_ai[viz]`
@@ -302,9 +305,9 @@ Install the [Python extension](https://marketplace.visualstudio.com/items?itemNa
 Install the [Ruff extension](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) for VSCode.
-Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.json). This will enable Ruff and black with consistent settings.
+Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.json) which will enable Ruff with consistent settings.
-# Documentation
+# Generating Documentation
 ## Generating a User Guide

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/README.md RENAMED Viewed

@@ -13,7 +13,7 @@
 [ruff-badge]: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
 [mypy-badge]: https://www.mypy-lang.org/static/mypy_badge.svg
-**not-again-ai** is a collection of various building blocks that come up over and over again when developing AI products. The key goals of this package are to have simple, but flexible interfaces and to minimize dependencies. Feel free to **a)** use this as a template for your own Python package. **b)** instead of installing the package, copy and paste functions into your own projects (this is made possible with the limited amount of dependencies and the MIT license).
+**not-again-ai** is a collection of various building blocks that come up over and over again when developing AI products. The key goals of this package are to have simple, yet flexible interfaces and to minimize dependencies. It is encouraged to also **a)** use this as a template for your own Python package. **b)** instead of installing the package, copy and paste functions into your own projects. We make this easier by limiting the number of dependencies and use an MIT license.
 **Documentation** available within individual **[notebooks](notebooks)**, docstrings within the source, or auto-generated at [DaveCoDev.github.io/not-again-ai/](https://DaveCoDev.github.io/not-again-ai/).
@@ -29,24 +29,25 @@ $ pip install not_again_ai[llm,local_llm,statistics,viz]
 Note that local LLM requires separate installations and will not work out of the box due to how hardware dependent it is. Be sure to check the [notebooks](notebooks/local_llm/) for more details.
-The package is split into subpackages, so you can install only the parts you need. See the **[notebooks](notebooks)** for examples.
+The package is split into subpackages, so you can install only the parts you need.
 * **Base only**: `pip install not_again_ai`
 * **LLM**: `pip install not_again_ai[llm]`
     1. If you wish to use OpenAI
         1. Go to https://platform.openai.com/settings/profile?tab=api-keys to get your API key.
-        1. (Optionally) Set the `OPENAI_API_KEY` and the `OPENAI_ORG_ID` environment variables.
+        1. (Optional) Set the `OPENAI_API_KEY` and the `OPENAI_ORG_ID` environment variables.
+* **Local LLM**: `pip install not_again_ai[llm,llm_local]`
+    1. Some HuggingFace transformers tokenizers are gated behind access requests. If you wish to use these, you will need to request access from HuggingFace on the model card.
+       1. Then set the `HF_TOKEN` environment variable to your HuggingFace API token which can be found here: https://huggingface.co/settings/tokens
     1. If you wish to use Ollama:
-        1. follow the instructions to install ollama for your system: https://github.com/ollama/ollama
-        1. [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
-        1. If you'd like to make the ollama service accessible on your local network and it is hosted on Linux, add the following to the `/etc/systemd/system/ollama.service` file:
+        1. Follow the instructions at https://github.com/ollama/ollama to install Ollama for your system.
+        1. (Optional) [Add Ollama as a startup service (recommended)](https://github.com/ollama/ollama/blob/main/docs/linux.md#adding-ollama-as-a-startup-service-recommended)
+        1. (Optional) To make the Ollama service accessible on your local network from a Linux server, add the following to the `/etc/systemd/system/ollama.service` file which will make Ollama available at `http://<local_address>:11434`:
             ```bash
             [Service]
             ...
             Environment="OLLAMA_HOST=0.0.0.0"
             ```
-        1. Ollama will be available at `http://<local_address>:11434`
-* **Local LLM**: `pip install not_again_ai[llm_local]`
-    - Most of this package is hardware dependent so this only installs some generic dependencies. Be sure to check the [notebooks](notebooks/local_llm/) for more details on what is available and how to install it.
+    1. HuggingFace transformers and other requirements are hardware dependent so for providers other than Ollama, this only installs some generic dependencies. Check the [notebooks](notebooks/local_llm/) for more details on what is available and how to install it.
 * **Statistics**: `pip install not_again_ai[statistics]`
 * **Visualization**: `pip install not_again_ai[viz]`
@@ -265,9 +266,9 @@ Install the [Python extension](https://marketplace.visualstudio.com/items?itemNa
 Install the [Ruff extension](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) for VSCode.
-Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.json). This will enable Ruff and black with consistent settings.
+Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.json) which will enable Ruff with consistent settings.
-# Documentation
+# Generating Documentation
 ## Generating a User Guide

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "not-again-ai"
-version = "0.9.0"
+version = "0.10.0"
 description = "Designed to once and for all collect all the little things that come up over and over again in AI projects and put them in one place."
 authors = ["DaveCoDev <dave.co.dev@gmail.com>"]
 license = "MIT"
@@ -26,21 +26,24 @@ classifiers = [
 # result in an old version being resolved/locked.
 python = "^3.11 || ^3.12"
+loguru = { version = "==0.7.2" }
 # Optional dependencies are defined here, and groupings are defined below.
-numpy = { version = "==1.26.4", optional = true }
-ollama = { version = "==0.2.0", optional = true }
-openai = { version = "==1.30.1", optional = true }
+jinja2 = { version = "==3.1.4", optional = true }
+numpy = { version = "==2.0.0", optional = true }
+ollama = { version = "==0.2.1", optional = true }
+openai = { version = "==1.35.3", optional = true }
 pandas = { version = "==2.2.2", optional = true }
 python-liquid = { version = "==1.12.1", optional = true }
-scipy = { version = "==1.13.0", optional = true }
-scikit-learn = { version = "==1.4.2", optional = true }
+scipy = { version = "==1.13.1", optional = true }
+scikit-learn = { version = "==1.5.0", optional = true }
 seaborn = { version = "==0.13.2", optional = true }
 tiktoken = { version = "==0.7.0", optional = true }
-transformers = { version = "==4.41.1", optional = true }
+transformers = { version = "==4.41.2", optional = true }
 [tool.poetry.extras]
-llm = ["ollama", "openai", "python-liquid", "tiktoken"]
-local_llm = ["transformers"]
+llm = ["openai", "python-liquid", "tiktoken"]
+local_llm = ["jinja2", "ollama", "transformers"]
 statistics = ["numpy", "scikit-learn", "scipy"]
 viz = ["numpy", "pandas", "seaborn"]

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/llm/openai_api/chat_completion.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import contextlib
 import json
+import time
 from typing import Any
 from openai import OpenAI
@@ -71,6 +72,7 @@ def chat_completion(
                 NOTE: If n > 1 this is the sum of all completions.
             'prompt_tokens' (int): The number of tokens in the messages sent to the model.
             'system_fingerprint' (str, optional): If seed is set, a unique identifier for the model used to generate the response.
+            'response_duration' (float): The time, in seconds, taken to generate the response from the API.
     """
     response_format = {"type": "json_object"} if json_mode else None
@@ -100,7 +102,10 @@ def chat_completion(
         if logprobs[0] and logprobs[1] is not None:
             kwargs["top_logprobs"] = logprobs[1]
+    start_time = time.time()
     response = client.chat.completions.create(**kwargs)
+    end_time = time.time()
+    response_duration = end_time - start_time
     response_data: dict[str, Any] = {"choices": []}
     for response_choice in response.choices:
@@ -160,6 +165,8 @@ def chat_completion(
     if seed is not None and response.system_fingerprint is not None:
         response_data["system_fingerprint"] = response.system_fingerprint
+    response_data["response_duration"] = response_duration
     if len(response_data["choices"]) == 1:
         response_data.update(response_data["choices"][0])
         del response_data["choices"]

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/llm/openai_api/context_management.py RENAMED Viewed

@@ -1,6 +1,6 @@
 import copy
-from not_again_ai.llm.openai_api.tokens import num_tokens_from_messages, truncate_str
+from not_again_ai.llm.openai_api.tokens import load_tokenizer, num_tokens_from_messages, truncate_str
 def _inject_variable(
@@ -39,6 +39,7 @@ def priority_truncation(
         token_limit: The maximum number of tokens allowed in the messages.
         model: The model to use for tokenization. Defaults to "gpt-3.5-turbo-0125".
     """
+    tokenizer = load_tokenizer(model)
     # Check if all variables in the priority list are in the variables dict.
     # If not, add the missing variables into priority in any order.
@@ -49,7 +50,8 @@ def priority_truncation(
     messages_formatted = copy.deepcopy(messages_unformatted)
     for var in priority:
         # Count the current number of tokens in messages_formatted and compute a remaining token budget.
-        num_tokens = num_tokens_from_messages(messages_formatted, model=model)
+        tokenizer = load_tokenizer(model)
+        num_tokens = num_tokens_from_messages(messages_formatted, tokenizer=tokenizer, model=model)
         remaining_tokens = token_limit - num_tokens
         if remaining_tokens <= 0:
             break
@@ -60,7 +62,7 @@ def priority_truncation(
             num_var_occurrences += message["content"].count("{{" + var + "}}")
         # Truncate the variable to fit the remaining token budget taking into account the number of times it occurs in the messages.
-        truncated_var = truncate_str(variables[var], remaining_tokens // num_var_occurrences, model=model)
+        truncated_var = truncate_str(variables[var], remaining_tokens // num_var_occurrences, tokenizer=tokenizer)
         # Inject the variable text into messages_formatted.
         messages_formatted = _inject_variable(messages_formatted, var, truncated_var)

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/llm/openai_api/tokens.py RENAMED Viewed

@@ -1,73 +1,75 @@
 import tiktoken
-def truncate_str(text: str, max_len: int, model: str = "gpt-3.5-turbo-0125") -> str:
-    """Truncates a string to a maximum token length.
+def load_tokenizer(model: str) -> tiktoken.Encoding:
+    """Load the tokenizer for the given model
     Args:
-        text: The string to truncate.
-        max_len: The maximum number of tokens to keep.
-        model: The model to use for tokenization. Defaults to "gpt-3.5-turbo-0125".
-            See https://platform.openai.com/docs/models for a list of OpenAI models.
+        model (str): The name of the language model to load the tokenizer for
     Returns:
-        The truncated string.
+        A tiktoken encoding object
     """
     try:
         encoding = tiktoken.encoding_for_model(model)
     except KeyError:
         print("Warning: model not found. Using cl100k_base encoding.")
         encoding = tiktoken.get_encoding("cl100k_base")
+    return encoding
+def truncate_str(text: str, max_len: int, tokenizer: tiktoken.Encoding) -> str:
+    """Truncates a string to a maximum token length.
-    tokens = encoding.encode(text)
+    Args:
+        text (str): The string to truncate.
+        max_len (int): The maximum number of tokens to keep.
+        tokenizer (tiktoken.Encoding): A tiktoken encoding object
+    Returns:
+        str: The truncated string.
+    """
+    tokens = tokenizer.encode(text)
     if len(tokens) > max_len:
         tokens = tokens[:max_len]
         # Decode the tokens back to a string
-        truncated_text = encoding.decode(tokens)
+        truncated_text = tokenizer.decode(tokens)
         return truncated_text
     else:
         return text
-def num_tokens_in_string(text: str, model: str = "gpt-3.5-turbo-0125") -> int:
+def num_tokens_in_string(text: str, tokenizer: tiktoken.Encoding) -> int:
     """Return the number of tokens in a string.
     Args:
-        text: The string to count the tokens.
-        model: The model to use for tokenization. Defaults to "gpt-3.5-turbo-0125".
-            See https://platform.openai.com/docs/models for a list of OpenAI models.
+        text (str): The string to count the tokens.
+        tokenizer (tiktoken.Encoding): A tiktoken encoding object
     Returns:
-        The number of tokens in the string.
+        int: The number of tokens in the string.
     """
-    try:
-        encoding = tiktoken.encoding_for_model(model)
-    except KeyError:
-        print("Warning: model not found. Using cl100k_base encoding.")
-        encoding = tiktoken.get_encoding("cl100k_base")
-    return len(encoding.encode(text))
+    return len(tokenizer.encode(text))
-def num_tokens_from_messages(messages: list[dict[str, str]], model: str = "gpt-3.5-turbo-0125") -> int:
+def num_tokens_from_messages(
+    messages: list[dict[str, str]], tokenizer: tiktoken.Encoding, model: str = "gpt-3.5-turbo-0125"
+) -> int:
     """Return the number of tokens used by a list of messages.
-    NOTE: Does not support counting tokens used by function calling.
+    NOTE: Does not support counting tokens used by function calling or prompts with images.
     Reference: # https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb
     and https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
     Args:
-        messages: A list of messages to count the tokens
+        messages (list[dict[str, str]]): A list of messages to count the tokens
             should ideally be the result after calling llm.prompts.chat_prompt.
-        model: The model to use for tokenization. Defaults to "gpt-3.5-turbo-0125".
+        tokenizer (tiktoken.Encoding): A tiktoken encoding object
+        model (str): The model to use for tokenization. Defaults to "gpt-3.5-turbo-0125".
             See https://platform.openai.com/docs/models for a list of OpenAI models.
     Returns:
-        The number of tokens used by the messages.
+        int: The number of tokens used by the messages.
     """
-    try:
-        encoding = tiktoken.encoding_for_model(model)
-    except KeyError:
-        print("Warning: model not found. Using cl100k_base encoding.")
-        encoding = tiktoken.get_encoding("cl100k_base")
     if model in {
         "gpt-3.5-turbo-0613",
         "gpt-3.5-turbo-16k-0613",
@@ -92,11 +94,11 @@ def num_tokens_from_messages(messages: list[dict[str, str]], model: str = "gpt-3
         tokens_per_name = -1
     # Approximate catch-all. Assumes future versions of 3.5 and 4 will have the same token counts as the 0613 versions.
     elif "gpt-3.5-turbo" in model:
-        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
+        return num_tokens_from_messages(messages, tokenizer=tokenizer, model="gpt-3.5-turbo-0613")
     elif "gpt-4o" in model:
-        return num_tokens_from_messages(messages, model="gpt-4o-2024-05-13")
+        return num_tokens_from_messages(messages, tokenizer=tokenizer, model="gpt-4o-2024-05-13")
     elif "gpt-4" in model:
-        return num_tokens_from_messages(messages, model="gpt-4-0613")
+        return num_tokens_from_messages(messages, tokenizer=tokenizer, model="gpt-4-0613")
     else:
         raise NotImplementedError(
             f"""num_tokens_from_messages() is not implemented for model {model}.
@@ -106,7 +108,7 @@ See https://github.com/openai/openai-python/blob/main/chatml.md for information
     for message in messages:
         num_tokens += tokens_per_message
         for key, value in message.items():
-            num_tokens += len(encoding.encode(value))
+            num_tokens += len(tokenizer.encode(value))
             if key == "name":
                 num_tokens += tokens_per_name
     num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>

not_again_ai-0.10.0/src/not_again_ai/local_llm/__init__.py ADDED Viewed

@@ -0,0 +1,23 @@
+import importlib.util
+import os
+os.environ["HF_HUB_DISABLE_TELEMETRY"] = "1"
+os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
+if (
+    importlib.util.find_spec("liquid") is None
+    or importlib.util.find_spec("ollama") is None
+    or importlib.util.find_spec("openai") is None
+    or importlib.util.find_spec("tiktoken") is None
+    or importlib.util.find_spec("transformers") is None
+):
+    raise ImportError(
+        "not_again_ai.local_llm requires the 'llm' and 'local_llm' extra to be installed. "
+        "You can install it using 'pip install not_again_ai[llm,local_llm]'."
+    )
+else:
+    import liquid  # noqa: F401
+    import ollama  # noqa: F401
+    import openai  # noqa: F401
+    import tiktoken  # noqa: F401
+    import transformers  # noqa: F401

{not_again_ai-0.9.0/src/not_again_ai/llm → not_again_ai-0.10.0/src/not_again_ai/local_llm}/chat_completion.py RENAMED Viewed

@@ -3,8 +3,8 @@ from typing import Any
 from ollama import Client
 from openai import OpenAI
-from not_again_ai.llm.ollama import chat_completion as chat_completion_ollama
 from not_again_ai.llm.openai_api import chat_completion as chat_completion_openai
+from not_again_ai.local_llm.ollama import chat_completion as chat_completion_ollama
 def chat_completion(
@@ -34,7 +34,9 @@ def chat_completion(
         dict[str, Any]: A dictionary with the following keys
             message (str | dict): The content of the generated assistant message.
                 If json_mode is True, this will be a dictionary.
+            prompt_tokens (int): The number of tokens in the messages sent to the model.
             completion_tokens (int): The number of tokens used by the model to generate the completion.
+            response_duration (float): The time, in seconds, taken to generate the response by using the model.
             extras (dict): This will contain any additional fields returned by corresponding provider.
     """
     # Determine which chat_completion function to call based on the client type
@@ -65,8 +67,10 @@ def chat_completion(
     # Parse the responses to be consistent
     response_data = {}
-    response_data["message"] = response.get("message", None)
-    response_data["completion_tokens"] = response.get("completion_tokens", None)
+    response_data["message"] = response.get("message")
+    response_data["completion_tokens"] = response.get("completion_tokens")
+    response_data["prompt_tokens"] = response.get("prompt_tokens")
+    response_data["response_duration"] = response.get("response_duration")
     # Return any additional fields from the response in an "extras" dictionary
     extras = {k: v for k, v in response.items() if k not in response_data}

{not_again_ai-0.9.0/src/not_again_ai/llm → not_again_ai-0.10.0/src/not_again_ai/local_llm}/ollama/chat_completion.py RENAMED Viewed

@@ -1,14 +1,12 @@
 import contextlib
 import json
 import re
+import time
 from typing import Any
 from ollama import Client, ResponseError
-def _convert_duration(nanoseconds: int) -> float:
-    seconds = nanoseconds / 1_000_000_000
-    return round(seconds, 5)
+from not_again_ai.local_llm.ollama.tokens import load_tokenizer, num_tokens_from_messages, num_tokens_in_string
 def chat_completion(
@@ -40,8 +38,9 @@ def chat_completion(
         dict[str, Any]: A dictionary with the following keys
             message (str | dict): The content of the generated assistant message.
                 If json_mode is True, this will be a dictionary.
+            prompt_tokens (int): The number of tokens in the messages sent to the model.
             completion_tokens (int): The number of tokens used by the model to generate the completion.
-            response_duration (float): The time taken to generate the response in seconds.
+            response_duration (float): The time, in seconds, taken to generate the response by using the model.
     """
     options = {
@@ -62,7 +61,10 @@ def chat_completion(
         all_args["format"] = "json"
     try:
-        response = client.chat(**all_args)
+        start_time = time.time()
+        response = client.chat(**all_args)  # type: ignore
+        end_time = time.time()
+        response_duration = end_time - start_time
     except ResponseError as e:
         # If the error says "model 'model' not found" use regex then raise a more specific error
         expected_pattern = f"model '{model}' not found"
@@ -71,25 +73,28 @@ def chat_completion(
                 f"Model '{model}' not found. Please use not_again_ai.llm.ollama.service.pull() first."
             ) from e
         else:
-            raise ResponseError(e.message) from e
+            raise ResponseError(e.error) from e
     response_data: dict[str, Any] = {}
     # Handle getting the message returned by the model
-    message = response["message"].get("content", None)
+    message = response["message"].get("content", None)  # type: ignore
     if message and json_mode:
         with contextlib.suppress(json.JSONDecodeError):
             message = json.loads(message)
     if message:
         response_data["message"] = message
+    tokenizer = load_tokenizer(model)
+    prompt_tokens = num_tokens_from_messages(messages, tokenizer)
+    response_data["prompt_tokens"] = prompt_tokens
     # Get the number of tokens generated
-    response_data["completion_tokens"] = response.get("eval_count", None)
+    response_data["completion_tokens"] = response.get("eval_count", None)  # type: ignore
+    if response_data["completion_tokens"] is None:
+        response_data["completion_tokens"] = num_tokens_in_string(str(response_data["message"]), tokenizer)
     # Get the latency of the response
-    if response.get("total_duration", None):
-        response_data["response_duration"] = _convert_duration(response["total_duration"])
-    else:
-        response_data["response_duration"] = None
+    response_data["response_duration"] = response_duration
     return response_data

not_again_ai-0.10.0/src/not_again_ai/local_llm/ollama/model_mapping.py ADDED Viewed

@@ -0,0 +1,15 @@
+"""Hardcoded mapping from ollama model names to their associated HuggingFace tokenizer.
+Given the way that Ollama models are tagged, we can against the first part of the model name,
+i.e. all phi3 models will start with "phi3".
+"""
+OLLAMA_MODEL_MAPPING = {
+    "phi3": "microsoft/Phi-3-mini-4k-instruct",
+    "llama3:": "nvidia/Llama3-ChatQA-1.5-8B",  # Using this version to get around needed to accept an agreement to get access to the tokenizer
+    "gemma": "google/gemma-1.1-7b-it",  # Requires HF_TOKEN set and accepting the agreement on the HF model page
+    "qwen2": "Qwen/Qwen2-7B-Instruct",
+    "granite-code": "ibm-granite/granite-34b-code-instruct",
+    "llama3-gradient": "nvidia/Llama3-ChatQA-1.5-8B",
+    "command-r": "CohereForAI/c4ai-command-r-v01",
+}

not_again_ai-0.10.0/src/not_again_ai/local_llm/ollama/tokens.py ADDED Viewed

@@ -0,0 +1,110 @@
+"""By default use the associated huggingface transformer tokenizer.
+If it does not exist in the mapping, default to tiktoken with some buffer (const + percentage)"""
+import os
+from loguru import logger
+import tiktoken
+from not_again_ai.llm.openai_api.tokens import num_tokens_from_messages as openai_num_tokens_from_messages
+from not_again_ai.local_llm.ollama.model_mapping import OLLAMA_MODEL_MAPPING
+# Prevents the transformers library from printing advisories that are not relevant to this code like not having torch installed.
+os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1"
+from transformers import AutoTokenizer  # noqa: E402
+TIKTOKEN_NUM_TOKENS_BUFFER = 10
+TIKTOKEN_PERCENT_TOKENS_BUFFER = 1.1
+def load_tokenizer(model: str) -> AutoTokenizer | tiktoken.Encoding:
+    """Use the model mapping to load the appropriate tokenizer
+    Args:
+        model: The name of the language model to load the tokenizer for
+    Returns:
+        Either a HuggingFace tokenizer or a tiktoken encoding object
+    """
+    # Loop over the keys in the model mapping checking if the model starts with the key
+    for key in OLLAMA_MODEL_MAPPING:
+        if model.startswith(key):
+            return AutoTokenizer.from_pretrained(OLLAMA_MODEL_MAPPING[key], use_fast=True)
+    # If the model does not start with any key in the model mapping, default to tiktoken
+    logger.warning(
+        f'Model "{model}" not found in OLLAMA_MODEL_MAPPING. Using tiktoken - token counts will have an added buffer of \
+{TIKTOKEN_PERCENT_TOKENS_BUFFER * 100}% plus {TIKTOKEN_NUM_TOKENS_BUFFER} tokens.'
+    )
+    tokenizer = tiktoken.get_encoding("o200k_base")
+    return tokenizer
+def truncate_str(text: str, max_len: int, tokenizer: AutoTokenizer | tiktoken.Encoding) -> str:
+    """Truncates a string to a maximum token length.
+    Args:
+        text: The string to truncate.
+        max_len: The maximum number of tokens to keep.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+    Returns:
+        str: The truncated string.
+    """
+    if isinstance(tokenizer, tiktoken.Encoding):
+        tokens = tokenizer.encode(text)
+        if len(tokens) > max_len:
+            tokens = tokens[:max_len]
+            truncated_text = tokenizer.decode(tokens)
+            return truncated_text
+    else:
+        tokens = tokenizer(text, return_tensors=None)["input_ids"]
+        if len(tokens) > max_len:
+            tokens = tokens[:max_len]
+            truncated_text = tokenizer.decode(tokens)
+            return truncated_text
+    return text
+def num_tokens_in_string(text: str, tokenizer: AutoTokenizer | tiktoken.Encoding) -> int:
+    """Return the number of tokens in a string.
+    Args:
+        text: The string to count the tokens.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+    Returns:
+        int: The number of tokens in the string.
+    """
+    if isinstance(tokenizer, tiktoken.Encoding):
+        num_tokens = (len(tokenizer.encode(text)) * TIKTOKEN_PERCENT_TOKENS_BUFFER) + TIKTOKEN_NUM_TOKENS_BUFFER
+        return int(num_tokens)
+    else:
+        tokens = tokenizer(text, return_tensors=None)["input_ids"]
+        return len(tokens)
+def num_tokens_from_messages(messages: list[dict[str, str]], tokenizer: AutoTokenizer | tiktoken.Encoding) -> int:
+    """Return the number of tokens used by a list of messages.
+    For models with HuggingFace tokenizers, uses
+    Args:
+        messages: A list of messages to count the tokens
+            should ideally be the result after calling llm.prompts.chat_prompt.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+    Returns:
+        int: The number of tokens used by the messages.
+    """
+    if isinstance(tokenizer, tiktoken.Encoding):
+        num_tokens = (
+            openai_num_tokens_from_messages(messages, tokenizer=tokenizer, model="gpt-4o")
+            * TIKTOKEN_PERCENT_TOKENS_BUFFER
+        ) + TIKTOKEN_NUM_TOKENS_BUFFER
+        return int(num_tokens)
+    else:
+        tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors=None)
+        return len(tokens)

not_again_ai-0.10.0/src/not_again_ai/local_llm/prompts.py ADDED Viewed

@@ -0,0 +1,38 @@
+from copy import deepcopy
+from liquid import Template
+def chat_prompt(messages_unformatted: list[dict[str, str]], variables: dict[str, str]) -> list[dict[str, str]]:
+    """Formats a list of messages for chat completion models using Liquid templating.
+    Args:
+        messages_unformatted: A list of dictionaries where each dictionary
+            represents a message. Each message must have 'role' and 'content'
+            keys with string values, where content is a Liquid template.
+        variables: A dictionary where each key-value pair represents a variable
+            name and its value for template rendering.
+    Returns:
+        A list of dictionaries with the same structure as `messages_unformatted`,
+        but with the 'content' of each message with the provided `variables`.
+    Examples:
+        >>> messages = [
+        ...     {"role": "system", "content": "You are a helpful assistant."},
+        ...     {"role": "user", "content": "Help me {{task}}"}
+        ... ]
+        >>> vars = {"task": "write Python code for the fibonnaci sequence"}
+        >>> chat_prompt(messages, vars)
+        [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": "Help me write Python code for the fibonnaci sequence"}
+        ]
+    """
+    messages_formatted = deepcopy(messages_unformatted)
+    for message in messages_formatted:
+        liquid_template = Template(message["content"])
+        message["content"] = liquid_template.render(**variables)
+    return messages_formatted

not_again_ai-0.10.0/src/not_again_ai/local_llm/tokens.py ADDED Viewed

@@ -0,0 +1,90 @@
+import tiktoken
+from transformers import AutoTokenizer
+from not_again_ai.llm.openai_api.tokens import load_tokenizer as openai_load_tokenizer
+from not_again_ai.llm.openai_api.tokens import num_tokens_from_messages as openai_num_tokens_from_messages
+from not_again_ai.llm.openai_api.tokens import num_tokens_in_string as openai_num_tokens_in_string
+from not_again_ai.llm.openai_api.tokens import truncate_str as openai_truncate_str
+from not_again_ai.local_llm.ollama.tokens import load_tokenizer as ollama_load_tokenizer
+from not_again_ai.local_llm.ollama.tokens import num_tokens_from_messages as ollama_num_tokens_from_messages
+from not_again_ai.local_llm.ollama.tokens import num_tokens_in_string as ollama_num_tokens_in_string
+from not_again_ai.local_llm.ollama.tokens import truncate_str as ollama_truncate_str
+def load_tokenizer(model: str, provider: str) -> AutoTokenizer | tiktoken.Encoding:
+    """Load the tokenizer for the given model and providers
+    Args:
+        model (str): The name of the language model to load the tokenizer for
+        provider (str): Either "openai_api" or "ollama"
+    Returns:
+        Either a HuggingFace tokenizer or a tiktoken encoding object
+    """
+    if provider == "openai_api":
+        return openai_load_tokenizer(model)
+    elif provider == "ollama":
+        return ollama_load_tokenizer(model)
+    else:
+        raise ValueError(f"Unknown tokenizer provider {provider}")
+def truncate_str(text: str, max_len: int, tokenizer: AutoTokenizer | tiktoken.Encoding, provider: str) -> str:
+    """Truncates a string to a maximum token length.
+    Args:
+        text: The string to truncate.
+        max_len: The maximum number of tokens to keep.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+        provider (str): Either "openai_api" or "ollama"
+    Returns:
+        str: The truncated string.
+    """
+    if provider == "openai_api":
+        return openai_truncate_str(text, max_len, tokenizer)
+    elif provider == "ollama":
+        return ollama_truncate_str(text, max_len, tokenizer)
+    else:
+        raise ValueError(f'Unknown tokenizer provider "{provider}"')
+def num_tokens_in_string(text: str, tokenizer: AutoTokenizer | tiktoken.Encoding, provider: str) -> int:
+    """Return the number of tokens in a string.
+    Args:
+        text: The string to count the tokens.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+        provider (str): Either "openai_api" or "ollama"
+    Returns:
+        int: The number of tokens in the string.
+    """
+    if provider == "openai_api":
+        return openai_num_tokens_in_string(text, tokenizer)
+    elif provider == "ollama":
+        return ollama_num_tokens_in_string(text, tokenizer)
+    else:
+        raise ValueError(f'Unknown tokenizer provider "{provider}"')
+def num_tokens_from_messages(
+    messages: list[dict[str, str]], tokenizer: AutoTokenizer | tiktoken.Encoding, provider: str
+) -> int:
+    """Return the number of tokens used by a list of messages.
+    Args:
+        messages: A list of messages to count the tokens
+            should ideally be the result after calling llm.prompts.chat_prompt.
+        tokenizer: Either a HuggingFace tokenizer or a tiktoken encoding object
+        provider (str): Either "openai_api" or "ollama"
+    Returns:
+        int: The number of tokens used by the messages.
+    """
+    if provider == "openai_api":
+        return openai_num_tokens_from_messages(messages, tokenizer)
+    elif provider == "ollama":
+        return ollama_num_tokens_from_messages(messages, tokenizer)
+    else:
+        raise ValueError(f'Unknown tokenizer provider "{provider}"')

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/statistics/dependence.py RENAMED Viewed

@@ -8,8 +8,8 @@ import sklearn.tree as sktree
 def _process_variable(
-    x: npt.NDArray[np.int_] | (npt.NDArray[np.float_] | npt.NDArray[np.str_]),
-) -> npt.NDArray[np.int_] | (npt.NDArray[np.float_] | npt.NDArray[np.str_]):
+    x: npt.NDArray[np.int_] | (npt.NDArray[np.float64] | npt.NDArray[np.str_]),
+) -> npt.NDArray[np.int_] | (npt.NDArray[np.float64] | npt.NDArray[np.str_]):
     """Process variable by encoding it as a numeric array."""
     le = skpreprocessing.LabelEncoder()
     x = le.fit_transform(x)
@@ -18,9 +18,9 @@ def _process_variable(
 def pearson_correlation(
     x: list[int]
-    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float_] | npt.NDArray[np.str_])))),
+    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float64] | npt.NDArray[np.str_])))),
     y: list[int]
-    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float_] | npt.NDArray[np.str_])))),
+    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float64] | npt.NDArray[np.str_])))),
     is_x_categorical: bool = False,
     is_y_categorical: bool = False,
     print_diagnostics: bool = False,
@@ -60,7 +60,7 @@ def pearson_correlation(
 def pred_power_score_classification(
     x: list[int]
-    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float_] | npt.NDArray[np.str_])))),
+    | (list[float] | (list[str] | (npt.NDArray[np.int_] | (npt.NDArray[np.float64] | npt.NDArray[np.str_])))),
     y: list[int] | (list[str] | npt.NDArray[np.int_]),
     cv_splits: int = 5,
     print_diagnostics: bool = False,

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/viz/barplots.py RENAMED Viewed

@@ -8,8 +8,8 @@ from not_again_ai.viz.utils import reset_plot_libs
 def simple_barplot(
-    x: list[str] | (list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float_])),
-    y: list[str] | (list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float_])),
+    x: list[str] | (list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float64])),
+    y: list[str] | (list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float64])),
     save_pathname: str,
     order: str | None = None,
     orient_bars_vertically: bool = True,

{not_again_ai-0.9.0 → not_again_ai-0.10.0}/src/not_again_ai/viz/scatterplot.py RENAMED Viewed

@@ -9,8 +9,8 @@ from not_again_ai.viz.utils import reset_plot_libs
 def scatterplot_basic(
-    x: list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float_]),
-    y: list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float_]),
+    x: list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float64]),
+    y: list[float] | (npt.NDArray[np.int_] | npt.NDArray[np.float64]),
     save_pathname: str,
     title: str | None = None,
     xlim: tuple[float, float] | None = None,