PyPI - not-again-ai - Versions diffs - 0.18.0__tar.gz → 0.20.0__tar.gz - Mend

not-again-ai 0.18.0tar.gz → 0.20.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (106) hide show

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/.github/copilot-instructions.md RENAMED Viewed

@@ -7,4 +7,5 @@
 - If the user is using Pydantic, it is version >=2.10
 - Always prefer pathlib for dealing with files. Use `Path.open` instead of `open`.
 - Prefer to use pendulum instead of datetime
-- Prefer to use loguru instead of logging
+- Prefer to use loguru instead of logging
+- Prefer httpx for HTTP requests instead of requests

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/.github/workflows/python.yml RENAMED Viewed

@@ -35,7 +35,7 @@ jobs:
         env:
           OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
           OPENAI_ORG_ID: ${{ secrets.OPENAI_ORG_ID }}
-          SKIP_TESTS_NAAI: "tests/llm/chat_completion tests/llm/embedding tests/data"
+          SKIP_TESTS_NAAI: "tests/llm/chat_completion tests/llm/embedding tests/llm/image_gen tests/data"
         run: uv run nox -s test-${{ matrix.python-version }}
   quality:
     runs-on: ubuntu-24.04

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: not-again-ai
-Version: 0.18.0
+Version: 0.20.0
 Summary: Designed to once and for all collect all the little things that come up over and over again in AI projects and put them in one place.
 Project-URL: Homepage, https://github.com/DaveCoDev/not-again-ai
 Project-URL: Documentation, https://davecodev.github.io/not-again-ai/
@@ -19,16 +19,18 @@ Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: 3.12
 Classifier: Typing :: Typed
 Requires-Python: >=3.11
-Requires-Dist: loguru>=0.7
-Requires-Dist: pydantic>=2.10
+Requires-Dist: loguru<1.0,>=0.7
+Requires-Dist: pydantic<3.0,>=2.11
 Provides-Extra: data
-Requires-Dist: playwright<2.0,>=1.51; extra == 'data'
-Requires-Dist: pytest-playwright<1.0,>=0.7; extra == 'data'
+Requires-Dist: crawl4ai<1.0,>=0.6; extra == 'data'
+Requires-Dist: httpx<1.0,>=0.28; extra == 'data'
+Requires-Dist: markitdown[pdf]==0.1.2; extra == 'data'
 Provides-Extra: llm
-Requires-Dist: anthropic<1.0,>=0.49; extra == 'llm'
+Requires-Dist: anthropic<1.0,>=0.50; extra == 'llm'
 Requires-Dist: azure-identity<2.0,>=1.21; extra == 'llm'
+Requires-Dist: google-genai<2.0,>1.12; extra == 'llm'
 Requires-Dist: ollama<1.0,>=0.4; extra == 'llm'
-Requires-Dist: openai<2.0,>=1.68; extra == 'llm'
+Requires-Dist: openai<2.0,>=1.76; extra == 'llm'
 Requires-Dist: python-liquid<3.0,>=2.0; extra == 'llm'
 Requires-Dist: tiktoken<1.0,>=0.9; extra == 'llm'
 Provides-Extra: statistics
@@ -62,7 +64,7 @@ It is encouraged to also **a)** use this as a template for your own Python packa
 **b)** instead of installing the package, copy and paste functions into your own projects.
 We make this easier by limiting the number of dependencies and use an MIT license.
-**Documentation** available within individual **[notebooks](notebooks)**, docstrings within the source, or auto-generated at [DaveCoDev.github.io/not-again-ai/](https://DaveCoDev.github.io/not-again-ai/).
+**Documentation** available within individual **[notebooks](notebooks)** or docstrings within the source code.
 # Installation
@@ -82,7 +84,9 @@ The package is split into subpackages, so you can install only the parts you nee
 ### Data
 1. `pip install not_again_ai[data]`
-1. `playwright install` to download the browser binaries.
+1. `crawl4ai-setup` to run crawl4ai post-installation setup.
+1. Set the `BRAVE_SEARCH_API_KEY` environment variable to use the Brave Search API for web data extraction.
+   1. Get the API key from https://api-dashboard.search.brave.com/app/keys. You must have at least the Free "Data for Search" subscription.
 ### LLM
@@ -138,7 +142,7 @@ all machines that use the project, both during development and in production.
 To install all dependencies into an isolated virtual environment:
 ```shell
-uv sync --all-extras
+uv sync --all-extras --all-groups
 ```
 To upgrade all dependencies to their latest versions:
@@ -311,3 +315,5 @@ Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.
 # Attributions
 [python-blueprint](https://github.com/johnthagen/python-blueprint) for the Python package skeleton.
+This project uses Crawl4AI (https://github.com/unclecode/crawl4ai) for web data extraction.

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/README.md RENAMED Viewed

@@ -19,7 +19,7 @@ It is encouraged to also **a)** use this as a template for your own Python packa
 **b)** instead of installing the package, copy and paste functions into your own projects.
 We make this easier by limiting the number of dependencies and use an MIT license.
-**Documentation** available within individual **[notebooks](notebooks)**, docstrings within the source, or auto-generated at [DaveCoDev.github.io/not-again-ai/](https://DaveCoDev.github.io/not-again-ai/).
+**Documentation** available within individual **[notebooks](notebooks)** or docstrings within the source code.
 # Installation
@@ -39,7 +39,9 @@ The package is split into subpackages, so you can install only the parts you nee
 ### Data
 1. `pip install not_again_ai[data]`
-1. `playwright install` to download the browser binaries.
+1. `crawl4ai-setup` to run crawl4ai post-installation setup.
+1. Set the `BRAVE_SEARCH_API_KEY` environment variable to use the Brave Search API for web data extraction.
+   1. Get the API key from https://api-dashboard.search.brave.com/app/keys. You must have at least the Free "Data for Search" subscription.
 ### LLM
@@ -95,7 +97,7 @@ all machines that use the project, both during development and in production.
 To install all dependencies into an isolated virtual environment:
 ```shell
-uv sync --all-extras
+uv sync --all-extras --all-groups
 ```
 To upgrade all dependencies to their latest versions:
@@ -268,3 +270,5 @@ Default settings are configured in [`.vscode/settings.json`](./.vscode/settings.
 # Attributions
 [python-blueprint](https://github.com/johnthagen/python-blueprint) for the Python package skeleton.
+This project uses Crawl4AI (https://github.com/unclecode/crawl4ai) for web data extraction.

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/noxfile.py RENAMED Viewed

@@ -26,7 +26,7 @@ def test(s: Session) -> None:
     # Skip tests in directories specified by the SKIP_TESTS_NAII environment variable.
     skip_tests = os.getenv("SKIP_TESTS_NAAI", "")
-    skip_tests += " tests/llm/chat_completion/ tests/llm/embedding/"
+    skip_tests += " tests/llm/chat_completion/ tests/llm/embedding/ tests/llm/image_gen/"
     skip_args = [f"--ignore={dir}" for dir in skip_tests.split()] if skip_tests else []
     s.run(

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "not-again-ai"
-version = "0.18.0"
+version = "0.20.0"
 description = "Designed to once and for all collect all the little things that come up over and over again in AI projects and put them in one place."
 authors = [
     { name = "DaveCoDev", email = "dave.co.dev@gmail.com" }
@@ -23,8 +23,8 @@ classifiers = [
 ]
 requires-python = ">=3.11"
 dependencies = [
-    "loguru>=0.7",
-    "pydantic>=2.10",
+    "loguru>=0.7,<1.0",
+    "pydantic>=2.11,<3.0",
 ]
 [project.urls]
@@ -34,14 +34,16 @@ Repository = "https://github.com/DaveCoDev/not-again-ai"
 [project.optional-dependencies]
 data = [
-    "playwright>=1.51,<2.0",
-    "pytest-playwright>=0.7,<1.0",
+    "Crawl4AI>=0.6,<1.0",
+    "httpx>=0.28,<1.0",
+    "markitdown[pdf]==0.1.2"
 ]
 llm = [
-    "anthropic>=0.49,<1.0",
+    "anthropic>=0.50,<1.0",
     "azure-identity>=1.21,<2.0",
+    "google-genai>1.12,<2.0",
     "ollama>=0.4,<1.0",
-    "openai>=1.68,<2.0",
+    "openai>=1.76,<2.0",
     "python-liquid>=2.0,<3.0",
     "tiktoken>=0.9,<1.0"
 ]
@@ -139,7 +141,7 @@ filterwarnings = [
     "error",
     # Add additional warning suppressions as needed here. For example, if a third-party library
     # is throwing a deprecation warning that needs to be fixed upstream:
-    # "ignore::DeprecationWarning:typer",
+    "ignore::DeprecationWarning",
     "ignore::pytest.PytestUnraisableExceptionWarning"
 ]
 asyncio_mode = "auto"

not_again_ai-0.20.0/src/not_again_ai/data/brave_search_api.py ADDED Viewed

@@ -0,0 +1,203 @@
+import os
+import httpx
+from loguru import logger
+from pydantic import BaseModel
+class SearchWebResult(BaseModel):
+    title: str
+    url: str
+    description: str
+    netloc: str | None = None
+class SearchWebResults(BaseModel):
+    results: list[SearchWebResult]
+async def search(
+    query: str,
+    count: int = 20,
+    offset: int = 0,
+    country: str = "US",
+    search_lang: str = "en",
+    ui_lang: str = "en-US",
+    freshness: str | None = None,
+    timezone: str = "America/New_York",
+    state: str = "MA",
+    user_agent: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.",
+) -> SearchWebResults:
+    """
+    Search using Brave Search API.
+    Args:
+        query: The search query string
+        count: Number of search results to return (1-20, default 10)
+        offset: Number of search results to skip (default 0)
+        country: Country code for search results (default "US")
+        search_lang: Language for search (default "en")
+        ui_lang: User interface language (default "en-US")
+        freshness: Freshness of results ("pd", "pw", "pm", "py" or YYYY-MM-DDtoYYYY-MM-DD or None)
+        timezone: Timezone for search results (default "America/New_York")
+        state: State for search results (default "MA")
+        user_agent: User agent string for the request (default is a common browser UA)
+    Returns:
+        SearchWebResults: A model containing the search results
+    Raises:
+        httpx.HTTPError: If the request fails
+        ValueError: If BRAVE_SEARCH_API_KEY is not set
+    """
+    api_key = os.getenv("BRAVE_SEARCH_API_KEY")
+    if not api_key:
+        raise ValueError("BRAVE_SEARCH_API_KEY environment variable is not set")
+    url = "https://api.search.brave.com/res/v1/web/search"
+    headers = {
+        "Accept": "application/json",
+        "Accept-Encoding": "gzip",
+        "X-Subscription-Token": api_key,
+        "X-Loc-Country": country,
+        "X-Loc-Timezone": timezone,
+        "X-Loc-State": state,
+        "User-Agent": user_agent,
+    }
+    params: dict[str, str | int | bool] = {
+        "q": query,
+        "count": count,
+        "offset": offset,
+        "country": country,
+        "search_lang": search_lang,
+        "ui_lang": ui_lang,
+        "text_decorations": False,
+        "spellcheck": False,
+        "units": "imperial",
+        "extra_snippets": False,
+        "safesearch": "off",
+    }
+    # Add optional parameters if provided
+    if freshness:
+        params["freshness"] = freshness
+    try:
+        async with httpx.AsyncClient() as client:
+            response = await client.get(url, headers=headers, params=params)
+            response.raise_for_status()
+            data = response.json()
+            results_list: list[SearchWebResult] = []
+            for item in data.get("web", {}).get("results", []):
+                result = SearchWebResult(
+                    title=item.get("title", ""),
+                    url=item.get("url", ""),
+                    description=item.get("snippet", ""),
+                    netloc=item.get("meta_url", {}).get("netloc", None),
+                )
+                results_list.append(result)
+            return SearchWebResults(results=results_list)
+    except httpx.HTTPError as e:
+        logger.error(f"HTTP error during Brave search: {e}")
+        raise
+    except Exception as e:
+        logger.error(f"Unexpected error during Brave search: {e}")
+        raise
+class SearchNewsResult(BaseModel):
+    title: str
+    url: str
+    description: str
+    age: str
+    netloc: str | None = None
+class SearchNewsResults(BaseModel):
+    results: list[SearchNewsResult]
+async def search_news(
+    query: str,
+    count: int = 20,
+    offset: int = 0,
+    country: str = "US",
+    search_lang: str = "en",
+    ui_lang: str = "en-US",
+    freshness: str | None = None,
+    user_agent: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.",
+) -> SearchNewsResults:
+    """
+    Search news using Brave News Search API.
+    Args:
+        query: The search query string
+        count: Number of news results to return (1-20, default 20)
+        offset: Number of search results to skip (default 0)
+        country: Country code for search results (default "US")
+        search_lang: Language for search (default "en")
+        ui_lang: User interface language (default "en-US")
+        freshness: Freshness of results ("pd", "pw", "pm", "py" or YYYY-MM-DDtoYYYY-MM-DD or None)
+        user_agent: User agent string for the request (default is a common browser UA)
+    Returns:
+        SearchNewsResults: A model containing the news search results
+    Raises:
+        httpx.HTTPError: If the request fails
+        ValueError: If BRAVE_SEARCH_API_KEY is not set
+    """
+    api_key = os.getenv("BRAVE_SEARCH_API_KEY")
+    if not api_key:
+        raise ValueError("BRAVE_SEARCH_API_KEY environment variable is not set")
+    url = "https://api.search.brave.com/res/v1/news/search"
+    headers = {
+        "Accept": "application/json",
+        "Accept-Encoding": "gzip",
+        "X-Subscription-Token": api_key,
+        "User-Agent": user_agent,
+    }
+    params: dict[str, str | int | bool] = {
+        "q": query,
+        "count": count,
+        "offset": offset,
+        "country": country,
+        "search_lang": search_lang,
+        "ui_lang": ui_lang,
+        "spellcheck": False,
+        "safesearch": "off",
+    }
+    # Add optional parameters if provided
+    if freshness:
+        params["freshness"] = freshness
+    try:
+        async with httpx.AsyncClient() as client:
+            response = await client.get(url, headers=headers, params=params)
+            response.raise_for_status()
+            data = response.json()
+            results_list: list[SearchNewsResult] = []
+            for item in data.get("results", []):
+                result = SearchNewsResult(
+                    title=item.get("title", ""),
+                    url=item.get("url", ""),
+                    description=item.get("description", ""),
+                    age=item.get("age"),
+                    netloc=item.get("meta_url", {}).get("netloc", None),
+                )
+                results_list.append(result)
+            return SearchNewsResults(results=results_list)
+    except httpx.HTTPError as e:
+        logger.error(f"HTTP error during Brave news search: {e}")
+        raise
+    except Exception as e:
+        logger.error(f"Unexpected error during Brave news search: {e}")
+        raise

not_again_ai-0.20.0/src/not_again_ai/data/web.py ADDED Viewed

@@ -0,0 +1,160 @@
+import asyncio
+import io
+import mimetypes
+from pathlib import Path
+import re
+from urllib.parse import urlparse
+from crawl4ai import AsyncWebCrawler, CacheMode
+from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
+from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
+import httpx
+from markitdown import MarkItDown, StreamInfo
+from pydantic import BaseModel
+class Link(BaseModel):
+    url: str
+    text: str
+class URLResult(BaseModel):
+    url: str
+    markdown: str
+    links: list[Link] = []
+async def _markitdown_bytes_to_str(file_bytes: bytes, filename_extension: str) -> str:
+    """
+    Convert a file using MarkItDown defaults.
+    """
+    with io.BytesIO(file_bytes) as temp:
+        result = await asyncio.to_thread(
+            MarkItDown(enable_plugins=False).convert,
+            source=temp,
+            stream_info=StreamInfo(extension=filename_extension),
+        )
+        text = result.text_content
+    return text
+def _detect_pdf_extension(url: str) -> bool:
+    """
+    Detect if the URL is a PDF based on its extension.
+    """
+    parsed_url = urlparse(url)
+    filename = Path(parsed_url.path).name
+    return mimetypes.guess_type(filename)[0] == "application/pdf"
+def _detect_google_sheets(url: str) -> bool:
+    """
+    Detect if the URL is a Google Sheets document.
+    """
+    is_google_sheets = url.startswith("https://docs.google.com/spreadsheets/")
+    return is_google_sheets
+async def _handle_pdf_content(url: str) -> URLResult:
+    md = MarkItDown(enable_plugins=False)
+    result = md.convert(url)
+    url_result = URLResult(
+        url=url,
+        markdown=result.markdown or "",
+        links=[],
+    )
+    return url_result
+async def _handle_google_sheets_content(url: str) -> URLResult:
+    """
+    Handle Google Sheets by using the export URL to get the raw content.
+    """
+    edit_pattern = r"https://docs\.google\.com/spreadsheets/d/([a-zA-Z0-9-_]+)/edit"
+    export_pattern = r"https://docs\.google\.com/spreadsheets/d/([a-zA-Z0-9-_]+)/export\?format=csv"
+    # Check if it's already an export URL
+    export_match = re.search(export_pattern, url)
+    if export_match:
+        export_url = url
+    else:
+        # Check if it's an edit URL and extract document ID
+        edit_match = re.search(edit_pattern, url)
+        if edit_match:
+            doc_id = edit_match.group(1)
+            export_url = f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=csv&gid=0"
+        else:
+            return await _handle_web_content(url)
+    async with httpx.AsyncClient(follow_redirects=True) as client:
+        response = await client.get(export_url)
+        response.raise_for_status()
+        csv_bytes = response.content
+    # Convert CSV to markdown using MarkItDown
+    markdown_content = await _markitdown_bytes_to_str(csv_bytes, ".csv")
+    url_result = URLResult(
+        url=url,
+        markdown=markdown_content,
+        links=[],
+    )
+    return url_result
+async def _handle_web_content(url: str) -> URLResult:
+    browser_config = BrowserConfig(
+        browser_type="chromium",
+        headless=True,
+        verbose=False,
+        user_agent_mode="random",
+        java_script_enabled=True,
+    )
+    run_config = CrawlerRunConfig(
+        scan_full_page=True,
+        user_agent_mode="random",
+        cache_mode=CacheMode.DISABLED,
+        markdown_generator=DefaultMarkdownGenerator(),
+    )
+    async with AsyncWebCrawler(config=browser_config) as crawler:
+        result = await crawler.arun(
+            url=url,
+            config=run_config,
+        )
+        if result.response_headers.get("content-type") == "application/pdf":
+            return await _handle_pdf_content(url)
+    links: list[Link] = []
+    seen_urls: set[str] = set()
+    combined_link_data = result.links.get("internal", []) + result.links.get("external", [])
+    for link_data in combined_link_data:
+        href = link_data.get("href", "")
+        if href and href not in seen_urls:
+            seen_urls.add(href)
+            link = Link(
+                url=href,
+                text=link_data.get("title", "") or link_data.get("text", ""),
+            )
+            links.append(link)
+    url_result = URLResult(
+        url=url,
+        markdown=result.markdown or "",
+        links=links,
+    )
+    return url_result
+async def process_url(url: str) -> URLResult:
+    """
+    Process a URL to extract content and convert it to Markdown and links
+    """
+    if _detect_pdf_extension(url):
+        url_result = await _handle_pdf_content(url)
+    elif _detect_google_sheets(url):
+        url_result = await _handle_google_sheets_content(url)
+    else:
+        url_result = await _handle_web_content(url)
+    return url_result

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/src/not_again_ai/llm/chat_completion/interface.py RENAMED Viewed

@@ -2,6 +2,7 @@ from collections.abc import AsyncGenerator, Callable
 from typing import Any
 from not_again_ai.llm.chat_completion.providers.anthropic_api import anthropic_chat_completion
+from not_again_ai.llm.chat_completion.providers.gemini_api import gemini_chat_completion
 from not_again_ai.llm.chat_completion.providers.ollama_api import ollama_chat_completion, ollama_chat_completion_stream
 from not_again_ai.llm.chat_completion.providers.openai_api import openai_chat_completion, openai_chat_completion_stream
 from not_again_ai.llm.chat_completion.types import ChatCompletionChunk, ChatCompletionRequest, ChatCompletionResponse
@@ -16,6 +17,8 @@ def chat_completion(
     - `openai` - OpenAI
     - `azure_openai` - Azure OpenAI
     - `ollama` - Ollama
+    - `anthropic` - Anthropic
+    - `gemini` - Gemini
     Args:
         request: Request parameter object
@@ -31,6 +34,8 @@ def chat_completion(
         return ollama_chat_completion(request, client)
     elif provider == "anthropic":
         return anthropic_chat_completion(request, client)
+    elif provider == "gemini":
+        return gemini_chat_completion(request, client)
     else:
         raise ValueError(f"Provider {provider} not supported")
@@ -43,8 +48,6 @@ async def chat_completion_stream(
     """Stream a chat completion response from the given provider. Currently supported providers:
     - `openai` - OpenAI
     - `azure_openai` - Azure OpenAI
-    - `ollama` - Ollama
-    - `anthropic` - Anthropic
     Args:
         request: Request parameter object

{not_again_ai-0.18.0 → not_again_ai-0.20.0}/src/not_again_ai/llm/chat_completion/providers/anthropic_api.py RENAMED Viewed

@@ -103,12 +103,12 @@ def anthropic_chat_completion(request: ChatCompletionRequest, client: Callable[.
         elif tool_choice_value in ["auto", "any"]:
             tool_choice["type"] = "auto"
             if kwargs.get("parallel_tool_calls") is not None:
-                tool_choice["disable_parallel_tool_use"] = str(not kwargs["parallel_tool_calls"])
+                tool_choice["disable_parallel_tool_use"] = not kwargs["parallel_tool_calls"]  # type: ignore
         else:
             tool_choice["name"] = tool_choice_value
             tool_choice["type"] = "tool"
             if kwargs.get("parallel_tool_calls") is not None:
-                tool_choice["disable_parallel_tool_use"] = str(not kwargs["parallel_tool_calls"])
+                tool_choice["disable_parallel_tool_use"] = not kwargs["parallel_tool_calls"]  # type: ignore
         kwargs["tool_choice"] = tool_choice
     kwargs.pop("parallel_tool_calls", None)

not-again-ai 0.18.0__tar.gz → 0.20.0__tar.gz

not-again-ai 0.18.0tar.gz → 0.20.0tar.gz