PyPI - jehoctor-rag-demo - Versions diffs - 0.2.0__py3-none-any.whl → 0.2.1__py3-none-any.whl - Mend

jehoctor-rag-demo 0.2.0py3-none-any.whl → 0.2.1py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/METADATA +56 -31
jehoctor_rag_demo-0.2.1.dist-info/RECORD +31 -0
rag_demo/__main__.py +15 -4
rag_demo/agents/__init__.py +4 -0
rag_demo/agents/base.py +40 -0
rag_demo/agents/hugging_face.py +116 -0
rag_demo/agents/llama_cpp.py +113 -0
rag_demo/agents/ollama.py +91 -0
rag_demo/app.py +1 -1
rag_demo/app_protocol.py +101 -0
rag_demo/constants.py +11 -0
rag_demo/logic.py +90 -176
rag_demo/modes/_logic_provider.py +3 -2
rag_demo/modes/chat.py +10 -8
rag_demo/probe.py +129 -0
jehoctor_rag_demo-0.2.0.dist-info/RECORD +0 -23
{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/WHEEL +0 -0
{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/entry_points.txt +0 -0

{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/METADATA RENAMED Viewed

@@ -1,10 +1,11 @@
 Metadata-Version: 2.3
 Name: jehoctor-rag-demo
-Version: 0.2.0
+Version: 0.2.1
 Summary: Chat with Wikipedia
 Author: James Hoctor
 Author-email: James Hoctor <JEHoctor@protonmail.com>
 Requires-Dist: aiosqlite==0.21.0
+Requires-Dist: bitsandbytes>=0.49.1
 Requires-Dist: chromadb>=1.3.4
 Requires-Dist: datasets>=4.4.1
 Requires-Dist: httpx>=0.28.1
@@ -16,7 +17,6 @@ Requires-Dist: langchain-huggingface>=1.1.0
 Requires-Dist: langchain-ollama>=1.0.0
 Requires-Dist: langchain-openai>=1.0.2
 Requires-Dist: langgraph-checkpoint-sqlite>=3.0.1
-Requires-Dist: llama-cpp-python>=0.3.16
 Requires-Dist: nvidia-ml-py>=13.590.44
 Requires-Dist: ollama>=0.6.0
 Requires-Dist: platformdirs>=4.5.0
@@ -24,9 +24,13 @@ Requires-Dist: psutil>=7.1.3
 Requires-Dist: py-cpuinfo>=9.0.0
 Requires-Dist: pydantic>=2.12.4
 Requires-Dist: pyperclip>=1.11.0
+Requires-Dist: sentence-transformers>=5.2.2
 Requires-Dist: textual>=6.5.0
+Requires-Dist: transformers[torch]>=4.57.6
 Requires-Dist: typer>=0.20.0
-Requires-Python: >=3.12
+Requires-Dist: llama-cpp-python>=0.3.16 ; extra == 'llamacpp'
+Requires-Python: ~=3.12.0
+Provides-Extra: llamacpp
 Description-Content-Type: text/markdown
 # RAG-demo
@@ -35,50 +39,43 @@ Chat with (a small portion of) Wikipedia
 ⚠️ RAG functionality is still under development. ⚠️
-![app screenshot](screenshots/screenshot_062f205a.png "App screenshot (this AI response is not accurate)")
+![app screenshot](screenshots/screenshot_0.2.0.png "App screenshot")
 ## Requirements
- 1. [uv](https://docs.astral.sh/uv/)
- 2. At least one of the following:
-    - A suitable terminal emulator. In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)). On Linux, you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/) instead of the terminal that came with your DE ([reason](https://darren.codes/posts/textual-copy-paste/)). Windows Terminal should be fine as far as I know.
-    - Any common web browser
+ 1. The [uv](https://docs.astral.sh/uv/) Python package manager
+    - Installing and updating `uv` is easy by following [the docs](https://docs.astral.sh/uv/getting-started/installation/).
+    - As of 2026-01-25, I'm developing using `uv` version 0.9.26, and using the new experimental `--pytorch-backend` option.
+ 2. A terminal emulator or web browser
+    - Any common web browser will work.
+    - Some terminal emulators will work better than others.
+      See [Notes on terminal emulators](#notes-on-terminal-emulators) below.
-## Optional stuff that could make your experience better
+### Notes on terminal emulators
+Certain terminal emulators will not work with some features of this program.
+In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)).
+On Linux you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/), instead of the terminal that came with your desktop environment ([reason](https://darren.codes/posts/textual-copy-paste/)).
+Windows Terminal should be fine as far as I know.
+### Optional dependencies
  1. [Hugging Face login](https://huggingface.co/docs/huggingface_hub/quick-start#login)
  2. API key for your favorite LLM provider (support coming soon)
  3. Ollama installed on your system if you have a GPU
  4. Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
+ 5. A C compiler if you want to build Llama.cpp from source.
-## Run from the repository
-First, clone this repository. Then, run one of the options below.
+## Run the latest version
 Run in a terminal:
 ```bash
-uv run chat
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest chat
 ```
 Or run in a web browser:
 ```bash
-uv run textual serve chat
-```
-## Run from the latest version on PyPI
-TODO: test uv automatic torch backend selection:
-https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection
-Run in a terminal:
-```bash
-uvx --from=jehoctor-rag-demo chat
-```
-Or run in a web browser:
-```bash
-uvx --from=jehoctor-rag-demo textual serve chat
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chat
 ```
 ## CUDA acceleration via Llama.cpp
@@ -86,15 +83,43 @@ uvx --from=jehoctor-rag-demo textual serve chat
 If you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.
 ```bash
-CMAKE_ARGS="-DGGML_CUDA=on" uv run chat
+CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chat
 ```
 ## Metal acceleration via Llama.cpp (on Apple Silicon)
 On an Apple Silicon machine, make sure `uv` runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support.
+Also, run with the extra group `llamacpp`.
+Try this:
+```bash
+uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from=jehoctor-rag-demo[llamacpp]@latest chat
+```
 ## Ollama on Linux
 Remember that you have to keep Ollama up-to-date manually on Linux.
 A recent version of Ollama (v0.11.10 or later) is required to run the [embedding model we use](https://ollama.com/library/embeddinggemma).
 See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.
+## Project feature roadmap
+- ❌ RAG functionality
+- ❌ torch inference via the Langchain local Hugging Face inference integration
+- ❌ uv automatic torch backend selection (see [the docs](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection))
+- ❌ OpenAI integration
+- ❌ Anthropic integration
+## Run from the repository
+First, clone this repository. Then, run one of the options below.
+Run in a terminal:
+```bash
+uv run chat
+```
+Or run in a web browser:
+```bash
+uv run textual serve chat
+```

jehoctor_rag_demo-0.2.1.dist-info/RECORD ADDED Viewed

@@ -0,0 +1,31 @@
+rag_demo/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+rag_demo/__main__.py,sha256=S0UlQj3EldcXRk3rhH3ONdSOPmeyITYXIZ0o2JSxWbg,1618
+rag_demo/agents/__init__.py,sha256=dsuO3AGcn2yGDq4gkAsZ32pjeTOqAudOL14G_AsEUyc,221
+rag_demo/agents/base.py,sha256=gib6bC8nVKN1s1KPZd1dJVGRXnu7gFQwf3I3_7TSjQo,1312
+rag_demo/agents/hugging_face.py,sha256=VrbGOlMO2z357LmU3sO5aM_yI5P-xbsfKmTAzH9_lFo,4225
+rag_demo/agents/llama_cpp.py,sha256=C0hInc24sXmt5407_k4mP2Y6svqgfUPemhzAL1N6jY0,4272
+rag_demo/agents/ollama.py,sha256=Fmtu8MSPPz91eT7HKvwvbQnA_xGPaD5HBrHEPPtomZA,3317
+rag_demo/app.py,sha256=AVCJjlQ60y5J0v50TcJ3zZoa0ubhd_yKVDfu1ERsMVU,1807
+rag_demo/app.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+rag_demo/app_protocol.py,sha256=P__Q3KT41uonYazpYmLWmOh1MeoBaCBx4xEydkwK5tk,3292
+rag_demo/constants.py,sha256=5EpyAD6p5Qb0vB5ASMtFSLVVsq3RG5cBn_AYihhDCPc,235
+rag_demo/db.py,sha256=53n662Hj9sTqPNcCI2Q-6Ca_HXv3kBQdAhXU4DLhwBM,3226
+rag_demo/dirs.py,sha256=b0VR76kXRHSRWzaXzmAhfPr3-8WKY3ZLW8aLlaPI3Do,309
+rag_demo/logic.py,sha256=SkF_Hqu1WSLHzwvSd_mJiCMSxZYqDnteYFRpc6oCREY,8236
+rag_demo/markdown.py,sha256=CxzshWfANeiieZkzMlLzpRaz7tBY2_tZQxhs7b2ImKM,551
+rag_demo/modes/__init__.py,sha256=ccvURDWz51_IotzzlO2OH3i4_Ih_MgnGlOK_JCh45dY,91
+rag_demo/modes/_logic_provider.py,sha256=U3J8Fgq8MbNYd92FqENW-5YP_jXqKG3xmMmYoSUzhHo,1343
+rag_demo/modes/chat.py,sha256=2pmKhQ2uYZdjezNnNBINViBMcuTVE5YCom_HEbdJeXg,13607
+rag_demo/modes/chat.tcss,sha256=YANlgYygiOr-e61N9HaGGdRPM36pdr-l4u72G0ozt4o,1032
+rag_demo/modes/config.py,sha256=0A8IdY-GOeqCd0kMs2KMgQEsFFeVXEcnowOugtR_Q84,2609
+rag_demo/modes/config.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+rag_demo/modes/help.py,sha256=riV8o4WDtsim09R4cRi0xkpYLgj4CL38IrjEz_mrRmk,713
+rag_demo/modes/help.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+rag_demo/probe.py,sha256=aDD-smNauEXXoBKVgx5xsMawM5tL0QAEBFl07ZGrddc,5101
+rag_demo/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
+rag_demo/widgets/__init__.py,sha256=JQ1KQjdYQ4texHw2iT4IyBKgTW0SzNYbNoHAbrdCwtk,44
+rag_demo/widgets/escapable_input.py,sha256=VfFij4NOtQ4uX3YFETg5YPd0_nBMky9Xz-02oRdHu-w,4240
+jehoctor_rag_demo-0.2.1.dist-info/WHEEL,sha256=eh7sammvW2TypMMMGKgsM83HyA_3qQ5Lgg3ynoecH3M,79
+jehoctor_rag_demo-0.2.1.dist-info/entry_points.txt,sha256=-nDSFVcIqdTxzYM4fdveDk3xUKRhmlr_cRuqQechYh4,49
+jehoctor_rag_demo-0.2.1.dist-info/METADATA,sha256=nCXuy3TYPPf67DPFryndW2P6CRdSCcJkXxFXZ-UN4vs,4650
+jehoctor_rag_demo-0.2.1.dist-info/RECORD,,

rag_demo/__main__.py CHANGED Viewed

@@ -3,21 +3,32 @@ import time
 # Measure the application start time.
 APPLICATION_START_TIME = time.time()
-# Disable "module import not at top of file" (aka E402) when importing Typer. This is necessary so that Typer's
-# initialization is included in the application startup time.
+# Disable "module import not at top of file" (aka E402) when importing Typer and other early imports. This is necessary
+# so that the initialization of these modules is included in the application startup time.
+from typing import Annotated  # noqa: E402
 import typer  # noqa: E402
+from rag_demo.constants import LocalProviderType  # noqa: E402
 def _main(
-    name: str | None = typer.Option(None, help="The name you want to want the AI to use with you."),
+    name: Annotated[str | None, typer.Option(help="The name you want to want the AI to use with you.")] = None,
+    provider: Annotated[LocalProviderType | None, typer.Option(help="The local provider to prefer.")] = None,
 ) -> None:
     """Talk to Wikipedia."""
     # Import here so that imports run within the typer.run context for prettier stack traces if errors occur.
     # We ignore PLC0415 because we do not want these imports to be at the top of the module as is usually preferred.
+    import transformers  # noqa: PLC0415
     from rag_demo.app import RAGDemo  # noqa: PLC0415
     from rag_demo.logic import Logic  # noqa: PLC0415
-    logic = Logic(username=name, application_start_time=APPLICATION_START_TIME)
+    # The transformers library likes to print text that interferes with the TUI. Disable it.
+    transformers.logging.set_verbosity(verbosity=transformers.logging.CRITICAL)
+    transformers.logging.disable_progress_bar()
+    logic = Logic(username=name, preferred_provider_type=provider, application_start_time=APPLICATION_START_TIME)
     app = RAGDemo(logic)
     app.run()

rag_demo/agents/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .base import Agent, AgentProvider
+from .hugging_face import HuggingFaceAgent, HuggingFaceAgentProvider
+from .llama_cpp import LlamaCppAgent, LlamaCppAgentProvider
+from .ollama import OllamaAgent, OllamaAgentProvider

rag_demo/agents/base.py ADDED Viewed

@@ -0,0 +1,40 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Final, Protocol
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from contextlib import AbstractAsyncContextManager
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+    from rag_demo.constants import LocalProviderType
+class Agent(Protocol):
+    """An LLM agent that supports streaming responses asynchronously."""
+    def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+class AgentProvider(Protocol):
+    """A strategy for creating LLM agents."""
+    type: Final[LocalProviderType]
+    def get_agent(self, checkpoints_sqlite_db: str | Path) -> AbstractAsyncContextManager[Agent | None]:
+        """Attempt to create an agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """

rag_demo/agents/hugging_face.py ADDED Viewed

@@ -0,0 +1,116 @@
+from __future__ import annotations
+import asyncio
+import sqlite3
+from contextlib import asynccontextmanager
+from typing import TYPE_CHECKING, Final
+from huggingface_hub import hf_hub_download
+from langchain.agents import create_agent
+from langchain.messages import AIMessageChunk, HumanMessage
+from langchain_huggingface import ChatHuggingFace, HuggingFaceEmbeddings, HuggingFacePipeline
+from langgraph.checkpoint.sqlite import SqliteSaver
+from rag_demo.constants import LocalProviderType
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+class HuggingFaceAgent:
+    """An LLM agent powered by Hugging Face local pipelines."""
+    def __init__(
+        self,
+        checkpoints_sqlite_db: str | Path,
+        model_id: str,
+        embedding_model_id: str,
+    ) -> None:
+        """Initialize the HuggingFaceAgent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+            model_id (str): Hugging Face model ID for the LLM.
+            embedding_model_id (str): Hugging Face model ID for the embedding model.
+        """
+        self.checkpoints_sqlite_db = checkpoints_sqlite_db
+        self.model_id = model_id
+        self.embedding_model_id = embedding_model_id
+        self.llm = ChatHuggingFace(
+            llm=HuggingFacePipeline.from_model_id(
+                model_id=model_id,
+                task="text-generation",
+                device_map="auto",
+                pipeline_kwargs={"max_new_tokens": 4096},
+            ),
+        )
+        self.embed = HuggingFaceEmbeddings(model_name=embedding_model_id)
+        self.agent = create_agent(
+            model=self.llm,
+            system_prompt="You are a helpful assistant.",
+            checkpointer=SqliteSaver(sqlite3.Connection(self.checkpoints_sqlite_db, check_same_thread=False)),
+        )
+    async def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+        agent_stream = self.agent.stream(
+            {"messages": [HumanMessage(content=user_message)]},
+            {"configurable": {"thread_id": thread_id}},
+            stream_mode="messages",
+        )
+        for message_chunk, _ in agent_stream:
+            if isinstance(message_chunk, AIMessageChunk):
+                token = message_chunk.content
+                if isinstance(token, str):
+                    yield token
+                else:
+                    app.log.error("Received message content of type", type(token))
+            else:
+                app.log.error("Received message chunk of type", type(message_chunk))
+def _hf_downloads() -> None:
+    hf_hub_download(
+        repo_id="Qwen/Qwen3-0.6B",  # 1.5GB
+        filename="model.safetensors",
+        revision="c1899de289a04d12100db370d81485cdf75e47ca",
+    )
+    hf_hub_download(
+        repo_id="unsloth/embeddinggemma-300m",  # 1.21GB
+        filename="model.safetensors",
+        revision="bfa3c846ac738e62aa61806ef9112d34acb1dc5a",
+    )
+class HuggingFaceAgentProvider:
+    """Create LLM agents using Hugging Face local pipelines."""
+    type: Final[LocalProviderType] = LocalProviderType.HUGGING_FACE
+    @asynccontextmanager
+    async def get_agent(self, checkpoints_sqlite_db: str | Path) -> AsyncIterator[HuggingFaceAgent]:
+        """Create a Hugging Face local pipeline agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """
+        loop = asyncio.get_running_loop()
+        await loop.run_in_executor(None, _hf_downloads)
+        yield HuggingFaceAgent(
+            checkpoints_sqlite_db,
+            model_id="Qwen/Qwen3-0.6B",
+            embedding_model_id="unsloth/embeddinggemma-300m",
+        )

rag_demo/agents/llama_cpp.py ADDED Viewed

@@ -0,0 +1,113 @@
+from __future__ import annotations
+import asyncio
+from contextlib import asynccontextmanager
+from typing import TYPE_CHECKING, Final
+import aiosqlite
+from huggingface_hub import hf_hub_download
+from langchain.agents import create_agent
+from langchain.messages import AIMessageChunk, HumanMessage
+from langchain_community.chat_models import ChatLlamaCpp
+from langchain_community.embeddings import LlamaCppEmbeddings
+from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
+from rag_demo import probe
+from rag_demo.constants import LocalProviderType
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+class LlamaCppAgent:
+    """An LLM agent powered by Llama.cpp."""
+    def __init__(
+        self,
+        checkpoints_conn: aiosqlite.Connection,
+        model_path: str,
+        embedding_model_path: str,
+    ) -> None:
+        """Initialize the LlamaCppAgent.
+        Args:
+            checkpoints_conn (aiosqlite.Connection): Connection to SQLite checkpoint database.
+            model_path (str): Path to Llama.cpp model.
+            embedding_model_path (str): Path to Llama.cpp embedding model.
+        """
+        self.checkpoints_conn = checkpoints_conn
+        self.llm = ChatLlamaCpp(model_path=model_path, verbose=False)
+        self.embed = LlamaCppEmbeddings(model_path=embedding_model_path, verbose=False)
+        self.agent = create_agent(
+            model=self.llm,
+            system_prompt="You are a helpful assistant.",
+            checkpointer=AsyncSqliteSaver(self.checkpoints_conn),
+        )
+    async def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+        agent_stream = self.agent.astream(
+            {"messages": [HumanMessage(content=user_message)]},
+            {"configurable": {"thread_id": thread_id}},
+            stream_mode="messages",
+        )
+        async for message_chunk, _ in agent_stream:
+            if isinstance(message_chunk, AIMessageChunk):
+                token = message_chunk.content
+                if isinstance(token, str):
+                    yield token
+                else:
+                    app.log.error("Received message content of type", type(token))
+            else:
+                app.log.error("Received message chunk of type", type(message_chunk))
+def _hf_downloads() -> tuple[str, str]:
+    model_path = hf_hub_download(
+        repo_id="bartowski/google_gemma-3-4b-it-GGUF",
+        filename="google_gemma-3-4b-it-Q6_K_L.gguf",  # 3.35GB
+        revision="71506238f970075ca85125cd749c28b1b0eee84e",
+    )
+    embedding_model_path = hf_hub_download(
+        repo_id="CompendiumLabs/bge-small-en-v1.5-gguf",
+        filename="bge-small-en-v1.5-q8_0.gguf",  # 36.8MB
+        revision="d32f8c040ea3b516330eeb75b72bcc2d3a780ab7",
+    )
+    return model_path, embedding_model_path
+class LlamaCppAgentProvider:
+    """Create LLM agents using Llama.cpp."""
+    type: Final[LocalProviderType] = LocalProviderType.LLAMA_CPP
+    @asynccontextmanager
+    async def get_agent(self, checkpoints_sqlite_db: str | Path) -> AsyncIterator[LlamaCppAgent | None]:
+        """Attempt to create a Llama.cpp agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """
+        if probe.probe_llama_available():
+            loop = asyncio.get_running_loop()
+            model_path, embedding_model_path = await loop.run_in_executor(None, _hf_downloads)
+            async with aiosqlite.connect(database=checkpoints_sqlite_db) as checkpoints_conn:
+                yield LlamaCppAgent(
+                    checkpoints_conn=checkpoints_conn,
+                    model_path=model_path,
+                    embedding_model_path=embedding_model_path,
+                )
+        else:
+            yield None

rag_demo/agents/ollama.py ADDED Viewed

@@ -0,0 +1,91 @@
+from __future__ import annotations
+from contextlib import asynccontextmanager
+from typing import TYPE_CHECKING, Final
+import aiosqlite
+import ollama
+from langchain.agents import create_agent
+from langchain.messages import AIMessageChunk, HumanMessage
+from langchain_ollama import ChatOllama, OllamaEmbeddings
+from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
+from rag_demo import probe
+from rag_demo.constants import LocalProviderType
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+class OllamaAgent:
+    """An LLM agent powered by Ollama."""
+    def __init__(self, checkpoints_conn: aiosqlite.Connection) -> None:
+        """Initialize the OllamaAgent.
+        Args:
+            checkpoints_conn (aiosqlite.Connection): Asynchronous connection to SQLite db for checkpoints.
+        """
+        self.checkpoints_conn = checkpoints_conn
+        ollama.pull("gemma3:latest")  # 3.3GB
+        ollama.pull("embeddinggemma:latest")  # 621MB
+        self.llm = ChatOllama(
+            model="gemma3:latest",
+            validate_model_on_init=True,
+            temperature=0.5,
+            num_predict=4096,
+        )
+        self.embed = OllamaEmbeddings(model="embeddinggemma:latest")
+        self.agent = create_agent(
+            model=self.llm,
+            system_prompt="You are a helpful assistant.",
+            checkpointer=AsyncSqliteSaver(self.checkpoints_conn),
+        )
+    async def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+        agent_stream = self.agent.astream(
+            {"messages": [HumanMessage(content=user_message)]},
+            {"configurable": {"thread_id": thread_id}},
+            stream_mode="messages",
+        )
+        async for message_chunk, _ in agent_stream:
+            if isinstance(message_chunk, AIMessageChunk):
+                token = message_chunk.content
+                if isinstance(token, str):
+                    yield token
+                else:
+                    app.log.error("Received message content of type", type(token))
+            else:
+                app.log.error("Received message chunk of type", type(message_chunk))
+class OllamaAgentProvider:
+    """Create LLM agents using Ollama."""
+    type: Final[LocalProviderType] = LocalProviderType.OLLAMA
+    @asynccontextmanager
+    async def get_agent(self, checkpoints_sqlite_db: str | Path) -> AsyncIterator[OllamaAgent | None]:
+        """Attempt to create an Ollama agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """
+        if probe.probe_ollama() is not None:
+            async with aiosqlite.connect(database=checkpoints_sqlite_db) as checkpoints_conn:
+                yield OllamaAgent(checkpoints_conn=checkpoints_conn)
+        else:
+            yield None

rag_demo/app.py CHANGED Viewed

@@ -48,7 +48,7 @@ class RAGDemo(App):
         self.run_worker(self._hold_runtime())
     async def _hold_runtime(self) -> None:
-        async with self.logic.runtime(app_like=self) as runtime:
+        async with self.logic.runtime(app=self) as runtime:
             self._runtime_future.set_result(runtime)
             # Pause the task until Textual cancels it when the application closes.
             await asyncio.Event().wait()

rag_demo/app_protocol.py ADDED Viewed

@@ -0,0 +1,101 @@
+"""Interface for the logic to call back into the app code.
+This is necessary to make the logic code testable. We don't want to have to run all the app code to test the logic. And,
+we want to have a high degree of confidence when mocking out the app code in logic tests. The basic pattern is that each
+piece of functionality that the logic depends on will have a protocol and an implementation of that protocol using the
+Textual App. In the tests, we create a mock implementation of the same protocol. Correctness of the logic is defined by
+its ability to work correctly with any implementation of the protocol, not just the implementation backed by the app.
+"""
+from __future__ import annotations
+from typing import TYPE_CHECKING, Protocol, TypeVar
+if TYPE_CHECKING:
+    from collections.abc import Awaitable
+    from textual.worker import Worker
+class LoggerProtocol(Protocol):
+    """Protocol that mimics textual.Logger."""
+    def __call__(self, *args: object, **kwargs: object) -> None:
+        """Log a message.
+        Args:
+            *args (object): Logged directly to the message separated by spaces.
+            **kwargs (object): Logged to the message as f"{key}={value!r}", separated by spaces.
+        """
+    def verbosity(self, *, verbose: bool) -> LoggerProtocol:
+        """Get a new logger with selective verbosity.
+        Note that unlike when using this method on a Textual logger directly, the type system will enforce that you use
+        `verbose` as a keyword argument (not a positional argument). I made this change to address ruff's FBT001 rule.
+        Put simply, this requirement makes the calling code easier to read.
+        https://docs.astral.sh/ruff/rules/boolean-type-hint-positional-argument/
+        Args:
+            verbose: True to use HIGH verbosity, otherwise NORMAL.
+        Returns:
+            New logger.
+        """
+    @property
+    def verbose(self) -> LoggerProtocol:
+        """A verbose logger."""
+    @property
+    def event(self) -> LoggerProtocol:
+        """Logs events."""
+    @property
+    def debug(self) -> LoggerProtocol:
+        """Logs debug messages."""
+    @property
+    def info(self) -> LoggerProtocol:
+        """Logs information."""
+    @property
+    def warning(self) -> LoggerProtocol:
+        """Logs warnings."""
+    @property
+    def error(self) -> LoggerProtocol:
+        """Logs errors."""
+    @property
+    def system(self) -> LoggerProtocol:
+        """Logs system information."""
+    @property
+    def logging(self) -> LoggerProtocol:
+        """Logs from stdlib logging module."""
+    @property
+    def worker(self) -> LoggerProtocol:
+        """Logs worker information."""
+ResultType = TypeVar("ResultType")
+class AppProtocol(Protocol):
+    """Protocol for the subset of what the main App can do that the runtime needs."""
+    def run_worker(self, work: Awaitable[ResultType], *, thread: bool = False) -> Worker[ResultType]:
+        """Run a coroutine in the background.
+        See https://textual.textualize.io/guide/workers/.
+        Args:
+            work (Awaitable[ResultType]): The coroutine to run.
+            thread (bool): Mark the worker as a thread worker.
+        """
+    @property
+    def log(self) -> LoggerProtocol:
+        """Returns the application logger."""

rag_demo/constants.py ADDED Viewed

@@ -0,0 +1,11 @@
+from __future__ import annotations
+from enum import StrEnum, auto
+class LocalProviderType(StrEnum):
+    """Enum of supported local LLM backend provider types."""
+    HUGGING_FACE = auto()
+    LLAMA_CPP = auto()
+    OLLAMA = auto()

rag_demo/logic.py CHANGED Viewed

@@ -1,57 +1,44 @@
 from __future__ import annotations
-import contextlib
-import platform
 import time
 from contextlib import asynccontextmanager
-from pathlib import Path
-from typing import TYPE_CHECKING, Protocol, TypeVar, cast
+from typing import TYPE_CHECKING, cast
-import aiosqlite
-import cpuinfo
-import httpx
-import huggingface_hub
-import llama_cpp
-import ollama
-import psutil
-import pynvml
 from datasets import Dataset, load_dataset
-from huggingface_hub import hf_hub_download
-from huggingface_hub.constants import HF_HUB_CACHE
-from langchain.agents import create_agent
-from langchain.messages import AIMessageChunk, HumanMessage
-from langchain_community.chat_models import ChatLlamaCpp
-from langchain_community.embeddings import LlamaCppEmbeddings
 from langchain_core.exceptions import LangChainException
-from langchain_ollama import ChatOllama, OllamaEmbeddings
-from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
 from rag_demo import dirs
+from rag_demo.agents import (
+    Agent,
+    AgentProvider,
+    HuggingFaceAgentProvider,
+    LlamaCppAgentProvider,
+    OllamaAgentProvider,
+)
 from rag_demo.db import AtomicIDManager
 from rag_demo.modes.chat import Response, StoppedStreamError
 if TYPE_CHECKING:
-    from collections.abc import AsyncIterator, Awaitable
-    from textual.worker import Worker
+    from collections.abc import AsyncIterator, Sequence
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+    from rag_demo.constants import LocalProviderType
     from rag_demo.modes import ChatScreen
-ResultType = TypeVar("ResultType")
+class UnknownPreferredProviderError(ValueError):
+    """Raised when the preferred provider cannot be checked first due to being unknown."""
-class AppLike(Protocol):
-    """Protocol for the subset of what the main App can do that the runtime needs."""
+    def __init__(self, preferred_provider: LocalProviderType) -> None:  # noqa: D107
+        super().__init__(f"Unknown preferred provider: {preferred_provider}")
-    def run_worker(self, work: Awaitable[ResultType]) -> Worker[ResultType]:
-        """Run a coroutine in the background.
-        See https://textual.textualize.io/guide/workers/.
+class NoProviderError(RuntimeError):
+    """Raised when no provider could provide an agent."""
-        Args:
-            work (Awaitable[ResultType]): The coroutine to run.
-        """
-        ...
+    def __init__(self) -> None:  # noqa: D107
+        super().__init__("No provider could provide an agent.")
 class Runtime:
@@ -60,50 +47,28 @@ class Runtime:
     def __init__(
         self,
         logic: Logic,
-        checkpoints_conn: aiosqlite.Connection,
+        app: AppProtocol,
+        agent: Agent,
         thread_id_manager: AtomicIDManager,
-        app_like: AppLike,
     ) -> None:
+        """Initialize the runtime.
+        Args:
+            logic (Logic): The application logic.
+            app (AppProtocol): The application interface.
+            agent (Agent): The agent to use.
+            thread_id_manager (AtomicIDManager): The thread ID manager.
+        """
         self.runtime_start_time = time.time()
         self.logic = logic
-        self.checkpoints_conn = checkpoints_conn
+        self.app = app
+        self.agent = agent
         self.thread_id_manager = thread_id_manager
-        self.app_like = app_like
         self.current_thread: int | None = None
         self.generating = False
-        if self.logic.probe_ollama() is not None:
-            ollama.pull("gemma3:latest")  # 3.3GB
-            ollama.pull("embeddinggemma:latest")  # 621MB
-            self.llm = ChatOllama(
-                model="gemma3:latest",
-                validate_model_on_init=True,
-                temperature=0.5,
-                num_predict=4096,
-            )
-            self.embed = OllamaEmbeddings(model="embeddinggemma:latest")
-        else:
-            model_path = hf_hub_download(
-                repo_id="bartowski/google_gemma-3-4b-it-GGUF",
-                filename="google_gemma-3-4b-it-Q6_K_L.gguf",  # 3.35GB
-                revision="71506238f970075ca85125cd749c28b1b0eee84e",
-            )
-            embedding_model_path = hf_hub_download(
-                repo_id="CompendiumLabs/bge-small-en-v1.5-gguf",
-                filename="bge-small-en-v1.5-q8_0.gguf",  # 36.8MB
-                revision="d32f8c040ea3b516330eeb75b72bcc2d3a780ab7",
-            )
-            self.llm = ChatLlamaCpp(model_path=model_path, verbose=False)
-            self.embed = LlamaCppEmbeddings(model_path=embedding_model_path, verbose=False)  # pyright: ignore[reportCallIssue]
-        self.agent = create_agent(
-            model=self.llm,
-            system_prompt="You are a helpful assistant.",
-            checkpointer=AsyncSqliteSaver(self.checkpoints_conn),
-        )
-    def get_rag_datasets(self) -> None:
+    def _get_rag_datasets(self) -> None:
         self.qa_test: Dataset = cast(
             "Dataset",
             load_dataset("rag-datasets/rag-mini-wikipedia", "question-answer", split="test"),
@@ -123,21 +88,9 @@ class Runtime:
         """
         self.generating = True
         async with response_widget.stream_writer() as writer:
-            agent_stream = self.agent.astream(
-                {"messages": [HumanMessage(content=request_text)]},
-                {"configurable": {"thread_id": thread}},
-                stream_mode="messages",
-            )
             try:
-                async for message_chunk, _ in agent_stream:
-                    if isinstance(message_chunk, AIMessageChunk):
-                        token = cast("AIMessageChunk", message_chunk).content
-                        if isinstance(token, str):
-                            await writer.write(token)
-                        else:
-                            response_widget.log.error(f"Received message content of type {type(token)}")
-                    else:
-                        response_widget.log.error(f"Received message chunk of type {type(message_chunk)}")
+                async for message_chunk in self.agent.astream(request_text, thread, self.app):
+                    await writer.write(message_chunk)
             except StoppedStreamError as e:
                 response_widget.set_shown_object(e)
             except LangChainException as e:
@@ -145,10 +98,24 @@ class Runtime:
         self.generating = False
     def new_conversation(self, chat_screen: ChatScreen) -> None:
+        """Clear the screen and start a new conversation with the agent.
+        Args:
+            chat_screen (ChatScreen): The chat screen to clear.
+        """
         self.current_thread = None
         chat_screen.clear_chats()
     async def submit_request(self, chat_screen: ChatScreen, request_text: str) -> bool:
+        """Submit a new user request in the current conversation.
+        Args:
+            chat_screen (ChatScreen): The chat screen in which the request is submitted.
+            request_text (str): The text of the request.
+        Returns:
+            bool: True if the request was accepted for immediate processing, False otherwise.
+        """
         if self.generating:
             return False
         self.generating = True
@@ -168,120 +135,67 @@ class Logic:
     def __init__(
         self,
         username: str | None = None,
+        preferred_provider_type: LocalProviderType | None = None,
         application_start_time: float | None = None,
         checkpoints_sqlite_db: str | Path = dirs.DATA_DIR / "checkpoints.sqlite3",
         app_sqlite_db: str | Path = dirs.DATA_DIR / "app.sqlite3",
+        agent_providers: Sequence[AgentProvider] = (
+            LlamaCppAgentProvider(),
+            OllamaAgentProvider(),
+            HuggingFaceAgentProvider(),
+        ),
     ) -> None:
         """Initialize the application logic.
         Args:
             username (str | None, optional): The username provided as a command line argument. Defaults to None.
+            preferred_provider_type (LocalProviderType | None, optional): Provider type to prefer. Defaults to None.
             application_start_time (float | None, optional): The time when the application started. Defaults to None.
             checkpoints_sqlite_db (str | Path, optional): The connection string for the SQLite database used for
                 Langchain checkpointing. Defaults to (dirs.DATA_DIR / "checkpoints.sqlite3").
             app_sqlite_db (str | Path, optional): The connection string for the SQLite database used for application
                 state such a thread metadata. Defaults to (dirs.DATA_DIR / "app.sqlite3").
+            agent_providers (Sequence[AgentProvider], optional): Sequence of agent providers in default preference
+                order. If preferred_provider_type is not None, this sequence will be reordered to bring providers of
+                that type to the front, using the original order to break ties. Defaults to (
+                    LlamaCppAgentProvider(),
+                    OllamaAgentProvider(),
+                    HuggingFaceAgentProvider(),
+                ).
         """
         self.logic_start_time = time.time()
         self.username = username
+        self.preferred_provider_type = preferred_provider_type
         self.application_start_time = application_start_time
         self.checkpoints_sqlite_db = checkpoints_sqlite_db
         self.app_sqlite_db = app_sqlite_db
+        self.agent_providers: Sequence[AgentProvider] = agent_providers
     @asynccontextmanager
-    async def runtime(self, app_like: AppLike) -> AsyncIterator[Runtime]:
+    async def runtime(self, app: AppProtocol) -> AsyncIterator[Runtime]:
         """Returns a runtime context for the application."""
-        # TODO: Do I need to set check_same_thread=False in aiosqlite.connect?
-        async with aiosqlite.connect(database=self.checkpoints_sqlite_db) as checkpoints_conn:
-            id_manager = AtomicIDManager(self.app_sqlite_db)
-            await id_manager.initialize()
-            yield Runtime(self, checkpoints_conn, id_manager, app_like)
-    def probe_os(self) -> str:
-        """Returns the OS name (eg 'Linux' or 'Windows'), the system name (eg 'Java'), or an empty string if unknown."""
-        return platform.system()
-    def probe_architecture(self) -> str:
-        """Returns the machine architecture, such as 'i386'."""
-        return platform.machine()
-    def probe_cpu(self) -> str:
-        """Returns the name of the CPU, e.g. "Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz"."""
-        return cpuinfo.get_cpu_info()["brand_raw"]
-    def probe_ram(self) -> int:
-        """Returns the total amount of RAM in bytes."""
-        return psutil.virtual_memory().total
-    def probe_disk_space(self) -> int:
-        """Returns the amount of free space in the root directory (in bytes)."""
-        return psutil.disk_usage("/").free
-    def probe_llamacpp_gpu_support(self) -> bool:
-        """Returns True if LlamaCpp supports GPU offloading, False otherwise."""
-        return llama_cpp.llama_supports_gpu_offload()
-    def probe_huggingface_free_cache_space(self) -> int | None:
-        """Returns the amount of free space in the Hugging Face cache (in bytes), or None if it can't be determined."""
-        with contextlib.suppress(FileNotFoundError):
-            return psutil.disk_usage(HF_HUB_CACHE).free
-        for parent_dir in Path(HF_HUB_CACHE).parents:
-            with contextlib.suppress(FileNotFoundError):
-                return psutil.disk_usage(str(parent_dir)).free
-        return None
+        thread_id_manager = AtomicIDManager(self.app_sqlite_db)
+        await thread_id_manager.initialize()
-    def probe_huggingface_cached_models(self) -> list[huggingface_hub.CachedRepoInfo] | None:
-        """Returns a list of models in the Hugging Face cache (possibly empty), or None if the cache doesn't exist."""
-        # The docstring for huggingface_hub.scan_cache_dir says it raises CacheNotFound "if the cache directory does not
-        # exist," and ValueError "if the cache directory is a file, instead of a directory."
-        with contextlib.suppress(ValueError, huggingface_hub.CacheNotFound):
-            return [repo for repo in huggingface_hub.scan_cache_dir().repos if repo.repo_type == "model"]
-        return None  # Isn't it nice to be explicit?
-    def probe_huggingface_cached_datasets(self) -> list[huggingface_hub.CachedRepoInfo] | None:
-        """Returns a list of datasets in the Hugging Face cache (possibly empty), or None if the cache doesn't exist."""
-        with contextlib.suppress(ValueError, huggingface_hub.CacheNotFound):
-            return [repo for repo in huggingface_hub.scan_cache_dir().repos if repo.repo_type == "dataset"]
-        return None
-    def probe_nvidia(self) -> tuple[int, list[str]]:
-        """Detect available NVIDIA GPUs and CUDA driver version.
-        Returns:
-            tuple[int, list[str]]: A tuple (cuda_version, nv_gpus) where cuda_version is the installed CUDA driver
-                version and nv_gpus is a list of GPU models corresponding to installed NVIDIA GPUs
-        """
-        try:
-            pynvml.nvmlInit()
-        except pynvml.NVMLError:
-            return -1, []
-        cuda_version = -1
-        nv_gpus = []
-        try:
-            cuda_version = pynvml.nvmlSystemGetCudaDriverVersion()
-            for i in range(pynvml.nvmlDeviceGetCount()):
-                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
-                nv_gpus.append(pynvml.nvmlDeviceGetName(handle))
-        except pynvml.NVMLError:
-            pass
-        finally:
-            with contextlib.suppress(pynvml.NVMLError):
-                pynvml.nvmlShutdown()
-        return cuda_version, nv_gpus
-    def probe_ollama(self) -> list[ollama.ListResponse.Model] | None:
-        """Returns a list of models installed in Ollama, or None if connecting to Ollama fails."""
-        with contextlib.suppress(ConnectionError):
-            return list(ollama.list().models)
-        return None
-    def probe_ollama_version(self) -> str | None:
-        """Returns the Ollama version string (e.g. "0.13.5"), or None if connecting to Ollama fails."""
-        # Yes, this uses private attributes, but that lets me use the Ollama Python lib's env var logic. If you use env
-        # vars to direct the app to a different Ollama server, this will query the same Ollama endpoint as the
-        # ollama.list() call above. Therefore I silence SLF001 here.
-        with contextlib.suppress(httpx.HTTPError, KeyError, ValueError):
-            response: httpx.Response = ollama._client._client.request("GET", "/api/version")  # noqa: SLF001
-            response.raise_for_status()
-            return response.json()["version"]
-        return None
+        agent_providers: Sequence[AgentProvider] = self.agent_providers
+        if self.preferred_provider_type is not None:
+            preferred_providers: Sequence[AgentProvider] = tuple(
+                ap for ap in agent_providers if ap.type == self.preferred_provider_type
+            )
+            if len(preferred_providers) == 0:
+                raise UnknownPreferredProviderError(self.preferred_provider_type)
+            agent_providers = (
+                *preferred_providers,
+                *(ap for ap in agent_providers if ap.type != self.preferred_provider_type),
+            )
+        for agent_provider in agent_providers:
+            async with agent_provider.get_agent(checkpoints_sqlite_db=self.checkpoints_sqlite_db) as agent:
+                if agent is not None:
+                    yield Runtime(
+                        logic=self,
+                        app=app,
+                        agent=agent,
+                        thread_id_manager=thread_id_manager,
+                    )
+                    return
+        raise NoProviderError

rag_demo/modes/_logic_provider.py CHANGED Viewed

@@ -10,11 +10,12 @@ if TYPE_CHECKING:
 class LogicProvider(Protocol):
-    """ABC for classes that contain application logic."""
+    """Protocol for classes that contain application logic."""
     logic: Logic
-    async def runtime(self) -> Runtime: ...
+    async def runtime(self) -> Runtime:
+        """Returns the application runtime of the parent app."""
 class LogicProviderScreen(Screen):

rag_demo/modes/chat.py CHANGED Viewed

@@ -3,7 +3,7 @@ from __future__ import annotations
 import time
 from contextlib import asynccontextmanager
 from pathlib import Path
-from typing import TYPE_CHECKING, Any
+from typing import TYPE_CHECKING
 import pyperclip
 from textual.containers import HorizontalGroup, VerticalGroup, VerticalScroll
@@ -116,7 +116,7 @@ class Response(LogicProviderWidget):
         self.set_reactive(Response.content, content)
         self._stream: ResponseWriter | None = None
         self.__object_to_show_sentinel = object()
-        self._object_to_show: Any = self.__object_to_show_sentinel
+        self._object_to_show: object = self.__object_to_show_sentinel
     def compose(self) -> ComposeResult:
         """Compose the initial content of the widget."""
@@ -137,7 +137,8 @@ class Response(LogicProviderWidget):
         self.query_one("#object-view", Pretty).display = False
         self.query_one("#stop", Button).display = False
-    def set_shown_object(self, obj: Any) -> None:  # noqa: ANN401
+    def set_shown_object(self, obj: object) -> None:
+        """Show an object using a Pretty Widget instead of showing markdown or raw response content."""
         self._object_to_show = obj
         self.query_one("#markdown-view", Markdown).display = False
         self.query_one("#raw-view", Label).display = False
@@ -146,6 +147,7 @@ class Response(LogicProviderWidget):
         self.query_one("#object-view", Pretty).display = True
     def clear_shown_object(self) -> None:
+        """Stop showing an object in the widget."""
         self._object_to_show = self.__object_to_show_sentinel
         self.query_one("#object-view", Pretty).display = False
         if self.show_raw:
@@ -192,14 +194,14 @@ class Response(LogicProviderWidget):
             try:
                 pyperclip.copy(self.content)
             except pyperclip.PyperclipException as e:
-                self.app.log.error(f"Error copying to clipboard with Pyperclip: {e}")
+                self.app.log.error("Error copying to clipboard with Pyperclip:", e)
             checkpoint2 = time.time()
             self.notify(f"Copied {len(self.content.splitlines())} lines of text to clipboard")
             end = time.time()
-            self.app.log.info(f"Textual copy took {checkpoint - start:.6f} seconds")
-            self.app.log.info(f"Pyperclip copy took {checkpoint2 - checkpoint:.6f} seconds")
-            self.app.log.info(f"Notify took {end - checkpoint2:.6f} seconds")
-            self.app.log.info(f"Total of {end - start:.6f} seconds")
+            self.app.log.info("Textual copy took", f"{checkpoint - start:.6f}", "seconds")
+            self.app.log.info("Pyperclip copy took", f"{checkpoint2 - checkpoint:.6f}", "seconds")
+            self.app.log.info("Notify took", f"{end - checkpoint2:.6f}", "seconds")
+            self.app.log.info("Total of", f"{end - start:.6f}", "seconds")
     def watch_show_raw(self) -> None:
         """Handle reactive updates to the show_raw attribute by changing the visibility of the child widgets.

rag_demo/probe.py ADDED Viewed

@@ -0,0 +1,129 @@
+from __future__ import annotations
+import contextlib
+import platform
+from pathlib import Path
+import cpuinfo
+import httpx
+import huggingface_hub
+import ollama
+import psutil
+import pynvml
+from huggingface_hub.constants import HF_HUB_CACHE
+try:
+    # llama-cpp-python is an optional dependency. If it is not installed in the dev environment then we need to ignore
+    # unresolved-import. If it is installed, then we need to ignore unused-ignore-comment (because there is no need to
+    # ignore unresolved-import in this case).
+    import llama_cpp  # ty:ignore[unresolved-import, unused-ignore-comment]
+    LLAMA_AVAILABLE = True
+except ImportError:
+    LLAMA_AVAILABLE = False
+def probe_os() -> str:
+    """Returns the OS name (eg 'Linux' or 'Windows'), the system name (eg 'Java'), or an empty string if unknown."""
+    return platform.system()
+def probe_architecture() -> str:
+    """Returns the machine architecture, such as 'i386'."""
+    return platform.machine()
+def probe_cpu() -> str:
+    """Returns the name of the CPU, e.g. "Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz"."""
+    return cpuinfo.get_cpu_info()["brand_raw"]
+def probe_ram() -> int:
+    """Returns the total amount of RAM in bytes."""
+    return psutil.virtual_memory().total
+def probe_disk_space() -> int:
+    """Returns the amount of free space in the root directory (in bytes)."""
+    return psutil.disk_usage("/").free
+def probe_llama_available() -> bool:
+    """Returns True if llama-cpp-python is installed, False otherwise."""
+    return LLAMA_AVAILABLE
+def probe_llamacpp_gpu_support() -> bool:
+    """Returns True if the installed version of llama-cpp-python supports GPU offloading, False otherwise."""
+    return LLAMA_AVAILABLE and llama_cpp.llama_supports_gpu_offload()
+def probe_huggingface_free_cache_space() -> int | None:
+    """Returns the amount of free space in the Hugging Face cache (in bytes), or None if it can't be determined."""
+    with contextlib.suppress(FileNotFoundError):
+        return psutil.disk_usage(HF_HUB_CACHE).free
+    for parent_dir in Path(HF_HUB_CACHE).parents:
+        with contextlib.suppress(FileNotFoundError):
+            return psutil.disk_usage(str(parent_dir)).free
+    return None
+def probe_huggingface_cached_models() -> list[huggingface_hub.CachedRepoInfo] | None:
+    """Returns a list of models in the Hugging Face cache (possibly empty), or None if the cache doesn't exist."""
+    # The docstring for huggingface_hub.scan_cache_dir says it raises CacheNotFound "if the cache directory does not
+    # exist," and ValueError "if the cache directory is a file, instead of a directory."
+    with contextlib.suppress(ValueError, huggingface_hub.CacheNotFound):
+        return [repo for repo in huggingface_hub.scan_cache_dir().repos if repo.repo_type == "model"]
+    return None  # Isn't it nice to be explicit?
+def probe_huggingface_cached_datasets() -> list[huggingface_hub.CachedRepoInfo] | None:
+    """Returns a list of datasets in the Hugging Face cache (possibly empty), or None if the cache doesn't exist."""
+    with contextlib.suppress(ValueError, huggingface_hub.CacheNotFound):
+        return [repo for repo in huggingface_hub.scan_cache_dir().repos if repo.repo_type == "dataset"]
+    return None
+def probe_nvidia() -> tuple[int, list[str]]:
+    """Detect available NVIDIA GPUs and CUDA driver version.
+    Returns:
+        tuple[int, list[str]]: A tuple (cuda_version, nv_gpus) where cuda_version is the installed CUDA driver
+            version and nv_gpus is a list of GPU models corresponding to installed NVIDIA GPUs
+    """
+    try:
+        pynvml.nvmlInit()
+    except pynvml.NVMLError:
+        return -1, []
+    cuda_version = -1
+    nv_gpus = []
+    try:
+        cuda_version = pynvml.nvmlSystemGetCudaDriverVersion()
+        for i in range(pynvml.nvmlDeviceGetCount()):
+            handle = pynvml.nvmlDeviceGetHandleByIndex(i)
+            nv_gpus.append(pynvml.nvmlDeviceGetName(handle))
+    except pynvml.NVMLError:
+        pass
+    finally:
+        with contextlib.suppress(pynvml.NVMLError):
+            pynvml.nvmlShutdown()
+    return cuda_version, nv_gpus
+def probe_ollama() -> list[ollama.ListResponse.Model] | None:
+    """Returns a list of models installed in Ollama, or None if connecting to Ollama fails."""
+    with contextlib.suppress(ConnectionError):
+        return list(ollama.list().models)
+    return None
+def probe_ollama_version() -> str | None:
+    """Returns the Ollama version string (e.g. "0.13.5"), or None if connecting to Ollama fails."""
+    # Yes, this uses private attributes, but that lets me use the Ollama Python lib's env var logic. If you use env
+    # vars to direct the app to a different Ollama server, this will query the same Ollama endpoint as the
+    # ollama.list() call above. Therefore I silence SLF001 here.
+    with contextlib.suppress(httpx.HTTPError, KeyError, ValueError):
+        response: httpx.Response = ollama._client._client.request("GET", "/api/version")  # noqa: SLF001
+        response.raise_for_status()
+        return response.json()["version"]
+    return None

jehoctor_rag_demo-0.2.0.dist-info/RECORD DELETED Viewed

@@ -1,23 +0,0 @@
-rag_demo/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-rag_demo/__main__.py,sha256=Kak0eQWBRHVGDoWgoHs9j-Tvf_9DMzdurMxD7EM4Jr0,1054
-rag_demo/app.py,sha256=xejrtFApeTeyOQvWDq1H0XPyZEr8cQPn7q9KRwnV660,1812
-rag_demo/app.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-rag_demo/db.py,sha256=53n662Hj9sTqPNcCI2Q-6Ca_HXv3kBQdAhXU4DLhwBM,3226
-rag_demo/dirs.py,sha256=b0VR76kXRHSRWzaXzmAhfPr3-8WKY3ZLW8aLlaPI3Do,309
-rag_demo/logic.py,sha256=7PTWPs9xZJ7bbEtNDMQTX6SX4JKG8HMiq2H_YUfM-CI,12602
-rag_demo/markdown.py,sha256=CxzshWfANeiieZkzMlLzpRaz7tBY2_tZQxhs7b2ImKM,551
-rag_demo/modes/__init__.py,sha256=ccvURDWz51_IotzzlO2OH3i4_Ih_MgnGlOK_JCh45dY,91
-rag_demo/modes/_logic_provider.py,sha256=__eO4XVbyRHkjV_D8OHsPJX5f2R8JoJPcNXhi-w_xFY,1277
-rag_demo/modes/chat.py,sha256=VigWSkw6R2ea95-wZ8tgtKIccev9A-ByzJj7nzglsog,13444
-rag_demo/modes/chat.tcss,sha256=YANlgYygiOr-e61N9HaGGdRPM36pdr-l4u72G0ozt4o,1032
-rag_demo/modes/config.py,sha256=0A8IdY-GOeqCd0kMs2KMgQEsFFeVXEcnowOugtR_Q84,2609
-rag_demo/modes/config.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-rag_demo/modes/help.py,sha256=riV8o4WDtsim09R4cRi0xkpYLgj4CL38IrjEz_mrRmk,713
-rag_demo/modes/help.tcss,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-rag_demo/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
-rag_demo/widgets/__init__.py,sha256=JQ1KQjdYQ4texHw2iT4IyBKgTW0SzNYbNoHAbrdCwtk,44
-rag_demo/widgets/escapable_input.py,sha256=VfFij4NOtQ4uX3YFETg5YPd0_nBMky9Xz-02oRdHu-w,4240
-jehoctor_rag_demo-0.2.0.dist-info/WHEEL,sha256=eh7sammvW2TypMMMGKgsM83HyA_3qQ5Lgg3ynoecH3M,79
-jehoctor_rag_demo-0.2.0.dist-info/entry_points.txt,sha256=-nDSFVcIqdTxzYM4fdveDk3xUKRhmlr_cRuqQechYh4,49
-jehoctor_rag_demo-0.2.0.dist-info/METADATA,sha256=wp1mdAqjB0be_1Uly4hwAoz0bjRUDI6gb6gK5SdrHRU,3531
-jehoctor_rag_demo-0.2.0.dist-info/RECORD,,

{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/WHEEL RENAMED Viewed

File without changes

{jehoctor_rag_demo-0.2.0.dist-info → jehoctor_rag_demo-0.2.1.dist-info}/entry_points.txt RENAMED Viewed

File without changes

jehoctor-rag-demo 0.2.0__py3-none-any.whl → 0.2.1__py3-none-any.whl

jehoctor-rag-demo 0.2.0py3-none-any.whl → 0.2.1py3-none-any.whl