PyPI - jehoctor-rag-demo - Versions diffs - 0.2.0__tar.gz → 0.2.1__tar.gz - Mend

jehoctor-rag-demo 0.2.0tar.gz → 0.2.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

{jehoctor_rag_demo-0.2.0 → jehoctor_rag_demo-0.2.1}/PKG-INFO RENAMED Viewed

@@ -1,10 +1,11 @@
 Metadata-Version: 2.3
 Name: jehoctor-rag-demo
-Version: 0.2.0
+Version: 0.2.1
 Summary: Chat with Wikipedia
 Author: James Hoctor
 Author-email: James Hoctor <JEHoctor@protonmail.com>
 Requires-Dist: aiosqlite==0.21.0
+Requires-Dist: bitsandbytes>=0.49.1
 Requires-Dist: chromadb>=1.3.4
 Requires-Dist: datasets>=4.4.1
 Requires-Dist: httpx>=0.28.1
@@ -16,7 +17,6 @@ Requires-Dist: langchain-huggingface>=1.1.0
 Requires-Dist: langchain-ollama>=1.0.0
 Requires-Dist: langchain-openai>=1.0.2
 Requires-Dist: langgraph-checkpoint-sqlite>=3.0.1
-Requires-Dist: llama-cpp-python>=0.3.16
 Requires-Dist: nvidia-ml-py>=13.590.44
 Requires-Dist: ollama>=0.6.0
 Requires-Dist: platformdirs>=4.5.0
@@ -24,9 +24,13 @@ Requires-Dist: psutil>=7.1.3
 Requires-Dist: py-cpuinfo>=9.0.0
 Requires-Dist: pydantic>=2.12.4
 Requires-Dist: pyperclip>=1.11.0
+Requires-Dist: sentence-transformers>=5.2.2
 Requires-Dist: textual>=6.5.0
+Requires-Dist: transformers[torch]>=4.57.6
 Requires-Dist: typer>=0.20.0
-Requires-Python: >=3.12
+Requires-Dist: llama-cpp-python>=0.3.16 ; extra == 'llamacpp'
+Requires-Python: ~=3.12.0
+Provides-Extra: llamacpp
 Description-Content-Type: text/markdown
 # RAG-demo
@@ -35,50 +39,43 @@ Chat with (a small portion of) Wikipedia
 ⚠️ RAG functionality is still under development. ⚠️
-![app screenshot](screenshots/screenshot_062f205a.png "App screenshot (this AI response is not accurate)")
+![app screenshot](screenshots/screenshot_0.2.0.png "App screenshot")
 ## Requirements
- 1. [uv](https://docs.astral.sh/uv/)
- 2. At least one of the following:
-    - A suitable terminal emulator. In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)). On Linux, you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/) instead of the terminal that came with your DE ([reason](https://darren.codes/posts/textual-copy-paste/)). Windows Terminal should be fine as far as I know.
-    - Any common web browser
+ 1. The [uv](https://docs.astral.sh/uv/) Python package manager
+    - Installing and updating `uv` is easy by following [the docs](https://docs.astral.sh/uv/getting-started/installation/).
+    - As of 2026-01-25, I'm developing using `uv` version 0.9.26, and using the new experimental `--pytorch-backend` option.
+ 2. A terminal emulator or web browser
+    - Any common web browser will work.
+    - Some terminal emulators will work better than others.
+      See [Notes on terminal emulators](#notes-on-terminal-emulators) below.
-## Optional stuff that could make your experience better
+### Notes on terminal emulators
+Certain terminal emulators will not work with some features of this program.
+In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)).
+On Linux you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/), instead of the terminal that came with your desktop environment ([reason](https://darren.codes/posts/textual-copy-paste/)).
+Windows Terminal should be fine as far as I know.
+### Optional dependencies
  1. [Hugging Face login](https://huggingface.co/docs/huggingface_hub/quick-start#login)
  2. API key for your favorite LLM provider (support coming soon)
  3. Ollama installed on your system if you have a GPU
  4. Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
+ 5. A C compiler if you want to build Llama.cpp from source.
-## Run from the repository
-First, clone this repository. Then, run one of the options below.
+## Run the latest version
 Run in a terminal:
 ```bash
-uv run chat
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest chat
 ```
 Or run in a web browser:
 ```bash
-uv run textual serve chat
-```
-## Run from the latest version on PyPI
-TODO: test uv automatic torch backend selection:
-https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection
-Run in a terminal:
-```bash
-uvx --from=jehoctor-rag-demo chat
-```
-Or run in a web browser:
-```bash
-uvx --from=jehoctor-rag-demo textual serve chat
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chat
 ```
 ## CUDA acceleration via Llama.cpp
@@ -86,15 +83,43 @@ uvx --from=jehoctor-rag-demo textual serve chat
 If you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.
 ```bash
-CMAKE_ARGS="-DGGML_CUDA=on" uv run chat
+CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chat
 ```
 ## Metal acceleration via Llama.cpp (on Apple Silicon)
 On an Apple Silicon machine, make sure `uv` runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support.
+Also, run with the extra group `llamacpp`.
+Try this:
+```bash
+uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from=jehoctor-rag-demo[llamacpp]@latest chat
+```
 ## Ollama on Linux
 Remember that you have to keep Ollama up-to-date manually on Linux.
 A recent version of Ollama (v0.11.10 or later) is required to run the [embedding model we use](https://ollama.com/library/embeddinggemma).
 See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.
+## Project feature roadmap
+- ❌ RAG functionality
+- ❌ torch inference via the Langchain local Hugging Face inference integration
+- ❌ uv automatic torch backend selection (see [the docs](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection))
+- ❌ OpenAI integration
+- ❌ Anthropic integration
+## Run from the repository
+First, clone this repository. Then, run one of the options below.
+Run in a terminal:
+```bash
+uv run chat
+```
+Or run in a web browser:
+```bash
+uv run textual serve chat
+```

jehoctor_rag_demo-0.2.1/README.md ADDED Viewed

@@ -0,0 +1,90 @@
+# RAG-demo
+Chat with (a small portion of) Wikipedia
+⚠️ RAG functionality is still under development. ⚠️
+![app screenshot](screenshots/screenshot_0.2.0.png "App screenshot")
+## Requirements
+ 1. The [uv](https://docs.astral.sh/uv/) Python package manager
+    - Installing and updating `uv` is easy by following [the docs](https://docs.astral.sh/uv/getting-started/installation/).
+    - As of 2026-01-25, I'm developing using `uv` version 0.9.26, and using the new experimental `--pytorch-backend` option.
+ 2. A terminal emulator or web browser
+    - Any common web browser will work.
+    - Some terminal emulators will work better than others.
+      See [Notes on terminal emulators](#notes-on-terminal-emulators) below.
+### Notes on terminal emulators
+Certain terminal emulators will not work with some features of this program.
+In particular, on macOS consider using [iTerm2](https://iterm2.com/) instead of the default Terminal.app ([explanation](https://textual.textualize.io/FAQ/#why-doesnt-textual-look-good-on-macos)).
+On Linux you might want to try [kitty](https://sw.kovidgoyal.net/kitty/), [wezterm](https://wezterm.org/), [alacritty](https://alacritty.org/), or [ghostty](https://ghostty.org/), instead of the terminal that came with your desktop environment ([reason](https://darren.codes/posts/textual-copy-paste/)).
+Windows Terminal should be fine as far as I know.
+### Optional dependencies
+ 1. [Hugging Face login](https://huggingface.co/docs/huggingface_hub/quick-start#login)
+ 2. API key for your favorite LLM provider (support coming soon)
+ 3. Ollama installed on your system if you have a GPU
+ 4. Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
+ 5. A C compiler if you want to build Llama.cpp from source.
+## Run the latest version
+Run in a terminal:
+```bash
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest chat
+```
+Or run in a web browser:
+```bash
+uvx --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chat
+```
+## CUDA acceleration via Llama.cpp
+If you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.
+```bash
+CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chat
+```
+## Metal acceleration via Llama.cpp (on Apple Silicon)
+On an Apple Silicon machine, make sure `uv` runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support.
+Also, run with the extra group `llamacpp`.
+Try this:
+```bash
+uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from=jehoctor-rag-demo[llamacpp]@latest chat
+```
+## Ollama on Linux
+Remember that you have to keep Ollama up-to-date manually on Linux.
+A recent version of Ollama (v0.11.10 or later) is required to run the [embedding model we use](https://ollama.com/library/embeddinggemma).
+See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.
+## Project feature roadmap
+- ❌ RAG functionality
+- ❌ torch inference via the Langchain local Hugging Face inference integration
+- ❌ uv automatic torch backend selection (see [the docs](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection))
+- ❌ OpenAI integration
+- ❌ Anthropic integration
+## Run from the repository
+First, clone this repository. Then, run one of the options below.
+Run in a terminal:
+```bash
+uv run chat
+```
+Or run in a web browser:
+```bash
+uv run textual serve chat
+```

{jehoctor_rag_demo-0.2.0 → jehoctor_rag_demo-0.2.1}/pyproject.toml RENAMED Viewed

@@ -1,16 +1,19 @@
 [project]
 name = "jehoctor-rag-demo"
-version = "0.2.0"
+version = "0.2.1"
 description = "Chat with Wikipedia"
 readme = "README.md"
 authors = [
     { name = "James Hoctor", email = "JEHoctor@protonmail.com" }
 ]
-requires-python = ">=3.12"
+requires-python = "~=3.12.0"
 # TODO: Reverse pinning of aiosqlite to 0.21.0 to work around this issue:
 # https://github.com/langchain-ai/langgraph/issues/6583
+# TODO: Should I depend on xformers "for a more memory-efficient attention implementation?
+# https://docs.langchain.com/oss/python/integrations/llms/huggingface_pipelines
 dependencies = [
     "aiosqlite==0.21.0",
+    "bitsandbytes>=0.49.1",
     "chromadb>=1.3.4",
     "datasets>=4.4.1",
     "httpx>=0.28.1",
@@ -22,7 +25,6 @@ dependencies = [
     "langchain-ollama>=1.0.0",
     "langchain-openai>=1.0.2",
     "langgraph-checkpoint-sqlite>=3.0.1",
-    "llama-cpp-python>=0.3.16",
     "nvidia-ml-py>=13.590.44",
     "ollama>=0.6.0",
     "platformdirs>=4.5.0",
@@ -30,20 +32,31 @@ dependencies = [
     "py-cpuinfo>=9.0.0",
     "pydantic>=2.12.4",
     "pyperclip>=1.11.0",
+    "sentence-transformers>=5.2.2",
     "textual>=6.5.0",
+    "transformers[torch]>=4.57.6",
     "typer>=0.20.0",
 ]
+[project.optional-dependencies]
+llamacpp = [
+    "llama-cpp-python>=0.3.16",
+]
 [project.scripts]
 chat = "rag_demo.__main__:main"
 [dependency-groups]
 dev = [
-    "pytest>=8.4.2",
     "ruff>=0.14.3",
     "mypy>=1.18.2",
     "textual-dev>=1.8.0",
     "ipython>=9.7.0",
+    "ty>=0.0.13",
+    "uv-outdated>=1.0.4",
+]
+test = [
+    "pytest>=8.4.2",
     "pytest-cov>=7.0.0",
     "pytest-asyncio>=1.3.0",
 ]
@@ -62,6 +75,7 @@ explicit = true
 [tool.uv.sources]
 llama-cpp-python = [
     { index = "llama-cpp-metal", marker = "platform_machine == 'arm64' and sys_platform == 'darwin'" },
+    { index = "llama-cpp-metal", marker = "platform_machine == 'aarch64' and sys_platform == 'darwin'" },
 ]
 [build-system]
@@ -75,15 +89,25 @@ module-name = "rag_demo"
 line-length = 120
 [tool.ruff.lint]
-per-file-ignores = { "__init__.py" = ["F401"] }  # Ignore unused-import in all __init__.py files.
 select = ["ALL"]
 ignore = [
-    "E501",   # Handled by ruff format (line-too-long)
-    "D100",   # undocumented-public-module
-    "D104",   # undocumented-public-package
-    "D203",   # Conflicts with Google style D211/D212
-    "ANN101", # Missing type annotation for self
-    "ANN102", # Missing type annotation for cls
+    "E501",      # Handled by ruff format (line-too-long)
+    "D100",      # undocumented-public-module
+    "D104",      # undocumented-public-package
+    "D203",      # Conflicts with Google style D211/D212
+    "ANN101",    # Missing type annotation for self
+    "ANN102",    # Missing type annotation for cls
+    "PLE1205",   # This rule falsely identifies Textual Loggers as standard Python Loggers, creating false positives.
+    "TRY400",    # Textual.Logger doesn't provide an exception logger, so it's fine to use Logger.error instead.
+]
+[tool.ruff.lint.per-file-ignores]
+"__init__.py" = ["F401"]  # Ignore unused-import in all __init__.py files.
+"tests/*" = [
+    "S101",      # Assert statements are allowed in tests.
+    "INP001",    # No need to create __init__.py files in the tests/ directory; only pytest runs the tests.
+    "C419",      # It's OK to use extra list comprehensions in tests to make the output more informative.
+    "PLR2004",   # There are going to be some magic values in the tests. It's OK.
 ]
 [tool.ruff.lint.pydocstyle]
@@ -100,3 +124,6 @@ files = ["src/", "tests/"]
 [tool.mypy.plugins]
 pydantic.mypy.plugins = { enabled = true }
+[tool.pytest.ini_options]
+asyncio_mode = "auto"

jehoctor_rag_demo-0.2.1/src/rag_demo/__main__.py ADDED Viewed

@@ -0,0 +1,42 @@
+import time
+# Measure the application start time.
+APPLICATION_START_TIME = time.time()
+# Disable "module import not at top of file" (aka E402) when importing Typer and other early imports. This is necessary
+# so that the initialization of these modules is included in the application startup time.
+from typing import Annotated  # noqa: E402
+import typer  # noqa: E402
+from rag_demo.constants import LocalProviderType  # noqa: E402
+def _main(
+    name: Annotated[str | None, typer.Option(help="The name you want to want the AI to use with you.")] = None,
+    provider: Annotated[LocalProviderType | None, typer.Option(help="The local provider to prefer.")] = None,
+) -> None:
+    """Talk to Wikipedia."""
+    # Import here so that imports run within the typer.run context for prettier stack traces if errors occur.
+    # We ignore PLC0415 because we do not want these imports to be at the top of the module as is usually preferred.
+    import transformers  # noqa: PLC0415
+    from rag_demo.app import RAGDemo  # noqa: PLC0415
+    from rag_demo.logic import Logic  # noqa: PLC0415
+    # The transformers library likes to print text that interferes with the TUI. Disable it.
+    transformers.logging.set_verbosity(verbosity=transformers.logging.CRITICAL)
+    transformers.logging.disable_progress_bar()
+    logic = Logic(username=name, preferred_provider_type=provider, application_start_time=APPLICATION_START_TIME)
+    app = RAGDemo(logic)
+    app.run()
+def main() -> None:
+    """Entrypoint for the rag demo, specifically the `chat` command."""
+    typer.run(_main)
+if __name__ == "__main__":
+    main()

jehoctor_rag_demo-0.2.1/src/rag_demo/agents/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from .base import Agent, AgentProvider
+from .hugging_face import HuggingFaceAgent, HuggingFaceAgentProvider
+from .llama_cpp import LlamaCppAgent, LlamaCppAgentProvider
+from .ollama import OllamaAgent, OllamaAgentProvider

jehoctor_rag_demo-0.2.1/src/rag_demo/agents/base.py ADDED Viewed

@@ -0,0 +1,40 @@
+from __future__ import annotations
+from typing import TYPE_CHECKING, Final, Protocol
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from contextlib import AbstractAsyncContextManager
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+    from rag_demo.constants import LocalProviderType
+class Agent(Protocol):
+    """An LLM agent that supports streaming responses asynchronously."""
+    def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+class AgentProvider(Protocol):
+    """A strategy for creating LLM agents."""
+    type: Final[LocalProviderType]
+    def get_agent(self, checkpoints_sqlite_db: str | Path) -> AbstractAsyncContextManager[Agent | None]:
+        """Attempt to create an agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """

jehoctor_rag_demo-0.2.1/src/rag_demo/agents/hugging_face.py ADDED Viewed

@@ -0,0 +1,116 @@
+from __future__ import annotations
+import asyncio
+import sqlite3
+from contextlib import asynccontextmanager
+from typing import TYPE_CHECKING, Final
+from huggingface_hub import hf_hub_download
+from langchain.agents import create_agent
+from langchain.messages import AIMessageChunk, HumanMessage
+from langchain_huggingface import ChatHuggingFace, HuggingFaceEmbeddings, HuggingFacePipeline
+from langgraph.checkpoint.sqlite import SqliteSaver
+from rag_demo.constants import LocalProviderType
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+class HuggingFaceAgent:
+    """An LLM agent powered by Hugging Face local pipelines."""
+    def __init__(
+        self,
+        checkpoints_sqlite_db: str | Path,
+        model_id: str,
+        embedding_model_id: str,
+    ) -> None:
+        """Initialize the HuggingFaceAgent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+            model_id (str): Hugging Face model ID for the LLM.
+            embedding_model_id (str): Hugging Face model ID for the embedding model.
+        """
+        self.checkpoints_sqlite_db = checkpoints_sqlite_db
+        self.model_id = model_id
+        self.embedding_model_id = embedding_model_id
+        self.llm = ChatHuggingFace(
+            llm=HuggingFacePipeline.from_model_id(
+                model_id=model_id,
+                task="text-generation",
+                device_map="auto",
+                pipeline_kwargs={"max_new_tokens": 4096},
+            ),
+        )
+        self.embed = HuggingFaceEmbeddings(model_name=embedding_model_id)
+        self.agent = create_agent(
+            model=self.llm,
+            system_prompt="You are a helpful assistant.",
+            checkpointer=SqliteSaver(sqlite3.Connection(self.checkpoints_sqlite_db, check_same_thread=False)),
+        )
+    async def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+        agent_stream = self.agent.stream(
+            {"messages": [HumanMessage(content=user_message)]},
+            {"configurable": {"thread_id": thread_id}},
+            stream_mode="messages",
+        )
+        for message_chunk, _ in agent_stream:
+            if isinstance(message_chunk, AIMessageChunk):
+                token = message_chunk.content
+                if isinstance(token, str):
+                    yield token
+                else:
+                    app.log.error("Received message content of type", type(token))
+            else:
+                app.log.error("Received message chunk of type", type(message_chunk))
+def _hf_downloads() -> None:
+    hf_hub_download(
+        repo_id="Qwen/Qwen3-0.6B",  # 1.5GB
+        filename="model.safetensors",
+        revision="c1899de289a04d12100db370d81485cdf75e47ca",
+    )
+    hf_hub_download(
+        repo_id="unsloth/embeddinggemma-300m",  # 1.21GB
+        filename="model.safetensors",
+        revision="bfa3c846ac738e62aa61806ef9112d34acb1dc5a",
+    )
+class HuggingFaceAgentProvider:
+    """Create LLM agents using Hugging Face local pipelines."""
+    type: Final[LocalProviderType] = LocalProviderType.HUGGING_FACE
+    @asynccontextmanager
+    async def get_agent(self, checkpoints_sqlite_db: str | Path) -> AsyncIterator[HuggingFaceAgent]:
+        """Create a Hugging Face local pipeline agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """
+        loop = asyncio.get_running_loop()
+        await loop.run_in_executor(None, _hf_downloads)
+        yield HuggingFaceAgent(
+            checkpoints_sqlite_db,
+            model_id="Qwen/Qwen3-0.6B",
+            embedding_model_id="unsloth/embeddinggemma-300m",
+        )

jehoctor_rag_demo-0.2.1/src/rag_demo/agents/llama_cpp.py ADDED Viewed

@@ -0,0 +1,113 @@
+from __future__ import annotations
+import asyncio
+from contextlib import asynccontextmanager
+from typing import TYPE_CHECKING, Final
+import aiosqlite
+from huggingface_hub import hf_hub_download
+from langchain.agents import create_agent
+from langchain.messages import AIMessageChunk, HumanMessage
+from langchain_community.chat_models import ChatLlamaCpp
+from langchain_community.embeddings import LlamaCppEmbeddings
+from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
+from rag_demo import probe
+from rag_demo.constants import LocalProviderType
+if TYPE_CHECKING:
+    from collections.abc import AsyncIterator
+    from pathlib import Path
+    from rag_demo.app_protocol import AppProtocol
+class LlamaCppAgent:
+    """An LLM agent powered by Llama.cpp."""
+    def __init__(
+        self,
+        checkpoints_conn: aiosqlite.Connection,
+        model_path: str,
+        embedding_model_path: str,
+    ) -> None:
+        """Initialize the LlamaCppAgent.
+        Args:
+            checkpoints_conn (aiosqlite.Connection): Connection to SQLite checkpoint database.
+            model_path (str): Path to Llama.cpp model.
+            embedding_model_path (str): Path to Llama.cpp embedding model.
+        """
+        self.checkpoints_conn = checkpoints_conn
+        self.llm = ChatLlamaCpp(model_path=model_path, verbose=False)
+        self.embed = LlamaCppEmbeddings(model_path=embedding_model_path, verbose=False)
+        self.agent = create_agent(
+            model=self.llm,
+            system_prompt="You are a helpful assistant.",
+            checkpointer=AsyncSqliteSaver(self.checkpoints_conn),
+        )
+    async def astream(self, user_message: str, thread_id: str, app: AppProtocol) -> AsyncIterator[str]:
+        """Stream a response from the agent.
+        Args:
+            user_message (str): User's next prompt in the conversation.
+            thread_id (str): Identifier for the current thread/conversation.
+            app (AppProtocol): Application interface, commonly used for logging.
+        Yields:
+            str: A token from the agent's response.
+        """
+        agent_stream = self.agent.astream(
+            {"messages": [HumanMessage(content=user_message)]},
+            {"configurable": {"thread_id": thread_id}},
+            stream_mode="messages",
+        )
+        async for message_chunk, _ in agent_stream:
+            if isinstance(message_chunk, AIMessageChunk):
+                token = message_chunk.content
+                if isinstance(token, str):
+                    yield token
+                else:
+                    app.log.error("Received message content of type", type(token))
+            else:
+                app.log.error("Received message chunk of type", type(message_chunk))
+def _hf_downloads() -> tuple[str, str]:
+    model_path = hf_hub_download(
+        repo_id="bartowski/google_gemma-3-4b-it-GGUF",
+        filename="google_gemma-3-4b-it-Q6_K_L.gguf",  # 3.35GB
+        revision="71506238f970075ca85125cd749c28b1b0eee84e",
+    )
+    embedding_model_path = hf_hub_download(
+        repo_id="CompendiumLabs/bge-small-en-v1.5-gguf",
+        filename="bge-small-en-v1.5-q8_0.gguf",  # 36.8MB
+        revision="d32f8c040ea3b516330eeb75b72bcc2d3a780ab7",
+    )
+    return model_path, embedding_model_path
+class LlamaCppAgentProvider:
+    """Create LLM agents using Llama.cpp."""
+    type: Final[LocalProviderType] = LocalProviderType.LLAMA_CPP
+    @asynccontextmanager
+    async def get_agent(self, checkpoints_sqlite_db: str | Path) -> AsyncIterator[LlamaCppAgent | None]:
+        """Attempt to create a Llama.cpp agent.
+        Args:
+            checkpoints_sqlite_db (str | Path): Connection string for SQLite database used for LangChain checkpoints.
+        """
+        if probe.probe_llama_available():
+            loop = asyncio.get_running_loop()
+            model_path, embedding_model_path = await loop.run_in_executor(None, _hf_downloads)
+            async with aiosqlite.connect(database=checkpoints_sqlite_db) as checkpoints_conn:
+                yield LlamaCppAgent(
+                    checkpoints_conn=checkpoints_conn,
+                    model_path=model_path,
+                    embedding_model_path=embedding_model_path,
+                )
+        else:
+            yield None

jehoctor-rag-demo 0.2.0__tar.gz → 0.2.1__tar.gz

jehoctor-rag-demo 0.2.0tar.gz → 0.2.1tar.gz