PyPI - optichat - Versions diffs - 0.1.0__tar.gz - Mend

optichat 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (22) hide show

optichat-0.1.0/PKG-INFO +138 -0
optichat-0.1.0/Readme.md +114 -0
optichat-0.1.0/app/__init__.py +1 -0
optichat-0.1.0/app/connect_models.py +340 -0
optichat-0.1.0/app/memory.py +580 -0
optichat-0.1.0/app/pipeline.py +229 -0
optichat-0.1.0/app/pipeline_functions.py +1162 -0
optichat-0.1.0/db/__init__.py +1 -0
optichat-0.1.0/db/database.py +445 -0
optichat-0.1.0/main.py +5 -0
optichat-0.1.0/optichat.egg-info/PKG-INFO +138 -0
optichat-0.1.0/optichat.egg-info/SOURCES.txt +20 -0
optichat-0.1.0/optichat.egg-info/dependency_links.txt +1 -0
optichat-0.1.0/optichat.egg-info/requires.txt +13 -0
optichat-0.1.0/optichat.egg-info/top_level.txt +4 -0
optichat-0.1.0/pyproject.toml +40 -0
optichat-0.1.0/setup.cfg +4 -0
optichat-0.1.0/ui/__init__.py +1 -0
optichat-0.1.0/ui/help.py +178 -0
optichat-0.1.0/ui/layout.py +540 -0
optichat-0.1.0/ui/layout_assets.py +513 -0
optichat-0.1.0/ui/style.tcss +417 -0

optichat-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,138 @@
+Metadata-Version: 2.4
+Name: optichat
+Version: 0.1.0
+Summary: An advanced terminal-based chat application built with Python and Textual.
+Author: OptiChat Contributors
+License-Expression: MIT
+Classifier: Programming Language :: Python :: 3
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: textual
+Requires-Dist: textual-dev
+Requires-Dist: langchain
+Requires-Dist: langchain-community
+Requires-Dist: langchain-core
+Requires-Dist: langchain-openai
+Requires-Dist: langchain-anthropic
+Requires-Dist: langchain-google-genai
+Requires-Dist: langgraph
+Requires-Dist: ollama
+Requires-Dist: chromadb
+Requires-Dist: sentence-transformers
+Requires-Dist: ddgs
+# OptiChat
+OptiChat is an advanced terminal-based chat application built with Python and Textual. It features a robust multi-tier memory system, personalized memory tracking, dynamic model connectivity (including cloud and local Ollama models), and a sophisticated prompt construction pipeline for high-quality, contextual AI responses.
+## 🌟 Key Features
+*   **Terminal-based UI**: A beautiful, responsive interface built with Textual, featuring tabs, chat session sidebars, and customizable themes.
+*   **Multi-Tier Memory System**:
+    *   **Short-Term Memory**: Token-budgeted rolling window for recent context.
+    *   **LRU Memory**: Background-processed cache of frequently used messages.
+    *   **Long-Term Memory**: Persistent vector store (ChromaDB) for semantic search across conversations.
+    *   **Personalized Memory**: Automatically learns and updates user preferences, interests, and interaction styles with conflict resolution.
+*   **Dynamic Model Connectivity**: Support for OpenAI, Anthropic, Gemini, and local models via Ollama.
+*   **Prompt Construction Pipeline**: Utilizes LangGraph to dynamically classify queries, retrieve memory, apply personalization, and enforce structured output schemas.
+*   **Chat Trace Logs**: Every assistant response includes a collapsible section showing the model's chain-of-thought ToDo plan – what the model thought before responding.
+*   **Adaptive Response**: Response length and depth dynamically adapt to question complexity (simple → concise, complex → thorough and comprehensive).
+*   **Auto Chat Naming**: New chats are automatically renamed based on your first question (2-3 word title) via a background thread.
+*   **Secure Local Storage**: All data, including settings, API keys (via `.env`), SQLite databases for chats, and ChromaDB vectors, are stored securely in your local `~/.optichat/` directory.
+## 🏗️ Architecture
+### Storage
+OptiChat stores its data locally in `~/.optichat/`. This includes:
+- `config.json` for global settings.
+- `optichat.db` (SQLite) for storing chats, messages, and session metadata.
+- `chroma/` for ChromaDB vector embeddings.
+- Flat files for chat-specific short-term and LRU caches.
+### Memory Pipeline
+1.  **Short-term**: Retains the most recent 3-5 messages.
+2.  **LRU Cache**: Frequently accessed context swapped in from long-term memory.
+3.  **Long-term**: Chunks and embeds responses into ChromaDB for semantic retrieval.
+4.  **Personalized**: Analyzes user behavior and explicitly stated preferences to tailor AI responses.
+### Prompt Construction
+Using LangChain and LangGraph, the pipeline:
+1.  Classifies the user input (type, complexity).
+2.  Retrieves relevant context (Short-term, LRU, or Long-term via semantic search).
+3.  Scores and orders the context.
+4.  Injects personalized memory (tone, length, interests).
+5.  Selects an appropriate output schema (e.g., factual, procedural, coding).
+6.  Instructs the model to produce a **chain-of-thought ToDo plan** (`<TRACE>…</TRACE>`) before answering.
+7.  Applies **adaptive response** instructions based on detected question complexity.
+8.  Streams the final response and parses the trace log for display.
+### Chat Trace Logs
+Every assistant response includes a collapsible **Chat Trace Logs** widget at the bottom of the message bubble. This displays the numbered ToDo plan (chain-of-thought) that the model produced before generating its answer. Click to expand and inspect the model's reasoning process — useful for debugging, understanding responses, and evaluating quality.
+### Adaptive Response
+Response length automatically adapts to question complexity:
+| Complexity | Behaviour |
+| :--- | :--- |
+| **Simple** | Concise, focused answer — a few sentences. |
+| **Moderate** | Well-structured with paragraphs, lists, and examples. |
+| **Complex** | Comprehensive and thorough — covers all aspects, edge cases, and examples. |
+Complexity is auto-detected from signal words (e.g., *"briefly"* → simple, *"in detail"* → complex).
+### Auto Chat Naming
+New chats start with a generic "Chat N" name. After the first AI response, a background thread automatically renames the chat based on your first question, producing a short 2-3 word title.
+## 🛠️ Setup & Installation
+1. **Clone the repository:**
+   ```bash
+   git clone <repository_url>
+   cd OptiChat
+   ```
+2. **Create a virtual environment (optional but recommended):**
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Run OptiChat:**
+   ```bash
+   python main.py # runs in terminal
+   textual run --dev main.py # runs in textual UI (Slower startup)
+   ```
+   *Note: OptiChat will automatically create the `~/.optichat/` directory and necessary files upon first launch.*
+5. **Configure AI Models:**
+   - Launch the application and navigate to the **Settings** tab.
+   - Enter your API keys for Cloud Providers (OpenAI, Anthropic, Gemini).
+   - Alternatively, ensure [Ollama](https://ollama.com/) is running locally to auto-detect and use local models.
+   - **DISCLAIMER: API models consume a lot of tokens for chats as multiple calls are used for a single response, use local models for longer conversations**
+## ⌨️ Keyboard Shortcuts
+| Shortcut | Action |
+| :--- | :--- |
+| `Ctrl+Q` | Quit OptiChat and close the layout |
+| `Ctrl+R` | Toggle streaming on/off |
+| `Ctrl+C` | Cancel current streaming response mid-output |
+| `↑ / ↓` | Scroll through input history (previous commands/messages) |
+| `Page Up / Page Down` | Scroll the main panel content |
+## 🚀 Development Roadmap
+OptiChat is developed in structured phases:
+*   **Phase 1: UI Design via Textual** - Building the responsive terminal interface, navigation, settings panels for API keys and themes, and chat windows.
+*   **Phase 2: Core Backend & Model Connectivity** - Initializing the `~/.optichat/` environment, implementing SQLite for chat history, and connecting to Cloud/Local AI models using LangChain.
+*   **Phase 3: Memory Storing Mechanism** - Implementing the background threads for Short-Term, LRU, and Long-Term (ChromaDB) memory handling, along with personalized memory updates.
+*   **Phase 4: Prompt Construction Pipeline** - Orchestrating the advanced LangGraph pipeline for query classification, semantic retrieval, schema enforcement, chain-of-thought trace logs, adaptive response, auto chat naming, and intelligent prompt assembly.
+---
+*Developed using Textual, LangChain, and LangGraph.*

optichat-0.1.0/Readme.md ADDED Viewed

@@ -0,0 +1,114 @@
+# OptiChat
+OptiChat is an advanced terminal-based chat application built with Python and Textual. It features a robust multi-tier memory system, personalized memory tracking, dynamic model connectivity (including cloud and local Ollama models), and a sophisticated prompt construction pipeline for high-quality, contextual AI responses.
+## 🌟 Key Features
+*   **Terminal-based UI**: A beautiful, responsive interface built with Textual, featuring tabs, chat session sidebars, and customizable themes.
+*   **Multi-Tier Memory System**:
+    *   **Short-Term Memory**: Token-budgeted rolling window for recent context.
+    *   **LRU Memory**: Background-processed cache of frequently used messages.
+    *   **Long-Term Memory**: Persistent vector store (ChromaDB) for semantic search across conversations.
+    *   **Personalized Memory**: Automatically learns and updates user preferences, interests, and interaction styles with conflict resolution.
+*   **Dynamic Model Connectivity**: Support for OpenAI, Anthropic, Gemini, and local models via Ollama.
+*   **Prompt Construction Pipeline**: Utilizes LangGraph to dynamically classify queries, retrieve memory, apply personalization, and enforce structured output schemas.
+*   **Chat Trace Logs**: Every assistant response includes a collapsible section showing the model's chain-of-thought ToDo plan – what the model thought before responding.
+*   **Adaptive Response**: Response length and depth dynamically adapt to question complexity (simple → concise, complex → thorough and comprehensive).
+*   **Auto Chat Naming**: New chats are automatically renamed based on your first question (2-3 word title) via a background thread.
+*   **Secure Local Storage**: All data, including settings, API keys (via `.env`), SQLite databases for chats, and ChromaDB vectors, are stored securely in your local `~/.optichat/` directory.
+## 🏗️ Architecture
+### Storage
+OptiChat stores its data locally in `~/.optichat/`. This includes:
+- `config.json` for global settings.
+- `optichat.db` (SQLite) for storing chats, messages, and session metadata.
+- `chroma/` for ChromaDB vector embeddings.
+- Flat files for chat-specific short-term and LRU caches.
+### Memory Pipeline
+1.  **Short-term**: Retains the most recent 3-5 messages.
+2.  **LRU Cache**: Frequently accessed context swapped in from long-term memory.
+3.  **Long-term**: Chunks and embeds responses into ChromaDB for semantic retrieval.
+4.  **Personalized**: Analyzes user behavior and explicitly stated preferences to tailor AI responses.
+### Prompt Construction
+Using LangChain and LangGraph, the pipeline:
+1.  Classifies the user input (type, complexity).
+2.  Retrieves relevant context (Short-term, LRU, or Long-term via semantic search).
+3.  Scores and orders the context.
+4.  Injects personalized memory (tone, length, interests).
+5.  Selects an appropriate output schema (e.g., factual, procedural, coding).
+6.  Instructs the model to produce a **chain-of-thought ToDo plan** (`<TRACE>…</TRACE>`) before answering.
+7.  Applies **adaptive response** instructions based on detected question complexity.
+8.  Streams the final response and parses the trace log for display.
+### Chat Trace Logs
+Every assistant response includes a collapsible **Chat Trace Logs** widget at the bottom of the message bubble. This displays the numbered ToDo plan (chain-of-thought) that the model produced before generating its answer. Click to expand and inspect the model's reasoning process — useful for debugging, understanding responses, and evaluating quality.
+### Adaptive Response
+Response length automatically adapts to question complexity:
+| Complexity | Behaviour |
+| :--- | :--- |
+| **Simple** | Concise, focused answer — a few sentences. |
+| **Moderate** | Well-structured with paragraphs, lists, and examples. |
+| **Complex** | Comprehensive and thorough — covers all aspects, edge cases, and examples. |
+Complexity is auto-detected from signal words (e.g., *"briefly"* → simple, *"in detail"* → complex).
+### Auto Chat Naming
+New chats start with a generic "Chat N" name. After the first AI response, a background thread automatically renames the chat based on your first question, producing a short 2-3 word title.
+## 🛠️ Setup & Installation
+1. **Clone the repository:**
+   ```bash
+   git clone <repository_url>
+   cd OptiChat
+   ```
+2. **Create a virtual environment (optional but recommended):**
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+3. **Install dependencies:**
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. **Run OptiChat:**
+   ```bash
+   python main.py # runs in terminal
+   textual run --dev main.py # runs in textual UI (Slower startup)
+   ```
+   *Note: OptiChat will automatically create the `~/.optichat/` directory and necessary files upon first launch.*
+5. **Configure AI Models:**
+   - Launch the application and navigate to the **Settings** tab.
+   - Enter your API keys for Cloud Providers (OpenAI, Anthropic, Gemini).
+   - Alternatively, ensure [Ollama](https://ollama.com/) is running locally to auto-detect and use local models.
+   - **DISCLAIMER: API models consume a lot of tokens for chats as multiple calls are used for a single response, use local models for longer conversations**
+## ⌨️ Keyboard Shortcuts
+| Shortcut | Action |
+| :--- | :--- |
+| `Ctrl+Q` | Quit OptiChat and close the layout |
+| `Ctrl+R` | Toggle streaming on/off |
+| `Ctrl+C` | Cancel current streaming response mid-output |
+| `↑ / ↓` | Scroll through input history (previous commands/messages) |
+| `Page Up / Page Down` | Scroll the main panel content |
+## 🚀 Development Roadmap
+OptiChat is developed in structured phases:
+*   **Phase 1: UI Design via Textual** - Building the responsive terminal interface, navigation, settings panels for API keys and themes, and chat windows.
+*   **Phase 2: Core Backend & Model Connectivity** - Initializing the `~/.optichat/` environment, implementing SQLite for chat history, and connecting to Cloud/Local AI models using LangChain.
+*   **Phase 3: Memory Storing Mechanism** - Implementing the background threads for Short-Term, LRU, and Long-Term (ChromaDB) memory handling, along with personalized memory updates.
+*   **Phase 4: Prompt Construction Pipeline** - Orchestrating the advanced LangGraph pipeline for query classification, semantic retrieval, schema enforcement, chain-of-thought trace logs, adaptive response, auto chat naming, and intelligent prompt assembly.
+---
+*Developed using Textual, LangChain, and LangGraph.*

optichat-0.1.0/app/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ # OptiChat App Package

optichat-0.1.0/app/connect_models.py ADDED Viewed

@@ -0,0 +1,340 @@
+"""OptiChat – AI model connection layer.
+Responsibilities
+────────────────
+• Validate API keys against each provider.
+• List available models from cloud providers (OpenAI / Anthropic / Gemini).
+• Detect locally-installed Ollama models.
+• Instantiate LangChain chat model objects for actual inference.
+"""
+from __future__ import annotations
+from typing import Any
+from langchain_core.language_models.chat_models import BaseChatModel
+# ══════════════════════════════════════════════
+#  Provider registry
+# ══════════════════════════════════════════════
+PROVIDERS = ("openai", "anthropic", "gemini")
+# ══════════════════════════════════════════════
+#  API key validation
+# ══════════════════════════════════════════════
+def validate_api_key(provider: str, api_key: str) -> bool:
+    """Return True if *api_key* is accepted by *provider*.
+    Each provider check is a lightweight call (list models or a tiny request)
+    wrapped in a try/except so a bad key returns False.
+    """
+    try:
+        if provider == "openai":
+            return _validate_openai(api_key)
+        elif provider == "anthropic":
+            return _validate_anthropic(api_key)
+        elif provider == "gemini":
+            return _validate_gemini(api_key)
+        else:
+            return False
+    except Exception:
+        return False
+def _validate_openai(api_key: str) -> bool:
+    from openai import OpenAI
+    client = OpenAI(api_key=api_key)
+    # A successful models.list() call proves the key is valid
+    models = client.models.list()
+    # Consume at least one item to confirm
+    _ = next(iter(models))
+    return True
+def _validate_anthropic(api_key: str) -> bool:
+    from langchain_anthropic import ChatAnthropic
+    client = ChatAnthropic(api_key=api_key)
+    models = client.models.list()
+    # Consume at least one item to confirm
+    _ = next(iter(models))
+    return True
+def _validate_gemini(api_key: str) -> bool:
+    from google import genai
+    client = genai.Client(api_key=api_key)
+    models = list(client.models.list())
+    return len(models) > 0
+# ══════════════════════════════════════════════
+#  List cloud models
+# ══════════════════════════════════════════════
+def list_cloud_models(provider: str, api_key: str) -> list[dict[str, str]]:
+    """Return a list of ``{id, name}`` dicts for available models.
+    Only returns chat/completion-capable models where possible.
+    """
+    try:
+        if provider == "openai":
+            return _list_openai(api_key)
+        elif provider == "anthropic":
+            return _list_anthropic(api_key)
+        elif provider == "gemini":
+            return _list_gemini(api_key)
+    except Exception:
+        pass
+    return []
+def _list_openai(api_key: str) -> list[dict[str, str]]:
+    from openai import OpenAI
+    client = OpenAI(api_key=api_key)
+    models = client.models.list()
+    result: list[dict[str, str]] = []
+    for m in models:
+        mid = m.id
+        # Filter to chat-capable models (gpt- prefix)
+        if mid.startswith(("gpt-", "o", "chatgpt")):
+            result.append({"id": f"openai/{mid}", "name": mid})
+    result.sort(key=lambda x: x["name"])
+    return result
+def _list_anthropic(api_key: str) -> list[dict[str, str]]:
+    from langchain_anthropic import ChatAnthropic
+    client = ChatAnthropic(api_key=api_key)
+    models = client.models.list()
+    result: list[dict[str, str]] = []
+    for m in models:
+        result.append({"id": f"anthropic/{m.id}", "name": m.display_name or m.id})
+    result.sort(key=lambda x: x["name"])
+    return result
+def _list_gemini(api_key: str) -> list[dict[str, str]]:
+    from google import genai
+    client = genai.Client(api_key=api_key)
+    result: list[dict[str, str]] = []
+    for m in client.models.list():
+        name = getattr(m, "name", "")
+        display = getattr(m, "display_name", name)
+        # Only include generative models
+        if "gemini" in name.lower():
+            result.append({"id": f"gemini/{name}", "name": display})
+    result.sort(key=lambda x: x["name"])
+    return result
+# ══════════════════════════════════════════════
+#  Ollama – local model detection
+# ══════════════════════════════════════════════
+def detect_ollama_models() -> list[dict[str, str]]:
+    """Detect locally installed Ollama models.
+    Returns a list of ``{id, name, size}`` dicts, or an empty list
+    if Ollama is not running / not installed.
+    """
+    try:
+        from ollama import Client
+        client = Client(host='http://127.0.0.1:11434')
+        response = client.list()
+        result: list[dict[str, str]] = []
+        for m in response.models:
+            model_name = m.model if hasattr(m, "model") else m.name
+            size_bytes = getattr(m, "size", 0)
+            size_gb = f"{size_bytes / (1024 ** 3):.1f} GB" if size_bytes else "?"
+            result.append({
+                "id": f"ollama/{model_name}",
+                "name": model_name,
+                "size": size_gb,
+            })
+        return result
+    except Exception:
+        return []
+# ══════════════════════════════════════════════
+#  Create a LangChain chat model instance
+# ══════════════════════════════════════════════
+def get_chat_model(model_id: str) -> BaseChatModel:
+    """Instantiate and return a LangChain chat model for *model_id*.
+    *model_id* format: ``provider/model_name``
+    e.g. ``openai/gpt-4o``, ``anthropic/claude-sonnet-4-20250514``,
+         ``gemini/gemini-2.0-flash``, ``ollama/llama3``.
+    """
+    if "/" not in model_id:
+        raise ValueError(f"Invalid model_id format: {model_id!r}. Expected 'provider/model'.")
+    provider, model_name = model_id.split("/", 1)
+    if provider == "openai":
+        from langchain_openai import ChatOpenAI
+        return ChatOpenAI(model=model_name, streaming=True)
+    elif provider == "anthropic":
+        from langchain_anthropic import ChatAnthropic
+        return ChatAnthropic(model=model_name, streaming=True)
+    elif provider == "gemini":
+        from langchain_google_genai import ChatGoogleGenerativeAI
+        return ChatGoogleGenerativeAI(model=model_name, streaming=True)
+    elif provider == "ollama":
+        from langchain_community.chat_models import ChatOllama
+        return ChatOllama(model=model_name)
+    else:
+        raise ValueError(f"Unknown provider: {provider!r}")
+# ══════════════════════════════════════════════
+#  Pipeline-aware message sending  (Phase 4)
+# ══════════════════════════════════════════════
+async def send_message_via_pipeline(
+    model_id: str,
+    user_input: str,
+    chat_name: str,
+    chat_id: str,
+    *,
+    websearch_enabled: bool = False,
+) -> dict[str, str]:
+    """Run the user's message through the full prompt construction pipeline.
+    The pipeline handles classification, memory retrieval, prompt assembly,
+    LLM invocation, and post-processing (DB + memory storage).
+    Parameters
+    ----------
+    websearch_enabled:
+        When True the pipeline's classifier node queries DuckDuckGo for
+        the top-2 results and injects them into the final prompt
+        (Phase 5 feature).
+    Returns a dict with keys ``response`` (the assistant reply) and
+    ``trace_log`` (the chain-of-thought trace extracted from the model output).
+    """
+    from app.pipeline import run_pipeline
+    result = await run_pipeline(
+        user_input=user_input,
+        chat_name=chat_name,
+        chat_id=chat_id,
+        model_id=model_id,
+        websearch_enabled=websearch_enabled,
+    )
+    error = result.get("error")
+    if error:
+        return {"response": f"*{error}*", "trace_log": ""}
+    return {
+        "response": result.get("response", ""),
+        "trace_log": result.get("trace_log", ""),
+    }
+# ══════════════════════════════════════════════
+#  Legacy: direct send (no pipeline, for fallback)
+# ══════════════════════════════════════════════
+async def send_message(
+    model_id: str,
+    messages: list[dict[str, str]],
+    chat_name: str | None = None,
+    chat_id: str | None = None,
+) -> str:
+    """Send a list of {role, content} dicts and return the assistant reply.
+    If *chat_name* and *chat_id* are provided, the user message and AI
+    response are automatically fed through the memory pipeline.
+    NOTE: For Phase 4+, prefer ``send_message_via_pipeline()`` which runs
+    the full prompt construction pipeline.
+    """
+    from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
+    _type_map = {
+        "system": SystemMessage,
+        "user": HumanMessage,
+        "assistant": AIMessage,
+    }
+    lc_messages = [_type_map[m["role"]](content=m["content"]) for m in messages]
+    model = get_chat_model(model_id)
+    response = await model.ainvoke(lc_messages)
+    reply = str(response.content)
+    # ── Memory integration (Phase 3) ────────
+    if chat_name and chat_id:
+        try:
+            from app.memory import process_message
+            # Store the last user message in memory
+            user_msgs = [m for m in messages if m["role"] == "user"]
+            if user_msgs:
+                await process_message(chat_name, chat_id, "user", user_msgs[-1]["content"])
+            # Store the assistant reply in memory
+            await process_message(chat_name, chat_id, "assistant", reply)
+        except Exception:
+            pass  # Memory errors must not block the response
+    return reply
+async def stream_message(
+    model_id: str,
+    messages: list[dict[str, str]],
+    chat_name: str | None = None,
+    chat_id: str | None = None,
+):
+    """Yield token chunks as an async generator.
+    After streaming completes, the accumulated response is fed through
+    the memory pipeline if *chat_name* and *chat_id* are provided.
+    """
+    from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
+    _type_map = {
+        "system": SystemMessage,
+        "user": HumanMessage,
+        "assistant": AIMessage,
+    }
+    lc_messages = [_type_map[m["role"]](content=m["content"]) for m in messages]
+    model = get_chat_model(model_id)
+    full_response: list[str] = []
+    async for chunk in model.astream(lc_messages):
+        text = chunk.content if hasattr(chunk, "content") else str(chunk)
+        if text:
+            full_response.append(text)
+            yield text
+    # ── Memory integration (Phase 3) ────────
+    if chat_name and chat_id:
+        try:
+            from app.memory import process_message
+            # Store the last user message in memory
+            user_msgs = [m for m in messages if m["role"] == "user"]
+            if user_msgs:
+                await process_message(chat_name, chat_id, "user", user_msgs[-1]["content"])
+            # Store the full accumulated assistant response in memory
+            accumulated = "".join(full_response)
+            if accumulated:
+                await process_message(chat_name, chat_id, "assistant", accumulated)
+        except Exception:
+            pass  # Memory errors must not block the response