PyPI - chat-console - Versions diffs - 0.3.995__tar.gz → 0.4.2__tar.gz - Mend

chat-console 0.3.995tar.gz → 0.4.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

{chat_console-0.3.995/chat_console.egg-info → chat_console-0.4.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: chat-console
-Version: 0.3.995
+Version: 0.4.2
 Summary: A command-line interface for chatting with LLMs, storing chats and (future) rag interactions
 Home-page: https://github.com/wazacraftrfid/chat-console
 Author: Johnathan Greenaway
@@ -28,7 +28,8 @@ Dynamic: requires-dist
 Dynamic: requires-python
 Dynamic: summary
-# Chat CLI
+c# Chat CLI
 A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
@@ -37,6 +38,7 @@ A comprehensive command-line interface for chatting with various AI language mod
 - Interactive terminal UI with Textual library
 - Support for multiple AI models:
   - OpenAI models (GPT-3.5, GPT-4)
+  - OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
   - Anthropic models (Claude 3 Opus, Sonnet, Haiku)
 - Conversation history with search functionality
 - Customizable response styles (concise, detailed, technical, friendly)
@@ -71,6 +73,26 @@ Run the application:
 chat-cli
 ```
+### Testing Reasoning Models
+To test the OpenAI reasoning models implementation, you can use the included test script:
+```
+./test_reasoning.py
+```
+This script will test both completion and streaming with the available reasoning models.
+### About OpenAI Reasoning Models
+OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
+- Think before they answer, producing a long internal chain of thought
+- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
+- Use "reasoning tokens" to work through problems step by step before providing a response
+- Support different reasoning effort levels (low, medium, high)
+The implementation in this CLI supports both standard completions and streaming with these models.
 ### Keyboard Shortcuts
 - `q` - Quit the application

{chat_console-0.3.995 → chat_console-0.4.2}/README.md RENAMED Viewed

@@ -1,4 +1,5 @@
-# Chat CLI
+c# Chat CLI
 A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
@@ -7,6 +8,7 @@ A comprehensive command-line interface for chatting with various AI language mod
 - Interactive terminal UI with Textual library
 - Support for multiple AI models:
   - OpenAI models (GPT-3.5, GPT-4)
+  - OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
   - Anthropic models (Claude 3 Opus, Sonnet, Haiku)
 - Conversation history with search functionality
 - Customizable response styles (concise, detailed, technical, friendly)
@@ -41,6 +43,26 @@ Run the application:
 chat-cli
 ```
+### Testing Reasoning Models
+To test the OpenAI reasoning models implementation, you can use the included test script:
+```
+./test_reasoning.py
+```
+This script will test both completion and streaming with the available reasoning models.
+### About OpenAI Reasoning Models
+OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
+- Think before they answer, producing a long internal chain of thought
+- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
+- Use "reasoning tokens" to work through problems step by step before providing a response
+- Support different reasoning effort levels (low, medium, high)
+The implementation in this CLI supports both standard completions and streaming with these models.
 ### Keyboard Shortcuts
 - `q` - Quit the application

{chat_console-0.3.995 → chat_console-0.4.2}/app/__init__.py RENAMED Viewed

@@ -3,4 +3,4 @@ Chat CLI
 A command-line interface for chatting with various LLM providers like ChatGPT and Claude.
 """
-__version__ = "0.3.995"
+__version__ = "0.4.2"

{chat_console-0.3.995 → chat_console-0.4.2}/app/api/base.py RENAMED Viewed

@@ -82,9 +82,9 @@ class BaseModelClient(ABC):
             # If we couldn't get the provider from the UI, infer it from the model name
             # Check for common OpenAI model patterns or prefixes
-            if (model_name_lower.startswith(("gpt-", "text-", "davinci")) or
+            if (model_name_lower.startswith(("gpt-", "text-", "davinci", "o1", "o3", "o4")) or
                 "gpt" in model_name_lower or
-                model_name_lower in ["04-mini", "04", "04-turbo", "04-vision"]):
+                model_name_lower in ["04-mini", "04", "04-turbo", "04-vision", "o1", "o3", "o4-mini"]):
                 provider = "openai"
                 logger.info(f"Identified {model_name} as an OpenAI model")
             # Then check for Anthropic models - these should ALWAYS use Anthropic client
@@ -162,9 +162,9 @@ class BaseModelClient(ABC):
             # If we couldn't get the provider from the UI, infer it from the model name
             if not provider:
                 # Check for common OpenAI model patterns or prefixes
-                if (model_name_lower.startswith(("gpt-", "text-", "davinci")) or
+                if (model_name_lower.startswith(("gpt-", "text-", "davinci", "o1", "o3", "o4")) or
                     "gpt" in model_name_lower or
-                    model_name_lower in ["04-mini", "04", "04-turbo", "04-vision"]):
+                    model_name_lower in ["04-mini", "04", "04-turbo", "04-vision", "o1", "o3", "o4-mini"]):
                     if not AVAILABLE_PROVIDERS["openai"]:
                         raise Exception("OpenAI API key not found. Please set OPENAI_API_KEY environment variable.")
                     provider = "openai"

{chat_console-0.3.995 → chat_console-0.4.2}/app/api/ollama.py RENAMED Viewed

@@ -31,8 +31,96 @@ class OllamaClient(BaseModelClient):
         # Track model loading state
         self._model_loading = False
+        # Track preloaded models and their last use timestamp
+        self._preloaded_models = {}
+        # Default timeout values (in seconds)
+        self.DEFAULT_TIMEOUT = 30
+        self.MODEL_LOAD_TIMEOUT = 120
+        self.MODEL_PULL_TIMEOUT = 3600  # 1 hour for large models
         # Path to the cached models file
         self.models_cache_path = Path(__file__).parent.parent / "data" / "ollama-models.json"
+    def get_timeout_for_model(self, model_id: str, operation: str = "generate") -> int:
+        """
+        Calculate an appropriate timeout based on model size
+        Parameters:
+        - model_id: The model identifier
+        - operation: The operation type ('generate', 'load', 'pull')
+        Returns:
+        - Timeout in seconds
+        """
+        # Default timeouts by operation
+        default_timeouts = {
+            "generate": self.DEFAULT_TIMEOUT,      # 30s
+            "load": self.MODEL_LOAD_TIMEOUT,       # 2min
+            "pull": self.MODEL_PULL_TIMEOUT,       # 1h
+            "list": 5,                             # 5s
+            "test": 2                              # 2s
+        }
+        # Parameter size multipliers
+        size_multipliers = {
+            # For models < 3B
+            "1b": 0.5,
+            "2b": 0.7,
+            "3b": 1.0,
+            # For models 3B-10B
+            "5b": 1.2,
+            "6b": 1.3,
+            "7b": 1.5,
+            "8b": 1.7,
+            "9b": 1.8,
+            # For models 10B-20B
+            "13b": 2.0,
+            "14b": 2.0,
+            # For models 20B-50B
+            "27b": 3.0,
+            "34b": 3.5,
+            "40b": 4.0,
+            # For models 50B+
+            "70b": 5.0,
+            "80b": 6.0,
+            "100b": 7.0,
+            "400b": 10.0,
+            "405b": 10.0,
+        }
+        # Get the base timeout for the operation
+        base_timeout = default_timeouts.get(operation, self.DEFAULT_TIMEOUT)
+        # Try to determine the model size from the model ID
+        model_size = "7b"  # Default assumption is 7B parameters
+        model_lower = model_id.lower()
+        # Check for size indicators in the model name
+        for size in size_multipliers.keys():
+            if size in model_lower:
+                model_size = size
+                break
+        # If it's a known large model without size in name
+        if "llama3.1" in model_lower and not any(size in model_lower for size in size_multipliers.keys()):
+            model_size = "8b"  # Default for llama3.1 without size specified
+        # For first generation after model selection, if preloaded, use shorter timeout
+        if operation == "generate" and model_id in self._preloaded_models:
+            # For preloaded models, use a shorter timeout
+            return max(int(base_timeout * 0.7), 20)  # Min 20 seconds
+        # Calculate final timeout with multiplier
+        multiplier = size_multipliers.get(model_size, 1.0)
+        timeout = int(base_timeout * multiplier)
+        # For pull operation, ensure we have a reasonable maximum
+        if operation == "pull":
+            return min(timeout, 7200)  # Max 2 hours
+        logger.info(f"Calculated timeout for {model_id} ({operation}): {timeout}s (base: {base_timeout}s, multiplier: {multiplier})")
+        return timeout
     @classmethod
     async def create(cls) -> 'OllamaClient':
@@ -61,7 +149,29 @@ class OllamaClient(BaseModelClient):
             style_instructions = self._get_style_instructions(style)
             debug_log(f"Adding style instructions: {style_instructions[:50]}...")
             formatted_messages.append(style_instructions)
+        # Special case for title generation - check if this is a title generation message
+        is_title_generation = False
+        for msg in messages:
+            if msg.get("role") == "system" and "generate a brief, descriptive title" in msg.get("content", "").lower():
+                is_title_generation = True
+                debug_log("Detected title generation prompt")
+                break
+        # For title generation, use a direct approach
+        if is_title_generation:
+            debug_log("Using specialized formatting for title generation")
+            # Find the user message containing the input for title generation
+            user_msg = next((msg for msg in messages if msg.get("role") == "user"), None)
+            if user_msg and "content" in user_msg:
+                # Create a direct prompt
+                prompt = "Generate a short descriptive title (maximum 40 characters) for this conversation. ONLY RESPOND WITH THE TITLE FOR THE FOLLOWING MESSAGE:\n\n" + user_msg["content"]
+                debug_log(f"Created title generation prompt: {prompt[:100]}...")
+                return prompt
+            else:
+                debug_log("Could not find user message for title generation, using standard formatting")
+        # Standard processing for normal chat messages
         # Add message content, preserving conversation flow
         for i, msg in enumerate(messages):
             try:
@@ -185,6 +295,7 @@ class OllamaClient(BaseModelClient):
             try:
                 async with aiohttp.ClientSession() as session:
                     logger.debug(f"Sending request to {self.base_url}/api/generate")
+                    gen_timeout = self.get_timeout_for_model(model, "generate")
                     async with session.post(
                         f"{self.base_url}/api/generate",
                         json={
@@ -193,12 +304,16 @@ class OllamaClient(BaseModelClient):
                             "temperature": temperature,
                             "stream": False
                         },
-                        timeout=30
+                        timeout=gen_timeout
                     ) as response:
                         response.raise_for_status()
                         data = await response.json()
                         if "response" not in data:
                             raise Exception("Invalid response format from Ollama server")
+                        # Update the model usage timestamp to keep it hot
+                        self.update_model_usage(model)
                         return data["response"]
             except aiohttp.ClientConnectorError:
@@ -324,10 +439,11 @@ class OllamaClient(BaseModelClient):
                                 "stream": False
                             }
+                        test_timeout = self.get_timeout_for_model(model, "test")
                         async with session.post(
                             f"{self.base_url}/api/generate",
                             json=test_payload,
-                            timeout=2
+                            timeout=test_timeout
                         ) as response:
                             if response.status != 200:
                                 logger.warning(f"Model test request failed with status {response.status}")
@@ -361,10 +477,11 @@ class OllamaClient(BaseModelClient):
                             debug_log(f"Error preparing pull payload: {str(pull_err)}, using default")
                             pull_payload = {"name": "gemma:2b"}  # Safe default
+                        pull_timeout = self.get_timeout_for_model(model, "pull")
                         async with session.post(
                             f"{self.base_url}/api/pull",
                             json=pull_payload,
-                            timeout=60
+                            timeout=pull_timeout
                         ) as pull_response:
                             if pull_response.status != 200:
                                 logger.error("Failed to pull model")
@@ -415,10 +532,11 @@ class OllamaClient(BaseModelClient):
                         }
                     debug_log(f"Sending request to Ollama API")
+                    gen_timeout = self.get_timeout_for_model(model, "generate")
                     response = await session.post(
                         f"{self.base_url}/api/generate",
                         json=request_payload,
-                        timeout=60  # Longer timeout for actual generation
+                        timeout=gen_timeout
                     )
                     response.raise_for_status()
                     debug_log(f"Response status: {response.status}")
@@ -426,6 +544,9 @@ class OllamaClient(BaseModelClient):
                     # Use a simpler async iteration pattern that's less error-prone
                     debug_log("Starting to process response stream")
+                    # Update the model usage timestamp to keep it hot
+                    self.update_model_usage(model)
                     # Set a flag to track if we've yielded any content
                     has_yielded_content = False
@@ -535,6 +656,123 @@ class OllamaClient(BaseModelClient):
     def is_loading_model(self) -> bool:
         """Check if Ollama is currently loading a model"""
         return self._model_loading
+    async def preload_model(self, model_id: str) -> bool:
+        """
+        Preload a model to keep it hot/ready for use
+        Returns True if successful, False otherwise
+        """
+        from datetime import datetime
+        import asyncio
+        logger.info(f"Preloading model: {model_id}")
+        # First, check if the model is already preloaded
+        if model_id in self._preloaded_models:
+            # Update timestamp if already preloaded
+            self._preloaded_models[model_id] = datetime.now()
+            logger.info(f"Model {model_id} already preloaded, updated timestamp")
+            return True
+        try:
+            # We'll use a minimal prompt to load the model
+            warm_up_prompt = "hello"
+            # Set model loading state
+            old_loading_state = self._model_loading
+            self._model_loading = True
+            async with aiohttp.ClientSession() as session:
+                # First try pulling the model if needed
+                try:
+                    logger.info(f"Ensuring model {model_id} is pulled")
+                    pull_payload = {"name": model_id}
+                    pull_timeout = self.get_timeout_for_model(model_id, "pull")
+                    async with session.post(
+                        f"{self.base_url}/api/pull",
+                        json=pull_payload,
+                        timeout=pull_timeout
+                    ) as pull_response:
+                        # We don't need to process the full pull, just initiate it
+                        if pull_response.status != 200:
+                            logger.warning(f"Pull request for model {model_id} failed with status {pull_response.status}")
+                except Exception as e:
+                    logger.warning(f"Error during model pull check: {str(e)}")
+                # Now send a small generation request to load the model into memory
+                logger.info(f"Sending warm-up request for model {model_id}")
+                gen_timeout = self.get_timeout_for_model(model_id, "load")
+                async with session.post(
+                    f"{self.base_url}/api/generate",
+                    json={
+                        "model": model_id,
+                        "prompt": warm_up_prompt,
+                        "temperature": 0.7,
+                        "stream": False
+                    },
+                    timeout=gen_timeout
+                ) as response:
+                    if response.status != 200:
+                        logger.error(f"Failed to preload model {model_id}, status: {response.status}")
+                        self._model_loading = old_loading_state
+                        return False
+                    # Read the response to ensure the model is fully loaded
+                    await response.json()
+                    # Update preloaded models with timestamp
+                    self._preloaded_models[model_id] = datetime.now()
+                    logger.info(f"Successfully preloaded model {model_id}")
+                    return True
+        except Exception as e:
+            logger.error(f"Error preloading model {model_id}: {str(e)}")
+            return False
+        finally:
+            # Reset model loading state
+            self._model_loading = old_loading_state
+    def get_preloaded_models(self) -> Dict[str, datetime]:
+        """Return the dict of preloaded models and their last use times"""
+        return self._preloaded_models
+    def update_model_usage(self, model_id: str) -> None:
+        """Update the timestamp for a model that is being used"""
+        if model_id and model_id in self._preloaded_models:
+            from datetime import datetime
+            self._preloaded_models[model_id] = datetime.now()
+            logger.info(f"Updated usage timestamp for model {model_id}")
+    async def release_inactive_models(self, max_inactive_minutes: int = 30) -> List[str]:
+        """
+        Release models that have been inactive for more than the specified time
+        Returns a list of model IDs that were released
+        """
+        from datetime import datetime, timedelta
+        if not self._preloaded_models:
+            return []
+        now = datetime.now()
+        inactive_threshold = timedelta(minutes=max_inactive_minutes)
+        models_to_release = []
+        # Find models that have been inactive for too long
+        for model_id, last_used in list(self._preloaded_models.items()):
+            if now - last_used > inactive_threshold:
+                models_to_release.append(model_id)
+        # Release the models
+        released_models = []
+        for model_id in models_to_release:
+            try:
+                logger.info(f"Releasing inactive model: {model_id} (inactive for {(now - self._preloaded_models[model_id]).total_seconds() / 60:.1f} minutes)")
+                # We don't have an explicit "unload" API in Ollama, but we can remove it from our tracking
+                del self._preloaded_models[model_id]
+                released_models.append(model_id)
+            except Exception as e:
+                logger.error(f"Error releasing model {model_id}: {str(e)}")
+        return released_models
     async def get_model_details(self, model_id: str) -> Dict[str, Any]:
         """Get detailed information about a specific Ollama model"""

{chat_console-0.3.995 → chat_console-0.4.2}/app/api/openai.py RENAMED Viewed

@@ -53,20 +53,38 @@ class OpenAIClient(BaseModelClient):
         """Generate a text completion using OpenAI"""
         processed_messages = self._prepare_messages(messages, style)
-        # Create parameters dict, omitting max_tokens if it's None
-        params = {
-            "model": model,
-            "messages": processed_messages,
-            "temperature": temperature,
-        }
-        # Only add max_tokens if it's not None
-        if max_tokens is not None:
-            params["max_tokens"] = max_tokens
-        response = await self.client.chat.completions.create(**params)
+        # Check if this is a reasoning model (o-series)
+        is_reasoning_model = model.startswith(("o1", "o3", "o4")) or model in ["o1", "o3", "o4-mini"]
-        return response.choices[0].message.content
+        # Use the Responses API for reasoning models
+        if is_reasoning_model:
+            # Create parameters dict for the Responses API
+            params = {
+                "model": model,
+                "input": processed_messages,
+                "reasoning": {"effort": "medium"},  # Default to medium effort
+            }
+            # Only add max_tokens if it's not None
+            if max_tokens is not None:
+                params["max_output_tokens"] = max_tokens
+            response = await self.client.responses.create(**params)
+            return response.output_text
+        else:
+            # Use the Chat Completions API for non-reasoning models
+            params = {
+                "model": model,
+                "messages": processed_messages,
+                "temperature": temperature,
+            }
+            # Only add max_tokens if it's not None
+            if max_tokens is not None:
+                params["max_tokens"] = max_tokens
+            response = await self.client.chat.completions.create(**params)
+            return response.choices[0].message.content
     async def generate_stream(self, messages: List[Dict[str, str]],
                             model: str,
@@ -83,6 +101,9 @@ class OpenAIClient(BaseModelClient):
         processed_messages = self._prepare_messages(messages, style)
+        # Check if this is a reasoning model (o-series)
+        is_reasoning_model = model.startswith(("o1", "o3", "o4")) or model in ["o1", "o3", "o4-mini"]
         try:
             debug_log(f"OpenAI: preparing {len(processed_messages)} messages for stream")
@@ -119,20 +140,37 @@ class OpenAIClient(BaseModelClient):
             while retry_count <= max_retries:
                 try:
-                    # Create parameters dict, omitting max_tokens if it's None
-                    params = {
-                        "model": model,
-                        "messages": api_messages,
-                        "temperature": temperature,
-                        "stream": True,
-                    }
-                    # Only add max_tokens if it's not None
-                    if max_tokens is not None:
-                        params["max_tokens"] = max_tokens
-                    debug_log(f"OpenAI: creating stream with params: {params}")
-                    stream = await self.client.chat.completions.create(**params)
+                    # Create parameters dict based on model type
+                    if is_reasoning_model:
+                        # Use the Responses API for reasoning models
+                        params = {
+                            "model": model,
+                            "input": api_messages,
+                            "reasoning": {"effort": "medium"},  # Default to medium effort
+                            "stream": True,
+                        }
+                        # Only add max_tokens if it's not None
+                        if max_tokens is not None:
+                            params["max_output_tokens"] = max_tokens
+                        debug_log(f"OpenAI: creating reasoning model stream with params: {params}")
+                        stream = await self.client.responses.create(**params)
+                    else:
+                        # Use the Chat Completions API for non-reasoning models
+                        params = {
+                            "model": model,
+                            "messages": api_messages,
+                            "temperature": temperature,
+                            "stream": True,
+                        }
+                        # Only add max_tokens if it's not None
+                        if max_tokens is not None:
+                            params["max_tokens"] = max_tokens
+                        debug_log(f"OpenAI: creating chat completion stream with params: {params}")
+                        stream = await self.client.chat.completions.create(**params)
                     # Store the stream for potential cancellation
                     self._active_stream = stream
@@ -157,17 +195,28 @@ class OpenAIClient(BaseModelClient):
                         chunk_count += 1
                         try:
-                            if chunk.choices and hasattr(chunk.choices[0], 'delta') and hasattr(chunk.choices[0].delta, 'content'):
-                                content = chunk.choices[0].delta.content
-                                if content is not None:
-                                    # Ensure we're returning a string
-                                    text = str(content)
-                                    debug_log(f"OpenAI: yielding chunk {chunk_count} of length: {len(text)}")
+                            # Handle different response formats based on model type
+                            if is_reasoning_model:
+                                # For reasoning models using the Responses API
+                                if hasattr(chunk, 'output_text') and chunk.output_text is not None:
+                                    text = str(chunk.output_text)
+                                    debug_log(f"OpenAI reasoning: yielding chunk {chunk_count} of length: {len(text)}")
                                     yield text
                                 else:
-                                    debug_log(f"OpenAI: skipping None content chunk {chunk_count}")
+                                    debug_log(f"OpenAI reasoning: skipping chunk {chunk_count} with missing content")
                             else:
-                                debug_log(f"OpenAI: skipping chunk {chunk_count} with missing content")
+                                # For regular models using the Chat Completions API
+                                if chunk.choices and hasattr(chunk.choices[0], 'delta') and hasattr(chunk.choices[0].delta, 'content'):
+                                    content = chunk.choices[0].delta.content
+                                    if content is not None:
+                                        # Ensure we're returning a string
+                                        text = str(content)
+                                        debug_log(f"OpenAI: yielding chunk {chunk_count} of length: {len(text)}")
+                                        yield text
+                                    else:
+                                        debug_log(f"OpenAI: skipping None content chunk {chunk_count}")
+                                else:
+                                    debug_log(f"OpenAI: skipping chunk {chunk_count} with missing content")
                         except Exception as chunk_error:
                             debug_log(f"OpenAI: error processing chunk {chunk_count}: {str(chunk_error)}")
                             # Skip problematic chunks but continue processing
@@ -221,11 +270,32 @@ class OpenAIClient(BaseModelClient):
             for model in models_response.data:
                 # Use 'id' as both id and name for now; can enhance with more info if needed
                 models.append({"id": model.id, "name": model.id})
+            # Add reasoning models which might not be in the models list
+            reasoning_models = [
+                {"id": "o1", "name": "o1 (Reasoning)"},
+                {"id": "o1-mini", "name": "o1-mini (Reasoning)"},
+                {"id": "o3", "name": "o3 (Reasoning)"},
+                {"id": "o3-mini", "name": "o3-mini (Reasoning)"},
+                {"id": "o4-mini", "name": "o4-mini (Reasoning)"}
+            ]
+            # Add reasoning models if they're not already in the list
+            existing_ids = {model["id"] for model in models}
+            for reasoning_model in reasoning_models:
+                if reasoning_model["id"] not in existing_ids:
+                    models.append(reasoning_model)
             return models
         except Exception as e:
             # Fallback to a static list if API call fails
             return [
                 {"id": "gpt-3.5-turbo", "name": "gpt-3.5-turbo"},
                 {"id": "gpt-4", "name": "gpt-4"},
-                {"id": "gpt-4-turbo", "name": "gpt-4-turbo"}
+                {"id": "gpt-4-turbo", "name": "gpt-4-turbo"},
+                {"id": "o1", "name": "o1 (Reasoning)"},
+                {"id": "o1-mini", "name": "o1-mini (Reasoning)"},
+                {"id": "o3", "name": "o3 (Reasoning)"},
+                {"id": "o3-mini", "name": "o3-mini (Reasoning)"},
+                {"id": "o4-mini", "name": "o4-mini (Reasoning)"}
             ]

{chat_console-0.3.995 → chat_console-0.4.2}/app/config.py RENAMED Viewed

@@ -52,6 +52,31 @@ DEFAULT_CONFIG = {
             "max_tokens": 8192,
             "display_name": "GPT-4"
         },
+        "o1": {
+            "provider": "openai",
+            "max_tokens": 128000,
+            "display_name": "o1 (Reasoning)"
+        },
+        "o1-mini": {
+            "provider": "openai",
+            "max_tokens": 128000,
+            "display_name": "o1-mini (Reasoning)"
+        },
+        "o3": {
+            "provider": "openai",
+            "max_tokens": 128000,
+            "display_name": "o3 (Reasoning)"
+        },
+        "o3-mini": {
+            "provider": "openai",
+            "max_tokens": 128000,
+            "display_name": "o3-mini (Reasoning)"
+        },
+        "o4-mini": {
+            "provider": "openai",
+            "max_tokens": 128000,
+            "display_name": "o4-mini (Reasoning)"
+        },
         # Use the corrected keys from anthropic.py
         "claude-3-opus-20240229": {
             "provider": "anthropic",
@@ -126,7 +151,9 @@ DEFAULT_CONFIG = {
     "max_history_items": 100,
     "highlight_code": True,
     "auto_save": True,
-    "generate_dynamic_titles": True
+    "generate_dynamic_titles": True,
+    "ollama_model_preload": True,
+    "ollama_inactive_timeout_minutes": 30
 }
 def validate_config(config):

{chat_console-0.3.995 → chat_console-0.4.2}/app/main.py RENAMED Viewed

@@ -363,7 +363,13 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
         self.selected_model = resolve_model_id(default_model_from_config)
         self.selected_style = CONFIG["default_style"] # Keep SimpleChatApp __init__
         self.initial_text = initial_text # Keep SimpleChatApp __init__
-        # Removed self.input_widget instance variable
+        # Task for model cleanup
+        self._model_cleanup_task = None
+        # Inactivity threshold in minutes before releasing model resources
+        # Read from config, default to 30 minutes
+        self.MODEL_INACTIVITY_THRESHOLD = CONFIG.get("ollama_inactive_timeout_minutes", 30)
     def compose(self) -> ComposeResult: # Modify SimpleChatApp compose
         """Create the simplified application layout."""
@@ -420,6 +426,11 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
             pass # Silently ignore if widget not found yet
         self.update_app_info()  # Update the model info
+        # Start the background task for model cleanup if model preloading is enabled
+        if CONFIG.get("ollama_model_preload", True):
+            self._model_cleanup_task = asyncio.create_task(self._check_inactive_models())
+            debug_log("Started background task for model cleanup")
         # Check API keys and services # Keep SimpleChatApp on_mount
         api_issues = [] # Keep SimpleChatApp on_mount
@@ -675,29 +686,87 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
             # Determine title client and model based on available keys
             if OPENAI_API_KEY:
+                # For highest success rate, use OpenAI for title generation when available
                 from app.api.openai import OpenAIClient
                 title_client = await OpenAIClient.create()
                 title_model = "gpt-3.5-turbo"
                 debug_log("Using OpenAI for background title generation")
             elif ANTHROPIC_API_KEY:
+                # Next best option is Anthropic
                 from app.api.anthropic import AnthropicClient
                 title_client = await AnthropicClient.create()
                 title_model = "claude-3-haiku-20240307"
                 debug_log("Using Anthropic for background title generation")
             else:
                 # Fallback to the currently selected model's client if no API keys
+                # Get client type first to ensure we correctly identify Ollama models
+                from app.api.ollama import OllamaClient
                 selected_model_resolved = resolve_model_id(self.selected_model)
-                title_client = await BaseModelClient.get_client_for_model(selected_model_resolved)
-                title_model = selected_model_resolved
-                debug_log(f"Using selected model's client ({type(title_client).__name__}) for background title generation")
+                client_type = BaseModelClient.get_client_type_for_model(selected_model_resolved)
+                # For Ollama models, special handling is required
+                if client_type == OllamaClient:
+                    debug_log(f"Title generation with Ollama model detected: {selected_model_resolved}")
+                    # Try common small/fast models first if they exist
+                    try:
+                        # Check if we have any smaller models available for faster title generation
+                        ollama_client = await OllamaClient.create()
+                        available_models = await ollama_client.get_available_models()
+                        small_model_options = ["gemma:2b", "phi3:mini", "llama3:8b", "orca-mini:3b", "phi2"]
+                        small_model_found = False
+                        for model_name in small_model_options:
+                            if any(model["id"] == model_name for model in available_models):
+                                debug_log(f"Found smaller Ollama model for title generation: {model_name}")
+                                title_model = model_name
+                                small_model_found = True
+                                break
+                        if not small_model_found:
+                            # Use the current model if no smaller models found
+                            title_model = selected_model_resolved
+                            debug_log(f"No smaller models found, using current model: {title_model}")
+                        # Always create a fresh client instance to avoid interference with model preloading
+                        title_client = ollama_client
+                        debug_log(f"Created dedicated Ollama client for title generation with model: {title_model}")
+                    except Exception as e:
+                        debug_log(f"Error finding optimized Ollama model for title generation: {str(e)}")
+                        # Fallback to standard approach
+                        title_client = await OllamaClient.create()
+                        title_model = selected_model_resolved
+                else:
+                    # For other providers, use normal client acquisition
+                    title_client = await BaseModelClient.get_client_for_model(selected_model_resolved)
+                    title_model = selected_model_resolved
+                    debug_log(f"Using selected model's client ({type(title_client).__name__}) for background title generation")
             if not title_client or not title_model:
                 raise Exception("Could not determine a client/model for title generation.")
             # Call the utility function
             from app.utils import generate_conversation_title # Import locally if needed
-            new_title = await generate_conversation_title(content, title_model, title_client)
-            debug_log(f"Background generated title: {new_title}")
+            # Add timeout handling for title generation to prevent hangs
+            try:
+                # Create a task with timeout
+                import asyncio
+                title_generation_task = asyncio.create_task(
+                    generate_conversation_title(content, title_model, title_client)
+                )
+                # Wait for completion with timeout (30 seconds)
+                new_title = await asyncio.wait_for(title_generation_task, timeout=30)
+                debug_log(f"Background generated title: {new_title}")
+            except asyncio.TimeoutError:
+                debug_log("Title generation timed out after 30 seconds")
+                # Use default title in case of timeout
+                new_title = f"Conversation ({datetime.now().strftime('%Y-%m-%d %H:%M')})"
+                # Try to cancel the task
+                if not title_generation_task.done():
+                    title_generation_task.cancel()
+                    debug_log("Cancelled timed out title generation task")
             # Check if title generation returned the default or a real title
             if new_title and not new_title.startswith("Conversation ("):
@@ -718,8 +787,8 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
                         title_widget.update(new_title)
                         self.current_conversation.title = new_title # Update local object too
                         log(f"Background title update successful: {new_title}")
-                        # Maybe a subtle notification? Optional.
-                        # self.notify(f"Title set: {new_title}", severity="information", timeout=2)
+                        # Subtle notification to show title was updated
+                        self.notify(f"Conversation titled: {new_title}", severity="information", timeout=2)
                     else:
                         log("Conversation changed before background title update could apply.")
                 else:
@@ -1226,6 +1295,94 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
             log(f"Stored selected provider: {self.selected_provider} for model: {self.selected_model}")
         self.update_app_info()  # Update the displayed model info
+        # Preload the model if it's an Ollama model and preloading is enabled
+        if self.selected_provider == "ollama" and CONFIG.get("ollama_model_preload", True):
+            # Start the background task to preload the model
+            debug_log(f"Starting background task to preload Ollama model: {self.selected_model}")
+            asyncio.create_task(self._preload_ollama_model(self.selected_model))
+    async def _preload_ollama_model(self, model_id: str) -> None:
+        """Preload an Ollama model in the background"""
+        from app.api.ollama import OllamaClient
+        debug_log(f"Preloading Ollama model: {model_id}")
+        # Show a subtle notification to the user
+        self.notify("Preparing model for use...", severity="information", timeout=3)
+        try:
+            # Initialize the client
+            client = await OllamaClient.create()
+            # Update the loading indicator to show model loading
+            loading = self.query_one("#loading-indicator")
+            loading.remove_class("hidden")
+            loading.add_class("model-loading")
+            loading.update(f"⚙️ Loading Ollama model...")
+            # Preload the model
+            success = await client.preload_model(model_id)
+            # Hide the loading indicator
+            loading.add_class("hidden")
+            loading.remove_class("model-loading")
+            if success:
+                debug_log(f"Successfully preloaded model: {model_id}")
+                self.notify(f"Model ready for use", severity="success", timeout=2)
+            else:
+                debug_log(f"Failed to preload model: {model_id}")
+                # No need to notify the user about failure - will happen naturally on first use
+        except Exception as e:
+            debug_log(f"Error preloading model: {str(e)}")
+            # Make sure to hide the loading indicator
+            try:
+                loading = self.query_one("#loading-indicator")
+                loading.add_class("hidden")
+                loading.remove_class("model-loading")
+            except Exception:
+                pass
+    async def _check_inactive_models(self) -> None:
+        """Background task to check for and release inactive models"""
+        from app.api.ollama import OllamaClient
+        # How often to check for inactive models (in seconds)
+        CHECK_INTERVAL = 600  # 10 minutes
+        debug_log(f"Starting inactive model check task with interval {CHECK_INTERVAL}s")
+        try:
+            while True:
+                await asyncio.sleep(CHECK_INTERVAL)
+                debug_log("Checking for inactive models...")
+                try:
+                    # Initialize the client
+                    client = await OllamaClient.create()
+                    # Get the threshold from instance variable
+                    threshold = getattr(self, "MODEL_INACTIVITY_THRESHOLD", 30)
+                    # Check and release inactive models
+                    released_models = await client.release_inactive_models(threshold)
+                    if released_models:
+                        debug_log(f"Released {len(released_models)} inactive models: {released_models}")
+                    else:
+                        debug_log("No inactive models to release")
+                except Exception as e:
+                    debug_log(f"Error checking for inactive models: {str(e)}")
+                    # Continue loop even if this check fails
+        except asyncio.CancelledError:
+            debug_log("Model cleanup task cancelled")
+            # Normal task cancellation, clean exit
+        except Exception as e:
+            debug_log(f"Unexpected error in model cleanup task: {str(e)}")
+            # Log but don't crash
     def on_style_selector_style_selected(self, event: StyleSelector.StyleSelected) -> None: # Keep SimpleChatApp on_style_selector_style_selected
         """Handle style selection""" # Keep SimpleChatApp on_style_selector_style_selected docstring

{chat_console-0.3.995 → chat_console-0.4.2}/app/utils.py RENAMED Viewed

@@ -32,6 +32,11 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
     # Try-except the entire function to ensure we always return a title
     try:
+        # Check if we're using an Ollama client
+        from app.api.ollama import OllamaClient
+        is_ollama_client = isinstance(client, OllamaClient)
+        debug_log(f"Client is Ollama: {is_ollama_client}")
         # Pick a reliable title generation model - prefer OpenAI if available
         from app.config import OPENAI_API_KEY, ANTHROPIC_API_KEY
@@ -46,10 +51,16 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
             title_model = "claude-3-haiku-20240307"
             debug_log("Using Anthropic for title generation")
         else:
-            # Use the passed client if no API keys available
-            title_client = client
-            title_model = model
-            debug_log(f"Using provided {type(client).__name__} for title generation")
+            # For Ollama clients, ensure we have a clean instance to avoid conflicts with preloaded models
+            if is_ollama_client:
+                debug_log("Creating fresh Ollama client instance for title generation")
+                title_client = await OllamaClient.create()
+                title_model = model
+            else:
+                # Use the passed client for other providers
+                title_client = client
+                title_model = model
+            debug_log(f"Using {type(title_client).__name__} for title generation with model {title_model}")
         # Create a special prompt for title generation
         title_prompt = [
@@ -65,12 +76,25 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
         # Generate title
         debug_log(f"Sending title generation request to {title_model}")
-        title = await title_client.generate_completion(
-            messages=title_prompt,
-            model=title_model,
-            temperature=0.7,
-            max_tokens=60
-        )
+        # Check if this is a reasoning model (o-series)
+        is_reasoning_model = title_model.startswith(("o1", "o3", "o4")) or title_model in ["o1", "o3", "o4-mini"]
+        if is_reasoning_model:
+            # For reasoning models, don't include temperature
+            title = await title_client.generate_completion(
+                messages=title_prompt,
+                model=title_model,
+                max_tokens=60
+            )
+        else:
+            # For non-reasoning models, include temperature
+            title = await title_client.generate_completion(
+                messages=title_prompt,
+                model=title_model,
+                temperature=0.7,
+                max_tokens=60
+            )
         # Sanitize the title
         title = title.strip().strip('"\'').strip()
@@ -755,7 +779,13 @@ def resolve_model_id(model_id_or_name: str) -> str:
         "35-turbo": "gpt-3.5-turbo",
         "35": "gpt-3.5-turbo",
         "4.1-mini": "gpt-4.1-mini",  # Add support for gpt-4.1-mini
-        "4.1": "gpt-4.1"  # Add support for gpt-4.1
+        "4.1": "gpt-4.1",  # Add support for gpt-4.1
+        # Add support for reasoning models
+        "o1": "o1",
+        "o1-mini": "o1-mini",
+        "o3": "o3",
+        "o3-mini": "o3-mini",
+        "o4-mini": "o4-mini"
     }
     if input_lower in openai_model_aliases:
@@ -765,23 +795,26 @@ def resolve_model_id(model_id_or_name: str) -> str:
     # Special case handling for common typos and model name variations
     typo_corrections = {
-        "o4-mini": "04-mini",
-        "o1": "01",
-        "o1-mini": "01-mini",
-        "o1-preview": "01-preview",
-        "o4": "04",
-        "o4-preview": "04-preview",
-        "o4-vision": "04-vision"
+        # Keep reasoning models as-is, don't convert 'o' to '0'
+        # "o4-mini": "04-mini",
+        # "o1": "01",
+        # "o1-mini": "01-mini",
+        # "o1-preview": "01-preview",
+        # "o4": "04",
+        # "o4-preview": "04-preview",
+        # "o4-vision": "04-vision"
     }
+    # Don't convert reasoning model IDs that start with 'o'
     # Check for more complex typo patterns with dates
-    if input_lower.startswith("o1-") and "-202" in input_lower:
+    if input_lower.startswith("o1-") and "-202" in input_lower and not any(input_lower == model_id for model_id in ["o1", "o1-mini", "o3", "o3-mini", "o4-mini"]):
         corrected = "01" + input_lower[2:]
         logger.info(f"Converting '{input_lower}' to '{corrected}' (letter 'o' to zero '0')")
         input_lower = corrected
         model_id_or_name = corrected
-    if input_lower in typo_corrections:
+    # Only apply typo corrections if not a reasoning model
+    if input_lower in typo_corrections and not any(input_lower == model_id for model_id in ["o1", "o1-mini", "o3", "o3-mini", "o4-mini"]):
         corrected = typo_corrections[input_lower]
         logger.info(f"Converting '{input_lower}' to '{corrected}' (letter 'o' to zero '0')")
         input_lower = corrected

{chat_console-0.3.995 → chat_console-0.4.2/chat_console.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: chat-console
-Version: 0.3.995
+Version: 0.4.2
 Summary: A command-line interface for chatting with LLMs, storing chats and (future) rag interactions
 Home-page: https://github.com/wazacraftrfid/chat-console
 Author: Johnathan Greenaway
@@ -28,7 +28,8 @@ Dynamic: requires-dist
 Dynamic: requires-python
 Dynamic: summary
-# Chat CLI
+c# Chat CLI
 A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
@@ -37,6 +38,7 @@ A comprehensive command-line interface for chatting with various AI language mod
 - Interactive terminal UI with Textual library
 - Support for multiple AI models:
   - OpenAI models (GPT-3.5, GPT-4)
+  - OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
   - Anthropic models (Claude 3 Opus, Sonnet, Haiku)
 - Conversation history with search functionality
 - Customizable response styles (concise, detailed, technical, friendly)
@@ -71,6 +73,26 @@ Run the application:
 chat-cli
 ```
+### Testing Reasoning Models
+To test the OpenAI reasoning models implementation, you can use the included test script:
+```
+./test_reasoning.py
+```
+This script will test both completion and streaming with the available reasoning models.
+### About OpenAI Reasoning Models
+OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
+- Think before they answer, producing a long internal chain of thought
+- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
+- Use "reasoning tokens" to work through problems step by step before providing a response
+- Support different reasoning effort levels (low, medium, high)
+The implementation in this CLI supports both standard completions and streaming with these models.
 ### Keyboard Shortcuts
 - `q` - Quit the application