chat-console 0.3.995__tar.gz → 0.4.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {chat_console-0.3.995/chat_console.egg-info → chat_console-0.4.2}/PKG-INFO +24 -2
- {chat_console-0.3.995 → chat_console-0.4.2}/README.md +23 -1
- {chat_console-0.3.995 → chat_console-0.4.2}/app/__init__.py +1 -1
- {chat_console-0.3.995 → chat_console-0.4.2}/app/api/base.py +4 -4
- {chat_console-0.3.995 → chat_console-0.4.2}/app/api/ollama.py +242 -4
- {chat_console-0.3.995 → chat_console-0.4.2}/app/api/openai.py +106 -36
- {chat_console-0.3.995 → chat_console-0.4.2}/app/config.py +28 -1
- {chat_console-0.3.995 → chat_console-0.4.2}/app/main.py +165 -8
- {chat_console-0.3.995 → chat_console-0.4.2}/app/utils.py +53 -20
- {chat_console-0.3.995 → chat_console-0.4.2/chat_console.egg-info}/PKG-INFO +24 -2
- {chat_console-0.3.995 → chat_console-0.4.2}/LICENSE +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/api/__init__.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/api/anthropic.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/database.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/models.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/__init__.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/chat_interface.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/chat_list.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/model_browser.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/model_selector.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/search.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/app/ui/styles.py +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/chat_console.egg-info/SOURCES.txt +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/chat_console.egg-info/dependency_links.txt +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/chat_console.egg-info/entry_points.txt +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/chat_console.egg-info/requires.txt +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/chat_console.egg-info/top_level.txt +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/setup.cfg +0 -0
- {chat_console-0.3.995 → chat_console-0.4.2}/setup.py +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: chat-console
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.4.2
|
4
4
|
Summary: A command-line interface for chatting with LLMs, storing chats and (future) rag interactions
|
5
5
|
Home-page: https://github.com/wazacraftrfid/chat-console
|
6
6
|
Author: Johnathan Greenaway
|
@@ -28,7 +28,8 @@ Dynamic: requires-dist
|
|
28
28
|
Dynamic: requires-python
|
29
29
|
Dynamic: summary
|
30
30
|
|
31
|
-
|
31
|
+
|
32
|
+
c# Chat CLI
|
32
33
|
|
33
34
|
A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
|
34
35
|
|
@@ -37,6 +38,7 @@ A comprehensive command-line interface for chatting with various AI language mod
|
|
37
38
|
- Interactive terminal UI with Textual library
|
38
39
|
- Support for multiple AI models:
|
39
40
|
- OpenAI models (GPT-3.5, GPT-4)
|
41
|
+
- OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
|
40
42
|
- Anthropic models (Claude 3 Opus, Sonnet, Haiku)
|
41
43
|
- Conversation history with search functionality
|
42
44
|
- Customizable response styles (concise, detailed, technical, friendly)
|
@@ -71,6 +73,26 @@ Run the application:
|
|
71
73
|
chat-cli
|
72
74
|
```
|
73
75
|
|
76
|
+
### Testing Reasoning Models
|
77
|
+
|
78
|
+
To test the OpenAI reasoning models implementation, you can use the included test script:
|
79
|
+
```
|
80
|
+
./test_reasoning.py
|
81
|
+
```
|
82
|
+
|
83
|
+
This script will test both completion and streaming with the available reasoning models.
|
84
|
+
|
85
|
+
### About OpenAI Reasoning Models
|
86
|
+
|
87
|
+
OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
|
88
|
+
|
89
|
+
- Think before they answer, producing a long internal chain of thought
|
90
|
+
- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
|
91
|
+
- Use "reasoning tokens" to work through problems step by step before providing a response
|
92
|
+
- Support different reasoning effort levels (low, medium, high)
|
93
|
+
|
94
|
+
The implementation in this CLI supports both standard completions and streaming with these models.
|
95
|
+
|
74
96
|
### Keyboard Shortcuts
|
75
97
|
|
76
98
|
- `q` - Quit the application
|
@@ -1,4 +1,5 @@
|
|
1
|
-
|
1
|
+
|
2
|
+
c# Chat CLI
|
2
3
|
|
3
4
|
A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
|
4
5
|
|
@@ -7,6 +8,7 @@ A comprehensive command-line interface for chatting with various AI language mod
|
|
7
8
|
- Interactive terminal UI with Textual library
|
8
9
|
- Support for multiple AI models:
|
9
10
|
- OpenAI models (GPT-3.5, GPT-4)
|
11
|
+
- OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
|
10
12
|
- Anthropic models (Claude 3 Opus, Sonnet, Haiku)
|
11
13
|
- Conversation history with search functionality
|
12
14
|
- Customizable response styles (concise, detailed, technical, friendly)
|
@@ -41,6 +43,26 @@ Run the application:
|
|
41
43
|
chat-cli
|
42
44
|
```
|
43
45
|
|
46
|
+
### Testing Reasoning Models
|
47
|
+
|
48
|
+
To test the OpenAI reasoning models implementation, you can use the included test script:
|
49
|
+
```
|
50
|
+
./test_reasoning.py
|
51
|
+
```
|
52
|
+
|
53
|
+
This script will test both completion and streaming with the available reasoning models.
|
54
|
+
|
55
|
+
### About OpenAI Reasoning Models
|
56
|
+
|
57
|
+
OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
|
58
|
+
|
59
|
+
- Think before they answer, producing a long internal chain of thought
|
60
|
+
- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
|
61
|
+
- Use "reasoning tokens" to work through problems step by step before providing a response
|
62
|
+
- Support different reasoning effort levels (low, medium, high)
|
63
|
+
|
64
|
+
The implementation in this CLI supports both standard completions and streaming with these models.
|
65
|
+
|
44
66
|
### Keyboard Shortcuts
|
45
67
|
|
46
68
|
- `q` - Quit the application
|
@@ -82,9 +82,9 @@ class BaseModelClient(ABC):
|
|
82
82
|
|
83
83
|
# If we couldn't get the provider from the UI, infer it from the model name
|
84
84
|
# Check for common OpenAI model patterns or prefixes
|
85
|
-
if (model_name_lower.startswith(("gpt-", "text-", "davinci")) or
|
85
|
+
if (model_name_lower.startswith(("gpt-", "text-", "davinci", "o1", "o3", "o4")) or
|
86
86
|
"gpt" in model_name_lower or
|
87
|
-
model_name_lower in ["04-mini", "04", "04-turbo", "04-vision"]):
|
87
|
+
model_name_lower in ["04-mini", "04", "04-turbo", "04-vision", "o1", "o3", "o4-mini"]):
|
88
88
|
provider = "openai"
|
89
89
|
logger.info(f"Identified {model_name} as an OpenAI model")
|
90
90
|
# Then check for Anthropic models - these should ALWAYS use Anthropic client
|
@@ -162,9 +162,9 @@ class BaseModelClient(ABC):
|
|
162
162
|
# If we couldn't get the provider from the UI, infer it from the model name
|
163
163
|
if not provider:
|
164
164
|
# Check for common OpenAI model patterns or prefixes
|
165
|
-
if (model_name_lower.startswith(("gpt-", "text-", "davinci")) or
|
165
|
+
if (model_name_lower.startswith(("gpt-", "text-", "davinci", "o1", "o3", "o4")) or
|
166
166
|
"gpt" in model_name_lower or
|
167
|
-
model_name_lower in ["04-mini", "04", "04-turbo", "04-vision"]):
|
167
|
+
model_name_lower in ["04-mini", "04", "04-turbo", "04-vision", "o1", "o3", "o4-mini"]):
|
168
168
|
if not AVAILABLE_PROVIDERS["openai"]:
|
169
169
|
raise Exception("OpenAI API key not found. Please set OPENAI_API_KEY environment variable.")
|
170
170
|
provider = "openai"
|
@@ -31,8 +31,96 @@ class OllamaClient(BaseModelClient):
|
|
31
31
|
# Track model loading state
|
32
32
|
self._model_loading = False
|
33
33
|
|
34
|
+
# Track preloaded models and their last use timestamp
|
35
|
+
self._preloaded_models = {}
|
36
|
+
|
37
|
+
# Default timeout values (in seconds)
|
38
|
+
self.DEFAULT_TIMEOUT = 30
|
39
|
+
self.MODEL_LOAD_TIMEOUT = 120
|
40
|
+
self.MODEL_PULL_TIMEOUT = 3600 # 1 hour for large models
|
41
|
+
|
34
42
|
# Path to the cached models file
|
35
43
|
self.models_cache_path = Path(__file__).parent.parent / "data" / "ollama-models.json"
|
44
|
+
|
45
|
+
def get_timeout_for_model(self, model_id: str, operation: str = "generate") -> int:
|
46
|
+
"""
|
47
|
+
Calculate an appropriate timeout based on model size
|
48
|
+
|
49
|
+
Parameters:
|
50
|
+
- model_id: The model identifier
|
51
|
+
- operation: The operation type ('generate', 'load', 'pull')
|
52
|
+
|
53
|
+
Returns:
|
54
|
+
- Timeout in seconds
|
55
|
+
"""
|
56
|
+
# Default timeouts by operation
|
57
|
+
default_timeouts = {
|
58
|
+
"generate": self.DEFAULT_TIMEOUT, # 30s
|
59
|
+
"load": self.MODEL_LOAD_TIMEOUT, # 2min
|
60
|
+
"pull": self.MODEL_PULL_TIMEOUT, # 1h
|
61
|
+
"list": 5, # 5s
|
62
|
+
"test": 2 # 2s
|
63
|
+
}
|
64
|
+
|
65
|
+
# Parameter size multipliers
|
66
|
+
size_multipliers = {
|
67
|
+
# For models < 3B
|
68
|
+
"1b": 0.5,
|
69
|
+
"2b": 0.7,
|
70
|
+
"3b": 1.0,
|
71
|
+
# For models 3B-10B
|
72
|
+
"5b": 1.2,
|
73
|
+
"6b": 1.3,
|
74
|
+
"7b": 1.5,
|
75
|
+
"8b": 1.7,
|
76
|
+
"9b": 1.8,
|
77
|
+
# For models 10B-20B
|
78
|
+
"13b": 2.0,
|
79
|
+
"14b": 2.0,
|
80
|
+
# For models 20B-50B
|
81
|
+
"27b": 3.0,
|
82
|
+
"34b": 3.5,
|
83
|
+
"40b": 4.0,
|
84
|
+
# For models 50B+
|
85
|
+
"70b": 5.0,
|
86
|
+
"80b": 6.0,
|
87
|
+
"100b": 7.0,
|
88
|
+
"400b": 10.0,
|
89
|
+
"405b": 10.0,
|
90
|
+
}
|
91
|
+
|
92
|
+
# Get the base timeout for the operation
|
93
|
+
base_timeout = default_timeouts.get(operation, self.DEFAULT_TIMEOUT)
|
94
|
+
|
95
|
+
# Try to determine the model size from the model ID
|
96
|
+
model_size = "7b" # Default assumption is 7B parameters
|
97
|
+
model_lower = model_id.lower()
|
98
|
+
|
99
|
+
# Check for size indicators in the model name
|
100
|
+
for size in size_multipliers.keys():
|
101
|
+
if size in model_lower:
|
102
|
+
model_size = size
|
103
|
+
break
|
104
|
+
|
105
|
+
# If it's a known large model without size in name
|
106
|
+
if "llama3.1" in model_lower and not any(size in model_lower for size in size_multipliers.keys()):
|
107
|
+
model_size = "8b" # Default for llama3.1 without size specified
|
108
|
+
|
109
|
+
# For first generation after model selection, if preloaded, use shorter timeout
|
110
|
+
if operation == "generate" and model_id in self._preloaded_models:
|
111
|
+
# For preloaded models, use a shorter timeout
|
112
|
+
return max(int(base_timeout * 0.7), 20) # Min 20 seconds
|
113
|
+
|
114
|
+
# Calculate final timeout with multiplier
|
115
|
+
multiplier = size_multipliers.get(model_size, 1.0)
|
116
|
+
timeout = int(base_timeout * multiplier)
|
117
|
+
|
118
|
+
# For pull operation, ensure we have a reasonable maximum
|
119
|
+
if operation == "pull":
|
120
|
+
return min(timeout, 7200) # Max 2 hours
|
121
|
+
|
122
|
+
logger.info(f"Calculated timeout for {model_id} ({operation}): {timeout}s (base: {base_timeout}s, multiplier: {multiplier})")
|
123
|
+
return timeout
|
36
124
|
|
37
125
|
@classmethod
|
38
126
|
async def create(cls) -> 'OllamaClient':
|
@@ -61,7 +149,29 @@ class OllamaClient(BaseModelClient):
|
|
61
149
|
style_instructions = self._get_style_instructions(style)
|
62
150
|
debug_log(f"Adding style instructions: {style_instructions[:50]}...")
|
63
151
|
formatted_messages.append(style_instructions)
|
152
|
+
|
153
|
+
# Special case for title generation - check if this is a title generation message
|
154
|
+
is_title_generation = False
|
155
|
+
for msg in messages:
|
156
|
+
if msg.get("role") == "system" and "generate a brief, descriptive title" in msg.get("content", "").lower():
|
157
|
+
is_title_generation = True
|
158
|
+
debug_log("Detected title generation prompt")
|
159
|
+
break
|
160
|
+
|
161
|
+
# For title generation, use a direct approach
|
162
|
+
if is_title_generation:
|
163
|
+
debug_log("Using specialized formatting for title generation")
|
164
|
+
# Find the user message containing the input for title generation
|
165
|
+
user_msg = next((msg for msg in messages if msg.get("role") == "user"), None)
|
166
|
+
if user_msg and "content" in user_msg:
|
167
|
+
# Create a direct prompt
|
168
|
+
prompt = "Generate a short descriptive title (maximum 40 characters) for this conversation. ONLY RESPOND WITH THE TITLE FOR THE FOLLOWING MESSAGE:\n\n" + user_msg["content"]
|
169
|
+
debug_log(f"Created title generation prompt: {prompt[:100]}...")
|
170
|
+
return prompt
|
171
|
+
else:
|
172
|
+
debug_log("Could not find user message for title generation, using standard formatting")
|
64
173
|
|
174
|
+
# Standard processing for normal chat messages
|
65
175
|
# Add message content, preserving conversation flow
|
66
176
|
for i, msg in enumerate(messages):
|
67
177
|
try:
|
@@ -185,6 +295,7 @@ class OllamaClient(BaseModelClient):
|
|
185
295
|
try:
|
186
296
|
async with aiohttp.ClientSession() as session:
|
187
297
|
logger.debug(f"Sending request to {self.base_url}/api/generate")
|
298
|
+
gen_timeout = self.get_timeout_for_model(model, "generate")
|
188
299
|
async with session.post(
|
189
300
|
f"{self.base_url}/api/generate",
|
190
301
|
json={
|
@@ -193,12 +304,16 @@ class OllamaClient(BaseModelClient):
|
|
193
304
|
"temperature": temperature,
|
194
305
|
"stream": False
|
195
306
|
},
|
196
|
-
timeout=
|
307
|
+
timeout=gen_timeout
|
197
308
|
) as response:
|
198
309
|
response.raise_for_status()
|
199
310
|
data = await response.json()
|
200
311
|
if "response" not in data:
|
201
312
|
raise Exception("Invalid response format from Ollama server")
|
313
|
+
|
314
|
+
# Update the model usage timestamp to keep it hot
|
315
|
+
self.update_model_usage(model)
|
316
|
+
|
202
317
|
return data["response"]
|
203
318
|
|
204
319
|
except aiohttp.ClientConnectorError:
|
@@ -324,10 +439,11 @@ class OllamaClient(BaseModelClient):
|
|
324
439
|
"stream": False
|
325
440
|
}
|
326
441
|
|
442
|
+
test_timeout = self.get_timeout_for_model(model, "test")
|
327
443
|
async with session.post(
|
328
444
|
f"{self.base_url}/api/generate",
|
329
445
|
json=test_payload,
|
330
|
-
timeout=
|
446
|
+
timeout=test_timeout
|
331
447
|
) as response:
|
332
448
|
if response.status != 200:
|
333
449
|
logger.warning(f"Model test request failed with status {response.status}")
|
@@ -361,10 +477,11 @@ class OllamaClient(BaseModelClient):
|
|
361
477
|
debug_log(f"Error preparing pull payload: {str(pull_err)}, using default")
|
362
478
|
pull_payload = {"name": "gemma:2b"} # Safe default
|
363
479
|
|
480
|
+
pull_timeout = self.get_timeout_for_model(model, "pull")
|
364
481
|
async with session.post(
|
365
482
|
f"{self.base_url}/api/pull",
|
366
483
|
json=pull_payload,
|
367
|
-
timeout=
|
484
|
+
timeout=pull_timeout
|
368
485
|
) as pull_response:
|
369
486
|
if pull_response.status != 200:
|
370
487
|
logger.error("Failed to pull model")
|
@@ -415,10 +532,11 @@ class OllamaClient(BaseModelClient):
|
|
415
532
|
}
|
416
533
|
|
417
534
|
debug_log(f"Sending request to Ollama API")
|
535
|
+
gen_timeout = self.get_timeout_for_model(model, "generate")
|
418
536
|
response = await session.post(
|
419
537
|
f"{self.base_url}/api/generate",
|
420
538
|
json=request_payload,
|
421
|
-
timeout=
|
539
|
+
timeout=gen_timeout
|
422
540
|
)
|
423
541
|
response.raise_for_status()
|
424
542
|
debug_log(f"Response status: {response.status}")
|
@@ -426,6 +544,9 @@ class OllamaClient(BaseModelClient):
|
|
426
544
|
# Use a simpler async iteration pattern that's less error-prone
|
427
545
|
debug_log("Starting to process response stream")
|
428
546
|
|
547
|
+
# Update the model usage timestamp to keep it hot
|
548
|
+
self.update_model_usage(model)
|
549
|
+
|
429
550
|
# Set a flag to track if we've yielded any content
|
430
551
|
has_yielded_content = False
|
431
552
|
|
@@ -535,6 +656,123 @@ class OllamaClient(BaseModelClient):
|
|
535
656
|
def is_loading_model(self) -> bool:
|
536
657
|
"""Check if Ollama is currently loading a model"""
|
537
658
|
return self._model_loading
|
659
|
+
|
660
|
+
async def preload_model(self, model_id: str) -> bool:
|
661
|
+
"""
|
662
|
+
Preload a model to keep it hot/ready for use
|
663
|
+
Returns True if successful, False otherwise
|
664
|
+
"""
|
665
|
+
from datetime import datetime
|
666
|
+
import asyncio
|
667
|
+
|
668
|
+
logger.info(f"Preloading model: {model_id}")
|
669
|
+
|
670
|
+
# First, check if the model is already preloaded
|
671
|
+
if model_id in self._preloaded_models:
|
672
|
+
# Update timestamp if already preloaded
|
673
|
+
self._preloaded_models[model_id] = datetime.now()
|
674
|
+
logger.info(f"Model {model_id} already preloaded, updated timestamp")
|
675
|
+
return True
|
676
|
+
|
677
|
+
try:
|
678
|
+
# We'll use a minimal prompt to load the model
|
679
|
+
warm_up_prompt = "hello"
|
680
|
+
|
681
|
+
# Set model loading state
|
682
|
+
old_loading_state = self._model_loading
|
683
|
+
self._model_loading = True
|
684
|
+
|
685
|
+
async with aiohttp.ClientSession() as session:
|
686
|
+
# First try pulling the model if needed
|
687
|
+
try:
|
688
|
+
logger.info(f"Ensuring model {model_id} is pulled")
|
689
|
+
pull_payload = {"name": model_id}
|
690
|
+
pull_timeout = self.get_timeout_for_model(model_id, "pull")
|
691
|
+
async with session.post(
|
692
|
+
f"{self.base_url}/api/pull",
|
693
|
+
json=pull_payload,
|
694
|
+
timeout=pull_timeout
|
695
|
+
) as pull_response:
|
696
|
+
# We don't need to process the full pull, just initiate it
|
697
|
+
if pull_response.status != 200:
|
698
|
+
logger.warning(f"Pull request for model {model_id} failed with status {pull_response.status}")
|
699
|
+
except Exception as e:
|
700
|
+
logger.warning(f"Error during model pull check: {str(e)}")
|
701
|
+
|
702
|
+
# Now send a small generation request to load the model into memory
|
703
|
+
logger.info(f"Sending warm-up request for model {model_id}")
|
704
|
+
gen_timeout = self.get_timeout_for_model(model_id, "load")
|
705
|
+
async with session.post(
|
706
|
+
f"{self.base_url}/api/generate",
|
707
|
+
json={
|
708
|
+
"model": model_id,
|
709
|
+
"prompt": warm_up_prompt,
|
710
|
+
"temperature": 0.7,
|
711
|
+
"stream": False
|
712
|
+
},
|
713
|
+
timeout=gen_timeout
|
714
|
+
) as response:
|
715
|
+
if response.status != 200:
|
716
|
+
logger.error(f"Failed to preload model {model_id}, status: {response.status}")
|
717
|
+
self._model_loading = old_loading_state
|
718
|
+
return False
|
719
|
+
|
720
|
+
# Read the response to ensure the model is fully loaded
|
721
|
+
await response.json()
|
722
|
+
|
723
|
+
# Update preloaded models with timestamp
|
724
|
+
self._preloaded_models[model_id] = datetime.now()
|
725
|
+
logger.info(f"Successfully preloaded model {model_id}")
|
726
|
+
return True
|
727
|
+
except Exception as e:
|
728
|
+
logger.error(f"Error preloading model {model_id}: {str(e)}")
|
729
|
+
return False
|
730
|
+
finally:
|
731
|
+
# Reset model loading state
|
732
|
+
self._model_loading = old_loading_state
|
733
|
+
|
734
|
+
def get_preloaded_models(self) -> Dict[str, datetime]:
|
735
|
+
"""Return the dict of preloaded models and their last use times"""
|
736
|
+
return self._preloaded_models
|
737
|
+
|
738
|
+
def update_model_usage(self, model_id: str) -> None:
|
739
|
+
"""Update the timestamp for a model that is being used"""
|
740
|
+
if model_id and model_id in self._preloaded_models:
|
741
|
+
from datetime import datetime
|
742
|
+
self._preloaded_models[model_id] = datetime.now()
|
743
|
+
logger.info(f"Updated usage timestamp for model {model_id}")
|
744
|
+
|
745
|
+
async def release_inactive_models(self, max_inactive_minutes: int = 30) -> List[str]:
|
746
|
+
"""
|
747
|
+
Release models that have been inactive for more than the specified time
|
748
|
+
Returns a list of model IDs that were released
|
749
|
+
"""
|
750
|
+
from datetime import datetime, timedelta
|
751
|
+
|
752
|
+
if not self._preloaded_models:
|
753
|
+
return []
|
754
|
+
|
755
|
+
now = datetime.now()
|
756
|
+
inactive_threshold = timedelta(minutes=max_inactive_minutes)
|
757
|
+
models_to_release = []
|
758
|
+
|
759
|
+
# Find models that have been inactive for too long
|
760
|
+
for model_id, last_used in list(self._preloaded_models.items()):
|
761
|
+
if now - last_used > inactive_threshold:
|
762
|
+
models_to_release.append(model_id)
|
763
|
+
|
764
|
+
# Release the models
|
765
|
+
released_models = []
|
766
|
+
for model_id in models_to_release:
|
767
|
+
try:
|
768
|
+
logger.info(f"Releasing inactive model: {model_id} (inactive for {(now - self._preloaded_models[model_id]).total_seconds() / 60:.1f} minutes)")
|
769
|
+
# We don't have an explicit "unload" API in Ollama, but we can remove it from our tracking
|
770
|
+
del self._preloaded_models[model_id]
|
771
|
+
released_models.append(model_id)
|
772
|
+
except Exception as e:
|
773
|
+
logger.error(f"Error releasing model {model_id}: {str(e)}")
|
774
|
+
|
775
|
+
return released_models
|
538
776
|
|
539
777
|
async def get_model_details(self, model_id: str) -> Dict[str, Any]:
|
540
778
|
"""Get detailed information about a specific Ollama model"""
|
@@ -53,20 +53,38 @@ class OpenAIClient(BaseModelClient):
|
|
53
53
|
"""Generate a text completion using OpenAI"""
|
54
54
|
processed_messages = self._prepare_messages(messages, style)
|
55
55
|
|
56
|
-
#
|
57
|
-
|
58
|
-
"model": model,
|
59
|
-
"messages": processed_messages,
|
60
|
-
"temperature": temperature,
|
61
|
-
}
|
62
|
-
|
63
|
-
# Only add max_tokens if it's not None
|
64
|
-
if max_tokens is not None:
|
65
|
-
params["max_tokens"] = max_tokens
|
66
|
-
|
67
|
-
response = await self.client.chat.completions.create(**params)
|
56
|
+
# Check if this is a reasoning model (o-series)
|
57
|
+
is_reasoning_model = model.startswith(("o1", "o3", "o4")) or model in ["o1", "o3", "o4-mini"]
|
68
58
|
|
69
|
-
|
59
|
+
# Use the Responses API for reasoning models
|
60
|
+
if is_reasoning_model:
|
61
|
+
# Create parameters dict for the Responses API
|
62
|
+
params = {
|
63
|
+
"model": model,
|
64
|
+
"input": processed_messages,
|
65
|
+
"reasoning": {"effort": "medium"}, # Default to medium effort
|
66
|
+
}
|
67
|
+
|
68
|
+
# Only add max_tokens if it's not None
|
69
|
+
if max_tokens is not None:
|
70
|
+
params["max_output_tokens"] = max_tokens
|
71
|
+
|
72
|
+
response = await self.client.responses.create(**params)
|
73
|
+
return response.output_text
|
74
|
+
else:
|
75
|
+
# Use the Chat Completions API for non-reasoning models
|
76
|
+
params = {
|
77
|
+
"model": model,
|
78
|
+
"messages": processed_messages,
|
79
|
+
"temperature": temperature,
|
80
|
+
}
|
81
|
+
|
82
|
+
# Only add max_tokens if it's not None
|
83
|
+
if max_tokens is not None:
|
84
|
+
params["max_tokens"] = max_tokens
|
85
|
+
|
86
|
+
response = await self.client.chat.completions.create(**params)
|
87
|
+
return response.choices[0].message.content
|
70
88
|
|
71
89
|
async def generate_stream(self, messages: List[Dict[str, str]],
|
72
90
|
model: str,
|
@@ -83,6 +101,9 @@ class OpenAIClient(BaseModelClient):
|
|
83
101
|
|
84
102
|
processed_messages = self._prepare_messages(messages, style)
|
85
103
|
|
104
|
+
# Check if this is a reasoning model (o-series)
|
105
|
+
is_reasoning_model = model.startswith(("o1", "o3", "o4")) or model in ["o1", "o3", "o4-mini"]
|
106
|
+
|
86
107
|
try:
|
87
108
|
debug_log(f"OpenAI: preparing {len(processed_messages)} messages for stream")
|
88
109
|
|
@@ -119,20 +140,37 @@ class OpenAIClient(BaseModelClient):
|
|
119
140
|
|
120
141
|
while retry_count <= max_retries:
|
121
142
|
try:
|
122
|
-
# Create parameters dict
|
123
|
-
|
124
|
-
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
143
|
+
# Create parameters dict based on model type
|
144
|
+
if is_reasoning_model:
|
145
|
+
# Use the Responses API for reasoning models
|
146
|
+
params = {
|
147
|
+
"model": model,
|
148
|
+
"input": api_messages,
|
149
|
+
"reasoning": {"effort": "medium"}, # Default to medium effort
|
150
|
+
"stream": True,
|
151
|
+
}
|
152
|
+
|
153
|
+
# Only add max_tokens if it's not None
|
154
|
+
if max_tokens is not None:
|
155
|
+
params["max_output_tokens"] = max_tokens
|
156
|
+
|
157
|
+
debug_log(f"OpenAI: creating reasoning model stream with params: {params}")
|
158
|
+
stream = await self.client.responses.create(**params)
|
159
|
+
else:
|
160
|
+
# Use the Chat Completions API for non-reasoning models
|
161
|
+
params = {
|
162
|
+
"model": model,
|
163
|
+
"messages": api_messages,
|
164
|
+
"temperature": temperature,
|
165
|
+
"stream": True,
|
166
|
+
}
|
167
|
+
|
168
|
+
# Only add max_tokens if it's not None
|
169
|
+
if max_tokens is not None:
|
170
|
+
params["max_tokens"] = max_tokens
|
171
|
+
|
172
|
+
debug_log(f"OpenAI: creating chat completion stream with params: {params}")
|
173
|
+
stream = await self.client.chat.completions.create(**params)
|
136
174
|
|
137
175
|
# Store the stream for potential cancellation
|
138
176
|
self._active_stream = stream
|
@@ -157,17 +195,28 @@ class OpenAIClient(BaseModelClient):
|
|
157
195
|
|
158
196
|
chunk_count += 1
|
159
197
|
try:
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
text = str(
|
165
|
-
debug_log(f"OpenAI: yielding chunk {chunk_count} of length: {len(text)}")
|
198
|
+
# Handle different response formats based on model type
|
199
|
+
if is_reasoning_model:
|
200
|
+
# For reasoning models using the Responses API
|
201
|
+
if hasattr(chunk, 'output_text') and chunk.output_text is not None:
|
202
|
+
text = str(chunk.output_text)
|
203
|
+
debug_log(f"OpenAI reasoning: yielding chunk {chunk_count} of length: {len(text)}")
|
166
204
|
yield text
|
167
205
|
else:
|
168
|
-
debug_log(f"OpenAI: skipping
|
206
|
+
debug_log(f"OpenAI reasoning: skipping chunk {chunk_count} with missing content")
|
169
207
|
else:
|
170
|
-
|
208
|
+
# For regular models using the Chat Completions API
|
209
|
+
if chunk.choices and hasattr(chunk.choices[0], 'delta') and hasattr(chunk.choices[0].delta, 'content'):
|
210
|
+
content = chunk.choices[0].delta.content
|
211
|
+
if content is not None:
|
212
|
+
# Ensure we're returning a string
|
213
|
+
text = str(content)
|
214
|
+
debug_log(f"OpenAI: yielding chunk {chunk_count} of length: {len(text)}")
|
215
|
+
yield text
|
216
|
+
else:
|
217
|
+
debug_log(f"OpenAI: skipping None content chunk {chunk_count}")
|
218
|
+
else:
|
219
|
+
debug_log(f"OpenAI: skipping chunk {chunk_count} with missing content")
|
171
220
|
except Exception as chunk_error:
|
172
221
|
debug_log(f"OpenAI: error processing chunk {chunk_count}: {str(chunk_error)}")
|
173
222
|
# Skip problematic chunks but continue processing
|
@@ -221,11 +270,32 @@ class OpenAIClient(BaseModelClient):
|
|
221
270
|
for model in models_response.data:
|
222
271
|
# Use 'id' as both id and name for now; can enhance with more info if needed
|
223
272
|
models.append({"id": model.id, "name": model.id})
|
273
|
+
|
274
|
+
# Add reasoning models which might not be in the models list
|
275
|
+
reasoning_models = [
|
276
|
+
{"id": "o1", "name": "o1 (Reasoning)"},
|
277
|
+
{"id": "o1-mini", "name": "o1-mini (Reasoning)"},
|
278
|
+
{"id": "o3", "name": "o3 (Reasoning)"},
|
279
|
+
{"id": "o3-mini", "name": "o3-mini (Reasoning)"},
|
280
|
+
{"id": "o4-mini", "name": "o4-mini (Reasoning)"}
|
281
|
+
]
|
282
|
+
|
283
|
+
# Add reasoning models if they're not already in the list
|
284
|
+
existing_ids = {model["id"] for model in models}
|
285
|
+
for reasoning_model in reasoning_models:
|
286
|
+
if reasoning_model["id"] not in existing_ids:
|
287
|
+
models.append(reasoning_model)
|
288
|
+
|
224
289
|
return models
|
225
290
|
except Exception as e:
|
226
291
|
# Fallback to a static list if API call fails
|
227
292
|
return [
|
228
293
|
{"id": "gpt-3.5-turbo", "name": "gpt-3.5-turbo"},
|
229
294
|
{"id": "gpt-4", "name": "gpt-4"},
|
230
|
-
{"id": "gpt-4-turbo", "name": "gpt-4-turbo"}
|
295
|
+
{"id": "gpt-4-turbo", "name": "gpt-4-turbo"},
|
296
|
+
{"id": "o1", "name": "o1 (Reasoning)"},
|
297
|
+
{"id": "o1-mini", "name": "o1-mini (Reasoning)"},
|
298
|
+
{"id": "o3", "name": "o3 (Reasoning)"},
|
299
|
+
{"id": "o3-mini", "name": "o3-mini (Reasoning)"},
|
300
|
+
{"id": "o4-mini", "name": "o4-mini (Reasoning)"}
|
231
301
|
]
|
@@ -52,6 +52,31 @@ DEFAULT_CONFIG = {
|
|
52
52
|
"max_tokens": 8192,
|
53
53
|
"display_name": "GPT-4"
|
54
54
|
},
|
55
|
+
"o1": {
|
56
|
+
"provider": "openai",
|
57
|
+
"max_tokens": 128000,
|
58
|
+
"display_name": "o1 (Reasoning)"
|
59
|
+
},
|
60
|
+
"o1-mini": {
|
61
|
+
"provider": "openai",
|
62
|
+
"max_tokens": 128000,
|
63
|
+
"display_name": "o1-mini (Reasoning)"
|
64
|
+
},
|
65
|
+
"o3": {
|
66
|
+
"provider": "openai",
|
67
|
+
"max_tokens": 128000,
|
68
|
+
"display_name": "o3 (Reasoning)"
|
69
|
+
},
|
70
|
+
"o3-mini": {
|
71
|
+
"provider": "openai",
|
72
|
+
"max_tokens": 128000,
|
73
|
+
"display_name": "o3-mini (Reasoning)"
|
74
|
+
},
|
75
|
+
"o4-mini": {
|
76
|
+
"provider": "openai",
|
77
|
+
"max_tokens": 128000,
|
78
|
+
"display_name": "o4-mini (Reasoning)"
|
79
|
+
},
|
55
80
|
# Use the corrected keys from anthropic.py
|
56
81
|
"claude-3-opus-20240229": {
|
57
82
|
"provider": "anthropic",
|
@@ -126,7 +151,9 @@ DEFAULT_CONFIG = {
|
|
126
151
|
"max_history_items": 100,
|
127
152
|
"highlight_code": True,
|
128
153
|
"auto_save": True,
|
129
|
-
"generate_dynamic_titles": True
|
154
|
+
"generate_dynamic_titles": True,
|
155
|
+
"ollama_model_preload": True,
|
156
|
+
"ollama_inactive_timeout_minutes": 30
|
130
157
|
}
|
131
158
|
|
132
159
|
def validate_config(config):
|
@@ -363,7 +363,13 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
|
|
363
363
|
self.selected_model = resolve_model_id(default_model_from_config)
|
364
364
|
self.selected_style = CONFIG["default_style"] # Keep SimpleChatApp __init__
|
365
365
|
self.initial_text = initial_text # Keep SimpleChatApp __init__
|
366
|
-
|
366
|
+
|
367
|
+
# Task for model cleanup
|
368
|
+
self._model_cleanup_task = None
|
369
|
+
|
370
|
+
# Inactivity threshold in minutes before releasing model resources
|
371
|
+
# Read from config, default to 30 minutes
|
372
|
+
self.MODEL_INACTIVITY_THRESHOLD = CONFIG.get("ollama_inactive_timeout_minutes", 30)
|
367
373
|
|
368
374
|
def compose(self) -> ComposeResult: # Modify SimpleChatApp compose
|
369
375
|
"""Create the simplified application layout."""
|
@@ -420,6 +426,11 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
|
|
420
426
|
pass # Silently ignore if widget not found yet
|
421
427
|
|
422
428
|
self.update_app_info() # Update the model info
|
429
|
+
|
430
|
+
# Start the background task for model cleanup if model preloading is enabled
|
431
|
+
if CONFIG.get("ollama_model_preload", True):
|
432
|
+
self._model_cleanup_task = asyncio.create_task(self._check_inactive_models())
|
433
|
+
debug_log("Started background task for model cleanup")
|
423
434
|
|
424
435
|
# Check API keys and services # Keep SimpleChatApp on_mount
|
425
436
|
api_issues = [] # Keep SimpleChatApp on_mount
|
@@ -675,29 +686,87 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
|
|
675
686
|
|
676
687
|
# Determine title client and model based on available keys
|
677
688
|
if OPENAI_API_KEY:
|
689
|
+
# For highest success rate, use OpenAI for title generation when available
|
678
690
|
from app.api.openai import OpenAIClient
|
679
691
|
title_client = await OpenAIClient.create()
|
680
692
|
title_model = "gpt-3.5-turbo"
|
681
693
|
debug_log("Using OpenAI for background title generation")
|
682
694
|
elif ANTHROPIC_API_KEY:
|
695
|
+
# Next best option is Anthropic
|
683
696
|
from app.api.anthropic import AnthropicClient
|
684
697
|
title_client = await AnthropicClient.create()
|
685
698
|
title_model = "claude-3-haiku-20240307"
|
686
699
|
debug_log("Using Anthropic for background title generation")
|
687
700
|
else:
|
688
701
|
# Fallback to the currently selected model's client if no API keys
|
702
|
+
# Get client type first to ensure we correctly identify Ollama models
|
703
|
+
from app.api.ollama import OllamaClient
|
689
704
|
selected_model_resolved = resolve_model_id(self.selected_model)
|
690
|
-
|
691
|
-
|
692
|
-
|
705
|
+
client_type = BaseModelClient.get_client_type_for_model(selected_model_resolved)
|
706
|
+
|
707
|
+
# For Ollama models, special handling is required
|
708
|
+
if client_type == OllamaClient:
|
709
|
+
debug_log(f"Title generation with Ollama model detected: {selected_model_resolved}")
|
710
|
+
|
711
|
+
# Try common small/fast models first if they exist
|
712
|
+
try:
|
713
|
+
# Check if we have any smaller models available for faster title generation
|
714
|
+
ollama_client = await OllamaClient.create()
|
715
|
+
available_models = await ollama_client.get_available_models()
|
716
|
+
small_model_options = ["gemma:2b", "phi3:mini", "llama3:8b", "orca-mini:3b", "phi2"]
|
717
|
+
|
718
|
+
small_model_found = False
|
719
|
+
for model_name in small_model_options:
|
720
|
+
if any(model["id"] == model_name for model in available_models):
|
721
|
+
debug_log(f"Found smaller Ollama model for title generation: {model_name}")
|
722
|
+
title_model = model_name
|
723
|
+
small_model_found = True
|
724
|
+
break
|
725
|
+
|
726
|
+
if not small_model_found:
|
727
|
+
# Use the current model if no smaller models found
|
728
|
+
title_model = selected_model_resolved
|
729
|
+
debug_log(f"No smaller models found, using current model: {title_model}")
|
730
|
+
|
731
|
+
# Always create a fresh client instance to avoid interference with model preloading
|
732
|
+
title_client = ollama_client
|
733
|
+
debug_log(f"Created dedicated Ollama client for title generation with model: {title_model}")
|
734
|
+
except Exception as e:
|
735
|
+
debug_log(f"Error finding optimized Ollama model for title generation: {str(e)}")
|
736
|
+
# Fallback to standard approach
|
737
|
+
title_client = await OllamaClient.create()
|
738
|
+
title_model = selected_model_resolved
|
739
|
+
else:
|
740
|
+
# For other providers, use normal client acquisition
|
741
|
+
title_client = await BaseModelClient.get_client_for_model(selected_model_resolved)
|
742
|
+
title_model = selected_model_resolved
|
743
|
+
debug_log(f"Using selected model's client ({type(title_client).__name__}) for background title generation")
|
693
744
|
|
694
745
|
if not title_client or not title_model:
|
695
746
|
raise Exception("Could not determine a client/model for title generation.")
|
696
747
|
|
697
748
|
# Call the utility function
|
698
749
|
from app.utils import generate_conversation_title # Import locally if needed
|
699
|
-
|
700
|
-
|
750
|
+
|
751
|
+
# Add timeout handling for title generation to prevent hangs
|
752
|
+
try:
|
753
|
+
# Create a task with timeout
|
754
|
+
import asyncio
|
755
|
+
title_generation_task = asyncio.create_task(
|
756
|
+
generate_conversation_title(content, title_model, title_client)
|
757
|
+
)
|
758
|
+
|
759
|
+
# Wait for completion with timeout (30 seconds)
|
760
|
+
new_title = await asyncio.wait_for(title_generation_task, timeout=30)
|
761
|
+
debug_log(f"Background generated title: {new_title}")
|
762
|
+
except asyncio.TimeoutError:
|
763
|
+
debug_log("Title generation timed out after 30 seconds")
|
764
|
+
# Use default title in case of timeout
|
765
|
+
new_title = f"Conversation ({datetime.now().strftime('%Y-%m-%d %H:%M')})"
|
766
|
+
# Try to cancel the task
|
767
|
+
if not title_generation_task.done():
|
768
|
+
title_generation_task.cancel()
|
769
|
+
debug_log("Cancelled timed out title generation task")
|
701
770
|
|
702
771
|
# Check if title generation returned the default or a real title
|
703
772
|
if new_title and not new_title.startswith("Conversation ("):
|
@@ -718,8 +787,8 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
|
|
718
787
|
title_widget.update(new_title)
|
719
788
|
self.current_conversation.title = new_title # Update local object too
|
720
789
|
log(f"Background title update successful: {new_title}")
|
721
|
-
#
|
722
|
-
|
790
|
+
# Subtle notification to show title was updated
|
791
|
+
self.notify(f"Conversation titled: {new_title}", severity="information", timeout=2)
|
723
792
|
else:
|
724
793
|
log("Conversation changed before background title update could apply.")
|
725
794
|
else:
|
@@ -1226,6 +1295,94 @@ class SimpleChatApp(App): # Keep SimpleChatApp class definition
|
|
1226
1295
|
log(f"Stored selected provider: {self.selected_provider} for model: {self.selected_model}")
|
1227
1296
|
|
1228
1297
|
self.update_app_info() # Update the displayed model info
|
1298
|
+
|
1299
|
+
# Preload the model if it's an Ollama model and preloading is enabled
|
1300
|
+
if self.selected_provider == "ollama" and CONFIG.get("ollama_model_preload", True):
|
1301
|
+
# Start the background task to preload the model
|
1302
|
+
debug_log(f"Starting background task to preload Ollama model: {self.selected_model}")
|
1303
|
+
asyncio.create_task(self._preload_ollama_model(self.selected_model))
|
1304
|
+
|
1305
|
+
async def _preload_ollama_model(self, model_id: str) -> None:
|
1306
|
+
"""Preload an Ollama model in the background"""
|
1307
|
+
from app.api.ollama import OllamaClient
|
1308
|
+
|
1309
|
+
debug_log(f"Preloading Ollama model: {model_id}")
|
1310
|
+
# Show a subtle notification to the user
|
1311
|
+
self.notify("Preparing model for use...", severity="information", timeout=3)
|
1312
|
+
|
1313
|
+
try:
|
1314
|
+
# Initialize the client
|
1315
|
+
client = await OllamaClient.create()
|
1316
|
+
|
1317
|
+
# Update the loading indicator to show model loading
|
1318
|
+
loading = self.query_one("#loading-indicator")
|
1319
|
+
loading.remove_class("hidden")
|
1320
|
+
loading.add_class("model-loading")
|
1321
|
+
loading.update(f"⚙️ Loading Ollama model...")
|
1322
|
+
|
1323
|
+
# Preload the model
|
1324
|
+
success = await client.preload_model(model_id)
|
1325
|
+
|
1326
|
+
# Hide the loading indicator
|
1327
|
+
loading.add_class("hidden")
|
1328
|
+
loading.remove_class("model-loading")
|
1329
|
+
|
1330
|
+
if success:
|
1331
|
+
debug_log(f"Successfully preloaded model: {model_id}")
|
1332
|
+
self.notify(f"Model ready for use", severity="success", timeout=2)
|
1333
|
+
else:
|
1334
|
+
debug_log(f"Failed to preload model: {model_id}")
|
1335
|
+
# No need to notify the user about failure - will happen naturally on first use
|
1336
|
+
except Exception as e:
|
1337
|
+
debug_log(f"Error preloading model: {str(e)}")
|
1338
|
+
# Make sure to hide the loading indicator
|
1339
|
+
try:
|
1340
|
+
loading = self.query_one("#loading-indicator")
|
1341
|
+
loading.add_class("hidden")
|
1342
|
+
loading.remove_class("model-loading")
|
1343
|
+
except Exception:
|
1344
|
+
pass
|
1345
|
+
|
1346
|
+
async def _check_inactive_models(self) -> None:
|
1347
|
+
"""Background task to check for and release inactive models"""
|
1348
|
+
from app.api.ollama import OllamaClient
|
1349
|
+
|
1350
|
+
# How often to check for inactive models (in seconds)
|
1351
|
+
CHECK_INTERVAL = 600 # 10 minutes
|
1352
|
+
|
1353
|
+
debug_log(f"Starting inactive model check task with interval {CHECK_INTERVAL}s")
|
1354
|
+
|
1355
|
+
try:
|
1356
|
+
while True:
|
1357
|
+
await asyncio.sleep(CHECK_INTERVAL)
|
1358
|
+
|
1359
|
+
debug_log("Checking for inactive models...")
|
1360
|
+
|
1361
|
+
try:
|
1362
|
+
# Initialize the client
|
1363
|
+
client = await OllamaClient.create()
|
1364
|
+
|
1365
|
+
# Get the threshold from instance variable
|
1366
|
+
threshold = getattr(self, "MODEL_INACTIVITY_THRESHOLD", 30)
|
1367
|
+
|
1368
|
+
# Check and release inactive models
|
1369
|
+
released_models = await client.release_inactive_models(threshold)
|
1370
|
+
|
1371
|
+
if released_models:
|
1372
|
+
debug_log(f"Released {len(released_models)} inactive models: {released_models}")
|
1373
|
+
else:
|
1374
|
+
debug_log("No inactive models to release")
|
1375
|
+
|
1376
|
+
except Exception as e:
|
1377
|
+
debug_log(f"Error checking for inactive models: {str(e)}")
|
1378
|
+
# Continue loop even if this check fails
|
1379
|
+
|
1380
|
+
except asyncio.CancelledError:
|
1381
|
+
debug_log("Model cleanup task cancelled")
|
1382
|
+
# Normal task cancellation, clean exit
|
1383
|
+
except Exception as e:
|
1384
|
+
debug_log(f"Unexpected error in model cleanup task: {str(e)}")
|
1385
|
+
# Log but don't crash
|
1229
1386
|
|
1230
1387
|
def on_style_selector_style_selected(self, event: StyleSelector.StyleSelected) -> None: # Keep SimpleChatApp on_style_selector_style_selected
|
1231
1388
|
"""Handle style selection""" # Keep SimpleChatApp on_style_selector_style_selected docstring
|
@@ -32,6 +32,11 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
|
|
32
32
|
|
33
33
|
# Try-except the entire function to ensure we always return a title
|
34
34
|
try:
|
35
|
+
# Check if we're using an Ollama client
|
36
|
+
from app.api.ollama import OllamaClient
|
37
|
+
is_ollama_client = isinstance(client, OllamaClient)
|
38
|
+
debug_log(f"Client is Ollama: {is_ollama_client}")
|
39
|
+
|
35
40
|
# Pick a reliable title generation model - prefer OpenAI if available
|
36
41
|
from app.config import OPENAI_API_KEY, ANTHROPIC_API_KEY
|
37
42
|
|
@@ -46,10 +51,16 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
|
|
46
51
|
title_model = "claude-3-haiku-20240307"
|
47
52
|
debug_log("Using Anthropic for title generation")
|
48
53
|
else:
|
49
|
-
#
|
50
|
-
|
51
|
-
|
52
|
-
|
54
|
+
# For Ollama clients, ensure we have a clean instance to avoid conflicts with preloaded models
|
55
|
+
if is_ollama_client:
|
56
|
+
debug_log("Creating fresh Ollama client instance for title generation")
|
57
|
+
title_client = await OllamaClient.create()
|
58
|
+
title_model = model
|
59
|
+
else:
|
60
|
+
# Use the passed client for other providers
|
61
|
+
title_client = client
|
62
|
+
title_model = model
|
63
|
+
debug_log(f"Using {type(title_client).__name__} for title generation with model {title_model}")
|
53
64
|
|
54
65
|
# Create a special prompt for title generation
|
55
66
|
title_prompt = [
|
@@ -65,12 +76,25 @@ async def generate_conversation_title(message: str, model: str, client: Any) ->
|
|
65
76
|
|
66
77
|
# Generate title
|
67
78
|
debug_log(f"Sending title generation request to {title_model}")
|
68
|
-
|
69
|
-
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
79
|
+
|
80
|
+
# Check if this is a reasoning model (o-series)
|
81
|
+
is_reasoning_model = title_model.startswith(("o1", "o3", "o4")) or title_model in ["o1", "o3", "o4-mini"]
|
82
|
+
|
83
|
+
if is_reasoning_model:
|
84
|
+
# For reasoning models, don't include temperature
|
85
|
+
title = await title_client.generate_completion(
|
86
|
+
messages=title_prompt,
|
87
|
+
model=title_model,
|
88
|
+
max_tokens=60
|
89
|
+
)
|
90
|
+
else:
|
91
|
+
# For non-reasoning models, include temperature
|
92
|
+
title = await title_client.generate_completion(
|
93
|
+
messages=title_prompt,
|
94
|
+
model=title_model,
|
95
|
+
temperature=0.7,
|
96
|
+
max_tokens=60
|
97
|
+
)
|
74
98
|
|
75
99
|
# Sanitize the title
|
76
100
|
title = title.strip().strip('"\'').strip()
|
@@ -755,7 +779,13 @@ def resolve_model_id(model_id_or_name: str) -> str:
|
|
755
779
|
"35-turbo": "gpt-3.5-turbo",
|
756
780
|
"35": "gpt-3.5-turbo",
|
757
781
|
"4.1-mini": "gpt-4.1-mini", # Add support for gpt-4.1-mini
|
758
|
-
"4.1": "gpt-4.1" # Add support for gpt-4.1
|
782
|
+
"4.1": "gpt-4.1", # Add support for gpt-4.1
|
783
|
+
# Add support for reasoning models
|
784
|
+
"o1": "o1",
|
785
|
+
"o1-mini": "o1-mini",
|
786
|
+
"o3": "o3",
|
787
|
+
"o3-mini": "o3-mini",
|
788
|
+
"o4-mini": "o4-mini"
|
759
789
|
}
|
760
790
|
|
761
791
|
if input_lower in openai_model_aliases:
|
@@ -765,23 +795,26 @@ def resolve_model_id(model_id_or_name: str) -> str:
|
|
765
795
|
|
766
796
|
# Special case handling for common typos and model name variations
|
767
797
|
typo_corrections = {
|
768
|
-
|
769
|
-
"
|
770
|
-
"o1
|
771
|
-
"o1-
|
772
|
-
"
|
773
|
-
"o4
|
774
|
-
"o4-
|
798
|
+
# Keep reasoning models as-is, don't convert 'o' to '0'
|
799
|
+
# "o4-mini": "04-mini",
|
800
|
+
# "o1": "01",
|
801
|
+
# "o1-mini": "01-mini",
|
802
|
+
# "o1-preview": "01-preview",
|
803
|
+
# "o4": "04",
|
804
|
+
# "o4-preview": "04-preview",
|
805
|
+
# "o4-vision": "04-vision"
|
775
806
|
}
|
776
807
|
|
808
|
+
# Don't convert reasoning model IDs that start with 'o'
|
777
809
|
# Check for more complex typo patterns with dates
|
778
|
-
if input_lower.startswith("o1-") and "-202" in input_lower:
|
810
|
+
if input_lower.startswith("o1-") and "-202" in input_lower and not any(input_lower == model_id for model_id in ["o1", "o1-mini", "o3", "o3-mini", "o4-mini"]):
|
779
811
|
corrected = "01" + input_lower[2:]
|
780
812
|
logger.info(f"Converting '{input_lower}' to '{corrected}' (letter 'o' to zero '0')")
|
781
813
|
input_lower = corrected
|
782
814
|
model_id_or_name = corrected
|
783
815
|
|
784
|
-
if
|
816
|
+
# Only apply typo corrections if not a reasoning model
|
817
|
+
if input_lower in typo_corrections and not any(input_lower == model_id for model_id in ["o1", "o1-mini", "o3", "o3-mini", "o4-mini"]):
|
785
818
|
corrected = typo_corrections[input_lower]
|
786
819
|
logger.info(f"Converting '{input_lower}' to '{corrected}' (letter 'o' to zero '0')")
|
787
820
|
input_lower = corrected
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.4
|
2
2
|
Name: chat-console
|
3
|
-
Version: 0.
|
3
|
+
Version: 0.4.2
|
4
4
|
Summary: A command-line interface for chatting with LLMs, storing chats and (future) rag interactions
|
5
5
|
Home-page: https://github.com/wazacraftrfid/chat-console
|
6
6
|
Author: Johnathan Greenaway
|
@@ -28,7 +28,8 @@ Dynamic: requires-dist
|
|
28
28
|
Dynamic: requires-python
|
29
29
|
Dynamic: summary
|
30
30
|
|
31
|
-
|
31
|
+
|
32
|
+
c# Chat CLI
|
32
33
|
|
33
34
|
A comprehensive command-line interface for chatting with various AI language models. This application allows you to interact with different LLM providers through an intuitive terminal-based interface.
|
34
35
|
|
@@ -37,6 +38,7 @@ A comprehensive command-line interface for chatting with various AI language mod
|
|
37
38
|
- Interactive terminal UI with Textual library
|
38
39
|
- Support for multiple AI models:
|
39
40
|
- OpenAI models (GPT-3.5, GPT-4)
|
41
|
+
- OpenAI reasoning models (o1, o1-mini, o3, o3-mini, o4-mini)
|
40
42
|
- Anthropic models (Claude 3 Opus, Sonnet, Haiku)
|
41
43
|
- Conversation history with search functionality
|
42
44
|
- Customizable response styles (concise, detailed, technical, friendly)
|
@@ -71,6 +73,26 @@ Run the application:
|
|
71
73
|
chat-cli
|
72
74
|
```
|
73
75
|
|
76
|
+
### Testing Reasoning Models
|
77
|
+
|
78
|
+
To test the OpenAI reasoning models implementation, you can use the included test script:
|
79
|
+
```
|
80
|
+
./test_reasoning.py
|
81
|
+
```
|
82
|
+
|
83
|
+
This script will test both completion and streaming with the available reasoning models.
|
84
|
+
|
85
|
+
### About OpenAI Reasoning Models
|
86
|
+
|
87
|
+
OpenAI's reasoning models (o1, o3, o4-mini, etc.) are LLMs trained with reinforcement learning to perform reasoning. These models:
|
88
|
+
|
89
|
+
- Think before they answer, producing a long internal chain of thought
|
90
|
+
- Excel in complex problem solving, coding, scientific reasoning, and multi-step planning
|
91
|
+
- Use "reasoning tokens" to work through problems step by step before providing a response
|
92
|
+
- Support different reasoning effort levels (low, medium, high)
|
93
|
+
|
94
|
+
The implementation in this CLI supports both standard completions and streaming with these models.
|
95
|
+
|
74
96
|
### Keyboard Shortcuts
|
75
97
|
|
76
98
|
- `q` - Quit the application
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|