RubyGems - smart_prompt - Versions diffs - 0.4.4 → 0.5.1 - Mend

smart_prompt 0.4.4 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (71) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +16 -0
data/README.cn.md +305 -11
data/README.md +309 -11
data/Rakefile +10 -1
data/config/anthropic_config.yml +151 -0
data/config/image_generation_config.yml +22 -0
data/config/multimodal_config.yml +85 -0
data/config/sensenova_config.yml +63 -0
data/config/zhipu_config.yml +73 -0
data/docs/ANTHROPIC_EXAMPLES.md +559 -0
data/docs/CONVERSATION_INTEGRATION_SUMMARY.md +155 -0
data/docs/HISTORY_EXAMPLES_README.md +533 -0
data/docs/HISTORY_MANAGEMENT_GUIDE.md +797 -0
data/docs/MONITORING_GUIDE.md +278 -0
data/docs/MULTIMODAL_README.md +265 -0
data/docs/RELEVANCE_BASED_STRATEGY_IMPLEMENTATION.md +124 -0
data/docs/STT_README.md +302 -0
data/docs/TTS_README.md +303 -0
data/docs/VIDEO_GENERATION_README.md +246 -0
data/docs/delete_files_list.md +124 -0
data/examples/anthropic_basic_chat.rb +143 -0
data/examples/anthropic_example.rb +232 -0
data/examples/anthropic_multimodal.rb +212 -0
data/examples/anthropic_streaming.rb +312 -0
data/examples/anthropic_tool_calling.rb +393 -0
data/examples/automatic_cleanup_example.rb +109 -0
data/examples/history_management_examples.rb +522 -0
data/examples/image_generation_example.rb +130 -0
data/examples/monitoring_example.rb +121 -0
data/examples/multimodal_example.rb +63 -0
data/examples/relevance_based_strategy_example.rb +87 -0
data/examples/sensenova_example.rb +129 -0
data/examples/stt_example.rb +287 -0
data/examples/tts_example.rb +244 -0
data/examples/video_generation_example.rb +189 -0
data/examples/zhipu_example.rb +151 -0
data/lib/smart_prompt/anthropic_adapter.rb +407 -298
data/lib/smart_prompt/compression_engine.rb +201 -0
data/lib/smart_prompt/context_strategy.rb +22 -0
data/lib/smart_prompt/conversation.rb +47 -4
data/lib/smart_prompt/engine.rb +29 -2
data/lib/smart_prompt/history_manager.rb +596 -0
data/lib/smart_prompt/hybrid_strategy.rb +222 -0
data/lib/smart_prompt/image_generation_adapter.rb +297 -0
data/lib/smart_prompt/lru_cache.rb +133 -0
data/lib/smart_prompt/message.rb +57 -0
data/lib/smart_prompt/multimodal_adapter.rb +277 -0
data/lib/smart_prompt/persistence_layer.rb +197 -0
data/lib/smart_prompt/relevance_based_strategy.rb +221 -0
data/lib/smart_prompt/sensenova_adapter.rb +410 -0
data/lib/smart_prompt/session.rb +140 -0
data/lib/smart_prompt/sliding_window_strategy.rb +100 -0
data/lib/smart_prompt/stt_adapter.rb +381 -0
data/lib/smart_prompt/summary_based_strategy.rb +152 -0
data/lib/smart_prompt/token_counter.rb +74 -0
data/lib/smart_prompt/tts_adapter.rb +403 -0
data/lib/smart_prompt/version.rb +1 -1
data/lib/smart_prompt/video_generation_adapter.rb +330 -0
data/lib/smart_prompt/worker.rb +28 -3
data/lib/smart_prompt/zhipu_adapter.rb +616 -0
data/lib/smart_prompt.rb +21 -0
data/workers/history_management_examples.rb +407 -0
data/workers/image_generation_workers.rb +119 -0
data/workers/multimodal_workers.rb +110 -0
data/workers/sensenova_workers.rb +62 -0
data/workers/stt_workers.rb +195 -0
data/workers/tts_workers.rb +388 -0
data/workers/video_generation_workers.rb +264 -0
data/workers/zhipu_workers.rb +113 -0
metadata +88 -1

data/README.md CHANGED Viewed

@@ -11,13 +11,24 @@ SmartPrompt is a powerful Ruby gem that provides an elegant domain-specific lang
 ### Multi-LLM Support
 - **OpenAI API Compatible**: Full support for OpenAI GPT models and compatible APIs
-- **Llama.cpp Integration**: Direct integration with local Llama.cpp servers
+- **Anthropic Claude**: Native support for Claude models with multimodal capabilities
+- **SenseNova (商汤日日新)**: One adapter covers chat (商量), multimodal vision (图文多模态), Cupido embeddings (向量), and 秒画 text-to-image — see `examples/sensenova_example.rb`
+- **智谱 AI (BigModel / GLM)**: One adapter covers all categories — chat (GLM-4), vision (GLM-4V), embeddings (embedding-3), text-to-image (CogView), text-to-video (CogVideoX), TTS (GLM-TTS), ASR (GLM-ASR) — see `examples/zhipu_example.rb`
+- **Llama.cpp Integration**: Direct integration with local Llama.cpp servers
 - **Extensible Adapters**: Easy-to-extend adapter system for new LLM providers
 - **Unified Interface**: Same API regardless of the underlying LLM provider
+### Multimodal AI Capabilities
+- **Vision Models**: Support for image understanding and analysis
+- **Image Generation**: Create images from text prompts using diffusion models
+- **Video Generation**: Generate videos from text or image prompts
+- **Text-to-Speech**: Convert text to natural-sounding speech
+- **Speech-to-Text**: Transcribe audio files to text with multi-language support
 ### Flexible Architecture
 - **Worker-based Tasks**: Define reusable workers for specific AI tasks
 - **Template System**: ERB-based prompt templates with parameter injection
+- **Intelligent History Management**: Session isolation, automatic compression, and multiple context strategies
 - **Conversation Management**: Built-in conversation history and context management
 - **Streaming Support**: Real-time response streaming for better user experience
@@ -26,6 +37,8 @@ SmartPrompt is a powerful Ruby gem that provides an elegant domain-specific lang
 - **Retry Logic**: Robust error handling with configurable retry mechanisms
 - **Embeddings**: Text embedding generation for semantic search and RAG applications
 - **Configuration-driven**: YAML-based configuration for easy deployment management
+- **Batch Processing**: Efficient processing of multiple files and tasks
+- **Language Detection**: Automatic language identification from text and audio
 ### Production Ready
 - **Comprehensive Logging**: Detailed logging for debugging and monitoring
@@ -61,6 +74,7 @@ Create a YAML configuration file (`config/smart_prompt.yml`):
 # Adapter definitions
 adapters:
   openai: OpenAIAdapter
+  anthropic: AnthropicAdapter
 # LLM configurations
 llms:
   SiliconFlow:
@@ -68,7 +82,13 @@ llms:
     url: https://api.siliconflow.cn/v1/
     api_key: ENV["APIKey"]
     default_model: Qwen/Qwen2.5-7B-Instruct
-  local:
+  claude:
+    adapter: anthropic
+    api_key: ENV["ANTHROPIC_API_KEY"]
+    model: claude-3-5-sonnet-20241022
+    temperature: 0.7
+    max_tokens: 4096
+  llamacpp:
     adapter: openai
     url: http://localhost:8080/
   ollama:
@@ -238,7 +258,10 @@ end
 ### Conversation History
+SmartPrompt provides intelligent conversation history management with session isolation, automatic compression, and multiple context strategies.
 ```ruby
+# Basic usage with automatic history management
 SmartPrompt.define_worker :conversational_chat do
   use "deepseek"
   model "deepseek-chat"
@@ -246,8 +269,38 @@ SmartPrompt.define_worker :conversational_chat do
   prompt(params[:message], with_history: true)
   send_msg
 end
+# Advanced usage with explicit session management
+SmartPrompt.define_worker :session_chat do
+  use "deepseek"
+  model "deepseek-chat"
+  # Use session_id for isolated conversations
+  session_id = params[:session_id] || "default"
+  # Configure session behavior
+  session_config = {
+    max_messages: 100,
+    max_tokens: 4000,
+    context_strategy: :sliding_window  # or :relevance_based, :summary_based, :hybrid
+  }
+  sys_msg("You are a helpful assistant.", params)
+  prompt(params[:message], with_history: true)
+  params.merge(session_id: session_id, session_config: session_config)
+  send_msg
+end
 ```
+**History Management Features:**
+- **Session Isolation**: Each conversation has independent history
+- **Context Strategies**: Choose from sliding window, relevance-based, summary-based, or hybrid
+- **Automatic Compression**: Reduce token usage while preserving context
+- **Persistence**: Save and restore conversations across restarts
+- **Performance**: LRU caching and async I/O for optimal performance
+See [History Management Guide](HISTORY_MANAGEMENT_GUIDE.md) for detailed documentation.
 ### Embeddings Generation
 ```ruby
@@ -265,6 +318,78 @@ embeddings = engine.call_worker(:text_embedder, {
 })
 ```
+### Multimodal AI Examples
+#### Image Generation
+```ruby
+# Generate image from text prompt (SiliconFlow /v1/images/generations)
+result = engine.call_worker(:image_generator, {
+  prompt: "A beautiful sunset over mountains",
+  image_size: "1024x1024",   # "widthxheight"; aliases: size:
+  batch_size: 1,             # only Kolors; aliases: n:
+  negative_prompt: "blurry, low quality",
+  save_to_file: true,
+  output_dir: "./generated_images"
+})
+puts "Generated #{result[:images].size} image(s)"
+puts "First image URL: #{result[:images].first[:url]}"
+puts "Saved files: #{result[:saved_files]}"
+```
+#### Video Generation
+```ruby
+# Generate video from text prompt
+result = engine.call_worker(:video_generator, {
+  prompt: "A cat playing with a ball of yarn",
+  duration: 5,
+  resolution: "720p",
+  save_to_file: true,
+  output_dir: "./generated_videos"
+})
+puts "Video generation started: #{result[:video_id]}"
+puts "Check status with: engine.call_worker(:video_status, {video_id: '#{result[:video_id]}'})"
+```
+#### Text-to-Speech
+```ruby
+# Convert text to speech
+result = engine.call_worker(:tts_synthesizer, {
+  text: "Welcome to SmartPrompt, your AI assistant",
+  voice: "alloy",
+  speed: 1.0,
+  save_to_file: true,
+  output_dir: "./generated_audio"
+})
+puts "Audio file created: #{result[:audio_file][:file_path]}"
+```
+#### Speech-to-Text
+```ruby
+# Transcribe audio to text
+result = engine.call_worker(:stt_transcriber, {
+  audio_file: "./audio.wav",
+  language: "en",
+  response_format: "json"
+})
+puts "Transcribed text: #{result[:transcription][:text]}"
+puts "Language: #{result[:transcription][:language]}"
+```
+#### Vision Analysis
+```ruby
+# Analyze image with vision model
+result = engine.call_worker(:vision_analyzer, {
+  image_file: "./image.jpg",
+  prompt: "Describe what you see in this image"
+})
+puts "Analysis: #{result[:response]}"
+```
 ## 🏗️ Architecture Overview
 SmartPrompt follows a modular architecture:
@@ -282,6 +407,13 @@ SmartPrompt follows a modular architecture:
                    │Workers│ │Conv.│ │Template│
                    │       │ │Mgmt │ │ System │
                    └───────┘ └─────┘ └────────┘
+                                │
+                       ┌────────┴────────┐
+                       │                 │
+                   ┌───▼────────┐  ┌─────▼──────┐
+                   │  History   │  │Persistence │
+                   │  Manager   │  │   Layer    │
+                   └────────────┘  └────────────┘
 ```
 ### Core Components
@@ -289,8 +421,10 @@ SmartPrompt follows a modular architecture:
 - **Engine**: Central orchestrator managing configuration, adapters, and workers
 - **Workers**: Reusable task definitions with embedded business logic
 - **Conversation**: Context and message history management
-- **Adapters**: LLM provider integrations (OpenAI, Llama.cpp, etc.)
+- **History Manager**: Intelligent conversation history with session isolation and context strategies
+- **Adapters**: LLM provider integrations (OpenAI, Anthropic, Llama.cpp, etc.)
 - **Templates**: ERB-based prompt template system
+- **Persistence Layer**: Save and restore conversation history across restarts
 ## 🔧 Configuration Reference
@@ -298,20 +432,177 @@ SmartPrompt follows a modular architecture:
 ```yaml
 adapters:
-  openai: "OpenAIAdapter"      # For OpenAI API
+  openai: "OpenAIAdapter"              # For OpenAI API
+  anthropic: "AnthropicAdapter"        # For Anthropic Claude API
+  sensenova: "SenseNovaAdapter"        # For 商汤 SenseNova (chat/vision/embeddings/image)
+  zhipu: "ZhipuAIAdapter"              # For 智谱 BigModel/GLM (chat/vision/embed/image/video/tts/asr)
+  multimodal: "MultimodalAdapter"      # For vision models
+  image_generation: "ImageGenerationAdapter"    # For image generation
+  video_generation: "VideoGenerationAdapter"    # For video generation
+  tts: "TTSAdapter"                    # For text-to-speech
+  stt: "STTAdapter"                    # For speech-to-text
 ```
 ### LLM Configuration
 ```yaml
 llms:
-  model_name:
-    adapter: "adapter_name"
-    api_key: "your_api_key"     # Can use ENV['KEY_NAME']
-    url: "https://api.url"
-    model: "model_identifier"
+  # Text models
+  gpt:
+    adapter: "openai"
+    api_key: ENV["OPENAI_API_KEY"]
+    model: "gpt-4"
+    temperature: 0.7
+  # Anthropic Claude models
+  claude:
+    adapter: "anthropic"
+    api_key: ENV["ANTHROPIC_API_KEY"]
+    model: "claude-3-5-sonnet-20241022"
+    temperature: 0.7
+    max_tokens: 4096
+  claude_opus:
+    adapter: "anthropic"
+    api_key: ENV["ANTHROPIC_API_KEY"]
+    model: "claude-3-opus-20240229"
+    temperature: 0.7
+    max_tokens: 4096
+  claude_haiku:
+    adapter: "anthropic"
+    api_key: ENV["ANTHROPIC_API_KEY"]
+    model: "claude-3-5-haiku-20241022"
+    temperature: 0.7
+    max_tokens: 4096
+  # Custom Anthropic endpoint (for proxy or custom deployment)
+  claude_custom:
+    adapter: "anthropic"
+    api_key: ENV["ANTHROPIC_API_KEY"]
+    url: "https://your-custom-endpoint.com"
+    model: "claude-3-5-sonnet-20241022"
+    temperature: 0.7
+    max_tokens: 4096
+  # 商汤 SenseNova — one adapter covers all four model categories; just change `model`.
+  # Free-tier models run on token.sensenova.cn/v1; paid models (SenseChat-5, SenseNova-V6-*
+  # , Cupido) run on api.sensenova.cn/compatible-mode/v2 (returns 403 if your key lacks them).
+  sensechat:                          # 商量 文本对话 (free-tier)
+    adapter: "sensenova"
+    url: "https://token.sensenova.cn/v1"
+    api_key: ENV["SENSENOVA_API_KEY"]
+    model: "sensenova-6.7-flash-lite"
+    temperature: 0.7
+    # Optional SenseNova sampling extras (forwarded to /chat/completions):
+    # reasoning_effort: "medium"
+    # max_completion_tokens: 4096
+    # Paid: url https://api.sensenova.cn/compatible-mode/v2, model SenseChat-5
+  sensevision:                        # 商量 图文多模态 (flash-lite is natively multimodal)
+    adapter: "sensenova"
+    url: "https://token.sensenova.cn/v1"
+    api_key: ENV["SENSENOVA_API_KEY"]
+    model: "sensenova-6.7-flash-lite"
+    # Paid: url https://api.sensenova.cn/compatible-mode/v2, model SenseNova-V6-Pro
+  senseembedding:                     # Cupido 向量模型 (paid; native endpoint)
+    adapter: "sensenova"
+    url: "https://api.sensenova.cn/compatible-mode/v2"
+    embeddings_url: "https://api.sensenova.cn/v1/llm/embeddings"
+    api_key: ENV["SENSENOVA_API_KEY"]
+    model: "Cupido"
+  senseimage:                         # 秒画 文生图 (sensenova-u1-fast; token.sensenova.cn base)
+    adapter: "sensenova"
+    url: "https://token.sensenova.cn/v1"
+    image_url: "https://token.sensenova.cn/v1/images/generations"
+    api_key: ENV["SENSENOVA_API_KEY"]
+    model: "sensenova-u1-fast"
+    # sensenova-u1-fast only accepts specific sizes (default 2048x2048); see
+    # VALID_IMAGE_SIZES in sensenova_adapter.rb.
+  # 智谱 AI (BigModel/GLM) — one adapter covers all categories; just change `model`.
+  # Base https://open.bigmodel.cn/api/paas/v4 ; Bearer auth. Defaults use free-tier models.
+  glm:                                # 文本对话 (free glm-4-flash; paid glm-4-plus/glm-5.2)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "glm-4-flash"
     temperature: 0.7
-    # Additional provider-specific options
+    # CodeGeeX-4: set `coding: true` and model: codegeex-4 (uses the coding base)
+  glm_vision:                         # 图文多模态 (free glm-4v-flash; paid glm-4v-plus)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "glm-4v-flash"
+  embedding:                          # 向量模型 (embedding-3; custom dimensions 256/512/1024/2048)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "embedding-3"
+    dimensions: 1024
+  cogview:                            # 文生图 (free cogview-3-flash; paid cogview-4/glm-image)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "cogview-3-flash"
+  cogvideo:                           # 文生视频 (async submit->poll->download; free cogvideox-flash)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "cogvideox-flash"
+  glm_tts:                            # 语音合成 (GLM-TTS)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "glm-tts"
+  glm_asr:                            # 语音识别 (GLM-ASR-2512)
+    adapter: "zhipu"
+    url: "https://open.bigmodel.cn/api/paas/v4"
+    api_key: ENV["ZHIPUAI_API_KEY"]
+    model: "glm-asr-2512"
+  # Vision models
+  vision:
+    adapter: "multimodal"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    model: "Qwen/Qwen2.5-VL-7B-Instruct"
+  # Image generation (Kolors supports batch_size/guidance_scale; see Qwen-Image for cfg)
+  image_gen:
+    adapter: "image_generation"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    model: "Kwai-Kolors/Kolors"
+  # Video generation
+  video_gen:
+    adapter: "video_generation"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    model: "Wan-AI/Wan2.2-T2V-A14B"
+  # Text-to-speech
+  tts_service:
+    adapter: "tts"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    model: "FunAudioLLM/CosyVoice2-0.5B"
+  # Speech-to-text
+  stt_service:
+    adapter: "stt"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    model: "FunAudioLLM/CosyVoice2-0.5B"
 ```
 ### Model Alias Configuration
@@ -398,20 +689,27 @@ end
 ## 🚀 Real-world Use Cases
 - **Chatbots and Conversational AI**: Build sophisticated chatbots with context awareness
-- **Content Generation**: Automated content creation with template-driven prompts
+- **Content Generation**: Automated content creation with template-driven prompts
 - **Code Analysis**: AI-powered code review and documentation generation
 - **Customer Support**: Intelligent ticket routing and response suggestions
 - **Data Processing**: LLM-powered data extraction and transformation
 - **Educational Tools**: AI tutors and learning assistance systems
+- **Multimedia Content Creation**: Generate images, videos, and audio content
+- **Voice Interfaces**: Build voice-enabled applications with TTS and STT
+- **Visual Analysis**: Image understanding and object detection applications
+- **Accessibility Tools**: Audio descriptions, text-to-speech for visually impaired
 ## 🛣️ Roadmap
+- [x] **Multimodal AI Support** - Vision, Image Generation, Video Generation, TTS, STT
 - [ ] Additional LLM provider adapters (Anthropic Claude, Google PaLM)
 - [ ] Visual prompt builder and management interface
 - [ ] Enhanced caching and performance optimizations
 - [ ] Integration with vector databases for RAG applications
 - [ ] Built-in evaluation and testing framework for prompts
 - [ ] Distributed worker execution support
+- [ ] Real-time audio/video streaming support
+- [ ] Advanced multimodal prompt chaining
 ## 🤝 Contributing

data/Rakefile CHANGED Viewed

@@ -1,4 +1,13 @@
 # frozen_string_literal: true
 require "bundler/gem_tasks"
-task default: %i[]
+require "rake/testtask"
+Rake::TestTask.new(:test) do |t|
+  t.libs << "lib"
+  t.libs << "test"
+  t.test_files = FileList["test/**/*_test.rb"]
+  t.verbose = true
+end
+task default: :test

data/config/anthropic_config.yml ADDED Viewed

@@ -0,0 +1,151 @@
+# Anthropic Configuration for SmartPrompt
+# This configuration enables Anthropic Claude models
+# Adapter definitions
+adapters:
+  openai: "OpenAIAdapter"
+  anthropic: "AnthropicAdapter"
+# LLM configurations
+llms:
+  deepseek_anthropic:
+    adapter: anthropic
+    api_key: ENV["ANTHROPIC_AUTH_TOKEN"]
+    url: "https://api.deepseek.com/anthropic"
+    temperature: 0.7
+    max_tokens: 4096
+  deepseek:
+    adapter: openai
+    api_key: ENV["DSKEY"]
+    url: "https://api.deepseek.com"
+# Path configurations
+template_path: "./templates"
+worker_path: "./workers"
+logger_file: "./logs/smart_prompt.log"
+# Advanced settings
+advanced:
+  # Timeout settings (in seconds)
+  request_timeout: 240
+  connection_timeout: 30
+  # Retry settings
+  max_retries: 3
+  retry_delay: 2
+  # Rate limiting
+  requests_per_minute: 60
+# History Management Configuration
+# SmartPrompt provides intelligent conversation history management with session isolation,
+# automatic compression, and multiple context strategies.
+history:
+  # Cache Configuration
+  # Maximum number of sessions to keep in memory (LRU eviction)
+  cache_size: 100
+  # Default Session Configuration
+  # These settings apply to all sessions unless overridden
+  session_defaults:
+    max_messages: 100              # Maximum messages per session (older messages removed)
+    max_tokens: 4000               # Maximum tokens per session (enforced during context retrieval)
+    context_strategy: sliding_window  # Default strategy: sliding_window, relevance_based, summary_based, hybrid
+    preserve_system_messages: true # Always keep system messages regardless of limits
+  # Context Strategy Configurations
+  # Each strategy has specific parameters for fine-tuning behavior
+  strategies:
+    # Sliding Window: Keep the most recent N messages
+    sliding_window:
+      window_size: 10              # Number of recent messages to keep
+      preserve_system: true        # Always include system messages
+    # Relevance-Based: Select messages based on semantic similarity
+    relevance_based:
+      top_k: 10                    # Number of most relevant messages to select
+      recency_weight: 0.3          # Weight for recency (0.0-1.0)
+      relevance_weight: 0.7        # Weight for relevance (0.0-1.0)
+      embedding_service: null      # Optional: embedding service for semantic similarity
+    # Summary-Based: Automatically compress old messages into summaries
+    summary_based:
+      summary_threshold: 20        # Trigger summarization after this many messages
+      keep_recent: 5               # Number of recent messages to keep uncompressed
+      compression_ratio: 0.5       # Target compression ratio (0.0-1.0)
+    # Hybrid: Adaptively combine multiple strategies
+    hybrid:
+      mode: adaptive               # Mode: 'adaptive' (auto-select) or 'combined' (merge results)
+      sliding_window: {}           # Override sliding window config
+      relevance_based: {}          # Override relevance-based config
+      summary_based: {}            # Override summary-based config
+  # Compression Configuration
+  # Automatic summarization to reduce token usage
+  compression:
+    enabled: true                  # Enable automatic compression
+    auto_compress_threshold: 50    # Auto-compress when session exceeds this many messages
+    compression_ratio: 0.5         # Target compression ratio
+    llm_adapter: null              # LLM to use for summarization (uses default if null)
+  # Persistence Configuration
+  # Save and restore conversation history across restarts
+  persistence:
+    enabled: true                  # Enable persistence to disk
+    backend: filesystem            # Backend type: 'filesystem' (more backends coming soon)
+    storage_path: "./history_data" # Directory for storing session data
+    async: true                    # Use async writes for better performance
+  # Cleanup Configuration
+  # Automatic cleanup of old or expired sessions
+  cleanup:
+    auto_cleanup: false            # Enable automatic cleanup thread
+    cleanup_interval: 3600         # Cleanup interval in seconds (1 hour)
+    session_ttl: 86400             # Session time-to-live in seconds (24 hours)
+    cleanup_callback: null         # Optional: custom cleanup logic (Ruby proc)
+  # Monitoring Configuration
+  # Logging and metrics for debugging and monitoring
+  monitoring:
+    enabled: true                  # Enable monitoring and logging
+    log_level: info                # Log level: debug, info, warn, error
+    metrics_format: prometheus     # Metrics format: prometheus, json, hash
+# Example Configurations for Different Use Cases:
+#
+# 1. High-Volume Chat Application (optimize for performance):
+#    cache_size: 1000
+#    session_defaults:
+#      max_messages: 50
+#      max_tokens: 2000
+#      context_strategy: sliding_window
+#    cleanup:
+#      auto_cleanup: true
+#      session_ttl: 3600  # 1 hour
+#
+# 2. Long-Running Conversations (optimize for context retention):
+#    session_defaults:
+#      max_messages: 500
+#      max_tokens: 16000
+#      context_strategy: summary_based
+#    compression:
+#      enabled: true
+#      auto_compress_threshold: 100
+#
+# 3. Semantic Search Application (optimize for relevance):
+#    session_defaults:
+#      context_strategy: relevance_based
+#    strategies:
+#      relevance_based:
+#        top_k: 20
+#        recency_weight: 0.2
+#        relevance_weight: 0.8
+#
+# 4. Development/Testing (disable persistence and cleanup):
+#    persistence:
+#      enabled: false
+#    cleanup:
+#      auto_cleanup: false
+#    monitoring:
+#      log_level: debug

data/config/image_generation_config.yml ADDED Viewed

@@ -0,0 +1,22 @@
+# Configuration for SiliconFlow image generation.
+#
+# Get an API key from https://siliconflow.cn and export it as SILICONFLOW_API_KEY.
+# Available image models: Kwai-Kolors/Kolors, Qwen/Qwen-Image,
+# Qwen/Qwen-Image-Edit (image editing). See:
+# https://api-docs.siliconflow.cn/docs/api/images-generations-post
+adapters:
+  image_generation: "ImageGenerationAdapter"
+llms:
+  image_gen:
+    adapter: "image_generation"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    # Kolors supports batch_size, guidance_scale and a range of image_size values.
+    model: "Kwai-Kolors/Kolors"
+default_llm: "image_gen"
+template_path: "./templates"
+worker_path: "./workers"
+logger_file: "./logs/smart_prompt.log"

data/config/multimodal_config.yml ADDED Viewed

@@ -0,0 +1,85 @@
+# Multimodal Configuration for SmartPrompt
+# This configuration enables multimodal capabilities with SiliconFlow
+# Adapter definitions
+adapters:
+  openai: "OpenAIAdapter"
+  multimodal: "MultimodalAdapter"
+# LLM configurations
+llms:
+  # Multimodal models for vision and video understanding
+  qwen_vl:
+    adapter: "multimodal"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    default_model: "Qwen/Qwen2.5-VL-7B-Instruct"
+    temperature: 0.7
+  qwen_omni:
+    adapter: "multimodal"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    default_model: "Qwen/Qwen3-Omni-7B-Instruct"
+    temperature: 0.7
+  deepseek_vl:
+    adapter: "multimodal"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    default_model: "deepseek-ai/DeepSeek-VL2"
+    temperature: 0.7
+  # Text-only models for comparison
+  siliconflow_text:
+    adapter: "openai"
+    url: "https://api.siliconflow.cn/v1/"
+    api_key: ENV["SILICONFLOW_API_KEY"]
+    default_model: "Qwen/Qwen2.5-7B-Instruct"
+    temperature: 0.7
+# Default settings
+default_llm: "qwen_vl"
+# Path configurations
+template_path: "./templates"
+worker_path: "./workers"
+logger_file: "./logs/smart_prompt.log"
+# Multimodal specific settings
+multimodal:
+  # Default image detail level ("low", "high", "auto")
+  default_image_detail: "auto"
+  # Default video extraction settings
+  default_max_frames: 10
+  default_fps: 1
+  # Supported file formats
+  supported_image_formats:
+    - "jpg"
+    - "jpeg"
+    - "png"
+    - "gif"
+    - "bmp"
+    - "webp"
+  supported_video_formats:
+    - "mp4"
+    - "mov"
+    - "avi"
+    - "mkv"
+    - "webm"
+# Advanced settings
+advanced:
+  # Timeout settings (in seconds)
+  request_timeout: 240
+  connection_timeout: 30
+  # Retry settings
+  max_retries: 3
+  retry_delay: 2
+  # Rate limiting
+  requests_per_minute: 60