RubyGems - llm_conductor - Versions diffs - 1.1.1 → 1.2.0 - Mend

llm_conductor 1.1.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/.rubocop.yml +13 -1
data/README.md +172 -2
data/VISION_USAGE.md +278 -0
data/examples/openrouter_vision_usage.rb +108 -0
data/examples/zai_usage.rb +163 -0
data/lib/llm_conductor/client_factory.rb +4 -1
data/lib/llm_conductor/clients/openrouter_client.rb +112 -7
data/lib/llm_conductor/clients/zai_client.rb +153 -0
data/lib/llm_conductor/configuration.rb +17 -0
data/lib/llm_conductor/prompt_manager.rb +1 -3
data/lib/llm_conductor/version.rb +1 -1
data/lib/llm_conductor.rb +5 -3
metadata +6 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 50f36714c48ac96d1f413b7a57fafdfa4b2479ac07a813203332bdda644666b3
-  data.tar.gz: 620d1f103cb0a1b2827882c99c24f0778d37c3916e842e849330eafa4d95614b
+  metadata.gz: c6ed179bb9142839bcc6feab8d06d61c27ff8279406bc7839f6d09ba14cb573f
+  data.tar.gz: a8ca32fecd9ac81326f7cefcf482f1b6a110b78ca2168c1c8ccbde5e034becb3
 SHA512:
-  metadata.gz: be09114d35f94c4fcade84c3166f95b6c03c504f5823ecd0b7f4cace091ea3f36774fdd5714c2f1c485bae0ae66bb59348a492e64d83cb1786146caa5fe1253f
-  data.tar.gz: c9826e530e16e7f74f9f8861a4b66cdcb96a6368d2d1ac1d7ec6d104e68970312c06a6bb04d48d75d7d582bf1c4baccda6dbf5c2a6e73185375204349888f789
+  metadata.gz: 581da83914c51a3966010d03491c3f57be4ed393bb572f2fdc9d0205f8680f4891f2b058ecf7642ea7bf26bea452a976946b6198d0419afb2e771de3bc112aea
+  data.tar.gz: 00eb70033cb739b7236b759a30219eb5eb6b72db7bba6c7ee519b98cf186e799cbf4f8696acf237d19a8fbfcca97dd9a189ce4f3b4f8f3d8a7d9ff1729d7eb86

data/.rubocop.yml CHANGED Viewed

@@ -29,10 +29,15 @@ Style/HashSyntax:
 Lint/ConstantDefinitionInBlock:
   Enabled: false
+Metrics/ClassLength:
+  Max: 120
 Metrics/MethodLength:
   Max: 15
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
+    - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 RSpec/ExampleLength:
   Enabled: false
@@ -89,17 +94,24 @@ Metrics/BlockLength:
 Metrics/AbcSize:
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
+    - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Metrics/CyclomaticComplexity:
   Exclude:
+    - 'lib/llm_conductor.rb'
     - 'lib/llm_conductor/prompts.rb'
+    - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Metrics/PerceivedComplexity:
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
+    - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Layout/LineLength:
-  Max: 120
+  Max: 125
 # Performance cops (from .rubocop_todo.yml)
 Performance/RedundantEqualityComparisonBlock:

data/README.md CHANGED Viewed

@@ -1,11 +1,12 @@
 # LLM Conductor
-A powerful Ruby gem from [Ekohe](https://ekohe.com) for orchestrating multiple Language Model providers with a unified, modern interface. LLM Conductor provides seamless integration with OpenAI GPT, Anthropic Claude, Google Gemini, Groq, and Ollama with advanced prompt management, data building patterns, and comprehensive response handling.
+A powerful Ruby gem from [Ekohe](https://ekohe.com) for orchestrating multiple Language Model providers with a unified, modern interface. LLM Conductor provides seamless integration with OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, OpenRouter, and Z.ai (Zhipu AI) with advanced prompt management, data building patterns, vision/multimodal support, and comprehensive response handling.
 ## Features
-🚀 **Multi-Provider Support** - OpenAI GPT, Anthropic Claude, Google Gemini, Groq, and Ollama with automatic vendor detection
+🚀 **Multi-Provider Support** - OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, OpenRouter, and Z.ai with automatic vendor detection
 🎯 **Unified Modern API** - Simple `LlmConductor.generate()` interface with rich Response objects
+🖼️ **Vision/Multimodal Support** - Send images alongside text prompts for vision-enabled models (OpenRouter, Z.ai GLM-4.5V)
 📝 **Advanced Prompt Management** - Registrable prompt classes with inheritance and templating
 🏗️ **Data Builder Pattern** - Structured data preparation for complex LLM inputs
 ⚡ **Smart Configuration** - Rails-style configuration with environment variable support
@@ -114,6 +115,16 @@ LlmConductor.configure do |config|
     base_url: ENV['OLLAMA_ADDRESS'] || 'http://localhost:11434'
   )
+  config.openrouter(
+    api_key: ENV['OPENROUTER_API_KEY'],
+    uri_base: 'https://openrouter.ai/api/v1' # Optional, this is the default
+  )
+  config.zai(
+    api_key: ENV['ZAI_API_KEY'],
+    uri_base: 'https://api.z.ai/api/paas/v4' # Optional, this is the default
+  )
   # Optional: Configure custom logger
   config.logger = Logger.new($stdout)                  # Log to stdout
   config.logger = Logger.new('log/llm_conductor.log')  # Log to file
@@ -153,6 +164,8 @@ The gem automatically detects these environment variables:
 - `GEMINI_API_KEY` - Google Gemini API key
 - `GROQ_API_KEY` - Groq API key
 - `OLLAMA_ADDRESS` - Ollama server address
+- `OPENROUTER_API_KEY` - OpenRouter API key
+- `ZAI_API_KEY` - Z.ai (Zhipu AI) API key
 ## Supported Providers & Models
@@ -223,6 +236,160 @@ response = LlmConductor.generate(
 )
 ```
+### OpenRouter (Access to Multiple Providers)
+OpenRouter provides unified access to various LLM providers with automatic routing. It also supports vision/multimodal models with automatic retry logic for handling intermittent availability issues.
+**Vision-capable models:**
+- `nvidia/nemotron-nano-12b-v2-vl:free` - **FREE** 12B vision model (may need retries)
+- `openai/gpt-4o-mini` - Fast and reliable
+- `google/gemini-flash-1.5` - Fast vision processing
+- `anthropic/claude-3.5-sonnet` - High quality analysis
+- `openai/gpt-4o` - Best quality (higher cost)
+**Note:** Free-tier models may experience intermittent 502 errors. The client includes automatic retry logic with exponential backoff (up to 5 retries) to handle these transient failures.
+```ruby
+# Text-only request
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: 'Your prompt here'
+)
+# Vision/multimodal request with single image
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+# Vision request with multiple images
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Compare these images',
+    images: [
+      'https://example.com/image1.jpg',
+      'https://example.com/image2.jpg'
+    ]
+  }
+)
+# Vision request with detail level
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Describe this image in detail',
+    images: [
+      { url: 'https://example.com/image.jpg', detail: 'high' }
+    ]
+  }
+)
+# Advanced: Raw array format (OpenAI-compatible)
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: [
+    { type: 'text', text: 'What is in this image?' },
+    { type: 'image_url', image_url: { url: 'https://example.com/image.jpg' } }
+  ]
+)
+```
+**Reliability:** The OpenRouter client includes intelligent retry logic:
+- Automatically retries on 502 errors (up to 5 attempts)
+- Exponential backoff: 2s, 4s, 8s, 16s, 32s
+- Transparent to your code - works seamlessly
+- Enable logging to see retry attempts:
+```ruby
+LlmConductor.configure do |config|
+  config.logger = Logger.new($stdout)
+  config.logger.level = Logger::INFO
+end
+```
+### Z.ai (Zhipu AI) - GLM Models with Vision Support
+Z.ai provides access to GLM (General Language Model) series including the powerful GLM-4.5V multimodal model with 64K context window and vision capabilities.
+**Text models:**
+- `glm-4-plus` - Enhanced text-only model
+- `glm-4` - Standard GLM-4 model
+**Vision-capable models:**
+- `glm-4.5v` - Latest multimodal model with 64K context ✅ **RECOMMENDED**
+- `glm-4v` - Previous generation vision model
+```ruby
+# Text-only request with GLM-4-plus
+response = LlmConductor.generate(
+  model: 'glm-4-plus',
+  vendor: :zai,
+  prompt: 'Explain quantum computing in simple terms'
+)
+# Vision request with GLM-4.5V - single image
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+# Vision request with multiple images
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Compare these images and identify differences',
+    images: [
+      'https://example.com/image1.jpg',
+      'https://example.com/image2.jpg'
+    ]
+  }
+)
+# Vision request with detail level
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Analyze this document in detail',
+    images: [
+      { url: 'https://example.com/document.jpg', detail: 'high' }
+    ]
+  }
+)
+# Base64 encoded local images
+require 'base64'
+image_data = Base64.strict_encode64(File.read('path/to/image.jpg'))
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: "data:image/jpeg;base64,#{image_data}"
+  }
+)
+```
+**GLM-4.5V Features:**
+- 64K token context window
+- Multimodal understanding (text + images)
+- Document understanding and OCR
+- Image reasoning and analysis
+- Base64 image support for local files
+- OpenAI-compatible API format
 ### Vendor Detection
 The gem automatically detects the appropriate provider based on model names:
@@ -230,6 +397,7 @@ The gem automatically detects the appropriate provider based on model names:
 - **OpenAI**: Models starting with `gpt-` (e.g., `gpt-4`, `gpt-3.5-turbo`)
 - **Anthropic**: Models starting with `claude-` (e.g., `claude-3-5-sonnet-20241022`)
 - **Google Gemini**: Models starting with `gemini-` (e.g., `gemini-2.5-flash`, `gemini-2.0-flash`)
+- **Z.ai**: Models starting with `glm-` (e.g., `glm-4.5v`, `glm-4-plus`, `glm-4v`)
 - **Groq**: Models starting with `llama`, `mixtral`, `gemma`, or `qwen` (e.g., `llama-3.1-70b-versatile`, `mixtral-8x7b-32768`, `gemma-7b-it`, `qwen-2.5-72b-instruct`)
 - **Ollama**: All other models (e.g., `llama3.2`, `mistral`, `codellama`)
@@ -483,6 +651,8 @@ Check the `/examples` directory for comprehensive usage examples:
 - `rag_usage.rb` - RAG implementation examples
 - `gemini_usage.rb` - Google Gemini integration
 - `groq_usage.rb` - Groq integration with various models
+- `openrouter_vision_usage.rb` - OpenRouter vision/multimodal examples
+- `zai_usage.rb` - Z.ai GLM-4.5V vision and text examples
 ## Development

data/VISION_USAGE.md ADDED Viewed

@@ -0,0 +1,278 @@
+# Vision/Multimodal Usage Guide
+This guide explains how to use vision/multimodal capabilities with the OpenRouter and Z.ai clients in LLM Conductor.
+## Quick Start
+### Using OpenRouter
+```ruby
+require 'llm_conductor'
+# Configure
+LlmConductor.configure do |config|
+  config.openrouter(api_key: ENV['OPENROUTER_API_KEY'])
+end
+# Analyze an image
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+puts response.output
+```
+### Using Z.ai (Zhipu AI)
+```ruby
+require 'llm_conductor'
+# Configure
+LlmConductor.configure do |config|
+  config.zai(api_key: ENV['ZAI_API_KEY'])
+end
+# Analyze an image with GLM-4.5V
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+puts response.output
+```
+## Recommended Models
+### OpenRouter Models
+For vision tasks via OpenRouter, these models work reliably:
+- **`openai/gpt-4o-mini`** - Fast, reliable, good balance of cost/quality ✅
+- **`google/gemini-flash-1.5`** - Fast vision processing
+- **`anthropic/claude-3.5-sonnet`** - High quality analysis
+- **`openai/gpt-4o`** - Best quality (higher cost)
+### Z.ai Models (Zhipu AI)
+For vision tasks via Z.ai, these GLM models are recommended:
+- **`glm-4.5v`** - GLM-4.5V multimodal model (64K context window) ✅
+- **`glm-4-plus`** - Text-only model with enhanced capabilities
+- **`glm-4v`** - Previous generation vision model
+## Usage Formats
+### 1. Single Image (Simple Format)
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Describe this image',
+    images: 'https://example.com/image.jpg'
+  }
+)
+```
+### 2. Multiple Images
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Compare these images',
+    images: [
+      'https://example.com/image1.jpg',
+      'https://example.com/image2.jpg',
+      'https://example.com/image3.jpg'
+    ]
+  }
+)
+```
+### 3. Image with Detail Level
+For high-resolution images, specify the detail level:
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Analyze this image in detail',
+    images: [
+      { url: 'https://example.com/hires-image.jpg', detail: 'high' }
+    ]
+  }
+)
+```
+Detail levels:
+- `'high'` - Better for detailed analysis (uses more tokens)
+- `'low'` - Faster, cheaper (default if not specified)
+- `'auto'` - Let the model decide
+### 4. Raw Format (Advanced)
+For maximum control, use the OpenAI-compatible array format:
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: [
+    { type: 'text', text: 'What is in this image?' },
+    { type: 'image_url', image_url: { url: 'https://example.com/image.jpg' } },
+    { type: 'text', text: 'Describe it in detail.' }
+  ]
+)
+```
+## Text-Only Requests (Backward Compatible)
+The client still supports regular text-only requests:
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: 'What is the capital of France?'
+)
+```
+## Image URL Requirements
+- Images must be publicly accessible URLs
+- Supported formats: JPEG, PNG, GIF, WebP
+- Maximum file size depends on the model
+- Use HTTPS URLs when possible
+## Error Handling
+```ruby
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Analyze this',
+    images: 'https://example.com/image.jpg'
+  }
+)
+if response.success?
+  puts response.output
+else
+  puts "Error: #{response.metadata[:error]}"
+end
+```
+## Testing in Development
+### Interactive Console
+```bash
+./bin/console
+```
+Then:
+```ruby
+LlmConductor.configure do |config|
+  config.openrouter(api_key: 'your-key')
+end
+response = LlmConductor.generate(
+  model: 'openai/gpt-4o-mini',
+  vendor: :openrouter,
+  prompt: {
+    text: 'What is this?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+```
+### Run Examples
+For OpenRouter:
+```bash
+export OPENROUTER_API_KEY='your-key'
+ruby examples/openrouter_vision_usage.rb
+```
+For Z.ai:
+```bash
+export ZAI_API_KEY='your-key'
+ruby examples/zai_usage.rb
+```
+## Token Counting
+Token counting for multimodal requests counts only the text portion. Image tokens vary by:
+- Image size
+- Detail level specified
+- Model being used
+The gem provides an approximation based on text tokens. For precise billing, check the OpenRouter dashboard.
+## Common Issues
+### 502 Server Error
+If you get a 502 error:
+- The model might be unavailable
+- Try a different model (e.g., switch to `openai/gpt-4o-mini`)
+- Free tier models may be overloaded
+### "No implicit conversion of Hash into String"
+This was fixed in the current version. Make sure you're using the latest version of the gem.
+### Image Not Loading
+- Verify the URL is publicly accessible
+- Check that the image format is supported
+- Try a smaller image size
+## Cost Considerations
+Vision models are more expensive than text-only models. Costs vary by:
+- **Model choice**: GPT-4o > GPT-4o-mini > Gemini Flash
+- **Detail level**: `high` uses more tokens than `low`
+- **Image count**: Each image adds to the cost
+- **Image size**: Larger images may use more tokens
+For development, use:
+- `openai/gpt-4o-mini` for cost-effective testing
+- `detail: 'low'` for quick analysis
+- Single images when possible
+For production:
+- Use `openai/gpt-4o` for best quality
+- Use `detail: 'high'` when needed
+- Monitor costs via OpenRouter dashboard
+## Examples
+- `examples/openrouter_vision_usage.rb` - Complete OpenRouter vision examples
+- `examples/zai_usage.rb` - Complete Z.ai GLM-4.5V examples including vision and text
+## Further Reading
+- [OpenRouter Documentation](https://openrouter.ai/docs)
+- [OpenAI Vision API Reference](https://platform.openai.com/docs/guides/vision)
+- [Anthropic Claude Vision](https://docs.anthropic.com/claude/docs/vision)
+- [Z.ai API Platform](https://api.z.ai/)
+- [GLM-4.5V Documentation](https://bigmodel.cn/)

data/examples/openrouter_vision_usage.rb ADDED Viewed

@@ -0,0 +1,108 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Example of OpenRouter vision/multimodal usage
+require_relative '../lib/llm_conductor'
+# Configure OpenRouter
+LlmConductor.configure do |config|
+  config.openrouter(
+    api_key: ENV['OPENROUTER_API_KEY']
+  )
+end
+# Example 1: Simple text-only request (backward compatible)
+puts '=== Example 1: Text-only request ==='
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free', # Free vision-capable model
+  vendor: :openrouter,
+  prompt: 'What is the capital of France?'
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 2: Vision request with a single image
+puts '=== Example 2: Single image analysis ==='
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 3: Vision request with multiple images
+puts '=== Example 3: Multiple images comparison ==='
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Compare these two images and describe the differences.',
+    images: [
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Placeholder_view_vector.svg/681px-Placeholder_view_vector.svg.png'
+    ]
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 4: Image with detail level specification
+puts '=== Example 4: Image with detail level ==='
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: {
+    text: 'Describe this image in detail.',
+    images: [
+      {
+        url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
+        detail: 'high'
+      }
+    ]
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 5: Using raw array format (advanced)
+puts '=== Example 5: Raw array format ==='
+response = LlmConductor.generate(
+  model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+  vendor: :openrouter,
+  prompt: [
+    { type: 'text', text: 'What is in this image?' },
+    {
+      type: 'image_url',
+      image_url: {
+        url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+      }
+    }
+  ]
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 6: Error handling
+puts '=== Example 6: Error handling ==='
+begin
+  response = LlmConductor.generate(
+    model: 'nvidia/nemotron-nano-12b-v2-vl:free',
+    vendor: :openrouter,
+    prompt: {
+      text: 'Analyze this image',
+      images: 'invalid-url'
+    }
+  )
+  if response.success?
+    puts response.output
+  else
+    puts "Error: #{response.metadata[:error]}"
+  end
+rescue StandardError => e
+  puts "Exception: #{e.message}"
+end

data/examples/zai_usage.rb ADDED Viewed

@@ -0,0 +1,163 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+# Example of Z.ai GLM model usage including multimodal/vision capabilities
+require_relative '../lib/llm_conductor'
+# Configure Z.ai
+LlmConductor.configure do |config|
+  config.zai(
+    api_key: ENV['ZAI_API_KEY']
+  )
+end
+# Example 1: Simple text-only request with GLM-4-plus
+puts '=== Example 1: Text-only request with GLM-4-plus ==='
+response = LlmConductor.generate(
+  model: 'glm-4-plus',
+  vendor: :zai,
+  prompt: 'What is the capital of France? Please answer in one sentence.'
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 2: Text request with GLM-4.5V (vision model, text-only mode)
+puts '=== Example 2: Text-only request with GLM-4.5V ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: 'Explain the concept of machine learning in simple terms.'
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 3: Vision request with a single image
+puts '=== Example 3: Single image analysis with GLM-4.5V ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What do you see in this image? Please describe it in detail.',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 4: Vision request with multiple images
+puts '=== Example 4: Multiple images comparison with GLM-4.5V ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Compare these two images and describe the differences you observe.',
+    images: [
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Placeholder_view_vector.svg/681px-Placeholder_view_vector.svg.png'
+    ]
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 5: Image with detail level specification
+puts '=== Example 5: Image with detail level ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Describe this image in detail, including colors, objects, and atmosphere.',
+    images: [
+      {
+        url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
+        detail: 'high'
+      }
+    ]
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 6: Using raw array format (advanced)
+puts '=== Example 6: Raw array format ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: [
+    { type: 'text', text: 'What objects can you identify in this image?' },
+    {
+      type: 'image_url',
+      image_url: {
+        url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+      }
+    }
+  ]
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 7: Base64 encoded image (for local images)
+puts '=== Example 7: Using base64 encoded image ==='
+# NOTE: In real usage, you would read and encode a local file
+# require 'base64'
+# image_data = Base64.strict_encode64(File.read('path/to/image.jpg'))
+# image_url = "data:image/jpeg;base64,#{image_data}"
+# For this example, we'll use a URL
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Analyze this image and extract any text you can see.',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 8: Error handling
+puts '=== Example 8: Error handling ==='
+begin
+  response = LlmConductor.generate(
+    model: 'glm-4.5v',
+    vendor: :zai,
+    prompt: {
+      text: 'Analyze this image',
+      images: 'invalid-url'
+    }
+  )
+  if response.success?
+    puts response.output
+  else
+    puts "Error: #{response.metadata[:error]}"
+  end
+rescue StandardError => e
+  puts "Exception: #{e.message}"
+end
+# Example 9: Document understanding (OCR)
+puts "\n=== Example 9: Document understanding ==="
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Please read any text visible in this image and transcribe it.',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"
+# Example 10: Complex reasoning with image
+puts '=== Example 10: Complex reasoning with image ==='
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Based on this image, what time of day do you think it is? Explain your reasoning.',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts response.output
+puts "Tokens used: #{response.total_tokens}\n\n"

data/lib/llm_conductor/client_factory.rb CHANGED Viewed

@@ -19,7 +19,8 @@ module LlmConductor
         ollama: Clients::OllamaClient,
         gemini: Clients::GeminiClient,
         google: Clients::GeminiClient,
-        groq: Clients::GroqClient
+        groq: Clients::GroqClient,
+        zai: Clients::ZaiClient
       }
       client_classes.fetch(vendor) do
@@ -35,6 +36,8 @@ module LlmConductor
         :openai
       when /^gemini/i
         :gemini
+      when /^glm/i
+        :zai
       when /^(llama|mixtral|gemma|qwen)/i
         :groq
       else

data/lib/llm_conductor/clients/openrouter_client.rb CHANGED Viewed

@@ -3,17 +3,122 @@
 module LlmConductor
   module Clients
     # OpenRouter client implementation for accessing various LLM providers through OpenRouter API
+    # Supports both text-only and multimodal (vision) requests
     class OpenrouterClient < BaseClient
       private
+      # Override token calculation to handle multimodal content
+      def calculate_tokens(content)
+        case content
+        when String
+          super(content)
+        when Hash
+          # For multimodal content, count tokens only for text part
+          # Note: This is an approximation as images have variable token counts
+          text = content[:text] || content['text'] || ''
+          super(text)
+        when Array
+          # For pre-formatted arrays, extract and count text parts
+          text_parts = content.select { |part| part[:type] == 'text' || part['type'] == 'text' }
+                              .map { |part| part[:text] || part['text'] || '' }
+                              .join(' ')
+          super(text_parts)
+        else
+          super(content.to_s)
+        end
+      end
       def generate_content(prompt)
-        client.chat(
-          parameters: {
-            model:,
-            messages: [{ role: 'user', content: prompt }],
-            provider: { sort: 'throughput' }
+        content = format_content(prompt)
+        # Retry logic for transient 502 errors (common with free-tier models)
+        # Free-tier vision models can be slow/overloaded, so we use more retries
+        max_retries = 5
+        retry_count = 0
+        begin
+          client.chat(
+            parameters: {
+              model:,
+              messages: [{ role: 'user', content: }],
+              provider: { sort: 'throughput' }
+            }
+          ).dig('choices', 0, 'message', 'content')
+        rescue Faraday::ServerError => e
+          retry_count += 1
+          # Log retry attempts if logger is configured
+          configuration.logger&.warn(
+            "OpenRouter API error (attempt #{retry_count}/#{max_retries}): #{e.message}"
+          )
+          raise unless e.response[:status] == 502 && retry_count < max_retries
+          wait_time = 2**retry_count # Exponential backoff: 2, 4, 8, 16, 32 seconds
+          configuration.logger&.info("Retrying in #{wait_time}s...")
+          sleep(wait_time)
+          retry
+        end
+      end
+      # Format content based on whether it's a simple string or multimodal content
+      # @param prompt [String, Hash, Array] The prompt content
+      # @return [String, Array] Formatted content for the API
+      def format_content(prompt)
+        case prompt
+        when Hash
+          # Handle hash with text and/or images
+          format_multimodal_hash(prompt)
+        when Array
+          # Already formatted as array of content parts
+          prompt
+        else
+          # Simple string prompt
+          prompt.to_s
+        end
+      end
+      # Format a hash containing text and/or images into multimodal content array
+      # @param prompt_hash [Hash] Hash with :text and/or :images keys
+      # @return [Array] Array of content parts for the API
+      def format_multimodal_hash(prompt_hash)
+        content_parts = []
+        # Add text part if present
+        if prompt_hash[:text] || prompt_hash['text']
+          text = prompt_hash[:text] || prompt_hash['text']
+          content_parts << { type: 'text', text: }
+        end
+        # Add image parts if present
+        images = prompt_hash[:images] || prompt_hash['images'] || []
+        images = [images] unless images.is_a?(Array)
+        images.each do |image|
+          content_parts << format_image_part(image)
+        end
+        content_parts
+      end
+      # Format an image into the appropriate API structure
+      # @param image [String, Hash] Image URL or hash with url/detail keys
+      # @return [Hash] Formatted image part for the API
+      def format_image_part(image)
+        case image
+        when String
+          # Simple URL string
+          { type: 'image_url', image_url: { url: image } }
+        when Hash
+          # Hash with url and optional detail level
+          {
+            type: 'image_url',
+            image_url: {
+              url: image[:url] || image['url'],
+              detail: image[:detail] || image['detail']
+            }.compact
           }
-        ).dig('choices', 0, 'message', 'content')
+        end
       end
       def client
@@ -21,7 +126,7 @@ module LlmConductor
           config = LlmConductor.configuration.provider_config(:openrouter)
           OpenAI::Client.new(
             access_token: config[:api_key],
-            uri_base: config[:uri_base] || 'https://openrouter.ai/api/'
+            uri_base: config[:uri_base] || 'https://openrouter.ai/api/v1'
           )
         end
       end

data/lib/llm_conductor/clients/zai_client.rb ADDED Viewed

@@ -0,0 +1,153 @@
+# frozen_string_literal: true
+module LlmConductor
+  module Clients
+    # Z.ai client implementation for accessing GLM models including GLM-4.5V
+    # Supports both text-only and multimodal (vision) requests
+    #
+    # Note: Z.ai uses OpenAI-compatible API format but with /v4/ path instead of /v1/
+    # We use Faraday directly instead of the ruby-openai gem to properly handle the API path
+    class ZaiClient < BaseClient
+      private
+      # Override token calculation to handle multimodal content
+      def calculate_tokens(content)
+        case content
+        when String
+          super(content)
+        when Hash
+          # For multimodal content, count tokens only for text part
+          # Note: This is an approximation as images have variable token counts
+          text = content[:text] || content['text'] || ''
+          super(text)
+        when Array
+          # For pre-formatted arrays, extract and count text parts
+          text_parts = content.select { |part| part[:type] == 'text' || part['type'] == 'text' }
+                              .map { |part| part[:text] || part['text'] || '' }
+                              .join(' ')
+          super(text_parts)
+        else
+          super(content.to_s)
+        end
+      end
+      def generate_content(prompt)
+        content = format_content(prompt)
+        # Retry logic for transient errors (similar to OpenRouter)
+        max_retries = 3
+        retry_count = 0
+        begin
+          # Make direct HTTP request to Z.ai API since they use /v4/ instead of /v1/
+          response = http_client.post('chat/completions') do |req|
+            req.body = {
+              model:,
+              messages: [{ role: 'user', content: }]
+            }.to_json
+          end
+          # Response body is already parsed as Hash by Faraday's JSON middleware
+          response_data = response.body.is_a?(String) ? JSON.parse(response.body) : response.body
+          response_data.dig('choices', 0, 'message', 'content')
+        rescue Faraday::ServerError => e
+          retry_count += 1
+          # Log retry attempts if logger is configured
+          configuration.logger&.warn(
+            "Z.ai API error (attempt #{retry_count}/#{max_retries}): #{e.message}"
+          )
+          raise unless retry_count < max_retries
+          wait_time = 2**retry_count # Exponential backoff: 2, 4, 8 seconds
+          configuration.logger&.info("Retrying in #{wait_time}s...")
+          sleep(wait_time)
+          retry
+        end
+      end
+      # Format content based on whether it's a simple string or multimodal content
+      # @param prompt [String, Hash, Array] The prompt content
+      # @return [String, Array] Formatted content for the API
+      def format_content(prompt)
+        case prompt
+        when Hash
+          # Handle hash with text and/or images
+          format_multimodal_hash(prompt)
+        when Array
+          # Already formatted as array of content parts
+          prompt
+        else
+          # Simple string prompt
+          prompt.to_s
+        end
+      end
+      # Format a hash containing text and/or images into multimodal content array
+      # @param prompt_hash [Hash] Hash with :text and/or :images keys
+      # @return [Array] Array of content parts for the API
+      def format_multimodal_hash(prompt_hash)
+        content_parts = []
+        # Add text part if present
+        if prompt_hash[:text] || prompt_hash['text']
+          text = prompt_hash[:text] || prompt_hash['text']
+          content_parts << { type: 'text', text: }
+        end
+        # Add image parts if present
+        images = prompt_hash[:images] || prompt_hash['images'] || []
+        images = [images] unless images.is_a?(Array)
+        images.each do |image|
+          content_parts << format_image_part(image)
+        end
+        content_parts
+      end
+      # Format an image into the appropriate API structure
+      # @param image [String, Hash] Image URL or hash with url/detail keys
+      # @return [Hash] Formatted image part for the API
+      def format_image_part(image)
+        case image
+        when String
+          # Simple URL string or base64 data
+          { type: 'image_url', image_url: { url: image } }
+        when Hash
+          # Hash with url and optional detail level
+          {
+            type: 'image_url',
+            image_url: {
+              url: image[:url] || image['url'],
+              detail: image[:detail] || image['detail']
+            }.compact
+          }
+        end
+      end
+      # HTTP client for making requests to Z.ai API
+      # Z.ai uses /v4/ in their path, not /v1/ like OpenAI, so we use Faraday directly
+      def http_client
+        @http_client ||= begin
+          config = LlmConductor.configuration.provider_config(:zai)
+          base_url = config[:uri_base] || 'https://api.z.ai/api/paas/v4'
+          Faraday.new(url: base_url) do |f|
+            f.request :json
+            f.response :json
+            f.headers['Authorization'] = "Bearer #{config[:api_key]}"
+            f.headers['Content-Type'] = 'application/json'
+            f.adapter Faraday.default_adapter
+          end
+        end
+      end
+      # Legacy client method for compatibility (not used, but kept for reference)
+      def client
+        http_client
+      end
+    end
+  end
+end

data/lib/llm_conductor/configuration.rb CHANGED Viewed

@@ -72,6 +72,14 @@ module LlmConductor
       }
     end
+    # Configure Z.ai provider
+    def zai(api_key: nil, **options)
+      @providers[:zai] = {
+        api_key: api_key || ENV['ZAI_API_KEY'],
+        **options
+      }
+    end
     # Get provider configuration
     def provider_config(provider)
       @providers[provider.to_sym] || {}
@@ -126,6 +134,14 @@ module LlmConductor
       groq(api_key: value)
     end
+    def zai_api_key
+      provider_config(:zai)[:api_key]
+    end
+    def zai_api_key=(value)
+      zai(api_key: value)
+    end
     private
     def setup_defaults_from_env
@@ -135,6 +151,7 @@ module LlmConductor
       openrouter if ENV['OPENROUTER_API_KEY']
       gemini if ENV['GEMINI_API_KEY']
       groq if ENV['GROQ_API_KEY']
+      zai if ENV['ZAI_API_KEY']
       ollama # Always configure Ollama with default URL
     end
   end

data/lib/llm_conductor/prompt_manager.rb CHANGED Viewed

@@ -59,9 +59,7 @@ module LlmConductor
       def validate_prompt_class!(prompt_class)
         raise InvalidPromptClassError, 'Prompt must be a class' unless prompt_class.is_a?(Class)
-        unless prompt_class < Prompts::BasePrompt
-          raise InvalidPromptClassError, 'Prompt class must inherit from BasePrompt'
-        end
+        raise InvalidPromptClassError, 'Prompt class must inherit from BasePrompt' unless prompt_class < Prompts::BasePrompt
         return if prompt_class.instance_methods(false).include?(:render)

data/lib/llm_conductor/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LlmConductor
-  VERSION = '1.1.1'
+  VERSION = '1.2.0'
 end

data/lib/llm_conductor.rb CHANGED Viewed

@@ -14,10 +14,11 @@ require_relative 'llm_conductor/clients/groq_client'
 require_relative 'llm_conductor/clients/ollama_client'
 require_relative 'llm_conductor/clients/openrouter_client'
 require_relative 'llm_conductor/clients/gemini_client'
+require_relative 'llm_conductor/clients/zai_client'
 require_relative 'llm_conductor/client_factory'
 # LLM Conductor provides a unified interface for multiple Language Model providers
-# including OpenAI GPT, Anthropic Claude, Google Gemini, Groq, OpenRouter, and Ollama
+# including OpenAI GPT, Anthropic Claude, Google Gemini, Groq, OpenRouter, Z.ai, and Ollama
 # with built-in prompt templates, token counting, and extensible client architecture.
 module LlmConductor
   class Error < StandardError; end
@@ -63,16 +64,17 @@ module LlmConductor
       when :ollama then Clients::OllamaClient
       when :gemini, :google then Clients::GeminiClient
       when :groq then Clients::GroqClient
+      when :zai then Clients::ZaiClient
       else
         raise ArgumentError,
               "Unsupported vendor: #{vendor}. " \
-              'Supported vendors: anthropic, openai, openrouter, ollama, gemini, groq'
+              'Supported vendors: anthropic, openai, openrouter, ollama, gemini, groq, zai'
       end
     end
   end
   # List of supported vendors
-  SUPPORTED_VENDORS = %i[anthropic openai openrouter ollama gemini groq].freeze
+  SUPPORTED_VENDORS = %i[anthropic openai openrouter ollama gemini groq zai].freeze
   # List of supported prompt types
   SUPPORTED_PROMPT_TYPES = %i[

metadata CHANGED Viewed

@@ -1,13 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: llm_conductor
 version: !ruby/object:Gem::Version
-  version: 1.1.1
+  version: 1.2.0
 platform: ruby
 authors:
 - Ben Zheng
 bindir: exe
 cert_chain: []
-date: 2025-10-15 00:00:00.000000000 Z
+date: 2025-10-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activesupport
@@ -152,13 +152,16 @@ files:
 - LICENSE
 - README.md
 - Rakefile
+- VISION_USAGE.md
 - config/initializers/llm_conductor.rb
 - examples/data_builder_usage.rb
 - examples/gemini_usage.rb
 - examples/groq_usage.rb
+- examples/openrouter_vision_usage.rb
 - examples/prompt_registration.rb
 - examples/rag_usage.rb
 - examples/simple_usage.rb
+- examples/zai_usage.rb
 - lib/llm_conductor.rb
 - lib/llm_conductor/client_factory.rb
 - lib/llm_conductor/clients/anthropic_client.rb
@@ -168,6 +171,7 @@ files:
 - lib/llm_conductor/clients/groq_client.rb
 - lib/llm_conductor/clients/ollama_client.rb
 - lib/llm_conductor/clients/openrouter_client.rb
+- lib/llm_conductor/clients/zai_client.rb
 - lib/llm_conductor/configuration.rb
 - lib/llm_conductor/data_builder.rb
 - lib/llm_conductor/prompt_manager.rb