RubyGems - llm_conductor - Versions diffs - 1.1.2 → 1.3.0 - Mend

llm_conductor 1.1.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/.rubocop.yml +11 -1
data/README.md +87 -3
data/VISION_USAGE.md +146 -9
data/examples/claude_vision_usage.rb +138 -0
data/examples/gpt_vision_usage.rb +156 -0
data/examples/zai_usage.rb +163 -0
data/lib/llm_conductor/client_factory.rb +4 -1
data/lib/llm_conductor/clients/anthropic_client.rb +28 -1
data/lib/llm_conductor/clients/concerns/vision_support.rb +159 -0
data/lib/llm_conductor/clients/gpt_client.rb +7 -1
data/lib/llm_conductor/clients/openrouter_client.rb +4 -81
data/lib/llm_conductor/clients/zai_client.rb +76 -0
data/lib/llm_conductor/configuration.rb +17 -0
data/lib/llm_conductor/prompt_manager.rb +1 -3
data/lib/llm_conductor/version.rb +1 -1
data/lib/llm_conductor.rb +5 -3
metadata +7 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 0ca46783dd713d49b3292342d83f5adde4a0da684e4365004651464e6ac630bb
-  data.tar.gz: 5f693e2e4d8da70bebe5faf174a71e1880a705eb7dc6468d4fdf6774b8f2e9f3
+  metadata.gz: bce592da24b8bb09f9702361a8d2de5051092290dd3b263f0026ddb877a8717b
+  data.tar.gz: 364a233ac3b1490010d949e15f83a3c45a5750ed117674ae2498508884cc365a
 SHA512:
-  metadata.gz: 70ccb3ae2317588199a2820f1da19188e5ada27da19052de9cc964aeac775bbdd8feae350c958e797676eb8cbaf6549966591ad1299a1088ceeb8cfcbf70dc35
-  data.tar.gz: d24c1ff9423b009b2d50a35d6b122817174d5871813db037ee15378938cfd88aaf9dbc4765325bd65cf99d9fbbbeaae2a2d5ad926677efc6b36187209ddccf09
+  metadata.gz: 3ea0a7fc5d5fe1f729e6eb76b9b81eb5b24aaad96ba59ef954637e00184eded4f6fd44c591ee3921f86dd3131403fc496a77b355bd59e60158849c2e3af44511
+  data.tar.gz: 322cfca7d9e8917761af1b5de1033d9c11f58fceaa8d79aa18feee6b65050c4ab123c340479d44adeafa42f85b25c9e406e17ffda0af0ed6f871cb3d4d7d682f

data/.rubocop.yml CHANGED Viewed

@@ -29,11 +29,15 @@ Style/HashSyntax:
 Lint/ConstantDefinitionInBlock:
   Enabled: false
+Metrics/ClassLength:
+  Max: 120
 Metrics/MethodLength:
   Max: 15
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
     - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 RSpec/ExampleLength:
   Enabled: false
@@ -91,19 +95,25 @@ Metrics/AbcSize:
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
     - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Metrics/CyclomaticComplexity:
   Exclude:
+    - 'lib/llm_conductor.rb'
     - 'lib/llm_conductor/prompts.rb'
     - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Metrics/PerceivedComplexity:
   Exclude:
     - 'lib/llm_conductor/prompts.rb'
     - 'lib/llm_conductor/clients/openrouter_client.rb'
+    - 'lib/llm_conductor/clients/zai_client.rb'
 Layout/LineLength:
-  Max: 120
+  Max: 125
+  Exclude:
+    - 'examples/*.rb'
 # Performance cops (from .rubocop_todo.yml)
 Performance/RedundantEqualityComparisonBlock:

data/README.md CHANGED Viewed

@@ -1,12 +1,12 @@
 # LLM Conductor
-A powerful Ruby gem from [Ekohe](https://ekohe.com) for orchestrating multiple Language Model providers with a unified, modern interface. LLM Conductor provides seamless integration with OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, and OpenRouter with advanced prompt management, data building patterns, vision/multimodal support, and comprehensive response handling.
+A powerful Ruby gem from [Ekohe](https://ekohe.com) for orchestrating multiple Language Model providers with a unified, modern interface. LLM Conductor provides seamless integration with OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, OpenRouter, and Z.ai (Zhipu AI) with advanced prompt management, data building patterns, vision/multimodal support, and comprehensive response handling.
 ## Features
-🚀 **Multi-Provider Support** - OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, and OpenRouter with automatic vendor detection
+🚀 **Multi-Provider Support** - OpenAI GPT, Anthropic Claude, Google Gemini, Groq, Ollama, OpenRouter, and Z.ai with automatic vendor detection
 🎯 **Unified Modern API** - Simple `LlmConductor.generate()` interface with rich Response objects
-🖼️ **Vision/Multimodal Support** - Send images alongside text prompts for vision-enabled models (OpenRouter)
+🖼️ **Vision/Multimodal Support** - Send images alongside text prompts for vision-enabled models (OpenRouter, Z.ai GLM-4.5V)
 📝 **Advanced Prompt Management** - Registrable prompt classes with inheritance and templating
 🏗️ **Data Builder Pattern** - Structured data preparation for complex LLM inputs
 ⚡ **Smart Configuration** - Rails-style configuration with environment variable support
@@ -120,6 +120,11 @@ LlmConductor.configure do |config|
     uri_base: 'https://openrouter.ai/api/v1' # Optional, this is the default
   )
+  config.zai(
+    api_key: ENV['ZAI_API_KEY'],
+    uri_base: 'https://api.z.ai/api/paas/v4' # Optional, this is the default
+  )
   # Optional: Configure custom logger
   config.logger = Logger.new($stdout)                  # Log to stdout
   config.logger = Logger.new('log/llm_conductor.log')  # Log to file
@@ -160,6 +165,7 @@ The gem automatically detects these environment variables:
 - `GROQ_API_KEY` - Groq API key
 - `OLLAMA_ADDRESS` - Ollama server address
 - `OPENROUTER_API_KEY` - OpenRouter API key
+- `ZAI_API_KEY` - Z.ai (Zhipu AI) API key
 ## Supported Providers & Models
@@ -309,6 +315,81 @@ LlmConductor.configure do |config|
 end
 ```
+### Z.ai (Zhipu AI) - GLM Models with Vision Support
+Z.ai provides access to GLM (General Language Model) series including the powerful GLM-4.5V multimodal model with 64K context window and vision capabilities.
+**Text models:**
+- `glm-4-plus` - Enhanced text-only model
+- `glm-4` - Standard GLM-4 model
+**Vision-capable models:**
+- `glm-4.5v` - Latest multimodal model with 64K context ✅ **RECOMMENDED**
+- `glm-4v` - Previous generation vision model
+```ruby
+# Text-only request with GLM-4-plus
+response = LlmConductor.generate(
+  model: 'glm-4-plus',
+  vendor: :zai,
+  prompt: 'Explain quantum computing in simple terms'
+)
+# Vision request with GLM-4.5V - single image
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+# Vision request with multiple images
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Compare these images and identify differences',
+    images: [
+      'https://example.com/image1.jpg',
+      'https://example.com/image2.jpg'
+    ]
+  }
+)
+# Vision request with detail level
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'Analyze this document in detail',
+    images: [
+      { url: 'https://example.com/document.jpg', detail: 'high' }
+    ]
+  }
+)
+# Base64 encoded local images
+require 'base64'
+image_data = Base64.strict_encode64(File.read('path/to/image.jpg'))
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: "data:image/jpeg;base64,#{image_data}"
+  }
+)
+```
+**GLM-4.5V Features:**
+- 64K token context window
+- Multimodal understanding (text + images)
+- Document understanding and OCR
+- Image reasoning and analysis
+- Base64 image support for local files
+- OpenAI-compatible API format
 ### Vendor Detection
 The gem automatically detects the appropriate provider based on model names:
@@ -316,6 +397,7 @@ The gem automatically detects the appropriate provider based on model names:
 - **OpenAI**: Models starting with `gpt-` (e.g., `gpt-4`, `gpt-3.5-turbo`)
 - **Anthropic**: Models starting with `claude-` (e.g., `claude-3-5-sonnet-20241022`)
 - **Google Gemini**: Models starting with `gemini-` (e.g., `gemini-2.5-flash`, `gemini-2.0-flash`)
+- **Z.ai**: Models starting with `glm-` (e.g., `glm-4.5v`, `glm-4-plus`, `glm-4v`)
 - **Groq**: Models starting with `llama`, `mixtral`, `gemma`, or `qwen` (e.g., `llama-3.1-70b-versatile`, `mixtral-8x7b-32768`, `gemma-7b-it`, `qwen-2.5-72b-instruct`)
 - **Ollama**: All other models (e.g., `llama3.2`, `mistral`, `codellama`)
@@ -569,6 +651,8 @@ Check the `/examples` directory for comprehensive usage examples:
 - `rag_usage.rb` - RAG implementation examples
 - `gemini_usage.rb` - Google Gemini integration
 - `groq_usage.rb` - Groq integration with various models
+- `openrouter_vision_usage.rb` - OpenRouter vision/multimodal examples
+- `zai_usage.rb` - Z.ai GLM-4.5V vision and text examples
 ## Development

data/VISION_USAGE.md CHANGED Viewed

@@ -1,9 +1,57 @@
 # Vision/Multimodal Usage Guide
-This guide explains how to use vision/multimodal capabilities with the OpenRouter client in LLM Conductor.
+This guide explains how to use vision/multimodal capabilities with LLM Conductor. Vision support is available for Claude (Anthropic), GPT (OpenAI), OpenRouter, and Z.ai clients.
 ## Quick Start
+### Using Claude (Anthropic)
+```ruby
+require 'llm_conductor'
+# Configure
+LlmConductor.configure do |config|
+  config.anthropic(api_key: ENV['ANTHROPIC_API_KEY'])
+end
+# Analyze an image
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+puts response.output
+```
+### Using GPT (OpenAI)
+```ruby
+require 'llm_conductor'
+# Configure
+LlmConductor.configure do |config|
+  config.openai(api_key: ENV['OPENAI_API_KEY'])
+end
+# Analyze an image
+response = LlmConductor.generate(
+  model: 'gpt-4o',
+  vendor: :openai,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+puts response.output
+```
+### Using OpenRouter
 ```ruby
 require 'llm_conductor'
@@ -25,8 +73,50 @@ response = LlmConductor.generate(
 puts response.output
 ```
+### Using Z.ai (Zhipu AI)
+```ruby
+require 'llm_conductor'
+# Configure
+LlmConductor.configure do |config|
+  config.zai(api_key: ENV['ZAI_API_KEY'])
+end
+# Analyze an image with GLM-4.5V
+response = LlmConductor.generate(
+  model: 'glm-4.5v',
+  vendor: :zai,
+  prompt: {
+    text: 'What is in this image?',
+    images: 'https://example.com/image.jpg'
+  }
+)
+puts response.output
+```
 ## Recommended Models
+### Claude Models (Anthropic)
+For vision tasks via Anthropic API:
+- **`claude-sonnet-4-20250514`** - Claude Sonnet 4 (latest, best for vision) ✅
+- **`claude-opus-4-20250514`** - Claude Opus 4 (maximum quality)
+- **`claude-opus-4-1-20250805`** - Claude Opus 4.1 (newest flagship model)
+### GPT Models (OpenAI)
+For vision tasks via OpenAI API:
+- **`gpt-4o`** - Latest GPT-4 Omni with advanced vision capabilities ✅
+- **`gpt-4o-mini`** - Fast, cost-effective vision model
+- **`gpt-4-turbo`** - Previous generation with vision support
+- **`gpt-4-vision-preview`** - Legacy vision model (deprecated)
+### OpenRouter Models
 For vision tasks via OpenRouter, these models work reliably:
 - **`openai/gpt-4o-mini`** - Fast, reliable, good balance of cost/quality ✅
@@ -34,6 +124,14 @@ For vision tasks via OpenRouter, these models work reliably:
 - **`anthropic/claude-3.5-sonnet`** - High quality analysis
 - **`openai/gpt-4o`** - Best quality (higher cost)
+### Z.ai Models (Zhipu AI)
+For vision tasks via Z.ai, these GLM models are recommended:
+- **`glm-4.5v`** - GLM-4.5V multimodal model (64K context window) ✅
+- **`glm-4-plus`** - Text-only model with enhanced capabilities
+- **`glm-4v`** - Previous generation vision model
 ## Usage Formats
 ### 1. Single Image (Simple Format)
@@ -68,12 +166,12 @@ response = LlmConductor.generate(
 ### 3. Image with Detail Level
-For high-resolution images, specify the detail level:
+For high-resolution images, specify the detail level (supported by GPT and OpenRouter):
 ```ruby
 response = LlmConductor.generate(
-  model: 'openai/gpt-4o-mini',
-  vendor: :openrouter,
+  model: 'gpt-4o',
+  vendor: :openai,
   prompt: {
     text: 'Analyze this image in detail',
     images: [
@@ -83,19 +181,22 @@ response = LlmConductor.generate(
 )
 ```
-Detail levels:
+Detail levels (GPT and OpenRouter only):
 - `'high'` - Better for detailed analysis (uses more tokens)
 - `'low'` - Faster, cheaper (default if not specified)
 - `'auto'` - Let the model decide
+**Note:** Claude (Anthropic) and Z.ai don't support the `detail` parameter.
 ### 4. Raw Format (Advanced)
-For maximum control, use the OpenAI-compatible array format:
+For maximum control, use provider-specific array formats:
+**GPT/OpenRouter Format:**
 ```ruby
 response = LlmConductor.generate(
-  model: 'openai/gpt-4o-mini',
-  vendor: :openrouter,
+  model: 'gpt-4o',
+  vendor: :openai,
   prompt: [
     { type: 'text', text: 'What is in this image?' },
     { type: 'image_url', image_url: { url: 'https://example.com/image.jpg' } },
@@ -104,6 +205,18 @@ response = LlmConductor.generate(
 )
 ```
+**Claude Format:**
+```ruby
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: [
+    { type: 'image', source: { type: 'url', url: 'https://example.com/image.jpg' } },
+    { type: 'text', text: 'What is in this image? Describe it in detail.' }
+  ]
+)
+```
 ## Text-Only Requests (Backward Compatible)
 The client still supports regular text-only requests:
@@ -169,11 +282,30 @@ response = LlmConductor.generate(
 ### Run Examples
+For Claude:
+```bash
+export ANTHROPIC_API_KEY='your-key'
+ruby examples/claude_vision_usage.rb
+```
+For GPT:
+```bash
+export OPENAI_API_KEY='your-key'
+ruby examples/gpt_vision_usage.rb
+```
+For OpenRouter:
 ```bash
 export OPENROUTER_API_KEY='your-key'
 ruby examples/openrouter_vision_usage.rb
 ```
+For Z.ai:
+```bash
+export ZAI_API_KEY='your-key'
+ruby examples/zai_usage.rb
+```
 ## Token Counting
 Token counting for multimodal requests counts only the text portion. Image tokens vary by:
@@ -223,11 +355,16 @@ For production:
 ## Examples
-See `examples/openrouter_vision_usage.rb` for complete working examples.
+- `examples/claude_vision_usage.rb` - Complete Claude vision examples with Claude Sonnet 4
+- `examples/gpt_vision_usage.rb` - Complete GPT vision examples with GPT-4o
+- `examples/openrouter_vision_usage.rb` - Complete OpenRouter vision examples
+- `examples/zai_usage.rb` - Complete Z.ai GLM-4.5V examples including vision and text
 ## Further Reading
 - [OpenRouter Documentation](https://openrouter.ai/docs)
 - [OpenAI Vision API Reference](https://platform.openai.com/docs/guides/vision)
 - [Anthropic Claude Vision](https://docs.anthropic.com/claude/docs/vision)
+- [Z.ai API Platform](https://api.z.ai/)
+- [GLM-4.5V Documentation](https://bigmodel.cn/)

data/examples/claude_vision_usage.rb ADDED Viewed

@@ -0,0 +1,138 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require_relative '../lib/llm_conductor'
+# This example demonstrates using Claude Sonnet 4 vision capabilities
+# Set your Anthropic API key: export ANTHROPIC_API_KEY='your-key-here'
+puts '=' * 80
+puts 'Claude Sonnet 4 Vision Usage Examples'
+puts '=' * 80
+puts
+# Check for API key
+api_key = ENV['ANTHROPIC_API_KEY']
+if api_key.nil? || api_key.empty?
+  puts 'ERROR: ANTHROPIC_API_KEY environment variable is not set!'
+  puts
+  puts 'Please set your Anthropic API key:'
+  puts '  export ANTHROPIC_API_KEY="your-key-here"'
+  puts
+  puts 'You can get an API key from: https://console.anthropic.com/'
+  exit 1
+end
+# Configure the client
+LlmConductor.configure do |config|
+  config.anthropic(api_key:)
+end
+# Example 1: Single Image Analysis
+puts "\n1. Single Image Analysis"
+puts '-' * 80
+begin
+  response = LlmConductor.generate(
+    model: 'claude-sonnet-4-20250514',
+    vendor: :anthropic,
+    prompt: {
+      text: 'What is in this image? Please describe it in detail.',
+      images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+    }
+  )
+  puts "Response: #{response.output}"
+  puts "Success: #{response.success?}"
+  puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+  puts "Metadata: #{response.metadata.inspect}" if response.metadata && !response.metadata.empty?
+rescue StandardError => e
+  puts "ERROR: #{e.message}"
+  puts "Backtrace: #{e.backtrace.first(5).join("\n")}"
+end
+# Example 2: Multiple Images Comparison
+puts "\n2. Multiple Images Comparison"
+puts '-' * 80
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: {
+    text: 'Compare these two images. What are the main differences?',
+    images: [
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/1024px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg',
+      'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3f/Placeholder_view_vector.svg/1024px-Placeholder_view_vector.svg.png'
+    ]
+  }
+)
+puts "Response: #{response.output}"
+puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+# Example 3: Image with Specific Question
+puts "\n3. Image with Specific Question"
+puts '-' * 80
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: {
+    text: 'Is there a wooden boardwalk visible in this image? If yes, describe its condition.',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/1024px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts "Response: #{response.output}"
+puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+# Example 4: Raw Format (Advanced)
+puts "\n4. Raw Format (Advanced)"
+puts '-' * 80
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: [
+    { type: 'image',
+      source: { type: 'url',
+                url: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/1024px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg' } },
+    { type: 'text', text: 'Describe the weather conditions in this image.' }
+  ]
+)
+puts "Response: #{response.output}"
+puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+# Example 5: Text-Only Request (Backward Compatible)
+puts "\n5. Text-Only Request (Backward Compatible)"
+puts '-' * 80
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: 'What is the capital of France?'
+)
+puts "Response: #{response.output}"
+puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+# Example 6: Image Analysis with Detailed Instructions
+puts "\n6. Image Analysis with Detailed Instructions"
+puts '-' * 80
+response = LlmConductor.generate(
+  model: 'claude-sonnet-4-20250514',
+  vendor: :anthropic,
+  prompt: {
+    text: 'Analyze this image and provide: 1) Main subjects, 2) Colors and lighting, 3) Mood or atmosphere, 4) Any notable details',
+    images: 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/1024px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg'
+  }
+)
+puts "Response: #{response.output}"
+puts "Tokens: #{response.input_tokens} input, #{response.output_tokens} output"
+puts "\n#{'=' * 80}"
+puts 'All examples completed successfully!'
+puts '=' * 80