gepa 0.29.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 1562683f309fd71c4f1e2c9db9c95b50abae8497756705af2763e4a789a8637b
4
+ data.tar.gz: 6e5602fe9b36a6b82e6fec5ebb90cbf3b735a677e08b105371024498bbda766b
5
+ SHA512:
6
+ metadata.gz: 3d6df6a5d14a1e7a060323190ee58c0eb94590abac048c11f3ebc5bd63430dce7b21d90ef5eab791dd869fefeb008c8456282eade682c9faa64f34b8651ba75d
7
+ data.tar.gz: a19baacf4ed4b8a4bec473fe86c9d9a200b76ade8fabae5106bf26e5e1470c27fc3ab04ac43c2e4a8d607a5008120ed65d511ac93a7baddc6c15480088504be6
data/LICENSE ADDED
@@ -0,0 +1,45 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Vicente Services SL
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
23
+ This project is a Ruby port of the original Python [DSPy library](https://github.com/stanfordnlp/dspy), which is licensed under the MIT License:
24
+
25
+ MIT License
26
+
27
+ Copyright (c) 2023 Stanford Future Data Systems
28
+
29
+ Permission is hereby granted, free of charge, to any person obtaining a copy
30
+ of this software and associated documentation files (the "Software"), to deal
31
+ in the Software without restriction, including without limitation the rights
32
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
33
+ copies of the Software, and to permit persons to whom the Software is
34
+ furnished to do so, subject to the following conditions:
35
+
36
+ The above copyright notice and this permission notice shall be included in all
37
+ copies or substantial portions of the Software.
38
+
39
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
40
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
41
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
42
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
43
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
44
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
45
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,247 @@
1
+ # DSPy.rb
2
+
3
+ [![Gem Version](https://img.shields.io/gem/v/dspy)](https://rubygems.org/gems/dspy)
4
+ [![Total Downloads](https://img.shields.io/gem/dt/dspy)](https://rubygems.org/gems/dspy)
5
+ [![Build Status](https://img.shields.io/github/actions/workflow/status/vicentereig/dspy.rb/ruby.yml?branch=main&label=build)](https://github.com/vicentereig/dspy.rb/actions/workflows/ruby.yml)
6
+ [![Documentation](https://img.shields.io/badge/docs-vicentereig.github.io%2Fdspy.rb-blue)](https://vicentereig.github.io/dspy.rb/)
7
+
8
+ **Build reliable LLM applications in idiomatic Ruby using composable, type-safe modules.**
9
+
10
+ The Ruby framework for programming with large language models. DSPy.rb brings structured LLM programming to Ruby developers. Instead of wrestling with prompt strings and parsing responses, you define typed signatures using idiomatic Ruby to compose and decompose AI Worklows and AI Agents.
11
+
12
+ **Prompts are the just Functions.** Traditional prompting is like writing code with string concatenation: it works until it doesn't. DSPy.rb brings you
13
+ the programming approach pioneered by [dspy.ai](https://dspy.ai/): instead of crafting fragile prompts, you define modular
14
+ signatures and let the framework handle the messy details.
15
+
16
+ DSPy.rb is an idiomatic Ruby surgical port of Stanford's [DSPy framework](https://github.com/stanfordnlp/dspy). While implementing
17
+ the core concepts of signatures, predictors, and optimization from the original Python library, DSPy.rb embraces Ruby
18
+ conventions and adds Ruby-specific innovations like CodeAct agents and enhanced production instrumentation.
19
+
20
+ The result? LLM applications that actually scale and don't break when you sneeze.
21
+
22
+ ## Your First DSPy Program
23
+
24
+ ```ruby
25
+ # Define a signature for sentiment classification
26
+ class Classify < DSPy::Signature
27
+ description "Classify sentiment of a given sentence."
28
+
29
+ class Sentiment < T::Enum
30
+ enums do
31
+ Positive = new('positive')
32
+ Negative = new('negative')
33
+ Neutral = new('neutral')
34
+ end
35
+ end
36
+
37
+ input do
38
+ const :sentence, String
39
+ end
40
+
41
+ output do
42
+ const :sentiment, Sentiment
43
+ const :confidence, Float
44
+ end
45
+ end
46
+
47
+ # Configure DSPy with your LLM
48
+ DSPy.configure do |c|
49
+ c.lm = DSPy::LM.new('openai/gpt-4o-mini',
50
+ api_key: ENV['OPENAI_API_KEY'],
51
+ structured_outputs: true) # Enable OpenAI's native JSON mode
52
+ end
53
+
54
+ # Create the predictor and run inference
55
+ classify = DSPy::Predict.new(Classify)
56
+ result = classify.call(sentence: "This book was super fun to read!")
57
+
58
+ puts result.sentiment # => #<Sentiment::Positive>
59
+ puts result.confidence # => 0.85
60
+ ```
61
+
62
+ ### Access to 200+ Models Across 5 Providers
63
+
64
+ DSPy.rb provides unified access to major LLM providers with provider-specific optimizations:
65
+
66
+ ```ruby
67
+ # OpenAI (GPT-4, GPT-4o, GPT-4o-mini, GPT-5, etc.)
68
+ DSPy.configure do |c|
69
+ c.lm = DSPy::LM.new('openai/gpt-4o-mini',
70
+ api_key: ENV['OPENAI_API_KEY'],
71
+ structured_outputs: true) # Native JSON mode
72
+ end
73
+
74
+ # Google Gemini (Gemini 1.5 Pro, Flash, Gemini 2.0, etc.)
75
+ DSPy.configure do |c|
76
+ c.lm = DSPy::LM.new('gemini/gemini-2.5-flash',
77
+ api_key: ENV['GEMINI_API_KEY'],
78
+ structured_outputs: true) # Native structured outputs
79
+ end
80
+
81
+ # Anthropic Claude (Claude 3.5, Claude 4, etc.)
82
+ DSPy.configure do |c|
83
+ c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-5-20250929',
84
+ api_key: ENV['ANTHROPIC_API_KEY'],
85
+ structured_outputs: true) # Tool-based extraction (default)
86
+ end
87
+
88
+ # Ollama - Run any local model (Llama, Mistral, Gemma, etc.)
89
+ DSPy.configure do |c|
90
+ c.lm = DSPy::LM.new('ollama/llama3.2') # Free, runs locally, no API key needed
91
+ end
92
+
93
+ # OpenRouter - Access to 200+ models from multiple providers
94
+ DSPy.configure do |c|
95
+ c.lm = DSPy::LM.new('openrouter/deepseek/deepseek-chat-v3.1:free',
96
+ api_key: ENV['OPENROUTER_API_KEY'])
97
+ end
98
+ ```
99
+
100
+ ## What You Get
101
+
102
+ **Core Building Blocks:**
103
+ - **Signatures** - Define input/output schemas using Sorbet types with T::Enum and union type support
104
+ - **Predict** - LLM completion with structured data extraction and multimodal support
105
+ - **Chain of Thought** - Step-by-step reasoning for complex problems with automatic prompt optimization
106
+ - **ReAct** - Tool-using agents with type-safe tool definitions and error recovery
107
+ - **CodeAct** - Dynamic code execution agents for programming tasks
108
+ - **Module Composition** - Combine multiple LLM calls into production-ready workflows
109
+
110
+ **Optimization & Evaluation:**
111
+ - **Prompt Objects** - Manipulate prompts as first-class objects instead of strings
112
+ - **Typed Examples** - Type-safe training data with automatic validation
113
+ - **Evaluation Framework** - Advanced metrics beyond simple accuracy with error-resilient pipelines
114
+ - **MIPROv2 Optimization** - Advanced Bayesian optimization with Gaussian Processes, multiple optimization strategies, auto-config presets, and storage persistence
115
+
116
+ **Production Features:**
117
+ - **Reliable JSON Extraction** - Native structured outputs for OpenAI and Gemini, Anthropic tool-based extraction, and automatic strategy selection with fallback
118
+ - **Type-Safe Configuration** - Strategy enums with automatic provider optimization (Strict/Compatible modes)
119
+ - **Smart Retry Logic** - Progressive fallback with exponential backoff for handling transient failures
120
+ - **Zero-Config Langfuse Integration** - Set env vars and get automatic OpenTelemetry traces in Langfuse
121
+ - **Performance Caching** - Schema and capability caching for faster repeated operations
122
+ - **File-based Storage** - Optimization result persistence with versioning
123
+ - **Structured Logging** - JSON and key=value formats with span tracking
124
+
125
+ **Developer Experience:**
126
+ - LLM provider support using official Ruby clients:
127
+ - [OpenAI Ruby](https://github.com/openai/openai-ruby) with vision model support
128
+ - [Anthropic Ruby SDK](https://github.com/anthropics/anthropic-sdk-ruby) with multimodal capabilities
129
+ - [Google Gemini API](https://ai.google.dev/) with native structured outputs
130
+ - [Ollama](https://ollama.com/) via OpenAI compatibility layer for local models
131
+ - **Multimodal Support** - Complete image analysis with DSPy::Image, type-safe bounding boxes, vision-capable models
132
+ - Runtime type checking with [Sorbet](https://sorbet.org/) including T::Enum and union types
133
+ - Type-safe tool definitions for ReAct agents
134
+ - Comprehensive instrumentation and observability
135
+
136
+ ## Development Status
137
+
138
+ DSPy.rb is actively developed and approaching stability. The core framework is production-ready with
139
+ comprehensive documentation, but I'm battle-testing features through the 0.x series before committing
140
+ to a stable v1.0 API.
141
+
142
+ Real-world usage feedback is invaluable - if you encounter issues or have suggestions, please open a GitHub issue!
143
+
144
+ ## Documentation
145
+
146
+ 📖 **[Complete Documentation Website](https://vicentereig.github.io/dspy.rb/)**
147
+
148
+ ### LLM-Friendly Documentation
149
+
150
+ For LLMs and AI assistants working with DSPy.rb:
151
+ - **[llms.txt](https://vicentereig.github.io/dspy.rb/llms.txt)** - Concise reference optimized for LLMs
152
+ - **[llms-full.txt](https://vicentereig.github.io/dspy.rb/llms-full.txt)** - Comprehensive API documentation
153
+
154
+ ### Getting Started
155
+ - **[Installation & Setup](docs/src/getting-started/installation.md)** - Detailed installation and configuration
156
+ - **[Quick Start Guide](docs/src/getting-started/quick-start.md)** - Your first DSPy programs
157
+ - **[Core Concepts](docs/src/getting-started/core-concepts.md)** - Understanding signatures, predictors, and modules
158
+
159
+ ### Core Features
160
+ - **[Signatures & Types](docs/src/core-concepts/signatures.md)** - Define typed interfaces for LLM operations
161
+ - **[Predictors](docs/src/core-concepts/predictors.md)** - Predict, ChainOfThought, ReAct, and more
162
+ - **[Modules & Pipelines](docs/src/core-concepts/modules.md)** - Compose complex multi-stage workflows
163
+ - **[Multimodal Support](docs/src/core-concepts/multimodal.md)** - Image analysis with vision-capable models
164
+ - **[Examples & Validation](docs/src/core-concepts/examples.md)** - Type-safe training data
165
+
166
+ ### Optimization
167
+ - **[Evaluation Framework](docs/src/optimization/evaluation.md)** - Advanced metrics beyond simple accuracy
168
+ - **[Prompt Optimization](docs/src/optimization/prompt-optimization.md)** - Manipulate prompts as objects
169
+ - **[MIPROv2 Optimizer](docs/src/optimization/miprov2.md)** - Advanced Bayesian optimization with Gaussian Processes
170
+ - **[GEPA Optimizer](docs/src/optimization/gepa.md)** *(beta)* - Reflective mutation with optional reflection LMs
171
+
172
+ ### Production Features
173
+ - **[Storage System](docs/src/production/storage.md)** - Persistence and optimization result storage
174
+ - **[Observability](docs/src/production/observability.md)** - Zero-config Langfuse integration with a dedicated export worker that never blocks your LLMs
175
+
176
+ ### Advanced Usage
177
+ - **[Complex Types](docs/src/advanced/complex-types.md)** - Sorbet type integration with automatic coercion for structs, enums, and arrays
178
+ - **[Manual Pipelines](docs/src/advanced/pipelines.md)** - Manual module composition patterns
179
+ - **[RAG Patterns](docs/src/advanced/rag.md)** - Manual RAG implementation with external services
180
+ - **[Custom Metrics](docs/src/advanced/custom-metrics.md)** - Proc-based evaluation logic
181
+
182
+ ## Quick Start
183
+
184
+ ### Installation
185
+
186
+ Add to your Gemfile:
187
+
188
+ ```ruby
189
+ gem 'dspy'
190
+ ```
191
+
192
+ Then run:
193
+
194
+ ```bash
195
+ bundle install
196
+ ```
197
+
198
+ ## Recent Achievements
199
+
200
+ DSPy.rb has rapidly evolved from experimental to production-ready:
201
+
202
+ ### Foundation
203
+ - ✅ **JSON Parsing Reliability** - Native OpenAI structured outputs, strategy selection, retry logic
204
+ - ✅ **Type-Safe Strategy Configuration** - Provider-optimized automatic strategy selection
205
+ - ✅ **Core Module System** - Predict, ChainOfThought, ReAct, CodeAct with type safety
206
+ - ✅ **Production Observability** - OpenTelemetry, New Relic, and Langfuse integration
207
+ - ✅ **Advanced Optimization** - MIPROv2 with Bayesian optimization, Gaussian Processes, and multiple strategies
208
+
209
+ ### Recent Advances
210
+ - ✅ **Enhanced Langfuse Integration (v0.25.0)** - Comprehensive OpenTelemetry span reporting with proper input/output, hierarchical nesting, accurate timing, and observation types
211
+ - ✅ **Comprehensive Multimodal Framework** - Complete image analysis with `DSPy::Image`, type-safe bounding boxes, vision model integration
212
+ - ✅ **Advanced Type System** - `T::Enum` integration, union types for agentic workflows, complex type coercion
213
+ - ✅ **Production-Ready Evaluation** - Multi-factor metrics beyond accuracy, error-resilient evaluation pipelines
214
+ - ✅ **Documentation Ecosystem** - `llms.txt` for AI assistants, ADRs, blog articles, comprehensive examples
215
+ - ✅ **API Maturation** - Simplified idiomatic patterns, better error handling, production-proven designs
216
+
217
+ ## Roadmap - Production Battle-Testing Toward v1.0
218
+
219
+ DSPy.rb has transitioned from **feature building** to **production validation**. The core framework is
220
+ feature-complete and stable - now I'm focusing on real-world usage patterns, performance optimization,
221
+ and ecosystem integration.
222
+
223
+ **Current Focus Areas:**
224
+
225
+ ### Production Readiness
226
+ - 🚧 **Production Patterns** - Real-world usage validation and performance optimization
227
+ - 🚧 **Ruby Ecosystem Integration** - Rails integration, Sidekiq compatibility, deployment patterns
228
+ - 🚧 **Scale Testing** - High-volume usage, memory management, connection pooling
229
+ - 🚧 **Error Recovery** - Robust failure handling patterns for production environments
230
+
231
+ ### Ecosystem Expansion
232
+ - 🚧 **Model Context Protocol (MCP)** - Integration with MCP ecosystem
233
+ - 🚧 **Additional Provider Support** - Azure OpenAI, local models beyond Ollama
234
+ - 🚧 **Tool Ecosystem** - Expanded tool integrations for ReAct agents
235
+
236
+ ### Community & Adoption
237
+ - 🚧 **Community Examples** - Real-world applications and case studies
238
+ - 🚧 **Contributor Experience** - Making it easier to contribute and extend
239
+ - 🚧 **Performance Benchmarks** - Comparative analysis vs other frameworks
240
+
241
+ **v1.0 Philosophy:**
242
+ v1.0 will be released after extensive production battle-testing, not after checking off features.
243
+ The API is already stable - v1.0 represents confidence in production reliability backed by real-world validation.
244
+
245
+ ## License
246
+
247
+ This project is licensed under the MIT License.
data/lib/gepa/api.rb ADDED
@@ -0,0 +1,61 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sorbet-runtime'
4
+
5
+ require_relative 'core/engine'
6
+ require_relative 'core/result'
7
+
8
+ module GEPA
9
+ extend T::Sig
10
+ module_function
11
+
12
+ sig do
13
+ params(
14
+ seed_candidate: T::Hash[String, String],
15
+ trainset: T::Array[T.untyped],
16
+ valset: T::Array[T.untyped],
17
+ adapter: T.untyped,
18
+ reflective_proposer: T.untyped,
19
+ merge_proposer: T.nilable(T.untyped),
20
+ logger: T.untyped,
21
+ experiment_tracker: T.untyped,
22
+ max_metric_calls: Integer,
23
+ telemetry: T.nilable(T.untyped)
24
+ ).returns(GEPA::Core::Result)
25
+ end
26
+ def optimize(
27
+ seed_candidate:,
28
+ trainset:,
29
+ valset:,
30
+ adapter:,
31
+ reflective_proposer:,
32
+ merge_proposer: nil,
33
+ logger:,
34
+ experiment_tracker:,
35
+ max_metric_calls:,
36
+ telemetry: nil
37
+ )
38
+ evaluator = proc { |dataset, candidate| adapter.evaluate(dataset, candidate) }
39
+
40
+ engine = GEPA::Core::Engine.new(
41
+ run_dir: nil,
42
+ evaluator: evaluator,
43
+ valset: valset,
44
+ seed_candidate: seed_candidate,
45
+ max_metric_calls: max_metric_calls,
46
+ perfect_score: Float::INFINITY,
47
+ seed: 0,
48
+ reflective_proposer: reflective_proposer,
49
+ merge_proposer: merge_proposer,
50
+ logger: logger,
51
+ experiment_tracker: experiment_tracker,
52
+ telemetry: telemetry || GEPA::Telemetry,
53
+ track_best_outputs: false,
54
+ display_progress_bar: false,
55
+ raise_on_exception: true
56
+ )
57
+
58
+ state = engine.run
59
+ GEPA::Core::Result.from_state(state)
60
+ end
61
+ end
@@ -0,0 +1,226 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sorbet-runtime'
4
+
5
+ require_relative 'state'
6
+ require_relative 'result'
7
+ require_relative '../telemetry'
8
+
9
+ module GEPA
10
+ module Core
11
+ class Engine
12
+ extend T::Sig
13
+
14
+ sig do
15
+ params(
16
+ evaluator: T.proc.params(dataset: T::Array[T.untyped], candidate: T::Hash[String, String])
17
+ .returns([T::Array[T.untyped], T::Array[Float]]),
18
+ valset: T::Array[T.untyped],
19
+ seed_candidate: T::Hash[String, String],
20
+ max_metric_calls: Integer,
21
+ perfect_score: Float,
22
+ seed: Integer,
23
+ reflective_proposer: T.untyped,
24
+ logger: T.untyped,
25
+ experiment_tracker: T.untyped,
26
+ merge_proposer: T.nilable(T.untyped),
27
+ run_dir: T.nilable(String),
28
+ track_best_outputs: T::Boolean,
29
+ display_progress_bar: T::Boolean,
30
+ telemetry: T.nilable(T.untyped),
31
+ raise_on_exception: T::Boolean
32
+ ).void
33
+ end
34
+ def initialize(
35
+ evaluator:,
36
+ valset:,
37
+ seed_candidate:,
38
+ max_metric_calls:,
39
+ perfect_score:,
40
+ seed:, # rubocop:disable Lint/UnusedMethodArgument -- kept for parity and future use
41
+ reflective_proposer:,
42
+ logger:,
43
+ experiment_tracker:,
44
+ merge_proposer: nil,
45
+ run_dir: nil,
46
+ track_best_outputs: false,
47
+ display_progress_bar: false,
48
+ telemetry: nil,
49
+ raise_on_exception: true
50
+ )
51
+ @run_dir = run_dir
52
+ @evaluator = evaluator
53
+ @valset = valset
54
+ @seed_candidate = seed_candidate
55
+ @max_metric_calls = max_metric_calls
56
+ @perfect_score = perfect_score
57
+ @reflective_proposer = reflective_proposer
58
+ @merge_proposer = merge_proposer
59
+ @logger = logger
60
+ @experiment_tracker = experiment_tracker
61
+ @track_best_outputs = track_best_outputs
62
+ @display_progress_bar = display_progress_bar
63
+ @telemetry = telemetry || GEPA::Telemetry
64
+ @raise_on_exception = raise_on_exception
65
+ end
66
+
67
+ sig { returns(GEPA::Core::State) }
68
+ def run
69
+ with_span('gepa.engine.run', max_metric_calls: @max_metric_calls) do
70
+ state = GEPA::Core::State.initialize_gepa_state(
71
+ run_dir: @run_dir,
72
+ logger: @logger,
73
+ seed_candidate: @seed_candidate,
74
+ valset_evaluator: ->(candidate) { full_evaluator(candidate) },
75
+ track_best_outputs: @track_best_outputs
76
+ )
77
+
78
+ @experiment_tracker.log_metrics({ base_program_full_valset_score: state.program_full_scores_val_set.first }, step: 0)
79
+
80
+ if @merge_proposer
81
+ @merge_proposer.last_iter_found_new_program = false
82
+ end
83
+
84
+ while state.total_num_evals < @max_metric_calls
85
+ break unless iteration_step(state)
86
+ end
87
+
88
+ state.save(@run_dir)
89
+ state
90
+ end
91
+ end
92
+
93
+ private
94
+
95
+ sig { params(state: GEPA::Core::State).returns(T::Boolean) }
96
+ def iteration_step(state)
97
+ state.i += 1
98
+ trace_entry = { iteration: state.i }
99
+ state.full_program_trace << trace_entry
100
+
101
+ progress = false
102
+
103
+ with_span('gepa.engine.iteration', iteration: state.i) do
104
+ merge_result = process_merge_iteration(state)
105
+ case merge_result
106
+ when :accepted
107
+ return true
108
+ when :attempted
109
+ return false
110
+ end
111
+
112
+ reflective_result = process_reflective_iteration(state)
113
+ return false if reflective_result == :no_candidate
114
+ progress = true if reflective_result == :accepted
115
+ end
116
+
117
+ progress
118
+ rescue StandardError => e
119
+ @logger.log("Iteration #{state.i}: Exception during optimization: #{e}")
120
+ @logger.log(e.backtrace&.join("\n"))
121
+ raise e if @raise_on_exception
122
+ true
123
+ end
124
+
125
+ sig { params(state: GEPA::Core::State).returns(Symbol) }
126
+ def process_merge_iteration(state)
127
+ return :skipped unless @merge_proposer && @merge_proposer.use_merge
128
+
129
+ if @merge_proposer.merges_due.positive? && @merge_proposer.last_iter_found_new_program
130
+ proposal = @merge_proposer.propose(state)
131
+ @merge_proposer.last_iter_found_new_program = false
132
+
133
+ if proposal&.tag == 'merge'
134
+ parent_sums = Array(proposal.subsample_scores_before).map(&:to_f)
135
+ new_sum = Array(proposal.subsample_scores_after).map(&:to_f).sum
136
+
137
+ if parent_sums.empty?
138
+ @logger.log("Iteration #{state.i}: Missing parent subscores for merge proposal, skipping")
139
+ return :handled
140
+ end
141
+
142
+ if new_sum >= parent_sums.max
143
+ with_span('gepa.engine.full_evaluation', iteration: state.i) do
144
+ run_full_evaluation(state, proposal.candidate, proposal.parent_program_ids)
145
+ end
146
+ @merge_proposer.merges_due -= 1
147
+ @merge_proposer.total_merges_tested += 1
148
+ return :accepted
149
+ else
150
+ @logger.log(
151
+ "Iteration #{state.i}: Merge subsample score #{new_sum.round(4)} "\
152
+ "did not beat parents #{parent_sums.map { |v| v.round(4) }}, skipping"
153
+ )
154
+ return :attempted
155
+ end
156
+ end
157
+ end
158
+
159
+ @merge_proposer.last_iter_found_new_program = false
160
+ :skipped
161
+ end
162
+
163
+ sig { params(state: GEPA::Core::State).void }
164
+ def process_reflective_iteration(state)
165
+ proposal = @reflective_proposer.propose(state)
166
+ unless proposal
167
+ @logger.log("Iteration #{state.i}: Reflective mutation did not propose a new candidate")
168
+ return :no_candidate
169
+ end
170
+
171
+ before = Array(proposal.subsample_scores_before).map(&:to_f)
172
+ after = Array(proposal.subsample_scores_after).map(&:to_f)
173
+ if after.empty? || after.sum <= before.sum
174
+ @logger.log("Iteration #{state.i}: New subsample score is not better, skipping")
175
+ return :skipped
176
+ end
177
+
178
+ with_span('gepa.engine.full_evaluation', iteration: state.i) do
179
+ run_full_evaluation(state, proposal.candidate, proposal.parent_program_ids)
180
+ end
181
+
182
+ if @merge_proposer&.use_merge
183
+ @merge_proposer.last_iter_found_new_program = true
184
+ @merge_proposer.schedule_if_needed
185
+ end
186
+
187
+ :accepted
188
+ end
189
+
190
+ sig do
191
+ params(state: GEPA::Core::State, new_program: T::Hash[String, String], parents: T::Array[Integer]).void
192
+ end
193
+ def run_full_evaluation(state, new_program, parents)
194
+ outputs, scores = full_evaluator(new_program)
195
+ avg_score = scores.sum / scores.length.to_f
196
+
197
+ state.num_full_ds_evals += 1
198
+ state.total_num_evals += scores.length
199
+
200
+ state.update_state_with_new_program(
201
+ parents,
202
+ new_program,
203
+ avg_score,
204
+ outputs,
205
+ scores,
206
+ @run_dir,
207
+ state.total_num_evals
208
+ )
209
+
210
+ @experiment_tracker.log_metrics({ new_program_full_score: avg_score }, step: state.i)
211
+ end
212
+
213
+ sig { params(candidate: T::Hash[String, String]).returns([T::Array[T.untyped], T::Array[Float]]) }
214
+ def full_evaluator(candidate)
215
+ @evaluator.call(@valset, candidate)
216
+ end
217
+
218
+ sig do
219
+ params(operation: String, attrs: T::Hash[Symbol, T.untyped], block: T.proc.returns(T.untyped)).returns(T.untyped)
220
+ end
221
+ def with_span(operation, attrs = {}, &block)
222
+ @telemetry.with_span(operation, attrs, &block)
223
+ end
224
+ end
225
+ end
226
+ end
@@ -0,0 +1,26 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'sorbet-runtime'
4
+
5
+ module GEPA
6
+ module Core
7
+ # Container for evaluating a candidate on a batch.
8
+ class EvaluationBatch < T::Struct
9
+ extend T::Sig
10
+
11
+ const :outputs, T::Array[T.untyped]
12
+ const :scores, T::Array[Float]
13
+ const :trajectories, T.nilable(T::Array[T.untyped])
14
+
15
+ sig { override.params(args: T.untyped, kwargs: T.untyped).void }
16
+ def initialize(*args, **kwargs)
17
+ super
18
+ raise ArgumentError, 'outputs and scores length mismatch' unless outputs.length == scores.length
19
+
20
+ if trajectories
21
+ raise ArgumentError, 'trajectories length mismatch' unless trajectories.length == outputs.length
22
+ end
23
+ end
24
+ end
25
+ end
26
+ end