dspy-gepa 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: e10764f9e3ab0f02357872a5a321c6bc441887f368e7f10f45133d2e8bdc7654
4
+ data.tar.gz: dc0246817a8c6eef2e077265f455faead3169197b9c9fed8ebee60307533827b
5
+ SHA512:
6
+ metadata.gz: 83f3f9b98b8ce979037396652eaa155ea53e31e1bb5ae8c0e813adeb56e2be48072b9c6656f780413b94d59c37553a389f8d3d4d3528e3ab32be090bc8ea1871
7
+ data.tar.gz: 1988ebec7b7ce2383501e7d8610bda3d13777e73c871a59d290eaf02089b450606271cacfc0707d9e3b58126ca1fb4424efe04c93c8c1a77537d192c84ef33f3
data/LICENSE ADDED
@@ -0,0 +1,45 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Vicente Services SL
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
23
+ This project is a Ruby port of the original Python [DSPy library](https://github.com/stanfordnlp/dspy), which is licensed under the MIT License:
24
+
25
+ MIT License
26
+
27
+ Copyright (c) 2023 Stanford Future Data Systems
28
+
29
+ Permission is hereby granted, free of charge, to any person obtaining a copy
30
+ of this software and associated documentation files (the "Software"), to deal
31
+ in the Software without restriction, including without limitation the rights
32
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
33
+ copies of the Software, and to permit persons to whom the Software is
34
+ furnished to do so, subject to the following conditions:
35
+
36
+ The above copyright notice and this permission notice shall be included in all
37
+ copies or substantial portions of the Software.
38
+
39
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
40
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
41
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
42
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
43
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
44
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
45
+ SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,267 @@
1
+ # DSPy.rb
2
+
3
+ [![Gem Version](https://img.shields.io/gem/v/dspy)](https://rubygems.org/gems/dspy)
4
+ [![Total Downloads](https://img.shields.io/gem/dt/dspy)](https://rubygems.org/gems/dspy)
5
+ [![Build Status](https://img.shields.io/github/actions/workflow/status/vicentereig/dspy.rb/ruby.yml?branch=main&label=build)](https://github.com/vicentereig/dspy.rb/actions/workflows/ruby.yml)
6
+ [![Documentation](https://img.shields.io/badge/docs-vicentereig.github.io%2Fdspy.rb-blue)](https://vicentereig.github.io/dspy.rb/)
7
+
8
+ > [!NOTE]
9
+ > The core Prompt Engineering Framework is production-ready with
10
+ > comprehensive documentation. I am focusing now on educational content on systematic Prompt Optimization and Context Engineering.
11
+ > Your feedback is invaluable. if you encounter issues, please open an [issue](https://github.com/vicentereig/dspy.rb/issues). If you have suggestions, open a [new thread](https://github.com/vicentereig/dspy.rb/discussions).
12
+ >
13
+ > If you want to contribute, feel free to reach out to me to coordinate efforts: hey at vicente.services
14
+ >
15
+ > And, yes, this is 100% a legit project. :)
16
+
17
+
18
+ **Build reliable LLM applications in idiomatic Ruby using composable, type-safe modules.**
19
+
20
+ The Ruby framework for programming with large language models. DSPy.rb brings structured LLM programming to Ruby developers, programmatic Prompt Engineering and Context Engineering.
21
+ Instead of wrestling with prompt strings and parsing responses, you define typed signatures using idiomatic Ruby to compose and decompose AI Worklows and AI Agents.
22
+
23
+ **Prompts are the just Functions.** Traditional prompting is like writing code with string concatenation: it works until it doesn't. DSPy.rb brings you
24
+ the programming approach pioneered by [dspy.ai](https://dspy.ai/): instead of crafting fragile prompts, you define modular
25
+ signatures and let the framework handle the messy details.
26
+
27
+ DSPy.rb is an idiomatic Ruby surgical port of Stanford's [DSPy framework](https://github.com/stanfordnlp/dspy). While implementing
28
+ the core concepts of signatures, predictors, and the main optimization algorithms from the original Python library, DSPy.rb embraces Ruby
29
+ conventions and adds Ruby-specific innovations like Sorbet-base Typed system, ReAct loops, and production-ready integrations like non-blocking Open Telemetry Instrumentation.
30
+
31
+ **What you get?** Ruby LLM applications that actually scale and don't break when you sneeze.
32
+
33
+ Check the [examples](examples/) and take them for a spin!
34
+
35
+ ## Your First DSPy Program
36
+ ### Installation
37
+
38
+ Add to your Gemfile:
39
+
40
+ ```ruby
41
+ gem 'dspy'
42
+ ```
43
+
44
+ and
45
+
46
+ ```bash
47
+ bundle install
48
+ ```
49
+
50
+ ### Optional Sibling Gems
51
+
52
+ DSPy.rb ships multiple gems from this monorepo so you only install what you need. Add these alongside `dspy`:
53
+
54
+ | Gem | Description | Status |
55
+ | --- | --- | --- |
56
+ | `dspy-schema` | Exposes `DSPy::TypeSystem::SorbetJsonSchema` for downstream reuse. | **Stable** (v1.0.0) |
57
+ | `dspy-code_act` | Think-Code-Observe agents that synthesize and execute Ruby safely. | Preview (0.x) |
58
+ | `dspy-datasets` | Dataset helpers plus Parquet/Polars tooling for richer evaluation corpora. | Preview (0.x) |
59
+ | `dspy-evals` | High-throughput evaluation harness with metrics, callbacks, and regression fixtures. | Preview (0.x) |
60
+ | `dspy-miprov2` | Bayesian optimization + Gaussian Process backend for the MIPROv2 teleprompter. | Preview (0.x) |
61
+ | `dspy-gepa` | `DSPy::Teleprompt::GEPA`, reflection loops, experiment tracking, telemetry adapters. | Preview (mirrors `dspy` version) |
62
+ | `gepa` | GEPA optimizer core (Pareto engine, telemetry, reflective proposer). | Preview (mirrors `dspy` version) |
63
+ | `dspy-o11y` | Core observability APIs: `DSPy::Observability`, async span processor, observation types. | **Stable** (v1.0.0) |
64
+ | `dspy-o11y-langfuse` | Auto-configures DSPy observability to stream spans to Langfuse via OTLP. | **Stable** (v1.0.0) |
65
+
66
+ Set the matching `DSPY_WITH_*` environment variables (see `Gemfile`) to include or exclude each sibling gem when running Bundler locally (for example `DSPY_WITH_GEPA=1` or `DSPY_WITH_O11Y_LANGFUSE=1`). Refer to `docs/core-concepts/dependency-tree.md` for the full dependency map and roadmap.
67
+ ### Your First Reliable Predictor
68
+
69
+ ```ruby
70
+
71
+ # Configure DSPy globablly to use your fave LLM - you can override this on an instance levle.
72
+ DSPy.configure do |c|
73
+ c.lm = DSPy::LM.new('openai/gpt-4o-mini',
74
+ api_key: ENV['OPENAI_API_KEY'],
75
+ structured_outputs: true) # Enable OpenAI's native JSON mode
76
+ end
77
+
78
+ # Define a signature for sentiment classification - instead of writing a full prompt!
79
+ class Classify < DSPy::Signature
80
+ description "Classify sentiment of a given sentence." # sets the goal of the underlying prompt
81
+
82
+ class Sentiment < T::Enum
83
+ enums do
84
+ Positive = new('positive')
85
+ Negative = new('negative')
86
+ Neutral = new('neutral')
87
+ end
88
+ end
89
+
90
+ # Structured Inputs: makes sure you are sending only valid prompt inputs to your model
91
+ input do
92
+ const :sentence, String, description: 'The sentence to analyze'
93
+ end
94
+
95
+ # Structured Outputs: your predictor will validate the output of the model too.
96
+ output do
97
+ const :sentiment, Sentiment, description: 'The sentiment of the sentence'
98
+ const :confidence, Float, description: 'A number between 0.0 and 1.0'
99
+ end
100
+ end
101
+
102
+ # Wire it to the simplest prompting technique - a Predictn.
103
+ classify = DSPy::Predict.new(Classify)
104
+ # it may raise an error if you mess the inputs or your LLM messes the outputs.
105
+ result = classify.call(sentence: "This book was super fun to read!")
106
+
107
+ puts result.sentiment # => #<Sentiment::Positive>
108
+ puts result.confidence # => 0.85
109
+ ```
110
+
111
+ ### Access to 200+ Models Across 5 Providers
112
+
113
+ DSPy.rb provides unified access to major LLM providers with provider-specific optimizations:
114
+
115
+ ```ruby
116
+ # OpenAI (GPT-4, GPT-4o, GPT-4o-mini, GPT-5, etc.)
117
+ DSPy.configure do |c|
118
+ c.lm = DSPy::LM.new('openai/gpt-4o-mini',
119
+ api_key: ENV['OPENAI_API_KEY'],
120
+ structured_outputs: true) # Native JSON mode
121
+ end
122
+
123
+ # Google Gemini (Gemini 1.5 Pro, Flash, Gemini 2.0, etc.)
124
+ DSPy.configure do |c|
125
+ c.lm = DSPy::LM.new('gemini/gemini-2.5-flash',
126
+ api_key: ENV['GEMINI_API_KEY'],
127
+ structured_outputs: true) # Native structured outputs
128
+ end
129
+
130
+ # Anthropic Claude (Claude 3.5, Claude 4, etc.)
131
+ DSPy.configure do |c|
132
+ c.lm = DSPy::LM.new('anthropic/claude-sonnet-4-5-20250929',
133
+ api_key: ENV['ANTHROPIC_API_KEY'],
134
+ structured_outputs: true) # Tool-based extraction (default)
135
+ end
136
+
137
+ # Ollama - Run any local model (Llama, Mistral, Gemma, etc.)
138
+ DSPy.configure do |c|
139
+ c.lm = DSPy::LM.new('ollama/llama3.2') # Free, runs locally, no API key needed
140
+ end
141
+
142
+ # OpenRouter - Access to 200+ models from multiple providers
143
+ DSPy.configure do |c|
144
+ c.lm = DSPy::LM.new('openrouter/deepseek/deepseek-chat-v3.1:free',
145
+ api_key: ENV['OPENROUTER_API_KEY'])
146
+ end
147
+ ```
148
+
149
+ ## What You Get
150
+
151
+ **Developer Experience:**
152
+ - LLM provider support using official Ruby clients:
153
+ - [OpenAI Ruby](https://github.com/openai/openai-ruby) with vision model support
154
+ - [Anthropic Ruby SDK](https://github.com/anthropics/anthropic-sdk-ruby) with multimodal capabilities
155
+ - [Google Gemini API](https://ai.google.dev/) with native structured outputs
156
+ - [Ollama](https://ollama.com/) via OpenAI compatibility layer for local models
157
+ - **Multimodal Support** - Complete image analysis with DSPy::Image, type-safe bounding boxes, vision-capable models
158
+ - Runtime type checking with [Sorbet](https://sorbet.org/) including T::Enum and union types
159
+ - Type-safe tool definitions for ReAct agents
160
+ - Comprehensive instrumentation and observability
161
+
162
+ **Core Building Blocks:**
163
+ - **Signatures** - Define input/output schemas using Sorbet types with T::Enum and union type support
164
+ - **Predict** - LLM completion with structured data extraction and multimodal support
165
+ - **Chain of Thought** - Step-by-step reasoning for complex problems with automatic prompt optimization
166
+ - **ReAct** - Tool-using agents with type-safe tool definitions and error recovery
167
+ - **Module Composition** - Combine multiple LLM calls into production-ready workflows
168
+
169
+ **Optimization & Evaluation:**
170
+ - **Prompt Objects** - Manipulate prompts as first-class objects instead of strings
171
+ - **Typed Examples** - Type-safe training data with automatic validation
172
+ - **Evaluation Framework** - Advanced metrics beyond simple accuracy with error-resilient pipelines
173
+ - **MIPROv2 Optimization** - Advanced Bayesian optimization with Gaussian Processes, multiple optimization strategies, auto-config presets, and storage persistence
174
+
175
+ **Production Features:**
176
+ - **Reliable JSON Extraction** - Native structured outputs for OpenAI and Gemini, Anthropic tool-based extraction, and automatic strategy selection with fallback
177
+ - **Type-Safe Configuration** - Strategy enums with automatic provider optimization (Strict/Compatible modes)
178
+ - **Smart Retry Logic** - Progressive fallback with exponential backoff for handling transient failures
179
+ - **Zero-Config Langfuse Integration** - Set env vars and get automatic OpenTelemetry traces in Langfuse
180
+ - **Performance Caching** - Schema and capability caching for faster repeated operations
181
+ - **File-based Storage** - Optimization result persistence with versioning
182
+ - **Structured Logging** - JSON and key=value formats with span tracking
183
+
184
+ ## Recent Achievements
185
+
186
+ DSPy.rb has rapidly evolved from experimental to production-ready:
187
+
188
+ ### Foundation
189
+ - ✅ **JSON Parsing Reliability** - Native OpenAI structured outputs with adaptive retry logic and schema-aware fallbacks
190
+ - ✅ **Type-Safe Strategy Configuration** - Provider-optimized strategy selection and enum-backed optimizer presets
191
+ - ✅ **Core Module System** - Predict, ChainOfThought, ReAct with type safety (add `dspy-code_act` for Think-Code-Observe agents)
192
+ - ✅ **Production Observability** - OpenTelemetry, New Relic, and Langfuse integration
193
+ - ✅ **Advanced Optimization** - MIPROv2 with Bayesian optimization, Gaussian Processes, and multi-mode search
194
+
195
+ ### Recent Advances
196
+ - ✅ **MIPROv2 ADE Integrity (v0.29.1)** - Stratified train/val/test splits, honest precision accounting, and enum-driven `--auto` presets with integration coverage
197
+ - ✅ **Instruction Deduplication (v0.29.1)** - Candidate generation now filters repeated programs so optimization logs highlight unique strategies
198
+ - ✅ **GEPA Teleprompter (v0.29.0)** - Genetic-Pareto reflective prompt evolution with merge proposer scheduling, reflective mutation, and ADE demo parity
199
+ - ✅ **Optimizer Utilities Parity (v0.29.0)** - Bootstrap strategies, dataset summaries, and Layer 3 utilities unlock multi-predictor programs on Ruby
200
+ - ✅ **Observability Hardening (v0.29.0)** - OTLP exporter runs on a single-thread executor preventing frozen SSL contexts without blocking spans
201
+ - ✅ **Documentation Refresh (v0.29.x)** - New GEPA guide plus ADE optimization docs covering presets, stratified splits, and error-handling defaults
202
+
203
+ **Current Focus Areas:**
204
+
205
+ ### Production Readiness
206
+ - 🚧 **Production Patterns** - Real-world usage validation and performance optimization
207
+ - 🚧 **Ruby Ecosystem Integration** - Rails integration, Sidekiq compatibility, deployment patterns
208
+
209
+ ### Community & Adoption
210
+ - 🚧 **Community Examples** - Real-world applications and case studies
211
+ - 🚧 **Contributor Experience** - Making it easier to contribute and extend
212
+ - 🚧 **Performance Benchmarks** - Comparative analysis vs other frameworks
213
+
214
+ **v1.0 Philosophy:**
215
+ v1.0 will be released after extensive production battle-testing, not after checking off features.
216
+ The API is already stable - v1.0 represents confidence in production reliability backed by real-world validation.
217
+
218
+
219
+ ## Documentation
220
+
221
+ 📖 **[Complete Documentation Website](https://vicentereig.github.io/dspy.rb/)**
222
+
223
+ ### LLM-Friendly Documentation
224
+
225
+ For LLMs and AI assistants working with DSPy.rb:
226
+ - **[llms.txt](https://vicentereig.github.io/dspy.rb/llms.txt)** - Concise reference optimized for LLMs
227
+ - **[llms-full.txt](https://vicentereig.github.io/dspy.rb/llms-full.txt)** - Comprehensive API documentation
228
+
229
+ ### Getting Started
230
+ - **[Installation & Setup](docs/src/getting-started/installation.md)** - Detailed installation and configuration
231
+ - **[Quick Start Guide](docs/src/getting-started/quick-start.md)** - Your first DSPy programs
232
+ - **[Core Concepts](docs/src/getting-started/core-concepts.md)** - Understanding signatures, predictors, and modules
233
+
234
+ ### Prompt Engineering
235
+ - **[Signatures & Types](docs/src/core-concepts/signatures.md)** - Define typed interfaces for LLM operations
236
+ - **[Predictors](docs/src/core-concepts/predictors.md)** - Predict, ChainOfThought, ReAct, and more
237
+ - **[Modules & Pipelines](docs/src/core-concepts/modules.md)** - Compose complex multi-stage workflows
238
+ - **[Multimodal Support](docs/src/core-concepts/multimodal.md)** - Image analysis with vision-capable models
239
+ - **[Examples & Validation](docs/src/core-concepts/examples.md)** - Type-safe training data
240
+ - **[Rich Types](docs/src/advanced/complex-types.md)** - Sorbet type integration with automatic coercion for structs, enums, and arrays
241
+ - **[Composable Pipelines](docs/src/advanced/pipelines.md)** - Manual module composition patterns
242
+
243
+ ### Prompt Optimization
244
+ - **[Evaluation Framework](docs/src/optimization/evaluation.md)** - Advanced metrics beyond simple accuracy
245
+ - **[Prompt Optimization](docs/src/optimization/prompt-optimization.md)** - Manipulate prompts as objects
246
+ - **[MIPROv2 Optimizer](docs/src/optimization/miprov2.md)** - Advanced Bayesian optimization with Gaussian Processes
247
+ - **[GEPA Optimizer](docs/src/optimization/gepa.md)** *(beta)* - Reflective mutation with optional reflection LMs
248
+
249
+ ### Context Engineering
250
+ - **[Tools](docs/src/core-concepts/toolsets.md)** - Tool wieldint agents.
251
+ - **[Agentic Memory](docs/src/core-concepts/memory.md)** - Memory Tools & Agentic Loops
252
+ - **[RAG Patterns](docs/src/advanced/rag.md)** - Manual RAG implementation with external services
253
+
254
+ ### Production Features
255
+ - **[Observability](docs/src/production/observability.md)** - Zero-config Langfuse integration with a dedicated export worker that never blocks your LLMs
256
+ - **[Storage System](docs/src/production/storage.md)** - Persistence and optimization result storage
257
+ - **[Custom Metrics](docs/src/advanced/custom-metrics.md)** - Proc-based evaluation logic
258
+
259
+
260
+
261
+
262
+
263
+
264
+
265
+
266
+ ## License
267
+ This project is licensed under the MIT License.
@@ -0,0 +1,531 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'logger'
4
+ require 'set'
5
+ require 'sorbet-runtime'
6
+ require 'dspy/teleprompt/teleprompter'
7
+ require 'dspy/teleprompt/utils'
8
+ require 'dspy/teleprompt/instruction_updates'
9
+ require 'gepa'
10
+
11
+ module DSPy
12
+ module Teleprompt
13
+ class GEPA < Teleprompter
14
+ extend T::Sig
15
+ DEFAULT_CONFIG = {
16
+ max_metric_calls: 32,
17
+ minibatch_size: 2,
18
+ perfect_score: 1.0,
19
+ skip_perfect_score: true,
20
+ use_merge: true,
21
+ max_merge_invocations: 5
22
+ }.freeze
23
+
24
+ def self.configure
25
+ yield(default_config) if block_given?
26
+ end
27
+
28
+ def self.default_config
29
+ @default_config ||= DEFAULT_CONFIG.dup
30
+ end
31
+
32
+ class NullExperimentTracker
33
+ extend T::Sig
34
+ attr_reader :events
35
+
36
+ def initialize
37
+ @events = []
38
+ end
39
+
40
+ sig { params(metrics: T::Hash[Symbol, T.untyped], step: T.nilable(Integer)).void }
41
+ def log_metrics(metrics, step: nil)
42
+ @events << { metrics: metrics, step: step }
43
+ end
44
+ end
45
+
46
+ class NullLogger
47
+ extend T::Sig
48
+ attr_reader :messages
49
+
50
+ def initialize
51
+ @messages = []
52
+ end
53
+
54
+ sig { params(message: String).void }
55
+ def log(message)
56
+ @messages << message
57
+ DSPy.log('gepa.log', message: message)
58
+ end
59
+ end
60
+
61
+ class PredictAdapter
62
+ extend T::Sig
63
+
64
+ ReflectionLMType = T.type_alias do
65
+ T.any(DSPy::ReflectionLM, T.proc.params(arg0: String).returns(String))
66
+ end
67
+
68
+ FeedbackFnType = T.type_alias do
69
+ T.proc.params(
70
+ predictor_output: T.untyped,
71
+ predictor_inputs: T::Hash[T.any(String, Symbol), T.untyped],
72
+ module_inputs: DSPy::Example,
73
+ module_outputs: T.untyped,
74
+ captured_trace: T::Array[T::Hash[Symbol, T.untyped]]
75
+ ).returns(T.untyped)
76
+ end
77
+
78
+ sig do
79
+ params(
80
+ student: DSPy::Module,
81
+ metric: T.proc.params(arg0: DSPy::Example, arg1: T.untyped).returns(T.untyped),
82
+ reflection_lm: T.nilable(ReflectionLMType),
83
+ feedback_map: T::Hash[String, FeedbackFnType]
84
+ ).void
85
+ end
86
+ def initialize(student, metric, reflection_lm: nil, feedback_map: {})
87
+ @student = student
88
+ @metric = metric
89
+ @reflection_lm = reflection_lm
90
+ @feedback_map = feedback_map.transform_keys(&:to_s)
91
+
92
+ @predictor_entries = resolve_predictors(@student)
93
+ @predictor_names = @predictor_entries.map(&:first)
94
+ end
95
+
96
+ sig { returns(T::Hash[String, String]) }
97
+ def seed_candidate
98
+ @predictor_entries.each_with_object({}) do |(name, predictor), memo|
99
+ memo[name] = extract_instruction(predictor)
100
+ end
101
+ end
102
+
103
+ sig do
104
+ params(candidate: T::Hash[String, String], recorder: T.nilable(T.untyped)).returns(DSPy::Module)
105
+ end
106
+ def build_program(candidate, recorder: nil)
107
+ program = clone_module(@student)
108
+ duplicate_predictors!(program)
109
+
110
+ predictor_map = resolve_predictors(program).to_h
111
+ candidate.each do |name, new_instruction|
112
+ predictor = predictor_map[name]
113
+ next unless predictor
114
+
115
+ program, updated = InstructionUpdates.apply_instruction(program, predictor, new_instruction)
116
+
117
+ predictor_map[name] = updated
118
+ end
119
+
120
+ wrap_predictors_for_tracing!(program, recorder: recorder) if recorder
121
+ program
122
+ end
123
+
124
+ sig do
125
+ params(
126
+ batch: T::Array[DSPy::Example],
127
+ candidate: T::Hash[String, String],
128
+ capture_traces: T::Boolean
129
+ ).returns(::GEPA::Core::EvaluationBatch)
130
+ end
131
+ def evaluate(batch, candidate, capture_traces: false)
132
+ recorder = capture_traces ? TraceRecorder.new : nil
133
+ program = build_program(candidate, recorder: recorder)
134
+
135
+ if capture_traces
136
+ trajectories = batch.map do |example|
137
+ recorder&.start_example
138
+ prediction = program.call(**example.input_values)
139
+ result = @metric.call(example, prediction)
140
+ score, feedback = extract_score_and_feedback(result)
141
+ trace_entries = recorder ? recorder.finish_example : []
142
+
143
+ {
144
+ example: example,
145
+ prediction: prediction,
146
+ score: score,
147
+ feedback: feedback,
148
+ trace: trace_entries
149
+ }
150
+ end
151
+
152
+ scores = trajectories.map { |row| row[:score] }
153
+ outputs = trajectories.map { |row| row[:prediction] }
154
+ ::GEPA::Core::EvaluationBatch.new(outputs: outputs, scores: scores, trajectories: trajectories)
155
+ else
156
+ evaluator = DSPy::Evals.new(program, metric: nil, num_threads: nil, max_errors: batch.length * 100, provide_traceback: false)
157
+ results = batch.map do |example|
158
+ prediction = program.call(**example.input_values)
159
+ result = @metric.call(example, prediction)
160
+ score, = extract_score_and_feedback(result)
161
+ [prediction, score]
162
+ end
163
+ outputs = results.map(&:first)
164
+ scores = results.map(&:last)
165
+ ::GEPA::Core::EvaluationBatch.new(outputs: outputs, scores: scores, trajectories: nil)
166
+ end
167
+ end
168
+
169
+ sig do
170
+ params(
171
+ candidate: T::Hash[String, String],
172
+ eval_batch: ::GEPA::Core::EvaluationBatch,
173
+ components_to_update: T::Array[String]
174
+ ).returns(T::Hash[String, T::Array[T::Hash[String, T.untyped]]])
175
+ end
176
+ def make_reflective_dataset(candidate, eval_batch, components_to_update)
177
+ return {} unless eval_batch.trajectories
178
+
179
+ components_to_update.each_with_object({}) do |component, memo|
180
+ rows = eval_batch.trajectories.flat_map do |trajectory|
181
+ example = trajectory[:example]
182
+ expected = serialize_struct(example.expected)
183
+ actual_program_output = serialize_prediction(trajectory[:prediction])
184
+ diff = build_diff(expected, actual_program_output)
185
+ default_feedback = trajectory[:feedback] || "Score: #{trajectory[:score]}"
186
+ default_score = trajectory[:score]
187
+ full_trace = Array(trajectory[:trace])
188
+
189
+ full_trace.filter_map do |entry|
190
+ next unless entry[:predictor_name] == component
191
+
192
+ raw_inputs = entry[:inputs] || {}
193
+ raw_output = entry[:output]
194
+ inputs = serialize_struct(raw_inputs)
195
+ outputs = serialize_prediction(raw_output)
196
+
197
+ feedback_text = default_feedback
198
+ score_value = default_score
199
+ score_overridden = false
200
+
201
+ if (feedback_fn = @feedback_map[component])
202
+ feedback_result = feedback_fn.call(
203
+ predictor_output: raw_output,
204
+ predictor_inputs: raw_inputs,
205
+ module_inputs: example,
206
+ module_outputs: trajectory[:prediction],
207
+ captured_trace: full_trace
208
+ )
209
+ override_score, override_feedback = extract_score_and_feedback(feedback_result)
210
+ feedback_text = override_feedback if override_feedback
211
+ unless override_score.nil?
212
+ score_value = override_score
213
+ score_overridden = true
214
+ end
215
+ end
216
+
217
+ row = {
218
+ 'Inputs' => inputs,
219
+ 'Expected' => expected,
220
+ 'Generated Outputs' => outputs,
221
+ 'Diff' => diff,
222
+ 'Feedback' => feedback_text
223
+ }
224
+ row['Score'] = score_value if score_overridden
225
+ row
226
+ end
227
+ end
228
+ memo[component] = rows unless rows.empty?
229
+ end
230
+ end
231
+
232
+ sig do
233
+ params(
234
+ candidate: T::Hash[String, String],
235
+ reflective_dataset: T::Hash[String, T::Array[T::Hash[String, T.untyped]]],
236
+ components_to_update: T::Array[String]
237
+ ).returns(T::Hash[String, String])
238
+ end
239
+ def propose_new_texts(candidate, reflective_dataset, components_to_update)
240
+ if @reflection_lm
241
+ components_to_update.to_h do |name|
242
+ response = ::GEPA::Strategies::InstructionProposalSignature.run(
243
+ @reflection_lm,
244
+ {
245
+ 'current_instruction_doc' => candidate[name],
246
+ 'dataset_with_feedback' => reflective_dataset.fetch(name, [])
247
+ }
248
+ )
249
+ [name, response.fetch('new_instruction')]
250
+ end
251
+ else
252
+ components_to_update.to_h do |name|
253
+ [name, "#{candidate[name]} improved"]
254
+ end
255
+ end
256
+ end
257
+
258
+ private
259
+
260
+ sig { params(program: DSPy::Module).returns(T::Array[[String, DSPy::Module]]) }
261
+ def resolve_predictors(program)
262
+ pairs = program.named_predictors
263
+ pairs = [['self', program]] if pairs.empty?
264
+ pairs
265
+ end
266
+
267
+ sig { params(mod: DSPy::Module).returns(DSPy::Module) }
268
+ def clone_module(mod)
269
+ safe_clone(mod)
270
+ end
271
+
272
+ sig { params(program: DSPy::Module).void }
273
+ def duplicate_predictors!(program)
274
+ resolve_predictors(program).each do |name, predictor|
275
+ next unless @predictor_names.include?(name)
276
+ next if predictor.equal?(program)
277
+ clone = safe_clone(predictor)
278
+ InstructionUpdates.replace_reference(program, predictor, clone)
279
+ end
280
+ end
281
+
282
+ sig { params(program: DSPy::Module, recorder: T.nilable(T.untyped)).void }
283
+ def wrap_predictors_for_tracing!(program, recorder: nil)
284
+ return unless recorder
285
+
286
+ resolve_predictors(program).each do |name, predictor|
287
+ wrap_predictor_for_tracing(program, predictor, name, recorder)
288
+ end
289
+ end
290
+
291
+ sig { params(program: DSPy::Module, predictor: DSPy::Module, name: String, recorder: T.untyped).void }
292
+ def wrap_predictor_for_tracing(program, predictor, name, recorder)
293
+ original_forward = predictor.method(:forward_untyped)
294
+ recorder_ref = recorder
295
+ predictor_name = name
296
+
297
+ predictor.define_singleton_method(:forward_untyped) do |**input_values|
298
+ result = original_forward.call(**input_values)
299
+ recorder_ref.record(
300
+ predictor_name: predictor_name,
301
+ inputs: input_values.dup,
302
+ output: result
303
+ )
304
+ result
305
+ end
306
+ end
307
+
308
+ # instruction update helpers handled by InstructionUpdates
309
+
310
+ sig { params(object: T.untyped).returns(T.untyped) }
311
+ def safe_clone(object)
312
+ object.clone
313
+ rescue TypeError
314
+ object.dup
315
+ end
316
+
317
+ class TraceRecorder
318
+ def initialize
319
+ @current_trace = nil
320
+ end
321
+
322
+ def start_example
323
+ @current_trace = []
324
+ end
325
+
326
+ def record(entry)
327
+ return unless @current_trace
328
+ @current_trace << entry
329
+ end
330
+
331
+ def finish_example
332
+ trace = @current_trace || []
333
+ @current_trace = nil
334
+ trace
335
+ end
336
+ end
337
+
338
+ sig { params(program: DSPy::Module).returns(String) }
339
+ def extract_instruction(program)
340
+ if program.respond_to?(:prompt) && program.prompt.respond_to?(:instruction)
341
+ program.prompt.instruction
342
+ elsif program.respond_to?(:instruction)
343
+ program.instruction
344
+ else
345
+ raise ArgumentError, "Program must expose prompt.instruction or #instruction"
346
+ end
347
+ end
348
+
349
+ sig { params(struct: T.untyped).returns(T::Hash[Symbol, T.untyped]) }
350
+ def serialize_struct(struct)
351
+ if struct.respond_to?(:to_h)
352
+ struct.to_h
353
+ elsif struct.instance_variables.any?
354
+ struct.instance_variables.each_with_object({}) do |ivar, memo|
355
+ key = ivar.to_s.delete_prefix('@').to_sym
356
+ memo[key] = struct.instance_variable_get(ivar)
357
+ end
358
+ else
359
+ {}
360
+ end
361
+ end
362
+
363
+ sig { params(prediction: T.untyped).returns(T::Hash[Symbol, T.untyped]) }
364
+ def serialize_prediction(prediction)
365
+ case prediction
366
+ when DSPy::Prediction
367
+ prediction.to_h
368
+ when Hash
369
+ prediction
370
+ else
371
+ serialize_struct(prediction)
372
+ end
373
+ end
374
+
375
+ sig { params(expected: T::Hash[Symbol, T.untyped], actual: T::Hash[Symbol, T.untyped]).returns(T::Hash[Symbol, T.untyped]) }
376
+ def build_diff(expected, actual)
377
+ keys = expected.keys | actual.keys
378
+ keys.each_with_object({}) do |key, memo|
379
+ exp = expected[key]
380
+ act = actual[key]
381
+ next if exp == act
382
+
383
+ memo[key] = { expected: exp, actual: act }
384
+ end
385
+ end
386
+
387
+ sig { params(result: T.untyped).returns([Float, T.nilable(String)]) }
388
+ def extract_score_and_feedback(result)
389
+ case result
390
+ when DSPy::Prediction
391
+ score = result.respond_to?(:score) ? result.score : 0.0
392
+ feedback = result.respond_to?(:feedback) ? result.feedback : nil
393
+ [score.to_f, feedback]
394
+ when Hash
395
+ [result[:score].to_f, result[:feedback]]
396
+ else
397
+ [result.to_f, nil]
398
+ end
399
+ end
400
+ end
401
+
402
+ sig do
403
+ params(
404
+ metric: T.proc.params(arg0: DSPy::Example, arg1: T.untyped).returns(T.untyped),
405
+ reflection_lm: T.nilable(T.untyped),
406
+ feedback_map: T.nilable(T::Hash[String, PredictAdapter::FeedbackFnType]),
407
+ adapter_builder: T.nilable(T.proc.returns(T.untyped)),
408
+ config: T.nilable(T::Hash[Symbol, T.untyped]),
409
+ experiment_tracker: T.nilable(T.untyped)
410
+ ).void
411
+ end
412
+ def initialize(metric:, reflection_lm: nil, feedback_map: nil, adapter_builder: nil, config: nil, experiment_tracker: nil)
413
+ super(metric: metric)
414
+ @metric = metric
415
+ @reflection_lm = reflection_lm
416
+ @feedback_map = (feedback_map || {}).transform_keys(&:to_s)
417
+ @adapter_builder = adapter_builder || method(:build_adapter)
418
+ @gepa_config = self.class.default_config.merge(config || {})
419
+ @experiment_tracker = experiment_tracker
420
+ end
421
+
422
+ sig do
423
+ override.params(
424
+ program: DSPy::Module,
425
+ trainset: T::Array[T.untyped],
426
+ valset: T.nilable(T::Array[T.untyped])
427
+ ).returns(OptimizationResult)
428
+ end
429
+ def compile(program, trainset:, valset: nil)
430
+ validate_inputs(program, trainset, valset)
431
+
432
+ typed_trainset = ensure_typed_examples(trainset)
433
+ typed_valset = valset ? ensure_typed_examples(valset) : typed_trainset
434
+
435
+ adapter = @adapter_builder.call(
436
+ program,
437
+ @metric,
438
+ reflection_lm: @reflection_lm,
439
+ feedback_map: @feedback_map
440
+ )
441
+ seed_candidate = adapter.seed_candidate
442
+
443
+ cand_selector = ::GEPA::Strategies::ParetoCandidateSelector.new
444
+ comp_selector = ::GEPA::Strategies::RoundRobinReflectionComponentSelector.new
445
+ batch_sampler = ::GEPA::Strategies::EpochShuffledBatchSampler.new([@gepa_config[:minibatch_size], typed_trainset.size].min)
446
+
447
+ telemetry_context = ::GEPA::Telemetry.build_context
448
+
449
+ logger = ::GEPA::Logging::BufferingLogger.new
450
+ tracker = @experiment_tracker || ::GEPA::Logging::ExperimentTracker.new
451
+
452
+ reflective = ::GEPA::Proposer::ReflectiveMutationProposer.new(
453
+ logger: logger,
454
+ trainset: typed_trainset,
455
+ adapter: adapter,
456
+ candidate_selector: cand_selector,
457
+ module_selector: comp_selector,
458
+ batch_sampler: batch_sampler,
459
+ perfect_score: @gepa_config[:perfect_score],
460
+ skip_perfect_score: @gepa_config[:skip_perfect_score],
461
+ experiment_tracker: tracker,
462
+ reflection_lm: nil,
463
+ telemetry: telemetry_context
464
+ )
465
+
466
+ evaluator = lambda do |dataset, candidate|
467
+ batch = adapter.evaluate(dataset, candidate, capture_traces: false)
468
+ [batch.outputs, batch.scores]
469
+ end
470
+
471
+ merge_proposer = nil
472
+ if @gepa_config[:use_merge]
473
+ merge_proposer = ::GEPA::Proposer::MergeProposer.new(
474
+ logger: logger,
475
+ valset: typed_valset,
476
+ evaluator: evaluator,
477
+ use_merge: true,
478
+ max_merge_invocations: @gepa_config[:max_merge_invocations],
479
+ rng: Random.new(0),
480
+ telemetry: telemetry_context
481
+ )
482
+ end
483
+
484
+ engine = ::GEPA::Core::Engine.new(
485
+ evaluator: evaluator,
486
+ valset: typed_valset,
487
+ seed_candidate: seed_candidate,
488
+ max_metric_calls: @gepa_config[:max_metric_calls],
489
+ perfect_score: @gepa_config[:perfect_score],
490
+ seed: 0,
491
+ reflective_proposer: reflective,
492
+ logger: logger,
493
+ experiment_tracker: tracker,
494
+ merge_proposer: merge_proposer,
495
+ run_dir: nil,
496
+ track_best_outputs: false,
497
+ display_progress_bar: false,
498
+ telemetry: telemetry_context,
499
+ raise_on_exception: true
500
+ )
501
+
502
+ state = engine.run
503
+ result = ::GEPA::Core::Result.from_state(state)
504
+ best_program = adapter.build_program(result.best_candidate)
505
+
506
+ OptimizationResult.new(
507
+ optimized_program: best_program,
508
+ scores: { best: result.val_aggregate_scores[result.best_idx] },
509
+ history: { total_candidates: result.num_candidates },
510
+ best_score_name: 'best',
511
+ best_score_value: result.val_aggregate_scores[result.best_idx],
512
+ metadata: { candidates: result.num_candidates }
513
+ )
514
+ end
515
+
516
+ private
517
+
518
+ sig do
519
+ params(
520
+ program: DSPy::Module,
521
+ metric: T.proc.params(arg0: DSPy::Example, arg1: T.untyped).returns(T.untyped),
522
+ reflection_lm: T.nilable(T.untyped),
523
+ feedback_map: T::Hash[String, PredictAdapter::FeedbackFnType]
524
+ ).returns(PredictAdapter)
525
+ end
526
+ def build_adapter(program, metric, reflection_lm: nil, feedback_map: {})
527
+ PredictAdapter.new(program, metric, reflection_lm: reflection_lm, feedback_map: feedback_map)
528
+ end
529
+ end
530
+ end
531
+ end
@@ -0,0 +1,10 @@
1
+ # typed: strict
2
+ # frozen_string_literal: true
3
+
4
+ require_relative '../version'
5
+
6
+ module DSPy
7
+ module GEPA
8
+ VERSION = '1.0.0'
9
+ end
10
+ end
data/lib/dspy/gepa.rb ADDED
@@ -0,0 +1,5 @@
1
+ # typed: strict
2
+ # frozen_string_literal: true
3
+
4
+ require_relative 'gepa/version'
5
+ require_relative 'gepa/teleprompt'
metadata ADDED
@@ -0,0 +1,78 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: dspy-gepa
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Vicente Reig Rincón de Arellano
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2025-10-25 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: dspy
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '='
18
+ - !ruby/object:Gem::Version
19
+ version: 0.30.0
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - '='
25
+ - !ruby/object:Gem::Version
26
+ version: 0.30.0
27
+ - !ruby/object:Gem::Dependency
28
+ name: gepa
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '='
32
+ - !ruby/object:Gem::Version
33
+ version: 1.0.0
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - '='
39
+ - !ruby/object:Gem::Version
40
+ version: 1.0.0
41
+ description: Ships DSPy::Teleprompt::GEPA plus reflective adapters, experiment tracking,
42
+ and telemetry hooks built on top of the GEPA optimizer core gem.
43
+ email:
44
+ - hey@vicente.services
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - LICENSE
50
+ - README.md
51
+ - lib/dspy/gepa.rb
52
+ - lib/dspy/gepa/teleprompt.rb
53
+ - lib/dspy/gepa/version.rb
54
+ homepage: https://github.com/vicentereig/dspy.rb
55
+ licenses:
56
+ - MIT
57
+ metadata:
58
+ github_repo: git@github.com:vicentereig/dspy.rb
59
+ post_install_message:
60
+ rdoc_options: []
61
+ require_paths:
62
+ - lib
63
+ required_ruby_version: !ruby/object:Gem::Requirement
64
+ requirements:
65
+ - - ">="
66
+ - !ruby/object:Gem::Version
67
+ version: 3.3.0
68
+ required_rubygems_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - ">="
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ requirements: []
74
+ rubygems_version: 3.0.3.1
75
+ signing_key:
76
+ specification_version: 4
77
+ summary: GEPA teleprompter integration for DSPy.rb.
78
+ test_files: []