dspy-evals 0.29.1 → 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f39ede0bd93df0362c4cf8205ed8c5340cd52100cb0ba83a650f39583e496d76
4
- data.tar.gz: 96eafcbb25a32b13d4c5b18e1685e7867f71e97fb59ae2aaa20a7aa4940d0db7
3
+ metadata.gz: c7ed57712b59a1618c77f2004ca95c8725446aa6283ce2481d47a8a1f0cf88a4
4
+ data.tar.gz: 82d9029b0b3c4f43ae36d522b12a7c6b1a5fd0f28cf1d588f294127c3e19101c
5
5
  SHA512:
6
- metadata.gz: 8f2f94c8cc7f3660a4083a04cf6e4720d473baab59bc3dad06671068c8003731bd256859438ba13f575815c09c098fca4e24bd6b322ac9b07ad0d6c196e3ec1e
7
- data.tar.gz: 863806288464a5859e8b9ee04b8ac19def6820ce31a88dfe5fc7b32605a638cb509f67d25b5dfd96641a09cac44627f4d42531489118d0c41c4ca597a6895a57
6
+ metadata.gz: c79448ff6fce8c0a1b57531e9482e3eaf4278e59c733f5f1f4dca2e066e7eedcb85e52bdb541a68cfc4ebbb8c6d66564de9350e97f18aa58c61776e32c31c46d
7
+ data.tar.gz: 00bf74ae3905981e68777ef0e42214e75b0e70d16edd3ea3da0147963ef7ac68b7bc5bed169cdcb213f781966293f3488d62c715e13a08d922ea2d59e671eb5a
data/README.md CHANGED
@@ -4,6 +4,7 @@
4
4
  [![Total Downloads](https://img.shields.io/gem/dt/dspy)](https://rubygems.org/gems/dspy)
5
5
  [![Build Status](https://img.shields.io/github/actions/workflow/status/vicentereig/dspy.rb/ruby.yml?branch=main&label=build)](https://github.com/vicentereig/dspy.rb/actions/workflows/ruby.yml)
6
6
  [![Documentation](https://img.shields.io/badge/docs-vicentereig.github.io%2Fdspy.rb-blue)](https://vicentereig.github.io/dspy.rb/)
7
+ [![Discord](https://img.shields.io/discord/1161519468141355160?label=discord&logo=discord&logoColor=white)](https://discord.gg/zWBhrMqn)
7
8
 
8
9
  > [!NOTE]
9
10
  > The core Prompt Engineering Framework is production-ready with
@@ -12,23 +13,16 @@
12
13
  >
13
14
  > If you want to contribute, feel free to reach out to me to coordinate efforts: hey at vicente.services
14
15
  >
15
- > And, yes, this is 100% a legit project. :)
16
-
17
16
 
18
17
  **Build reliable LLM applications in idiomatic Ruby using composable, type-safe modules.**
19
18
 
20
- The Ruby framework for programming with large language models. DSPy.rb brings structured LLM programming to Ruby developers, programmatic Prompt Engineering and Context Engineering.
21
- Instead of wrestling with prompt strings and parsing responses, you define typed signatures using idiomatic Ruby to compose and decompose AI Worklows and AI Agents.
19
+ DSPy.rb is the Ruby-first surgical port of Stanford's [DSPy framework](https://github.com/stanfordnlp/dspy). It delivers structured LLM programming, prompt engineering, and context engineering in the language we love. Instead of wrestling with brittle prompt strings, you define typed signatures in idiomatic Ruby and compose workflows and agents that actually behave.
22
20
 
23
- **Prompts are the just Functions.** Traditional prompting is like writing code with string concatenation: it works until it doesn't. DSPy.rb brings you
24
- the programming approach pioneered by [dspy.ai](https://dspy.ai/): instead of crafting fragile prompts, you define modular
25
- signatures and let the framework handle the messy details.
21
+ **Prompts are just functions.** Traditional prompting is like writing code with string concatenation: it works until it doesn't. DSPy.rb brings you the programming approach pioneered by [dspy.ai](https://dspy.ai/): define modular signatures and let the framework deal with the messy bits.
26
22
 
27
- DSPy.rb is an idiomatic Ruby surgical port of Stanford's [DSPy framework](https://github.com/stanfordnlp/dspy). While implementing
28
- the core concepts of signatures, predictors, and the main optimization algorithms from the original Python library, DSPy.rb embraces Ruby
29
- conventions and adds Ruby-specific innovations like Sorbet-base Typed system, ReAct loops, and production-ready integrations like non-blocking Open Telemetry Instrumentation.
23
+ While we implement the same signatures, predictors, and optimization algorithms as the original library, DSPy.rb leans hard into Ruby conventions with Sorbet-based typing, ReAct loops, and production-ready integrations like non-blocking OpenTelemetry instrumentation.
30
24
 
31
- **What you get?** Ruby LLM applications that actually scale and don't break when you sneeze.
25
+ **What you get?** Ruby LLM applications that scale and don't break when you sneeze.
32
26
 
33
27
  Check the [examples](examples/) and take them for a spin!
34
28
 
@@ -46,11 +40,13 @@ and
46
40
  ```bash
47
41
  bundle install
48
42
  ```
43
+
49
44
  ### Your First Reliable Predictor
50
45
 
51
46
  ```ruby
47
+ require 'dspy'
52
48
 
53
- # Configure DSPy globablly to use your fave LLM - you can override this on an instance levle.
49
+ # Configure DSPy globally to use your fave LLM (you can override per predictor).
54
50
  DSPy.configure do |c|
55
51
  c.lm = DSPy::LM.new('openai/gpt-4o-mini',
56
52
  api_key: ENV['OPENAI_API_KEY'],
@@ -81,7 +77,7 @@ class Classify < DSPy::Signature
81
77
  end
82
78
  end
83
79
 
84
- # Wire it to the simplest prompting technique - a Predictn.
80
+ # Wire it to the simplest prompting technique: a prediction loop.
85
81
  classify = DSPy::Predict.new(Classify)
86
82
  # it may raise an error if you mess the inputs or your LLM messes the outputs.
87
83
  result = classify.call(sentence: "This book was super fun to read!")
@@ -90,6 +86,37 @@ puts result.sentiment # => #<Sentiment::Positive>
90
86
  puts result.confidence # => 0.85
91
87
  ```
92
88
 
89
+ Save this as `examples/first_predictor.rb` and run it with:
90
+
91
+ ```bash
92
+ bundle exec ruby examples/first_predictor.rb
93
+ ```
94
+
95
+ ### Sibling Gems
96
+
97
+ DSPy.rb ships multiple gems from this monorepo so you can opt into features with heavier dependency trees (e.g., datasets pull in Polars/Arrow, MIPROv2 requires `numo-*` BLAS bindings) only when you need them. Add these alongside `dspy`:
98
+
99
+ | Gem | Description | Status |
100
+ | --- | --- | --- |
101
+ | `dspy-schema` | Exposes `DSPy::TypeSystem::SorbetJsonSchema` for downstream reuse. (Still required by the core `dspy` gem; extraction lets other projects depend on it directly.) | **Stable** (v1.0.0) |
102
+ | `dspy-openai` | Packages the OpenAI/OpenRouter/Ollama adapters plus the official SDK guardrails. Install whenever you call `openai/*`, `openrouter/*`, or `ollama/*`. [Adapter README](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/openai/README.md) | **Stable** (v1.0.0) |
103
+ | `dspy-anthropic` | Claude adapters, streaming, and structured-output helpers behind the official `anthropic` SDK. [Adapter README](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/anthropic/README.md) | **Stable** (v1.0.0) |
104
+ | `dspy-gemini` | Gemini adapters with multimodal + tool-call support via `gemini-ai`. [Adapter README](https://github.com/vicentereig/dspy.rb/blob/main/lib/dspy/gemini/README.md) | **Stable** (v1.0.0) |
105
+ | `dspy-code_act` | Think-Code-Observe agents that synthesize and execute Ruby safely. (Add the gem or set `DSPY_WITH_CODE_ACT=1` before requiring `dspy/code_act`.) | **Stable** (v1.0.0) |
106
+ | `dspy-datasets` | Dataset helpers plus Parquet/Polars tooling for richer evaluation corpora. (Toggle via `DSPY_WITH_DATASETS`.) | **Stable** (v1.0.0) |
107
+ | `dspy-evals` | High-throughput evaluation harness with metrics, callbacks, and regression fixtures. (Toggle via `DSPY_WITH_EVALS`.) | **Stable** (v1.0.0) |
108
+ | `dspy-miprov2` | Bayesian optimization + Gaussian Process backend for the MIPROv2 teleprompter. (Install or export `DSPY_WITH_MIPROV2=1` before requiring the teleprompter.) | **Stable** (v1.0.0) |
109
+ | `dspy-gepa` | `DSPy::Teleprompt::GEPA`, reflection loops, experiment tracking, telemetry adapters. (Install or set `DSPY_WITH_GEPA=1`.) | **Stable** (v1.0.0) |
110
+ | `gepa` | GEPA optimizer core (Pareto engine, telemetry, reflective proposer). | **Stable** (v1.0.0) |
111
+ | `dspy-o11y` | Core observability APIs: `DSPy::Observability`, async span processor, observation types. (Install or set `DSPY_WITH_O11Y=1`.) | **Stable** (v1.0.0) |
112
+ | `dspy-o11y-langfuse` | Auto-configures DSPy observability to stream spans to Langfuse via OTLP. (Install or set `DSPY_WITH_O11Y_LANGFUSE=1`.) | **Stable** (v1.0.0) |
113
+ | `dspy-deep_search` | Production DeepSearch loop with Exa-backed search/read, token budgeting, and instrumentation (Issue #163). | **Stable** (v1.0.0) |
114
+ | `dspy-deep_research` | Planner/QA orchestration atop DeepSearch plus the memory supervisor used by the CLI example. | **Stable** (v1.0.0) |
115
+ | `sorbet-toon` | Token-Oriented Object Notation (TOON) codec, prompt formatter, and Sorbet mixins for BAML/TOON Enhanced Prompting. [Sorbet::Toon README](https://github.com/vicentereig/dspy.rb/blob/main/lib/sorbet/toon/README.md) | **Alpha** (v0.1.0) |
116
+
117
+ **Provider adapters:** Add `dspy-openai`, `dspy-anthropic`, and/or `dspy-gemini` next to `dspy` in your Gemfile depending on which `DSPy::LM` providers you call. Each gem already depends on the official SDK (`openai`, `anthropic`, `gemini-ai`), and DSPy auto-loads the adapters when the gem is present—no extra `require` needed.
118
+
119
+ Set the matching `DSPY_WITH_*` environment variables (see `Gemfile`) to include or exclude each sibling gem when running Bundler locally (for example `DSPY_WITH_GEPA=1` or `DSPY_WITH_O11Y_LANGFUSE=1`). Refer to `adr/013-dependency-tree.md` for the full dependency map and roadmap.
93
120
  ### Access to 200+ Models Across 5 Providers
94
121
 
95
122
  DSPy.rb provides unified access to major LLM providers with provider-specific optimizations:
@@ -130,7 +157,10 @@ end
130
157
 
131
158
  ## What You Get
132
159
 
133
- **Developer Experience:**
160
+ **Developer Experience:** Official clients, multimodal coverage, and observability baked in.
161
+ <details>
162
+ <summary>Expand for everything included</summary>
163
+
134
164
  - LLM provider support using official Ruby clients:
135
165
  - [OpenAI Ruby](https://github.com/openai/openai-ruby) with vision model support
136
166
  - [Anthropic Ruby SDK](https://github.com/anthropics/anthropic-sdk-ruby) with multimodal capabilities
@@ -140,21 +170,33 @@ end
140
170
  - Runtime type checking with [Sorbet](https://sorbet.org/) including T::Enum and union types
141
171
  - Type-safe tool definitions for ReAct agents
142
172
  - Comprehensive instrumentation and observability
173
+ </details>
174
+
175
+ **Core Building Blocks:** Predictors, agents, and pipelines wired through type-safe signatures.
176
+ <details>
177
+ <summary>Expand for everything included</summary>
143
178
 
144
- **Core Building Blocks:**
145
179
  - **Signatures** - Define input/output schemas using Sorbet types with T::Enum and union type support
146
180
  - **Predict** - LLM completion with structured data extraction and multimodal support
147
181
  - **Chain of Thought** - Step-by-step reasoning for complex problems with automatic prompt optimization
148
182
  - **ReAct** - Tool-using agents with type-safe tool definitions and error recovery
149
183
  - **Module Composition** - Combine multiple LLM calls into production-ready workflows
184
+ </details>
185
+
186
+ **Optimization & Evaluation:** Treat prompt optimization like a real ML workflow.
187
+ <details>
188
+ <summary>Expand for everything included</summary>
150
189
 
151
- **Optimization & Evaluation:**
152
190
  - **Prompt Objects** - Manipulate prompts as first-class objects instead of strings
153
191
  - **Typed Examples** - Type-safe training data with automatic validation
154
192
  - **Evaluation Framework** - Advanced metrics beyond simple accuracy with error-resilient pipelines
155
193
  - **MIPROv2 Optimization** - Advanced Bayesian optimization with Gaussian Processes, multiple optimization strategies, auto-config presets, and storage persistence
194
+ </details>
195
+
196
+ **Production Features:** Hardened behaviors for teams shipping actual products.
197
+ <details>
198
+ <summary>Expand for everything included</summary>
156
199
 
157
- **Production Features:**
158
200
  - **Reliable JSON Extraction** - Native structured outputs for OpenAI and Gemini, Anthropic tool-based extraction, and automatic strategy selection with fallback
159
201
  - **Type-Safe Configuration** - Strategy enums with automatic provider optimization (Strict/Compatible modes)
160
202
  - **Smart Retry Logic** - Progressive fallback with exponential backoff for handling transient failures
@@ -162,15 +204,18 @@ end
162
204
  - **Performance Caching** - Schema and capability caching for faster repeated operations
163
205
  - **File-based Storage** - Optimization result persistence with versioning
164
206
  - **Structured Logging** - JSON and key=value formats with span tracking
207
+ </details>
165
208
 
166
209
  ## Recent Achievements
167
210
 
168
- DSPy.rb has rapidly evolved from experimental to production-ready:
211
+ DSPy.rb has gone from experimental to production-ready in three fast releases.
212
+ <details>
213
+ <summary>Expand for the full changelog highlights</summary>
169
214
 
170
215
  ### Foundation
171
216
  - ✅ **JSON Parsing Reliability** - Native OpenAI structured outputs with adaptive retry logic and schema-aware fallbacks
172
217
  - ✅ **Type-Safe Strategy Configuration** - Provider-optimized strategy selection and enum-backed optimizer presets
173
- - ✅ **Core Module System** - Predict, ChainOfThought, ReAct, CodeAct with type safety
218
+ - ✅ **Core Module System** - Predict, ChainOfThought, ReAct with type safety (add `dspy-code_act` for Think-Code-Observe agents)
174
219
  - ✅ **Production Observability** - OpenTelemetry, New Relic, and Langfuse integration
175
220
  - ✅ **Advanced Optimization** - MIPROv2 with Bayesian optimization, Gaussian Processes, and multi-mode search
176
221
 
@@ -181,8 +226,11 @@ DSPy.rb has rapidly evolved from experimental to production-ready:
181
226
  - ✅ **Optimizer Utilities Parity (v0.29.0)** - Bootstrap strategies, dataset summaries, and Layer 3 utilities unlock multi-predictor programs on Ruby
182
227
  - ✅ **Observability Hardening (v0.29.0)** - OTLP exporter runs on a single-thread executor preventing frozen SSL contexts without blocking spans
183
228
  - ✅ **Documentation Refresh (v0.29.x)** - New GEPA guide plus ADE optimization docs covering presets, stratified splits, and error-handling defaults
229
+ </details>
184
230
 
185
- **Current Focus Areas:**
231
+ **Current Focus Areas:** Closing the loop on production patterns and community adoption ahead of v1.0.
232
+ <details>
233
+ <summary>Expand for the roadmap</summary>
186
234
 
187
235
  ### Production Readiness
188
236
  - 🚧 **Production Patterns** - Real-world usage validation and performance optimization
@@ -192,10 +240,9 @@ DSPy.rb has rapidly evolved from experimental to production-ready:
192
240
  - 🚧 **Community Examples** - Real-world applications and case studies
193
241
  - 🚧 **Contributor Experience** - Making it easier to contribute and extend
194
242
  - 🚧 **Performance Benchmarks** - Comparative analysis vs other frameworks
243
+ </details>
195
244
 
196
- **v1.0 Philosophy:**
197
- v1.0 will be released after extensive production battle-testing, not after checking off features.
198
- The API is already stable - v1.0 represents confidence in production reliability backed by real-world validation.
245
+ **v1.0 Philosophy:** v1.0 lands after battle-testing, not checkbox bingo. The API is already stable; the milestone marks production confidence.
199
246
 
200
247
 
201
248
  ## Documentation
@@ -2,6 +2,6 @@
2
2
 
3
3
  module DSPy
4
4
  class Evals
5
- VERSION = DSPy::VERSION
5
+ VERSION = '1.0.1'
6
6
  end
7
7
  end
data/lib/dspy/evals.rb CHANGED
@@ -1,7 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  require 'json'
4
- require 'polars'
5
4
  require 'concurrent'
6
5
  require 'sorbet-runtime'
7
6
  require_relative 'example'
@@ -111,8 +110,14 @@ module DSPy
111
110
  }
112
111
  end
113
112
 
114
- sig { returns(Polars::DataFrame) }
113
+ if defined?(Polars::DataFrame)
114
+ sig { returns(Polars::DataFrame) }
115
+ else
116
+ sig { returns(T.untyped) }
117
+ end
115
118
  def to_polars
119
+ ensure_polars!
120
+
116
121
  rows = @results.each_with_index.map do |result, index|
117
122
  {
118
123
  "index" => index,
@@ -130,6 +135,20 @@ module DSPy
130
135
 
131
136
  private
132
137
 
138
+ POLARS_MISSING_ERROR = <<~MSG
139
+ Polars is required to export evaluation results. Add `gem 'polars'`
140
+ (or enable the `dspy-datasets` gem / `DSPY_WITH_DATASETS=1`) before
141
+ calling `DSPy::Evals::BatchEvaluationResult#to_polars`.
142
+ MSG
143
+
144
+ def ensure_polars!
145
+ return if defined?(Polars::DataFrame)
146
+
147
+ require 'polars'
148
+ rescue LoadError => e
149
+ raise LoadError, "#{POLARS_MISSING_ERROR}\n\n#{e.message}"
150
+ end
151
+
133
152
  def serialize_for_polars(value)
134
153
  case value
135
154
  when NilClass, TrueClass, FalseClass, Numeric, String
metadata CHANGED
@@ -1,28 +1,28 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: dspy-evals
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.29.1
4
+ version: 1.0.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Vicente Reig Rincón de Arellano
8
8
  bindir: bin
9
9
  cert_chain: []
10
- date: 2025-10-23 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
11
  dependencies:
12
12
  - !ruby/object:Gem::Dependency
13
13
  name: dspy
14
14
  requirement: !ruby/object:Gem::Requirement
15
15
  requirements:
16
- - - '='
16
+ - - ">="
17
17
  - !ruby/object:Gem::Version
18
- version: 0.29.1
18
+ version: '0.30'
19
19
  type: :runtime
20
20
  prerelease: false
21
21
  version_requirements: !ruby/object:Gem::Requirement
22
22
  requirements:
23
- - - '='
23
+ - - ">="
24
24
  - !ruby/object:Gem::Version
25
- version: 0.29.1
25
+ version: '0.30'
26
26
  - !ruby/object:Gem::Dependency
27
27
  name: concurrent-ruby
28
28
  requirement: !ruby/object:Gem::Requirement
@@ -82,7 +82,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
82
82
  - !ruby/object:Gem::Version
83
83
  version: '0'
84
84
  requirements: []
85
- rubygems_version: 3.6.5
85
+ rubygems_version: 3.6.9
86
86
  specification_version: 4
87
87
  summary: Evaluation utilities for DSPy.rb programs.
88
88
  test_files: []