dspy 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 6b4d0c0f8eba6601ce96a8acf5a167a8a7be9fec7f20c024495eee01b702cff1
4
- data.tar.gz: 15f4abd449e6e74b30d0ea47231cb238e9e40b51b048c31b3a74c2c2571d022b
3
+ metadata.gz: 8053438ba1e55a093c35b50b9dc3b106b0c158ce426c6f286ba7f62aeee8161d
4
+ data.tar.gz: 456ca182c45f1924caa6eeea6c86debba28a77ba127c6326014b1a468e07c445
5
5
  SHA512:
6
- metadata.gz: f6c87053b33dbfc27eb2386801cfff2ce6fe67a9d8e3518624be72f914a099e7f85c18e2a90f06634b7e4e442c190ca30a8188ca75d9a40ed4ad3cb1dc79de63
7
- data.tar.gz: ec5ab6691f7494449ce4bf2469654eebb85d5de4cc76bf8893d895e5d00f3c64621d0602c1c800782a9917da281626d5020e192af439f683948e0d92f638c0fa
6
+ metadata.gz: 67c31136acd1ef0a01c49938b3f142abf3690264ac048969e342a750dce80b59b16b258efc4ee6385fb55fb7580e18ac9f87aeb7fb09631cfb0fb139debb4f52
7
+ data.tar.gz: 94214ff9fdd61ea1d478abf9d8da37325d884e8238b01e7bd50cdb3bd21fcfcfaf2c0c6b3509241f0bff3e730587e770588c9f262206f1ca1bc809724e553b31
data/README.md CHANGED
@@ -1,10 +1,490 @@
1
1
  # DSPy.rb
2
2
 
3
- A port of the DSPy library to Ruby.
3
+ **Build reliable LLM applications in Ruby using composable, type-safe modules.**
4
+
5
+ DSPy.rb brings structured LLM programming to Ruby developers.
6
+ Instead of wrestling with prompt strings and parsing responses,
7
+ you define typed signatures and compose them into pipelines that just work.
8
+
9
+ Traditional prompting is like writing code with string concatenation: it works until
10
+ it doesn't. DSPy.rb brings you the programming approach pioneered
11
+ by [dspy.ai](https://dspy.ai/): instead of crafting fragile prompts, you define
12
+ modular signatures and let the framework handle the messy details.
13
+
14
+ The result? LLM applications that actually scale and don't break when you sneeze.
15
+
16
+ ## What You Get
17
+
18
+ **Core Building Blocks:**
19
+ - **Signatures** - Define input/output schemas using Sorbet types
20
+ - **Predict** - Basic LLM completion with structured data
21
+ - **Chain of Thought** - Step-by-step reasoning for complex problems
22
+ - **ReAct** - Tool-using agents that can actually get things done
23
+ - **RAG** - Context-enriched responses from your data
24
+ - **Multi-stage Pipelines** - Compose multiple LLM calls into workflows
25
+ - OpenAI and Anthropic support via [Ruby LLM](https://github.com/crmne/ruby_llm)
26
+ - Runtime type checking with [Sorbet](https://sorbet.org/)
27
+ - Type-safe tool definitions for ReAct agents
28
+
29
+ ## Fair Warning
30
+
31
+ This is fresh off the oven and evolving fast.
32
+ I'm actively building this as a Ruby port of the [DSPy library](https://dspy.ai/).
33
+ If you hit bugs or want to contribute, just email me directly!
34
+
35
+ ## What's Next
36
+ These are my goals to release v1.0.
37
+
38
+ - Solidify prompt optimization
39
+ - OTel Integration
40
+ - Ollama support
4
41
 
5
42
  ## Installation
6
43
 
7
- ```bash
8
- gem install dspy
44
+ Skip the gem for now - install straight from this repo while I prep the first release:
45
+ ```ruby
46
+ gem 'dspy', github: 'vicentereig/dspy.rb'
9
47
  ```
10
48
 
49
+ ## Usage Examples
50
+
51
+ ### Simple Prediction
52
+
53
+ ```ruby
54
+ # Define a signature for sentiment classification
55
+ class Classify < DSPy::Signature
56
+ description "Classify sentiment of a given sentence."
57
+
58
+ class Sentiment < T::Enum
59
+ enums do
60
+ Positive = new('positive')
61
+ Negative = new('negative')
62
+ Neutral = new('neutral')
63
+ end
64
+ end
65
+
66
+ input do
67
+ const :sentence, String
68
+ end
69
+
70
+ output do
71
+ const :sentiment, Sentiment
72
+ const :confidence, Float
73
+ end
74
+ end
75
+
76
+ # Configure DSPy with your LLM
77
+ DSPy.configure do |c|
78
+ c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
79
+ end
80
+
81
+ # Create the predictor and run inference
82
+ classify = DSPy::Predict.new(Classify)
83
+ result = classify.call(sentence: "This book was super fun to read, though not the last chapter.")
84
+
85
+ # result is a properly typed T::Struct instance
86
+ puts result.sentiment # => #<Sentiment::Positive>
87
+ puts result.confidence # => 0.85
88
+ ```
89
+
90
+ ### Chain of Thought Reasoning
91
+
92
+ ```ruby
93
+ class AnswerPredictor < DSPy::Signature
94
+ description "Provides a concise answer to the question"
95
+
96
+ input do
97
+ const :question, String
98
+ end
99
+
100
+ output do
101
+ const :answer, String
102
+ end
103
+ end
104
+
105
+ # Chain of thought automatically adds a 'reasoning' field to the output
106
+ qa_cot = DSPy::ChainOfThought.new(AnswerPredictor)
107
+ result = qa_cot.call(question: "Two dice are tossed. What is the probability that the sum equals two?")
108
+
109
+ puts result.reasoning # => "There is only one way to get a sum of 2..."
110
+ puts result.answer # => "1/36"
111
+ ```
112
+
113
+ ### ReAct Agents with Tools
114
+
115
+ ```ruby
116
+
117
+ class DeepQA < DSPy::Signature
118
+ description "Answer questions with consideration for the context"
119
+
120
+ input do
121
+ const :question, String
122
+ end
123
+
124
+ output do
125
+ const :answer, String
126
+ end
127
+ end
128
+
129
+ # Define tools for the agent
130
+ class CalculatorTool < DSPy::Tools::Base
131
+
132
+ tool_name 'calculator'
133
+ tool_description 'Performs basic arithmetic operations'
134
+
135
+ sig { params(operation: String, num1: Float, num2: Float).returns(T.any(Float, String)) }
136
+ def call(operation:, num1:, num2:)
137
+ case operation.downcase
138
+ when 'add' then num1 + num2
139
+ when 'subtract' then num1 - num2
140
+ when 'multiply' then num1 * num2
141
+ when 'divide'
142
+ return "Error: Cannot divide by zero" if num2 == 0
143
+ num1 / num2
144
+ else
145
+ "Error: Unknown operation '#{operation}'. Use add, subtract, multiply, or divide"
146
+ end
147
+ end
148
+
149
+ # Create ReAct agent with tools
150
+ agent = DSPy::ReAct.new(DeepQA, tools: [CalculatorTool.new])
151
+
152
+ # Run the agent
153
+ result = agent.forward(question: "What is 42 plus 58?")
154
+ puts result.answer # => "100"
155
+ puts result.history # => Array of reasoning steps and tool calls
156
+ ```
157
+
158
+ ### Multi-stage Pipelines
159
+ Outline the sections of an article and draft them out.
160
+
161
+ ```ruby
162
+
163
+ # write an article!
164
+ drafter = ArticleDrafter.new
165
+ article = drafter.forward(topic: "The impact of AI on software development") # { title: '....', sections: [{content: '....'}]}
166
+
167
+ class Outline < DSPy::Signature
168
+ description "Outline a thorough overview of a topic."
169
+
170
+ input do
171
+ const :topic, String
172
+ end
173
+
174
+ output do
175
+ const :title, String
176
+ const :sections, T::Array[String]
177
+ end
178
+ end
179
+
180
+ class DraftSection < DSPy::Signature
181
+ description "Draft a section of an article"
182
+
183
+ input do
184
+ const :topic, String
185
+ const :title, String
186
+ const :section, String
187
+ end
188
+
189
+ output do
190
+ const :content, String
191
+ end
192
+ end
193
+
194
+ class ArticleDrafter < DSPy::Module
195
+ def initialize
196
+ @build_outline = DSPy::ChainOfThought.new(Outline)
197
+ @draft_section = DSPy::ChainOfThought.new(DraftSection)
198
+ end
199
+
200
+ def forward(topic:)
201
+ outline = @build_outline.call(topic: topic)
202
+
203
+ sections = outline.sections.map do |section|
204
+ @draft_section.call(
205
+ topic: topic,
206
+ title: outline.title,
207
+ section: section
208
+ )
209
+ end
210
+
211
+ {
212
+ title: outline.title,
213
+ sections: sections.map(&:content)
214
+ }
215
+ end
216
+ end
217
+
218
+ ```
219
+
220
+ ## Working with Complex Types
221
+
222
+ ### Enums
223
+
224
+ ```ruby
225
+ class Color < T::Enum
226
+ enums do
227
+ Red = new
228
+ Green = new
229
+ Blue = new
230
+ end
231
+ end
232
+
233
+ class ColorSignature < DSPy::Signature
234
+ description "Identify the dominant color in a description"
235
+
236
+ input do
237
+ const :description, String,
238
+ description: 'Description of an object or scene'
239
+ end
240
+
241
+ output do
242
+ const :color, Color,
243
+ description: 'The dominant color (Red, Green, or Blue)'
244
+ end
245
+ end
246
+
247
+ predictor = DSPy::Predict.new(ColorSignature)
248
+ result = predictor.call(description: "A red apple on a wooden table")
249
+ puts result.color # => #<Color::Red>
250
+ ```
251
+
252
+ ### Optional Fields and Defaults
253
+
254
+ ```ruby
255
+ class AnalysisSignature < DSPy::Signature
256
+ description "Analyze text with optional metadata"
257
+
258
+ input do
259
+ const :text, String,
260
+ description: 'Text to analyze'
261
+ const :include_metadata, T::Boolean,
262
+ description: 'Whether to include metadata in analysis',
263
+ default: false
264
+ end
265
+
266
+ output do
267
+ const :summary, String,
268
+ description: 'Summary of the text'
269
+ const :word_count, Integer,
270
+ description: 'Number of words (optional)',
271
+ default: 0
272
+ end
273
+ end
274
+ ```
275
+
276
+ ## Advanced Usage Patterns
277
+
278
+ ### Multi-stage Pipelines
279
+
280
+ ```ruby
281
+ class TopicSignature < DSPy::Signature
282
+ description "Extract main topic from text"
283
+
284
+ input do
285
+ const :content, String,
286
+ description: 'Text content to analyze'
287
+ end
288
+
289
+ output do
290
+ const :topic, String,
291
+ description: 'Main topic of the content'
292
+ end
293
+ end
294
+
295
+ class SummarySignature < DSPy::Signature
296
+ description "Create summary focusing on specific topic"
297
+
298
+ input do
299
+ const :content, String,
300
+ description: 'Original text content'
301
+ const :topic, String,
302
+ description: 'Topic to focus on'
303
+ end
304
+
305
+ output do
306
+ const :summary, String,
307
+ description: 'Topic-focused summary'
308
+ end
309
+ end
310
+
311
+ class ArticlePipeline < DSPy::Signature
312
+ extend T::Sig
313
+
314
+ def initialize
315
+ @topic_extractor = DSPy::Predict.new(TopicSignature)
316
+ @summarizer = DSPy::ChainOfThought.new(SummarySignature)
317
+ end
318
+
319
+ sig { params(content: String).returns(T.untyped) }
320
+ def forward(content:)
321
+ # Extract topic
322
+ topic_result = @topic_extractor.call(content: content)
323
+
324
+ # Create focused summary
325
+ summary_result = @summarizer.call(
326
+ content: content,
327
+ topic: topic_result.topic
328
+ )
329
+
330
+ {
331
+ topic: topic_result.topic,
332
+ summary: summary_result.summary,
333
+ reasoning: summary_result.reasoning
334
+ }
335
+ end
336
+ end
337
+
338
+ # Usage
339
+ pipeline = ArticlePipeline.new
340
+ result = pipeline.call(content: "Long article content...")
341
+ ```
342
+
343
+ ### Retrieval Augmented Generation
344
+
345
+ ```ruby
346
+ class ContextualQA < DSPy::Signature
347
+ description "Answer questions using relevant context"
348
+
349
+ input do
350
+ const :question, String,
351
+ description: 'The question to answer'
352
+ const :context, T::Array[String],
353
+ description: 'Relevant context passages'
354
+ end
355
+
356
+ output do
357
+ const :answer, String,
358
+ description: 'Answer based on the provided context'
359
+ const :confidence, Float,
360
+ description: 'Confidence in the answer (0.0 to 1.0)'
361
+ end
362
+ end
363
+
364
+ # Usage with retriever
365
+ retriever = YourRetrieverClass.new
366
+ qa = DSPy::ChainOfThought.new(ContextualQA)
367
+
368
+ question = "What is the capital of France?"
369
+ context = retriever.retrieve(question) # Returns array of strings
370
+
371
+ result = qa.call(question: question, context: context)
372
+ puts result.reasoning # Step-by-step reasoning
373
+ puts result.answer # "Paris"
374
+ puts result.confidence # 0.95
375
+ ```
376
+
377
+ ## Instrumentation & Observability
378
+
379
+ DSPy.rb includes built-in instrumentation that captures detailed events and
380
+ performance metrics from your LLM operations. Perfect for monitoring your
381
+ applications and integrating with observability tools.
382
+
383
+ ### Quick Setup
384
+
385
+ Enable instrumentation to start capturing events:
386
+
387
+ ```ruby
388
+ DSPy::Instrumentation.configure do |config|
389
+ config.enabled = true
390
+ end
391
+ ```
392
+
393
+ ### Available Events
394
+
395
+ Subscribe to these events to monitor different aspects of your LLM operations:
396
+
397
+ | Event Name | Triggered When | Key Payload Fields |
398
+ |------------|----------------|-------------------|
399
+ | `dspy.lm.request` | LLM API request lifecycle | `gen_ai_system`, `model`, `provider`, `duration_ms`, `status` |
400
+ | `dspy.lm.tokens` | Token usage tracking | `tokens_input`, `tokens_output`, `tokens_total` |
401
+ | `dspy.predict` | Prediction operations | `signature_class`, `input_size`, `duration_ms`, `status` |
402
+ | `dspy.chain_of_thought` | CoT reasoning | `signature_class`, `model`, `duration_ms`, `status` |
403
+ | `dspy.react` | Agent operations | `max_iterations`, `tools_used`, `duration_ms`, `status` |
404
+ | `dspy.react.tool_call` | Tool execution | `tool_name`, `tool_input`, `tool_output`, `duration_ms` |
405
+
406
+ ### Event Payloads
407
+
408
+ The instrumentation emits events with structured payloads you can process:
409
+
410
+ ```ruby
411
+ # Example event payload for dspy.predict
412
+ {
413
+ signature_class: "QuestionAnswering",
414
+ model: "gpt-4o-mini",
415
+ provider: "openai",
416
+ input_size: 45,
417
+ duration_ms: 1234.56,
418
+ cpu_time_ms: 89.12,
419
+ status: "success",
420
+ timestamp: "2024-01-15T10:30:00Z"
421
+ }
422
+
423
+ # Example token usage payload
424
+ {
425
+ tokens_input: 150,
426
+ tokens_output: 45,
427
+ tokens_total: 195,
428
+ gen_ai_system: "openai",
429
+ signature_class: "QuestionAnswering"
430
+ }
431
+ ```
432
+
433
+ Events are emitted via dry-monitor notifications, giving you flexibility to
434
+ process them however you need - logging, metrics, alerts, or custom monitoring.
435
+
436
+ ### Token Tracking
437
+
438
+ Token usage is extracted from actual API responses (OpenAI and Anthropic only),
439
+ giving you precise cost tracking:
440
+
441
+ ```ruby
442
+ # Token events include:
443
+ {
444
+ tokens_input: 150, # From API response
445
+ tokens_output: 45, # From API response
446
+ tokens_total: 195, # From API response
447
+ gen_ai_system: "openai",
448
+ gen_ai_request_model: "gpt-4o-mini"
449
+ }
450
+ ```
451
+
452
+ ### Configuration Options
453
+
454
+ ```ruby
455
+ DSPy::Instrumentation.configure do |config|
456
+ config.enabled = true
457
+ config.log_to_stdout = false
458
+ config.log_file = 'log/dspy.log'
459
+ config.log_level = :info
460
+
461
+ # Custom payload enrichment
462
+ config.custom_options = lambda do |event|
463
+ {
464
+ timestamp: Time.current.iso8601,
465
+ hostname: Socket.gethostname,
466
+ request_id: Thread.current[:request_id]
467
+ }
468
+ end
469
+ end
470
+ ```
471
+
472
+ ### Integration with Monitoring Tools
473
+
474
+ Subscribe to events for custom processing:
475
+
476
+ ```ruby
477
+ # Subscribe to all LM events
478
+ DSPy::Instrumentation.subscribe('dspy.lm.*') do |event|
479
+ puts "#{event.id}: #{event.payload[:duration_ms]}ms"
480
+ end
481
+
482
+ # Subscribe to specific events
483
+ DSPy::Instrumentation.subscribe('dspy.predict') do |event|
484
+ MyMetrics.histogram('dspy.predict.duration', event.payload[:duration_ms])
485
+ end
486
+ ```
487
+
488
+ ## License
489
+
490
+ This project is licensed under the MIT License.
@@ -0,0 +1,162 @@
1
+ # typed: strict
2
+ # frozen_string_literal: true
3
+
4
+ require 'sorbet-runtime'
5
+ require_relative 'predict'
6
+ require_relative 'signature'
7
+ require_relative 'instrumentation'
8
+
9
+ module DSPy
10
+ # Enhances prediction by encouraging step-by-step reasoning
11
+ # before providing a final answer using Sorbet signatures.
12
+ class ChainOfThought < Predict
13
+ extend T::Sig
14
+
15
+ FieldDescriptor = DSPy::Signature::FieldDescriptor
16
+
17
+ sig { params(signature_class: T.class_of(DSPy::Signature)).void }
18
+ def initialize(signature_class)
19
+ @original_signature = signature_class
20
+
21
+ # Create enhanced output struct with reasoning
22
+ enhanced_output_struct = create_enhanced_output_struct(signature_class)
23
+
24
+ # Create enhanced signature class
25
+ enhanced_signature = Class.new(DSPy::Signature) do
26
+ # Set the description
27
+ description "#{signature_class.description} Think step by step."
28
+
29
+ # Use the same input struct and copy field descriptors
30
+ @input_struct_class = signature_class.input_struct_class
31
+ @input_field_descriptors = signature_class.instance_variable_get(:@input_field_descriptors) || {}
32
+
33
+ # Use the enhanced output struct and create field descriptors for it
34
+ @output_struct_class = enhanced_output_struct
35
+
36
+ # Create field descriptors for the enhanced output struct
37
+ @output_field_descriptors = {}
38
+
39
+ # Copy original output field descriptors
40
+ original_output_descriptors = signature_class.instance_variable_get(:@output_field_descriptors) || {}
41
+ @output_field_descriptors.merge!(original_output_descriptors)
42
+
43
+ # Add reasoning field descriptor (ChainOfThought always provides this)
44
+ @output_field_descriptors[:reasoning] = FieldDescriptor.new(String, "Step by step reasoning process")
45
+
46
+ class << self
47
+ attr_reader :input_struct_class, :output_struct_class
48
+ end
49
+ end
50
+
51
+ # Call parent constructor with enhanced signature
52
+ super(enhanced_signature)
53
+ @signature_class = enhanced_signature
54
+ end
55
+
56
+ # Override forward_untyped to add ChainOfThought-specific instrumentation
57
+ sig { override.params(input_values: T.untyped).returns(T.untyped) }
58
+ def forward_untyped(**input_values)
59
+ # Prepare instrumentation payload
60
+ input_fields = input_values.keys.map(&:to_s)
61
+
62
+ # Instrument ChainOfThought lifecycle
63
+ result = Instrumentation.instrument('dspy.chain_of_thought', {
64
+ signature_class: @original_signature.name,
65
+ model: lm.model,
66
+ provider: lm.provider,
67
+ input_fields: input_fields
68
+ }) do
69
+ # Call parent prediction logic
70
+ prediction_result = super(**input_values)
71
+
72
+ # Analyze reasoning if present
73
+ if prediction_result.respond_to?(:reasoning) && prediction_result.reasoning
74
+ reasoning_content = prediction_result.reasoning.to_s
75
+ reasoning_length = reasoning_content.length
76
+ reasoning_steps = count_reasoning_steps(reasoning_content)
77
+
78
+ # Emit reasoning analysis event
79
+ Instrumentation.emit('dspy.chain_of_thought.reasoning_complete', {
80
+ signature_class: @original_signature.name,
81
+ reasoning_steps: reasoning_steps,
82
+ reasoning_length: reasoning_length,
83
+ has_reasoning: !reasoning_content.empty?
84
+ })
85
+ end
86
+
87
+ prediction_result
88
+ end
89
+
90
+ result
91
+ end
92
+
93
+ private
94
+
95
+ # Count reasoning steps by looking for step indicators
96
+ def count_reasoning_steps(reasoning_text)
97
+ return 0 if reasoning_text.nil? || reasoning_text.empty?
98
+
99
+ # Look for common step patterns
100
+ step_patterns = [
101
+ /step \d+/i,
102
+ /\d+\./,
103
+ /first|second|third|then|next|finally/i,
104
+ /\n\s*-/
105
+ ]
106
+
107
+ max_count = 0
108
+ step_patterns.each do |pattern|
109
+ count = reasoning_text.scan(pattern).length
110
+ max_count = [max_count, count].max
111
+ end
112
+
113
+ # Fallback: count sentences if no clear steps
114
+ max_count > 0 ? max_count : reasoning_text.split(/[.!?]+/).reject(&:empty?).length
115
+ end
116
+
117
+ sig { params(signature_class: T.class_of(DSPy::Signature)).returns(T.class_of(T::Struct)) }
118
+ def create_enhanced_output_struct(signature_class)
119
+ # Get original output props
120
+ original_props = signature_class.output_struct_class.props
121
+
122
+ # Create new struct class with reasoning added
123
+ Class.new(T::Struct) do
124
+ # Add all original fields
125
+ original_props.each do |name, prop|
126
+ # Extract the type and other options
127
+ type = prop[:type]
128
+ options = prop.except(:type, :type_object, :accessor_key, :sensitivity, :redaction)
129
+
130
+ # Handle default values
131
+ if options[:default]
132
+ const name, type, default: options[:default]
133
+ elsif options[:factory]
134
+ const name, type, factory: options[:factory]
135
+ else
136
+ const name, type
137
+ end
138
+ end
139
+
140
+ # Add reasoning field (ChainOfThought always provides this)
141
+ const :reasoning, String
142
+
143
+ # Add to_h method to serialize the struct to a hash
144
+ define_method :to_h do
145
+ hash = {}
146
+
147
+ # Start with input values if available
148
+ if self.instance_variable_defined?(:@input_values)
149
+ hash.merge!(self.instance_variable_get(:@input_values))
150
+ end
151
+
152
+ # Then add output properties
153
+ self.class.props.keys.each do |key|
154
+ hash[key] = self.send(key)
155
+ end
156
+
157
+ hash
158
+ end
159
+ end
160
+ end
161
+ end
162
+ end
data/lib/dspy/field.rb ADDED
@@ -0,0 +1,23 @@
1
+ # frozen_string_literal: true
2
+
3
+ module DSPy
4
+ class InputField
5
+ attr_reader :name, :type, :desc
6
+
7
+ def initialize(name, type, desc: nil)
8
+ @name = name
9
+ @type = type
10
+ @desc = desc
11
+ end
12
+ end
13
+
14
+ class OutputField
15
+ attr_reader :name, :type, :desc
16
+
17
+ def initialize(name, type, desc: nil)
18
+ @name = name
19
+ @type = type
20
+ @desc = desc
21
+ end
22
+ end
23
+ end