RubyGems - dspy - Versions diffs - 0.1.0 → 0.3.0 - Mend

dspy 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +4 -4
data/README.md +483 -3
data/lib/dspy/chain_of_thought.rb +162 -0
data/lib/dspy/field.rb +23 -0
data/lib/dspy/instrumentation/token_tracker.rb +54 -0
data/lib/dspy/instrumentation.rb +100 -0
data/lib/dspy/lm/adapter.rb +41 -0
data/lib/dspy/lm/adapter_factory.rb +59 -0
data/lib/dspy/lm/adapters/anthropic_adapter.rb +96 -0
data/lib/dspy/lm/adapters/openai_adapter.rb +53 -0
data/lib/dspy/lm/adapters/ruby_llm_adapter.rb +81 -0
data/lib/dspy/lm/errors.rb +10 -0
data/lib/dspy/lm/response.rb +28 -0
data/lib/dspy/lm.rb +128 -0
data/lib/dspy/module.rb +58 -0
data/lib/dspy/predict.rb +192 -0
data/lib/dspy/re_act.rb +428 -0
data/lib/dspy/schema_adapters.rb +55 -0
data/lib/dspy/signature.rb +298 -0
data/lib/dspy/subscribers/logger_subscriber.rb +197 -0
data/lib/dspy/tools/base.rb +226 -0
data/lib/dspy/tools.rb +21 -0
data/lib/dspy.rb +38 -2
metadata +150 -4

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6b4d0c0f8eba6601ce96a8acf5a167a8a7be9fec7f20c024495eee01b702cff1
-  data.tar.gz: 15f4abd449e6e74b30d0ea47231cb238e9e40b51b048c31b3a74c2c2571d022b
+  metadata.gz: 8053438ba1e55a093c35b50b9dc3b106b0c158ce426c6f286ba7f62aeee8161d
+  data.tar.gz: 456ca182c45f1924caa6eeea6c86debba28a77ba127c6326014b1a468e07c445
 SHA512:
-  metadata.gz: f6c87053b33dbfc27eb2386801cfff2ce6fe67a9d8e3518624be72f914a099e7f85c18e2a90f06634b7e4e442c190ca30a8188ca75d9a40ed4ad3cb1dc79de63
-  data.tar.gz: ec5ab6691f7494449ce4bf2469654eebb85d5de4cc76bf8893d895e5d00f3c64621d0602c1c800782a9917da281626d5020e192af439f683948e0d92f638c0fa
+  metadata.gz: 67c31136acd1ef0a01c49938b3f142abf3690264ac048969e342a750dce80b59b16b258efc4ee6385fb55fb7580e18ac9f87aeb7fb09631cfb0fb139debb4f52
+  data.tar.gz: 94214ff9fdd61ea1d478abf9d8da37325d884e8238b01e7bd50cdb3bd21fcfcfaf2c0c6b3509241f0bff3e730587e770588c9f262206f1ca1bc809724e553b31

data/README.md CHANGED Viewed

@@ -1,10 +1,490 @@
 # DSPy.rb
-A port of the DSPy library to Ruby.
+**Build reliable LLM applications in Ruby using composable, type-safe modules.**
+DSPy.rb brings structured LLM programming to Ruby developers.
+Instead of wrestling with prompt strings and parsing responses,
+you define typed signatures and compose them into pipelines that just work.
+Traditional prompting is like writing code with string concatenation: it works until
+it doesn't. DSPy.rb brings you the programming approach pioneered
+by [dspy.ai](https://dspy.ai/): instead of crafting fragile prompts, you define
+modular signatures and let the framework handle the messy details.
+The result? LLM applications that actually scale and don't break when you sneeze.
+## What You Get
+**Core Building Blocks:**
+- **Signatures** - Define input/output schemas using Sorbet types
+- **Predict** - Basic LLM completion with structured data
+- **Chain of Thought** - Step-by-step reasoning for complex problems
+- **ReAct** - Tool-using agents that can actually get things done
+- **RAG** - Context-enriched responses from your data
+- **Multi-stage Pipelines** - Compose multiple LLM calls into workflows
+- OpenAI and Anthropic support via [Ruby LLM](https://github.com/crmne/ruby_llm)
+- Runtime type checking with [Sorbet](https://sorbet.org/)
+- Type-safe tool definitions for ReAct agents
+## Fair Warning
+This is fresh off the oven and evolving fast.
+I'm actively building this as a Ruby port of the [DSPy library](https://dspy.ai/).
+If you hit bugs or want to contribute, just email me directly!
+## What's Next
+These are my goals to release v1.0.
+- Solidify prompt optimization
+- OTel Integration
+- Ollama support
 ## Installation
-```bash
-gem install dspy
+Skip the gem for now - install straight from this repo while I prep the first release:
+```ruby
+gem 'dspy', github: 'vicentereig/dspy.rb'
 ```
+## Usage Examples
+### Simple Prediction
+```ruby
+# Define a signature for sentiment classification
+class Classify < DSPy::Signature
+  description "Classify sentiment of a given sentence."
+  class Sentiment < T::Enum
+    enums do
+      Positive = new('positive')
+      Negative = new('negative')
+      Neutral = new('neutral')
+    end
+  end
+  input do
+    const :sentence, String
+  end
+  output do
+    const :sentiment, Sentiment
+    const :confidence, Float
+  end
+end
+# Configure DSPy with your LLM
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+# Create the predictor and run inference
+classify = DSPy::Predict.new(Classify)
+result = classify.call(sentence: "This book was super fun to read, though not the last chapter.")
+# result is a properly typed T::Struct instance
+puts result.sentiment    # => #<Sentiment::Positive>
+puts result.confidence   # => 0.85
+```
+### Chain of Thought Reasoning
+```ruby
+class AnswerPredictor < DSPy::Signature
+  description "Provides a concise answer to the question"
+  input do
+    const :question, String
+  end
+  output do
+    const :answer, String
+  end
+end
+# Chain of thought automatically adds a 'reasoning' field to the output
+qa_cot = DSPy::ChainOfThought.new(AnswerPredictor)
+result = qa_cot.call(question: "Two dice are tossed. What is the probability that the sum equals two?")
+puts result.reasoning  # => "There is only one way to get a sum of 2..."
+puts result.answer     # => "1/36"
+```
+### ReAct Agents with Tools
+```ruby
+class DeepQA < DSPy::Signature
+  description "Answer questions with consideration for the context"
+  input do
+    const :question, String
+  end
+  output do
+    const :answer, String
+  end
+end
+# Define tools for the agent
+class CalculatorTool < DSPy::Tools::Base
+  tool_name 'calculator'
+  tool_description 'Performs basic arithmetic operations'
+  sig { params(operation: String, num1: Float, num2: Float).returns(T.any(Float, String)) }
+  def call(operation:, num1:, num2:)
+    case operation.downcase
+    when 'add' then num1 + num2
+    when 'subtract' then num1 - num2
+    when 'multiply' then num1 * num2
+    when 'divide'
+      return "Error: Cannot divide by zero" if num2 == 0
+      num1 / num2
+    else
+      "Error: Unknown operation '#{operation}'. Use add, subtract, multiply, or divide"
+    end
+  end
+# Create ReAct agent with tools
+agent = DSPy::ReAct.new(DeepQA, tools: [CalculatorTool.new])
+# Run the agent
+result = agent.forward(question: "What is 42 plus 58?")
+puts result.answer # => "100"
+puts result.history # => Array of reasoning steps and tool calls
+```
+### Multi-stage Pipelines
+Outline the sections of an article and draft them out.
+```ruby
+# write an article!
+drafter = ArticleDrafter.new
+article = drafter.forward(topic: "The impact of AI on software development") # { title: '....', sections: [{content: '....'}]}
+class Outline < DSPy::Signature
+  description "Outline a thorough overview of a topic."
+  input do
+    const :topic, String
+  end
+  output do
+    const :title, String
+    const :sections, T::Array[String]
+  end
+end
+class DraftSection < DSPy::Signature
+  description "Draft a section of an article"
+  input do
+    const :topic, String
+    const :title, String
+    const :section, String
+  end
+  output do
+    const :content, String
+  end
+end
+class ArticleDrafter < DSPy::Module
+  def initialize
+    @build_outline = DSPy::ChainOfThought.new(Outline)
+    @draft_section = DSPy::ChainOfThought.new(DraftSection)
+  end
+  def forward(topic:)
+    outline = @build_outline.call(topic: topic)
+    sections = outline.sections.map do |section|
+      @draft_section.call(
+        topic: topic,
+        title: outline.title,
+        section: section
+      )
+    end
+    {
+      title: outline.title,
+      sections: sections.map(&:content)
+    }
+  end
+end
+```
+## Working with Complex Types
+### Enums
+```ruby
+class Color < T::Enum
+  enums do
+    Red = new
+    Green = new
+    Blue = new
+  end
+end
+class ColorSignature < DSPy::Signature
+  description "Identify the dominant color in a description"
+  input do
+    const :description, String,
+      description: 'Description of an object or scene'
+  end
+  output do
+    const :color, Color,
+      description: 'The dominant color (Red, Green, or Blue)'
+  end
+end
+predictor = DSPy::Predict.new(ColorSignature)
+result = predictor.call(description: "A red apple on a wooden table")
+puts result.color  # => #<Color::Red>
+```
+### Optional Fields and Defaults
+```ruby
+class AnalysisSignature < DSPy::Signature
+  description "Analyze text with optional metadata"
+  input do
+    const :text, String,
+      description: 'Text to analyze'
+    const :include_metadata, T::Boolean,
+      description: 'Whether to include metadata in analysis',
+      default: false
+  end
+  output do
+    const :summary, String,
+      description: 'Summary of the text'
+    const :word_count, Integer,
+      description: 'Number of words (optional)',
+      default: 0
+  end
+end
+```
+## Advanced Usage Patterns
+### Multi-stage Pipelines
+```ruby
+class TopicSignature < DSPy::Signature
+  description "Extract main topic from text"
+  input do
+    const :content, String,
+      description: 'Text content to analyze'
+  end
+  output do
+    const :topic, String,
+      description: 'Main topic of the content'
+  end
+end
+class SummarySignature < DSPy::Signature
+  description "Create summary focusing on specific topic"
+  input do
+    const :content, String,
+      description: 'Original text content'
+    const :topic, String,
+      description: 'Topic to focus on'
+  end
+  output do
+    const :summary, String,
+      description: 'Topic-focused summary'
+  end
+end
+class ArticlePipeline < DSPy::Signature
+  extend T::Sig
+  def initialize
+    @topic_extractor = DSPy::Predict.new(TopicSignature)
+    @summarizer = DSPy::ChainOfThought.new(SummarySignature)
+  end
+  sig { params(content: String).returns(T.untyped) }
+  def forward(content:)
+    # Extract topic
+    topic_result = @topic_extractor.call(content: content)
+    # Create focused summary
+    summary_result = @summarizer.call(
+      content: content,
+      topic: topic_result.topic
+    )
+    {
+      topic: topic_result.topic,
+      summary: summary_result.summary,
+      reasoning: summary_result.reasoning
+    }
+  end
+end
+# Usage
+pipeline = ArticlePipeline.new
+result = pipeline.call(content: "Long article content...")
+```
+### Retrieval Augmented Generation
+```ruby
+class ContextualQA < DSPy::Signature
+  description "Answer questions using relevant context"
+  input do
+    const :question, String,
+      description: 'The question to answer'
+    const :context, T::Array[String],
+      description: 'Relevant context passages'
+  end
+  output do
+    const :answer, String,
+      description: 'Answer based on the provided context'
+    const :confidence, Float,
+      description: 'Confidence in the answer (0.0 to 1.0)'
+  end
+end
+# Usage with retriever
+retriever = YourRetrieverClass.new
+qa = DSPy::ChainOfThought.new(ContextualQA)
+question = "What is the capital of France?"
+context = retriever.retrieve(question)  # Returns array of strings
+result = qa.call(question: question, context: context)
+puts result.reasoning   # Step-by-step reasoning
+puts result.answer      # "Paris"
+puts result.confidence  # 0.95
+```
+## Instrumentation & Observability
+DSPy.rb includes built-in instrumentation that captures detailed events and
+performance metrics from your LLM operations. Perfect for monitoring your
+applications and integrating with observability tools.
+### Quick Setup
+Enable instrumentation to start capturing events:
+```ruby
+DSPy::Instrumentation.configure do |config|
+  config.enabled = true
+end
+```
+### Available Events
+Subscribe to these events to monitor different aspects of your LLM operations:
+| Event Name | Triggered When | Key Payload Fields |
+|------------|----------------|-------------------|
+| `dspy.lm.request` | LLM API request lifecycle | `gen_ai_system`, `model`, `provider`, `duration_ms`, `status` |
+| `dspy.lm.tokens` | Token usage tracking | `tokens_input`, `tokens_output`, `tokens_total` |
+| `dspy.predict` | Prediction operations | `signature_class`, `input_size`, `duration_ms`, `status` |
+| `dspy.chain_of_thought` | CoT reasoning | `signature_class`, `model`, `duration_ms`, `status` |
+| `dspy.react` | Agent operations | `max_iterations`, `tools_used`, `duration_ms`, `status` |
+| `dspy.react.tool_call` | Tool execution | `tool_name`, `tool_input`, `tool_output`, `duration_ms` |
+### Event Payloads
+The instrumentation emits events with structured payloads you can process:
+```ruby
+# Example event payload for dspy.predict
+{
+  signature_class: "QuestionAnswering",
+  model: "gpt-4o-mini",
+  provider: "openai",
+  input_size: 45,
+  duration_ms: 1234.56,
+  cpu_time_ms: 89.12,
+  status: "success",
+  timestamp: "2024-01-15T10:30:00Z"
+}
+# Example token usage payload
+{
+  tokens_input: 150,
+  tokens_output: 45,
+  tokens_total: 195,
+  gen_ai_system: "openai",
+  signature_class: "QuestionAnswering"
+}
+```
+Events are emitted via dry-monitor notifications, giving you flexibility to
+process them however you need - logging, metrics, alerts, or custom monitoring.
+### Token Tracking
+Token usage is extracted from actual API responses (OpenAI and Anthropic only),
+giving you precise cost tracking:
+```ruby
+# Token events include:
+{
+  tokens_input: 150,     # From API response
+  tokens_output: 45,     # From API response
+  tokens_total: 195,     # From API response
+  gen_ai_system: "openai",
+  gen_ai_request_model: "gpt-4o-mini"
+}
+```
+### Configuration Options
+```ruby
+DSPy::Instrumentation.configure do |config|
+  config.enabled = true
+  config.log_to_stdout = false
+  config.log_file = 'log/dspy.log'
+  config.log_level = :info
+  # Custom payload enrichment
+  config.custom_options = lambda do |event|
+    {
+      timestamp: Time.current.iso8601,
+      hostname: Socket.gethostname,
+      request_id: Thread.current[:request_id]
+    }
+  end
+end
+```
+### Integration with Monitoring Tools
+Subscribe to events for custom processing:
+```ruby
+# Subscribe to all LM events
+DSPy::Instrumentation.subscribe('dspy.lm.*') do |event|
+  puts "#{event.id}: #{event.payload[:duration_ms]}ms"
+end
+# Subscribe to specific events
+DSPy::Instrumentation.subscribe('dspy.predict') do |event|
+  MyMetrics.histogram('dspy.predict.duration', event.payload[:duration_ms])
+end
+```
+## License
+This project is licensed under the MIT License.

data/lib/dspy/chain_of_thought.rb ADDED Viewed

@@ -0,0 +1,162 @@
+# typed: strict
+# frozen_string_literal: true
+require 'sorbet-runtime'
+require_relative 'predict'
+require_relative 'signature'
+require_relative 'instrumentation'
+module DSPy
+  # Enhances prediction by encouraging step-by-step reasoning
+  # before providing a final answer using Sorbet signatures.
+  class ChainOfThought < Predict
+    extend T::Sig
+    FieldDescriptor = DSPy::Signature::FieldDescriptor
+    sig { params(signature_class: T.class_of(DSPy::Signature)).void }
+    def initialize(signature_class)
+      @original_signature = signature_class
+      # Create enhanced output struct with reasoning
+      enhanced_output_struct = create_enhanced_output_struct(signature_class)
+      # Create enhanced signature class
+      enhanced_signature = Class.new(DSPy::Signature) do
+        # Set the description
+        description "#{signature_class.description} Think step by step."
+        # Use the same input struct and copy field descriptors
+        @input_struct_class = signature_class.input_struct_class
+        @input_field_descriptors = signature_class.instance_variable_get(:@input_field_descriptors) || {}
+        # Use the enhanced output struct and create field descriptors for it
+        @output_struct_class = enhanced_output_struct
+        # Create field descriptors for the enhanced output struct
+        @output_field_descriptors = {}
+        # Copy original output field descriptors
+        original_output_descriptors = signature_class.instance_variable_get(:@output_field_descriptors) || {}
+        @output_field_descriptors.merge!(original_output_descriptors)
+        # Add reasoning field descriptor (ChainOfThought always provides this)
+        @output_field_descriptors[:reasoning] = FieldDescriptor.new(String, "Step by step reasoning process")
+        class << self
+          attr_reader :input_struct_class, :output_struct_class
+        end
+      end
+      # Call parent constructor with enhanced signature
+      super(enhanced_signature)
+      @signature_class = enhanced_signature
+    end
+    # Override forward_untyped to add ChainOfThought-specific instrumentation
+    sig { override.params(input_values: T.untyped).returns(T.untyped) }
+    def forward_untyped(**input_values)
+      # Prepare instrumentation payload
+      input_fields = input_values.keys.map(&:to_s)
+      # Instrument ChainOfThought lifecycle
+      result = Instrumentation.instrument('dspy.chain_of_thought', {
+        signature_class: @original_signature.name,
+        model: lm.model,
+        provider: lm.provider,
+        input_fields: input_fields
+      }) do
+        # Call parent prediction logic
+        prediction_result = super(**input_values)
+        # Analyze reasoning if present
+        if prediction_result.respond_to?(:reasoning) && prediction_result.reasoning
+          reasoning_content = prediction_result.reasoning.to_s
+          reasoning_length = reasoning_content.length
+          reasoning_steps = count_reasoning_steps(reasoning_content)
+          # Emit reasoning analysis event
+          Instrumentation.emit('dspy.chain_of_thought.reasoning_complete', {
+            signature_class: @original_signature.name,
+            reasoning_steps: reasoning_steps,
+            reasoning_length: reasoning_length,
+            has_reasoning: !reasoning_content.empty?
+          })
+        end
+        prediction_result
+      end
+      result
+    end
+    private
+    # Count reasoning steps by looking for step indicators
+    def count_reasoning_steps(reasoning_text)
+      return 0 if reasoning_text.nil? || reasoning_text.empty?
+      # Look for common step patterns
+      step_patterns = [
+        /step \d+/i,
+        /\d+\./,
+        /first|second|third|then|next|finally/i,
+        /\n\s*-/
+      ]
+      max_count = 0
+      step_patterns.each do |pattern|
+        count = reasoning_text.scan(pattern).length
+        max_count = [max_count, count].max
+      end
+      # Fallback: count sentences if no clear steps
+      max_count > 0 ? max_count : reasoning_text.split(/[.!?]+/).reject(&:empty?).length
+    end
+    sig { params(signature_class: T.class_of(DSPy::Signature)).returns(T.class_of(T::Struct)) }
+    def create_enhanced_output_struct(signature_class)
+      # Get original output props
+      original_props = signature_class.output_struct_class.props
+      # Create new struct class with reasoning added
+      Class.new(T::Struct) do
+        # Add all original fields
+        original_props.each do |name, prop|
+          # Extract the type and other options
+          type = prop[:type]
+          options = prop.except(:type, :type_object, :accessor_key, :sensitivity, :redaction)
+          # Handle default values
+          if options[:default]
+            const name, type, default: options[:default]
+          elsif options[:factory]
+            const name, type, factory: options[:factory]
+          else
+            const name, type
+          end
+        end
+        # Add reasoning field (ChainOfThought always provides this)
+        const :reasoning, String
+        # Add to_h method to serialize the struct to a hash
+        define_method :to_h do
+          hash = {}
+          # Start with input values if available
+          if self.instance_variable_defined?(:@input_values)
+            hash.merge!(self.instance_variable_get(:@input_values))
+          end
+          # Then add output properties
+          self.class.props.keys.each do |key|
+            hash[key] = self.send(key)
+          end
+          hash
+        end
+      end
+    end
+  end
+end

data/lib/dspy/field.rb ADDED Viewed

@@ -0,0 +1,23 @@
+# frozen_string_literal: true
+module DSPy
+  class InputField
+    attr_reader :name, :type, :desc
+    def initialize(name, type, desc: nil)
+      @name = name
+      @type = type
+      @desc = desc
+    end
+  end
+  class OutputField
+    attr_reader :name, :type, :desc
+    def initialize(name, type, desc: nil)
+      @name = name
+      @type = type
+      @desc = desc
+    end
+  end
+end