RubyGems - dspy - Versions diffs - 0.1.0 → 0.2.0 - Mend

dspy 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

checksums.yaml +4 -4
data/README.md +374 -3
data/lib/dspy/chain_of_thought.rb +22 -0
data/lib/dspy/ext/dry_schema.rb +94 -0
data/lib/dspy/field.rb +23 -0
data/lib/dspy/lm.rb +76 -0
data/lib/dspy/module.rb +13 -0
data/lib/dspy/predict.rb +72 -0
data/lib/dspy/re_act.rb +253 -0
data/lib/dspy/signature.rb +26 -0
data/lib/dspy/sorbet_chain_of_thought.rb +91 -0
data/lib/dspy/sorbet_module.rb +47 -0
data/lib/dspy/sorbet_predict.rb +180 -0
data/lib/dspy/sorbet_re_act.rb +332 -0
data/lib/dspy/sorbet_signature.rb +218 -0
data/lib/dspy/tools/sorbet_tool.rb +226 -0
data/lib/dspy/tools.rb +21 -0
data/lib/dspy/types.rb +3 -0
data/lib/dspy.rb +29 -2
metadata +117 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 6b4d0c0f8eba6601ce96a8acf5a167a8a7be9fec7f20c024495eee01b702cff1
-  data.tar.gz: 15f4abd449e6e74b30d0ea47231cb238e9e40b51b048c31b3a74c2c2571d022b
+  metadata.gz: e0ab7e7c2a8d741b4d0080ad96ce991752855d0bdc61393f6f329df404a5c956
+  data.tar.gz: e3f0a0c42b4b66e6a6427e7ff963524abb0059ac0ad78d18fdb12684240e9459
 SHA512:
-  metadata.gz: f6c87053b33dbfc27eb2386801cfff2ce6fe67a9d8e3518624be72f914a099e7f85c18e2a90f06634b7e4e442c190ca30a8188ca75d9a40ed4ad3cb1dc79de63
-  data.tar.gz: ec5ab6691f7494449ce4bf2469654eebb85d5de4cc76bf8893d895e5d00f3c64621d0602c1c800782a9917da281626d5020e192af439f683948e0d92f638c0fa
+  metadata.gz: 3530f0bb5a8cbfa5ffe99a4b0fb3e7d52f0e3c205501b62904b05b4e35f907e824a97a94f91f432e968aaf1360658757eb4f9a7a189238628f2187ad32ab4060
+  data.tar.gz: c737f35b0a17cd8ddf98a99656a1acf04b15753c5f3130a3a20627c751a814ec489fbd7ef640834c58b520ee3f7a68657e1473289b3a60bc9cd68cd673d90655

data/README.md CHANGED Viewed

@@ -1,10 +1,381 @@
 # DSPy.rb
-A port of the DSPy library to Ruby.
+A Ruby port of the [DSPy library](https://dspy.ai/), enabling a composable and pipeline-oriented approach to programming with Large Language Models (LLMs) in Ruby.
+## Current State
+DSPy.rb provides a foundation for composable LLM programming with the following implemented features:
+- **Signatures**: Define input/output schemas for LLM interactions using JSON schemas
+- **Predict**: Basic LLM completion with structured inputs and outputs
+- **Chain of Thought**: Enhanced reasoning through step-by-step thinking
+- **ReAct**: Compose multiple LLM calls in a structured workflow using tools.
+- **RAG (Retrieval-Augmented Generation)**: Enriched responses with context from retrieval
+- **Multi-stage Pipelines**: Compose multiple LLM calls in a structured workflow
+The library currently supports:
+- OpenAI and Anthropic via [Ruby LLM](https://github.com/crmne/ruby_llm)
+- JSON schema validation with [dry-schema](https://dry-rb.org/gems/dry-schema/)
 ## Installation
-```bash
-gem install dspy
+This is not even fresh  off the oven. I recommend you installing
+it straight from this repo, while I build the first release.
+```ruby
+gem 'dspy', github: 'vicentereig/dspy.rb'
+```
+## Usage Examples
+### Basic Prediction
+```ruby
+# Define a signature for sentiment classification
+class Classify < DSPy::Signature
+  description "Classify sentiment of a given sentence."
+  input do
+    required(:sentence).value(:string).meta(description: 'The sentence to analyze')
+  end
+  output do
+    required(:sentiment).value(included_in?: %w(positive negative neutral))
+      .meta(description: 'The sentiment classification')
+    required(:confidence).value(:float).meta(description: 'Confidence score')
+  end
+end
+# Initialize the language model
+class SentimentClassifierWithDescriptions < DSPy::Signature
+  description "Classify sentiment of a given sentence."
+  input do
+    required(:sentence)
+      .value(:string)
+      .meta(description: 'The sentence whose sentiment you are analyzing')
+  end
+  output do
+    required(:sentiment)
+      .value(included_in?: [:positive, :negative, :neutral])
+      .meta(description: 'The allowed values to classify sentences')
+    required(:confidence).value(:float)
+                         .meta(description:'The confidence score for the classification')
+  end
+end
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+# Create the predictor and run inference
+classify = DSPy::Predict.new(Classify)
+result = classify.call(sentence: "This book was super fun to read, though not the last chapter.")
+# => {:confidence=>0.85, :sentence=>"This book was super fun to read, though not the last chapter.", :sentiment=>"positive"}
+```
+### Chain of Thought Reasoning
+```ruby
+class AnswerPredictor < DSPy::Signature
+  description "Provides a concise answer to the question"
+  input do
+    required(:question).value(:string)
+  end
+  output do
+    required(:answer).value(:string)
+  end
+end
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+qa_cot = DSPy::ChainOfThought.new(AnswerPredictor)
+response = qa_cot.call(question: "Two dice are tossed. What is the probability that the sum equals two?")
+# Result includes reasoning and answer in the response
+# {:question=>"...", :answer=>"1/36", :reasoning=>"There is only one way to get a sum of 2..."}
+```
+### RAG (Retrieval-Augmented Generation)
+```ruby
+class ContextualQA < DSPy::Signature
+  description "Answers questions using relevant context"
+  input do
+    required(:context).value(Types::Array.of(:string))
+    required(:question).filled(:string)
+  end
+  output do
+    required(:response).filled(:string)
+  end
+end
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+# Set up retriever (example using ColBERT)
+retriever = ColBERTv2.new(url: 'http://your-retriever-endpoint')
+# Generate a contextual response
+rag = DSPy::ChainOfThought.new(ContextualQA)
+prediction = rag.call(question: question, context: retriever.call('your query').map(&:long_text))
 ```
+### Multi-stage Pipeline
+```ruby
+# Create a pipeline for article drafting
+class ArticleDrafter < DSPy::Module
+  def initialize
+    @build_outline = DSPy::ChainOfThought.new(Outline)
+    @draft_section = DSPy::ChainOfThought.new(DraftSection)
+  end
+  def forward(topic)
+    # First build the outline
+    outline = @build_outline.call(topic: topic)
+    # Then draft each section
+    sections = []
+    (outline[:section_subheadings] || {}).each do |heading, subheadings|
+      section = @draft_section.call(
+        topic: outline[:title],
+        section_heading: "## #{heading}",
+        section_subheadings: [subheadings].flatten.map { |sh| "### #{sh}" }
+      )
+      sections << section
+    end
+    DraftArticle.new(title: outline[:title], sections: sections)
+  end
+end
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+# Usage
+drafter = ArticleDrafter.new
+article = drafter.call("World Cup 2002")
+```
+### ReAct: Reasoning and Acting with Tools
+The `DSPy::ReAct` module implements the ReAct (Reasoning and Acting) paradigm, allowing LLMs to synergize reasoning with tool usage to answer complex questions or complete tasks. The agent iteratively generates thoughts, chooses actions (either calling a tool or finishing), and observes the results to inform its next step.
+**Core Components:**
+*   **Signature**: Defines the overall task for the ReAct agent (e.g., answering a question). The output schema of this signature will be augmented by ReAct to include `history` (an array of structured thought/action/observation steps) and `iterations`.
+*   **Tools**: Instances of classes inheriting from `DSPy::Tools::Tool`. Each tool has a `name`, `description` (used by the LLM to decide when to use the tool), and a `call` method that executes the tool's logic.
+*   **LLM**: The ReAct agent internally uses an LLM (configured via `DSPy.configure`) to generate thoughts and decide on actions.
+**Example 1: Simple Arithmetic with a Tool**
+Let's say we want to answer "What is 5 plus 7?". We can provide the ReAct agent with a simple calculator tool.
+```ruby
+# Define a signature for the task
+class MathQA < DSPy::Signature
+  description "Answers mathematical questions."
+  input do
+    required(:question).value(:string).meta(description: 'The math question to solve.')
+  end
+  output do
+    required(:answer).value(:string).meta(description: 'The numerical answer.')
+  end
+end
+# Define a simple calculator tool
+class CalculatorTool < DSPy::Tools::Tool
+  def initialize
+    super('calculator', 'Calculates the result of a simple arithmetic expression (e.g., "5 + 7"). Input must be a string representing the expression.')
+  end
+  def call(expression_string)
+    # In a real scenario, you might use a more robust expression parser.
+    # For this example, let's assume simple addition for "X + Y" format.
+    if expression_string.match(/(\d+)\s*\+\s*(\d+)/)
+      num1 = $1.to_i
+      num2 = $2.to_i
+      (num1 + num2).to_s
+    else
+      "Error: Could not parse expression. Use format 'number + number'."
+    end
+  rescue StandardError => e
+    "Error: #{e.message}"
+  end
+end
+# Configure DSPy (if not already done)
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY'])
+end
+# Initialize ReAct agent with the signature and tool
+calculator = CalculatorTool.new
+react_agent = DSPy::ReAct.new(MathQA, tools: [calculator])
+# Ask the question
+question_text = "What is 5 plus 7?"
+result = react_agent.forward(question: question_text)
+puts "Question: #{question_text}"
+puts "Answer: #{result.answer}"
+puts "Iterations: #{result.iterations}"
+puts "History:"
+result.history.each do |entry|
+  puts "  Step #{entry[:step]}:"
+  puts "    Thought: #{entry[:thought]}"
+  puts "    Action: #{entry[:action]}"
+  puts "    Action Input: #{entry[:action_input]}"
+  puts "    Observation: #{entry[:observation]}" if entry[:observation]
+end
+# Expected output (will vary based on LLM's reasoning):
+# Question: What is 5 plus 7?
+# Answer: 12
+# Iterations: 2
+# History:
+#   Step 1:
+#     Thought: I need to calculate 5 plus 7. I have a calculator tool that can do this.
+#     Action: calculator
+#     Action Input: 5 + 7
+#     Observation: 12
+#   Step 2:
+#     Thought: The calculator returned 12, which is the answer to "5 plus 7?". I can now finish.
+#     Action: finish
+#     Action Input: 12
+```
+**Example 2: Web Search with Serper.dev**
+For questions requiring up-to-date information or broader knowledge, the ReAct agent can use a web search tool. Here's an example using the `serper.dev` API.
+*Note: You'll need a Serper API key, which you can set in the `SERPER_API_KEY` environment variable.*
+```ruby
+require 'net/http'
+require 'json'
+require 'uri'
+# Define a signature for web-based QA
+class WebQuestionAnswer < DSPy::Signature
+  description "Answers questions that may require web searches."
+  input do
+    required(:question).value(:string).meta(description: 'The question to answer, potentially requiring a web search.')
+  end
+  output do
+    required(:answer).value(:string).meta(description: 'The final answer to the question.')
+  end
+end
+# Define the Serper Search Tool
+class SerperSearchTool < DSPy::Tools::Tool
+  def initialize
+    super('web_search', 'Searches the web for a given query and returns the first organic result snippet. Useful for finding current information or answers to general knowledge questions.')
+  end
+  def call(query)
+    api_key = ENV['SERPER_API_KEY']
+    unless api_key
+      return "Error: SERPER_API_KEY environment variable not set."
+    end
+    uri = URI.parse("https://google.serper.dev/search")
+    request = Net::HTTP::Post.new(uri)
+    request['X-API-KEY'] = api_key
+    request['Content-Type'] = 'application/json'
+    request.body = JSON.dump({ q: query })
+    begin
+      response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: uri.scheme == 'https') do |http|
+        http.request(request)
+      end
+      if response.is_a?(Net::HTTPSuccess)
+        results = JSON.parse(response.body)
+        first_organic_result = results['organic']&.first
+        if first_organic_result && first_organic_result['snippet']
+          return "Source: #{first_organic_result['link']}\nSnippet: #{first_organic_result['snippet']}"
+        elsif first_organic_result && first_organic_result['title']
+          return "Source: #{first_organic_result['link']}\nTitle: #{first_organic_result['title']}"
+        else
+          return "No relevant snippet found in the first result."
+        end
+      else
+        return "Error: Serper API request failed with status #{response.code} - #{response.body}"
+      end
+    rescue StandardError => e
+      return "Error performing web search: #{e.message}"
+    end
+  end
+end
+# Configure DSPy (if not already done)
+DSPy.configure do |c|
+  c.lm = DSPy::LM.new('openai/gpt-4o-mini', api_key: ENV['OPENAI_API_KEY']) # Ensure your LM is configured
+end
+# Initialize ReAct agent with the signature and search tool
+search_tool = SerperSearchTool.new
+web_qa_agent = DSPy::ReAct.new(WebQuestionAnswer, tools: [search_tool])
+# Ask a question requiring web search
+question_text = "What is the latest news about the Mars rover Perseverance?"
+result = web_qa_agent.forward(question: question_text)
+puts "Question: #{question_text}"
+puts "Answer: #{result.answer}"
+puts "Iterations: #{result.iterations}"
+puts "History (summary):"
+result.history.each_with_index do |entry, index|
+  puts "  Step #{entry[:step]}: Action: #{entry[:action]}, Input: #{entry[:action_input]&.slice(0, 50)}..."
+  # For brevity, not printing full thought/observation here.
+end
+# The answer and history will depend on the LLM's reasoning and live search results.
+```
+## Roadmap
+### First Release
+- [x] Signatures and Predict module
+- [x] RAG examples
+- [x] Multi-Stage Pipelines
+- [x] Validate inputs and outputs with JSON Schema
+- [x] thread-safe global config
+- [x] Convert responses from hashes to Dry Poros (currently tons of footguns with hashes :fire:)
+- [ ] Cover unhappy paths: validation errors
+- [x] Implement ReAct module for reasoning and acting
+- [ ] Add OpenTelemetry instrumentation
+- [ ] Improve logging
+- [ ] Add streaming support (?)
+- [x] Ensure thread safety
+- [ ] Comprehensive initial documentation, LLM friendly.
+#### Backburner
+- [ ] Support for multiple LM providers (Anthropic, etc.)
+- [ ] Support for reasoning providers
+- [ ] Adaptive Graph of Thoughts with Tools
+### Optimizers
+- [ ] Optimizing prompts: RAG
+- [ ] Optimizing prompts: Chain of Thought
+- [ ] Optimizing prompts: ReAct
+- [ ] Optimizing weights: Classification
+## Contributing
+Contributions are welcome! Please feel free to submit a Pull Request.
+## License
+`dspy.rb` is released under the [MIT License](LICENSE).

data/lib/dspy/chain_of_thought.rb ADDED Viewed

@@ -0,0 +1,22 @@
+# frozen_string_literal: true
+module DSPy
+  # Enhances prediction by encouraging step-by-step reasoning
+  # before providing a final answer.
+  class ChainOfThought < Predict
+    def initialize(signature_class)
+      @signature_class = signature_class
+      chain_of_thought_schema = Dry::Schema.JSON do
+        required(:reasoning).
+          value(:string).
+          meta(description: "Reasoning: Let's think step by step in order to #{signature_class.description}")
+      end
+      @signature_class.output_schema = Dry::Schema.JSON(parent:
+                                                          [
+                                                            @signature_class.output_schema,
+                                                            chain_of_thought_schema
+                                                          ])
+    end
+  end
+end

data/lib/dspy/ext/dry_schema.rb ADDED Viewed

@@ -0,0 +1,94 @@
+require 'dry/schema/version'
+if Dry::Schema::VERSION > Gem::Version.new('1.15')
+  raise 'Double check this monkey path before upgrading drys-schema.'
+end
+Dry::Schema.load_extensions(:json_schema)
+# Monkey patch Macros::Core to add meta method
+module Dry
+  module Schema
+    module Macros
+      class Core
+        def meta(metadata)
+          schema_dsl.meta(name, metadata)
+          self
+        end
+      end
+    end
+  end
+end
+# Monkey patch DSL to store metadata
+module Dry
+  module Schema
+    class DSL
+      def meta(name, metadata)
+        @metas ||= {}
+        @metas[name] = metadata
+        self
+      end
+      def metas
+        @metas ||= {}
+      end
+      # Ensure metas are included in new instances
+      alias_method :original_new, :new
+      def new(**options, &block)
+        options[:metas] = metas
+        original_new(**options, &block)
+      end
+      # Ensure processor has access to metas
+      alias_method :original_call, :call
+      def call
+        processor = original_call
+        processor.instance_variable_set(:@schema_metas, metas)
+        processor
+      end
+    end
+  end
+end
+# Monkey patch Processor to expose schema_metas
+module Dry
+  module Schema
+    class Processor
+      attr_reader :schema_metas
+      # Add schema_metas accessor
+      def schema_metas
+        @schema_metas ||= {}
+      end
+    end
+  end
+end
+# Directly monkey patch the JSON Schema generation
+module Dry
+  module Schema
+    module JSONSchema
+      module SchemaMethods
+        # Override the original json_schema method
+        def json_schema(loose: false)
+          compiler = SchemaCompiler.new(root: true, loose: loose)
+          compiler.call(to_ast)
+          result = compiler.to_hash
+          # Add descriptions to properties from schema_metas
+          if respond_to?(:schema_metas) && !schema_metas.empty?
+            schema_metas.each do |key, meta|
+              if meta[:description] && result[:properties][key]
+                result[:properties][key][:description] = meta[:description]
+              end
+            end
+          end
+          result
+        end
+      end
+    end
+  end
+end

data/lib/dspy/field.rb ADDED Viewed

@@ -0,0 +1,23 @@
+# frozen_string_literal: true
+module DSPy
+  class InputField
+    attr_reader :name, :type, :desc
+    def initialize(name, type, desc: nil)
+      @name = name
+      @type = type
+      @desc = desc
+    end
+  end
+  class OutputField
+    attr_reader :name, :type, :desc
+    def initialize(name, type, desc: nil)
+      @name = name
+      @type = type
+      @desc = desc
+    end
+  end
+end

data/lib/dspy/lm.rb ADDED Viewed

@@ -0,0 +1,76 @@
+# frozen_string_literal: true
+require 'ruby_llm'
+module DSPy
+  class LM
+    attr_reader :model_id, :api_key, :model, :provider
+    def initialize(model_id, api_key: nil)
+      @model_id = model_id
+      @api_key = api_key
+      # Configure RubyLLM with the API key if provided
+      if model_id.start_with?('openai/')
+        RubyLLM.configure do |config|
+          config.openai_api_key = api_key
+        end
+        @provider = :openai
+        @model = model_id.split('/').last
+      elsif model_id.start_with?('anthropic/')
+        RubyLLM.configure do |config|
+          config.anthropic_api_key = api_key
+        end
+        @provider = :anthropic
+        @model = model_id.split('/').last
+      else
+        raise ArgumentError, "Unsupported model provider: #{model_id}"
+      end
+    end
+    def chat(inference_module, input_values, &block)
+      signature_class = inference_module.signature_class
+      chat = RubyLLM.chat(model: model)
+      system_prompt = inference_module.system_signature
+      user_prompt = inference_module.user_signature(input_values)
+      chat.add_message role: :system, content: system_prompt
+      chat.ask(user_prompt, &block)
+      parse_response(chat.messages.last, input_values, signature_class)
+    end
+    private
+    def parse_response(response, input_values, signature_class)
+      # Try to parse the response as JSON
+      content = response.content
+      # Extract JSON if it's in a code block
+      if content.include?('```json')
+        content = content.split('```json').last.split('```').first.strip
+      elsif content.include?('```')
+        content = content.split('```').last.split('```').first.strip
+      end
+      begin
+        json_payload = JSON.parse(content)
+        # Handle different signature types
+        if signature_class < DSPy::SorbetSignature
+          # For Sorbet signatures, just return the parsed JSON
+          # The SorbetPredict will handle validation
+          json_payload
+        else
+          # Original dry-schema based handling
+          output = signature_class.output_schema.call(json_payload)
+          result_schema = Dry::Schema.JSON(parent: [signature_class.input_schema, signature_class.output_schema])
+          result = output.to_h.merge(input_values)
+          # create an instance with input and output schema
+          poro_result = result_schema.call(result)
+          poro_result.to_h
+        end
+      rescue JSON::ParserError
+        raise "Failed to parse LLM response as JSON: #{content}"
+      end
+    end
+  end
+end

data/lib/dspy/module.rb ADDED Viewed

@@ -0,0 +1,13 @@
+# frozen_string_literal: true
+module DSPy
+  class Module
+    def forward(...)
+      raise NotImplementedError, "Subclasses must implement forward method"
+    end
+    def call(...)
+      forward(...)
+    end
+  end
+end

data/lib/dspy/predict.rb ADDED Viewed

@@ -0,0 +1,72 @@
+# frozen_string_literal: true
+module DSPy
+  class PredictionInvalidError < RuntimeError
+    attr_accessor :errors
+    def initialize(errors)
+      @errors = errors
+      super("Prediction invalid: #{errors.to_h}")
+    end
+  end
+  class Predict < DSPy::Module
+    attr_reader :signature_class
+    def initialize(signature_class)
+      @signature_class = signature_class
+    end
+    def system_signature
+      <<-PROMPT
+      Your input schema fields are:
+        ```json
+         #{JSON.generate(@signature_class.input_schema.json_schema)}
+        ```
+      Your output schema fields are:
+        ```json
+          #{JSON.generate(@signature_class.output_schema.json_schema)}
+        ````
+      All interactions will be structured in the following way, with the appropriate values filled in.
+      ## Input values
+        ```json
+          {input_values}
+        ```
+      ## Output values
+      Respond exclusively with the output schema fields in the json block below.
+        ```json
+          {output_values}
+        ```
+      In adhering to this structure, your objective is: #{@signature_class.description}
+      PROMPT
+    end
+    def user_signature(input_values)
+      <<-PROMPT
+        ## Input Values
+        ```json
+        #{JSON.generate(input_values)}
+        ```
+        Respond with the corresponding output schema fields wrapped in a ```json ``` block,
+         starting with the heading `## Output values`.
+      PROMPT
+    end
+    def lm
+      DSPy.config.lm
+    end
+    def forward(**input_values)
+      DSPy.logger.info( module: self.class.to_s, **input_values)
+      result = @signature_class.input_schema.call(input_values)
+      if result.success?
+        output_attributes = lm.chat(self, input_values)
+        poro_class = Data.define(*output_attributes.keys)
+        return poro_class.new(*output_attributes.values)
+      end
+      raise PredictionInvalidError.new(result.errors)
+    end
+  end
+end