RubyGems - ruby_llm-instructor - Versions diffs - 0.1.0 - Mend

ruby_llm-instructor 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

checksums.yaml +7 -0
data/README.md +308 -0
data/Rakefile +6 -0
data/lib/ruby_llm/instructor/adapters/ruby_llm_schema.rb +79 -0
data/lib/ruby_llm/instructor/client.rb +105 -0
data/lib/ruby_llm/instructor/version.rb +5 -0
data/lib/ruby_llm/instructor.rb +20 -0
metadata +157 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: efe5592fe81d6feb6e5a328c56ae00bca1bb3f8007826da4440d1bc042f12c7e
+  data.tar.gz: 0d5b70673f1da2ef41686603c1c50bb237cb6d7dd338d921f5ed894c38dcb2a0
+SHA512:
+  metadata.gz: 8b0fb08a2c1c5f93639cfa79256029c3433b6ec4459d28082a69373464d08b6d4cc743b9617e7a4fb5eb1856f68f5cdad4efe3bd7d174753b42ca7d7e5aab7b7
+  data.tar.gz: 0a536fbab7e9f6420e8d9899ec30d78232c4f55af2dd6ed5ce62b154dc7b22607196291c88e39c60ddeed6eefecb46e568e5375e5dcdf0682a39ff0ca3e976b6

data/README.md ADDED Viewed

@@ -0,0 +1,308 @@
+# ruby_llm-instructor
+[![CI](https://github.com/washu/ruby_llm-instructor/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/washu/ruby_llm-instructor/actions/workflows/ci.yml)
+Structured, validated outputs from LLMs for Ruby. Define a Ruby class, hand it to
+`RubyLLM::Instructor::Client`, and get back a fully-hydrated, validated instance —
+with automatic retries on validation failure.
+Part of the [RubyLLM ecosystem](https://rubyllm.com/ecosystem/). Built on top of
+[`ruby_llm`](https://github.com/crmne/ruby_llm), so the same code works against
+OpenAI, Anthropic, Gemini, and every other provider `ruby_llm` supports.
+## Installation
+```ruby
+gem "ruby_llm-instructor"
+```
+```bash
+bundle install
+```
+## Quick start
+Configure `RubyLLM` with your API key(s), then pass any Ruby class as `response_model`:
+```ruby
+require "ruby_llm"
+require "ruby_llm/instructor"
+RubyLLM.configure do |config|
+  config.openai_api_key = ENV["OPENAI_API_KEY"]
+end
+class UserProfile
+  attr_accessor :name, :email
+end
+instructor = RubyLLM::Instructor::Client.new
+user = instructor.chat(
+  model: "gpt-4o",
+  response_model: UserProfile,
+  prompt: "Extract information: My name is Sal, reached at sal@example.com"
+)
+user.name   # => "Sal"
+user.email  # => "sal@example.com"
+user.class  # => UserProfile
+```
+## Supported response model types
+`ruby_llm-instructor` uses duck-typing — no base class or mixin required. The JSON
+schema sent to the LLM is inferred automatically from your class's shape.
+### Plain Ruby class (PORO)
+Schema inferred from `attr_accessor` setters. No validation — any response is accepted.
+```ruby
+class UserProfile
+  attr_accessor :name, :email
+end
+```
+### ActiveModel
+Add validations; `ruby_llm-instructor` calls `valid?` automatically and feeds
+error messages back to the LLM on retry.
+```ruby
+require "active_model"
+class LeadCapture
+  include ActiveModel::Model
+  include ActiveModel::Attributes
+  attribute :company, :string
+  attribute :phone,   :string
+  attribute :revenue, :integer
+  validates :company, presence: true
+  validates :phone, format: { with: /\A\+?\d{10,15}\z/, message: "must be a valid phone number" }
+end
+instructor = RubyLLM::Instructor::Client.new
+lead = instructor.chat(
+  model: "claude-3-5-sonnet",
+  response_model: LeadCapture,
+  prompt: "Inbound transcript: We are Stripe, call us at +15550192831. ARR is $4B."
+)
+lead.company # => "Stripe"
+lead.phone   # => "+15550192831"
+lead.revenue # => 4000000000
+```
+Using `ActiveModel::Attributes` also improves the JSON schema sent to the LLM —
+field types (`integer`, `number`, `boolean`) are inferred from your attribute
+declarations rather than defaulting to `string`.
+### dry-validation (native contract)
+Pass a `Dry::Validation::Contract` subclass directly. The JSON schema is built
+automatically from the contract's params block, and validation runs through the
+contract itself — no bridge required.
+```ruby
+require "dry-validation"
+class PersonContract < Dry::Validation::Contract
+  params do
+    required(:name).filled(:string)
+    required(:email).filled(:string)
+  end
+  rule(:email) { key.failure("must include @") unless value.include?("@") }
+end
+instructor = RubyLLM::Instructor::Client.new
+person = instructor.chat(
+  model: "gpt-4o",
+  response_model: PersonContract,
+  prompt: "Sal Scotto, sal@example.com"
+)
+person.name    # => "Sal Scotto"
+person.email   # => "sal@example.com"
+person.frozen? # => true  (returned as a Data object)
+```
+The returned instance is a `Data.define` value object with one member per contract
+field — immutable and frozen.
+#### Duck-typed bridge (alternative)
+If you prefer to keep your domain class, bridge dry-validation's result to
+`valid?` / `errors.full_messages` and it works the same way:
+```ruby
+class PersonDry
+  attr_accessor :name, :email
+  CONTRACT = Class.new(Dry::Validation::Contract) do
+    params do
+      required(:name).filled(:string)
+      required(:email).filled(:string)
+    end
+    rule(:email) { key.failure("must include @") unless value.include?("@") }
+  end
+  def valid?
+    @result = CONTRACT.new.call(name: @name, email: @email)
+    @result.success?
+  end
+  def errors
+    DryErrors.new(@result)
+  end
+  DryErrors = Struct.new(:result) do
+    def full_messages
+      return [] unless result
+      result.errors.to_h.flat_map { |field, msgs| msgs.map { |m| "#{field} #{m}" } }
+    end
+  end
+end
+```
+### Ruby `Data.define` (immutable value object)
+Members are inferred automatically. The returned instance is frozen.
+```ruby
+Person = Data.define(:name, :email)
+person = instructor.chat(
+  model: "gpt-4o",
+  response_model: Person,
+  prompt: "Sal Scotto, sal@example.com"
+)
+person.name    # => "Sal Scotto"
+person.frozen? # => true
+```
+### Struct
+```ruby
+Address = Struct.new(:street, :city, :zip, keyword_init: true)
+address = instructor.chat(
+  model: "gpt-4o",
+  response_model: Address,
+  prompt: "Ship to: 123 Main St, Springfield, 62701"
+)
+address.city # => "Springfield"
+```
+### Custom schema
+If your class defines `to_json_schema` (class or instance method), the adapter uses
+it directly instead of introspecting setters — giving you full control over the schema
+sent to the LLM while keeping the normal hydration and validation flow.
+```ruby
+class Article
+  attr_accessor :title, :status
+  def self.to_json_schema
+    {
+      name: "article",
+      schema: {
+        type: "object",
+        properties: {
+          title:  { type: "string", description: "Article headline" },
+          status: { type: "string", enum: %w[draft published archived] }
+        },
+        required: %w[title status]
+      }
+    }
+  end
+end
+```
+## Streaming
+Pass a `stream:` proc to receive chunks as they arrive. The final hydrated object
+is still returned once the response completes.
+```ruby
+instructor.chat(
+  model: "gpt-4o",
+  response_model: UserProfile,
+  prompt: "...",
+  stream: ->(chunk) { print chunk.content }
+)
+```
+## Extraction mode: schema vs tools
+By default `ruby_llm-instructor` uses `mode: :schema` — structured output via the
+provider's native JSON schema constraint. Pass `mode: :tools` to use function
+calling instead, which works with older models that pre-date structured output.
+```ruby
+# Default — structured output (recommended for modern models)
+instructor.chat(model: "gpt-4o", response_model: MyModel, prompt: "...", mode: :schema)
+# Function-calling fallback — works with older models
+instructor.chat(model: "gpt-3.5-turbo", response_model: MyModel, prompt: "...", mode: :tools)
+```
+## Auto-retry on validation failure
+When the LLM returns data that fails `valid?`, `ruby_llm-instructor` feeds the
+error messages back to the model and asks for a corrected response — up to
+`max_retries` times (default: 3). If all retries are exhausted, a `RuntimeError`
+is raised.
+```ruby
+instructor.chat(
+  model: "gpt-4o",
+  response_model: LeadCapture,
+  prompt: "...",
+  max_retries: 5
+)
+```
+## One model, any provider
+The `model:` string is passed straight through to `ruby_llm`:
+```ruby
+# OpenAI
+instructor.chat(model: "gpt-4o", ...)
+# Anthropic
+instructor.chat(model: "claude-3-5-sonnet", ...)
+# Ollama (local)
+instructor.chat(model: "llama3", ...)
+```
+## What's in v0.1
+- All `ruby_llm`-supported providers (OpenAI, Anthropic, Gemini, Ollama, …)
+- Response models: PORO, ActiveModel, native dry-validation contract, duck-typed dry-v bridge, `Data.define`, `Struct`, custom `to_json_schema`
+- Type inference from `ActiveModel::Attributes` (integer, number, boolean)
+- Required vs. optional fields from presence validators
+- Automatic retry-on-validation-failure with corrective prompt
+- Streaming via `stream:` proc
+- Function-calling fallback via `mode: :tools`
+## Development
+```bash
+bin/setup
+bundle exec rspec
+```
+## License
+MIT

data/Rakefile ADDED Viewed

@@ -0,0 +1,6 @@
+require "bundler/gem_tasks"
+require "rspec/core/rake_task"
+RSpec::Core::RakeTask.new(:spec)
+task default: :spec

data/lib/ruby_llm/instructor/adapters/ruby_llm_schema.rb ADDED Viewed

@@ -0,0 +1,79 @@
+module RubyLLM
+  module Instructor
+    module Adapters
+      class RubyLlmSchemaAdapter
+        def initialize(model_klass)
+          @klass = model_klass
+        end
+        def build_schema
+          return build_dry_contract_schema if dry_contract?
+          return @klass.to_json_schema if @klass.respond_to?(:to_json_schema)
+          return @klass.new.to_json_schema if @klass.method_defined?(:to_json_schema)
+          attrs = attribute_definitions
+          RubyLLM::Schema.create do
+            attrs.each do |name, type, required|
+              opts = { required: required, description: "Extracted value for #{name}" }
+              case type
+              when :integer then integer name, **opts
+              when :number  then number  name, **opts
+              when :boolean then boolean name, **opts
+              else               string  name, **opts
+              end
+            end
+          end
+        end
+        private
+        def attribute_definitions
+          if @klass.respond_to?(:members)
+            @klass.members.map { |m| [m.to_sym, :string, true] }
+          elsif @klass.respond_to?(:attribute_types)
+            required = presence_validated_fields
+            @klass.attribute_types.filter_map do |name, type|
+              next if name == "id"
+              [name.to_sym, map_active_model_type(type), required.include?(name)]
+            end
+          else
+            @klass.instance_methods(false)
+                  .select { |m| m.to_s.end_with?("=") }
+                  .map    { |m| [m.to_s.chomp("=").to_sym, :string, true] }
+          end
+        end
+        def map_active_model_type(type)
+          case type.type
+          when :integer         then :integer
+          when :float, :decimal then :number
+          when :boolean         then :boolean
+          else                       :string
+          end
+        end
+        def dry_contract?
+          defined?(Dry::Validation::Contract) &&
+            @klass.is_a?(Class) &&
+            @klass < Dry::Validation::Contract
+        rescue TypeError
+          false
+        end
+        def build_dry_contract_schema
+          raw = @klass.schema.json_schema
+          { name: "response", schema: raw.reject { |k, _| k.to_s == "$schema" } }
+        end
+        def presence_validated_fields
+          return [] unless @klass.respond_to?(:_validators)
+          @klass._validators.select { |_, validators|
+            validators.any? { |v| v.is_a?(ActiveModel::Validations::PresenceValidator) }
+          }.keys.map(&:to_s)
+        end
+      end
+    end
+  end
+end

data/lib/ruby_llm/instructor/client.rb ADDED Viewed

@@ -0,0 +1,105 @@
+# frozen_string_literal: true
+module RubyLLM
+  module Instructor
+    class Client
+      def chat(model:, response_model:, prompt:, max_retries: 3, stream: nil, mode: :schema)
+        compiled_schema = Adapters::RubyLlmSchemaAdapter.new(response_model).build_schema
+        current_prompt = prompt
+        retries = 0
+        begin
+          session = RubyLLM.chat(model: model)
+          response = mode == :tools ? via_tools(session, compiled_schema, current_prompt, stream)
+                                    : via_schema(session, compiled_schema, current_prompt, stream)
+          parsed_data = response.content
+          unless parsed_data.is_a?(Hash)
+            raise ValidationError,
+                  "Expected a structured JSON object matching the schema, " \
+                  "got #{parsed_data.class} (#{parsed_data.inspect[0, 200]})"
+          end
+          errors = validate_payload(response_model, parsed_data)
+          raise ValidationError, errors.join(", ") if errors.any?
+          build_instance(response_model, parsed_data)
+        rescue ValidationError => e
+          if retries < max_retries
+            retries += 1
+            current_prompt = "Your structural response failed local validation rules: #{e.message}. Please fix the data matching the schema parameters perfectly."
+            retry
+          else
+            raise "ruby_llm-instructor failed validation after #{max_retries} attempts. Errors: #{e.message}"
+          end
+        end
+      end
+      private
+      def via_schema(session, schema, prompt, stream)
+        session.with_schema(schema).ask(prompt, &stream)
+      end
+      def via_tools(session, schema, prompt, stream)
+        tool = extraction_tool_for(schema)
+        session.with_tool(tool, choice: :required, calls: :one).ask(prompt, &stream)
+      end
+      def extraction_tool_for(schema)
+        tool_params = schema.is_a?(Hash) ? (schema[:schema] || schema["schema"] || schema) : schema
+        Class.new(RubyLLM::Tool) do
+          description "Extract and return the structured data from the text"
+          singleton_class.define_method(:name) { "RubyLLMInstructorExtract" }
+          params tool_params
+          def execute(**args) = halt(args)
+        end
+      end
+      def dry_contract?(klass)
+        defined?(Dry::Validation::Contract) &&
+          klass.is_a?(Class) &&
+          klass < Dry::Validation::Contract
+      rescue TypeError
+        false
+      end
+      def validate_payload(response_model, parsed_data)
+        if dry_contract?(response_model)
+          result = response_model.new.call(parsed_data.transform_keys(&:to_sym))
+          return [] if result.success?
+          return result.errors.to_h.flat_map { |field, msgs| msgs.map { |m| "#{field} #{m}" } }
+        end
+        return [] unless response_model.respond_to?(:new)
+        instance = build_instance(response_model, parsed_data)
+        return [] unless instance.respond_to?(:valid?) && !instance.valid?
+        if instance.respond_to?(:errors) && instance.errors.respond_to?(:full_messages)
+          instance.errors.full_messages
+        else
+          ["Validation failed"]
+        end
+      end
+      def build_instance(response_model, parsed_data)
+        if dry_contract?(response_model)
+          fields = response_model.schema.key_map.map(&:name).map(&:to_sym)
+          return Data.define(*fields).new(**parsed_data.transform_keys(&:to_sym).slice(*fields))
+        end
+        if response_model.respond_to?(:members)
+          response_model.new(**parsed_data.transform_keys(&:to_sym))
+        else
+          instance = response_model.new
+          parsed_data.each do |key, value|
+            instance.send("#{key}=", value) if instance.respond_to?("#{key}=")
+          end
+          instance
+        end
+      end
+    end
+  end
+end

data/lib/ruby_llm/instructor/version.rb ADDED Viewed

@@ -0,0 +1,5 @@
+module RubyLLM
+  module Instructor
+    VERSION = "0.1.0"
+  end
+end

data/lib/ruby_llm/instructor.rb ADDED Viewed

@@ -0,0 +1,20 @@
+require "ruby_llm"
+require "ruby_llm/schema"
+require "active_model"
+require_relative "instructor/version"
+require_relative "instructor/adapters/ruby_llm_schema"
+require_relative "instructor/client"
+begin
+  require "dry-validation"
+  require "dry/schema"
+  Dry::Schema.load_extensions(:json_schema)
+rescue LoadError
+  nil
+end
+module RubyLLM
+  module Instructor
+    class ValidationError < StandardError; end
+  end
+end

metadata ADDED Viewed

@@ -0,0 +1,157 @@
+--- !ruby/object:Gem::Specification
+name: ruby_llm-instructor
+version: !ruby/object:Gem::Version
+  version: 0.1.0
+platform: ruby
+authors:
+- Sal Scotto Di Luzio
+bindir: bin
+cert_chain: []
+date: 1980-01-02 00:00:00.000000000 Z
+dependencies:
+- !ruby/object:Gem::Dependency
+  name: ruby_llm
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.15.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 1.15.0
+- !ruby/object:Gem::Dependency
+  name: ruby_llm-schema
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.4.0
+  type: :runtime
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: 0.4.0
+- !ruby/object:Gem::Dependency
+  name: rspec
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '3.0'
+- !ruby/object:Gem::Dependency
+  name: bundler
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '2.0'
+- !ruby/object:Gem::Dependency
+  name: rake
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
+- !ruby/object:Gem::Dependency
+  name: activemodel
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '7.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '7.0'
+- !ruby/object:Gem::Dependency
+  name: dry-validation
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '1.0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '1.0'
+- !ruby/object:Gem::Dependency
+  name: simplecov
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.22'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '0.22'
+description: Validates and coerces unstructured LLM responses directly into rich,
+  schema-validated Ruby objects with automatic self-correction loops.
+executables: []
+extensions: []
+extra_rdoc_files: []
+files:
+- README.md
+- Rakefile
+- lib/ruby_llm/instructor.rb
+- lib/ruby_llm/instructor/adapters/ruby_llm_schema.rb
+- lib/ruby_llm/instructor/client.rb
+- lib/ruby_llm/instructor/version.rb
+licenses:
+- MIT
+metadata: {}
+rdoc_options: []
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+required_rubygems_version: !ruby/object:Gem::Requirement
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      version: '0'
+requirements: []
+rubygems_version: 3.6.9
+specification_version: 4
+summary: Structured outputs for LLMs in Ruby, powered by RubyLLM and Can be used with
+  ActiveModel, DryValidations or PORO objects that define a valid? method.
+test_files: []