RubyGems - langchainrb - Versions diffs - 0.8.2 → 0.9.0 - Mend

langchainrb 0.8.2 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (29) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +53 -25
data/lib/langchain/assistants/assistant.rb +199 -0
data/lib/langchain/assistants/message.rb +58 -0
data/lib/langchain/assistants/thread.rb +34 -0
data/lib/langchain/conversation/memory.rb +1 -6
data/lib/langchain/conversation.rb +7 -18
data/lib/langchain/llm/ai21.rb +1 -1
data/lib/langchain/llm/azure.rb +10 -97
data/lib/langchain/llm/base.rb +1 -0
data/lib/langchain/llm/cohere.rb +4 -6
data/lib/langchain/llm/google_palm.rb +2 -0
data/lib/langchain/llm/google_vertex_ai.rb +12 -10
data/lib/langchain/llm/openai.rb +104 -160
data/lib/langchain/llm/replicate.rb +0 -6
data/lib/langchain/llm/response/anthropic_response.rb +4 -0
data/lib/langchain/llm/response/google_palm_response.rb +4 -0
data/lib/langchain/llm/response/ollama_response.rb +4 -0
data/lib/langchain/llm/response/openai_response.rb +8 -0
data/lib/langchain/tool/base.rb +24 -0
data/lib/langchain/tool/google_search.rb +1 -4
data/lib/langchain/utils/token_length/ai21_validator.rb +2 -2
data/lib/langchain/utils/token_length/cohere_validator.rb +2 -2
data/lib/langchain/utils/token_length/google_palm_validator.rb +2 -2
data/lib/langchain/utils/token_length/openai_validator.rb +2 -2
data/lib/langchain/version.rb +1 -1
data/lib/langchain.rb +2 -1
metadata +8 -5

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 13eec34cc529732ddfb8994956659bd4307a79ebfd76ff883fe3b6644d647c24
-  data.tar.gz: ce04acfe42a6a8da5a5951734651dd0083f7d2efc43cf4b3367710c8221ee96a
+  metadata.gz: eb443fa9eb8f0f9ee32fcef7b413d6825a4c45779c14551e03e71878215560d9
+  data.tar.gz: ed70c0b23899598c04fc6c6178466f2bda354f2483f712e83eeb9797f55c38ef
 SHA512:
-  metadata.gz: 2094d99610311a1583d890f8c6898605bcd3e76d2fb72deb1ccd4b250f2b98f7a883401faf2e161b97b82fb29f6e64ead8843d8af22f0bd3e8a4c872c150c134
-  data.tar.gz: d7ce155cbb992e651aa8dc468ed1ee39bd96d1457f50faa11a32d7caac87086f5d8a381fc2b50aaba10ac934486ed415d5e609f47ee0426b4187540e2436b2e9
+  metadata.gz: 1cf9baef16a801a1fd81ab6cb1ee89ab297fb8bc633a15641d125e44b1f4121208ec5e41f1c79ac49f93d27e60be899e030ec2bfb99359d6dc983b99398302ce
+  data.tar.gz: af0961e7ee973c0fd35f6e44206f66f7f598c0213a4ec71fb5a7608a58cf56336a6a1d700341c53ad6c0c65fb8eebd9bd382b54829821caa5219dda0089ca8f2

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,9 @@
 ## [Unreleased]
+## [0.9.0]
+- Introducing new `Langchain::Assistant` that will be replacing `Langchain::Conversation` and `Langchain::Agent`s.
+- `Langchain::Conversation` is deprecated.
 ## [0.8.2]
 - Introducing new `Langchain::Chunker::Markdown` chunker (thanks @spikex)
 - Fixes

data/README.md CHANGED Viewed

@@ -1,6 +1,3 @@
-# Please fill out the [Ruby AI Survey 2023](https://docs.google.com/forms/d/1dH_0js1wpEyh1YqPTOxU3b5fXj76sb5lYp12lVoNNZE/edit).
-Results will be anonymized and shared!
 💎🔗 Langchain.rb
 ---
 ⚡ Building LLM-powered applications in Ruby ⚡
@@ -18,8 +15,7 @@ Available for paid consulting engagements! [Email me](mailto:andrei@sourcelabs.i
 ## Use Cases
 * Retrieval Augmented Generation (RAG) and vector search
-* Chat bots
-* [AI agents](https://github.com/andreibondarev/langchainrb/tree/main/lib/langchain/agent/agents.md)
+* [Assistants](#assistants) (chat bots) & [AI Agents](https://github.com/andreibondarev/langchainrb/tree/main/lib/langchain/agent/agents.md)
 ## Table of Contents
@@ -29,7 +25,7 @@ Available for paid consulting engagements! [Email me](mailto:andrei@sourcelabs.i
 - [Prompt Management](#prompt-management)
 - [Output Parsers](#output-parsers)
 - [Building RAG](#building-retrieval-augment-generation-rag-system)
-- [Building chat bots](#building-chat-bots)
+- [Assistants](#assistants)
 - [Evaluations](#evaluations-evals)
 - [Examples](#examples)
 - [Logging](#logging)
@@ -64,7 +60,7 @@ Langchain.rb wraps all supported LLMs in a unified interface allowing you to eas
 | [AWS Bedrock](https://aws.amazon.com/bedrock?utm_source=langchainrb&utm_medium=github)          | ✅                 | ✅                 | ❌                  | ❌                 | Provides AWS, Cohere, AI21, Antropic and Stability AI models |
 | [Cohere](https://cohere.com/?utm_source=langchainrb&utm_medium=github)                     | ✅                 | ✅                 | ✅                  | ✅                 |                    |
 | [GooglePalm](https://ai.google/discover/palm2?utm_source=langchainrb&utm_medium=github)         | ✅                 | ✅                 | ✅                  | ✅                 |                    |
-| [Google Vertex AI](https://cloud.google.com/vertex-ai?utm_source=langchainrb&utm_medium=github) | ✅                 | ❌                 | ❌                  | ❌                 |                    |
+| [Google Vertex AI](https://cloud.google.com/vertex-ai?utm_source=langchainrb&utm_medium=github) | ✅                 | ✅                 | ❌                  | ✅                 |                    |
 | [HuggingFace](https://huggingface.co/?utm_source=langchainrb&utm_medium=github)                 | ✅                 | ❌                 | ❌                  | ❌                 |                    |
 | [Ollama](https://ollama.ai/?utm_source=langchainrb&utm_medium=github)                           | ✅                 | ✅                 | ❌                  | ❌                 |                    |
 | [Replicate](https://replicate.com/?utm_source=langchainrb&utm_medium=github)                    | ✅                 | ✅                 | ✅                  | ✅                 |                    |
@@ -73,7 +69,7 @@ Langchain.rb wraps all supported LLMs in a unified interface allowing you to eas
 #### OpenAI
-Add `gem "ruby-openai", "~> 6.1.0"` to your Gemfile.
+Add `gem "ruby-openai", "~> 6.3.0"` to your Gemfile.
 ```ruby
 llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
@@ -405,43 +401,75 @@ client.ask(
 )
 ```
-## Building chat bots
+## Evaluations (Evals)
+The Evaluations module is a collection of tools that can be used to evaluate and track the performance of the output products by LLM and your RAG (Retrieval Augmented Generation) pipelines.
-### Conversation class
+## Assistants
+Assistants are Agent-like objects that leverage helpful instructions, LLMs, tools and knowledge to respond to user queries. Assistants can be configured with an LLM of your choice (currently only OpenAI), any vector search database and easily extended with additional tools.
-Choose and instantiate the LLM provider you'll be using:
+### Creating an Assistant
+1. Instantiate an LLM of your choice
 ```ruby
 llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
 ```
-Instantiate the Conversation class:
+2. Instantiate a Thread. Threads keep track of the messages in the Assistant conversation.
+```ruby
+thread = Langchain::Thread.new
+```
+You can pass old message from previously using the Assistant:
+```ruby
+thread.messages = messages
+```
+Messages contain the conversation history and the whole message history is sent to the LLM every time. A Message belongs to 1 of the 4 roles:
+* `Message(role: "system")` message usually contains the instructions.
+* `Message(role: "user")` messages come from the user.
+* `Message(role: "assistant")` messages are produced by the LLM.
+* `Message(role: "tool")` messages are sent in response to tool calls with tool outputs.
+3. Instantiate an Assistant
+```ruby
+assistant = Langchain::Assistant.new(
+  llm: llm,
+  thread: thread,
+  instructions: "You are a Meteorologist Assistant that is able to pull the weather for any location",
+  tools: [
+    Langchain::Tool::GoogleSearch.new(api_key: ENV["SERPAPI_API_KEY"])
+  ]
+)
+```
+### Using an Assistant
+You can now add your message to an Assistant.
 ```ruby
-chat = Langchain::Conversation.new(llm: llm)
+assistant.add_message content: "What's the weather in New York City?"
 ```
-(Optional) Set the conversation context:
+Run the Assistant to generate a response.
 ```ruby
-chat.set_context("You are a chatbot from the future")
+assistant.run
 ```
-Exchange messages with the LLM
+If a Tool is invoked you can manually submit an output.
 ```ruby
-chat.message("Tell me about future technologies")
+assistant.submit_tool_output tool_call_id: "...", output: "It's 70 degrees and sunny in New York City"
 ```
-To stream the chat response:
+Or run the assistant with `auto_tool_execution: tool` to call Tools automatically.
 ```ruby
-chat = Langchain::Conversation.new(llm: llm) do |chunk|
-  print(chunk)
-end
+assistant.add_message content: "How about San Diego, CA?"
+assistant.run(auto_tool_execution: true)
+```
+You can also combine the two by calling:
+```ruby
+assistant.add_message_and_run content: "What about Sacramento, CA?", auto_tool_execution: true
 ```
-Open AI Functions support
+### Accessing Thread messages
+You can access the messages in a Thread by calling `assistant.thread.messages`.
 ```ruby
-chat.set_functions(functions)
+assistant.thread.messages
 ```
-## Evaluations (Evals)
-The Evaluations module is a collection of tools that can be used to evaluate and track the performance of the output products by LLM and your RAG (Retrieval Augmented Generation) pipelines.
+The Assistant checks the context window limits before every request to the LLM and remove oldest thread messages one by one if the context window is exceeded.
 ### RAGAS
 Ragas helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. The implementation is based on this [paper](https://arxiv.org/abs/2309.15217) and the original Python [repo](https://github.com/explodinggradients/ragas). Ragas tracks the following 3 metrics and assigns the 0.0 - 1.0 scores:

data/lib/langchain/assistants/assistant.rb ADDED Viewed

@@ -0,0 +1,199 @@
+# frozen_string_literal: true
+module Langchain
+  class Assistant
+    attr_reader :llm, :thread, :instructions
+    attr_accessor :tools
+    # Create a new assistant
+    #
+    # @param llm [Langchain::LLM::Base] LLM instance that the assistant will use
+    # @param thread [Langchain::Thread] The thread that'll keep track of the conversation
+    # @param tools [Array<Langchain::Tool::Base>] Tools that the assistant has access to
+    # @param instructions [String] The system instructions to include in the thread
+    def initialize(
+      llm:,
+      thread:,
+      tools: [],
+      instructions: nil
+    )
+      raise ArgumentError, "Invalid LLM; currently only Langchain::LLM::OpenAI is supported" unless llm.instance_of?(Langchain::LLM::OpenAI)
+      raise ArgumentError, "Thread must be an instance of Langchain::Thread" unless thread.is_a?(Langchain::Thread)
+      raise ArgumentError, "Tools must be an array of Langchain::Tool::Base instance(s)" unless tools.is_a?(Array) && tools.all? { |tool| tool.is_a?(Langchain::Tool::Base) }
+      @llm = llm
+      @thread = thread
+      @tools = tools
+      @instructions = instructions
+      # The first message in the thread should be the system instructions
+      # TODO: What if the user added old messages and the system instructions are already in there? Should this overwrite the existing instructions?
+      add_message(role: "system", content: instructions) if instructions
+    end
+    # Add a user message to the thread
+    #
+    # @param content [String] The content of the message
+    # @param role [String] The role attribute of the message. Default: "user"
+    # @param tool_calls [Array<Hash>] The tool calls to include in the message
+    # @param tool_call_id [String] The ID of the tool call to include in the message
+    # @return [Array<Langchain::Message>] The messages in the thread
+    def add_message(content: nil, role: "user", tool_calls: [], tool_call_id: nil)
+      message = build_message(role: role, content: content, tool_calls: tool_calls, tool_call_id: tool_call_id)
+      thread.add_message(message)
+    end
+    # Run the assistant
+    #
+    # @param auto_tool_execution [Boolean] Whether or not to automatically run tools
+    # @return [Array<Langchain::Message>] The messages in the thread
+    def run(auto_tool_execution: false)
+      running = true
+      while running
+        # TODO: I think we need to look at all messages and not just the last one.
+        case (last_message = thread.messages.last).role
+        when "system"
+          # Do nothing
+          running = false
+        when "assistant"
+          if last_message.tool_calls.any?
+            if auto_tool_execution
+              run_tools(last_message.tool_calls)
+            else
+              # Maybe log and tell the user that there's outstanding tool calls?
+              running = false
+            end
+          else
+            # Last message was from the assistant without any tools calls.
+            # Do nothing
+            running = false
+          end
+        when "user"
+          # Run it!
+          response = chat_with_llm
+          if response.tool_calls
+            # Re-run the while(running) loop to process the tool calls
+            running = true
+            add_message(role: response.role, tool_calls: response.tool_calls)
+          elsif response.chat_completion
+            # Stop the while(running) loop and add the assistant's response to the thread
+            running = false
+            add_message(role: response.role, content: response.chat_completion)
+          end
+        when "tool"
+          # Run it!
+          response = chat_with_llm
+          running = true
+          if response.tool_calls
+            add_message(role: response.role, tool_calls: response.tool_calls)
+          elsif response.chat_completion
+            add_message(role: response.role, content: response.chat_completion)
+          end
+        end
+      end
+      thread.messages
+    end
+    # Add a user message to the thread and run the assistant
+    #
+    # @param content [String] The content of the message
+    # @param auto_tool_execution [Boolean] Whether or not to automatically run tools
+    # @return [Array<Langchain::Message>] The messages in the thread
+    def add_message_and_run(content:, auto_tool_execution: false)
+      add_message(content: content, role: "user")
+      run(auto_tool_execution: auto_tool_execution)
+    end
+    # Submit tool output to the thread
+    #
+    # @param tool_call_id [String] The ID of the tool call to submit output for
+    # @param output [String] The output of the tool
+    # @return [Array<Langchain::Message>] The messages in the thread
+    def submit_tool_output(tool_call_id:, output:)
+      # TODO: Validate that `tool_call_id` is valid
+      add_message(role: "tool", content: output, tool_call_id: tool_call_id)
+    end
+    private
+    # Call to the LLM#chat() method
+    #
+    # @return [Langchain::LLM::BaseResponse] The LLM response object
+    def chat_with_llm
+      params = {messages: thread.openai_messages}
+      if tools.any?
+        params[:tools] = tools.map(&:to_openai_tool)
+        # TODO: Not sure that tool_choice should always be "auto"; Maybe we can let the user toggle it.
+        params[:tool_choice] = "auto"
+      end
+      llm.chat(**params)
+    end
+    # Run the tools automatically
+    #
+    # @param tool_calls [Array<Hash>] The tool calls to run
+    def run_tools(tool_calls)
+      # Iterate over each function invocation and submit tool output
+      tool_calls.each do |tool_call|
+        tool_call_id = tool_call.dig("id")
+        tool_name = tool_call.dig("function", "name")
+        tool_arguments = JSON.parse(tool_call.dig("function", "arguments"), symbolize_names: true)
+        tool_instance = tools.find do |t|
+          t.name == tool_name
+        end or raise ArgumentError, "Tool not found in assistant.tools"
+        output = tool_instance.execute(**tool_arguments)
+        submit_tool_output(tool_call_id: tool_call_id, output: output)
+      end
+      response = chat_with_llm
+      if response.tool_calls
+        add_message(role: response.role, tool_calls: response.tool_calls)
+      elsif response.chat_completion
+        add_message(role: response.role, content: response.chat_completion)
+      end
+    end
+    # Build a message
+    #
+    # @param role [String] The role of the message
+    # @param content [String] The content of the message
+    # @param tool_calls [Array<Hash>] The tool calls to include in the message
+    # @param tool_call_id [String] The ID of the tool call to include in the message
+    # @return [Langchain::Message] The Message object
+    def build_message(role:, content: nil, tool_calls: [], tool_call_id: nil)
+      Message.new(role: role, content: content, tool_calls: tool_calls, tool_call_id: tool_call_id)
+    end
+    # # TODO: Fix the message truncation when context window is exceeded
+    # def build_assistant_prompt(instructions:, tools:)
+    #   while begin
+    #     # Check if the prompt exceeds the context window
+    #     # Return false to exit the while loop
+    #     !llm.class.const_get(:LENGTH_VALIDATOR).validate_max_tokens!(
+    #       thread.messages,
+    #       llm.defaults[:chat_completion_model_name],
+    #       {llm: llm}
+    #     )
+    #   # Rescue error if context window is exceeded and return true to continue the while loop
+    #   rescue Langchain::Utils::TokenLength::TokenLimitExceeded
+    #     # Should be using `retry` instead of while()
+    #     true
+    #   end
+    #     # Truncate the oldest messages when the context window is exceeded
+    #     thread.messages.shift
+    #   end
+    #   prompt
+    # end
+  end
+end

data/lib/langchain/assistants/message.rb ADDED Viewed

@@ -0,0 +1,58 @@
+# frozen_string_literal: true
+module Langchain
+  # Langchain::Message are the messages that are sent to LLM chat methods
+  class Message
+    attr_reader :role, :content, :tool_calls, :tool_call_id
+    ROLES = %w[
+      system
+      assistant
+      user
+      tool
+    ].freeze
+    # @param role [String] The role of the message
+    # @param content [String] The content of the message
+    # @param tool_calls [Array<Hash>] Tool calls to be made
+    # @param tool_call_id [String] The ID of the tool call to be made
+    def initialize(role:, content: nil, tool_calls: [], tool_call_id: nil) # TODO: Implement image_file: reference (https://platform.openai.com/docs/api-reference/messages/object#messages/object-content)
+      raise ArgumentError, "Role must be one of #{ROLES.join(", ")}" unless ROLES.include?(role)
+      raise ArgumentError, "Tool calls must be an array of hashes" unless tool_calls.is_a?(Array) && tool_calls.all? { |tool_call| tool_call.is_a?(Hash) }
+      @role = role
+      # Some Tools return content as a JSON hence `.to_s`
+      @content = content.to_s
+      @tool_calls = tool_calls
+      @tool_call_id = tool_call_id
+    end
+    # Convert the message to an OpenAI API-compatible hash
+    #
+    # @return [Hash] The message as an OpenAI API-compatible hash
+    def to_openai_format
+      {}.tap do |h|
+        h[:role] = role
+        h[:content] = content if content # Content is nil for tool calls
+        h[:tool_calls] = tool_calls if tool_calls.any?
+        h[:tool_call_id] = tool_call_id if tool_call_id
+      end
+    end
+    def assistant?
+      role == "assistant"
+    end
+    def system?
+      role == "system"
+    end
+    def user?
+      role == "user"
+    end
+    def tool?
+      role == "tool"
+    end
+  end
+end

data/lib/langchain/assistants/thread.rb ADDED Viewed

@@ -0,0 +1,34 @@
+# frozen_string_literal: true
+module Langchain
+  # Langchain::Thread keeps track of messages in a conversation
+  # Eventually we may want to add functionality to persist to the thread to disk, DB, storage, etc.
+  class Thread
+    attr_accessor :messages
+    # @param messages [Array<Langchain::Message>]
+    def initialize(messages: [])
+      raise ArgumentError, "messages array must only contain Langchain::Message instance(s)" unless messages.is_a?(Array) && messages.all? { |m| m.is_a?(Langchain::Message) }
+      @messages = messages
+    end
+    # Convert the thread to an OpenAI API-compatible array of hashes
+    #
+    # @return [Array<Hash>] The thread as an OpenAI API-compatible array of hashes
+    def openai_messages
+      messages.map(&:to_openai_format)
+    end
+    # Add a message to the thread
+    #
+    # @param message [Langchain::Message] The message to add
+    # @return [Array<Langchain::Message>] The updated messages array
+    def add_message(message)
+      raise ArgumentError, "message must be a Langchain::Message instance" unless message.is_a?(Langchain::Message)
+      # Prepend the message to the thread
+      messages << message
+    end
+  end
+end

data/lib/langchain/conversation/memory.rb CHANGED Viewed

@@ -3,7 +3,7 @@
 module Langchain
   class Conversation
     class Memory
-      attr_reader :examples, :messages
+      attr_reader :messages
       # The least number of tokens we want to be under the limit by
       TOKEN_LEEWAY = 20
@@ -12,7 +12,6 @@ module Langchain
         @llm = llm
         @context = nil
         @summary = nil
-        @examples = []
         @messages = messages
         @strategy = options.delete(:strategy) || :truncate
         @options = options
@@ -22,10 +21,6 @@ module Langchain
         @context = message
       end
-      def add_examples(examples)
-        @examples.concat examples
-      end
       def append_message(message)
         @messages.append(message)
       end

data/lib/langchain/conversation.rb CHANGED Viewed

@@ -25,9 +25,10 @@ module Langchain
     # @param options [Hash] Options to pass to the LLM, like temperature, top_k, etc.
     # @return [Langchain::Conversation] The Langchain::Conversation instance
     def initialize(llm:, **options, &block)
+      warn "[DEPRECATION] `Langchain::Conversation` is deprecated. Please use `Langchain::Assistant` instead."
       @llm = llm
       @context = nil
-      @examples = []
       @memory = ::Langchain::Conversation::Memory.new(
         llm: llm,
         messages: options.delete(:messages) || [],
@@ -37,22 +38,12 @@ module Langchain
       @block = block
     end
-    def set_functions(functions)
-      @llm.functions = functions
-    end
     # Set the context of the conversation. Usually used to set the model's persona.
     # @param message [String] The context of the conversation
     def set_context(message)
       @memory.set_context ::Langchain::Conversation::Context.new(message)
     end
-    # Add examples to the conversation. Used to give the model a sense of the conversation.
-    # @param examples [Array<Prompt|Response>] The examples to add to the conversation
-    def add_examples(examples)
-      @memory.add_examples examples
-    end
     # Message the model with a prompt and return the response.
     # @param message [String] The prompt to message the model with
     # @return [Response] The response from the model
@@ -75,16 +66,14 @@ module Langchain
       @memory.context
     end
-    # Examples from conversation memory
-    # @return [Array<Prompt|Response>] Examples from the conversation memory
-    def examples
-      @memory.examples
-    end
     private
     def llm_response
-      @llm.chat(messages: @memory.messages.map(&:to_h), context: @memory.context&.to_s, examples: @memory.examples.map(&:to_h), **@options, &@block)
+      message_history = messages.map(&:to_h)
+      # Prepend the system message as context as the first message
+      message_history.prepend({role: "system", content: @memory.context.to_s}) if @memory.context
+      @llm.chat(messages: message_history, **@options, &@block)
     rescue Langchain::Utils::TokenLength::TokenLimitExceeded => exception
       @memory.reduce_messages(exception)
       retry

data/lib/langchain/llm/ai21.rb CHANGED Viewed

@@ -35,7 +35,7 @@ module Langchain::LLM
     def complete(prompt:, **params)
       parameters = complete_parameters params
-      parameters[:maxTokens] = LENGTH_VALIDATOR.validate_max_tokens!(prompt, parameters[:model], client)
+      parameters[:maxTokens] = LENGTH_VALIDATOR.validate_max_tokens!(prompt, parameters[:model], {llm: client})
       response = client.complete(prompt, parameters)
       Langchain::LLM::AI21Response.new response, model: parameters[:model]

data/lib/langchain/llm/azure.rb CHANGED Viewed

@@ -4,7 +4,7 @@ module Langchain::LLM
   # LLM interface for Azure OpenAI Service APIs: https://learn.microsoft.com/en-us/azure/ai-services/openai/
   #
   # Gem requirements:
-  #    gem "ruby-openai", "~> 6.1.0"
+  #    gem "ruby-openai", "~> 6.3.0"
   #
   # Usage:
   #    openai = Langchain::LLM::Azure.new(api_key:, llm_options: {}, embedding_deployment_url: chat_deployment_url:)
@@ -34,106 +34,19 @@ module Langchain::LLM
       @defaults = DEFAULTS.merge(default_options)
     end
-    #
-    # Generate an embedding for a given text
-    #
-    # @param text [String] The text to generate an embedding for
-    # @param params extra parameters passed to OpenAI::Client#embeddings
-    # @return [Langchain::LLM::OpenAIResponse] Response object
-    #
-    def embed(text:, **params)
-      parameters = {model: @defaults[:embeddings_model_name], input: text}
-      validate_max_tokens(text, parameters[:model])
-      response = with_api_error_handling do
-        embed_client.embeddings(parameters: parameters.merge(params))
-      end
-      Langchain::LLM::OpenAIResponse.new(response)
+    def embed(...)
+      @client = @embed_client
+      super(...)
     end
-    #
-    # Generate a completion for a given prompt
-    #
-    # @param prompt [String] The prompt to generate a completion for
-    # @param params  extra parameters passed to OpenAI::Client#complete
-    # @return [Langchain::LLM::Response::OpenaAI] Response object
-    #
-    def complete(prompt:, **params)
-      parameters = compose_parameters @defaults[:completion_model_name], params
-      parameters[:messages] = compose_chat_messages(prompt: prompt)
-      parameters[:max_tokens] = validate_max_tokens(parameters[:messages], parameters[:model])
-      response = with_api_error_handling do
-        chat_client.chat(parameters: parameters)
-      end
-      Langchain::LLM::OpenAIResponse.new(response)
+    def complete(...)
+      @client = @chat_client
+      super(...)
     end
-    #
-    # Generate a chat completion for a given prompt or messages.
-    #
-    # == Examples
-    #
-    #     # simplest case, just give a prompt
-    #     openai.chat prompt: "When was Ruby first released?"
-    #
-    #     # prompt plus some context about how to respond
-    #     openai.chat context: "You are RubyGPT, a helpful chat bot for helping people learn Ruby", prompt: "Does Ruby have a REPL like IPython?"
-    #
-    #     # full control over messages that get sent, equivilent to the above
-    #     openai.chat messages: [
-    #       {
-    #         role: "system",
-    #         content: "You are RubyGPT, a helpful chat bot for helping people learn Ruby", prompt: "Does Ruby have a REPL like IPython?"
-    #       },
-    #       {
-    #         role: "user",
-    #         content: "When was Ruby first released?"
-    #       }
-    #     ]
-    #
-    #     # few-short prompting with examples
-    #     openai.chat prompt: "When was factory_bot released?",
-    #       examples: [
-    #         {
-    #           role: "user",
-    #           content: "When was Ruby on Rails released?"
-    #         }
-    #         {
-    #           role: "assistant",
-    #           content: "2004"
-    #         },
-    #       ]
-    #
-    # @param prompt [String] The prompt to generate a chat completion for
-    # @param messages [Array<Hash>] The messages that have been sent in the conversation
-    # @param context [String] An initial context to provide as a system message, ie "You are RubyGPT, a helpful chat bot for helping people learn Ruby"
-    # @param examples [Array<Hash>] Examples of messages to provide to the model. Useful for Few-Shot Prompting
-    # @param options [Hash] extra parameters passed to OpenAI::Client#chat
-    # @yield [Hash] Stream responses back one token at a time
-    # @return [Langchain::LLM::OpenAIResponse] Response object
-    #
-    def chat(prompt: "", messages: [], context: "", examples: [], **options, &block)
-      raise ArgumentError.new(":prompt or :messages argument is expected") if prompt.empty? && messages.empty?
-      parameters = compose_parameters @defaults[:chat_completion_model_name], options, &block
-      parameters[:messages] = compose_chat_messages(prompt: prompt, messages: messages, context: context, examples: examples)
-      if functions
-        parameters[:functions] = functions
-      else
-        parameters[:max_tokens] = validate_max_tokens(parameters[:messages], parameters[:model])
-      end
-      response = with_api_error_handling { chat_client.chat(parameters: parameters) }
-      return if block
-      Langchain::LLM::OpenAIResponse.new(response)
+    def chat(...)
+      @client = @chat_client
+      super(...)
     end
   end
 end