RubyGems - llm_meta_client - Versions diffs - 1.0.2 → 1.3.0 - Mend

llm_meta_client 1.0.2 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 5dde8e658e36a04b651c91bfcbd34d23b639449c4781529cd5c061e49e52cc53
-  data.tar.gz: 56d464e06df9afb9a92ef5dcafee8d806158fb1c333dc5a44800a1737095e68b
+  metadata.gz: 2fcc6377f3293f8ecd13b81cd79ea63891c18f55dfa05ee225a151f7e1fa5b84
+  data.tar.gz: 35c4cba209aed5989b43606715205a7e2b85c669fa7dd71d65a33f4dd47a293a
 SHA512:
-  metadata.gz: 3a5442c238211a0432a26e54cd7923c1782d11178679a694cb2ac60ddb3bbd02365431bf0bbba61c9f505e65d1c3b8f60303a23dbb4752813fda580bfb997a4f
-  data.tar.gz: 955f0bb38816e24504041962e6b2a4cd9531ea89ad805c503624de927c9339383276c146dad1b346f64521f0cd3dfed985d2b5707676efdad27e998eb3441f60
+  metadata.gz: c959d77e7d3b8c9f5070bf2f63a74e83ba1082391694788e094c09629e76883dcf0142336af7742dd08fa46c0d36ec20ad31ad85b558ea4199cb8b7e3c4345fc
+  data.tar.gz: 651d88ddb211fd11234daf2ecee22e368d8886d27e43b94c1786d4d3980cedfb078a562ee0cd0ad06a48140c7fb07cae779326237e604b21a819d4ceb663d815

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,55 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.3.0] - 2026-05-10
+### Added
+- Tool-call streaming end-to-end. `ServerQuery#stream` now accepts `tool_ids:` and yields a `tool_calls` event when the LLM decides to invoke MCP tools. Turn 1 (tool selection) runs synchronously; turn 2 (the follow-up after tool execution) is streamed.
+- Scaffold renders a separate "🛠 Tool calls" bubble during streaming via the new `_tool_call_message.html.erb` partial. The Stimulus controller inserts it before the streaming bubble when `event: tool_calls` arrives, and removes it once the assistant message is saved (the saved bubble's combined markdown contains the tool-call section).
+- `Chat#stream_assistant_response` accepts `tool_ids:` and threads them through. Persistence is unchanged — the saved `Message.response` includes a markdown "Tool calls" section appended to the response text, matching the existing synchronous shape.
+- When `tool_ids` is non-empty, the system prompt is augmented with an instruction to explain tool errors rather than fail silently. Models that ignore the instruction are caught by a server-side fallback (see below).
+### Changed
+- Streaming error messages now parse `error` + `message` from the response body so users see context (e.g. "Rate limit exceeded — check your provider plan…") instead of a bare HTTP code. Mid-stream `event: error` payloads with codes like `rate_limit` get the same friendlier treatment.
+### Notes
+- Requires `llm_meta_server` with the matching tool-streaming additions: `LlmRbFacade.stream!` accepting `tools:` + `on_tool_calls:` and an `Api::ChatStreamsController` that emits the `tool_calls` SSE event. Server-side fixes that ship alongside this release: an Anthropic-tool-only-response rehydrator (Claude tool-only completions weren't surfacing through `Session#functions`), and a sink injection that emits MCP `isError: true` payloads as text deltas before turn 2 (Gemini sometimes returns nothing after a tool error and would otherwise leave the bubble blank).
+## [1.2.0] - 2026-05-10
+### Added
+- End-to-end SSE streaming for chat completions:
+  - `ServerQuery#stream` consumes SSE from the new `chat_streams` endpoint on `llm_meta_server` and yields parsed events. Returns the assembled content; absorbs upstream `done` markers and raises `ServerError` on upstream `error` events.
+  - `Chat#stream_assistant_response` and `Chat#finalize_streamed_response` for streaming generation with persistence at stream close (assistant message saved only on success — disconnects mid-stream don't persist).
+  - Scaffold now generates `ChatStreamsController`, `_streaming_message.html.erb` partial, `message_stream_controller.js` Stimulus controller, and a shared `_chat_sidebar.html.erb` partial. Routes add `resource :stream` nested under `chats`.
+  - Streaming bubble swaps to the host-rendered `_message` partial on save, so any markdown / syntax-highlighting customization in the host's `_message.html.erb` applies post-stream.
+  - `event: title` SSE event includes a turbo_stream snippet that updates the chat-sidebar in place when a brand-new chat gets its auto-generated title.
+### Changed
+- `ServerQuery` error messages parse the JSON response body from `llm_meta_server` and surface friendlier text (rate limits, auth errors, upstream unavailable) instead of bare `HTTP <code>`.
+- Streaming endpoint v1 does not pass `tool_ids`. The synchronous `#call` path is unchanged and still supports tool calls.
+### Notes
+- The streaming endpoint requires `llm_meta_server` with the matching `chat_streams` route. SSE delivery through reverse proxies needs `proxy_buffering off` (nginx) or `flushpackets=on` + `SetEnv no-gzip 1` (Apache).
+## [1.1.1] - 2026-04-22
+### Added
+- `ServerQuery#call` now surfaces tool calls from the LLM server response. When the response includes a `tool_calls` array, a markdown-formatted "Tool calls" section (name + JSON args) is appended to the returned content (separated by a horizontal rule). This lets host apps display which tools the LLM invoked without any schema or view changes; existing markdown renderers pick it up automatically. Previously, tool calls were silently dropped.
+## [1.1.0] - 2026-04-22
+### Changed
+- Widen `prompt_navigator` dependency constraint from `~> 1.0` to `>= 1.0, < 3.0` so host apps can opt into `prompt_navigator` 2.0 (which requires Ruby 3.4.9+ and adds `PromptExecution.delete_set!`). Existing hosts on `prompt_navigator` 1.x keep resolving unchanged.
 ## [1.0.2] - 2026-03-27
 ### Added

data/lib/generators/llm_meta_client/scaffold/scaffold_generator.rb CHANGED Viewed

@@ -19,6 +19,7 @@ module LlmMetaClient
       def create_controllers
         template "app/controllers/chats_controller.rb"
+        template "app/controllers/chat_streams_controller.rb"
         template "app/controllers/prompts_controller.rb"
         template "app/controllers/api/mcp_servers_controller.rb"
       end
@@ -29,6 +30,9 @@ module LlmMetaClient
         template "app/views/chats/create.turbo_stream.erb"
         template "app/views/chats/update.turbo_stream.erb"
         template "app/views/chats/_message.html.erb"
+        template "app/views/chats/_streaming_message.html.erb"
+        template "app/views/chats/_tool_call_message.html.erb"
+        template "app/views/chats/_chat_sidebar.html.erb"
         template "app/views/chats/_messages_list.html.erb"
         template "app/views/shared/_family_field.html.erb"
         template "app/views/shared/_api_key_field.html.erb"
@@ -46,6 +50,7 @@ module LlmMetaClient
         template "app/javascript/controllers/chat_title_edit_controller.js"
         template "app/javascript/controllers/tool_selector_controller.js"
         template "app/javascript/controllers/generation_settings_controller.js"
+        template "app/javascript/controllers/message_stream_controller.js"
         copy_file "app/javascript/popover.js"
       end
@@ -73,6 +78,7 @@ module LlmMetaClient
               patch :update_title
               get :download_csv
             end
+            resource :stream, only: [ :show ], controller: "chat_streams"
           end
           resources :prompts, only: [ :show ]

data/lib/generators/llm_meta_client/scaffold/templates/app/controllers/chat_streams_controller.rb ADDED Viewed

@@ -0,0 +1,101 @@
+class ChatStreamsController < ApplicationController
+  include ActionController::Live
+  skip_before_action :authenticate_user!, raise: false
+  skip_before_action :verify_authenticity_token
+  def show
+    response.headers["Content-Type"] = "text/event-stream"
+    response.headers["Cache-Control"] = "no-cache"
+    response.headers["X-Accel-Buffering"] = "no"
+    chat = find_chat
+    prompt_execution = PromptNavigator::PromptExecution.find_by!(execution_id: params[:execution_id])
+    unless chat.messages.exists?(prompt_navigator_prompt_execution_id: prompt_execution.id)
+      raise ActiveRecord::RecordNotFound
+    end
+    jwt_token = current_user.id_token if user_signed_in?
+    generation_settings = parse_generation_settings(params[:generation_settings_json])
+    tool_ids = Array(params[:tool_ids]).reject(&:blank?)
+    assembled = chat.stream_assistant_response(prompt_execution, jwt_token, tool_ids: tool_ids, generation_settings: generation_settings) do |event|
+      if event[:event] == "tool_calls"
+        tool_calls = event[:data]["tool_calls"] || []
+        forward(event: "tool_calls", data: {
+          tool_calls: tool_calls,
+          html: view_context.render(partial: "chats/tool_call_message", locals: { tool_calls: tool_calls })
+        })
+      else
+        forward(event)
+      end
+    end
+    if assembled.present?
+      assistant_message = chat.finalize_streamed_response(prompt_execution, assembled, jwt_token)
+      if assistant_message
+        forward(event: "saved", data: {
+          message_id: assistant_message.id,
+          execution_id: prompt_execution.execution_id,
+          html: view_context.render(partial: "chats/message", locals: { message: assistant_message })
+        })
+      end
+      title_before = chat.title
+      chat.generate_title(prompt_execution.prompt, jwt_token)
+      if chat.reload.title.present? && chat.title != title_before
+        forward(event: "title", data: {
+          title: chat.title,
+          chat_uuid: chat.uuid,
+          turbo_stream: render_sidebar_update(chat)
+        })
+      end
+    end
+    forward(event: "done", data: {})
+  rescue ActionController::Live::ClientDisconnected
+    Rails.logger.info "[ChatStream] client disconnected"
+  rescue ActiveRecord::RecordNotFound
+    forward(event: "error", data: { code: "not_found", message: "Chat or prompt execution not found" }) rescue nil
+  rescue StandardError => e
+    Rails.logger.error "[ChatStream] #{e.class}: #{e.message}"
+    forward(event: "error", data: { code: e.class.name, message: e.message }) rescue nil
+  ensure
+    response.stream.close
+  end
+  private
+  def find_chat
+    scope = user_signed_in? ? current_user.chats : Chat.where(user_id: nil)
+    scope.find_by!(uuid: params[:chat_id])
+  end
+  def forward(event)
+    name = event[:event]
+    payload = event[:data].to_json
+    if name.nil? || name == "message"
+      response.stream.write "data: #{payload}\n\n"
+    else
+      response.stream.write "event: #{name}\ndata: #{payload}\n\n"
+    end
+  end
+  def parse_generation_settings(raw)
+    return {} if raw.blank?
+    parsed = JSON.parse(raw)
+    parsed.is_a?(Hash) ? parsed.symbolize_keys : {}
+  rescue JSON::ParserError
+    {}
+  end
+  def render_sidebar_update(chat)
+    initialize_chat(user_signed_in? ? current_user.chats : nil)
+    add_chat(chat)
+    view_context.turbo_stream.replace(
+      "chat-sidebar",
+      partial: "chats/chat_sidebar",
+      locals: { chat: chat }
+    ).to_s
+  end
+end

data/lib/generators/llm_meta_client/scaffold/templates/app/controllers/chats_controller.rb CHANGED Viewed

@@ -70,9 +70,10 @@ class ChatsController < ApplicationController
     initialize_history @chat&.ordered_by_descending_prompt_executions
     if params[:message].present?
-      # Validate generation settings before proceeding
+      # Validate generation settings before proceeding (raises if invalid).
+      # The streaming controller re-parses them from the URL.
       begin
-        generation_settings = generation_settings_param
+        generation_settings_param
       rescue InvalidGenerationSettingsError => e
         @error_message = e.message
         respond_to do |format|
@@ -92,15 +93,11 @@ class ChatsController < ApplicationController
       # Set active message UUID for highlighting in UI
       set_active_message_uuid(@prompt_execution&.execution_id || params.dig(:chat, :branch_from_uuid))
-      # Send to LLM and get assistant response
-      begin
-        @assistant_message = @chat.add_assistant_response(@prompt_execution, jwt_token, tool_ids: tool_ids_param, generation_settings: generation_settings)
-        # Generate chat title from the user's prompt (only if title is not yet set)
-        @chat.generate_title(params[:message], jwt_token)
-      rescue StandardError => e
-        Rails.logger.error "Error in chat response: #{e.class} - #{e.message}\n#{e.backtrace&.join("\n")}"
-        @error_message = "An error occurred while getting the response. Please try again."
-      end
+      # The assistant response is streamed by ChatStreamsController (SSE).
+      # The streaming bubble is rendered by create.turbo_stream.erb and opens
+      # the EventSource on connect; persistence + title gen happen at stream close.
+      @generation_settings_json = params[:generation_settings_json]
+      @tool_ids = Array(params[:tool_ids]).reject(&:blank?)
     end
     # Return turbo stream to render both messages
@@ -167,9 +164,10 @@ class ChatsController < ApplicationController
     initialize_history @chat&.ordered_by_descending_prompt_executions
     if params[:message].present?
-      # Validate generation settings before proceeding
+      # Validate generation settings before proceeding (raises if invalid).
+      # The streaming controller re-parses them from the URL.
       begin
-        generation_settings = generation_settings_param
+        generation_settings_param
       rescue InvalidGenerationSettingsError => e
         @error_message = e.message
         respond_to do |format|
@@ -189,13 +187,10 @@ class ChatsController < ApplicationController
       # Set active message UUID for highlighting in UI
       set_active_message_uuid(@prompt_execution&.execution_id || params.dig(:chat, :branch_from_uuid))
-      # Send to LLM and get assistant response
-      begin
-        @assistant_message = @chat.add_assistant_response(@prompt_execution, jwt_token, tool_ids: tool_ids_param, generation_settings: generation_settings)
-      rescue StandardError => e
-        Rails.logger.error "Error in chat response: #{e.class} - #{e.message}\n#{e.backtrace&.join("\n")}"
-        @error_message = "An error occurred while getting the response. Please try again."
-      end
+      # The assistant response is streamed by ChatStreamsController (SSE).
+      # See create action for details.
+      @generation_settings_json = params[:generation_settings_json]
+      @tool_ids = Array(params[:tool_ids]).reject(&:blank?)
     end
     # Return turbo stream to render both messages
@@ -207,10 +202,6 @@ class ChatsController < ApplicationController
   private
-  def tool_ids_param
-    params[:tool_ids].presence || []
-  end
   ALLOWED_GENERATION_KEYS = %w[temperature top_k top_p max_tokens repeat_penalty].freeze
   class InvalidGenerationSettingsError < StandardError; end

data/lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/message_stream_controller.js ADDED Viewed

@@ -0,0 +1,124 @@
+import { Controller } from "@hotwired/stimulus"
+// Connects to data-controller="message-stream"
+// Opens an EventSource on connect, appends each delta to the content target,
+// closes on `done` / `error`.
+export default class extends Controller {
+  static targets = ["content"]
+  static values = { url: String }
+  connect() {
+    this.completed = false
+    this.source = new EventSource(this.urlValue)
+    this.source.addEventListener("message", (e) => this.#onDelta(e))
+    this.source.addEventListener("done", () => this.#onDone())
+    this.source.addEventListener("title", (e) => this.#onTitle(e))
+    this.source.addEventListener("saved", (e) => this.#onSaved(e))
+    this.source.addEventListener("tool_calls", (e) => this.#onToolCalls(e))
+    this.source.addEventListener("error", (e) => this.#onError(e))
+  }
+  disconnect() {
+    this.#close()
+  }
+  #onDelta(event) {
+    let delta
+    try { delta = JSON.parse(event.data).delta } catch { return }
+    if (!delta) return
+    this.contentTarget.append(delta)
+    this.#scrollToBottom()
+  }
+  #onTitle(event) {
+    try {
+      const data = JSON.parse(event.data)
+      if (data.turbo_stream && window.Turbo) {
+        window.Turbo.renderStreamMessage(data.turbo_stream)
+      }
+    } catch {}
+  }
+  #onSaved(event) {
+    try {
+      const data = JSON.parse(event.data)
+      this.element.dataset.savedExecutionId = data.execution_id
+      if (data.html) this.#swapInRenderedMessage(data.html)
+      // The saved bubble's content already includes any tool calls section in
+      // markdown; remove the transient tool-call bubbles so reload and live look
+      // the same.
+      this.#removeTransientToolCallBubbles()
+    } catch {}
+  }
+  #onToolCalls(event) {
+    try {
+      const data = JSON.parse(event.data)
+      if (!data.html) return
+      const wrapper = document.createElement("template")
+      wrapper.innerHTML = data.html.trim()
+      const bubble = wrapper.content.firstElementChild
+      if (!bubble) return
+      bubble.classList.add("tool-call-streaming")
+      this.element.parentNode.insertBefore(bubble, this.element)
+      this.#scrollToBottom()
+    } catch {}
+  }
+  #removeTransientToolCallBubbles() {
+    document.querySelectorAll(".tool-call-streaming").forEach((el) => el.remove())
+  }
+  // Swap the streaming bubble's role + content with the host-rendered _message
+  // partial output so any markdown / syntax highlighting / partial customizations
+  // applied on reload also apply right after the stream finishes. We don't
+  // replace the whole element — that would disconnect this controller and
+  // close the EventSource before `title` / `done` arrive.
+  #swapInRenderedMessage(html) {
+    const doc = new DOMParser().parseFromString(html, "text/html")
+    const newBubble = doc.querySelector(".message")
+    if (!newBubble) return
+    const newRole = newBubble.querySelector(".message-role")
+    const oldRole = this.element.querySelector(".message-role")
+    if (newRole && oldRole) oldRole.innerHTML = newRole.innerHTML
+    const newContent = newBubble.querySelector(".message-content")
+    if (newContent) this.contentTarget.innerHTML = newContent.innerHTML
+    this.element.classList.remove("streaming")
+    if (newBubble.id) this.element.id = newBubble.id
+  }
+  #onDone() {
+    this.completed = true
+    this.#close()
+  }
+  #onError(event) {
+    // EventSource fires onerror whenever the connection closes — including
+    // immediately after a clean `event: done`. Suppress those.
+    if (this.completed) {
+      this.#close()
+      return
+    }
+    let message = "Stream interrupted."
+    try { if (event.data) message = JSON.parse(event.data).message || message } catch {}
+    const errEl = document.createElement("p")
+    errEl.className = "stream-error"
+    errEl.textContent = `[error] ${message}`
+    this.contentTarget.appendChild(errEl)
+    this.#close()
+  }
+  #close() {
+    if (this.source && this.source.readyState !== EventSource.CLOSED) {
+      this.source.close()
+    }
+  }
+  #scrollToBottom() {
+    const chatMessages = document.getElementById("chat-messages")
+    if (chatMessages) chatMessages.scrollTop = chatMessages.scrollHeight
+  }
+}

data/lib/generators/llm_meta_client/scaffold/templates/app/models/chat.rb CHANGED Viewed

@@ -68,6 +68,37 @@ class Chat < ApplicationRecord
     new_message
   end
+  # Stream the assistant response from the LLM. Yields each parsed SSE event.
+  # Returns the assembled content (with markdown "Tool calls" section appended
+  # if tools fired). Caller is responsible for persistence.
+  def stream_assistant_response(prompt_execution, jwt_token, tool_ids: [], generation_settings: {}, &block)
+    summarized_context, prompt = build_streaming_context(prompt_execution, jwt_token, with_tools: tool_ids.any?)
+    LlmMetaClient::ServerQuery.new.stream(
+      jwt_token,
+      prompt_execution.llm_uuid,
+      prompt_execution.model,
+      summarized_context,
+      prompt,
+      tool_ids: tool_ids,
+      generation_settings: generation_settings,
+      &block
+    )
+  end
+  # Persist the streamed assistant response. Skips persistence if content is blank.
+  def finalize_streamed_response(prompt_execution, content, jwt_token)
+    return nil if content.blank?
+    prompt_execution.update!(
+      llm_platform: resolve_llm_type(prompt_execution.llm_uuid, jwt_token),
+      response: content
+    )
+    messages.create!(
+      role: "assistant",
+      prompt_navigator_prompt_execution: prompt_execution
+    )
+  end
   # Get all messages in order
   def ordered_messages
     messages
@@ -115,31 +146,43 @@ class Chat < ApplicationRecord
   # Send messages to LLM and get response
   def send_to_llm(prompt_execution, jwt_token, tool_ids: [], generation_settings: {})
-    llm_uuid = prompt_execution.llm_uuid
-    model = prompt_execution.model
+    summarized_context, prompt = build_streaming_context(prompt_execution, jwt_token, with_tools: tool_ids.any?)
+    LlmMetaClient::ServerQuery.new.call(
+      jwt_token,
+      prompt_execution.llm_uuid,
+      prompt_execution.model,
+      summarized_context,
+      prompt,
+      tool_ids: tool_ids,
+      generation_settings: generation_settings
+    )
+  end
-    # Get LLM options
+  # Build the (summarized_context, prompt) tuple for an LLM call.
+  # Shared by both the synchronous and streaming paths.
+  def build_streaming_context(prompt_execution, jwt_token, with_tools: false)
     llm_options = LlmMetaClient::ServerResource.available_llm_options(jwt_token)
-    # Error if no LLM is available
     raise LlmMetaClient::Exceptions::OllamaUnavailableError, "No LLM available" if llm_options.empty?
-    # Build prompt and context from direct lineage via PromptExecution
     last_msg = ordered_messages.last
     pe = last_msg.prompt_navigator_prompt_execution
     prompt = { role: last_msg.role, prompt: pe.prompt }
     context = pe.build_context(limit: Rails.configuration.summarize_conversation_count)
-    if context.empty?
-      summarized_context = "No context available."
-    else
-      summarized_context = LlmMetaClient::ServerQuery.new.call(jwt_token, llm_uuid, model, context, "Please summarize the context")
-    end
+    summarized_context =
+      if context.empty?
+        "No context available."
+      else
+        LlmMetaClient::ServerQuery.new.call(
+          jwt_token, prompt_execution.llm_uuid, prompt_execution.model,
+          context, "Please summarize the context"
+        )
+      end
     summarized_context += "Additional prompt: Responses from the assistant must consist solely of the response body."
+    if with_tools
+      summarized_context += " If a tool call returns an error, do not give up silently — explain the error and what likely caused it (e.g. an invalid argument value)."
+    end
-    # Send chat request using LlmMetaClient::ServerQuery
-    LlmMetaClient::ServerQuery.new.call(jwt_token, llm_uuid, model, summarized_context, prompt, tool_ids: tool_ids, generation_settings: generation_settings)
+    [ summarized_context, prompt ]
   end
 end

data/lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_chat_sidebar.html.erb ADDED Viewed

@@ -0,0 +1,8 @@
+<div id="chat-sidebar">
+  <%%= chat_list(
+    ->(id) { chat_path(id) },
+    active_uuid: chat&.uuid,
+    download_csv_path: ->(id) { download_csv_chat_path(id) },
+    download_all_csv_path: download_all_csv_chats_path
+  ) %>
+</div>

data/lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_streaming_message.html.erb ADDED Viewed

@@ -0,0 +1,12 @@
+<%% stream_url = chat_stream_path(
+  chat_id: chat.uuid,
+  execution_id: prompt_execution.execution_id,
+  generation_settings_json: @generation_settings_json.presence,
+  tool_ids: (@tool_ids.presence || nil)
+) %>
+<div class="message assistant streaming"
+     data-controller="message-stream"
+     data-message-stream-url-value="<%%= stream_url %>">
+  <div class="message-role">🤖 streaming…</div>
+  <div class="message-content" data-message-stream-target="content"></div>
+</div>

data/lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_tool_call_message.html.erb ADDED Viewed

@@ -0,0 +1,22 @@
+<div class="message assistant tool-call">
+  <div class="message-role">🛠 Tool calls</div>
+  <div class="message-content">
+    <ul>
+      <%% tool_calls.each do |tc| %>
+        <%% name = tc["name"] || tc[:name] || "(unknown)" %>
+        <%% args = tc["arguments"] || tc[:arguments] %>
+        <%% args_str = case args
+          when Hash, Array then args.to_json
+          when nil, "" then nil
+          else args.to_s
+        end %>
+        <li>
+          <code><%%= name %></code>
+          <%% if args_str %>
+            — <code><%%= args_str %></code>
+          <%% end %>
+        </li>
+      <%% end %>
+    </ul>
+  </div>
+</div>

data/lib/generators/llm_meta_client/scaffold/templates/app/views/chats/create.turbo_stream.erb CHANGED Viewed

@@ -5,10 +5,10 @@
 <%% # User message is already shown by JavaScript on form submit %>
 <%% # Only render assistant message here %>
-<%% # Render assistant message if available %>
-<%% if @assistant_message %>
+<%%# Render streaming assistant placeholder; the message-stream Stimulus controller opens an EventSource and appends deltas as they arrive. %>
+<%% if @prompt_execution && @error_message.blank? %>
 <%%= turbo_stream.append "messages-list" do %>
-  <%%= render partial: "chats/message", locals: { message: @assistant_message } %>
+  <%%= render partial: "chats/streaming_message", locals: { chat: @chat, prompt_execution: @prompt_execution } %>
 <%% end %>
 <%% end %>
@@ -25,9 +25,7 @@
 <%% # Update chat sidebar %>
 <%%= turbo_stream.replace "chat-sidebar" do %>
-  <div id="chat-sidebar">
-    <%%= chat_list(->(id) { chat_path(id) }, active_uuid: @chat&.uuid, download_csv_path: ->(id) { download_csv_chat_path(id) }, download_all_csv_path: download_all_csv_chats_path) %>
-  </div>
+  <%%= render partial: "chats/chat_sidebar", locals: { chat: @chat } %>
 <%% end %>
 <%% # Update history sidebar - replace entire content to ensure update %>

data/lib/generators/llm_meta_client/scaffold/templates/app/views/chats/update.turbo_stream.erb CHANGED Viewed

@@ -5,10 +5,10 @@
 <%% # User message is already shown by JavaScript on form submit %>
 <%% # Only render assistant message here %>
-<%% # Render assistant message if available %>
-<%% if @assistant_message %>
+<%%# Render streaming assistant placeholder; the message-stream Stimulus controller opens an EventSource and appends deltas as they arrive. %>
+<%% if @prompt_execution && @error_message.blank? %>
 <%%= turbo_stream.append "messages-list" do %>
-  <%%= render partial: "chats/message", locals: { message: @assistant_message } %>
+  <%%= render partial: "chats/streaming_message", locals: { chat: @chat, prompt_execution: @prompt_execution } %>
 <%% end %>
 <%% end %>

data/lib/llm_meta_client/server_query.rb CHANGED Viewed

@@ -1,5 +1,47 @@
+require "net/http"
+require "uri"
+require "json"
 module LlmMetaClient
   class ServerQuery
+    # Stream LLM responses incrementally. Yields each content delta event
+    # ({ event: "message", data: { "delta" => "..." } }) and any tool_calls
+    # event ({ event: "tool_calls", data: { "tool_calls" => [...] } }) to the
+    # caller's block. Upstream "done" markers are absorbed (end-of-stream is
+    # signaled by the block returning); upstream "error" events raise ServerError.
+    # Returns the final assistant content. If tool calls fired, the returned
+    # string mirrors the synchronous #call format (response + markdown
+    # "Tool calls" section appended) so persistence stays consistent.
+    def stream(id_token, api_key_uuid, model_id, context, user_content, tool_ids: [], generation_settings: {})
+      context_and_user_content = "Context:#{context}, User Prompt: #{user_content}"
+      debug_log "Streaming request to LLM: \n===>\n#{context_and_user_content}\n===>"
+      body = { prompt: context_and_user_content }
+      body[:tool_ids] = tool_ids if tool_ids.present?
+      body[:generation_settings] = generation_settings if generation_settings.present?
+      assembled = +""
+      collected_tool_calls = []
+      request_stream(api_key_uuid, id_token, model_id, body) do |event|
+        case event[:event]
+        when "message"
+          assembled << event[:data]["delta"].to_s
+          yield event if block_given?
+        when "tool_calls"
+          collected_tool_calls = event[:data]["tool_calls"] || []
+          yield event if block_given?
+        when "done"
+          # End-of-stream marker from upstream; no-op here.
+        when "error"
+          raise Exceptions::ServerError, format_stream_error(event[:data])
+        else
+          yield event if block_given?
+        end
+      end
+      collected_tool_calls.any? ? combine_with_tool_calls(assembled, collected_tool_calls) : assembled
+    end
     def call(id_token, api_key_uuid, model_id, context, user_content, tool_ids: [], generation_settings: {})
       debug_log "Context: #{context}"
       context_and_user_content = "Context:#{context}, User Prompt: #{user_content}"
@@ -7,13 +49,17 @@ module LlmMetaClient
       response = request(api_key_uuid, id_token, model_id, context_and_user_content, tool_ids, generation_settings)
-      raise Exceptions::ServerError, "LLM server returned HTTP #{response.code}" unless response.success?
+      unless response.success?
+        raise Exceptions::ServerError, build_error_message(response.code.to_i, response.parsed_response)
+      end
       response_body = response.parsed_response
       raise Exceptions::InvalidResponseError, "LLM server returned non-JSON response" unless response_body.is_a?(Hash)
       content = response_body.dig("response", "message") || ""
+      tool_calls = response_body.dig("response", "tool_calls")
+      content = combine_with_tool_calls(content, tool_calls) if tool_calls.is_a?(Array) && tool_calls.any?
       raise Exceptions::EmptyResponseError, "LLM server returned empty response" if content.blank?
@@ -28,6 +74,28 @@ module LlmMetaClient
       Rails.logger.info(message) if Rails.env.development?
     end
+    def combine_with_tool_calls(message, tool_calls)
+      tool_section = format_tool_calls(tool_calls)
+      return tool_section if message.blank?
+      "#{message}\n\n---\n\n#{tool_section}"
+    end
+    def format_tool_calls(tool_calls)
+      lines = [ "**Tool calls**", "" ]
+      tool_calls.each do |tc|
+        name = tc["name"] || tc[:name] || "(unknown)"
+        args = tc["arguments"] || tc[:arguments]
+        args_str =
+          case args
+          when Hash, Array then args.to_json
+          when nil then ""
+          else args.to_s
+          end
+        lines << (args_str.empty? ? "- `#{name}`" : "- `#{name}` — `#{args_str}`")
+      end
+      lines.join("\n")
+    end
     def request(api_key_uuid, id_token, model_id, user_content, tool_ids, generation_settings)
       headers = { "Content-Type" => "application/json" }
       headers["Authorization"] = "Bearer #{id_token}" if id_token.present?
@@ -47,5 +115,93 @@ module LlmMetaClient
     def url(api_key_uuid, model_id)
       "#{Rails.application.config.llm_service_base_url}/api/llm_api_keys/#{api_key_uuid}/models/#{model_id}/chats"
     end
+    def stream_url(api_key_uuid, model_id)
+      "#{Rails.application.config.llm_service_base_url}/api/llm_api_keys/#{api_key_uuid}/models/#{model_id}/chat_streams"
+    end
+    def request_stream(api_key_uuid, id_token, model_id, body)
+      uri = URI(stream_url(api_key_uuid, model_id))
+      Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == "https", read_timeout: 600) do |http|
+        req = Net::HTTP::Post.new(uri)
+        req["Content-Type"] = "application/json"
+        req["Accept"] = "text/event-stream"
+        req["Authorization"] = "Bearer #{id_token}" if id_token.present?
+        req.body = body.to_json
+        http.request(req) do |response|
+          unless response.is_a?(Net::HTTPSuccess)
+            body = JSON.parse(response.read_body.to_s) rescue nil
+            raise Exceptions::ServerError, build_error_message(response.code.to_i, body)
+          end
+          buffer = +""
+          response.read_body do |chunk|
+            buffer << chunk
+            while (boundary = buffer.index("\n\n"))
+              raw_event = buffer.slice!(0, boundary + 2)
+              parsed = parse_sse_event(raw_event)
+              yield parsed if parsed
+            end
+          end
+        end
+      end
+    end
+    # Format an `event: error` SSE payload from llm_meta_server into a
+    # user-facing string. Payload shape: { "code" => "rate_limit", "message" => "..." }
+    def format_stream_error(data)
+      code = data["code"]
+      message = data["message"]
+      case code
+      when "rate_limit"
+        suffix = message.present? ? ": #{message}" : ""
+        "Rate limit exceeded — check your provider plan or retry shortly#{suffix}"
+      when "api_key_required"
+        message.presence || "API key required for this model"
+      else
+        message.presence || "Upstream stream error"
+      end
+    end
+    # Turn a non-success HTTP response from llm_meta_server into a user-facing
+    # error string. The server returns JSON like
+    #   { "error" => "LLM API Rate limit exceeded", "message" => "Too many requests" }
+    # for known error classes; fall back to a generic message otherwise.
+    def build_error_message(status_code, body)
+      if body.is_a?(Hash)
+        err = body["error"]
+        msg = body["message"]
+        return "#{err}: #{msg}" if err.present? && msg.present?
+        return err if err.present?
+        return msg if msg.present?
+      end
+      case status_code
+      when 429 then "Rate limit exceeded — check your provider plan or retry shortly (HTTP 429)"
+      when 401, 403 then "LLM service rejected the request (HTTP #{status_code}) — check your API key"
+      when 502, 503, 504 then "LLM service is unavailable (HTTP #{status_code})"
+      else "LLM server returned HTTP #{status_code}"
+      end
+    end
+    def parse_sse_event(raw)
+      event_name = "message"
+      data_lines = []
+      raw.each_line(chomp: true) do |line|
+        next if line.empty?
+        if line.start_with?("event:")
+          event_name = line.sub(/^event:\s*/, "")
+        elsif line.start_with?("data:")
+          data_lines << line.sub(/^data:\s*/, "")
+        end
+      end
+      return nil if data_lines.empty?
+      data = JSON.parse(data_lines.join("\n"))
+      { event: event_name, data: data }
+    rescue JSON::ParserError
+      nil
+    end
   end
 end

data/lib/llm_meta_client/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module LlmMetaClient
-  VERSION = "1.0.2"
+  VERSION = "1.3.0"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm_meta_client
 version: !ruby/object:Gem::Version
-  version: 1.0.2
+  version: 1.3.0
 platform: ruby
 authors:
 - dhq_boiler
@@ -47,16 +47,22 @@ dependencies:
   name: prompt_navigator
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
         version: '1.0'
+    - - "<"
+      - !ruby/object:Gem::Version
+        version: '3.0'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
         version: '1.0'
+    - - "<"
+      - !ruby/object:Gem::Version
+        version: '3.0'
 - !ruby/object:Gem::Dependency
   name: chat_manager
   requirement: !ruby/object:Gem::Requirement
@@ -103,18 +109,23 @@ files:
 - lib/generators/llm_meta_client/authentication/templates/db/migrate/create_users.rb
 - lib/generators/llm_meta_client/scaffold/scaffold_generator.rb
 - lib/generators/llm_meta_client/scaffold/templates/app/controllers/api/mcp_servers_controller.rb
+- lib/generators/llm_meta_client/scaffold/templates/app/controllers/chat_streams_controller.rb
 - lib/generators/llm_meta_client/scaffold/templates/app/controllers/chats_controller.rb
 - lib/generators/llm_meta_client/scaffold/templates/app/controllers/prompts_controller.rb
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/chat_title_edit_controller.js
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/chats_form_controller.js
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/generation_settings_controller.js
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/llm_selector_controller.js
+- lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/message_stream_controller.js
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/tool_selector_controller.js
 - lib/generators/llm_meta_client/scaffold/templates/app/javascript/popover.js
 - lib/generators/llm_meta_client/scaffold/templates/app/models/chat.rb
 - lib/generators/llm_meta_client/scaffold/templates/app/models/message.rb
+- lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_chat_sidebar.html.erb
 - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_message.html.erb
 - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_messages_list.html.erb
+- lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_streaming_message.html.erb
+- lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_tool_call_message.html.erb
 - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/create.turbo_stream.erb
 - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/edit.html.erb
 - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/new.html.erb
@@ -163,7 +174,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 4.0.3
+rubygems_version: 3.6.9
 specification_version: 4
 summary: A Rails Engine for integrating multiple LLM providers into your application.
 test_files: []