llm_meta_client 1.0.1 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 95bb1f421088cc0c287b4f79e961455a1a5275b860b372fb23523ddc8781d37a
4
- data.tar.gz: d4f92a150de0d9996d3ea6c54e69fe24989763f4bd359ce46a6b7ff69d36ecb7
3
+ metadata.gz: 5529e0613f2103802abfbf448ef5d030f63a7788e8fde7440d732fa94ced7378
4
+ data.tar.gz: 38223a0a3ff727e9a649fc38db922e5f89aaaa858118464ef9d410a727fff5f1
5
5
  SHA512:
6
- metadata.gz: 480919bad702190fec4166d8a2659121ff62b1880123e6db168d9a1c8a8b7f80f23ca175e2e6f4a31b96e6bde97fd0723d95caa71439845a5017c351c18e373e
7
- data.tar.gz: c2600201eccae124c9929eabcd8b7d755b7dbac4092cd5441c55d1caf79c839af8f0b41b49d7bf1c2e6a324e33c6bd1957d29a1ac3e837a7ab6924d727e8ae91
6
+ metadata.gz: 2d86c22b05ff9991ff7087a370de97fc90adaf766f2a13745511d2ce7beb129828e169cbeb601b9d2996836f62fb79ca398a66a2e1fbd09a20cf78c148d5fb8a
7
+ data.tar.gz: 55cc853db63cca50ace6693e4b4a124ee2923371fd6709b2ef1d92e3fdcb4bc62ad433533c2e832dfff132d6ab91bf608847d583d0fdc1104beb38c54597db8d
data/CHANGELOG.md CHANGED
@@ -5,6 +5,55 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.2.0] - 2026-05-10
9
+
10
+ ### Added
11
+
12
+ - End-to-end SSE streaming for chat completions:
13
+ - `ServerQuery#stream` consumes SSE from the new `chat_streams` endpoint on `llm_meta_server` and yields parsed events. Returns the assembled content; absorbs upstream `done` markers and raises `ServerError` on upstream `error` events.
14
+ - `Chat#stream_assistant_response` and `Chat#finalize_streamed_response` for streaming generation with persistence at stream close (assistant message saved only on success — disconnects mid-stream don't persist).
15
+ - Scaffold now generates `ChatStreamsController`, `_streaming_message.html.erb` partial, `message_stream_controller.js` Stimulus controller, and a shared `_chat_sidebar.html.erb` partial. Routes add `resource :stream` nested under `chats`.
16
+ - Streaming bubble swaps to the host-rendered `_message` partial on save, so any markdown / syntax-highlighting customization in the host's `_message.html.erb` applies post-stream.
17
+ - `event: title` SSE event includes a turbo_stream snippet that updates the chat-sidebar in place when a brand-new chat gets its auto-generated title.
18
+
19
+ ### Changed
20
+
21
+ - `ServerQuery` error messages parse the JSON response body from `llm_meta_server` and surface friendlier text (rate limits, auth errors, upstream unavailable) instead of bare `HTTP <code>`.
22
+ - Streaming endpoint v1 does not pass `tool_ids`. The synchronous `#call` path is unchanged and still supports tool calls.
23
+
24
+ ### Notes
25
+
26
+ - The streaming endpoint requires `llm_meta_server` with the matching `chat_streams` route. SSE delivery through reverse proxies needs `proxy_buffering off` (nginx) or `flushpackets=on` + `SetEnv no-gzip 1` (Apache).
27
+
28
+ ## [1.1.1] - 2026-04-22
29
+
30
+ ### Added
31
+
32
+ - `ServerQuery#call` now surfaces tool calls from the LLM server response. When the response includes a `tool_calls` array, a markdown-formatted "Tool calls" section (name + JSON args) is appended to the returned content (separated by a horizontal rule). This lets host apps display which tools the LLM invoked without any schema or view changes; existing markdown renderers pick it up automatically. Previously, tool calls were silently dropped.
33
+
34
+ ## [1.1.0] - 2026-04-22
35
+
36
+ ### Changed
37
+
38
+ - Widen `prompt_navigator` dependency constraint from `~> 1.0` to `>= 1.0, < 3.0` so host apps can opt into `prompt_navigator` 2.0 (which requires Ruby 3.4.9+ and adds `PromptExecution.delete_set!`). Existing hosts on `prompt_navigator` 1.x keep resolving unchanged.
39
+
40
+ ## [1.0.2] - 2026-03-27
41
+
42
+ ### Added
43
+
44
+ - Add client-side validation for Generation Settings JSON
45
+
46
+ ## [1.0.1] - 2026-03-25
47
+
48
+ ### Fixed
49
+
50
+ - Fix: normalize Ollama llm_type in server resource options
51
+ - Fix: update branch_from_uuid after LLM response
52
+
53
+ ### Changed
54
+
55
+ - Refactor: move llm_uuid and model from Chat to PromptExecution
56
+
8
57
  ## [1.0.0] - 2026-03-25
9
58
 
10
59
  ### Changed
@@ -62,6 +62,21 @@
62
62
  }
63
63
  }
64
64
 
65
+ .generation-settings-json-input--invalid {
66
+ border-color: #ef4444;
67
+
68
+ &:focus {
69
+ border-color: #ef4444;
70
+ box-shadow: 0 0 0 3px rgba(239, 68, 68, 0.1);
71
+ }
72
+ }
73
+
74
+ .generation-settings-error {
75
+ font-size: 12px;
76
+ color: #ef4444;
77
+ margin-top: 4px;
78
+ }
79
+
65
80
  .generation-settings-hint {
66
81
  font-size: 11px;
67
82
  color: #9ca3af;
@@ -19,6 +19,7 @@ module LlmMetaClient
19
19
 
20
20
  def create_controllers
21
21
  template "app/controllers/chats_controller.rb"
22
+ template "app/controllers/chat_streams_controller.rb"
22
23
  template "app/controllers/prompts_controller.rb"
23
24
  template "app/controllers/api/mcp_servers_controller.rb"
24
25
  end
@@ -29,6 +30,8 @@ module LlmMetaClient
29
30
  template "app/views/chats/create.turbo_stream.erb"
30
31
  template "app/views/chats/update.turbo_stream.erb"
31
32
  template "app/views/chats/_message.html.erb"
33
+ template "app/views/chats/_streaming_message.html.erb"
34
+ template "app/views/chats/_chat_sidebar.html.erb"
32
35
  template "app/views/chats/_messages_list.html.erb"
33
36
  template "app/views/shared/_family_field.html.erb"
34
37
  template "app/views/shared/_api_key_field.html.erb"
@@ -46,6 +49,7 @@ module LlmMetaClient
46
49
  template "app/javascript/controllers/chat_title_edit_controller.js"
47
50
  template "app/javascript/controllers/tool_selector_controller.js"
48
51
  template "app/javascript/controllers/generation_settings_controller.js"
52
+ template "app/javascript/controllers/message_stream_controller.js"
49
53
  copy_file "app/javascript/popover.js"
50
54
  end
51
55
 
@@ -73,6 +77,7 @@ module LlmMetaClient
73
77
  patch :update_title
74
78
  get :download_csv
75
79
  end
80
+ resource :stream, only: [ :show ], controller: "chat_streams"
76
81
  end
77
82
  resources :prompts, only: [ :show ]
78
83
 
@@ -0,0 +1,92 @@
1
+ class ChatStreamsController < ApplicationController
2
+ include ActionController::Live
3
+
4
+ skip_before_action :authenticate_user!, raise: false
5
+ skip_before_action :verify_authenticity_token
6
+
7
+ def show
8
+ response.headers["Content-Type"] = "text/event-stream"
9
+ response.headers["Cache-Control"] = "no-cache"
10
+ response.headers["X-Accel-Buffering"] = "no"
11
+
12
+ chat = find_chat
13
+ prompt_execution = PromptNavigator::PromptExecution.find_by!(execution_id: params[:execution_id])
14
+ unless chat.messages.exists?(prompt_navigator_prompt_execution_id: prompt_execution.id)
15
+ raise ActiveRecord::RecordNotFound
16
+ end
17
+
18
+ jwt_token = current_user.id_token if user_signed_in?
19
+ generation_settings = parse_generation_settings(params[:generation_settings_json])
20
+
21
+ assembled = chat.stream_assistant_response(prompt_execution, jwt_token, generation_settings: generation_settings) do |event|
22
+ forward(event)
23
+ end
24
+
25
+ if assembled.present?
26
+ assistant_message = chat.finalize_streamed_response(prompt_execution, assembled, jwt_token)
27
+ if assistant_message
28
+ forward(event: "saved", data: {
29
+ message_id: assistant_message.id,
30
+ execution_id: prompt_execution.execution_id,
31
+ html: view_context.render(partial: "chats/message", locals: { message: assistant_message })
32
+ })
33
+ end
34
+
35
+ title_before = chat.title
36
+ chat.generate_title(prompt_execution.prompt, jwt_token)
37
+ if chat.reload.title.present? && chat.title != title_before
38
+ forward(event: "title", data: {
39
+ title: chat.title,
40
+ chat_uuid: chat.uuid,
41
+ turbo_stream: render_sidebar_update(chat)
42
+ })
43
+ end
44
+ end
45
+
46
+ forward(event: "done", data: {})
47
+ rescue ActionController::Live::ClientDisconnected
48
+ Rails.logger.info "[ChatStream] client disconnected"
49
+ rescue ActiveRecord::RecordNotFound
50
+ forward(event: "error", data: { code: "not_found", message: "Chat or prompt execution not found" }) rescue nil
51
+ rescue StandardError => e
52
+ Rails.logger.error "[ChatStream] #{e.class}: #{e.message}"
53
+ forward(event: "error", data: { code: e.class.name, message: e.message }) rescue nil
54
+ ensure
55
+ response.stream.close
56
+ end
57
+
58
+ private
59
+
60
+ def find_chat
61
+ scope = user_signed_in? ? current_user.chats : Chat.where(user_id: nil)
62
+ scope.find_by!(uuid: params[:chat_id])
63
+ end
64
+
65
+ def forward(event)
66
+ name = event[:event]
67
+ payload = event[:data].to_json
68
+ if name.nil? || name == "message"
69
+ response.stream.write "data: #{payload}\n\n"
70
+ else
71
+ response.stream.write "event: #{name}\ndata: #{payload}\n\n"
72
+ end
73
+ end
74
+
75
+ def parse_generation_settings(raw)
76
+ return {} if raw.blank?
77
+ parsed = JSON.parse(raw)
78
+ parsed.is_a?(Hash) ? parsed.symbolize_keys : {}
79
+ rescue JSON::ParserError
80
+ {}
81
+ end
82
+
83
+ def render_sidebar_update(chat)
84
+ initialize_chat(user_signed_in? ? current_user.chats : nil)
85
+ add_chat(chat)
86
+ view_context.turbo_stream.replace(
87
+ "chat-sidebar",
88
+ partial: "chats/chat_sidebar",
89
+ locals: { chat: chat }
90
+ ).to_s
91
+ end
92
+ end
@@ -70,9 +70,10 @@ class ChatsController < ApplicationController
70
70
  initialize_history @chat&.ordered_by_descending_prompt_executions
71
71
 
72
72
  if params[:message].present?
73
- # Validate generation settings before proceeding
73
+ # Validate generation settings before proceeding (raises if invalid).
74
+ # The streaming controller re-parses them from the URL.
74
75
  begin
75
- generation_settings = generation_settings_param
76
+ generation_settings_param
76
77
  rescue InvalidGenerationSettingsError => e
77
78
  @error_message = e.message
78
79
  respond_to do |format|
@@ -92,15 +93,10 @@ class ChatsController < ApplicationController
92
93
  # Set active message UUID for highlighting in UI
93
94
  set_active_message_uuid(@prompt_execution&.execution_id || params.dig(:chat, :branch_from_uuid))
94
95
 
95
- # Send to LLM and get assistant response
96
- begin
97
- @assistant_message = @chat.add_assistant_response(@prompt_execution, jwt_token, tool_ids: tool_ids_param, generation_settings: generation_settings)
98
- # Generate chat title from the user's prompt (only if title is not yet set)
99
- @chat.generate_title(params[:message], jwt_token)
100
- rescue StandardError => e
101
- Rails.logger.error "Error in chat response: #{e.class} - #{e.message}\n#{e.backtrace&.join("\n")}"
102
- @error_message = "An error occurred while getting the response. Please try again."
103
- end
96
+ # The assistant response is streamed by ChatStreamsController (SSE).
97
+ # The streaming bubble is rendered by create.turbo_stream.erb and opens
98
+ # the EventSource on connect; persistence + title gen happen at stream close.
99
+ @generation_settings_json = params[:generation_settings_json]
104
100
  end
105
101
 
106
102
  # Return turbo stream to render both messages
@@ -167,9 +163,10 @@ class ChatsController < ApplicationController
167
163
  initialize_history @chat&.ordered_by_descending_prompt_executions
168
164
 
169
165
  if params[:message].present?
170
- # Validate generation settings before proceeding
166
+ # Validate generation settings before proceeding (raises if invalid).
167
+ # The streaming controller re-parses them from the URL.
171
168
  begin
172
- generation_settings = generation_settings_param
169
+ generation_settings_param
173
170
  rescue InvalidGenerationSettingsError => e
174
171
  @error_message = e.message
175
172
  respond_to do |format|
@@ -189,13 +186,9 @@ class ChatsController < ApplicationController
189
186
  # Set active message UUID for highlighting in UI
190
187
  set_active_message_uuid(@prompt_execution&.execution_id || params.dig(:chat, :branch_from_uuid))
191
188
 
192
- # Send to LLM and get assistant response
193
- begin
194
- @assistant_message = @chat.add_assistant_response(@prompt_execution, jwt_token, tool_ids: tool_ids_param, generation_settings: generation_settings)
195
- rescue StandardError => e
196
- Rails.logger.error "Error in chat response: #{e.class} - #{e.message}\n#{e.backtrace&.join("\n")}"
197
- @error_message = "An error occurred while getting the response. Please try again."
198
- end
189
+ # The assistant response is streamed by ChatStreamsController (SSE).
190
+ # See create action for details.
191
+ @generation_settings_json = params[:generation_settings_json]
199
192
  end
200
193
 
201
194
  # Return turbo stream to render both messages
@@ -207,10 +200,6 @@ class ChatsController < ApplicationController
207
200
 
208
201
  private
209
202
 
210
- def tool_ids_param
211
- params[:tool_ids].presence || []
212
- end
213
-
214
203
  ALLOWED_GENERATION_KEYS = %w[temperature top_k top_p max_tokens repeat_penalty].freeze
215
204
 
216
205
  class InvalidGenerationSettingsError < StandardError; end
@@ -13,7 +13,15 @@ export default class extends Controller {
13
13
  }
14
14
 
15
15
  // Handle form submission to show user message immediately
16
- submit() {
16
+ submit(event) {
17
+ // Check generation settings validity before submitting
18
+ const gsController = this.#generationSettingsController()
19
+ if (gsController && !gsController.isValid) {
20
+ event.preventDefault()
21
+ gsController.validate()
22
+ return
23
+ }
24
+
17
25
  // Don't prevent default - let Turbo handle the form submission
18
26
  // Just add the user message to the DOM immediately
19
27
  const messageContent = this.promptTarget.value.trim()
@@ -64,6 +72,12 @@ export default class extends Controller {
64
72
  }
65
73
  }
66
74
 
75
+ #generationSettingsController() {
76
+ const el = this.element.querySelector('[data-controller*="generation-settings"]')
77
+ if (!el) return null
78
+ return this.application.getControllerForElementAndIdentifier(el, "generation-settings")
79
+ }
80
+
67
81
  #canSubmit() {
68
82
  // Text field and prompt field can be validated using HTML5's required attribute,
69
83
  // so we delegate to checkValidity() to utilize standard validation
@@ -1,5 +1,7 @@
1
1
  import { Controller } from "@hotwired/stimulus"
2
2
 
3
+ const ALLOWED_KEYS = ["temperature", "top_k", "top_p", "max_tokens", "repeat_penalty"]
4
+
3
5
  // Connects to data-controller="generation-settings"
4
6
  export default class extends Controller {
5
7
  static targets = [
@@ -7,6 +9,7 @@ export default class extends Controller {
7
9
  "toggleIcon",
8
10
  "panel",
9
11
  "jsonInput",
12
+ "error",
10
13
  ]
11
14
 
12
15
  connect() {
@@ -24,4 +27,72 @@ export default class extends Controller {
24
27
  this.toggleIconTarget.classList.toggle("bi-chevron-up", this.expanded)
25
28
  }
26
29
  }
30
+
31
+ validate() {
32
+ const input = this.jsonInputTarget.value.trim()
33
+
34
+ if (!input) {
35
+ this.#clearError()
36
+ return
37
+ }
38
+
39
+ let parsed
40
+ try {
41
+ parsed = JSON.parse(input)
42
+ } catch (e) {
43
+ this.#showError("Invalid JSON syntax")
44
+ return
45
+ }
46
+
47
+ if (typeof parsed !== "object" || Array.isArray(parsed) || parsed === null) {
48
+ this.#showError("Must be a JSON object (e.g. {\"temperature\": 0.7})")
49
+ return
50
+ }
51
+
52
+ const unknownKeys = Object.keys(parsed).filter(k => !ALLOWED_KEYS.includes(k))
53
+ if (unknownKeys.length > 0) {
54
+ this.#showError(`Unknown keys: ${unknownKeys.join(", ")}`)
55
+ return
56
+ }
57
+
58
+ const nonNumeric = Object.entries(parsed).filter(([, v]) => typeof v !== "number")
59
+ if (nonNumeric.length > 0) {
60
+ this.#showError(`Values must be numeric: ${nonNumeric.map(([k]) => k).join(", ")}`)
61
+ return
62
+ }
63
+
64
+ this.#clearError()
65
+ }
66
+
67
+ get isValid() {
68
+ if (!this.hasJsonInputTarget) return true
69
+ const input = this.jsonInputTarget.value.trim()
70
+ if (!input) return true
71
+
72
+ try {
73
+ const parsed = JSON.parse(input)
74
+ if (typeof parsed !== "object" || Array.isArray(parsed) || parsed === null) return false
75
+ if (Object.keys(parsed).some(k => !ALLOWED_KEYS.includes(k))) return false
76
+ if (Object.values(parsed).some(v => typeof v !== "number")) return false
77
+ return true
78
+ } catch {
79
+ return false
80
+ }
81
+ }
82
+
83
+ #showError(message) {
84
+ if (this.hasErrorTarget) {
85
+ this.errorTarget.textContent = message
86
+ this.errorTarget.style.display = "block"
87
+ }
88
+ this.jsonInputTarget.classList.add("generation-settings-json-input--invalid")
89
+ }
90
+
91
+ #clearError() {
92
+ if (this.hasErrorTarget) {
93
+ this.errorTarget.textContent = ""
94
+ this.errorTarget.style.display = "none"
95
+ }
96
+ this.jsonInputTarget.classList.remove("generation-settings-json-input--invalid")
97
+ }
27
98
  }
@@ -0,0 +1,101 @@
1
+ import { Controller } from "@hotwired/stimulus"
2
+
3
+ // Connects to data-controller="message-stream"
4
+ // Opens an EventSource on connect, appends each delta to the content target,
5
+ // closes on `done` / `error`.
6
+ export default class extends Controller {
7
+ static targets = ["content"]
8
+ static values = { url: String }
9
+
10
+ connect() {
11
+ this.completed = false
12
+ this.source = new EventSource(this.urlValue)
13
+ this.source.addEventListener("message", (e) => this.#onDelta(e))
14
+ this.source.addEventListener("done", () => this.#onDone())
15
+ this.source.addEventListener("title", (e) => this.#onTitle(e))
16
+ this.source.addEventListener("saved", (e) => this.#onSaved(e))
17
+ this.source.addEventListener("error", (e) => this.#onError(e))
18
+ }
19
+
20
+ disconnect() {
21
+ this.#close()
22
+ }
23
+
24
+ #onDelta(event) {
25
+ let delta
26
+ try { delta = JSON.parse(event.data).delta } catch { return }
27
+ if (!delta) return
28
+ this.contentTarget.append(delta)
29
+ this.#scrollToBottom()
30
+ }
31
+
32
+ #onTitle(event) {
33
+ try {
34
+ const data = JSON.parse(event.data)
35
+ if (data.turbo_stream && window.Turbo) {
36
+ window.Turbo.renderStreamMessage(data.turbo_stream)
37
+ }
38
+ } catch {}
39
+ }
40
+
41
+ #onSaved(event) {
42
+ try {
43
+ const data = JSON.parse(event.data)
44
+ this.element.dataset.savedExecutionId = data.execution_id
45
+ if (data.html) this.#swapInRenderedMessage(data.html)
46
+ } catch {}
47
+ }
48
+
49
+ // Swap the streaming bubble's role + content with the host-rendered _message
50
+ // partial output so any markdown / syntax highlighting / partial customizations
51
+ // applied on reload also apply right after the stream finishes. We don't
52
+ // replace the whole element — that would disconnect this controller and
53
+ // close the EventSource before `title` / `done` arrive.
54
+ #swapInRenderedMessage(html) {
55
+ const doc = new DOMParser().parseFromString(html, "text/html")
56
+ const newBubble = doc.querySelector(".message")
57
+ if (!newBubble) return
58
+
59
+ const newRole = newBubble.querySelector(".message-role")
60
+ const oldRole = this.element.querySelector(".message-role")
61
+ if (newRole && oldRole) oldRole.innerHTML = newRole.innerHTML
62
+
63
+ const newContent = newBubble.querySelector(".message-content")
64
+ if (newContent) this.contentTarget.innerHTML = newContent.innerHTML
65
+
66
+ this.element.classList.remove("streaming")
67
+ if (newBubble.id) this.element.id = newBubble.id
68
+ }
69
+
70
+ #onDone() {
71
+ this.completed = true
72
+ this.#close()
73
+ }
74
+
75
+ #onError(event) {
76
+ // EventSource fires onerror whenever the connection closes — including
77
+ // immediately after a clean `event: done`. Suppress those.
78
+ if (this.completed) {
79
+ this.#close()
80
+ return
81
+ }
82
+ let message = "Stream interrupted."
83
+ try { if (event.data) message = JSON.parse(event.data).message || message } catch {}
84
+ const errEl = document.createElement("p")
85
+ errEl.className = "stream-error"
86
+ errEl.textContent = `[error] ${message}`
87
+ this.contentTarget.appendChild(errEl)
88
+ this.#close()
89
+ }
90
+
91
+ #close() {
92
+ if (this.source && this.source.readyState !== EventSource.CLOSED) {
93
+ this.source.close()
94
+ }
95
+ }
96
+
97
+ #scrollToBottom() {
98
+ const chatMessages = document.getElementById("chat-messages")
99
+ if (chatMessages) chatMessages.scrollTop = chatMessages.scrollHeight
100
+ }
101
+ }
@@ -68,6 +68,35 @@ class Chat < ApplicationRecord
68
68
  new_message
69
69
  end
70
70
 
71
+ # Stream the assistant response from the LLM. Yields each parsed SSE event.
72
+ # Returns the assembled content. Caller is responsible for persistence.
73
+ def stream_assistant_response(prompt_execution, jwt_token, generation_settings: {}, &block)
74
+ summarized_context, prompt = build_streaming_context(prompt_execution, jwt_token)
75
+ LlmMetaClient::ServerQuery.new.stream(
76
+ jwt_token,
77
+ prompt_execution.llm_uuid,
78
+ prompt_execution.model,
79
+ summarized_context,
80
+ prompt,
81
+ generation_settings: generation_settings,
82
+ &block
83
+ )
84
+ end
85
+
86
+ # Persist the streamed assistant response. Skips persistence if content is blank.
87
+ def finalize_streamed_response(prompt_execution, content, jwt_token)
88
+ return nil if content.blank?
89
+
90
+ prompt_execution.update!(
91
+ llm_platform: resolve_llm_type(prompt_execution.llm_uuid, jwt_token),
92
+ response: content
93
+ )
94
+ messages.create!(
95
+ role: "assistant",
96
+ prompt_navigator_prompt_execution: prompt_execution
97
+ )
98
+ end
99
+
71
100
  # Get all messages in order
72
101
  def ordered_messages
73
102
  messages
@@ -115,31 +144,40 @@ class Chat < ApplicationRecord
115
144
 
116
145
  # Send messages to LLM and get response
117
146
  def send_to_llm(prompt_execution, jwt_token, tool_ids: [], generation_settings: {})
118
- llm_uuid = prompt_execution.llm_uuid
119
- model = prompt_execution.model
147
+ summarized_context, prompt = build_streaming_context(prompt_execution, jwt_token)
148
+ LlmMetaClient::ServerQuery.new.call(
149
+ jwt_token,
150
+ prompt_execution.llm_uuid,
151
+ prompt_execution.model,
152
+ summarized_context,
153
+ prompt,
154
+ tool_ids: tool_ids,
155
+ generation_settings: generation_settings
156
+ )
157
+ end
120
158
 
121
- # Get LLM options
159
+ # Build the (summarized_context, prompt) tuple for an LLM call.
160
+ # Shared by both the synchronous and streaming paths.
161
+ def build_streaming_context(prompt_execution, jwt_token)
122
162
  llm_options = LlmMetaClient::ServerResource.available_llm_options(jwt_token)
123
-
124
- # Error if no LLM is available
125
163
  raise LlmMetaClient::Exceptions::OllamaUnavailableError, "No LLM available" if llm_options.empty?
126
164
 
127
- # Build prompt and context from direct lineage via PromptExecution
128
165
  last_msg = ordered_messages.last
129
166
  pe = last_msg.prompt_navigator_prompt_execution
130
-
131
167
  prompt = { role: last_msg.role, prompt: pe.prompt }
132
168
  context = pe.build_context(limit: Rails.configuration.summarize_conversation_count)
133
169
 
134
- if context.empty?
135
- summarized_context = "No context available."
136
- else
137
- summarized_context = LlmMetaClient::ServerQuery.new.call(jwt_token, llm_uuid, model, context, "Please summarize the context")
138
- end
139
-
170
+ summarized_context =
171
+ if context.empty?
172
+ "No context available."
173
+ else
174
+ LlmMetaClient::ServerQuery.new.call(
175
+ jwt_token, prompt_execution.llm_uuid, prompt_execution.model,
176
+ context, "Please summarize the context"
177
+ )
178
+ end
140
179
  summarized_context += "Additional prompt: Responses from the assistant must consist solely of the response body."
141
180
 
142
- # Send chat request using LlmMetaClient::ServerQuery
143
- LlmMetaClient::ServerQuery.new.call(jwt_token, llm_uuid, model, summarized_context, prompt, tool_ids: tool_ids, generation_settings: generation_settings)
181
+ [ summarized_context, prompt ]
144
182
  end
145
183
  end
@@ -0,0 +1,8 @@
1
+ <div id="chat-sidebar">
2
+ <%%= chat_list(
3
+ ->(id) { chat_path(id) },
4
+ active_uuid: chat&.uuid,
5
+ download_csv_path: ->(id) { download_csv_chat_path(id) },
6
+ download_all_csv_path: download_all_csv_chats_path
7
+ ) %>
8
+ </div>
@@ -0,0 +1,7 @@
1
+ <%% stream_url = chat_stream_path(chat_id: chat.uuid, execution_id: prompt_execution.execution_id, generation_settings_json: @generation_settings_json.presence) %>
2
+ <div class="message assistant streaming"
3
+ data-controller="message-stream"
4
+ data-message-stream-url-value="<%%= stream_url %>">
5
+ <div class="message-role">🤖 streaming…</div>
6
+ <div class="message-content" data-message-stream-target="content"></div>
7
+ </div>
@@ -5,10 +5,10 @@
5
5
  <%% # User message is already shown by JavaScript on form submit %>
6
6
  <%% # Only render assistant message here %>
7
7
 
8
- <%% # Render assistant message if available %>
9
- <%% if @assistant_message %>
8
+ <%%# Render streaming assistant placeholder; the message-stream Stimulus controller opens an EventSource and appends deltas as they arrive. %>
9
+ <%% if @prompt_execution && @error_message.blank? %>
10
10
  <%%= turbo_stream.append "messages-list" do %>
11
- <%%= render partial: "chats/message", locals: { message: @assistant_message } %>
11
+ <%%= render partial: "chats/streaming_message", locals: { chat: @chat, prompt_execution: @prompt_execution } %>
12
12
  <%% end %>
13
13
  <%% end %>
14
14
 
@@ -25,9 +25,7 @@
25
25
 
26
26
  <%% # Update chat sidebar %>
27
27
  <%%= turbo_stream.replace "chat-sidebar" do %>
28
- <div id="chat-sidebar">
29
- <%%= chat_list(->(id) { chat_path(id) }, active_uuid: @chat&.uuid, download_csv_path: ->(id) { download_csv_chat_path(id) }, download_all_csv_path: download_all_csv_chats_path) %>
30
- </div>
28
+ <%%= render partial: "chats/chat_sidebar", locals: { chat: @chat } %>
31
29
  <%% end %>
32
30
 
33
31
  <%% # Update history sidebar - replace entire content to ensure update %>
@@ -5,10 +5,10 @@
5
5
  <%% # User message is already shown by JavaScript on form submit %>
6
6
  <%% # Only render assistant message here %>
7
7
 
8
- <%% # Render assistant message if available %>
9
- <%% if @assistant_message %>
8
+ <%%# Render streaming assistant placeholder; the message-stream Stimulus controller opens an EventSource and appends deltas as they arrive. %>
9
+ <%% if @prompt_execution && @error_message.blank? %>
10
10
  <%%= turbo_stream.append "messages-list" do %>
11
- <%%= render partial: "chats/message", locals: { message: @assistant_message } %>
11
+ <%%= render partial: "chats/streaming_message", locals: { chat: @chat, prompt_execution: @prompt_execution } %>
12
12
  <%% end %>
13
13
  <%% end %>
14
14
 
@@ -20,7 +20,9 @@
20
20
  class="generation-settings-json-input"
21
21
  rows="8"
22
22
  placeholder='{"temperature": 0.7, "top_k": 40, "top_p": 0.9, "max_tokens": 4096, "repeat_penalty": 1.1}'
23
- data-<%%= stimulus_controller %>-target="jsonInput"></textarea>
23
+ data-<%%= stimulus_controller %>-target="jsonInput"
24
+ data-action="input-><%%= stimulus_controller %>#validate"></textarea>
25
+ <div class="generation-settings-error" data-<%%= stimulus_controller %>-target="error" style="display: none;"></div>
24
26
  <div class="generation-settings-hint">
25
27
  Available keys: temperature, top_k, top_p, max_tokens, repeat_penalty
26
28
  </div>
@@ -1,5 +1,39 @@
1
+ require "net/http"
2
+ require "uri"
3
+ require "json"
4
+
1
5
  module LlmMetaClient
2
6
  class ServerQuery
7
+ # Stream LLM responses incrementally. Yields each content delta event
8
+ # ({ event: "message", data: { "delta" => "..." } }) to the caller's block.
9
+ # Upstream "done" markers are absorbed (end-of-stream is signaled by the
10
+ # block returning); upstream "error" events raise ServerError.
11
+ # Returns the assembled content string. Tool calls are not supported here.
12
+ def stream(id_token, api_key_uuid, model_id, context, user_content, generation_settings: {})
13
+ context_and_user_content = "Context:#{context}, User Prompt: #{user_content}"
14
+ debug_log "Streaming request to LLM: \n===>\n#{context_and_user_content}\n===>"
15
+
16
+ body = { prompt: context_and_user_content }
17
+ body[:generation_settings] = generation_settings if generation_settings.present?
18
+
19
+ assembled = +""
20
+ request_stream(api_key_uuid, id_token, model_id, body) do |event|
21
+ case event[:event]
22
+ when "message"
23
+ assembled << event[:data]["delta"].to_s
24
+ yield event if block_given?
25
+ when "done"
26
+ # End-of-stream marker from upstream; no-op here.
27
+ when "error"
28
+ raise Exceptions::ServerError, format_stream_error(event[:data])
29
+ else
30
+ yield event if block_given?
31
+ end
32
+ end
33
+
34
+ assembled
35
+ end
36
+
3
37
  def call(id_token, api_key_uuid, model_id, context, user_content, tool_ids: [], generation_settings: {})
4
38
  debug_log "Context: #{context}"
5
39
  context_and_user_content = "Context:#{context}, User Prompt: #{user_content}"
@@ -7,13 +41,17 @@ module LlmMetaClient
7
41
 
8
42
  response = request(api_key_uuid, id_token, model_id, context_and_user_content, tool_ids, generation_settings)
9
43
 
10
- raise Exceptions::ServerError, "LLM server returned HTTP #{response.code}" unless response.success?
44
+ unless response.success?
45
+ raise Exceptions::ServerError, build_error_message(response.code.to_i, response.parsed_response)
46
+ end
11
47
 
12
48
  response_body = response.parsed_response
13
49
 
14
50
  raise Exceptions::InvalidResponseError, "LLM server returned non-JSON response" unless response_body.is_a?(Hash)
15
51
 
16
52
  content = response_body.dig("response", "message") || ""
53
+ tool_calls = response_body.dig("response", "tool_calls")
54
+ content = combine_with_tool_calls(content, tool_calls) if tool_calls.is_a?(Array) && tool_calls.any?
17
55
 
18
56
  raise Exceptions::EmptyResponseError, "LLM server returned empty response" if content.blank?
19
57
 
@@ -28,6 +66,28 @@ module LlmMetaClient
28
66
  Rails.logger.info(message) if Rails.env.development?
29
67
  end
30
68
 
69
+ def combine_with_tool_calls(message, tool_calls)
70
+ tool_section = format_tool_calls(tool_calls)
71
+ return tool_section if message.blank?
72
+ "#{message}\n\n---\n\n#{tool_section}"
73
+ end
74
+
75
+ def format_tool_calls(tool_calls)
76
+ lines = [ "**Tool calls**", "" ]
77
+ tool_calls.each do |tc|
78
+ name = tc["name"] || tc[:name] || "(unknown)"
79
+ args = tc["arguments"] || tc[:arguments]
80
+ args_str =
81
+ case args
82
+ when Hash, Array then args.to_json
83
+ when nil then ""
84
+ else args.to_s
85
+ end
86
+ lines << (args_str.empty? ? "- `#{name}`" : "- `#{name}` — `#{args_str}`")
87
+ end
88
+ lines.join("\n")
89
+ end
90
+
31
91
  def request(api_key_uuid, id_token, model_id, user_content, tool_ids, generation_settings)
32
92
  headers = { "Content-Type" => "application/json" }
33
93
  headers["Authorization"] = "Bearer #{id_token}" if id_token.present?
@@ -47,5 +107,93 @@ module LlmMetaClient
47
107
  def url(api_key_uuid, model_id)
48
108
  "#{Rails.application.config.llm_service_base_url}/api/llm_api_keys/#{api_key_uuid}/models/#{model_id}/chats"
49
109
  end
110
+
111
+ def stream_url(api_key_uuid, model_id)
112
+ "#{Rails.application.config.llm_service_base_url}/api/llm_api_keys/#{api_key_uuid}/models/#{model_id}/chat_streams"
113
+ end
114
+
115
+ def request_stream(api_key_uuid, id_token, model_id, body)
116
+ uri = URI(stream_url(api_key_uuid, model_id))
117
+
118
+ Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == "https", read_timeout: 600) do |http|
119
+ req = Net::HTTP::Post.new(uri)
120
+ req["Content-Type"] = "application/json"
121
+ req["Accept"] = "text/event-stream"
122
+ req["Authorization"] = "Bearer #{id_token}" if id_token.present?
123
+ req.body = body.to_json
124
+
125
+ http.request(req) do |response|
126
+ unless response.is_a?(Net::HTTPSuccess)
127
+ body = JSON.parse(response.read_body.to_s) rescue nil
128
+ raise Exceptions::ServerError, build_error_message(response.code.to_i, body)
129
+ end
130
+
131
+ buffer = +""
132
+ response.read_body do |chunk|
133
+ buffer << chunk
134
+ while (boundary = buffer.index("\n\n"))
135
+ raw_event = buffer.slice!(0, boundary + 2)
136
+ parsed = parse_sse_event(raw_event)
137
+ yield parsed if parsed
138
+ end
139
+ end
140
+ end
141
+ end
142
+ end
143
+
144
+ # Format an `event: error` SSE payload from llm_meta_server into a
145
+ # user-facing string. Payload shape: { "code" => "rate_limit", "message" => "..." }
146
+ def format_stream_error(data)
147
+ code = data["code"]
148
+ message = data["message"]
149
+ case code
150
+ when "rate_limit"
151
+ suffix = message.present? ? ": #{message}" : ""
152
+ "Rate limit exceeded — check your provider plan or retry shortly#{suffix}"
153
+ when "api_key_required"
154
+ message.presence || "API key required for this model"
155
+ else
156
+ message.presence || "Upstream stream error"
157
+ end
158
+ end
159
+
160
+ # Turn a non-success HTTP response from llm_meta_server into a user-facing
161
+ # error string. The server returns JSON like
162
+ # { "error" => "LLM API Rate limit exceeded", "message" => "Too many requests" }
163
+ # for known error classes; fall back to a generic message otherwise.
164
+ def build_error_message(status_code, body)
165
+ if body.is_a?(Hash)
166
+ err = body["error"]
167
+ msg = body["message"]
168
+ return "#{err}: #{msg}" if err.present? && msg.present?
169
+ return err if err.present?
170
+ return msg if msg.present?
171
+ end
172
+ case status_code
173
+ when 429 then "Rate limit exceeded — check your provider plan or retry shortly (HTTP 429)"
174
+ when 401, 403 then "LLM service rejected the request (HTTP #{status_code}) — check your API key"
175
+ when 502, 503, 504 then "LLM service is unavailable (HTTP #{status_code})"
176
+ else "LLM server returned HTTP #{status_code}"
177
+ end
178
+ end
179
+
180
+ def parse_sse_event(raw)
181
+ event_name = "message"
182
+ data_lines = []
183
+ raw.each_line(chomp: true) do |line|
184
+ next if line.empty?
185
+ if line.start_with?("event:")
186
+ event_name = line.sub(/^event:\s*/, "")
187
+ elsif line.start_with?("data:")
188
+ data_lines << line.sub(/^data:\s*/, "")
189
+ end
190
+ end
191
+ return nil if data_lines.empty?
192
+
193
+ data = JSON.parse(data_lines.join("\n"))
194
+ { event: event_name, data: data }
195
+ rescue JSON::ParserError
196
+ nil
197
+ end
50
198
  end
51
199
  end
@@ -1,3 +1,3 @@
1
1
  module LlmMetaClient
2
- VERSION = "1.0.1"
2
+ VERSION = "1.2.0"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm_meta_client
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.1
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - dhq_boiler
@@ -47,16 +47,22 @@ dependencies:
47
47
  name: prompt_navigator
48
48
  requirement: !ruby/object:Gem::Requirement
49
49
  requirements:
50
- - - "~>"
50
+ - - ">="
51
51
  - !ruby/object:Gem::Version
52
52
  version: '1.0'
53
+ - - "<"
54
+ - !ruby/object:Gem::Version
55
+ version: '3.0'
53
56
  type: :runtime
54
57
  prerelease: false
55
58
  version_requirements: !ruby/object:Gem::Requirement
56
59
  requirements:
57
- - - "~>"
60
+ - - ">="
58
61
  - !ruby/object:Gem::Version
59
62
  version: '1.0'
63
+ - - "<"
64
+ - !ruby/object:Gem::Version
65
+ version: '3.0'
60
66
  - !ruby/object:Gem::Dependency
61
67
  name: chat_manager
62
68
  requirement: !ruby/object:Gem::Requirement
@@ -103,18 +109,22 @@ files:
103
109
  - lib/generators/llm_meta_client/authentication/templates/db/migrate/create_users.rb
104
110
  - lib/generators/llm_meta_client/scaffold/scaffold_generator.rb
105
111
  - lib/generators/llm_meta_client/scaffold/templates/app/controllers/api/mcp_servers_controller.rb
112
+ - lib/generators/llm_meta_client/scaffold/templates/app/controllers/chat_streams_controller.rb
106
113
  - lib/generators/llm_meta_client/scaffold/templates/app/controllers/chats_controller.rb
107
114
  - lib/generators/llm_meta_client/scaffold/templates/app/controllers/prompts_controller.rb
108
115
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/chat_title_edit_controller.js
109
116
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/chats_form_controller.js
110
117
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/generation_settings_controller.js
111
118
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/llm_selector_controller.js
119
+ - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/message_stream_controller.js
112
120
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/controllers/tool_selector_controller.js
113
121
  - lib/generators/llm_meta_client/scaffold/templates/app/javascript/popover.js
114
122
  - lib/generators/llm_meta_client/scaffold/templates/app/models/chat.rb
115
123
  - lib/generators/llm_meta_client/scaffold/templates/app/models/message.rb
124
+ - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_chat_sidebar.html.erb
116
125
  - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_message.html.erb
117
126
  - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_messages_list.html.erb
127
+ - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/_streaming_message.html.erb
118
128
  - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/create.turbo_stream.erb
119
129
  - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/edit.html.erb
120
130
  - lib/generators/llm_meta_client/scaffold/templates/app/views/chats/new.html.erb
@@ -163,7 +173,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
163
173
  - !ruby/object:Gem::Version
164
174
  version: '0'
165
175
  requirements: []
166
- rubygems_version: 4.0.3
176
+ rubygems_version: 3.6.9
167
177
  specification_version: 4
168
178
  summary: A Rails Engine for integrating multiple LLM providers into your application.
169
179
  test_files: []