legion-llm 0.5.14 → 0.5.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ceb7fa0519b3985439d577579889de688b42a9e8d3c5fdbc6a1be5b22c7fb2ba
4
- data.tar.gz: 80fd2cc36a19cf49783f9429e4aca545e26c3b8ca130fa649800955ce52001d2
3
+ metadata.gz: 2dea674b5405be2c2863f1c6dd568f21ec8baad8db42eeaa457cd6dcdc881bc8
4
+ data.tar.gz: ee16678e6be6bc612d906bdd754d7e3db79f803c52b465fe3fb2ed762812aa20
5
5
  SHA512:
6
- metadata.gz: 82f00569a04406cc64983f447e3fbd5fbb4c9765f9caa59543a7db0dc612ca659368dae3441f4fa77ae27aaf74347dac997cccb842dc3a2e3e99184ac52f591e
7
- data.tar.gz: 40d6ec150d5832f0a4d0ac8f9f9e25754c58e6d4d5a528eb13c2c1420a62a8e6e579c3dda417fdd4dcff8f9d6ba888d7613baba266596c442e787f041a1a50a0
6
+ metadata.gz: 0a743021a3a3540290cfc4ea3c119fdc42bbba38eb5115b2883fefc6a4da0bceda04c45136c72d233559d9de53c41af198bfa057eed5579fc345121fade8cd74
7
+ data.tar.gz: 58ba674f0aa898bd75895bfaeb93b5f31bf2aa7f6dc0dbbc8d3f0afcb96cbfcabb80b68798c2c8758f8df8726c9bfd86a567e1e531728f45f91af96b53e15e7b
data/CHANGELOG.md CHANGED
@@ -1,5 +1,25 @@
1
1
  # Legion LLM Changelog
2
2
 
3
+ ## [0.5.16] - 2026-03-28
4
+
5
+ ### Fixed
6
+ - `POST /api/llm/inference` endpoint now routes through the 18-step pipeline when `pipeline_enabled?` is true — previously it created a bare `RubyLLM` session and called `session.ask` directly, bypassing RAG (step 8), GAIA advisory (step 7), knowledge capture (step 19), billing, and classification
7
+ - `POST /api/llm/chat` sync fallback path now routes through the pipeline (previously called `session.ask` on a bare session the same way)
8
+ - `_dispatch_chat` pipeline gate now fires when `messages:` array is present in addition to `message:` string — `Legion::LLM.chat(messages: [...])` was silently falling through to the legacy path even with `pipeline_enabled: true`
9
+ - `Pipeline::Executor#step_provider_call` and `#step_provider_call_stream` now inject prior messages via `session.add_message` before the final `ask` — multi-turn conversations passed as a `messages:` array now correctly preserve history at the provider level
10
+
11
+ ### Added
12
+ - `spec/legion/llm/pipeline/executor_multi_turn_spec.rb`: specs verifying prior-message injection in single-turn, multi-turn, two-message, and streaming cases
13
+ - `spec/legion/llm/routes_inference_spec.rb`: specs verifying that `Legion::LLM.chat(messages: [...])` routes through the pipeline, carries tracing/timeline, handles multi-turn history, passes tool classes, and falls back gracefully when pipeline is disabled
14
+
15
+ ## [0.5.15] - 2026-03-28
16
+
17
+ ### Added
18
+ - `Legion::LLM::Routes` Sinatra extension module (`lib/legion/llm/routes.rb`): contains all `/api/llm/*` route definitions (chat, inference, providers) extracted from `LegionIO/lib/legion/api/llm.rb`. Self-registers with `Legion::API.register_library_routes('llm', Legion::LLM::Routes)` at the end of `Legion::LLM.start`.
19
+
20
+ ### Changed
21
+ - `Legion::LLM.start` now calls `register_routes` after setting `@started = true`, mounting routes onto the API if `Legion::API` is available.
22
+
3
23
  ## [0.5.14] - 2026-03-27
4
24
 
5
25
  ### Added
data/CLAUDE.md CHANGED
@@ -8,7 +8,7 @@
8
8
  Core LegionIO gem providing LLM capabilities to all extensions. Wraps ruby_llm to provide a consistent interface for chat, embeddings, tool use, and agents across multiple providers (Bedrock, Anthropic, OpenAI, Gemini, Ollama). Includes a dynamic weighted routing engine that dispatches requests across local, fleet, and cloud tiers based on caller intent, priority rules, time schedules, cost multipliers, and real-time provider health.
9
9
 
10
10
  **GitHub**: https://github.com/LegionIO/legion-llm
11
- **Version**: 0.5.3
11
+ **Version**: 0.5.15
12
12
  **License**: Apache-2.0
13
13
 
14
14
  ## Architecture
@@ -325,7 +325,7 @@ In-memory signal consumer with pluggable handlers. Adjusts effective priorities
325
325
  | `lib/legion/llm/structured_output.rb` | JSON schema enforcement with native response_format and prompt fallback |
326
326
  | `lib/legion/llm/errors.rb` | Typed error hierarchy: LLMError base + AuthError, RateLimitError, ContextOverflow, ProviderError, ProviderDown, UnsupportedCapability, PipelineError |
327
327
  | `lib/legion/llm/conversation_store.rb` | ConversationStore: in-memory LRU (256 slots) + optional Sequel DB persistence + spool fallback |
328
- | `lib/legion/llm/version.rb` | Version constant (0.5.3) |
328
+ | `lib/legion/llm/version.rb` | Version constant |
329
329
  | `lib/legion/llm/quality_checker.rb` | QualityChecker module with QualityResult struct |
330
330
  | `lib/legion/llm/escalation_history.rb` | EscalationHistory mixin: `escalation_history`, `escalated?`, `final_resolution`, `escalation_chain` |
331
331
  | `lib/legion/llm/router/escalation_chain.rb` | EscalationChain value object |
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  LLM integration for the [LegionIO](https://github.com/LegionIO/LegionIO) framework. Wraps [ruby_llm](https://github.com/crmne/ruby_llm) to provide chat, embeddings, tool use, and agent capabilities to any Legion extension.
4
4
 
5
- **Version**: 0.5.3
5
+ **Version**: 0.5.15
6
6
 
7
7
  ## Installation
8
8
 
@@ -652,7 +652,7 @@ bundle exec rspec
652
652
  Tests use stubbed `Legion::Logging` and `Legion::Settings` modules (no need for the full LegionIO stack):
653
653
 
654
654
  ```bash
655
- bundle exec rspec # Run all 882 tests
655
+ bundle exec rspec # Run all tests
656
656
  bundle exec rubocop # Lint (0 offenses)
657
657
  bundle exec rspec spec/legion/llm_spec.rb # Run specific test file
658
658
  bundle exec rspec spec/legion/llm/router_spec.rb # Router tests only
@@ -26,7 +26,7 @@ module Legion
26
26
  end
27
27
 
28
28
  def create_conversation(conversation_id, **metadata)
29
- conversations[conversation_id] = { messages: [], metadata: metadata, accessed_at: Time.now }
29
+ conversations[conversation_id] = { messages: [], metadata: metadata, lru_tick: next_tick }
30
30
  evict_if_needed
31
31
  persist_conversation(conversation_id, metadata)
32
32
  end
@@ -41,6 +41,7 @@ module Legion
41
41
 
42
42
  def reset!
43
43
  @conversations = {}
44
+ @lru_counter = 0
44
45
  end
45
46
 
46
47
  private
@@ -49,6 +50,10 @@ module Legion
49
50
  @conversations ||= {}
50
51
  end
51
52
 
53
+ def next_tick
54
+ @lru_counter = (@lru_counter || 0) + 1
55
+ end
56
+
52
57
  def ensure_conversation(conversation_id)
53
58
  return if in_memory?(conversation_id)
54
59
 
@@ -63,13 +68,13 @@ module Legion
63
68
  def touch(conversation_id)
64
69
  return unless in_memory?(conversation_id)
65
70
 
66
- conversations[conversation_id][:accessed_at] = Time.now
71
+ conversations[conversation_id][:lru_tick] = next_tick
67
72
  end
68
73
 
69
74
  def evict_if_needed
70
75
  return unless conversations.size > self::MAX_CONVERSATIONS
71
76
 
72
- oldest_id = conversations.min_by { |_, v| v[:accessed_at] }&.first
77
+ oldest_id = conversations.min_by { |_, v| v[:lru_tick] }&.first
73
78
  conversations.delete(oldest_id) if oldest_id
74
79
  end
75
80
 
@@ -166,7 +166,11 @@ module Legion
166
166
  )
167
167
  session.with_instructions(injected_system) if injected_system
168
168
 
169
- message_content = @request.messages.last&.dig(:content)
169
+ messages = @request.messages
170
+ prior = messages.size > 1 ? messages[0..-2] : []
171
+ prior.each { |m| session.add_message(m) }
172
+
173
+ message_content = messages.last&.dig(:content)
170
174
  @raw_response = message_content ? session.ask(message_content) : session
171
175
 
172
176
  @timestamps[:provider_end] = Time.now
@@ -228,7 +232,11 @@ module Legion
228
232
  (@request.tools || []).each { |tool| session.with_tool(tool) if tool.is_a?(Class) }
229
233
  ToolRegistry.tools.each { |t| session.with_tool(t) } if defined?(ToolRegistry)
230
234
 
231
- message_content = @request.messages.last&.dig(:content)
235
+ messages = @request.messages
236
+ prior = messages.size > 1 ? messages[0..-2] : []
237
+ prior.each { |m| session.add_message(m) }
238
+
239
+ message_content = messages.last&.dig(:content)
232
240
  @raw_response = session.ask(message_content, &)
233
241
 
234
242
  @timestamps[:provider_end] = Time.now
@@ -0,0 +1,413 @@
1
+ # frozen_string_literal: true
2
+
3
+ # Self-registering route module for legion-llm.
4
+ # All routes previously defined in LegionIO/lib/legion/api/llm.rb now live here
5
+ # and are mounted via Legion::API.register_library_routes when legion-llm boots.
6
+ #
7
+ # LegionIO/lib/legion/api/llm.rb is preserved for backward compatibility but guards
8
+ # its registration with defined?(Legion::LLM::Routes) so double-registration is avoided.
9
+
10
+ require 'securerandom'
11
+
12
+ module Legion
13
+ module LLM
14
+ module Routes
15
+ def self.registered(app) # rubocop:disable Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity,Metrics/AbcSize,Metrics/MethodLength
16
+ app.helpers do # rubocop:disable Metrics/BlockLength
17
+ # Minimal fallback implementations of shared API helpers.
18
+ # These are used when Legion::LLM::Routes is mounted on a bare Sinatra app.
19
+ # When mounted via Legion::API (the normal path), Legion::API::Helpers and
20
+ # Legion::API::Validators provide full implementations that take precedence.
21
+ unless method_defined?(:parse_request_body)
22
+ define_method(:parse_request_body) do
23
+ raw = request.body.read
24
+ return {} if raw.nil? || raw.empty?
25
+
26
+ begin
27
+ parsed = Legion::JSON.load(raw)
28
+ rescue StandardError
29
+ halt 400, { 'Content-Type' => 'application/json' },
30
+ Legion::JSON.dump({ error: { code: 'invalid_json', message: 'request body is not valid JSON' } })
31
+ end
32
+
33
+ unless parsed.respond_to?(:transform_keys)
34
+ halt 400, { 'Content-Type' => 'application/json' },
35
+ Legion::JSON.dump({ error: { code: 'invalid_request_body',
36
+ message: 'request body must be a JSON object' } })
37
+ end
38
+
39
+ parsed.transform_keys(&:to_sym)
40
+ end
41
+ end
42
+
43
+ unless method_defined?(:validate_required!)
44
+ define_method(:validate_required!) do |body, *keys|
45
+ missing = keys.select { |k| body[k].nil? || (body[k].respond_to?(:empty?) && body[k].empty?) }
46
+ return if missing.empty?
47
+
48
+ halt 400, { 'Content-Type' => 'application/json' },
49
+ Legion::JSON.dump({ error: { code: 'missing_fields',
50
+ message: "required: #{missing.join(', ')}" } })
51
+ end
52
+ end
53
+
54
+ unless method_defined?(:json_response)
55
+ define_method(:json_response) do |data, status_code: 200|
56
+ content_type :json
57
+ status status_code
58
+ Legion::JSON.dump({ data: data })
59
+ end
60
+ end
61
+
62
+ unless method_defined?(:json_error)
63
+ define_method(:json_error) do |code, message, status_code: 400|
64
+ content_type :json
65
+ status status_code
66
+ Legion::JSON.dump({ error: { code: code, message: message } })
67
+ end
68
+ end
69
+
70
+ unless method_defined?(:require_llm!)
71
+ define_method(:require_llm!) do
72
+ return if defined?(Legion::LLM) &&
73
+ Legion::LLM.respond_to?(:started?) &&
74
+ Legion::LLM.started?
75
+
76
+ halt 503, { 'Content-Type' => 'application/json' },
77
+ Legion::JSON.dump({ error: { code: 'llm_unavailable',
78
+ message: 'LLM subsystem is not available' } })
79
+ end
80
+ end
81
+
82
+ unless method_defined?(:cache_available?)
83
+ define_method(:cache_available?) do
84
+ defined?(Legion::Cache) &&
85
+ Legion::Cache.respond_to?(:connected?) &&
86
+ Legion::Cache.connected?
87
+ end
88
+ end
89
+
90
+ unless method_defined?(:gateway_available?)
91
+ define_method(:gateway_available?) do
92
+ defined?(Legion::Extensions::LLM::Gateway::Runners::Inference)
93
+ end
94
+ end
95
+
96
+ unless method_defined?(:validate_tools!)
97
+ define_method(:validate_tools!) do |tool_list|
98
+ unless tool_list.is_a?(Array) && tool_list.all? { |t| t.respond_to?(:transform_keys) }
99
+ halt 400, { 'Content-Type' => 'application/json' },
100
+ Legion::JSON.dump({ error: { code: 'invalid_tools',
101
+ message: 'tools must be an array of objects' } })
102
+ end
103
+
104
+ invalid = tool_list.any? do |t|
105
+ ts = t.transform_keys(&:to_sym)
106
+ ts[:name].to_s.empty?
107
+ end
108
+ return unless invalid
109
+
110
+ halt 400, { 'Content-Type' => 'application/json' },
111
+ Legion::JSON.dump({ error: { code: 'invalid_tools',
112
+ message: 'each tool must have a non-empty name' } })
113
+ end
114
+ end
115
+
116
+ unless method_defined?(:validate_messages!)
117
+ define_method(:validate_messages!) do |msg_list|
118
+ valid = msg_list.all? do |m|
119
+ next false unless m.respond_to?(:key?) && m.respond_to?(:[])
120
+
121
+ role = m[:role] || m['role']
122
+ content_value = m[:content] || m['content']
123
+
124
+ !role.to_s.empty? &&
125
+ (m.key?(:content) || m.key?('content')) &&
126
+ !content_value.nil? &&
127
+ !(content_value.respond_to?(:empty?) && content_value.empty?)
128
+ end
129
+ return if valid
130
+
131
+ halt 400, { 'Content-Type' => 'application/json' },
132
+ Legion::JSON.dump({ error: { code: 'invalid_messages',
133
+ message: 'each message must be an object with non-empty role and content' } })
134
+ end
135
+ end
136
+ end
137
+
138
+ register_chat(app)
139
+ register_providers(app)
140
+ end
141
+
142
+ def self.register_chat(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
143
+ register_inference(app)
144
+
145
+ app.post '/api/llm/chat' do # rubocop:disable Metrics/BlockLength
146
+ Legion::Logging.debug "API: POST /api/llm/chat params=#{params.keys}" if defined?(Legion::Logging)
147
+ require_llm!
148
+
149
+ body = parse_request_body
150
+ validate_required!(body, :message)
151
+
152
+ message = body[:message]
153
+
154
+ if defined?(Legion::MCP::TierRouter)
155
+ tier_result = Legion::MCP::TierRouter.route(
156
+ intent: message,
157
+ params: body.except(:message, :model, :provider, :request_id),
158
+ context: {}
159
+ )
160
+ if tier_result[:tier]&.zero?
161
+ halt json_response({
162
+ response: tier_result[:response],
163
+ tier: 0,
164
+ latency_ms: tier_result[:latency_ms],
165
+ pattern_confidence: tier_result[:pattern_confidence]
166
+ })
167
+ end
168
+ end
169
+
170
+ request_id = body[:request_id] || SecureRandom.uuid
171
+ model = body[:model]
172
+ provider = body[:provider]
173
+
174
+ if gateway_available?
175
+ ingress_result = Legion::Ingress.run(
176
+ payload: { message: message, model: model, provider: provider,
177
+ request_id: request_id },
178
+ runner_class: 'Legion::Extensions::LLM::Gateway::Runners::Inference',
179
+ function: 'chat',
180
+ source: 'api'
181
+ )
182
+
183
+ unless ingress_result[:success]
184
+ Legion::Logging.error "[api/llm/chat] ingress failed: #{ingress_result}" if defined?(Legion::Logging)
185
+ err = ingress_result[:error] || ingress_result[:status]
186
+ err_code = err.respond_to?(:dig) ? (err[:code] || 'gateway_error') : err.to_s
187
+ err_message = err.respond_to?(:dig) ? (err[:message] || err.to_s) : err.to_s
188
+ halt json_error(err_code, err_message, status_code: 502)
189
+ end
190
+
191
+ result = ingress_result[:result]
192
+
193
+ if result.nil?
194
+ Legion::Logging.warn "[api/llm/chat] runner returned nil (status=#{ingress_result[:status]})" if defined?(Legion::Logging)
195
+ halt json_error('empty_result', 'Gateway runner returned no result', status_code: 502)
196
+ end
197
+
198
+ if result.is_a?(Hash) && result[:error]
199
+ re = result[:error]
200
+ re_code = re.respond_to?(:dig) ? (re[:code] || 'gateway_error') : re.to_s
201
+ re_message = re.respond_to?(:dig) ? (re[:message] || re.to_s) : re.to_s
202
+ halt json_error(re_code, re_message, status_code: 502)
203
+ end
204
+
205
+ response_content = if result.respond_to?(:content)
206
+ result.content
207
+ elsif result.is_a?(Hash)
208
+ result[:response] || result[:content] || result.to_s
209
+ else
210
+ result.to_s
211
+ end
212
+
213
+ meta = { routed_via: 'gateway' }
214
+ meta[:model] = result.model.to_s if result.respond_to?(:model)
215
+ meta[:tokens_in] = result.input_tokens if result.respond_to?(:input_tokens)
216
+ meta[:tokens_out] = result.output_tokens if result.respond_to?(:output_tokens)
217
+
218
+ halt json_response({ response: response_content, meta: meta }, status_code: 201)
219
+ end
220
+
221
+ if cache_available? && env['HTTP_X_LEGION_SYNC'] != 'true'
222
+ llm = Legion::LLM
223
+ rc = Legion::LLM::ResponseCache
224
+ rc.init_request(request_id)
225
+
226
+ Thread.new do
227
+ session = llm.chat_direct(model: model, provider: provider)
228
+ response = session.ask(message)
229
+ rc.complete(
230
+ request_id,
231
+ response: response.content,
232
+ meta: {
233
+ model: session.model.to_s,
234
+ tokens_in: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
235
+ tokens_out: response.respond_to?(:output_tokens) ? response.output_tokens : nil
236
+ }
237
+ )
238
+ rescue StandardError => e
239
+ Legion::Logging.error "API POST /api/llm/chat async: #{e.class} — #{e.message}" if defined?(Legion::Logging)
240
+ rc.fail_request(request_id, code: 'llm_error', message: e.message)
241
+ end
242
+
243
+ Legion::Logging.info "API: LLM chat request #{request_id} queued async" if defined?(Legion::Logging)
244
+ json_response({ request_id: request_id, poll_key: "llm:#{request_id}:status" },
245
+ status_code: 202)
246
+ else
247
+ result = Legion::LLM.chat(message: message, model: model, provider: provider,
248
+ caller: { source: 'api', path: request.path })
249
+ if result.is_a?(Legion::LLM::Pipeline::Response)
250
+ raw_msg = result.message
251
+ content = raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg.to_s
252
+ routing = result.routing || {}
253
+ resolved_model = routing[:model] || routing['model']
254
+ tokens = result.tokens || {}
255
+ Legion::Logging.info "API: LLM chat request #{request_id} completed sync model=#{resolved_model}" if defined?(Legion::Logging)
256
+ json_response(
257
+ {
258
+ response: content,
259
+ meta: {
260
+ model: resolved_model.to_s,
261
+ tokens_in: tokens[:input],
262
+ tokens_out: tokens[:output]
263
+ }
264
+ },
265
+ status_code: 201
266
+ )
267
+ else
268
+ response = result
269
+ Legion::Logging.info "API: LLM chat request #{request_id} completed sync" if defined?(Legion::Logging)
270
+ json_response(
271
+ {
272
+ response: response.respond_to?(:content) ? response.content : response.to_s,
273
+ meta: {
274
+ model: response.respond_to?(:model_id) ? response.model_id.to_s : model.to_s,
275
+ tokens_in: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
276
+ tokens_out: response.respond_to?(:output_tokens) ? response.output_tokens : nil
277
+ }
278
+ },
279
+ status_code: 201
280
+ )
281
+ end
282
+ end
283
+ end
284
+ end
285
+
286
+ def self.register_inference(app) # rubocop:disable Metrics/MethodLength,Metrics/AbcSize,Metrics/CyclomaticComplexity,Metrics/PerceivedComplexity
287
+ app.post '/api/llm/inference' do # rubocop:disable Metrics/BlockLength
288
+ require_llm!
289
+ body = parse_request_body
290
+ validate_required!(body, :messages)
291
+
292
+ messages = body[:messages]
293
+ raw_tools = body[:tools]
294
+ model = body[:model]
295
+ provider = body[:provider]
296
+
297
+ unless messages.is_a?(Array)
298
+ halt 400, { 'Content-Type' => 'application/json' },
299
+ Legion::JSON.dump({ error: { code: 'invalid_messages', message: 'messages must be an array' } })
300
+ end
301
+
302
+ validate_messages!(messages)
303
+
304
+ unless raw_tools.nil? || raw_tools.is_a?(Array)
305
+ halt 400, { 'Content-Type' => 'application/json' },
306
+ Legion::JSON.dump({ error: { code: 'invalid_tools', message: 'tools must be an array' } })
307
+ end
308
+
309
+ tools = raw_tools || []
310
+
311
+ tool_declarations = []
312
+ unless tools.empty?
313
+ validate_tools!(tools)
314
+
315
+ tool_declarations = tools.map do |t|
316
+ ts = t.respond_to?(:transform_keys) ? t.transform_keys(&:to_sym) : t
317
+ tname = ts[:name].to_s
318
+ tdesc = ts[:description].to_s
319
+ tparams = ts[:parameters] || {}
320
+ Class.new do
321
+ define_singleton_method(:tool_name) { tname }
322
+ define_singleton_method(:description) { tdesc }
323
+ define_singleton_method(:parameters) { tparams }
324
+ define_method(:call) { |**_| raise NotImplementedError, "#{tname} executes client-side only" }
325
+ end
326
+ end
327
+ end
328
+
329
+ normalized_messages = messages.map do |m|
330
+ ms = m.respond_to?(:transform_keys) ? m.transform_keys(&:to_sym) : m
331
+ { role: ms[:role].to_s, content: ms[:content].to_s }
332
+ end
333
+
334
+ result = Legion::LLM.chat(
335
+ messages: normalized_messages,
336
+ model: model,
337
+ provider: provider,
338
+ tools: tool_declarations,
339
+ caller: { source: 'api', path: request.path }
340
+ )
341
+
342
+ if result.is_a?(Legion::LLM::Pipeline::Response)
343
+ raw_msg = result.message
344
+ content = raw_msg.is_a?(Hash) ? (raw_msg[:content] || raw_msg['content']) : raw_msg.to_s
345
+ routing = result.routing || {}
346
+ resolved_model = routing[:model] || routing['model']
347
+ tokens = result.tokens || {}
348
+ json_response({
349
+ content: content,
350
+ tool_calls: nil,
351
+ stop_reason: result.stop&.dig(:reason)&.to_s,
352
+ model: resolved_model.to_s,
353
+ input_tokens: tokens[:input],
354
+ output_tokens: tokens[:output]
355
+ }, status_code: 200)
356
+ else
357
+ response = result
358
+ tc_list = if response.respond_to?(:tool_calls) && response.tool_calls
359
+ Array(response.tool_calls).map do |tc|
360
+ {
361
+ id: tc.respond_to?(:id) ? tc.id : nil,
362
+ name: tc.respond_to?(:name) ? tc.name : tc.to_s,
363
+ arguments: tc.respond_to?(:arguments) ? tc.arguments : {}
364
+ }
365
+ end
366
+ end
367
+ json_response({
368
+ content: response.respond_to?(:content) ? response.content : response.to_s,
369
+ tool_calls: tc_list,
370
+ stop_reason: response.respond_to?(:stop_reason) ? response.stop_reason : nil,
371
+ model: response.respond_to?(:model_id) ? response.model_id.to_s : model.to_s,
372
+ input_tokens: response.respond_to?(:input_tokens) ? response.input_tokens : nil,
373
+ output_tokens: response.respond_to?(:output_tokens) ? response.output_tokens : nil
374
+ }, status_code: 200)
375
+ end
376
+ rescue StandardError => e
377
+ Legion::Logging.error "[api/llm/inference] #{e.class}: #{e.message}" if defined?(Legion::Logging)
378
+ json_error('inference_error', e.message, status_code: 500)
379
+ end
380
+ end
381
+
382
+ def self.register_providers(app)
383
+ app.get '/api/llm/providers' do
384
+ require_llm!
385
+ unless gateway_available? && defined?(Legion::Extensions::LLM::Gateway::Runners::ProviderStats)
386
+ halt json_error('gateway_unavailable', 'LLM gateway is not loaded', status_code: 503)
387
+ end
388
+
389
+ stats = Legion::Extensions::LLM::Gateway::Runners::ProviderStats
390
+ json_response({
391
+ providers: stats.health_report,
392
+ summary: stats.circuit_summary
393
+ })
394
+ end
395
+
396
+ app.get '/api/llm/providers/:name' do
397
+ require_llm!
398
+ unless gateway_available? && defined?(Legion::Extensions::LLM::Gateway::Runners::ProviderStats)
399
+ halt json_error('gateway_unavailable', 'LLM gateway is not loaded', status_code: 503)
400
+ end
401
+
402
+ stats = Legion::Extensions::LLM::Gateway::Runners::ProviderStats
403
+ detail = stats.provider_detail(provider: params[:name])
404
+ json_response(detail)
405
+ end
406
+ end
407
+
408
+ class << self
409
+ private :register_chat, :register_inference, :register_providers
410
+ end
411
+ end
412
+ end
413
+ end
@@ -2,6 +2,6 @@
2
2
 
3
3
  module Legion
4
4
  module LLM
5
- VERSION = '0.5.14'
5
+ VERSION = '0.5.16'
6
6
  end
7
7
  end
data/lib/legion/llm.rb CHANGED
@@ -24,6 +24,7 @@ require_relative 'llm/off_peak'
24
24
  require_relative 'llm/cost_tracker'
25
25
  require_relative 'llm/tool_registry'
26
26
  require_relative 'llm/override_confidence'
27
+ require_relative 'llm/routes'
27
28
 
28
29
  module Legion
29
30
  module LLM
@@ -51,6 +52,7 @@ module Legion
51
52
  @started = true
52
53
  Legion::Settings[:llm][:connected] = true
53
54
  Legion::Logging.info 'Legion::LLM started'
55
+ register_routes
54
56
  ping_provider
55
57
  end
56
58
 
@@ -228,7 +230,7 @@ module Legion
228
230
  end
229
231
 
230
232
  def _dispatch_chat(model:, provider:, intent:, tier:, escalate:, max_escalations:, quality_check:, message:, **kwargs, &)
231
- if pipeline_enabled? && message
233
+ if pipeline_enabled? && (message || kwargs[:messages])
232
234
  return chat_via_pipeline(model: model, provider: provider, intent: intent, tier: tier,
233
235
  message: message, escalate: escalate, max_escalations: max_escalations,
234
236
  quality_check: quality_check, **kwargs, &)
@@ -658,6 +660,15 @@ module Legion
658
660
  Legion::Logging.warn "LLM ping failed for #{provider}/#{model}: #{e.message}"
659
661
  end
660
662
 
663
+ def register_routes
664
+ return unless defined?(Legion::API) && Legion::API.respond_to?(:register_library_routes)
665
+
666
+ Legion::API.register_library_routes('llm', Legion::LLM::Routes)
667
+ Legion::Logging.debug 'Legion::LLM routes registered with API'
668
+ rescue StandardError => e
669
+ Legion::Logging.warn "Legion::LLM route registration failed: #{e.message}" if defined?(Legion::Logging)
670
+ end
671
+
661
672
  def auto_configure_defaults
662
673
  settings[:providers].each do |provider, config|
663
674
  next unless config&.dig(:enabled)
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: legion-llm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.14
4
+ version: 0.5.16
5
5
  platform: ruby
6
6
  authors:
7
7
  - Esity
@@ -270,6 +270,7 @@ files:
270
270
  - lib/legion/llm/router/health_tracker.rb
271
271
  - lib/legion/llm/router/resolution.rb
272
272
  - lib/legion/llm/router/rule.rb
273
+ - lib/legion/llm/routes.rb
273
274
  - lib/legion/llm/scheduling.rb
274
275
  - lib/legion/llm/settings.rb
275
276
  - lib/legion/llm/shadow_eval.rb