simple_inference 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 324b5bd8986fee7b2130229b1bb7ad277b1c476c71ed07c5a350845f91ae23a5
4
+ data.tar.gz: b746716fe0e9364bf1085acc4ac2a09dfe6ca5703831562869a96fdf43ea29a2
5
+ SHA512:
6
+ metadata.gz: 25baf84b1d22f57ed1ccb4b0413acf79cad258efee5e2a31b33d6a0247a87b175492cf293a67c6c7c2503c7d3c28e32283e21dba1aeb1bcbbe621203975e7972
7
+ data.tar.gz: ba78fc2c974118d4046119cc378202556e3d50611514df0cb4a9affa41412c5bb1a77aeccaef34071de15792a63b17bb5f484cafec8f506f7372660ba09fc5fa
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2025 jasl
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,244 @@
1
+ ## simple_inference Ruby SDK
2
+
3
+ Fiber-friendly Ruby client for the Simple Inference Server APIs (chat, embeddings, audio, rerank, health), designed to work well inside Rails apps and background jobs.
4
+
5
+ ### Installation
6
+
7
+ Add the gem to your Rails application's `Gemfile`, pointing at this repository path:
8
+
9
+ ```ruby
10
+ gem "simple_inference", path: "sdks/ruby"
11
+ ```
12
+
13
+ Then run:
14
+
15
+ ```bash
16
+ bundle install
17
+ ```
18
+
19
+ ### Configuration
20
+
21
+ You can configure the client via environment variables:
22
+
23
+ - `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
24
+ - `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
25
+ - `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
26
+ - `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
27
+
28
+ Or explicitly when constructing a client:
29
+
30
+ ```ruby
31
+ client = SimpleInference::Client.new(
32
+ base_url: "http://localhost:8000",
33
+ api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
34
+ timeout: 30.0
35
+ )
36
+ ```
37
+
38
+ For convenience, you can also use the module constructor:
39
+
40
+ ```ruby
41
+ client = SimpleInference.new(base_url: "http://localhost:8000")
42
+ ```
43
+
44
+ ### Rails integration example
45
+
46
+ Create an initializer, for example `config/initializers/simple_inference.rb`:
47
+
48
+ ```ruby
49
+ SIMPLE_INFERENCE_CLIENT = SimpleInference::Client.new(
50
+ base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
51
+ api_key: ENV["SIMPLE_INFERENCE_API_KEY"]
52
+ )
53
+ ```
54
+
55
+ Then in a controller:
56
+
57
+ ```ruby
58
+ class ChatsController < ApplicationController
59
+ def create
60
+ result = SIMPLE_INFERENCE_CLIENT.chat_completions(
61
+ model: "local-llm",
62
+ messages: [
63
+ { "role" => "user", "content" => params[:prompt] }
64
+ ]
65
+ )
66
+
67
+ render json: result[:body], status: result[:status]
68
+ end
69
+ end
70
+ ```
71
+
72
+ You can also use the client in background jobs:
73
+
74
+ ```ruby
75
+ class EmbedJob < ApplicationJob
76
+ queue_as :default
77
+
78
+ def perform(text)
79
+ result = SIMPLE_INFERENCE_CLIENT.embeddings(
80
+ model: "bge-m3",
81
+ input: text
82
+ )
83
+
84
+ vector = result[:body]["data"].first["embedding"]
85
+ # TODO: persist the vector (e.g. in DB or a vector store)
86
+ end
87
+ end
88
+ ```
89
+
90
+ And for health checks / maintenance tasks:
91
+
92
+ ```ruby
93
+ if SIMPLE_INFERENCE_CLIENT.healthy?
94
+ Rails.logger.info("Inference server is healthy")
95
+ else
96
+ Rails.logger.warn("Inference server is unhealthy")
97
+ end
98
+
99
+ models = SIMPLE_INFERENCE_CLIENT.list_models
100
+ Rails.logger.info("Available models: #{models[:body].inspect}")
101
+ ```
102
+
103
+ ### API methods
104
+
105
+ - `client.chat_completions(params)` → `POST /v1/chat/completions`
106
+ - `client.embeddings(params)` → `POST /v1/embeddings`
107
+ - `client.rerank(params)` → `POST /v1/rerank`
108
+ - `client.list_models` → `GET /v1/models`
109
+ - `client.health` → `GET /health`
110
+ - `client.healthy?` → boolean helper based on `/health`
111
+ - `client.audio_transcriptions(params)` → `POST /v1/audio/transcriptions`
112
+ - `client.audio_translations(params)` → `POST /v1/audio/translations`
113
+
114
+ All methods follow a Receive-an-Object / Return-an-Object style:
115
+
116
+ - Input: a Ruby `Hash` (keys can be strings or symbols).
117
+ - Output: a `Hash` with keys:
118
+ - `:status` – HTTP status code
119
+ - `:headers` – response headers (lowercased keys)
120
+ - `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
121
+
122
+ ### Error handling
123
+
124
+ By default (`raise_on_error: true`) non-2xx HTTP responses raise:
125
+
126
+ - `SimpleInference::Errors::HTTPError` – wraps status, headers and raw body.
127
+
128
+ Network and parsing errors are mapped to:
129
+
130
+ - `SimpleInference::Errors::TimeoutError`
131
+ - `SimpleInference::Errors::ConnectionError`
132
+ - `SimpleInference::Errors::DecodeError`
133
+
134
+ If you prefer to handle HTTP error codes manually, disable raising:
135
+
136
+ ```ruby
137
+ client = SimpleInference::Client.new(
138
+ base_url: "http://localhost:8000",
139
+ raise_on_error: false
140
+ )
141
+
142
+ response = client.embeddings(model: "local-embed", input: "hello")
143
+ if response[:status] == 200
144
+ # happy path
145
+ else
146
+ Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
147
+ end
148
+ ```
149
+
150
+ ### Using with OpenAI and compatible services
151
+
152
+ Because this SDK follows the OpenAI-style HTTP paths (`/v1/chat/completions`, `/v1/embeddings`, etc.), you can also point it directly at OpenAI or other compatible inference services.
153
+
154
+ #### Connect to OpenAI
155
+
156
+ ```ruby
157
+ client = SimpleInference::Client.new(
158
+ base_url: "https://api.openai.com",
159
+ api_key: ENV["OPENAI_API_KEY"]
160
+ )
161
+
162
+ response = client.chat_completions(
163
+ model: "gpt-4.1-mini",
164
+ messages: [{ "role" => "user", "content" => "Hello" }]
165
+ )
166
+
167
+ pp response[:body]
168
+ ```
169
+
170
+ #### Streaming chat completions (SSE)
171
+
172
+ For OpenAI-style streaming (`text/event-stream`), use `chat_completions_stream`. It yields parsed JSON events (Ruby `Hash`), so you can consume deltas incrementally:
173
+
174
+ ```ruby
175
+ client.chat_completions_stream(
176
+ model: "gpt-4.1-mini",
177
+ messages: [{ "role" => "user", "content" => "Hello" }]
178
+ ) do |event|
179
+ delta = event.dig("choices", 0, "delta", "content")
180
+ print delta if delta
181
+ end
182
+ puts
183
+ ```
184
+
185
+ If you prefer, it also returns an Enumerator:
186
+
187
+ ```ruby
188
+ client.chat_completions_stream(model: "gpt-4.1-mini", messages: [...]).each do |event|
189
+ # ...
190
+ end
191
+ ```
192
+
193
+ Fallback behavior:
194
+
195
+ - If the upstream service does **not** support streaming (for example, this repo's server currently returns `400` with `{"detail":"Streaming responses are not supported yet"}`), the SDK will **retry non-streaming** and yield a **single synthetic chunk** so your streaming consumer code can still run.
196
+
197
+ #### Connect to any OpenAI-compatible endpoint
198
+
199
+ For services that expose an OpenAI-compatible API (same paths and payloads), point `base_url` at that service and provide the correct token:
200
+
201
+ ```ruby
202
+ client = SimpleInference::Client.new(
203
+ base_url: "https://my-openai-compatible.example.com",
204
+ api_key: ENV["MY_SERVICE_TOKEN"]
205
+ )
206
+ ```
207
+
208
+ If the service uses a non-standard header instead of `Authorization: Bearer`, you can omit `api_key` and pass headers explicitly:
209
+
210
+ ```ruby
211
+ client = SimpleInference::Client.new(
212
+ base_url: "https://my-service.example.com",
213
+ headers: {
214
+ "x-api-key" => ENV["MY_SERVICE_KEY"]
215
+ }
216
+ )
217
+ ```
218
+
219
+ ### Puma vs Falcon (Fiber / Async) usage
220
+
221
+ The default HTTP adapter uses Ruby's `Net::HTTP` and is safe to use under Puma's multithreaded model:
222
+
223
+ - No global mutable state
224
+ - Per-client configuration only
225
+ - Blocking IO that integrates with Ruby 3 Fiber scheduler
226
+
227
+ For Falcon / async environments, you can keep the default adapter, or use the optional HTTPX adapter (requires the `httpx` gem):
228
+
229
+ ```ruby
230
+ gem "httpx" # optional, only required when using the HTTPX adapter
231
+ ```
232
+
233
+ You can then use the optional HTTPX adapter shipped with this gem:
234
+
235
+ ```ruby
236
+ adapter = SimpleInference::HTTPAdapter::HTTPX.new(timeout: 30.0)
237
+
238
+ SIMPLE_INFERENCE_CLIENT =
239
+ SimpleInference::Client.new(
240
+ base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
241
+ api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
242
+ adapter: adapter
243
+ )
244
+ ```
data/Rakefile ADDED
@@ -0,0 +1,12 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "bundler/gem_tasks"
4
+ require "minitest/test_task"
5
+
6
+ Minitest::TestTask.create
7
+
8
+ require "rubocop/rake_task"
9
+
10
+ RuboCop::RakeTask.new
11
+
12
+ task default: %i[test rubocop]
@@ -0,0 +1,573 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "json"
4
+ require "securerandom"
5
+ require "uri"
6
+ require "timeout"
7
+ require "socket"
8
+
9
+ module SimpleInference
10
+ class Client
11
+ attr_reader :config, :adapter
12
+
13
+ def initialize(options = {})
14
+ @config = Config.new(options || {})
15
+ @adapter = @config.adapter || HTTPAdapter::Default.new
16
+ end
17
+
18
+ # POST /v1/chat/completions
19
+ # params: { model: "model-name", messages: [...], ... }
20
+ def chat_completions(params)
21
+ post_json("/v1/chat/completions", params)
22
+ end
23
+
24
+ # POST /v1/chat/completions (streaming)
25
+ #
26
+ # Yields parsed JSON events from an OpenAI-style SSE stream (`text/event-stream`).
27
+ #
28
+ # If no block is given, returns an Enumerator.
29
+ def chat_completions_stream(params)
30
+ return enum_for(:chat_completions_stream, params) unless block_given?
31
+
32
+ unless params.is_a?(Hash)
33
+ raise Errors::ConfigurationError, "params must be a Hash"
34
+ end
35
+
36
+ body = params.dup
37
+ body.delete(:stream)
38
+ body.delete("stream")
39
+ body["stream"] = true
40
+
41
+ response = post_json_stream("/v1/chat/completions", body) do |event|
42
+ yield event
43
+ end
44
+
45
+ content_type = response.dig(:headers, "content-type").to_s
46
+
47
+ # Streaming case: we already yielded events from the SSE stream.
48
+ if response[:status].to_i >= 200 && response[:status].to_i < 300 && content_type.include?("text/event-stream")
49
+ return response
50
+ end
51
+
52
+ # Fallback when upstream does not support streaming (this repo's server).
53
+ if streaming_unsupported_error?(response[:status], response[:body])
54
+ fallback_body = params.dup
55
+ fallback_body.delete(:stream)
56
+ fallback_body.delete("stream")
57
+
58
+ fallback_response = post_json("/v1/chat/completions", fallback_body)
59
+ chunk = synthesize_chat_completion_chunk(fallback_response[:body])
60
+ yield chunk if chunk
61
+ return fallback_response
62
+ end
63
+
64
+ # If we got a non-streaming success response (JSON), convert it into a single
65
+ # chunk so streaming consumers can share the same code path.
66
+ if response[:status].to_i >= 200 && response[:status].to_i < 300
67
+ chunk = synthesize_chat_completion_chunk(response[:body])
68
+ yield chunk if chunk
69
+ end
70
+
71
+ response
72
+ end
73
+
74
+ # POST /v1/embeddings
75
+ def embeddings(params)
76
+ post_json("/v1/embeddings", params)
77
+ end
78
+
79
+ # POST /v1/rerank
80
+ def rerank(params)
81
+ post_json("/v1/rerank", params)
82
+ end
83
+
84
+ # GET /v1/models
85
+ def list_models
86
+ get_json("/v1/models")
87
+ end
88
+
89
+ # GET /health
90
+ def health
91
+ get_json("/health")
92
+ end
93
+
94
+ # Returns true when service is healthy, false otherwise.
95
+ def healthy?
96
+ response = get_json("/health", raise_on_http_error: false)
97
+ status_ok = response[:status] == 200
98
+ body_status_ok = response.dig(:body, "status") == "ok"
99
+ status_ok && body_status_ok
100
+ rescue Errors::Error
101
+ false
102
+ end
103
+
104
+ # POST /v1/audio/transcriptions
105
+ # params: { file: io_or_hash, model: "model-name", **audio_options }
106
+ def audio_transcriptions(params)
107
+ post_multipart("/v1/audio/transcriptions", params)
108
+ end
109
+
110
+ # POST /v1/audio/translations
111
+ def audio_translations(params)
112
+ post_multipart("/v1/audio/translations", params)
113
+ end
114
+
115
+ private
116
+
117
+ def base_url
118
+ config.base_url
119
+ end
120
+
121
+ def get_json(path, params: nil, raise_on_http_error: nil)
122
+ full_path = with_query(path, params)
123
+ request_json(
124
+ method: :get,
125
+ path: full_path,
126
+ body: nil,
127
+ expect_json: true,
128
+ raise_on_http_error: raise_on_http_error
129
+ )
130
+ end
131
+
132
+ def post_json(path, body, raise_on_http_error: nil)
133
+ request_json(
134
+ method: :post,
135
+ path: path,
136
+ body: body,
137
+ expect_json: true,
138
+ raise_on_http_error: raise_on_http_error
139
+ )
140
+ end
141
+
142
+ def post_json_stream(path, body, raise_on_http_error: nil, &on_event)
143
+ if base_url.nil? || base_url.empty?
144
+ raise Errors::ConfigurationError, "base_url is required"
145
+ end
146
+
147
+ url = "#{base_url}#{path}"
148
+
149
+ headers = config.headers.merge(
150
+ "Content-Type" => "application/json",
151
+ "Accept" => "text/event-stream, application/json"
152
+ )
153
+ payload = body.nil? ? nil : JSON.generate(body)
154
+
155
+ request_env = {
156
+ method: :post,
157
+ url: url,
158
+ headers: headers,
159
+ body: payload,
160
+ timeout: config.timeout,
161
+ open_timeout: config.open_timeout,
162
+ read_timeout: config.read_timeout,
163
+ }
164
+
165
+ handle_stream_response(request_env, raise_on_http_error: raise_on_http_error, &on_event)
166
+ end
167
+
168
+ def handle_stream_response(request_env, raise_on_http_error:, &on_event)
169
+ sse_buffer = +""
170
+ sse_done = false
171
+ used_streaming_adapter = false
172
+
173
+ raw_response =
174
+ if @adapter.respond_to?(:call_stream)
175
+ used_streaming_adapter = true
176
+ @adapter.call_stream(request_env) do |chunk|
177
+ next if sse_done
178
+
179
+ sse_buffer << chunk.to_s
180
+ extract_sse_blocks!(sse_buffer).each do |block|
181
+ data = sse_data_from_block(block)
182
+ next if data.nil?
183
+
184
+ payload = data.strip
185
+ next if payload.empty?
186
+ if payload == "[DONE]"
187
+ sse_done = true
188
+ sse_buffer.clear
189
+ break
190
+ end
191
+
192
+ on_event&.call(parse_json_event(payload))
193
+ end
194
+ end
195
+ else
196
+ @adapter.call(request_env)
197
+ end
198
+
199
+ status = raw_response[:status]
200
+ headers = (raw_response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
201
+ body = raw_response[:body]
202
+ body_str = body.nil? ? "" : body.to_s
203
+
204
+ content_type = headers["content-type"].to_s
205
+
206
+ # Streaming case.
207
+ if status >= 200 && status < 300 && content_type.include?("text/event-stream")
208
+ # If we couldn't stream incrementally, best-effort parse the full SSE body.
209
+ unless used_streaming_adapter
210
+ buffer = body_str.dup
211
+ extract_sse_blocks!(buffer).each do |block|
212
+ data = sse_data_from_block(block)
213
+ next if data.nil?
214
+
215
+ payload = data.strip
216
+ next if payload.empty?
217
+ break if payload == "[DONE]"
218
+
219
+ on_event&.call(parse_json_event(payload))
220
+ end
221
+ end
222
+
223
+ return {
224
+ status: status,
225
+ headers: headers,
226
+ body: nil,
227
+ }
228
+ end
229
+
230
+ # Non-streaming response path (adapter doesn't support streaming or server returned JSON).
231
+ should_parse_json = content_type.include?("json")
232
+ parsed_body = should_parse_json ? parse_json(body_str) : body_str
233
+
234
+ raise_on =
235
+ if raise_on_http_error.nil?
236
+ config.raise_on_error
237
+ else
238
+ !!raise_on_http_error
239
+ end
240
+
241
+ if raise_on && (status < 200 || status >= 300)
242
+ # Do not raise for the known "streaming unsupported" case; the caller will
243
+ # perform a non-streaming retry fallback.
244
+ unless streaming_unsupported_error?(status, parsed_body)
245
+ message = "HTTP #{status}"
246
+ begin
247
+ error_body = JSON.parse(body_str)
248
+ error_field = error_body["error"]
249
+ message =
250
+ if error_field.is_a?(Hash)
251
+ error_field["message"] || error_body["message"] || message
252
+ else
253
+ error_field || error_body["message"] || message
254
+ end
255
+ rescue JSON::ParserError
256
+ # fall back to generic message
257
+ end
258
+
259
+ raise Errors::HTTPError.new(
260
+ message,
261
+ status: status,
262
+ headers: headers,
263
+ body: body_str
264
+ )
265
+ end
266
+ end
267
+
268
+ {
269
+ status: status,
270
+ headers: headers,
271
+ body: parsed_body,
272
+ }
273
+ rescue Timeout::Error => e
274
+ raise Errors::TimeoutError, e.message
275
+ rescue SocketError, SystemCallError => e
276
+ raise Errors::ConnectionError, e.message
277
+ end
278
+
279
+ def extract_sse_blocks!(buffer)
280
+ blocks = []
281
+
282
+ loop do
283
+ idx_lf = buffer.index("\n\n")
284
+ idx_crlf = buffer.index("\r\n\r\n")
285
+
286
+ idx = [idx_lf, idx_crlf].compact.min
287
+ break if idx.nil?
288
+
289
+ sep_len = (idx == idx_crlf) ? 4 : 2
290
+ blocks << buffer.slice!(0, idx)
291
+ buffer.slice!(0, sep_len)
292
+ end
293
+
294
+ blocks
295
+ end
296
+
297
+ def sse_data_from_block(block)
298
+ return nil if block.nil? || block.empty?
299
+
300
+ data_lines = []
301
+ block.split(/\r?\n/).each do |line|
302
+ next if line.nil? || line.empty?
303
+ next if line.start_with?(":")
304
+ next unless line.start_with?("data:")
305
+
306
+ data_lines << (line[5..]&.lstrip).to_s
307
+ end
308
+
309
+ return nil if data_lines.empty?
310
+
311
+ data_lines.join("\n")
312
+ end
313
+
314
+ def parse_json_event(payload)
315
+ JSON.parse(payload)
316
+ rescue JSON::ParserError => e
317
+ raise Errors::DecodeError, "Failed to parse SSE JSON event: #{e.message}"
318
+ end
319
+
320
+ def streaming_unsupported_error?(status, body)
321
+ return false unless status.to_i == 400
322
+ return false unless body.is_a?(Hash)
323
+
324
+ body["detail"].to_s.strip == "Streaming responses are not supported yet"
325
+ end
326
+
327
+ def synthesize_chat_completion_chunk(body)
328
+ return nil unless body.is_a?(Hash)
329
+
330
+ id = body["id"]
331
+ created = body["created"]
332
+ model = body["model"]
333
+
334
+ choices = body["choices"]
335
+ return nil unless choices.is_a?(Array) && !choices.empty?
336
+
337
+ choice0 = choices[0]
338
+ return nil unless choice0.is_a?(Hash)
339
+
340
+ message = choice0["message"]
341
+ return nil unless message.is_a?(Hash)
342
+
343
+ role = message["role"] || "assistant"
344
+ content = message["content"]
345
+
346
+ {
347
+ "id" => id,
348
+ "object" => "chat.completion.chunk",
349
+ "created" => created,
350
+ "model" => model,
351
+ "choices" => [
352
+ {
353
+ "index" => choice0["index"] || 0,
354
+ "delta" => {
355
+ "role" => role,
356
+ "content" => content,
357
+ },
358
+ "finish_reason" => choice0["finish_reason"],
359
+ },
360
+ ],
361
+ }
362
+ end
363
+
364
+ def request_json(method:, path:, body:, expect_json:, raise_on_http_error:)
365
+ if base_url.nil? || base_url.empty?
366
+ raise Errors::ConfigurationError, "base_url is required"
367
+ end
368
+
369
+ url = "#{base_url}#{path}"
370
+
371
+ headers = config.headers.merge("Content-Type" => "application/json")
372
+ payload = body.nil? ? nil : JSON.generate(body)
373
+
374
+ request_env = {
375
+ method: method,
376
+ url: url,
377
+ headers: headers,
378
+ body: payload,
379
+ timeout: config.timeout,
380
+ open_timeout: config.open_timeout,
381
+ read_timeout: config.read_timeout,
382
+ }
383
+
384
+ handle_response(
385
+ request_env,
386
+ expect_json: expect_json,
387
+ raise_on_http_error: raise_on_http_error
388
+ )
389
+ end
390
+
391
+ def with_query(path, params)
392
+ return path if params.nil? || params.empty?
393
+
394
+ query = URI.encode_www_form(params)
395
+ separator = path.include?("?") ? "&" : "?"
396
+ "#{path}#{separator}#{query}"
397
+ end
398
+
399
+ def post_multipart(path, params)
400
+ file_value = params[:file] || params["file"]
401
+ model = params[:model] || params["model"]
402
+
403
+ raise Errors::ConfigurationError, "file is required" if file_value.nil?
404
+ raise Errors::ConfigurationError, "model is required" if model.nil? || model.to_s.empty?
405
+
406
+ io, filename = normalize_upload(file_value)
407
+
408
+ form_fields = {
409
+ "model" => model.to_s,
410
+ }
411
+
412
+ # Optional scalar fields
413
+ %i[language prompt response_format temperature].each do |key|
414
+ value = params[key] || params[key.to_s]
415
+ next if value.nil?
416
+
417
+ form_fields[key.to_s] = value.to_s
418
+ end
419
+
420
+ # timestamp_granularities can be an array or single value
421
+ tgs = params[:timestamp_granularities] || params["timestamp_granularities"]
422
+ if tgs && !tgs.empty?
423
+ Array(tgs).each_with_index do |value, index|
424
+ form_fields["timestamp_granularities[#{index}]"] = value.to_s
425
+ end
426
+ end
427
+
428
+ body, headers = build_multipart_body(io, filename, form_fields)
429
+
430
+ request_env = {
431
+ method: :post,
432
+ url: "#{base_url}#{path}",
433
+ headers: config.headers.merge(headers),
434
+ body: body,
435
+ timeout: config.timeout,
436
+ open_timeout: config.open_timeout,
437
+ read_timeout: config.read_timeout,
438
+ }
439
+
440
+ handle_response(
441
+ request_env,
442
+ expect_json: nil, # auto-detect based on Content-Type
443
+ raise_on_http_error: nil
444
+ )
445
+ ensure
446
+ if io && io.respond_to?(:close)
447
+ begin
448
+ io.close unless io.closed?
449
+ rescue StandardError
450
+ # ignore close errors
451
+ end
452
+ end
453
+ end
454
+
455
+ def normalize_upload(file)
456
+ if file.is_a?(Hash)
457
+ io = file[:io] || file["io"]
458
+ filename = file[:filename] || file["filename"] || "audio.wav"
459
+ elsif file.respond_to?(:read)
460
+ io = file
461
+ filename =
462
+ if file.respond_to?(:path) && file.path
463
+ File.basename(file.path)
464
+ else
465
+ "audio.wav"
466
+ end
467
+ else
468
+ raise Errors::ConfigurationError,
469
+ "file must be an IO object or a hash with :io and :filename keys"
470
+ end
471
+
472
+ raise Errors::ConfigurationError, "file IO is required" if io.nil?
473
+
474
+ [io, filename]
475
+ end
476
+
477
+ def build_multipart_body(io, filename, fields)
478
+ boundary = "simple-inference-ruby-#{SecureRandom.hex(12)}"
479
+
480
+ headers = {
481
+ "Content-Type" => "multipart/form-data; boundary=#{boundary}",
482
+ }
483
+
484
+ body = +""
485
+
486
+ fields.each do |name, value|
487
+ body << "--#{boundary}\r\n"
488
+ body << %(Content-Disposition: form-data; name="#{name}"\r\n\r\n)
489
+ body << value.to_s
490
+ body << "\r\n"
491
+ end
492
+
493
+ body << "--#{boundary}\r\n"
494
+ body << %(Content-Disposition: form-data; name="file"; filename="#{filename}"\r\n)
495
+ body << "Content-Type: application/octet-stream\r\n\r\n"
496
+
497
+ while (chunk = io.read(16_384))
498
+ body << chunk
499
+ end
500
+
501
+ body << "\r\n--#{boundary}--\r\n"
502
+
503
+ [body, headers]
504
+ end
505
+
506
+ def handle_response(request_env, expect_json:, raise_on_http_error:)
507
+ response = @adapter.call(request_env)
508
+
509
+ status = response[:status]
510
+ headers = (response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
511
+ body = response[:body].to_s
512
+
513
+ # Decide whether to raise on HTTP errors
514
+ raise_on =
515
+ if raise_on_http_error.nil?
516
+ config.raise_on_error
517
+ else
518
+ !!raise_on_http_error
519
+ end
520
+
521
+ if raise_on && (status < 200 || status >= 300)
522
+ message = "HTTP #{status}"
523
+
524
+ begin
525
+ error_body = JSON.parse(body)
526
+ message = error_body["error"] || error_body["message"] || message
527
+ rescue JSON::ParserError
528
+ # fall back to generic message
529
+ end
530
+
531
+ raise Errors::HTTPError.new(
532
+ message,
533
+ status: status,
534
+ headers: headers,
535
+ body: body
536
+ )
537
+ end
538
+
539
+ should_parse_json =
540
+ if expect_json.nil?
541
+ content_type = headers["content-type"]
542
+ content_type && content_type.include?("json")
543
+ else
544
+ expect_json
545
+ end
546
+
547
+ parsed_body =
548
+ if should_parse_json
549
+ parse_json(body)
550
+ else
551
+ body
552
+ end
553
+
554
+ {
555
+ status: status,
556
+ headers: headers,
557
+ body: parsed_body,
558
+ }
559
+ rescue Timeout::Error => e
560
+ raise Errors::TimeoutError, e.message
561
+ rescue SocketError, SystemCallError => e
562
+ raise Errors::ConnectionError, e.message
563
+ end
564
+
565
+ def parse_json(body)
566
+ return nil if body.nil? || body.empty?
567
+
568
+ JSON.parse(body)
569
+ rescue JSON::ParserError => e
570
+ raise Errors::DecodeError, "Failed to parse JSON response: #{e.message}"
571
+ end
572
+ end
573
+ end
@@ -0,0 +1,88 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SimpleInference
4
+ class Config
5
+ attr_reader :base_url,
6
+ :api_key,
7
+ :timeout,
8
+ :open_timeout,
9
+ :read_timeout,
10
+ :adapter,
11
+ :raise_on_error
12
+
13
+ def initialize(options = {})
14
+ opts = symbolize_keys(options || {})
15
+
16
+ @base_url = normalize_base_url(
17
+ opts[:base_url] || ENV["SIMPLE_INFERENCE_BASE_URL"] || "http://localhost:8000"
18
+ )
19
+ @api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
20
+ @api_key = nil if @api_key.empty?
21
+
22
+ @timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
23
+ @open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
24
+ @read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
25
+
26
+ @adapter = opts[:adapter]
27
+
28
+ @raise_on_error = boolean_option(
29
+ explicit: opts.fetch(:raise_on_error, nil),
30
+ env_name: "SIMPLE_INFERENCE_RAISE_ON_ERROR",
31
+ default: true
32
+ )
33
+
34
+ @default_headers = build_default_headers(opts[:headers] || {})
35
+ end
36
+
37
+ def headers
38
+ @default_headers.dup
39
+ end
40
+
41
+ private
42
+
43
+ def normalize_base_url(value)
44
+ url = value.to_s.strip
45
+ url = "http://localhost:8000" if url.empty?
46
+ url.chomp("/")
47
+ end
48
+
49
+ def to_float_or_nil(value)
50
+ return nil if value.nil? || value == ""
51
+
52
+ Float(value)
53
+ rescue ArgumentError, TypeError
54
+ nil
55
+ end
56
+
57
+ def boolean_option(explicit:, env_name:, default:)
58
+ return !!explicit unless explicit.nil?
59
+
60
+ env_value = ENV[env_name]
61
+ return default if env_value.nil?
62
+
63
+ %w[1 true yes on].include?(env_value.to_s.strip.downcase)
64
+ end
65
+
66
+ def build_default_headers(extra_headers)
67
+ headers = {
68
+ "Accept" => "application/json",
69
+ }
70
+
71
+ headers["Authorization"] = "Bearer #{@api_key}" if @api_key
72
+
73
+ headers.merge(stringify_keys(extra_headers))
74
+ end
75
+
76
+ def symbolize_keys(hash)
77
+ hash.each_with_object({}) do |(key, value), out|
78
+ out[key.to_sym] = value
79
+ end
80
+ end
81
+
82
+ def stringify_keys(hash)
83
+ hash.each_with_object({}) do |(key, value), out|
84
+ out[key.to_s] = value
85
+ end
86
+ end
87
+ end
88
+ end
@@ -0,0 +1,24 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SimpleInference
4
+ module Errors
5
+ class Error < StandardError; end
6
+
7
+ class ConfigurationError < Error; end
8
+
9
+ class HTTPError < Error
10
+ attr_reader :status, :headers, :body
11
+
12
+ def initialize(message, status:, headers:, body:)
13
+ super(message)
14
+ @status = status
15
+ @headers = headers
16
+ @body = body
17
+ end
18
+ end
19
+
20
+ class TimeoutError < Error; end
21
+ class ConnectionError < Error; end
22
+ class DecodeError < Error; end
23
+ end
24
+ end
@@ -0,0 +1,72 @@
1
+ # frozen_string_literal: true
2
+
3
+ begin
4
+ require "httpx"
5
+ rescue LoadError => e
6
+ raise LoadError,
7
+ "httpx gem is required for SimpleInference::HTTPAdapter::HTTPX (add `gem \"httpx\"`)",
8
+ cause: e
9
+ end
10
+
11
+ module SimpleInference
12
+ module HTTPAdapter
13
+ # Fiber-friendly HTTP adapter built on HTTPX.
14
+ #
15
+ # NOTE: This adapter intentionally does NOT implement `#call_stream`.
16
+ # Streaming consumers will still work via the SDK's full-body SSE parsing
17
+ # fallback path (see SimpleInference::Client#handle_stream_response).
18
+ class HTTPX
19
+ def initialize(timeout: nil)
20
+ @timeout = timeout
21
+ end
22
+
23
+ def call(request)
24
+ method = request.fetch(:method).to_s.downcase.to_sym
25
+ url = request.fetch(:url)
26
+ headers = request[:headers] || {}
27
+ body = request[:body]
28
+
29
+ client = ::HTTPX
30
+
31
+ # Mirror the SDK's timeout semantics:
32
+ # - `:timeout` is the overall request deadline (maps to HTTPX `request_timeout`)
33
+ # - `:open_timeout` and `:read_timeout` override connect/read deadlines
34
+ timeout = request[:timeout] || @timeout
35
+ open_timeout = request[:open_timeout] || timeout
36
+ read_timeout = request[:read_timeout] || timeout
37
+
38
+ timeout_opts = {}
39
+ timeout_opts[:request_timeout] = timeout.to_f if timeout
40
+ timeout_opts[:connect_timeout] = open_timeout.to_f if open_timeout
41
+ timeout_opts[:read_timeout] = read_timeout.to_f if read_timeout
42
+
43
+ unless timeout_opts.empty?
44
+ client = client.with(timeout: timeout_opts)
45
+ end
46
+
47
+ response = client.request(method, url, headers: headers, body: body)
48
+
49
+ # HTTPX may return an error response object instead of raising.
50
+ if response.respond_to?(:status) && response.status.to_i == 0
51
+ err = response.respond_to?(:error) ? response.error : nil
52
+ raise Errors::ConnectionError, (err ? err.message : "HTTPX request failed")
53
+ end
54
+
55
+ response_headers =
56
+ response.headers.to_h.each_with_object({}) do |(k, v), out|
57
+ out[k.to_s] = v.is_a?(Array) ? v.join(", ") : v.to_s
58
+ end
59
+
60
+ {
61
+ status: response.status.to_i,
62
+ headers: response_headers,
63
+ body: response.body.to_s,
64
+ }
65
+ rescue ::HTTPX::TimeoutError => e
66
+ raise Errors::TimeoutError, e.message
67
+ rescue ::HTTPX::Error, IOError, SystemCallError => e
68
+ raise Errors::ConnectionError, e.message
69
+ end
70
+ end
71
+ end
72
+ end
@@ -0,0 +1,123 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "net/http"
4
+ require "uri"
5
+
6
+ module SimpleInference
7
+ module HTTPAdapter
8
+ # Optional adapters are lazily loaded so the SDK has no hard runtime deps.
9
+ autoload :HTTPX, "simple_inference/http_adapter/httpx"
10
+
11
+ # Default synchronous HTTP adapter built on Net::HTTP.
12
+ # It is compatible with Ruby 3 Fiber scheduler and keeps the interface
13
+ # minimal so it can be swapped out for custom adapters (HTTPX, async-http, etc.).
14
+ class Default
15
+ def call(request)
16
+ uri = URI.parse(request.fetch(:url))
17
+
18
+ http = Net::HTTP.new(uri.host, uri.port)
19
+ http.use_ssl = uri.scheme == "https"
20
+
21
+ timeout = request[:timeout]
22
+ open_timeout = request[:open_timeout] || timeout
23
+ read_timeout = request[:read_timeout] || timeout
24
+
25
+ http.open_timeout = open_timeout if open_timeout
26
+ http.read_timeout = read_timeout if read_timeout
27
+
28
+ klass = http_class_for(request[:method])
29
+ req = klass.new(uri.request_uri)
30
+
31
+ headers = request[:headers] || {}
32
+ headers.each do |key, value|
33
+ req[key.to_s] = value
34
+ end
35
+
36
+ body = request[:body]
37
+ req.body = body if body
38
+
39
+ response = http.request(req)
40
+
41
+ {
42
+ status: Integer(response.code),
43
+ headers: response.each_header.to_h,
44
+ body: response.body.to_s,
45
+ }
46
+ end
47
+
48
+ # Streaming-capable request helper.
49
+ #
50
+ # When the response is `text/event-stream` (and 2xx), it yields raw body chunks
51
+ # as they arrive via the given block, and returns a response hash with `body: nil`.
52
+ #
53
+ # For non-streaming responses, it behaves like `#call` and returns the full body.
54
+ def call_stream(request)
55
+ return call(request) unless block_given?
56
+
57
+ uri = URI.parse(request.fetch(:url))
58
+
59
+ http = Net::HTTP.new(uri.host, uri.port)
60
+ http.use_ssl = uri.scheme == "https"
61
+
62
+ timeout = request[:timeout]
63
+ open_timeout = request[:open_timeout] || timeout
64
+ read_timeout = request[:read_timeout] || timeout
65
+
66
+ http.open_timeout = open_timeout if open_timeout
67
+ http.read_timeout = read_timeout if read_timeout
68
+
69
+ klass = http_class_for(request[:method])
70
+ req = klass.new(uri.request_uri)
71
+
72
+ headers = request[:headers] || {}
73
+ headers.each do |key, value|
74
+ req[key.to_s] = value
75
+ end
76
+
77
+ body = request[:body]
78
+ req.body = body if body
79
+
80
+ status = nil
81
+ response_headers = {}
82
+ response_body = +""
83
+
84
+ http.request(req) do |response|
85
+ status = Integer(response.code)
86
+ response_headers = response.each_header.to_h
87
+
88
+ headers_lc = response_headers.transform_keys { |k| k.to_s.downcase }
89
+ content_type = headers_lc["content-type"]
90
+
91
+ if status >= 200 && status < 300 && content_type&.include?("text/event-stream")
92
+ response.read_body do |chunk|
93
+ yield chunk
94
+ end
95
+ response_body = nil
96
+ else
97
+ response_body = response.body.to_s
98
+ end
99
+ end
100
+
101
+ {
102
+ status: Integer(status),
103
+ headers: response_headers,
104
+ body: response_body,
105
+ }
106
+ end
107
+
108
+ private
109
+
110
+ def http_class_for(method)
111
+ case method.to_s.upcase
112
+ when "GET" then Net::HTTP::Get
113
+ when "POST" then Net::HTTP::Post
114
+ when "PUT" then Net::HTTP::Put
115
+ when "PATCH" then Net::HTTP::Patch
116
+ when "DELETE" then Net::HTTP::Delete
117
+ else
118
+ raise ArgumentError, "Unsupported HTTP method: #{method.inspect}"
119
+ end
120
+ end
121
+ end
122
+ end
123
+ end
@@ -0,0 +1,5 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SimpleInference
4
+ VERSION = "0.1.0"
5
+ end
@@ -0,0 +1,19 @@
1
+ # frozen_string_literal: true
2
+
3
+ require_relative "simple_inference/version"
4
+ require_relative "simple_inference/config"
5
+ require_relative "simple_inference/errors"
6
+ require_relative "simple_inference/http_adapter"
7
+ require_relative "simple_inference/client"
8
+
9
+ module SimpleInference
10
+ class << self
11
+ # Convenience constructor using RORO-style options hash.
12
+ #
13
+ # Example:
14
+ # client = SimpleInference.new(base_url: "...", api_key: "...")
15
+ def new(options = {})
16
+ Client.new(options)
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,4 @@
1
+ module SimpleInference
2
+ VERSION: String
3
+ end
4
+
metadata ADDED
@@ -0,0 +1,56 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: simple_inference
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - jasl
8
+ bindir: exe
9
+ cert_chain: []
10
+ date: 1980-01-02 00:00:00.000000000 Z
11
+ dependencies: []
12
+ description: Fiber-friendly Ruby client for Simple Inference Server APIs (chat, embeddings,
13
+ audio, rerank, health).
14
+ email:
15
+ - jasl9187@hotmail.com
16
+ executables: []
17
+ extensions: []
18
+ extra_rdoc_files: []
19
+ files:
20
+ - LICENSE.txt
21
+ - README.md
22
+ - Rakefile
23
+ - lib/simple_inference.rb
24
+ - lib/simple_inference/client.rb
25
+ - lib/simple_inference/config.rb
26
+ - lib/simple_inference/errors.rb
27
+ - lib/simple_inference/http_adapter.rb
28
+ - lib/simple_inference/http_adapter/httpx.rb
29
+ - lib/simple_inference/version.rb
30
+ - sig/simple_inference.rbs
31
+ homepage: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
32
+ licenses:
33
+ - MIT
34
+ metadata:
35
+ allowed_push_host: https://rubygems.org
36
+ homepage_uri: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
37
+ source_code_uri: https://github.com/jasl/simple_inference_server
38
+ rubygems_mfa_required: 'true'
39
+ rdoc_options: []
40
+ require_paths:
41
+ - lib
42
+ required_ruby_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: 3.2.0
47
+ required_rubygems_version: !ruby/object:Gem::Requirement
48
+ requirements:
49
+ - - ">="
50
+ - !ruby/object:Gem::Version
51
+ version: '0'
52
+ requirements: []
53
+ rubygems_version: 4.0.1
54
+ specification_version: 4
55
+ summary: Fiber-friendly Ruby client for the Simple Inference Server (OpenAI-compatible).
56
+ test_files: []