simple_inference 0.1.3 → 0.1.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 50a8ea07e0e30771b6d42f2bacf12f12b379f1151bb44947bb5febe3cac70cf9
4
- data.tar.gz: c655b141ea39e518c5cdcc30cc28bac249ef23acfae8c05f4cf852e199a98a80
3
+ metadata.gz: ad988c1bb0af4938ea72fd303943a6dc27b90f26a8128abd737e0fca6429e081
4
+ data.tar.gz: 6be00487c1533201ffc48afb14a64c385b434698cf1bf3ab1c5c4ab10834d06a
5
5
  SHA512:
6
- metadata.gz: '0483fda13365abb75bde99643ada2fc95017e4883e07d823d8b86912e9553764befe7c6a92222aeca3ee05d20e8f6b1e1a23a4a74decf45383bc4cc9c055d357'
7
- data.tar.gz: 4d115195078c198b2c2c1b4bc1307a05e337884449b19393488374cd1710efaf6186e68c7060417141e5477933ea3fafc5223896adc06c1d1063263437254d56
6
+ metadata.gz: 066dbeee456edae89770a5ed6541d77dda53d6ebcac59a2f277e28e00dde8b12b373cdec67bb0e79f84df781397034f1ff75694560bd6f612dca608ce6252630
7
+ data.tar.gz: 8008d5a95c38e45465e48a3f45fe8b7fd1cffec49e16cfd54419cbed08a11d7d613715314c91c742f63430860caac1fe332e10270cd0741401e98540a0582d65
data/README.md CHANGED
@@ -1,13 +1,24 @@
1
- ## simple_inference Ruby SDK
1
+ # SimpleInference
2
2
 
3
- Fiber-friendly Ruby client for the Simple Inference Server APIs (chat, embeddings, audio, rerank, health), designed to work well inside Rails apps and background jobs.
3
+ A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
4
4
 
5
- ### Installation
5
+ Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with `Net::HTTP`.
6
6
 
7
- Add the gem to your Rails application's `Gemfile`, pointing at this repository path:
7
+ ## Features
8
+
9
+ - 🔌 **Universal compatibility** – Works with any OpenAI-compatible API provider
10
+ - 🌊 **Streaming support** – Native SSE streaming for chat completions
11
+ - 🧵 **Fiber-friendly** – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
12
+ - 🔧 **Flexible configuration** – Customizable API prefix for non-standard endpoints
13
+ - 🎯 **Simple interface** – Receive-an-Object / Return-an-Object style API
14
+ - 📦 **Zero runtime dependencies** – Uses only Ruby standard library
15
+
16
+ ## Installation
17
+
18
+ Add to your Gemfile:
8
19
 
9
20
  ```ruby
10
- gem "simple_inference", path: "sdks/ruby"
21
+ gem "simple_inference"
11
22
  ```
12
23
 
13
24
  Then run:
@@ -16,231 +27,398 @@ Then run:
16
27
  bundle install
17
28
  ```
18
29
 
19
- ### Configuration
30
+ ## Quick Start
31
+
32
+ ```ruby
33
+ require "simple_inference"
34
+
35
+ # Connect to OpenAI
36
+ client = SimpleInference::Client.new(
37
+ base_url: "https://api.openai.com",
38
+ api_key: ENV["OPENAI_API_KEY"]
39
+ )
40
+
41
+ result = client.chat(
42
+ model: "gpt-4o-mini",
43
+ messages: [{ "role" => "user", "content" => "Hello!" }]
44
+ )
45
+
46
+ puts result.content
47
+ p result.usage
48
+ ```
49
+
50
+ ## Configuration
51
+
52
+ ### Options
20
53
 
21
- You can configure the client via environment variables:
54
+ | Option | Env Variable | Default | Description |
55
+ |--------|--------------|---------|-------------|
56
+ | `base_url` | `SIMPLE_INFERENCE_BASE_URL` | `http://localhost:8000` | API base URL |
57
+ | `api_key` | `SIMPLE_INFERENCE_API_KEY` | `nil` | API key (sent as `Authorization: Bearer <token>`) |
58
+ | `api_prefix` | `SIMPLE_INFERENCE_API_PREFIX` | `/v1` | API path prefix (e.g., `/v1`, empty string for some providers) |
59
+ | `timeout` | `SIMPLE_INFERENCE_TIMEOUT` | `nil` | Request timeout in seconds |
60
+ | `open_timeout` | `SIMPLE_INFERENCE_OPEN_TIMEOUT` | `nil` | Connection open timeout |
61
+ | `read_timeout` | `SIMPLE_INFERENCE_READ_TIMEOUT` | `nil` | Read timeout |
62
+ | `raise_on_error` | `SIMPLE_INFERENCE_RAISE_ON_ERROR` | `true` | Raise exceptions on HTTP errors |
63
+ | `headers` | – | `{}` | Additional headers to send with requests |
64
+ | `adapter` | – | `Default` | HTTP adapter (see [Adapters](#http-adapters)) |
22
65
 
23
- - `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
24
- - `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
25
- - `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
26
- - `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
66
+ ### Provider Examples
27
67
 
28
- Or explicitly when constructing a client:
68
+ #### OpenAI
29
69
 
30
70
  ```ruby
31
71
  client = SimpleInference::Client.new(
32
- base_url: "http://localhost:8000",
33
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
34
- timeout: 30.0
72
+ base_url: "https://api.openai.com",
73
+ api_key: ENV["OPENAI_API_KEY"]
35
74
  )
36
75
  ```
37
76
 
38
- For convenience, you can also use the module constructor:
77
+ #### 火山引擎 (Volcengine / ByteDance)
78
+
79
+ 火山引擎的 API 路径不包含 `/v1` 前缀,需要设置 `api_prefix: ""`:
39
80
 
40
81
  ```ruby
41
- client = SimpleInference.new(base_url: "http://localhost:8000")
42
- ```
82
+ client = SimpleInference::Client.new(
83
+ base_url: "https://ark.cn-beijing.volces.com/api/v3",
84
+ api_key: ENV["ARK_API_KEY"],
85
+ api_prefix: "" # 重要:火山引擎不使用 /v1 前缀
86
+ )
43
87
 
44
- ### Rails integration example
88
+ result = client.chat(
89
+ model: "deepseek-v3-250324",
90
+ messages: [
91
+ { "role" => "system", "content" => "你是人工智能助手" },
92
+ { "role" => "user", "content" => "你好" }
93
+ ]
94
+ )
95
+
96
+ puts result.content
97
+ ```
45
98
 
46
- Create an initializer, for example `config/initializers/simple_inference.rb`:
99
+ #### DeepSeek
47
100
 
48
101
  ```ruby
49
- SIMPLE_INFERENCE_CLIENT = SimpleInference::Client.new(
50
- base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
51
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"]
102
+ client = SimpleInference::Client.new(
103
+ base_url: "https://api.deepseek.com",
104
+ api_key: ENV["DEEPSEEK_API_KEY"]
52
105
  )
53
106
  ```
54
107
 
55
- Then in a controller:
108
+ #### Groq
56
109
 
57
110
  ```ruby
58
- class ChatsController < ApplicationController
59
- def create
60
- result = SIMPLE_INFERENCE_CLIENT.chat_completions(
61
- model: "local-llm",
62
- messages: [
63
- { "role" => "user", "content" => params[:prompt] }
64
- ]
65
- )
111
+ client = SimpleInference::Client.new(
112
+ base_url: "https://api.groq.com/openai",
113
+ api_key: ENV["GROQ_API_KEY"]
114
+ )
115
+ ```
66
116
 
67
- render json: result[:body], status: result[:status]
68
- end
69
- end
117
+ #### Together AI
118
+
119
+ ```ruby
120
+ client = SimpleInference::Client.new(
121
+ base_url: "https://api.together.xyz",
122
+ api_key: ENV["TOGETHER_API_KEY"]
123
+ )
70
124
  ```
71
125
 
72
- You can also use the client in background jobs:
126
+ #### Local inference servers (Ollama, vLLM, etc.)
73
127
 
74
128
  ```ruby
75
- class EmbedJob < ApplicationJob
76
- queue_as :default
129
+ # Ollama
130
+ client = SimpleInference::Client.new(
131
+ base_url: "http://localhost:11434"
132
+ )
77
133
 
78
- def perform(text)
79
- result = SIMPLE_INFERENCE_CLIENT.embeddings(
80
- model: "bge-m3",
81
- input: text
82
- )
134
+ # vLLM
135
+ client = SimpleInference::Client.new(
136
+ base_url: "http://localhost:8000"
137
+ )
138
+ ```
83
139
 
84
- vector = result[:body]["data"].first["embedding"]
85
- # TODO: persist the vector (e.g. in DB or a vector store)
86
- end
87
- end
140
+ #### Custom authentication header
141
+
142
+ Some providers use non-standard authentication headers:
143
+
144
+ ```ruby
145
+ client = SimpleInference::Client.new(
146
+ base_url: "https://my-service.example.com",
147
+ api_prefix: "/v1",
148
+ headers: {
149
+ "x-api-key" => ENV["MY_SERVICE_KEY"]
150
+ }
151
+ )
88
152
  ```
89
153
 
90
- And for health checks / maintenance tasks:
154
+ ## API Methods
155
+
156
+ ### Chat
91
157
 
92
158
  ```ruby
93
- if SIMPLE_INFERENCE_CLIENT.healthy?
94
- Rails.logger.info("Inference server is healthy")
95
- else
96
- Rails.logger.warn("Inference server is unhealthy")
97
- end
159
+ result = client.chat(
160
+ model: "gpt-4o-mini",
161
+ messages: [
162
+ { "role" => "system", "content" => "You are a helpful assistant." },
163
+ { "role" => "user", "content" => "Hello!" }
164
+ ],
165
+ temperature: 0.7,
166
+ max_tokens: 1000
167
+ )
98
168
 
99
- models = SIMPLE_INFERENCE_CLIENT.list_models
100
- Rails.logger.info("Available models: #{models[:body].inspect}")
169
+ puts result.content
170
+ p result.usage
101
171
  ```
102
172
 
103
- ### API methods
173
+ ### Streaming Chat
104
174
 
105
- - `client.chat_completions(params)` → `POST /v1/chat/completions`
106
- - `client.embeddings(params)` → `POST /v1/embeddings`
107
- - `client.rerank(params)` → `POST /v1/rerank`
108
- - `client.list_models` `GET /v1/models`
109
- - `client.health` → `GET /health`
110
- - `client.healthy?` → boolean helper based on `/health`
111
- - `client.audio_transcriptions(params)` `POST /v1/audio/transcriptions`
112
- - `client.audio_translations(params)` → `POST /v1/audio/translations`
175
+ ```ruby
176
+ result = client.chat(
177
+ model: "gpt-4o-mini",
178
+ messages: [{ "role" => "user", "content" => "Tell me a story" }],
179
+ stream: true,
180
+ include_usage: true
181
+ ) do |delta|
182
+ print delta
183
+ end
184
+ puts
113
185
 
114
- All methods follow a Receive-an-Object / Return-an-Object style:
186
+ p result.usage
187
+ ```
115
188
 
116
- - Input: a Ruby `Hash` (keys can be strings or symbols).
117
- - Output: a `Hash` with keys:
118
- - `:status` – HTTP status code
119
- - `:headers` – response headers (lowercased keys)
120
- - `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
189
+ Low-level streaming (events) is also available, and can be used as an Enumerator:
121
190
 
122
- ### Error handling
191
+ ```ruby
192
+ stream = client.chat_completions_stream(
193
+ model: "gpt-4o-mini",
194
+ messages: [{ "role" => "user", "content" => "Hello" }]
195
+ )
123
196
 
124
- By default (`raise_on_error: true`) non-2xx HTTP responses raise:
197
+ stream.each do |event|
198
+ # process event
199
+ end
200
+ ```
125
201
 
126
- - `SimpleInference::Errors::HTTPError` wraps status, headers and raw body.
202
+ Or as an Enumerable of delta strings:
127
203
 
128
- Network and parsing errors are mapped to:
204
+ ```ruby
205
+ stream = client.chat_stream(
206
+ model: "gpt-4o-mini",
207
+ messages: [{ "role" => "user", "content" => "Hello" }],
208
+ include_usage: true
209
+ )
129
210
 
130
- - `SimpleInference::Errors::TimeoutError`
131
- - `SimpleInference::Errors::ConnectionError`
132
- - `SimpleInference::Errors::DecodeError`
211
+ stream.each { |delta| print delta }
212
+ puts
213
+ p stream.result&.usage
214
+ ```
133
215
 
134
- If you prefer to handle HTTP error codes manually, disable raising:
216
+ ### Embeddings
135
217
 
136
218
  ```ruby
137
- client = SimpleInference::Client.new(
138
- base_url: "http://localhost:8000",
139
- raise_on_error: false
219
+ response = client.embeddings(
220
+ model: "text-embedding-3-small",
221
+ input: "Hello, world!"
140
222
  )
141
223
 
142
- response = client.embeddings(model: "local-embed", input: "hello")
143
- if response[:status] == 200
144
- # happy path
145
- else
146
- Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
147
- end
224
+ vector = response.body["data"][0]["embedding"]
148
225
  ```
149
226
 
150
- ### Using with OpenAI and compatible services
227
+ ### Rerank
151
228
 
152
- Because this SDK follows the OpenAI-style HTTP paths (`/v1/chat/completions`, `/v1/embeddings`, etc.), you can also point it directly at OpenAI or other compatible inference services.
229
+ ```ruby
230
+ response = client.rerank(
231
+ model: "bge-reranker-v2-m3",
232
+ query: "What is machine learning?",
233
+ documents: [
234
+ "Machine learning is a subset of AI...",
235
+ "The weather today is sunny...",
236
+ "Deep learning uses neural networks..."
237
+ ]
238
+ )
239
+ ```
153
240
 
154
- #### Connect to OpenAI
241
+ ### Audio Transcription
155
242
 
156
243
  ```ruby
157
- client = SimpleInference::Client.new(
158
- base_url: "https://api.openai.com",
159
- api_key: ENV["OPENAI_API_KEY"]
244
+ response = client.audio_transcriptions(
245
+ model: "whisper-1",
246
+ file: File.open("audio.mp3", "rb")
160
247
  )
161
248
 
162
- response = client.chat_completions(
163
- model: "gpt-4.1-mini",
164
- messages: [{ "role" => "user", "content" => "Hello" }]
165
- )
249
+ puts response.body["text"]
250
+ ```
166
251
 
167
- pp response[:body]
252
+ ### Audio Translation
253
+
254
+ ```ruby
255
+ response = client.audio_translations(
256
+ model: "whisper-1",
257
+ file: File.open("audio.mp3", "rb")
258
+ )
168
259
  ```
169
260
 
170
- #### Streaming chat completions (SSE)
261
+ ### List Models
171
262
 
172
- For OpenAI-style streaming (`text/event-stream`), use `chat_completions_stream`. It yields parsed JSON events (Ruby `Hash`), so you can consume deltas incrementally:
263
+ ```ruby
264
+ model_ids = client.models
265
+ ```
266
+
267
+ ### Health Check
173
268
 
174
269
  ```ruby
175
- client.chat_completions_stream(
176
- model: "gpt-4.1-mini",
177
- messages: [{ "role" => "user", "content" => "Hello" }]
178
- ) do |event|
179
- delta = event.dig("choices", 0, "delta", "content")
180
- print delta if delta
270
+ # Returns full response
271
+ response = client.health
272
+
273
+ # Returns boolean
274
+ if client.healthy?
275
+ puts "Service is up!"
181
276
  end
182
- puts
183
277
  ```
184
278
 
185
- If you prefer, it also returns an Enumerator:
279
+ ## Response Format
280
+
281
+ All HTTP methods return a `SimpleInference::Response` with:
186
282
 
187
283
  ```ruby
188
- client.chat_completions_stream(model: "gpt-4.1-mini", messages: [...]).each do |event|
189
- # ...
190
- end
284
+ response.status # Integer HTTP status code
285
+ response.headers # Hash with downcased String keys
286
+ response.body # Parsed JSON (Hash/Array), raw String, or nil (SSE success)
287
+ response.success? # true for 2xx
191
288
  ```
192
289
 
193
- Fallback behavior:
290
+ ## Error Handling
194
291
 
195
- - If the upstream service does **not** support streaming (for example, this repo's server currently returns `400` with `{"detail":"Streaming responses are not supported yet"}`), the SDK will **retry non-streaming** and yield a **single synthetic chunk** so your streaming consumer code can still run.
292
+ By default, non-2xx responses raise exceptions:
196
293
 
197
- #### Connect to any OpenAI-compatible endpoint
294
+ ```ruby
295
+ begin
296
+ client.chat_completions(model: "invalid", messages: [])
297
+ rescue SimpleInference::Errors::HTTPError => e
298
+ puts "HTTP #{e.status}: #{e.message}"
299
+ p e.body # parsed body (Hash/Array/String)
300
+ puts e.raw_body # raw response body string (if available)
301
+ end
302
+ ```
303
+
304
+ Other exception types:
198
305
 
199
- For services that expose an OpenAI-compatible API (same paths and payloads), point `base_url` at that service and provide the correct token:
306
+ - `SimpleInference::Errors::TimeoutError` Request timed out
307
+ - `SimpleInference::Errors::ConnectionError` – Network error
308
+ - `SimpleInference::Errors::DecodeError` – JSON parsing failed
309
+ - `SimpleInference::Errors::ConfigurationError` – Invalid configuration
310
+
311
+ To handle errors manually:
200
312
 
201
313
  ```ruby
202
314
  client = SimpleInference::Client.new(
203
- base_url: "https://my-openai-compatible.example.com",
204
- api_key: ENV["MY_SERVICE_TOKEN"]
315
+ base_url: "https://api.openai.com",
316
+ api_key: ENV["OPENAI_API_KEY"],
317
+ raise_on_error: false
205
318
  )
319
+
320
+ response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
321
+
322
+ if response.success?
323
+ # success
324
+ else
325
+ puts "Error: #{response.status} - #{response.body}"
326
+ end
206
327
  ```
207
328
 
208
- If the service uses a non-standard header instead of `Authorization: Bearer`, you can omit `api_key` and pass headers explicitly:
329
+ ## HTTP Adapters
330
+
331
+ ### Default (Net::HTTP)
332
+
333
+ The default adapter uses Ruby's built-in `Net::HTTP`. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
334
+
335
+ ### HTTPX Adapter
336
+
337
+ For better performance or async environments, use the optional HTTPX adapter:
209
338
 
210
339
  ```ruby
340
+ # Gemfile
341
+ gem "httpx"
342
+ ```
343
+
344
+ ```ruby
345
+ adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
346
+
211
347
  client = SimpleInference::Client.new(
212
- base_url: "https://my-service.example.com",
213
- headers: {
214
- "x-api-key" => ENV["MY_SERVICE_KEY"]
215
- }
348
+ base_url: "https://api.openai.com",
349
+ api_key: ENV["OPENAI_API_KEY"],
350
+ adapter: adapter
216
351
  )
217
352
  ```
218
353
 
219
- ### Puma vs Falcon (Fiber / Async) usage
354
+ ### Custom Adapter
220
355
 
221
- The default HTTP adapter uses Ruby's `Net::HTTP` and is safe to use under Puma's multithreaded model:
356
+ Implement your own adapter by subclassing `SimpleInference::HTTPAdapter`:
222
357
 
223
- - No global mutable state
224
- - Per-client configuration only
225
- - Blocking IO that integrates with Ruby 3 Fiber scheduler
358
+ ```ruby
359
+ class MyAdapter < SimpleInference::HTTPAdapter
360
+ def call(request)
361
+ # request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
362
+ # Must return: { status: Integer, headers: Hash, body: String }
363
+ end
226
364
 
227
- If you don't pass an adapter, `SimpleInference::Client` uses `SimpleInference::HTTPAdapters::Default` (Net::HTTP).
365
+ def call_stream(request, &block)
366
+ # For streaming support (optional)
367
+ # Yield raw chunks to block for SSE responses
368
+ end
369
+ end
370
+ ```
371
+
372
+ ## Rails Integration
228
373
 
229
- For Falcon / async environments, you can keep the default adapter, or use the optional HTTPX adapter (requires the `httpx` gem):
374
+ Create an initializer `config/initializers/simple_inference.rb`:
230
375
 
231
376
  ```ruby
232
- gem "httpx" # optional, only required when using the HTTPX adapter
377
+ INFERENCE_CLIENT = SimpleInference::Client.new(
378
+ base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
379
+ api_key: ENV["INFERENCE_API_KEY"]
380
+ )
233
381
  ```
234
382
 
235
- You can then use the optional HTTPX adapter shipped with this gem:
383
+ Use in controllers:
236
384
 
237
385
  ```ruby
238
- adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
386
+ class ChatsController < ApplicationController
387
+ def create
388
+ response = INFERENCE_CLIENT.chat_completions(
389
+ model: "gpt-4o-mini",
390
+ messages: [{ "role" => "user", "content" => params[:prompt] }]
391
+ )
392
+
393
+ render json: response.body
394
+ end
395
+ end
396
+ ```
397
+
398
+ Use in background jobs:
399
+
400
+ ```ruby
401
+ class EmbedJob < ApplicationJob
402
+ def perform(text)
403
+ response = INFERENCE_CLIENT.embeddings(
404
+ model: "text-embedding-3-small",
405
+ input: text
406
+ )
239
407
 
240
- SIMPLE_INFERENCE_CLIENT =
241
- SimpleInference::Client.new(
242
- base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
243
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
244
- adapter: adapter
245
- )
408
+ vector = response.body["data"][0]["embedding"]
409
+ # Store vector...
410
+ end
411
+ end
246
412
  ```
413
+
414
+ ## Thread Safety
415
+
416
+ The client is thread-safe:
417
+
418
+ - No global mutable state
419
+ - Per-client configuration only
420
+ - Each request uses its own HTTP connection
421
+
422
+ ## License
423
+
424
+ MIT License. See [LICENSE](LICENSE.txt) for details.
@@ -22,8 +22,112 @@ module SimpleInference
22
22
 
23
23
  # POST /v1/chat/completions
24
24
  # params: { model: "model-name", messages: [...], ... }
25
- def chat_completions(params)
26
- post_json("/v1/chat/completions", params)
25
+ def chat_completions(**params)
26
+ post_json(api_path("/chat/completions"), params)
27
+ end
28
+
29
+ # High-level helper for OpenAI-compatible chat.
30
+ #
31
+ # - Non-streaming: returns an OpenAI::ChatResult with `content` + `usage`.
32
+ # - Streaming: yields delta strings to the block (if given), accumulates, and returns OpenAI::ChatResult.
33
+ #
34
+ # @param model [String]
35
+ # @param messages [Array<Hash>]
36
+ # @param stream [Boolean] force streaming when true (default: block_given?)
37
+ # @param include_usage [Boolean, nil] when true (and streaming), requests usage in the final chunk
38
+ # @param request_logprobs [Boolean] when true, requests logprobs (and collects them in streaming mode)
39
+ # @param top_logprobs [Integer, nil] default: 5 (when request_logprobs is true)
40
+ # @param params [Hash] additional OpenAI parameters (max_tokens, temperature, etc.)
41
+ # @yield [String] delta content chunks (streaming only)
42
+ # @return [SimpleInference::OpenAI::ChatResult]
43
+ def chat(model:, messages:, stream: nil, include_usage: nil, request_logprobs: false, top_logprobs: 5, **params, &block)
44
+ raise ArgumentError, "model is required" if model.nil? || model.to_s.strip.empty?
45
+ raise ArgumentError, "messages must be an Array" unless messages.is_a?(Array)
46
+
47
+ use_stream = stream.nil? ? block_given? : stream
48
+
49
+ request = { model: model, messages: messages }.merge(params)
50
+ request.delete(:stream)
51
+ request.delete("stream")
52
+
53
+ if request_logprobs
54
+ request[:logprobs] = true unless request.key?(:logprobs) || request.key?("logprobs")
55
+ if top_logprobs && !(request.key?(:top_logprobs) || request.key?("top_logprobs"))
56
+ request[:top_logprobs] = top_logprobs
57
+ end
58
+ end
59
+
60
+ if use_stream && include_usage
61
+ stream_options = request[:stream_options] || request["stream_options"]
62
+ stream_options ||= {}
63
+
64
+ if stream_options.is_a?(Hash)
65
+ stream_options[:include_usage] = true unless stream_options.key?(:include_usage) || stream_options.key?("include_usage")
66
+ end
67
+
68
+ request[:stream_options] = stream_options
69
+ end
70
+
71
+ if use_stream
72
+ full = +""
73
+ finish_reason = nil
74
+ last_usage = nil
75
+ collected_logprobs = []
76
+
77
+ response =
78
+ chat_completions_stream(**request) do |event|
79
+ delta = OpenAI.chat_completion_chunk_delta(event)
80
+ if delta
81
+ full << delta
82
+ block.call(delta) if block
83
+ end
84
+
85
+ fr = event.is_a?(Hash) ? event.dig("choices", 0, "finish_reason") : nil
86
+ finish_reason = fr if fr
87
+
88
+ if request_logprobs
89
+ chunk_logprobs = event.is_a?(Hash) ? event.dig("choices", 0, "logprobs", "content") : nil
90
+ if chunk_logprobs.is_a?(Array)
91
+ collected_logprobs.concat(chunk_logprobs)
92
+ end
93
+ end
94
+
95
+ usage = OpenAI.chat_completion_usage(event)
96
+ last_usage = usage if usage
97
+ end
98
+
99
+ OpenAI::ChatResult.new(
100
+ content: full,
101
+ usage: last_usage || OpenAI.chat_completion_usage(response),
102
+ finish_reason: finish_reason || OpenAI.chat_completion_finish_reason(response),
103
+ logprobs: collected_logprobs.empty? ? OpenAI.chat_completion_logprobs(response) : collected_logprobs,
104
+ response: response
105
+ )
106
+ else
107
+ response = chat_completions(**request)
108
+ OpenAI::ChatResult.new(
109
+ content: OpenAI.chat_completion_content(response),
110
+ usage: OpenAI.chat_completion_usage(response),
111
+ finish_reason: OpenAI.chat_completion_finish_reason(response),
112
+ logprobs: OpenAI.chat_completion_logprobs(response),
113
+ response: response
114
+ )
115
+ end
116
+ end
117
+
118
+ # Streaming chat as an Enumerable.
119
+ #
120
+ # @return [SimpleInference::OpenAI::ChatStream]
121
+ def chat_stream(model:, messages:, include_usage: nil, request_logprobs: false, top_logprobs: 5, **params)
122
+ OpenAI::ChatStream.new(
123
+ client: self,
124
+ model: model,
125
+ messages: messages,
126
+ include_usage: include_usage,
127
+ request_logprobs: request_logprobs,
128
+ top_logprobs: top_logprobs,
129
+ params: params
130
+ )
27
131
  end
28
132
 
29
133
  # POST /v1/chat/completions (streaming)
@@ -31,45 +135,41 @@ module SimpleInference
31
135
  # Yields parsed JSON events from an OpenAI-style SSE stream (`text/event-stream`).
32
136
  #
33
137
  # If no block is given, returns an Enumerator.
34
- def chat_completions_stream(params)
35
- return enum_for(:chat_completions_stream, params) unless block_given?
36
-
37
- unless params.is_a?(Hash)
38
- raise Errors::ConfigurationError, "params must be a Hash"
39
- end
138
+ def chat_completions_stream(**params)
139
+ return enum_for(:chat_completions_stream, **params) unless block_given?
40
140
 
41
141
  body = params.dup
42
142
  body.delete(:stream)
43
143
  body.delete("stream")
44
144
  body["stream"] = true
45
145
 
46
- response = post_json_stream("/v1/chat/completions", body) do |event|
146
+ response = post_json_stream(api_path("/chat/completions"), body) do |event|
47
147
  yield event
48
148
  end
49
149
 
50
- content_type = response.dig(:headers, "content-type").to_s
150
+ content_type = response.headers["content-type"].to_s
51
151
 
52
152
  # Streaming case: we already yielded events from the SSE stream.
53
- if response[:status].to_i >= 200 && response[:status].to_i < 300 && content_type.include?("text/event-stream")
153
+ if response.status >= 200 && response.status < 300 && content_type.include?("text/event-stream")
54
154
  return response
55
155
  end
56
156
 
57
157
  # Fallback when upstream does not support streaming (this repo's server).
58
- if streaming_unsupported_error?(response[:status], response[:body])
158
+ if streaming_unsupported_error?(response.status, response.body)
59
159
  fallback_body = params.dup
60
160
  fallback_body.delete(:stream)
61
161
  fallback_body.delete("stream")
62
162
 
63
- fallback_response = post_json("/v1/chat/completions", fallback_body)
64
- chunk = synthesize_chat_completion_chunk(fallback_response[:body])
163
+ fallback_response = post_json(api_path("/chat/completions"), fallback_body)
164
+ chunk = synthesize_chat_completion_chunk(fallback_response.body)
65
165
  yield chunk if chunk
66
166
  return fallback_response
67
167
  end
68
168
 
69
169
  # If we got a non-streaming success response (JSON), convert it into a single
70
170
  # chunk so streaming consumers can share the same code path.
71
- if response[:status].to_i >= 200 && response[:status].to_i < 300
72
- chunk = synthesize_chat_completion_chunk(response[:body])
171
+ if response.status >= 200 && response.status < 300
172
+ chunk = synthesize_chat_completion_chunk(response.body)
73
173
  yield chunk if chunk
74
174
  end
75
175
 
@@ -77,18 +177,27 @@ module SimpleInference
77
177
  end
78
178
 
79
179
  # POST /v1/embeddings
80
- def embeddings(params)
81
- post_json("/v1/embeddings", params)
180
+ def embeddings(**params)
181
+ post_json(api_path("/embeddings"), params)
82
182
  end
83
183
 
84
184
  # POST /v1/rerank
85
- def rerank(params)
86
- post_json("/v1/rerank", params)
185
+ def rerank(**params)
186
+ post_json(api_path("/rerank"), params)
87
187
  end
88
188
 
89
189
  # GET /v1/models
90
190
  def list_models
91
- get_json("/v1/models")
191
+ get_json(api_path("/models"))
192
+ end
193
+
194
+ # Convenience wrapper for list_models.
195
+ #
196
+ # @return [Array<String>] model IDs
197
+ def models
198
+ response = list_models
199
+ data = response.body.is_a?(Hash) ? response.body["data"] : nil
200
+ Array(data).filter_map { |m| m.is_a?(Hash) ? m["id"] : nil }
92
201
  end
93
202
 
94
203
  # GET /health
@@ -99,8 +208,8 @@ module SimpleInference
99
208
  # Returns true when service is healthy, false otherwise.
100
209
  def healthy?
101
210
  response = get_json("/health", raise_on_http_error: false)
102
- status_ok = response[:status] == 200
103
- body_status_ok = response.dig(:body, "status") == "ok"
211
+ status_ok = response.status == 200
212
+ body_status_ok = response.body.is_a?(Hash) && response.body["status"] == "ok"
104
213
  status_ok && body_status_ok
105
214
  rescue Errors::Error
106
215
  false
@@ -108,13 +217,13 @@ module SimpleInference
108
217
 
109
218
  # POST /v1/audio/transcriptions
110
219
  # params: { file: io_or_hash, model: "model-name", **audio_options }
111
- def audio_transcriptions(params)
112
- post_multipart("/v1/audio/transcriptions", params)
220
+ def audio_transcriptions(**params)
221
+ post_multipart(api_path("/audio/transcriptions"), params)
113
222
  end
114
223
 
115
224
  # POST /v1/audio/translations
116
- def audio_translations(params)
117
- post_multipart("/v1/audio/translations", params)
225
+ def audio_translations(**params)
226
+ post_multipart(api_path("/audio/translations"), params)
118
227
  end
119
228
 
120
229
  private
@@ -123,6 +232,10 @@ module SimpleInference
123
232
  config.base_url
124
233
  end
125
234
 
235
+ def api_path(endpoint)
236
+ "#{config.api_prefix}#{endpoint}"
237
+ end
238
+
126
239
  def get_json(path, params: nil, raise_on_http_error: nil)
127
240
  full_path = with_query(path, params)
128
241
  request_json(
@@ -199,31 +312,26 @@ module SimpleInference
199
312
  consume_sse_buffer!(buffer, &on_event)
200
313
  end
201
314
 
202
- return {
203
- status: status,
204
- headers: headers,
205
- body: nil,
206
- }
315
+ return Response.new(status: status, headers: headers, body: nil)
207
316
  end
208
317
 
209
318
  # Non-streaming response path (adapter doesn't support streaming or server returned JSON).
210
319
  should_parse_json = content_type.include?("json")
211
- parsed_body = should_parse_json ? parse_json(body_str) : body_str
212
-
213
- maybe_raise_http_error(
214
- status: status,
215
- headers: headers,
216
- body_str: body_str,
217
- raise_on_http_error: raise_on_http_error,
218
- ignore_streaming_unsupported: true,
219
- parsed_body: parsed_body
220
- )
320
+ parsed_body =
321
+ if should_parse_json
322
+ begin
323
+ parse_json(body_str)
324
+ rescue Errors::DecodeError
325
+ # Prefer HTTPError over DecodeError for non-2xx responses.
326
+ status >= 200 && status < 300 ? raise : body_str
327
+ end
328
+ else
329
+ body_str
330
+ end
221
331
 
222
- {
223
- status: status,
224
- headers: headers,
225
- body: parsed_body,
226
- }
332
+ response = Response.new(status: status, headers: headers, body: parsed_body, raw_body: body_str)
333
+ maybe_raise_http_error(response: response, raise_on_http_error: raise_on_http_error, ignore_streaming_unsupported: true)
334
+ response
227
335
  rescue Timeout::Error => e
228
336
  raise Errors::TimeoutError, e.message
229
337
  rescue SocketError, SystemCallError => e
@@ -575,13 +683,6 @@ module SimpleInference
575
683
  headers = (response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
576
684
  body = response[:body].to_s
577
685
 
578
- maybe_raise_http_error(
579
- status: status,
580
- headers: headers,
581
- body_str: body,
582
- raise_on_http_error: raise_on_http_error
583
- )
584
-
585
686
  should_parse_json =
586
687
  if expect_json.nil?
587
688
  content_type = headers["content-type"]
@@ -592,16 +693,19 @@ module SimpleInference
592
693
 
593
694
  parsed_body =
594
695
  if should_parse_json
595
- parse_json(body)
696
+ begin
697
+ parse_json(body)
698
+ rescue Errors::DecodeError
699
+ # Prefer HTTPError over DecodeError for non-2xx responses.
700
+ status >= 200 && status < 300 ? raise : body
701
+ end
596
702
  else
597
703
  body
598
704
  end
599
705
 
600
- {
601
- status: status,
602
- headers: headers,
603
- body: parsed_body,
604
- }
706
+ response = Response.new(status: status, headers: headers, body: parsed_body, raw_body: body)
707
+ maybe_raise_http_error(response: response, raise_on_http_error: raise_on_http_error)
708
+ response
605
709
  rescue Timeout::Error => e
606
710
  raise Errors::TimeoutError, e.message
607
711
  rescue SocketError, SystemCallError => e
@@ -644,26 +748,17 @@ module SimpleInference
644
748
  end
645
749
  end
646
750
 
647
- def maybe_raise_http_error(
648
- status:,
649
- headers:,
650
- body_str:,
651
- raise_on_http_error:,
652
- ignore_streaming_unsupported: false,
653
- parsed_body: nil
654
- )
751
+ def maybe_raise_http_error(response:, raise_on_http_error:, ignore_streaming_unsupported: false)
655
752
  return unless raise_on_http_error?(raise_on_http_error)
656
- return unless status < 200 || status >= 300
753
+ return if response.success?
657
754
 
658
755
  # Do not raise for the known "streaming unsupported" case; the caller will
659
756
  # perform a non-streaming retry fallback.
660
- return if ignore_streaming_unsupported && streaming_unsupported_error?(status, parsed_body)
757
+ return if ignore_streaming_unsupported && streaming_unsupported_error?(response.status, response.body)
661
758
 
662
759
  raise Errors::HTTPError.new(
663
- http_error_message(status, body_str, parsed_body: parsed_body),
664
- status: status,
665
- headers: headers,
666
- body: body_str
760
+ http_error_message(response.status, response.raw_body.to_s, parsed_body: response.body),
761
+ response: response
667
762
  )
668
763
  end
669
764
  end
@@ -4,6 +4,7 @@ module SimpleInference
4
4
  class Config
5
5
  attr_reader :base_url,
6
6
  :api_key,
7
+ :api_prefix,
7
8
  :timeout,
8
9
  :open_timeout,
9
10
  :read_timeout,
@@ -19,6 +20,10 @@ module SimpleInference
19
20
  @api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
20
21
  @api_key = nil if @api_key.empty?
21
22
 
23
+ @api_prefix = normalize_api_prefix(
24
+ opts.key?(:api_prefix) ? opts[:api_prefix] : ENV.fetch("SIMPLE_INFERENCE_API_PREFIX", "/v1")
25
+ )
26
+
22
27
  @timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
23
28
  @open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
24
29
  @read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
@@ -46,6 +51,17 @@ module SimpleInference
46
51
  url.chomp("/")
47
52
  end
48
53
 
54
+ def normalize_api_prefix(value)
55
+ return "" if value.nil?
56
+
57
+ prefix = value.to_s.strip
58
+ return "" if prefix.empty?
59
+
60
+ # Ensure it starts with / and does not end with /
61
+ prefix = "/#{prefix}" unless prefix.start_with?("/")
62
+ prefix.chomp("/")
63
+ end
64
+
49
65
  def to_float_or_nil(value)
50
66
  return nil if value.nil? || value == ""
51
67
 
@@ -7,14 +7,20 @@ module SimpleInference
7
7
  class ConfigurationError < Error; end
8
8
 
9
9
  class HTTPError < Error
10
- attr_reader :status, :headers, :body
10
+ attr_reader :response
11
11
 
12
- def initialize(message, status:, headers:, body:)
12
+ def initialize(message, response:)
13
13
  super(message)
14
- @status = status
15
- @headers = headers
16
- @body = body
14
+ @response = response
17
15
  end
16
+
17
+ def status = @response.status
18
+
19
+ def headers = @response.headers
20
+
21
+ def body = @response.body
22
+
23
+ def raw_body = @response.raw_body
18
24
  end
19
25
 
20
26
  class TimeoutError < Error; end
@@ -0,0 +1,178 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SimpleInference
4
+ # Helpers for extracting common fields from OpenAI-compatible `chat/completions` payloads.
5
+ #
6
+ # These helpers accept either:
7
+ # - A `SimpleInference::Response`, or
8
+ # - A parsed `body` / `chunk` hash (typically from JSON.parse, with String keys)
9
+ #
10
+ # Providers are "OpenAI-compatible", but many differ in subtle ways:
11
+ # - Some return `choices[0].text` instead of `choices[0].message.content`
12
+ # - Some represent `content` as an array or structured hash
13
+ #
14
+ # This module normalizes those shapes so application code can stay small and predictable.
15
+ module OpenAI
16
+ module_function
17
+
18
+ ChatResult =
19
+ Struct.new(
20
+ :content,
21
+ :usage,
22
+ :finish_reason,
23
+ :logprobs,
24
+ :response,
25
+ keyword_init: true
26
+ )
27
+
28
+ # Enumerable wrapper for streaming chat responses.
29
+ #
30
+ # @example
31
+ # stream = client.chat_stream(model: "...", messages: [...], include_usage: true)
32
+ # stream.each { |delta| print delta }
33
+ # p stream.result.usage
34
+ class ChatStream
35
+ include Enumerable
36
+
37
+ attr_reader :result
38
+
39
+ def initialize(client:, model:, messages:, include_usage:, request_logprobs:, top_logprobs:, params:)
40
+ @client = client
41
+ @model = model
42
+ @messages = messages
43
+ @include_usage = include_usage
44
+ @request_logprobs = request_logprobs
45
+ @top_logprobs = top_logprobs
46
+ @params = params
47
+ @started = false
48
+ @result = nil
49
+ end
50
+
51
+ def each
52
+ return enum_for(:each) unless block_given?
53
+ raise Errors::ConfigurationError, "ChatStream can only be consumed once" if @started
54
+
55
+ @started = true
56
+ @result =
57
+ @client.chat(
58
+ model: @model,
59
+ messages: @messages,
60
+ stream: true,
61
+ include_usage: @include_usage,
62
+ request_logprobs: @request_logprobs,
63
+ top_logprobs: @top_logprobs,
64
+ **(@params || {})
65
+ ) { |delta| yield delta }
66
+ end
67
+ end
68
+
69
+ # Extract assistant content from a non-streaming chat completion.
70
+ #
71
+ # @param response_or_body [Hash] SimpleInference response hash or parsed body hash
72
+ # @return [String, nil]
73
+ def chat_completion_content(response_or_body)
74
+ body = unwrap_body(response_or_body)
75
+ choice = first_choice(body)
76
+ return nil unless choice
77
+
78
+ raw =
79
+ choice.dig("message", "content") ||
80
+ choice["text"]
81
+
82
+ normalize_content(raw)
83
+ end
84
+
85
+ # Extract finish_reason from a non-streaming chat completion.
86
+ #
87
+ # @param response_or_body [Hash] SimpleInference response hash or parsed body hash
88
+ # @return [String, nil]
89
+ def chat_completion_finish_reason(response_or_body)
90
+ body = unwrap_body(response_or_body)
91
+ first_choice(body)&.[]("finish_reason")
92
+ end
93
+
94
+ # Extract usage from a chat completion response or a final streaming chunk.
95
+ #
96
+ # @param response_or_body [Hash] SimpleInference response hash, body hash, or chunk hash
97
+ # @return [Hash, nil] symbol-keyed usage hash
98
+ def chat_completion_usage(response_or_body)
99
+ body = unwrap_body(response_or_body)
100
+ usage = body.is_a?(Hash) ? body["usage"] : nil
101
+ return nil unless usage.is_a?(Hash)
102
+
103
+ {
104
+ prompt_tokens: usage["prompt_tokens"],
105
+ completion_tokens: usage["completion_tokens"],
106
+ total_tokens: usage["total_tokens"],
107
+ }.compact
108
+ end
109
+
110
+ # Extract logprobs (if present) from a non-streaming chat completion.
111
+ #
112
+ # @param response_or_body [Hash] SimpleInference response hash or parsed body hash
113
+ # @return [Array<Hash>, nil]
114
+ def chat_completion_logprobs(response_or_body)
115
+ body = unwrap_body(response_or_body)
116
+ first_choice(body)&.dig("logprobs", "content")
117
+ end
118
+
119
+ # Extract delta content from a streaming `chat.completion.chunk`.
120
+ #
121
+ # @param chunk [Hash] parsed streaming event hash
122
+ # @return [String, nil]
123
+ def chat_completion_chunk_delta(chunk)
124
+ chunk = unwrap_body(chunk)
125
+ return nil unless chunk.is_a?(Hash)
126
+
127
+ raw = chunk.dig("choices", 0, "delta", "content")
128
+ normalize_content(raw)
129
+ end
130
+
131
+ # Normalize `content` shapes into a simple String.
132
+ #
133
+ # Supports strings, arrays of parts, and part hashes.
134
+ #
135
+ # @param value [Object]
136
+ # @return [String, nil]
137
+ def normalize_content(value)
138
+ case value
139
+ when String
140
+ value
141
+ when Array
142
+ value.map { |part| normalize_content(part) }.join
143
+ when Hash
144
+ value["text"] ||
145
+ value["content"] ||
146
+ value.to_s
147
+ when nil
148
+ nil
149
+ else
150
+ value.to_s
151
+ end
152
+ end
153
+
154
+ # Unwrap a full SimpleInference response into its `:body`, otherwise return the object.
155
+ #
156
+ # @param obj [Object]
157
+ # @return [Object]
158
+ def unwrap_body(obj)
159
+ return {} unless obj
160
+ return obj.body || {} if obj.respond_to?(:body)
161
+
162
+ obj
163
+ end
164
+
165
+ def first_choice(body)
166
+ return nil unless body.is_a?(Hash)
167
+
168
+ choices = body["choices"]
169
+ return nil unless choices.is_a?(Array) && !choices.empty?
170
+
171
+ choice0 = choices[0]
172
+ return nil unless choice0.is_a?(Hash)
173
+
174
+ choice0
175
+ end
176
+ private_class_method :first_choice
177
+ end
178
+ end
@@ -0,0 +1,28 @@
1
+ # frozen_string_literal: true
2
+
3
+ module SimpleInference
4
+ # A lightweight wrapper for HTTP responses returned by SimpleInference.
5
+ #
6
+ # - `status` is an Integer HTTP status code
7
+ # - `headers` is a Hash with downcased String keys
8
+ # - `body` is a parsed JSON Hash/Array, a String, or nil (e.g. SSE streaming success)
9
+ # - `raw_body` is the raw response body String (when available)
10
+ class Response
11
+ attr_reader :status, :headers, :body, :raw_body
12
+
13
+ def initialize(status:, headers:, body:, raw_body: nil)
14
+ @status = status.to_i
15
+ @headers = (headers || {}).transform_keys { |k| k.to_s.downcase }
16
+ @body = body
17
+ @raw_body = raw_body
18
+ end
19
+
20
+ def success?
21
+ status >= 200 && status < 300
22
+ end
23
+
24
+ def to_h
25
+ { status: status, headers: headers, body: body, raw_body: raw_body }
26
+ end
27
+ end
28
+ end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleInference
4
- VERSION = "0.1.3"
4
+ VERSION = "0.1.5"
5
5
  end
@@ -4,6 +4,8 @@ require_relative "simple_inference/version"
4
4
  require_relative "simple_inference/config"
5
5
  require_relative "simple_inference/errors"
6
6
  require_relative "simple_inference/http_adapter"
7
+ require_relative "simple_inference/response"
8
+ require_relative "simple_inference/openai"
7
9
  require_relative "simple_inference/client"
8
10
 
9
11
  module SimpleInference
@@ -1,4 +1,71 @@
1
1
  module SimpleInference
2
2
  VERSION: String
3
- end
4
3
 
4
+ class Response
5
+ attr_reader status: Integer
6
+ attr_reader headers: Hash[String, untyped]
7
+ attr_reader body: untyped
8
+ attr_reader raw_body: String?
9
+
10
+ def initialize: (status: Integer, headers: Hash[untyped, untyped], body: untyped, ?raw_body: String?) -> void
11
+ def success?: () -> bool
12
+ def to_h: () -> Hash[Symbol, untyped]
13
+ end
14
+
15
+ module OpenAI
16
+ class ChatResult
17
+ attr_reader content: String?
18
+ attr_reader usage: Hash[Symbol, untyped]?
19
+ attr_reader finish_reason: String?
20
+ attr_reader logprobs: Array[Hash[untyped, untyped]]?
21
+ attr_reader response: Response
22
+ end
23
+
24
+ class ChatStream
25
+ include Enumerable[String]
26
+ attr_reader result: ChatResult?
27
+ end
28
+
29
+ def self.chat_completion_content: (untyped) -> String?
30
+ def self.chat_completion_finish_reason: (untyped) -> String?
31
+ def self.chat_completion_usage: (untyped) -> Hash[Symbol, untyped]?
32
+ def self.chat_completion_logprobs: (untyped) -> Array[Hash[untyped, untyped]]?
33
+ def self.chat_completion_chunk_delta: (untyped) -> String?
34
+ def self.normalize_content: (untyped) -> String?
35
+ end
36
+
37
+ class Client
38
+ def initialize: (?Hash[untyped, untyped]) -> void
39
+
40
+ def chat: (
41
+ model: String,
42
+ messages: Array[Hash[untyped, untyped]],
43
+ ?stream: bool?,
44
+ ?include_usage: bool?,
45
+ ?request_logprobs: bool,
46
+ ?top_logprobs: Integer?,
47
+ **untyped
48
+ ) { (String) -> void } -> OpenAI::ChatResult
49
+
50
+ def chat_stream: (
51
+ model: String,
52
+ messages: Array[Hash[untyped, untyped]],
53
+ ?include_usage: bool?,
54
+ ?request_logprobs: bool,
55
+ ?top_logprobs: Integer?,
56
+ **untyped
57
+ ) -> OpenAI::ChatStream
58
+
59
+ def chat_completions: (**untyped) -> Response
60
+ def chat_completions_stream: (**untyped) { (Hash[untyped, untyped]) -> void } -> Response
61
+
62
+ def embeddings: (**untyped) -> Response
63
+ def rerank: (**untyped) -> Response
64
+ def list_models: () -> Response
65
+ def models: () -> Array[String]
66
+ def health: () -> Response
67
+ def healthy?: () -> bool
68
+ def audio_transcriptions: (**untyped) -> Response
69
+ def audio_translations: (**untyped) -> Response
70
+ end
71
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: simple_inference
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.1.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - jasl
@@ -9,8 +9,8 @@ bindir: exe
9
9
  cert_chain: []
10
10
  date: 1980-01-02 00:00:00.000000000 Z
11
11
  dependencies: []
12
- description: Fiber-friendly Ruby client for Simple Inference Server APIs (chat, embeddings,
13
- audio, rerank, health).
12
+ description: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
13
+ (chat, embeddings, audio, rerank, health).
14
14
  email:
15
15
  - jasl9187@hotmail.com
16
16
  executables: []
@@ -27,15 +27,16 @@ files:
27
27
  - lib/simple_inference/http_adapter.rb
28
28
  - lib/simple_inference/http_adapters/default.rb
29
29
  - lib/simple_inference/http_adapters/httpx.rb
30
+ - lib/simple_inference/openai.rb
31
+ - lib/simple_inference/response.rb
30
32
  - lib/simple_inference/version.rb
31
33
  - sig/simple_inference.rbs
32
- homepage: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
34
+ homepage: https://github.com/jasl/simple_inference.rb
33
35
  licenses:
34
36
  - MIT
35
37
  metadata:
36
38
  allowed_push_host: https://rubygems.org
37
- homepage_uri: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
38
- source_code_uri: https://github.com/jasl/simple_inference_server
39
+ homepage_uri: https://github.com/jasl/simple_inference.rb
39
40
  rubygems_mfa_required: 'true'
40
41
  rdoc_options: []
41
42
  require_paths:
@@ -51,7 +52,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
51
52
  - !ruby/object:Gem::Version
52
53
  version: '0'
53
54
  requirements: []
54
- rubygems_version: 4.0.1
55
+ rubygems_version: 4.0.3
55
56
  specification_version: 4
56
- summary: Fiber-friendly Ruby client for the Simple Inference Server (OpenAI-compatible).
57
+ summary: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
57
58
  test_files: []