simple_inference 0.1.3 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 50a8ea07e0e30771b6d42f2bacf12f12b379f1151bb44947bb5febe3cac70cf9
4
- data.tar.gz: c655b141ea39e518c5cdcc30cc28bac249ef23acfae8c05f4cf852e199a98a80
3
+ metadata.gz: 8d8b01060969cbab2df30a38e16b7952a877188e89bd720209c15b57f9f79687
4
+ data.tar.gz: e278f52f76cf6f7bd3f74e567731bbdec016769b2b720161e9907348fd9b54c3
5
5
  SHA512:
6
- metadata.gz: '0483fda13365abb75bde99643ada2fc95017e4883e07d823d8b86912e9553764befe7c6a92222aeca3ee05d20e8f6b1e1a23a4a74decf45383bc4cc9c055d357'
7
- data.tar.gz: 4d115195078c198b2c2c1b4bc1307a05e337884449b19393488374cd1710efaf6186e68c7060417141e5477933ea3fafc5223896adc06c1d1063263437254d56
6
+ metadata.gz: cc6724a0fbe640d7af0d6bb35bfee81e6b95d501b23734f2874dfddbb2f71dcb7ae59557b742427bb9322804fbca632cbe95abe68f9ea26709303fea86550605
7
+ data.tar.gz: 871b06d6e585bac84cf38ac3abef77b3940dd41f4868c76e08b19c317c2b35c93f81adde9a0ec73e9c20a689062cade65c0115d6e82afab86444d253f9964688
data/README.md CHANGED
@@ -1,13 +1,24 @@
1
- ## simple_inference Ruby SDK
1
+ # SimpleInference
2
2
 
3
- Fiber-friendly Ruby client for the Simple Inference Server APIs (chat, embeddings, audio, rerank, health), designed to work well inside Rails apps and background jobs.
3
+ A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
4
4
 
5
- ### Installation
5
+ Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with `Net::HTTP`.
6
6
 
7
- Add the gem to your Rails application's `Gemfile`, pointing at this repository path:
7
+ ## Features
8
+
9
+ - 🔌 **Universal compatibility** – Works with any OpenAI-compatible API provider
10
+ - 🌊 **Streaming support** – Native SSE streaming for chat completions
11
+ - 🧵 **Fiber-friendly** – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
12
+ - 🔧 **Flexible configuration** – Customizable API prefix for non-standard endpoints
13
+ - 🎯 **Simple interface** – Receive-an-Object / Return-an-Object style API
14
+ - 📦 **Zero runtime dependencies** – Uses only Ruby standard library
15
+
16
+ ## Installation
17
+
18
+ Add to your Gemfile:
8
19
 
9
20
  ```ruby
10
- gem "simple_inference", path: "sdks/ruby"
21
+ gem "simple_inference"
11
22
  ```
12
23
 
13
24
  Then run:
@@ -16,231 +27,378 @@ Then run:
16
27
  bundle install
17
28
  ```
18
29
 
19
- ### Configuration
20
-
21
- You can configure the client via environment variables:
22
-
23
- - `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
24
- - `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
25
- - `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
26
- - `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
27
-
28
- Or explicitly when constructing a client:
30
+ ## Quick Start
29
31
 
30
32
  ```ruby
33
+ require "simple_inference"
34
+
35
+ # Connect to OpenAI
31
36
  client = SimpleInference::Client.new(
32
- base_url: "http://localhost:8000",
33
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
34
- timeout: 30.0
37
+ base_url: "https://api.openai.com",
38
+ api_key: ENV["OPENAI_API_KEY"]
39
+ )
40
+
41
+ response = client.chat_completions(
42
+ model: "gpt-4o-mini",
43
+ messages: [{ "role" => "user", "content" => "Hello!" }]
35
44
  )
45
+
46
+ puts response[:body]["choices"][0]["message"]["content"]
36
47
  ```
37
48
 
38
- For convenience, you can also use the module constructor:
49
+ ## Configuration
50
+
51
+ ### Options
52
+
53
+ | Option | Env Variable | Default | Description |
54
+ |--------|--------------|---------|-------------|
55
+ | `base_url` | `SIMPLE_INFERENCE_BASE_URL` | `http://localhost:8000` | API base URL |
56
+ | `api_key` | `SIMPLE_INFERENCE_API_KEY` | `nil` | API key (sent as `Authorization: Bearer <token>`) |
57
+ | `api_prefix` | `SIMPLE_INFERENCE_API_PREFIX` | `/v1` | API path prefix (e.g., `/v1`, empty string for some providers) |
58
+ | `timeout` | `SIMPLE_INFERENCE_TIMEOUT` | `nil` | Request timeout in seconds |
59
+ | `open_timeout` | `SIMPLE_INFERENCE_OPEN_TIMEOUT` | `nil` | Connection open timeout |
60
+ | `read_timeout` | `SIMPLE_INFERENCE_READ_TIMEOUT` | `nil` | Read timeout |
61
+ | `raise_on_error` | `SIMPLE_INFERENCE_RAISE_ON_ERROR` | `true` | Raise exceptions on HTTP errors |
62
+ | `headers` | – | `{}` | Additional headers to send with requests |
63
+ | `adapter` | – | `Default` | HTTP adapter (see [Adapters](#http-adapters)) |
64
+
65
+ ### Provider Examples
66
+
67
+ #### OpenAI
39
68
 
40
69
  ```ruby
41
- client = SimpleInference.new(base_url: "http://localhost:8000")
70
+ client = SimpleInference::Client.new(
71
+ base_url: "https://api.openai.com",
72
+ api_key: ENV["OPENAI_API_KEY"]
73
+ )
42
74
  ```
43
75
 
44
- ### Rails integration example
76
+ #### 火山引擎 (Volcengine / ByteDance)
45
77
 
46
- Create an initializer, for example `config/initializers/simple_inference.rb`:
78
+ 火山引擎的 API 路径不包含 `/v1` 前缀,需要设置 `api_prefix: ""`:
47
79
 
48
80
  ```ruby
49
- SIMPLE_INFERENCE_CLIENT = SimpleInference::Client.new(
50
- base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
51
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"]
81
+ client = SimpleInference::Client.new(
82
+ base_url: "https://ark.cn-beijing.volces.com/api/v3",
83
+ api_key: ENV["ARK_API_KEY"],
84
+ api_prefix: "" # 重要:火山引擎不使用 /v1 前缀
85
+ )
86
+
87
+ response = client.chat_completions(
88
+ model: "deepseek-v3-250324",
89
+ messages: [
90
+ { "role" => "system", "content" => "你是人工智能助手" },
91
+ { "role" => "user", "content" => "你好" }
92
+ ]
52
93
  )
53
94
  ```
54
95
 
55
- Then in a controller:
96
+ #### DeepSeek
56
97
 
57
98
  ```ruby
58
- class ChatsController < ApplicationController
59
- def create
60
- result = SIMPLE_INFERENCE_CLIENT.chat_completions(
61
- model: "local-llm",
62
- messages: [
63
- { "role" => "user", "content" => params[:prompt] }
64
- ]
65
- )
66
-
67
- render json: result[:body], status: result[:status]
68
- end
69
- end
99
+ client = SimpleInference::Client.new(
100
+ base_url: "https://api.deepseek.com",
101
+ api_key: ENV["DEEPSEEK_API_KEY"]
102
+ )
70
103
  ```
71
104
 
72
- You can also use the client in background jobs:
105
+ #### Groq
73
106
 
74
107
  ```ruby
75
- class EmbedJob < ApplicationJob
76
- queue_as :default
108
+ client = SimpleInference::Client.new(
109
+ base_url: "https://api.groq.com/openai",
110
+ api_key: ENV["GROQ_API_KEY"]
111
+ )
112
+ ```
77
113
 
78
- def perform(text)
79
- result = SIMPLE_INFERENCE_CLIENT.embeddings(
80
- model: "bge-m3",
81
- input: text
82
- )
114
+ #### Together AI
83
115
 
84
- vector = result[:body]["data"].first["embedding"]
85
- # TODO: persist the vector (e.g. in DB or a vector store)
86
- end
87
- end
116
+ ```ruby
117
+ client = SimpleInference::Client.new(
118
+ base_url: "https://api.together.xyz",
119
+ api_key: ENV["TOGETHER_API_KEY"]
120
+ )
88
121
  ```
89
122
 
90
- And for health checks / maintenance tasks:
123
+ #### Local inference servers (Ollama, vLLM, etc.)
91
124
 
92
125
  ```ruby
93
- if SIMPLE_INFERENCE_CLIENT.healthy?
94
- Rails.logger.info("Inference server is healthy")
95
- else
96
- Rails.logger.warn("Inference server is unhealthy")
97
- end
126
+ # Ollama
127
+ client = SimpleInference::Client.new(
128
+ base_url: "http://localhost:11434"
129
+ )
98
130
 
99
- models = SIMPLE_INFERENCE_CLIENT.list_models
100
- Rails.logger.info("Available models: #{models[:body].inspect}")
131
+ # vLLM
132
+ client = SimpleInference::Client.new(
133
+ base_url: "http://localhost:8000"
134
+ )
101
135
  ```
102
136
 
103
- ### API methods
137
+ #### Custom authentication header
104
138
 
105
- - `client.chat_completions(params)` `POST /v1/chat/completions`
106
- - `client.embeddings(params)` → `POST /v1/embeddings`
107
- - `client.rerank(params)` → `POST /v1/rerank`
108
- - `client.list_models` → `GET /v1/models`
109
- - `client.health` → `GET /health`
110
- - `client.healthy?` → boolean helper based on `/health`
111
- - `client.audio_transcriptions(params)` → `POST /v1/audio/transcriptions`
112
- - `client.audio_translations(params)` → `POST /v1/audio/translations`
139
+ Some providers use non-standard authentication headers:
113
140
 
114
- All methods follow a Receive-an-Object / Return-an-Object style:
141
+ ```ruby
142
+ client = SimpleInference::Client.new(
143
+ base_url: "https://my-service.example.com",
144
+ api_prefix: "/v1",
145
+ headers: {
146
+ "x-api-key" => ENV["MY_SERVICE_KEY"]
147
+ }
148
+ )
149
+ ```
115
150
 
116
- - Input: a Ruby `Hash` (keys can be strings or symbols).
117
- - Output: a `Hash` with keys:
118
- - `:status` – HTTP status code
119
- - `:headers` – response headers (lowercased keys)
120
- - `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
151
+ ## API Methods
121
152
 
122
- ### Error handling
153
+ ### Chat Completions
123
154
 
124
- By default (`raise_on_error: true`) non-2xx HTTP responses raise:
155
+ ```ruby
156
+ response = client.chat_completions(
157
+ model: "gpt-4o-mini",
158
+ messages: [
159
+ { "role" => "system", "content" => "You are a helpful assistant." },
160
+ { "role" => "user", "content" => "Hello!" }
161
+ ],
162
+ temperature: 0.7,
163
+ max_tokens: 1000
164
+ )
125
165
 
126
- - `SimpleInference::Errors::HTTPError` – wraps status, headers and raw body.
166
+ puts response[:body]["choices"][0]["message"]["content"]
167
+ ```
127
168
 
128
- Network and parsing errors are mapped to:
169
+ ### Streaming Chat Completions
129
170
 
130
- - `SimpleInference::Errors::TimeoutError`
131
- - `SimpleInference::Errors::ConnectionError`
132
- - `SimpleInference::Errors::DecodeError`
171
+ ```ruby
172
+ client.chat_completions_stream(
173
+ model: "gpt-4o-mini",
174
+ messages: [{ "role" => "user", "content" => "Tell me a story" }]
175
+ ) do |event|
176
+ delta = event.dig("choices", 0, "delta", "content")
177
+ print delta if delta
178
+ end
179
+ puts
180
+ ```
133
181
 
134
- If you prefer to handle HTTP error codes manually, disable raising:
182
+ Or use as an Enumerator:
135
183
 
136
184
  ```ruby
137
- client = SimpleInference::Client.new(
138
- base_url: "http://localhost:8000",
139
- raise_on_error: false
185
+ stream = client.chat_completions_stream(
186
+ model: "gpt-4o-mini",
187
+ messages: [{ "role" => "user", "content" => "Hello" }]
140
188
  )
141
189
 
142
- response = client.embeddings(model: "local-embed", input: "hello")
143
- if response[:status] == 200
144
- # happy path
145
- else
146
- Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
190
+ stream.each do |event|
191
+ # process event
147
192
  end
148
193
  ```
149
194
 
150
- ### Using with OpenAI and compatible services
195
+ ### Embeddings
196
+
197
+ ```ruby
198
+ response = client.embeddings(
199
+ model: "text-embedding-3-small",
200
+ input: "Hello, world!"
201
+ )
151
202
 
152
- Because this SDK follows the OpenAI-style HTTP paths (`/v1/chat/completions`, `/v1/embeddings`, etc.), you can also point it directly at OpenAI or other compatible inference services.
203
+ vector = response[:body]["data"][0]["embedding"]
204
+ ```
153
205
 
154
- #### Connect to OpenAI
206
+ ### Rerank
155
207
 
156
208
  ```ruby
157
- client = SimpleInference::Client.new(
158
- base_url: "https://api.openai.com",
159
- api_key: ENV["OPENAI_API_KEY"]
209
+ response = client.rerank(
210
+ model: "bge-reranker-v2-m3",
211
+ query: "What is machine learning?",
212
+ documents: [
213
+ "Machine learning is a subset of AI...",
214
+ "The weather today is sunny...",
215
+ "Deep learning uses neural networks..."
216
+ ]
160
217
  )
218
+ ```
161
219
 
162
- response = client.chat_completions(
163
- model: "gpt-4.1-mini",
164
- messages: [{ "role" => "user", "content" => "Hello" }]
220
+ ### Audio Transcription
221
+
222
+ ```ruby
223
+ response = client.audio_transcriptions(
224
+ model: "whisper-1",
225
+ file: File.open("audio.mp3", "rb")
165
226
  )
166
227
 
167
- pp response[:body]
228
+ puts response[:body]["text"]
168
229
  ```
169
230
 
170
- #### Streaming chat completions (SSE)
231
+ ### Audio Translation
232
+
233
+ ```ruby
234
+ response = client.audio_translations(
235
+ model: "whisper-1",
236
+ file: File.open("audio.mp3", "rb")
237
+ )
238
+ ```
171
239
 
172
- For OpenAI-style streaming (`text/event-stream`), use `chat_completions_stream`. It yields parsed JSON events (Ruby `Hash`), so you can consume deltas incrementally:
240
+ ### List Models
173
241
 
174
242
  ```ruby
175
- client.chat_completions_stream(
176
- model: "gpt-4.1-mini",
177
- messages: [{ "role" => "user", "content" => "Hello" }]
178
- ) do |event|
179
- delta = event.dig("choices", 0, "delta", "content")
180
- print delta if delta
181
- end
182
- puts
243
+ response = client.list_models
244
+ models = response[:body]["data"]
183
245
  ```
184
246
 
185
- If you prefer, it also returns an Enumerator:
247
+ ### Health Check
186
248
 
187
249
  ```ruby
188
- client.chat_completions_stream(model: "gpt-4.1-mini", messages: [...]).each do |event|
189
- # ...
250
+ # Returns full response
251
+ response = client.health
252
+
253
+ # Returns boolean
254
+ if client.healthy?
255
+ puts "Service is up!"
190
256
  end
191
257
  ```
192
258
 
193
- Fallback behavior:
259
+ ## Response Format
260
+
261
+ All methods return a Hash with:
262
+
263
+ ```ruby
264
+ {
265
+ status: 200, # HTTP status code
266
+ headers: { "content-type" => "application/json", ... }, # Response headers (lowercase keys)
267
+ body: { ... } # Parsed JSON body (Hash) or raw String
268
+ }
269
+ ```
270
+
271
+ ## Error Handling
272
+
273
+ By default, non-2xx responses raise exceptions:
274
+
275
+ ```ruby
276
+ begin
277
+ client.chat_completions(model: "invalid", messages: [])
278
+ rescue SimpleInference::Errors::HTTPError => e
279
+ puts "HTTP #{e.status}: #{e.message}"
280
+ puts e.body # raw response body
281
+ end
282
+ ```
194
283
 
195
- - If the upstream service does **not** support streaming (for example, this repo's server currently returns `400` with `{"detail":"Streaming responses are not supported yet"}`), the SDK will **retry non-streaming** and yield a **single synthetic chunk** so your streaming consumer code can still run.
284
+ Other exception types:
196
285
 
197
- #### Connect to any OpenAI-compatible endpoint
286
+ - `SimpleInference::Errors::TimeoutError` Request timed out
287
+ - `SimpleInference::Errors::ConnectionError` – Network error
288
+ - `SimpleInference::Errors::DecodeError` – JSON parsing failed
289
+ - `SimpleInference::Errors::ConfigurationError` – Invalid configuration
198
290
 
199
- For services that expose an OpenAI-compatible API (same paths and payloads), point `base_url` at that service and provide the correct token:
291
+ To handle errors manually:
200
292
 
201
293
  ```ruby
202
294
  client = SimpleInference::Client.new(
203
- base_url: "https://my-openai-compatible.example.com",
204
- api_key: ENV["MY_SERVICE_TOKEN"]
295
+ base_url: "https://api.openai.com",
296
+ api_key: ENV["OPENAI_API_KEY"],
297
+ raise_on_error: false
205
298
  )
299
+
300
+ response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
301
+
302
+ if response[:status] == 200
303
+ # success
304
+ else
305
+ puts "Error: #{response[:status]} - #{response[:body]}"
306
+ end
206
307
  ```
207
308
 
208
- If the service uses a non-standard header instead of `Authorization: Bearer`, you can omit `api_key` and pass headers explicitly:
309
+ ## HTTP Adapters
310
+
311
+ ### Default (Net::HTTP)
312
+
313
+ The default adapter uses Ruby's built-in `Net::HTTP`. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
314
+
315
+ ### HTTPX Adapter
316
+
317
+ For better performance or async environments, use the optional HTTPX adapter:
318
+
319
+ ```ruby
320
+ # Gemfile
321
+ gem "httpx"
322
+ ```
209
323
 
210
324
  ```ruby
325
+ adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
326
+
211
327
  client = SimpleInference::Client.new(
212
- base_url: "https://my-service.example.com",
213
- headers: {
214
- "x-api-key" => ENV["MY_SERVICE_KEY"]
215
- }
328
+ base_url: "https://api.openai.com",
329
+ api_key: ENV["OPENAI_API_KEY"],
330
+ adapter: adapter
216
331
  )
217
332
  ```
218
333
 
219
- ### Puma vs Falcon (Fiber / Async) usage
334
+ ### Custom Adapter
220
335
 
221
- The default HTTP adapter uses Ruby's `Net::HTTP` and is safe to use under Puma's multithreaded model:
336
+ Implement your own adapter by subclassing `SimpleInference::HTTPAdapter`:
222
337
 
223
- - No global mutable state
224
- - Per-client configuration only
225
- - Blocking IO that integrates with Ruby 3 Fiber scheduler
338
+ ```ruby
339
+ class MyAdapter < SimpleInference::HTTPAdapter
340
+ def call(request)
341
+ # request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
342
+ # Must return: { status: Integer, headers: Hash, body: String }
343
+ end
344
+
345
+ def call_stream(request, &block)
346
+ # For streaming support (optional)
347
+ # Yield raw chunks to block for SSE responses
348
+ end
349
+ end
350
+ ```
226
351
 
227
- If you don't pass an adapter, `SimpleInference::Client` uses `SimpleInference::HTTPAdapters::Default` (Net::HTTP).
352
+ ## Rails Integration
228
353
 
229
- For Falcon / async environments, you can keep the default adapter, or use the optional HTTPX adapter (requires the `httpx` gem):
354
+ Create an initializer `config/initializers/simple_inference.rb`:
230
355
 
231
356
  ```ruby
232
- gem "httpx" # optional, only required when using the HTTPX adapter
357
+ INFERENCE_CLIENT = SimpleInference::Client.new(
358
+ base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
359
+ api_key: ENV["INFERENCE_API_KEY"]
360
+ )
233
361
  ```
234
362
 
235
- You can then use the optional HTTPX adapter shipped with this gem:
363
+ Use in controllers:
236
364
 
237
365
  ```ruby
238
- adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
366
+ class ChatsController < ApplicationController
367
+ def create
368
+ response = INFERENCE_CLIENT.chat_completions(
369
+ model: "gpt-4o-mini",
370
+ messages: [{ "role" => "user", "content" => params[:prompt] }]
371
+ )
239
372
 
240
- SIMPLE_INFERENCE_CLIENT =
241
- SimpleInference::Client.new(
242
- base_url: ENV.fetch("SIMPLE_INFERENCE_BASE_URL", "http://localhost:8000"),
243
- api_key: ENV["SIMPLE_INFERENCE_API_KEY"],
244
- adapter: adapter
245
- )
373
+ render json: response[:body]
374
+ end
375
+ end
376
+ ```
377
+
378
+ Use in background jobs:
379
+
380
+ ```ruby
381
+ class EmbedJob < ApplicationJob
382
+ def perform(text)
383
+ response = INFERENCE_CLIENT.embeddings(
384
+ model: "text-embedding-3-small",
385
+ input: text
386
+ )
387
+
388
+ vector = response[:body]["data"][0]["embedding"]
389
+ # Store vector...
390
+ end
391
+ end
246
392
  ```
393
+
394
+ ## Thread Safety
395
+
396
+ The client is thread-safe:
397
+
398
+ - No global mutable state
399
+ - Per-client configuration only
400
+ - Each request uses its own HTTP connection
401
+
402
+ ## License
403
+
404
+ MIT License. See [LICENSE](LICENSE.txt) for details.
@@ -23,7 +23,7 @@ module SimpleInference
23
23
  # POST /v1/chat/completions
24
24
  # params: { model: "model-name", messages: [...], ... }
25
25
  def chat_completions(params)
26
- post_json("/v1/chat/completions", params)
26
+ post_json(api_path("/chat/completions"), params)
27
27
  end
28
28
 
29
29
  # POST /v1/chat/completions (streaming)
@@ -43,7 +43,7 @@ module SimpleInference
43
43
  body.delete("stream")
44
44
  body["stream"] = true
45
45
 
46
- response = post_json_stream("/v1/chat/completions", body) do |event|
46
+ response = post_json_stream(api_path("/chat/completions"), body) do |event|
47
47
  yield event
48
48
  end
49
49
 
@@ -60,7 +60,7 @@ module SimpleInference
60
60
  fallback_body.delete(:stream)
61
61
  fallback_body.delete("stream")
62
62
 
63
- fallback_response = post_json("/v1/chat/completions", fallback_body)
63
+ fallback_response = post_json(api_path("/chat/completions"), fallback_body)
64
64
  chunk = synthesize_chat_completion_chunk(fallback_response[:body])
65
65
  yield chunk if chunk
66
66
  return fallback_response
@@ -78,17 +78,17 @@ module SimpleInference
78
78
 
79
79
  # POST /v1/embeddings
80
80
  def embeddings(params)
81
- post_json("/v1/embeddings", params)
81
+ post_json(api_path("/embeddings"), params)
82
82
  end
83
83
 
84
84
  # POST /v1/rerank
85
85
  def rerank(params)
86
- post_json("/v1/rerank", params)
86
+ post_json(api_path("/rerank"), params)
87
87
  end
88
88
 
89
89
  # GET /v1/models
90
90
  def list_models
91
- get_json("/v1/models")
91
+ get_json(api_path("/models"))
92
92
  end
93
93
 
94
94
  # GET /health
@@ -109,12 +109,12 @@ module SimpleInference
109
109
  # POST /v1/audio/transcriptions
110
110
  # params: { file: io_or_hash, model: "model-name", **audio_options }
111
111
  def audio_transcriptions(params)
112
- post_multipart("/v1/audio/transcriptions", params)
112
+ post_multipart(api_path("/audio/transcriptions"), params)
113
113
  end
114
114
 
115
115
  # POST /v1/audio/translations
116
116
  def audio_translations(params)
117
- post_multipart("/v1/audio/translations", params)
117
+ post_multipart(api_path("/audio/translations"), params)
118
118
  end
119
119
 
120
120
  private
@@ -123,6 +123,10 @@ module SimpleInference
123
123
  config.base_url
124
124
  end
125
125
 
126
+ def api_path(endpoint)
127
+ "#{config.api_prefix}#{endpoint}"
128
+ end
129
+
126
130
  def get_json(path, params: nil, raise_on_http_error: nil)
127
131
  full_path = with_query(path, params)
128
132
  request_json(
@@ -4,6 +4,7 @@ module SimpleInference
4
4
  class Config
5
5
  attr_reader :base_url,
6
6
  :api_key,
7
+ :api_prefix,
7
8
  :timeout,
8
9
  :open_timeout,
9
10
  :read_timeout,
@@ -19,6 +20,10 @@ module SimpleInference
19
20
  @api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
20
21
  @api_key = nil if @api_key.empty?
21
22
 
23
+ @api_prefix = normalize_api_prefix(
24
+ opts.key?(:api_prefix) ? opts[:api_prefix] : ENV.fetch("SIMPLE_INFERENCE_API_PREFIX", "/v1")
25
+ )
26
+
22
27
  @timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
23
28
  @open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
24
29
  @read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
@@ -46,6 +51,17 @@ module SimpleInference
46
51
  url.chomp("/")
47
52
  end
48
53
 
54
+ def normalize_api_prefix(value)
55
+ return "" if value.nil?
56
+
57
+ prefix = value.to_s.strip
58
+ return "" if prefix.empty?
59
+
60
+ # Ensure it starts with / and does not end with /
61
+ prefix = "/#{prefix}" unless prefix.start_with?("/")
62
+ prefix.chomp("/")
63
+ end
64
+
49
65
  def to_float_or_nil(value)
50
66
  return nil if value.nil? || value == ""
51
67
 
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module SimpleInference
4
- VERSION = "0.1.3"
4
+ VERSION = "0.1.4"
5
5
  end
@@ -1,4 +1,3 @@
1
1
  module SimpleInference
2
2
  VERSION: String
3
3
  end
4
-
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: simple_inference
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.1.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - jasl
@@ -9,8 +9,8 @@ bindir: exe
9
9
  cert_chain: []
10
10
  date: 1980-01-02 00:00:00.000000000 Z
11
11
  dependencies: []
12
- description: Fiber-friendly Ruby client for Simple Inference Server APIs (chat, embeddings,
13
- audio, rerank, health).
12
+ description: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
13
+ (chat, embeddings, audio, rerank, health).
14
14
  email:
15
15
  - jasl9187@hotmail.com
16
16
  executables: []
@@ -29,13 +29,12 @@ files:
29
29
  - lib/simple_inference/http_adapters/httpx.rb
30
30
  - lib/simple_inference/version.rb
31
31
  - sig/simple_inference.rbs
32
- homepage: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
32
+ homepage: https://github.com/jasl/simple_inference.rb
33
33
  licenses:
34
34
  - MIT
35
35
  metadata:
36
36
  allowed_push_host: https://rubygems.org
37
- homepage_uri: https://github.com/jasl/simple_inference_server/tree/main/sdks/ruby
38
- source_code_uri: https://github.com/jasl/simple_inference_server
37
+ homepage_uri: https://github.com/jasl/simple_inference.rb
39
38
  rubygems_mfa_required: 'true'
40
39
  rdoc_options: []
41
40
  require_paths:
@@ -51,7 +50,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
51
50
  - !ruby/object:Gem::Version
52
51
  version: '0'
53
52
  requirements: []
54
- rubygems_version: 4.0.1
53
+ rubygems_version: 4.0.3
55
54
  specification_version: 4
56
- summary: Fiber-friendly Ruby client for the Simple Inference Server (OpenAI-compatible).
55
+ summary: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
57
56
  test_files: []