simple_inference 0.1.3 → 0.1.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +297 -139
- data/lib/simple_inference/client.rb +12 -8
- data/lib/simple_inference/config.rb +16 -0
- data/lib/simple_inference/version.rb +1 -1
- data/sig/simple_inference.rbs +0 -1
- metadata +7 -8
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 8d8b01060969cbab2df30a38e16b7952a877188e89bd720209c15b57f9f79687
|
|
4
|
+
data.tar.gz: e278f52f76cf6f7bd3f74e567731bbdec016769b2b720161e9907348fd9b54c3
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: cc6724a0fbe640d7af0d6bb35bfee81e6b95d501b23734f2874dfddbb2f71dcb7ae59557b742427bb9322804fbca632cbe95abe68f9ea26709303fea86550605
|
|
7
|
+
data.tar.gz: 871b06d6e585bac84cf38ac3abef77b3940dd41f4868c76e08b19c317c2b35c93f81adde9a0ec73e9c20a689062cade65c0115d6e82afab86444d253f9964688
|
data/README.md
CHANGED
|
@@ -1,13 +1,24 @@
|
|
|
1
|
-
|
|
1
|
+
# SimpleInference
|
|
2
2
|
|
|
3
|
-
Fiber-friendly Ruby client for
|
|
3
|
+
A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with `Net::HTTP`.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- 🔌 **Universal compatibility** – Works with any OpenAI-compatible API provider
|
|
10
|
+
- 🌊 **Streaming support** – Native SSE streaming for chat completions
|
|
11
|
+
- 🧵 **Fiber-friendly** – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
|
|
12
|
+
- 🔧 **Flexible configuration** – Customizable API prefix for non-standard endpoints
|
|
13
|
+
- 🎯 **Simple interface** – Receive-an-Object / Return-an-Object style API
|
|
14
|
+
- 📦 **Zero runtime dependencies** – Uses only Ruby standard library
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
Add to your Gemfile:
|
|
8
19
|
|
|
9
20
|
```ruby
|
|
10
|
-
gem "simple_inference"
|
|
21
|
+
gem "simple_inference"
|
|
11
22
|
```
|
|
12
23
|
|
|
13
24
|
Then run:
|
|
@@ -16,231 +27,378 @@ Then run:
|
|
|
16
27
|
bundle install
|
|
17
28
|
```
|
|
18
29
|
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
You can configure the client via environment variables:
|
|
22
|
-
|
|
23
|
-
- `SIMPLE_INFERENCE_BASE_URL`: e.g. `http://localhost:8000`
|
|
24
|
-
- `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
|
|
25
|
-
- `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
|
|
26
|
-
- `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
|
|
27
|
-
|
|
28
|
-
Or explicitly when constructing a client:
|
|
30
|
+
## Quick Start
|
|
29
31
|
|
|
30
32
|
```ruby
|
|
33
|
+
require "simple_inference"
|
|
34
|
+
|
|
35
|
+
# Connect to OpenAI
|
|
31
36
|
client = SimpleInference::Client.new(
|
|
32
|
-
base_url: "
|
|
33
|
-
api_key:
|
|
34
|
-
|
|
37
|
+
base_url: "https://api.openai.com",
|
|
38
|
+
api_key: ENV["OPENAI_API_KEY"]
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
response = client.chat_completions(
|
|
42
|
+
model: "gpt-4o-mini",
|
|
43
|
+
messages: [{ "role" => "user", "content" => "Hello!" }]
|
|
35
44
|
)
|
|
45
|
+
|
|
46
|
+
puts response[:body]["choices"][0]["message"]["content"]
|
|
36
47
|
```
|
|
37
48
|
|
|
38
|
-
|
|
49
|
+
## Configuration
|
|
50
|
+
|
|
51
|
+
### Options
|
|
52
|
+
|
|
53
|
+
| Option | Env Variable | Default | Description |
|
|
54
|
+
|--------|--------------|---------|-------------|
|
|
55
|
+
| `base_url` | `SIMPLE_INFERENCE_BASE_URL` | `http://localhost:8000` | API base URL |
|
|
56
|
+
| `api_key` | `SIMPLE_INFERENCE_API_KEY` | `nil` | API key (sent as `Authorization: Bearer <token>`) |
|
|
57
|
+
| `api_prefix` | `SIMPLE_INFERENCE_API_PREFIX` | `/v1` | API path prefix (e.g., `/v1`, empty string for some providers) |
|
|
58
|
+
| `timeout` | `SIMPLE_INFERENCE_TIMEOUT` | `nil` | Request timeout in seconds |
|
|
59
|
+
| `open_timeout` | `SIMPLE_INFERENCE_OPEN_TIMEOUT` | `nil` | Connection open timeout |
|
|
60
|
+
| `read_timeout` | `SIMPLE_INFERENCE_READ_TIMEOUT` | `nil` | Read timeout |
|
|
61
|
+
| `raise_on_error` | `SIMPLE_INFERENCE_RAISE_ON_ERROR` | `true` | Raise exceptions on HTTP errors |
|
|
62
|
+
| `headers` | – | `{}` | Additional headers to send with requests |
|
|
63
|
+
| `adapter` | – | `Default` | HTTP adapter (see [Adapters](#http-adapters)) |
|
|
64
|
+
|
|
65
|
+
### Provider Examples
|
|
66
|
+
|
|
67
|
+
#### OpenAI
|
|
39
68
|
|
|
40
69
|
```ruby
|
|
41
|
-
client = SimpleInference.new(
|
|
70
|
+
client = SimpleInference::Client.new(
|
|
71
|
+
base_url: "https://api.openai.com",
|
|
72
|
+
api_key: ENV["OPENAI_API_KEY"]
|
|
73
|
+
)
|
|
42
74
|
```
|
|
43
75
|
|
|
44
|
-
|
|
76
|
+
#### 火山引擎 (Volcengine / ByteDance)
|
|
45
77
|
|
|
46
|
-
|
|
78
|
+
火山引擎的 API 路径不包含 `/v1` 前缀,需要设置 `api_prefix: ""`:
|
|
47
79
|
|
|
48
80
|
```ruby
|
|
49
|
-
|
|
50
|
-
base_url:
|
|
51
|
-
api_key:
|
|
81
|
+
client = SimpleInference::Client.new(
|
|
82
|
+
base_url: "https://ark.cn-beijing.volces.com/api/v3",
|
|
83
|
+
api_key: ENV["ARK_API_KEY"],
|
|
84
|
+
api_prefix: "" # 重要:火山引擎不使用 /v1 前缀
|
|
85
|
+
)
|
|
86
|
+
|
|
87
|
+
response = client.chat_completions(
|
|
88
|
+
model: "deepseek-v3-250324",
|
|
89
|
+
messages: [
|
|
90
|
+
{ "role" => "system", "content" => "你是人工智能助手" },
|
|
91
|
+
{ "role" => "user", "content" => "你好" }
|
|
92
|
+
]
|
|
52
93
|
)
|
|
53
94
|
```
|
|
54
95
|
|
|
55
|
-
|
|
96
|
+
#### DeepSeek
|
|
56
97
|
|
|
57
98
|
```ruby
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
messages: [
|
|
63
|
-
{ "role" => "user", "content" => params[:prompt] }
|
|
64
|
-
]
|
|
65
|
-
)
|
|
66
|
-
|
|
67
|
-
render json: result[:body], status: result[:status]
|
|
68
|
-
end
|
|
69
|
-
end
|
|
99
|
+
client = SimpleInference::Client.new(
|
|
100
|
+
base_url: "https://api.deepseek.com",
|
|
101
|
+
api_key: ENV["DEEPSEEK_API_KEY"]
|
|
102
|
+
)
|
|
70
103
|
```
|
|
71
104
|
|
|
72
|
-
|
|
105
|
+
#### Groq
|
|
73
106
|
|
|
74
107
|
```ruby
|
|
75
|
-
|
|
76
|
-
|
|
108
|
+
client = SimpleInference::Client.new(
|
|
109
|
+
base_url: "https://api.groq.com/openai",
|
|
110
|
+
api_key: ENV["GROQ_API_KEY"]
|
|
111
|
+
)
|
|
112
|
+
```
|
|
77
113
|
|
|
78
|
-
|
|
79
|
-
result = SIMPLE_INFERENCE_CLIENT.embeddings(
|
|
80
|
-
model: "bge-m3",
|
|
81
|
-
input: text
|
|
82
|
-
)
|
|
114
|
+
#### Together AI
|
|
83
115
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
116
|
+
```ruby
|
|
117
|
+
client = SimpleInference::Client.new(
|
|
118
|
+
base_url: "https://api.together.xyz",
|
|
119
|
+
api_key: ENV["TOGETHER_API_KEY"]
|
|
120
|
+
)
|
|
88
121
|
```
|
|
89
122
|
|
|
90
|
-
|
|
123
|
+
#### Local inference servers (Ollama, vLLM, etc.)
|
|
91
124
|
|
|
92
125
|
```ruby
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
end
|
|
126
|
+
# Ollama
|
|
127
|
+
client = SimpleInference::Client.new(
|
|
128
|
+
base_url: "http://localhost:11434"
|
|
129
|
+
)
|
|
98
130
|
|
|
99
|
-
|
|
100
|
-
|
|
131
|
+
# vLLM
|
|
132
|
+
client = SimpleInference::Client.new(
|
|
133
|
+
base_url: "http://localhost:8000"
|
|
134
|
+
)
|
|
101
135
|
```
|
|
102
136
|
|
|
103
|
-
|
|
137
|
+
#### Custom authentication header
|
|
104
138
|
|
|
105
|
-
|
|
106
|
-
- `client.embeddings(params)` → `POST /v1/embeddings`
|
|
107
|
-
- `client.rerank(params)` → `POST /v1/rerank`
|
|
108
|
-
- `client.list_models` → `GET /v1/models`
|
|
109
|
-
- `client.health` → `GET /health`
|
|
110
|
-
- `client.healthy?` → boolean helper based on `/health`
|
|
111
|
-
- `client.audio_transcriptions(params)` → `POST /v1/audio/transcriptions`
|
|
112
|
-
- `client.audio_translations(params)` → `POST /v1/audio/translations`
|
|
139
|
+
Some providers use non-standard authentication headers:
|
|
113
140
|
|
|
114
|
-
|
|
141
|
+
```ruby
|
|
142
|
+
client = SimpleInference::Client.new(
|
|
143
|
+
base_url: "https://my-service.example.com",
|
|
144
|
+
api_prefix: "/v1",
|
|
145
|
+
headers: {
|
|
146
|
+
"x-api-key" => ENV["MY_SERVICE_KEY"]
|
|
147
|
+
}
|
|
148
|
+
)
|
|
149
|
+
```
|
|
115
150
|
|
|
116
|
-
|
|
117
|
-
- Output: a `Hash` with keys:
|
|
118
|
-
- `:status` – HTTP status code
|
|
119
|
-
- `:headers` – response headers (lowercased keys)
|
|
120
|
-
- `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
|
|
151
|
+
## API Methods
|
|
121
152
|
|
|
122
|
-
###
|
|
153
|
+
### Chat Completions
|
|
123
154
|
|
|
124
|
-
|
|
155
|
+
```ruby
|
|
156
|
+
response = client.chat_completions(
|
|
157
|
+
model: "gpt-4o-mini",
|
|
158
|
+
messages: [
|
|
159
|
+
{ "role" => "system", "content" => "You are a helpful assistant." },
|
|
160
|
+
{ "role" => "user", "content" => "Hello!" }
|
|
161
|
+
],
|
|
162
|
+
temperature: 0.7,
|
|
163
|
+
max_tokens: 1000
|
|
164
|
+
)
|
|
125
165
|
|
|
126
|
-
|
|
166
|
+
puts response[:body]["choices"][0]["message"]["content"]
|
|
167
|
+
```
|
|
127
168
|
|
|
128
|
-
|
|
169
|
+
### Streaming Chat Completions
|
|
129
170
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
-
|
|
171
|
+
```ruby
|
|
172
|
+
client.chat_completions_stream(
|
|
173
|
+
model: "gpt-4o-mini",
|
|
174
|
+
messages: [{ "role" => "user", "content" => "Tell me a story" }]
|
|
175
|
+
) do |event|
|
|
176
|
+
delta = event.dig("choices", 0, "delta", "content")
|
|
177
|
+
print delta if delta
|
|
178
|
+
end
|
|
179
|
+
puts
|
|
180
|
+
```
|
|
133
181
|
|
|
134
|
-
|
|
182
|
+
Or use as an Enumerator:
|
|
135
183
|
|
|
136
184
|
```ruby
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
185
|
+
stream = client.chat_completions_stream(
|
|
186
|
+
model: "gpt-4o-mini",
|
|
187
|
+
messages: [{ "role" => "user", "content" => "Hello" }]
|
|
140
188
|
)
|
|
141
189
|
|
|
142
|
-
|
|
143
|
-
|
|
144
|
-
# happy path
|
|
145
|
-
else
|
|
146
|
-
Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
|
|
190
|
+
stream.each do |event|
|
|
191
|
+
# process event
|
|
147
192
|
end
|
|
148
193
|
```
|
|
149
194
|
|
|
150
|
-
###
|
|
195
|
+
### Embeddings
|
|
196
|
+
|
|
197
|
+
```ruby
|
|
198
|
+
response = client.embeddings(
|
|
199
|
+
model: "text-embedding-3-small",
|
|
200
|
+
input: "Hello, world!"
|
|
201
|
+
)
|
|
151
202
|
|
|
152
|
-
|
|
203
|
+
vector = response[:body]["data"][0]["embedding"]
|
|
204
|
+
```
|
|
153
205
|
|
|
154
|
-
|
|
206
|
+
### Rerank
|
|
155
207
|
|
|
156
208
|
```ruby
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
209
|
+
response = client.rerank(
|
|
210
|
+
model: "bge-reranker-v2-m3",
|
|
211
|
+
query: "What is machine learning?",
|
|
212
|
+
documents: [
|
|
213
|
+
"Machine learning is a subset of AI...",
|
|
214
|
+
"The weather today is sunny...",
|
|
215
|
+
"Deep learning uses neural networks..."
|
|
216
|
+
]
|
|
160
217
|
)
|
|
218
|
+
```
|
|
161
219
|
|
|
162
|
-
|
|
163
|
-
|
|
164
|
-
|
|
220
|
+
### Audio Transcription
|
|
221
|
+
|
|
222
|
+
```ruby
|
|
223
|
+
response = client.audio_transcriptions(
|
|
224
|
+
model: "whisper-1",
|
|
225
|
+
file: File.open("audio.mp3", "rb")
|
|
165
226
|
)
|
|
166
227
|
|
|
167
|
-
|
|
228
|
+
puts response[:body]["text"]
|
|
168
229
|
```
|
|
169
230
|
|
|
170
|
-
|
|
231
|
+
### Audio Translation
|
|
232
|
+
|
|
233
|
+
```ruby
|
|
234
|
+
response = client.audio_translations(
|
|
235
|
+
model: "whisper-1",
|
|
236
|
+
file: File.open("audio.mp3", "rb")
|
|
237
|
+
)
|
|
238
|
+
```
|
|
171
239
|
|
|
172
|
-
|
|
240
|
+
### List Models
|
|
173
241
|
|
|
174
242
|
```ruby
|
|
175
|
-
client.
|
|
176
|
-
|
|
177
|
-
messages: [{ "role" => "user", "content" => "Hello" }]
|
|
178
|
-
) do |event|
|
|
179
|
-
delta = event.dig("choices", 0, "delta", "content")
|
|
180
|
-
print delta if delta
|
|
181
|
-
end
|
|
182
|
-
puts
|
|
243
|
+
response = client.list_models
|
|
244
|
+
models = response[:body]["data"]
|
|
183
245
|
```
|
|
184
246
|
|
|
185
|
-
|
|
247
|
+
### Health Check
|
|
186
248
|
|
|
187
249
|
```ruby
|
|
188
|
-
|
|
189
|
-
|
|
250
|
+
# Returns full response
|
|
251
|
+
response = client.health
|
|
252
|
+
|
|
253
|
+
# Returns boolean
|
|
254
|
+
if client.healthy?
|
|
255
|
+
puts "Service is up!"
|
|
190
256
|
end
|
|
191
257
|
```
|
|
192
258
|
|
|
193
|
-
|
|
259
|
+
## Response Format
|
|
260
|
+
|
|
261
|
+
All methods return a Hash with:
|
|
262
|
+
|
|
263
|
+
```ruby
|
|
264
|
+
{
|
|
265
|
+
status: 200, # HTTP status code
|
|
266
|
+
headers: { "content-type" => "application/json", ... }, # Response headers (lowercase keys)
|
|
267
|
+
body: { ... } # Parsed JSON body (Hash) or raw String
|
|
268
|
+
}
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
## Error Handling
|
|
272
|
+
|
|
273
|
+
By default, non-2xx responses raise exceptions:
|
|
274
|
+
|
|
275
|
+
```ruby
|
|
276
|
+
begin
|
|
277
|
+
client.chat_completions(model: "invalid", messages: [])
|
|
278
|
+
rescue SimpleInference::Errors::HTTPError => e
|
|
279
|
+
puts "HTTP #{e.status}: #{e.message}"
|
|
280
|
+
puts e.body # raw response body
|
|
281
|
+
end
|
|
282
|
+
```
|
|
194
283
|
|
|
195
|
-
|
|
284
|
+
Other exception types:
|
|
196
285
|
|
|
197
|
-
|
|
286
|
+
- `SimpleInference::Errors::TimeoutError` – Request timed out
|
|
287
|
+
- `SimpleInference::Errors::ConnectionError` – Network error
|
|
288
|
+
- `SimpleInference::Errors::DecodeError` – JSON parsing failed
|
|
289
|
+
- `SimpleInference::Errors::ConfigurationError` – Invalid configuration
|
|
198
290
|
|
|
199
|
-
|
|
291
|
+
To handle errors manually:
|
|
200
292
|
|
|
201
293
|
```ruby
|
|
202
294
|
client = SimpleInference::Client.new(
|
|
203
|
-
base_url: "https://
|
|
204
|
-
api_key:
|
|
295
|
+
base_url: "https://api.openai.com",
|
|
296
|
+
api_key: ENV["OPENAI_API_KEY"],
|
|
297
|
+
raise_on_error: false
|
|
205
298
|
)
|
|
299
|
+
|
|
300
|
+
response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
|
|
301
|
+
|
|
302
|
+
if response[:status] == 200
|
|
303
|
+
# success
|
|
304
|
+
else
|
|
305
|
+
puts "Error: #{response[:status]} - #{response[:body]}"
|
|
306
|
+
end
|
|
206
307
|
```
|
|
207
308
|
|
|
208
|
-
|
|
309
|
+
## HTTP Adapters
|
|
310
|
+
|
|
311
|
+
### Default (Net::HTTP)
|
|
312
|
+
|
|
313
|
+
The default adapter uses Ruby's built-in `Net::HTTP`. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
|
|
314
|
+
|
|
315
|
+
### HTTPX Adapter
|
|
316
|
+
|
|
317
|
+
For better performance or async environments, use the optional HTTPX adapter:
|
|
318
|
+
|
|
319
|
+
```ruby
|
|
320
|
+
# Gemfile
|
|
321
|
+
gem "httpx"
|
|
322
|
+
```
|
|
209
323
|
|
|
210
324
|
```ruby
|
|
325
|
+
adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
|
|
326
|
+
|
|
211
327
|
client = SimpleInference::Client.new(
|
|
212
|
-
base_url: "https://
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
}
|
|
328
|
+
base_url: "https://api.openai.com",
|
|
329
|
+
api_key: ENV["OPENAI_API_KEY"],
|
|
330
|
+
adapter: adapter
|
|
216
331
|
)
|
|
217
332
|
```
|
|
218
333
|
|
|
219
|
-
###
|
|
334
|
+
### Custom Adapter
|
|
220
335
|
|
|
221
|
-
|
|
336
|
+
Implement your own adapter by subclassing `SimpleInference::HTTPAdapter`:
|
|
222
337
|
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
338
|
+
```ruby
|
|
339
|
+
class MyAdapter < SimpleInference::HTTPAdapter
|
|
340
|
+
def call(request)
|
|
341
|
+
# request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
|
|
342
|
+
# Must return: { status: Integer, headers: Hash, body: String }
|
|
343
|
+
end
|
|
344
|
+
|
|
345
|
+
def call_stream(request, &block)
|
|
346
|
+
# For streaming support (optional)
|
|
347
|
+
# Yield raw chunks to block for SSE responses
|
|
348
|
+
end
|
|
349
|
+
end
|
|
350
|
+
```
|
|
226
351
|
|
|
227
|
-
|
|
352
|
+
## Rails Integration
|
|
228
353
|
|
|
229
|
-
|
|
354
|
+
Create an initializer `config/initializers/simple_inference.rb`:
|
|
230
355
|
|
|
231
356
|
```ruby
|
|
232
|
-
|
|
357
|
+
INFERENCE_CLIENT = SimpleInference::Client.new(
|
|
358
|
+
base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
|
|
359
|
+
api_key: ENV["INFERENCE_API_KEY"]
|
|
360
|
+
)
|
|
233
361
|
```
|
|
234
362
|
|
|
235
|
-
|
|
363
|
+
Use in controllers:
|
|
236
364
|
|
|
237
365
|
```ruby
|
|
238
|
-
|
|
366
|
+
class ChatsController < ApplicationController
|
|
367
|
+
def create
|
|
368
|
+
response = INFERENCE_CLIENT.chat_completions(
|
|
369
|
+
model: "gpt-4o-mini",
|
|
370
|
+
messages: [{ "role" => "user", "content" => params[:prompt] }]
|
|
371
|
+
)
|
|
239
372
|
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
373
|
+
render json: response[:body]
|
|
374
|
+
end
|
|
375
|
+
end
|
|
376
|
+
```
|
|
377
|
+
|
|
378
|
+
Use in background jobs:
|
|
379
|
+
|
|
380
|
+
```ruby
|
|
381
|
+
class EmbedJob < ApplicationJob
|
|
382
|
+
def perform(text)
|
|
383
|
+
response = INFERENCE_CLIENT.embeddings(
|
|
384
|
+
model: "text-embedding-3-small",
|
|
385
|
+
input: text
|
|
386
|
+
)
|
|
387
|
+
|
|
388
|
+
vector = response[:body]["data"][0]["embedding"]
|
|
389
|
+
# Store vector...
|
|
390
|
+
end
|
|
391
|
+
end
|
|
246
392
|
```
|
|
393
|
+
|
|
394
|
+
## Thread Safety
|
|
395
|
+
|
|
396
|
+
The client is thread-safe:
|
|
397
|
+
|
|
398
|
+
- No global mutable state
|
|
399
|
+
- Per-client configuration only
|
|
400
|
+
- Each request uses its own HTTP connection
|
|
401
|
+
|
|
402
|
+
## License
|
|
403
|
+
|
|
404
|
+
MIT License. See [LICENSE](LICENSE.txt) for details.
|
|
@@ -23,7 +23,7 @@ module SimpleInference
|
|
|
23
23
|
# POST /v1/chat/completions
|
|
24
24
|
# params: { model: "model-name", messages: [...], ... }
|
|
25
25
|
def chat_completions(params)
|
|
26
|
-
post_json("/
|
|
26
|
+
post_json(api_path("/chat/completions"), params)
|
|
27
27
|
end
|
|
28
28
|
|
|
29
29
|
# POST /v1/chat/completions (streaming)
|
|
@@ -43,7 +43,7 @@ module SimpleInference
|
|
|
43
43
|
body.delete("stream")
|
|
44
44
|
body["stream"] = true
|
|
45
45
|
|
|
46
|
-
response = post_json_stream("/
|
|
46
|
+
response = post_json_stream(api_path("/chat/completions"), body) do |event|
|
|
47
47
|
yield event
|
|
48
48
|
end
|
|
49
49
|
|
|
@@ -60,7 +60,7 @@ module SimpleInference
|
|
|
60
60
|
fallback_body.delete(:stream)
|
|
61
61
|
fallback_body.delete("stream")
|
|
62
62
|
|
|
63
|
-
fallback_response = post_json("/
|
|
63
|
+
fallback_response = post_json(api_path("/chat/completions"), fallback_body)
|
|
64
64
|
chunk = synthesize_chat_completion_chunk(fallback_response[:body])
|
|
65
65
|
yield chunk if chunk
|
|
66
66
|
return fallback_response
|
|
@@ -78,17 +78,17 @@ module SimpleInference
|
|
|
78
78
|
|
|
79
79
|
# POST /v1/embeddings
|
|
80
80
|
def embeddings(params)
|
|
81
|
-
post_json("/
|
|
81
|
+
post_json(api_path("/embeddings"), params)
|
|
82
82
|
end
|
|
83
83
|
|
|
84
84
|
# POST /v1/rerank
|
|
85
85
|
def rerank(params)
|
|
86
|
-
post_json("/
|
|
86
|
+
post_json(api_path("/rerank"), params)
|
|
87
87
|
end
|
|
88
88
|
|
|
89
89
|
# GET /v1/models
|
|
90
90
|
def list_models
|
|
91
|
-
get_json("/
|
|
91
|
+
get_json(api_path("/models"))
|
|
92
92
|
end
|
|
93
93
|
|
|
94
94
|
# GET /health
|
|
@@ -109,12 +109,12 @@ module SimpleInference
|
|
|
109
109
|
# POST /v1/audio/transcriptions
|
|
110
110
|
# params: { file: io_or_hash, model: "model-name", **audio_options }
|
|
111
111
|
def audio_transcriptions(params)
|
|
112
|
-
post_multipart("/
|
|
112
|
+
post_multipart(api_path("/audio/transcriptions"), params)
|
|
113
113
|
end
|
|
114
114
|
|
|
115
115
|
# POST /v1/audio/translations
|
|
116
116
|
def audio_translations(params)
|
|
117
|
-
post_multipart("/
|
|
117
|
+
post_multipart(api_path("/audio/translations"), params)
|
|
118
118
|
end
|
|
119
119
|
|
|
120
120
|
private
|
|
@@ -123,6 +123,10 @@ module SimpleInference
|
|
|
123
123
|
config.base_url
|
|
124
124
|
end
|
|
125
125
|
|
|
126
|
+
def api_path(endpoint)
|
|
127
|
+
"#{config.api_prefix}#{endpoint}"
|
|
128
|
+
end
|
|
129
|
+
|
|
126
130
|
def get_json(path, params: nil, raise_on_http_error: nil)
|
|
127
131
|
full_path = with_query(path, params)
|
|
128
132
|
request_json(
|
|
@@ -4,6 +4,7 @@ module SimpleInference
|
|
|
4
4
|
class Config
|
|
5
5
|
attr_reader :base_url,
|
|
6
6
|
:api_key,
|
|
7
|
+
:api_prefix,
|
|
7
8
|
:timeout,
|
|
8
9
|
:open_timeout,
|
|
9
10
|
:read_timeout,
|
|
@@ -19,6 +20,10 @@ module SimpleInference
|
|
|
19
20
|
@api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
|
|
20
21
|
@api_key = nil if @api_key.empty?
|
|
21
22
|
|
|
23
|
+
@api_prefix = normalize_api_prefix(
|
|
24
|
+
opts.key?(:api_prefix) ? opts[:api_prefix] : ENV.fetch("SIMPLE_INFERENCE_API_PREFIX", "/v1")
|
|
25
|
+
)
|
|
26
|
+
|
|
22
27
|
@timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
|
|
23
28
|
@open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
|
|
24
29
|
@read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
|
|
@@ -46,6 +51,17 @@ module SimpleInference
|
|
|
46
51
|
url.chomp("/")
|
|
47
52
|
end
|
|
48
53
|
|
|
54
|
+
def normalize_api_prefix(value)
|
|
55
|
+
return "" if value.nil?
|
|
56
|
+
|
|
57
|
+
prefix = value.to_s.strip
|
|
58
|
+
return "" if prefix.empty?
|
|
59
|
+
|
|
60
|
+
# Ensure it starts with / and does not end with /
|
|
61
|
+
prefix = "/#{prefix}" unless prefix.start_with?("/")
|
|
62
|
+
prefix.chomp("/")
|
|
63
|
+
end
|
|
64
|
+
|
|
49
65
|
def to_float_or_nil(value)
|
|
50
66
|
return nil if value.nil? || value == ""
|
|
51
67
|
|
data/sig/simple_inference.rbs
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: simple_inference
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.4
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- jasl
|
|
@@ -9,8 +9,8 @@ bindir: exe
|
|
|
9
9
|
cert_chain: []
|
|
10
10
|
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
11
|
dependencies: []
|
|
12
|
-
description: Fiber-friendly Ruby client for
|
|
13
|
-
audio, rerank, health).
|
|
12
|
+
description: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
|
|
13
|
+
(chat, embeddings, audio, rerank, health).
|
|
14
14
|
email:
|
|
15
15
|
- jasl9187@hotmail.com
|
|
16
16
|
executables: []
|
|
@@ -29,13 +29,12 @@ files:
|
|
|
29
29
|
- lib/simple_inference/http_adapters/httpx.rb
|
|
30
30
|
- lib/simple_inference/version.rb
|
|
31
31
|
- sig/simple_inference.rbs
|
|
32
|
-
homepage: https://github.com/jasl/
|
|
32
|
+
homepage: https://github.com/jasl/simple_inference.rb
|
|
33
33
|
licenses:
|
|
34
34
|
- MIT
|
|
35
35
|
metadata:
|
|
36
36
|
allowed_push_host: https://rubygems.org
|
|
37
|
-
homepage_uri: https://github.com/jasl/
|
|
38
|
-
source_code_uri: https://github.com/jasl/simple_inference_server
|
|
37
|
+
homepage_uri: https://github.com/jasl/simple_inference.rb
|
|
39
38
|
rubygems_mfa_required: 'true'
|
|
40
39
|
rdoc_options: []
|
|
41
40
|
require_paths:
|
|
@@ -51,7 +50,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
51
50
|
- !ruby/object:Gem::Version
|
|
52
51
|
version: '0'
|
|
53
52
|
requirements: []
|
|
54
|
-
rubygems_version: 4.0.
|
|
53
|
+
rubygems_version: 4.0.3
|
|
55
54
|
specification_version: 4
|
|
56
|
-
summary: Fiber-friendly Ruby client for
|
|
55
|
+
summary: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
|
|
57
56
|
test_files: []
|