simple_inference 0.1.3 → 0.1.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/README.md +316 -138
- data/lib/simple_inference/client.rb +169 -74
- data/lib/simple_inference/config.rb +16 -0
- data/lib/simple_inference/errors.rb +11 -5
- data/lib/simple_inference/openai.rb +178 -0
- data/lib/simple_inference/response.rb +28 -0
- data/lib/simple_inference/version.rb +1 -1
- data/lib/simple_inference.rb +2 -0
- data/sig/simple_inference.rbs +68 -1
- metadata +9 -8
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: ad988c1bb0af4938ea72fd303943a6dc27b90f26a8128abd737e0fca6429e081
|
|
4
|
+
data.tar.gz: 6be00487c1533201ffc48afb14a64c385b434698cf1bf3ab1c5c4ab10834d06a
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 066dbeee456edae89770a5ed6541d77dda53d6ebcac59a2f277e28e00dde8b12b373cdec67bb0e79f84df781397034f1ff75694560bd6f612dca608ce6252630
|
|
7
|
+
data.tar.gz: 8008d5a95c38e45465e48a3f45fe8b7fd1cffec49e16cfd54419cbed08a11d7d613715314c91c742f63430860caac1fe332e10270cd0741401e98540a0582d65
|
data/README.md
CHANGED
|
@@ -1,13 +1,24 @@
|
|
|
1
|
-
|
|
1
|
+
# SimpleInference
|
|
2
2
|
|
|
3
|
-
Fiber-friendly Ruby client for
|
|
3
|
+
A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs. Works seamlessly with OpenAI, Azure OpenAI, 火山引擎 (Volcengine), DeepSeek, Groq, Together AI, and any other provider that implements the OpenAI API specification.
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Designed for simplicity and compatibility – no heavy dependencies, just pure Ruby with `Net::HTTP`.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
## Features
|
|
8
|
+
|
|
9
|
+
- 🔌 **Universal compatibility** – Works with any OpenAI-compatible API provider
|
|
10
|
+
- 🌊 **Streaming support** – Native SSE streaming for chat completions
|
|
11
|
+
- 🧵 **Fiber-friendly** – Compatible with Ruby 3 Fiber scheduler, works great with Falcon
|
|
12
|
+
- 🔧 **Flexible configuration** – Customizable API prefix for non-standard endpoints
|
|
13
|
+
- 🎯 **Simple interface** – Receive-an-Object / Return-an-Object style API
|
|
14
|
+
- 📦 **Zero runtime dependencies** – Uses only Ruby standard library
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
Add to your Gemfile:
|
|
8
19
|
|
|
9
20
|
```ruby
|
|
10
|
-
gem "simple_inference"
|
|
21
|
+
gem "simple_inference"
|
|
11
22
|
```
|
|
12
23
|
|
|
13
24
|
Then run:
|
|
@@ -16,231 +27,398 @@ Then run:
|
|
|
16
27
|
bundle install
|
|
17
28
|
```
|
|
18
29
|
|
|
19
|
-
|
|
30
|
+
## Quick Start
|
|
31
|
+
|
|
32
|
+
```ruby
|
|
33
|
+
require "simple_inference"
|
|
34
|
+
|
|
35
|
+
# Connect to OpenAI
|
|
36
|
+
client = SimpleInference::Client.new(
|
|
37
|
+
base_url: "https://api.openai.com",
|
|
38
|
+
api_key: ENV["OPENAI_API_KEY"]
|
|
39
|
+
)
|
|
40
|
+
|
|
41
|
+
result = client.chat(
|
|
42
|
+
model: "gpt-4o-mini",
|
|
43
|
+
messages: [{ "role" => "user", "content" => "Hello!" }]
|
|
44
|
+
)
|
|
45
|
+
|
|
46
|
+
puts result.content
|
|
47
|
+
p result.usage
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
## Configuration
|
|
51
|
+
|
|
52
|
+
### Options
|
|
20
53
|
|
|
21
|
-
|
|
54
|
+
| Option | Env Variable | Default | Description |
|
|
55
|
+
|--------|--------------|---------|-------------|
|
|
56
|
+
| `base_url` | `SIMPLE_INFERENCE_BASE_URL` | `http://localhost:8000` | API base URL |
|
|
57
|
+
| `api_key` | `SIMPLE_INFERENCE_API_KEY` | `nil` | API key (sent as `Authorization: Bearer <token>`) |
|
|
58
|
+
| `api_prefix` | `SIMPLE_INFERENCE_API_PREFIX` | `/v1` | API path prefix (e.g., `/v1`, empty string for some providers) |
|
|
59
|
+
| `timeout` | `SIMPLE_INFERENCE_TIMEOUT` | `nil` | Request timeout in seconds |
|
|
60
|
+
| `open_timeout` | `SIMPLE_INFERENCE_OPEN_TIMEOUT` | `nil` | Connection open timeout |
|
|
61
|
+
| `read_timeout` | `SIMPLE_INFERENCE_READ_TIMEOUT` | `nil` | Read timeout |
|
|
62
|
+
| `raise_on_error` | `SIMPLE_INFERENCE_RAISE_ON_ERROR` | `true` | Raise exceptions on HTTP errors |
|
|
63
|
+
| `headers` | – | `{}` | Additional headers to send with requests |
|
|
64
|
+
| `adapter` | – | `Default` | HTTP adapter (see [Adapters](#http-adapters)) |
|
|
22
65
|
|
|
23
|
-
|
|
24
|
-
- `SIMPLE_INFERENCE_API_KEY`: optional, if your deployment requires auth (sent as `Authorization: Bearer <token>`).
|
|
25
|
-
- `SIMPLE_INFERENCE_TIMEOUT`, `SIMPLE_INFERENCE_OPEN_TIMEOUT`, `SIMPLE_INFERENCE_READ_TIMEOUT` (seconds).
|
|
26
|
-
- `SIMPLE_INFERENCE_RAISE_ON_ERROR`: `true`/`false` (default `true`).
|
|
66
|
+
### Provider Examples
|
|
27
67
|
|
|
28
|
-
|
|
68
|
+
#### OpenAI
|
|
29
69
|
|
|
30
70
|
```ruby
|
|
31
71
|
client = SimpleInference::Client.new(
|
|
32
|
-
base_url: "
|
|
33
|
-
api_key:
|
|
34
|
-
timeout: 30.0
|
|
72
|
+
base_url: "https://api.openai.com",
|
|
73
|
+
api_key: ENV["OPENAI_API_KEY"]
|
|
35
74
|
)
|
|
36
75
|
```
|
|
37
76
|
|
|
38
|
-
|
|
77
|
+
#### 火山引擎 (Volcengine / ByteDance)
|
|
78
|
+
|
|
79
|
+
火山引擎的 API 路径不包含 `/v1` 前缀,需要设置 `api_prefix: ""`:
|
|
39
80
|
|
|
40
81
|
```ruby
|
|
41
|
-
client = SimpleInference.new(
|
|
42
|
-
|
|
82
|
+
client = SimpleInference::Client.new(
|
|
83
|
+
base_url: "https://ark.cn-beijing.volces.com/api/v3",
|
|
84
|
+
api_key: ENV["ARK_API_KEY"],
|
|
85
|
+
api_prefix: "" # 重要:火山引擎不使用 /v1 前缀
|
|
86
|
+
)
|
|
43
87
|
|
|
44
|
-
|
|
88
|
+
result = client.chat(
|
|
89
|
+
model: "deepseek-v3-250324",
|
|
90
|
+
messages: [
|
|
91
|
+
{ "role" => "system", "content" => "你是人工智能助手" },
|
|
92
|
+
{ "role" => "user", "content" => "你好" }
|
|
93
|
+
]
|
|
94
|
+
)
|
|
95
|
+
|
|
96
|
+
puts result.content
|
|
97
|
+
```
|
|
45
98
|
|
|
46
|
-
|
|
99
|
+
#### DeepSeek
|
|
47
100
|
|
|
48
101
|
```ruby
|
|
49
|
-
|
|
50
|
-
base_url:
|
|
51
|
-
api_key:
|
|
102
|
+
client = SimpleInference::Client.new(
|
|
103
|
+
base_url: "https://api.deepseek.com",
|
|
104
|
+
api_key: ENV["DEEPSEEK_API_KEY"]
|
|
52
105
|
)
|
|
53
106
|
```
|
|
54
107
|
|
|
55
|
-
|
|
108
|
+
#### Groq
|
|
56
109
|
|
|
57
110
|
```ruby
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
63
|
-
{ "role" => "user", "content" => params[:prompt] }
|
|
64
|
-
]
|
|
65
|
-
)
|
|
111
|
+
client = SimpleInference::Client.new(
|
|
112
|
+
base_url: "https://api.groq.com/openai",
|
|
113
|
+
api_key: ENV["GROQ_API_KEY"]
|
|
114
|
+
)
|
|
115
|
+
```
|
|
66
116
|
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
117
|
+
#### Together AI
|
|
118
|
+
|
|
119
|
+
```ruby
|
|
120
|
+
client = SimpleInference::Client.new(
|
|
121
|
+
base_url: "https://api.together.xyz",
|
|
122
|
+
api_key: ENV["TOGETHER_API_KEY"]
|
|
123
|
+
)
|
|
70
124
|
```
|
|
71
125
|
|
|
72
|
-
|
|
126
|
+
#### Local inference servers (Ollama, vLLM, etc.)
|
|
73
127
|
|
|
74
128
|
```ruby
|
|
75
|
-
|
|
76
|
-
|
|
129
|
+
# Ollama
|
|
130
|
+
client = SimpleInference::Client.new(
|
|
131
|
+
base_url: "http://localhost:11434"
|
|
132
|
+
)
|
|
77
133
|
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
134
|
+
# vLLM
|
|
135
|
+
client = SimpleInference::Client.new(
|
|
136
|
+
base_url: "http://localhost:8000"
|
|
137
|
+
)
|
|
138
|
+
```
|
|
83
139
|
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
140
|
+
#### Custom authentication header
|
|
141
|
+
|
|
142
|
+
Some providers use non-standard authentication headers:
|
|
143
|
+
|
|
144
|
+
```ruby
|
|
145
|
+
client = SimpleInference::Client.new(
|
|
146
|
+
base_url: "https://my-service.example.com",
|
|
147
|
+
api_prefix: "/v1",
|
|
148
|
+
headers: {
|
|
149
|
+
"x-api-key" => ENV["MY_SERVICE_KEY"]
|
|
150
|
+
}
|
|
151
|
+
)
|
|
88
152
|
```
|
|
89
153
|
|
|
90
|
-
|
|
154
|
+
## API Methods
|
|
155
|
+
|
|
156
|
+
### Chat
|
|
91
157
|
|
|
92
158
|
```ruby
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
159
|
+
result = client.chat(
|
|
160
|
+
model: "gpt-4o-mini",
|
|
161
|
+
messages: [
|
|
162
|
+
{ "role" => "system", "content" => "You are a helpful assistant." },
|
|
163
|
+
{ "role" => "user", "content" => "Hello!" }
|
|
164
|
+
],
|
|
165
|
+
temperature: 0.7,
|
|
166
|
+
max_tokens: 1000
|
|
167
|
+
)
|
|
98
168
|
|
|
99
|
-
|
|
100
|
-
|
|
169
|
+
puts result.content
|
|
170
|
+
p result.usage
|
|
101
171
|
```
|
|
102
172
|
|
|
103
|
-
###
|
|
173
|
+
### Streaming Chat
|
|
104
174
|
|
|
105
|
-
|
|
106
|
-
|
|
107
|
-
-
|
|
108
|
-
|
|
109
|
-
|
|
110
|
-
|
|
111
|
-
|
|
112
|
-
|
|
175
|
+
```ruby
|
|
176
|
+
result = client.chat(
|
|
177
|
+
model: "gpt-4o-mini",
|
|
178
|
+
messages: [{ "role" => "user", "content" => "Tell me a story" }],
|
|
179
|
+
stream: true,
|
|
180
|
+
include_usage: true
|
|
181
|
+
) do |delta|
|
|
182
|
+
print delta
|
|
183
|
+
end
|
|
184
|
+
puts
|
|
113
185
|
|
|
114
|
-
|
|
186
|
+
p result.usage
|
|
187
|
+
```
|
|
115
188
|
|
|
116
|
-
-
|
|
117
|
-
- Output: a `Hash` with keys:
|
|
118
|
-
- `:status` – HTTP status code
|
|
119
|
-
- `:headers` – response headers (lowercased keys)
|
|
120
|
-
- `:body` – parsed JSON (Ruby `Hash`) when the response is JSON, or a `String` for text bodies.
|
|
189
|
+
Low-level streaming (events) is also available, and can be used as an Enumerator:
|
|
121
190
|
|
|
122
|
-
|
|
191
|
+
```ruby
|
|
192
|
+
stream = client.chat_completions_stream(
|
|
193
|
+
model: "gpt-4o-mini",
|
|
194
|
+
messages: [{ "role" => "user", "content" => "Hello" }]
|
|
195
|
+
)
|
|
123
196
|
|
|
124
|
-
|
|
197
|
+
stream.each do |event|
|
|
198
|
+
# process event
|
|
199
|
+
end
|
|
200
|
+
```
|
|
125
201
|
|
|
126
|
-
|
|
202
|
+
Or as an Enumerable of delta strings:
|
|
127
203
|
|
|
128
|
-
|
|
204
|
+
```ruby
|
|
205
|
+
stream = client.chat_stream(
|
|
206
|
+
model: "gpt-4o-mini",
|
|
207
|
+
messages: [{ "role" => "user", "content" => "Hello" }],
|
|
208
|
+
include_usage: true
|
|
209
|
+
)
|
|
129
210
|
|
|
130
|
-
|
|
131
|
-
|
|
132
|
-
|
|
211
|
+
stream.each { |delta| print delta }
|
|
212
|
+
puts
|
|
213
|
+
p stream.result&.usage
|
|
214
|
+
```
|
|
133
215
|
|
|
134
|
-
|
|
216
|
+
### Embeddings
|
|
135
217
|
|
|
136
218
|
```ruby
|
|
137
|
-
|
|
138
|
-
|
|
139
|
-
|
|
219
|
+
response = client.embeddings(
|
|
220
|
+
model: "text-embedding-3-small",
|
|
221
|
+
input: "Hello, world!"
|
|
140
222
|
)
|
|
141
223
|
|
|
142
|
-
|
|
143
|
-
if response[:status] == 200
|
|
144
|
-
# happy path
|
|
145
|
-
else
|
|
146
|
-
Rails.logger.warn("Embedding call failed: #{response[:status]} #{response[:body].inspect}")
|
|
147
|
-
end
|
|
224
|
+
vector = response.body["data"][0]["embedding"]
|
|
148
225
|
```
|
|
149
226
|
|
|
150
|
-
###
|
|
227
|
+
### Rerank
|
|
151
228
|
|
|
152
|
-
|
|
229
|
+
```ruby
|
|
230
|
+
response = client.rerank(
|
|
231
|
+
model: "bge-reranker-v2-m3",
|
|
232
|
+
query: "What is machine learning?",
|
|
233
|
+
documents: [
|
|
234
|
+
"Machine learning is a subset of AI...",
|
|
235
|
+
"The weather today is sunny...",
|
|
236
|
+
"Deep learning uses neural networks..."
|
|
237
|
+
]
|
|
238
|
+
)
|
|
239
|
+
```
|
|
153
240
|
|
|
154
|
-
|
|
241
|
+
### Audio Transcription
|
|
155
242
|
|
|
156
243
|
```ruby
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
244
|
+
response = client.audio_transcriptions(
|
|
245
|
+
model: "whisper-1",
|
|
246
|
+
file: File.open("audio.mp3", "rb")
|
|
160
247
|
)
|
|
161
248
|
|
|
162
|
-
response
|
|
163
|
-
|
|
164
|
-
messages: [{ "role" => "user", "content" => "Hello" }]
|
|
165
|
-
)
|
|
249
|
+
puts response.body["text"]
|
|
250
|
+
```
|
|
166
251
|
|
|
167
|
-
|
|
252
|
+
### Audio Translation
|
|
253
|
+
|
|
254
|
+
```ruby
|
|
255
|
+
response = client.audio_translations(
|
|
256
|
+
model: "whisper-1",
|
|
257
|
+
file: File.open("audio.mp3", "rb")
|
|
258
|
+
)
|
|
168
259
|
```
|
|
169
260
|
|
|
170
|
-
|
|
261
|
+
### List Models
|
|
171
262
|
|
|
172
|
-
|
|
263
|
+
```ruby
|
|
264
|
+
model_ids = client.models
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Health Check
|
|
173
268
|
|
|
174
269
|
```ruby
|
|
175
|
-
|
|
176
|
-
|
|
177
|
-
|
|
178
|
-
|
|
179
|
-
|
|
180
|
-
|
|
270
|
+
# Returns full response
|
|
271
|
+
response = client.health
|
|
272
|
+
|
|
273
|
+
# Returns boolean
|
|
274
|
+
if client.healthy?
|
|
275
|
+
puts "Service is up!"
|
|
181
276
|
end
|
|
182
|
-
puts
|
|
183
277
|
```
|
|
184
278
|
|
|
185
|
-
|
|
279
|
+
## Response Format
|
|
280
|
+
|
|
281
|
+
All HTTP methods return a `SimpleInference::Response` with:
|
|
186
282
|
|
|
187
283
|
```ruby
|
|
188
|
-
|
|
189
|
-
#
|
|
190
|
-
|
|
284
|
+
response.status # Integer HTTP status code
|
|
285
|
+
response.headers # Hash with downcased String keys
|
|
286
|
+
response.body # Parsed JSON (Hash/Array), raw String, or nil (SSE success)
|
|
287
|
+
response.success? # true for 2xx
|
|
191
288
|
```
|
|
192
289
|
|
|
193
|
-
|
|
290
|
+
## Error Handling
|
|
194
291
|
|
|
195
|
-
|
|
292
|
+
By default, non-2xx responses raise exceptions:
|
|
196
293
|
|
|
197
|
-
|
|
294
|
+
```ruby
|
|
295
|
+
begin
|
|
296
|
+
client.chat_completions(model: "invalid", messages: [])
|
|
297
|
+
rescue SimpleInference::Errors::HTTPError => e
|
|
298
|
+
puts "HTTP #{e.status}: #{e.message}"
|
|
299
|
+
p e.body # parsed body (Hash/Array/String)
|
|
300
|
+
puts e.raw_body # raw response body string (if available)
|
|
301
|
+
end
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
Other exception types:
|
|
198
305
|
|
|
199
|
-
|
|
306
|
+
- `SimpleInference::Errors::TimeoutError` – Request timed out
|
|
307
|
+
- `SimpleInference::Errors::ConnectionError` – Network error
|
|
308
|
+
- `SimpleInference::Errors::DecodeError` – JSON parsing failed
|
|
309
|
+
- `SimpleInference::Errors::ConfigurationError` – Invalid configuration
|
|
310
|
+
|
|
311
|
+
To handle errors manually:
|
|
200
312
|
|
|
201
313
|
```ruby
|
|
202
314
|
client = SimpleInference::Client.new(
|
|
203
|
-
base_url: "https://
|
|
204
|
-
api_key:
|
|
315
|
+
base_url: "https://api.openai.com",
|
|
316
|
+
api_key: ENV["OPENAI_API_KEY"],
|
|
317
|
+
raise_on_error: false
|
|
205
318
|
)
|
|
319
|
+
|
|
320
|
+
response = client.chat_completions(model: "gpt-4o-mini", messages: [...])
|
|
321
|
+
|
|
322
|
+
if response.success?
|
|
323
|
+
# success
|
|
324
|
+
else
|
|
325
|
+
puts "Error: #{response.status} - #{response.body}"
|
|
326
|
+
end
|
|
206
327
|
```
|
|
207
328
|
|
|
208
|
-
|
|
329
|
+
## HTTP Adapters
|
|
330
|
+
|
|
331
|
+
### Default (Net::HTTP)
|
|
332
|
+
|
|
333
|
+
The default adapter uses Ruby's built-in `Net::HTTP`. It's thread-safe and compatible with Ruby 3 Fiber scheduler.
|
|
334
|
+
|
|
335
|
+
### HTTPX Adapter
|
|
336
|
+
|
|
337
|
+
For better performance or async environments, use the optional HTTPX adapter:
|
|
209
338
|
|
|
210
339
|
```ruby
|
|
340
|
+
# Gemfile
|
|
341
|
+
gem "httpx"
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
```ruby
|
|
345
|
+
adapter = SimpleInference::HTTPAdapters::HTTPX.new(timeout: 30.0)
|
|
346
|
+
|
|
211
347
|
client = SimpleInference::Client.new(
|
|
212
|
-
base_url: "https://
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
}
|
|
348
|
+
base_url: "https://api.openai.com",
|
|
349
|
+
api_key: ENV["OPENAI_API_KEY"],
|
|
350
|
+
adapter: adapter
|
|
216
351
|
)
|
|
217
352
|
```
|
|
218
353
|
|
|
219
|
-
###
|
|
354
|
+
### Custom Adapter
|
|
220
355
|
|
|
221
|
-
|
|
356
|
+
Implement your own adapter by subclassing `SimpleInference::HTTPAdapter`:
|
|
222
357
|
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
|
|
358
|
+
```ruby
|
|
359
|
+
class MyAdapter < SimpleInference::HTTPAdapter
|
|
360
|
+
def call(request)
|
|
361
|
+
# request keys: :method, :url, :headers, :body, :timeout, :open_timeout, :read_timeout
|
|
362
|
+
# Must return: { status: Integer, headers: Hash, body: String }
|
|
363
|
+
end
|
|
226
364
|
|
|
227
|
-
|
|
365
|
+
def call_stream(request, &block)
|
|
366
|
+
# For streaming support (optional)
|
|
367
|
+
# Yield raw chunks to block for SSE responses
|
|
368
|
+
end
|
|
369
|
+
end
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
## Rails Integration
|
|
228
373
|
|
|
229
|
-
|
|
374
|
+
Create an initializer `config/initializers/simple_inference.rb`:
|
|
230
375
|
|
|
231
376
|
```ruby
|
|
232
|
-
|
|
377
|
+
INFERENCE_CLIENT = SimpleInference::Client.new(
|
|
378
|
+
base_url: ENV.fetch("INFERENCE_BASE_URL", "https://api.openai.com"),
|
|
379
|
+
api_key: ENV["INFERENCE_API_KEY"]
|
|
380
|
+
)
|
|
233
381
|
```
|
|
234
382
|
|
|
235
|
-
|
|
383
|
+
Use in controllers:
|
|
236
384
|
|
|
237
385
|
```ruby
|
|
238
|
-
|
|
386
|
+
class ChatsController < ApplicationController
|
|
387
|
+
def create
|
|
388
|
+
response = INFERENCE_CLIENT.chat_completions(
|
|
389
|
+
model: "gpt-4o-mini",
|
|
390
|
+
messages: [{ "role" => "user", "content" => params[:prompt] }]
|
|
391
|
+
)
|
|
392
|
+
|
|
393
|
+
render json: response.body
|
|
394
|
+
end
|
|
395
|
+
end
|
|
396
|
+
```
|
|
397
|
+
|
|
398
|
+
Use in background jobs:
|
|
399
|
+
|
|
400
|
+
```ruby
|
|
401
|
+
class EmbedJob < ApplicationJob
|
|
402
|
+
def perform(text)
|
|
403
|
+
response = INFERENCE_CLIENT.embeddings(
|
|
404
|
+
model: "text-embedding-3-small",
|
|
405
|
+
input: text
|
|
406
|
+
)
|
|
239
407
|
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
adapter: adapter
|
|
245
|
-
)
|
|
408
|
+
vector = response.body["data"][0]["embedding"]
|
|
409
|
+
# Store vector...
|
|
410
|
+
end
|
|
411
|
+
end
|
|
246
412
|
```
|
|
413
|
+
|
|
414
|
+
## Thread Safety
|
|
415
|
+
|
|
416
|
+
The client is thread-safe:
|
|
417
|
+
|
|
418
|
+
- No global mutable state
|
|
419
|
+
- Per-client configuration only
|
|
420
|
+
- Each request uses its own HTTP connection
|
|
421
|
+
|
|
422
|
+
## License
|
|
423
|
+
|
|
424
|
+
MIT License. See [LICENSE](LICENSE.txt) for details.
|
|
@@ -22,8 +22,112 @@ module SimpleInference
|
|
|
22
22
|
|
|
23
23
|
# POST /v1/chat/completions
|
|
24
24
|
# params: { model: "model-name", messages: [...], ... }
|
|
25
|
-
def chat_completions(params)
|
|
26
|
-
post_json("/
|
|
25
|
+
def chat_completions(**params)
|
|
26
|
+
post_json(api_path("/chat/completions"), params)
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
# High-level helper for OpenAI-compatible chat.
|
|
30
|
+
#
|
|
31
|
+
# - Non-streaming: returns an OpenAI::ChatResult with `content` + `usage`.
|
|
32
|
+
# - Streaming: yields delta strings to the block (if given), accumulates, and returns OpenAI::ChatResult.
|
|
33
|
+
#
|
|
34
|
+
# @param model [String]
|
|
35
|
+
# @param messages [Array<Hash>]
|
|
36
|
+
# @param stream [Boolean] force streaming when true (default: block_given?)
|
|
37
|
+
# @param include_usage [Boolean, nil] when true (and streaming), requests usage in the final chunk
|
|
38
|
+
# @param request_logprobs [Boolean] when true, requests logprobs (and collects them in streaming mode)
|
|
39
|
+
# @param top_logprobs [Integer, nil] default: 5 (when request_logprobs is true)
|
|
40
|
+
# @param params [Hash] additional OpenAI parameters (max_tokens, temperature, etc.)
|
|
41
|
+
# @yield [String] delta content chunks (streaming only)
|
|
42
|
+
# @return [SimpleInference::OpenAI::ChatResult]
|
|
43
|
+
def chat(model:, messages:, stream: nil, include_usage: nil, request_logprobs: false, top_logprobs: 5, **params, &block)
|
|
44
|
+
raise ArgumentError, "model is required" if model.nil? || model.to_s.strip.empty?
|
|
45
|
+
raise ArgumentError, "messages must be an Array" unless messages.is_a?(Array)
|
|
46
|
+
|
|
47
|
+
use_stream = stream.nil? ? block_given? : stream
|
|
48
|
+
|
|
49
|
+
request = { model: model, messages: messages }.merge(params)
|
|
50
|
+
request.delete(:stream)
|
|
51
|
+
request.delete("stream")
|
|
52
|
+
|
|
53
|
+
if request_logprobs
|
|
54
|
+
request[:logprobs] = true unless request.key?(:logprobs) || request.key?("logprobs")
|
|
55
|
+
if top_logprobs && !(request.key?(:top_logprobs) || request.key?("top_logprobs"))
|
|
56
|
+
request[:top_logprobs] = top_logprobs
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
if use_stream && include_usage
|
|
61
|
+
stream_options = request[:stream_options] || request["stream_options"]
|
|
62
|
+
stream_options ||= {}
|
|
63
|
+
|
|
64
|
+
if stream_options.is_a?(Hash)
|
|
65
|
+
stream_options[:include_usage] = true unless stream_options.key?(:include_usage) || stream_options.key?("include_usage")
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
request[:stream_options] = stream_options
|
|
69
|
+
end
|
|
70
|
+
|
|
71
|
+
if use_stream
|
|
72
|
+
full = +""
|
|
73
|
+
finish_reason = nil
|
|
74
|
+
last_usage = nil
|
|
75
|
+
collected_logprobs = []
|
|
76
|
+
|
|
77
|
+
response =
|
|
78
|
+
chat_completions_stream(**request) do |event|
|
|
79
|
+
delta = OpenAI.chat_completion_chunk_delta(event)
|
|
80
|
+
if delta
|
|
81
|
+
full << delta
|
|
82
|
+
block.call(delta) if block
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
fr = event.is_a?(Hash) ? event.dig("choices", 0, "finish_reason") : nil
|
|
86
|
+
finish_reason = fr if fr
|
|
87
|
+
|
|
88
|
+
if request_logprobs
|
|
89
|
+
chunk_logprobs = event.is_a?(Hash) ? event.dig("choices", 0, "logprobs", "content") : nil
|
|
90
|
+
if chunk_logprobs.is_a?(Array)
|
|
91
|
+
collected_logprobs.concat(chunk_logprobs)
|
|
92
|
+
end
|
|
93
|
+
end
|
|
94
|
+
|
|
95
|
+
usage = OpenAI.chat_completion_usage(event)
|
|
96
|
+
last_usage = usage if usage
|
|
97
|
+
end
|
|
98
|
+
|
|
99
|
+
OpenAI::ChatResult.new(
|
|
100
|
+
content: full,
|
|
101
|
+
usage: last_usage || OpenAI.chat_completion_usage(response),
|
|
102
|
+
finish_reason: finish_reason || OpenAI.chat_completion_finish_reason(response),
|
|
103
|
+
logprobs: collected_logprobs.empty? ? OpenAI.chat_completion_logprobs(response) : collected_logprobs,
|
|
104
|
+
response: response
|
|
105
|
+
)
|
|
106
|
+
else
|
|
107
|
+
response = chat_completions(**request)
|
|
108
|
+
OpenAI::ChatResult.new(
|
|
109
|
+
content: OpenAI.chat_completion_content(response),
|
|
110
|
+
usage: OpenAI.chat_completion_usage(response),
|
|
111
|
+
finish_reason: OpenAI.chat_completion_finish_reason(response),
|
|
112
|
+
logprobs: OpenAI.chat_completion_logprobs(response),
|
|
113
|
+
response: response
|
|
114
|
+
)
|
|
115
|
+
end
|
|
116
|
+
end
|
|
117
|
+
|
|
118
|
+
# Streaming chat as an Enumerable.
|
|
119
|
+
#
|
|
120
|
+
# @return [SimpleInference::OpenAI::ChatStream]
|
|
121
|
+
def chat_stream(model:, messages:, include_usage: nil, request_logprobs: false, top_logprobs: 5, **params)
|
|
122
|
+
OpenAI::ChatStream.new(
|
|
123
|
+
client: self,
|
|
124
|
+
model: model,
|
|
125
|
+
messages: messages,
|
|
126
|
+
include_usage: include_usage,
|
|
127
|
+
request_logprobs: request_logprobs,
|
|
128
|
+
top_logprobs: top_logprobs,
|
|
129
|
+
params: params
|
|
130
|
+
)
|
|
27
131
|
end
|
|
28
132
|
|
|
29
133
|
# POST /v1/chat/completions (streaming)
|
|
@@ -31,45 +135,41 @@ module SimpleInference
|
|
|
31
135
|
# Yields parsed JSON events from an OpenAI-style SSE stream (`text/event-stream`).
|
|
32
136
|
#
|
|
33
137
|
# If no block is given, returns an Enumerator.
|
|
34
|
-
def chat_completions_stream(params)
|
|
35
|
-
return enum_for(:chat_completions_stream, params) unless block_given?
|
|
36
|
-
|
|
37
|
-
unless params.is_a?(Hash)
|
|
38
|
-
raise Errors::ConfigurationError, "params must be a Hash"
|
|
39
|
-
end
|
|
138
|
+
def chat_completions_stream(**params)
|
|
139
|
+
return enum_for(:chat_completions_stream, **params) unless block_given?
|
|
40
140
|
|
|
41
141
|
body = params.dup
|
|
42
142
|
body.delete(:stream)
|
|
43
143
|
body.delete("stream")
|
|
44
144
|
body["stream"] = true
|
|
45
145
|
|
|
46
|
-
response = post_json_stream("/
|
|
146
|
+
response = post_json_stream(api_path("/chat/completions"), body) do |event|
|
|
47
147
|
yield event
|
|
48
148
|
end
|
|
49
149
|
|
|
50
|
-
content_type = response.
|
|
150
|
+
content_type = response.headers["content-type"].to_s
|
|
51
151
|
|
|
52
152
|
# Streaming case: we already yielded events from the SSE stream.
|
|
53
|
-
if response
|
|
153
|
+
if response.status >= 200 && response.status < 300 && content_type.include?("text/event-stream")
|
|
54
154
|
return response
|
|
55
155
|
end
|
|
56
156
|
|
|
57
157
|
# Fallback when upstream does not support streaming (this repo's server).
|
|
58
|
-
if streaming_unsupported_error?(response
|
|
158
|
+
if streaming_unsupported_error?(response.status, response.body)
|
|
59
159
|
fallback_body = params.dup
|
|
60
160
|
fallback_body.delete(:stream)
|
|
61
161
|
fallback_body.delete("stream")
|
|
62
162
|
|
|
63
|
-
fallback_response = post_json("/
|
|
64
|
-
chunk = synthesize_chat_completion_chunk(fallback_response
|
|
163
|
+
fallback_response = post_json(api_path("/chat/completions"), fallback_body)
|
|
164
|
+
chunk = synthesize_chat_completion_chunk(fallback_response.body)
|
|
65
165
|
yield chunk if chunk
|
|
66
166
|
return fallback_response
|
|
67
167
|
end
|
|
68
168
|
|
|
69
169
|
# If we got a non-streaming success response (JSON), convert it into a single
|
|
70
170
|
# chunk so streaming consumers can share the same code path.
|
|
71
|
-
if response
|
|
72
|
-
chunk = synthesize_chat_completion_chunk(response
|
|
171
|
+
if response.status >= 200 && response.status < 300
|
|
172
|
+
chunk = synthesize_chat_completion_chunk(response.body)
|
|
73
173
|
yield chunk if chunk
|
|
74
174
|
end
|
|
75
175
|
|
|
@@ -77,18 +177,27 @@ module SimpleInference
|
|
|
77
177
|
end
|
|
78
178
|
|
|
79
179
|
# POST /v1/embeddings
|
|
80
|
-
def embeddings(params)
|
|
81
|
-
post_json("/
|
|
180
|
+
def embeddings(**params)
|
|
181
|
+
post_json(api_path("/embeddings"), params)
|
|
82
182
|
end
|
|
83
183
|
|
|
84
184
|
# POST /v1/rerank
|
|
85
|
-
def rerank(params)
|
|
86
|
-
post_json("/
|
|
185
|
+
def rerank(**params)
|
|
186
|
+
post_json(api_path("/rerank"), params)
|
|
87
187
|
end
|
|
88
188
|
|
|
89
189
|
# GET /v1/models
|
|
90
190
|
def list_models
|
|
91
|
-
get_json("/
|
|
191
|
+
get_json(api_path("/models"))
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
# Convenience wrapper for list_models.
|
|
195
|
+
#
|
|
196
|
+
# @return [Array<String>] model IDs
|
|
197
|
+
def models
|
|
198
|
+
response = list_models
|
|
199
|
+
data = response.body.is_a?(Hash) ? response.body["data"] : nil
|
|
200
|
+
Array(data).filter_map { |m| m.is_a?(Hash) ? m["id"] : nil }
|
|
92
201
|
end
|
|
93
202
|
|
|
94
203
|
# GET /health
|
|
@@ -99,8 +208,8 @@ module SimpleInference
|
|
|
99
208
|
# Returns true when service is healthy, false otherwise.
|
|
100
209
|
def healthy?
|
|
101
210
|
response = get_json("/health", raise_on_http_error: false)
|
|
102
|
-
status_ok = response
|
|
103
|
-
body_status_ok = response.
|
|
211
|
+
status_ok = response.status == 200
|
|
212
|
+
body_status_ok = response.body.is_a?(Hash) && response.body["status"] == "ok"
|
|
104
213
|
status_ok && body_status_ok
|
|
105
214
|
rescue Errors::Error
|
|
106
215
|
false
|
|
@@ -108,13 +217,13 @@ module SimpleInference
|
|
|
108
217
|
|
|
109
218
|
# POST /v1/audio/transcriptions
|
|
110
219
|
# params: { file: io_or_hash, model: "model-name", **audio_options }
|
|
111
|
-
def audio_transcriptions(params)
|
|
112
|
-
post_multipart("/
|
|
220
|
+
def audio_transcriptions(**params)
|
|
221
|
+
post_multipart(api_path("/audio/transcriptions"), params)
|
|
113
222
|
end
|
|
114
223
|
|
|
115
224
|
# POST /v1/audio/translations
|
|
116
|
-
def audio_translations(params)
|
|
117
|
-
post_multipart("/
|
|
225
|
+
def audio_translations(**params)
|
|
226
|
+
post_multipart(api_path("/audio/translations"), params)
|
|
118
227
|
end
|
|
119
228
|
|
|
120
229
|
private
|
|
@@ -123,6 +232,10 @@ module SimpleInference
|
|
|
123
232
|
config.base_url
|
|
124
233
|
end
|
|
125
234
|
|
|
235
|
+
def api_path(endpoint)
|
|
236
|
+
"#{config.api_prefix}#{endpoint}"
|
|
237
|
+
end
|
|
238
|
+
|
|
126
239
|
def get_json(path, params: nil, raise_on_http_error: nil)
|
|
127
240
|
full_path = with_query(path, params)
|
|
128
241
|
request_json(
|
|
@@ -199,31 +312,26 @@ module SimpleInference
|
|
|
199
312
|
consume_sse_buffer!(buffer, &on_event)
|
|
200
313
|
end
|
|
201
314
|
|
|
202
|
-
return
|
|
203
|
-
status: status,
|
|
204
|
-
headers: headers,
|
|
205
|
-
body: nil,
|
|
206
|
-
}
|
|
315
|
+
return Response.new(status: status, headers: headers, body: nil)
|
|
207
316
|
end
|
|
208
317
|
|
|
209
318
|
# Non-streaming response path (adapter doesn't support streaming or server returned JSON).
|
|
210
319
|
should_parse_json = content_type.include?("json")
|
|
211
|
-
parsed_body =
|
|
212
|
-
|
|
213
|
-
|
|
214
|
-
|
|
215
|
-
|
|
216
|
-
|
|
217
|
-
|
|
218
|
-
|
|
219
|
-
|
|
220
|
-
|
|
320
|
+
parsed_body =
|
|
321
|
+
if should_parse_json
|
|
322
|
+
begin
|
|
323
|
+
parse_json(body_str)
|
|
324
|
+
rescue Errors::DecodeError
|
|
325
|
+
# Prefer HTTPError over DecodeError for non-2xx responses.
|
|
326
|
+
status >= 200 && status < 300 ? raise : body_str
|
|
327
|
+
end
|
|
328
|
+
else
|
|
329
|
+
body_str
|
|
330
|
+
end
|
|
221
331
|
|
|
222
|
-
|
|
223
|
-
|
|
224
|
-
|
|
225
|
-
body: parsed_body,
|
|
226
|
-
}
|
|
332
|
+
response = Response.new(status: status, headers: headers, body: parsed_body, raw_body: body_str)
|
|
333
|
+
maybe_raise_http_error(response: response, raise_on_http_error: raise_on_http_error, ignore_streaming_unsupported: true)
|
|
334
|
+
response
|
|
227
335
|
rescue Timeout::Error => e
|
|
228
336
|
raise Errors::TimeoutError, e.message
|
|
229
337
|
rescue SocketError, SystemCallError => e
|
|
@@ -575,13 +683,6 @@ module SimpleInference
|
|
|
575
683
|
headers = (response[:headers] || {}).transform_keys { |k| k.to_s.downcase }
|
|
576
684
|
body = response[:body].to_s
|
|
577
685
|
|
|
578
|
-
maybe_raise_http_error(
|
|
579
|
-
status: status,
|
|
580
|
-
headers: headers,
|
|
581
|
-
body_str: body,
|
|
582
|
-
raise_on_http_error: raise_on_http_error
|
|
583
|
-
)
|
|
584
|
-
|
|
585
686
|
should_parse_json =
|
|
586
687
|
if expect_json.nil?
|
|
587
688
|
content_type = headers["content-type"]
|
|
@@ -592,16 +693,19 @@ module SimpleInference
|
|
|
592
693
|
|
|
593
694
|
parsed_body =
|
|
594
695
|
if should_parse_json
|
|
595
|
-
|
|
696
|
+
begin
|
|
697
|
+
parse_json(body)
|
|
698
|
+
rescue Errors::DecodeError
|
|
699
|
+
# Prefer HTTPError over DecodeError for non-2xx responses.
|
|
700
|
+
status >= 200 && status < 300 ? raise : body
|
|
701
|
+
end
|
|
596
702
|
else
|
|
597
703
|
body
|
|
598
704
|
end
|
|
599
705
|
|
|
600
|
-
|
|
601
|
-
|
|
602
|
-
|
|
603
|
-
body: parsed_body,
|
|
604
|
-
}
|
|
706
|
+
response = Response.new(status: status, headers: headers, body: parsed_body, raw_body: body)
|
|
707
|
+
maybe_raise_http_error(response: response, raise_on_http_error: raise_on_http_error)
|
|
708
|
+
response
|
|
605
709
|
rescue Timeout::Error => e
|
|
606
710
|
raise Errors::TimeoutError, e.message
|
|
607
711
|
rescue SocketError, SystemCallError => e
|
|
@@ -644,26 +748,17 @@ module SimpleInference
|
|
|
644
748
|
end
|
|
645
749
|
end
|
|
646
750
|
|
|
647
|
-
def maybe_raise_http_error(
|
|
648
|
-
status:,
|
|
649
|
-
headers:,
|
|
650
|
-
body_str:,
|
|
651
|
-
raise_on_http_error:,
|
|
652
|
-
ignore_streaming_unsupported: false,
|
|
653
|
-
parsed_body: nil
|
|
654
|
-
)
|
|
751
|
+
def maybe_raise_http_error(response:, raise_on_http_error:, ignore_streaming_unsupported: false)
|
|
655
752
|
return unless raise_on_http_error?(raise_on_http_error)
|
|
656
|
-
return
|
|
753
|
+
return if response.success?
|
|
657
754
|
|
|
658
755
|
# Do not raise for the known "streaming unsupported" case; the caller will
|
|
659
756
|
# perform a non-streaming retry fallback.
|
|
660
|
-
return if ignore_streaming_unsupported && streaming_unsupported_error?(status,
|
|
757
|
+
return if ignore_streaming_unsupported && streaming_unsupported_error?(response.status, response.body)
|
|
661
758
|
|
|
662
759
|
raise Errors::HTTPError.new(
|
|
663
|
-
http_error_message(status,
|
|
664
|
-
|
|
665
|
-
headers: headers,
|
|
666
|
-
body: body_str
|
|
760
|
+
http_error_message(response.status, response.raw_body.to_s, parsed_body: response.body),
|
|
761
|
+
response: response
|
|
667
762
|
)
|
|
668
763
|
end
|
|
669
764
|
end
|
|
@@ -4,6 +4,7 @@ module SimpleInference
|
|
|
4
4
|
class Config
|
|
5
5
|
attr_reader :base_url,
|
|
6
6
|
:api_key,
|
|
7
|
+
:api_prefix,
|
|
7
8
|
:timeout,
|
|
8
9
|
:open_timeout,
|
|
9
10
|
:read_timeout,
|
|
@@ -19,6 +20,10 @@ module SimpleInference
|
|
|
19
20
|
@api_key = (opts[:api_key] || ENV["SIMPLE_INFERENCE_API_KEY"]).to_s
|
|
20
21
|
@api_key = nil if @api_key.empty?
|
|
21
22
|
|
|
23
|
+
@api_prefix = normalize_api_prefix(
|
|
24
|
+
opts.key?(:api_prefix) ? opts[:api_prefix] : ENV.fetch("SIMPLE_INFERENCE_API_PREFIX", "/v1")
|
|
25
|
+
)
|
|
26
|
+
|
|
22
27
|
@timeout = to_float_or_nil(opts[:timeout] || ENV["SIMPLE_INFERENCE_TIMEOUT"])
|
|
23
28
|
@open_timeout = to_float_or_nil(opts[:open_timeout] || ENV["SIMPLE_INFERENCE_OPEN_TIMEOUT"])
|
|
24
29
|
@read_timeout = to_float_or_nil(opts[:read_timeout] || ENV["SIMPLE_INFERENCE_READ_TIMEOUT"])
|
|
@@ -46,6 +51,17 @@ module SimpleInference
|
|
|
46
51
|
url.chomp("/")
|
|
47
52
|
end
|
|
48
53
|
|
|
54
|
+
def normalize_api_prefix(value)
|
|
55
|
+
return "" if value.nil?
|
|
56
|
+
|
|
57
|
+
prefix = value.to_s.strip
|
|
58
|
+
return "" if prefix.empty?
|
|
59
|
+
|
|
60
|
+
# Ensure it starts with / and does not end with /
|
|
61
|
+
prefix = "/#{prefix}" unless prefix.start_with?("/")
|
|
62
|
+
prefix.chomp("/")
|
|
63
|
+
end
|
|
64
|
+
|
|
49
65
|
def to_float_or_nil(value)
|
|
50
66
|
return nil if value.nil? || value == ""
|
|
51
67
|
|
|
@@ -7,14 +7,20 @@ module SimpleInference
|
|
|
7
7
|
class ConfigurationError < Error; end
|
|
8
8
|
|
|
9
9
|
class HTTPError < Error
|
|
10
|
-
attr_reader :
|
|
10
|
+
attr_reader :response
|
|
11
11
|
|
|
12
|
-
def initialize(message,
|
|
12
|
+
def initialize(message, response:)
|
|
13
13
|
super(message)
|
|
14
|
-
@
|
|
15
|
-
@headers = headers
|
|
16
|
-
@body = body
|
|
14
|
+
@response = response
|
|
17
15
|
end
|
|
16
|
+
|
|
17
|
+
def status = @response.status
|
|
18
|
+
|
|
19
|
+
def headers = @response.headers
|
|
20
|
+
|
|
21
|
+
def body = @response.body
|
|
22
|
+
|
|
23
|
+
def raw_body = @response.raw_body
|
|
18
24
|
end
|
|
19
25
|
|
|
20
26
|
class TimeoutError < Error; end
|
|
@@ -0,0 +1,178 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module SimpleInference
|
|
4
|
+
# Helpers for extracting common fields from OpenAI-compatible `chat/completions` payloads.
|
|
5
|
+
#
|
|
6
|
+
# These helpers accept either:
|
|
7
|
+
# - A `SimpleInference::Response`, or
|
|
8
|
+
# - A parsed `body` / `chunk` hash (typically from JSON.parse, with String keys)
|
|
9
|
+
#
|
|
10
|
+
# Providers are "OpenAI-compatible", but many differ in subtle ways:
|
|
11
|
+
# - Some return `choices[0].text` instead of `choices[0].message.content`
|
|
12
|
+
# - Some represent `content` as an array or structured hash
|
|
13
|
+
#
|
|
14
|
+
# This module normalizes those shapes so application code can stay small and predictable.
|
|
15
|
+
module OpenAI
|
|
16
|
+
module_function
|
|
17
|
+
|
|
18
|
+
ChatResult =
|
|
19
|
+
Struct.new(
|
|
20
|
+
:content,
|
|
21
|
+
:usage,
|
|
22
|
+
:finish_reason,
|
|
23
|
+
:logprobs,
|
|
24
|
+
:response,
|
|
25
|
+
keyword_init: true
|
|
26
|
+
)
|
|
27
|
+
|
|
28
|
+
# Enumerable wrapper for streaming chat responses.
|
|
29
|
+
#
|
|
30
|
+
# @example
|
|
31
|
+
# stream = client.chat_stream(model: "...", messages: [...], include_usage: true)
|
|
32
|
+
# stream.each { |delta| print delta }
|
|
33
|
+
# p stream.result.usage
|
|
34
|
+
class ChatStream
|
|
35
|
+
include Enumerable
|
|
36
|
+
|
|
37
|
+
attr_reader :result
|
|
38
|
+
|
|
39
|
+
def initialize(client:, model:, messages:, include_usage:, request_logprobs:, top_logprobs:, params:)
|
|
40
|
+
@client = client
|
|
41
|
+
@model = model
|
|
42
|
+
@messages = messages
|
|
43
|
+
@include_usage = include_usage
|
|
44
|
+
@request_logprobs = request_logprobs
|
|
45
|
+
@top_logprobs = top_logprobs
|
|
46
|
+
@params = params
|
|
47
|
+
@started = false
|
|
48
|
+
@result = nil
|
|
49
|
+
end
|
|
50
|
+
|
|
51
|
+
def each
|
|
52
|
+
return enum_for(:each) unless block_given?
|
|
53
|
+
raise Errors::ConfigurationError, "ChatStream can only be consumed once" if @started
|
|
54
|
+
|
|
55
|
+
@started = true
|
|
56
|
+
@result =
|
|
57
|
+
@client.chat(
|
|
58
|
+
model: @model,
|
|
59
|
+
messages: @messages,
|
|
60
|
+
stream: true,
|
|
61
|
+
include_usage: @include_usage,
|
|
62
|
+
request_logprobs: @request_logprobs,
|
|
63
|
+
top_logprobs: @top_logprobs,
|
|
64
|
+
**(@params || {})
|
|
65
|
+
) { |delta| yield delta }
|
|
66
|
+
end
|
|
67
|
+
end
|
|
68
|
+
|
|
69
|
+
# Extract assistant content from a non-streaming chat completion.
|
|
70
|
+
#
|
|
71
|
+
# @param response_or_body [Hash] SimpleInference response hash or parsed body hash
|
|
72
|
+
# @return [String, nil]
|
|
73
|
+
def chat_completion_content(response_or_body)
|
|
74
|
+
body = unwrap_body(response_or_body)
|
|
75
|
+
choice = first_choice(body)
|
|
76
|
+
return nil unless choice
|
|
77
|
+
|
|
78
|
+
raw =
|
|
79
|
+
choice.dig("message", "content") ||
|
|
80
|
+
choice["text"]
|
|
81
|
+
|
|
82
|
+
normalize_content(raw)
|
|
83
|
+
end
|
|
84
|
+
|
|
85
|
+
# Extract finish_reason from a non-streaming chat completion.
|
|
86
|
+
#
|
|
87
|
+
# @param response_or_body [Hash] SimpleInference response hash or parsed body hash
|
|
88
|
+
# @return [String, nil]
|
|
89
|
+
def chat_completion_finish_reason(response_or_body)
|
|
90
|
+
body = unwrap_body(response_or_body)
|
|
91
|
+
first_choice(body)&.[]("finish_reason")
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
# Extract usage from a chat completion response or a final streaming chunk.
|
|
95
|
+
#
|
|
96
|
+
# @param response_or_body [Hash] SimpleInference response hash, body hash, or chunk hash
|
|
97
|
+
# @return [Hash, nil] symbol-keyed usage hash
|
|
98
|
+
def chat_completion_usage(response_or_body)
|
|
99
|
+
body = unwrap_body(response_or_body)
|
|
100
|
+
usage = body.is_a?(Hash) ? body["usage"] : nil
|
|
101
|
+
return nil unless usage.is_a?(Hash)
|
|
102
|
+
|
|
103
|
+
{
|
|
104
|
+
prompt_tokens: usage["prompt_tokens"],
|
|
105
|
+
completion_tokens: usage["completion_tokens"],
|
|
106
|
+
total_tokens: usage["total_tokens"],
|
|
107
|
+
}.compact
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
# Extract logprobs (if present) from a non-streaming chat completion.
|
|
111
|
+
#
|
|
112
|
+
# @param response_or_body [Hash] SimpleInference response hash or parsed body hash
|
|
113
|
+
# @return [Array<Hash>, nil]
|
|
114
|
+
def chat_completion_logprobs(response_or_body)
|
|
115
|
+
body = unwrap_body(response_or_body)
|
|
116
|
+
first_choice(body)&.dig("logprobs", "content")
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
# Extract delta content from a streaming `chat.completion.chunk`.
|
|
120
|
+
#
|
|
121
|
+
# @param chunk [Hash] parsed streaming event hash
|
|
122
|
+
# @return [String, nil]
|
|
123
|
+
def chat_completion_chunk_delta(chunk)
|
|
124
|
+
chunk = unwrap_body(chunk)
|
|
125
|
+
return nil unless chunk.is_a?(Hash)
|
|
126
|
+
|
|
127
|
+
raw = chunk.dig("choices", 0, "delta", "content")
|
|
128
|
+
normalize_content(raw)
|
|
129
|
+
end
|
|
130
|
+
|
|
131
|
+
# Normalize `content` shapes into a simple String.
|
|
132
|
+
#
|
|
133
|
+
# Supports strings, arrays of parts, and part hashes.
|
|
134
|
+
#
|
|
135
|
+
# @param value [Object]
|
|
136
|
+
# @return [String, nil]
|
|
137
|
+
def normalize_content(value)
|
|
138
|
+
case value
|
|
139
|
+
when String
|
|
140
|
+
value
|
|
141
|
+
when Array
|
|
142
|
+
value.map { |part| normalize_content(part) }.join
|
|
143
|
+
when Hash
|
|
144
|
+
value["text"] ||
|
|
145
|
+
value["content"] ||
|
|
146
|
+
value.to_s
|
|
147
|
+
when nil
|
|
148
|
+
nil
|
|
149
|
+
else
|
|
150
|
+
value.to_s
|
|
151
|
+
end
|
|
152
|
+
end
|
|
153
|
+
|
|
154
|
+
# Unwrap a full SimpleInference response into its `:body`, otherwise return the object.
|
|
155
|
+
#
|
|
156
|
+
# @param obj [Object]
|
|
157
|
+
# @return [Object]
|
|
158
|
+
def unwrap_body(obj)
|
|
159
|
+
return {} unless obj
|
|
160
|
+
return obj.body || {} if obj.respond_to?(:body)
|
|
161
|
+
|
|
162
|
+
obj
|
|
163
|
+
end
|
|
164
|
+
|
|
165
|
+
def first_choice(body)
|
|
166
|
+
return nil unless body.is_a?(Hash)
|
|
167
|
+
|
|
168
|
+
choices = body["choices"]
|
|
169
|
+
return nil unless choices.is_a?(Array) && !choices.empty?
|
|
170
|
+
|
|
171
|
+
choice0 = choices[0]
|
|
172
|
+
return nil unless choice0.is_a?(Hash)
|
|
173
|
+
|
|
174
|
+
choice0
|
|
175
|
+
end
|
|
176
|
+
private_class_method :first_choice
|
|
177
|
+
end
|
|
178
|
+
end
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module SimpleInference
|
|
4
|
+
# A lightweight wrapper for HTTP responses returned by SimpleInference.
|
|
5
|
+
#
|
|
6
|
+
# - `status` is an Integer HTTP status code
|
|
7
|
+
# - `headers` is a Hash with downcased String keys
|
|
8
|
+
# - `body` is a parsed JSON Hash/Array, a String, or nil (e.g. SSE streaming success)
|
|
9
|
+
# - `raw_body` is the raw response body String (when available)
|
|
10
|
+
class Response
|
|
11
|
+
attr_reader :status, :headers, :body, :raw_body
|
|
12
|
+
|
|
13
|
+
def initialize(status:, headers:, body:, raw_body: nil)
|
|
14
|
+
@status = status.to_i
|
|
15
|
+
@headers = (headers || {}).transform_keys { |k| k.to_s.downcase }
|
|
16
|
+
@body = body
|
|
17
|
+
@raw_body = raw_body
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def success?
|
|
21
|
+
status >= 200 && status < 300
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def to_h
|
|
25
|
+
{ status: status, headers: headers, body: body, raw_body: raw_body }
|
|
26
|
+
end
|
|
27
|
+
end
|
|
28
|
+
end
|
data/lib/simple_inference.rb
CHANGED
|
@@ -4,6 +4,8 @@ require_relative "simple_inference/version"
|
|
|
4
4
|
require_relative "simple_inference/config"
|
|
5
5
|
require_relative "simple_inference/errors"
|
|
6
6
|
require_relative "simple_inference/http_adapter"
|
|
7
|
+
require_relative "simple_inference/response"
|
|
8
|
+
require_relative "simple_inference/openai"
|
|
7
9
|
require_relative "simple_inference/client"
|
|
8
10
|
|
|
9
11
|
module SimpleInference
|
data/sig/simple_inference.rbs
CHANGED
|
@@ -1,4 +1,71 @@
|
|
|
1
1
|
module SimpleInference
|
|
2
2
|
VERSION: String
|
|
3
|
-
end
|
|
4
3
|
|
|
4
|
+
class Response
|
|
5
|
+
attr_reader status: Integer
|
|
6
|
+
attr_reader headers: Hash[String, untyped]
|
|
7
|
+
attr_reader body: untyped
|
|
8
|
+
attr_reader raw_body: String?
|
|
9
|
+
|
|
10
|
+
def initialize: (status: Integer, headers: Hash[untyped, untyped], body: untyped, ?raw_body: String?) -> void
|
|
11
|
+
def success?: () -> bool
|
|
12
|
+
def to_h: () -> Hash[Symbol, untyped]
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
module OpenAI
|
|
16
|
+
class ChatResult
|
|
17
|
+
attr_reader content: String?
|
|
18
|
+
attr_reader usage: Hash[Symbol, untyped]?
|
|
19
|
+
attr_reader finish_reason: String?
|
|
20
|
+
attr_reader logprobs: Array[Hash[untyped, untyped]]?
|
|
21
|
+
attr_reader response: Response
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
class ChatStream
|
|
25
|
+
include Enumerable[String]
|
|
26
|
+
attr_reader result: ChatResult?
|
|
27
|
+
end
|
|
28
|
+
|
|
29
|
+
def self.chat_completion_content: (untyped) -> String?
|
|
30
|
+
def self.chat_completion_finish_reason: (untyped) -> String?
|
|
31
|
+
def self.chat_completion_usage: (untyped) -> Hash[Symbol, untyped]?
|
|
32
|
+
def self.chat_completion_logprobs: (untyped) -> Array[Hash[untyped, untyped]]?
|
|
33
|
+
def self.chat_completion_chunk_delta: (untyped) -> String?
|
|
34
|
+
def self.normalize_content: (untyped) -> String?
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
class Client
|
|
38
|
+
def initialize: (?Hash[untyped, untyped]) -> void
|
|
39
|
+
|
|
40
|
+
def chat: (
|
|
41
|
+
model: String,
|
|
42
|
+
messages: Array[Hash[untyped, untyped]],
|
|
43
|
+
?stream: bool?,
|
|
44
|
+
?include_usage: bool?,
|
|
45
|
+
?request_logprobs: bool,
|
|
46
|
+
?top_logprobs: Integer?,
|
|
47
|
+
**untyped
|
|
48
|
+
) { (String) -> void } -> OpenAI::ChatResult
|
|
49
|
+
|
|
50
|
+
def chat_stream: (
|
|
51
|
+
model: String,
|
|
52
|
+
messages: Array[Hash[untyped, untyped]],
|
|
53
|
+
?include_usage: bool?,
|
|
54
|
+
?request_logprobs: bool,
|
|
55
|
+
?top_logprobs: Integer?,
|
|
56
|
+
**untyped
|
|
57
|
+
) -> OpenAI::ChatStream
|
|
58
|
+
|
|
59
|
+
def chat_completions: (**untyped) -> Response
|
|
60
|
+
def chat_completions_stream: (**untyped) { (Hash[untyped, untyped]) -> void } -> Response
|
|
61
|
+
|
|
62
|
+
def embeddings: (**untyped) -> Response
|
|
63
|
+
def rerank: (**untyped) -> Response
|
|
64
|
+
def list_models: () -> Response
|
|
65
|
+
def models: () -> Array[String]
|
|
66
|
+
def health: () -> Response
|
|
67
|
+
def healthy?: () -> bool
|
|
68
|
+
def audio_transcriptions: (**untyped) -> Response
|
|
69
|
+
def audio_translations: (**untyped) -> Response
|
|
70
|
+
end
|
|
71
|
+
end
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: simple_inference
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 0.1.
|
|
4
|
+
version: 0.1.5
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- jasl
|
|
@@ -9,8 +9,8 @@ bindir: exe
|
|
|
9
9
|
cert_chain: []
|
|
10
10
|
date: 1980-01-02 00:00:00.000000000 Z
|
|
11
11
|
dependencies: []
|
|
12
|
-
description: Fiber-friendly Ruby client for
|
|
13
|
-
audio, rerank, health).
|
|
12
|
+
description: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
|
|
13
|
+
(chat, embeddings, audio, rerank, health).
|
|
14
14
|
email:
|
|
15
15
|
- jasl9187@hotmail.com
|
|
16
16
|
executables: []
|
|
@@ -27,15 +27,16 @@ files:
|
|
|
27
27
|
- lib/simple_inference/http_adapter.rb
|
|
28
28
|
- lib/simple_inference/http_adapters/default.rb
|
|
29
29
|
- lib/simple_inference/http_adapters/httpx.rb
|
|
30
|
+
- lib/simple_inference/openai.rb
|
|
31
|
+
- lib/simple_inference/response.rb
|
|
30
32
|
- lib/simple_inference/version.rb
|
|
31
33
|
- sig/simple_inference.rbs
|
|
32
|
-
homepage: https://github.com/jasl/
|
|
34
|
+
homepage: https://github.com/jasl/simple_inference.rb
|
|
33
35
|
licenses:
|
|
34
36
|
- MIT
|
|
35
37
|
metadata:
|
|
36
38
|
allowed_push_host: https://rubygems.org
|
|
37
|
-
homepage_uri: https://github.com/jasl/
|
|
38
|
-
source_code_uri: https://github.com/jasl/simple_inference_server
|
|
39
|
+
homepage_uri: https://github.com/jasl/simple_inference.rb
|
|
39
40
|
rubygems_mfa_required: 'true'
|
|
40
41
|
rdoc_options: []
|
|
41
42
|
require_paths:
|
|
@@ -51,7 +52,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
|
51
52
|
- !ruby/object:Gem::Version
|
|
52
53
|
version: '0'
|
|
53
54
|
requirements: []
|
|
54
|
-
rubygems_version: 4.0.
|
|
55
|
+
rubygems_version: 4.0.3
|
|
55
56
|
specification_version: 4
|
|
56
|
-
summary: Fiber-friendly Ruby client for
|
|
57
|
+
summary: A lightweight, Fiber-friendly Ruby client for OpenAI-compatible LLM APIs.
|
|
57
58
|
test_files: []
|