llm.rb 4.12.0 → 4.13.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +38 -0
- data/README.md +124 -741
- data/lib/llm/context.rb +2 -2
- data/lib/llm/function.rb +1 -1
- data/lib/llm/mcp/error.rb +31 -1
- data/lib/llm/mcp/rpc.rb +8 -3
- data/lib/llm/mcp.rb +41 -0
- data/lib/llm/providers/openai/request_adapter/respond.rb +11 -5
- data/lib/llm/providers/openai/response_adapter/responds.rb +13 -1
- data/lib/llm/providers/openai/responses/stream_parser.rb +31 -0
- data/lib/llm/version.rb +1 -1
- data/llm.gemspec +16 -6
- metadata +17 -7
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 7847fee7ea1e63553ad5323750fc2e5ac1b4a9082c2f4c5aba71f4587440ea75
|
|
4
|
+
data.tar.gz: e63bdae085b2f0f606cbdb4633a7eff93fd6e2428fcb85ff5fe94fc78851bf5d
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: b1c8d8600b3214da5613d152677d13fde796b42e6a29cf8af035e4ad5f28b7cea0466a375b9b444a748e9e063d2e6ad6720b653609cb2b7038e8040cd2b44e39
|
|
7
|
+
data.tar.gz: c76882f9cd5416312e26f4e25493403df8f9f8c61ee14cba5096383b449bd7a4ce8b9d70834d12176648c3d9206f0f555a1eec4b22bdb6426d88c0c36c8ed592
|
data/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,43 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## Unreleased
|
|
4
|
+
|
|
5
|
+
Changes since `v4.13.0`.
|
|
6
|
+
|
|
7
|
+
## v4.13.0
|
|
8
|
+
|
|
9
|
+
Changes since `v4.12.0`.
|
|
10
|
+
|
|
11
|
+
This release expands MCP prompt support, improves reasoning support in the
|
|
12
|
+
OpenAI Responses API, and refreshes the docs around llm.rb's runtime model,
|
|
13
|
+
contexts, and advanced workflows.
|
|
14
|
+
|
|
15
|
+
### Add
|
|
16
|
+
|
|
17
|
+
- Add `LLM::MCP#prompts` and `LLM::MCP#find_prompt` for MCP prompt support.
|
|
18
|
+
|
|
19
|
+
### Change
|
|
20
|
+
|
|
21
|
+
- Rework the README around llm.rb as a runtime for AI systems.
|
|
22
|
+
- Add a dedicated deep dive guide for providers, contexts, persistence,
|
|
23
|
+
tools, agents, MCP, tracing, multimodal prompts, and retrieval.
|
|
24
|
+
|
|
25
|
+
### Fix
|
|
26
|
+
|
|
27
|
+
All of these fixes apply to MCP:
|
|
28
|
+
|
|
29
|
+
- fix(mcp): raise `LLM::MCP::MismatchError` on mismatched response ids.
|
|
30
|
+
- fix(mcp): normalize prompt message content while preserving the original payload.
|
|
31
|
+
|
|
32
|
+
All of these fixes apply to OpenAI's Responses API:
|
|
33
|
+
|
|
34
|
+
- fix(openai): emit `on_reasoning_content` for streamed reasoning summaries.
|
|
35
|
+
- fix(openai): skip `previous_response_id` on `store: false` follow-up calls.
|
|
36
|
+
- fix(openai): fall back to an empty object schema for tools without params.
|
|
37
|
+
- fix(openai): preserve original tool-call payloads on re-sent assistant tool messages.
|
|
38
|
+
- fix(openai): emit `output_text` for assistant-authored response content.
|
|
39
|
+
- fix(openai): return `nil` for `system_fingerprint` on normalized response objects.
|
|
40
|
+
|
|
3
41
|
## v4.12.0
|
|
4
42
|
|
|
5
43
|
Changes since `v4.11.1`.
|
data/README.md
CHANGED
|
@@ -4,155 +4,148 @@
|
|
|
4
4
|
<p align="center">
|
|
5
5
|
<a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
|
|
6
6
|
<a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
|
|
7
|
-
<a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.
|
|
7
|
+
<a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.13.0-green.svg?" alt="Version"></a>
|
|
8
8
|
</p>
|
|
9
9
|
|
|
10
10
|
## About
|
|
11
11
|
|
|
12
|
-
llm.rb is a
|
|
13
|
-
|
|
14
|
-
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
applications,
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
50
|
-
-
|
|
51
|
-
|
|
52
|
-
-
|
|
53
|
-
|
|
54
|
-
|
|
55
|
-
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
-
|
|
63
|
-
|
|
64
|
-
|
|
65
|
-
|
|
66
|
-
|
|
67
|
-
|
|
68
|
-
|
|
69
|
-
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
80
|
-
|
|
81
|
-
|
|
82
|
-
|
|
83
|
-
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
|
|
93
|
-
|
|
94
|
-
|
|
95
|
-
|
|
96
|
-
|
|
97
|
-
|
|
98
|
-
|
|
99
|
-
|
|
100
|
-
|
|
101
|
-
|
|
102
|
-
|
|
103
|
-
- **
|
|
104
|
-
|
|
105
|
-
|
|
106
|
-
- **
|
|
107
|
-
|
|
108
|
-
|
|
12
|
+
llm.rb is a runtime for building AI systems that integrate directly with your
|
|
13
|
+
application. It is not just an API wrapper. It provides a unified execution
|
|
14
|
+
model for providers, tools, MCP servers, streaming, schemas, files, and
|
|
15
|
+
state.
|
|
16
|
+
|
|
17
|
+
It is built for engineers who want control over how these systems run. llm.rb
|
|
18
|
+
stays close to Ruby, runs on the standard library by default, loads optional
|
|
19
|
+
pieces only when needed, and remains easy to extend. It also works well in
|
|
20
|
+
Rails or ActiveRecord applications, where a small wrapper around context
|
|
21
|
+
persistence is enough to save and restore long-lived conversation state across
|
|
22
|
+
requests, jobs, or retries.
|
|
23
|
+
|
|
24
|
+
Most LLM libraries stop at request/response APIs. Building real systems means
|
|
25
|
+
stitching together streaming, tools, state, persistence, and external
|
|
26
|
+
services by hand. llm.rb provides a single execution model for all of these,
|
|
27
|
+
so they compose naturally instead of becoming separate subsystems.
|
|
28
|
+
|
|
29
|
+
## Architecture
|
|
30
|
+
|
|
31
|
+
```
|
|
32
|
+
External MCP Internal MCP OpenAPI / REST
|
|
33
|
+
│ │ │
|
|
34
|
+
└────────── Tools / MCP Layer ───────┘
|
|
35
|
+
│
|
|
36
|
+
llm.rb Contexts
|
|
37
|
+
│
|
|
38
|
+
LLM Providers
|
|
39
|
+
(OpenAI, Anthropic, etc.)
|
|
40
|
+
│
|
|
41
|
+
Your Application
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Core Concept
|
|
45
|
+
|
|
46
|
+
`LLM::Context` is the execution boundary in llm.rb.
|
|
47
|
+
|
|
48
|
+
It holds:
|
|
49
|
+
- message history
|
|
50
|
+
- tool state
|
|
51
|
+
- schemas
|
|
52
|
+
- streaming configuration
|
|
53
|
+
- usage and cost tracking
|
|
54
|
+
|
|
55
|
+
Instead of switching abstractions for each feature, everything builds on the
|
|
56
|
+
same context object.
|
|
57
|
+
|
|
58
|
+
## Differentiators
|
|
59
|
+
|
|
60
|
+
### Execution Model
|
|
61
|
+
|
|
62
|
+
- **A system layer, not just an API wrapper**
|
|
63
|
+
Put providers, tools, MCP servers, and application APIs behind one runtime
|
|
64
|
+
model instead of stitching them together by hand.
|
|
65
|
+
- **Contexts are central**
|
|
66
|
+
Keep history, tools, schema, usage, persistence, and execution state in one
|
|
67
|
+
place instead of spreading them across your app.
|
|
68
|
+
- **Contexts can be serialized**
|
|
69
|
+
Save and restore live state for jobs, databases, retries, or long-running
|
|
70
|
+
workflows.
|
|
71
|
+
|
|
72
|
+
### Runtime Behavior
|
|
73
|
+
|
|
74
|
+
- **Streaming and tool execution work together**
|
|
75
|
+
Start tool work while output is still streaming so you can hide latency
|
|
76
|
+
instead of waiting for turns to finish.
|
|
77
|
+
- **Concurrency is a first-class feature**
|
|
78
|
+
Use threads, fibers, or async tasks without rewriting your tool layer.
|
|
79
|
+
- **Advanced workloads are built in, not bolted on**
|
|
80
|
+
Streaming, concurrent tool execution, persistence, tracing, and MCP support
|
|
81
|
+
all fit the same runtime model.
|
|
82
|
+
|
|
83
|
+
### Integration
|
|
84
|
+
|
|
85
|
+
- **MCP is built in**
|
|
86
|
+
Connect to MCP servers over stdio or HTTP without bolting on a separate
|
|
87
|
+
integration stack.
|
|
88
|
+
- **Tools are explicit**
|
|
89
|
+
Run local tools, provider-native tools, and MCP tools through the same path
|
|
90
|
+
with fewer special cases.
|
|
91
|
+
- **Providers are normalized, not flattened**
|
|
92
|
+
Share one API surface across providers without losing access to provider-
|
|
93
|
+
specific capabilities where they matter.
|
|
94
|
+
- **Local model metadata is included**
|
|
95
|
+
Model capabilities, pricing, and limits are available locally without extra
|
|
96
|
+
API calls.
|
|
97
|
+
|
|
98
|
+
### Design Philosophy
|
|
99
|
+
|
|
100
|
+
- **Runs on the stdlib**
|
|
101
|
+
Start with Ruby's standard library and add extra dependencies only when you
|
|
102
|
+
need them.
|
|
103
|
+
- **It is highly pluggable**
|
|
104
|
+
Add tools, swap providers, change JSON backends, plug in tracing, or layer
|
|
105
|
+
internal APIs and MCP servers into the same execution path.
|
|
106
|
+
- **It scales from scripts to long-lived systems**
|
|
107
|
+
The same primitives work for one-off scripts, background jobs, and more
|
|
108
|
+
demanding application workloads with streaming, persistence, and tracing.
|
|
109
|
+
- **Thread boundaries are clear**
|
|
110
|
+
Providers are shareable. Contexts are stateful and should stay thread-local.
|
|
109
111
|
|
|
110
112
|
## Capabilities
|
|
111
113
|
|
|
112
|
-
llm.rb provides a complete set of primitives for building LLM-powered systems:
|
|
113
|
-
|
|
114
114
|
- **Chat & Contexts** — stateless and stateful interactions with persistence
|
|
115
|
-
- **
|
|
116
|
-
- **
|
|
117
|
-
- **Tool Calling** —
|
|
118
|
-
- **Run Tools While Streaming** —
|
|
115
|
+
- **Context Serialization** — save and restore state across processes or time
|
|
116
|
+
- **Streaming** — visible output, reasoning output, tool-call events
|
|
117
|
+
- **Tool Calling** — class-based tools and closure-based functions
|
|
118
|
+
- **Run Tools While Streaming** — overlap model output with tool latency
|
|
119
119
|
- **Concurrent Execution** — threads, async tasks, and fibers
|
|
120
|
-
- **Agents** — reusable
|
|
121
|
-
- **Structured Outputs** — JSON
|
|
122
|
-
- **
|
|
120
|
+
- **Agents** — reusable assistants with tool auto-execution
|
|
121
|
+
- **Structured Outputs** — JSON Schema-based responses
|
|
122
|
+
- **Responses API** — stateful response workflows where providers support them
|
|
123
|
+
- **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
|
|
123
124
|
- **Multimodal Inputs** — text, images, audio, documents, URLs
|
|
124
|
-
- **Audio** —
|
|
125
|
+
- **Audio** — speech generation, transcription, translation
|
|
125
126
|
- **Images** — generation and editing
|
|
126
127
|
- **Files API** — upload and reference files in prompts
|
|
127
128
|
- **Embeddings** — vector generation for search and RAG
|
|
128
|
-
- **Vector Stores** —
|
|
129
|
-
- **Cost Tracking** —
|
|
129
|
+
- **Vector Stores** — retrieval workflows
|
|
130
|
+
- **Cost Tracking** — local cost estimation without extra API calls
|
|
130
131
|
- **Observability** — tracing, logging, telemetry
|
|
131
132
|
- **Model Registry** — local metadata for capabilities, limits, pricing
|
|
133
|
+
- **Persistent HTTP** — optional connection pooling for providers and MCP
|
|
132
134
|
|
|
133
|
-
##
|
|
134
|
-
|
|
135
|
-
These examples show individual features, but llm.rb is designed to combine
|
|
136
|
-
them into full systems where LLMs, tools, and external services operate
|
|
137
|
-
together.
|
|
138
|
-
|
|
139
|
-
#### Simple Streaming
|
|
135
|
+
## Installation
|
|
140
136
|
|
|
141
|
-
|
|
142
|
-
|
|
143
|
-
|
|
137
|
+
```bash
|
|
138
|
+
gem install llm.rb
|
|
139
|
+
```
|
|
144
140
|
|
|
145
|
-
|
|
146
|
-
[`LLM::Stream`](lib/llm/stream.rb). See [Advanced Streaming](#advanced-streaming)
|
|
147
|
-
for a structured callback-based example. Basic `#<<` streams only receive
|
|
148
|
-
visible output chunks:
|
|
141
|
+
## Example
|
|
149
142
|
|
|
150
143
|
```ruby
|
|
151
|
-
#!/usr/bin/env ruby
|
|
152
144
|
require "llm"
|
|
153
145
|
|
|
154
146
|
llm = LLM.openai(key: ENV["KEY"])
|
|
155
147
|
ctx = LLM::Context.new(llm, stream: $stdout)
|
|
148
|
+
|
|
156
149
|
loop do
|
|
157
150
|
print "> "
|
|
158
151
|
ctx.talk(STDIN.gets || break)
|
|
@@ -160,623 +153,13 @@ loop do
|
|
|
160
153
|
end
|
|
161
154
|
```
|
|
162
155
|
|
|
163
|
-
|
|
164
|
-
|
|
165
|
-
The `LLM::Schema` system lets you define JSON schemas for structured outputs.
|
|
166
|
-
Schemas can be defined as classes with `property` declarations or built
|
|
167
|
-
programmatically using a fluent interface. When you pass a schema to a context,
|
|
168
|
-
llm.rb adapts it into the provider's structured-output format when that
|
|
169
|
-
provider supports one. The `content!` method then parses the assistant's JSON
|
|
170
|
-
response into a Ruby object:
|
|
171
|
-
|
|
172
|
-
```ruby
|
|
173
|
-
#!/usr/bin/env ruby
|
|
174
|
-
require "llm"
|
|
175
|
-
require "pp"
|
|
176
|
-
|
|
177
|
-
class Report < LLM::Schema
|
|
178
|
-
property :category, Enum["performance", "security", "outage"], "Report category", required: true
|
|
179
|
-
property :summary, String, "Short summary", required: true
|
|
180
|
-
property :impact, OneOf[String, Integer], "Primary impact, as text or a count", required: true
|
|
181
|
-
property :services, Array[String], "Impacted services", required: true
|
|
182
|
-
property :timestamp, String, "When it happened", optional: true
|
|
183
|
-
end
|
|
184
|
-
|
|
185
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
186
|
-
ctx = LLM::Context.new(llm, schema: Report)
|
|
187
|
-
res = ctx.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
|
|
188
|
-
pp res.content!
|
|
189
|
-
|
|
190
|
-
# {
|
|
191
|
-
# "category" => "performance",
|
|
192
|
-
# "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
|
|
193
|
-
# "impact" => "5% request timeouts",
|
|
194
|
-
# "services" => ["Database"],
|
|
195
|
-
# "timestamp" => "2024-06-05T10:42:00Z"
|
|
196
|
-
# }
|
|
197
|
-
```
|
|
198
|
-
|
|
199
|
-
#### Tool Calling
|
|
200
|
-
|
|
201
|
-
Tools in llm.rb can be defined as classes inheriting from `LLM::Tool` or as
|
|
202
|
-
closures using `LLM.function`. When the LLM requests a tool call, the context
|
|
203
|
-
stores `Function` objects in `ctx.functions`. The `call()` method executes all
|
|
204
|
-
pending functions and returns their results to the LLM. Tools describe
|
|
205
|
-
structured parameters with JSON Schema and adapt those definitions to each
|
|
206
|
-
provider's tool-calling format (OpenAI, Anthropic, Google, etc.):
|
|
207
|
-
|
|
208
|
-
```ruby
|
|
209
|
-
#!/usr/bin/env ruby
|
|
210
|
-
require "llm"
|
|
211
|
-
|
|
212
|
-
class System < LLM::Tool
|
|
213
|
-
name "system"
|
|
214
|
-
description "Run a shell command"
|
|
215
|
-
param :command, String, "Command to execute", required: true
|
|
216
|
-
|
|
217
|
-
def call(command:)
|
|
218
|
-
{success: system(command)}
|
|
219
|
-
end
|
|
220
|
-
end
|
|
221
|
-
|
|
222
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
223
|
-
ctx = LLM::Context.new(llm, stream: $stdout, tools: [System])
|
|
224
|
-
ctx.talk("Run `date`.")
|
|
225
|
-
ctx.talk(ctx.call(:functions)) while ctx.functions.any?
|
|
226
|
-
```
|
|
227
|
-
|
|
228
|
-
#### Concurrent Tools
|
|
229
|
-
|
|
230
|
-
llm.rb provides explicit concurrency control for tool execution. The
|
|
231
|
-
`wait(:thread)` method spawns each pending function in its own thread and waits
|
|
232
|
-
for all to complete. You can also use `:fiber` for cooperative multitasking or
|
|
233
|
-
`:task` for async/await patterns (requires the `async` gem). The context
|
|
234
|
-
automatically collects all results and reports them back to the LLM in a
|
|
235
|
-
single turn, maintaining conversation flow while parallelizing independent
|
|
236
|
-
operations:
|
|
237
|
-
|
|
238
|
-
```ruby
|
|
239
|
-
#!/usr/bin/env ruby
|
|
240
|
-
require "llm"
|
|
241
|
-
|
|
242
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
243
|
-
ctx = LLM::Context.new(llm, stream: $stdout, tools: [FetchWeather, FetchNews, FetchStock])
|
|
244
|
-
|
|
245
|
-
# Execute multiple independent tools concurrently
|
|
246
|
-
ctx.talk("Summarize the weather, headlines, and stock price.")
|
|
247
|
-
ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
#### Advanced Streaming
|
|
251
|
-
|
|
252
|
-
Use [`LLM::Stream`](lib/llm/stream.rb) when you want more than plain `#<<`
|
|
253
|
-
output. It adds structured streaming callbacks for:
|
|
254
|
-
|
|
255
|
-
- `on_content` for visible assistant output
|
|
256
|
-
- `on_reasoning_content` for separate reasoning output
|
|
257
|
-
- `on_tool_call` for streamed tool-call notifications
|
|
258
|
-
- `on_tool_return` for completed tool execution
|
|
259
|
-
|
|
260
|
-
Subclass [`LLM::Stream`](lib/llm/stream.rb) when you want callbacks like
|
|
261
|
-
`on_reasoning_content`, `on_tool_call`, and `on_tool_return`, or helpers like
|
|
262
|
-
`queue` and `wait`.
|
|
263
|
-
|
|
264
|
-
Keep `on_content`, `on_reasoning_content`, and `on_tool_call` fast: they run
|
|
265
|
-
inline with the streaming parser. `on_tool_return` is different: it runs later,
|
|
266
|
-
when `wait` resolves queued streamed tool work.
|
|
267
|
-
|
|
268
|
-
`on_tool_call` lets tools start before the model finishes its turn, for
|
|
269
|
-
example with `tool.spawn(:thread)`, `tool.spawn(:fiber)`, or
|
|
270
|
-
`tool.spawn(:task)`. That can overlap tool latency with streaming output.
|
|
271
|
-
`on_tool_return` is the place to react when that queued work completes, for
|
|
272
|
-
example by updating progress UIs, logging tool latency, or changing visible
|
|
273
|
-
state from "Running tool ..." to "Finished tool ...".
|
|
274
|
-
|
|
275
|
-
If a stream cannot resolve a tool, `on_tool_call` receives `error` as an
|
|
276
|
-
`LLM::Function::Return`. That keeps the session alive and leaves control in
|
|
277
|
-
the callback: it can send `error`, spawn the tool when `error == nil`, or
|
|
278
|
-
handle the situation however it sees fit.
|
|
279
|
-
|
|
280
|
-
In normal use this should be rare, since `on_tool_call` is usually called with
|
|
281
|
-
a resolved tool and `error == nil`. To resolve a tool call, the tool must be
|
|
282
|
-
found in `LLM::Function.registry`. That covers `LLM::Tool` subclasses,
|
|
283
|
-
including MCP tools, but not `LLM.function` closures, which are excluded
|
|
284
|
-
because they may be bound to local state:
|
|
285
|
-
|
|
286
|
-
```ruby
|
|
287
|
-
#!/usr/bin/env ruby
|
|
288
|
-
require "llm"
|
|
289
|
-
# Assume `System < LLM::Tool` is already defined.
|
|
290
|
-
|
|
291
|
-
class Stream < LLM::Stream
|
|
292
|
-
def on_content(content)
|
|
293
|
-
$stdout << content
|
|
294
|
-
end
|
|
295
|
-
|
|
296
|
-
def on_reasoning_content(content)
|
|
297
|
-
$stderr << content
|
|
298
|
-
end
|
|
299
|
-
|
|
300
|
-
def on_tool_call(tool, error)
|
|
301
|
-
$stdout << "Running tool #{tool.name}\n"
|
|
302
|
-
queue << (error || tool.spawn(:thread))
|
|
303
|
-
end
|
|
304
|
-
|
|
305
|
-
def on_tool_return(tool, ret)
|
|
306
|
-
$stdout << (ret.error? ? "Tool #{tool.name} failed\n" : "Finished tool #{tool.name}\n")
|
|
307
|
-
end
|
|
308
|
-
end
|
|
309
|
-
|
|
310
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
311
|
-
ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])
|
|
312
|
-
|
|
313
|
-
ctx.talk("Run `date` and `uname -a`.")
|
|
314
|
-
while ctx.functions.any?
|
|
315
|
-
ctx.talk(ctx.wait(:thread))
|
|
316
|
-
end
|
|
317
|
-
```
|
|
318
|
-
|
|
319
|
-
#### MCP
|
|
320
|
-
|
|
321
|
-
MCP is a first-class integration mechanism in llm.rb.
|
|
322
|
-
|
|
323
|
-
MCP allows llm.rb to treat external services, internal APIs, and system
|
|
324
|
-
capabilities as tools in a unified interface. This makes it possible to
|
|
325
|
-
connect multiple MCP sources simultaneously and expose your own APIs as tools.
|
|
326
|
-
|
|
327
|
-
In practice, this supports workflows such as external SaaS integrations,
|
|
328
|
-
multiple MCP sources in the same context, and OpenAPI -> MCP -> tools
|
|
329
|
-
pipelines for internal services.
|
|
330
|
-
|
|
331
|
-
llm.rb integrates with the Model Context Protocol (MCP) to dynamically discover
|
|
332
|
-
and use tools from external servers. This example starts a filesystem MCP
|
|
333
|
-
server over stdio and makes its tools available to a context, enabling the LLM
|
|
334
|
-
to interact with the local file system through a standardized interface.
|
|
335
|
-
Use `LLM::MCP.stdio` or `LLM::MCP.http` when you want to make the transport
|
|
336
|
-
explicit. Like `LLM::Context`, an MCP client is stateful and should remain
|
|
337
|
-
isolated to a single thread:
|
|
338
|
-
|
|
339
|
-
```ruby
|
|
340
|
-
#!/usr/bin/env ruby
|
|
341
|
-
require "llm"
|
|
342
|
-
|
|
343
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
344
|
-
mcp = LLM::MCP.stdio(argv: ["npx", "-y", "@modelcontextprotocol/server-filesystem", Dir.pwd])
|
|
345
|
-
|
|
346
|
-
begin
|
|
347
|
-
mcp.start
|
|
348
|
-
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
|
|
349
|
-
ctx.talk("List the directories in this project.")
|
|
350
|
-
ctx.talk(ctx.call(:functions)) while ctx.functions.any?
|
|
351
|
-
ensure
|
|
352
|
-
mcp.stop
|
|
353
|
-
end
|
|
354
|
-
```
|
|
355
|
-
|
|
356
|
-
You can also connect to an MCP server over HTTP. This is useful when the
|
|
357
|
-
server already runs remotely and exposes MCP through a URL instead of a local
|
|
358
|
-
process. If you expect repeated tool calls, use `persistent` to reuse a
|
|
359
|
-
process-wide HTTP connection pool. This requires the optional
|
|
360
|
-
`net-http-persistent` gem:
|
|
361
|
-
|
|
362
|
-
```ruby
|
|
363
|
-
#!/usr/bin/env ruby
|
|
364
|
-
require "llm"
|
|
365
|
-
|
|
366
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
367
|
-
mcp = LLM::MCP.http(
|
|
368
|
-
url: "https://api.githubcopilot.com/mcp/",
|
|
369
|
-
headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
|
|
370
|
-
).persistent
|
|
371
|
-
|
|
372
|
-
begin
|
|
373
|
-
mcp.start
|
|
374
|
-
ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
|
|
375
|
-
ctx.talk("List the available GitHub MCP toolsets.")
|
|
376
|
-
ctx.talk(ctx.call(:functions)) while ctx.functions.any?
|
|
377
|
-
ensure
|
|
378
|
-
mcp.stop
|
|
379
|
-
end
|
|
380
|
-
```
|
|
381
|
-
|
|
382
|
-
## Providers
|
|
383
|
-
|
|
384
|
-
llm.rb supports multiple LLM providers with a unified API.
|
|
385
|
-
All providers share the same context, tool, and concurrency interfaces, making
|
|
386
|
-
it easy to switch between cloud and local models:
|
|
387
|
-
|
|
388
|
-
- **OpenAI** (`LLM.openai`)
|
|
389
|
-
- **Anthropic** (`LLM.anthropic`)
|
|
390
|
-
- **Google** (`LLM.google`)
|
|
391
|
-
- **DeepSeek** (`LLM.deepseek`)
|
|
392
|
-
- **xAI** (`LLM.xai`)
|
|
393
|
-
- **zAI** (`LLM.zai`)
|
|
394
|
-
- **Ollama** (`LLM.ollama`)
|
|
395
|
-
- **Llama.cpp** (`LLM.llamacpp`)
|
|
396
|
-
|
|
397
|
-
## Production
|
|
398
|
-
|
|
399
|
-
#### Ready for production
|
|
400
|
-
|
|
401
|
-
llm.rb is designed for production use from the ground up:
|
|
402
|
-
|
|
403
|
-
- **Thread-safe providers** - Share `LLM::Provider` instances across your application
|
|
404
|
-
- **Thread-local contexts** - Keep `LLM::Context` instances thread-local for state isolation
|
|
405
|
-
- **Cost tracking** - Know your spend before the bill arrives
|
|
406
|
-
- **Observability** - Built-in tracing with OpenTelemetry support
|
|
407
|
-
- **Persistence** - Save and restore contexts across processes
|
|
408
|
-
- **Performance** - Swap JSON adapters and enable HTTP connection pooling
|
|
409
|
-
- **Error handling** - Structured errors, not unpredictable exceptions
|
|
410
|
-
|
|
411
|
-
#### Tracing
|
|
412
|
-
|
|
413
|
-
llm.rb includes built-in tracers for local logging, OpenTelemetry, and
|
|
414
|
-
LangSmith. Assign a tracer to a provider and all context requests and tool
|
|
415
|
-
calls made through that provider will be instrumented. Tracers are local to
|
|
416
|
-
the current fiber, so the same provider can use different tracers in different
|
|
417
|
-
concurrent tasks without interfering with each other.
|
|
418
|
-
|
|
419
|
-
Use the logger tracer when you want structured logs through Ruby's standard
|
|
420
|
-
library:
|
|
421
|
-
|
|
422
|
-
```ruby
|
|
423
|
-
#!/usr/bin/env ruby
|
|
424
|
-
require "llm"
|
|
425
|
-
|
|
426
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
427
|
-
llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
|
|
428
|
-
|
|
429
|
-
ctx = LLM::Context.new(llm)
|
|
430
|
-
ctx.talk("Hello")
|
|
431
|
-
```
|
|
432
|
-
|
|
433
|
-
Use the telemetry tracer when you want OpenTelemetry spans. This requires the
|
|
434
|
-
`opentelemetry-sdk` gem, and exporters such as OTLP can be added separately:
|
|
435
|
-
|
|
436
|
-
```ruby
|
|
437
|
-
#!/usr/bin/env ruby
|
|
438
|
-
require "llm"
|
|
439
|
-
|
|
440
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
441
|
-
llm.tracer = LLM::Tracer::Telemetry.new(llm)
|
|
442
|
-
|
|
443
|
-
ctx = LLM::Context.new(llm)
|
|
444
|
-
ctx.talk("Hello")
|
|
445
|
-
pp llm.tracer.spans
|
|
446
|
-
```
|
|
447
|
-
|
|
448
|
-
Use the LangSmith tracer when you want LangSmith-compatible metadata and trace
|
|
449
|
-
grouping on top of the telemetry tracer:
|
|
450
|
-
|
|
451
|
-
```ruby
|
|
452
|
-
#!/usr/bin/env ruby
|
|
453
|
-
require "llm"
|
|
454
|
-
|
|
455
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
456
|
-
llm.tracer = LLM::Tracer::Langsmith.new(
|
|
457
|
-
llm,
|
|
458
|
-
metadata: {env: "dev"},
|
|
459
|
-
tags: ["chatbot"]
|
|
460
|
-
)
|
|
461
|
-
|
|
462
|
-
ctx = LLM::Context.new(llm)
|
|
463
|
-
ctx.talk("Hello")
|
|
464
|
-
```
|
|
465
|
-
|
|
466
|
-
#### Thread Safety
|
|
467
|
-
|
|
468
|
-
llm.rb uses Ruby's `Monitor` class to ensure thread safety at the provider
|
|
469
|
-
level, allowing you to share a single provider instance across multiple threads
|
|
470
|
-
while maintaining state isolation through thread-local contexts. This design
|
|
471
|
-
enables efficient resource sharing while preventing race conditions in
|
|
472
|
-
concurrent applications:
|
|
473
|
-
|
|
474
|
-
```ruby
|
|
475
|
-
#!/usr/bin/env ruby
|
|
476
|
-
require "llm"
|
|
477
|
-
|
|
478
|
-
# Thread-safe providers - create once, use everywhere
|
|
479
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
480
|
-
|
|
481
|
-
# Each thread should have its own context for state isolation
|
|
482
|
-
Thread.new do
|
|
483
|
-
ctx = LLM::Context.new(llm) # Thread-local context
|
|
484
|
-
ctx.talk("Hello from thread 1")
|
|
485
|
-
end
|
|
486
|
-
|
|
487
|
-
Thread.new do
|
|
488
|
-
ctx = LLM::Context.new(llm) # Thread-local context
|
|
489
|
-
ctx.talk("Hello from thread 2")
|
|
490
|
-
end
|
|
491
|
-
```
|
|
492
|
-
|
|
493
|
-
#### Performance Tuning
|
|
494
|
-
|
|
495
|
-
llm.rb's JSON adapter system lets you swap JSON libraries for better
|
|
496
|
-
performance in high-throughput applications. The library supports stdlib JSON,
|
|
497
|
-
Oj, and Yajl, with Oj typically offering the best performance. Additionally,
|
|
498
|
-
you can enable HTTP connection pooling using the optional `net-http-persistent`
|
|
499
|
-
gem to reduce connection overhead in production environments:
|
|
500
|
-
|
|
501
|
-
```ruby
|
|
502
|
-
#!/usr/bin/env ruby
|
|
503
|
-
require "llm"
|
|
504
|
-
|
|
505
|
-
# Swap JSON libraries for better performance
|
|
506
|
-
LLM.json = :oj # Use Oj for faster JSON parsing
|
|
507
|
-
|
|
508
|
-
# Enable HTTP connection pooling for high-throughput applications
|
|
509
|
-
llm = LLM.openai(key: ENV["KEY"]).persistent # Uses net-http-persistent when available
|
|
510
|
-
```
|
|
511
|
-
|
|
512
|
-
#### Model Registry
|
|
513
|
-
|
|
514
|
-
llm.rb includes a local model registry that provides metadata about model
|
|
515
|
-
capabilities, pricing, and limits without requiring API calls. The registry is
|
|
516
|
-
shipped with the gem and sourced from https://models.dev, giving you access to
|
|
517
|
-
up-to-date information about context windows, token costs, and supported
|
|
518
|
-
modalities for each provider:
|
|
519
|
-
|
|
520
|
-
```ruby
|
|
521
|
-
#!/usr/bin/env ruby
|
|
522
|
-
require "llm"
|
|
523
|
-
|
|
524
|
-
# Access model metadata, capabilities, and pricing
|
|
525
|
-
registry = LLM.registry_for(:openai)
|
|
526
|
-
model_info = registry.limit(model: "gpt-4.1")
|
|
527
|
-
puts "Context window: #{model_info.context} tokens"
|
|
528
|
-
puts "Cost: $#{model_info.cost.input}/1M input tokens"
|
|
529
|
-
```
|
|
530
|
-
|
|
531
|
-
## More Examples
|
|
532
|
-
|
|
533
|
-
#### Responses API
|
|
534
|
-
|
|
535
|
-
llm.rb also supports OpenAI's Responses API through `LLM::Context` with
|
|
536
|
-
`mode: :responses`. The important switch is `store:`. With `store: false`, the
|
|
537
|
-
Responses API stays stateless while still using the Responses endpoint, which
|
|
538
|
-
is useful for models or features that are only available through the Responses
|
|
539
|
-
API. With `store: true`, OpenAI can keep
|
|
540
|
-
response state server-side and reduce how much conversation state needs to be
|
|
541
|
-
sent on each turn:
|
|
542
|
-
|
|
543
|
-
```ruby
|
|
544
|
-
#!/usr/bin/env ruby
|
|
545
|
-
require "llm"
|
|
546
|
-
|
|
547
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
548
|
-
ctx = LLM::Context.new(llm, mode: :responses, store: false)
|
|
549
|
-
|
|
550
|
-
ctx.talk("Your task is to answer the user's questions", role: :developer)
|
|
551
|
-
res = ctx.talk("What is the capital of France?")
|
|
552
|
-
puts res.content
|
|
553
|
-
```
|
|
554
|
-
|
|
555
|
-
#### Context Persistence: Vanilla
|
|
556
|
-
|
|
557
|
-
Contexts can be serialized and restored across process boundaries. A context
|
|
558
|
-
can be serialized to JSON and stored on disk, in a database, in a job queue,
|
|
559
|
-
or anywhere else your application needs to persist state:
|
|
560
|
-
|
|
561
|
-
```ruby
|
|
562
|
-
#!/usr/bin/env ruby
|
|
563
|
-
require "llm"
|
|
564
|
-
|
|
565
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
566
|
-
ctx = LLM::Context.new(llm)
|
|
567
|
-
ctx.talk("Hello")
|
|
568
|
-
ctx.talk("Remember that my favorite language is Ruby")
|
|
569
|
-
|
|
570
|
-
# Serialize to a string when you want to store the context yourself,
|
|
571
|
-
# for example in a database row or job payload.
|
|
572
|
-
payload = ctx.to_json
|
|
573
|
-
|
|
574
|
-
restored = LLM::Context.new(llm)
|
|
575
|
-
restored.restore(string: payload)
|
|
576
|
-
res = restored.talk("What is my favorite language?")
|
|
577
|
-
puts res.content
|
|
578
|
-
|
|
579
|
-
# You can also persist the same state to a file:
|
|
580
|
-
ctx.save(path: "context.json")
|
|
581
|
-
restored = LLM::Context.new(llm)
|
|
582
|
-
restored.restore(path: "context.json")
|
|
583
|
-
```
|
|
584
|
-
|
|
585
|
-
#### Context Persistence: ActiveRecord (Rails)
|
|
586
|
-
|
|
587
|
-
In a Rails application, you can also wrap persisted context state in an
|
|
588
|
-
ActiveRecord model. A minimal schema would include a `snapshot` column for the
|
|
589
|
-
serialized context payload (`jsonb` is recommended) and a `provider` column
|
|
590
|
-
for the provider name:
|
|
591
|
-
|
|
592
|
-
```ruby
|
|
593
|
-
create_table :contexts do |t|
|
|
594
|
-
t.jsonb :snapshot
|
|
595
|
-
t.string :provider, null: false
|
|
596
|
-
t.timestamps
|
|
597
|
-
end
|
|
598
|
-
```
|
|
599
|
-
|
|
600
|
-
For example:
|
|
601
|
-
|
|
602
|
-
```ruby
|
|
603
|
-
class Context < ApplicationRecord
|
|
604
|
-
def talk(...)
|
|
605
|
-
ctx.talk(...).tap { flush }
|
|
606
|
-
end
|
|
607
|
-
|
|
608
|
-
def wait(...)
|
|
609
|
-
ctx.wait(...).tap { flush }
|
|
610
|
-
end
|
|
611
|
-
|
|
612
|
-
def messages
|
|
613
|
-
ctx.messages
|
|
614
|
-
end
|
|
615
|
-
|
|
616
|
-
def model
|
|
617
|
-
ctx.model
|
|
618
|
-
end
|
|
619
|
-
|
|
620
|
-
def flush
|
|
621
|
-
update_column(:snapshot, ctx.to_json)
|
|
622
|
-
end
|
|
623
|
-
|
|
624
|
-
private
|
|
625
|
-
|
|
626
|
-
def ctx
|
|
627
|
-
@ctx ||= begin
|
|
628
|
-
ctx = LLM::Context.new(llm)
|
|
629
|
-
ctx.restore(string: snapshot) if snapshot
|
|
630
|
-
ctx
|
|
631
|
-
end
|
|
632
|
-
end
|
|
633
|
-
|
|
634
|
-
def llm
|
|
635
|
-
LLM.method(provider).call(key: ENV.fetch(key))
|
|
636
|
-
end
|
|
156
|
+
## Resources
|
|
637
157
|
|
|
638
|
-
|
|
639
|
-
|
|
640
|
-
|
|
641
|
-
|
|
642
|
-
|
|
643
|
-
|
|
644
|
-
#### Agents
|
|
645
|
-
|
|
646
|
-
Agents in llm.rb are reusable, preconfigured assistants that automatically
|
|
647
|
-
execute tool calls and maintain conversation state. Unlike contexts which
|
|
648
|
-
require manual tool execution, agents automatically handle the tool call loop,
|
|
649
|
-
making them ideal for autonomous workflows where you want the LLM to
|
|
650
|
-
independently use available tools to accomplish tasks:
|
|
651
|
-
|
|
652
|
-
```ruby
|
|
653
|
-
#!/usr/bin/env ruby
|
|
654
|
-
require "llm"
|
|
655
|
-
|
|
656
|
-
class SystemAdmin < LLM::Agent
|
|
657
|
-
model "gpt-4.1"
|
|
658
|
-
instructions "You are a Linux system admin"
|
|
659
|
-
tools Shell
|
|
660
|
-
schema Result
|
|
661
|
-
end
|
|
662
|
-
|
|
663
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
664
|
-
agent = SystemAdmin.new(llm)
|
|
665
|
-
res = agent.talk("Run 'date'")
|
|
666
|
-
```
|
|
667
|
-
|
|
668
|
-
#### Cost Tracking
|
|
669
|
-
|
|
670
|
-
llm.rb provides built-in cost estimation that works without making additional
|
|
671
|
-
API calls. The cost tracking system uses the local model registry to calculate
|
|
672
|
-
estimated costs based on token usage, giving you visibility into spending
|
|
673
|
-
before bills arrive. This is particularly useful for monitoring usage in
|
|
674
|
-
production applications and setting budget alerts:
|
|
675
|
-
|
|
676
|
-
```ruby
|
|
677
|
-
#!/usr/bin/env ruby
|
|
678
|
-
require "llm"
|
|
679
|
-
|
|
680
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
681
|
-
ctx = LLM::Context.new(llm)
|
|
682
|
-
ctx.talk "Hello"
|
|
683
|
-
puts "Estimated cost so far: $#{ctx.cost}"
|
|
684
|
-
ctx.talk "Tell me a joke"
|
|
685
|
-
puts "Estimated cost so far: $#{ctx.cost}"
|
|
686
|
-
```
|
|
687
|
-
|
|
688
|
-
#### Multimodal Prompts
|
|
689
|
-
|
|
690
|
-
Contexts provide helpers for composing multimodal prompts from URLs, local
|
|
691
|
-
files, and provider-managed remote files. These tagged objects let providers
|
|
692
|
-
adapt the input into the format they expect:
|
|
693
|
-
|
|
694
|
-
```ruby
|
|
695
|
-
#!/usr/bin/env ruby
|
|
696
|
-
require "llm"
|
|
697
|
-
|
|
698
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
699
|
-
ctx = LLM::Context.new(llm)
|
|
700
|
-
|
|
701
|
-
res = ctx.talk ["Describe this image", ctx.image_url("https://example.com/cat.jpg")]
|
|
702
|
-
puts res.content
|
|
703
|
-
```
|
|
704
|
-
|
|
705
|
-
#### Audio Generation
|
|
706
|
-
|
|
707
|
-
llm.rb supports OpenAI's audio API for text-to-speech generation, allowing you
|
|
708
|
-
to create speech from text with configurable voices and output formats. The
|
|
709
|
-
audio API returns binary audio data that can be streamed directly to files or
|
|
710
|
-
other IO objects, enabling integration with multimedia applications:
|
|
711
|
-
|
|
712
|
-
```ruby
|
|
713
|
-
#!/usr/bin/env ruby
|
|
714
|
-
require "llm"
|
|
715
|
-
|
|
716
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
717
|
-
res = llm.audio.create_speech(input: "Hello world")
|
|
718
|
-
IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
|
|
719
|
-
```
|
|
720
|
-
|
|
721
|
-
#### Image Generation
|
|
722
|
-
|
|
723
|
-
llm.rb provides access to OpenAI's DALL-E image generation API through a
|
|
724
|
-
unified interface. The API supports multiple response formats including
|
|
725
|
-
base64-encoded images and temporary URLs, with automatic handling of binary
|
|
726
|
-
data streaming for efficient file operations:
|
|
727
|
-
|
|
728
|
-
```ruby
|
|
729
|
-
#!/usr/bin/env ruby
|
|
730
|
-
require "llm"
|
|
731
|
-
|
|
732
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
733
|
-
res = llm.images.create(prompt: "a dog on a rocket to the moon")
|
|
734
|
-
IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
|
|
735
|
-
```
|
|
736
|
-
|
|
737
|
-
#### Embeddings
|
|
738
|
-
|
|
739
|
-
llm.rb's embedding API generates vector representations of text for semantic
|
|
740
|
-
search and retrieval-augmented generation (RAG) workflows. The API supports
|
|
741
|
-
batch processing of multiple inputs and returns normalized vectors suitable for
|
|
742
|
-
vector similarity operations, with consistent dimensionality across providers:
|
|
743
|
-
|
|
744
|
-
```ruby
|
|
745
|
-
#!/usr/bin/env ruby
|
|
746
|
-
require "llm"
|
|
747
|
-
|
|
748
|
-
llm = LLM.openai(key: ENV["KEY"])
|
|
749
|
-
res = llm.embed(["programming is fun", "ruby is a programming language", "sushi is art"])
|
|
750
|
-
puts res.class
|
|
751
|
-
puts res.embeddings.size
|
|
752
|
-
puts res.embeddings[0].size
|
|
753
|
-
|
|
754
|
-
# LLM::Response
|
|
755
|
-
# 3
|
|
756
|
-
# 1536
|
|
757
|
-
```
|
|
758
|
-
|
|
759
|
-
## Real-World Example: Relay
|
|
760
|
-
|
|
761
|
-
See how these pieces come together in a complete application architecture with
|
|
762
|
-
[Relay](https://github.com/llmrb/relay), a production-ready LLM application
|
|
763
|
-
built on llm.rb that demonstrates:
|
|
764
|
-
|
|
765
|
-
- Context management across requests
|
|
766
|
-
- Tool composition and execution
|
|
767
|
-
- Concurrent workflows
|
|
768
|
-
- Cost tracking and observability
|
|
769
|
-
- Production deployment patterns
|
|
770
|
-
|
|
771
|
-
Watch the screencast:
|
|
772
|
-
|
|
773
|
-
[](https://www.youtube.com/watch?v=x1K4wMeO_QA)
|
|
774
|
-
|
|
775
|
-
## Installation
|
|
776
|
-
|
|
777
|
-
```bash
|
|
778
|
-
gem install llm.rb
|
|
779
|
-
```
|
|
158
|
+
- [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
|
|
159
|
+
examples guide.
|
|
160
|
+
- [_examples/relay](./_examples/relay) shows a real application built on top
|
|
161
|
+
of llm.rb.
|
|
162
|
+
- [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
|
|
780
163
|
|
|
781
164
|
## License
|
|
782
165
|
|
data/lib/llm/context.rb
CHANGED
|
@@ -103,9 +103,9 @@ module LLM
|
|
|
103
103
|
# res = ctx.respond("What is the capital of France?")
|
|
104
104
|
# puts res.output_text
|
|
105
105
|
def respond(prompt, params = {})
|
|
106
|
-
res_id = @messages.find(&:assistant?)&.response&.response_id
|
|
107
|
-
params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
|
|
108
106
|
params = @params.merge(params)
|
|
107
|
+
res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
|
|
108
|
+
params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
|
|
109
109
|
res = @llm.responses.create(prompt, params)
|
|
110
110
|
role = params[:role] || @llm.user_role
|
|
111
111
|
@messages.concat LLM::Prompt === prompt ? prompt.to_a : [LLM::Message.new(role, prompt)]
|
data/lib/llm/function.rb
CHANGED
|
@@ -257,7 +257,7 @@ class LLM::Function
|
|
|
257
257
|
when "LLM::OpenAI::Responses"
|
|
258
258
|
{
|
|
259
259
|
type: "function", name: @name, description: @description,
|
|
260
|
-
parameters: @params.to_h.merge(additionalProperties: false), strict:
|
|
260
|
+
parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
|
|
261
261
|
}.compact
|
|
262
262
|
else
|
|
263
263
|
{
|
data/lib/llm/mcp/error.rb
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
class LLM::MCP
|
|
4
|
-
|
|
4
|
+
Error = Class.new(LLM::Error) do
|
|
5
5
|
attr_reader :code, :data
|
|
6
6
|
|
|
7
7
|
##
|
|
@@ -27,5 +27,35 @@ class LLM::MCP
|
|
|
27
27
|
end
|
|
28
28
|
end
|
|
29
29
|
|
|
30
|
+
MismatchError = Class.new(Error) do
|
|
31
|
+
##
|
|
32
|
+
# @return [Integer, String]
|
|
33
|
+
# The request id the client was waiting for
|
|
34
|
+
attr_reader :expected_id
|
|
35
|
+
|
|
36
|
+
##
|
|
37
|
+
# @return [Integer, String]
|
|
38
|
+
# The response id received from the server
|
|
39
|
+
attr_reader :actual_id
|
|
40
|
+
|
|
41
|
+
##
|
|
42
|
+
# @param [Integer, String] expected_id
|
|
43
|
+
# The request id the client was waiting for
|
|
44
|
+
# @param [Integer, String] actual_id
|
|
45
|
+
# The response id received from the server instead
|
|
46
|
+
def initialize(expected_id:, actual_id:)
|
|
47
|
+
@expected_id = expected_id
|
|
48
|
+
@actual_id = actual_id
|
|
49
|
+
super(message)
|
|
50
|
+
end
|
|
51
|
+
|
|
52
|
+
##
|
|
53
|
+
# @return [String]
|
|
54
|
+
def message
|
|
55
|
+
"mismatched MCP response id #{actual_id.inspect} " \
|
|
56
|
+
"while waiting for #{expected_id.inspect}"
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
|
|
30
60
|
TimeoutError = Class.new(Error)
|
|
31
61
|
end
|
data/lib/llm/mcp/rpc.rb
CHANGED
|
@@ -53,11 +53,14 @@ class LLM::MCP
|
|
|
53
53
|
poll(timeout:, ex: [IO::WaitReadable]) do
|
|
54
54
|
loop do
|
|
55
55
|
res = transport.read_nonblock
|
|
56
|
-
|
|
57
|
-
if res["error"]
|
|
56
|
+
if res["id"] == id && res["error"]
|
|
58
57
|
raise LLM::MCP::Error.from(response: res)
|
|
59
|
-
|
|
58
|
+
elsif res["id"] == id
|
|
60
59
|
break res["result"]
|
|
60
|
+
elsif res["method"]
|
|
61
|
+
next
|
|
62
|
+
elsif res.key?("id")
|
|
63
|
+
raise LLM::MCP::MismatchError.new(expected_id: id, actual_id: res["id"])
|
|
61
64
|
end
|
|
62
65
|
end
|
|
63
66
|
end
|
|
@@ -101,6 +104,8 @@ class LLM::MCP
|
|
|
101
104
|
# The exceptions to retry when raised
|
|
102
105
|
# @yield
|
|
103
106
|
# The block to run
|
|
107
|
+
# @raise [LLM::MCP::MismatchError]
|
|
108
|
+
# When an unrelated response id is received while waiting
|
|
104
109
|
# @raise [LLM::MCP::TimeoutError]
|
|
105
110
|
# When the block takes longer than the timeout
|
|
106
111
|
# @return [Object]
|
data/lib/llm/mcp.rb
CHANGED
|
@@ -121,6 +121,34 @@ class LLM::MCP
|
|
|
121
121
|
res["tools"].map { LLM::Tool.mcp(self, _1) }
|
|
122
122
|
end
|
|
123
123
|
|
|
124
|
+
##
|
|
125
|
+
# Returns the prompts provided by the MCP process.
|
|
126
|
+
# @return [Array<LLM::Object>]
|
|
127
|
+
def prompts
|
|
128
|
+
res = call(transport, "prompts/list")
|
|
129
|
+
LLM::Object.from(res["prompts"])
|
|
130
|
+
end
|
|
131
|
+
|
|
132
|
+
##
|
|
133
|
+
# Returns a prompt by name.
|
|
134
|
+
# @param [String] name The prompt name
|
|
135
|
+
# @param [Hash<String, String>, nil] arguments The prompt arguments
|
|
136
|
+
# @return [LLM::Object]
|
|
137
|
+
def find_prompt(name:, arguments: nil)
|
|
138
|
+
params = {name:}
|
|
139
|
+
params[:arguments] = arguments if arguments
|
|
140
|
+
res = call(transport, "prompts/get", params)
|
|
141
|
+
res["messages"] = [*res["messages"]].map do |message|
|
|
142
|
+
LLM::Message.new(
|
|
143
|
+
message["role"],
|
|
144
|
+
adapt_content(message["content"]),
|
|
145
|
+
{original_content: message["content"]}
|
|
146
|
+
)
|
|
147
|
+
end
|
|
148
|
+
LLM::Object.from(res)
|
|
149
|
+
end
|
|
150
|
+
alias_method :get_prompt, :find_prompt
|
|
151
|
+
|
|
124
152
|
##
|
|
125
153
|
# Calls a tool by name with the given arguments
|
|
126
154
|
# @param [String] name The name of the tool to call
|
|
@@ -135,6 +163,19 @@ class LLM::MCP
|
|
|
135
163
|
|
|
136
164
|
attr_reader :llm, :command, :transport, :timeout
|
|
137
165
|
|
|
166
|
+
def adapt_content(content)
|
|
167
|
+
case content
|
|
168
|
+
when String
|
|
169
|
+
content
|
|
170
|
+
when Hash
|
|
171
|
+
content["type"] == "text" ? content["text"].to_s : LLM::Object.from(content)
|
|
172
|
+
when Array
|
|
173
|
+
content.map { adapt_content(_1) }
|
|
174
|
+
else
|
|
175
|
+
content
|
|
176
|
+
end
|
|
177
|
+
end
|
|
178
|
+
|
|
138
179
|
def adapt_tool_result(result)
|
|
139
180
|
if result["structuredContent"]
|
|
140
181
|
result["structuredContent"]
|
|
@@ -15,6 +15,8 @@ module LLM::OpenAI::RequestAdapter
|
|
|
15
15
|
catch(:abort) do
|
|
16
16
|
if Hash === message
|
|
17
17
|
{role: message[:role], content: adapt_content(message[:content])}
|
|
18
|
+
elsif message.tool_call?
|
|
19
|
+
message.extra[:original_tool_calls]
|
|
18
20
|
else
|
|
19
21
|
adapt_message
|
|
20
22
|
end
|
|
@@ -23,12 +25,12 @@ module LLM::OpenAI::RequestAdapter
|
|
|
23
25
|
|
|
24
26
|
private
|
|
25
27
|
|
|
26
|
-
def adapt_content(content)
|
|
28
|
+
def adapt_content(content, role: message.role)
|
|
27
29
|
case content
|
|
28
30
|
when String
|
|
29
|
-
[{type:
|
|
31
|
+
[{type: text_content_type(role), text: content.to_s}]
|
|
30
32
|
when LLM::Response then adapt_remote_file(content)
|
|
31
|
-
when LLM::Message then adapt_content(content.content)
|
|
33
|
+
when LLM::Message then adapt_content(content.content, role: content.role)
|
|
32
34
|
when LLM::Object
|
|
33
35
|
case content.kind
|
|
34
36
|
when :image_url then [{type: :image_url, image_url: {url: content.value.to_s}}]
|
|
@@ -46,7 +48,7 @@ module LLM::OpenAI::RequestAdapter
|
|
|
46
48
|
when Array
|
|
47
49
|
adapt_array
|
|
48
50
|
else
|
|
49
|
-
{role: message.role, content: adapt_content(content)}
|
|
51
|
+
{role: message.role, content: adapt_content(content, role: message.role)}
|
|
50
52
|
end
|
|
51
53
|
end
|
|
52
54
|
|
|
@@ -56,7 +58,7 @@ module LLM::OpenAI::RequestAdapter
|
|
|
56
58
|
elsif returns.any?
|
|
57
59
|
returns.map { {type: "function_call_output", call_id: _1.id, output: LLM.json.dump(_1.value)} }
|
|
58
60
|
else
|
|
59
|
-
{role: message.role, content: content.flat_map { adapt_content(_1) }}
|
|
61
|
+
{role: message.role, content: content.flat_map { adapt_content(_1, role: message.role) }}
|
|
60
62
|
end
|
|
61
63
|
end
|
|
62
64
|
|
|
@@ -83,5 +85,9 @@ module LLM::OpenAI::RequestAdapter
|
|
|
83
85
|
def message = @message
|
|
84
86
|
def content = message.content
|
|
85
87
|
def returns = content.grep(LLM::Function::Return)
|
|
88
|
+
|
|
89
|
+
def text_content_type(role)
|
|
90
|
+
role.to_s == "assistant" ? :output_text : :input_text
|
|
91
|
+
end
|
|
86
92
|
end
|
|
87
93
|
end
|
|
@@ -60,6 +60,13 @@ module LLM::OpenAI::ResponseAdapter
|
|
|
60
60
|
body.model
|
|
61
61
|
end
|
|
62
62
|
|
|
63
|
+
##
|
|
64
|
+
# OpenAI's Responses API does not expose a system fingerprint.
|
|
65
|
+
# @return [nil]
|
|
66
|
+
def system_fingerprint
|
|
67
|
+
nil
|
|
68
|
+
end
|
|
69
|
+
|
|
63
70
|
##
|
|
64
71
|
# Returns the aggregated text content from the response outputs.
|
|
65
72
|
# @return [String]
|
|
@@ -88,10 +95,15 @@ module LLM::OpenAI::ResponseAdapter
|
|
|
88
95
|
private
|
|
89
96
|
|
|
90
97
|
def adapt_message
|
|
91
|
-
message = LLM::Message.new(
|
|
98
|
+
message = LLM::Message.new(
|
|
99
|
+
"assistant",
|
|
100
|
+
+"",
|
|
101
|
+
{response: self, tool_calls: [], original_tool_calls: [], reasoning_content: +""}
|
|
102
|
+
)
|
|
92
103
|
output.each do |choice|
|
|
93
104
|
if choice.type == "function_call"
|
|
94
105
|
message.extra[:tool_calls] << adapt_tool(choice)
|
|
106
|
+
message.extra[:original_tool_calls] << choice
|
|
95
107
|
elsif choice.type == "reasoning"
|
|
96
108
|
(choice.summary || []).each do |summary|
|
|
97
109
|
next unless summary["type"] == "summary_text"
|
|
@@ -43,11 +43,19 @@ class LLM::OpenAI
|
|
|
43
43
|
@body[k] = v
|
|
44
44
|
end
|
|
45
45
|
@body["output"] ||= []
|
|
46
|
+
when "response.in_progress", "response.completed"
|
|
47
|
+
response = chunk["response"] || {}
|
|
48
|
+
response.each do |k, v|
|
|
49
|
+
next if k == "output" && @body["output"].is_a?(Array) && @body["output"].any?
|
|
50
|
+
@body[k] = v
|
|
51
|
+
end
|
|
52
|
+
@body["output"] ||= response["output"] || []
|
|
46
53
|
when "response.output_item.added"
|
|
47
54
|
output_index = chunk["output_index"]
|
|
48
55
|
item = chunk["item"]
|
|
49
56
|
@body["output"][output_index] = item
|
|
50
57
|
@body["output"][output_index]["content"] ||= []
|
|
58
|
+
@body["output"][output_index]["summary"] ||= [] if item["type"] == "reasoning"
|
|
51
59
|
when "response.content_part.added"
|
|
52
60
|
output_index = chunk["output_index"]
|
|
53
61
|
content_index = chunk["content_index"]
|
|
@@ -55,6 +63,25 @@ class LLM::OpenAI
|
|
|
55
63
|
@body["output"][output_index] ||= {"content" => []}
|
|
56
64
|
@body["output"][output_index]["content"] ||= []
|
|
57
65
|
@body["output"][output_index]["content"][content_index] = part
|
|
66
|
+
when "response.reasoning_summary_text.delta"
|
|
67
|
+
output_item = @body["output"][chunk["output_index"]]
|
|
68
|
+
if output_item && output_item["type"] == "reasoning"
|
|
69
|
+
summary_index = chunk["summary_index"] || 0
|
|
70
|
+
output_item["summary"] ||= []
|
|
71
|
+
output_item["summary"][summary_index] ||= {"type" => "summary_text", "text" => +""}
|
|
72
|
+
output_item["summary"][summary_index]["text"] << chunk["delta"]
|
|
73
|
+
emit_reasoning_content(chunk["delta"])
|
|
74
|
+
end
|
|
75
|
+
when "response.reasoning_summary_text.done"
|
|
76
|
+
output_item = @body["output"][chunk["output_index"]]
|
|
77
|
+
if output_item && output_item["type"] == "reasoning"
|
|
78
|
+
summary_index = chunk["summary_index"] || 0
|
|
79
|
+
output_item["summary"] ||= []
|
|
80
|
+
output_item["summary"][summary_index] = {
|
|
81
|
+
"type" => "summary_text",
|
|
82
|
+
"text" => chunk["text"]
|
|
83
|
+
}
|
|
84
|
+
end
|
|
58
85
|
when "response.output_text.delta"
|
|
59
86
|
output_index = chunk["output_index"]
|
|
60
87
|
content_index = chunk["content_index"]
|
|
@@ -102,6 +129,10 @@ class LLM::OpenAI
|
|
|
102
129
|
end
|
|
103
130
|
end
|
|
104
131
|
|
|
132
|
+
def emit_reasoning_content(value)
|
|
133
|
+
@stream.on_reasoning_content(value) if @stream.respond_to?(:on_reasoning_content)
|
|
134
|
+
end
|
|
135
|
+
|
|
105
136
|
def emit_tool(index, tool)
|
|
106
137
|
return unless @stream.respond_to?(:on_tool_call)
|
|
107
138
|
return unless complete_tool?(tool)
|
data/lib/llm/version.rb
CHANGED
data/llm.gemspec
CHANGED
|
@@ -11,12 +11,22 @@ Gem::Specification.new do |spec|
|
|
|
11
11
|
spec.summary = "System integration layer for LLMs, tools, MCP, and APIs in Ruby."
|
|
12
12
|
|
|
13
13
|
spec.description = <<~DESCRIPTION
|
|
14
|
-
llm.rb is a
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
14
|
+
llm.rb is a runtime for building AI systems that integrate directly with your
|
|
15
|
+
application. It is not just an API wrapper. It provides a unified execution
|
|
16
|
+
model for providers, tools, MCP servers, streaming, schemas, files, and
|
|
17
|
+
state.
|
|
18
|
+
|
|
19
|
+
It is built for engineers who want control over how these systems run.
|
|
20
|
+
llm.rb stays close to Ruby, runs on the standard library by default, loads
|
|
21
|
+
optional pieces only when needed, and remains easy to extend. It also works
|
|
22
|
+
well in Rails or ActiveRecord applications, where a small wrapper around
|
|
23
|
+
context persistence is enough to save and restore long-lived conversation
|
|
24
|
+
state across requests, jobs, or retries.
|
|
25
|
+
|
|
26
|
+
Most LLM libraries stop at request/response APIs. Building real systems
|
|
27
|
+
means stitching together streaming, tools, state, persistence, and external
|
|
28
|
+
services by hand. llm.rb provides a single execution model for all of these,
|
|
29
|
+
so they compose naturally instead of becoming separate subsystems.
|
|
20
30
|
DESCRIPTION
|
|
21
31
|
|
|
22
32
|
spec.license = "0BSD"
|
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: llm.rb
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version: 4.
|
|
4
|
+
version: 4.13.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Antar Azri
|
|
@@ -195,12 +195,22 @@ dependencies:
|
|
|
195
195
|
- !ruby/object:Gem::Version
|
|
196
196
|
version: '1.7'
|
|
197
197
|
description: |
|
|
198
|
-
llm.rb is a
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
|
|
198
|
+
llm.rb is a runtime for building AI systems that integrate directly with your
|
|
199
|
+
application. It is not just an API wrapper. It provides a unified execution
|
|
200
|
+
model for providers, tools, MCP servers, streaming, schemas, files, and
|
|
201
|
+
state.
|
|
202
|
+
|
|
203
|
+
It is built for engineers who want control over how these systems run.
|
|
204
|
+
llm.rb stays close to Ruby, runs on the standard library by default, loads
|
|
205
|
+
optional pieces only when needed, and remains easy to extend. It also works
|
|
206
|
+
well in Rails or ActiveRecord applications, where a small wrapper around
|
|
207
|
+
context persistence is enough to save and restore long-lived conversation
|
|
208
|
+
state across requests, jobs, or retries.
|
|
209
|
+
|
|
210
|
+
Most LLM libraries stop at request/response APIs. Building real systems
|
|
211
|
+
means stitching together streaming, tools, state, persistence, and external
|
|
212
|
+
services by hand. llm.rb provides a single execution model for all of these,
|
|
213
|
+
so they compose naturally instead of becoming separate subsystems.
|
|
204
214
|
email:
|
|
205
215
|
- azantar@proton.me
|
|
206
216
|
- 0x1eef@hardenedbsd.org
|