llm.rb 4.12.0 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 79d4a45ec25408e46451475575e917ef9d8579bec32f1a6a78bfed235e5ae212
4
- data.tar.gz: fdeb12175be3ef87e411021444305b9e785a9bf2d055dfdc7bf718f5740623d8
3
+ metadata.gz: 7847fee7ea1e63553ad5323750fc2e5ac1b4a9082c2f4c5aba71f4587440ea75
4
+ data.tar.gz: e63bdae085b2f0f606cbdb4633a7eff93fd6e2428fcb85ff5fe94fc78851bf5d
5
5
  SHA512:
6
- metadata.gz: ea35b39b5476b75370485128dd8441e078bc7ac69236a7a50f4e32fb419f6fac5f7bb81faf3e029f28b788f4d69645e1b97e4126ea4f9fcc31f014921d2434a4
7
- data.tar.gz: c73bbf806f5cef71bfadfc1368fbdbfe07bf37118df18ebec71f4914a27ae2a3858fa6a210ee4d7cdff8f672a14c59016604a72a0a90c611b37223c4652ee991
6
+ metadata.gz: b1c8d8600b3214da5613d152677d13fde796b42e6a29cf8af035e4ad5f28b7cea0466a375b9b444a748e9e063d2e6ad6720b653609cb2b7038e8040cd2b44e39
7
+ data.tar.gz: c76882f9cd5416312e26f4e25493403df8f9f8c61ee14cba5096383b449bd7a4ce8b9d70834d12176648c3d9206f0f555a1eec4b22bdb6426d88c0c36c8ed592
data/CHANGELOG.md CHANGED
@@ -1,5 +1,43 @@
1
1
  # Changelog
2
2
 
3
+ ## Unreleased
4
+
5
+ Changes since `v4.13.0`.
6
+
7
+ ## v4.13.0
8
+
9
+ Changes since `v4.12.0`.
10
+
11
+ This release expands MCP prompt support, improves reasoning support in the
12
+ OpenAI Responses API, and refreshes the docs around llm.rb's runtime model,
13
+ contexts, and advanced workflows.
14
+
15
+ ### Add
16
+
17
+ - Add `LLM::MCP#prompts` and `LLM::MCP#find_prompt` for MCP prompt support.
18
+
19
+ ### Change
20
+
21
+ - Rework the README around llm.rb as a runtime for AI systems.
22
+ - Add a dedicated deep dive guide for providers, contexts, persistence,
23
+ tools, agents, MCP, tracing, multimodal prompts, and retrieval.
24
+
25
+ ### Fix
26
+
27
+ All of these fixes apply to MCP:
28
+
29
+ - fix(mcp): raise `LLM::MCP::MismatchError` on mismatched response ids.
30
+ - fix(mcp): normalize prompt message content while preserving the original payload.
31
+
32
+ All of these fixes apply to OpenAI's Responses API:
33
+
34
+ - fix(openai): emit `on_reasoning_content` for streamed reasoning summaries.
35
+ - fix(openai): skip `previous_response_id` on `store: false` follow-up calls.
36
+ - fix(openai): fall back to an empty object schema for tools without params.
37
+ - fix(openai): preserve original tool-call payloads on re-sent assistant tool messages.
38
+ - fix(openai): emit `output_text` for assistant-authored response content.
39
+ - fix(openai): return `nil` for `system_fingerprint` on normalized response objects.
40
+
3
41
  ## v4.12.0
4
42
 
5
43
  Changes since `v4.11.1`.
data/README.md CHANGED
@@ -4,155 +4,148 @@
4
4
  <p align="center">
5
5
  <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
6
6
  <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
7
- <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.12.0-green.svg?" alt="Version"></a>
7
+ <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.13.0-green.svg?" alt="Version"></a>
8
8
  </p>
9
9
 
10
10
  ## About
11
11
 
12
- llm.rb is a Ruby-centric system integration layer for building real
13
- LLM-powered systems. It connects LLMs to real systems by turning APIs into
14
- tools and unifying MCP, providers, and application logic into a single
15
- execution model. It is used in production systems integrating external and
16
- internal tools, including agents, MCP services, and OpenAPI-based APIs.
17
-
18
- Built for engineers who want to understand and control their LLM systems. No
19
- frameworks, no hidden magic just composable primitives for building real
20
- applications, from scripts to full systems like [Relay](https://github.com/llmrb/relay).
21
-
22
- Jump to [Quick start](#quick-start), discover its [capabilities](#capabilities), read about
23
- its [architecture](#architecture--execution-model) or watch the
24
- [Screencast](https://www.youtube.com/watch?v=x1K4wMeO_QA) for a deep dive into the design
25
- and capabilities of llm.rb.
26
-
27
- ## What Makes It Different
28
-
29
- Most LLM libraries stop at requests and responses. <br>
30
- llm.rb is built around the state and execution model behind them:
31
-
32
- - **A system layer, not just an API wrapper** <br>
33
- llm.rb unifies LLMs, tools, MCP servers, and application APIs into a single execution model.
34
- - **Contexts are central** <br>
35
- They hold history, tools, schema, usage, cost, persistence, and execution state.
36
- - **Contexts can be serialized** <br>
37
- A context can be serialized to JSON and stored on disk, in a database, in a
38
- job queue, or anywhere else your application needs to persist state.
39
- - **Tool execution is explicit** <br>
40
- Run local, provider-native, and MCP tools sequentially or concurrently with threads, fibers, or async tasks.
41
- - **Run tools while streaming** <br>
42
- Start tool work while a response is still streaming instead of waiting for the turn to finish. <br>
43
- This overlaps tool latency with model output and exposes streamed tool-call events for introspection, making it one of llm.rb's strongest execution features.
44
- - **HTTP MCP can reuse connections** <br>
45
- Opt into persistent HTTP pooling for repeated remote MCP tool calls with `persistent`.
46
- - **One API across providers and capabilities** <br>
47
- The same model covers chat, files, images, audio, embeddings, vector stores, and more.
48
- - **Thread-safe where it matters** <br>
49
- Providers are shareable, while contexts stay isolated and stateful.
50
- - **Local metadata, fewer extra API calls** <br>
51
- A built-in registry provides model capabilities, limits, pricing, and cost estimation.
52
- - **Stdlib-only by default** <br>
53
- llm.rb runs on the Ruby standard library by default, with providers, optional features, and the model registry loaded only when you use them.
54
-
55
- ## What llm.rb Enables
56
-
57
- llm.rb acts as the integration layer between LLMs, tools, and real systems.
58
-
59
- - Turn REST / OpenAPI APIs into LLM tools
60
- - Connect multiple MCP sources (Notion, internal services, etc.)
61
- - Build agents that operate across system boundaries
62
- - Orchestrate tools from multiple providers and protocols
63
- - Stream responses while executing tools concurrently
64
- - Treat LLMs as part of your architecture, not isolated calls
65
-
66
- Without llm.rb, providers, tool formats, and orchestration paths tend to stay
67
- fragmented. With llm.rb, they share a unified execution model with composable
68
- tools and a more consistent system architecture.
69
-
70
- ## Real-World Usage
71
-
72
- llm.rb is used to integrate external MCP services such as Notion, internal APIs
73
- exposed via OpenAPI or `swagger.json`, and multiple tool sources into a unified
74
- execution model. Common usage patterns include combining multiple MCP sources,
75
- turning internal APIs into tools, and running those tools through the same
76
- context and provider flow.
77
-
78
- It supports multiple MCP sources, external SaaS integrations, internal APIs via
79
- OpenAPI, and multiple LLM providers simultaneously.
80
-
81
- ## Architecture & Execution Model
82
-
83
- llm.rb sits at the center of the execution path, connecting tools, MCP
84
- sources, APIs, providers, and your application through explicit contexts:
85
-
86
- ```
87
- External MCP Internal MCP OpenAPI / REST
88
- │ │ │
89
- └────────── Tools / MCP Layer ──────────┘
90
-
91
- llm.rb Contexts
92
-
93
- LLM Providers
94
- (OpenAI, Anthropic, etc.)
95
-
96
- Your Application
97
- ```
98
-
99
- ### Key Design Decisions
100
-
101
- - **Thread-safe providers** - `LLM::Provider` instances are safe to share across threads
102
- - **Thread-local contexts** - `LLM::Context` should generally be kept thread-local
103
- - **Lazy loading** - Providers, optional features, and the model registry load on demand
104
- - **JSON adapter system** - Swap JSON libraries (JSON/Oj/Yajl) for performance
105
- - **Registry system** - Local metadata for model capabilities, limits, and pricing
106
- - **Provider adaptation** - Normalizes differences between OpenAI, Anthropic, Google, and other providers
107
- - **Structured tool execution** - Errors are captured and returned as data, not raised unpredictably
108
- - **Function vs Tool APIs** - Choose between class-based tools and closure-based functions
12
+ llm.rb is a runtime for building AI systems that integrate directly with your
13
+ application. It is not just an API wrapper. It provides a unified execution
14
+ model for providers, tools, MCP servers, streaming, schemas, files, and
15
+ state.
16
+
17
+ It is built for engineers who want control over how these systems run. llm.rb
18
+ stays close to Ruby, runs on the standard library by default, loads optional
19
+ pieces only when needed, and remains easy to extend. It also works well in
20
+ Rails or ActiveRecord applications, where a small wrapper around context
21
+ persistence is enough to save and restore long-lived conversation state across
22
+ requests, jobs, or retries.
23
+
24
+ Most LLM libraries stop at request/response APIs. Building real systems means
25
+ stitching together streaming, tools, state, persistence, and external
26
+ services by hand. llm.rb provides a single execution model for all of these,
27
+ so they compose naturally instead of becoming separate subsystems.
28
+
29
+ ## Architecture
30
+
31
+ ```
32
+ External MCP Internal MCP OpenAPI / REST
33
+ │ │ │
34
+ └────────── Tools / MCP Layer ───────┘
35
+
36
+ llm.rb Contexts
37
+
38
+ LLM Providers
39
+ (OpenAI, Anthropic, etc.)
40
+
41
+ Your Application
42
+ ```
43
+
44
+ ## Core Concept
45
+
46
+ `LLM::Context` is the execution boundary in llm.rb.
47
+
48
+ It holds:
49
+ - message history
50
+ - tool state
51
+ - schemas
52
+ - streaming configuration
53
+ - usage and cost tracking
54
+
55
+ Instead of switching abstractions for each feature, everything builds on the
56
+ same context object.
57
+
58
+ ## Differentiators
59
+
60
+ ### Execution Model
61
+
62
+ - **A system layer, not just an API wrapper**
63
+ Put providers, tools, MCP servers, and application APIs behind one runtime
64
+ model instead of stitching them together by hand.
65
+ - **Contexts are central**
66
+ Keep history, tools, schema, usage, persistence, and execution state in one
67
+ place instead of spreading them across your app.
68
+ - **Contexts can be serialized**
69
+ Save and restore live state for jobs, databases, retries, or long-running
70
+ workflows.
71
+
72
+ ### Runtime Behavior
73
+
74
+ - **Streaming and tool execution work together**
75
+ Start tool work while output is still streaming so you can hide latency
76
+ instead of waiting for turns to finish.
77
+ - **Concurrency is a first-class feature**
78
+ Use threads, fibers, or async tasks without rewriting your tool layer.
79
+ - **Advanced workloads are built in, not bolted on**
80
+ Streaming, concurrent tool execution, persistence, tracing, and MCP support
81
+ all fit the same runtime model.
82
+
83
+ ### Integration
84
+
85
+ - **MCP is built in**
86
+ Connect to MCP servers over stdio or HTTP without bolting on a separate
87
+ integration stack.
88
+ - **Tools are explicit**
89
+ Run local tools, provider-native tools, and MCP tools through the same path
90
+ with fewer special cases.
91
+ - **Providers are normalized, not flattened**
92
+ Share one API surface across providers without losing access to provider-
93
+ specific capabilities where they matter.
94
+ - **Local model metadata is included**
95
+ Model capabilities, pricing, and limits are available locally without extra
96
+ API calls.
97
+
98
+ ### Design Philosophy
99
+
100
+ - **Runs on the stdlib**
101
+ Start with Ruby's standard library and add extra dependencies only when you
102
+ need them.
103
+ - **It is highly pluggable**
104
+ Add tools, swap providers, change JSON backends, plug in tracing, or layer
105
+ internal APIs and MCP servers into the same execution path.
106
+ - **It scales from scripts to long-lived systems**
107
+ The same primitives work for one-off scripts, background jobs, and more
108
+ demanding application workloads with streaming, persistence, and tracing.
109
+ - **Thread boundaries are clear**
110
+ Providers are shareable. Contexts are stateful and should stay thread-local.
109
111
 
110
112
  ## Capabilities
111
113
 
112
- llm.rb provides a complete set of primitives for building LLM-powered systems:
113
-
114
114
  - **Chat & Contexts** — stateless and stateful interactions with persistence
115
- - **Streaming** — real-time responses across providers, including structured stream callbacks
116
- - **Reasoning Support** — full stream, message, and response support when providers expose reasoning
117
- - **Tool Calling** — define and execute functions with automatic orchestration
118
- - **Run Tools While Streaming** — begin tool work before the model finishes its turn
115
+ - **Context Serialization** — save and restore state across processes or time
116
+ - **Streaming** — visible output, reasoning output, tool-call events
117
+ - **Tool Calling** — class-based tools and closure-based functions
118
+ - **Run Tools While Streaming** — overlap model output with tool latency
119
119
  - **Concurrent Execution** — threads, async tasks, and fibers
120
- - **Agents** — reusable, preconfigured assistants with tool auto-execution
121
- - **Structured Outputs** — JSON schema-based responses
122
- - **MCP Support** — integrate external tool servers dynamically over stdio or HTTP
120
+ - **Agents** — reusable assistants with tool auto-execution
121
+ - **Structured Outputs** — JSON Schema-based responses
122
+ - **Responses API** — stateful response workflows where providers support them
123
+ - **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
123
124
  - **Multimodal Inputs** — text, images, audio, documents, URLs
124
- - **Audio** — text-to-speech, transcription, translation
125
+ - **Audio** — speech generation, transcription, translation
125
126
  - **Images** — generation and editing
126
127
  - **Files API** — upload and reference files in prompts
127
128
  - **Embeddings** — vector generation for search and RAG
128
- - **Vector Stores** — OpenAI-based retrieval workflows
129
- - **Cost Tracking** — estimate usage without API calls
129
+ - **Vector Stores** — retrieval workflows
130
+ - **Cost Tracking** — local cost estimation without extra API calls
130
131
  - **Observability** — tracing, logging, telemetry
131
132
  - **Model Registry** — local metadata for capabilities, limits, pricing
133
+ - **Persistent HTTP** — optional connection pooling for providers and MCP
132
134
 
133
- ## Quick Start
134
-
135
- These examples show individual features, but llm.rb is designed to combine
136
- them into full systems where LLMs, tools, and external services operate
137
- together.
138
-
139
- #### Simple Streaming
135
+ ## Installation
140
136
 
141
- At the simplest level, any object that implements `#<<` can receive visible
142
- output as it arrives. This works with `$stdout`, `StringIO`, files, sockets,
143
- and other Ruby IO-style objects.
137
+ ```bash
138
+ gem install llm.rb
139
+ ```
144
140
 
145
- For more control, llm.rb also supports advanced streaming patterns through
146
- [`LLM::Stream`](lib/llm/stream.rb). See [Advanced Streaming](#advanced-streaming)
147
- for a structured callback-based example. Basic `#<<` streams only receive
148
- visible output chunks:
141
+ ## Example
149
142
 
150
143
  ```ruby
151
- #!/usr/bin/env ruby
152
144
  require "llm"
153
145
 
154
146
  llm = LLM.openai(key: ENV["KEY"])
155
147
  ctx = LLM::Context.new(llm, stream: $stdout)
148
+
156
149
  loop do
157
150
  print "> "
158
151
  ctx.talk(STDIN.gets || break)
@@ -160,623 +153,13 @@ loop do
160
153
  end
161
154
  ```
162
155
 
163
- #### Structured Outputs
164
-
165
- The `LLM::Schema` system lets you define JSON schemas for structured outputs.
166
- Schemas can be defined as classes with `property` declarations or built
167
- programmatically using a fluent interface. When you pass a schema to a context,
168
- llm.rb adapts it into the provider's structured-output format when that
169
- provider supports one. The `content!` method then parses the assistant's JSON
170
- response into a Ruby object:
171
-
172
- ```ruby
173
- #!/usr/bin/env ruby
174
- require "llm"
175
- require "pp"
176
-
177
- class Report < LLM::Schema
178
- property :category, Enum["performance", "security", "outage"], "Report category", required: true
179
- property :summary, String, "Short summary", required: true
180
- property :impact, OneOf[String, Integer], "Primary impact, as text or a count", required: true
181
- property :services, Array[String], "Impacted services", required: true
182
- property :timestamp, String, "When it happened", optional: true
183
- end
184
-
185
- llm = LLM.openai(key: ENV["KEY"])
186
- ctx = LLM::Context.new(llm, schema: Report)
187
- res = ctx.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
188
- pp res.content!
189
-
190
- # {
191
- # "category" => "performance",
192
- # "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
193
- # "impact" => "5% request timeouts",
194
- # "services" => ["Database"],
195
- # "timestamp" => "2024-06-05T10:42:00Z"
196
- # }
197
- ```
198
-
199
- #### Tool Calling
200
-
201
- Tools in llm.rb can be defined as classes inheriting from `LLM::Tool` or as
202
- closures using `LLM.function`. When the LLM requests a tool call, the context
203
- stores `Function` objects in `ctx.functions`. The `call()` method executes all
204
- pending functions and returns their results to the LLM. Tools describe
205
- structured parameters with JSON Schema and adapt those definitions to each
206
- provider's tool-calling format (OpenAI, Anthropic, Google, etc.):
207
-
208
- ```ruby
209
- #!/usr/bin/env ruby
210
- require "llm"
211
-
212
- class System < LLM::Tool
213
- name "system"
214
- description "Run a shell command"
215
- param :command, String, "Command to execute", required: true
216
-
217
- def call(command:)
218
- {success: system(command)}
219
- end
220
- end
221
-
222
- llm = LLM.openai(key: ENV["KEY"])
223
- ctx = LLM::Context.new(llm, stream: $stdout, tools: [System])
224
- ctx.talk("Run `date`.")
225
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
226
- ```
227
-
228
- #### Concurrent Tools
229
-
230
- llm.rb provides explicit concurrency control for tool execution. The
231
- `wait(:thread)` method spawns each pending function in its own thread and waits
232
- for all to complete. You can also use `:fiber` for cooperative multitasking or
233
- `:task` for async/await patterns (requires the `async` gem). The context
234
- automatically collects all results and reports them back to the LLM in a
235
- single turn, maintaining conversation flow while parallelizing independent
236
- operations:
237
-
238
- ```ruby
239
- #!/usr/bin/env ruby
240
- require "llm"
241
-
242
- llm = LLM.openai(key: ENV["KEY"])
243
- ctx = LLM::Context.new(llm, stream: $stdout, tools: [FetchWeather, FetchNews, FetchStock])
244
-
245
- # Execute multiple independent tools concurrently
246
- ctx.talk("Summarize the weather, headlines, and stock price.")
247
- ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
248
- ```
249
-
250
- #### Advanced Streaming
251
-
252
- Use [`LLM::Stream`](lib/llm/stream.rb) when you want more than plain `#<<`
253
- output. It adds structured streaming callbacks for:
254
-
255
- - `on_content` for visible assistant output
256
- - `on_reasoning_content` for separate reasoning output
257
- - `on_tool_call` for streamed tool-call notifications
258
- - `on_tool_return` for completed tool execution
259
-
260
- Subclass [`LLM::Stream`](lib/llm/stream.rb) when you want callbacks like
261
- `on_reasoning_content`, `on_tool_call`, and `on_tool_return`, or helpers like
262
- `queue` and `wait`.
263
-
264
- Keep `on_content`, `on_reasoning_content`, and `on_tool_call` fast: they run
265
- inline with the streaming parser. `on_tool_return` is different: it runs later,
266
- when `wait` resolves queued streamed tool work.
267
-
268
- `on_tool_call` lets tools start before the model finishes its turn, for
269
- example with `tool.spawn(:thread)`, `tool.spawn(:fiber)`, or
270
- `tool.spawn(:task)`. That can overlap tool latency with streaming output.
271
- `on_tool_return` is the place to react when that queued work completes, for
272
- example by updating progress UIs, logging tool latency, or changing visible
273
- state from "Running tool ..." to "Finished tool ...".
274
-
275
- If a stream cannot resolve a tool, `on_tool_call` receives `error` as an
276
- `LLM::Function::Return`. That keeps the session alive and leaves control in
277
- the callback: it can send `error`, spawn the tool when `error == nil`, or
278
- handle the situation however it sees fit.
279
-
280
- In normal use this should be rare, since `on_tool_call` is usually called with
281
- a resolved tool and `error == nil`. To resolve a tool call, the tool must be
282
- found in `LLM::Function.registry`. That covers `LLM::Tool` subclasses,
283
- including MCP tools, but not `LLM.function` closures, which are excluded
284
- because they may be bound to local state:
285
-
286
- ```ruby
287
- #!/usr/bin/env ruby
288
- require "llm"
289
- # Assume `System < LLM::Tool` is already defined.
290
-
291
- class Stream < LLM::Stream
292
- def on_content(content)
293
- $stdout << content
294
- end
295
-
296
- def on_reasoning_content(content)
297
- $stderr << content
298
- end
299
-
300
- def on_tool_call(tool, error)
301
- $stdout << "Running tool #{tool.name}\n"
302
- queue << (error || tool.spawn(:thread))
303
- end
304
-
305
- def on_tool_return(tool, ret)
306
- $stdout << (ret.error? ? "Tool #{tool.name} failed\n" : "Finished tool #{tool.name}\n")
307
- end
308
- end
309
-
310
- llm = LLM.openai(key: ENV["KEY"])
311
- ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])
312
-
313
- ctx.talk("Run `date` and `uname -a`.")
314
- while ctx.functions.any?
315
- ctx.talk(ctx.wait(:thread))
316
- end
317
- ```
318
-
319
- #### MCP
320
-
321
- MCP is a first-class integration mechanism in llm.rb.
322
-
323
- MCP allows llm.rb to treat external services, internal APIs, and system
324
- capabilities as tools in a unified interface. This makes it possible to
325
- connect multiple MCP sources simultaneously and expose your own APIs as tools.
326
-
327
- In practice, this supports workflows such as external SaaS integrations,
328
- multiple MCP sources in the same context, and OpenAPI -> MCP -> tools
329
- pipelines for internal services.
330
-
331
- llm.rb integrates with the Model Context Protocol (MCP) to dynamically discover
332
- and use tools from external servers. This example starts a filesystem MCP
333
- server over stdio and makes its tools available to a context, enabling the LLM
334
- to interact with the local file system through a standardized interface.
335
- Use `LLM::MCP.stdio` or `LLM::MCP.http` when you want to make the transport
336
- explicit. Like `LLM::Context`, an MCP client is stateful and should remain
337
- isolated to a single thread:
338
-
339
- ```ruby
340
- #!/usr/bin/env ruby
341
- require "llm"
342
-
343
- llm = LLM.openai(key: ENV["KEY"])
344
- mcp = LLM::MCP.stdio(argv: ["npx", "-y", "@modelcontextprotocol/server-filesystem", Dir.pwd])
345
-
346
- begin
347
- mcp.start
348
- ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
349
- ctx.talk("List the directories in this project.")
350
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
351
- ensure
352
- mcp.stop
353
- end
354
- ```
355
-
356
- You can also connect to an MCP server over HTTP. This is useful when the
357
- server already runs remotely and exposes MCP through a URL instead of a local
358
- process. If you expect repeated tool calls, use `persistent` to reuse a
359
- process-wide HTTP connection pool. This requires the optional
360
- `net-http-persistent` gem:
361
-
362
- ```ruby
363
- #!/usr/bin/env ruby
364
- require "llm"
365
-
366
- llm = LLM.openai(key: ENV["KEY"])
367
- mcp = LLM::MCP.http(
368
- url: "https://api.githubcopilot.com/mcp/",
369
- headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
370
- ).persistent
371
-
372
- begin
373
- mcp.start
374
- ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
375
- ctx.talk("List the available GitHub MCP toolsets.")
376
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
377
- ensure
378
- mcp.stop
379
- end
380
- ```
381
-
382
- ## Providers
383
-
384
- llm.rb supports multiple LLM providers with a unified API.
385
- All providers share the same context, tool, and concurrency interfaces, making
386
- it easy to switch between cloud and local models:
387
-
388
- - **OpenAI** (`LLM.openai`)
389
- - **Anthropic** (`LLM.anthropic`)
390
- - **Google** (`LLM.google`)
391
- - **DeepSeek** (`LLM.deepseek`)
392
- - **xAI** (`LLM.xai`)
393
- - **zAI** (`LLM.zai`)
394
- - **Ollama** (`LLM.ollama`)
395
- - **Llama.cpp** (`LLM.llamacpp`)
396
-
397
- ## Production
398
-
399
- #### Ready for production
400
-
401
- llm.rb is designed for production use from the ground up:
402
-
403
- - **Thread-safe providers** - Share `LLM::Provider` instances across your application
404
- - **Thread-local contexts** - Keep `LLM::Context` instances thread-local for state isolation
405
- - **Cost tracking** - Know your spend before the bill arrives
406
- - **Observability** - Built-in tracing with OpenTelemetry support
407
- - **Persistence** - Save and restore contexts across processes
408
- - **Performance** - Swap JSON adapters and enable HTTP connection pooling
409
- - **Error handling** - Structured errors, not unpredictable exceptions
410
-
411
- #### Tracing
412
-
413
- llm.rb includes built-in tracers for local logging, OpenTelemetry, and
414
- LangSmith. Assign a tracer to a provider and all context requests and tool
415
- calls made through that provider will be instrumented. Tracers are local to
416
- the current fiber, so the same provider can use different tracers in different
417
- concurrent tasks without interfering with each other.
418
-
419
- Use the logger tracer when you want structured logs through Ruby's standard
420
- library:
421
-
422
- ```ruby
423
- #!/usr/bin/env ruby
424
- require "llm"
425
-
426
- llm = LLM.openai(key: ENV["KEY"])
427
- llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
428
-
429
- ctx = LLM::Context.new(llm)
430
- ctx.talk("Hello")
431
- ```
432
-
433
- Use the telemetry tracer when you want OpenTelemetry spans. This requires the
434
- `opentelemetry-sdk` gem, and exporters such as OTLP can be added separately:
435
-
436
- ```ruby
437
- #!/usr/bin/env ruby
438
- require "llm"
439
-
440
- llm = LLM.openai(key: ENV["KEY"])
441
- llm.tracer = LLM::Tracer::Telemetry.new(llm)
442
-
443
- ctx = LLM::Context.new(llm)
444
- ctx.talk("Hello")
445
- pp llm.tracer.spans
446
- ```
447
-
448
- Use the LangSmith tracer when you want LangSmith-compatible metadata and trace
449
- grouping on top of the telemetry tracer:
450
-
451
- ```ruby
452
- #!/usr/bin/env ruby
453
- require "llm"
454
-
455
- llm = LLM.openai(key: ENV["KEY"])
456
- llm.tracer = LLM::Tracer::Langsmith.new(
457
- llm,
458
- metadata: {env: "dev"},
459
- tags: ["chatbot"]
460
- )
461
-
462
- ctx = LLM::Context.new(llm)
463
- ctx.talk("Hello")
464
- ```
465
-
466
- #### Thread Safety
467
-
468
- llm.rb uses Ruby's `Monitor` class to ensure thread safety at the provider
469
- level, allowing you to share a single provider instance across multiple threads
470
- while maintaining state isolation through thread-local contexts. This design
471
- enables efficient resource sharing while preventing race conditions in
472
- concurrent applications:
473
-
474
- ```ruby
475
- #!/usr/bin/env ruby
476
- require "llm"
477
-
478
- # Thread-safe providers - create once, use everywhere
479
- llm = LLM.openai(key: ENV["KEY"])
480
-
481
- # Each thread should have its own context for state isolation
482
- Thread.new do
483
- ctx = LLM::Context.new(llm) # Thread-local context
484
- ctx.talk("Hello from thread 1")
485
- end
486
-
487
- Thread.new do
488
- ctx = LLM::Context.new(llm) # Thread-local context
489
- ctx.talk("Hello from thread 2")
490
- end
491
- ```
492
-
493
- #### Performance Tuning
494
-
495
- llm.rb's JSON adapter system lets you swap JSON libraries for better
496
- performance in high-throughput applications. The library supports stdlib JSON,
497
- Oj, and Yajl, with Oj typically offering the best performance. Additionally,
498
- you can enable HTTP connection pooling using the optional `net-http-persistent`
499
- gem to reduce connection overhead in production environments:
500
-
501
- ```ruby
502
- #!/usr/bin/env ruby
503
- require "llm"
504
-
505
- # Swap JSON libraries for better performance
506
- LLM.json = :oj # Use Oj for faster JSON parsing
507
-
508
- # Enable HTTP connection pooling for high-throughput applications
509
- llm = LLM.openai(key: ENV["KEY"]).persistent # Uses net-http-persistent when available
510
- ```
511
-
512
- #### Model Registry
513
-
514
- llm.rb includes a local model registry that provides metadata about model
515
- capabilities, pricing, and limits without requiring API calls. The registry is
516
- shipped with the gem and sourced from https://models.dev, giving you access to
517
- up-to-date information about context windows, token costs, and supported
518
- modalities for each provider:
519
-
520
- ```ruby
521
- #!/usr/bin/env ruby
522
- require "llm"
523
-
524
- # Access model metadata, capabilities, and pricing
525
- registry = LLM.registry_for(:openai)
526
- model_info = registry.limit(model: "gpt-4.1")
527
- puts "Context window: #{model_info.context} tokens"
528
- puts "Cost: $#{model_info.cost.input}/1M input tokens"
529
- ```
530
-
531
- ## More Examples
532
-
533
- #### Responses API
534
-
535
- llm.rb also supports OpenAI's Responses API through `LLM::Context` with
536
- `mode: :responses`. The important switch is `store:`. With `store: false`, the
537
- Responses API stays stateless while still using the Responses endpoint, which
538
- is useful for models or features that are only available through the Responses
539
- API. With `store: true`, OpenAI can keep
540
- response state server-side and reduce how much conversation state needs to be
541
- sent on each turn:
542
-
543
- ```ruby
544
- #!/usr/bin/env ruby
545
- require "llm"
546
-
547
- llm = LLM.openai(key: ENV["KEY"])
548
- ctx = LLM::Context.new(llm, mode: :responses, store: false)
549
-
550
- ctx.talk("Your task is to answer the user's questions", role: :developer)
551
- res = ctx.talk("What is the capital of France?")
552
- puts res.content
553
- ```
554
-
555
- #### Context Persistence: Vanilla
556
-
557
- Contexts can be serialized and restored across process boundaries. A context
558
- can be serialized to JSON and stored on disk, in a database, in a job queue,
559
- or anywhere else your application needs to persist state:
560
-
561
- ```ruby
562
- #!/usr/bin/env ruby
563
- require "llm"
564
-
565
- llm = LLM.openai(key: ENV["KEY"])
566
- ctx = LLM::Context.new(llm)
567
- ctx.talk("Hello")
568
- ctx.talk("Remember that my favorite language is Ruby")
569
-
570
- # Serialize to a string when you want to store the context yourself,
571
- # for example in a database row or job payload.
572
- payload = ctx.to_json
573
-
574
- restored = LLM::Context.new(llm)
575
- restored.restore(string: payload)
576
- res = restored.talk("What is my favorite language?")
577
- puts res.content
578
-
579
- # You can also persist the same state to a file:
580
- ctx.save(path: "context.json")
581
- restored = LLM::Context.new(llm)
582
- restored.restore(path: "context.json")
583
- ```
584
-
585
- #### Context Persistence: ActiveRecord (Rails)
586
-
587
- In a Rails application, you can also wrap persisted context state in an
588
- ActiveRecord model. A minimal schema would include a `snapshot` column for the
589
- serialized context payload (`jsonb` is recommended) and a `provider` column
590
- for the provider name:
591
-
592
- ```ruby
593
- create_table :contexts do |t|
594
- t.jsonb :snapshot
595
- t.string :provider, null: false
596
- t.timestamps
597
- end
598
- ```
599
-
600
- For example:
601
-
602
- ```ruby
603
- class Context < ApplicationRecord
604
- def talk(...)
605
- ctx.talk(...).tap { flush }
606
- end
607
-
608
- def wait(...)
609
- ctx.wait(...).tap { flush }
610
- end
611
-
612
- def messages
613
- ctx.messages
614
- end
615
-
616
- def model
617
- ctx.model
618
- end
619
-
620
- def flush
621
- update_column(:snapshot, ctx.to_json)
622
- end
623
-
624
- private
625
-
626
- def ctx
627
- @ctx ||= begin
628
- ctx = LLM::Context.new(llm)
629
- ctx.restore(string: snapshot) if snapshot
630
- ctx
631
- end
632
- end
633
-
634
- def llm
635
- LLM.method(provider).call(key: ENV.fetch(key))
636
- end
156
+ ## Resources
637
157
 
638
- def key
639
- "#{provider.upcase}_KEY"
640
- end
641
- end
642
- ```
643
-
644
- #### Agents
645
-
646
- Agents in llm.rb are reusable, preconfigured assistants that automatically
647
- execute tool calls and maintain conversation state. Unlike contexts which
648
- require manual tool execution, agents automatically handle the tool call loop,
649
- making them ideal for autonomous workflows where you want the LLM to
650
- independently use available tools to accomplish tasks:
651
-
652
- ```ruby
653
- #!/usr/bin/env ruby
654
- require "llm"
655
-
656
- class SystemAdmin < LLM::Agent
657
- model "gpt-4.1"
658
- instructions "You are a Linux system admin"
659
- tools Shell
660
- schema Result
661
- end
662
-
663
- llm = LLM.openai(key: ENV["KEY"])
664
- agent = SystemAdmin.new(llm)
665
- res = agent.talk("Run 'date'")
666
- ```
667
-
668
- #### Cost Tracking
669
-
670
- llm.rb provides built-in cost estimation that works without making additional
671
- API calls. The cost tracking system uses the local model registry to calculate
672
- estimated costs based on token usage, giving you visibility into spending
673
- before bills arrive. This is particularly useful for monitoring usage in
674
- production applications and setting budget alerts:
675
-
676
- ```ruby
677
- #!/usr/bin/env ruby
678
- require "llm"
679
-
680
- llm = LLM.openai(key: ENV["KEY"])
681
- ctx = LLM::Context.new(llm)
682
- ctx.talk "Hello"
683
- puts "Estimated cost so far: $#{ctx.cost}"
684
- ctx.talk "Tell me a joke"
685
- puts "Estimated cost so far: $#{ctx.cost}"
686
- ```
687
-
688
- #### Multimodal Prompts
689
-
690
- Contexts provide helpers for composing multimodal prompts from URLs, local
691
- files, and provider-managed remote files. These tagged objects let providers
692
- adapt the input into the format they expect:
693
-
694
- ```ruby
695
- #!/usr/bin/env ruby
696
- require "llm"
697
-
698
- llm = LLM.openai(key: ENV["KEY"])
699
- ctx = LLM::Context.new(llm)
700
-
701
- res = ctx.talk ["Describe this image", ctx.image_url("https://example.com/cat.jpg")]
702
- puts res.content
703
- ```
704
-
705
- #### Audio Generation
706
-
707
- llm.rb supports OpenAI's audio API for text-to-speech generation, allowing you
708
- to create speech from text with configurable voices and output formats. The
709
- audio API returns binary audio data that can be streamed directly to files or
710
- other IO objects, enabling integration with multimedia applications:
711
-
712
- ```ruby
713
- #!/usr/bin/env ruby
714
- require "llm"
715
-
716
- llm = LLM.openai(key: ENV["KEY"])
717
- res = llm.audio.create_speech(input: "Hello world")
718
- IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
719
- ```
720
-
721
- #### Image Generation
722
-
723
- llm.rb provides access to OpenAI's DALL-E image generation API through a
724
- unified interface. The API supports multiple response formats including
725
- base64-encoded images and temporary URLs, with automatic handling of binary
726
- data streaming for efficient file operations:
727
-
728
- ```ruby
729
- #!/usr/bin/env ruby
730
- require "llm"
731
-
732
- llm = LLM.openai(key: ENV["KEY"])
733
- res = llm.images.create(prompt: "a dog on a rocket to the moon")
734
- IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
735
- ```
736
-
737
- #### Embeddings
738
-
739
- llm.rb's embedding API generates vector representations of text for semantic
740
- search and retrieval-augmented generation (RAG) workflows. The API supports
741
- batch processing of multiple inputs and returns normalized vectors suitable for
742
- vector similarity operations, with consistent dimensionality across providers:
743
-
744
- ```ruby
745
- #!/usr/bin/env ruby
746
- require "llm"
747
-
748
- llm = LLM.openai(key: ENV["KEY"])
749
- res = llm.embed(["programming is fun", "ruby is a programming language", "sushi is art"])
750
- puts res.class
751
- puts res.embeddings.size
752
- puts res.embeddings[0].size
753
-
754
- # LLM::Response
755
- # 3
756
- # 1536
757
- ```
758
-
759
- ## Real-World Example: Relay
760
-
761
- See how these pieces come together in a complete application architecture with
762
- [Relay](https://github.com/llmrb/relay), a production-ready LLM application
763
- built on llm.rb that demonstrates:
764
-
765
- - Context management across requests
766
- - Tool composition and execution
767
- - Concurrent workflows
768
- - Cost tracking and observability
769
- - Production deployment patterns
770
-
771
- Watch the screencast:
772
-
773
- [![Watch the llm.rb screencast](https://img.youtube.com/vi/Jb7LNUYlCf4/maxresdefault.jpg)](https://www.youtube.com/watch?v=x1K4wMeO_QA)
774
-
775
- ## Installation
776
-
777
- ```bash
778
- gem install llm.rb
779
- ```
158
+ - [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
159
+ examples guide.
160
+ - [_examples/relay](./_examples/relay) shows a real application built on top
161
+ of llm.rb.
162
+ - [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
780
163
 
781
164
  ## License
782
165
 
data/lib/llm/context.rb CHANGED
@@ -103,9 +103,9 @@ module LLM
103
103
  # res = ctx.respond("What is the capital of France?")
104
104
  # puts res.output_text
105
105
  def respond(prompt, params = {})
106
- res_id = @messages.find(&:assistant?)&.response&.response_id
107
- params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
108
106
  params = @params.merge(params)
107
+ res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
108
+ params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
109
109
  res = @llm.responses.create(prompt, params)
110
110
  role = params[:role] || @llm.user_role
111
111
  @messages.concat LLM::Prompt === prompt ? prompt.to_a : [LLM::Message.new(role, prompt)]
data/lib/llm/function.rb CHANGED
@@ -257,7 +257,7 @@ class LLM::Function
257
257
  when "LLM::OpenAI::Responses"
258
258
  {
259
259
  type: "function", name: @name, description: @description,
260
- parameters: @params.to_h.merge(additionalProperties: false), strict: true
260
+ parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
261
261
  }.compact
262
262
  else
263
263
  {
data/lib/llm/mcp/error.rb CHANGED
@@ -1,7 +1,7 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  class LLM::MCP
4
- class Error < LLM::Error
4
+ Error = Class.new(LLM::Error) do
5
5
  attr_reader :code, :data
6
6
 
7
7
  ##
@@ -27,5 +27,35 @@ class LLM::MCP
27
27
  end
28
28
  end
29
29
 
30
+ MismatchError = Class.new(Error) do
31
+ ##
32
+ # @return [Integer, String]
33
+ # The request id the client was waiting for
34
+ attr_reader :expected_id
35
+
36
+ ##
37
+ # @return [Integer, String]
38
+ # The response id received from the server
39
+ attr_reader :actual_id
40
+
41
+ ##
42
+ # @param [Integer, String] expected_id
43
+ # The request id the client was waiting for
44
+ # @param [Integer, String] actual_id
45
+ # The response id received from the server instead
46
+ def initialize(expected_id:, actual_id:)
47
+ @expected_id = expected_id
48
+ @actual_id = actual_id
49
+ super(message)
50
+ end
51
+
52
+ ##
53
+ # @return [String]
54
+ def message
55
+ "mismatched MCP response id #{actual_id.inspect} " \
56
+ "while waiting for #{expected_id.inspect}"
57
+ end
58
+ end
59
+
30
60
  TimeoutError = Class.new(Error)
31
61
  end
data/lib/llm/mcp/rpc.rb CHANGED
@@ -53,11 +53,14 @@ class LLM::MCP
53
53
  poll(timeout:, ex: [IO::WaitReadable]) do
54
54
  loop do
55
55
  res = transport.read_nonblock
56
- next unless res["id"] == id
57
- if res["error"]
56
+ if res["id"] == id && res["error"]
58
57
  raise LLM::MCP::Error.from(response: res)
59
- else
58
+ elsif res["id"] == id
60
59
  break res["result"]
60
+ elsif res["method"]
61
+ next
62
+ elsif res.key?("id")
63
+ raise LLM::MCP::MismatchError.new(expected_id: id, actual_id: res["id"])
61
64
  end
62
65
  end
63
66
  end
@@ -101,6 +104,8 @@ class LLM::MCP
101
104
  # The exceptions to retry when raised
102
105
  # @yield
103
106
  # The block to run
107
+ # @raise [LLM::MCP::MismatchError]
108
+ # When an unrelated response id is received while waiting
104
109
  # @raise [LLM::MCP::TimeoutError]
105
110
  # When the block takes longer than the timeout
106
111
  # @return [Object]
data/lib/llm/mcp.rb CHANGED
@@ -121,6 +121,34 @@ class LLM::MCP
121
121
  res["tools"].map { LLM::Tool.mcp(self, _1) }
122
122
  end
123
123
 
124
+ ##
125
+ # Returns the prompts provided by the MCP process.
126
+ # @return [Array<LLM::Object>]
127
+ def prompts
128
+ res = call(transport, "prompts/list")
129
+ LLM::Object.from(res["prompts"])
130
+ end
131
+
132
+ ##
133
+ # Returns a prompt by name.
134
+ # @param [String] name The prompt name
135
+ # @param [Hash<String, String>, nil] arguments The prompt arguments
136
+ # @return [LLM::Object]
137
+ def find_prompt(name:, arguments: nil)
138
+ params = {name:}
139
+ params[:arguments] = arguments if arguments
140
+ res = call(transport, "prompts/get", params)
141
+ res["messages"] = [*res["messages"]].map do |message|
142
+ LLM::Message.new(
143
+ message["role"],
144
+ adapt_content(message["content"]),
145
+ {original_content: message["content"]}
146
+ )
147
+ end
148
+ LLM::Object.from(res)
149
+ end
150
+ alias_method :get_prompt, :find_prompt
151
+
124
152
  ##
125
153
  # Calls a tool by name with the given arguments
126
154
  # @param [String] name The name of the tool to call
@@ -135,6 +163,19 @@ class LLM::MCP
135
163
 
136
164
  attr_reader :llm, :command, :transport, :timeout
137
165
 
166
+ def adapt_content(content)
167
+ case content
168
+ when String
169
+ content
170
+ when Hash
171
+ content["type"] == "text" ? content["text"].to_s : LLM::Object.from(content)
172
+ when Array
173
+ content.map { adapt_content(_1) }
174
+ else
175
+ content
176
+ end
177
+ end
178
+
138
179
  def adapt_tool_result(result)
139
180
  if result["structuredContent"]
140
181
  result["structuredContent"]
@@ -15,6 +15,8 @@ module LLM::OpenAI::RequestAdapter
15
15
  catch(:abort) do
16
16
  if Hash === message
17
17
  {role: message[:role], content: adapt_content(message[:content])}
18
+ elsif message.tool_call?
19
+ message.extra[:original_tool_calls]
18
20
  else
19
21
  adapt_message
20
22
  end
@@ -23,12 +25,12 @@ module LLM::OpenAI::RequestAdapter
23
25
 
24
26
  private
25
27
 
26
- def adapt_content(content)
28
+ def adapt_content(content, role: message.role)
27
29
  case content
28
30
  when String
29
- [{type: :input_text, text: content.to_s}]
31
+ [{type: text_content_type(role), text: content.to_s}]
30
32
  when LLM::Response then adapt_remote_file(content)
31
- when LLM::Message then adapt_content(content.content)
33
+ when LLM::Message then adapt_content(content.content, role: content.role)
32
34
  when LLM::Object
33
35
  case content.kind
34
36
  when :image_url then [{type: :image_url, image_url: {url: content.value.to_s}}]
@@ -46,7 +48,7 @@ module LLM::OpenAI::RequestAdapter
46
48
  when Array
47
49
  adapt_array
48
50
  else
49
- {role: message.role, content: adapt_content(content)}
51
+ {role: message.role, content: adapt_content(content, role: message.role)}
50
52
  end
51
53
  end
52
54
 
@@ -56,7 +58,7 @@ module LLM::OpenAI::RequestAdapter
56
58
  elsif returns.any?
57
59
  returns.map { {type: "function_call_output", call_id: _1.id, output: LLM.json.dump(_1.value)} }
58
60
  else
59
- {role: message.role, content: content.flat_map { adapt_content(_1) }}
61
+ {role: message.role, content: content.flat_map { adapt_content(_1, role: message.role) }}
60
62
  end
61
63
  end
62
64
 
@@ -83,5 +85,9 @@ module LLM::OpenAI::RequestAdapter
83
85
  def message = @message
84
86
  def content = message.content
85
87
  def returns = content.grep(LLM::Function::Return)
88
+
89
+ def text_content_type(role)
90
+ role.to_s == "assistant" ? :output_text : :input_text
91
+ end
86
92
  end
87
93
  end
@@ -60,6 +60,13 @@ module LLM::OpenAI::ResponseAdapter
60
60
  body.model
61
61
  end
62
62
 
63
+ ##
64
+ # OpenAI's Responses API does not expose a system fingerprint.
65
+ # @return [nil]
66
+ def system_fingerprint
67
+ nil
68
+ end
69
+
63
70
  ##
64
71
  # Returns the aggregated text content from the response outputs.
65
72
  # @return [String]
@@ -88,10 +95,15 @@ module LLM::OpenAI::ResponseAdapter
88
95
  private
89
96
 
90
97
  def adapt_message
91
- message = LLM::Message.new("assistant", +"", {response: self, tool_calls: [], reasoning_content: +""})
98
+ message = LLM::Message.new(
99
+ "assistant",
100
+ +"",
101
+ {response: self, tool_calls: [], original_tool_calls: [], reasoning_content: +""}
102
+ )
92
103
  output.each do |choice|
93
104
  if choice.type == "function_call"
94
105
  message.extra[:tool_calls] << adapt_tool(choice)
106
+ message.extra[:original_tool_calls] << choice
95
107
  elsif choice.type == "reasoning"
96
108
  (choice.summary || []).each do |summary|
97
109
  next unless summary["type"] == "summary_text"
@@ -43,11 +43,19 @@ class LLM::OpenAI
43
43
  @body[k] = v
44
44
  end
45
45
  @body["output"] ||= []
46
+ when "response.in_progress", "response.completed"
47
+ response = chunk["response"] || {}
48
+ response.each do |k, v|
49
+ next if k == "output" && @body["output"].is_a?(Array) && @body["output"].any?
50
+ @body[k] = v
51
+ end
52
+ @body["output"] ||= response["output"] || []
46
53
  when "response.output_item.added"
47
54
  output_index = chunk["output_index"]
48
55
  item = chunk["item"]
49
56
  @body["output"][output_index] = item
50
57
  @body["output"][output_index]["content"] ||= []
58
+ @body["output"][output_index]["summary"] ||= [] if item["type"] == "reasoning"
51
59
  when "response.content_part.added"
52
60
  output_index = chunk["output_index"]
53
61
  content_index = chunk["content_index"]
@@ -55,6 +63,25 @@ class LLM::OpenAI
55
63
  @body["output"][output_index] ||= {"content" => []}
56
64
  @body["output"][output_index]["content"] ||= []
57
65
  @body["output"][output_index]["content"][content_index] = part
66
+ when "response.reasoning_summary_text.delta"
67
+ output_item = @body["output"][chunk["output_index"]]
68
+ if output_item && output_item["type"] == "reasoning"
69
+ summary_index = chunk["summary_index"] || 0
70
+ output_item["summary"] ||= []
71
+ output_item["summary"][summary_index] ||= {"type" => "summary_text", "text" => +""}
72
+ output_item["summary"][summary_index]["text"] << chunk["delta"]
73
+ emit_reasoning_content(chunk["delta"])
74
+ end
75
+ when "response.reasoning_summary_text.done"
76
+ output_item = @body["output"][chunk["output_index"]]
77
+ if output_item && output_item["type"] == "reasoning"
78
+ summary_index = chunk["summary_index"] || 0
79
+ output_item["summary"] ||= []
80
+ output_item["summary"][summary_index] = {
81
+ "type" => "summary_text",
82
+ "text" => chunk["text"]
83
+ }
84
+ end
58
85
  when "response.output_text.delta"
59
86
  output_index = chunk["output_index"]
60
87
  content_index = chunk["content_index"]
@@ -102,6 +129,10 @@ class LLM::OpenAI
102
129
  end
103
130
  end
104
131
 
132
+ def emit_reasoning_content(value)
133
+ @stream.on_reasoning_content(value) if @stream.respond_to?(:on_reasoning_content)
134
+ end
135
+
105
136
  def emit_tool(index, tool)
106
137
  return unless @stream.respond_to?(:on_tool_call)
107
138
  return unless complete_tool?(tool)
data/lib/llm/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LLM
4
- VERSION = "4.12.0"
4
+ VERSION = "4.13.0"
5
5
  end
data/llm.gemspec CHANGED
@@ -11,12 +11,22 @@ Gem::Specification.new do |spec|
11
11
  spec.summary = "System integration layer for LLMs, tools, MCP, and APIs in Ruby."
12
12
 
13
13
  spec.description = <<~DESCRIPTION
14
- llm.rb is a Ruby-centric system integration layer for building LLM-powered
15
- systems. It connects LLMs to real systems by turning APIs into tools and
16
- unifying MCP, providers, contexts, and application logic in one execution
17
- model. It supports explicit tool orchestration, concurrent execution,
18
- streaming, multiple MCP sources, and multiple LLM providers for production
19
- systems that integrate external and internal services.
14
+ llm.rb is a runtime for building AI systems that integrate directly with your
15
+ application. It is not just an API wrapper. It provides a unified execution
16
+ model for providers, tools, MCP servers, streaming, schemas, files, and
17
+ state.
18
+
19
+ It is built for engineers who want control over how these systems run.
20
+ llm.rb stays close to Ruby, runs on the standard library by default, loads
21
+ optional pieces only when needed, and remains easy to extend. It also works
22
+ well in Rails or ActiveRecord applications, where a small wrapper around
23
+ context persistence is enough to save and restore long-lived conversation
24
+ state across requests, jobs, or retries.
25
+
26
+ Most LLM libraries stop at request/response APIs. Building real systems
27
+ means stitching together streaming, tools, state, persistence, and external
28
+ services by hand. llm.rb provides a single execution model for all of these,
29
+ so they compose naturally instead of becoming separate subsystems.
20
30
  DESCRIPTION
21
31
 
22
32
  spec.license = "0BSD"
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm.rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.12.0
4
+ version: 4.13.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Antar Azri
@@ -195,12 +195,22 @@ dependencies:
195
195
  - !ruby/object:Gem::Version
196
196
  version: '1.7'
197
197
  description: |
198
- llm.rb is a Ruby-centric system integration layer for building LLM-powered
199
- systems. It connects LLMs to real systems by turning APIs into tools and
200
- unifying MCP, providers, contexts, and application logic in one execution
201
- model. It supports explicit tool orchestration, concurrent execution,
202
- streaming, multiple MCP sources, and multiple LLM providers for production
203
- systems that integrate external and internal services.
198
+ llm.rb is a runtime for building AI systems that integrate directly with your
199
+ application. It is not just an API wrapper. It provides a unified execution
200
+ model for providers, tools, MCP servers, streaming, schemas, files, and
201
+ state.
202
+
203
+ It is built for engineers who want control over how these systems run.
204
+ llm.rb stays close to Ruby, runs on the standard library by default, loads
205
+ optional pieces only when needed, and remains easy to extend. It also works
206
+ well in Rails or ActiveRecord applications, where a small wrapper around
207
+ context persistence is enough to save and restore long-lived conversation
208
+ state across requests, jobs, or retries.
209
+
210
+ Most LLM libraries stop at request/response APIs. Building real systems
211
+ means stitching together streaming, tools, state, persistence, and external
212
+ services by hand. llm.rb provides a single execution model for all of these,
213
+ so they compose naturally instead of becoming separate subsystems.
204
214
  email:
205
215
  - azantar@proton.me
206
216
  - 0x1eef@hardenedbsd.org