llm.rb 4.12.0 → 4.14.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md CHANGED
@@ -4,155 +4,155 @@
4
4
  <p align="center">
5
5
  <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
6
6
  <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
7
- <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.12.0-green.svg?" alt="Version"></a>
7
+ <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.14.0-green.svg?" alt="Version"></a>
8
8
  </p>
9
9
 
10
10
  ## About
11
11
 
12
- llm.rb is a Ruby-centric system integration layer for building real
13
- LLM-powered systems. It connects LLMs to real systems by turning APIs into
14
- tools and unifying MCP, providers, and application logic into a single
15
- execution model. It is used in production systems integrating external and
16
- internal tools, including agents, MCP services, and OpenAPI-based APIs.
12
+ llm.rb is a runtime for building AI systems that integrate directly with your
13
+ application. It is not just an API wrapper. It provides a unified execution
14
+ model for providers, tools, MCP servers, streaming, schemas, files, and
15
+ state.
17
16
 
18
- Built for engineers who want to understand and control their LLM systems. No
19
- frameworks, no hidden magic just composable primitives for building real
20
- applications, from scripts to full systems like [Relay](https://github.com/llmrb/relay).
17
+ It is built for engineers who want control over how these systems run. llm.rb
18
+ stays close to Ruby, runs on the standard library by default, loads optional
19
+ pieces only when needed, and remains easy to extend. It also works well in
20
+ Rails or ActiveRecord applications, where a small wrapper around context
21
+ persistence is enough to save and restore long-lived conversation state across
22
+ requests, jobs, or retries.
21
23
 
22
- Jump to [Quick start](#quick-start), discover its [capabilities](#capabilities), read about
23
- its [architecture](#architecture--execution-model) or watch the
24
- [Screencast](https://www.youtube.com/watch?v=x1K4wMeO_QA) for a deep dive into the design
25
- and capabilities of llm.rb.
24
+ Most LLM libraries stop at request/response APIs. Building real systems means
25
+ stitching together streaming, tools, state, persistence, and external
26
+ services by hand. llm.rb provides a single execution model for all of these,
27
+ so they compose naturally instead of becoming separate subsystems.
26
28
 
27
- ## What Makes It Different
29
+ ## Architecture
28
30
 
29
- Most LLM libraries stop at requests and responses. <br>
30
- llm.rb is built around the state and execution model behind them:
31
-
32
- - **A system layer, not just an API wrapper** <br>
33
- llm.rb unifies LLMs, tools, MCP servers, and application APIs into a single execution model.
34
- - **Contexts are central** <br>
35
- They hold history, tools, schema, usage, cost, persistence, and execution state.
36
- - **Contexts can be serialized** <br>
37
- A context can be serialized to JSON and stored on disk, in a database, in a
38
- job queue, or anywhere else your application needs to persist state.
39
- - **Tool execution is explicit** <br>
40
- Run local, provider-native, and MCP tools sequentially or concurrently with threads, fibers, or async tasks.
41
- - **Run tools while streaming** <br>
42
- Start tool work while a response is still streaming instead of waiting for the turn to finish. <br>
43
- This overlaps tool latency with model output and exposes streamed tool-call events for introspection, making it one of llm.rb's strongest execution features.
44
- - **HTTP MCP can reuse connections** <br>
45
- Opt into persistent HTTP pooling for repeated remote MCP tool calls with `persistent`.
46
- - **One API across providers and capabilities** <br>
47
- The same model covers chat, files, images, audio, embeddings, vector stores, and more.
48
- - **Thread-safe where it matters** <br>
49
- Providers are shareable, while contexts stay isolated and stateful.
50
- - **Local metadata, fewer extra API calls** <br>
51
- A built-in registry provides model capabilities, limits, pricing, and cost estimation.
52
- - **Stdlib-only by default** <br>
53
- llm.rb runs on the Ruby standard library by default, with providers, optional features, and the model registry loaded only when you use them.
54
-
55
- ## What llm.rb Enables
56
-
57
- llm.rb acts as the integration layer between LLMs, tools, and real systems.
58
-
59
- - Turn REST / OpenAPI APIs into LLM tools
60
- - Connect multiple MCP sources (Notion, internal services, etc.)
61
- - Build agents that operate across system boundaries
62
- - Orchestrate tools from multiple providers and protocols
63
- - Stream responses while executing tools concurrently
64
- - Treat LLMs as part of your architecture, not isolated calls
65
-
66
- Without llm.rb, providers, tool formats, and orchestration paths tend to stay
67
- fragmented. With llm.rb, they share a unified execution model with composable
68
- tools and a more consistent system architecture.
69
-
70
- ## Real-World Usage
71
-
72
- llm.rb is used to integrate external MCP services such as Notion, internal APIs
73
- exposed via OpenAPI or `swagger.json`, and multiple tool sources into a unified
74
- execution model. Common usage patterns include combining multiple MCP sources,
75
- turning internal APIs into tools, and running those tools through the same
76
- context and provider flow.
77
-
78
- It supports multiple MCP sources, external SaaS integrations, internal APIs via
79
- OpenAPI, and multiple LLM providers simultaneously.
80
-
81
- ## Architecture & Execution Model
82
-
83
- llm.rb sits at the center of the execution path, connecting tools, MCP
84
- sources, APIs, providers, and your application through explicit contexts:
85
-
86
- ```
87
- External MCP Internal MCP OpenAPI / REST
88
- │ │ │
89
- └────────── Tools / MCP Layer ──────────┘
90
-
91
- llm.rb Contexts
92
-
93
- LLM Providers
94
- (OpenAI, Anthropic, etc.)
95
-
96
- Your Application
97
- ```
98
-
99
- ### Key Design Decisions
31
+ <p align="center">
32
+ <img src="https://github.com/llmrb/llm.rb/raw/main/resources/architecture.png" alt="llm.rb architecture" width="790">
33
+ </p>
100
34
 
101
- - **Thread-safe providers** - `LLM::Provider` instances are safe to share across threads
102
- - **Thread-local contexts** - `LLM::Context` should generally be kept thread-local
103
- - **Lazy loading** - Providers, optional features, and the model registry load on demand
104
- - **JSON adapter system** - Swap JSON libraries (JSON/Oj/Yajl) for performance
105
- - **Registry system** - Local metadata for model capabilities, limits, and pricing
106
- - **Provider adaptation** - Normalizes differences between OpenAI, Anthropic, Google, and other providers
107
- - **Structured tool execution** - Errors are captured and returned as data, not raised unpredictably
108
- - **Function vs Tool APIs** - Choose between class-based tools and closure-based functions
35
+ ## Core Concept
36
+
37
+ `LLM::Context` is the execution boundary in llm.rb.
38
+
39
+ It holds:
40
+ - message history
41
+ - tool state
42
+ - schemas
43
+ - streaming configuration
44
+ - usage and cost tracking
45
+
46
+ Instead of switching abstractions for each feature, everything builds on the
47
+ same context object.
48
+
49
+ ## Differentiators
50
+
51
+ ### Execution Model
52
+
53
+ - **A system layer, not just an API wrapper**
54
+ Put providers, tools, MCP servers, and application APIs behind one runtime
55
+ model instead of stitching them together by hand.
56
+ - **Contexts are central**
57
+ Keep history, tools, schema, usage, persistence, and execution state in one
58
+ place instead of spreading them across your app.
59
+ - **Contexts can be serialized**
60
+ Save and restore live state for jobs, databases, retries, or long-running
61
+ workflows.
62
+
63
+ ### Runtime Behavior
64
+
65
+ - **Streaming and tool execution work together**
66
+ Start tool work while output is still streaming so you can hide latency
67
+ instead of waiting for turns to finish.
68
+ - **Requests can be interrupted cleanly**
69
+ Stop in-flight provider work through the same runtime instead of treating
70
+ cancellation as a separate concern. `LLM::Context#cancel!` is inspired by
71
+ Go's context cancellation model.
72
+ - **Concurrency is a first-class feature**
73
+ Use threads, fibers, or async tasks without rewriting your tool layer.
74
+ - **Advanced workloads are built in, not bolted on**
75
+ Streaming, concurrent tool execution, persistence, tracing, and MCP support
76
+ all fit the same runtime model.
77
+
78
+ ### Integration
79
+
80
+ - **MCP is built in**
81
+ Connect to MCP servers over stdio or HTTP without bolting on a separate
82
+ integration stack.
83
+ - **Provider support is broad**
84
+ Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek,
85
+ Z.ai, xAI, llama.cpp, and Ollama through the same runtime.
86
+ - **Tools are explicit**
87
+ Run local tools, provider-native tools, and MCP tools through the same path
88
+ with fewer special cases.
89
+ - **Providers are normalized, not flattened**
90
+ Share one API surface across providers without losing access to provider-
91
+ specific capabilities where they matter.
92
+ - **Responses keep a uniform shape**
93
+ Provider calls return
94
+ [`LLM::Response`](https://0x1eef.github.io/x/llm.rb/LLM/Response.html)
95
+ objects as a common base shape, then extend them with endpoint- or
96
+ provider-specific behavior when needed.
97
+ - **Low-level access is still there**
98
+ Normalized responses still keep the raw `Net::HTTPResponse` available when
99
+ you need headers, status, or other HTTP details.
100
+ - **Local model metadata is included**
101
+ Model capabilities, pricing, and limits are available locally without extra
102
+ API calls.
103
+
104
+ ### Design Philosophy
105
+
106
+ - **Runs on the stdlib**
107
+ Start with Ruby's standard library and add extra dependencies only when you
108
+ need them.
109
+ - **It is highly pluggable**
110
+ Add tools, swap providers, change JSON backends, plug in tracing, or layer
111
+ internal APIs and MCP servers into the same execution path.
112
+ - **It scales from scripts to long-lived systems**
113
+ The same primitives work for one-off scripts, background jobs, and more
114
+ demanding application workloads with streaming, persistence, and tracing.
115
+ - **Thread boundaries are clear**
116
+ Providers are shareable. Contexts are stateful and should stay thread-local.
109
117
 
110
118
  ## Capabilities
111
119
 
112
- llm.rb provides a complete set of primitives for building LLM-powered systems:
113
-
114
120
  - **Chat & Contexts** — stateless and stateful interactions with persistence
115
- - **Streaming** — real-time responses across providers, including structured stream callbacks
116
- - **Reasoning Support** — full stream, message, and response support when providers expose reasoning
117
- - **Tool Calling** — define and execute functions with automatic orchestration
118
- - **Run Tools While Streaming** — begin tool work before the model finishes its turn
121
+ - **Context Serialization** — save and restore state across processes or time
122
+ - **Streaming** — visible output, reasoning output, tool-call events
123
+ - **Request Interruption** — stop in-flight provider work cleanly
124
+ - **Tool Calling** — class-based tools and closure-based functions
125
+ - **Run Tools While Streaming** — overlap model output with tool latency
119
126
  - **Concurrent Execution** — threads, async tasks, and fibers
120
- - **Agents** — reusable, preconfigured assistants with tool auto-execution
121
- - **Structured Outputs** — JSON schema-based responses
122
- - **MCP Support** — integrate external tool servers dynamically over stdio or HTTP
127
+ - **Agents** — reusable assistants with tool auto-execution
128
+ - **Structured Outputs** — JSON Schema-based responses
129
+ - **Responses API** — stateful response workflows where providers support them
130
+ - **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
123
131
  - **Multimodal Inputs** — text, images, audio, documents, URLs
124
- - **Audio** — text-to-speech, transcription, translation
132
+ - **Audio** — speech generation, transcription, translation
125
133
  - **Images** — generation and editing
126
134
  - **Files API** — upload and reference files in prompts
127
135
  - **Embeddings** — vector generation for search and RAG
128
- - **Vector Stores** — OpenAI-based retrieval workflows
129
- - **Cost Tracking** — estimate usage without API calls
136
+ - **Vector Stores** — retrieval workflows
137
+ - **Cost Tracking** — local cost estimation without extra API calls
130
138
  - **Observability** — tracing, logging, telemetry
131
139
  - **Model Registry** — local metadata for capabilities, limits, pricing
140
+ - **Persistent HTTP** — optional connection pooling for providers and MCP
132
141
 
133
- ## Quick Start
134
-
135
- These examples show individual features, but llm.rb is designed to combine
136
- them into full systems where LLMs, tools, and external services operate
137
- together.
138
-
139
- #### Simple Streaming
142
+ ## Installation
140
143
 
141
- At the simplest level, any object that implements `#<<` can receive visible
142
- output as it arrives. This works with `$stdout`, `StringIO`, files, sockets,
143
- and other Ruby IO-style objects.
144
+ ```bash
145
+ gem install llm.rb
146
+ ```
144
147
 
145
- For more control, llm.rb also supports advanced streaming patterns through
146
- [`LLM::Stream`](lib/llm/stream.rb). See [Advanced Streaming](#advanced-streaming)
147
- for a structured callback-based example. Basic `#<<` streams only receive
148
- visible output chunks:
148
+ ## Example
149
149
 
150
150
  ```ruby
151
- #!/usr/bin/env ruby
152
151
  require "llm"
153
152
 
154
153
  llm = LLM.openai(key: ENV["KEY"])
155
154
  ctx = LLM::Context.new(llm, stream: $stdout)
155
+
156
156
  loop do
157
157
  print "> "
158
158
  ctx.talk(STDIN.gets || break)
@@ -160,623 +160,13 @@ loop do
160
160
  end
161
161
  ```
162
162
 
163
- #### Structured Outputs
164
-
165
- The `LLM::Schema` system lets you define JSON schemas for structured outputs.
166
- Schemas can be defined as classes with `property` declarations or built
167
- programmatically using a fluent interface. When you pass a schema to a context,
168
- llm.rb adapts it into the provider's structured-output format when that
169
- provider supports one. The `content!` method then parses the assistant's JSON
170
- response into a Ruby object:
171
-
172
- ```ruby
173
- #!/usr/bin/env ruby
174
- require "llm"
175
- require "pp"
176
-
177
- class Report < LLM::Schema
178
- property :category, Enum["performance", "security", "outage"], "Report category", required: true
179
- property :summary, String, "Short summary", required: true
180
- property :impact, OneOf[String, Integer], "Primary impact, as text or a count", required: true
181
- property :services, Array[String], "Impacted services", required: true
182
- property :timestamp, String, "When it happened", optional: true
183
- end
184
-
185
- llm = LLM.openai(key: ENV["KEY"])
186
- ctx = LLM::Context.new(llm, schema: Report)
187
- res = ctx.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
188
- pp res.content!
189
-
190
- # {
191
- # "category" => "performance",
192
- # "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
193
- # "impact" => "5% request timeouts",
194
- # "services" => ["Database"],
195
- # "timestamp" => "2024-06-05T10:42:00Z"
196
- # }
197
- ```
198
-
199
- #### Tool Calling
200
-
201
- Tools in llm.rb can be defined as classes inheriting from `LLM::Tool` or as
202
- closures using `LLM.function`. When the LLM requests a tool call, the context
203
- stores `Function` objects in `ctx.functions`. The `call()` method executes all
204
- pending functions and returns their results to the LLM. Tools describe
205
- structured parameters with JSON Schema and adapt those definitions to each
206
- provider's tool-calling format (OpenAI, Anthropic, Google, etc.):
207
-
208
- ```ruby
209
- #!/usr/bin/env ruby
210
- require "llm"
211
-
212
- class System < LLM::Tool
213
- name "system"
214
- description "Run a shell command"
215
- param :command, String, "Command to execute", required: true
216
-
217
- def call(command:)
218
- {success: system(command)}
219
- end
220
- end
221
-
222
- llm = LLM.openai(key: ENV["KEY"])
223
- ctx = LLM::Context.new(llm, stream: $stdout, tools: [System])
224
- ctx.talk("Run `date`.")
225
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
226
- ```
227
-
228
- #### Concurrent Tools
229
-
230
- llm.rb provides explicit concurrency control for tool execution. The
231
- `wait(:thread)` method spawns each pending function in its own thread and waits
232
- for all to complete. You can also use `:fiber` for cooperative multitasking or
233
- `:task` for async/await patterns (requires the `async` gem). The context
234
- automatically collects all results and reports them back to the LLM in a
235
- single turn, maintaining conversation flow while parallelizing independent
236
- operations:
237
-
238
- ```ruby
239
- #!/usr/bin/env ruby
240
- require "llm"
241
-
242
- llm = LLM.openai(key: ENV["KEY"])
243
- ctx = LLM::Context.new(llm, stream: $stdout, tools: [FetchWeather, FetchNews, FetchStock])
244
-
245
- # Execute multiple independent tools concurrently
246
- ctx.talk("Summarize the weather, headlines, and stock price.")
247
- ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
248
- ```
249
-
250
- #### Advanced Streaming
251
-
252
- Use [`LLM::Stream`](lib/llm/stream.rb) when you want more than plain `#<<`
253
- output. It adds structured streaming callbacks for:
254
-
255
- - `on_content` for visible assistant output
256
- - `on_reasoning_content` for separate reasoning output
257
- - `on_tool_call` for streamed tool-call notifications
258
- - `on_tool_return` for completed tool execution
259
-
260
- Subclass [`LLM::Stream`](lib/llm/stream.rb) when you want callbacks like
261
- `on_reasoning_content`, `on_tool_call`, and `on_tool_return`, or helpers like
262
- `queue` and `wait`.
263
-
264
- Keep `on_content`, `on_reasoning_content`, and `on_tool_call` fast: they run
265
- inline with the streaming parser. `on_tool_return` is different: it runs later,
266
- when `wait` resolves queued streamed tool work.
267
-
268
- `on_tool_call` lets tools start before the model finishes its turn, for
269
- example with `tool.spawn(:thread)`, `tool.spawn(:fiber)`, or
270
- `tool.spawn(:task)`. That can overlap tool latency with streaming output.
271
- `on_tool_return` is the place to react when that queued work completes, for
272
- example by updating progress UIs, logging tool latency, or changing visible
273
- state from "Running tool ..." to "Finished tool ...".
274
-
275
- If a stream cannot resolve a tool, `on_tool_call` receives `error` as an
276
- `LLM::Function::Return`. That keeps the session alive and leaves control in
277
- the callback: it can send `error`, spawn the tool when `error == nil`, or
278
- handle the situation however it sees fit.
279
-
280
- In normal use this should be rare, since `on_tool_call` is usually called with
281
- a resolved tool and `error == nil`. To resolve a tool call, the tool must be
282
- found in `LLM::Function.registry`. That covers `LLM::Tool` subclasses,
283
- including MCP tools, but not `LLM.function` closures, which are excluded
284
- because they may be bound to local state:
285
-
286
- ```ruby
287
- #!/usr/bin/env ruby
288
- require "llm"
289
- # Assume `System < LLM::Tool` is already defined.
290
-
291
- class Stream < LLM::Stream
292
- def on_content(content)
293
- $stdout << content
294
- end
295
-
296
- def on_reasoning_content(content)
297
- $stderr << content
298
- end
299
-
300
- def on_tool_call(tool, error)
301
- $stdout << "Running tool #{tool.name}\n"
302
- queue << (error || tool.spawn(:thread))
303
- end
304
-
305
- def on_tool_return(tool, ret)
306
- $stdout << (ret.error? ? "Tool #{tool.name} failed\n" : "Finished tool #{tool.name}\n")
307
- end
308
- end
309
-
310
- llm = LLM.openai(key: ENV["KEY"])
311
- ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])
312
-
313
- ctx.talk("Run `date` and `uname -a`.")
314
- while ctx.functions.any?
315
- ctx.talk(ctx.wait(:thread))
316
- end
317
- ```
318
-
319
- #### MCP
320
-
321
- MCP is a first-class integration mechanism in llm.rb.
322
-
323
- MCP allows llm.rb to treat external services, internal APIs, and system
324
- capabilities as tools in a unified interface. This makes it possible to
325
- connect multiple MCP sources simultaneously and expose your own APIs as tools.
326
-
327
- In practice, this supports workflows such as external SaaS integrations,
328
- multiple MCP sources in the same context, and OpenAPI -> MCP -> tools
329
- pipelines for internal services.
330
-
331
- llm.rb integrates with the Model Context Protocol (MCP) to dynamically discover
332
- and use tools from external servers. This example starts a filesystem MCP
333
- server over stdio and makes its tools available to a context, enabling the LLM
334
- to interact with the local file system through a standardized interface.
335
- Use `LLM::MCP.stdio` or `LLM::MCP.http` when you want to make the transport
336
- explicit. Like `LLM::Context`, an MCP client is stateful and should remain
337
- isolated to a single thread:
338
-
339
- ```ruby
340
- #!/usr/bin/env ruby
341
- require "llm"
342
-
343
- llm = LLM.openai(key: ENV["KEY"])
344
- mcp = LLM::MCP.stdio(argv: ["npx", "-y", "@modelcontextprotocol/server-filesystem", Dir.pwd])
345
-
346
- begin
347
- mcp.start
348
- ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
349
- ctx.talk("List the directories in this project.")
350
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
351
- ensure
352
- mcp.stop
353
- end
354
- ```
355
-
356
- You can also connect to an MCP server over HTTP. This is useful when the
357
- server already runs remotely and exposes MCP through a URL instead of a local
358
- process. If you expect repeated tool calls, use `persistent` to reuse a
359
- process-wide HTTP connection pool. This requires the optional
360
- `net-http-persistent` gem:
361
-
362
- ```ruby
363
- #!/usr/bin/env ruby
364
- require "llm"
365
-
366
- llm = LLM.openai(key: ENV["KEY"])
367
- mcp = LLM::MCP.http(
368
- url: "https://api.githubcopilot.com/mcp/",
369
- headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
370
- ).persistent
371
-
372
- begin
373
- mcp.start
374
- ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
375
- ctx.talk("List the available GitHub MCP toolsets.")
376
- ctx.talk(ctx.call(:functions)) while ctx.functions.any?
377
- ensure
378
- mcp.stop
379
- end
380
- ```
381
-
382
- ## Providers
383
-
384
- llm.rb supports multiple LLM providers with a unified API.
385
- All providers share the same context, tool, and concurrency interfaces, making
386
- it easy to switch between cloud and local models:
387
-
388
- - **OpenAI** (`LLM.openai`)
389
- - **Anthropic** (`LLM.anthropic`)
390
- - **Google** (`LLM.google`)
391
- - **DeepSeek** (`LLM.deepseek`)
392
- - **xAI** (`LLM.xai`)
393
- - **zAI** (`LLM.zai`)
394
- - **Ollama** (`LLM.ollama`)
395
- - **Llama.cpp** (`LLM.llamacpp`)
396
-
397
- ## Production
398
-
399
- #### Ready for production
400
-
401
- llm.rb is designed for production use from the ground up:
402
-
403
- - **Thread-safe providers** - Share `LLM::Provider` instances across your application
404
- - **Thread-local contexts** - Keep `LLM::Context` instances thread-local for state isolation
405
- - **Cost tracking** - Know your spend before the bill arrives
406
- - **Observability** - Built-in tracing with OpenTelemetry support
407
- - **Persistence** - Save and restore contexts across processes
408
- - **Performance** - Swap JSON adapters and enable HTTP connection pooling
409
- - **Error handling** - Structured errors, not unpredictable exceptions
410
-
411
- #### Tracing
412
-
413
- llm.rb includes built-in tracers for local logging, OpenTelemetry, and
414
- LangSmith. Assign a tracer to a provider and all context requests and tool
415
- calls made through that provider will be instrumented. Tracers are local to
416
- the current fiber, so the same provider can use different tracers in different
417
- concurrent tasks without interfering with each other.
418
-
419
- Use the logger tracer when you want structured logs through Ruby's standard
420
- library:
421
-
422
- ```ruby
423
- #!/usr/bin/env ruby
424
- require "llm"
425
-
426
- llm = LLM.openai(key: ENV["KEY"])
427
- llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
428
-
429
- ctx = LLM::Context.new(llm)
430
- ctx.talk("Hello")
431
- ```
432
-
433
- Use the telemetry tracer when you want OpenTelemetry spans. This requires the
434
- `opentelemetry-sdk` gem, and exporters such as OTLP can be added separately:
435
-
436
- ```ruby
437
- #!/usr/bin/env ruby
438
- require "llm"
439
-
440
- llm = LLM.openai(key: ENV["KEY"])
441
- llm.tracer = LLM::Tracer::Telemetry.new(llm)
442
-
443
- ctx = LLM::Context.new(llm)
444
- ctx.talk("Hello")
445
- pp llm.tracer.spans
446
- ```
447
-
448
- Use the LangSmith tracer when you want LangSmith-compatible metadata and trace
449
- grouping on top of the telemetry tracer:
450
-
451
- ```ruby
452
- #!/usr/bin/env ruby
453
- require "llm"
454
-
455
- llm = LLM.openai(key: ENV["KEY"])
456
- llm.tracer = LLM::Tracer::Langsmith.new(
457
- llm,
458
- metadata: {env: "dev"},
459
- tags: ["chatbot"]
460
- )
461
-
462
- ctx = LLM::Context.new(llm)
463
- ctx.talk("Hello")
464
- ```
163
+ ## Resources
465
164
 
466
- #### Thread Safety
467
-
468
- llm.rb uses Ruby's `Monitor` class to ensure thread safety at the provider
469
- level, allowing you to share a single provider instance across multiple threads
470
- while maintaining state isolation through thread-local contexts. This design
471
- enables efficient resource sharing while preventing race conditions in
472
- concurrent applications:
473
-
474
- ```ruby
475
- #!/usr/bin/env ruby
476
- require "llm"
477
-
478
- # Thread-safe providers - create once, use everywhere
479
- llm = LLM.openai(key: ENV["KEY"])
480
-
481
- # Each thread should have its own context for state isolation
482
- Thread.new do
483
- ctx = LLM::Context.new(llm) # Thread-local context
484
- ctx.talk("Hello from thread 1")
485
- end
486
-
487
- Thread.new do
488
- ctx = LLM::Context.new(llm) # Thread-local context
489
- ctx.talk("Hello from thread 2")
490
- end
491
- ```
492
-
493
- #### Performance Tuning
494
-
495
- llm.rb's JSON adapter system lets you swap JSON libraries for better
496
- performance in high-throughput applications. The library supports stdlib JSON,
497
- Oj, and Yajl, with Oj typically offering the best performance. Additionally,
498
- you can enable HTTP connection pooling using the optional `net-http-persistent`
499
- gem to reduce connection overhead in production environments:
500
-
501
- ```ruby
502
- #!/usr/bin/env ruby
503
- require "llm"
504
-
505
- # Swap JSON libraries for better performance
506
- LLM.json = :oj # Use Oj for faster JSON parsing
507
-
508
- # Enable HTTP connection pooling for high-throughput applications
509
- llm = LLM.openai(key: ENV["KEY"]).persistent # Uses net-http-persistent when available
510
- ```
511
-
512
- #### Model Registry
513
-
514
- llm.rb includes a local model registry that provides metadata about model
515
- capabilities, pricing, and limits without requiring API calls. The registry is
516
- shipped with the gem and sourced from https://models.dev, giving you access to
517
- up-to-date information about context windows, token costs, and supported
518
- modalities for each provider:
519
-
520
- ```ruby
521
- #!/usr/bin/env ruby
522
- require "llm"
523
-
524
- # Access model metadata, capabilities, and pricing
525
- registry = LLM.registry_for(:openai)
526
- model_info = registry.limit(model: "gpt-4.1")
527
- puts "Context window: #{model_info.context} tokens"
528
- puts "Cost: $#{model_info.cost.input}/1M input tokens"
529
- ```
530
-
531
- ## More Examples
532
-
533
- #### Responses API
534
-
535
- llm.rb also supports OpenAI's Responses API through `LLM::Context` with
536
- `mode: :responses`. The important switch is `store:`. With `store: false`, the
537
- Responses API stays stateless while still using the Responses endpoint, which
538
- is useful for models or features that are only available through the Responses
539
- API. With `store: true`, OpenAI can keep
540
- response state server-side and reduce how much conversation state needs to be
541
- sent on each turn:
542
-
543
- ```ruby
544
- #!/usr/bin/env ruby
545
- require "llm"
546
-
547
- llm = LLM.openai(key: ENV["KEY"])
548
- ctx = LLM::Context.new(llm, mode: :responses, store: false)
549
-
550
- ctx.talk("Your task is to answer the user's questions", role: :developer)
551
- res = ctx.talk("What is the capital of France?")
552
- puts res.content
553
- ```
554
-
555
- #### Context Persistence: Vanilla
556
-
557
- Contexts can be serialized and restored across process boundaries. A context
558
- can be serialized to JSON and stored on disk, in a database, in a job queue,
559
- or anywhere else your application needs to persist state:
560
-
561
- ```ruby
562
- #!/usr/bin/env ruby
563
- require "llm"
564
-
565
- llm = LLM.openai(key: ENV["KEY"])
566
- ctx = LLM::Context.new(llm)
567
- ctx.talk("Hello")
568
- ctx.talk("Remember that my favorite language is Ruby")
569
-
570
- # Serialize to a string when you want to store the context yourself,
571
- # for example in a database row or job payload.
572
- payload = ctx.to_json
573
-
574
- restored = LLM::Context.new(llm)
575
- restored.restore(string: payload)
576
- res = restored.talk("What is my favorite language?")
577
- puts res.content
578
-
579
- # You can also persist the same state to a file:
580
- ctx.save(path: "context.json")
581
- restored = LLM::Context.new(llm)
582
- restored.restore(path: "context.json")
583
- ```
584
-
585
- #### Context Persistence: ActiveRecord (Rails)
586
-
587
- In a Rails application, you can also wrap persisted context state in an
588
- ActiveRecord model. A minimal schema would include a `snapshot` column for the
589
- serialized context payload (`jsonb` is recommended) and a `provider` column
590
- for the provider name:
591
-
592
- ```ruby
593
- create_table :contexts do |t|
594
- t.jsonb :snapshot
595
- t.string :provider, null: false
596
- t.timestamps
597
- end
598
- ```
599
-
600
- For example:
601
-
602
- ```ruby
603
- class Context < ApplicationRecord
604
- def talk(...)
605
- ctx.talk(...).tap { flush }
606
- end
607
-
608
- def wait(...)
609
- ctx.wait(...).tap { flush }
610
- end
611
-
612
- def messages
613
- ctx.messages
614
- end
615
-
616
- def model
617
- ctx.model
618
- end
619
-
620
- def flush
621
- update_column(:snapshot, ctx.to_json)
622
- end
623
-
624
- private
625
-
626
- def ctx
627
- @ctx ||= begin
628
- ctx = LLM::Context.new(llm)
629
- ctx.restore(string: snapshot) if snapshot
630
- ctx
631
- end
632
- end
633
-
634
- def llm
635
- LLM.method(provider).call(key: ENV.fetch(key))
636
- end
637
-
638
- def key
639
- "#{provider.upcase}_KEY"
640
- end
641
- end
642
- ```
643
-
644
- #### Agents
645
-
646
- Agents in llm.rb are reusable, preconfigured assistants that automatically
647
- execute tool calls and maintain conversation state. Unlike contexts which
648
- require manual tool execution, agents automatically handle the tool call loop,
649
- making them ideal for autonomous workflows where you want the LLM to
650
- independently use available tools to accomplish tasks:
651
-
652
- ```ruby
653
- #!/usr/bin/env ruby
654
- require "llm"
655
-
656
- class SystemAdmin < LLM::Agent
657
- model "gpt-4.1"
658
- instructions "You are a Linux system admin"
659
- tools Shell
660
- schema Result
661
- end
662
-
663
- llm = LLM.openai(key: ENV["KEY"])
664
- agent = SystemAdmin.new(llm)
665
- res = agent.talk("Run 'date'")
666
- ```
667
-
668
- #### Cost Tracking
669
-
670
- llm.rb provides built-in cost estimation that works without making additional
671
- API calls. The cost tracking system uses the local model registry to calculate
672
- estimated costs based on token usage, giving you visibility into spending
673
- before bills arrive. This is particularly useful for monitoring usage in
674
- production applications and setting budget alerts:
675
-
676
- ```ruby
677
- #!/usr/bin/env ruby
678
- require "llm"
679
-
680
- llm = LLM.openai(key: ENV["KEY"])
681
- ctx = LLM::Context.new(llm)
682
- ctx.talk "Hello"
683
- puts "Estimated cost so far: $#{ctx.cost}"
684
- ctx.talk "Tell me a joke"
685
- puts "Estimated cost so far: $#{ctx.cost}"
686
- ```
687
-
688
- #### Multimodal Prompts
689
-
690
- Contexts provide helpers for composing multimodal prompts from URLs, local
691
- files, and provider-managed remote files. These tagged objects let providers
692
- adapt the input into the format they expect:
693
-
694
- ```ruby
695
- #!/usr/bin/env ruby
696
- require "llm"
697
-
698
- llm = LLM.openai(key: ENV["KEY"])
699
- ctx = LLM::Context.new(llm)
700
-
701
- res = ctx.talk ["Describe this image", ctx.image_url("https://example.com/cat.jpg")]
702
- puts res.content
703
- ```
704
-
705
- #### Audio Generation
706
-
707
- llm.rb supports OpenAI's audio API for text-to-speech generation, allowing you
708
- to create speech from text with configurable voices and output formats. The
709
- audio API returns binary audio data that can be streamed directly to files or
710
- other IO objects, enabling integration with multimedia applications:
711
-
712
- ```ruby
713
- #!/usr/bin/env ruby
714
- require "llm"
715
-
716
- llm = LLM.openai(key: ENV["KEY"])
717
- res = llm.audio.create_speech(input: "Hello world")
718
- IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
719
- ```
720
-
721
- #### Image Generation
722
-
723
- llm.rb provides access to OpenAI's DALL-E image generation API through a
724
- unified interface. The API supports multiple response formats including
725
- base64-encoded images and temporary URLs, with automatic handling of binary
726
- data streaming for efficient file operations:
727
-
728
- ```ruby
729
- #!/usr/bin/env ruby
730
- require "llm"
731
-
732
- llm = LLM.openai(key: ENV["KEY"])
733
- res = llm.images.create(prompt: "a dog on a rocket to the moon")
734
- IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
735
- ```
736
-
737
- #### Embeddings
738
-
739
- llm.rb's embedding API generates vector representations of text for semantic
740
- search and retrieval-augmented generation (RAG) workflows. The API supports
741
- batch processing of multiple inputs and returns normalized vectors suitable for
742
- vector similarity operations, with consistent dimensionality across providers:
743
-
744
- ```ruby
745
- #!/usr/bin/env ruby
746
- require "llm"
747
-
748
- llm = LLM.openai(key: ENV["KEY"])
749
- res = llm.embed(["programming is fun", "ruby is a programming language", "sushi is art"])
750
- puts res.class
751
- puts res.embeddings.size
752
- puts res.embeddings[0].size
753
-
754
- # LLM::Response
755
- # 3
756
- # 1536
757
- ```
758
-
759
- ## Real-World Example: Relay
760
-
761
- See how these pieces come together in a complete application architecture with
762
- [Relay](https://github.com/llmrb/relay), a production-ready LLM application
763
- built on llm.rb that demonstrates:
764
-
765
- - Context management across requests
766
- - Tool composition and execution
767
- - Concurrent workflows
768
- - Cost tracking and observability
769
- - Production deployment patterns
770
-
771
- Watch the screencast:
772
-
773
- [![Watch the llm.rb screencast](https://img.youtube.com/vi/Jb7LNUYlCf4/maxresdefault.jpg)](https://www.youtube.com/watch?v=x1K4wMeO_QA)
774
-
775
- ## Installation
776
-
777
- ```bash
778
- gem install llm.rb
779
- ```
165
+ - [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
166
+ examples guide.
167
+ - [_examples/relay](./_examples/relay) shows a real application built on top
168
+ of llm.rb.
169
+ - [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
780
170
 
781
171
  ## License
782
172