llm.rb 11.3.1 → 12.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +242 -1
  3. data/LICENSE +92 -17
  4. data/README.md +204 -623
  5. data/data/anthropic.json +433 -249
  6. data/data/bedrock.json +2097 -1055
  7. data/data/deepinfra.json +993 -0
  8. data/data/deepseek.json +53 -28
  9. data/data/google.json +389 -771
  10. data/data/openai.json +1053 -771
  11. data/data/xai.json +133 -292
  12. data/data/zai.json +249 -141
  13. data/lib/llm/active_record/acts_as_agent.rb +3 -41
  14. data/lib/llm/active_record/acts_as_llm.rb +18 -0
  15. data/lib/llm/active_record.rb +3 -3
  16. data/lib/llm/context.rb +9 -5
  17. data/lib/llm/contract/completion.rb +2 -2
  18. data/lib/llm/provider.rb +2 -2
  19. data/lib/llm/providers/deepinfra/audio.rb +66 -0
  20. data/lib/llm/providers/deepinfra/images.rb +90 -0
  21. data/lib/llm/providers/deepinfra/response_adapter.rb +36 -0
  22. data/lib/llm/providers/deepinfra.rb +100 -0
  23. data/lib/llm/providers/deepseek/images.rb +109 -0
  24. data/lib/llm/providers/deepseek/request_adapter.rb +32 -0
  25. data/lib/llm/providers/deepseek/response_adapter/image.rb +9 -0
  26. data/lib/llm/providers/deepseek/response_adapter.rb +29 -0
  27. data/lib/llm/providers/deepseek.rb +4 -2
  28. data/lib/llm/providers/google/request_adapter.rb +22 -5
  29. data/lib/llm/providers/google.rb +4 -4
  30. data/lib/llm/providers/openai/audio.rb +6 -2
  31. data/lib/llm/providers/openai/images.rb +9 -50
  32. data/lib/llm/providers/openai/request_adapter/respond.rb +38 -4
  33. data/lib/llm/providers/openai/response_adapter/audio.rb +5 -1
  34. data/lib/llm/providers/openai/response_adapter/completion.rb +1 -1
  35. data/lib/llm/providers/openai/response_adapter/image.rb +0 -4
  36. data/lib/llm/providers/openai/responses.rb +1 -0
  37. data/lib/llm/providers/openai/stream_parser.rb +5 -6
  38. data/lib/llm/providers/openai.rb +2 -2
  39. data/lib/llm/providers/xai/images.rb +49 -26
  40. data/lib/llm/providers/xai.rb +2 -2
  41. data/lib/llm/response.rb +10 -0
  42. data/lib/llm/schema/leaf.rb +7 -1
  43. data/lib/llm/schema/renderer.rb +121 -0
  44. data/lib/llm/schema.rb +30 -0
  45. data/lib/llm/sequel/agent.rb +2 -43
  46. data/lib/llm/sequel/plugin.rb +25 -7
  47. data/lib/llm/tracer/telemetry.rb +4 -6
  48. data/lib/llm/tracer.rb +9 -21
  49. data/lib/llm/transport/execution.rb +16 -1
  50. data/lib/llm/transport/net_http_adapter.rb +1 -1
  51. data/lib/llm/uridata.rb +16 -0
  52. data/lib/llm/version.rb +1 -1
  53. data/lib/llm.rb +9 -0
  54. data/llm.gemspec +5 -18
  55. data/resources/deepdive.md +798 -264
  56. metadata +15 -18
  57. data/lib/llm/tracer/langsmith.rb +0 -144
data/README.md CHANGED
@@ -12,113 +12,75 @@
12
12
 
13
13
  > A [r.uby.dev](https://r.uby.dev) project.
14
14
 
15
- Ruby's capable AI runtime.
15
+ Welcome to the canonical llm.rb repository.
16
16
 
17
- It provides one Ruby interface for building with large language models:
18
- providers, agents, tools, skills, MCP, A2A, RAG, streaming, files, and persisted conversation state all share the same runtime.
17
+ llm.rb is not a library, framework or toolkit but an advanced runtime
18
+ for building highly capable AI applications on CRuby. By default
19
+ it has zero runtime dependencies although certain functionality –
20
+ such as ActiveRecord support – require optional dependencies
21
+ that are opt-in.
19
22
 
20
- The gem runs on Ruby's standard library by default and loads optional
21
- integrations only when needed. It supports OpenAI, OpenAI-compatible
22
- endpoints, Anthropic, Google Gemini, DeepSeek, xAI, Z.ai, AWS Bedrock,
23
- Ollama, and llama.cpp, with built-in ActiveRecord and Sequel support.
23
+ ## Features
24
24
 
25
- ## Services
25
+ The runtime supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google
26
+ Gemini, DeepSeek, DeepInfra, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp.
27
+ It has first-class support for streaming, tool calls, MCP
28
+ and A2A, embeddings, vector stores and the RAG pattern.
26
29
 
27
- llm.rb is a [r.uby.dev](https://r.uby.dev) project
28
- that is part of a growing family of AI-related
29
- projects that also includes publically accessible
30
- SSH services.
30
+ There are multiple HTTP backends to choose from, tools can be run concurrently
31
+ or in parallel via threads, async tasks, fibers, ractors, and fork, and it is
32
+ also possible to make a tool call while the model is still streaming.
31
33
 
32
- #### matz - the mruby expert
34
+ The runtime builds on top of three core concepts: providers, contexts, and agents,
35
+ so once you learn the fundamentals, everything else falls into place naturally. And once
36
+ you learn llm.rb, you will also be able to use <a href="https://r.uby.dev/mruby-llm">mruby-llm</a> and
37
+ <a href="https://r.uby.dev/wasm-llm">wasm-llm</a> because the API is pretty much identical.
33
38
 
34
- > ssh matz@r.uby.dev
39
+ ## Install
35
40
 
36
- See [https://r.uby.dev/matz](https://r.uby.dev/matz) for more information.
37
-
38
- #### robert - the freebsd expert
39
-
40
- > ssh robert@4.4bsd.dev
41
-
42
- See [https://4.4bsd.dev/robert](https://4.4bsd.dev/robert) for more information.
41
+ ```bash
42
+ gem install llm.rb
43
+ ```
43
44
 
44
45
  ## Quick start
45
46
 
46
- #### LLM::Context
47
-
48
- The
49
- [LLM::Context](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
50
- object is at the heart of the runtime. Almost all other features build
51
- on top of it. It is a low-level interface to a model, and requires tool
52
- execution to be managed manually. The
53
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
54
- class is almost the same as
55
- [LLM::Context](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
56
- but it manages tool execution for you - we'll cover agents next:
57
-
58
- ```ruby
59
- require "llm"
60
-
61
- llm = LLM.openai(key: ENV["KEY"])
62
- ctx = LLM::Context.new(llm, stream: $stdout)
63
- ctx.talk "Hello world"
64
- ```
65
-
66
47
  #### LLM::Agent
67
48
 
68
- The
69
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
70
- object is implemented on top of
71
- [LLM::Context](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html).
72
- It provides the same interface, but manages tool execution for you. It
73
- also has builtin features such as a loop guard that detects repeated
74
- tool call patterns, and another guard that detects infinite tool call
75
- loops. Both guards advise the model to change course rather than raise
76
- an error:
49
+ The [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) class is the default high-level interface,
50
+ and it is recommended for most use-cases. It manages tool execution
51
+ automatically, guards against infinite loops, manages conversation
52
+ state, and much more.
77
53
 
78
54
  ```ruby
79
55
  require "llm"
80
56
 
81
- llm = LLM.openai(key: ENV["KEY"])
57
+ llm = LLM.deepseek(key: ENV["KEY"])
82
58
  agent = LLM::Agent.new(llm, stream: $stdout)
83
59
  agent.talk "Hello world"
84
60
  ```
85
61
 
86
- #### Agents (Advanced)
62
+ #### LLM::Context
87
63
 
88
- An agent can be configured to require confirmation before a tool is
89
- executed. When a matching tool is called, llm.rb runs
90
- `on_tool_confirmation`. That callback must decide whether to cancel the
91
- tool call or approve it and execute it by calling
92
- `fn.spawn(strategy).wait`, and it must always return an instance of
93
- [`LLM::Function::Return`](https://r.uby.dev/api-docs/llm.rb/LLM/Function/Return.html):
64
+ The [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) class is at the heart of the runtime
65
+ and it is what [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) uses under the hood.
66
+ It requires that the tool call loop be managed manually -
67
+ sometimes that can be useful, but usually for advanced use-cases.
68
+ If you're new to llm.rb, try [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) first.
94
69
 
95
70
  ```ruby
96
71
  require "llm"
97
72
 
98
- class Agent < LLM::Agent
99
- tools DeleteFile
100
- confirm "delete-file"
101
-
102
- def on_tool_confirmation(fn, strategy)
103
- path = fn.arguments.path
104
- if path.start_with?("/tmp/")
105
- fn.spawn(strategy).wait
106
- else
107
- fn.cancel(reason: "Deletion requires approval")
108
- end
109
- end
110
- end
111
-
112
- llm = LLM.openai(key: ENV["KEY"])
113
- Agent.new(llm, stream: $stdout).talk("Delete /tmp/example.txt.")
73
+ llm = LLM.deepseek(key: ENV["KEY"])
74
+ ctx = LLM::Context.new(llm, stream: $stdout)
75
+ ctx.talk "Hello world"
114
76
  ```
115
77
 
116
- #### Tools
78
+ #### LLM::Tool
117
79
 
118
- The
119
- [LLM::Tool](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
120
- class can be subclassed to implement your own tools that can extend the
121
- abilities of a model:
80
+ Subclasses of [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html) are plain Ruby classes with
81
+ an optional set of typed parameters. <br> The model can choose to
82
+ call them on your behalf, and they're one of the most powerful features
83
+ for extending the feature set or abilities of a model.
122
84
 
123
85
  ```ruby
124
86
  class ReadFile < LLM::Tool
@@ -128,629 +90,248 @@ class ReadFile < LLM::Tool
128
90
  required %i[path]
129
91
 
130
92
  def call(path:)
131
- { contents: File.read(path) }
93
+ {contents: File.read(path)}
132
94
  end
133
95
  end
134
96
  ```
135
97
 
136
- #### MCP
137
-
138
- The
139
- [LLM::MCP](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html)
140
- object lets llm.rb use tools provided by an MCP server. Those tools are
141
- exposed through the same runtime as local tools, so you can pass them
142
- to either
143
- [LLM::Context](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
144
- or
145
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
146
- In this example, the MCP server runs over stdio and
147
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
148
- manages the tool loop. For **stdio**, `mcp.session` is the preferred
149
- pattern because it keeps one MCP session alive across discovery and
150
- tool calls:
151
-
152
- ```ruby
153
- require "llm"
154
-
155
- llm = LLM.openai(key: ENV["KEY"])
156
- mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
157
-
158
- mcp.session do
159
- agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
160
- agent.talk "Use the available tools to inspect the environment."
161
- end
162
- ```
163
-
164
- MCP can also be used without `session`. Although it works it is generally
165
- not recommended for the **stdio** transport because it is inefficient
166
- to start and stop a fresh MCP process for tool discovery and each tool
167
- call:
168
-
169
- ```ruby
170
- require "llm"
171
-
172
- llm = LLM.openai(key: ENV["KEY"])
173
- mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
174
-
175
- agent = LLM::Agent.new(llm, tools: mcp.tools)
176
- agent.talk("Use the available tools to inspect the environment.")
177
- ```
178
-
179
- The HTTP transport can be used with or without the `session` method,
180
- and unlike the stdio transport it can remain efficient without the
181
- `session` method through a persistent connection pool that is available
182
- through the
183
- [LLM::Transport.net_http_persistent](https://r.uby.dev/api-docs/llm.rb/LLM/Transport.html#method-c-net_http_persistent)
184
- transport:
185
-
186
- ```ruby
187
- require "llm"
188
-
189
- llm = LLM.openai(key: ENV["KEY"])
190
- mcp = LLM::MCP.http(
191
- url: "https://remote-mcp.example.com",
192
- transport: :net_http_persistent
193
- )
194
-
195
- agent = LLM::Agent.new(llm, tools: mcp.tools)
196
- agent.talk("Use the available tools to inspect the environment.")
197
- ```
198
-
199
- #### A2A (Agent 2 Agent)
200
-
201
- The
202
- [LLM::A2A](https://r.uby.dev/api-docs/llm.rb/LLM/A2A.html)
203
- object lets llm.rb use skills provided by a remote A2A agent. Those
204
- skills are exposed through the same runtime as local tools, so you can
205
- pass them to either
206
- [LLM::Context](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
207
- or
208
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
209
-
210
- Use remote skills as local tools:
211
-
212
- ```ruby
213
- require "llm"
214
-
215
- a2a = LLM::A2A.rest(
216
- url: "https://remote-agent.example.com",
217
- headers: { "Authorization" => "Bearer token" }
218
- )
219
- llm = LLM.openai(key: ENV["KEY"])
220
- agent = LLM::Agent.new(llm, tools: a2a.skills)
221
- agent.talk "Analyze this CSV and summarize the trends."
222
- ```
223
-
224
- Use persistent HTTP connections:
225
-
226
- ```ruby
227
- require "llm"
228
-
229
- a2a = LLM::A2A.rest(
230
- url: "https://remote-agent.example.com",
231
- transport: :net_http_persistent
232
- )
233
- ```
234
-
235
- For more on direct messaging, task operations, push notification
236
- configs, and JSON-RPC, see the
237
- [LLM::A2A API docs](https://r.uby.dev/api-docs/llm.rb/LLM/A2A.html).
238
-
239
- #### Transports
240
-
241
- Providers use Ruby's standard library Net::HTTP transport by default.
242
- You can opt into persistent Net::HTTP connections with `persistent: true`,
243
- or provide a transport shortcut when you want a different backend.
244
- `transport: :curb` uses libcurl through the optional `curb` gem.
245
-
246
- Custom transports can implement the
247
- [LLM::Transport](https://r.uby.dev/api-docs/llm.rb/LLM/Transport.html)
248
- interface and receive transport-agnostic
249
- [LLM::Transport::Request](https://r.uby.dev/api-docs/llm.rb/LLM/Transport/Request.html)
250
- objects from providers.
251
-
252
- ```ruby
253
- require "llm"
254
-
255
- llm = LLM.openai(key: ENV["KEY"], persistent: true)
256
- llm = LLM.openai(key: ENV["KEY"], transport: :net_http_persistent)
257
- llm = LLM.openai(key: ENV["KEY"], transport: :curb)
258
- ```
259
-
260
- #### Skills
261
-
262
- Skills are reusable instructions loaded from a `SKILL.md` directory. They let
263
- you package behavior and tool access together, and they plug into the
264
- same runtime as tools, agents, MCP, and A2A. When a skill runs, llm.rb
265
- spawns a subagent with the skill instructions, access to only the tools
266
- listed in the skill, and recent conversation context:
267
-
268
- ```yaml
269
- ---
270
- name: release
271
- description: Prepare a release
272
- tools: ["search-docs", "git"]
273
- ---
274
-
275
- ## Task
276
-
277
- Review the release state, summarize what changed, and prepare the release.
278
- ```
279
-
280
- ```ruby
281
- require "llm"
282
-
283
- class ReleaseAgent < LLM::Agent
284
- model "gpt-5.4-mini"
285
- skills "./skills/release"
286
- end
287
-
288
- llm = LLM.openai(key: ENV["KEY"])
289
- ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")
290
- ```
291
-
292
- A skill can also have its sub-agent inherit the parents tools through the
293
- `inherit` directive. The `inherit` directive has coverage for the "classic"
294
- tools (a subclass of [LLM::Tool](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)),
295
- MCP tools, and A2A tools that a parent context or agent has access to:
296
-
297
- ```yaml
298
- ---
299
- name: release
300
- description: Prepare a release
301
- tools: inherit
302
- ---
303
- ```
304
-
305
98
  #### LLM::Stream
306
99
 
307
- The
308
- [LLM::Stream](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
309
- object lets you observe output and runtime events as they happen. You
310
- can subclass it to handle streamed content in your own application:
311
-
312
- ```ruby
313
- require "llm"
314
-
315
- class Stream < LLM::Stream
316
- def on_content(content)
317
- $stdout << content
318
- end
319
- end
320
-
321
- llm = LLM.openai(key: ENV["KEY"])
322
- agent = LLM::Agent.new(llm, stream: Stream.new)
323
- agent.talk "Write a haiku about Ruby."
324
- ```
325
-
326
- #### LLM::Stream (advanced)
327
-
328
- The
329
- [LLM::Stream](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
330
- object can also resolve tool calls while output is still streaming. In
331
- `on_tool_call`, you can spawn the tool, push the work onto the stream
332
- queue, and later drain it with `wait`:
100
+ Streams can be simple IO objects or subclasses of
101
+ [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html) with structured callbacks for content,
102
+ reasoning, tool calls, tool returns, and compaction.
333
103
 
334
104
  ```ruby
335
- require "llm"
336
-
337
- class Stream < LLM::Stream
105
+ class MyStream < LLM::Stream
338
106
  def on_content(content)
339
- $stdout << content
107
+ print content
340
108
  end
341
109
 
342
- def on_tool_call(tool, error)
343
- return queue << error if error
344
- queue << ctx.spawn(tool, :thread)
110
+ def on_reasoning_content(content)
111
+ warn content
345
112
  end
346
113
  end
347
114
 
348
- llm = LLM.openai(key: ENV["KEY"])
349
- ctx = LLM::Context.new(llm, stream: Stream.new, tools: [ReadFile])
350
- ctx.talk "Read README.md and summarize the quick start."
351
- ctx.talk(ctx.wait) while ctx.functions?
352
- ```
353
-
354
- #### Concurrency
355
-
356
- llm.rb can run tool work concurrently. This is useful when a model calls
357
- multiple tools and you want to resolve them in parallel instead of one
358
- at a time. On
359
- [LLM::Agent](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
360
- you can enable this with `concurrency`. Common options are `:call` for
361
- sequential execution, `:thread`, or `:task` for concurrent IO-bound work, and
362
- `:ractor` or `:fork` for more isolated CPU-bound work:
363
-
364
- ```ruby
365
- require "llm"
366
-
367
- class Agent < LLM::Agent
368
- model "gpt-5.4-mini"
369
- tools ReadFile
370
- concurrency :thread
371
- end
372
-
373
- llm = LLM.openai(key: ENV["KEY"])
374
- agent = Agent.new(llm, stream: $stdout)
375
- agent.talk "Read README.md and CHANGELOG.md and compare them."
376
- ```
377
-
378
- #### Serialization
379
-
380
- The [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
381
- object can be serialized to JSON, which makes it suitable for storing
382
- in a file, a database column, or a Redis queue. The built-in
383
- ActiveRecord and Sequel plugins are built on top of the same underlying
384
- serialization feature:
385
-
386
- ```ruby
387
- require "llm"
388
-
389
- llm = LLM.openai(key: ENV["KEY"])
390
-
391
- # Serialize an agent
392
- agent1 = LLM::Agent.new(llm)
393
- agent1.talk "Remember that my favorite language is Ruby"
394
- string = agent1.to_json
395
-
396
- # Restore an agent (from JSON)
397
- agent2 = LLM::Agent.new(llm, stream: $stdout)
398
- agent2.restore(string:)
399
- agent2.talk "What is my favorite language?"
400
- ```
401
-
402
- #### ask
403
-
404
- [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
405
- also provides `ask`, a convenience interface that is compatible with
406
- RubyLLM's `ask` method. It accepts a prompt, an optional `with:`
407
- attachment path or paths, an optional `stream:` target, and an optional
408
- block that chunks are yielded to. It returns an
409
- [`LLM::Response`](https://r.uby.dev/api-docs/llm.rb/LLM/Response.html),
410
- so use `.content` when you want the text directly:
411
-
412
- ```ruby
413
- require "llm"
414
-
415
- llm = LLM.openai(key: ENV["KEY"])
416
- agent = LLM::Agent.new(llm)
417
-
418
- puts agent.ask("Hello world").content
419
- puts agent.ask("Summarize this document.", with: "README.md").content
420
- agent.ask("Stream this reply.") { $stdout << _1 }
421
- ```
422
-
423
- ## Installation
424
-
425
- ```bash
426
- gem install llm.rb
115
+ llm = LLM.deepseek(key: ENV["KEY"])
116
+ agent = LLM::Agent.new(llm, stream: MyStream.new)
117
+ agent.talk "Explain Ruby fibers."
427
118
  ```
428
119
 
429
- ## Examples
120
+ #### LLM::MCP
430
121
 
431
- #### REPL
432
-
433
- This example uses [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
434
- for an interactive REPL. <br> See the
435
- [deepdive (web)](https://r.uby.dev/llm/) or
436
- [deepdive (markdown)](resources/deepdive.md) for more examples.
122
+ The Model Context Protocol (MCP) has first-class support
123
+ in llm.rb. The stdio and http transports work out of the
124
+ box. MCP tools are translated into subclasses of
125
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html) that can be used with [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
126
+ or [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
437
127
 
438
128
  ```ruby
439
129
  require "llm"
440
130
 
441
- llm = LLM.openai(key: ENV["KEY"])
442
- agent = LLM::Agent.new(llm, stream: $stdout)
443
-
444
- loop do
445
- print "> "
446
- agent.talk(STDIN.gets || break)
447
- puts
448
- end
131
+ llm = LLM.deepseek(key: ENV["KEY"])
132
+ mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
133
+ agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
134
+ agent.talk "Run the tool"
449
135
  ```
450
136
 
451
- #### Multimodal: Local Files
452
-
453
- In llm.rb, a prompt can be a string, an [`LLM::Prompt`](https://r.uby.dev/api-docs/llm.rb/LLM/Prompt.html), or an array.
454
- When you use an array, each element can be plain text or a tagged object such as
455
- [`agent.image_url(...)`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#image_url-instance_method),
456
- [`agent.local_file(...)`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#local_file-instance_method),
457
- or [`agent.remote_file(...)`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#remote_file-instance_method).
458
- Those tagged objects carry the metadata the provider adapter needs to turn one
459
- Ruby prompt into the provider-specific multimodal request schema.
137
+ #### LLM::A2A
460
138
 
461
- If the model understands that file type, you can attach a local file directly
462
- with `agent.ask(..., with: path)` instead of uploading it first through a
463
- provider Files API. Under the hood, llm.rb tags the path as a
464
- [`agent.local_file(...)`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#local_file-instance_method)
465
- object:
139
+ The Agent 2 Agent (A2A) protocol has first-class support
140
+ in llm.rb. The http and jsonrpc transports work out of the
141
+ box. A2A skills are translated into subclasses of
142
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html) that can be used with [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
143
+ or [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
466
144
 
467
145
  ```ruby
468
146
  require "llm"
469
147
 
470
- llm = LLM.openai(key: ENV["KEY"])
471
- agent = LLM::Agent.new(llm)
472
- puts agent.ask("Summarize this document.", with: "README.md").content
148
+ llm = LLM.deepseek(key: ENV["KEY"])
149
+ a2a = LLM::A2A.rest(url: "https://remote-agent.example.com")
150
+ agent = LLM::Agent.new(llm, stream: $stdout, tools: a2a.skills)
151
+ agent.talk "Run the skill"
473
152
  ```
474
153
 
475
- #### Context Compaction
476
-
477
- This example uses [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
478
- [`LLM::Compactor`](https://r.uby.dev/api-docs/llm.rb/LLM/Compactor.html), and
479
- [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html) together so
480
- long-lived conversations can summarize older history and expose the lifecycle
481
- through stream hooks. This approach is inspired by General Intelligence
482
- Systems. The
483
- compactor can also use its own `model:` if you want summarization to run on a
484
- different model from the main conversation. `token_threshold:` accepts either a
485
- fixed token count or a percentage string like `"90%"`, which resolves
486
- against the active model context window and triggers compaction once total
487
- token usage goes over that percentage. See the
488
- [deepdive (web)](https://r.uby.dev/llm/) or
489
- [deepdive (markdown)](resources/deepdive.md) for more examples.
154
+ #### RAG
490
155
 
491
- ```ruby
492
- require "llm"
493
-
494
- class Stream < LLM::Stream
495
- def on_compaction(ctx, compactor)
496
- puts "Compacting #{ctx.messages.size} messages..."
497
- end
156
+ Most providers offer an embedding model that can be
157
+ used for semantic search, or similarity search. An
158
+ embedding model can generate embeddings that can then
159
+ be stored in a database that is optimized for storing
160
+ and querying vectors, such as SQLite's [sqlite-vec](https://github.com/asg017/sqlite-vec)
161
+ or PostgreSQL's [pg-vector](https://github.com/pgvector/pgvector).
498
162
 
499
- def on_compaction_finish(ctx, compactor)
500
- puts "Compacted to #{ctx.messages.size} messages."
501
- end
502
- end
503
-
504
- llm = LLM.openai(key: ENV["KEY"])
505
- agent = LLM::Agent.new(
506
- llm,
507
- stream: Stream.new,
508
- compactor: {
509
- token_threshold: "90%",
510
- retention_window: 8,
511
- model: "gpt-5.4-mini"
512
- }
513
- )
514
- ```
515
-
516
- #### Reasoning
517
-
518
- This example uses [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
519
- with the OpenAI Responses API so reasoning output is streamed separately from
520
- visible assistant output. See the
521
- [deepdive (web)](https://r.uby.dev/llm/) or
522
- [deepdive (markdown)](resources/deepdive.md) for more examples.
523
-
524
- To use the Responses API (OpenAI-specific), initialize an agent with
525
- `mode: :responses` and keep using `talk` for turns.
163
+ llm.rb also includes support for OpenAI's vector store API. It
164
+ provides a vector database as a HTTP service but we won't cover
165
+ that here.
526
166
 
527
167
  ```ruby
528
168
  require "llm"
529
169
 
530
- class Stream < LLM::Stream
531
- def on_content(content)
532
- $stdout << content
533
- end
170
+ llm = LLM.openai(key: ENV["KEY"])
171
+ body = "llm.rb is Ruby's capable AI runtime."
172
+ embedding = llm.embed([body]).embeddings.first
534
173
 
535
- def on_reasoning_content(content)
536
- $stderr << content
537
- end
538
- end
539
-
540
- llm = LLM.openai(key: ENV["KEY"])
541
- agent = LLM::Agent.new(
542
- llm,
543
- model: "gpt-5.4-mini",
544
- mode: :responses,
545
- reasoning: { effort: "medium" },
546
- stream: Stream.new
174
+ Document.create!(
175
+ title: "llm.rb",
176
+ body:,
177
+ embedding:,
547
178
  )
548
- agent.talk("Solve 17 * 19 and show your work.")
549
179
  ```
550
180
 
551
- #### Request Cancellation
552
-
553
- Need to cancel a stream? llm.rb has you covered through
554
- [`LLM::Agent#interrupt!`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#interrupt-21-instance_method).
555
- <br> See the [deepdive (web)](https://r.uby.dev/llm/)
556
- or [deepdive (markdown)](resources/deepdive.md) for more examples.
557
-
558
- ```ruby
559
- require "llm"
560
- require "io/console"
561
-
562
- llm = LLM.openai(key: ENV["KEY"])
563
- agent = LLM::Agent.new(llm, stream: $stdout)
564
- worker = Thread.new do
565
- agent.talk("Write a very long essay about network protocols.")
566
- rescue LLM::Interrupt
567
- puts "Request was interrupted!"
568
- end
569
-
570
- STDIN.getch
571
- agent.interrupt!
572
- worker.join
573
- ```
181
+ #### Concurrency
574
182
 
575
- #### Sequel (ORM)
183
+ The runtime supports five different concurrency strategies that have
184
+ different attributes. The choice between all of them often depends
185
+ on the requirements of your application.
576
186
 
577
- The `plugin :llm` integration wraps
578
- [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) on a
579
- `Sequel::Model` and keeps tool execution explicit. Like the ActiveRecord
580
- wrappers, its built-in persistence contract is the serialized `data` column,
581
- while `provider:` resolves a real `LLM::Provider` instance and `context:`
582
- injects defaults such as `model:`. <br> See the
583
- [deepdive (web)](https://r.uby.dev/llm/) or
584
- [deepdive (markdown)](resources/deepdive.md) for more examples.
187
+ IO-bound tools are a good fit for the `:task`, `:thread`,
188
+ and `:fiber` strategies while true parallelism can be achieved
189
+ with the `:fork` and `:ractor` strategies. The
190
+ `:fork` strategy also provides a separate process that offers
191
+ isolation from its parent.
585
192
 
586
193
  ```ruby
587
194
  require "llm"
588
- require "net/http/persistent"
589
- require "sequel"
590
- require "sequel/plugins/llm"
591
-
592
- class Context < Sequel::Model
593
- plugin :llm, provider: :set_provider, context: :set_context
594
-
595
- private
596
-
597
- def set_provider
598
- LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
599
- end
600
-
601
- def set_context
602
- { model: "gpt-5.4-mini", mode: :responses, store: false }
603
- end
604
- end
605
195
 
606
- ctx = Context.create
607
- ctx.talk("Remember that my favorite language is Ruby")
608
- puts ctx.talk("What is my favorite language?").content
196
+ llm = LLM.deepseek(key: ENV["KEY"])
197
+ tools = [FetchNews, FetchStocks, FetchFeeds]
198
+ agent = LLM::Agent.new(llm, tools:, concurrency: :fork)
199
+ agent.talk "Run the tools in parallel"
609
200
  ```
610
201
 
611
- #### ActiveRecord (ORM): acts_as_llm
202
+ #### ORM
612
203
 
613
- The `acts_as_llm` method wraps [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) and
614
- provides full control over tool execution. Its built-in persistence contract is
615
- one serialized `data` column. If your app has provider, model, or usage
616
- columns, provide them to llm.rb through `provider:` and `context:` instead of
617
- relying on reserved wrapper columns.
204
+ Because both [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html), and [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
205
+ can be serialized to JSON and stored in a simple string, both ActiveRecord
206
+ and Sequel support can be implemented within a single column on a single row.
618
207
 
619
- See the [deepdive (web)](https://r.uby.dev/llm/)
620
- or [deepdive (markdown)](resources/deepdive.md) for more examples.
208
+ The runtime includes first-class support for both ActiveRecord *and* Sequel, and
209
+ for both Rack-based applications *and* Rails-based applications. On databases
210
+ where it is supported, such as PostgreSQL, the column can be optimized by using
211
+ the `jsonb` type.
621
212
 
622
213
  ```ruby
623
- require "llm"
624
214
  require "active_record"
625
- require "llm/active_record"
626
-
627
- class Context < ApplicationRecord
628
- acts_as_llm provider: :set_provider, context: :set_context
629
-
630
- private
631
-
632
- def set_provider
633
- LLM.openai(key: ENV["OPENAI_SECRET"])
634
- end
635
-
636
- def set_context
637
- { model: "gpt-5.4-mini", mode: :responses, store: false }
638
- end
639
- end
640
-
641
- ctx = Context.create!
642
- ctx.talk("Remember that my favorite language is Ruby")
643
- puts ctx.talk("What is my favorite language?").content
644
- ```
645
-
646
- ```ruby
647
215
  require "llm"
648
- require "active_record"
649
216
  require "llm/active_record"
650
217
 
651
- class Context < ApplicationRecord
652
- acts_as_llm provider: :set_provider, context: :set_context
653
-
654
- # Optional application columns can still provide the provider and context.
655
- # For example, `provider_name` and `model_name` can be normal columns.
656
-
657
- private
658
-
659
- def set_provider
660
- LLM.public_send(provider_name, key: provider_key)
661
- end
662
-
663
- def set_context
664
- { model: model_name, mode: :responses, store: false }
218
+ class Agent < ApplicationRecord
219
+ acts_as_agent do |agent|
220
+ agent.model "deepseek-v4-pro"
221
+ agent.instructions "solve the user's query"
222
+ agent.tools [Research, FinalizeResearch, ActOnResearch]
665
223
  end
666
- end
667
- ```
668
-
669
- #### ActiveRecord (ORM): acts_as_agent
670
-
671
- The `acts_as_agent` method wraps [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) and
672
- manages tool execution for you. Like `acts_as_llm`, its built-in persistence
673
- contract is one serialized `data` column. If your app has provider or model
674
- columns, provide them to llm.rb through your hooks and agent DSL.
675
-
676
- See the [deepdive (web)](https://r.uby.dev/llm/)
677
- or [deepdive (markdown)](resources/deepdive.md) for more examples.
678
-
679
- ```ruby
680
- require "llm"
681
- require "active_record"
682
- require "llm/active_record"
683
-
684
- class Ticket < ApplicationRecord
685
- acts_as_agent provider: :set_provider, context: :set_context
686
- model "gpt-5.4-mini"
687
- instructions "You are a concise support assistant."
688
- tools SearchDocs, Escalate
689
- concurrency :thread
690
224
 
691
225
  private
692
226
 
227
+ # By convention, this method defines the provider for a model.
228
+ # If necessary, it can be renamed with: provider: :your_method.
693
229
  def set_provider
694
- LLM.openai(key: ENV["OPENAI_SECRET"])
230
+ LLM.deepseek(key: ENV["KEY"])
695
231
  end
696
232
 
233
+ # By convention, this method returns the context options given
234
+ # to LLM::Context or LLM::Agent.
697
235
  def set_context
698
- { mode: :responses, store: false }
236
+ {}
699
237
  end
700
238
  end
701
239
 
702
- ticket = Ticket.create!
703
- puts ticket.talk("How do I rotate my API key?").content
240
+ agent = Agent.create!
241
+ agent.talk "perform research"
704
242
  ```
705
243
 
706
- ```ruby
707
- require "llm"
708
- require "active_record"
709
- require "llm/active_record"
244
+ ## FAQ
710
245
 
711
- class Ticket < ApplicationRecord
712
- acts_as_agent provider: :set_provider, context: :set_context
713
- model "gpt-5.4-mini"
714
- instructions "You are a concise support assistant."
246
+ <details>
247
+ <summary>What providers does llm.rb support?</summary>
248
+ <br>
249
+ <p>
250
+ China-based
715
251
 
716
- private
252
+ * DeepSeek
253
+ * zAI
717
254
 
718
- def set_provider
719
- LLM.public_send(provider_name, key: provider_key)
720
- end
255
+ US-based
721
256
 
722
- def set_context
723
- { mode: :responses, store: false }
724
- end
725
- end
726
- ```
257
+ * OpenAI
258
+ * Google (Gemini)
259
+ * xAI
260
+ * AWS bedrock
261
+ * DeepInfra
262
+ * Anthropic
727
263
 
728
- #### MCP
264
+ Openweights
729
265
 
730
- This example uses [`LLM::MCP`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html)
731
- over HTTP so remote GitHub MCP tools run through the same
732
- `LLM::Agent` tool path as local tools. It expects a GitHub token in
733
- `ENV["GITHUB_PAT"]`. See the
734
- [deepdive (web)](https://r.uby.dev/llm/) or
735
- [deepdive (markdown)](resources/deepdive.md) for more examples.
266
+ * DeepSeek
267
+ * zAI
268
+ * DeepInfra
269
+ * AWS bedrock
736
270
 
737
- ```ruby
738
- require "llm"
739
- require "net/http/persistent"
271
+ Host your own
740
272
 
741
- llm = LLM.openai(key: ENV["KEY"], persistent: true)
742
- mcp = LLM::MCP.http(
743
- url: "https://api.githubcopilot.com/mcp/",
744
- headers: { "Authorization" => "Bearer " + ENV["GITHUB_PAT"].to_s },
745
- persistent: true
746
- )
273
+ * Ollama
274
+ * Llamacpp
275
+ </p>
276
+ </details>
747
277
 
748
- agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
749
- agent.talk("Pull information about my GitHub account.")
750
- ```
278
+ <details>
279
+ <summary>I have a limited budget. What should I do?</summary>
280
+ <br>
281
+ <p>
282
+ There a few options. The first option is to host
283
+ your own model, and use the ollama or llamacpp
284
+ providers. This can be diffilcult though because
285
+ a capable model requires hardware that can
286
+ match it. If you have the ability to self-host,
287
+ this would be my first option.
288
+ </p>
289
+ <p>
290
+ The second option is DeepSeek. <br>
291
+ The deepseek-v4-flash model costs pennies to use. <br>
292
+ And llm.rb has been optimized for deepseek. For example,
293
+ DeepSeek does not have image generation capabilities
294
+ but on the llm.rb runtime it does (vector graphics only,
295
+ though).
296
+ </p>
297
+ <p>
298
+ The same is true for structured outputs. DeepSeek does
299
+ not support structured outputs in the same way as OpenAI or
300
+ Google, but the llm.rb runtime makes it appear as
301
+ though it does, through the `json_object` response
302
+ type.
303
+ </p>
304
+ If you're on a budget, DeepSeek is hard to beat.
305
+ </details>
306
+ <details>
307
+ <summary>Can I download llm.rb via a decentralized network?</summary>
308
+ <br>
309
+ You can!
310
+ <br>
311
+ We are on the <a href="https://radicle.network">radicle.network</a>
312
+ <br>
313
+ Every commit that lands on GitHub also lands on Radicle.
314
+ <br>
315
+ Our repository ID is z2PtfQ6dYwyYaW2aGrztG1sMyDmCE.
316
+ <br>
317
+ Browse on <a href="https://radicle.network/nodes/iris.radicle.network/z2PtfQ6dYwyYaW2aGrztG1sMyDmCE">the web</a>.
318
+ </details>
319
+
320
+ ## Resources
321
+
322
+ If you like what you read so far, check out the [deepdive.md](https://r.uby.dev/llm/deepdive/)
323
+ to learn more. Unfortunately it
324
+ wasn't possible to cover every feature without the README becoming a small book.
325
+ The [r.uby.dev](https://r.uby.dev) homepage also includes more learning material
326
+ and resources.
751
327
 
752
328
  ## License
753
329
 
754
- [BSD Zero Clause](https://choosealicense.com/licenses/0bsd/)
330
+ [Business Source License 1.1](./LICENSE)
331
+ <br>
332
+ Commercial production use requires a commercial license.
333
+ <br>
334
+ Each version converts to the [BSD Zero Clause](https://choosealicense.com/licenses/0bsd/)
335
+ four years after its first public release.
755
336
  <br>
756
- See [LICENSE](./LICENSE)
337
+ Contact [robert@r.uby.dev](mailto:robert@r.uby.dev) for a commercial license.