llm.rb 11.3.1 → 12.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +242 -1
  3. data/LICENSE +92 -17
  4. data/README.md +204 -623
  5. data/data/anthropic.json +433 -249
  6. data/data/bedrock.json +2097 -1055
  7. data/data/deepinfra.json +993 -0
  8. data/data/deepseek.json +53 -28
  9. data/data/google.json +389 -771
  10. data/data/openai.json +1053 -771
  11. data/data/xai.json +133 -292
  12. data/data/zai.json +249 -141
  13. data/lib/llm/active_record/acts_as_agent.rb +3 -41
  14. data/lib/llm/active_record/acts_as_llm.rb +18 -0
  15. data/lib/llm/active_record.rb +3 -3
  16. data/lib/llm/context.rb +9 -5
  17. data/lib/llm/contract/completion.rb +2 -2
  18. data/lib/llm/provider.rb +2 -2
  19. data/lib/llm/providers/deepinfra/audio.rb +66 -0
  20. data/lib/llm/providers/deepinfra/images.rb +90 -0
  21. data/lib/llm/providers/deepinfra/response_adapter.rb +36 -0
  22. data/lib/llm/providers/deepinfra.rb +100 -0
  23. data/lib/llm/providers/deepseek/images.rb +109 -0
  24. data/lib/llm/providers/deepseek/request_adapter.rb +32 -0
  25. data/lib/llm/providers/deepseek/response_adapter/image.rb +9 -0
  26. data/lib/llm/providers/deepseek/response_adapter.rb +29 -0
  27. data/lib/llm/providers/deepseek.rb +4 -2
  28. data/lib/llm/providers/google/request_adapter.rb +22 -5
  29. data/lib/llm/providers/google.rb +4 -4
  30. data/lib/llm/providers/openai/audio.rb +6 -2
  31. data/lib/llm/providers/openai/images.rb +9 -50
  32. data/lib/llm/providers/openai/request_adapter/respond.rb +38 -4
  33. data/lib/llm/providers/openai/response_adapter/audio.rb +5 -1
  34. data/lib/llm/providers/openai/response_adapter/completion.rb +1 -1
  35. data/lib/llm/providers/openai/response_adapter/image.rb +0 -4
  36. data/lib/llm/providers/openai/responses.rb +1 -0
  37. data/lib/llm/providers/openai/stream_parser.rb +5 -6
  38. data/lib/llm/providers/openai.rb +2 -2
  39. data/lib/llm/providers/xai/images.rb +49 -26
  40. data/lib/llm/providers/xai.rb +2 -2
  41. data/lib/llm/response.rb +10 -0
  42. data/lib/llm/schema/leaf.rb +7 -1
  43. data/lib/llm/schema/renderer.rb +121 -0
  44. data/lib/llm/schema.rb +30 -0
  45. data/lib/llm/sequel/agent.rb +2 -43
  46. data/lib/llm/sequel/plugin.rb +25 -7
  47. data/lib/llm/tracer/telemetry.rb +4 -6
  48. data/lib/llm/tracer.rb +9 -21
  49. data/lib/llm/transport/execution.rb +16 -1
  50. data/lib/llm/transport/net_http_adapter.rb +1 -1
  51. data/lib/llm/uridata.rb +16 -0
  52. data/lib/llm/version.rb +1 -1
  53. data/lib/llm.rb +9 -0
  54. data/llm.gemspec +5 -18
  55. data/resources/deepdive.md +798 -264
  56. metadata +15 -18
  57. data/lib/llm/tracer/langsmith.rb +0 -144
@@ -12,421 +12,955 @@
12
12
 
13
13
  > A [r.uby.dev](https://r.uby.dev) project.
14
14
 
15
- ## Intro
15
+ ## Welcome
16
+
17
+ Welcome to the llm.rb deepdive. You are reading this document
18
+ in the markdown format. An optimized version exists
19
+ at [https://r.uby.dev/llm/deepdive](https://r.uby.dev/llm/deepdive)
20
+ and it is both easier to read and navigate.
21
+
22
+ This document is a continuation of the [homepage documentation](https://r.uby.dev/llm).
23
+ It assumes you are familiar with the basics already, and focuses on
24
+ features that didn't make it into the homepage documentation.
25
+
26
+ ## Table of contents
27
+
28
+ - [Agents](#agents)
29
+ - [As a subclass](#as-a-subclass)
30
+ - [As an object](#as-an-object)
31
+ - [Skills](#skills)
32
+ - [SKILL.md](#skillmd)
33
+ - [Run it](#run-it)
34
+ - [MCP](#mcp)
35
+ - [stdio](#stdio)
36
+ - [http](#http)
37
+ - [A2A](#a2a)
38
+ - [rest](#rest)
39
+ - [jsonrpc](#jsonrpc)
40
+ - [Transports](#transports)
41
+ - [net/http](#nethttp)
42
+ - [net/http/persistent](#nethttppersistent)
43
+ - [curb](#curb)
44
+ - [Stream](#stream)
45
+ - [IO-like object](#io-like-object)
46
+ - [LLM::Stream](#llmstream)
47
+ - [ORM](#orm)
48
+ - [ActiveRecord](#activerecord)
49
+ - [Sequel](#sequel)
50
+ - [Schema](#schema)
51
+ - [Estimation](#estimation)
52
+ - [Cancellation](#cancellation)
53
+ - [Cancel a request](#cancel-a-request)
54
+ - [Tracer](#tracer)
55
+ - [Provider-wide tracer](#provider-wide-tracer)
56
+ - [Agent-local tracer](#agent-local-tracer)
57
+ - [Images](#images)
58
+ - [Generation](#generation)
59
+ - [Edits](#edits)
60
+ - [Audio](#audio)
61
+ - [text-to-speech](#text-to-speech)
62
+ - [speech-to-text](#speech-to-text)
63
+ - [translation](#translation)
64
+
65
+ ## Agents
66
+
67
+ An agent is represented by the
68
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
69
+ class, and it is built on top of
70
+ [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) -
71
+ the heart of the runtime. An agent manages the tool loop automatically,
72
+ implements a tool loop guard for misbehaving models, and
73
+ it can use five different concurrency strategies to execute
74
+ tools.
75
+
76
+ An agent can be a subclass of
77
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
78
+ or a direct
79
+ instance of it. The subclass approach is useful when you
80
+ want reusable agents that can attach behavior (as methods)
81
+ to their own class.
82
+
83
+ #### As a subclass
84
+
85
+ A subclass of
86
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
87
+ can define its model, tools,
88
+ and other attributes at the class-level. All of these
89
+ attributes are optional, and they act as defaults that
90
+ can be overriden on the instance level.
91
+
92
+ The example uses the `:fork` concurrency model. It has
93
+ two primary benefits: tools are run in parallel, and in
94
+ a separate process with a separate memory address space.
95
+
96
+ The example purposefully demonstrates how the attributes
97
+ can be lazily defined with a block, or a Symbol that is
98
+ evaluated as an instance method on the subclass. It is
99
+ not strictly neccessary, though, and the example would
100
+ be simpler without it.
16
101
 
17
- This guide is a practical walkthrough of [llm.rb](https://github.com/r-uby-dev/llm.rb#readme) —
18
- Ruby's capable AI runtime.
102
+ ```ruby
103
+ class Agent < LLM::Agent
104
+ model "deepseek-v4-pro"
105
+ tools { [DoResearch, FinalizeResearch, ActOnResearch] }
106
+ stream { $stdout }
107
+ tracer :set_tracer
108
+ concurrency :fork
109
+
110
+ def research!
111
+ talk "start the research"
112
+ end
19
113
 
20
- llm.rb runs on Ruby's standard library by default and loads optional pieces
21
- only when needed. You can start with a provider and a single context, then add
22
- agents, tools, streaming, persistence, embeddings, and protocol clients
23
- without changing the shape of your code.
114
+ private
115
+
116
+ def set_tracer
117
+ LLM::Tracer::Logger.new(llm, io: $stderr)
118
+ end
119
+ end
120
+ llm = LLM.deepseek(key: ENV["KEY"])
121
+ agent = Agent.new(llm).tap(&:research!)
122
+ agent.talk "How did the research go?"
123
+ ```
24
124
 
25
- It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google Gemini,
26
- DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. ActiveRecord and
27
- Sequel support are built in, along with concurrent tool execution through
28
- threads, tasks, fibers, ractors, and fork.
125
+ #### As an object
29
126
 
30
- ## Install
127
+ The more direct, and sometimes more convienent approach, is to
128
+ create an instance of
129
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
130
+ directly. The same attributes can be provided as the
131
+ second argument given to
132
+ [`LLM::Agent.new`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
133
+ and the same lazy evaluation rules apply. This approach can be
134
+ great for prototyping quickly, and you can always turn to a
135
+ subclass later if that makes more sense.
31
136
 
32
- ```bash
33
- gem install llm.rb
137
+ ```ruby
138
+ llm = LLM.deepseek(key: ENV["KEY"])
139
+ agent = LLM::Agent.new(llm, stream: $stdout)
140
+ agent.talk "Hello, fellow agent"
34
141
  ```
35
142
 
36
- ## Quick Start
143
+ [Back to top](#table-of-contents)
37
144
 
38
- #### Agent
145
+ ## Tools
39
146
 
40
- [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) is the
41
- recommended starting point.
42
- <br>
43
- It manages tool execution for you and keeps conversation state across turns.
147
+ A tool extends the capabilities of a model. <br>
148
+ A tool is a subclass of
149
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
150
+ that has a name,
151
+ a description, and an optional set of typed parameters.
152
+
153
+ A tool also has a method associated with it, and when the
154
+ model calls a tool it will do so through this method &ndash;
155
+ alongside any parameters the tool might have defined.
156
+
157
+ In other words, a tool provides a way for a model to
158
+ call a method you have written, and it returns a value
159
+ to the model that is considered the tool's response.
160
+ The model then proceeds to process the tool's response,
161
+ and then might generate its own response, or perhaps call
162
+ another tool.
163
+
164
+ #### LLM::Tool
165
+
166
+ A tool can be defined by subclassing
167
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
168
+ with
169
+ a name, description, and optional set of parameters. The
170
+ tool name, and description should be informative so the
171
+ model can understand what the tool does and how it can
172
+ serve a user's query.
44
173
 
45
174
  ```ruby
46
175
  require "llm"
176
+ require "shellwords"
177
+
178
+ class Shell < LLM::Shell
179
+ name "shell"
180
+ description "execute a shell command"
181
+ parameter :name, String, "the command's name"
182
+ parameter :arguments, Array[String], "One or more arguments"
183
+ required %i[name]
184
+ defaults arguments: []
185
+
186
+ def call(name:, arguments:)
187
+ out = `#{name.shellscape} #{arguments.map(&:shellescape).join(" ")}`
188
+ {ok: $?.success?, out:}
189
+ end
190
+ end
47
191
 
48
- llm = LLM.openai(key: ENV["KEY"])
49
- agent = LLM::Agent.new(llm, stream: $stdout)
50
- agent.talk "Hello world"
192
+ llm = LLM.deepseek(key: ENV["KEY"])
193
+ agent = LLM::Agent.new(llm, tools: [Shell], stream: $stdout)
194
+ agent.talk "What files are in the current working directory?"
51
195
  ```
52
196
 
53
- #### REPL
197
+ #### Errors
198
+
199
+ Exceptions that might be raised by a tool are automatically
200
+ rescued and returned to the model as a structured error.
201
+ Otherwise &ndash; the conversation's history could be left
202
+ in an invalid state.
54
203
 
55
- A read-eval-print loop is the simplest way to interact with an agent.
56
- <br>
57
- The loop reads input, sends it to the model, and prints the response as it
58
- arrives:
204
+ That's because a tool call must complete with a tool response,
205
+ that's the only valid response a model expects, so even in the
206
+ case of an error, something must be returned that communicates
207
+ what happened.
59
208
 
60
209
  ```ruby
61
- require "llm"
210
+ class Error < LLM::Tool
211
+ name "error"
212
+ description "demo how errors are handled"
213
+
214
+ ##
215
+ # Returns
216
+ # {error: true, kind: "RuntimeError", message: "boom"}
217
+ def call
218
+ raise "boom"
219
+ end
220
+ end
221
+ ```
62
222
 
63
- llm = LLM.openai(key: ENV["KEY"])
64
- agent = LLM::Agent.new(llm, stream: $stdout)
223
+ ## Skills
65
224
 
66
- loop do
67
- print "> "
68
- agent.talk(STDIN.gets || break)
69
- puts
70
- end
225
+ The skill concept is borrowed from tools like Claude and
226
+ Codex, but llm.rb gives it a runtime of its own. A skill
227
+ is a directory with a `SKILL.md` file. That file contains
228
+ frontmatter where the skill's name, description, and tools
229
+ can be declared.
230
+
231
+ #### SKILL.md
232
+
233
+ The `SKILL.md` file can look like this. When a skill runs,
234
+ the runtime spawns a subagent with its own context window
235
+ and message history. Some context is inherited from the
236
+ parent agent, though.
237
+
238
+ By default the subagent can only access the tools declared
239
+ by the skill. The `inherit` directive lets it inherit the
240
+ parent agent's tools instead, including A2A and MCP tools.
241
+
242
+ ```markdown
243
+ ---
244
+ name: git-skill
245
+ description: reads my git history and writes a summary
246
+ tools: ['git-log', 'git-show', 'write-file']
247
+ ---
248
+
249
+ ## Task
250
+
251
+ Collect a log of recent history.
252
+ Analyze each commit.
253
+ Write a summary to summary.txt
71
254
  ```
72
255
 
73
- #### Context
256
+ #### Run it
257
+
258
+ Given the skill above, llm.rb only needs the path to the
259
+ directory that contains `SKILL.md`. Under the hood, a skill
260
+ is represented as a tool the model can call. That means
261
+ a skill can be called whenever it satisfies the user's
262
+ request &ndash; in the same way that a regular tool can.
74
263
 
75
- [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) is the
76
- lower-level runtime object.
77
- <br>
78
- It holds the same conversation state but leaves tool execution up to you.
79
- Use it when you want to decide when and how tools run.
264
+ This feature also works with both the ActiveRecord, and
265
+ Sequel integrations.
80
266
 
81
267
  ```ruby
82
268
  require "llm"
83
269
 
84
- llm = LLM.openai(key: ENV["KEY"])
85
- ctx = LLM::Context.new(llm, stream: $stdout)
86
- ctx.talk "Hello world"
270
+ llm = LLM.deepseek(key: ENV["KEY"])
271
+ agent = LLM::Agent.new(llm, skills: [__dir__])
272
+ agent.talk "run the git skill"
87
273
  ```
88
274
 
89
- With tools, the manual loop is explicit:
275
+ [Back to top](#table-of-contents)
276
+
277
+ ## MCP
278
+
279
+ #### stdio
280
+
281
+ The stdio transport connects to an MCP server that is launched as a
282
+ separate process, and both its standard input and standard output
283
+ streams are used for communication. It is recommended but not
284
+ required to execute commands for a stdio transport over a
285
+ persistent session via the
286
+ [`LLM::MCP#session`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html#session-instance_method)
287
+ method &ndash; otherwise
288
+ you could end up launching the same process multiple times.
90
289
 
91
290
  ```ruby
92
- ctx = LLM::Context.new(llm, tools: [ReadFile])
93
- ctx.talk("Read README.md and summarize it.")
94
- ctx.talk(ctx.wait(:call)) while ctx.functions?
291
+ require "llm"
292
+
293
+ llm = LLM.deepseek(key: ENV["KEY"])
294
+ mcp = LLM::MCP.stdio(argv: ["npx", "-y", "@forgejo/mcp-server"])
295
+ agent = LLM::Agent.new(llm)
296
+
297
+ mcp.session do
298
+ agent.talk "What's happening on forgejo?", tools: mcp.tools
299
+ end
95
300
  ```
96
301
 
97
- For ordinary application code, prefer
98
- [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
99
- It does the same thing but manages the loop for you.
302
+ #### http
100
303
 
101
- ## Tools
304
+ The http transport connects to an MCP server over HTTP, and unlike
305
+ the stdio transport, the MCP server does not have to be running
306
+ locally. Popular services like GitHub provide their own MCP server
307
+ over HTTP, and it is one of the most capable MCP servers I have
308
+ used.
102
309
 
103
- #### Definition
310
+ Unlike the stdio transport,
311
+ [`LLM::MCP#session`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html#session-instance_method)
312
+ carries little benefit for the http transport and it can be
313
+ omitted. It is recommended to consider the `net_http_persistent`
314
+ transport for MCP interactions that run over HTTP, otherwise
315
+ you could end up tearing down and setting up the same connection
316
+ multiple times.
104
317
 
105
- Tools extend what the model can do.
106
- <br>
107
- They are plain Ruby classes with typed parameters. Define one, attach it to
108
- an agent, and the model can call it when it makes sense.
318
+ ```ruby
319
+ require "llm"
320
+
321
+ llm = LLM.deepseek(key: ENV["KEY"])
322
+ mcp = LLM::MCP.http(
323
+ url: "https://api.githubcopilot.com/mcp/",
324
+ headers: {
325
+ "Authorization" => "Bearer #{ENV.fetch('GITHUB_PAT')}"
326
+ },
327
+ transport: :net_http_persistent
328
+ )
329
+ agent = LLM::Agent.new(llm)
330
+ agent.talk "What's happening on GitHub?", tools: mcp.tools
331
+ ```
332
+
333
+ [Back to top](#table-of-contents)
334
+
335
+ ## A2A
336
+
337
+ #### rest
338
+
339
+ The rest transport communicates with other agents via A2A
340
+ endpoints that speak both HTTP and JSON. The skills advertised
341
+ by an agent become subclasses of
342
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
343
+ that can be used by both
344
+ [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html),
345
+ and [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
346
+ &ndash; similar to how MCP tools become subclasses of
347
+ [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html).
109
348
 
110
349
  ```ruby
111
- class ReadFile < LLM::Tool
112
- name "read-file"
113
- description "Read a file"
114
- parameter :path, String, "The filename or path"
115
- required %i[path]
116
-
117
- def call(path:)
118
- {contents: File.read(path)}
119
- end
120
- end
350
+ require "llm"
351
+
352
+ llm = LLM.deepseek(key: ENV["KEY"])
353
+ a2a = LLM::A2A.rest(url: "https://agent.example.com")
354
+ agent = LLM::Agent.new(llm, tools: a2a.skills)
355
+ agent.talk "What's happening, fellow agent?"
121
356
  ```
122
357
 
123
- Attach the tool to an agent:
358
+ #### jsonrpc
359
+
360
+ The jsonrpc transport communicates with other agents via HTTP
361
+ and a protocol known as jsonrpc. Sometimes an agent will
362
+ implement both, or just one of each. An agent's card, which
363
+ is represented by an instance of
364
+ [`LLM::A2A::Card`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A/Card.html),
365
+ can be
366
+ used to discover available transports via the
367
+ [`LLM::A2A::Card#interfaces`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A/Card.html#interfaces-instance_method)
368
+ method.
124
369
 
125
370
  ```ruby
126
- agent = LLM::Agent.new(llm, stream: $stdout, tools: [ReadFile])
127
- agent.talk "Read README.md and summarize the project."
371
+ require "llm"
372
+ llm = LLM.deepseek(key: ENV["KEY"])
373
+ a2a = LLM::A2A.jsonrpc(url: "https://agent.example.com")
374
+ agent = LLM::Agent.new(llm, tools: a2a.skills)
375
+ agent.talk "What's happening, fellow agent?"
128
376
  ```
129
377
 
130
- [`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html) handles the
131
- Ruby-side definition. llm.rb adapts the tool schema to the provider at request
132
- time.
378
+ [Back to top](#table-of-contents)
379
+
380
+ ## Transports
133
381
 
134
- #### Concurrency
382
+ The [`LLM::Provider`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html),
383
+ [`LLM::MCP`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html), and
384
+ [`LLM::A2A`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A.html) classes
385
+ all accept a `transport` option that decides which library
386
+ will be used for HTTP communication. There are three options out
387
+ of the box:
388
+ [`net-http`](https://github.com/ruby/net-http),
389
+ [`net-http-persistent`](https://github.com/drbrain/net-http-persistent),
390
+ and [`curb`](https://github.com/taf2/curb).
135
391
 
136
- When an agent calls several tools at once, you can run them in parallel.
137
- <br>
138
- This cuts down waiting time when tools do independent work like reading
139
- files or calling APIs.
392
+ #### net/http
393
+
394
+ The [`net/http`](https://github.com/ruby/net-http) transport is represented by the symbol `:net_http`. <br>
395
+ It is the default transport.
140
396
 
141
397
  ```ruby
142
- class Agent < LLM::Agent
143
- model "gpt-5.4-mini"
144
- tools ReadFile
145
- concurrency :thread
146
- end
398
+ require "llm"
147
399
 
148
- llm = LLM.openai(key: ENV["KEY"])
149
- agent = Agent.new(llm, stream: $stdout)
150
- agent.talk "Read README.md and CHANGELOG.md and compare them."
400
+ llm = LLM.deepseek(key: "...", transport: :net_http)
401
+ mcp = LLM::MCP.http(url: "...", transport: :net_http)
402
+ a2a = LLM::A2A.rest(url: "...", transport: :net_http)
151
403
  ```
152
404
 
153
- ## Structured Output
154
-
155
- #### Schema
405
+ #### net/http/persistent
156
406
 
157
- When you need JSON with a known shape, use
158
- [`LLM::Schema`](https://r.uby.dev/api-docs/llm.rb/LLM/Schema.html).
159
- <br>
160
- The model will return data that matches your schema instead of free text.
407
+ The [`net/http/persistent`](https://github.com/drbrain/net-http-persistent) transport is represented by the symbol `:net_http_persistent`. <br>
408
+ It maintains a connection pool so the cost of tearing down and
409
+ setting up a connection repeatedly is kept low, and it is built
410
+ on top of [`net/http`](https://github.com/ruby/net-http).
161
411
 
162
412
  ```ruby
163
- class Report < LLM::Schema
164
- property :category, Enum["performance", "security", "outage"]
165
- property :summary, String, "Short summary"
166
- property :services, Array[String], "Impacted services"
167
- required %i[category summary services]
168
- end
413
+ require "llm"
169
414
 
170
- agent = LLM::Agent.new(llm, schema: Report)
171
- res = agent.talk("Classify: 'API latency spiked for the billing service.'")
172
- puts res.content!
415
+ llm = LLM.deepseek(key: "...", transport: :net_http_persistent)
416
+ mcp = LLM::MCP.http(url: "...", transport: :net_http_persistent)
417
+ a2a = LLM::A2A.rest(url: "...", transport: :net_http_persistent)
173
418
  ```
174
419
 
175
- For one-off schemas, build the shape inline:
420
+ #### curb
421
+
422
+ The [`curb`](https://github.com/taf2/curb) transport is represented by the symbol `:curb`. <br>
423
+ It provides bindings for libcurl &ndash; a widely used, highly portable
424
+ and feature-rich HTTP library written in C.
176
425
 
177
426
  ```ruby
178
- schema = LLM::Schema.new.object(
179
- category: LLM::Schema.new.string.enum("bug", "feature").required,
180
- summary: LLM::Schema.new.string.required
181
- )
427
+ require "llm"
182
428
 
183
- agent = LLM::Agent.new(llm, schema:)
184
- res = agent.talk("Classify: add a dark mode toggle.")
185
- puts res.content
429
+ llm = LLM.deepseek(key: "...", transport: :curb)
430
+ mcp = LLM::MCP.http(url: "...", transport: :curb)
431
+ a2a = LLM::A2A.rest(url: "...", transport: :curb)
186
432
  ```
187
433
 
188
- ## Streaming
434
+ [Back to top](#table-of-contents)
435
+
436
+ ## Stream
437
+
438
+ #### IO-like object
189
439
 
190
- #### Stream
440
+ Any object that implements the `#<<` method can receive
441
+ chunks from a stream. That includes objects like `$stdout`.
442
+ This form of streaming is simple and limited. It is the
443
+ equivalent of
444
+ [`LLM::Stream#on_content`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html#on_content-instance_method),
445
+ and doesn't include
446
+ any of the other
447
+ [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
448
+ hooks.
191
449
 
192
- Streaming works with any object that responds to `#<<`, like `$stdout`.
193
- <br>
194
- For more control, subclass
195
- [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html) and
196
- override its callbacks:
450
+ ```ruby
451
+ require "llm"
452
+
453
+ llm = LLM.deepseek(key: ENV["KEY"])
454
+ agent = LLM::Agent.new(llm, stream: $stdout)
455
+ agent.talk "hello world"
456
+ ```
457
+
458
+ #### LLM::Stream
459
+
460
+ The [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
461
+ class provides many hooks that a subclass
462
+ can implement. They range from being notified when a tool call
463
+ starts to when a tool call finishes, or when a conversation is
464
+ due to be compacted because the context window exceeded a defined
465
+ limit. All these callbacks support a responsive user interface
466
+ where the user is always aware of what is happening behind the
467
+ scenes.
197
468
 
198
469
  ```ruby
199
- class MyStream < LLM::Stream
470
+ class Stream < LLM::Stream
200
471
  def on_content(content)
201
- print content
472
+ puts content
202
473
  end
203
474
 
204
475
  def on_reasoning_content(content)
205
- warn content
476
+ puts content
206
477
  end
207
- end
208
478
 
209
- llm = LLM.openai(key: ENV["KEY"])
210
- agent = LLM::Agent.new(llm, stream: MyStream.new)
211
- agent.talk "Explain Ruby fibers."
479
+ def on_tool_call(tool, error)
480
+ # this callback can be used to either log a tool call,
481
+ # or execute a tool call during a stream.
482
+ end
483
+
484
+ def on_tool_return(tool, result)
485
+ end
486
+
487
+ def on_compaction(ctx, compactor)
488
+ # this callback is called *before* a compact happens
489
+ end
490
+
491
+ def on_compaction_finish(ctx, compactor)
492
+ # this callback is called *after* a compact happens
493
+ end
494
+ end
212
495
  ```
213
496
 
214
- ## Skills
497
+ [Back to top](#table-of-contents)
215
498
 
216
- #### Release
499
+ ## Serialization
217
500
 
218
- Skills package repeatable instructions and scoped tool access into
219
- `SKILL.md` directories.
220
- <br>
221
- They turn common workflows into named capabilities that agents can load
222
- on demand.
501
+ The [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
502
+ class can be serialized to JSON and stored in a string or on disk.
503
+ That is powerful because a context contains runtime state that can
504
+ be restored later, in a different process or even on a different
505
+ machine. And because an agent is implemented on top of
506
+ [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
507
+ this feature works for [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
508
+ too.
223
509
 
224
- ```yaml
225
- ---
226
- name: release
227
- description: Prepare a release
228
- tools: ["search-docs", "git"]
229
- ---
510
+ #### Save to disk
230
511
 
231
- ## Task
512
+ The runtime can serialize its state to a string, a text file, or
513
+ a database column. The option that fits best depends on your application
514
+ and environment. Web applications might be more interested in the [ORM](#orm)
515
+ feature, which is built on top of the serialization feature.
232
516
 
233
- Review the release state, summarize what changed, and prepare the release.
517
+ ```ruby
518
+ ##
519
+ # Create a provider
520
+ llm = LLM.deepseek(key: ENV["KEY"])
521
+
522
+ ##
523
+ # Save agent
524
+ agent1 = LLM::Agent.new(llm)
525
+ agent1.talk "remember my name is robert"
526
+ agent1.save(path: "agent.json")
527
+
528
+ ##
529
+ # Restore agent
530
+ agent2 = LLM::Agent.new(llm, stream: $stdout)
531
+ agent2.restore(path: "agent.json")
532
+ agent2.talk "what's my name?"
234
533
  ```
235
534
 
535
+ ## ORM
536
+
537
+ Both ActiveRecord, and Sequel have first-class support on the
538
+ llm.rb runtime. In both cases an ActiveRecord or Sequel model
539
+ can be turned into a model that has the same capabilities as
540
+ [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html),
541
+ or [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
542
+
543
+ The main difference is that the runtime persists directly into
544
+ the database with no requirements beyond a single column on a
545
+ single row. That means it is usually trivial to turn an existing
546
+ model into an AI-aware model.
547
+
548
+ #### ActiveRecord
549
+
550
+ The ActiveRecord interface for
551
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
552
+ is
553
+ [`acts_as_agent`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsAgent.html).
554
+ It yields an instance of
555
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
556
+ and that can be used
557
+ to configure the agent (eg which model, instructions, skills,
558
+ tools, etc).
559
+
560
+ An interesting option is the `format` option, by default it
561
+ defaults to `:string` but it can also be changed to `:json`
562
+ or `:jsonb` depending on the configuration and type of underlying
563
+ column. The JSONB column type is recommended.
564
+
236
565
  ```ruby
237
- class ReleaseAgent < LLM::Agent
238
- model "gpt-5.4-mini"
239
- skills "./skills/release"
240
- end
566
+ require "active_record"
567
+ require "llm"
568
+ require "llm/active_record"
241
569
 
242
- llm = LLM.openai(key: ENV["KEY"])
243
- ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")
244
- ```
570
+ class Agent < ApplicationRecord
571
+ acts_as_agent(format: :jsonb) do |agent|
572
+ agent.model "deepseek-v4-pro"
573
+ agent.instructions "solve the user's query"
574
+ agent.tools [Research, FinalizeResearch, ActOnResearch]
575
+ end
245
576
 
246
- When a skill runs, llm.rb starts a subagent with the skill's instructions,
247
- its allowed tools, and recent conversation context. Skills can also use
248
- `tools: inherit` to run with the parent agent's full toolset.
577
+ private
249
578
 
250
- ## MCP
579
+ ##
580
+ # By convention, this method defines the provider
581
+ # for a model. If neccessary, it can be renamed and
582
+ # configured via `provider: :your_method` instead.
583
+ def set_provider
584
+ LLM.deepseek(key: ENV["KEY"])
585
+ end
586
+
587
+ ##
588
+ # By convention, this method should return what is
589
+ # given as the second argument to `LLM::Context` or
590
+ # `LLM::Agent`.
591
+ #
592
+ # Often, there is no need to set it, so it can be left
593
+ # undefined or it can be reassigned in the same way as
594
+ # `set_provider`. For example: `context: :your_method`
595
+ def set_context
596
+ {}
597
+ end
598
+ end
599
+
600
+ agent = Agent.create!
601
+ agent.talk "perform research"
602
+ ```
251
603
 
252
- #### Stdio
604
+ #### Sequel
253
605
 
254
- [`LLM::MCP`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html) lets llm.rb use
255
- tools provided by local stdio servers or remote HTTP servers.
256
- <br>
257
- This is how you connect your agent to GitHub, databases, or anything else
258
- that speaks the Model Context Protocol.
606
+ The following is a Sequel equivalent to the ActiveRecord example,
607
+ but to keep it interesting and informative, this example also
608
+ configures a per-model tracer that logs to `$stdout`. Works the
609
+ same for ActiveRecord.
259
610
 
260
611
  ```ruby
612
+ require "sequel"
261
613
  require "llm"
614
+ require "llm/sequel/plugin"
615
+
616
+ class Agent < Sequel::Model
617
+ plugin(:agent, format: :jsonb) do |agent|
618
+ agent.model "deepseek-v4-pro"
619
+ agent.instructions "solve the user's query"
620
+ agent.tools [Research, FinalizeResearch, ActOnResearch]
621
+ agent.tracer { LLM::Tracer::Logger.new(llm, io: $stdout) }
622
+ end
262
623
 
263
- llm = LLM.openai(key: ENV["KEY"])
264
- mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
624
+ private
265
625
 
266
- mcp.session do
267
- agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
268
- agent.talk "Use the available tools to inspect the environment."
626
+ def set_provider
627
+ LLM.deepseek(key: ENV["KEY"])
628
+ end
269
629
  end
630
+
631
+ agent = Agent.create
632
+ agent.talk "perform research"
270
633
  ```
271
634
 
272
- #### Remote
635
+ [Back to top](#table-of-contents)
636
+
637
+ ## Schema
638
+
639
+ The [`LLM::Schema`](https://r.uby.dev/api-docs/llm.rb/LLM/Schema.html)
640
+ class can be subclassed to describe
641
+ the shape of a JSON object or objects that you expect
642
+ the model to respond with.
643
+
644
+ It can be useful for a wide range of use cases but the
645
+ most popular might be classification, data extraction,
646
+ and transferring structured data between different software
647
+ rather than blobs of text that a machine cannot easily parse
648
+ in a structured way.
273
649
 
274
- For HTTP MCP servers, use persistent connections when you make repeated
275
- tool calls:
650
+ #### Estimation
651
+
652
+ The following example asks the model to estimate the age
653
+ of a person in a photo. The model provides a structured response
654
+ that's represented by an instance of
655
+ [`LLM::Object`](https://r.uby.dev/api-docs/llm.rb/LLM/Object.html).
656
+
657
+ The object returned by
658
+ [`LLM::Response#content!`](https://r.uby.dev/api-docs/llm.rb/LLM/Contract/Completion.html#content!-instance_method)
659
+ has methods that can access the age, confidence, and comments
660
+ properties.
661
+ This approach can also work for extracting data or an analysis
662
+ from a PDF, and other file types.
276
663
 
277
664
  ```ruby
278
- mcp = LLM::MCP.http(
279
- url: "https://remote-mcp.example.com",
280
- transport: :net_http_persistent
281
- )
665
+ require "llm"
666
+ require "pp"
667
+
668
+ class Estimation < LLM::Schema
669
+ property :age, Integer, "The estimated age of the person"
670
+ property :confidence, Number, "Your confidence in the estimate"
671
+ property :applicable, Boolean, "True when the photo contains a person"
672
+ property :comments, String, "Any additional comments or input"
673
+ required %i[age confidence applicable comments]
674
+ end
282
675
 
283
- agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
284
- agent.talk "Use the remote tools to inspect the repository."
676
+ llm = LLM.openai(key: ENV["KEY"])
677
+ agent = LLM::Agent.new(llm, schema: Estimation)
678
+ res = agent.ask "Given this photo, provide an age estimate", with: "photo.jpg"
679
+
680
+ ##
681
+ # Coerces the model's response from a JSON string
682
+ # to an instance of LLM::Object.
683
+ estimate = res.content!
684
+
685
+ ##
686
+ # Let's print the estimate
687
+ if estimate.applicable
688
+ print "The person is approx ", estimate.age.to_s, " years old", "\n"
689
+ print "I have a confidence rating of ", estimate.confidence.to_s, "\n"
690
+ else
691
+ print "This photo is not applicable:", "\n"
692
+ print estimate.comments
693
+ end
285
694
  ```
286
695
 
287
- ## Persistence
696
+ [Back to top](#table-of-contents)
697
+
698
+ ## Cancellation
288
699
 
289
- #### Overview
700
+ #### Cancel a request
290
701
 
291
- Agents and contexts serialize to JSON and restore later.
292
- <br>
293
- The same serialized state powers the ActiveRecord and Sequel integrations.
702
+ A common scenario when communicating with a model is to
703
+ want to cancel the request mid-stream. This could be done
704
+ for a number of different reasons, most often because the
705
+ user made a mistake, or the model is making a mistake and
706
+ the user wants to cancel the action.
294
707
 
295
- #### Filesystem
708
+ The runtime has built-in support for cancellation. So for
709
+ example it is possible to cancel a request on the main
710
+ thread from a secondary thread. A number of things happen
711
+ when a request is cancelled. First the request is cancelled
712
+ at the transport level, and each transport handles it a little
713
+ differently. The net effect in every case is that the connection
714
+ is closed.
296
715
 
297
- Persist agent state to a JSON file on disk.
716
+ The runtime then notifies the rest of the system. so for example,
717
+ if a tool was running, it will receive the `on_interrupt` / `on_cancel`
718
+ callback that lets the tool do any necessary cleanup, or execute its own
719
+ cancellation plan. Tools that were pending (not yet run but requetsed to
720
+ run) are cancelled through
721
+ [`LLM::Function#cancel`](https://r.uby.dev/api-docs/llm.rb/LLM/Function.html#cancel-instance_method).
298
722
 
299
723
  ```ruby
300
724
  require "llm"
301
725
 
302
- llm = LLM.openai(key: ENV["KEY"])
726
+ llm = LLM.deepseek(key: ENV["DEEPSEEK_SECRET"])
303
727
  agent = LLM::Agent.new(llm)
304
- agent.talk "Remember that my favorite language is Ruby"
728
+ queue = Queue.new
305
729
 
306
- # Save
307
- File.write("agent.json", agent.to_json)
730
+ Thread.new do
731
+ queue.push(nil)
732
+ sleep(2)
733
+ agent.cancel!
734
+ end
308
735
 
309
- # Restore later
310
- agent2 = LLM::Agent.new(llm, stream: $stdout)
311
- agent2.restore(path: "agent.json")
312
- agent2.talk "What is my favorite language?"
736
+ begin
737
+ queue.pop
738
+ agent.talk "write me a very long poem", stream: $stdout
739
+ rescue LLM::Interrupt
740
+ puts "request cancelled!"
741
+ end
313
742
  ```
314
743
 
315
- #### ActiveRecord
744
+ [Back to top](#table-of-contents)
316
745
 
317
- [`acts_as_agent`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsAgent.html)
318
- wraps an agent directly on an ActiveRecord model.
319
- <br>
320
- Serialized state lives in a single `data` column while your application
321
- controls provider, model, and tool configuration.
746
+ ## Tracer
322
747
 
323
- ```ruby
324
- require "llm"
325
- require "active_record"
326
- require "llm/active_record"
748
+ The runtime can be observed by subclasses of
749
+ [`LLM::Tracer`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer.html). <br>
750
+ The default tracers include a tracer that can write to standard
751
+ output
752
+ ([`LLM::Tracer::Logger`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer/Logger.html)),
753
+ and a generic OpenTelemetry tracer that can export spans via OTLP
754
+ ([`LLM::Tracer::Telemetry`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer/Telemetry.html)).
327
755
 
328
- class Ticket < ApplicationRecord
329
- acts_as_agent provider: :set_provider, context: :set_context
330
- model "gpt-5.4-mini"
331
- instructions "You are a concise support assistant."
332
- tools SearchDocs, Escalate
333
- concurrency :thread
756
+ llm.rb has numerous hooks implemented throughout the runtime that
757
+ [`LLM::Tracer`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer.html)
758
+ subclasses can hook into, and the tracer is
759
+ purposefully designed to be extensible. The scope of a trace
760
+ can vary from an individual agent (an instance of
761
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)),
762
+ or for every request a provider makes (an indirect instance of
763
+ [`LLM::Provider`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html)).
334
764
 
335
- private
765
+ #### Provider-wide tracer
336
766
 
337
- def set_provider
338
- LLM.openai(key: ENV["OPENAI_SECRET"])
339
- end
767
+ The following two examples demonstrate provider-wide tracers that
768
+ cover every request made for a single provider.
340
769
 
341
- def set_context
342
- {mode: :responses, store: false}
343
- end
344
- end
770
+ ```ruby
771
+ ##
772
+ # Provider-wide tracer
773
+ # Writes to $stdout
774
+ llm = LLM.deepseek(key: ENV["KEY"])
775
+ llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
345
776
 
346
- ticket = Ticket.create!
347
- puts ticket.talk("How do I rotate my API key?").content
777
+ ##
778
+ # Provider-wide tracer
779
+ # Writes to deepseek.log
780
+ llm = LLM.deepseek(key: ENV["KEY"])
781
+ llm.tracer = LLM::Tracer::Logger.new(llm, path: "deepseek.log")
348
782
  ```
349
783
 
350
- If you need manual control over tool execution, use
351
- [`acts_as_llm`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsLLM.html)
352
- instead. It wraps
353
- [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) with the
354
- same persistence contract.
784
+ #### Agent-local tracer
355
785
 
356
- ## Embeddings
786
+ The next two examples demonstrate a tracer that is local
787
+ to an agent.
357
788
 
358
- #### Vector
789
+ ```ruby
790
+ ##
791
+ # Agent-local
792
+ # Writes to $stdout
793
+ llm = LLM.deepseek(key: ENV["KEY"])
794
+ agent = LLM::Agent.new(llm, tracer: LLM::Tracer::Logger.new(llm, io: $stdout))
795
+
796
+ ##
797
+ # Agent-local
798
+ # Writes to deepseek-agent.log
799
+ llm = LLM.deepseek(key: ENV["KEY"])
800
+ agent = LLM::Agent.new(llm, tracer: LLM::Tracer::Logger.new(llm, path: "deepseek-agent.log"))
801
+ ```
359
802
 
360
- Embeddings turn text into vectors. Call `.embed` on any provider that supports
361
- it. The returned vectors can be stored in a vector-aware database (PostgreSQL
362
- with pgvector, SQLite with `vec0`, or a dedicated vector database) and
363
- compared by semantic similarity.
803
+ [Back to top](#table-of-contents)
804
+
805
+ ## Images
806
+
807
+ The OpenAI, Google, xAI, DeepInfra, and DeepSeek providers have
808
+ builtin image generation capabilities. OpenAI, xAI, and DeepInfra
809
+ also support image edits. Google only supports image generation.
810
+ DeepSeek supports generation and edits too, but only through SVG
811
+ output rather than raster image models.
812
+
813
+ #### Generation
814
+
815
+ The [`LLM::Provider#images`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html#images-instance_method)
816
+ method returns an Image
817
+ object that a subset of providers implement. At the
818
+ moment Google, xAI, OpenAI, DeepInfra, and DeepSeek have image
819
+ generation capabilities. DeepSeek is the odd one out: it generates
820
+ SVG documents rather than raster images.
364
821
 
365
822
  ```ruby
823
+ require "llm"
824
+
825
+ ##
826
+ # Store dogrocket.png
366
827
  llm = LLM.openai(key: ENV["KEY"])
367
- res = llm.embed("llm.rb manages providers, agents, tools, and state")
368
- puts res.model
369
- puts res.embeddings.first.size
828
+ res = llm.images.create(prompt: "a dog on a rocket to the moon")
829
+ IO.copy_stream res.images[0], "dogrocket.png"
370
830
  ```
371
831
 
372
- Embed multiple texts at once:
832
+ The API is the same across providers. <br>
833
+ For example &ndash; xAI:
373
834
 
374
835
  ```ruby
375
- chunks = [
376
- "LLM::Agent manages the tool loop automatically.",
377
- "LLM::Context exposes the low-level tool loop.",
378
- "MCP tools can be passed to agents as local tools."
379
- ]
380
-
381
- res = llm.embed(chunks)
382
- res.embeddings.each_with_index { |vec, i| puts "Vector #{i}: #{vec.size} dimensions" }
836
+ require "llm"
837
+
838
+ ##
839
+ # Store dogrocket.png
840
+ # Same API as OpenAI
841
+ llm = LLM.xai(key: ENV["KEY"])
842
+ res = llm.images.create(prompt: "a dog on a rocket to the moon")
843
+ IO.copy_stream res.images[0], "dogrocket.png"
383
844
  ```
384
845
 
385
- ## Multimodal
846
+ #### Edits
386
847
 
387
- #### Image
848
+ OpenAI, xAI, and DeepInfra have the same interface for image edits. <br>
849
+ DeepSeek also supports edits, but only for SVG files. <br>
850
+ Google does not have edit image support. <br>
388
851
 
389
- Prompts can be strings, arrays, or
390
- [`LLM::Prompt`](https://r.uby.dev/api-docs/llm.rb/LLM/Prompt.html) objects.
391
- <br>
392
- Arrays let you mix text with images and other content.
852
+ ```ruby
853
+ require "llm"
854
+
855
+ ##
856
+ # Edit self.jpg and add a mustache
857
+ # Save to mustache.png
858
+ llm = LLM.openai(key: ENV["KEY"])
859
+ res = llm.images.edit(prompt: "add a mustache", image: "self.jpg")
860
+ IO.copy_stream res.images[0], "mustache.png"
861
+ ```
862
+
863
+ #### DeepSeek
864
+
865
+ The DeepSeek provider does not provide an image generation model
866
+ but it is possible to ask a text-to-text model to produce
867
+ vector graphics (SVGs), and in that limited sense, it can become
868
+ a capable text-to-image model.
393
869
 
394
870
  ```ruby
395
- agent = LLM::Agent.new(llm)
396
- agent.talk [
397
- "Describe this image",
398
- agent.image_url("https://example.com/image.png")
399
- ]
871
+ require "llm"
872
+
873
+ ##
874
+ # Edit rocket.svg and change its color
875
+ # Save to rocket-edited.svg
876
+ llm = LLM.deepseek(key: ENV["KEY"])
877
+ res = llm.images.edit(prompt: "make the rocket red", image: "rocket.svg")
878
+ IO.copy_stream res.images[0], "rocket-edited.svg"
400
879
  ```
401
880
 
402
- Attach local files directly with
403
- [`LLM::Agent#ask`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#ask-instance_method):
881
+ An interesting property of the DeepSeek implementation is that
882
+ it can maintain a session that can perform multiple image generations
883
+ or edits rather than just one-shot generations.
884
+
885
+ It's possible because under the hood
886
+ [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
887
+ is attached to the
888
+ [`LLM::Response`](https://r.uby.dev/api-docs/llm.rb/LLM/Response.html)
889
+ object that is returned to the caller. So the response includes an
890
+ `agent` method, and it can be carried across multiple generations.
891
+ It is specific to this endpoint though. It works like this:
404
892
 
405
893
  ```ruby
406
- agent = LLM::Agent.new(llm)
407
- puts agent.ask("Summarize this document.", with: "README.md").content
894
+ require "llm"
895
+
896
+ llm = LLM.deepseek(key: ENV["DEEPSEEK_SECRET"])
897
+ agent = nil
898
+ loop do
899
+ print "> "
900
+ prompt = $stdin.gets
901
+ res = llm.images.create(prompt:, agent:)
902
+ agent = res.agent
903
+ IO.copy_stream res.images[0], "image.svg"
904
+ print "ok: saved image.svg", "\n"
905
+ end
408
906
  ```
409
907
 
410
- ## Tracing
908
+ [Back to top](#table-of-contents)
909
+
910
+ ## Audio
411
911
 
412
- #### Logger
912
+ The audio interface defined by llm.rb describes three methods,
913
+ although not every provider implements all of them. Generally
914
+ speaking the audio interface is for text-to-speech, and
915
+ speech-to-text models.
413
916
 
414
- Attach a tracer at the provider level to log requests and tool calls:
917
+ The following providers have audio support:
918
+
919
+ * OpenAI - full support
920
+ * Google - partial support
921
+ * DeepInfra - partial support
922
+
923
+ #### text-to-speech
924
+
925
+ The `create_speech` method generates an audio clip based
926
+ on the given input. This method returns a
927
+ [`LLM::URIData`](https://r.uby.dev/api-docs/llm.rb/LLM/URIData.html)
928
+ object. OpenAI, and DeepInfra support this method.
415
929
 
416
930
  ```ruby
417
- llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
418
- agent = LLM::Agent.new(llm)
419
- agent.talk("Hello")
931
+ require "llm"
932
+
933
+ llm = LLM.openai(key: ENV["KEY"])
934
+ res = llm.audio.create_speech(input: "Hello world")
935
+ IO.copy_stream res.audio.decoded, "helloworld.mp3"
420
936
  ```
421
937
 
422
- ## Applications
938
+ #### speech-to-text
423
939
 
424
- #### SSH
940
+ The `create_transcription` method transcribes a given
941
+ audio clip as text. OpenAI, Google and DeepInfra support
942
+ this method.
943
+
944
+ ```ruby
945
+ require "llm"
946
+
947
+ llm = LLM.google(key: ENV["KEY"])
948
+ res = llm.audio.create_transcription(file: "helloworld.mp3")
949
+ res.text # => "Hello world"
950
+ ```
425
951
 
426
- The llm.rb runtime powers small terminal applications that you can try over
427
- SSH right now.
952
+ #### translation
953
+
954
+ The `create_translation` method translates a given audio
955
+ clip, then transcribes it as text. OpenAI, and Google
956
+ support this method.
957
+
958
+ ```ruby
959
+ require "llm"
960
+
961
+ llm = LLM.google(key: ENV["KEY"])
962
+ res = llm.audio.create_translation(file: "bomdia.mp3")
963
+ res.text # => "Good day"
964
+ ```
428
965
 
429
- | Application | Try it | Runtime |
430
- |---|---|---|
431
- | [matz](https://r.uby.dev/matz/) | `ssh matz@r.uby.dev` | [mruby-llm](https://r.uby.dev/mruby-llm/) |
432
- | [robert](https://4.4bsd.dev/robert) | `ssh robert@4.4bsd.dev` | [mruby-llm](https://r.uby.dev/mruby-llm/) |
966
+ [Back to top](#table-of-contents)