RubyGems - llm.rb - Versions diffs - 11.3.1 → 12.0.0 - Mend

llm.rb 11.3.1 → 12.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +242 -1
data/LICENSE +92 -17
data/README.md +204 -623
data/data/anthropic.json +433 -249
data/data/bedrock.json +2097 -1055
data/data/deepinfra.json +993 -0
data/data/deepseek.json +53 -28
data/data/google.json +389 -771
data/data/openai.json +1053 -771
data/data/xai.json +133 -292
data/data/zai.json +249 -141
data/lib/llm/active_record/acts_as_agent.rb +3 -41
data/lib/llm/active_record/acts_as_llm.rb +18 -0
data/lib/llm/active_record.rb +3 -3
data/lib/llm/context.rb +9 -5
data/lib/llm/contract/completion.rb +2 -2
data/lib/llm/provider.rb +2 -2
data/lib/llm/providers/deepinfra/audio.rb +66 -0
data/lib/llm/providers/deepinfra/images.rb +90 -0
data/lib/llm/providers/deepinfra/response_adapter.rb +36 -0
data/lib/llm/providers/deepinfra.rb +100 -0
data/lib/llm/providers/deepseek/images.rb +109 -0
data/lib/llm/providers/deepseek/request_adapter.rb +32 -0
data/lib/llm/providers/deepseek/response_adapter/image.rb +9 -0
data/lib/llm/providers/deepseek/response_adapter.rb +29 -0
data/lib/llm/providers/deepseek.rb +4 -2
data/lib/llm/providers/google/request_adapter.rb +22 -5
data/lib/llm/providers/google.rb +4 -4
data/lib/llm/providers/openai/audio.rb +6 -2
data/lib/llm/providers/openai/images.rb +9 -50
data/lib/llm/providers/openai/request_adapter/respond.rb +38 -4
data/lib/llm/providers/openai/response_adapter/audio.rb +5 -1
data/lib/llm/providers/openai/response_adapter/completion.rb +1 -1
data/lib/llm/providers/openai/response_adapter/image.rb +0 -4
data/lib/llm/providers/openai/responses.rb +1 -0
data/lib/llm/providers/openai/stream_parser.rb +5 -6
data/lib/llm/providers/openai.rb +2 -2
data/lib/llm/providers/xai/images.rb +49 -26
data/lib/llm/providers/xai.rb +2 -2
data/lib/llm/response.rb +10 -0
data/lib/llm/schema/leaf.rb +7 -1
data/lib/llm/schema/renderer.rb +121 -0
data/lib/llm/schema.rb +30 -0
data/lib/llm/sequel/agent.rb +2 -43
data/lib/llm/sequel/plugin.rb +25 -7
data/lib/llm/tracer/telemetry.rb +4 -6
data/lib/llm/tracer.rb +9 -21
data/lib/llm/transport/execution.rb +16 -1
data/lib/llm/transport/net_http_adapter.rb +1 -1
data/lib/llm/uridata.rb +16 -0
data/lib/llm/version.rb +1 -1
data/lib/llm.rb +9 -0
data/llm.gemspec +5 -18
data/resources/deepdive.md +798 -264
metadata +15 -18
data/lib/llm/tracer/langsmith.rb +0 -144

data/resources/deepdive.md CHANGED Viewed

@@ -12,421 +12,955 @@
 > A [r.uby.dev](https://r.uby.dev) project.
-## Intro
+## Welcome
+Welcome to the llm.rb deepdive. You are reading this document
+in the markdown format. An optimized version exists
+at [https://r.uby.dev/llm/deepdive](https://r.uby.dev/llm/deepdive)
+and it is both easier to read and navigate.
+This document is a continuation of the [homepage documentation](https://r.uby.dev/llm).
+It assumes you are familiar with the basics already, and focuses on
+features that didn't make it into the homepage documentation.
+## Table of contents
+- [Agents](#agents)
+  - [As a subclass](#as-a-subclass)
+  - [As an object](#as-an-object)
+- [Skills](#skills)
+  - [SKILL.md](#skillmd)
+  - [Run it](#run-it)
+- [MCP](#mcp)
+  - [stdio](#stdio)
+  - [http](#http)
+- [A2A](#a2a)
+  - [rest](#rest)
+  - [jsonrpc](#jsonrpc)
+- [Transports](#transports)
+  - [net/http](#nethttp)
+  - [net/http/persistent](#nethttppersistent)
+  - [curb](#curb)
+- [Stream](#stream)
+  - [IO-like object](#io-like-object)
+  - [LLM::Stream](#llmstream)
+- [ORM](#orm)
+  - [ActiveRecord](#activerecord)
+  - [Sequel](#sequel)
+- [Schema](#schema)
+  - [Estimation](#estimation)
+- [Cancellation](#cancellation)
+  - [Cancel a request](#cancel-a-request)
+- [Tracer](#tracer)
+  - [Provider-wide tracer](#provider-wide-tracer)
+  - [Agent-local tracer](#agent-local-tracer)
+- [Images](#images)
+  - [Generation](#generation)
+  - [Edits](#edits)
+- [Audio](#audio)
+  - [text-to-speech](#text-to-speech)
+  - [speech-to-text](#speech-to-text)
+  - [translation](#translation)
+## Agents
+An agent is represented by the
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
+class, and it is built on top of
+[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) -
+the heart of the runtime. An agent manages the tool loop automatically,
+implements a tool loop guard for misbehaving models, and
+it can use five different concurrency strategies to execute
+tools.
+An agent can be a subclass of
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
+or a direct
+instance of it. The subclass approach is useful when you
+want reusable agents that can attach behavior (as methods)
+to their own class.
+#### As a subclass
+A subclass of
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
+can define its model, tools,
+and other attributes at the class-level. All of these
+attributes are optional, and they act as defaults that
+can be overriden on the instance level.
+The example uses the `:fork` concurrency model. It has
+two primary benefits: tools are run in parallel, and in
+a separate process with a separate memory address space.
+The example purposefully demonstrates how the attributes
+can be lazily defined with a block, or a Symbol that is
+evaluated as an instance method on the subclass. It is
+not strictly neccessary, though, and the example would
+be simpler without it.
-This guide is a practical walkthrough of [llm.rb](https://github.com/r-uby-dev/llm.rb#readme) —
-Ruby's capable AI runtime.
+```ruby
+class Agent < LLM::Agent
+  model "deepseek-v4-pro"
+  tools { [DoResearch, FinalizeResearch, ActOnResearch] }
+  stream { $stdout }
+  tracer :set_tracer
+  concurrency :fork
+  def research!
+    talk "start the research"
+  end
-llm.rb runs on Ruby's standard library by default and loads optional pieces
-only when needed. You can start with a provider and a single context, then add
-agents, tools, streaming, persistence, embeddings, and protocol clients
-without changing the shape of your code.
+  private
+  def set_tracer
+    LLM::Tracer::Logger.new(llm, io: $stderr)
+  end
+end
+llm   = LLM.deepseek(key: ENV["KEY"])
+agent = Agent.new(llm).tap(&:research!)
+agent.talk "How did the research go?"
+```
-It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google Gemini,
-DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. ActiveRecord and
-Sequel support are built in, along with concurrent tool execution through
-threads, tasks, fibers, ractors, and fork.
+#### As an object
-## Install
+The more direct, and sometimes more convienent approach, is to
+create an instance of
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
+directly. The same attributes can be provided as the
+second argument given to
+[`LLM::Agent.new`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
+and the same lazy evaluation rules apply. This approach can be
+great for prototyping quickly, and you can always turn to a
+subclass later if that makes more sense.
-```bash
-gem install llm.rb
+```ruby
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, stream: $stdout)
+agent.talk "Hello, fellow agent"
 ```
-## Quick Start
+[Back to top](#table-of-contents)
-#### Agent
+## Tools
-[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html) is the
-recommended starting point.
-<br>
-It manages tool execution for you and keeps conversation state across turns.
+A tool extends the capabilities of a model. <br>
+A tool is a subclass of
+[`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
+that has a name,
+a description, and an optional set of typed parameters.
+A tool also has a method associated with it, and when the
+model calls a tool it will do so through this method &ndash;
+alongside any parameters the tool might have defined.
+In other words, a tool provides a way for a model to
+call a method you have written, and it returns a value
+to the model that is considered the tool's response.
+The model then proceeds to process the tool's response,
+and then might generate its own response, or perhaps call
+another tool.
+#### LLM::Tool
+A tool can be defined by subclassing
+[`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
+with
+a name, description, and optional set of parameters. The
+tool name, and description should be informative so the
+model can understand what the tool does and how it can
+serve a user's query.
 ```ruby
 require "llm"
+require "shellwords"
+class Shell < LLM::Shell
+  name "shell"
+  description "execute a shell command"
+  parameter :name, String, "the command's name"
+  parameter :arguments, Array[String], "One or more arguments"
+  required %i[name]
+  defaults arguments: []
+  def call(name:, arguments:)
+    out = `#{name.shellscape} #{arguments.map(&:shellescape).join(" ")}`
+    {ok: $?.success?, out:}
+  end
+end
-llm = LLM.openai(key: ENV["KEY"])
-agent = LLM::Agent.new(llm, stream: $stdout)
-agent.talk "Hello world"
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, tools: [Shell], stream: $stdout)
+agent.talk "What files are in the current working directory?"
 ```
-#### REPL
+#### Errors
+Exceptions that might be raised by a tool are automatically
+rescued and returned to the model as a structured error.
+Otherwise &ndash; the conversation's history could be left
+in an invalid state.
-A read-eval-print loop is the simplest way to interact with an agent.
-<br>
-The loop reads input, sends it to the model, and prints the response as it
-arrives:
+That's because a tool call must complete with a tool response,
+that's the only valid response a model expects, so even in the
+case of an error, something must be returned that communicates
+what happened.
 ```ruby
-require "llm"
+class Error < LLM::Tool
+  name "error"
+  description "demo how errors are handled"
+  ##
+  # Returns
+  # {error: true, kind: "RuntimeError", message: "boom"}
+  def call
+    raise "boom"
+  end
+end
+```
-llm = LLM.openai(key: ENV["KEY"])
-agent = LLM::Agent.new(llm, stream: $stdout)
+## Skills
-loop do
-  print "> "
-  agent.talk(STDIN.gets || break)
-  puts
-end
+The skill concept is borrowed from tools like Claude and
+Codex, but llm.rb gives it a runtime of its own. A skill
+is a directory with a `SKILL.md` file. That file contains
+frontmatter where the skill's name, description, and tools
+can be declared.
+#### SKILL.md
+The `SKILL.md` file can look like this. When a skill runs,
+the runtime spawns a subagent with its own context window
+and message history. Some context is inherited from the
+parent agent, though.
+By default the subagent can only access the tools declared
+by the skill. The `inherit` directive lets it inherit the
+parent agent's tools instead, including A2A and MCP tools.
+```markdown
+---
+name: git-skill
+description: reads my git history and writes a summary
+tools: ['git-log', 'git-show', 'write-file']
+---
+## Task
+Collect a log of recent history.
+Analyze each commit.
+Write a summary to summary.txt
 ```
-#### Context
+#### Run it
+Given the skill above, llm.rb only needs the path to the
+directory that contains `SKILL.md`. Under the hood, a skill
+is represented as a tool the model can call. That means
+a skill can be called whenever it satisfies the user's
+request &ndash; in the same way that a regular tool can.
-[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) is the
-lower-level runtime object.
-<br>
-It holds the same conversation state but leaves tool execution up to you.
-Use it when you want to decide when and how tools run.
+This feature also works with both the ActiveRecord, and
+Sequel integrations.
 ```ruby
 require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: $stdout)
-ctx.talk "Hello world"
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, skills: [__dir__])
+agent.talk "run the git skill"
 ```
-With tools, the manual loop is explicit:
+[Back to top](#table-of-contents)
+## MCP
+#### stdio
+The stdio transport connects to an MCP server that is launched as a
+separate process, and both its standard input and standard output
+streams are used for communication. It is recommended but not
+required to execute commands for a stdio transport over a
+persistent session via the
+[`LLM::MCP#session`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html#session-instance_method)
+method &ndash; otherwise
+you could end up launching the same process multiple times.
 ```ruby
-ctx = LLM::Context.new(llm, tools: [ReadFile])
-ctx.talk("Read README.md and summarize it.")
-ctx.talk(ctx.wait(:call)) while ctx.functions?
+require "llm"
+llm   = LLM.deepseek(key: ENV["KEY"])
+mcp   = LLM::MCP.stdio(argv: ["npx", "-y", "@forgejo/mcp-server"])
+agent = LLM::Agent.new(llm)
+mcp.session do
+  agent.talk "What's happening on forgejo?", tools: mcp.tools
+end
 ```
-For ordinary application code, prefer
-[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
-It does the same thing but manages the loop for you.
+#### http
-## Tools
+The http transport connects to an MCP server over HTTP, and unlike
+the stdio transport, the MCP server does not have to be running
+locally. Popular services like GitHub provide their own MCP server
+over HTTP, and it is one of the most capable MCP servers I have
+used.
-#### Definition
+Unlike the stdio transport,
+[`LLM::MCP#session`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html#session-instance_method)
+carries little benefit for the http transport and it can be
+omitted.  It is recommended to consider the `net_http_persistent`
+transport for MCP interactions that run over HTTP, otherwise
+you could end up tearing down and setting up the same connection
+multiple times.
-Tools extend what the model can do.
-<br>
-They are plain Ruby classes with typed parameters. Define one, attach it to
-an agent, and the model can call it when it makes sense.
+```ruby
+require "llm"
+llm   = LLM.deepseek(key: ENV["KEY"])
+mcp   = LLM::MCP.http(
+  url: "https://api.githubcopilot.com/mcp/",
+  headers: {
+    "Authorization" => "Bearer #{ENV.fetch('GITHUB_PAT')}"
+  },
+  transport: :net_http_persistent
+)
+agent = LLM::Agent.new(llm)
+agent.talk "What's happening on GitHub?", tools: mcp.tools
+```
+[Back to top](#table-of-contents)
+## A2A
+#### rest
+The rest transport communicates with other agents via A2A
+endpoints that speak both HTTP and JSON. The skills advertised
+by an agent become subclasses of
+[`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html)
+that can be used by both
+[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html),
+and [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
+&ndash; similar to how MCP tools become subclasses of
+[`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html).
 ```ruby
-class ReadFile < LLM::Tool
-  name "read-file"
-  description "Read a file"
-  parameter :path, String, "The filename or path"
-  required %i[path]
-  def call(path:)
-    {contents: File.read(path)}
-  end
-end
+require "llm"
+llm   = LLM.deepseek(key: ENV["KEY"])
+a2a   = LLM::A2A.rest(url: "https://agent.example.com")
+agent = LLM::Agent.new(llm, tools: a2a.skills)
+agent.talk "What's happening, fellow agent?"
 ```
-Attach the tool to an agent:
+#### jsonrpc
+The jsonrpc transport communicates with other agents via HTTP
+and a protocol known as jsonrpc. Sometimes an agent will
+implement both, or just one of each. An agent's card, which
+is represented by an instance of
+[`LLM::A2A::Card`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A/Card.html),
+can be
+used to discover available transports via the
+[`LLM::A2A::Card#interfaces`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A/Card.html#interfaces-instance_method)
+method.
 ```ruby
-agent = LLM::Agent.new(llm, stream: $stdout, tools: [ReadFile])
-agent.talk "Read README.md and summarize the project."
+require "llm"
+llm   = LLM.deepseek(key: ENV["KEY"])
+a2a   = LLM::A2A.jsonrpc(url: "https://agent.example.com")
+agent = LLM::Agent.new(llm, tools: a2a.skills)
+agent.talk "What's happening, fellow agent?"
 ```
-[`LLM::Tool`](https://r.uby.dev/api-docs/llm.rb/LLM/Tool.html) handles the
-Ruby-side definition. llm.rb adapts the tool schema to the provider at request
-time.
+[Back to top](#table-of-contents)
+## Transports
-#### Concurrency
+The [`LLM::Provider`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html),
+[`LLM::MCP`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html), and
+[`LLM::A2A`](https://r.uby.dev/api-docs/llm.rb/LLM/A2A.html) classes
+all accept a `transport` option that decides which library
+will be used for HTTP communication. There are three options out
+of the box:
+[`net-http`](https://github.com/ruby/net-http),
+[`net-http-persistent`](https://github.com/drbrain/net-http-persistent),
+and [`curb`](https://github.com/taf2/curb).
-When an agent calls several tools at once, you can run them in parallel.
-<br>
-This cuts down waiting time when tools do independent work like reading
-files or calling APIs.
+#### net/http
+The [`net/http`](https://github.com/ruby/net-http) transport is represented by the symbol `:net_http`. <br>
+It is the default transport.
 ```ruby
-class Agent < LLM::Agent
-  model "gpt-5.4-mini"
-  tools ReadFile
-  concurrency :thread
-end
+require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-agent = Agent.new(llm, stream: $stdout)
-agent.talk "Read README.md and CHANGELOG.md and compare them."
+llm = LLM.deepseek(key: "...", transport: :net_http)
+mcp = LLM::MCP.http(url: "...", transport: :net_http)
+a2a = LLM::A2A.rest(url: "...", transport: :net_http)
 ```
-## Structured Output
-#### Schema
+#### net/http/persistent
-When you need JSON with a known shape, use
-[`LLM::Schema`](https://r.uby.dev/api-docs/llm.rb/LLM/Schema.html).
-<br>
-The model will return data that matches your schema instead of free text.
+The [`net/http/persistent`](https://github.com/drbrain/net-http-persistent) transport is represented by the symbol `:net_http_persistent`. <br>
+It maintains a connection pool so the cost of tearing down and
+setting up a connection repeatedly is kept low, and it is built
+on top of [`net/http`](https://github.com/ruby/net-http).
 ```ruby
-class Report < LLM::Schema
-  property :category, Enum["performance", "security", "outage"]
-  property :summary, String, "Short summary"
-  property :services, Array[String], "Impacted services"
-  required %i[category summary services]
-end
+require "llm"
-agent = LLM::Agent.new(llm, schema: Report)
-res = agent.talk("Classify: 'API latency spiked for the billing service.'")
-puts res.content!
+llm = LLM.deepseek(key: "...", transport: :net_http_persistent)
+mcp = LLM::MCP.http(url: "...", transport: :net_http_persistent)
+a2a = LLM::A2A.rest(url: "...", transport: :net_http_persistent)
 ```
-For one-off schemas, build the shape inline:
+#### curb
+The [`curb`](https://github.com/taf2/curb) transport is represented by the symbol `:curb`. <br>
+It provides bindings for libcurl &ndash; a widely used, highly portable
+and feature-rich HTTP library written in C.
 ```ruby
-schema = LLM::Schema.new.object(
-  category: LLM::Schema.new.string.enum("bug", "feature").required,
-  summary: LLM::Schema.new.string.required
-)
+require "llm"
-agent = LLM::Agent.new(llm, schema:)
-res = agent.talk("Classify: add a dark mode toggle.")
-puts res.content
+llm = LLM.deepseek(key: "...", transport: :curb)
+mcp = LLM::MCP.http(url: "...", transport: :curb)
+a2a = LLM::A2A.rest(url: "...", transport: :curb)
 ```
-## Streaming
+[Back to top](#table-of-contents)
+## Stream
+#### IO-like object
-#### Stream
+Any object that implements the `#<<` method can receive
+chunks from a stream. That includes objects like `$stdout`.
+This form of streaming is simple and limited. It is the
+equivalent of
+[`LLM::Stream#on_content`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html#on_content-instance_method),
+and doesn't include
+any of the other
+[`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
+hooks.
-Streaming works with any object that responds to `#<<`, like `$stdout`.
-<br>
-For more control, subclass
-[`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html) and
-override its callbacks:
+```ruby
+require "llm"
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, stream: $stdout)
+agent.talk "hello world"
+```
+#### LLM::Stream
+The [`LLM::Stream`](https://r.uby.dev/api-docs/llm.rb/LLM/Stream.html)
+class provides many hooks that a subclass
+can implement. They range from being notified when a tool call
+starts to when a tool call finishes, or when a conversation is
+due to be compacted because the context window exceeded a defined
+limit. All these callbacks support a responsive user interface
+where the user is always aware of what is happening behind the
+scenes.
 ```ruby
-class MyStream < LLM::Stream
+class Stream < LLM::Stream
   def on_content(content)
-    print content
+    puts content
   end
   def on_reasoning_content(content)
-    warn content
+    puts content
   end
-end
-llm = LLM.openai(key: ENV["KEY"])
-agent = LLM::Agent.new(llm, stream: MyStream.new)
-agent.talk "Explain Ruby fibers."
+  def on_tool_call(tool, error)
+    # this callback can be used to either log a tool call,
+    # or execute a tool call during a stream.
+  end
+  def on_tool_return(tool, result)
+  end
+  def on_compaction(ctx, compactor)
+    # this callback is called *before* a compact happens
+  end
+  def on_compaction_finish(ctx, compactor)
+    # this callback is called *after* a compact happens
+  end
+end
 ```
-## Skills
+[Back to top](#table-of-contents)
-#### Release
+## Serialization
-Skills package repeatable instructions and scoped tool access into
-`SKILL.md` directories.
-<br>
-They turn common workflows into named capabilities that agents can load
-on demand.
+The [`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
+class can be serialized to JSON and stored in a string or on disk.
+That is powerful because a context contains runtime state that can
+be restored later, in a different process or even on a different
+machine. And because an agent is implemented on top of
+[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html)
+this feature works for [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
+too.
-```yaml
----
-name: release
-description: Prepare a release
-tools: ["search-docs", "git"]
----
+#### Save to disk
-## Task
+The runtime can serialize its state to a string, a text file, or
+a database column. The option that fits best depends on your application
+and environment. Web applications might be more interested in the [ORM](#orm)
+feature, which is built on top of the serialization feature.
-Review the release state, summarize what changed, and prepare the release.
+```ruby
+##
+# Create a provider
+llm = LLM.deepseek(key: ENV["KEY"])
+##
+# Save agent
+agent1 = LLM::Agent.new(llm)
+agent1.talk "remember my name is robert"
+agent1.save(path: "agent.json")
+##
+# Restore agent
+agent2 = LLM::Agent.new(llm, stream: $stdout)
+agent2.restore(path: "agent.json")
+agent2.talk "what's my name?"
 ```
+## ORM
+Both ActiveRecord, and Sequel have first-class support on the
+llm.rb runtime. In both cases an ActiveRecord or Sequel model
+can be turned into a model that has the same capabilities as
+[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html),
+or [`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html).
+The main difference is that the runtime persists directly into
+the database with no requirements beyond a single column on a
+single row. That means it is usually trivial to turn an existing
+model into an AI-aware model.
+#### ActiveRecord
+The ActiveRecord interface for
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)
+is
+[`acts_as_agent`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsAgent.html).
+It yields an instance of
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
+and that can be used
+to configure the agent (eg which model, instructions, skills,
+tools, etc).
+An interesting option is the `format` option, by default it
+defaults to `:string` but it can also be changed to `:json`
+or `:jsonb` depending on the configuration and type of underlying
+column. The JSONB column type is recommended.
 ```ruby
-class ReleaseAgent < LLM::Agent
-  model "gpt-5.4-mini"
-  skills "./skills/release"
-end
+require "active_record"
+require "llm"
+require "llm/active_record"
-llm = LLM.openai(key: ENV["KEY"])
-ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")
-```
+class Agent < ApplicationRecord
+  acts_as_agent(format: :jsonb) do |agent|
+    agent.model "deepseek-v4-pro"
+    agent.instructions "solve the user's query"
+    agent.tools [Research, FinalizeResearch, ActOnResearch]
+  end
-When a skill runs, llm.rb starts a subagent with the skill's instructions,
-its allowed tools, and recent conversation context. Skills can also use
-`tools: inherit` to run with the parent agent's full toolset.
+  private
-## MCP
+  ##
+  # By convention, this method defines the provider
+  # for a model. If neccessary, it can be renamed and
+  # configured via `provider: :your_method` instead.
+  def set_provider
+    LLM.deepseek(key: ENV["KEY"])
+  end
+  ##
+  # By convention, this method should return what is
+  # given as the second argument to `LLM::Context` or
+  # `LLM::Agent`.
+  #
+  # Often, there is no need to set it, so it can be left
+  # undefined or it can be reassigned in the same way as
+  # `set_provider`. For example: `context: :your_method`
+  def set_context
+    {}
+  end
+end
+agent = Agent.create!
+agent.talk "perform research"
+```
-#### Stdio
+#### Sequel
-[`LLM::MCP`](https://r.uby.dev/api-docs/llm.rb/LLM/MCP.html) lets llm.rb use
-tools provided by local stdio servers or remote HTTP servers.
-<br>
-This is how you connect your agent to GitHub, databases, or anything else
-that speaks the Model Context Protocol.
+The following is a Sequel equivalent to the ActiveRecord example,
+but to keep it interesting and informative, this example also
+configures a per-model tracer that logs to `$stdout`. Works the
+same for ActiveRecord.
 ```ruby
+require "sequel"
 require "llm"
+require "llm/sequel/plugin"
+class Agent < Sequel::Model
+  plugin(:agent, format: :jsonb) do |agent|
+    agent.model "deepseek-v4-pro"
+    agent.instructions "solve the user's query"
+    agent.tools [Research, FinalizeResearch, ActOnResearch]
+    agent.tracer { LLM::Tracer::Logger.new(llm, io: $stdout) }
+  end
-llm = LLM.openai(key: ENV["KEY"])
-mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
+  private
-mcp.session do
-  agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
-  agent.talk "Use the available tools to inspect the environment."
+  def set_provider
+    LLM.deepseek(key: ENV["KEY"])
+  end
 end
+agent = Agent.create
+agent.talk "perform research"
 ```
-#### Remote
+[Back to top](#table-of-contents)
+## Schema
+The [`LLM::Schema`](https://r.uby.dev/api-docs/llm.rb/LLM/Schema.html)
+class can be subclassed to describe
+the shape of a JSON object or objects that you expect
+the model to respond with.
+It can be useful for a wide range of use cases but the
+most popular might be classification, data extraction,
+and transferring structured data between different software
+rather than blobs of text that a machine cannot easily parse
+in a structured way.
-For HTTP MCP servers, use persistent connections when you make repeated
-tool calls:
+#### Estimation
+The following example asks the model to estimate the age
+of a person in a photo. The model provides a structured response
+that's represented by an instance of
+[`LLM::Object`](https://r.uby.dev/api-docs/llm.rb/LLM/Object.html).
+The object returned by
+[`LLM::Response#content!`](https://r.uby.dev/api-docs/llm.rb/LLM/Contract/Completion.html#content!-instance_method)
+has methods that can access the age, confidence, and comments
+properties.
+This approach can also work for extracting data or an analysis
+from a PDF, and other file types.
 ```ruby
-mcp = LLM::MCP.http(
-  url: "https://remote-mcp.example.com",
-  transport: :net_http_persistent
-)
+require "llm"
+require "pp"
+class Estimation < LLM::Schema
+  property :age, Integer, "The estimated age of the person"
+  property :confidence, Number, "Your confidence in the estimate"
+  property :applicable, Boolean, "True when the photo contains a person"
+  property :comments, String, "Any additional comments or input"
+  required %i[age confidence applicable comments]
+end
-agent = LLM::Agent.new(llm, stream: $stdout, tools: mcp.tools)
-agent.talk "Use the remote tools to inspect the repository."
+llm = LLM.openai(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, schema: Estimation)
+res = agent.ask "Given this photo, provide an age estimate", with: "photo.jpg"
+##
+# Coerces the model's response from a JSON string
+# to an instance of LLM::Object.
+estimate = res.content!
+##
+# Let's print the estimate
+if estimate.applicable
+  print "The person is approx ", estimate.age.to_s, " years old", "\n"
+  print "I have a confidence rating of ", estimate.confidence.to_s, "\n"
+else
+  print "This photo is not applicable:", "\n"
+  print estimate.comments
+end
 ```
-## Persistence
+[Back to top](#table-of-contents)
+## Cancellation
-#### Overview
+#### Cancel a request
-Agents and contexts serialize to JSON and restore later.
-<br>
-The same serialized state powers the ActiveRecord and Sequel integrations.
+A common scenario when communicating with a model is to
+want to cancel the request mid-stream. This could be done
+for a number of different reasons, most often because the
+user made a mistake, or the model is making a mistake and
+the user wants to cancel the action.
-#### Filesystem
+The runtime has built-in support for cancellation. So for
+example it is possible to cancel a request on the main
+thread from a secondary thread. A number of things happen
+when a request is cancelled. First the request is cancelled
+at the transport level, and each transport handles it a little
+differently. The net effect in every case is that the connection
+is closed.
-Persist agent state to a JSON file on disk.
+The runtime then notifies the rest of the system. so for example,
+if a tool was running, it will receive the `on_interrupt` / `on_cancel`
+callback that lets the tool do any necessary cleanup, or execute its own
+cancellation plan. Tools that were pending (not yet run but requetsed to
+run) are cancelled through
+[`LLM::Function#cancel`](https://r.uby.dev/api-docs/llm.rb/LLM/Function.html#cancel-instance_method).
 ```ruby
 require "llm"
-llm = LLM.openai(key: ENV["KEY"])
+llm = LLM.deepseek(key: ENV["DEEPSEEK_SECRET"])
 agent = LLM::Agent.new(llm)
-agent.talk "Remember that my favorite language is Ruby"
+queue = Queue.new
-# Save
-File.write("agent.json", agent.to_json)
+Thread.new do
+  queue.push(nil)
+  sleep(2)
+  agent.cancel!
+end
-# Restore later
-agent2 = LLM::Agent.new(llm, stream: $stdout)
-agent2.restore(path: "agent.json")
-agent2.talk "What is my favorite language?"
+begin
+  queue.pop
+  agent.talk "write me a very long poem", stream: $stdout
+rescue LLM::Interrupt
+  puts "request cancelled!"
+end
 ```
-#### ActiveRecord
+[Back to top](#table-of-contents)
-[`acts_as_agent`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsAgent.html)
-wraps an agent directly on an ActiveRecord model.
-<br>
-Serialized state lives in a single `data` column while your application
-controls provider, model, and tool configuration.
+## Tracer
-```ruby
-require "llm"
-require "active_record"
-require "llm/active_record"
+The runtime can be observed by subclasses of
+[`LLM::Tracer`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer.html). <br>
+The default tracers include a tracer that can write to standard
+output
+([`LLM::Tracer::Logger`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer/Logger.html)),
+and a generic OpenTelemetry tracer that can export spans via OTLP
+([`LLM::Tracer::Telemetry`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer/Telemetry.html)).
-class Ticket < ApplicationRecord
-  acts_as_agent provider: :set_provider, context: :set_context
-  model "gpt-5.4-mini"
-  instructions "You are a concise support assistant."
-  tools SearchDocs, Escalate
-  concurrency :thread
+llm.rb has numerous hooks implemented throughout the runtime that
+[`LLM::Tracer`](https://r.uby.dev/api-docs/llm.rb/LLM/Tracer.html)
+subclasses can hook into, and the tracer is
+purposefully designed to be extensible. The scope of a trace
+can vary from an individual agent (an instance of
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html)),
+or for every request a provider makes (an indirect instance of
+[`LLM::Provider`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html)).
-  private
+#### Provider-wide tracer
-  def set_provider
-    LLM.openai(key: ENV["OPENAI_SECRET"])
-  end
+The following two examples demonstrate provider-wide tracers that
+cover every request made for a single provider.
-  def set_context
-    {mode: :responses, store: false}
-  end
-end
+```ruby
+##
+# Provider-wide tracer
+# Writes to $stdout
+llm = LLM.deepseek(key: ENV["KEY"])
+llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
-ticket = Ticket.create!
-puts ticket.talk("How do I rotate my API key?").content
+##
+# Provider-wide tracer
+# Writes to deepseek.log
+llm = LLM.deepseek(key: ENV["KEY"])
+llm.tracer = LLM::Tracer::Logger.new(llm, path: "deepseek.log")
 ```
-If you need manual control over tool execution, use
-[`acts_as_llm`](https://r.uby.dev/api-docs/llm.rb/LLM/ActiveRecord/ActsAsLLM.html)
-instead. It wraps
-[`LLM::Context`](https://r.uby.dev/api-docs/llm.rb/LLM/Context.html) with the
-same persistence contract.
+#### Agent-local tracer
-## Embeddings
+The next two examples demonstrate a tracer that is local
+to an agent.
-#### Vector
+```ruby
+##
+# Agent-local
+# Writes to $stdout
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, tracer: LLM::Tracer::Logger.new(llm, io: $stdout))
+##
+# Agent-local
+# Writes to deepseek-agent.log
+llm = LLM.deepseek(key: ENV["KEY"])
+agent = LLM::Agent.new(llm, tracer: LLM::Tracer::Logger.new(llm, path: "deepseek-agent.log"))
+```
-Embeddings turn text into vectors. Call `.embed` on any provider that supports
-it. The returned vectors can be stored in a vector-aware database (PostgreSQL
-with pgvector, SQLite with `vec0`, or a dedicated vector database) and
-compared by semantic similarity.
+[Back to top](#table-of-contents)
+## Images
+The OpenAI, Google, xAI, DeepInfra, and DeepSeek providers have
+builtin image generation capabilities. OpenAI, xAI, and DeepInfra
+also support image edits. Google only supports image generation.
+DeepSeek supports generation and edits too, but only through SVG
+output rather than raster image models.
+#### Generation
+The [`LLM::Provider#images`](https://r.uby.dev/api-docs/llm.rb/LLM/Provider.html#images-instance_method)
+method returns an Image
+object that a subset of providers implement. At the
+moment Google, xAI, OpenAI, DeepInfra, and DeepSeek have image
+generation capabilities. DeepSeek is the odd one out: it generates
+SVG documents rather than raster images.
 ```ruby
+require "llm"
+##
+# Store dogrocket.png
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.embed("llm.rb manages providers, agents, tools, and state")
-puts res.model
-puts res.embeddings.first.size
+res = llm.images.create(prompt: "a dog on a rocket to the moon")
+IO.copy_stream res.images[0], "dogrocket.png"
 ```
-Embed multiple texts at once:
+The API is the same across providers. <br>
+For example &ndash; xAI:
 ```ruby
-chunks = [
-  "LLM::Agent manages the tool loop automatically.",
-  "LLM::Context exposes the low-level tool loop.",
-  "MCP tools can be passed to agents as local tools."
-]
-res = llm.embed(chunks)
-res.embeddings.each_with_index { |vec, i| puts "Vector #{i}: #{vec.size} dimensions" }
+require "llm"
+##
+# Store dogrocket.png
+# Same API as OpenAI
+llm = LLM.xai(key: ENV["KEY"])
+res = llm.images.create(prompt: "a dog on a rocket to the moon")
+IO.copy_stream res.images[0], "dogrocket.png"
 ```
-## Multimodal
+#### Edits
-#### Image
+OpenAI, xAI, and DeepInfra have the same interface for image edits. <br>
+DeepSeek also supports edits, but only for SVG files. <br>
+Google does not have edit image support. <br>
-Prompts can be strings, arrays, or
-[`LLM::Prompt`](https://r.uby.dev/api-docs/llm.rb/LLM/Prompt.html) objects.
-<br>
-Arrays let you mix text with images and other content.
+```ruby
+require "llm"
+##
+# Edit self.jpg and add a mustache
+# Save to mustache.png
+llm = LLM.openai(key: ENV["KEY"])
+res = llm.images.edit(prompt: "add a mustache", image: "self.jpg")
+IO.copy_stream res.images[0], "mustache.png"
+```
+#### DeepSeek
+The DeepSeek provider does not provide an image generation model
+but it is possible to ask a text-to-text model to produce
+vector graphics (SVGs), and in that limited sense, it can become
+a capable text-to-image model.
 ```ruby
-agent = LLM::Agent.new(llm)
-agent.talk [
-  "Describe this image",
-  agent.image_url("https://example.com/image.png")
-]
+require "llm"
+##
+# Edit rocket.svg and change its color
+# Save to rocket-edited.svg
+llm = LLM.deepseek(key: ENV["KEY"])
+res = llm.images.edit(prompt: "make the rocket red", image: "rocket.svg")
+IO.copy_stream res.images[0], "rocket-edited.svg"
 ```
-Attach local files directly with
-[`LLM::Agent#ask`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html#ask-instance_method):
+An interesting property of the DeepSeek implementation is that
+it can maintain a session that can perform multiple image generations
+or edits rather than just one-shot generations.
+It's possible because under the hood
+[`LLM::Agent`](https://r.uby.dev/api-docs/llm.rb/LLM/Agent.html),
+is attached to the
+[`LLM::Response`](https://r.uby.dev/api-docs/llm.rb/LLM/Response.html)
+object that is returned to the caller. So the response includes an
+`agent` method, and it can be carried across multiple generations.
+It is specific to this endpoint though. It works like this:
 ```ruby
-agent = LLM::Agent.new(llm)
-puts agent.ask("Summarize this document.", with: "README.md").content
+require "llm"
+llm = LLM.deepseek(key: ENV["DEEPSEEK_SECRET"])
+agent = nil
+loop do
+  print "> "
+  prompt = $stdin.gets
+  res = llm.images.create(prompt:, agent:)
+  agent = res.agent
+  IO.copy_stream res.images[0], "image.svg"
+  print "ok: saved image.svg", "\n"
+end
 ```
-## Tracing
+[Back to top](#table-of-contents)
+## Audio
-#### Logger
+The audio interface defined by llm.rb describes three methods,
+although not every provider implements all of them. Generally
+speaking the audio interface is for text-to-speech, and
+speech-to-text models.
-Attach a tracer at the provider level to log requests and tool calls:
+The following providers have audio support:
+* OpenAI - full support
+* Google - partial support
+* DeepInfra - partial support
+#### text-to-speech
+The `create_speech` method generates an audio clip based
+on the given input. This method returns a
+[`LLM::URIData`](https://r.uby.dev/api-docs/llm.rb/LLM/URIData.html)
+object. OpenAI, and DeepInfra support this method.
 ```ruby
-llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
-agent = LLM::Agent.new(llm)
-agent.talk("Hello")
+require "llm"
+llm = LLM.openai(key: ENV["KEY"])
+res = llm.audio.create_speech(input: "Hello world")
+IO.copy_stream res.audio.decoded, "helloworld.mp3"
 ```
-## Applications
+#### speech-to-text
-#### SSH
+The `create_transcription` method transcribes a given
+audio clip as text. OpenAI, Google and DeepInfra support
+this method.
+```ruby
+require "llm"
+llm = LLM.google(key: ENV["KEY"])
+res = llm.audio.create_transcription(file: "helloworld.mp3")
+res.text # => "Hello world"
+```
-The llm.rb runtime powers small terminal applications that you can try over
-SSH right now.
+#### translation
+The `create_translation` method translates a given audio
+clip, then transcribes it as text. OpenAI, and Google
+support this method.
+```ruby
+require "llm"
+llm = LLM.google(key: ENV["KEY"])
+res = llm.audio.create_translation(file: "bomdia.mp3")
+res.text # => "Good day"
+```
-| Application | Try it | Runtime |
-|---|---|---|
-| [matz](https://r.uby.dev/matz/) | `ssh matz@r.uby.dev` | [mruby-llm](https://r.uby.dev/mruby-llm/) |
-| [robert](https://4.4bsd.dev/robert) | `ssh robert@4.4bsd.dev` | [mruby-llm](https://r.uby.dev/mruby-llm/) |
+[Back to top](#table-of-contents)