RubyGems - llm.rb - Versions diffs - 8.1.0 → 9.0.0 - Mend

llm.rb 8.1.0 → 9.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +120 -2
data/README.md +161 -514
data/lib/llm/active_record/acts_as_llm.rb +7 -8
data/lib/llm/agent.rb +36 -16
data/lib/llm/context.rb +30 -26
data/lib/llm/contract/completion.rb +45 -0
data/lib/llm/cost.rb +81 -4
data/lib/llm/error.rb +1 -1
data/lib/llm/function/array.rb +8 -5
data/lib/llm/function/call_group.rb +39 -0
data/lib/llm/function/fork/task.rb +6 -0
data/lib/llm/function/ractor/task.rb +6 -0
data/lib/llm/function/task.rb +10 -0
data/lib/llm/function.rb +1 -0
data/lib/llm/mcp/transport/http.rb +26 -46
data/lib/llm/mcp/transport/stdio.rb +0 -8
data/lib/llm/mcp.rb +6 -23
data/lib/llm/provider.rb +23 -20
data/lib/llm/providers/anthropic/error_handler.rb +6 -7
data/lib/llm/providers/anthropic/files.rb +2 -2
data/lib/llm/providers/anthropic/response_adapter/completion.rb +30 -0
data/lib/llm/providers/anthropic.rb +1 -1
data/lib/llm/providers/bedrock/error_handler.rb +8 -9
data/lib/llm/providers/bedrock/models.rb +13 -13
data/lib/llm/providers/bedrock/response_adapter/completion.rb +30 -0
data/lib/llm/providers/bedrock.rb +1 -1
data/lib/llm/providers/google/error_handler.rb +6 -7
data/lib/llm/providers/google/files.rb +2 -4
data/lib/llm/providers/google/images.rb +1 -1
data/lib/llm/providers/google/models.rb +0 -2
data/lib/llm/providers/google/response_adapter/completion.rb +30 -0
data/lib/llm/providers/google.rb +1 -1
data/lib/llm/providers/ollama/error_handler.rb +6 -7
data/lib/llm/providers/ollama/models.rb +0 -2
data/lib/llm/providers/ollama/response_adapter/completion.rb +30 -0
data/lib/llm/providers/ollama.rb +1 -1
data/lib/llm/providers/openai/audio.rb +3 -3
data/lib/llm/providers/openai/error_handler.rb +6 -7
data/lib/llm/providers/openai/files.rb +2 -2
data/lib/llm/providers/openai/images.rb +3 -3
data/lib/llm/providers/openai/models.rb +1 -1
data/lib/llm/providers/openai/response_adapter/completion.rb +42 -0
data/lib/llm/providers/openai/response_adapter/responds.rb +39 -0
data/lib/llm/providers/openai/responses.rb +2 -2
data/lib/llm/providers/openai/vector_stores.rb +1 -1
data/lib/llm/providers/openai.rb +1 -1
data/lib/llm/response.rb +10 -8
data/lib/llm/sequel/plugin.rb +7 -8
data/lib/llm/stream/queue.rb +15 -42
data/lib/llm/stream.rb +4 -4
data/lib/llm/transport/execution.rb +67 -0
data/lib/llm/transport/http.rb +134 -0
data/lib/llm/transport/persistent_http.rb +152 -0
data/lib/llm/transport/response/http.rb +113 -0
data/lib/llm/transport/response.rb +112 -0
data/lib/llm/{provider/transport/http → transport}/stream_decoder.rb +8 -4
data/lib/llm/transport.rb +139 -0
data/lib/llm/usage.rb +14 -5
data/lib/llm/version.rb +1 -1
data/lib/llm.rb +2 -12
data/llm.gemspec +2 -16
metadata +11 -19
data/lib/llm/provider/transport/http/execution.rb +0 -115
data/lib/llm/provider/transport/http/interruptible.rb +0 -114
data/lib/llm/provider/transport/http.rb +0 -145
data/lib/llm/utils.rb +0 -19

data/README.md CHANGED Viewed

@@ -4,519 +4,240 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-8.1.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-9.0.0-green.svg?" alt="Version"></a>
 </p>
 ## About
-llm.rb is the most capable runtime for building AI systems in Ruby.
-<br>
-llm.rb is designed for Ruby, and although it works great in Rails, it is not tightly
-coupled to it. It runs on the standard library by default (zero dependencies),
-loads optional pieces only when needed, includes built-in ActiveRecord support through
-`acts_as_llm` and `acts_as_agent`, includes built-in Sequel support through
-`plugin :llm` and `plugin :agent`, and is designed for engineers who want control over
-long-lived, tool-capable, stateful AI workflows instead of just
-request/response helpers.
-It provides one runtime for providers, agents, tools, skills, MCP servers, streaming,
-schemas, files, and persisted state, so real systems can be built out of one coherent
-execution model instead of a pile of adapters.
-It supports providers including OpenAI, Anthropic, Google Gemini, DeepSeek, xAI,
-Z.ai, and AWS Bedrock.
-It provides concurrent tool execution with multiple strategies exposed through a single
-runtime: async-task, threads, fibers, ractors and processes (fork). The first three are
-good for IO-bound work and the last two are good for CPU-bound work. Ractor support is
-experimental and comes with limitations.
-Want to see some code? Jump to [the examples](#examples) section. <br>
-Want to see a self-hosted LLM environment built on llm.rb? Check out [relay.app](https://github.com/llmrb/relay.app). <br>
-Want to use llm.rb with mruby ? Check out [mruby-llm](https://github.com/llmrb/mruby-llm)
-## Architecture
-<p align="center">
-  <img src="https://github.com/llmrb/llm.rb/raw/main/resources/architecture.png" alt="llm.rb architecture" width="790">
-</p>
-## Core Concept
+llm.rb is Ruby's most capable AI runtime.
-[`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
-is the execution boundary in llm.rb.
+It runs on Ruby's standard library by default. loads optional pieces
+only when needed, and offers a single runtime for providers, agents,
+tools, skills, MCP, streaming, files, and persisted state. As a bonus,
+llm.rb is also [available for mruby](https://github.com/llmrb/mruby-llm).
-It holds:
-- message history
-- tool state
-- schemas
-- streaming configuration
-- usage and cost tracking
+It supports OpenAI, OpenAI-compatible endpoints, Anthropic, Google
+Gemini, DeepSeek, xAI, Z.ai, AWS Bedrock, Ollama, and llama.cpp. It
+also includes built-in ActiveRecord and Sequel support, plus concurrent
+tool execution through threads, tasks (via async gem), fibers, ractors,
+and fork (via xchan.rb gem).
-Instead of switching abstractions for each feature, everything builds on the
-same context object.
+## Quick start
-## Standout features
+#### LLM::Context
-The following list is **not exhaustive**, but it covers a lot of ground.
+The
+[LLM::Context](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+object is at the heart of the runtime. Almost all other features build
+on top of it. It is a low-level interface to a model, and requires tool
+execution to be managed manually. The
+[LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
+class is almost the same as
+[LLM::Context](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+but it manages tool execution for you - we'll cover agents next:
-#### Skills
+```ruby
+require "llm"
-Skills are reusable, directory-backed capabilities loaded from `SKILL.md`.
-They run through the same runtime as tools, agents, and MCP. They do not
-require a second orchestration layer or a parallel abstraction. If you've
-used Claude or Codex, you know the general idea of skills, and llm.rb
-supports that same concept with the same execution model as the rest of the
-system.
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(llm, stream: $stdout)
+ctx.talk "Hello world"
+```
-In llm.rb, a skill has frontmatter and instructions. The frontmatter can
-define `name`, `description`, and `tools`. The `tools` entries are tool names,
-and each name must resolve to a subclass of
-[`LLM::Tool`](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html) that is already
-loaded in the runtime.
+#### LLM::Agent
-If you want Claude/Codex-like skills that can drive scripts or shell
-commands, you would typically pair the skill with a tool that can execute
-system commands.
-```yaml
----
-name: release
-description: Prepare a release
-tools:
-  - search_docs
-  - git
----
-Review the release state, summarize what changed, and prepare the release.
-```
+The
+[LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
+object is implemented on top of
+[LLM::Context](https://0x1eef.github.io/x/llm.rb/LLM/Context.html).
+It provides the same interface, but manages tool execution for you. It
+also has builtin features such as a loop guard that detects repeated
+tool call patterns, and another guard that detects infinite tool call
+loops. Both guards advise the model to change course rather than raise
+an error:
 ```ruby
-class Agent < LLM::Agent
-  model "gpt-5.4-mini"
-  skills "./skills/release"
-  tracer { LLM::Tracer::Logger.new(llm, path: "logs/release-agent.log") }
-end
+require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-Agent.new(llm, stream: $stdout).talk("Let's prepare the release!")
+agent = LLM::Agent.new(llm, stream: $stdout)
+agent.talk "Hello world"
 ```
-#### ORM
-Any ActiveRecord model or Sequel model can become an agent-capable model,
-including existing business and domain models, without forcing you into a
-separate agent table or a second persistence layer.
-`acts_as_agent` extends a model with agent capabilities: the same runtime
-surface as [`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html),
-because it actually wraps an `LLM::Agent`, plus persistence through one text,
-JSON, or JSONB-backed `data` column on the same table. If your app also has
-provider or model columns, provide them to llm.rb through `set_provider` and
-`set_context`.
+#### Tools
+The
+[LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html)
+class can be subclassed to implement your own tools that can extend the
+abilities of a model:
 ```ruby
-class Ticket < ApplicationRecord
-  acts_as_agent provider: :set_provider, context: :set_context
-  model "gpt-5.4-mini"
-  instructions "You are a support assistant."
-  private
-  def set_provider
-    LLM.openai(key: ENV["OPENAI_SECRET"])
-  end
-  def set_context
-    { mode: :responses, store: false }
+class ReadFile < LLM::Tool
+  name "read-file"
+  description "Read a file"
+  parameter :path, String, "The filename or path"
+  required %i[path]
+  def call(path:)
+    {contents: File.read(path)}
   end
 end
 ```
-#### Agentic Patterns
-llm.rb is especially strong when you want to build agentic systems in a Ruby
-way. Agents can be ordinary application models with state, associations,
-tools, skills, and persistence, which makes it much easier to build systems
-where users have their own specialized agents instead of treating agents as
-something outside the app.
-That pattern works so well in llm.rb because
-[`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html),
-`acts_as_agent`, `plugin :agent`, skills, tools, and persisted runtime state
-all fit the same execution model. The runtime stays small enough that the
-main design work becomes application design, not orchestration glue.
-For a concrete example, see
-[How to build a platform of agents](https://0x1eef.github.io/posts/how-to-build-a-platform-of-agents).
-#### Persistence
+#### MCP
-The same runtime can be serialized to disk, restored later, persisted in JSON
-or JSONB-backed ORM columns, resumed across process boundaries, or shared
-across long-lived workflows.
+The
+[LLM::MCP](https://0x1eef.github.io/x/llm.rb/LLM/MCP.html)
+object lets llm.rb use tools provided by an MCP server. Those tools are
+exposed through the same runtime as local tools, so you can pass them
+to either
+[LLM::Context](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+or
+[LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html).
+In this example, the MCP server runs over stdio and
+[LLM::Context](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+uses the same tool loop as local tools:
 ```ruby
-ctx = LLM::Context.new(llm)
-ctx.talk("Remember that my favorite language is Ruby.")
-ctx.save(path: "context.json")
-```
-#### Context Compaction
+require "llm"
-Long-lived contexts can compact older history into a summary instead of
-growing forever. Compaction is built into [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
-through [`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html),
-and when a stream is present it emits `on_compaction` and
-`on_compaction_finish` through [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
-The compactor can also use a different model from the main context, which is
-useful when you want summarization to run on a cheaper or faster model.
-`token_threshold:` accepts either a fixed token count or a percentage string
-like `"90%"`, which resolves against the active model context window and
-triggers compaction once total token usage goes over that percentage.
+llm = LLM.openai(key: ENV["KEY"])
+mcp = LLM::MCP.stdio(argv: ["ruby", "server.rb"])
-```ruby
-ctx = LLM::Context.new(
-  llm,
-  compactor: {
-    token_threshold: "90%",
-    retention_window: 8,
-    model: "gpt-5.4-mini"
-  }
-)
+mcp.run do
+  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
+  ctx.talk "Use the available tools to inspect the environment."
+  ctx.talk(ctx.wait(:call)) while ctx.functions?
+end
 ```
-#### Guards
+#### Skills
-Guards let llm.rb supervise agentic execution, not just run it.
-They live on [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
-can inspect the current runtime state, and can step in when a context is no
-longer making progress.
+Skills are reusable instructions loaded from a `SKILL.md` directory. They let
+you package behavior and tool access together, and they plug into the same
+runtime as tools, agents, and MCP. When a skill runs, llm.rb spawns a
+subagent with the skill instructions, access to only the tools listed in the
+skill, and recent conversation context:
-[`LLM::LoopGuard`](https://0x1eef.github.io/x/llm.rb/LLM/LoopGuard.html) is
-the built-in implementation. It detects repeated tool-call patterns and
-blocks pending tool execution with in-band guarded tool errors instead of
-letting the loop keep spinning. [`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
-enables that guard by default through its wrapped context.
+```yaml
+---
+name: release
+description: Prepare a release
+tools: ["search-docs", "git"]
+---
-```ruby
-ctx = LLM::Context.new(llm)
-ctx.guard = MyGuard.new
-```
+## Task
-#### Transformers
-Transformers let llm.rb rewrite outgoing prompts and params before a request
-is sent to the provider. They also live on
-[`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html), but
-they solve a different problem from guards: instead of blocking execution,
-they can normalize or scrub what gets sent. When a stream is present, that
-lifecycle is also exposed through
-[`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) with
-`on_transform` and `on_transform_finish`.
-That makes them a good fit for things like PII scrubbing, prompt
-normalization, or request-level param injection. A transformer just needs to
-implement `call(ctx, prompt, params)` and return `[prompt, params]`. That
-means a transformer can scrub plain text prompts, but it can also scrub
-[`LLM::Function::Return`](https://0x1eef.github.io/x/llm.rb/LLM/Function/Return.html)
-values. In other words, you can intercept a tool call's return value and
-modify it before sending it back to the LLM.
-That is also a useful UI hook. A stream can surface messages like
-`Anonymizing your data...` before a scrubber runs and `Data anonymized.`
-after it finishes.
+Review the release state, summarize what changed, and prepare the release.
+```
 ```ruby
-class ScrubPII
-  EMAIL = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i
+require "llm"
-  def call(ctx, prompt, params)
-    [scrub(prompt), params]
-  end
+class ReleaseAgent < LLM::Agent
+  model "gpt-5.4-mini"
+  skills "./skills/release"
+end
-  private
+llm = LLM.openai(key: ENV["KEY"])
+ReleaseAgent.new(llm, stream: $stdout).talk("Prepare the next release.")
+```
-  def scrub(prompt)
-    case prompt
-    when String then prompt.gsub(EMAIL, "[REDACTED_EMAIL]")
-    when Array then prompt.map { scrub(_1) }
-    when LLM::Function::Return then on_tool_return(prompt)
-    else prompt
-    end
-  end
+#### LLM::Stream
-  def on_tool_return(result)
-    value = case result.name
-    when "lookup-customer" then scrub_value(result.value)
-    else result.value
-    end
-    LLM::Function::Return.new(result.id, result.name, value)
-  end
+The
+[LLM::Stream](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html)
+object lets you observe output and runtime events as they happen. You
+can subclass it to handle streamed content in your own application:
-  def scrub_value(value)
-    case value
-    when String then value.gsub(EMAIL, "[REDACTED_EMAIL]")
-    when Array then value.map { scrub_value(_1) }
-    when Hash then value.transform_values { scrub_value(_1) }
-    else value
-    end
+```ruby
+require "llm"
+class Stream < LLM::Stream
+  def on_content(content)
+    $stdout << content
   end
 end
-ctx = LLM::Context.new(llm)
-ctx.transformer = ScrubPII.new
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(llm, stream: Stream.new)
+ctx.talk "Write a haiku about Ruby."
 ```
-When a stream is present, that transformer lifecycle is also exposed through
-`on_transform` and `on_transform_finish` on
-[`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
+#### LLM::Stream (advanced)
-#### LLM::Stream
-`LLM::Stream` is not just for printing tokens. It supports `on_content`,
-`on_reasoning_content`, `on_tool_call`, `on_tool_return`, `on_transform`,
-`on_transform_finish`, `on_compaction`, and `on_compaction_finish`, which
-means visible output, reasoning output, request rewriting, tool execution,
-and context compaction can all be driven through the same execution path.
+The
+[LLM::Stream](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html)
+object can also resolve tool calls while output is still streaming. In
+`on_tool_call`, you can spawn the tool, push the work onto the stream
+queue, and later drain it with `wait`:
 ```ruby
+require "llm"
 class Stream < LLM::Stream
-  def on_tool_call(tool, error)
-    queue << (error || ctx.spawn(tool, :thread))
+  def on_content(content)
+    $stdout << content
   end
-  def on_tool_return(tool, result)
-    puts(result.value)
+  def on_tool_call(tool, error)
+    return queue << error if error
+    queue << ctx.spawn(tool, :thread)
   end
 end
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(llm, stream: Stream.new, tools: [ReadFile])
+ctx.talk "Read README.md and summarize the quick start."
+ctx.talk(ctx.wait) while ctx.functions?
 ```
 #### Concurrency
-Tool execution can run sequentially with `:call` or concurrently through
-`:thread`, `:task`, `:fiber`, `:fork`, and experimental `:ractor`, without
-rewriting your tool layer. Async tasks, threads, and fibers are the
-I/O-bound options. Fork and ractor are the CPU-bound options. `:fork`
-requires [`xchan.rb`](https://github.com/0x1eef/xchan.rb#readme) support,
-and `:ractor` is still experimental.
-`:fiber` uses `Fiber.schedule`, so it requires `Fiber.scheduler`.
+llm.rb can run tool work concurrently. This is useful when a model calls
+multiple tools and you want to resolve them in parallel instead of one
+at a time. On
+[LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html),
+you can enable this with `concurrency`. Common options are `:call` for
+sequential execution, `:thread`, or `:task` for concurrent IO-bound work, and
+`:ractor` or `:fork` for more isolated CPU-bound work:
 ```ruby
+require "llm"
 class Agent < LLM::Agent
   model "gpt-5.4-mini"
-  tools FetchWeather, FetchNews, FetchStock
+  tools ReadFile
   concurrency :thread
 end
-```
-#### MCP
-Remote MCP tools and prompts are not bolted on as a separate integration
-stack. They adapt into the same tool and prompt path used by local tools,
-skills, contexts, and agents.
-Use `mcp.run do ... end` for scoped work where the client should start and
-stop around one block. Use `mcp.start` and `mcp.stop` directly when you need
-finer sequential control across several steps before shutting the client down.
-```ruby
-mcp = LLM::MCP.http(
-  url: "https://api.githubcopilot.com/mcp/",
-  headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
-  persistent: true
-)
-mcp.run do
-  ctx = LLM::Context.new(llm, tools: mcp.tools)
-end
+llm = LLM.openai(key: ENV["KEY"])
+agent = Agent.new(llm, stream: $stdout)
+agent.talk "Read README.md and CHANGELOG.md and compare them."
 ```
-#### Cancellation
+#### Serialization
-Cancellation is one of the harder problems to get right, and while llm.rb
-makes it possible, it still requires careful engineering to use effectively.
-The point though is that it is possible to stop in-flight provider work cleanly
-through the same runtime, and the model used by llm.rb is directly inspired by
-Go's context package. In fact, llm.rb is heavily inspired by Go but with a Ruby
-twist.
+The [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+object can be serialized to JSON, which makes it suitable for storing
+in a file, a database column, or a Redis queue. The built-in
+ActiveRecord and Sequel plugins are built on top of this feature:
 ```ruby
 require "llm"
-require "io/console"
 llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: $stdout)
-worker = Thread.new do
-  ctx.talk("Write a very long essay about network protocols.")
-rescue LLM::Interrupt
-  puts "Request was interrupted!"
-end
-STDIN.getch
-ctx.interrupt!
-worker.join
-```
+# Serialize a context
+ctx1 = LLM::Context.new(llm)
+ctx1.talk "Remember that my favorite language is Ruby"
+string = ctx1.to_json
-## Differentiators
-### Execution Model
-- **A system layer, not just an API wrapper** <br>
-  Put providers, tools, MCP servers, and application APIs behind one runtime
-  model instead of stitching them together by hand.
-- **Contexts are central** <br>
-  Keep history, tools, schema, usage, persistence, and execution state in one
-  place instead of spreading them across your app.
-- **Contexts can be serialized** <br>
-  Save and restore live state for jobs, databases, retries, or long-running
-  workflows.
-### Runtime Behavior
-- **Streaming and tool execution work together** <br>
-  Start tool work while output is still streaming so you can hide latency
-  instead of waiting for turns to finish.
-- **Agents auto-manage tool execution** <br>
-  Use `LLM::Agent` when you want the same stateful runtime surface as
-  `LLM::Context`, but with tool loops executed automatically according to a
-  configured concurrency mode such as `:call`, `:thread`, `:task`, `:fiber`,
-  `:fork`, or experimental `:ractor` support for class-based tools. MCP tools
-  are not supported by the current `:ractor` mode, but mixed tool sets can
-  still route MCP tools and local tools through different strategies at
-  runtime. By default, the tool attempt budget is `25`. When an agent
-  exhausts that budget, it sends advisory tool errors back through the model
-  instead of raising out of the runtime. Set `tool_attempts: nil` to disable
-  that advisory behavior.
-- **Tool calls have an explicit lifecycle** <br>
-  A tool call can be executed, cancelled through
-  [`LLM::Function#cancel`](https://0x1eef.github.io/x/llm.rb/LLM/Function.html#cancel-instance_method),
-  or left unresolved for manual handling, but the normal runtime contract is
-  still that a model-issued tool request is answered with a tool return.
-- **Requests can be interrupted cleanly** <br>
-  Stop in-flight provider work through the same runtime instead of treating
-  cancellation as a separate concern.
-  [`LLM::Context#cancel!`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html#cancel-21-instance_method)
-  is inspired by Go's context cancellation model.
-- **Concurrency is a first-class feature** <br>
-  Use async tasks, threads, fibers, forks, or experimental ractors without
-  rewriting your tool layer. Async tasks, threads, and fibers are the
-  I/O-bound options. Fork and ractor are the CPU-bound options. `:fork`
-  requires [`xchan.rb`](https://github.com/0x1eef/xchan.rb#readme) support.
-  The current `:ractor` mode is for class-based tools, and MCP tools are
-  not supported by ractor, but mixed workloads can branch on `tool.mcp?`
-  and choose a supported strategy per tool. Class-based `:ractor` tools
-  still emit normal tool tracer callbacks. `:fiber` uses `Fiber.schedule`,
-  so it requires `Fiber.scheduler`.
-- **Advanced workloads are built in, not bolted on** <br>
-  Streaming, concurrent tool execution, persistence, tracing, and MCP support
-  all fit the same runtime model.
-### Integration
-- **MCP is built in** <br>
-  Connect to MCP servers over stdio or HTTP without bolting on a separate
-  integration stack.
-- **ActiveRecord and Sequel persistence are built in** <br>
-  llm.rb includes built-in ActiveRecord support through `acts_as_llm` and
-  `acts_as_agent`, plus built-in Sequel support through `plugin :llm` and
-  `plugin :agent`.
-  Use `acts_as_llm` when you want to wrap `LLM::Context`, `acts_as_agent`
-  when you want to wrap `LLM::Agent`, `plugin :llm` when you want a
-  `LLM::Context` on a Sequel model, or `plugin :agent` when you want an
-  `LLM::Agent`. These integrations support `provider:` and `context:` hooks,
-  plus `format: :string` for text columns or `format: :jsonb` for native
-  PostgreSQL JSON storage when ORM JSON typecasting support is enabled.
-- **ORM models can become persistent agents** <br>
-  Turn an ActiveRecord or Sequel model into an agent-capable model with
-  built-in persistence, stored on the same table, with `jsonb` support when
-  your ORM and database support native JSON columns.
-- **Persistent HTTP pooling is shared process-wide** <br>
-  When enabled, separate
-  [`LLM::Provider`](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html)
-  instances with the same endpoint settings can share one persistent
-  pool, and separate HTTP
-  [`LLM::MCP`](https://0x1eef.github.io/x/llm.rb/LLM/MCP.html)
-  instances can do the same, instead of each object creating its own
-  isolated per-instance transport.
-- **OpenAI-compatible gateways are supported** <br>
-  Target OpenAI-compatible services such as DeepInfra and OpenRouter, as well
-  as proxies and self-hosted servers, with `host:` and `base_path:` when they
-  preserve OpenAI request shapes but change the API root path.
-- **Provider support is broad** <br>
-  Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek,
-  Z.ai, xAI, AWS Bedrock, llama.cpp, and Ollama through the same runtime.
-- **Tools are explicit** <br>
-  Run local tools, provider-native tools, and MCP tools through the same path
-  with fewer special cases.
-- **Skills become bounded runtime capabilities** <br>
-  Point llm.rb at directories with a `SKILL.md`, resolve named tools through
-  the registry, and adapt each skill into its own callable capability through
-  the normal runtime. Unlike a generic skill-discovery tool, each skill runs
-  with its own bounded tool subset and behaves like a task-scoped sub-agent.
-- **Providers are normalized, not flattened** <br>
-  Share one API surface across providers without losing access to provider-
-  specific capabilities where they matter.
-- **Responses keep a uniform shape** <br>
-  Provider calls return
-  [`LLM::Response`](https://0x1eef.github.io/x/llm.rb/LLM/Response.html)
-  objects as a common base shape, then extend them with endpoint- or
-  provider-specific behavior when needed.
-- **Low-level access is still there** <br>
-  Normalized responses still keep the raw `Net::HTTPResponse` available when
-  you need headers, status, or other HTTP details.
-- **Local model metadata is included** <br>
-  Model capabilities, pricing, and limits are available locally without extra
-  API calls.
-### Design Philosophy
-- **Runs on the stdlib** <br>
-  Start with Ruby's standard library and add extra dependencies only when you
-  need them.
-- **It is highly pluggable** <br>
-  Add tools, swap providers, change JSON backends, plug in tracing, or layer
-  internal APIs and MCP servers into the same execution path.
-- **It scales from scripts to long-lived systems** <br>
-  The same primitives work for one-off scripts, background jobs, and more
-  demanding application workloads with streaming, persistence, and tracing.
-- **Thread boundaries are clear** <br>
-  Providers are shareable. Contexts are stateful and should stay thread-local.
-## Capabilities
-Execution:
-- **Chat & Contexts** — stateless and stateful interactions with persistence
-- **Context Serialization** — save and restore state across processes or time
-- **Streaming** — visible output, reasoning output, tool-call events
-- **Request Interruption** — stop in-flight provider work cleanly
-- **Concurrent Execution** — threads, async tasks, and fibers
-Runtime Building Blocks:
-- **Tool Calling** — class-based tools and closure-based functions
-- **Run Tools While Streaming** — overlap model output with tool latency
-- **Agents** — reusable assistants with tool auto-execution
-- **Skills** — directory-backed capabilities loaded from `SKILL.md`
-- **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
-- **Context Compaction** — summarize older history in long-lived contexts
-Data and Structure:
-- **Structured Outputs** — JSON Schema-based responses
-- **Responses API** — stateful response workflows where providers support them
-- **Multimodal Inputs** — text, images, audio, documents, URLs
-- **Audio** — speech generation, transcription, translation
-- **Images** — generation and editing
-- **Files API** — upload and reference files in prompts
-- **Embeddings** — vector generation for search and RAG
-- **Vector Stores** — retrieval workflows
-Operations:
-- **Cost Tracking** — local cost estimation without extra API calls
-- **Observability** — tracing, logging, telemetry
-- **Model Registry** — local metadata for capabilities, limits, pricing
-- **Persistent HTTP** — optional connection pooling for providers and MCP
+# Restore a context (from JSON)
+ctx2 = LLM::Context.new(llm, stream: $stdout)
+ctx2.restore(string:)
+ctx2.talk "What is my favorite language?"
+```
 ## Installation
@@ -566,80 +287,6 @@ ctx = LLM::Context.new(llm)
 ctx.talk ["Summarize this document.", ctx.local_file("README.md")]
 ```
-#### Agent
-This example uses [`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html) directly and lets the agent manage tool execution. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
-```ruby
-require "llm"
-class ShellAgent < LLM::Agent
-  model "gpt-5.4-mini"
-  instructions "You are a Linux system assistant."
-  tools Shell
-  concurrency :thread
-end
-llm = LLM.openai(key: ENV["KEY"])
-agent = ShellAgent.new(llm)
-puts agent.talk("What time is it on this system?").content
-```
-#### Skills
-This example uses [`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html) with directory-backed skills so `SKILL.md` capabilities run through the normal tool path. In llm.rb, a skill is exposed as a tool in the runtime. When that tool is called, it spawns a sub-agent with relevant context plus the instructions and tool subset declared in its own `SKILL.md`. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
-Each skill runs only with the tools declared in its own frontmatter.
-```ruby
-require "llm"
-class Agent < LLM::Agent
-  model "gpt-5.4-mini"
-  instructions "You are a concise release assistant."
-  skills "./skills/release", "./skills/review"
-  tracer { LLM::Tracer::Logger.new(llm, path: "logs/release-agent.log") }
-end
-llm = LLM.openai(key: ENV["KEY"])
-puts Agent.new(llm).talk("Use the review skill.").content
-```
-#### Streaming
-This example uses [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) directly so visible output and tool execution can happen together. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
-```ruby
-require "llm"
-class Stream < LLM::Stream
-  def on_content(content)
-    $stdout << content
-  end
-  def on_tool_call(tool, error)
-    return queue << error if error
-    $stdout << "\nRunning tool #{tool.name}...\n"
-    queue << ctx.spawn(tool, :thread)
-  end
-  def on_tool_return(tool, result)
-    if result.error?
-      $stdout << "Tool #{tool.name} failed\n"
-    else
-      $stdout << "Finished tool #{tool.name}\n"
-    end
-  end
-end
-llm = LLM.openai(key: ENV["KEY"])
-stream = Stream.new
-ctx = LLM::Context.new(llm, stream:, tools: [System])
-ctx.talk("Run `date` and `uname -a`.")
-ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
-```
 #### Context Compaction
 This example uses [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
@@ -652,7 +299,7 @@ compactor can also use its own `model:` if you want summarization to run on a
 different model from the main context. `token_threshold:` accepts either a
 fixed token count or a percentage string like `"90%"`, which resolves
 against the active model context window and triggers compaction once total
-token usage goes over that percentage. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
+token usage goes over that percentage. See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
 ```ruby
 require "llm"
@@ -744,7 +391,7 @@ class Context < Sequel::Model
   private
   def set_provider
-    LLM.openai(key: ENV["OPENAI_SECRET"])
+    LLM.openai(key: ENV["OPENAI_SECRET"], persistent: true)
   end
   def set_context
@@ -880,7 +527,7 @@ This example uses [`LLM::MCP`](https://0x1eef.github.io/x/llm.rb/LLM/MCP.html) o
 require "llm"
 require "net/http/persistent"
-llm = LLM.openai(key: ENV["KEY"])
+llm = LLM.openai(key: ENV["KEY"], persistent: true)
 mcp = LLM::MCP.http(
   url: "https://api.githubcopilot.com/mcp/",
   headers: {"Authorization" => "Bearer #{ENV["GITHUB_PAT"]}"},
@@ -890,7 +537,7 @@ mcp = LLM::MCP.http(
 mcp.start
 ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
 ctx.talk("Pull information about my GitHub account.")
-ctx.talk(ctx.call(:functions)) while ctx.functions.any?
+ctx.talk(ctx.wait(:call)) while ctx.functions?
 mcp.stop
 ```
@@ -905,7 +552,7 @@ mcp = LLM::MCP.http(
 mcp.run do
   ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
   ctx.talk("Pull information about my GitHub account.")
-  ctx.talk(ctx.call(:functions)) while ctx.functions.any?
+  ctx.talk(ctx.wait(:call)) while ctx.functions?
 end
 ```