RubyGems - llm.rb - Versions diffs - 4.12.0 → 4.13.0 - Mend

llm.rb 4.12.0 → 4.13.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +38 -0
data/README.md +124 -741
data/lib/llm/context.rb +2 -2
data/lib/llm/function.rb +1 -1
data/lib/llm/mcp/error.rb +31 -1
data/lib/llm/mcp/rpc.rb +8 -3
data/lib/llm/mcp.rb +41 -0
data/lib/llm/providers/openai/request_adapter/respond.rb +11 -5
data/lib/llm/providers/openai/response_adapter/responds.rb +13 -1
data/lib/llm/providers/openai/responses/stream_parser.rb +31 -0
data/lib/llm/version.rb +1 -1
data/llm.gemspec +16 -6
metadata +17 -7

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 79d4a45ec25408e46451475575e917ef9d8579bec32f1a6a78bfed235e5ae212
-  data.tar.gz: fdeb12175be3ef87e411021444305b9e785a9bf2d055dfdc7bf718f5740623d8
+  metadata.gz: 7847fee7ea1e63553ad5323750fc2e5ac1b4a9082c2f4c5aba71f4587440ea75
+  data.tar.gz: e63bdae085b2f0f606cbdb4633a7eff93fd6e2428fcb85ff5fe94fc78851bf5d
 SHA512:
-  metadata.gz: ea35b39b5476b75370485128dd8441e078bc7ac69236a7a50f4e32fb419f6fac5f7bb81faf3e029f28b788f4d69645e1b97e4126ea4f9fcc31f014921d2434a4
-  data.tar.gz: c73bbf806f5cef71bfadfc1368fbdbfe07bf37118df18ebec71f4914a27ae2a3858fa6a210ee4d7cdff8f672a14c59016604a72a0a90c611b37223c4652ee991
+  metadata.gz: b1c8d8600b3214da5613d152677d13fde796b42e6a29cf8af035e4ad5f28b7cea0466a375b9b444a748e9e063d2e6ad6720b653609cb2b7038e8040cd2b44e39
+  data.tar.gz: c76882f9cd5416312e26f4e25493403df8f9f8c61ee14cba5096383b449bd7a4ce8b9d70834d12176648c3d9206f0f555a1eec4b22bdb6426d88c0c36c8ed592

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,43 @@
 # Changelog
+## Unreleased
+Changes since `v4.13.0`.
+## v4.13.0
+Changes since `v4.12.0`.
+This release expands MCP prompt support, improves reasoning support in the
+OpenAI Responses API, and refreshes the docs around llm.rb's runtime model,
+contexts, and advanced workflows.
+### Add
+- Add `LLM::MCP#prompts` and `LLM::MCP#find_prompt` for MCP prompt support.
+### Change
+- Rework the README around llm.rb as a runtime for AI systems.
+- Add a dedicated deep dive guide for providers, contexts, persistence,
+  tools, agents, MCP, tracing, multimodal prompts, and retrieval.
+### Fix
+All of these fixes apply to MCP:
+- fix(mcp): raise `LLM::MCP::MismatchError` on mismatched response ids.
+- fix(mcp): normalize prompt message content while preserving the original payload.
+All of these fixes apply to OpenAI's Responses API:
+- fix(openai): emit `on_reasoning_content` for streamed reasoning summaries.
+- fix(openai): skip `previous_response_id` on `store: false` follow-up calls.
+- fix(openai): fall back to an empty object schema for tools without params.
+- fix(openai): preserve original tool-call payloads on re-sent assistant tool messages.
+- fix(openai): emit `output_text` for assistant-authored response content.
+- fix(openai): return `nil` for `system_fingerprint` on normalized response objects.
 ## v4.12.0
 Changes since `v4.11.1`.

data/README.md CHANGED Viewed

@@ -4,155 +4,148 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.12.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.13.0-green.svg?" alt="Version"></a>
 </p>
 ## About
-llm.rb is a Ruby-centric system integration layer for building real
-LLM-powered systems. It connects LLMs to real systems by turning APIs into
-tools and unifying MCP, providers, and application logic into a single
-execution model. It is used in production systems integrating external and
-internal tools, including agents, MCP services, and OpenAPI-based APIs.
-Built for engineers who want to understand and control their LLM systems. No
-frameworks, no hidden magic — just composable primitives for building real
-applications, from scripts to full systems like [Relay](https://github.com/llmrb/relay).
-Jump to [Quick start](#quick-start), discover its [capabilities](#capabilities), read about
-its [architecture](#architecture--execution-model) or watch the
-[Screencast](https://www.youtube.com/watch?v=x1K4wMeO_QA) for a deep dive into the design
-and capabilities of llm.rb.
-## What Makes It Different
-Most LLM libraries stop at requests and responses. <br>
-llm.rb is built around the state and execution model behind them:
-- **A system layer, not just an API wrapper** <br>
-  llm.rb unifies LLMs, tools, MCP servers, and application APIs into a single execution model.
-- **Contexts are central** <br>
-  They hold history, tools, schema, usage, cost, persistence, and execution state.
-- **Contexts can be serialized** <br>
-  A context can be serialized to JSON and stored on disk, in a database, in a
-  job queue, or anywhere else your application needs to persist state.
-- **Tool execution is explicit** <br>
-  Run local, provider-native, and MCP tools sequentially or concurrently with threads, fibers, or async tasks.
-- **Run tools while streaming** <br>
-  Start tool work while a response is still streaming instead of waiting for the turn to finish. <br>
-  This overlaps tool latency with model output and exposes streamed tool-call events for introspection, making it one of llm.rb's strongest execution features.
-- **HTTP MCP can reuse connections** <br>
-  Opt into persistent HTTP pooling for repeated remote MCP tool calls with `persistent`.
-- **One API across providers and capabilities** <br>
-  The same model covers chat, files, images, audio, embeddings, vector stores, and more.
-- **Thread-safe where it matters** <br>
-  Providers are shareable, while contexts stay isolated and stateful.
-- **Local metadata, fewer extra API calls** <br>
-  A built-in registry provides model capabilities, limits, pricing, and cost estimation.
-- **Stdlib-only by default** <br>
-  llm.rb runs on the Ruby standard library by default, with providers, optional features, and the model registry loaded only when you use them.
-## What llm.rb Enables
-llm.rb acts as the integration layer between LLMs, tools, and real systems.
-- Turn REST / OpenAPI APIs into LLM tools
-- Connect multiple MCP sources (Notion, internal services, etc.)
-- Build agents that operate across system boundaries
-- Orchestrate tools from multiple providers and protocols
-- Stream responses while executing tools concurrently
-- Treat LLMs as part of your architecture, not isolated calls
-Without llm.rb, providers, tool formats, and orchestration paths tend to stay
-fragmented. With llm.rb, they share a unified execution model with composable
-tools and a more consistent system architecture.
-## Real-World Usage
-llm.rb is used to integrate external MCP services such as Notion, internal APIs
-exposed via OpenAPI or `swagger.json`, and multiple tool sources into a unified
-execution model. Common usage patterns include combining multiple MCP sources,
-turning internal APIs into tools, and running those tools through the same
-context and provider flow.
-It supports multiple MCP sources, external SaaS integrations, internal APIs via
-OpenAPI, and multiple LLM providers simultaneously.
-## Architecture & Execution Model
-llm.rb sits at the center of the execution path, connecting tools, MCP
-sources, APIs, providers, and your application through explicit contexts:
-```
-        External MCP        Internal MCP        OpenAPI / REST
-             │                   │                    │
-             └────────── Tools / MCP Layer ──────────┘
-                               │
-                         llm.rb Contexts
-                               │
-                        LLM Providers
-                  (OpenAI, Anthropic, etc.)
-                               │
-                        Your Application
-```
-### Key Design Decisions
-- **Thread-safe providers** - `LLM::Provider` instances are safe to share across threads
-- **Thread-local contexts** - `LLM::Context` should generally be kept thread-local
-- **Lazy loading** - Providers, optional features, and the model registry load on demand
-- **JSON adapter system** - Swap JSON libraries (JSON/Oj/Yajl) for performance
-- **Registry system** - Local metadata for model capabilities, limits, and pricing
-- **Provider adaptation** - Normalizes differences between OpenAI, Anthropic, Google, and other providers
-- **Structured tool execution** - Errors are captured and returned as data, not raised unpredictably
-- **Function vs Tool APIs** - Choose between class-based tools and closure-based functions
+llm.rb is a runtime for building AI systems that integrate directly with your
+application. It is not just an API wrapper. It provides a unified execution
+model for providers, tools, MCP servers, streaming, schemas, files, and
+state.
+It is built for engineers who want control over how these systems run. llm.rb
+stays close to Ruby, runs on the standard library by default, loads optional
+pieces only when needed, and remains easy to extend. It also works well in
+Rails or ActiveRecord applications, where a small wrapper around context
+persistence is enough to save and restore long-lived conversation state across
+requests, jobs, or retries.
+Most LLM libraries stop at request/response APIs. Building real systems means
+stitching together streaming, tools, state, persistence, and external
+services by hand. llm.rb provides a single execution model for all of these,
+so they compose naturally instead of becoming separate subsystems.
+## Architecture
+```
+    External MCP      Internal MCP      OpenAPI / REST
+         │                 │                  │
+         └────────── Tools / MCP Layer ───────┘
+                            │
+                      llm.rb Contexts
+                            │
+                       LLM Providers
+                 (OpenAI, Anthropic, etc.)
+                            │
+                      Your Application
+```
+## Core Concept
+`LLM::Context` is the execution boundary in llm.rb.
+It holds:
+- message history
+- tool state
+- schemas
+- streaming configuration
+- usage and cost tracking
+Instead of switching abstractions for each feature, everything builds on the
+same context object.
+## Differentiators
+### Execution Model
+- **A system layer, not just an API wrapper**
+  Put providers, tools, MCP servers, and application APIs behind one runtime
+  model instead of stitching them together by hand.
+- **Contexts are central**
+  Keep history, tools, schema, usage, persistence, and execution state in one
+  place instead of spreading them across your app.
+- **Contexts can be serialized**
+  Save and restore live state for jobs, databases, retries, or long-running
+  workflows.
+### Runtime Behavior
+- **Streaming and tool execution work together**
+  Start tool work while output is still streaming so you can hide latency
+  instead of waiting for turns to finish.
+- **Concurrency is a first-class feature**
+  Use threads, fibers, or async tasks without rewriting your tool layer.
+- **Advanced workloads are built in, not bolted on**
+  Streaming, concurrent tool execution, persistence, tracing, and MCP support
+  all fit the same runtime model.
+### Integration
+- **MCP is built in**
+  Connect to MCP servers over stdio or HTTP without bolting on a separate
+  integration stack.
+- **Tools are explicit**
+  Run local tools, provider-native tools, and MCP tools through the same path
+  with fewer special cases.
+- **Providers are normalized, not flattened**
+  Share one API surface across providers without losing access to provider-
+  specific capabilities where they matter.
+- **Local model metadata is included**
+  Model capabilities, pricing, and limits are available locally without extra
+  API calls.
+### Design Philosophy
+- **Runs on the stdlib**
+  Start with Ruby's standard library and add extra dependencies only when you
+  need them.
+- **It is highly pluggable**
+  Add tools, swap providers, change JSON backends, plug in tracing, or layer
+  internal APIs and MCP servers into the same execution path.
+- **It scales from scripts to long-lived systems**
+  The same primitives work for one-off scripts, background jobs, and more
+  demanding application workloads with streaming, persistence, and tracing.
+- **Thread boundaries are clear**
+  Providers are shareable. Contexts are stateful and should stay thread-local.
 ## Capabilities
-llm.rb provides a complete set of primitives for building LLM-powered systems:
 - **Chat & Contexts** — stateless and stateful interactions with persistence
-- **Streaming** — real-time responses across providers, including structured stream callbacks
-- **Reasoning Support** — full stream, message, and response support when providers expose reasoning
-- **Tool Calling** — define and execute functions with automatic orchestration
-- **Run Tools While Streaming** — begin tool work before the model finishes its turn
+- **Context Serialization** — save and restore state across processes or time
+- **Streaming** — visible output, reasoning output, tool-call events
+- **Tool Calling** — class-based tools and closure-based functions
+- **Run Tools While Streaming** — overlap model output with tool latency
 - **Concurrent Execution** — threads, async tasks, and fibers
-- **Agents** — reusable, preconfigured assistants with tool auto-execution
-- **Structured Outputs** — JSON schema-based responses
-- **MCP Support** — integrate external tool servers dynamically over stdio or HTTP
+- **Agents** — reusable assistants with tool auto-execution
+- **Structured Outputs** — JSON Schema-based responses
+- **Responses API** — stateful response workflows where providers support them
+- **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
 - **Multimodal Inputs** — text, images, audio, documents, URLs
-- **Audio** — text-to-speech, transcription, translation
+- **Audio** — speech generation, transcription, translation
 - **Images** — generation and editing
 - **Files API** — upload and reference files in prompts
 - **Embeddings** — vector generation for search and RAG
-- **Vector Stores** — OpenAI-based retrieval workflows
-- **Cost Tracking** — estimate usage without API calls
+- **Vector Stores** — retrieval workflows
+- **Cost Tracking** — local cost estimation without extra API calls
 - **Observability** — tracing, logging, telemetry
 - **Model Registry** — local metadata for capabilities, limits, pricing
+- **Persistent HTTP** — optional connection pooling for providers and MCP
-## Quick Start
-These examples show individual features, but llm.rb is designed to combine
-them into full systems where LLMs, tools, and external services operate
-together.
-#### Simple Streaming
+## Installation
-At the simplest level, any object that implements `#<<` can receive visible
-output as it arrives. This works with `$stdout`, `StringIO`, files, sockets,
-and other Ruby IO-style objects.
+```bash
+gem install llm.rb
+```
-For more control, llm.rb also supports advanced streaming patterns through
-[`LLM::Stream`](lib/llm/stream.rb). See [Advanced Streaming](#advanced-streaming)
-for a structured callback-based example. Basic `#<<` streams only receive
-visible output chunks:
+## Example
 ```ruby
-#!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
 ctx = LLM::Context.new(llm, stream: $stdout)
 loop do
   print "> "
   ctx.talk(STDIN.gets || break)
@@ -160,623 +153,13 @@ loop do
 end
 ```
-#### Structured Outputs
-The `LLM::Schema` system lets you define JSON schemas for structured outputs.
-Schemas can be defined as classes with `property` declarations or built
-programmatically using a fluent interface. When you pass a schema to a context,
-llm.rb adapts it into the provider's structured-output format when that
-provider supports one. The `content!` method then parses the assistant's JSON
-response into a Ruby object:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-require "pp"
-class Report < LLM::Schema
-  property :category, Enum["performance", "security", "outage"], "Report category", required: true
-  property :summary, String, "Short summary", required: true
-  property :impact, OneOf[String, Integer], "Primary impact, as text or a count", required: true
-  property :services, Array[String], "Impacted services", required: true
-  property :timestamp, String, "When it happened", optional: true
-end
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, schema: Report)
-res = ctx.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
-pp res.content!
-# {
-#   "category" => "performance",
-#   "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
-#   "impact" => "5% request timeouts",
-#   "services" => ["Database"],
-#   "timestamp" => "2024-06-05T10:42:00Z"
-# }
-```
-#### Tool Calling
-Tools in llm.rb can be defined as classes inheriting from `LLM::Tool` or as
-closures using `LLM.function`. When the LLM requests a tool call, the context
-stores `Function` objects in `ctx.functions`. The `call()` method executes all
-pending functions and returns their results to the LLM. Tools describe
-structured parameters with JSON Schema and adapt those definitions to each
-provider's tool-calling format (OpenAI, Anthropic, Google, etc.):
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-class System < LLM::Tool
-  name "system"
-  description "Run a shell command"
-  param :command, String, "Command to execute", required: true
-  def call(command:)
-    {success: system(command)}
-  end
-end
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: $stdout, tools: [System])
-ctx.talk("Run `date`.")
-ctx.talk(ctx.call(:functions)) while ctx.functions.any?
-```
-#### Concurrent Tools
-llm.rb provides explicit concurrency control for tool execution. The
-`wait(:thread)` method spawns each pending function in its own thread and waits
-for all to complete. You can also use `:fiber` for cooperative multitasking or
-`:task` for async/await patterns (requires the `async` gem). The context
-automatically collects all results and reports them back to the LLM in a
-single turn, maintaining conversation flow while parallelizing independent
-operations:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: $stdout, tools: [FetchWeather, FetchNews, FetchStock])
-# Execute multiple independent tools concurrently
-ctx.talk("Summarize the weather, headlines, and stock price.")
-ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
-```
-#### Advanced Streaming
-Use [`LLM::Stream`](lib/llm/stream.rb) when you want more than plain `#<<`
-output. It adds structured streaming callbacks for:
-- `on_content` for visible assistant output
-- `on_reasoning_content` for separate reasoning output
-- `on_tool_call` for streamed tool-call notifications
-- `on_tool_return` for completed tool execution
-Subclass [`LLM::Stream`](lib/llm/stream.rb) when you want callbacks like
-`on_reasoning_content`, `on_tool_call`, and `on_tool_return`, or helpers like
-`queue` and `wait`.
-Keep `on_content`, `on_reasoning_content`, and `on_tool_call` fast: they run
-inline with the streaming parser. `on_tool_return` is different: it runs later,
-when `wait` resolves queued streamed tool work.
-`on_tool_call` lets tools start before the model finishes its turn, for
-example with `tool.spawn(:thread)`, `tool.spawn(:fiber)`, or
-`tool.spawn(:task)`. That can overlap tool latency with streaming output.
-`on_tool_return` is the place to react when that queued work completes, for
-example by updating progress UIs, logging tool latency, or changing visible
-state from "Running tool ..." to "Finished tool ...".
-If a stream cannot resolve a tool, `on_tool_call` receives `error` as an
-`LLM::Function::Return`. That keeps the session alive and leaves control in
-the callback: it can send `error`, spawn the tool when `error == nil`, or
-handle the situation however it sees fit.
-In normal use this should be rare, since `on_tool_call` is usually called with
-a resolved tool and `error == nil`. To resolve a tool call, the tool must be
-found in `LLM::Function.registry`. That covers `LLM::Tool` subclasses,
-including MCP tools, but not `LLM.function` closures, which are excluded
-because they may be bound to local state:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-# Assume `System < LLM::Tool` is already defined.
-class Stream < LLM::Stream
-  def on_content(content)
-    $stdout << content
-  end
-  def on_reasoning_content(content)
-    $stderr << content
-  end
-  def on_tool_call(tool, error)
-    $stdout << "Running tool #{tool.name}\n"
-    queue << (error || tool.spawn(:thread))
-  end
-  def on_tool_return(tool, ret)
-    $stdout << (ret.error? ? "Tool #{tool.name} failed\n" : "Finished tool #{tool.name}\n")
-  end
-end
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])
-ctx.talk("Run `date` and `uname -a`.")
-while ctx.functions.any?
-  ctx.talk(ctx.wait(:thread))
-end
-```
-#### MCP
-MCP is a first-class integration mechanism in llm.rb.
-MCP allows llm.rb to treat external services, internal APIs, and system
-capabilities as tools in a unified interface. This makes it possible to
-connect multiple MCP sources simultaneously and expose your own APIs as tools.
-In practice, this supports workflows such as external SaaS integrations,
-multiple MCP sources in the same context, and OpenAPI -> MCP -> tools
-pipelines for internal services.
-llm.rb integrates with the Model Context Protocol (MCP) to dynamically discover
-and use tools from external servers. This example starts a filesystem MCP
-server over stdio and makes its tools available to a context, enabling the LLM
-to interact with the local file system through a standardized interface.
-Use `LLM::MCP.stdio` or `LLM::MCP.http` when you want to make the transport
-explicit. Like `LLM::Context`, an MCP client is stateful and should remain
-isolated to a single thread:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-mcp = LLM::MCP.stdio(argv: ["npx", "-y", "@modelcontextprotocol/server-filesystem", Dir.pwd])
-begin
-  mcp.start
-  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
-  ctx.talk("List the directories in this project.")
-  ctx.talk(ctx.call(:functions)) while ctx.functions.any?
-ensure
-  mcp.stop
-end
-```
-You can also connect to an MCP server over HTTP. This is useful when the
-server already runs remotely and exposes MCP through a URL instead of a local
-process. If you expect repeated tool calls, use `persistent` to reuse a
-process-wide HTTP connection pool. This requires the optional
-`net-http-persistent` gem:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-mcp = LLM::MCP.http(
-  url: "https://api.githubcopilot.com/mcp/",
-  headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
-).persistent
-begin
-  mcp.start
-  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
-  ctx.talk("List the available GitHub MCP toolsets.")
-  ctx.talk(ctx.call(:functions)) while ctx.functions.any?
-ensure
-  mcp.stop
-end
-```
-## Providers
-llm.rb supports multiple LLM providers with a unified API.
-All providers share the same context, tool, and concurrency interfaces, making
-it easy to switch between cloud and local models:
-- **OpenAI** (`LLM.openai`)
-- **Anthropic** (`LLM.anthropic`)
-- **Google** (`LLM.google`)
-- **DeepSeek** (`LLM.deepseek`)
-- **xAI** (`LLM.xai`)
-- **zAI** (`LLM.zai`)
-- **Ollama** (`LLM.ollama`)
-- **Llama.cpp** (`LLM.llamacpp`)
-## Production
-#### Ready for production
-llm.rb is designed for production use from the ground up:
-- **Thread-safe providers** - Share `LLM::Provider` instances across your application
-- **Thread-local contexts** - Keep `LLM::Context` instances thread-local for state isolation
-- **Cost tracking** - Know your spend before the bill arrives
-- **Observability** - Built-in tracing with OpenTelemetry support
-- **Persistence** - Save and restore contexts across processes
-- **Performance** - Swap JSON adapters and enable HTTP connection pooling
-- **Error handling** - Structured errors, not unpredictable exceptions
-#### Tracing
-llm.rb includes built-in tracers for local logging, OpenTelemetry, and
-LangSmith. Assign a tracer to a provider and all context requests and tool
-calls made through that provider will be instrumented. Tracers are local to
-the current fiber, so the same provider can use different tracers in different
-concurrent tasks without interfering with each other.
-Use the logger tracer when you want structured logs through Ruby's standard
-library:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
-ctx = LLM::Context.new(llm)
-ctx.talk("Hello")
-```
-Use the telemetry tracer when you want OpenTelemetry spans. This requires the
-`opentelemetry-sdk` gem, and exporters such as OTLP can be added separately:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-llm.tracer = LLM::Tracer::Telemetry.new(llm)
-ctx = LLM::Context.new(llm)
-ctx.talk("Hello")
-pp llm.tracer.spans
-```
-Use the LangSmith tracer when you want LangSmith-compatible metadata and trace
-grouping on top of the telemetry tracer:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-llm.tracer = LLM::Tracer::Langsmith.new(
-  llm,
-  metadata: {env: "dev"},
-  tags: ["chatbot"]
-)
-ctx = LLM::Context.new(llm)
-ctx.talk("Hello")
-```
-#### Thread Safety
-llm.rb uses Ruby's `Monitor` class to ensure thread safety at the provider
-level, allowing you to share a single provider instance across multiple threads
-while maintaining state isolation through thread-local contexts. This design
-enables efficient resource sharing while preventing race conditions in
-concurrent applications:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-# Thread-safe providers - create once, use everywhere
-llm = LLM.openai(key: ENV["KEY"])
-# Each thread should have its own context for state isolation
-Thread.new do
-  ctx = LLM::Context.new(llm)  # Thread-local context
-  ctx.talk("Hello from thread 1")
-end
-Thread.new do
-  ctx = LLM::Context.new(llm)  # Thread-local context
-  ctx.talk("Hello from thread 2")
-end
-```
-#### Performance Tuning
-llm.rb's JSON adapter system lets you swap JSON libraries for better
-performance in high-throughput applications. The library supports stdlib JSON,
-Oj, and Yajl, with Oj typically offering the best performance. Additionally,
-you can enable HTTP connection pooling using the optional `net-http-persistent`
-gem to reduce connection overhead in production environments:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-# Swap JSON libraries for better performance
-LLM.json = :oj  # Use Oj for faster JSON parsing
-# Enable HTTP connection pooling for high-throughput applications
-llm = LLM.openai(key: ENV["KEY"]).persistent  # Uses net-http-persistent when available
-```
-#### Model Registry
-llm.rb includes a local model registry that provides metadata about model
-capabilities, pricing, and limits without requiring API calls. The registry is
-shipped with the gem and sourced from https://models.dev, giving you access to
-up-to-date information about context windows, token costs, and supported
-modalities for each provider:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-# Access model metadata, capabilities, and pricing
-registry = LLM.registry_for(:openai)
-model_info = registry.limit(model: "gpt-4.1")
-puts "Context window: #{model_info.context} tokens"
-puts "Cost: $#{model_info.cost.input}/1M input tokens"
-```
-## More Examples
-#### Responses API
-llm.rb also supports OpenAI's Responses API through `LLM::Context` with
-`mode: :responses`. The important switch is `store:`. With `store: false`, the
-Responses API stays stateless while still using the Responses endpoint, which
-is useful for models or features that are only available through the Responses
-API. With `store: true`, OpenAI can keep
-response state server-side and reduce how much conversation state needs to be
-sent on each turn:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, mode: :responses, store: false)
-ctx.talk("Your task is to answer the user's questions", role: :developer)
-res = ctx.talk("What is the capital of France?")
-puts res.content
-```
-#### Context Persistence: Vanilla
-Contexts can be serialized and restored across process boundaries. A context
-can be serialized to JSON and stored on disk, in a database, in a job queue,
-or anywhere else your application needs to persist state:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm)
-ctx.talk("Hello")
-ctx.talk("Remember that my favorite language is Ruby")
-# Serialize to a string when you want to store the context yourself,
-# for example in a database row or job payload.
-payload = ctx.to_json
-restored = LLM::Context.new(llm)
-restored.restore(string: payload)
-res = restored.talk("What is my favorite language?")
-puts res.content
-# You can also persist the same state to a file:
-ctx.save(path: "context.json")
-restored = LLM::Context.new(llm)
-restored.restore(path: "context.json")
-```
-#### Context Persistence: ActiveRecord (Rails)
-In a Rails application, you can also wrap persisted context state in an
-ActiveRecord model. A minimal schema would include a `snapshot` column for the
-serialized context payload (`jsonb` is recommended) and a `provider` column
-for the provider name:
-```ruby
-create_table :contexts do |t|
-  t.jsonb :snapshot
-  t.string :provider, null: false
-  t.timestamps
-end
-```
-For example:
-```ruby
-class Context < ApplicationRecord
-  def talk(...)
-    ctx.talk(...).tap { flush }
-  end
-  def wait(...)
-    ctx.wait(...).tap { flush }
-  end
-  def messages
-    ctx.messages
-  end
-  def model
-    ctx.model
-  end
-  def flush
-    update_column(:snapshot, ctx.to_json)
-  end
-  private
-  def ctx
-    @ctx ||= begin
-      ctx = LLM::Context.new(llm)
-      ctx.restore(string: snapshot) if snapshot
-      ctx
-    end
-  end
-  def llm
-    LLM.method(provider).call(key: ENV.fetch(key))
-  end
+## Resources
-  def key
-    "#{provider.upcase}_KEY"
-  end
-end
-```
-#### Agents
-Agents in llm.rb are reusable, preconfigured assistants that automatically
-execute tool calls and maintain conversation state. Unlike contexts which
-require manual tool execution, agents automatically handle the tool call loop,
-making them ideal for autonomous workflows where you want the LLM to
-independently use available tools to accomplish tasks:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-class SystemAdmin < LLM::Agent
-  model "gpt-4.1"
-  instructions "You are a Linux system admin"
-  tools Shell
-  schema Result
-end
-llm = LLM.openai(key: ENV["KEY"])
-agent = SystemAdmin.new(llm)
-res = agent.talk("Run 'date'")
-```
-#### Cost Tracking
-llm.rb provides built-in cost estimation that works without making additional
-API calls. The cost tracking system uses the local model registry to calculate
-estimated costs based on token usage, giving you visibility into spending
-before bills arrive. This is particularly useful for monitoring usage in
-production applications and setting budget alerts:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm)
-ctx.talk "Hello"
-puts "Estimated cost so far: $#{ctx.cost}"
-ctx.talk "Tell me a joke"
-puts "Estimated cost so far: $#{ctx.cost}"
-```
-#### Multimodal Prompts
-Contexts provide helpers for composing multimodal prompts from URLs, local
-files, and provider-managed remote files. These tagged objects let providers
-adapt the input into the format they expect:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm)
-res = ctx.talk ["Describe this image", ctx.image_url("https://example.com/cat.jpg")]
-puts res.content
-```
-#### Audio Generation
-llm.rb supports OpenAI's audio API for text-to-speech generation, allowing you
-to create speech from text with configurable voices and output formats. The
-audio API returns binary audio data that can be streamed directly to files or
-other IO objects, enabling integration with multimedia applications:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.audio.create_speech(input: "Hello world")
-IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
-```
-#### Image Generation
-llm.rb provides access to OpenAI's DALL-E image generation API through a
-unified interface. The API supports multiple response formats including
-base64-encoded images and temporary URLs, with automatic handling of binary
-data streaming for efficient file operations:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.images.create(prompt: "a dog on a rocket to the moon")
-IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
-```
-#### Embeddings
-llm.rb's embedding API generates vector representations of text for semantic
-search and retrieval-augmented generation (RAG) workflows. The API supports
-batch processing of multiple inputs and returns normalized vectors suitable for
-vector similarity operations, with consistent dimensionality across providers:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.embed(["programming is fun", "ruby is a programming language", "sushi is art"])
-puts res.class
-puts res.embeddings.size
-puts res.embeddings[0].size
-# LLM::Response
-# 3
-# 1536
-```
-## Real-World Example: Relay
-See how these pieces come together in a complete application architecture with
-[Relay](https://github.com/llmrb/relay), a production-ready LLM application
-built on llm.rb that demonstrates:
-- Context management across requests
-- Tool composition and execution
-- Concurrent workflows
-- Cost tracking and observability
-- Production deployment patterns
-Watch the screencast:
-[![Watch the llm.rb screencast](https://img.youtube.com/vi/Jb7LNUYlCf4/maxresdefault.jpg)](https://www.youtube.com/watch?v=x1K4wMeO_QA)
-## Installation
-```bash
-gem install llm.rb
-```
+- [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
+  examples guide.
+- [_examples/relay](./_examples/relay) shows a real application built on top
+  of llm.rb.
+- [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
 ## License

data/lib/llm/context.rb CHANGED Viewed

@@ -103,9 +103,9 @@ module LLM
     #   res = ctx.respond("What is the capital of France?")
     #   puts res.output_text
     def respond(prompt, params = {})
-      res_id = @messages.find(&:assistant?)&.response&.response_id
-      params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
       params = @params.merge(params)
+      res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
+      params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
       res = @llm.responses.create(prompt, params)
       role = params[:role] || @llm.user_role
       @messages.concat LLM::Prompt === prompt ? prompt.to_a : [LLM::Message.new(role, prompt)]

data/lib/llm/function.rb CHANGED Viewed

@@ -257,7 +257,7 @@ class LLM::Function
     when "LLM::OpenAI::Responses"
       {
         type: "function", name: @name, description: @description,
-        parameters: @params.to_h.merge(additionalProperties: false), strict: true
+        parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
       }.compact
     else
       {

data/lib/llm/mcp/error.rb CHANGED Viewed

@@ -1,7 +1,7 @@
 # frozen_string_literal: true
 class LLM::MCP
-  class Error < LLM::Error
+  Error = Class.new(LLM::Error) do
     attr_reader :code, :data
     ##
@@ -27,5 +27,35 @@ class LLM::MCP
     end
   end
+  MismatchError = Class.new(Error) do
+    ##
+    # @return [Integer, String]
+    #  The request id the client was waiting for
+    attr_reader :expected_id
+    ##
+    # @return [Integer, String]
+    #  The response id received from the server
+    attr_reader :actual_id
+    ##
+    # @param [Integer, String] expected_id
+    #  The request id the client was waiting for
+    # @param [Integer, String] actual_id
+    #  The response id received from the server instead
+    def initialize(expected_id:, actual_id:)
+      @expected_id = expected_id
+      @actual_id = actual_id
+      super(message)
+    end
+    ##
+    # @return [String]
+    def message
+      "mismatched MCP response id #{actual_id.inspect} " \
+      "while waiting for #{expected_id.inspect}"
+    end
+  end
   TimeoutError = Class.new(Error)
 end

data/lib/llm/mcp/rpc.rb CHANGED Viewed

@@ -53,11 +53,14 @@ class LLM::MCP
       poll(timeout:, ex: [IO::WaitReadable]) do
         loop do
           res = transport.read_nonblock
-          next unless res["id"] == id
-          if res["error"]
+          if res["id"] == id && res["error"]
             raise LLM::MCP::Error.from(response: res)
-          else
+          elsif res["id"] == id
             break res["result"]
+          elsif res["method"]
+            next
+          elsif res.key?("id")
+            raise LLM::MCP::MismatchError.new(expected_id: id, actual_id: res["id"])
           end
         end
       end
@@ -101,6 +104,8 @@ class LLM::MCP
     #  The exceptions to retry when raised
     # @yield
     #  The block to run
+    # @raise [LLM::MCP::MismatchError]
+    #  When an unrelated response id is received while waiting
     # @raise [LLM::MCP::TimeoutError]
     #  When the block takes longer than the timeout
     # @return [Object]

data/lib/llm/mcp.rb CHANGED Viewed

@@ -121,6 +121,34 @@ class LLM::MCP
     res["tools"].map { LLM::Tool.mcp(self, _1) }
   end
+  ##
+  # Returns the prompts provided by the MCP process.
+  # @return [Array<LLM::Object>]
+  def prompts
+    res = call(transport, "prompts/list")
+    LLM::Object.from(res["prompts"])
+  end
+  ##
+  # Returns a prompt by name.
+  # @param [String] name The prompt name
+  # @param [Hash<String, String>, nil] arguments The prompt arguments
+  # @return [LLM::Object]
+  def find_prompt(name:, arguments: nil)
+    params = {name:}
+    params[:arguments] = arguments if arguments
+    res = call(transport, "prompts/get", params)
+    res["messages"] = [*res["messages"]].map do |message|
+      LLM::Message.new(
+        message["role"],
+        adapt_content(message["content"]),
+        {original_content: message["content"]}
+      )
+    end
+    LLM::Object.from(res)
+  end
+  alias_method :get_prompt, :find_prompt
   ##
   # Calls a tool by name with the given arguments
   # @param [String] name The name of the tool to call
@@ -135,6 +163,19 @@ class LLM::MCP
   attr_reader :llm, :command, :transport, :timeout
+  def adapt_content(content)
+    case content
+    when String
+      content
+    when Hash
+      content["type"] == "text" ? content["text"].to_s : LLM::Object.from(content)
+    when Array
+      content.map { adapt_content(_1) }
+    else
+      content
+    end
+  end
   def adapt_tool_result(result)
     if result["structuredContent"]
       result["structuredContent"]

data/lib/llm/providers/openai/request_adapter/respond.rb CHANGED Viewed

@@ -15,6 +15,8 @@ module LLM::OpenAI::RequestAdapter
       catch(:abort) do
         if Hash === message
           {role: message[:role], content: adapt_content(message[:content])}
+        elsif message.tool_call?
+          message.extra[:original_tool_calls]
         else
           adapt_message
         end
@@ -23,12 +25,12 @@ module LLM::OpenAI::RequestAdapter
     private
-    def adapt_content(content)
+    def adapt_content(content, role: message.role)
       case content
       when String
-        [{type: :input_text, text: content.to_s}]
+        [{type: text_content_type(role), text: content.to_s}]
       when LLM::Response then adapt_remote_file(content)
-      when LLM::Message then adapt_content(content.content)
+      when LLM::Message then adapt_content(content.content, role: content.role)
       when LLM::Object
         case content.kind
         when :image_url then [{type: :image_url, image_url: {url: content.value.to_s}}]
@@ -46,7 +48,7 @@ module LLM::OpenAI::RequestAdapter
       when Array
         adapt_array
       else
-        {role: message.role, content: adapt_content(content)}
+        {role: message.role, content: adapt_content(content, role: message.role)}
       end
     end
@@ -56,7 +58,7 @@ module LLM::OpenAI::RequestAdapter
       elsif returns.any?
         returns.map { {type: "function_call_output", call_id: _1.id, output: LLM.json.dump(_1.value)} }
       else
-        {role: message.role, content: content.flat_map { adapt_content(_1) }}
+        {role: message.role, content: content.flat_map { adapt_content(_1, role: message.role) }}
       end
     end
@@ -83,5 +85,9 @@ module LLM::OpenAI::RequestAdapter
     def message = @message
     def content = message.content
     def returns = content.grep(LLM::Function::Return)
+    def text_content_type(role)
+      role.to_s == "assistant" ? :output_text : :input_text
+    end
   end
 end

data/lib/llm/providers/openai/response_adapter/responds.rb CHANGED Viewed

@@ -60,6 +60,13 @@ module LLM::OpenAI::ResponseAdapter
       body.model
     end
+    ##
+    # OpenAI's Responses API does not expose a system fingerprint.
+    # @return [nil]
+    def system_fingerprint
+      nil
+    end
     ##
     # Returns the aggregated text content from the response outputs.
     # @return [String]
@@ -88,10 +95,15 @@ module LLM::OpenAI::ResponseAdapter
     private
     def adapt_message
-      message = LLM::Message.new("assistant", +"", {response: self, tool_calls: [], reasoning_content: +""})
+      message = LLM::Message.new(
+        "assistant",
+        +"",
+        {response: self, tool_calls: [], original_tool_calls: [], reasoning_content: +""}
+      )
       output.each do |choice|
         if choice.type == "function_call"
           message.extra[:tool_calls] << adapt_tool(choice)
+          message.extra[:original_tool_calls] << choice
         elsif choice.type == "reasoning"
           (choice.summary || []).each do |summary|
             next unless summary["type"] == "summary_text"

data/lib/llm/providers/openai/responses/stream_parser.rb CHANGED Viewed

@@ -43,11 +43,19 @@ class LLM::OpenAI
           @body[k] = v
         end
         @body["output"] ||= []
+      when "response.in_progress", "response.completed"
+        response = chunk["response"] || {}
+        response.each do |k, v|
+          next if k == "output" && @body["output"].is_a?(Array) && @body["output"].any?
+          @body[k] = v
+        end
+        @body["output"] ||= response["output"] || []
       when "response.output_item.added"
         output_index = chunk["output_index"]
         item = chunk["item"]
         @body["output"][output_index] = item
         @body["output"][output_index]["content"] ||= []
+        @body["output"][output_index]["summary"] ||= [] if item["type"] == "reasoning"
       when "response.content_part.added"
         output_index = chunk["output_index"]
         content_index = chunk["content_index"]
@@ -55,6 +63,25 @@ class LLM::OpenAI
         @body["output"][output_index] ||= {"content" => []}
         @body["output"][output_index]["content"] ||= []
         @body["output"][output_index]["content"][content_index] = part
+      when "response.reasoning_summary_text.delta"
+        output_item = @body["output"][chunk["output_index"]]
+        if output_item && output_item["type"] == "reasoning"
+          summary_index = chunk["summary_index"] || 0
+          output_item["summary"] ||= []
+          output_item["summary"][summary_index] ||= {"type" => "summary_text", "text" => +""}
+          output_item["summary"][summary_index]["text"] << chunk["delta"]
+          emit_reasoning_content(chunk["delta"])
+        end
+      when "response.reasoning_summary_text.done"
+        output_item = @body["output"][chunk["output_index"]]
+        if output_item && output_item["type"] == "reasoning"
+          summary_index = chunk["summary_index"] || 0
+          output_item["summary"] ||= []
+          output_item["summary"][summary_index] = {
+            "type" => "summary_text",
+            "text" => chunk["text"]
+          }
+        end
       when "response.output_text.delta"
         output_index = chunk["output_index"]
         content_index = chunk["content_index"]
@@ -102,6 +129,10 @@ class LLM::OpenAI
       end
     end
+    def emit_reasoning_content(value)
+      @stream.on_reasoning_content(value) if @stream.respond_to?(:on_reasoning_content)
+    end
     def emit_tool(index, tool)
       return unless @stream.respond_to?(:on_tool_call)
       return unless complete_tool?(tool)

data/lib/llm/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LLM
-  VERSION = "4.12.0"
+  VERSION = "4.13.0"
 end

data/llm.gemspec CHANGED Viewed

@@ -11,12 +11,22 @@ Gem::Specification.new do |spec|
   spec.summary = "System integration layer for LLMs, tools, MCP, and APIs in Ruby."
   spec.description = <<~DESCRIPTION
-  llm.rb is a Ruby-centric system integration layer for building LLM-powered
-  systems. It connects LLMs to real systems by turning APIs into tools and
-  unifying MCP, providers, contexts, and application logic in one execution
-  model. It supports explicit tool orchestration, concurrent execution,
-  streaming, multiple MCP sources, and multiple LLM providers for production
-  systems that integrate external and internal services.
+  llm.rb is a runtime for building AI systems that integrate directly with your
+  application. It is not just an API wrapper. It provides a unified execution
+  model for providers, tools, MCP servers, streaming, schemas, files, and
+  state.
+  It is built for engineers who want control over how these systems run.
+  llm.rb stays close to Ruby, runs on the standard library by default, loads
+  optional pieces only when needed, and remains easy to extend. It also works
+  well in Rails or ActiveRecord applications, where a small wrapper around
+  context persistence is enough to save and restore long-lived conversation
+  state across requests, jobs, or retries.
+  Most LLM libraries stop at request/response APIs. Building real systems
+  means stitching together streaming, tools, state, persistence, and external
+  services by hand. llm.rb provides a single execution model for all of these,
+  so they compose naturally instead of becoming separate subsystems.
   DESCRIPTION
   spec.license = "0BSD"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm.rb
 version: !ruby/object:Gem::Version
-  version: 4.12.0
+  version: 4.13.0
 platform: ruby
 authors:
 - Antar Azri
@@ -195,12 +195,22 @@ dependencies:
       - !ruby/object:Gem::Version
         version: '1.7'
 description: |
-  llm.rb is a Ruby-centric system integration layer for building LLM-powered
-  systems. It connects LLMs to real systems by turning APIs into tools and
-  unifying MCP, providers, contexts, and application logic in one execution
-  model. It supports explicit tool orchestration, concurrent execution,
-  streaming, multiple MCP sources, and multiple LLM providers for production
-  systems that integrate external and internal services.
+  llm.rb is a runtime for building AI systems that integrate directly with your
+  application. It is not just an API wrapper. It provides a unified execution
+  model for providers, tools, MCP servers, streaming, schemas, files, and
+  state.
+  It is built for engineers who want control over how these systems run.
+  llm.rb stays close to Ruby, runs on the standard library by default, loads
+  optional pieces only when needed, and remains easy to extend. It also works
+  well in Rails or ActiveRecord applications, where a small wrapper around
+  context persistence is enough to save and restore long-lived conversation
+  state across requests, jobs, or retries.
+  Most LLM libraries stop at request/response APIs. Building real systems
+  means stitching together streaming, tools, state, persistence, and external
+  services by hand. llm.rb provides a single execution model for all of these,
+  so they compose naturally instead of becoming separate subsystems.
 email:
 - azantar@proton.me
 - 0x1eef@hardenedbsd.org