RubyGems - llm.rb - Versions diffs - 4.8.0 → 4.10.0 - Mend

llm.rb 4.8.0 → 4.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (79) hide show

checksums.yaml +4 -4
data/README.md +356 -583
data/data/anthropic.json +770 -0
data/data/deepseek.json +75 -0
data/data/google.json +1050 -0
data/data/openai.json +1421 -0
data/data/xai.json +792 -0
data/data/zai.json +330 -0
data/lib/llm/agent.rb +42 -41
data/lib/llm/bot.rb +1 -263
data/lib/llm/buffer.rb +7 -0
data/lib/llm/{session → context}/deserializer.rb +4 -3
data/lib/llm/context.rb +292 -0
data/lib/llm/cost.rb +26 -0
data/lib/llm/error.rb +8 -0
data/lib/llm/function/array.rb +61 -0
data/lib/llm/function/fiber_group.rb +91 -0
data/lib/llm/function/task_group.rb +89 -0
data/lib/llm/function/thread_group.rb +94 -0
data/lib/llm/function.rb +75 -10
data/lib/llm/mcp/command.rb +108 -0
data/lib/llm/mcp/error.rb +31 -0
data/lib/llm/mcp/pipe.rb +82 -0
data/lib/llm/mcp/rpc.rb +118 -0
data/lib/llm/mcp/transport/http/event_handler.rb +66 -0
data/lib/llm/mcp/transport/http.rb +122 -0
data/lib/llm/mcp/transport/stdio.rb +85 -0
data/lib/llm/mcp.rb +116 -0
data/lib/llm/message.rb +13 -11
data/lib/llm/model.rb +2 -2
data/lib/llm/prompt.rb +17 -7
data/lib/llm/provider.rb +32 -17
data/lib/llm/providers/anthropic/files.rb +3 -3
data/lib/llm/providers/anthropic.rb +19 -4
data/lib/llm/providers/deepseek.rb +10 -3
data/lib/llm/providers/{gemini → google}/audio.rb +6 -6
data/lib/llm/providers/{gemini → google}/error_handler.rb +2 -2
data/lib/llm/providers/{gemini → google}/files.rb +11 -11
data/lib/llm/providers/{gemini → google}/images.rb +7 -7
data/lib/llm/providers/{gemini → google}/models.rb +5 -5
data/lib/llm/providers/{gemini → google}/request_adapter/completion.rb +7 -3
data/lib/llm/providers/{gemini → google}/request_adapter.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/completion.rb +7 -7
data/lib/llm/providers/{gemini → google}/response_adapter/embedding.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/file.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/files.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/image.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/models.rb +1 -1
data/lib/llm/providers/{gemini → google}/response_adapter/web_search.rb +2 -2
data/lib/llm/providers/{gemini → google}/response_adapter.rb +8 -8
data/lib/llm/providers/{gemini → google}/stream_parser.rb +3 -3
data/lib/llm/providers/{gemini.rb → google.rb} +41 -26
data/lib/llm/providers/llamacpp.rb +10 -3
data/lib/llm/providers/ollama.rb +19 -4
data/lib/llm/providers/openai/files.rb +3 -3
data/lib/llm/providers/openai/response_adapter/completion.rb +9 -1
data/lib/llm/providers/openai/response_adapter/responds.rb +9 -1
data/lib/llm/providers/openai/responses.rb +9 -1
data/lib/llm/providers/openai/stream_parser.rb +2 -0
data/lib/llm/providers/openai.rb +19 -4
data/lib/llm/providers/xai.rb +10 -3
data/lib/llm/providers/zai.rb +9 -2
data/lib/llm/registry.rb +81 -0
data/lib/llm/schema/all_of.rb +31 -0
data/lib/llm/schema/any_of.rb +31 -0
data/lib/llm/schema/one_of.rb +31 -0
data/lib/llm/schema/parser.rb +145 -0
data/lib/llm/schema.rb +49 -8
data/lib/llm/server_tool.rb +5 -5
data/lib/llm/session.rb +10 -1
data/lib/llm/tool.rb +88 -6
data/lib/llm/tracer/logger.rb +1 -1
data/lib/llm/tracer/telemetry.rb +7 -7
data/lib/llm/tracer.rb +3 -3
data/lib/llm/usage.rb +5 -0
data/lib/llm/version.rb +1 -1
data/lib/llm.rb +39 -6
data/llm.gemspec +45 -8
metadata +86 -28

data/README.md CHANGED Viewed

@@ -4,751 +4,544 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.8.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.10.0-green.svg?" alt="Version"></a>
 </p>
 ## About
-llm.rb is a zero-dependency Ruby toolkit for Large Language Models that
-includes OpenAI, Gemini, Anthropic, xAI (Grok), zAI, DeepSeek, Ollama,
-and LlamaCpp. The toolkit includes full support for chat, streaming,
-tool calling, audio, images, files, and structured outputs.
+llm.rb is a Ruby-centric toolkit for building real LLM-powered systems — where
+LLMs are part of your architecture, not just API calls. It gives you explicit
+control over contexts, tools, concurrency, and providers, so you can compose
+reliable, production-ready workflows without hidden abstractions.
-And it is licensed under the [0BSD License](https://choosealicense.com/licenses/0bsd/) &ndash;
-one of the most permissive open source licenses, with minimal conditions for use,
-modification, and/or distribution. Attribution is appreciated, but not required
-by the license. Built with [good music](https://www.youtube.com/watch?v=SNvaqwTbn14)
-and a lot of ☕️.
+Built for engineers who want to understand and control their LLM systems. No
+frameworks, no hidden magic — just composable primitives for building real
+applications, from scripts to full systems like [Relay](https://github.com/llmrb/relay).
-## Quick start
+Jump to [Quick start](#quick-start), discover its [capabilities](#capabilities), read about
+its [architecture](#architecture--execution-model) or watch the
+[Screencast](https://www.youtube.com/watch?v=x1K4wMeO_QA) for a deep dive into the design
+and capabilities of llm.rb.
-#### REPL
+## What Makes It Different
-The [LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html) class provides
-a session with an LLM provider that maintains conversation history and context across
-multiple requests. The following example implements a simple REPL loop, and the response
-is streamed to the terminal in real-time as it arrives from the provider. The provider
-happens to be OpenAI in this case but it could be any other provider, and `$stdout`
-could be any object that implements the `#<<` method:
+Most LLM libraries stop at requests and responses. <br>
+llm.rb is built around the state and execution model around them:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
+- **Contexts are central** <br>
+  They hold history, tools, schema, usage, cost, persistence, and execution state.
+- **Tool execution is explicit** <br>
+  Run local, provider-native, and MCP tools sequentially or concurrently with threads, fibers, or async tasks.
+- **One API across providers and capabilities** <br>
+  The same model covers chat, files, images, audio, embeddings, vector stores, and more.
+- **Thread-safe where it matters** <br>
+  Providers are shareable, while contexts stay isolated and stateful.
+- **Local metadata, fewer extra API calls** <br>
+  A built-in registry provides model capabilities, limits, pricing, and cost estimation.
+- **Stdlib-only by default** <br>
+  llm.rb runs on the Ruby standard library by default, with providers, optional features, and the model registry loaded only when you use them.
-llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm, stream: $stdout)
-loop do
-  print "> "
-  ses.talk(STDIN.gets || break)
-  puts
-end
-```
+## Architecture & Execution Model
+llm.rb is built in layers, each providing explicit control:
-#### Schema
+```
+┌─────────────────────────────────────────┐
+│          Your Application               │
+├─────────────────────────────────────────┤
+│         Contexts & Agents               │ ← Stateful workflows
+├─────────────────────────────────────────┤
+│           Tools & Functions             │ ← Concurrent execution
+├─────────────────────────────────────────┤
+│   Unified Provider API (OpenAI, etc.)   │ ← Provider abstraction
+├─────────────────────────────────────────┤
+│      HTTP, JSON, Thread Safety          │ ← Infrastructure
+└─────────────────────────────────────────┘
+```
-The [LLM::Schema](https://0x1eef.github.io/x/llm.rb/LLM/Schema.html) class provides
-a simple DSL for describing the structure of a response that an LLM emits according
-to a JSON schema. The schema lets a client describe what JSON object an LLM should
-emit, and the LLM will abide by the schema to the best of its ability:
+### Key Design Decisions
+- **Thread-safe providers** - `LLM::Provider` instances are safe to share across threads
+- **Thread-local contexts** - `LLM::Context` should generally be kept thread-local
+- **Lazy loading** - Providers, optional features, and the model registry load on demand
+- **JSON adapter system** - Swap JSON libraries (JSON/Oj/Yajl) for performance
+- **Registry system** - Local metadata for model capabilities, limits, and pricing
+- **Provider adaptation** - Normalizes differences between OpenAI, Anthropic, Google, and other providers
+- **Structured tool execution** - Errors are captured and returned as data, not raised unpredictably
+- **Function vs Tool APIs** - Choose between class-based tools and closure-based functions
+## Capabilities
+llm.rb provides a complete set of primitives for building LLM-powered systems:
+- **Chat & Contexts** — stateless and stateful interactions with persistence
+- **Streaming** — real-time responses across providers
+- **Tool Calling** — define and execute functions with automatic orchestration
+- **Concurrent Execution** — threads, async tasks, and fibers
+- **Agents** — reusable, preconfigured assistants with tool auto-execution
+- **Structured Outputs** — JSON schema-based responses
+- **MCP Support** — integrate external tool servers dynamically
+- **Multimodal Inputs** — text, images, audio, documents, URLs
+- **Audio** — text-to-speech, transcription, translation
+- **Images** — generation and editing
+- **Files API** — upload and reference files in prompts
+- **Embeddings** — vector generation for search and RAG
+- **Vector Stores** — OpenAI-based retrieval workflows
+- **Cost Tracking** — estimate usage without API calls
+- **Observability** — tracing, logging, telemetry
+- **Model Registry** — local metadata for capabilities, limits, pricing
+## Quick Start
+#### Concurrent Tools
+llm.rb provides explicit concurrency control for tool execution. The
+`wait(:thread)` method spawns each pending function in its own thread and waits
+for all to complete. You can also use `:fiber` for cooperative multitasking or
+`:task` for async/await patterns (requires the `async` gem). The context
+automatically collects all results and reports them back to the LLM in a
+single turn, maintaining conversation flow while parallelizing independent
+operations:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-require "pp"
-class Report < LLM::Schema
-  property :category, String, "Report category", required: true
-  property :summary, String, "Short summary", required: true
-  property :services, Array[String], "Impacted services", required: true
-  property :timestamp, String, "When it happened", optional: true
-end
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm, schema: Report)
-res = ses.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
-pp res.content!
+ctx = LLM::Context.new(llm, stream: $stdout, tools: [FetchWeather, FetchNews, FetchStock])
-##
-# {
-#   "category" => "Performance Incident",
-#   "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
-#   "services" => ["Database"],
-#   "timestamp" => "2024-06-05T10:42:00Z"
-# }
+# Execute multiple independent tools concurrently
+ctx.talk("Summarize the weather, headlines, and stock price.")
+ctx.talk(ctx.functions.wait(:thread)) while ctx.functions.any?
 ```
-#### Tools
+#### MCP
-The [LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html) class lets you
-define callable tools for the model. Each tool is described to the LLM as a function
-it can invoke to fetch information or perform an action. The model decides when to
-call tools based on the conversation; when it does, llm.rb runs the tool and sends
-the result back on the next request. The following example implements a simple tool
-that runs shell commands:
+llm.rb integrates with the Model Context Protocol (MCP) to dynamically discover
+and use tools from external servers. This example starts a filesystem MCP
+server over stdio and makes its tools available to a context, enabling the LLM
+to interact with the local file system through a standardized interface:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-class System < LLM::Tool
-  name "system"
-  description "Run a shell command"
-  param :command, String, "Command to execute", required: true
-  def call(command:)
-    {success: system(command)}
-  end
-end
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm, tools: [System])
-ses.talk("Run `date`.")
-ses.talk(ses.functions.map(&:call)) # report return value to the LLM
+mcp = LLM.mcp(stdio: {argv: ["npx", "-y", "@modelcontextprotocol/server-filesystem", Dir.pwd]})
+begin
+  mcp.start
+  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
+  ctx.talk("List the directories in this project.")
+  ctx.talk(ctx.functions.call) while ctx.functions.any?
+ensure
+  mcp.stop
+end
 ```
-#### Agents
-The [LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
-class provides a class-level DSL for defining reusable, preconfigured
-assistants with defaults for model, tools, schema, and instructions.
-Instructions are injected only on the first request, and unlike
-[LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html),
-an [LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
-will automatically call tools when needed:
+You can also connect to a hosted MCP server over HTTP. This is useful when the
+server already runs remotely and exposes MCP through a URL instead of a local
+process:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-class SystemAdmin < LLM::Agent
-  model "gpt-4.1"
-  instructions "You are a Linux system admin"
-  tools Shell
-  schema Result
-end
 llm = LLM.openai(key: ENV["KEY"])
-agent = SystemAdmin.new(llm)
-res = agent.talk("Run 'date'")
+mcp = LLM.mcp(http: {
+  url: "https://api.githubcopilot.com/mcp/",
+  headers: {"Authorization" => "Bearer #{ENV.fetch("GITHUB_PAT")}"}
+})
+begin
+  mcp.start
+  ctx = LLM::Context.new(llm, stream: $stdout, tools: mcp.tools)
+  ctx.talk("List the available GitHub MCP toolsets.")
+  ctx.talk(ctx.functions.call) while ctx.functions.any?
+ensure
+  mcp.stop
+end
 ```
-#### Prompts
+#### Streaming Chat
-The [LLM::Prompt](https://0x1eef.github.io/x/llm.rb/LLM/Prompt.html)
-class represents a single request composed of multiple messages.
-It is useful when a single turn needs more than one message, for example:
-system instructions plus one or more user messages, or a replay of
-prior context:
+This example demonstrates llm.rb's streaming support. The `stream: $stdout`
+parameter tells the context to write responses incrementally as they arrive
+from the LLM. The `Context` object manages the conversation history, and
+`talk()` sends your input while automatically appending both your message and
+the LLM's response to the context. Streams accept any object with `#<<`,
+giving you flexibility to pipe output to files, network sockets, or custom
+buffers:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
-prompt = ses.prompt do
-  system "Be concise and show your reasoning briefly."
-  user "If a train goes 60 mph for 1.5 hours, how far does it travel?"
-  user "Now double the speed for the same time."
+ctx = LLM::Context.new(llm, stream: $stdout)
+loop do
+  print "> "
+  ctx.talk(STDIN.gets || break)
+  puts
 end
-ses.talk(prompt)
 ```
-But prompts are not session-scoped. [LLM::Prompt](https://0x1eef.github.io/x/llm.rb/LLM/Prompt.html)
-is a first-class object that you can build and pass around independently of a session.
-This enables patterns where you compose a prompt in one part of your code,
-and execute it through a session elsewhere:
+#### Tool Calling
+Tools in llm.rb can be defined as classes inheriting from `LLM::Tool` or as
+closures using `LLM.function`. When the LLM requests a tool call, the context
+stores `Function` objects in `ctx.functions`. The `call()` method executes all
+pending functions and returns their results to the LLM. Tools support
+structured parameters with JSON Schema validation and automatically adapt to
+each provider's API format (OpenAI, Anthropic, Google, etc.):
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
+class System < LLM::Tool
+  name "system"
+  description "Run a shell command"
+  param :command, String, "Command to execute", required: true
-prompt = LLM::Prompt.new(llm) do
-  system "Be concise and show your reasoning briefly."
-  user "If a train goes 60 mph for 1.5 hours, how far does it travel?"
-  user "Now double the speed for the same time."
+  def call(command:)
+    {success: system(command)}
+  end
 end
-ses.talk(prompt)
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(llm, stream: $stdout, tools: [System])
+ctx.talk("Run `date`.")
+ctx.talk(ctx.functions.call) while ctx.functions.any?
 ```
-#### Threads
-llm.rb is designed for threaded environments with throughput in mind.
-Locks are used selectively, and localized state is preferred wherever
-possible. Blanket locking across every class could help guarantee
-correctness but it could also add contention, reduce throughput,
-and increase complexity.
-That's why we decided to optimize for both correctness and throughput
-instead. An important part of that design is guaranteeing that
-[LLM::Provider](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html)
-is safe to share and use across threads. [LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html) and
-[LLM::Agent](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html) are
-stateful objects that should be kept local to a single thread.
-[LLM::Tracer](https://0x1eef.github.io/x/llm.rb/LLM/Tracer.html) and its
-subclasses are also designed to be thread-local, which means that
-`llm.tracer = ...` only impacts the current thread and must be set
-again in each thread where a tracer is desired. This avoids contention
-on tracer state, keeps tracing isolated per thread, and allows different
-tracers to be used in different threads simultaneously.
-So the recommended pattern is to keep one session, tracer or agent per
-thread, and share a provider across multiple threads:
+#### Structured Outputs
+The `LLM::Schema` system lets you define JSON schemas that LLMs must follow.
+Schemas can be defined as classes with `property` declarations or built
+programmatically using a fluent interface. When you pass a schema to a context,
+llm.rb automatically configures the provider's JSON mode and validates
+responses against your schema. The `content!` method returns the parsed JSON
+object, while errors are captured as structured data rather than raising
+exceptions:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
+require "pp"
-llm = LLM.openai(key: ENV["KEY"]).persist!
-schema = llm.schema.object(answer: llm.schema.integer.required)
+class Report < LLM::Schema
+  property :category, Enum["performance", "security", "outage"], "Report category", required: true
+  property :summary, String, "Short summary", required: true
+  property :impact, OneOf[String, Integer], "Primary impact, as text or a count", required: true
+  property :services, Array[String], "Impacted services", required: true
+  property :timestamp, String, "When it happened", optional: true
+end
-vals = 10.times.map do |x|
-  Thread.new do
-    llm.tracer = LLM::Tracer::Logger.new(llm, path: "thread#{x}.log")
-    ses = LLM::Session.new(llm, schema:)
-    res = ses.talk "#{x} + 5 = ?"
-    res.content!
-  end
-end.map(&:value)
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(llm, schema: Report)
+res = ctx.talk("Structure this report: 'Database latency spiked at 10:42 UTC, causing 5% request timeouts for 12 minutes.'")
+pp res.content!
-vals.each { |val| puts val }
+# {
+#   "category" => "performance",
+#   "summary" => "Database latency spiked, causing 5% request timeouts for 12 minutes.",
+#   "impact" => "5% request timeouts",
+#   "services" => ["Database"],
+#   "timestamp" => "2024-06-05T10:42:00Z"
+# }
 ```
-## Features
-#### General
-- ✅  Unified API across providers
-- 📦  Zero runtime deps (stdlib-only)
-- 🧵  Thread-safe providers for multi-threaded workloads
-- 🧩  Pluggable JSON adapters (JSON, Oj, Yajl, etc)
-- 🧱  Builtin tracer API ([LLM::Tracer](https://0x1eef.github.io/x/llm.rb/LLM/Tracer.html))
+## Providers
-#### Optionals
+llm.rb supports multiple LLM providers with a unified API.
+All providers share the same context, tool, and concurrency interfaces, making
+it easy to switch between cloud and local models:
-- ♻️  Optional persistent HTTP pool via net-http-persistent ([net-http-persistent](https://github.com/drbrain/net-http-persistent))
-- 📈  Optional telemetry support via OpenTelemetry ([opentelemetry-sdk](https://github.com/open-telemetry/opentelemetry-ruby))
-- 🪵  Optional logging support via Ruby's standard library ([ruby/logger](https://github.com/ruby/logger))
+- **OpenAI** (`LLM.openai`)
+- **Anthropic** (`LLM.anthropic`)
+- **Google** (`LLM.google`)
+- **DeepSeek** (`LLM.deepseek`)
+- **xAI** (`LLM.xai`)
+- **zAI** (`LLM.zai`)
+- **Ollama** (`LLM.ollama`)
+- **Llama.cpp** (`LLM.llamacpp`)
-#### Chat, Agents
-- 🧠  Stateless + stateful chat (completions + responses)
-- 💾  Save and restore sessions across processes
-- 🤖  Tool calling / function execution
-- 🔁  Agent tool-call auto-execution (bounded)
-- 🗂️  JSON Schema structured output
-- 📡  Streaming responses
-#### Media
-- 🗣️  TTS, transcription, translation
-- 🖼️  Image generation + editing
-- 📎  Files API + prompt-aware file inputs
-- 📦  Streaming multipart uploads (no full buffering)
-- 💡  Multimodal prompts (text, documents, audio, images, video, URLs)
-#### Embeddings
-- 🧮  Embeddings
-- 🧱  OpenAI vector stores (RAG)
+## Production
-#### Miscellaneous
-- 📜  Models API
-- 🔧  OpenAI responses + moderations
+#### Ready for production
-## Matrix
+llm.rb is designed for production use from the ground up:
-| Feature / Provider                  | OpenAI | Anthropic | Gemini | DeepSeek | xAI (Grok) | zAI    | Ollama | LlamaCpp |
-|--------------------------------------|:------:|:---------:|:------:|:--------:|:----------:|:------:|:------:|:--------:|
-| **Chat Completions**                 | ✅     | ✅        | ✅     | ✅       | ✅         | ✅     | ✅     | ✅       |
-| **Streaming**                        | ✅     | ✅        | ✅     | ✅       | ✅         | ✅     | ✅     | ✅       |
-| **Tool Calling**                     | ✅     | ✅        | ✅     | ✅       | ✅         | ✅     | ✅     | ✅       |
-| **JSON Schema / Structured Output**  | ✅     | ❌        | ✅     | ❌       | ✅         | ❌     | ✅*    | ✅*      |
-| **Embeddings**                       | ✅     | ✅        | ✅     | ✅       | ❌         | ❌     | ✅     | ✅       |
-| **Multimodal Prompts** *(text, documents, audio, images, videos, URLs, etc)* | ✅     | ✅        | ✅     | ✅       | ✅         | ❌     | ✅     | ✅       |
-| **Files API**                        | ✅     | ✅        | ✅     | ❌       | ❌         | ❌     | ❌     | ❌       |
-| **Models API**                       | ✅     | ✅        | ✅     | ✅       | ✅         | ❌     | ✅     | ✅       |
-| **Audio (TTS / Transcribe / Translate)** | ✅  | ❌        | ✅     | ❌       | ❌         | ❌     | ❌     | ❌       |
-| **Image Generation & Editing**       | ✅     | ❌        | ✅     | ❌       | ✅         | ❌     | ❌     | ❌       |
-| **Local Model Support**              | ❌     | ❌        | ❌     | ❌       | ❌         | ❌     | ✅     | ✅       |
-| **Vector Stores (RAG)**               | ✅     | ❌        | ❌     | ❌       | ❌         | ❌     | ❌     | ❌       |
-| **Responses**                        | ✅     | ❌        | ❌     | ❌       | ❌         | ❌     | ❌     | ❌       |
-| **Moderations**                      | ✅     | ❌        | ❌     | ❌       | ❌         | ❌     | ❌     | ❌       |
+- **Thread-safe providers** - Share `LLM::Provider` instances across your application
+- **Thread-local contexts** - Keep `LLM::Context` instances thread-local for state isolation
+- **Cost tracking** - Know your spend before the bill arrives
+- **Observability** - Built-in tracing with OpenTelemetry support
+- **Persistence** - Save and restore contexts across processes
+- **Performance** - Swap JSON adapters and enable HTTP connection pooling
+- **Error handling** - Structured errors, not unpredictable exceptions
-\* JSON Schema support in Ollama/LlamaCpp depends on the model, not the API.
+#### Tracing
+llm.rb includes built-in tracers for local logging, OpenTelemetry, and
+LangSmith. Assign a tracer to a provider and all context requests and tool
+calls made through that provider will be instrumented. Tracers are local to
+the current fiber, so the same provider can use different tracers in different
+concurrent tasks without interfering with each other.
-## Examples
-### Providers
-#### LLM::Provider
-All providers inherit from [LLM::Provider](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html) &ndash;
-they share a common interface and set of functionality. Each provider can be instantiated
-using an API key (if required) and an optional set of configuration options via
-[the singleton methods of LLM](https://0x1eef.github.io/x/llm.rb/LLM.html). For example:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-##
-# remote providers
-llm = LLM.openai(key: "yourapikey")
-llm = LLM.gemini(key: "yourapikey")
-llm = LLM.anthropic(key: "yourapikey")
-llm = LLM.xai(key: "yourapikey")
-llm = LLM.zai(key: "yourapikey")
-llm = LLM.deepseek(key: "yourapikey")
-##
-# local providers
-llm = LLM.ollama(key: nil)
-llm = LLM.llamacpp(key: nil)
-```
-#### LLM::Response
-All provider methods that perform requests return an
-[LLM::Response](https://0x1eef.github.io/x/llm.rb/LLM/Response.html).
-If the HTTP response is JSON (`content-type: application/json`),
-`response.body` is parsed into an
-[LLM::Object](https://0x1eef.github.io/x/llm.rb/LLM/Object.html) for
-dot-access. For non-JSON responses, `response.body` is a raw string.
-It is also possible to access top-level keys directly on the response
-(eg: `res.object` instead of `res.body.object`):
+Use the logger tracer when you want structured logs through Ruby's standard
+library:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.models.all
-puts res.object
-puts res.data.first.id
-```
-#### Persistence
-The llm.rb library can maintain a process-wide connection pool
-for each provider that is instantiated. This feature can improve
-performance but it is optional, the implementation depends on
-[net-http-persistent](https://github.com/drbrain/net-http-persistent),
-and the gem should be installed separately:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
+llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
-llm  = LLM.openai(key: ENV["KEY"]).persist!
-res1 = llm.responses.create "message 1"
-res2 = llm.responses.create "message 2", previous_response_id: res1.response_id
-res3 = llm.responses.create "message 3", previous_response_id: res2.response_id
-puts res3.output_text
+ctx = LLM::Context.new(llm)
+ctx.talk("Hello")
 ```
-#### Telemetry
-The llm.rb library includes telemetry support through its tracer API, and it
-can be used to trace LLM requests. It can be useful for debugging, monitoring,
-and observability. The primary use case in mind is integration with tools like
-[LangSmith](https://www.langsmith.com/).
-It is worth mentioning that tracers are local to a thread, and they
-should be configured per thread. That means that `llm.tracer = LLM::Tracer::Telemetry.new(llm)`
-only impacts the current thread, and it should be repeated in each thread where
-tracing is required.
-The telemetry implementation uses the [opentelemetry-sdk](https://github.com/open-telemetry/opentelemetry-ruby)
-and is based on the [gen-ai telemetry spec(s)](https://github.com/open-telemetry/semantic-conventions/blob/main/docs/gen-ai/).
-This feature is optional, disabled by default, and the [opentelemetry-sdk](https://github.com/open-telemetry/opentelemetry-ruby)
-gem should be installed separately. Please also note that llm.rb will take care of
-loading and configuring the [opentelemetry-sdk](https://github.com/open-telemetry/opentelemetry-ruby)
-library for you, and llm.rb configures an in-memory exporter that doesn't have
-external dependencies by default:
+Use the telemetry tracer when you want OpenTelemetry spans. This requires the
+`opentelemetry-sdk` gem, and exporters such as OTLP can be added separately:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-require "pp"
 llm = LLM.openai(key: ENV["KEY"])
 llm.tracer = LLM::Tracer::Telemetry.new(llm)
-ses = LLM::Session.new(llm)
-ses.talk "Hello world!"
-ses.talk "Adios."
-ses.tracer.spans.each { |span| pp span }
+ctx = LLM::Context.new(llm)
+ctx.talk("Hello")
+pp llm.tracer.spans
 ```
-The llm.rb library also supports export through the OpenTelemetry Protocol (OTLP).
-OTLP is a standard protocol for exporting telemetry data, and it is supported by
-multiple observability tools. By default the export is batched in the background,
-and happens automatically but short lived scripts might need to
-[explicitly flush](https://0x1eef.github.io/x/llm.rb/LLM/Tracer/Telemetry#flush!-instance_method)
-the exporter before they exit &ndash; otherwise some telemetry data could be lost:
-```ruby
- #!/usr/bin/env ruby
- require "llm"
- require "opentelemetry-exporter-otlp"
- endpoint = "https://api.smith.langchain.com/otel/v1/traces"
- exporter = OpenTelemetry::Exporter::OTLP::Exporter.new(endpoint:)
- llm = LLM.openai(key: ENV["KEY"])
- llm.tracer = LLM::Tracer::Telemetry.new(llm, exporter:)
- ses = LLM::Session.new(llm)
- ses.talk "hello"
- ses.talk "how are you?"
- at_exit do
-   # Helpful for short-lived scripts, otherwise the exporter
-   # might not have time to flush pending telemetry data
-   ses.tracer.flush!
- end
- ```
-#### Logger
-The llm.rb library includes simple logging support through its
-tracer API, and Ruby's standard library ([ruby/logger](https://github.com/ruby/logger)).
-This feature is optional, disabled by default, and it can be useful for debugging and/or
-monitoring requests to LLM providers. The `path` or `io` options can be used to choose
-where logs are written, and by default it is set to `$stdout`. Like other tracers,
-the logger tracer is local to a thread:
+Use the LangSmith tracer when you want LangSmith-compatible metadata and trace
+grouping on top of the telemetry tracer:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-llm.tracer = LLM::Tracer::Logger.new(llm, io: $stdout)
+llm.tracer = LLM::Tracer::Langsmith.new(
+  llm,
+  metadata: {env: "dev"},
+  tags: ["chatbot"]
+)
-ses = LLM::Session.new(llm)
-ses.talk "Hello world!"
-ses.talk "Adios."
+ctx = LLM::Context.new(llm)
+ctx.talk("Hello")
 ```
-#### Serialization
+#### Thread Safety
-[LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html) can be
-serialized and deserialized across process boundaries and persisted to
-storage such as files, a `jsonb` column (PostgreSQL), or other backends
-through a JSON representation of the history encapsulated by
-[LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html)
-&ndash; inclusive of tool metadata as well:
+llm.rb uses Ruby's `Monitor` class to ensure thread safety at the provider
+level, allowing you to share a single provider instance across multiple threads
+while maintaining state isolation through thread-local contexts. This design
+enables efficient resource sharing while preventing race conditions in
+concurrent applications:
-* Process 1
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
+# Thread-safe providers - create once, use everywhere
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
-ses.talk "Howdy partner"
-ses.talk "I'll see you later"
-ses.save(path: "session.json")
-```
-* Process 2
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-require "pp"
-llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
-ses.restore(path: "session.json")
-ses.talk "Howdy partner. I'm back"
-pp ses.messages
-```
-But how does it work without a file ? The [LLM::Session](https://0x1eef.github.io/x/llm.rb/LLM/Session.html)
-class implements `#to_json` and it can be used to obtain a JSON representation
-of a session that can be stored in a `jsonb` column in PostgreSQL, or any
-other storage backend. The session can then be restored from the JSON
-representation via the restore method and its `string` argument:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
+# Each thread should have its own context for state isolation
+Thread.new do
+  ctx = LLM::Context.new(llm)  # Thread-local context
+  ctx.talk("Hello from thread 1")
+end
-llm = LLM.openai(key: ENV["KEY"])
-ses1 = LLM::Session.new(llm)
-ses1.talk "Howdy partner"
-ses1.talk "I'll see you later"
-json = ses1.to_json
-ses2 = LLM::Session.new(llm)
-ses2.restore(string: json)
-ses2.talk "Howdy partner. I'm back"
+Thread.new do
+  ctx = LLM::Context.new(llm)  # Thread-local context
+  ctx.talk("Hello from thread 2")
+end
 ```
-### Tools
+#### Performance Tuning
-#### LLM::Function
-The following example demonstrates [LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-and how it can define a local function (which happens to be a tool), and how
-a provider (such as OpenAI) can then detect when we should call the function.
-Its most notable feature is that it can act as a closure and has access to
-its surrounding scope, which can be useful in some situations:
+llm.rb's JSON adapter system lets you swap JSON libraries for better
+performance in high-throughput applications. The library supports stdlib JSON,
+Oj, and Yajl, with Oj typically offering the best performance. Additionally,
+you can enable HTTP connection pooling using the optional `net-http-persistent`
+gem to reduce connection overhead in production environments:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-llm  = LLM.openai(key: ENV["KEY"])
-tool = LLM.function(:system) do |fn|
-  fn.description "Run a shell command"
-  fn.params do |schema|
-    schema.object(command: schema.string.required)
-  end
-  fn.define do |command:|
-    ro, wo = IO.pipe
-    re, we = IO.pipe
-    Process.wait Process.spawn(command, out: wo, err: we)
-    [wo,we].each(&:close)
-    {stderr: re.read, stdout: ro.read}
-  end
-end
-ses = LLM::Session.new(llm, tools: [tool])
-ses.talk "Your task is to run shell commands via a tool.", role: :user
-ses.talk "What is the current date?", role: :user
-ses.talk ses.functions.map(&:call) # report return value to the LLM
+# Swap JSON libraries for better performance
+LLM.json = :oj  # Use Oj for faster JSON parsing
-ses.talk "What operating system am I running?", role: :user
-ses.talk ses.functions.map(&:call) # report return value to the LLM
-##
-# {stderr: "", stdout: "Thu May  1 10:01:02 UTC 2025"}
-# {stderr: "", stdout: "FreeBSD"}
+# Enable HTTP connection pooling for high-throughput applications
+llm = LLM.openai(key: ENV["KEY"]).persist!  # Uses net-http-persistent when available
 ```
-#### LLM::Tool
-The [LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html) class can be used
-to implement a [LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-as a class. Under the hood, a subclass of [LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html)
-wraps an instance of [LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-and delegates to it.
+#### Model Registry
-The choice between [LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-and [LLM::Tool](https://0x1eef.github.io/x/llm.rb/LLM/Tool.html) is often a matter of
-preference but each carry their own benefits. For example, [LLM::Function](https://0x1eef.github.io/x/llm.rb/LLM/Function.html)
-has the benefit of being a closure that has access to its surrounding context and
-sometimes that is useful:
+llm.rb includes a local model registry that provides metadata about model
+capabilities, pricing, and limits without requiring API calls. The registry is
+shipped with the gem and sourced from https://models.dev, giving you access to
+up-to-date information about context windows, token costs, and supported
+modalities for each provider:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-class System < LLM::Tool
-  name "system"
-  description "Run a shell command"
-  param :command, String, "The command to execute", required: true
-  def call(command:)
-    ro, wo = IO.pipe
-    re, we = IO.pipe
-    Process.wait Process.spawn(command, out: wo, err: we)
-    [wo,we].each(&:close)
-    {stderr: re.read, stdout: ro.read}
-  end
-end
-llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm, tools: [System])
-ses.talk "Your task is to run shell commands via a tool.", role: :user
-ses.talk "What is the current date?", role: :user
-ses.talk ses.functions.map(&:call) # report return value to the LLM
-ses.talk "What operating system am I running?", role: :user
-ses.talk ses.functions.map(&:call) # report return value to the LLM
-##
-# {stderr: "", stdout: "Thu May  1 10:01:02 UTC 2025"}
-# {stderr: "", stdout: "FreeBSD"}
+# Access model metadata, capabilities, and pricing
+registry = LLM.registry_for(:openai)
+model_info = registry.limit(model: "gpt-4.1")
+puts "Context window: #{model_info.context} tokens"
+puts "Cost: $#{model_info.cost.input}/1M input tokens"
 ```
-### Files
+## More Examples
-#### Create
+#### Responses API
-The OpenAI and Gemini providers provide a Files API where a client can upload files
-that can be referenced from a prompt, and with other APIs as well. The following
-example uses the OpenAI provider to describe the contents of a PDF file after
-it has been uploaded. The file (a specialized instance of
-[LLM::Response](https://0x1eef.github.io/x/llm.rb/LLM/Response.html)
-) is given as part of a prompt that is understood by llm.rb:
+llm.rb also supports OpenAI's Responses API through `llm.responses` and
+`ctx.respond`. This API can maintain response state server-side and can reduce
+how much conversation state needs to be sent on each turn:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
-require "pp"
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
-file = llm.files.create(file: "/tmp/llm-book.pdf")
-res = ses.talk ["Tell me about this file", file]
-pp res.content
-```
-### Prompts
+ctx = LLM::Context.new(llm)
-#### Multimodal
+ctx.respond("Your task is to answer the user's questions", role: :developer)
+res = ctx.respond("What is the capital of France?")
+puts res.output_text
+```
-LLMs are great with text, but many can also handle images, audio, video,
-and URLs. With llm.rb you pass those inputs by tagging them with one of
-the following methods. And for multipart prompts, we can pass an array
-where each element is a part of the input. See the example below for
-details, in the meantime here are the methods to know for multimodal
-inputs:
+#### Context Persistence
-* `ses.image_url` for an image URL
-* `ses.local_file` for a local file
-* `ses.remote_file` for a file already uploaded via the provider's Files API
+Contexts can be serialized and restored across process boundaries. This makes
+it possible to persist conversation state in a file, database, or queue and
+resume work later:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-ses = LLM::Session.new(llm)
-res = ses.talk ["Tell me about this image URL", ses.image_url(url)]
-res = ses.talk ["Tell me about this PDF", ses.remote_file(file)]
-res = ses.talk ["Tell me about this image", ses.local_file(path)]
+ctx = LLM::Context.new(llm)
+ctx.talk("Hello")
+ctx.talk("Remember that my favorite language is Ruby")
+ctx.save(path: "context.json")
+restored = LLM::Context.new(llm)
+restored.restore(path: "context.json")
+res = restored.talk("What is my favorite language?")
+puts res.content
 ```
-### Audio
-#### Speech
+#### Agents
-Some but not all providers implement audio generation capabilities that
-can create speech from text, transcribe audio to text, or translate
-audio to text (usually English). The following example uses the OpenAI provider
-to create an audio file from a text prompt. The audio is then moved to
-`${HOME}/hello.mp3` as the final step:
+Agents in llm.rb are reusable, preconfigured assistants that automatically
+execute tool calls and maintain conversation state. Unlike contexts which
+require manual tool execution, agents automatically handle the tool call loop,
+making them ideal for autonomous workflows where you want the LLM to
+independently use available tools to accomplish tasks:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
+class SystemAdmin < LLM::Agent
+  model "gpt-4.1"
+  instructions "You are a Linux system admin"
+  tools Shell
+  schema Result
+end
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.audio.create_speech(input: "Hello world")
-IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
+agent = SystemAdmin.new(llm)
+res = agent.talk("Run 'date'")
 ```
-#### Transcribe
+#### Cost Tracking
-The following example transcribes an audio file to text. The audio file
-(`${HOME}/hello.mp3`) was theoretically created in the previous example,
-and the result is printed to the console. The example uses the OpenAI
-provider to transcribe the audio file:
+llm.rb provides built-in cost estimation that works without making additional
+API calls. The cost tracking system uses the local model registry to calculate
+estimated costs based on token usage, giving you visibility into spending
+before bills arrive. This is particularly useful for monitoring usage in
+production applications and setting budget alerts:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.audio.create_transcription(
-  file: File.join(Dir.home, "hello.mp3")
-)
-puts res.text # => "Hello world."
+ctx = LLM::Context.new(llm)
+ctx.talk "Hello"
+puts "Estimated cost so far: $#{ctx.cost}"
+ctx.talk "Tell me a joke"
+puts "Estimated cost so far: $#{ctx.cost}"
 ```
-#### Translate
+#### Multimodal Prompts
-The following example translates an audio file to text. In this example
-the audio file (`${HOME}/bomdia.mp3`) is theoretically in Portuguese,
-and it is translated to English. The example uses the OpenAI provider,
-and at the time of writing, it can only translate to English:
+Contexts provide helpers for composing multimodal prompts from URLs, local
+files, and provider-managed remote files. These tagged objects let providers
+adapt the input into the format they expect:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.audio.create_translation(
-  file: File.join(Dir.home, "bomdia.mp3")
-)
-puts res.text # => "Good morning."
-```
-### Images
+ctx = LLM::Context.new(llm)
-#### Create
-Some but not all LLM providers implement image generation capabilities that
-can create new images from a prompt, or edit an existing image with a
-prompt. The following example uses the OpenAI provider to create an
-image of a dog on a rocket to the moon. The image is then written to
-`${HOME}/dogonrocket.png` as the final step:
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-llm = LLM.openai(key: ENV["KEY"])
-res = llm.images.create(prompt: "a dog on a rocket to the moon")
-IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
+res = ctx.talk ["Describe this image", ctx.image_url("https://example.com/cat.jpg")]
+puts res.content
 ```
-#### Edit
+#### Audio Generation
-The following example is focused on editing a local image with the aid
-of a prompt. The image (`/tmp/llm-logo.png`) is returned to us with a hat.
-The image is then written to `${HOME}/logo-with-hat.png` as
-the final step:
+llm.rb supports OpenAI's audio API for text-to-speech generation, allowing you
+to create speech from text with configurable voices and output formats. The
+audio API returns binary audio data that can be streamed directly to files or
+other IO objects, enabling integration with multimedia applications:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.images.edit(
-  image: "/tmp/llm-logo.png",
-  prompt: "add a hat to the logo",
-)
-IO.copy_stream res.images[0], File.join(Dir.home, "logo-with-hat.png")
+res = llm.audio.create_speech(input: "Hello world")
+IO.copy_stream res.audio, File.join(Dir.home, "hello.mp3")
 ```
-#### Variations
+#### Image Generation
-The following example is focused on creating variations of a local image.
-The image (`/tmp/llm-logo.png`) is returned to us with five different variations.
-The images are then written to `${HOME}/logo-variation0.png`, `${HOME}/logo-variation1.png`
-and so on as the final step:
+llm.rb provides access to OpenAI's DALL-E image generation API through a
+unified interface. The API supports multiple response formats including
+base64-encoded images and temporary URLs, with automatic handling of binary
+data streaming for efficient file operations:
 ```ruby
 #!/usr/bin/env ruby
 require "llm"
 llm = LLM.openai(key: ENV["KEY"])
-res = llm.images.create_variation(
-  image: "/tmp/llm-logo.png",
-  n: 5
-)
-res.images.each.with_index do |image, index|
-  IO.copy_stream image,
-                 File.join(Dir.home, "logo-variation#{index}.png")
-end
+res = llm.images.create(prompt: "a dog on a rocket to the moon")
+IO.copy_stream res.images[0], File.join(Dir.home, "dogonrocket.png")
 ```
-### Embeddings
-#### Text
+#### Embeddings
-The
-[`LLM::Provider#embed`](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html#embed-instance_method)
-method returns vector embeddings for one or more text inputs. A common
-use is semantic search (store vectors, then query for similar text):
+llm.rb's embedding API generates vector representations of text for semantic
+search and retrieval-augmented generation (RAG) workflows. The API supports
+batch processing of multiple inputs and returns normalized vectors suitable for
+vector similarity operations, with consistent dimensionality across providers:
 ```ruby
 #!/usr/bin/env ruby
@@ -760,52 +553,32 @@ puts res.class
 puts res.embeddings.size
 puts res.embeddings[0].size
-##
 # LLM::Response
 # 3
 # 1536
 ```
-### Models
+## Real-World Example: Relay
-#### List
+See how these pieces come together in a complete application architecture with
+[Relay](https://github.com/llmrb/relay), a production-ready LLM application
+built on llm.rb that demonstrates:
-Almost all LLM providers provide a models endpoint that allows a client to
-query the list of models that are available to use. The list is dynamic,
-maintained by LLM providers, and it is independent of a specific llm.rb
-release:
+- Context management across requests
+- Tool composition and execution
+- Concurrent workflows
+- Cost tracking and observability
+- Production deployment patterns
-```ruby
-#!/usr/bin/env ruby
-require "llm"
-require "pp"
-##
-# List all models
-llm = LLM.openai(key: ENV["KEY"])
-llm.models.all.each do |model|
-  puts "model: #{model.id}"
-end
-##
-# Select a model
-model = llm.models.all.find { |m| m.id == "gpt-3.5-turbo" }
-ses = LLM::Session.new(llm, model: model.id)
-res = ses.talk "Hello #{model.id} :)"
-pp res.content
-```
+Watch the screencast:
-## Install
+[![Watch the llm.rb screencast](https://img.youtube.com/vi/Jb7LNUYlCf4/maxresdefault.jpg)](https://www.youtube.com/watch?v=x1K4wMeO_QA)
-llm.rb can be installed via rubygems.org:
+## Installation
-	gem install llm.rb
-## Sources
-* [GitHub.com](https://github.com/llmrb/llm.rb)
-* [GitLab.com](https://gitlab.com/llmrb/llm.rb)
-* [Codeberg.org](https://codeberg.org/llmrb/llm.rb)
+```bash
+gem install llm.rb
+```
 ## License