RubyGems - llm.rb - Versions diffs - 4.14.0 → 4.16.0 - Mend

llm.rb 4.14.0 → 4.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +83 -0
data/README.md +93 -28
data/data/anthropic.json +218 -198
data/data/deepseek.json +1 -1
data/data/google.json +481 -429
data/data/openai.json +742 -704
data/data/xai.json +277 -277
data/data/zai.json +160 -126
data/lib/llm/active_record/acts_as_llm.rb +238 -0
data/lib/llm/active_record.rb +3 -0
data/lib/llm/context.rb +15 -10
data/lib/llm/eventstream/parser.rb +40 -8
data/lib/llm/provider.rb +16 -1
data/lib/llm/providers/anthropic/stream_parser.rb +6 -3
data/lib/llm/providers/google/stream_parser.rb +6 -3
data/lib/llm/providers/ollama/stream_parser.rb +3 -2
data/lib/llm/providers/openai/audio.rb +4 -4
data/lib/llm/providers/openai/files.rb +6 -6
data/lib/llm/providers/openai/images.rb +4 -4
data/lib/llm/providers/openai/models.rb +2 -2
data/lib/llm/providers/openai/moderations.rb +2 -2
data/lib/llm/providers/openai/responses/stream_parser.rb +216 -91
data/lib/llm/providers/openai/responses.rb +4 -4
data/lib/llm/providers/openai/stream_parser.rb +111 -57
data/lib/llm/providers/openai/vector_stores.rb +12 -12
data/lib/llm/providers/openai.rb +4 -4
data/lib/llm/response.rb +12 -4
data/lib/llm/sequel/plugin.rb +252 -0
data/lib/llm/stream/queue.rb +2 -2
data/lib/llm/stream.rb +2 -2
data/lib/llm/version.rb +1 -1
data/lib/sequel/plugins/llm.rb +8 -0
metadata +5 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: ea1addf0bff644fa11e4f69a806f8ff5b7aa04fbbbc3f0592bd51b6ebc07f0f8
-  data.tar.gz: a3c846b9744e4ef230e2f23ed6ab42f6b4c84a0165b8bc066b7f6a003ee8fc00
+  metadata.gz: 793403110075dfcc650d4b0931ebcdfee74787ce1412b61318d7d749b22c3e9f
+  data.tar.gz: 469e1b635896822483e8a6ec7cebaf8c34443b90010435c4ae2d4899fd71c1b4
 SHA512:
-  metadata.gz: 7387da06d824d42753ff30455b0e464b7ca6eaa43e9410ce814ad96451c5595154d1e721fb69c9edc0971208aaf8a011ce42078827b57971e0e7c0a66eb0db6e
-  data.tar.gz: 590442f434086b7215d664e6b5d474130499a14fba16810ff7e0b04878d25e46ca8983057af5fd9275d8415d95da6e1439b84388fa450b8c06bc7841c832a48e
+  metadata.gz: 515e571ad97704659363a764f633c9199239d0ad1e741b1239e7528cf455160a1246ba756fa16201c47f03b5185939809842eef28aaeb45123aa38c038c23232
+  data.tar.gz: 180e987e00885e15d004965e98c5b70b00481568be43c168431afaad27baed40adfb8c93a9cad10fb96210dad3e768407524422b5edf0c62b011ef57900ac742

data/CHANGELOG.md CHANGED Viewed

@@ -2,8 +2,79 @@
 ## Unreleased
+Changes since `v4.16.0`.
+## v4.16.0
+Changes since `v4.15.0`.
+This release expands ORM support with built-in ActiveRecord persistence
+and improves compatibility with OpenAI-compatible gateways, proxies, and
+self-hosted servers that use non-standard API root paths.
+### Change
+* **Support OpenAI-compatible base paths** <br>
+  Add `base_path:` to provider configuration so OpenAI-compatible
+  endpoints can vary both host and API prefix. This supports providers,
+  proxies, and gateways that keep OpenAI request shapes but use
+  non-standard URL layouts such as DeepInfra's `/v1/openai/...`.
+* **Add ActiveRecord context persistence with `acts_as_llm`** <br>
+  Add a built-in ActiveRecord wrapper that mirrors the Sequel plugin
+  API so applications can persist `LLM::Context` state on records with
+  default columns, provider/context hooks, validation-backed writes,
+  and `format: :string`, `:json`, or `:jsonb` storage.
+## v4.15.0
 Changes since `v4.14.0`.
+### Change
+* **Reduce OpenAI stream parser merge overhead** <br>
+  Special-case the most common single-field deltas, streamline
+  incremental tool-call merging, and avoid repeated JSON parse attempts
+  until streamed tool arguments look complete.
+* **Cache streaming callback capabilities in parsers** <br>
+  Cache callback support checks once at parser initialization time in
+  the OpenAI, OpenAI Responses, Anthropic, Google, and Ollama stream
+  parsers instead of repeating `respond_to?` checks on hot streaming
+  paths.
+* **Reduce OpenAI Responses parser lookup overhead** <br>
+  Special-case the hot Responses API event paths and cache the current
+  output item and content part so streamed output text deltas do less
+  repeated nested lookup work.
+* **Add a Sequel context persistence plugin** <br>
+  Add `plugin :llm` for Sequel models so apps can persist
+  `LLM::Context` state with default columns and pass provider setup
+  through `provider:` when needed. The plugin now also supports
+  `format: :string`, `:json`, or `:jsonb` for text and native JSON
+  storage when Sequel JSON typecasting is enabled.
+* **Improve streaming parser performance** <br>
+  In the local replay-based `stream_parser` benchmark versus
+  `v4.14.0` (median of 20 samples, 5000 iterations), plain Ruby is a
+  small overall win: the generic eventstream path is about 0.4%
+  faster, the OpenAI stream parser is about 0.5% faster, and the
+  OpenAI Responses parser is about 1.6% faster, with unchanged
+  allocations. Under YJIT on the same benchmark, the generic
+  eventstream path is about 0.9% faster and the OpenAI stream parser
+  is about 0.4% faster, while the OpenAI Responses parser is about
+  0.7% slower, also with unchanged allocations.
+  Compared to `v4.13.0`, the larger `v4.14.0` streaming gains still
+  hold. The generic eventstream path remains dramatically faster than
+  `v4.13.0`, the OpenAI stream parser remains modestly faster, and the
+  OpenAI Responses parser is roughly flat to slightly better depending
+  on runtime. In other words, current keeps the large eventstream win
+  from `v4.14.0`, adds only small incremental changes beyond that, and
+  does not turn the post-`v4.14.0` parser work into another large
+  benchmark jump.
 ## v4.14.0
 Changes since `v4.13.0`.
@@ -40,6 +111,18 @@ parallel tool calls can safely share one connection.
   worthwhile, which lowers allocation churn in the remaining generic
   SSE path.
+* **Improve streaming parser performance** <br>
+  In the local replay-based `stream_parser` benchmark versus `v4.13.0`
+  (median of 20 samples, 5000 iterations):
+  Plain Ruby: the generic eventstream path is about 53% faster with
+  about 32% fewer allocations, the OpenAI stream parser is about 11%
+  faster with about 4% fewer allocations, and the OpenAI Responses
+  parser is about 3% faster with unchanged allocations.
+  YJIT on the current parser benchmark harness: the current tree is
+  about 26% faster than non-YJIT on the generic eventstream path,
+  about 18% faster on the OpenAI stream parser, and about 16% faster
+  on the OpenAI Responses parser, with allocations unchanged.
 ### Fix
 * **Support parallel MCP tool calls on one client** <br>

data/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.14.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.16.0-green.svg?" alt="Version"></a>
 </p>
 ## About
@@ -17,9 +17,9 @@ state.
 It is built for engineers who want control over how these systems run. llm.rb
 stays close to Ruby, runs on the standard library by default, loads optional
 pieces only when needed, and remains easy to extend. It also works well in
-Rails or ActiveRecord applications, where a small wrapper around context
-persistence is enough to save and restore long-lived conversation state across
-requests, jobs, or retries.
+Rails or ActiveRecord applications, with built-in `acts_as_llm`, and includes
+built-in Sequel support through `plugin :llm`, so long-lived context state can
+be saved and restored across requests, jobs, or retries.
 Most LLM libraries stop at request/response APIs. Building real systems means
 stitching together streaming, tools, state, persistence, and external
@@ -34,7 +34,8 @@ so they compose naturally instead of becoming separate subsystems.
 ## Core Concept
-`LLM::Context` is the execution boundary in llm.rb.
+[`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+is the execution boundary in llm.rb.
 It holds:
 - message history
@@ -50,69 +51,93 @@ same context object.
 ### Execution Model
-- **A system layer, not just an API wrapper**
+- **A system layer, not just an API wrapper** <br>
   Put providers, tools, MCP servers, and application APIs behind one runtime
   model instead of stitching them together by hand.
-- **Contexts are central**
+- **Contexts are central** <br>
   Keep history, tools, schema, usage, persistence, and execution state in one
   place instead of spreading them across your app.
-- **Contexts can be serialized**
+- **Contexts can be serialized** <br>
   Save and restore live state for jobs, databases, retries, or long-running
   workflows.
 ### Runtime Behavior
-- **Streaming and tool execution work together**
+- **Streaming and tool execution work together** <br>
   Start tool work while output is still streaming so you can hide latency
   instead of waiting for turns to finish.
-- **Requests can be interrupted cleanly**
+- **Tool calls have an explicit lifecycle** <br>
+  A tool call can be executed, cancelled through
+  [`LLM::Function#cancel`](https://0x1eef.github.io/x/llm.rb/LLM/Function.html#cancel-instance_method),
+  or left unresolved for manual handling, but the normal runtime contract is
+  still that a model-issued tool request is answered with a tool return.
+- **Requests can be interrupted cleanly** <br>
   Stop in-flight provider work through the same runtime instead of treating
-  cancellation as a separate concern. `LLM::Context#cancel!` is inspired by
-  Go's context cancellation model.
-- **Concurrency is a first-class feature**
+  cancellation as a separate concern.
+  [`LLM::Context#cancel!`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html#cancel-21-instance_method)
+  is inspired by Go's context cancellation model.
+- **Concurrency is a first-class feature** <br>
   Use threads, fibers, or async tasks without rewriting your tool layer.
-- **Advanced workloads are built in, not bolted on**
+- **Advanced workloads are built in, not bolted on** <br>
   Streaming, concurrent tool execution, persistence, tracing, and MCP support
   all fit the same runtime model.
 ### Integration
-- **MCP is built in**
+- **MCP is built in** <br>
   Connect to MCP servers over stdio or HTTP without bolting on a separate
   integration stack.
-- **Provider support is broad**
+- **ActiveRecord and Sequel persistence are built in** <br>
+  Use `acts_as_llm` on ActiveRecord models or `plugin :llm` on Sequel models
+  to persist `LLM::Context` state with sensible default columns. Both support
+  `provider:` and `context:` hooks, plus `format: :string` for text columns
+  or `format: :jsonb` for native PostgreSQL JSON storage when ORM JSON
+  typecasting support is enabled.
+- **Persistent HTTP pooling is shared process-wide** <br>
+  When enabled, separate
+  [`LLM::Provider`](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html)
+  instances with the same endpoint settings can share one persistent
+  pool, and separate HTTP
+  [`LLM::MCP`](https://0x1eef.github.io/x/llm.rb/LLM/MCP.html)
+  instances can do the same, instead of each object creating its own
+  isolated per-instance transport.
+- **OpenAI-compatible gateways are supported** <br>
+  Target OpenAI-compatible services such as DeepInfra and OpenRouter, as well
+  as proxies and self-hosted servers, with `host:` and `base_path:` when they
+  preserve OpenAI request shapes but change the API root path.
+- **Provider support is broad** <br>
   Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek,
   Z.ai, xAI, llama.cpp, and Ollama through the same runtime.
-- **Tools are explicit**
+- **Tools are explicit** <br>
   Run local tools, provider-native tools, and MCP tools through the same path
   with fewer special cases.
-- **Providers are normalized, not flattened**
+- **Providers are normalized, not flattened** <br>
   Share one API surface across providers without losing access to provider-
   specific capabilities where they matter.
-- **Responses keep a uniform shape**
+- **Responses keep a uniform shape** <br>
   Provider calls return
   [`LLM::Response`](https://0x1eef.github.io/x/llm.rb/LLM/Response.html)
   objects as a common base shape, then extend them with endpoint- or
   provider-specific behavior when needed.
-- **Low-level access is still there**
+- **Low-level access is still there** <br>
   Normalized responses still keep the raw `Net::HTTPResponse` available when
   you need headers, status, or other HTTP details.
-- **Local model metadata is included**
+- **Local model metadata is included** <br>
   Model capabilities, pricing, and limits are available locally without extra
   API calls.
 ### Design Philosophy
-- **Runs on the stdlib**
+- **Runs on the stdlib** <br>
   Start with Ruby's standard library and add extra dependencies only when you
   need them.
-- **It is highly pluggable**
+- **It is highly pluggable** <br>
   Add tools, swap providers, change JSON backends, plug in tracing, or layer
   internal APIs and MCP servers into the same execution path.
-- **It scales from scripts to long-lived systems**
+- **It scales from scripts to long-lived systems** <br>
   The same primitives work for one-off scripts, background jobs, and more
   demanding application workloads with streaming, persistence, and tracing.
-- **Thread boundaries are clear**
+- **Thread boundaries are clear** <br>
   Providers are shareable. Contexts are stateful and should stay thread-local.
 ## Capabilities
@@ -145,7 +170,11 @@ same context object.
 gem install llm.rb
 ```
-## Example
+## Examples
+**REPL**
+See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
 ```ruby
 require "llm"
@@ -160,12 +189,48 @@ loop do
 end
 ```
+**Sequel (ORM)**
+See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
+```ruby
+require "llm"
+require "sequel"
+require "sequel/plugins/llm"
+class Context < Sequel::Model
+  plugin :llm, provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
+end
+ctx = Context.create(provider: "openai", model: "gpt-5.4-mini")
+ctx.talk("Remember that my favorite language is Ruby")
+puts ctx.talk("What is my favorite language?").content
+```
+**ActiveRecord (ORM)**
+See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
+```ruby
+require "llm"
+require "active_record"
+require "llm/active_record"
+class Context < ApplicationRecord
+  acts_as_llm provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
+end
+ctx = Context.create!(provider: "openai", model: "gpt-5.4-mini")
+ctx.talk("Remember that my favorite language is Ruby")
+puts ctx.talk("What is my favorite language?").content
+```
 ## Resources
 - [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
   examples guide.
-- [_examples/relay](./_examples/relay) shows a real application built on top
-  of llm.rb.
+- [relay](https://github.com/llmrb/relay) shows a real application built on
+  top of llm.rb.
 - [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
 ## License