llm.rb 4.14.0 → 4.16.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ea1addf0bff644fa11e4f69a806f8ff5b7aa04fbbbc3f0592bd51b6ebc07f0f8
4
- data.tar.gz: a3c846b9744e4ef230e2f23ed6ab42f6b4c84a0165b8bc066b7f6a003ee8fc00
3
+ metadata.gz: 793403110075dfcc650d4b0931ebcdfee74787ce1412b61318d7d749b22c3e9f
4
+ data.tar.gz: 469e1b635896822483e8a6ec7cebaf8c34443b90010435c4ae2d4899fd71c1b4
5
5
  SHA512:
6
- metadata.gz: 7387da06d824d42753ff30455b0e464b7ca6eaa43e9410ce814ad96451c5595154d1e721fb69c9edc0971208aaf8a011ce42078827b57971e0e7c0a66eb0db6e
7
- data.tar.gz: 590442f434086b7215d664e6b5d474130499a14fba16810ff7e0b04878d25e46ca8983057af5fd9275d8415d95da6e1439b84388fa450b8c06bc7841c832a48e
6
+ metadata.gz: 515e571ad97704659363a764f633c9199239d0ad1e741b1239e7528cf455160a1246ba756fa16201c47f03b5185939809842eef28aaeb45123aa38c038c23232
7
+ data.tar.gz: 180e987e00885e15d004965e98c5b70b00481568be43c168431afaad27baed40adfb8c93a9cad10fb96210dad3e768407524422b5edf0c62b011ef57900ac742
data/CHANGELOG.md CHANGED
@@ -2,8 +2,79 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ Changes since `v4.16.0`.
6
+
7
+ ## v4.16.0
8
+
9
+ Changes since `v4.15.0`.
10
+
11
+ This release expands ORM support with built-in ActiveRecord persistence
12
+ and improves compatibility with OpenAI-compatible gateways, proxies, and
13
+ self-hosted servers that use non-standard API root paths.
14
+
15
+ ### Change
16
+
17
+ * **Support OpenAI-compatible base paths** <br>
18
+ Add `base_path:` to provider configuration so OpenAI-compatible
19
+ endpoints can vary both host and API prefix. This supports providers,
20
+ proxies, and gateways that keep OpenAI request shapes but use
21
+ non-standard URL layouts such as DeepInfra's `/v1/openai/...`.
22
+
23
+ * **Add ActiveRecord context persistence with `acts_as_llm`** <br>
24
+ Add a built-in ActiveRecord wrapper that mirrors the Sequel plugin
25
+ API so applications can persist `LLM::Context` state on records with
26
+ default columns, provider/context hooks, validation-backed writes,
27
+ and `format: :string`, `:json`, or `:jsonb` storage.
28
+
29
+ ## v4.15.0
30
+
5
31
  Changes since `v4.14.0`.
6
32
 
33
+ ### Change
34
+
35
+ * **Reduce OpenAI stream parser merge overhead** <br>
36
+ Special-case the most common single-field deltas, streamline
37
+ incremental tool-call merging, and avoid repeated JSON parse attempts
38
+ until streamed tool arguments look complete.
39
+
40
+ * **Cache streaming callback capabilities in parsers** <br>
41
+ Cache callback support checks once at parser initialization time in
42
+ the OpenAI, OpenAI Responses, Anthropic, Google, and Ollama stream
43
+ parsers instead of repeating `respond_to?` checks on hot streaming
44
+ paths.
45
+
46
+ * **Reduce OpenAI Responses parser lookup overhead** <br>
47
+ Special-case the hot Responses API event paths and cache the current
48
+ output item and content part so streamed output text deltas do less
49
+ repeated nested lookup work.
50
+
51
+ * **Add a Sequel context persistence plugin** <br>
52
+ Add `plugin :llm` for Sequel models so apps can persist
53
+ `LLM::Context` state with default columns and pass provider setup
54
+ through `provider:` when needed. The plugin now also supports
55
+ `format: :string`, `:json`, or `:jsonb` for text and native JSON
56
+ storage when Sequel JSON typecasting is enabled.
57
+
58
+ * **Improve streaming parser performance** <br>
59
+ In the local replay-based `stream_parser` benchmark versus
60
+ `v4.14.0` (median of 20 samples, 5000 iterations), plain Ruby is a
61
+ small overall win: the generic eventstream path is about 0.4%
62
+ faster, the OpenAI stream parser is about 0.5% faster, and the
63
+ OpenAI Responses parser is about 1.6% faster, with unchanged
64
+ allocations. Under YJIT on the same benchmark, the generic
65
+ eventstream path is about 0.9% faster and the OpenAI stream parser
66
+ is about 0.4% faster, while the OpenAI Responses parser is about
67
+ 0.7% slower, also with unchanged allocations.
68
+
69
+ Compared to `v4.13.0`, the larger `v4.14.0` streaming gains still
70
+ hold. The generic eventstream path remains dramatically faster than
71
+ `v4.13.0`, the OpenAI stream parser remains modestly faster, and the
72
+ OpenAI Responses parser is roughly flat to slightly better depending
73
+ on runtime. In other words, current keeps the large eventstream win
74
+ from `v4.14.0`, adds only small incremental changes beyond that, and
75
+ does not turn the post-`v4.14.0` parser work into another large
76
+ benchmark jump.
77
+
7
78
  ## v4.14.0
8
79
 
9
80
  Changes since `v4.13.0`.
@@ -40,6 +111,18 @@ parallel tool calls can safely share one connection.
40
111
  worthwhile, which lowers allocation churn in the remaining generic
41
112
  SSE path.
42
113
 
114
+ * **Improve streaming parser performance** <br>
115
+ In the local replay-based `stream_parser` benchmark versus `v4.13.0`
116
+ (median of 20 samples, 5000 iterations):
117
+ Plain Ruby: the generic eventstream path is about 53% faster with
118
+ about 32% fewer allocations, the OpenAI stream parser is about 11%
119
+ faster with about 4% fewer allocations, and the OpenAI Responses
120
+ parser is about 3% faster with unchanged allocations.
121
+ YJIT on the current parser benchmark harness: the current tree is
122
+ about 26% faster than non-YJIT on the generic eventstream path,
123
+ about 18% faster on the OpenAI stream parser, and about 16% faster
124
+ on the OpenAI Responses parser, with allocations unchanged.
125
+
43
126
  ### Fix
44
127
 
45
128
  * **Support parallel MCP tool calls on one client** <br>
data/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
  <p align="center">
5
5
  <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
6
6
  <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
7
- <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.14.0-green.svg?" alt="Version"></a>
7
+ <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.16.0-green.svg?" alt="Version"></a>
8
8
  </p>
9
9
 
10
10
  ## About
@@ -17,9 +17,9 @@ state.
17
17
  It is built for engineers who want control over how these systems run. llm.rb
18
18
  stays close to Ruby, runs on the standard library by default, loads optional
19
19
  pieces only when needed, and remains easy to extend. It also works well in
20
- Rails or ActiveRecord applications, where a small wrapper around context
21
- persistence is enough to save and restore long-lived conversation state across
22
- requests, jobs, or retries.
20
+ Rails or ActiveRecord applications, with built-in `acts_as_llm`, and includes
21
+ built-in Sequel support through `plugin :llm`, so long-lived context state can
22
+ be saved and restored across requests, jobs, or retries.
23
23
 
24
24
  Most LLM libraries stop at request/response APIs. Building real systems means
25
25
  stitching together streaming, tools, state, persistence, and external
@@ -34,7 +34,8 @@ so they compose naturally instead of becoming separate subsystems.
34
34
 
35
35
  ## Core Concept
36
36
 
37
- `LLM::Context` is the execution boundary in llm.rb.
37
+ [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
38
+ is the execution boundary in llm.rb.
38
39
 
39
40
  It holds:
40
41
  - message history
@@ -50,69 +51,93 @@ same context object.
50
51
 
51
52
  ### Execution Model
52
53
 
53
- - **A system layer, not just an API wrapper**
54
+ - **A system layer, not just an API wrapper** <br>
54
55
  Put providers, tools, MCP servers, and application APIs behind one runtime
55
56
  model instead of stitching them together by hand.
56
- - **Contexts are central**
57
+ - **Contexts are central** <br>
57
58
  Keep history, tools, schema, usage, persistence, and execution state in one
58
59
  place instead of spreading them across your app.
59
- - **Contexts can be serialized**
60
+ - **Contexts can be serialized** <br>
60
61
  Save and restore live state for jobs, databases, retries, or long-running
61
62
  workflows.
62
63
 
63
64
  ### Runtime Behavior
64
65
 
65
- - **Streaming and tool execution work together**
66
+ - **Streaming and tool execution work together** <br>
66
67
  Start tool work while output is still streaming so you can hide latency
67
68
  instead of waiting for turns to finish.
68
- - **Requests can be interrupted cleanly**
69
+ - **Tool calls have an explicit lifecycle** <br>
70
+ A tool call can be executed, cancelled through
71
+ [`LLM::Function#cancel`](https://0x1eef.github.io/x/llm.rb/LLM/Function.html#cancel-instance_method),
72
+ or left unresolved for manual handling, but the normal runtime contract is
73
+ still that a model-issued tool request is answered with a tool return.
74
+ - **Requests can be interrupted cleanly** <br>
69
75
  Stop in-flight provider work through the same runtime instead of treating
70
- cancellation as a separate concern. `LLM::Context#cancel!` is inspired by
71
- Go's context cancellation model.
72
- - **Concurrency is a first-class feature**
76
+ cancellation as a separate concern.
77
+ [`LLM::Context#cancel!`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html#cancel-21-instance_method)
78
+ is inspired by Go's context cancellation model.
79
+ - **Concurrency is a first-class feature** <br>
73
80
  Use threads, fibers, or async tasks without rewriting your tool layer.
74
- - **Advanced workloads are built in, not bolted on**
81
+ - **Advanced workloads are built in, not bolted on** <br>
75
82
  Streaming, concurrent tool execution, persistence, tracing, and MCP support
76
83
  all fit the same runtime model.
77
84
 
78
85
  ### Integration
79
86
 
80
- - **MCP is built in**
87
+ - **MCP is built in** <br>
81
88
  Connect to MCP servers over stdio or HTTP without bolting on a separate
82
89
  integration stack.
83
- - **Provider support is broad**
90
+ - **ActiveRecord and Sequel persistence are built in** <br>
91
+ Use `acts_as_llm` on ActiveRecord models or `plugin :llm` on Sequel models
92
+ to persist `LLM::Context` state with sensible default columns. Both support
93
+ `provider:` and `context:` hooks, plus `format: :string` for text columns
94
+ or `format: :jsonb` for native PostgreSQL JSON storage when ORM JSON
95
+ typecasting support is enabled.
96
+ - **Persistent HTTP pooling is shared process-wide** <br>
97
+ When enabled, separate
98
+ [`LLM::Provider`](https://0x1eef.github.io/x/llm.rb/LLM/Provider.html)
99
+ instances with the same endpoint settings can share one persistent
100
+ pool, and separate HTTP
101
+ [`LLM::MCP`](https://0x1eef.github.io/x/llm.rb/LLM/MCP.html)
102
+ instances can do the same, instead of each object creating its own
103
+ isolated per-instance transport.
104
+ - **OpenAI-compatible gateways are supported** <br>
105
+ Target OpenAI-compatible services such as DeepInfra and OpenRouter, as well
106
+ as proxies and self-hosted servers, with `host:` and `base_path:` when they
107
+ preserve OpenAI request shapes but change the API root path.
108
+ - **Provider support is broad** <br>
84
109
  Work with OpenAI, OpenAI-compatible endpoints, Anthropic, Google, DeepSeek,
85
110
  Z.ai, xAI, llama.cpp, and Ollama through the same runtime.
86
- - **Tools are explicit**
111
+ - **Tools are explicit** <br>
87
112
  Run local tools, provider-native tools, and MCP tools through the same path
88
113
  with fewer special cases.
89
- - **Providers are normalized, not flattened**
114
+ - **Providers are normalized, not flattened** <br>
90
115
  Share one API surface across providers without losing access to provider-
91
116
  specific capabilities where they matter.
92
- - **Responses keep a uniform shape**
117
+ - **Responses keep a uniform shape** <br>
93
118
  Provider calls return
94
119
  [`LLM::Response`](https://0x1eef.github.io/x/llm.rb/LLM/Response.html)
95
120
  objects as a common base shape, then extend them with endpoint- or
96
121
  provider-specific behavior when needed.
97
- - **Low-level access is still there**
122
+ - **Low-level access is still there** <br>
98
123
  Normalized responses still keep the raw `Net::HTTPResponse` available when
99
124
  you need headers, status, or other HTTP details.
100
- - **Local model metadata is included**
125
+ - **Local model metadata is included** <br>
101
126
  Model capabilities, pricing, and limits are available locally without extra
102
127
  API calls.
103
128
 
104
129
  ### Design Philosophy
105
130
 
106
- - **Runs on the stdlib**
131
+ - **Runs on the stdlib** <br>
107
132
  Start with Ruby's standard library and add extra dependencies only when you
108
133
  need them.
109
- - **It is highly pluggable**
134
+ - **It is highly pluggable** <br>
110
135
  Add tools, swap providers, change JSON backends, plug in tracing, or layer
111
136
  internal APIs and MCP servers into the same execution path.
112
- - **It scales from scripts to long-lived systems**
137
+ - **It scales from scripts to long-lived systems** <br>
113
138
  The same primitives work for one-off scripts, background jobs, and more
114
139
  demanding application workloads with streaming, persistence, and tracing.
115
- - **Thread boundaries are clear**
140
+ - **Thread boundaries are clear** <br>
116
141
  Providers are shareable. Contexts are stateful and should stay thread-local.
117
142
 
118
143
  ## Capabilities
@@ -145,7 +170,11 @@ same context object.
145
170
  gem install llm.rb
146
171
  ```
147
172
 
148
- ## Example
173
+ ## Examples
174
+
175
+ **REPL**
176
+
177
+ See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
149
178
 
150
179
  ```ruby
151
180
  require "llm"
@@ -160,12 +189,48 @@ loop do
160
189
  end
161
190
  ```
162
191
 
192
+ **Sequel (ORM)**
193
+
194
+ See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
195
+
196
+ ```ruby
197
+ require "llm"
198
+ require "sequel"
199
+ require "sequel/plugins/llm"
200
+
201
+ class Context < Sequel::Model
202
+ plugin :llm, provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
203
+ end
204
+
205
+ ctx = Context.create(provider: "openai", model: "gpt-5.4-mini")
206
+ ctx.talk("Remember that my favorite language is Ruby")
207
+ puts ctx.talk("What is my favorite language?").content
208
+ ```
209
+
210
+ **ActiveRecord (ORM)**
211
+
212
+ See the [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) for more examples.
213
+
214
+ ```ruby
215
+ require "llm"
216
+ require "active_record"
217
+ require "llm/active_record"
218
+
219
+ class Context < ApplicationRecord
220
+ acts_as_llm provider: -> { { key: ENV["#{provider.upcase}_SECRET"], persistent: true } }
221
+ end
222
+
223
+ ctx = Context.create!(provider: "openai", model: "gpt-5.4-mini")
224
+ ctx.talk("Remember that my favorite language is Ruby")
225
+ puts ctx.talk("What is my favorite language?").content
226
+ ```
227
+
163
228
  ## Resources
164
229
 
165
230
  - [deepdive](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) is the
166
231
  examples guide.
167
- - [_examples/relay](./_examples/relay) shows a real application built on top
168
- of llm.rb.
232
+ - [relay](https://github.com/llmrb/relay) shows a real application built on
233
+ top of llm.rb.
169
234
  - [doc site](https://0x1eef.github.io/x/llm.rb?rebuild=1) has the API docs.
170
235
 
171
236
  ## License