llm.rb 4.22.0 → 4.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 96698cb3af793b0bd83cae7635279cefbff24f86b11f59c9209edd76f76b757c
4
- data.tar.gz: 389e4372ab3b4a2e90020e6e2e838b5a36516d5a5dd82a71243975dfe6f8f959
3
+ metadata.gz: 49ed8077a6283802d4141dcb9ec037c7fc46920ebd3273b30c55624b575f3156
4
+ data.tar.gz: e2289baf740ba9603ed1c308414e632ddda296356659c8714bf3a1744c216104
5
5
  SHA512:
6
- metadata.gz: 6bd4fa02802333bbb925db2e513913bd1669e8a4d7c85d8cb76b88399e9b0e84bfd5ddf922c7816a2afd0c0d76d6a9f8c873702c789665dfe3205ada01d34203
7
- data.tar.gz: 0d579386ead2158a4e7ad4991ff0c025758ac51624947d07e5d112779d46cb36bcabdd492ac20bbabc981b3e75e25300d04ba8b86808e4825b5c66e2186e52ae
6
+ metadata.gz: b6b0d72baa785a6bf25cbfd3f2581d7f6a5850a0fa61dea29668596e19eb8a1142330f8acfea7f04a1bc76461c02c0af681588332d955aae2b5c6808f2fc0610
7
+ data.tar.gz: 836fc45489b9d86c7bde3ed2b94d2813be5bdaea1ebf7697f7e7eca5962f5374343e371188e40ced180ed50e053cd74ec8fcec8dea08c164291ee8577301f195
data/CHANGELOG.md CHANGED
@@ -2,8 +2,37 @@
2
2
 
3
3
  ## Unreleased
4
4
 
5
+ Changes since `v4.23.0`.
6
+
7
+ ## v4.23.0
8
+
5
9
  Changes since `v4.22.0`.
6
10
 
11
+ This release expands llm.rb's runtime surface for long-lived contexts and
12
+ stateful tools. It adds built-in context compaction through `LLM::Compactor`,
13
+ lets explicit `tools:` arrays accept bound `LLM::Tool` instances, and fixes
14
+ OpenAI-compatible no-arg tool schemas for stricter providers such as xAI.
15
+
16
+ ### Change
17
+
18
+ * **Add `LLM::Compactor` for long-lived contexts** <br>
19
+ Add built-in context compaction through `LLM::Compactor`, so older history
20
+ can be summarized, retained windows can stay bounded, compaction can run on
21
+ its own `model:`, and `LLM::Stream` can observe the lifecycle through
22
+ `on_compaction` and `on_compaction_finish`.
23
+
24
+ * **Allow bound tool instances in explicit tool lists** <br>
25
+ Let explicit `tools:` arrays accept `LLM::Tool` instances such as
26
+ `MyTool.new(foo: 1)`, so tools can carry bound state without changing the
27
+ global tool registry model.
28
+
29
+ ### Fix
30
+
31
+ * **Fix xAI/OpenAI-compatible no-arg tool schemas** <br>
32
+ Send an empty object schema for tools without declared parameters instead
33
+ of `null`, so stricter providers such as xAI accept mixed tool sets that
34
+ include no-arg tools.
35
+
7
36
  ## v4.22.0
8
37
 
9
38
  Changes since `v4.21.0`.
data/README.md CHANGED
@@ -4,7 +4,7 @@
4
4
  <p align="center">
5
5
  <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
6
6
  <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
7
- <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.21.0-green.svg?" alt="Version"></a>
7
+ <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.23.0-green.svg?" alt="Version"></a>
8
8
  </p>
9
9
 
10
10
  ## About
@@ -147,12 +147,34 @@ ctx.talk("Remember that my favorite language is Ruby.")
147
147
  ctx.save(path: "context.json")
148
148
  ```
149
149
 
150
+ #### Context Compaction
151
+
152
+ Long-lived contexts can compact older history into a summary instead of
153
+ growing forever. Compaction is built into [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
154
+ through [`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html),
155
+ and when a stream is present it emits `on_compaction` and
156
+ `on_compaction_finish` through [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
157
+ The compactor can also use a different model from the main context, which is
158
+ useful when you want summarization to run on a cheaper or faster model.
159
+
160
+ ```ruby
161
+ ctx = LLM::Context.new(
162
+ llm,
163
+ compactor: {
164
+ message_threshold: 200,
165
+ retention_window: 8,
166
+ model: "gpt-5.4-mini"
167
+ }
168
+ )
169
+ ```
170
+
150
171
  #### LLM::Stream
151
172
 
152
173
  `LLM::Stream` is not just for printing tokens. It supports `on_content`,
153
- `on_reasoning_content`, `on_tool_call`, and `on_tool_return`, which means
154
- visible output, reasoning output, and tool execution can all be driven through
155
- the same execution path.
174
+ `on_reasoning_content`, `on_tool_call`, `on_tool_return`, `on_compaction`,
175
+ and `on_compaction_finish`, which means visible output, reasoning output, tool
176
+ execution, and context compaction can all be driven through the same
177
+ execution path.
156
178
 
157
179
  ```ruby
158
180
  class Stream < LLM::Stream
@@ -350,6 +372,7 @@ Runtime Building Blocks:
350
372
  - **Agents** — reusable assistants with tool auto-execution
351
373
  - **Skills** — directory-backed capabilities loaded from `SKILL.md`
352
374
  - **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
375
+ - **Context Compaction** — summarize older history in long-lived contexts
353
376
 
354
377
  Data and Structure:
355
378
  - **Structured Outputs** — JSON Schema-based responses
@@ -464,6 +487,42 @@ ctx.talk("Run `date` and `uname -a`.")
464
487
  ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
465
488
  ```
466
489
 
490
+ #### Context Compaction
491
+
492
+ This example uses [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
493
+ [`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html), and
494
+ [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) together so
495
+ long-lived contexts can summarize older history and expose the lifecycle
496
+ through stream hooks. This approach is inspired by General Intelligence
497
+ Systems' [Brute](https://github.com/general-intelligence-systems/brute). The
498
+ compactor can also use its own `model:` if you want summarization to run on a
499
+ different model from the main context. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
500
+
501
+ ```ruby
502
+ require "llm"
503
+
504
+ class Stream < LLM::Stream
505
+ def on_compaction(ctx, compactor)
506
+ puts "Compacting #{ctx.messages.size} messages..."
507
+ end
508
+
509
+ def on_compaction_finish(ctx, compactor)
510
+ puts "Compacted to #{ctx.messages.size} messages."
511
+ end
512
+ end
513
+
514
+ llm = LLM.openai(key: ENV["KEY"])
515
+ ctx = LLM::Context.new(
516
+ llm,
517
+ stream: Stream.new,
518
+ compactor: {
519
+ message_threshold: 200,
520
+ retention_window: 8,
521
+ model: "gpt-5.4-mini"
522
+ }
523
+ )
524
+ ```
525
+
467
526
  #### Reasoning
468
527
 
469
528
  This example uses [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) with the OpenAI Responses API so reasoning output is streamed separately from visible assistant output. See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
data/lib/llm/buffer.rb CHANGED
@@ -23,6 +23,16 @@ module LLM
23
23
  @messages.concat(ary)
24
24
  end
25
25
 
26
+ ##
27
+ # Replace the tracked messages
28
+ # @param [Array<LLM::Message>] messages
29
+ # The replacement messages
30
+ # @return [LLM::Buffer]
31
+ def replace(messages)
32
+ @messages.replace(messages)
33
+ self
34
+ end
35
+
26
36
  ##
27
37
  # @yield [LLM::Message]
28
38
  # Yields each message in the conversation thread
@@ -0,0 +1,128 @@
1
+ # frozen_string_literal: true
2
+
3
+ ##
4
+ # {LLM::Compactor LLM::Compactor} summarizes older context messages into a
5
+ # smaller replacement message when a context grows too large.
6
+ #
7
+ # This work is directly inspired by the compaction approach developed by
8
+ # General Intelligence Systems in
9
+ # [Brute](https://github.com/general-intelligence-systems/brute).
10
+ #
11
+ # The compactor can also use a different model from the main context by
12
+ # setting `model:` in the compactor config. By default, `token_threshold` is
13
+ # 10% less than the current context window, or `100_000` when the context
14
+ # window is unknown. Set `message_threshold:` or `token_threshold:` to `nil`
15
+ # to disable that constraint.
16
+ class LLM::Compactor
17
+ DEFAULT_TOKEN_THRESHOLD = 100_000
18
+ DEFAULTS = {
19
+ message_threshold: 200,
20
+ retention_window: 8,
21
+ model: nil
22
+ }.freeze
23
+
24
+ ##
25
+ # @return [Hash]
26
+ attr_reader :config
27
+
28
+ ##
29
+ # @param [LLM::Context] ctx
30
+ # @param [Hash] config
31
+ # @option config [Integer] :token_threshold
32
+ # Defaults to 10% less than the current context window, or `100_000` when
33
+ # the context window is unknown. Set to `nil` to disable token-based
34
+ # compaction.
35
+ # @option config [Integer] :message_threshold
36
+ # Set to `nil` to disable message-count-based compaction.
37
+ # @option config [Integer] :retention_window
38
+ # @option config [String, nil] :model
39
+ # The model to use for the summarization request. Defaults to the current
40
+ # context model.
41
+ def initialize(ctx, **config)
42
+ @ctx = ctx
43
+ @config = DEFAULTS.merge(token_threshold: default_token_threshold).merge(config)
44
+ end
45
+
46
+ ##
47
+ # Returns true when the context should be compacted
48
+ # @param [Object] prompt
49
+ # The next prompt or turn input
50
+ # @return [Boolean]
51
+ def compact?(prompt = nil)
52
+ return false if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
53
+ messages = ctx.messages.reject(&:system?)
54
+ return true if config[:message_threshold] && messages.size > config[:message_threshold]
55
+ usage = ctx.usage
56
+ return true if config[:token_threshold] && usage && usage.total_tokens > config[:token_threshold]
57
+ false
58
+ end
59
+
60
+ ##
61
+ # Summarize older messages and replace them with a compact summary.
62
+ # @param [Object] prompt
63
+ # The next prompt or turn input
64
+ # @return [LLM::Message, nil]
65
+ def compact!(prompt = nil)
66
+ return nil if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
67
+ messages = ctx.messages.reject(&:system?)
68
+ retention_window = [config[:retention_window], messages.size].min
69
+ return nil unless messages.size > retention_window
70
+ stream = ctx.params[:stream]
71
+ stream.on_compaction(ctx, self) if LLM::Stream === stream
72
+ recent = retained_messages
73
+ older = messages[0...(messages.size - recent.size)]
74
+ summary = LLM::Message.new(ctx.llm.user_role, "[Previous conversation summary]\n\n#{summarize(older)}")
75
+ ctx.messages.replace([*ctx.messages.take_while(&:system?), summary, *recent])
76
+ stream.on_compaction_finish(ctx, self) if LLM::Stream === stream
77
+ summary
78
+ end
79
+
80
+ private
81
+
82
+ attr_reader :ctx
83
+
84
+ def default_token_threshold
85
+ window = ctx.context_window
86
+ return DEFAULT_TOKEN_THRESHOLD if window.zero?
87
+ window - (window / 10)
88
+ end
89
+
90
+ def retained_messages
91
+ messages = ctx.messages.reject(&:system?)
92
+ retention_window = [config[:retention_window], messages.size].min
93
+ start = [messages.size - retention_window, 0].max
94
+ start -= 1 while start > 0 && messages[start].tool_return?
95
+ messages[start..] || []
96
+ end
97
+
98
+ def summarize(messages)
99
+ model = config[:model] || ctx.params[:model] || ctx.llm.default_model
100
+ ctx.llm.complete(summary_prompt(messages), model:).content
101
+ end
102
+
103
+ def summary_prompt(messages)
104
+ <<~PROMPT
105
+ Summarize this conversation history for context continuity.
106
+ The summary will replace these messages in the context window.
107
+
108
+ Focus on:
109
+ - What the user asked for
110
+ - Important facts and decisions
111
+ - Tool calls and outcomes that still matter
112
+ - What should happen next
113
+
114
+ Conversation:
115
+ #{serialize(messages)}
116
+ PROMPT
117
+ end
118
+
119
+ def serialize(messages)
120
+ messages.map do |message|
121
+ content = case message.content
122
+ when Array then message.content.map(&:inspect).join(", ")
123
+ else message.content.to_s
124
+ end
125
+ "#{message.role}: #{content.empty? ? "(empty)" : content}"
126
+ end.join("\n---\n")
127
+ end
128
+ end
data/lib/llm/context.rb CHANGED
@@ -34,6 +34,7 @@ module LLM
34
34
  # ctx.talk(prompt)
35
35
  # ctx.messages.each { |m| puts "[#{m.role}] #{m.content}" }
36
36
  class Context
37
+ require_relative "compactor"
37
38
  require_relative "context/serializer"
38
39
  require_relative "context/deserializer"
39
40
  include Serializer
@@ -75,12 +76,24 @@ module LLM
75
76
  def initialize(llm, params = {})
76
77
  @llm = llm
77
78
  @mode = params.delete(:mode) || :completions
79
+ @compactor = params.delete(:compactor)
78
80
  tools = [*params.delete(:tools), *load_skills(params.delete(:skills))]
79
81
  @params = {model: llm.default_model, schema: nil}.compact.merge!(params)
80
82
  @params[:tools] = tools unless tools.empty?
81
83
  @messages = LLM::Buffer.new(llm)
82
84
  end
83
85
 
86
+ ##
87
+ # Returns a context compactor
88
+ # This feature is inspired by the compaction approach developed by
89
+ # General Intelligence Systems in
90
+ # [Brute](https://github.com/general-intelligence-systems/brute).
91
+ # @return [LLM::Compactor]
92
+ def compactor
93
+ @compactor = LLM::Compactor.new(self, **(@compactor || {})) unless LLM::Compactor === @compactor
94
+ @compactor
95
+ end
96
+
84
97
  ##
85
98
  # Interact with the context via the chat completions API.
86
99
  # This method immediately sends a request to the LLM and returns the response.
@@ -96,6 +109,7 @@ module LLM
96
109
  def talk(prompt, params = {})
97
110
  return respond(prompt, params) if mode == :responses
98
111
  @owner = Fiber.current
112
+ compactor.compact!(prompt) if compactor.compact?(prompt)
99
113
  params = params.merge(messages: @messages.to_a)
100
114
  params = @params.merge(params)
101
115
  bind!(params[:stream], params[:model])
@@ -123,6 +137,7 @@ module LLM
123
137
  # puts res.output_text
124
138
  def respond(prompt, params = {})
125
139
  @owner = Fiber.current
140
+ compactor.compact!(prompt) if compactor.compact?(prompt)
126
141
  params = @params.merge(params)
127
142
  bind!(params[:stream], params[:model])
128
143
  res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
@@ -224,7 +239,14 @@ module LLM
224
239
  # messages.
225
240
  # @return [LLM::Object, nil]
226
241
  def usage
227
- @messages.find(&:assistant?)&.usage
242
+ usage = @messages.find(&:assistant?)&.usage
243
+ return unless usage
244
+ LLM::Object.from(
245
+ input_tokens: usage.input_tokens || 0,
246
+ output_tokens: usage.output_tokens || 0,
247
+ reasoning_tokens: usage.reasoning_tokens || 0,
248
+ total_tokens: usage.total_tokens || 0
249
+ )
228
250
  end
229
251
 
230
252
  ##
data/lib/llm/function.rb CHANGED
@@ -266,9 +266,10 @@ class LLM::Function
266
266
  parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
267
267
  }.compact
268
268
  else
269
+ params = @params || {type: "object", properties: {}}
269
270
  {
270
271
  type: "function", name: @name,
271
- function: {name: @name, description: @description, parameters: @params}
272
+ function: {name: @name, description: @description, parameters: params}
272
273
  }.compact
273
274
  end
274
275
  end
data/lib/llm/stream.rb CHANGED
@@ -18,7 +18,8 @@ module LLM
18
18
  #
19
19
  # The most common callback is {#on_content}, which also maps to {#<<}.
20
20
  # Providers may also call {#on_reasoning_content} and {#on_tool_call} when
21
- # that data is available.
21
+ # that data is available. Runtime features such as context compaction may
22
+ # also emit lifecycle callbacks like {#on_compaction}.
22
23
  class Stream
23
24
  require_relative "stream/queue"
24
25
 
@@ -103,6 +104,24 @@ module LLM
103
104
  nil
104
105
  end
105
106
 
107
+ ##
108
+ # Called before a context compaction starts.
109
+ # @param [LLM::Context] ctx
110
+ # @param [LLM::Compactor] compactor
111
+ # @return [nil]
112
+ def on_compaction(ctx, compactor)
113
+ nil
114
+ end
115
+
116
+ ##
117
+ # Called after a context compaction finishes.
118
+ # @param [LLM::Context] ctx
119
+ # @param [LLM::Compactor] compactor
120
+ # @return [nil]
121
+ def on_compaction_finish(ctx, compactor)
122
+ nil
123
+ end
124
+
106
125
  # @endgroup
107
126
 
108
127
  # @group Error handlers
data/lib/llm/tool.rb CHANGED
@@ -171,4 +171,18 @@ class LLM::Tool
171
171
  def self.mcp?
172
172
  false
173
173
  end
174
+
175
+ ##
176
+ # Returns a function bound to this tool instance.
177
+ # @return [LLM::Function]
178
+ def function
179
+ @function ||= self.class.function.dup.tap { _1.register(self) }
180
+ end
181
+
182
+ ##
183
+ # Returns true if the tool is an MCP tool
184
+ # @return [Boolean]
185
+ def mcp?
186
+ self.class.mcp?
187
+ end
174
188
  end
data/lib/llm/version.rb CHANGED
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module LLM
4
- VERSION = "4.22.0"
4
+ VERSION = "4.23.0"
5
5
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: llm.rb
3
3
  version: !ruby/object:Gem::Version
4
- version: 4.22.0
4
+ version: 4.23.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Antar Azri
@@ -271,6 +271,7 @@ files:
271
271
  - lib/llm/agent.rb
272
272
  - lib/llm/bot.rb
273
273
  - lib/llm/buffer.rb
274
+ - lib/llm/compactor.rb
274
275
  - lib/llm/context.rb
275
276
  - lib/llm/context/deserializer.rb
276
277
  - lib/llm/context/serializer.rb