RubyGems - llm.rb - Versions diffs - 4.22.0 → 4.23.0 - Mend

llm.rb 4.22.0 → 4.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 96698cb3af793b0bd83cae7635279cefbff24f86b11f59c9209edd76f76b757c
-  data.tar.gz: 389e4372ab3b4a2e90020e6e2e838b5a36516d5a5dd82a71243975dfe6f8f959
+  metadata.gz: 49ed8077a6283802d4141dcb9ec037c7fc46920ebd3273b30c55624b575f3156
+  data.tar.gz: e2289baf740ba9603ed1c308414e632ddda296356659c8714bf3a1744c216104
 SHA512:
-  metadata.gz: 6bd4fa02802333bbb925db2e513913bd1669e8a4d7c85d8cb76b88399e9b0e84bfd5ddf922c7816a2afd0c0d76d6a9f8c873702c789665dfe3205ada01d34203
-  data.tar.gz: 0d579386ead2158a4e7ad4991ff0c025758ac51624947d07e5d112779d46cb36bcabdd492ac20bbabc981b3e75e25300d04ba8b86808e4825b5c66e2186e52ae
+  metadata.gz: b6b0d72baa785a6bf25cbfd3f2581d7f6a5850a0fa61dea29668596e19eb8a1142330f8acfea7f04a1bc76461c02c0af681588332d955aae2b5c6808f2fc0610
+  data.tar.gz: 836fc45489b9d86c7bde3ed2b94d2813be5bdaea1ebf7697f7e7eca5962f5374343e371188e40ced180ed50e053cd74ec8fcec8dea08c164291ee8577301f195

data/CHANGELOG.md CHANGED Viewed

@@ -2,8 +2,37 @@
 ## Unreleased
+Changes since `v4.23.0`.
+## v4.23.0
 Changes since `v4.22.0`.
+This release expands llm.rb's runtime surface for long-lived contexts and
+stateful tools. It adds built-in context compaction through `LLM::Compactor`,
+lets explicit `tools:` arrays accept bound `LLM::Tool` instances, and fixes
+OpenAI-compatible no-arg tool schemas for stricter providers such as xAI.
+### Change
+* **Add `LLM::Compactor` for long-lived contexts** <br>
+  Add built-in context compaction through `LLM::Compactor`, so older history
+  can be summarized, retained windows can stay bounded, compaction can run on
+  its own `model:`, and `LLM::Stream` can observe the lifecycle through
+  `on_compaction` and `on_compaction_finish`.
+* **Allow bound tool instances in explicit tool lists** <br>
+  Let explicit `tools:` arrays accept `LLM::Tool` instances such as
+  `MyTool.new(foo: 1)`, so tools can carry bound state without changing the
+  global tool registry model.
+### Fix
+* **Fix xAI/OpenAI-compatible no-arg tool schemas** <br>
+  Send an empty object schema for tools without declared parameters instead
+  of `null`, so stricter providers such as xAI accept mixed tool sets that
+  include no-arg tools.
 ## v4.22.0
 Changes since `v4.21.0`.

data/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.21.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.23.0-green.svg?" alt="Version"></a>
 </p>
 ## About
@@ -147,12 +147,34 @@ ctx.talk("Remember that my favorite language is Ruby.")
 ctx.save(path: "context.json")
 ```
+#### Context Compaction
+Long-lived contexts can compact older history into a summary instead of
+growing forever. Compaction is built into [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+through [`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html),
+and when a stream is present it emits `on_compaction` and
+`on_compaction_finish` through [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
+The compactor can also use a different model from the main context, which is
+useful when you want summarization to run on a cheaper or faster model.
+```ruby
+ctx = LLM::Context.new(
+  llm,
+  compactor: {
+    message_threshold: 200,
+    retention_window: 8,
+    model: "gpt-5.4-mini"
+  }
+)
+```
 #### LLM::Stream
 `LLM::Stream` is not just for printing tokens. It supports `on_content`,
-`on_reasoning_content`, `on_tool_call`, and `on_tool_return`, which means
-visible output, reasoning output, and tool execution can all be driven through
-the same execution path.
+`on_reasoning_content`, `on_tool_call`, `on_tool_return`, `on_compaction`,
+and `on_compaction_finish`, which means visible output, reasoning output, tool
+execution, and context compaction can all be driven through the same
+execution path.
 ```ruby
 class Stream < LLM::Stream
@@ -350,6 +372,7 @@ Runtime Building Blocks:
 - **Agents** — reusable assistants with tool auto-execution
 - **Skills** — directory-backed capabilities loaded from `SKILL.md`
 - **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
+- **Context Compaction** — summarize older history in long-lived contexts
 Data and Structure:
 - **Structured Outputs** — JSON Schema-based responses
@@ -464,6 +487,42 @@ ctx.talk("Run `date` and `uname -a`.")
 ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
 ```
+#### Context Compaction
+This example uses [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
+[`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html), and
+[`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) together so
+long-lived contexts can summarize older history and expose the lifecycle
+through stream hooks. This approach is inspired by General Intelligence
+Systems' [Brute](https://github.com/general-intelligence-systems/brute). The
+compactor can also use its own `model:` if you want summarization to run on a
+different model from the main context. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
+```ruby
+require "llm"
+class Stream < LLM::Stream
+  def on_compaction(ctx, compactor)
+    puts "Compacting #{ctx.messages.size} messages..."
+  end
+  def on_compaction_finish(ctx, compactor)
+    puts "Compacted to #{ctx.messages.size} messages."
+  end
+end
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(
+  llm,
+  stream: Stream.new,
+  compactor: {
+    message_threshold: 200,
+    retention_window: 8,
+    model: "gpt-5.4-mini"
+  }
+)
+```
 #### Reasoning
 This example uses [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) with the OpenAI Responses API so reasoning output is streamed separately from visible assistant output. See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.

data/lib/llm/buffer.rb CHANGED Viewed

@@ -23,6 +23,16 @@ module LLM
       @messages.concat(ary)
     end
+    ##
+    # Replace the tracked messages
+    # @param [Array<LLM::Message>] messages
+    #  The replacement messages
+    # @return [LLM::Buffer]
+    def replace(messages)
+      @messages.replace(messages)
+      self
+    end
     ##
     # @yield [LLM::Message]
     #  Yields each message in the conversation thread

data/lib/llm/compactor.rb ADDED Viewed

@@ -0,0 +1,128 @@
+# frozen_string_literal: true
+##
+# {LLM::Compactor LLM::Compactor} summarizes older context messages into a
+# smaller replacement message when a context grows too large.
+#
+# This work is directly inspired by the compaction approach developed by
+# General Intelligence Systems in
+# [Brute](https://github.com/general-intelligence-systems/brute).
+#
+# The compactor can also use a different model from the main context by
+# setting `model:` in the compactor config. By default, `token_threshold` is
+# 10% less than the current context window, or `100_000` when the context
+# window is unknown. Set `message_threshold:` or `token_threshold:` to `nil`
+# to disable that constraint.
+class LLM::Compactor
+  DEFAULT_TOKEN_THRESHOLD = 100_000
+  DEFAULTS = {
+    message_threshold: 200,
+    retention_window: 8,
+    model: nil
+  }.freeze
+  ##
+  # @return [Hash]
+  attr_reader :config
+  ##
+  # @param [LLM::Context] ctx
+  # @param [Hash] config
+  # @option config [Integer] :token_threshold
+  #  Defaults to 10% less than the current context window, or `100_000` when
+  #  the context window is unknown. Set to `nil` to disable token-based
+  #  compaction.
+  # @option config [Integer] :message_threshold
+  #  Set to `nil` to disable message-count-based compaction.
+  # @option config [Integer] :retention_window
+  # @option config [String, nil] :model
+  #  The model to use for the summarization request. Defaults to the current
+  #  context model.
+  def initialize(ctx, **config)
+    @ctx = ctx
+    @config = DEFAULTS.merge(token_threshold: default_token_threshold).merge(config)
+  end
+  ##
+  # Returns true when the context should be compacted
+  # @param [Object] prompt
+  #  The next prompt or turn input
+  # @return [Boolean]
+  def compact?(prompt = nil)
+    return false if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
+    messages = ctx.messages.reject(&:system?)
+    return true if config[:message_threshold] && messages.size > config[:message_threshold]
+    usage = ctx.usage
+    return true if config[:token_threshold] && usage && usage.total_tokens > config[:token_threshold]
+    false
+  end
+  ##
+  # Summarize older messages and replace them with a compact summary.
+  # @param [Object] prompt
+  #  The next prompt or turn input
+  # @return [LLM::Message, nil]
+  def compact!(prompt = nil)
+    return nil if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
+    messages = ctx.messages.reject(&:system?)
+    retention_window = [config[:retention_window], messages.size].min
+    return nil unless messages.size > retention_window
+    stream = ctx.params[:stream]
+    stream.on_compaction(ctx, self) if LLM::Stream === stream
+    recent = retained_messages
+    older = messages[0...(messages.size - recent.size)]
+    summary = LLM::Message.new(ctx.llm.user_role, "[Previous conversation summary]\n\n#{summarize(older)}")
+    ctx.messages.replace([*ctx.messages.take_while(&:system?), summary, *recent])
+    stream.on_compaction_finish(ctx, self) if LLM::Stream === stream
+    summary
+  end
+  private
+  attr_reader :ctx
+  def default_token_threshold
+    window = ctx.context_window
+    return DEFAULT_TOKEN_THRESHOLD if window.zero?
+    window - (window / 10)
+  end
+  def retained_messages
+    messages = ctx.messages.reject(&:system?)
+    retention_window = [config[:retention_window], messages.size].min
+    start = [messages.size - retention_window, 0].max
+    start -= 1 while start > 0 && messages[start].tool_return?
+    messages[start..] || []
+  end
+  def summarize(messages)
+    model = config[:model] || ctx.params[:model] || ctx.llm.default_model
+    ctx.llm.complete(summary_prompt(messages), model:).content
+  end
+  def summary_prompt(messages)
+    <<~PROMPT
+      Summarize this conversation history for context continuity.
+      The summary will replace these messages in the context window.
+      Focus on:
+      - What the user asked for
+      - Important facts and decisions
+      - Tool calls and outcomes that still matter
+      - What should happen next
+      Conversation:
+      #{serialize(messages)}
+    PROMPT
+  end
+  def serialize(messages)
+    messages.map do |message|
+      content = case message.content
+      when Array then message.content.map(&:inspect).join(", ")
+      else message.content.to_s
+      end
+      "#{message.role}: #{content.empty? ? "(empty)" : content}"
+    end.join("\n---\n")
+  end
+end

data/lib/llm/context.rb CHANGED Viewed

@@ -34,6 +34,7 @@ module LLM
   #   ctx.talk(prompt)
   #   ctx.messages.each { |m| puts "[#{m.role}] #{m.content}" }
   class Context
+    require_relative "compactor"
     require_relative "context/serializer"
     require_relative "context/deserializer"
     include Serializer
@@ -75,12 +76,24 @@ module LLM
     def initialize(llm, params = {})
       @llm = llm
       @mode = params.delete(:mode) || :completions
+      @compactor = params.delete(:compactor)
       tools = [*params.delete(:tools), *load_skills(params.delete(:skills))]
       @params = {model: llm.default_model, schema: nil}.compact.merge!(params)
       @params[:tools] = tools unless tools.empty?
       @messages = LLM::Buffer.new(llm)
     end
+    ##
+    # Returns a context compactor
+    # This feature is inspired by the compaction approach developed by
+    # General Intelligence Systems in
+    # [Brute](https://github.com/general-intelligence-systems/brute).
+    # @return [LLM::Compactor]
+    def compactor
+      @compactor = LLM::Compactor.new(self, **(@compactor || {})) unless LLM::Compactor === @compactor
+      @compactor
+    end
     ##
     # Interact with the context via the chat completions API.
     # This method immediately sends a request to the LLM and returns the response.
@@ -96,6 +109,7 @@ module LLM
     def talk(prompt, params = {})
       return respond(prompt, params) if mode == :responses
       @owner = Fiber.current
+      compactor.compact!(prompt) if compactor.compact?(prompt)
       params = params.merge(messages: @messages.to_a)
       params = @params.merge(params)
       bind!(params[:stream], params[:model])
@@ -123,6 +137,7 @@ module LLM
     #   puts res.output_text
     def respond(prompt, params = {})
       @owner = Fiber.current
+      compactor.compact!(prompt) if compactor.compact?(prompt)
       params = @params.merge(params)
       bind!(params[:stream], params[:model])
       res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
@@ -224,7 +239,14 @@ module LLM
     # messages.
     # @return [LLM::Object, nil]
     def usage
-      @messages.find(&:assistant?)&.usage
+      usage = @messages.find(&:assistant?)&.usage
+      return unless usage
+      LLM::Object.from(
+        input_tokens: usage.input_tokens || 0,
+        output_tokens: usage.output_tokens || 0,
+        reasoning_tokens: usage.reasoning_tokens || 0,
+        total_tokens: usage.total_tokens || 0
+      )
     end
     ##

data/lib/llm/function.rb CHANGED Viewed

@@ -266,9 +266,10 @@ class LLM::Function
         parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
       }.compact
     else
+      params = @params || {type: "object", properties: {}}
       {
         type: "function", name: @name,
-        function: {name: @name, description: @description, parameters: @params}
+        function: {name: @name, description: @description, parameters: params}
       }.compact
     end
   end

data/lib/llm/stream.rb CHANGED Viewed

@@ -18,7 +18,8 @@ module LLM
   #
   # The most common callback is {#on_content}, which also maps to {#<<}.
   # Providers may also call {#on_reasoning_content} and {#on_tool_call} when
-  # that data is available.
+  # that data is available. Runtime features such as context compaction may
+  # also emit lifecycle callbacks like {#on_compaction}.
   class Stream
     require_relative "stream/queue"
@@ -103,6 +104,24 @@ module LLM
       nil
     end
+    ##
+    # Called before a context compaction starts.
+    # @param [LLM::Context] ctx
+    # @param [LLM::Compactor] compactor
+    # @return [nil]
+    def on_compaction(ctx, compactor)
+      nil
+    end
+    ##
+    # Called after a context compaction finishes.
+    # @param [LLM::Context] ctx
+    # @param [LLM::Compactor] compactor
+    # @return [nil]
+    def on_compaction_finish(ctx, compactor)
+      nil
+    end
     # @endgroup
     # @group Error handlers

data/lib/llm/tool.rb CHANGED Viewed

@@ -171,4 +171,18 @@ class LLM::Tool
   def self.mcp?
     false
   end
+  ##
+  # Returns a function bound to this tool instance.
+  # @return [LLM::Function]
+  def function
+    @function ||= self.class.function.dup.tap { _1.register(self) }
+  end
+  ##
+  # Returns true if the tool is an MCP tool
+  # @return [Boolean]
+  def mcp?
+    self.class.mcp?
+  end
 end

data/lib/llm/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LLM
-  VERSION = "4.22.0"
+  VERSION = "4.23.0"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm.rb
 version: !ruby/object:Gem::Version
-  version: 4.22.0
+  version: 4.23.0
 platform: ruby
 authors:
 - Antar Azri
@@ -271,6 +271,7 @@ files:
 - lib/llm/agent.rb
 - lib/llm/bot.rb
 - lib/llm/buffer.rb
+- lib/llm/compactor.rb
 - lib/llm/context.rb
 - lib/llm/context/deserializer.rb
 - lib/llm/context/serializer.rb