RubyGems - llm.rb - Versions diffs - 4.22.0 → 5.0.0 - Mend

llm.rb 4.22.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +74 -0
data/README.md +120 -7
data/lib/llm/agent.rb +19 -19
data/lib/llm/buffer.rb +10 -0
data/lib/llm/compactor.rb +117 -0
data/lib/llm/context/deserializer.rb +2 -1
data/lib/llm/context.rb +140 -4
data/lib/llm/error.rb +4 -0
data/lib/llm/function/fiber_group.rb +8 -0
data/lib/llm/function/ractor/task.rb +7 -0
data/lib/llm/function/ractor_group.rb +8 -0
data/lib/llm/function/task.rb +8 -0
data/lib/llm/function/task_group.rb +8 -0
data/lib/llm/function/thread_group.rb +8 -0
data/lib/llm/function.rb +21 -1
data/lib/llm/loop_guard.rb +117 -0
data/lib/llm/message.rb +8 -0
data/lib/llm/stream/queue.rb +8 -0
data/lib/llm/stream.rb +37 -10
data/lib/llm/tool.rb +28 -0
data/lib/llm/version.rb +1 -1
data/lib/llm.rb +1 -0
metadata +3 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 96698cb3af793b0bd83cae7635279cefbff24f86b11f59c9209edd76f76b757c
-  data.tar.gz: 389e4372ab3b4a2e90020e6e2e838b5a36516d5a5dd82a71243975dfe6f8f959
+  metadata.gz: 482fe176a5e48457ba806d4ca3ae46c2e39f0e6a8037b8b39f4aaeea399ea33c
+  data.tar.gz: 71088a3ae2878ad20ed021324ab3da60df42c99753d062c3063bf9ba45cfc079
 SHA512:
-  metadata.gz: 6bd4fa02802333bbb925db2e513913bd1669e8a4d7c85d8cb76b88399e9b0e84bfd5ddf922c7816a2afd0c0d76d6a9f8c873702c789665dfe3205ada01d34203
-  data.tar.gz: 0d579386ead2158a4e7ad4991ff0c025758ac51624947d07e5d112779d46cb36bcabdd492ac20bbabc981b3e75e25300d04ba8b86808e4825b5c66e2186e52ae
+  metadata.gz: aef5fa2469606524a5c00cd582c12035b513c284743a2f86650cb8e0b828952be06935c173bf6b20d0230c31c91dbb20475fca5ca1c25b04e08b02069d399a49
+  data.tar.gz: f937bb7f2d381e131f6c4d87b5c095cab5c4a2c3af9a0dc0557b263c26d637fcc463fd876442faa128cb31edb7833a67a34da5bd9e527c86d4030349d388b000

data/CHANGELOG.md CHANGED Viewed

@@ -2,8 +2,82 @@
 ## Unreleased
+Changes since `v5.0.0`.
+## v5.0.0
+Changes since `v4.23.0`.
+This release expands llm.rb from an execution runtime into a more explicit
+supervision and transformation runtime. It adds context-level guards,
+transformers, and loop supervision through `LLM::LoopGuard`, while deepening
+long-lived context behavior through compaction, interruption hooks, and
+streamed `ctx.spawn(...)` tool execution.
+### Change
+* **Make compactor thresholds explicit** <br>
+  Require `message_threshold:` and `token_threshold:` to be opted into
+  explicitly, so `LLM::Compactor` only compacts automatically when one of
+  those thresholds is configured. Context-window-derived token limits can be
+  computed by the caller when needed.
+* **Allow assigning a compactor through `LLM::Context`** <br>
+  Let `LLM::Context` accept `ctx.compactor = ...` in addition to the
+  constructor `compactor:` option, so compactor config can be assigned or
+  replaced after context initialization.
+* **Mark compaction summaries in message metadata** <br>
+  Mark compaction summaries with `extra[:compaction]` and
+  `LLM::Message#compaction?`, so applications can detect or hide synthetic
+  summary messages in conversation history.
+* **Add cooperative tool interruption hooks** <br>
+  Let `ctx.interrupt!` notify queued tool work through `on_interrupt`, so
+  running tools can clean up cooperatively when a context is cancelled.
+* **Add `LLM::Context` guards** <br>
+  Add a new `guard` capability to `LLM::Context` so execution can be
+  supervised at the runtime level. The built-in `LLM::LoopGuard` detects
+  repeated tool-call patterns and stops stuck agentic loops through in-band
+  `LLM::GuardError` returns. `LLM::Agent` enables this guard by default.
+* **Add `LLM::Context` transformers** <br>
+  Add a new `transformer` capability to `LLM::Context` so prompts and params
+  can be rewritten before provider requests are sent. This makes it possible
+  to apply context-wide behaviors such as PII scrubbing or request-level
+  param injection without rewriting every `talk` and `respond` call site.
+## v4.23.0
 Changes since `v4.22.0`.
+This release expands llm.rb's runtime surface for long-lived contexts and
+stateful tools. It adds built-in context compaction through `LLM::Compactor`,
+lets explicit `tools:` arrays accept bound `LLM::Tool` instances, and fixes
+OpenAI-compatible no-arg tool schemas for stricter providers such as xAI.
+### Change
+* **Add `LLM::Compactor` for long-lived contexts** <br>
+  Add built-in context compaction through `LLM::Compactor`, so older history
+  can be summarized, retained windows can stay bounded, compaction can run on
+  its own `model:`, thresholds can be configured explicitly, and
+  `LLM::Stream` can observe the lifecycle through `on_compaction` and
+  `on_compaction_finish`.
+* **Allow bound tool instances in explicit tool lists** <br>
+  Let explicit `tools:` arrays accept `LLM::Tool` instances such as
+  `MyTool.new(foo: 1)`, so tools can carry bound state without changing the
+  global tool registry model.
+### Fix
+* **Fix xAI/OpenAI-compatible no-arg tool schemas** <br>
+  Send an empty object schema for tools without declared parameters instead
+  of `null`, so stricter providers such as xAI accept mixed tool sets that
+  include no-arg tools.
 ## v4.22.0
 Changes since `v4.21.0`.

data/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-4.21.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-5.0.0-green.svg?" alt="Version"></a>
 </p>
 ## About
@@ -25,6 +25,7 @@ schemas, files, and persisted state, so real systems can be built out of one coh
 execution model instead of a pile of adapters.
 Want to see some code? Jump to [the examples](#examples) section. <br>
+Want to see an agentic framework built on top of llm.rb? Check out [general-intelligence-systems/brute](https://github.com/general-intelligence-systems/brute). <br>
 Want a taste of what llm.rb can build? See [the screencast](#screencast).
 ## Architecture
@@ -147,17 +148,91 @@ ctx.talk("Remember that my favorite language is Ruby.")
 ctx.save(path: "context.json")
 ```
+#### Context Compaction
+Long-lived contexts can compact older history into a summary instead of
+growing forever. Compaction is built into [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html)
+through [`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html),
+and when a stream is present it emits `on_compaction` and
+`on_compaction_finish` through [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
+The compactor can also use a different model from the main context, which is
+useful when you want summarization to run on a cheaper or faster model.
+```ruby
+ctx = LLM::Context.new(
+  llm,
+  compactor: {
+    message_threshold: 200,
+    retention_window: 8,
+    model: "gpt-5.4-mini"
+  }
+)
+```
+#### Guards
+Guards let llm.rb supervise agentic execution, not just run it.
+They live on [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
+can inspect the current runtime state, and can step in when a context is no
+longer making progress.
+[`LLM::LoopGuard`](https://0x1eef.github.io/x/llm.rb/LLM/LoopGuard.html) is
+the built-in implementation. It detects repeated tool-call patterns and
+blocks pending tool execution with in-band guarded tool errors instead of
+letting the loop keep spinning. [`LLM::Agent`](https://0x1eef.github.io/x/llm.rb/LLM/Agent.html)
+enables that guard by default through its wrapped context.
+```ruby
+ctx = LLM::Context.new(llm)
+ctx.guard = MyGuard.new
+```
+#### Transformers
+Transformers let llm.rb rewrite outgoing prompts and params before a request
+is sent to the provider. They also live on
+[`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html), but
+they solve a different problem from guards: instead of blocking execution,
+they can normalize or scrub what gets sent.
+That makes them a good fit for things like PII scrubbing, prompt
+normalization, or request-level param injection. A transformer just needs to
+implement `call(ctx, prompt, params)` and return `[prompt, params]`.
+```ruby
+class ScrubPII
+  EMAIL = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/i
+  def call(ctx, prompt, params)
+    [scrub(prompt), params]
+  end
+  private
+  def scrub(prompt)
+    case prompt
+    when String then prompt.gsub(EMAIL, "[REDACTED_EMAIL]")
+    else prompt
+    end
+  end
+end
+ctx = LLM::Context.new(llm)
+ctx.transformer = ScrubPII.new
+```
 #### LLM::Stream
 `LLM::Stream` is not just for printing tokens. It supports `on_content`,
-`on_reasoning_content`, `on_tool_call`, and `on_tool_return`, which means
-visible output, reasoning output, and tool execution can all be driven through
-the same execution path.
+`on_reasoning_content`, `on_tool_call`, `on_tool_return`, `on_compaction`,
+and `on_compaction_finish`, which means visible output, reasoning output, tool
+execution, and context compaction can all be driven through the same
+execution path.
 ```ruby
 class Stream < LLM::Stream
   def on_tool_call(tool, error)
-    queue << tool.spawn(:thread)
+    queue << (error || ctx.spawn(tool, :thread))
   end
   def on_tool_return(tool, result)
@@ -350,6 +425,7 @@ Runtime Building Blocks:
 - **Agents** — reusable assistants with tool auto-execution
 - **Skills** — directory-backed capabilities loaded from `SKILL.md`
 - **MCP Support** — stdio and HTTP MCP clients with prompt and tool support
+- **Context Compaction** — summarize older history in long-lived contexts
 Data and Structure:
 - **Structured Outputs** — JSON Schema-based responses
@@ -445,7 +521,7 @@ class Stream < LLM::Stream
   def on_tool_call(tool, error)
     return queue << error if error
     $stdout << "\nRunning tool #{tool.name}...\n"
-    queue << tool.spawn(:thread)
+    queue << ctx.spawn(tool, :thread)
   end
   def on_tool_return(tool, result)
@@ -458,12 +534,49 @@ class Stream < LLM::Stream
 end
 llm = LLM.openai(key: ENV["KEY"])
-ctx = LLM::Context.new(llm, stream: Stream.new, tools: [System])
+stream = Stream.new
+ctx = LLM::Context.new(llm, stream:, tools: [System])
 ctx.talk("Run `date` and `uname -a`.")
 ctx.talk(ctx.wait(:thread)) while ctx.functions.any?
 ```
+#### Context Compaction
+This example uses [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context.html),
+[`LLM::Compactor`](https://0x1eef.github.io/x/llm.rb/LLM/Compactor.html), and
+[`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) together so
+long-lived contexts can summarize older history and expose the lifecycle
+through stream hooks. This approach is inspired by General Intelligence
+Systems' [Brute](https://github.com/general-intelligence-systems/brute). The
+compactor can also use its own `model:` if you want summarization to run on a
+different model from the main context. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
+```ruby
+require "llm"
+class Stream < LLM::Stream
+  def on_compaction(ctx, compactor)
+    puts "Compacting #{ctx.messages.size} messages..."
+  end
+  def on_compaction_finish(ctx, compactor)
+    puts "Compacted to #{ctx.messages.size} messages."
+  end
+end
+llm = LLM.openai(key: ENV["KEY"])
+ctx = LLM::Context.new(
+  llm,
+  stream: Stream.new,
+  compactor: {
+    message_threshold: 200,
+    retention_window: 8,
+    model: "gpt-5.4-mini"
+  }
+)
+```
 #### Reasoning
 This example uses [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) with the OpenAI Responses API so reasoning output is streamed separately from visible assistant output. See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.

data/lib/llm/agent.rb CHANGED Viewed

@@ -16,6 +16,9 @@ module LLM
   # **Notes:**
   # * Instructions are injected once unless a system message is already present.
   # * An agent automatically executes tool loops (unlike {LLM::Context LLM::Context}).
+  # * The automatic tool loop enables the wrapped context's `guard` by default.
+  #   The built-in {LLM::LoopGuard LLM::LoopGuard} detects repeated tool-call
+  #   patterns and blocks stuck execution before more tool work is queued.
   # * Tool loop execution can be configured with `concurrency :call`,
   #   `:thread`, `:task`, `:fiber`, `:ractor`, or a list of queued task
   #   types such as `[:thread, :ractor]`.
@@ -128,7 +131,7 @@ module LLM
       defaults = {model: self.class.model, tools: self.class.tools, skills: self.class.skills, schema: self.class.schema}.compact
       @concurrency = params.delete(:concurrency) || self.class.concurrency
       @llm = llm
-      @ctx = LLM::Context.new(llm, defaults.merge(params))
+      @ctx = LLM::Context.new(llm, defaults.merge({guard: true}).merge(params))
     end
     ##
@@ -137,7 +140,7 @@ module LLM
     #
     # @param prompt (see LLM::Provider#complete)
     # @param [Hash] params The params passed to the provider, including optional :stream, :tools, :schema etc.
-    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 10)
+    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 25)
     # @return [LLM::Response] Returns the LLM's response for this turn.
     # @example
     #   llm = LLM.openai(key: ENV["KEY"])
@@ -145,14 +148,7 @@ module LLM
     #   response = agent.talk("Hello, what is your name?")
     #   puts response.choices[0].content
     def talk(prompt, params = {})
-      max = Integer(params.delete(:tool_attempts) || 10)
-      res = @ctx.talk(apply_instructions(prompt), params)
-      max.times do
-        break if @ctx.functions.empty?
-        res = @ctx.talk(call_functions, params)
-      end
-      raise LLM::ToolLoopError, "pending tool calls remain" unless @ctx.functions.empty?
-      res
+      run_loop(:talk, prompt, params)
     end
     alias_method :chat, :talk
@@ -163,7 +159,7 @@ module LLM
     # @note Not all LLM providers support this API
     # @param prompt (see LLM::Provider#complete)
     # @param [Hash] params The params passed to the provider, including optional :stream, :tools, :schema etc.
-    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 10)
+    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 25)
     # @return [LLM::Response] Returns the LLM's response for this turn.
     # @example
     #   llm = LLM.openai(key: ENV["KEY"])
@@ -171,14 +167,7 @@ module LLM
     #   res = agent.respond("What is the capital of France?")
     #   puts res.output_text
     def respond(prompt, params = {})
-      max = Integer(params.delete(:tool_attempts) || 10)
-      res = @ctx.respond(apply_instructions(prompt), params)
-      max.times do
-        break if @ctx.functions.empty?
-        res = @ctx.respond(call_functions, params)
-      end
-      raise LLM::ToolLoopError, "pending tool calls remain" unless @ctx.functions.empty?
-      res
+      run_loop(:respond, prompt, params)
     end
     ##
@@ -380,5 +369,16 @@ module LLM
       else raise ArgumentError, "Unknown concurrency: #{concurrency.inspect}. Expected :call, :thread, :task, :fiber, :ractor, or an array of queued task types"
       end
     end
+    def run_loop(method, prompt, params)
+      max = Integer(params.delete(:tool_attempts) || 25)
+      res = @ctx.public_send(method, apply_instructions(prompt), params)
+      max.times do
+        break if @ctx.functions.empty?
+        res = @ctx.public_send(method, call_functions, params)
+      end
+      raise LLM::ToolLoopError, "pending tool calls remain" unless @ctx.functions.empty?
+      res
+    end
   end
 end

data/lib/llm/buffer.rb CHANGED Viewed

@@ -23,6 +23,16 @@ module LLM
       @messages.concat(ary)
     end
+    ##
+    # Replace the tracked messages
+    # @param [Array<LLM::Message>] messages
+    #  The replacement messages
+    # @return [LLM::Buffer]
+    def replace(messages)
+      @messages.replace(messages)
+      self
+    end
     ##
     # @yield [LLM::Message]
     #  Yields each message in the conversation thread

data/lib/llm/compactor.rb ADDED Viewed

@@ -0,0 +1,117 @@
+# frozen_string_literal: true
+##
+# {LLM::Compactor LLM::Compactor} summarizes older context messages into a
+# smaller replacement message when a context grows too large.
+#
+# This work is directly inspired by the compaction approach developed by
+# General Intelligence Systems in
+# [Brute](https://github.com/general-intelligence-systems/brute).
+#
+# The compactor can also use a different model from the main context by
+# setting `model:` in the compactor config. Compaction thresholds are opt-in:
+# provide `message_threshold:` and/or `token_threshold:` to enable policy-
+# driven compaction.
+class LLM::Compactor
+  DEFAULTS = {
+    retention_window: 8,
+    model: nil
+  }.freeze
+  ##
+  # @return [Hash]
+  attr_reader :config
+  ##
+  # @param [LLM::Context] ctx
+  # @param [Hash] config
+  # @option config [Integer, nil] :token_threshold
+  #  Enables token-based compaction.
+  # @option config [Integer, nil] :message_threshold
+  #  Enables message-count-based compaction.
+  # @option config [Integer] :retention_window
+  # @option config [String, nil] :model
+  #  The model to use for the summarization request. Defaults to the current
+  #  context model.
+  def initialize(ctx, config = {})
+    @ctx = ctx
+    @config = DEFAULTS.merge(config)
+  end
+  ##
+  # Returns true when the context should be compacted
+  # @param [Object] prompt
+  #  The next prompt or turn input
+  # @return [Boolean]
+  def compact?(prompt = nil)
+    return false if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
+    messages = ctx.messages.reject(&:system?)
+    return true if config[:message_threshold] && messages.size > config[:message_threshold]
+    usage = ctx.usage
+    return true if config[:token_threshold] && usage && usage.total_tokens > config[:token_threshold]
+    false
+  end
+  ##
+  # Summarize older messages and replace them with a compact summary.
+  # @param [Object] prompt
+  #  The next prompt or turn input
+  # @return [LLM::Message, nil]
+  def compact!(prompt = nil)
+    return nil if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
+    messages = ctx.messages.reject(&:system?)
+    retention_window = [config[:retention_window], messages.size].min
+    return nil unless messages.size > retention_window
+    stream = ctx.params[:stream]
+    stream.on_compaction(ctx, self) if LLM::Stream === stream
+    recent = retained_messages
+    older = messages[0...(messages.size - recent.size)]
+    summary = LLM::Message.new(ctx.llm.user_role, "[Previous conversation summary]\n\n#{summarize(older)}", {compaction: true})
+    ctx.messages.replace([*ctx.messages.take_while(&:system?), summary, *recent])
+    stream.on_compaction_finish(ctx, self) if LLM::Stream === stream
+    summary
+  end
+  private
+  attr_reader :ctx
+  def retained_messages
+    messages = ctx.messages.reject(&:system?)
+    retention_window = [config[:retention_window], messages.size].min
+    start = [messages.size - retention_window, 0].max
+    start -= 1 while start > 0 && messages[start].tool_return?
+    messages[start..] || []
+  end
+  def summarize(messages)
+    model = config[:model] || ctx.params[:model] || ctx.llm.default_model
+    ctx.llm.complete(summary_prompt(messages), model:).content
+  end
+  def summary_prompt(messages)
+    <<~PROMPT
+      Summarize this conversation history for context continuity.
+      The summary will replace these messages in the context window.
+      Focus on:
+      - What the user asked for
+      - Important facts and decisions
+      - Tool calls and outcomes that still matter
+      - What should happen next
+      Conversation:
+      #{serialize(messages)}
+    PROMPT
+  end
+  def serialize(messages)
+    messages.map do |message|
+      content = case message.content
+      when Array then message.content.map(&:inspect).join(", ")
+      else message.content.to_s
+      end
+      "#{message.role}: #{content.empty? ? "(empty)" : content}"
+    end.join("\n---\n")
+  end
+end

data/lib/llm/context/deserializer.rb CHANGED Viewed

@@ -39,7 +39,8 @@ class LLM::Context
       original_tool_calls = payload["original_tool_calls"]
       usage = payload["usage"]
       reasoning_content = payload["reasoning_content"]
-      extra = {tool_calls:, original_tool_calls:, tools: @params[:tools], usage:, reasoning_content:}.compact
+      compaction = payload["compaction"]
+      extra = {tool_calls:, original_tool_calls:, tools: @params[:tools], usage:, reasoning_content:, compaction:}.compact
       content = returns.nil? ? deserialize_content(payload["content"]) : returns
       LLM::Message.new(payload["role"], content, extra)
     end

data/lib/llm/context.rb CHANGED Viewed

@@ -34,6 +34,7 @@ module LLM
   #   ctx.talk(prompt)
   #   ctx.messages.each { |m| puts "[#{m.role}] #{m.content}" }
   class Context
+    require_relative "compactor"
     require_relative "context/serializer"
     require_relative "context/deserializer"
     include Serializer
@@ -75,6 +76,9 @@ module LLM
     def initialize(llm, params = {})
       @llm = llm
       @mode = params.delete(:mode) || :completions
+      @compactor = params.delete(:compactor)
+      @guard = params.delete(:guard)
+      @transformer = params.delete(:transformer)
       tools = [*params.delete(:tools), *load_skills(params.delete(:skills))]
       @params = {model: llm.default_model, schema: nil}.compact.merge!(params)
       @params[:tools] = tools unless tools.empty?
@@ -82,6 +86,79 @@ module LLM
     end
     ##
+    # Returns a context compactor
+    # This feature is inspired by the compaction approach developed by
+    # General Intelligence Systems in
+    # [Brute](https://github.com/general-intelligence-systems/brute).
+    # @return [LLM::Compactor]
+    def compactor
+      @compactor = LLM::Compactor.new(self, @compactor || {}) unless LLM::Compactor === @compactor
+      @compactor
+    end
+    ##
+    # Sets a context compactor or compactor config
+    # @param [LLM::Compactor, Hash, nil] compactor
+    # @return [LLM::Compactor, Hash, nil]
+    def compactor=(compactor)
+      @compactor = compactor
+    end
+    ##
+    # Returns a guard, if configured.
+    #
+    # Guards are context-level supervisors for agentic execution. A guard can
+    # inspect the runtime state and decide whether pending tool work should be
+    # blocked before the context keeps looping.
+    #
+    # The built-in implementation is {LLM::LoopGuard LLM::LoopGuard}, which
+    # detects repeated tool-call patterns and turns them into in-band
+    # {LLM::GuardError LLM::GuardError} tool returns.
+    #
+    # @return [#call, nil]
+    def guard
+      return if @guard.nil? || @guard == false
+      @guard = LLM::LoopGuard.new if @guard == true
+      @guard = LLM::LoopGuard.new(@guard) if Hash === @guard
+      @guard
+    end
+    ##
+    # Sets a guard or guard config.
+    #
+    # Guards must implement `call(ctx)` and return either `nil` or a warning
+    # string. Returning a warning tells the context to block pending tool work
+    # with guarded tool errors instead of continuing the loop.
+    #
+    # @param [#call, Hash, Boolean, nil] guard
+    # @return [#call, Hash, Boolean, nil]
+    def guard=(guard)
+      @guard = guard
+    end
+    ##
+    # Returns a transformer, if configured.
+    #
+    # Transformers can rewrite outgoing prompts and params before a request is
+    # sent to the provider.
+    #
+    # @return [#call, nil]
+    def transformer
+      @transformer
+    end
+    ##
+    # Sets a transformer.
+    #
+    # Transformers must implement `call(ctx, prompt, params)` and return a
+    # two-element array of `[prompt, params]`.
+    #
+    # @param [#call, nil] transformer
+    # @return [#call, nil]
+    def transformer=(transformer)
+      @transformer = transformer
+    end
     # Interact with the context via the chat completions API.
     # This method immediately sends a request to the LLM and returns the response.
     #
@@ -96,8 +173,10 @@ module LLM
     def talk(prompt, params = {})
       return respond(prompt, params) if mode == :responses
       @owner = Fiber.current
+      compactor.compact!(prompt) if compactor.compact?(prompt)
       params = params.merge(messages: @messages.to_a)
       params = @params.merge(params)
+      prompt, params = transform(prompt, params)
       bind!(params[:stream], params[:model])
       res = @llm.complete(prompt, params)
       role = params[:role] || @llm.user_role
@@ -123,7 +202,9 @@ module LLM
     #   puts res.output_text
     def respond(prompt, params = {})
       @owner = Fiber.current
+      compactor.compact!(prompt) if compactor.compact?(prompt)
       params = @params.merge(params)
+      prompt, params = transform(prompt, params)
       bind!(params[:stream], params[:model])
       res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
       params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
@@ -168,11 +249,26 @@ module LLM
     # @return [Array<LLM::Function::Return>]
     def call(target)
       case target
-      when :functions then functions.call
+      when :functions then guarded_returns || functions.call
       else raise ArgumentError, "Unknown target: #{target.inspect}. Expected :functions"
       end
     end
+    ##
+    # Spawns a function through the context.
+    #
+    # When a guard is configured, this method can return an in-band guarded
+    # tool error instead of spawning work.
+    #
+    # @param [LLM::Function] function
+    # @param [Symbol] strategy
+    # @return [LLM::Function::Return, LLM::Function::Task]
+    def spawn(function, strategy)
+      warning = guard&.call(self)
+      return guarded_return_for(function, warning) if warning
+      function.spawn(strategy)
+    end
     ##
     # Returns tool returns accumulated in this context
     # @return [Array<LLM::Function::Return>]
@@ -201,10 +297,15 @@ module LLM
     def wait(strategy)
       stream = @params[:stream]
       if LLM::Stream === stream && !stream.queue.empty?
-        stream.wait(strategy)
+        @queue = stream.queue
+        @queue.wait(strategy)
       else
-        functions.wait(strategy)
+        return guarded_returns if guarded_returns
+        @queue = functions.spawn(strategy)
+        @queue.wait
       end
+    ensure
+      @queue = nil
     end
     ##
@@ -213,6 +314,7 @@ module LLM
     # @return [nil]
     def interrupt!
       llm.interrupt!(@owner)
+      queue&.interrupt!
     end
     alias_method :cancel!, :interrupt!
@@ -224,7 +326,14 @@ module LLM
     # messages.
     # @return [LLM::Object, nil]
     def usage
-      @messages.find(&:assistant?)&.usage
+      usage = @messages.find(&:assistant?)&.usage
+      return unless usage
+      LLM::Object.from(
+        input_tokens: usage.input_tokens || 0,
+        output_tokens: usage.output_tokens || 0,
+        reasoning_tokens: usage.reasoning_tokens || 0,
+        total_tokens: usage.total_tokens || 0
+      )
     end
     ##
@@ -352,13 +461,40 @@ module LLM
     def bind!(stream, model)
       return unless LLM::Stream === stream
+      stream.extra[:ctx] = self
       stream.extra[:tracer] = tracer
       stream.extra[:model] = model
     end
+    def queue
+      return @queue if @queue
+      stream = @params[:stream]
+      stream.queue if LLM::Stream === stream
+    end
     def load_skills(skills)
       [*skills].map { LLM::Skill.load(_1).to_tool(self) }
     end
+    def guarded_returns
+      warning = guard&.call(self)
+      return unless warning
+      functions.map { guarded_return_for(_1, warning) }
+    end
+    def transform(prompt, params)
+      return [prompt, params] unless transformer
+      transformer.call(self, prompt, params)
+    end
+    def guarded_return_for(function, warning)
+      LLM::Function::Return.new(function.id, function.name, {
+        error: true,
+        type: LLM::GuardError.name,
+        message: warning
+      })
+    end
   end
   # Backward-compatible alias

data/lib/llm/error.rb CHANGED Viewed

@@ -55,6 +55,10 @@ module LLM
   # When stuck in a tool call loop
   ToolLoopError = Class.new(Error)
+  ##
+  # When a guard blocks pending tool execution
+  GuardError = Class.new(Error)
   ##
   # When a request is interrupted
   Interrupt = Class.new(Error)

data/lib/llm/function/fiber_group.rb CHANGED Viewed

@@ -59,6 +59,14 @@ class LLM::Function
       @fibers.any?(&:alive?)
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      @fibers.each(&:interrupt!)
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # Waits for all fibers in the group to finish and returns
     # their {LLM::Function::Return} values.

data/lib/llm/function/ractor/task.rb CHANGED Viewed

@@ -26,6 +26,13 @@ class LLM::Function
       mailbox.alive?
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # @return [LLM::Function::Return]
     def wait

data/lib/llm/function/ractor_group.rb CHANGED Viewed

@@ -19,6 +19,14 @@ class LLM::Function
       @tasks.any?(&:alive?)
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      @tasks.each(&:interrupt!)
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # @return [Array<LLM::Function::Return>]
     def wait

data/lib/llm/function/task.rb CHANGED Viewed

@@ -29,6 +29,14 @@ class LLM::Function
       false
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      function&.interrupt!
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # @return [LLM::Function::Return]
     def wait

data/lib/llm/function/task_group.rb CHANGED Viewed

@@ -60,6 +60,14 @@ class LLM::Function
       @tasks.any?(&:alive?)
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      @tasks.each(&:interrupt!)
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # Waits for all tasks in the group to finish and returns
     # their {LLM::Function::Return} values.

data/lib/llm/function/thread_group.rb CHANGED Viewed

@@ -65,6 +65,14 @@ class LLM::Function
       @threads.any?(&:alive?)
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      @threads.each(&:interrupt!)
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # Waits for all threads in the group to finish and returns
     # their {LLM::Function::Return} values.

data/lib/llm/function.rb CHANGED Viewed

@@ -62,6 +62,13 @@ class LLM::Function
     def to_json(...)
       LLM.json.dump(to_h, ...)
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      nil
+    end
+    alias_method :cancel!, :interrupt!
   end
   ##
@@ -218,6 +225,18 @@ class LLM::Function
     @cancelled = true
   end
+  ##
+  # Notifies the function runner that the call was interrupted.
+  # This is cooperative and only applies to runners that implement
+  # `on_interrupt`.
+  # @return [nil]
+  def interrupt!
+    hook = %i[on_cancel on_interrupt].find { @runner.respond_to?(_1) }
+    @runner.public_send(hook) if hook
+    nil
+  end
+  alias_method :cancel!, :interrupt!
   ##
   # Returns true when a function has been called
   # @return [Boolean]
@@ -266,9 +285,10 @@ class LLM::Function
         parameters: (@params || {type: "object", properties: {}}).to_h.merge(additionalProperties: false), strict: false
       }.compact
     else
+      params = @params || {type: "object", properties: {}}
       {
         type: "function", name: @name,
-        function: {name: @name, description: @description, parameters: @params}
+        function: {name: @name, description: @description, parameters: params}
       }.compact
     end
   end

data/lib/llm/loop_guard.rb ADDED Viewed

@@ -0,0 +1,117 @@
+# frozen_string_literal: true
+##
+# {LLM::LoopGuard LLM::LoopGuard} is the built-in implementation of
+# llm.rb's `guard` capability.
+#
+# A guard is a context-level supervisor for agentic execution. It can inspect
+# the current runtime state and return a warning string when pending tool work
+# should be blocked before the loop keeps going.
+#
+# {LLM::LoopGuard LLM::LoopGuard} detects when a context is repeating the same
+# tool-call pattern instead of making progress. It is directly inspired by
+# General Intelligence Systems' Brute runtime and its doom-loop detection
+# approach.
+#
+# The public interface is intentionally small:
+# - `call(ctx)` returns `nil` when no intervention is needed
+# - `call(ctx)` returns a warning string when pending tool execution should be blocked
+#
+# {LLM::Context LLM::Context} can use that warning to return in-band
+# {LLM::GuardError LLM::GuardError} tool errors, and
+# {LLM::Agent LLM::Agent} enables this guard by default through its wrapped
+# context.
+#
+# Brute is MIT licensed. The relevant license grant is:
+#
+#   Permission is hereby granted, free of charge, to any person obtaining a copy
+#   of this software and associated documentation files (the "Software"), to deal
+#   in the Software without restriction, including without limitation the rights
+#   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+#   copies of the Software, and to permit persons to whom the Software is
+#   furnished to do so.
+class LLM::LoopGuard
+  ##
+  # The default number of repeated tool-call patterns required before
+  # the guard intervenes.
+  # @return [Integer]
+  DEFAULT_THRESHOLD = 3
+  ##
+  # Returns the repetition threshold.
+  # @return [Integer]
+  attr_reader :threshold
+  ##
+  # @param [Hash] config
+  # @option config [Integer] :threshold
+  #  How many repeated tool-call patterns must appear at the tail of the
+  #  sequence before the guard returns a warning.
+  def initialize(config = {})
+    @threshold = config.fetch(:threshold, DEFAULT_THRESHOLD)
+  end
+  ##
+  # Checks the current context for repeated tool-call patterns.
+  #
+  # This method inspects assistant tool calls only. It reduces each call to a
+  # `[tool_name, arguments]` signature and checks whether the tail of the
+  # sequence is repeating.
+  #
+  # @param [LLM::Context] ctx
+  # @return [String, nil]
+  #  Returns a warning string when pending tool execution should be blocked,
+  #  or `nil` when execution should continue.
+  def call(ctx)
+    repetitions = detect(ctx.messages.to_a)
+    repetitions ? warning(repetitions) : nil
+  end
+  private
+  def detect(messages)
+    signatures = extract_signatures(messages)
+    return if signatures.size < threshold
+    check_repeating_pattern(signatures)
+  end
+  def warning(repetitions)
+    <<~MSG
+      SYSTEM NOTICE: Repeated tool-call pattern detected - the same pattern has repeated #{repetitions} times.
+      You are stuck in a loop and not making progress. Stop and try a fundamentally different approach:
+      - Re-read the relevant context before retrying
+      - Try a different tool or strategy
+      - Break the problem into smaller steps
+      - If a tool keeps failing, investigate why before retrying
+    MSG
+  end
+  def extract_signatures(messages)
+    messages
+      .select { _1.respond_to?(:functions) && _1.assistant? }
+      .flat_map { |message| message.functions.map { [_1.name.to_s, _1.arguments.to_s] } }
+  end
+  def check_repeating_pattern(sequence)
+    max_pattern_len = sequence.size / threshold
+    (1..max_pattern_len).each do |pattern_len|
+      count = count_tail_repetitions(sequence, pattern_len)
+      return count if count >= threshold
+    end
+    nil
+  end
+  def count_tail_repetitions(sequence, length)
+    return 0 if sequence.size < length
+    pattern = sequence.last(length)
+    count = 1
+    pos = sequence.size - length
+    while pos >= length
+      candidate = sequence[(pos - length)...pos]
+      break unless candidate == pattern
+      count += 1
+      pos -= length
+    end
+    count
+  end
+end

data/lib/llm/message.rb CHANGED Viewed

@@ -34,6 +34,7 @@ module LLM
     # @return [Hash]
     def to_h
       {role:, content:, reasoning_content:,
+       compaction: extra.compaction,
        tools: extra.tool_calls,
        usage:,
        original_tool_calls: extra.original_tool_calls}.compact
@@ -74,6 +75,13 @@ module LLM
       extra.reasoning_content
     end
+    ##
+    # Returns true when a message was created by context compaction
+    # @return [Boolean]
+    def compaction?
+      !!extra.compaction
+    end
     ##
     # Returns true when a message contains an image URL
     # @return [Boolean]

data/lib/llm/stream/queue.rb CHANGED Viewed

@@ -31,6 +31,14 @@ class LLM::Stream
       @items.empty?
     end
+    ##
+    # @return [nil]
+    def interrupt!
+      @items.each(&:interrupt!)
+      nil
+    end
+    alias_method :cancel!, :interrupt!
     ##
     # Waits for queued work to finish and returns function results.
     # @param [Symbol, Array<Symbol>] strategy

data/lib/llm/stream.rb CHANGED Viewed

@@ -18,7 +18,8 @@ module LLM
   #
   # The most common callback is {#on_content}, which also maps to {#<<}.
   # Providers may also call {#on_reasoning_content} and {#on_tool_call} when
-  # that data is available.
+  # that data is available. Runtime features such as context compaction may
+  # also emit lifecycle callbacks like {#on_compaction}.
   class Stream
     require_relative "stream/queue"
@@ -29,6 +30,13 @@ module LLM
       @extra ||= LLM::Object.from({})
     end
+    ##
+    # Returns the current context, if one was attached to the stream.
+    # @return [LLM::Context, nil]
+    def ctx
+      extra[:ctx]
+    end
     ##
     # Returns a lazily-initialized queue for tool results or spawned work.
     # @return [LLM::Stream::Queue]
@@ -69,13 +77,14 @@ module LLM
     ##
     # Called when a streamed tool call has been fully constructed.
     # @note A stream implementation may start tool execution here, for
-    #   example by pushing `tool.spawn(:thread)`, `tool.spawn(:fiber)`, or
-    #   `tool.spawn(:task)` onto {#queue}. Mixed strategies can also be
-    #   selected per tool, such as `tool.mcp? ? tool.spawn(:task) :
-    #   tool.spawn(:ractor)`. When a streamed tool cannot be resolved, `error`
-    #   is passed as an {LLM::Function::Return}. It can be sent back to the
-    #   model, allowing the tool-call path to recover and the session to
-    #   continue. Tool resolution depends on
+    #   example by pushing `ctx.spawn(tool, :thread)`,
+    #   `ctx.spawn(tool, :fiber)`, or `ctx.spawn(tool, :task)` onto {#queue}.
+    #   Mixed strategies can also be selected per tool, such as
+    #   `tool.mcp? ? ctx.spawn(tool, :task) : ctx.spawn(tool, :ractor)`.
+    #   When a streamed tool cannot be resolved, `error` is passed as an
+    #   {LLM::Function::Return}. It can be sent back to the model, allowing
+    #   the tool-call path to recover and the session to continue. Tool
+    #   resolution depends on
     #   {LLM::Function.registry}, which includes {LLM::Tool LLM::Tool}
     #   subclasses, including MCP tools, but not functions defined with
     #   {LLM.function}. The current `:ractor` mode is for class-based tools
@@ -92,8 +101,8 @@ module LLM
     ##
     # Called when queued streamed tool work returns.
     # @note This callback runs when {#wait} resolves work that was queued from
-    #   {#on_tool_call}, such as values returned by `tool.spawn(:thread)`,
-    #   `tool.spawn(:fiber)`, or `tool.spawn(:task)`.
+    #   {#on_tool_call}, such as values returned by `ctx.spawn(tool, :thread)`,
+    #   `ctx.spawn(tool, :fiber)`, or `ctx.spawn(tool, :task)`.
     # @param [LLM::Function] tool
     #  The tool that returned.
     # @param [LLM::Function::Return] result
@@ -103,6 +112,24 @@ module LLM
       nil
     end
+    ##
+    # Called before a context compaction starts.
+    # @param [LLM::Context] ctx
+    # @param [LLM::Compactor] compactor
+    # @return [nil]
+    def on_compaction(ctx, compactor)
+      nil
+    end
+    ##
+    # Called after a context compaction finishes.
+    # @param [LLM::Context] ctx
+    # @param [LLM::Compactor] compactor
+    # @return [nil]
+    def on_compaction_finish(ctx, compactor)
+      nil
+    end
     # @endgroup
     # @group Error handlers

data/lib/llm/tool.rb CHANGED Viewed

@@ -171,4 +171,32 @@ class LLM::Tool
   def self.mcp?
     false
   end
+  ##
+  # Returns a function bound to this tool instance.
+  # @return [LLM::Function]
+  def function
+    @function ||= self.class.function.dup.tap { _1.register(self) }
+  end
+  ##
+  # Returns true if the tool is an MCP tool
+  # @return [Boolean]
+  def mcp?
+    self.class.mcp?
+  end
+  ##
+  # Called when an in-flight tool run is interrupted.
+  # Tools can override this to implement cooperative cleanup.
+  # @return [nil]
+  def on_interrupt
+  end
+  ##
+  # Called when an in-flight tool run is cancelled.
+  # @return [nil]
+  def on_cancel
+    on_interrupt
+  end
 end

data/lib/llm/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LLM
-  VERSION = "4.22.0"
+  VERSION = "5.0.0"
 end

data/lib/llm.rb CHANGED Viewed

@@ -23,6 +23,7 @@ module LLM
   require_relative "llm/stream"
   require_relative "llm/provider"
   require_relative "llm/context"
+  require_relative "llm/loop_guard"
   require_relative "llm/agent"
   require_relative "llm/buffer"
   require_relative "llm/function"

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm.rb
 version: !ruby/object:Gem::Version
-  version: 4.22.0
+  version: 5.0.0
 platform: ruby
 authors:
 - Antar Azri
@@ -271,6 +271,7 @@ files:
 - lib/llm/agent.rb
 - lib/llm/bot.rb
 - lib/llm/buffer.rb
+- lib/llm/compactor.rb
 - lib/llm/context.rb
 - lib/llm/context/deserializer.rb
 - lib/llm/context/serializer.rb
@@ -297,6 +298,7 @@ files:
 - lib/llm/function/thread_group.rb
 - lib/llm/function/tracing.rb
 - lib/llm/json_adapter.rb
+- lib/llm/loop_guard.rb
 - lib/llm/mcp.rb
 - lib/llm/mcp/command.rb
 - lib/llm/mcp/error.rb