RubyGems - llm.rb - Versions diffs - 6.0.0 → 7.0.0 - Mend

llm.rb 6.0.0 → 7.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +82 -0
data/README.md +15 -5
data/lib/llm/agent.rb +34 -6
data/lib/llm/buffer.rb +8 -0
data/lib/llm/compactor.rb +27 -9
data/lib/llm/context/deserializer.rb +1 -0
data/lib/llm/context.rb +49 -23
data/lib/llm/loop_guard.rb +1 -10
data/lib/llm/provider/transport/http/execution.rb +1 -1
data/lib/llm/provider/transport/http/interruptible.rb +99 -94
data/lib/llm/provider/transport/http.rb +4 -3
data/lib/llm/provider.rb +8 -0
data/lib/llm/skill.rb +2 -0
data/lib/llm/version.rb +1 -1
metadata +1 -1

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8cc16b548be77e0a2bb78c0e130b14f0ac8fdabb2c2f082dd2824282e5a5733b
-  data.tar.gz: 668655ba6a7d65d44b53cc1b8b33afddf3563dc216df83181fb2b7394a2847ff
+  metadata.gz: 6c923952039095a2234eb1bd5c058a951b0d797d27577cdf7f679df59b49060b
+  data.tar.gz: 3667e0d79e44634f769dfced198dd07c1039f173cb43b72aab7d3204aa3638f8
 SHA512:
-  metadata.gz: 2257aeec49a43c56bfc974e3fc190a850ea27c98c437c7c242efe6c645c000eebdbe2e731fcc0b287303690c1055768f2de32be6ac900391c5156260da9c2ce5
-  data.tar.gz: 8f4f8f3475ac1bd2d9ffbd8c0b413b46224936459eb0ed6bb395876b994c1ab67ec02cf670176f90b8d9b892b59ab4b67271f4b69c80fa7c094aa89310ab5c58
+  metadata.gz: 655d450b2ffeb71ed9564b7c5c23a2a86e9e385de9dc1abdac18588e460cffdecd1b2da1d5ef9fc162dc3f3286b7d2c979baec3953cd1ddbdab74d1ef5b87112
+  data.tar.gz: a044fedb675c4d92eff55c210d588b68b80c7e3967188674c2de4d8f6bc69d76e8f15c18f49fb54e09a8c93dff89074304d231609337bfa3bc79c96e1f3f576b

data/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,87 @@
 # Changelog
+## Unreleased
+## v7.0.0
+Changes since `v6.1.0`.
+This release turns agent tool-loop limit errors into in-band advisory
+returns so the LLM can react to rate limits and continue the loop. It
+adds `tool_attempts: nil` as a way to opt out of advisory tool-limit
+returns entirely, and fixes the default provider HTTP path to keep
+`net-http-persistent` optional when not explicitly enabled.
+### Breaking
+* **Return in-band tool-loop limit errors from agents** <br>
+  Stop raising `LLM::ToolLoopError` when an agent exhausts its tool loop
+  attempt budget, and instead send advisory `LLM::Function::Return`
+  errors back through the model so the LLM can react to the rate limit
+  in-band and continue the loop.
+* **Allow `tool_attempts: nil` to disable advisory tool-limit returns** <br>
+  Keep the default `tool_attempts` budget at `25`, but treat an explicit
+  `tool_attempts: nil` as an opt-out that disables advisory tool-limit
+  returns entirely.
+### Fix
+* **Keep `net-http-persistent` optional on normal HTTP requests** <br>
+  Stop the default provider HTTP path from loading `net/http/persistent`
+  unless persistent transport support is explicitly enabled.
+## v6.1.0
+Changes since `v6.0.0`.
+This release tightens interrupt and compaction behavior for long-running
+contexts. It adds `LLM::Buffer#rindex`, supports percentage-based token
+thresholds in `LLM::Compactor`, tracks persisted compaction state through
+context serialization, reliably interrupts Async-backed requests, preserves
+valid tool-call history on cancellation, keeps concurrent skill tool loops
+running on streamed agents, and returns zero-valued usage objects when no
+provider usage has been recorded yet.
+### Change
+* **Add `LLM::Buffer#rindex`** <br>
+  Add `LLM::Buffer#rindex` as a direct forward to the underlying message
+  array so callers can find the last matching message index through the
+  buffer API.
+* **Support percentage compaction token thresholds** <br>
+  Let `LLM::Compactor` accept `token_threshold:` values like `"90%"` so
+  compaction can trigger at a percentage of the active model context
+  window.
+### Fix
+* **Interrupt Async-backed requests reliably** <br>
+  Track request ownership through the provider transport so contexts use
+  the active Async task when available, letting `ctx.interrupt!`
+  reliably cancel streamed requests under Async runtimes and surface
+  them as `LLM::Interrupt`.
+* **Preserve valid tool-call history on cancellation** <br>
+  Append cancelled tool-return messages for unresolved tool calls during
+  `ctx.interrupt!` so follow-up provider requests do not fail with
+  invalid tool-call history after pending tool work is cancelled.
+* **Preserve concurrent skill tool loops on streamed agents** <br>
+  Propagate the active agent concurrency through the effective request
+  stream so nested skill agents keep using queued `wait(...)` tool
+  execution instead of falling back to direct `:call` execution.
+* **Track persisted compaction state on contexts** <br>
+  Mark contexts as compacted after `LLM::Compactor#compact!`, persist and
+  restore that state through context serialization, and clear it after the
+  next successful model response.
+* **Return zero-valued usage objects from contexts** <br>
+  Make `LLM::Context#usage` consistently return an `LLM::Object`, using a
+  zero-valued usage object when no provider usage has been recorded yet.
 ## v6.0.0
 Changes since `v5.4.0`.

data/README.md CHANGED Viewed

@@ -4,7 +4,7 @@
 <p align="center">
   <a href="https://0x1eef.github.io/x/llm.rb?rebuild=1"><img src="https://img.shields.io/badge/docs-0x1eef.github.io-blue.svg" alt="RubyDoc"></a>
   <a href="https://opensource.org/license/0bsd"><img src="https://img.shields.io/badge/License-0BSD-orange.svg?" alt="License"></a>
-  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-6.0.0-green.svg?" alt="Version"></a>
+  <a href="https://github.com/llmrb/llm.rb/tags"><img src="https://img.shields.io/badge/version-7.0.0-green.svg?" alt="Version"></a>
 </p>
 ## About
@@ -163,12 +163,15 @@ and when a stream is present it emits `on_compaction` and
 `on_compaction_finish` through [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html).
 The compactor can also use a different model from the main context, which is
 useful when you want summarization to run on a cheaper or faster model.
+`token_threshold:` accepts either a fixed token count or a percentage string
+like `"90%"`, which resolves against the active model context window and
+triggers compaction once total token usage goes over that percentage.
 ```ruby
 ctx = LLM::Context.new(
   llm,
   compactor: {
-    message_threshold: 200,
+    token_threshold: "90%",
     retention_window: 8,
     model: "gpt-5.4-mini"
   }
@@ -367,6 +370,10 @@ worker.join
   or experimental `:ractor` support for class-based tools. MCP tools are not
   supported by the current `:ractor` mode, but mixed tool sets can still
   route MCP tools and local tools through different strategies at runtime.
+  By default, the tool attempt budget is `25`. When an agent exhausts that
+  budget, it sends advisory tool errors back through the model instead of
+  raising out of the runtime. Set `tool_attempts: nil` to disable that
+  advisory behavior.
 - **Tool calls have an explicit lifecycle** <br>
   A tool call can be executed, cancelled through
   [`LLM::Function#cancel`](https://0x1eef.github.io/x/llm.rb/LLM/Function.html#cancel-instance_method),
@@ -622,9 +629,12 @@ This example uses [`LLM::Context`](https://0x1eef.github.io/x/llm.rb/LLM/Context
 [`LLM::Stream`](https://0x1eef.github.io/x/llm.rb/LLM/Stream.html) together so
 long-lived contexts can summarize older history and expose the lifecycle
 through stream hooks. This approach is inspired by General Intelligence
-Systems' [Brute](https://github.com/general-intelligence-systems/brute). The
+Systems. The
 compactor can also use its own `model:` if you want summarization to run on a
-different model from the main context. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
+different model from the main context. `token_threshold:` accepts either a
+fixed token count or a percentage string like `"90%"`, which resolves
+against the active model context window and triggers compaction once total
+token usage goes over that percentage. <br> See the [deepdive (web)](https://0x1eef.github.io/x/llm.rb/file.deepdive.html) or [deepdive (markdown)](resources/deepdive.md) for more examples.
 ```ruby
 require "llm"
@@ -644,7 +654,7 @@ ctx = LLM::Context.new(
   llm,
   stream: Stream.new,
   compactor: {
-    message_threshold: 200,
+    token_threshold: "90%",
     retention_window: 8,
     model: "gpt-5.4-mini"
   }

data/lib/llm/agent.rb CHANGED Viewed

@@ -19,6 +19,9 @@ module LLM
   # * The automatic tool loop enables the wrapped context's `guard` by default.
   #   The built-in {LLM::LoopGuard LLM::LoopGuard} detects repeated tool-call
   #   patterns and blocks stuck execution before more tool work is queued.
+  # * The default tool attempt budget is `25`. After that, the agent sends
+  #   advisory tool errors back through the model and keeps the loop in-band.
+  #   Set `tool_attempts: nil` to disable that advisory behavior.
   # * Tool loop execution can be configured with `concurrency :call`,
   #   `:thread`, `:task`, `:fiber`, `:ractor`, or a list of queued task
   #   types such as `[:thread, :ractor]`.
@@ -161,7 +164,10 @@ module LLM
     #
     # @param prompt (see LLM::Provider#complete)
     # @param [Hash] params The params passed to the provider, including optional :stream, :tools, :schema etc.
-    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 25)
+    # @option params [Integer] :tool_attempts
+    #  The maxinum number of tool call iterations before the agent sends
+    #  in-band advisory tool errors back through the model (default 25).
+    #  Set to `nil` to disable advisory tool-limit returns.
     # @return [LLM::Response] Returns the LLM's response for this turn.
     # @example
     #   llm = LLM.openai(key: ENV["KEY"])
@@ -180,7 +186,10 @@ module LLM
     # @note Not all LLM providers support this API
     # @param prompt (see LLM::Provider#complete)
     # @param [Hash] params The params passed to the provider, including optional :stream, :tools, :schema etc.
-    # @option params [Integer] :tool_attempts The maxinum number of tool call iterations (default 25)
+    # @option params [Integer] :tool_attempts
+    #  The maxinum number of tool call iterations before the agent sends
+    #  in-band advisory tool errors back through the model (default 25).
+    #  Set to `nil` to disable advisory tool-limit returns.
     # @return [LLM::Response] Returns the LLM's response for this turn.
     # @example
     #   llm = LLM.openai(key: ENV["KEY"])
@@ -393,18 +402,37 @@ module LLM
     def run_loop(method, prompt, params)
       loop = proc do
-        max = Integer(params.delete(:tool_attempts) || 25)
+        max = params.key?(:tool_attempts) ? params.delete(:tool_attempts) : 25
+        max = Integer(max) if max
+        stream = params[:stream] || @ctx.params[:stream]
+        stream.extra[:concurrency] = concurrency if LLM::Stream === stream
         res = @ctx.public_send(method, apply_instructions(prompt), params)
-        max.times do
+        loop do
           break if @ctx.functions.empty?
-          res = @ctx.public_send(method, call_functions, params)
+          if max
+            max.times do
+              break if @ctx.functions.empty?
+              res = @ctx.public_send(method, call_functions, params)
+            end
+            break if @ctx.functions.empty?
+            res = @ctx.public_send(method, @ctx.functions.map { rate_limit(_1) }, params)
+          else
+            res = @ctx.public_send(method, call_functions, params)
+          end
         end
-        raise LLM::ToolLoopError, "pending tool calls remain" unless @ctx.functions.empty?
         res
       end
       @tracer ? @llm.with_tracer(@tracer, &loop) : loop.call
     end
+    def rate_limit(function)
+      LLM::Function::Return.new(function.id, function.name, {
+        error: true,
+        type: LLM::ToolLoopError.name,
+        message: "tool loop rate limit reached"
+      })
+    end
     def resolve_option(option)
       Proc === option ? instance_exec(&option) : option
     end

data/lib/llm/buffer.rb CHANGED Viewed

@@ -52,6 +52,14 @@ module LLM
       reverse_each.find(...)
     end
+    ##
+    # Returns the index of the last message matching the given block.
+    # @yield [LLM::Message]
+    # @return [Integer, nil]
+    def rindex(...)
+      @messages.rindex(...)
+    end
     ##
     # Returns the last message(s) in the buffer
     # @param [Integer, nil] n

data/lib/llm/compactor.rb CHANGED Viewed

@@ -5,13 +5,14 @@
 # smaller replacement message when a context grows too large.
 #
 # This work is directly inspired by the compaction approach developed by
-# General Intelligence Systems in
-# [Brute](https://github.com/general-intelligence-systems/brute).
+# General Intelligence Systems.
 #
 # The compactor can also use a different model from the main context by
 # setting `model:` in the compactor config. Compaction thresholds are opt-in:
 # provide `message_threshold:` and/or `token_threshold:` to enable policy-
-# driven compaction.
+# driven compaction. `token_threshold:` accepts either an integer token count
+# or a percentage string like `"90%"`, which resolves against the current
+# model context window.
 class LLM::Compactor
   DEFAULTS = {
     retention_window: 8,
@@ -25,8 +26,11 @@ class LLM::Compactor
   ##
   # @param [LLM::Context] ctx
   # @param [Hash] config
-  # @option config [Integer, nil] :token_threshold
-  #  Enables token-based compaction.
+  # @option config [Integer, String, nil] :token_threshold
+  #  Enables token-based compaction. Integer values are treated as a fixed
+  #  token count. Percentage strings like `"90%"` are resolved against
+  #  {LLM::Context#context_window}; if the context window is unknown, the
+  #  percentage threshold is treated as disabled.
   # @option config [Integer, nil] :message_threshold
   #  Enables message-count-based compaction.
   # @option config [Integer] :retention_window
@@ -39,18 +43,22 @@ class LLM::Compactor
   end
   ##
-  # Returns true when the context should be compacted
+  # Returns true when the context should be compacted.
+  #
+  # When `token_threshold:` is a percentage string such as `"90%"`, the
+  # threshold is resolved against the current context window and compared to
+  # the current total token usage.
   # @param [Object] prompt
   #  The next prompt or turn input
   # @return [Boolean]
-  def compact?(prompt = nil)
+  def compactable?(prompt = nil)
     return false if ctx.functions.any? || [*prompt].grep(LLM::Function::Return).any?
     messages = ctx.messages.reject(&:system?)
     return true if config[:message_threshold] && messages.size > config[:message_threshold]
-    usage = ctx.usage
-    return true if config[:token_threshold] && usage && usage.total_tokens > config[:token_threshold]
+    return true if token_threshold and ctx.usage.total_tokens > token_threshold
     false
   end
+  alias_method :compact?, :compactable?
   ##
   # Summarize older messages and replace them with a compact summary.
@@ -68,6 +76,7 @@ class LLM::Compactor
     older = messages[0...(messages.size - recent.size)]
     summary = LLM::Message.new(ctx.llm.user_role, "[Previous conversation summary]\n\n#{summarize(older)}", {compaction: true})
     ctx.messages.replace([*ctx.messages.take_while(&:system?), summary, *recent])
+    ctx.compacted = true
     stream.on_compaction_finish(ctx, self) if LLM::Stream === stream
     summary
   end
@@ -84,6 +93,15 @@ class LLM::Compactor
     messages[start..] || []
   end
+  def token_threshold
+    @token_threshold ||= begin
+      threshold = config[:token_threshold]
+      return threshold unless threshold.to_s.end_with?("%")
+      return if ctx.context_window <= 0
+      (ctx.context_window * threshold.delete_suffix("%").to_f / 100).floor
+    end
+  end
   def summarize(messages)
     model = config[:model] || ctx.params[:model] || ctx.llm.default_model
     ctx.llm.complete(summary_prompt(messages), model:).content

data/lib/llm/context/deserializer.rb CHANGED Viewed

@@ -26,6 +26,7 @@ class LLM::Context
         LLM.json.load(string)
       end
       @messages.concat [*ctx["messages"]].map { deserialize_message(_1) }
+      @compacted = !!ctx["compacted"]
       self
     end
     alias_method :restore, :deserialize

data/lib/llm/context.rb CHANGED Viewed

@@ -40,6 +40,14 @@ module LLM
     include Serializer
     include Deserializer
+    ZERO_USAGE = LLM::Object.from(
+      input_tokens: 0,
+      output_tokens: 0,
+      reasoning_tokens: 0,
+      total_tokens: 0
+    )
+    private_constant :ZERO_USAGE
     ##
     # Returns the accumulated message history for this context
     # @return [LLM::Buffer<LLM::Message>]
@@ -88,8 +96,7 @@ module LLM
     ##
     # Returns a context compactor
     # This feature is inspired by the compaction approach developed by
-    # General Intelligence Systems in
-    # [Brute](https://github.com/general-intelligence-systems/brute).
+    # General Intelligence Systems.
     # @return [LLM::Compactor]
     def compactor
       @compactor = LLM::Compactor.new(self, @compactor || {}) unless LLM::Compactor === @compactor
@@ -104,6 +111,14 @@ module LLM
       @compactor = compactor
     end
+    ##
+    # Returns whether the context has been compacted and no later model
+    # response has cleared that state.
+    # @return [Boolean]
+    # @api private
+    attr_accessor :compacted
+    alias_method :compacted?, :compacted
     ##
     # Returns a guard, if configured.
     #
@@ -172,13 +187,14 @@ module LLM
     #   puts res.messages[0].content
     def talk(prompt, params = {})
       return respond(prompt, params) if mode == :responses
-      @owner = Fiber.current
+      @owner = @llm.request_owner
       compactor.compact!(prompt) if compactor.compact?(prompt)
       params = params.merge(messages: @messages.to_a)
       params = @params.merge(params)
       prompt, params = transform(prompt, params)
       bind!(params[:stream], params[:model], params[:tools])
       res = @llm.complete(prompt, params)
+      self.compacted = false
       role = params[:role] || @llm.user_role
       role = @llm.tool_role if params[:role].nil? && [*prompt].grep(LLM::Function::Return).any?
       @messages.concat LLM::Prompt === prompt ? prompt.to_a : [LLM::Message.new(role, prompt)]
@@ -201,7 +217,7 @@ module LLM
     #   res = ctx.respond("What is the capital of France?")
     #   puts res.output_text
     def respond(prompt, params = {})
-      @owner = Fiber.current
+      @owner = @llm.request_owner
       compactor.compact!(prompt) if compactor.compact?(prompt)
       params = @params.merge(params)
       prompt, params = transform(prompt, params)
@@ -209,6 +225,7 @@ module LLM
       res_id = params[:store] == false ? nil : @messages.find(&:assistant?)&.response&.response_id
       params = params.merge(previous_response_id: res_id, input: @messages.to_a).compact
       res = @llm.responses.create(prompt, params)
+      self.compacted = false
       role = params[:role] || @llm.user_role
       @messages.concat LLM::Prompt === prompt ? prompt.to_a : [LLM::Message.new(role, prompt)]
       @messages.concat [res.choices[-1]]
@@ -313,27 +330,31 @@ module LLM
     # This is inspired by Go's context cancellation model.
     # @return [nil]
     def interrupt!
+      pending = functions.to_a
       llm.interrupt!(@owner)
       queue&.interrupt!
+      return if pending.empty?
+      pending.each(&:interrupt!)
+      returns = pending.map { _1.cancel(reason: "function call cancelled") }
+      @messages << LLM::Message.new(@llm.tool_role, returns)
+      nil
     end
     alias_method :cancel!, :interrupt!
     ##
     # Returns token usage accumulated in this context
-    # @note
-    # This method returns token usage for the latest
-    # assistant message, and it returns nil for non-assistant
-    # messages.
-    # @return [LLM::Object, nil]
+    # @return [LLM::Object]
     def usage
-      usage = @messages.find(&:assistant?)&.usage
-      return unless usage
-      LLM::Object.from(
-        input_tokens: usage.input_tokens || 0,
-        output_tokens: usage.output_tokens || 0,
-        reasoning_tokens: usage.reasoning_tokens || 0,
-        total_tokens: usage.total_tokens || 0
-      )
+      if usage = @messages.find(&:assistant?)&.usage
+        LLM::Object.from(
+          input_tokens: usage.input_tokens || 0,
+          output_tokens: usage.output_tokens || 0,
+          reasoning_tokens: usage.reasoning_tokens || 0,
+          total_tokens: usage.total_tokens || 0
+        )
+      else
+        ZERO_USAGE
+      end
     end
     ##
@@ -403,7 +424,12 @@ module LLM
     ##
     # @return [Hash]
     def to_h
-      {schema_version: 1, model:, messages: @messages.map { serialize_message(_1) }}
+      {
+        schema_version: 1,
+        model:,
+        compacted:,
+        messages: @messages.map { serialize_message(_1) }
+      }
     end
     ##
@@ -432,12 +458,12 @@ module LLM
     #  Returns an _approximate_ cost for a given context
     #  based on both the provider, and model
     def cost
-      return LLM::Cost.new(0, 0) unless usage
       cost = LLM.registry_for(llm).cost(model:)
-      LLM::Cost.new(
-        (cost.input.to_f / 1_000_000.0)  * usage.input_tokens,
-        (cost.output.to_f / 1_000_000.0) * usage.output_tokens
-      )
+      input_cost = (cost.input.to_f / 1_000_000.0) * usage.input_tokens
+      output_cost = (cost.output.to_f / 1_000_000.0) * usage.output_tokens
+      LLM::Cost.new(input_cost, output_cost)
+    rescue LLM::NoSuchModelError, LLM::NoSuchRegistryError
+      LLM::Cost.new(0, 0)
     end
     ##

data/lib/llm/loop_guard.rb CHANGED Viewed

@@ -10,8 +10,7 @@
 #
 # {LLM::LoopGuard LLM::LoopGuard} detects when a context is repeating the same
 # tool-call pattern instead of making progress. It is directly inspired by
-# General Intelligence Systems' Brute runtime and its doom-loop detection
-# approach.
+# General Intelligence Systems and its doom-loop detection approach.
 #
 # The public interface is intentionally small:
 # - `call(ctx)` returns `nil` when no intervention is needed
@@ -22,14 +21,6 @@
 # {LLM::Agent LLM::Agent} enables this guard by default through its wrapped
 # context.
 #
-# Brute is MIT licensed. The relevant license grant is:
-#
-#   Permission is hereby granted, free of charge, to any person obtaining a copy
-#   of this software and associated documentation files (the "Software"), to deal
-#   in the Software without restriction, including without limitation the rights
-#   to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-#   copies of the Software, and to permit persons to whom the Software is
-#   furnished to do so.
 class LLM::LoopGuard
   ##
   # The default number of repeated tool-call patterns required before

data/lib/llm/provider/transport/http/execution.rb CHANGED Viewed

@@ -38,7 +38,7 @@ module LLM::Provider::Transport
           perform_request(http, request, stream, stream_parser, &b)
         end
         [handle_response(res, tracer, span), span, tracer]
-      rescue *LLM::Provider::Transport::HTTP::Interruptible::INTERRUPT_ERRORS
+      rescue *transport.interrupt_errors
         raise LLM::Interrupt, "request interrupted" if transport.interrupted?(owner)
         raise
       end

data/lib/llm/provider/transport/http/interruptible.rb CHANGED Viewed

@@ -1,109 +1,114 @@
 # frozen_string_literal: true
 class LLM::Provider
-  module Transport
-    class HTTP
-      ##
-      # Internal request interruption methods for
-      # {LLM::Provider::Transport::HTTP}.
-      #
-      # This module tracks active requests by execution owner and provides
-      # the logic used to interrupt an in-flight request by closing the
-      # active HTTP connection.
-      #
-      # @api private
-      module Interruptible
-        INTERRUPT_ERRORS = [::IOError, ::EOFError, Errno::EBADF].freeze
-        Request = Struct.new(:http, :connection, keyword_init: true)
+  ##
+  # Internal request interruption methods for
+  # {LLM::Provider::Transport::HTTP}.
+  #
+  # This module tracks active requests by execution owner and provides
+  # the logic used to interrupt an in-flight request by closing the
+  # active HTTP connection.
+  #
+  # @api private
+  module Transport::HTTP::Interruptible
+    INTERRUPT_ERRORS = [::IOError, ::EOFError, Errno::EBADF].freeze
+    Request = Struct.new(:http, :connection, keyword_init: true)
-        ##
-        # Interrupt an active request, if any.
-        # @param [Fiber] owner
-        #  The execution owner whose request should be interrupted
-        # @return [nil]
-        def interrupt!(owner)
-          req = request_for(owner) or return
-          lock { (@interrupts ||= {})[owner] = true }
-          if persistent_http?(req.http)
-            close_socket(req.connection&.http)
-            req.http.finish(req.connection)
-          elsif transient_http?(req.http)
-            close_socket(req.http)
-            req.http.finish if req.http.active?
-          end
-        rescue *INTERRUPT_ERRORS
-          nil
-        end
-        private
+    def interrupt_errors
+      [*INTERRUPT_ERRORS, *optional_interrupt_errors]
+    end
-        ##
-        # Closes the active socket for a request, if present.
-        # @param [Net::HTTP, nil] http
-        # @return [nil]
-        def close_socket(http)
-          socket = http&.instance_variable_get(:@socket) or return
-          socket = socket.io if socket.respond_to?(:io)
-          socket.close
-        rescue *INTERRUPT_ERRORS
-          nil
-        end
+    ##
+    # Interrupt an active request, if any.
+    # @param [Fiber] owner
+    #  The execution owner whose request should be interrupted
+    # @return [nil]
+    def interrupt!(owner)
+      req = request_for(owner) or return
+      lock { (@interrupts ||= {})[owner] = true }
+      if persistent_http?(req.http)
+        close_socket(req.connection&.http)
+        req.http.finish(req.connection)
+      elsif transient_http?(req.http)
+        close_socket(req.http)
+        req.http.finish if req.http.active?
+      end
+      owner.stop if owner.respond_to?(:stop)
+    rescue *interrupt_errors
+      nil
+    end
-        ##
-        # Returns whether the active request is using a transient HTTP client.
-        # @param [Object, nil] http
-        # @return [Boolean]
-        def transient_http?(http)
-          Net::HTTP === http
-        end
+    private
-        ##
-        # Returns whether the active request is using a persistent HTTP client.
-        # @param [Object, nil] http
-        # @return [Boolean]
-        def persistent_http?(http)
-          defined?(Net::HTTP::Persistent) && Net::HTTP::Persistent === http
-        end
+    ##
+    # Closes the active socket for a request, if present.
+    # @param [Net::HTTP, nil] http
+    # @return [nil]
+    def close_socket(http)
+      socket = http&.instance_variable_get(:@socket) or return
+      socket = socket.io if socket.respond_to?(:io)
+      socket.close
+    rescue *interrupt_errors
+      nil
+    end
-        ##
-        # Returns the active request for an execution owner.
-        # @param [Fiber] owner
-        # @return [Request, nil]
-        def request_for(owner)
-          lock do
-            @requests ||= {}
-            @requests[owner]
-          end
-        end
+    ##
+    # Returns whether the active request is using a transient HTTP client.
+    # @param [Object, nil] http
+    # @return [Boolean]
+    def transient_http?(http)
+      Net::HTTP === http
+    end
-        ##
-        # Records an active request for an execution owner.
-        # @param [Request] req
-        # @param [Fiber] owner
-        # @return [Request]
-        def set_request(req, owner)
-          lock do
-            @requests ||= {}
-            @requests[owner] = req
-          end
-        end
+    ##
+    # Returns whether the active request is using a persistent HTTP client.
+    # @param [Object, nil] http
+    # @return [Boolean]
+    def persistent_http?(http)
+      defined?(Net::HTTP::Persistent) && Net::HTTP::Persistent === http
+    end
-        ##
-        # Clears the active request for an execution owner.
-        # @param [Fiber] owner
-        # @return [Request, nil]
-        def clear_request(owner)
-          lock { @requests&.delete(owner) }
-        end
+    ##
+    # Returns the active request for an execution owner.
+    # @param [Fiber] owner
+    # @return [Request, nil]
+    def request_for(owner)
+      lock do
+        @requests ||= {}
+        @requests[owner]
+      end
+    end
-        ##
-        # Returns whether an execution owner was interrupted.
-        # @param [Fiber] owner
-        # @return [Boolean, nil]
-        def interrupted?(owner)
-          lock { @interrupts&.delete(owner) }
-        end
+    ##
+    # Records an active request for an execution owner.
+    # @param [Request] req
+    # @param [Fiber] owner
+    # @return [Request]
+    def set_request(req, owner)
+      lock do
+        @requests ||= {}
+        @requests[owner] = req
       end
     end
+    ##
+    # Clears the active request for an execution owner.
+    # @param [Fiber] owner
+    # @return [Request, nil]
+    def clear_request(owner)
+      lock { @requests&.delete(owner) }
+    end
+    ##
+    # Returns whether an execution owner was interrupted.
+    # @param [Fiber] owner
+    # @return [Boolean, nil]
+    def interrupted?(owner)
+      lock { @interrupts&.delete(owner) }
+    end
+    def optional_interrupt_errors
+      defined?(::Async::Stop) ? [Async::Stop] : []
+    end
   end
 end

data/lib/llm/provider/transport/http.rb CHANGED Viewed

@@ -50,9 +50,10 @@ class LLM::Provider
       ##
       # Returns the current request owner.
-      # @return [Fiber]
+      # @return [Object]
       def request_owner
-        Fiber.current
+        return Fiber.current unless defined?(::Async)
+        Async::Task.current || Fiber.current
       end
       ##
@@ -70,7 +71,7 @@ class LLM::Provider
       ##
       # @return [Boolean]
       def persistent?
-        !persistent_client.nil?
+        !@persistent_client.nil?
       end
       ##

data/lib/llm/provider.rb CHANGED Viewed

@@ -338,6 +338,14 @@ class LLM::Provider
   end
   alias_method :cancel!, :interrupt!
+  ##
+  # Returns the current request owner used by the transport.
+  # @return [Object]
+  # @api private
+  def request_owner
+    transport.request_owner
+  end
   ##
   # @param [Object] stream
   # @return [Boolean]

data/lib/llm/skill.rb CHANGED Viewed

@@ -76,6 +76,8 @@ module LLM
     def call(ctx)
       instructions, tools, tracer = self.instructions, self.tools, ctx.llm.tracer
       params = ctx.params.merge(mode: ctx.mode).reject { [:tools, :schema].include?(_1) }
+      concurrency = params[:stream].extra[:concurrency] if LLM::Stream === params[:stream]
+      params[:concurrency] = concurrency if concurrency
       agent = Class.new(LLM::Agent) do
         instructions(instructions)
         tools(*tools)

data/lib/llm/version.rb CHANGED Viewed

@@ -1,5 +1,5 @@
 # frozen_string_literal: true
 module LLM
-  VERSION = "6.0.0"
+  VERSION = "7.0.0"
 end

metadata CHANGED Viewed

@@ -1,7 +1,7 @@
 --- !ruby/object:Gem::Specification
 name: llm.rb
 version: !ruby/object:Gem::Version
-  version: 6.0.0
+  version: 7.0.0
 platform: ruby
 authors:
 - Antar Azri