RubyGems - openclacky - Versions diffs - 1.0.1 → 1.0.3 - Mend

openclacky 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +33 -0
data/lib/clacky/agent/llm_caller.rb +403 -0
data/lib/clacky/agent/message_compressor.rb +15 -4
data/lib/clacky/agent/message_compressor_helper.rb +41 -2
data/lib/clacky/agent/tool_registry.rb +109 -0
data/lib/clacky/agent.rb +69 -2
data/lib/clacky/agent_config.rb +17 -0
data/lib/clacky/cli.rb +65 -0
data/lib/clacky/default_skills/channel-setup/SKILL.md +57 -3
data/lib/clacky/default_skills/onboard/SKILL.md +14 -5
data/lib/clacky/default_skills/onboard/scripts/install_builtin_skills.rb +175 -0
data/lib/clacky/default_skills/skill-add/scripts/install_from_zip.rb +59 -26
data/lib/clacky/providers.rb +57 -3
data/lib/clacky/server/channel/adapters/feishu/adapter.rb +14 -0
data/lib/clacky/server/channel/adapters/feishu/bot.rb +10 -0
data/lib/clacky/server/channel/adapters/feishu/message_parser.rb +1 -0
data/lib/clacky/server/channel/adapters/weixin/adapter.rb +7 -0
data/lib/clacky/server/channel/channel_manager.rb +103 -4
data/lib/clacky/server/channel/channel_ui_controller.rb +8 -2
data/lib/clacky/server/discover.rb +77 -0
data/lib/clacky/server/epipe_safe_io.rb +105 -0
data/lib/clacky/server/http_server.rb +90 -46
data/lib/clacky/server/server_master.rb +6 -0
data/lib/clacky/skill.rb +30 -0
data/lib/clacky/utils/file_processor.rb +14 -40
data/lib/clacky/utils/model_pricing.rb +95 -0
data/lib/clacky/version.rb +1 -1
data/lib/clacky/web/app.css +157 -31
data/lib/clacky/web/i18n.js +18 -2
data/lib/clacky/web/index.html +8 -2
data/lib/clacky/web/onboard.js +77 -1
data/lib/clacky/web/sessions.js +31 -19
data/lib/clacky/web/settings.js +127 -6
data/lib/clacky/web/skills.js +4 -0
data/lib/clacky.rb +5 -0
metadata +5 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 9d6ba5a62f7a352730705db11aff8ab76af059764903eb4413bd5a0aa835fecf
-  data.tar.gz: 58ba8fdcf23b5dabcc4a8ed709be0f34a9d27a5be83601fee685a638eb3ff445
+  metadata.gz: 448b47d4336764c1646147f9b86fc04f8bad84a34565b9b67cbf558000c185bf
+  data.tar.gz: 827ace1367511360cd6586f5a89529b504d31cdce68d5ecd90fadbe92069c2b5
 SHA512:
-  metadata.gz: 00e3f00119cad74d7da43519a1a12332e509c0050946d713dea17db539bbadf0099e96ea5369cc19046fd0bc1c224849cbbaf43addfe0708858780a370067b3b
-  data.tar.gz: 4e7888c952dd49c664c67212c0986b62bd7745887dae7d85bce14b3f36c544fc5bd9ca27f1851f04e14477cfd9316938605b6ae0f89b19652cadd1442c6dc564
+  metadata.gz: 667591fbe92e0e4d01de03cd1e9924ff595a1a11fa5196a7b338675366e37445d7cfe02844fc6bd1eb768ab54134d56195a9573fb95dc20c57d448429bcfb8d2
+  data.tar.gz: b324a9f5161eb7574f846736c200341fb2f4db39786f3bd5c2210c178a8c2ed115a520251e36da4ed82572ea6ebf0e88d1072a51536a817169d0838ae86d7dea

data/CHANGELOG.md CHANGED Viewed

@@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [1.0.3] - 2026-05-09
+### Added
+- **Channel send command — push messages from CLI/agent to IM channels.** New `clacky channel send` CLI command and full outbound channel pipeline. The agent can now actively reach out to users on Feishu/WeCom/WeChat (e.g. for cron tasks or background completions) instead of only replying. Includes a new `ChannelManager` for routing, multi-master server discovery, and proper `chat_id` extraction for outbound messages. (#73)
+- **`--model` flag to override the model per invocation.** Run any one-off command with a different model without changing config: `clacky --model gpt-4o-mini "..."`. Useful for quick comparisons or routing specific tasks to cheaper/faster models. (#76)
+- **Fuzzy tool-name resolution for cross-model compatibility.** When a model emits a slightly off tool name (e.g. `read_file` vs `file_reader`, case mismatches, or hyphen/underscore differences), the agent now resolves it to the closest registered tool instead of erroring out. Significantly improves reliability when switching between Claude, GPT, and other providers. (#78)
+- **Context overflow auto-recovery.** When an upstream LLM call hits a context-length error, the agent now detects it via `LlmCaller`'s error classification and automatically compresses message history to retry — instead of bubbling a hard error to the user. Backed by 175 new error-detection and 169 new recovery specs.
+- **Refined session list UI with SVG icons.** Reworked sidebar session list with crisp SVG icons and tightened styling for a more polished look. (#83)
+### Fixed
+- **EPIPE crashes when stdout/stderr is closed.** Wrapped server I/O in `EpipeSafeIO` so the master/web server no longer crashes when its output stream goes away (e.g. terminal closed, pipe broken). Covered by 193 new specs.
+- **Duplicate `$` in CLI completion line.** Removed the stray dollar sign that appeared at the end of completed commands. (C-5583, #86)
+- **Session list scroll jump on "load more".** The list no longer snaps back to the top when older sessions are paginated in. (C-5568, #85)
+- Reverted an earlier message line-wrap change (#74) that caused regressions; will be revisited. (#84)
+### Added
+- **Multi-region provider endpoints.** Providers can now expose multiple endpoint variants (e.g. global vs. CN-optimized Anthropic), and you can switch between them from both the onboarding flow and the Settings page. Bundled with updated model pricing data so cost estimates stay accurate across regions. (#67)
+- **Pre-installed platform-recommended skills during onboarding.** New users get a curated set of skills automatically during onboard — downloaded concurrently with dual-host fallback and a hard deadline so onboarding never hangs on a slow mirror. (#68)
+- **Builtin skills served via platform API.** Recommended skills are now fetched through `/api/v1/skills/builtin`, making the list easier to update without shipping a new gem. (#72)
+- **Feishu group chats: respond only when @-mentioned.** The Feishu adapter now parses the mentions array and ignores group messages that don't @ the bot, so the bot no longer replies to every message in a busy group. Sessions are also isolated per (chat, user) pair by default (`:chat_user` binding mode), preventing context leaks between DMs and groups. (#71)
+### Fixed
+- **Recover from truncated upstream tool calls.** When an upstream LLM response cuts off mid tool-call, the agent now detects the truncation and recovers automatically instead of getting stuck. Covered by extensive new tests.
+- **Feedback option click now sends the message.** Clicking a suggested feedback option previously set the input text but silently failed to send (due to a `sendMessage` vs `_sendMessage` scope bug). Now it dispatches immediately as expected. (#69)
+- **Sidebar footer and input area heights aligned.** Introduced a shared `--footer-height` CSS variable (56px) and reworked the stop button to use a pseudo-element square for pixel-perfect centering — both columns now line up cleanly. (#70)
+- **Feishu bot fails closed on API outage.** If `/open-apis/bot/v3/info` fails and `bot_open_id` can't be resolved, the adapter now drops group messages (with a warning) instead of spamming every group message as a fallback.
+- **`preview.md` no longer pollutes user project directories.** Preview files are written to the system tmpdir, and plain text formats (md/log/csv) skip preview generation entirely since they're already readable as-is.
+### More
+- Added agent stop logging to make interrupt / stop chains easier to debug.
 ## [1.0.1] - 2026-05-06
 ### Added

data/lib/clacky/agent/llm_caller.rb CHANGED Viewed

@@ -79,6 +79,14 @@ module Clacky
         # the error is something else and we let it propagate.
         force_reasoning_content_pad = false
         thinking_retry_attempted = false
+        # One-shot flag for context-overflow recovery. When the server complains
+        # the input exceeds the model's context window, we run a forced
+        # compression with pull_back_from_tail: 1 (preserves the model's
+        # two-checkpoint prompt cache) and retry the original request once.
+        # We retry at most once — if still overflowing afterward, the issue is
+        # something else (e.g. tool schemas alone exceed the window) and we let
+        # the error propagate.
+        context_overflow_retry_attempted = false
         begin
           begin
@@ -101,6 +109,19 @@ module Clacky
           # Successful response — if we were probing, confirm primary is healthy.
           handle_probe_success if @config.probing?
+          # ── Upstream truncation detector ──────────────────────────────────
+          # OpenRouter / Bedrock and other routers sometimes close the SSE
+          # stream mid-tool_use: we receive finish_reason="stop" together with
+          # a syntactically valid tool_call whose `arguments` JSON is empty,
+          # "{}" (placeholder before any key was streamed), or otherwise
+          # unparseable. Treat this as retryable — otherwise the agent would
+          # execute a tool with empty args (often failing cryptically) or
+          # silently exit thinking the task is done.
+          #
+          # Raises UpstreamTruncatedError (a RetryableError) so the rescue
+          # block below handles retry + fallback identically to 5xx/429.
+          detect_upstream_truncation!(response)
         rescue Faraday::TimeoutError => e
           # ── Read-timeout path (distinct from connection-level failures) ──
           # Faraday::TimeoutError on our non-streaming POST almost always means
@@ -207,6 +228,55 @@ module Clacky
         end
         rescue Clacky::BadRequestError => e
+          # One-shot recovery for "context too long" errors. The model's
+          # context window is exceeded by the current history+tools+system
+          # prompt. We run a forced compression with pull_back_from_tail: 1
+          # (preserves the two-checkpoint prompt cache so the compression
+          # call itself still hits cache#A on the second-to-last position),
+          # then retry the original request once.
+          if !context_overflow_retry_attempted &&
+              !@compressing_for_overflow &&
+              context_too_long_error?(e) &&
+              respond_to?(:compress_messages_if_needed, true)
+            context_overflow_retry_attempted = true
+            Clacky::Logger.info(
+              "[context-overflow] caught BadRequestError, attempting forced compression with pull-back",
+              error_message: e.message[0, 200],
+              history_size: @history.size,
+              previous_total_tokens: @previous_total_tokens
+            )
+            # Layer 1: standard cache-preserving compression (pull_back: 1).
+            # Handles 99% of real overflow cases (newest message tipped the
+            # request just past the window).
+            if perform_context_overflow_compression(mode: :standard)
+              retry
+            end
+            # Layer 2: aggressive fallback. The Layer 1 compression call
+            # itself overflowed — happens when a single newly-appended
+            # message is enormous (huge tool_result, pasted file, etc.) so
+            # popping just K=1 didn't bring the request below the window.
+            # Pop ~half the history this time; sacrifices prompt cache to
+            # guarantee the compression call fits.
+            Clacky::Logger.warn(
+              "[context-overflow] standard compression failed, escalating to aggressive mode"
+            )
+            if perform_context_overflow_compression(mode: :aggressive)
+              retry
+            end
+            # Both layers exhausted. Let the original error propagate so the
+            # user sees the underlying provider message. This should be
+            # extremely rare — would require both halves of the history to
+            # individually exceed the window, which is essentially impossible
+            # under the "previous turn succeeded" invariant.
+            Clacky::Logger.error(
+              "[context-overflow] both standard and aggressive compression failed; " \
+              "propagating original error"
+            )
+            raise
+          end
           # One-shot recovery for thinking-mode providers (DeepSeek V4, Kimi K2)
           # that require every assistant message in the history to carry a
           # reasoning_content field. The history-evidence heuristic in
@@ -230,6 +300,49 @@ module Clacky
         token_data = track_cost(response[:usage], raw_api_usage: response[:raw_api_usage])
         response[:token_usage] = token_data
+        # [DIAG] Log raw client response shape. Only emit when we see the
+        # "finish_reason=stop + non-empty tool_calls" combo, or when any
+        # tool_call's arguments look empty/unparseable — both indicate the
+        # upstream (Bedrock/relay/model) cut the tool_use stream short.
+        # Normal responses produce no log line (too noisy).
+        begin
+          tool_calls = response[:tool_calls] || []
+          if !tool_calls.empty?
+            raw_tcs = tool_calls.map do |c|
+              args_str = c[:arguments].is_a?(String) ? c[:arguments] : c[:arguments].to_s
+              parseable = begin
+                JSON.parse(args_str)
+                true
+              rescue StandardError
+                false
+              end
+              {
+                name: c[:name].to_s,
+                args_len: args_str.length,
+                args_parseable: parseable,
+                args_head: args_str[0, 120]
+              }
+            end
+            truncated_call = raw_tcs.any? { |t| t[:args_len] == 0 || t[:args_len] == 2 || !t[:args_parseable] }
+            suspicious     = response[:finish_reason] == "stop"
+            if suspicious || truncated_call
+              Clacky::Logger.warn("llm.response_suspicious",
+                model: current_model,
+                finish_reason: response[:finish_reason].to_s,
+                tool_calls_count: raw_tcs.size,
+                tool_calls: raw_tcs,
+                completion_tokens: token_data[:completion_tokens],
+                ttft_ms: response.dig(:latency, :ttft_ms),
+                combo_stop_with_toolcalls: suspicious,
+                has_truncated_args: truncated_call
+              )
+            end
+          end
+        rescue StandardError => e
+          Clacky::Logger.warn("llm.response_log_failed", error: e.message)
+        end
         response
         ensure
           # Close any "retrying" progress slot that was opened during the
@@ -286,6 +399,101 @@ module Clacky
         )
       end
+      # Run a forced compression to recover from a context-overflow error.
+      # Called by the BadRequestError rescue when context_too_long_error?
+      # returns true.
+      #
+      # Two-layer defence:
+      # ────────────────────────────────────────────────────────────────────
+      # Layer 1 (mode: :standard, default) — preserves prompt cache.
+      #   Pop K=1 message from @history tail, then run compression. This
+      #   frees just enough token budget for the compression LLM call
+      #   itself to fit, while preserving the model's two-checkpoint prompt
+      #   cache (cache#A at second-to-last position is still hit). The
+      #   popped message is reattached to the rebuilt history's tail by
+      #   handle_compression_response, so recent task progress is not lost.
+      #   Handles 99% of real-world cases where overflow is caused by the
+      #   newest message pushing total just past the window.
+      #
+      # Layer 2 (mode: :aggressive) — sacrifices prompt cache to survive.
+      #   Pop ~half the history (capped) from the tail. This dramatically
+      #   shrinks the compression call's input regardless of how big any
+      #   single message is. Used as a fallback when Layer 1 itself raises
+      #   context_too_long — i.e. a single newly-appended message is so
+      #   large (e.g. >50K-token tool_result, pasted huge file) that even
+      #   removing it didn't bring the request under the window, OR the
+      #   popped message was small but earlier history grew past the limit.
+      #   Pulled-back messages are still reattached after compression so no
+      #   user content is silently dropped.
+      #
+      # @param mode [Symbol] :standard or :aggressive
+      # @return [Boolean] true if compression succeeded (caller should retry
+      #   the original request), false if compression was unable to run
+      #   (compression disabled, history too short, etc.) or itself failed
+      #   — caller decides whether to escalate to the next layer or
+      #   propagate the original error.
+      private def perform_context_overflow_compression(mode: :standard)
+        return false unless respond_to?(:compress_messages_if_needed, true)
+        # Compute pull-back count.
+        # Standard: K=1 (cache-preserving).
+        # Aggressive: pop ~half the history, but never less than 4 and never
+        #   more than (history_size - 2) so we always keep system + at least
+        #   one recent message. Capped at 64 to bound the worst case (an
+        #   enormous history that should never realistically occur).
+        pull_back =
+          if mode == :aggressive
+            half = @history.size / 2
+            [[half, 4].max, [@history.size - 2, 64].min].min
+          else
+            1
+          end
+        @compressing_for_overflow = true
+        compression_context = nil
+        begin
+          compression_context = compress_messages_if_needed(
+            force: true,
+            pull_back_from_tail: pull_back
+          )
+          return false if compression_context.nil?
+          compression_message = compression_context[:compression_message]
+          @history.append(compression_message)
+          response = call_llm  # recursive — guarded by @compressing_for_overflow
+          handle_compression_response(response, compression_context)
+          Clacky::Logger.info(
+            "[context-overflow] compression succeeded",
+            mode: mode,
+            pull_back: pull_back
+          )
+          true
+        rescue => e
+          # Compression failed mid-flight. Restore @history to a sensible state:
+          # roll back the compression instruction we appended, and re-append the
+          # pulled-back messages so the user's recent work isn't silently lost.
+          if compression_context
+            cm = compression_context[:compression_message]
+            @history.rollback_before(cm) if cm
+            (compression_context[:pulled_back_messages] || []).each do |m|
+              @history.append(m)
+            end
+          end
+          Clacky::Logger.warn(
+            "[context-overflow] compression failed during overflow recovery",
+            mode: mode,
+            pull_back: pull_back,
+            error_class: e.class.name,
+            error_message: e.message[0, 200]
+          )
+          false
+        ensure
+          @compressing_for_overflow = false
+        end
+      end
       # True when a 400 BadRequestError is specifically about a missing
       # reasoning_content field in thinking mode (DeepSeek V4, Kimi K2 thinking).
       # We require TWO distinct substrings to avoid false positives — a generic
@@ -302,6 +510,153 @@ module Clacky
            msg.include?("must be provided"))
       end
+      # True when a 400 BadRequestError indicates the request exceeded the
+      # model's context window (i.e. the conversation history is too long).
+      #
+      # We deliberately favour broad detection over narrow precision:
+      #   - False positive cost: one extra (no-op) compression cycle.
+      #   - False negative cost: user is stuck — every retry hits the same wall.
+      # So the matcher is intentionally permissive.
+      #
+      # Coverage (verified against real production error strings):
+      #
+      #   OpenAI:
+      #     "This model's maximum context length is 128000 tokens. However
+      #      you requested ... Please reduce the length of the messages."
+      #     error.code == "context_length_exceeded"
+      #
+      #   Anthropic:
+      #     "prompt is too long: 218849 tokens > 200000 maximum"
+      #
+      #   Qwen / Alibaba (DashScope):
+      #     "You passed 117345 input tokens and requested 8192 output tokens.
+      #      However the model's context length is only 125536 tokens, resulting
+      #      in a maximum input length of 117344 tokens. Please reduce the length
+      #      of the input prompt. (parameter=input_tokens, value=117345)"
+      #
+      #   Qwen / Alibaba (DashScope) — newer/terser format (qwen3.6 series):
+      #     "InternalError.Algo.InvalidParameter: Range of input length should be [1, 229376]"
+      #
+      #   DeepSeek / Kimi / MiniMax / most OpenAI-compatible relays:
+      #     Variants of OpenAI-style "context length" / "tokens exceeds" wording.
+      #
+      #   Generic gateways (Portkey, OpenRouter):
+      #     "The total number of tokens exceeds the model's maximum context length"
+      private def context_too_long_error?(err)
+        return false unless err.is_a?(Clacky::BadRequestError)
+        msg = err.message.to_s.downcase
+        # Strong phrases — any one of these is conclusive on its own.
+        # Each phrase is two-or-more semantic words to avoid single-word noise.
+        strong_phrases = [
+          "context length",                 # OpenAI / Qwen / many compat APIs
+          "context_length_exceeded",        # OpenAI error.code
+          "maximum context",                # OpenAI variant
+          "maximum input length",           # Qwen
+          "prompt is too long",             # Anthropic
+          "input is too long",              # Anthropic-compat relays
+          "exceeds the maximum context",    # Portkey & generic gateways
+          "exceeds the model's context",    # Generic
+          "exceeds the model's maximum",    # Generic
+          "reduce the length of the input", # Qwen action hint
+          "reduce the length of the messages", # OpenAI action hint
+          "reduce the length of your",      # Generic action hint
+          "reduce the length of the prompt", # Generic action hint
+          "range of input length"           # Qwen DashScope qwen3.6+ terse format
+        ]
+        return true if strong_phrases.any? { |p| msg.include?(p) }
+        # Pattern 1: Anthropic-style "<N> tokens > <N> maximum"
+        return true if msg =~ /\d+\s*tokens?\s*>\s*\d+/
+        # Pattern 2: Qwen-style structured field "parameter=input_tokens"
+        return true if msg.include?("parameter=input_tokens")
+        false
+      end
+      # Detect upstream tool-call truncation and raise UpstreamTruncatedError
+      # so the standard RetryableError rescue (with fallback model support)
+      # handles retry identically to 5xx/429.
+      #
+      # Background: OpenRouter routes to Anthropic/Bedrock/etc. and passes
+      # through whatever the upstream sends. If the upstream closes the SSE
+      # stream mid-tool_use (observed with Anthropic at ~127 s TTFT under
+      # load), OpenRouter does NOT surface an error — it emits a valid
+      # `tool_calls[]` whose `arguments` is empty, `"{}"`, or non-parseable
+      # JSON. Without this check the agent would either execute the tool with
+      # empty args or (worse) silently exit thinking the task finished.
+      #
+      # Rule is deliberately narrow: we only intercept the case where the
+      # model streamed literally nothing into the tool_call arguments —
+      # i.e. `nil`, empty string, or the placeholder `"{}"`. Partial/invalid
+      # JSON (e.g. `{"path": "/tmp/x"`) is left to the existing
+      # ArgumentsParser → BadArgumentsError path, because the model already
+      # committed to specific values and feeding the parse error back as a
+      # tool_result lets it self-correct in one round-trip (faster than a
+      # blind retry from scratch).
+      private def detect_upstream_truncation!(response)
+        tool_calls = response[:tool_calls]
+        return if tool_calls.nil? || tool_calls.empty?
+        truncated = tool_calls.find { |tc| tool_call_args_truncated?(tc[:arguments]) }
+        return unless truncated
+        args_str = truncated[:arguments].is_a?(String) ? truncated[:arguments] : truncated[:arguments].to_s
+        Clacky::Logger.warn("llm.upstream_truncation_detected",
+          model: current_model,
+          tool_name: truncated[:name].to_s,
+          args_len: args_str.length,
+          args_head: args_str[0, 80],
+          finish_reason: response[:finish_reason].to_s,
+          completion_tokens: response.dig(:token_usage, :completion_tokens),
+          ttft_ms: response.dig(:latency, :ttft_ms)
+        )
+        # Inject a one-shot [SYSTEM] hint so a plain retry isn't doomed to the
+        # same fate when the truncation correlates with large tool_call args
+        # (e.g. writing a 5000-char file in one go). For infrastructure-level
+        # blips this hint is harmless — the retry usually succeeds on its own
+        # and the hint just sits in history without affecting behaviour.
+        inject_upstream_truncation_hint_if_first(truncated)
+        raise Clacky::UpstreamTruncatedError,
+          "[LLM] Upstream truncated tool_call `#{truncated[:name]}` " \
+          "(args=#{args_str[0, 40].inspect}). Retrying..."
+      end
+      # True when a tool_call's arguments field looks COMPLETELY empty —
+      # i.e. the upstream stream was cut before the model wrote any real
+      # content into the arguments JSON.
+      #
+      # Rules:
+      #   - nil / non-String / empty string  → truncated (nothing at all)
+      #   - parses to {} (empty object)      → truncated (placeholder only)
+      #   - anything else (including partial/invalid JSON like `{"path":
+      #     "/tmp/x"` where the model already started writing) → NOT
+      #     truncated by this detector
+      #
+      # Partial-JSON cases are deliberately left to the existing
+      # ArgumentsParser → BadArgumentsError path, which surfaces the parse
+      # error back to the LLM as a tool_result so it can self-correct. That
+      # is more efficient than a blind retry when the model already wrote
+      # most of the args.
+      private def tool_call_args_truncated?(args)
+        return true if args.nil?
+        return true unless args.is_a?(String)
+        return true if args.empty?
+        parsed = begin
+          JSON.parse(args)
+        rescue JSON::ParserError
+          # Partial/invalid JSON — let ArgumentsParser handle it downstream.
+          return false
+        end
+        parsed.is_a?(Hash) && parsed.empty?
+      end
       # On the FIRST Faraday::TimeoutError within a task, append a [SYSTEM]
       # user message to the history instructing the model to break its work
       # into smaller steps. Subsequent timeouts in the same task are ignored
@@ -345,6 +700,54 @@ module Clacky
           "LLM response timed out — asking model to break the task into smaller steps and retrying..."
         )
       end
+      # On the FIRST upstream-truncation detection within a task, append a
+      # [SYSTEM] user message nudging the model toward smaller tool_call args.
+      # This guards against the (real but rare) case where the upstream SSE
+      # cut correlates with large tool_call payloads — a plain retry on the
+      # same oversized args would keep tripping the same wire.
+      #
+      # For purely infrastructural truncations (Anthropic edge blip, router
+      # hiccup), the hint is harmless — the retry will succeed and the hint
+      # just sits unused in history. Cheaper than letting the agent burn
+      # through its retry budget on the same oversized payload.
+      #
+      # Same plumbing as inject_large_output_hint_if_first_timeout: one-shot
+      # per task, carries `system_injected: true` so it's hidden from UI
+      # replay and skipped by compression/caching placement logic. Reset per
+      # task via Agent#run (see @task_upstream_truncation_hint_injected).
+      private def inject_upstream_truncation_hint_if_first(truncated_call)
+        return if @task_upstream_truncation_hint_injected
+        @task_upstream_truncation_hint_injected = true
+        tool_name = truncated_call[:name].to_s
+        hint = "[SYSTEM] The previous response was cut short by the upstream provider " \
+               "before the `#{tool_name}` tool_call finished streaming. " \
+               "The partial tool_call has been discarded. To avoid the same problem on retry, " \
+               "please adapt your approach:\n" \
+               "- Prefer smaller tool_call arguments — large single-shot payloads are more likely to be truncated.\n" \
+               "- For long file content: create the file first with a minimal skeleton via `write`, " \
+               "then append sections one at a time with `edit`.\n" \
+               "- Break large tasks into multiple smaller tool calls instead of one big one.\n" \
+               "- Keep each tool-call argument comfortably under ~2000 characters when possible."
+        @history.append({
+          role: "user",
+          content: hint,
+          system_injected: true,
+          task_id: @current_task_id
+        })
+        Clacky::Logger.info(
+          "[llm_caller] Upstream truncation — injected 'smaller tool_call args' hint " \
+          "(tool=#{tool_name.inspect})"
+        )
+        @ui&.show_warning(
+          "Upstream response was truncated mid tool-call — asking model to use smaller steps and retrying..."
+        )
+      end
     end
   end
 end

data/lib/clacky/agent/message_compressor.rb CHANGED Viewed

@@ -93,8 +93,13 @@ module Clacky
     # @param original_messages [Array<Hash>] Original messages before compression
     # @param recent_messages [Array<Hash>] Recent messages to preserve
     # @param chunk_path [String, nil] Path to the archived chunk MD file (if saved)
-    # @return [Array<Hash>] Rebuilt message list: system + compressed + recent
-    def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [])
+    # @param pulled_back_messages [Array<Hash>] Messages temporarily popped from the
+    #   tail of @history before the compression LLM call (to free up token budget so
+    #   the compression call itself doesn't overflow context). These are NOT discarded —
+    #   they are reattached to the tail of the rebuilt history so recent task progress
+    #   is preserved. Default: [] (normal compression path doesn't need this).
+    # @return [Array<Hash>] Rebuilt message list: system + compressed + recent + pulled_back
+    def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: [])
       # Find and preserve system message
       system_msg = original_messages.find { |m| m[:role] == "system" }
@@ -112,13 +117,19 @@ module Clacky
         raise "LLM compression failed: unable to parse compressed messages"
       end
-      # Return system message + compressed messages + recent messages.
+      # Return system message + compressed messages + recent messages + pulled_back messages.
       # Strip any system messages from recent_messages as a safety net —
       # get_recent_messages_with_tool_pairs already excludes them, but this
       # guard ensures we never end up with duplicate system prompts even if
       # the caller passes an unfiltered list.
+      #
+      # pulled_back_messages: messages that were temporarily popped from the tail
+      # of @history before the compression LLM call (to free up token budget so
+      # the compression call itself doesn't overflow context). They are reattached
+      # here to preserve recent task progress.
       safe_recent = recent_messages.reject { |m| m[:role] == "system" }
-      [system_msg, *parsed_messages, *safe_recent].compact
+      safe_pulled_back = pulled_back_messages.reject { |m| m[:role] == "system" }
+      [system_msg, *parsed_messages, *safe_recent, *safe_pulled_back].compact
     end

data/lib/clacky/agent/message_compressor_helper.rb CHANGED Viewed

@@ -103,8 +103,24 @@ module Clacky
       # Check if compression is needed and return compression context
       # @param force [Boolean] Force compression even if thresholds not met
+      # @param pull_back_from_tail [Integer] Number of messages to temporarily pop
+      #   from the tail of history before building the compression instruction.
+      #   Used by the context-overflow recovery path: when the current history
+      #   is already at/over the model's context window, we cannot append even
+      #   a small compression instruction without overflowing. Popping K messages
+      #   from the tail frees up token budget for the compression call itself.
+      #
+      #   Cache-preservation note: thanks to the model's two-checkpoint prompt
+      #   cache (cache#A at second-to-last, cache#B at last), pulling back K=1
+      #   message keeps cache#A intact — the compression LLM call still hits the
+      #   cached prefix [system, m1..m(N-1)]. K>=2 sacrifices cache hits but is
+      #   only used as fallback when one message isn't enough headroom.
+      #
+      #   The popped messages are NOT discarded — they ride along in the
+      #   returned context and are reattached to the rebuilt history's tail by
+      #   handle_compression_response, so recent task progress is preserved.
       # @return [Hash, nil] Compression context or nil if not needed
-      def compress_messages_if_needed(force: false)
+      def compress_messages_if_needed(force: false, pull_back_from_tail: 0)
         # Check if compression is enabled
         return nil unless @config.enable_compression
@@ -148,6 +164,27 @@ module Clacky
         # Get the most recent N messages, ensuring tool_calls/tool results pairs are kept together
         all_messages = @history.to_a
+        # Pull back K messages from the tail (context-overflow recovery path).
+        # We *physically* remove them from @history so the next call_llm
+        # (which reads @history.to_api) doesn't include them in the prompt.
+        # They will be reattached to the rebuilt history's tail by
+        # handle_compression_response after compression succeeds. If compression
+        # fails, the caller is responsible for restoring them via the returned
+        # context (rollback path).
+        pulled_back_messages = []
+        if pull_back_from_tail > 0
+          k = [pull_back_from_tail, all_messages.size - 1].min  # never pop the system message
+          k.times do
+            popped = @history.pop_last
+            pulled_back_messages.unshift(popped) if popped
+          end
+          # Recompute all_messages from the now-shrunk history so downstream
+          # logic (recent_messages selection, build_compression_message) sees
+          # the post-pop view.
+          all_messages = @history.to_a
+        end
         recent_messages = get_recent_messages_with_tool_pairs(all_messages, target_recent_count)
         recent_messages = [] if recent_messages.nil?
@@ -160,6 +197,7 @@ module Clacky
         {
           compression_message: compression_message,
           recent_messages: recent_messages,
+          pulled_back_messages: pulled_back_messages,
           original_token_count: total_tokens,
           original_message_count: @history.size,
           compression_level: @compression_level
@@ -227,7 +265,8 @@ module Clacky
           recent_messages: compression_context[:recent_messages],
           chunk_path: chunk_path,
           topics: topics,
-          previous_chunks: previous_chunks
+          previous_chunks: previous_chunks,
+          pulled_back_messages: compression_context[:pulled_back_messages] || []
         ))
         # Reset to the estimated size of the rebuilt (small) history.