openclacky 1.0.1 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. checksums.yaml +4 -4
  2. data/CHANGELOG.md +33 -0
  3. data/lib/clacky/agent/llm_caller.rb +403 -0
  4. data/lib/clacky/agent/message_compressor.rb +15 -4
  5. data/lib/clacky/agent/message_compressor_helper.rb +41 -2
  6. data/lib/clacky/agent/tool_registry.rb +109 -0
  7. data/lib/clacky/agent.rb +69 -2
  8. data/lib/clacky/agent_config.rb +17 -0
  9. data/lib/clacky/cli.rb +65 -0
  10. data/lib/clacky/default_skills/channel-setup/SKILL.md +57 -3
  11. data/lib/clacky/default_skills/onboard/SKILL.md +14 -5
  12. data/lib/clacky/default_skills/onboard/scripts/install_builtin_skills.rb +175 -0
  13. data/lib/clacky/default_skills/skill-add/scripts/install_from_zip.rb +59 -26
  14. data/lib/clacky/providers.rb +57 -3
  15. data/lib/clacky/server/channel/adapters/feishu/adapter.rb +14 -0
  16. data/lib/clacky/server/channel/adapters/feishu/bot.rb +10 -0
  17. data/lib/clacky/server/channel/adapters/feishu/message_parser.rb +1 -0
  18. data/lib/clacky/server/channel/adapters/weixin/adapter.rb +7 -0
  19. data/lib/clacky/server/channel/channel_manager.rb +103 -4
  20. data/lib/clacky/server/channel/channel_ui_controller.rb +8 -2
  21. data/lib/clacky/server/discover.rb +77 -0
  22. data/lib/clacky/server/epipe_safe_io.rb +105 -0
  23. data/lib/clacky/server/http_server.rb +90 -46
  24. data/lib/clacky/server/server_master.rb +6 -0
  25. data/lib/clacky/skill.rb +30 -0
  26. data/lib/clacky/utils/file_processor.rb +14 -40
  27. data/lib/clacky/utils/model_pricing.rb +95 -0
  28. data/lib/clacky/version.rb +1 -1
  29. data/lib/clacky/web/app.css +157 -31
  30. data/lib/clacky/web/i18n.js +18 -2
  31. data/lib/clacky/web/index.html +8 -2
  32. data/lib/clacky/web/onboard.js +77 -1
  33. data/lib/clacky/web/sessions.js +31 -19
  34. data/lib/clacky/web/settings.js +127 -6
  35. data/lib/clacky/web/skills.js +4 -0
  36. data/lib/clacky.rb +5 -0
  37. metadata +5 -2
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9d6ba5a62f7a352730705db11aff8ab76af059764903eb4413bd5a0aa835fecf
4
- data.tar.gz: 58ba8fdcf23b5dabcc4a8ed709be0f34a9d27a5be83601fee685a638eb3ff445
3
+ metadata.gz: 448b47d4336764c1646147f9b86fc04f8bad84a34565b9b67cbf558000c185bf
4
+ data.tar.gz: 827ace1367511360cd6586f5a89529b504d31cdce68d5ecd90fadbe92069c2b5
5
5
  SHA512:
6
- metadata.gz: 00e3f00119cad74d7da43519a1a12332e509c0050946d713dea17db539bbadf0099e96ea5369cc19046fd0bc1c224849cbbaf43addfe0708858780a370067b3b
7
- data.tar.gz: 4e7888c952dd49c664c67212c0986b62bd7745887dae7d85bce14b3f36c544fc5bd9ca27f1851f04e14477cfd9316938605b6ae0f89b19652cadd1442c6dc564
6
+ metadata.gz: 667591fbe92e0e4d01de03cd1e9924ff595a1a11fa5196a7b338675366e37445d7cfe02844fc6bd1eb768ab54134d56195a9573fb95dc20c57d448429bcfb8d2
7
+ data.tar.gz: b324a9f5161eb7574f846736c200341fb2f4db39786f3bd5c2210c178a8c2ed115a520251e36da4ed82572ea6ebf0e88d1072a51536a817169d0838ae86d7dea
data/CHANGELOG.md CHANGED
@@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.3] - 2026-05-09
9
+
10
+ ### Added
11
+ - **Channel send command — push messages from CLI/agent to IM channels.** New `clacky channel send` CLI command and full outbound channel pipeline. The agent can now actively reach out to users on Feishu/WeCom/WeChat (e.g. for cron tasks or background completions) instead of only replying. Includes a new `ChannelManager` for routing, multi-master server discovery, and proper `chat_id` extraction for outbound messages. (#73)
12
+ - **`--model` flag to override the model per invocation.** Run any one-off command with a different model without changing config: `clacky --model gpt-4o-mini "..."`. Useful for quick comparisons or routing specific tasks to cheaper/faster models. (#76)
13
+ - **Fuzzy tool-name resolution for cross-model compatibility.** When a model emits a slightly off tool name (e.g. `read_file` vs `file_reader`, case mismatches, or hyphen/underscore differences), the agent now resolves it to the closest registered tool instead of erroring out. Significantly improves reliability when switching between Claude, GPT, and other providers. (#78)
14
+ - **Context overflow auto-recovery.** When an upstream LLM call hits a context-length error, the agent now detects it via `LlmCaller`'s error classification and automatically compresses message history to retry — instead of bubbling a hard error to the user. Backed by 175 new error-detection and 169 new recovery specs.
15
+ - **Refined session list UI with SVG icons.** Reworked sidebar session list with crisp SVG icons and tightened styling for a more polished look. (#83)
16
+
17
+ ### Fixed
18
+ - **EPIPE crashes when stdout/stderr is closed.** Wrapped server I/O in `EpipeSafeIO` so the master/web server no longer crashes when its output stream goes away (e.g. terminal closed, pipe broken). Covered by 193 new specs.
19
+ - **Duplicate `$` in CLI completion line.** Removed the stray dollar sign that appeared at the end of completed commands. (C-5583, #86)
20
+ - **Session list scroll jump on "load more".** The list no longer snaps back to the top when older sessions are paginated in. (C-5568, #85)
21
+ - Reverted an earlier message line-wrap change (#74) that caused regressions; will be revisited. (#84)
22
+
23
+
24
+
25
+ ### Added
26
+ - **Multi-region provider endpoints.** Providers can now expose multiple endpoint variants (e.g. global vs. CN-optimized Anthropic), and you can switch between them from both the onboarding flow and the Settings page. Bundled with updated model pricing data so cost estimates stay accurate across regions. (#67)
27
+ - **Pre-installed platform-recommended skills during onboarding.** New users get a curated set of skills automatically during onboard — downloaded concurrently with dual-host fallback and a hard deadline so onboarding never hangs on a slow mirror. (#68)
28
+ - **Builtin skills served via platform API.** Recommended skills are now fetched through `/api/v1/skills/builtin`, making the list easier to update without shipping a new gem. (#72)
29
+ - **Feishu group chats: respond only when @-mentioned.** The Feishu adapter now parses the mentions array and ignores group messages that don't @ the bot, so the bot no longer replies to every message in a busy group. Sessions are also isolated per (chat, user) pair by default (`:chat_user` binding mode), preventing context leaks between DMs and groups. (#71)
30
+
31
+ ### Fixed
32
+ - **Recover from truncated upstream tool calls.** When an upstream LLM response cuts off mid tool-call, the agent now detects the truncation and recovers automatically instead of getting stuck. Covered by extensive new tests.
33
+ - **Feedback option click now sends the message.** Clicking a suggested feedback option previously set the input text but silently failed to send (due to a `sendMessage` vs `_sendMessage` scope bug). Now it dispatches immediately as expected. (#69)
34
+ - **Sidebar footer and input area heights aligned.** Introduced a shared `--footer-height` CSS variable (56px) and reworked the stop button to use a pseudo-element square for pixel-perfect centering — both columns now line up cleanly. (#70)
35
+ - **Feishu bot fails closed on API outage.** If `/open-apis/bot/v3/info` fails and `bot_open_id` can't be resolved, the adapter now drops group messages (with a warning) instead of spamming every group message as a fallback.
36
+ - **`preview.md` no longer pollutes user project directories.** Preview files are written to the system tmpdir, and plain text formats (md/log/csv) skip preview generation entirely since they're already readable as-is.
37
+
38
+ ### More
39
+ - Added agent stop logging to make interrupt / stop chains easier to debug.
40
+
8
41
  ## [1.0.1] - 2026-05-06
9
42
 
10
43
  ### Added
@@ -79,6 +79,14 @@ module Clacky
79
79
  # the error is something else and we let it propagate.
80
80
  force_reasoning_content_pad = false
81
81
  thinking_retry_attempted = false
82
+ # One-shot flag for context-overflow recovery. When the server complains
83
+ # the input exceeds the model's context window, we run a forced
84
+ # compression with pull_back_from_tail: 1 (preserves the model's
85
+ # two-checkpoint prompt cache) and retry the original request once.
86
+ # We retry at most once — if still overflowing afterward, the issue is
87
+ # something else (e.g. tool schemas alone exceed the window) and we let
88
+ # the error propagate.
89
+ context_overflow_retry_attempted = false
82
90
 
83
91
  begin
84
92
  begin
@@ -101,6 +109,19 @@ module Clacky
101
109
  # Successful response — if we were probing, confirm primary is healthy.
102
110
  handle_probe_success if @config.probing?
103
111
 
112
+ # ── Upstream truncation detector ──────────────────────────────────
113
+ # OpenRouter / Bedrock and other routers sometimes close the SSE
114
+ # stream mid-tool_use: we receive finish_reason="stop" together with
115
+ # a syntactically valid tool_call whose `arguments` JSON is empty,
116
+ # "{}" (placeholder before any key was streamed), or otherwise
117
+ # unparseable. Treat this as retryable — otherwise the agent would
118
+ # execute a tool with empty args (often failing cryptically) or
119
+ # silently exit thinking the task is done.
120
+ #
121
+ # Raises UpstreamTruncatedError (a RetryableError) so the rescue
122
+ # block below handles retry + fallback identically to 5xx/429.
123
+ detect_upstream_truncation!(response)
124
+
104
125
  rescue Faraday::TimeoutError => e
105
126
  # ── Read-timeout path (distinct from connection-level failures) ──
106
127
  # Faraday::TimeoutError on our non-streaming POST almost always means
@@ -207,6 +228,55 @@ module Clacky
207
228
  end
208
229
 
209
230
  rescue Clacky::BadRequestError => e
231
+ # One-shot recovery for "context too long" errors. The model's
232
+ # context window is exceeded by the current history+tools+system
233
+ # prompt. We run a forced compression with pull_back_from_tail: 1
234
+ # (preserves the two-checkpoint prompt cache so the compression
235
+ # call itself still hits cache#A on the second-to-last position),
236
+ # then retry the original request once.
237
+ if !context_overflow_retry_attempted &&
238
+ !@compressing_for_overflow &&
239
+ context_too_long_error?(e) &&
240
+ respond_to?(:compress_messages_if_needed, true)
241
+ context_overflow_retry_attempted = true
242
+ Clacky::Logger.info(
243
+ "[context-overflow] caught BadRequestError, attempting forced compression with pull-back",
244
+ error_message: e.message[0, 200],
245
+ history_size: @history.size,
246
+ previous_total_tokens: @previous_total_tokens
247
+ )
248
+ # Layer 1: standard cache-preserving compression (pull_back: 1).
249
+ # Handles 99% of real overflow cases (newest message tipped the
250
+ # request just past the window).
251
+ if perform_context_overflow_compression(mode: :standard)
252
+ retry
253
+ end
254
+
255
+ # Layer 2: aggressive fallback. The Layer 1 compression call
256
+ # itself overflowed — happens when a single newly-appended
257
+ # message is enormous (huge tool_result, pasted file, etc.) so
258
+ # popping just K=1 didn't bring the request below the window.
259
+ # Pop ~half the history this time; sacrifices prompt cache to
260
+ # guarantee the compression call fits.
261
+ Clacky::Logger.warn(
262
+ "[context-overflow] standard compression failed, escalating to aggressive mode"
263
+ )
264
+ if perform_context_overflow_compression(mode: :aggressive)
265
+ retry
266
+ end
267
+
268
+ # Both layers exhausted. Let the original error propagate so the
269
+ # user sees the underlying provider message. This should be
270
+ # extremely rare — would require both halves of the history to
271
+ # individually exceed the window, which is essentially impossible
272
+ # under the "previous turn succeeded" invariant.
273
+ Clacky::Logger.error(
274
+ "[context-overflow] both standard and aggressive compression failed; " \
275
+ "propagating original error"
276
+ )
277
+ raise
278
+ end
279
+
210
280
  # One-shot recovery for thinking-mode providers (DeepSeek V4, Kimi K2)
211
281
  # that require every assistant message in the history to carry a
212
282
  # reasoning_content field. The history-evidence heuristic in
@@ -230,6 +300,49 @@ module Clacky
230
300
  token_data = track_cost(response[:usage], raw_api_usage: response[:raw_api_usage])
231
301
  response[:token_usage] = token_data
232
302
 
303
+ # [DIAG] Log raw client response shape. Only emit when we see the
304
+ # "finish_reason=stop + non-empty tool_calls" combo, or when any
305
+ # tool_call's arguments look empty/unparseable — both indicate the
306
+ # upstream (Bedrock/relay/model) cut the tool_use stream short.
307
+ # Normal responses produce no log line (too noisy).
308
+ begin
309
+ tool_calls = response[:tool_calls] || []
310
+ if !tool_calls.empty?
311
+ raw_tcs = tool_calls.map do |c|
312
+ args_str = c[:arguments].is_a?(String) ? c[:arguments] : c[:arguments].to_s
313
+ parseable = begin
314
+ JSON.parse(args_str)
315
+ true
316
+ rescue StandardError
317
+ false
318
+ end
319
+ {
320
+ name: c[:name].to_s,
321
+ args_len: args_str.length,
322
+ args_parseable: parseable,
323
+ args_head: args_str[0, 120]
324
+ }
325
+ end
326
+ truncated_call = raw_tcs.any? { |t| t[:args_len] == 0 || t[:args_len] == 2 || !t[:args_parseable] }
327
+ suspicious = response[:finish_reason] == "stop"
328
+
329
+ if suspicious || truncated_call
330
+ Clacky::Logger.warn("llm.response_suspicious",
331
+ model: current_model,
332
+ finish_reason: response[:finish_reason].to_s,
333
+ tool_calls_count: raw_tcs.size,
334
+ tool_calls: raw_tcs,
335
+ completion_tokens: token_data[:completion_tokens],
336
+ ttft_ms: response.dig(:latency, :ttft_ms),
337
+ combo_stop_with_toolcalls: suspicious,
338
+ has_truncated_args: truncated_call
339
+ )
340
+ end
341
+ end
342
+ rescue StandardError => e
343
+ Clacky::Logger.warn("llm.response_log_failed", error: e.message)
344
+ end
345
+
233
346
  response
234
347
  ensure
235
348
  # Close any "retrying" progress slot that was opened during the
@@ -286,6 +399,101 @@ module Clacky
286
399
  )
287
400
  end
288
401
 
402
+ # Run a forced compression to recover from a context-overflow error.
403
+ # Called by the BadRequestError rescue when context_too_long_error?
404
+ # returns true.
405
+ #
406
+ # Two-layer defence:
407
+ # ────────────────────────────────────────────────────────────────────
408
+ # Layer 1 (mode: :standard, default) — preserves prompt cache.
409
+ # Pop K=1 message from @history tail, then run compression. This
410
+ # frees just enough token budget for the compression LLM call
411
+ # itself to fit, while preserving the model's two-checkpoint prompt
412
+ # cache (cache#A at second-to-last position is still hit). The
413
+ # popped message is reattached to the rebuilt history's tail by
414
+ # handle_compression_response, so recent task progress is not lost.
415
+ # Handles 99% of real-world cases where overflow is caused by the
416
+ # newest message pushing total just past the window.
417
+ #
418
+ # Layer 2 (mode: :aggressive) — sacrifices prompt cache to survive.
419
+ # Pop ~half the history (capped) from the tail. This dramatically
420
+ # shrinks the compression call's input regardless of how big any
421
+ # single message is. Used as a fallback when Layer 1 itself raises
422
+ # context_too_long — i.e. a single newly-appended message is so
423
+ # large (e.g. >50K-token tool_result, pasted huge file) that even
424
+ # removing it didn't bring the request under the window, OR the
425
+ # popped message was small but earlier history grew past the limit.
426
+ # Pulled-back messages are still reattached after compression so no
427
+ # user content is silently dropped.
428
+ #
429
+ # @param mode [Symbol] :standard or :aggressive
430
+ # @return [Boolean] true if compression succeeded (caller should retry
431
+ # the original request), false if compression was unable to run
432
+ # (compression disabled, history too short, etc.) or itself failed
433
+ # — caller decides whether to escalate to the next layer or
434
+ # propagate the original error.
435
+ private def perform_context_overflow_compression(mode: :standard)
436
+ return false unless respond_to?(:compress_messages_if_needed, true)
437
+
438
+ # Compute pull-back count.
439
+ # Standard: K=1 (cache-preserving).
440
+ # Aggressive: pop ~half the history, but never less than 4 and never
441
+ # more than (history_size - 2) so we always keep system + at least
442
+ # one recent message. Capped at 64 to bound the worst case (an
443
+ # enormous history that should never realistically occur).
444
+ pull_back =
445
+ if mode == :aggressive
446
+ half = @history.size / 2
447
+ [[half, 4].max, [@history.size - 2, 64].min].min
448
+ else
449
+ 1
450
+ end
451
+
452
+ @compressing_for_overflow = true
453
+ compression_context = nil
454
+
455
+ begin
456
+ compression_context = compress_messages_if_needed(
457
+ force: true,
458
+ pull_back_from_tail: pull_back
459
+ )
460
+ return false if compression_context.nil?
461
+
462
+ compression_message = compression_context[:compression_message]
463
+ @history.append(compression_message)
464
+
465
+ response = call_llm # recursive — guarded by @compressing_for_overflow
466
+ handle_compression_response(response, compression_context)
467
+ Clacky::Logger.info(
468
+ "[context-overflow] compression succeeded",
469
+ mode: mode,
470
+ pull_back: pull_back
471
+ )
472
+ true
473
+ rescue => e
474
+ # Compression failed mid-flight. Restore @history to a sensible state:
475
+ # roll back the compression instruction we appended, and re-append the
476
+ # pulled-back messages so the user's recent work isn't silently lost.
477
+ if compression_context
478
+ cm = compression_context[:compression_message]
479
+ @history.rollback_before(cm) if cm
480
+ (compression_context[:pulled_back_messages] || []).each do |m|
481
+ @history.append(m)
482
+ end
483
+ end
484
+ Clacky::Logger.warn(
485
+ "[context-overflow] compression failed during overflow recovery",
486
+ mode: mode,
487
+ pull_back: pull_back,
488
+ error_class: e.class.name,
489
+ error_message: e.message[0, 200]
490
+ )
491
+ false
492
+ ensure
493
+ @compressing_for_overflow = false
494
+ end
495
+ end
496
+
289
497
  # True when a 400 BadRequestError is specifically about a missing
290
498
  # reasoning_content field in thinking mode (DeepSeek V4, Kimi K2 thinking).
291
499
  # We require TWO distinct substrings to avoid false positives — a generic
@@ -302,6 +510,153 @@ module Clacky
302
510
  msg.include?("must be provided"))
303
511
  end
304
512
 
513
+ # True when a 400 BadRequestError indicates the request exceeded the
514
+ # model's context window (i.e. the conversation history is too long).
515
+ #
516
+ # We deliberately favour broad detection over narrow precision:
517
+ # - False positive cost: one extra (no-op) compression cycle.
518
+ # - False negative cost: user is stuck — every retry hits the same wall.
519
+ # So the matcher is intentionally permissive.
520
+ #
521
+ # Coverage (verified against real production error strings):
522
+ #
523
+ # OpenAI:
524
+ # "This model's maximum context length is 128000 tokens. However
525
+ # you requested ... Please reduce the length of the messages."
526
+ # error.code == "context_length_exceeded"
527
+ #
528
+ # Anthropic:
529
+ # "prompt is too long: 218849 tokens > 200000 maximum"
530
+ #
531
+ # Qwen / Alibaba (DashScope):
532
+ # "You passed 117345 input tokens and requested 8192 output tokens.
533
+ # However the model's context length is only 125536 tokens, resulting
534
+ # in a maximum input length of 117344 tokens. Please reduce the length
535
+ # of the input prompt. (parameter=input_tokens, value=117345)"
536
+ #
537
+ # Qwen / Alibaba (DashScope) — newer/terser format (qwen3.6 series):
538
+ # "InternalError.Algo.InvalidParameter: Range of input length should be [1, 229376]"
539
+ #
540
+ # DeepSeek / Kimi / MiniMax / most OpenAI-compatible relays:
541
+ # Variants of OpenAI-style "context length" / "tokens exceeds" wording.
542
+ #
543
+ # Generic gateways (Portkey, OpenRouter):
544
+ # "The total number of tokens exceeds the model's maximum context length"
545
+ private def context_too_long_error?(err)
546
+ return false unless err.is_a?(Clacky::BadRequestError)
547
+
548
+ msg = err.message.to_s.downcase
549
+
550
+ # Strong phrases — any one of these is conclusive on its own.
551
+ # Each phrase is two-or-more semantic words to avoid single-word noise.
552
+ strong_phrases = [
553
+ "context length", # OpenAI / Qwen / many compat APIs
554
+ "context_length_exceeded", # OpenAI error.code
555
+ "maximum context", # OpenAI variant
556
+ "maximum input length", # Qwen
557
+ "prompt is too long", # Anthropic
558
+ "input is too long", # Anthropic-compat relays
559
+ "exceeds the maximum context", # Portkey & generic gateways
560
+ "exceeds the model's context", # Generic
561
+ "exceeds the model's maximum", # Generic
562
+ "reduce the length of the input", # Qwen action hint
563
+ "reduce the length of the messages", # OpenAI action hint
564
+ "reduce the length of your", # Generic action hint
565
+ "reduce the length of the prompt", # Generic action hint
566
+ "range of input length" # Qwen DashScope qwen3.6+ terse format
567
+ ]
568
+ return true if strong_phrases.any? { |p| msg.include?(p) }
569
+
570
+ # Pattern 1: Anthropic-style "<N> tokens > <N> maximum"
571
+ return true if msg =~ /\d+\s*tokens?\s*>\s*\d+/
572
+
573
+ # Pattern 2: Qwen-style structured field "parameter=input_tokens"
574
+ return true if msg.include?("parameter=input_tokens")
575
+
576
+ false
577
+ end
578
+
579
+ # Detect upstream tool-call truncation and raise UpstreamTruncatedError
580
+ # so the standard RetryableError rescue (with fallback model support)
581
+ # handles retry identically to 5xx/429.
582
+ #
583
+ # Background: OpenRouter routes to Anthropic/Bedrock/etc. and passes
584
+ # through whatever the upstream sends. If the upstream closes the SSE
585
+ # stream mid-tool_use (observed with Anthropic at ~127 s TTFT under
586
+ # load), OpenRouter does NOT surface an error — it emits a valid
587
+ # `tool_calls[]` whose `arguments` is empty, `"{}"`, or non-parseable
588
+ # JSON. Without this check the agent would either execute the tool with
589
+ # empty args or (worse) silently exit thinking the task finished.
590
+ #
591
+ # Rule is deliberately narrow: we only intercept the case where the
592
+ # model streamed literally nothing into the tool_call arguments —
593
+ # i.e. `nil`, empty string, or the placeholder `"{}"`. Partial/invalid
594
+ # JSON (e.g. `{"path": "/tmp/x"`) is left to the existing
595
+ # ArgumentsParser → BadArgumentsError path, because the model already
596
+ # committed to specific values and feeding the parse error back as a
597
+ # tool_result lets it self-correct in one round-trip (faster than a
598
+ # blind retry from scratch).
599
+ private def detect_upstream_truncation!(response)
600
+ tool_calls = response[:tool_calls]
601
+ return if tool_calls.nil? || tool_calls.empty?
602
+
603
+ truncated = tool_calls.find { |tc| tool_call_args_truncated?(tc[:arguments]) }
604
+ return unless truncated
605
+
606
+ args_str = truncated[:arguments].is_a?(String) ? truncated[:arguments] : truncated[:arguments].to_s
607
+ Clacky::Logger.warn("llm.upstream_truncation_detected",
608
+ model: current_model,
609
+ tool_name: truncated[:name].to_s,
610
+ args_len: args_str.length,
611
+ args_head: args_str[0, 80],
612
+ finish_reason: response[:finish_reason].to_s,
613
+ completion_tokens: response.dig(:token_usage, :completion_tokens),
614
+ ttft_ms: response.dig(:latency, :ttft_ms)
615
+ )
616
+
617
+ # Inject a one-shot [SYSTEM] hint so a plain retry isn't doomed to the
618
+ # same fate when the truncation correlates with large tool_call args
619
+ # (e.g. writing a 5000-char file in one go). For infrastructure-level
620
+ # blips this hint is harmless — the retry usually succeeds on its own
621
+ # and the hint just sits in history without affecting behaviour.
622
+ inject_upstream_truncation_hint_if_first(truncated)
623
+
624
+ raise Clacky::UpstreamTruncatedError,
625
+ "[LLM] Upstream truncated tool_call `#{truncated[:name]}` " \
626
+ "(args=#{args_str[0, 40].inspect}). Retrying..."
627
+ end
628
+
629
+ # True when a tool_call's arguments field looks COMPLETELY empty —
630
+ # i.e. the upstream stream was cut before the model wrote any real
631
+ # content into the arguments JSON.
632
+ #
633
+ # Rules:
634
+ # - nil / non-String / empty string → truncated (nothing at all)
635
+ # - parses to {} (empty object) → truncated (placeholder only)
636
+ # - anything else (including partial/invalid JSON like `{"path":
637
+ # "/tmp/x"` where the model already started writing) → NOT
638
+ # truncated by this detector
639
+ #
640
+ # Partial-JSON cases are deliberately left to the existing
641
+ # ArgumentsParser → BadArgumentsError path, which surfaces the parse
642
+ # error back to the LLM as a tool_result so it can self-correct. That
643
+ # is more efficient than a blind retry when the model already wrote
644
+ # most of the args.
645
+ private def tool_call_args_truncated?(args)
646
+ return true if args.nil?
647
+ return true unless args.is_a?(String)
648
+ return true if args.empty?
649
+
650
+ parsed = begin
651
+ JSON.parse(args)
652
+ rescue JSON::ParserError
653
+ # Partial/invalid JSON — let ArgumentsParser handle it downstream.
654
+ return false
655
+ end
656
+
657
+ parsed.is_a?(Hash) && parsed.empty?
658
+ end
659
+
305
660
  # On the FIRST Faraday::TimeoutError within a task, append a [SYSTEM]
306
661
  # user message to the history instructing the model to break its work
307
662
  # into smaller steps. Subsequent timeouts in the same task are ignored
@@ -345,6 +700,54 @@ module Clacky
345
700
  "LLM response timed out — asking model to break the task into smaller steps and retrying..."
346
701
  )
347
702
  end
703
+
704
+ # On the FIRST upstream-truncation detection within a task, append a
705
+ # [SYSTEM] user message nudging the model toward smaller tool_call args.
706
+ # This guards against the (real but rare) case where the upstream SSE
707
+ # cut correlates with large tool_call payloads — a plain retry on the
708
+ # same oversized args would keep tripping the same wire.
709
+ #
710
+ # For purely infrastructural truncations (Anthropic edge blip, router
711
+ # hiccup), the hint is harmless — the retry will succeed and the hint
712
+ # just sits unused in history. Cheaper than letting the agent burn
713
+ # through its retry budget on the same oversized payload.
714
+ #
715
+ # Same plumbing as inject_large_output_hint_if_first_timeout: one-shot
716
+ # per task, carries `system_injected: true` so it's hidden from UI
717
+ # replay and skipped by compression/caching placement logic. Reset per
718
+ # task via Agent#run (see @task_upstream_truncation_hint_injected).
719
+ private def inject_upstream_truncation_hint_if_first(truncated_call)
720
+ return if @task_upstream_truncation_hint_injected
721
+
722
+ @task_upstream_truncation_hint_injected = true
723
+
724
+ tool_name = truncated_call[:name].to_s
725
+ hint = "[SYSTEM] The previous response was cut short by the upstream provider " \
726
+ "before the `#{tool_name}` tool_call finished streaming. " \
727
+ "The partial tool_call has been discarded. To avoid the same problem on retry, " \
728
+ "please adapt your approach:\n" \
729
+ "- Prefer smaller tool_call arguments — large single-shot payloads are more likely to be truncated.\n" \
730
+ "- For long file content: create the file first with a minimal skeleton via `write`, " \
731
+ "then append sections one at a time with `edit`.\n" \
732
+ "- Break large tasks into multiple smaller tool calls instead of one big one.\n" \
733
+ "- Keep each tool-call argument comfortably under ~2000 characters when possible."
734
+
735
+ @history.append({
736
+ role: "user",
737
+ content: hint,
738
+ system_injected: true,
739
+ task_id: @current_task_id
740
+ })
741
+
742
+ Clacky::Logger.info(
743
+ "[llm_caller] Upstream truncation — injected 'smaller tool_call args' hint " \
744
+ "(tool=#{tool_name.inspect})"
745
+ )
746
+
747
+ @ui&.show_warning(
748
+ "Upstream response was truncated mid tool-call — asking model to use smaller steps and retrying..."
749
+ )
750
+ end
348
751
  end
349
752
  end
350
753
  end
@@ -93,8 +93,13 @@ module Clacky
93
93
  # @param original_messages [Array<Hash>] Original messages before compression
94
94
  # @param recent_messages [Array<Hash>] Recent messages to preserve
95
95
  # @param chunk_path [String, nil] Path to the archived chunk MD file (if saved)
96
- # @return [Array<Hash>] Rebuilt message list: system + compressed + recent
97
- def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [])
96
+ # @param pulled_back_messages [Array<Hash>] Messages temporarily popped from the
97
+ # tail of @history before the compression LLM call (to free up token budget so
98
+ # the compression call itself doesn't overflow context). These are NOT discarded —
99
+ # they are reattached to the tail of the rebuilt history so recent task progress
100
+ # is preserved. Default: [] (normal compression path doesn't need this).
101
+ # @return [Array<Hash>] Rebuilt message list: system + compressed + recent + pulled_back
102
+ def rebuild_with_compression(compressed_content, original_messages:, recent_messages:, chunk_path: nil, topics: nil, previous_chunks: [], pulled_back_messages: [])
98
103
  # Find and preserve system message
99
104
  system_msg = original_messages.find { |m| m[:role] == "system" }
100
105
 
@@ -112,13 +117,19 @@ module Clacky
112
117
  raise "LLM compression failed: unable to parse compressed messages"
113
118
  end
114
119
 
115
- # Return system message + compressed messages + recent messages.
120
+ # Return system message + compressed messages + recent messages + pulled_back messages.
116
121
  # Strip any system messages from recent_messages as a safety net —
117
122
  # get_recent_messages_with_tool_pairs already excludes them, but this
118
123
  # guard ensures we never end up with duplicate system prompts even if
119
124
  # the caller passes an unfiltered list.
125
+ #
126
+ # pulled_back_messages: messages that were temporarily popped from the tail
127
+ # of @history before the compression LLM call (to free up token budget so
128
+ # the compression call itself doesn't overflow context). They are reattached
129
+ # here to preserve recent task progress.
120
130
  safe_recent = recent_messages.reject { |m| m[:role] == "system" }
121
- [system_msg, *parsed_messages, *safe_recent].compact
131
+ safe_pulled_back = pulled_back_messages.reject { |m| m[:role] == "system" }
132
+ [system_msg, *parsed_messages, *safe_recent, *safe_pulled_back].compact
122
133
  end
123
134
 
124
135
 
@@ -103,8 +103,24 @@ module Clacky
103
103
 
104
104
  # Check if compression is needed and return compression context
105
105
  # @param force [Boolean] Force compression even if thresholds not met
106
+ # @param pull_back_from_tail [Integer] Number of messages to temporarily pop
107
+ # from the tail of history before building the compression instruction.
108
+ # Used by the context-overflow recovery path: when the current history
109
+ # is already at/over the model's context window, we cannot append even
110
+ # a small compression instruction without overflowing. Popping K messages
111
+ # from the tail frees up token budget for the compression call itself.
112
+ #
113
+ # Cache-preservation note: thanks to the model's two-checkpoint prompt
114
+ # cache (cache#A at second-to-last, cache#B at last), pulling back K=1
115
+ # message keeps cache#A intact — the compression LLM call still hits the
116
+ # cached prefix [system, m1..m(N-1)]. K>=2 sacrifices cache hits but is
117
+ # only used as fallback when one message isn't enough headroom.
118
+ #
119
+ # The popped messages are NOT discarded — they ride along in the
120
+ # returned context and are reattached to the rebuilt history's tail by
121
+ # handle_compression_response, so recent task progress is preserved.
106
122
  # @return [Hash, nil] Compression context or nil if not needed
107
- def compress_messages_if_needed(force: false)
123
+ def compress_messages_if_needed(force: false, pull_back_from_tail: 0)
108
124
  # Check if compression is enabled
109
125
  return nil unless @config.enable_compression
110
126
 
@@ -148,6 +164,27 @@ module Clacky
148
164
 
149
165
  # Get the most recent N messages, ensuring tool_calls/tool results pairs are kept together
150
166
  all_messages = @history.to_a
167
+
168
+ # Pull back K messages from the tail (context-overflow recovery path).
169
+ # We *physically* remove them from @history so the next call_llm
170
+ # (which reads @history.to_api) doesn't include them in the prompt.
171
+ # They will be reattached to the rebuilt history's tail by
172
+ # handle_compression_response after compression succeeds. If compression
173
+ # fails, the caller is responsible for restoring them via the returned
174
+ # context (rollback path).
175
+ pulled_back_messages = []
176
+ if pull_back_from_tail > 0
177
+ k = [pull_back_from_tail, all_messages.size - 1].min # never pop the system message
178
+ k.times do
179
+ popped = @history.pop_last
180
+ pulled_back_messages.unshift(popped) if popped
181
+ end
182
+ # Recompute all_messages from the now-shrunk history so downstream
183
+ # logic (recent_messages selection, build_compression_message) sees
184
+ # the post-pop view.
185
+ all_messages = @history.to_a
186
+ end
187
+
151
188
  recent_messages = get_recent_messages_with_tool_pairs(all_messages, target_recent_count)
152
189
  recent_messages = [] if recent_messages.nil?
153
190
 
@@ -160,6 +197,7 @@ module Clacky
160
197
  {
161
198
  compression_message: compression_message,
162
199
  recent_messages: recent_messages,
200
+ pulled_back_messages: pulled_back_messages,
163
201
  original_token_count: total_tokens,
164
202
  original_message_count: @history.size,
165
203
  compression_level: @compression_level
@@ -227,7 +265,8 @@ module Clacky
227
265
  recent_messages: compression_context[:recent_messages],
228
266
  chunk_path: chunk_path,
229
267
  topics: topics,
230
- previous_chunks: previous_chunks
268
+ previous_chunks: previous_chunks,
269
+ pulled_back_messages: compression_context[:pulled_back_messages] || []
231
270
  ))
232
271
 
233
272
  # Reset to the estimated size of the rebuilt (small) history.