openclacky 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 9d6ba5a62f7a352730705db11aff8ab76af059764903eb4413bd5a0aa835fecf
4
- data.tar.gz: 58ba8fdcf23b5dabcc4a8ed709be0f34a9d27a5be83601fee685a638eb3ff445
3
+ metadata.gz: d36230a47c25a8b5fb04dfc14f9359155489a2539d0a699843e140deed1434ba
4
+ data.tar.gz: c237725ed637d2d7a852d3624611cca101290e2348e0c6befb2650342550ec03
5
5
  SHA512:
6
- metadata.gz: 00e3f00119cad74d7da43519a1a12332e509c0050946d713dea17db539bbadf0099e96ea5369cc19046fd0bc1c224849cbbaf43addfe0708858780a370067b3b
7
- data.tar.gz: 4e7888c952dd49c664c67212c0986b62bd7745887dae7d85bce14b3f36c544fc5bd9ca27f1851f04e14477cfd9316938605b6ae0f89b19652cadd1442c6dc564
6
+ metadata.gz: 89c65d848c67dff3ed63ae70cd6a0539a7a8068682d72009b34741ea09c44749f5fa05c5839bc9c02c5c499709c8e5bce321165561bdbf8a43500539d1e4b21c
7
+ data.tar.gz: 74ebac898a16e090481c8ba423ac7c2d9cafe918f09cdc87066b54c911034b941c713650d24aaa8d71c627c48d3c8c56a780c2ffa6e717448e4712cdd5ca9512
data/CHANGELOG.md CHANGED
@@ -5,6 +5,24 @@ All notable changes to this project will be documented in this file.
5
5
  The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
6
  and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [1.0.2] - 2026-05-07
9
+
10
+ ### Added
11
+ - **Multi-region provider endpoints.** Providers can now expose multiple endpoint variants (e.g. global vs. CN-optimized Anthropic), and you can switch between them from both the onboarding flow and the Settings page. Bundled with updated model pricing data so cost estimates stay accurate across regions. (#67)
12
+ - **Pre-installed platform-recommended skills during onboarding.** New users get a curated set of skills automatically during onboard — downloaded concurrently with dual-host fallback and a hard deadline so onboarding never hangs on a slow mirror. (#68)
13
+ - **Builtin skills served via platform API.** Recommended skills are now fetched through `/api/v1/skills/builtin`, making the list easier to update without shipping a new gem. (#72)
14
+ - **Feishu group chats: respond only when @-mentioned.** The Feishu adapter now parses the mentions array and ignores group messages that don't @ the bot, so the bot no longer replies to every message in a busy group. Sessions are also isolated per (chat, user) pair by default (`:chat_user` binding mode), preventing context leaks between DMs and groups. (#71)
15
+
16
+ ### Fixed
17
+ - **Recover from truncated upstream tool calls.** When an upstream LLM response cuts off mid tool-call, the agent now detects the truncation and recovers automatically instead of getting stuck. Covered by extensive new tests.
18
+ - **Feedback option click now sends the message.** Clicking a suggested feedback option previously set the input text but silently failed to send (due to a `sendMessage` vs `_sendMessage` scope bug). Now it dispatches immediately as expected. (#69)
19
+ - **Sidebar footer and input area heights aligned.** Introduced a shared `--footer-height` CSS variable (56px) and reworked the stop button to use a pseudo-element square for pixel-perfect centering — both columns now line up cleanly. (#70)
20
+ - **Feishu bot fails closed on API outage.** If `/open-apis/bot/v3/info` fails and `bot_open_id` can't be resolved, the adapter now drops group messages (with a warning) instead of spamming every group message as a fallback.
21
+ - **`preview.md` no longer pollutes user project directories.** Preview files are written to the system tmpdir, and plain text formats (md/log/csv) skip preview generation entirely since they're already readable as-is.
22
+
23
+ ### More
24
+ - Added agent stop logging to make interrupt / stop chains easier to debug.
25
+
8
26
  ## [1.0.1] - 2026-05-06
9
27
 
10
28
  ### Added
@@ -101,6 +101,19 @@ module Clacky
101
101
  # Successful response — if we were probing, confirm primary is healthy.
102
102
  handle_probe_success if @config.probing?
103
103
 
104
+ # ── Upstream truncation detector ──────────────────────────────────
105
+ # OpenRouter / Bedrock and other routers sometimes close the SSE
106
+ # stream mid-tool_use: we receive finish_reason="stop" together with
107
+ # a syntactically valid tool_call whose `arguments` JSON is empty,
108
+ # "{}" (placeholder before any key was streamed), or otherwise
109
+ # unparseable. Treat this as retryable — otherwise the agent would
110
+ # execute a tool with empty args (often failing cryptically) or
111
+ # silently exit thinking the task is done.
112
+ #
113
+ # Raises UpstreamTruncatedError (a RetryableError) so the rescue
114
+ # block below handles retry + fallback identically to 5xx/429.
115
+ detect_upstream_truncation!(response)
116
+
104
117
  rescue Faraday::TimeoutError => e
105
118
  # ── Read-timeout path (distinct from connection-level failures) ──
106
119
  # Faraday::TimeoutError on our non-streaming POST almost always means
@@ -230,6 +243,49 @@ module Clacky
230
243
  token_data = track_cost(response[:usage], raw_api_usage: response[:raw_api_usage])
231
244
  response[:token_usage] = token_data
232
245
 
246
+ # [DIAG] Log raw client response shape. Only emit when we see the
247
+ # "finish_reason=stop + non-empty tool_calls" combo, or when any
248
+ # tool_call's arguments look empty/unparseable — both indicate the
249
+ # upstream (Bedrock/relay/model) cut the tool_use stream short.
250
+ # Normal responses produce no log line (too noisy).
251
+ begin
252
+ tool_calls = response[:tool_calls] || []
253
+ if !tool_calls.empty?
254
+ raw_tcs = tool_calls.map do |c|
255
+ args_str = c[:arguments].is_a?(String) ? c[:arguments] : c[:arguments].to_s
256
+ parseable = begin
257
+ JSON.parse(args_str)
258
+ true
259
+ rescue StandardError
260
+ false
261
+ end
262
+ {
263
+ name: c[:name].to_s,
264
+ args_len: args_str.length,
265
+ args_parseable: parseable,
266
+ args_head: args_str[0, 120]
267
+ }
268
+ end
269
+ truncated_call = raw_tcs.any? { |t| t[:args_len] == 0 || t[:args_len] == 2 || !t[:args_parseable] }
270
+ suspicious = response[:finish_reason] == "stop"
271
+
272
+ if suspicious || truncated_call
273
+ Clacky::Logger.warn("llm.response_suspicious",
274
+ model: current_model,
275
+ finish_reason: response[:finish_reason].to_s,
276
+ tool_calls_count: raw_tcs.size,
277
+ tool_calls: raw_tcs,
278
+ completion_tokens: token_data[:completion_tokens],
279
+ ttft_ms: response.dig(:latency, :ttft_ms),
280
+ combo_stop_with_toolcalls: suspicious,
281
+ has_truncated_args: truncated_call
282
+ )
283
+ end
284
+ end
285
+ rescue StandardError => e
286
+ Clacky::Logger.warn("llm.response_log_failed", error: e.message)
287
+ end
288
+
233
289
  response
234
290
  ensure
235
291
  # Close any "retrying" progress slot that was opened during the
@@ -302,6 +358,87 @@ module Clacky
302
358
  msg.include?("must be provided"))
303
359
  end
304
360
 
361
+ # Detect upstream tool-call truncation and raise UpstreamTruncatedError
362
+ # so the standard RetryableError rescue (with fallback model support)
363
+ # handles retry identically to 5xx/429.
364
+ #
365
+ # Background: OpenRouter routes to Anthropic/Bedrock/etc. and passes
366
+ # through whatever the upstream sends. If the upstream closes the SSE
367
+ # stream mid-tool_use (observed with Anthropic at ~127 s TTFT under
368
+ # load), OpenRouter does NOT surface an error — it emits a valid
369
+ # `tool_calls[]` whose `arguments` is empty, `"{}"`, or non-parseable
370
+ # JSON. Without this check the agent would either execute the tool with
371
+ # empty args or (worse) silently exit thinking the task finished.
372
+ #
373
+ # Rule is deliberately narrow: we only intercept the case where the
374
+ # model streamed literally nothing into the tool_call arguments —
375
+ # i.e. `nil`, empty string, or the placeholder `"{}"`. Partial/invalid
376
+ # JSON (e.g. `{"path": "/tmp/x"`) is left to the existing
377
+ # ArgumentsParser → BadArgumentsError path, because the model already
378
+ # committed to specific values and feeding the parse error back as a
379
+ # tool_result lets it self-correct in one round-trip (faster than a
380
+ # blind retry from scratch).
381
+ private def detect_upstream_truncation!(response)
382
+ tool_calls = response[:tool_calls]
383
+ return if tool_calls.nil? || tool_calls.empty?
384
+
385
+ truncated = tool_calls.find { |tc| tool_call_args_truncated?(tc[:arguments]) }
386
+ return unless truncated
387
+
388
+ args_str = truncated[:arguments].is_a?(String) ? truncated[:arguments] : truncated[:arguments].to_s
389
+ Clacky::Logger.warn("llm.upstream_truncation_detected",
390
+ model: current_model,
391
+ tool_name: truncated[:name].to_s,
392
+ args_len: args_str.length,
393
+ args_head: args_str[0, 80],
394
+ finish_reason: response[:finish_reason].to_s,
395
+ completion_tokens: response.dig(:token_usage, :completion_tokens),
396
+ ttft_ms: response.dig(:latency, :ttft_ms)
397
+ )
398
+
399
+ # Inject a one-shot [SYSTEM] hint so a plain retry isn't doomed to the
400
+ # same fate when the truncation correlates with large tool_call args
401
+ # (e.g. writing a 5000-char file in one go). For infrastructure-level
402
+ # blips this hint is harmless — the retry usually succeeds on its own
403
+ # and the hint just sits in history without affecting behaviour.
404
+ inject_upstream_truncation_hint_if_first(truncated)
405
+
406
+ raise Clacky::UpstreamTruncatedError,
407
+ "[LLM] Upstream truncated tool_call `#{truncated[:name]}` " \
408
+ "(args=#{args_str[0, 40].inspect}). Retrying..."
409
+ end
410
+
411
+ # True when a tool_call's arguments field looks COMPLETELY empty —
412
+ # i.e. the upstream stream was cut before the model wrote any real
413
+ # content into the arguments JSON.
414
+ #
415
+ # Rules:
416
+ # - nil / non-String / empty string → truncated (nothing at all)
417
+ # - parses to {} (empty object) → truncated (placeholder only)
418
+ # - anything else (including partial/invalid JSON like `{"path":
419
+ # "/tmp/x"` where the model already started writing) → NOT
420
+ # truncated by this detector
421
+ #
422
+ # Partial-JSON cases are deliberately left to the existing
423
+ # ArgumentsParser → BadArgumentsError path, which surfaces the parse
424
+ # error back to the LLM as a tool_result so it can self-correct. That
425
+ # is more efficient than a blind retry when the model already wrote
426
+ # most of the args.
427
+ private def tool_call_args_truncated?(args)
428
+ return true if args.nil?
429
+ return true unless args.is_a?(String)
430
+ return true if args.empty?
431
+
432
+ parsed = begin
433
+ JSON.parse(args)
434
+ rescue JSON::ParserError
435
+ # Partial/invalid JSON — let ArgumentsParser handle it downstream.
436
+ return false
437
+ end
438
+
439
+ parsed.is_a?(Hash) && parsed.empty?
440
+ end
441
+
305
442
  # On the FIRST Faraday::TimeoutError within a task, append a [SYSTEM]
306
443
  # user message to the history instructing the model to break its work
307
444
  # into smaller steps. Subsequent timeouts in the same task are ignored
@@ -345,6 +482,54 @@ module Clacky
345
482
  "LLM response timed out — asking model to break the task into smaller steps and retrying..."
346
483
  )
347
484
  end
485
+
486
+ # On the FIRST upstream-truncation detection within a task, append a
487
+ # [SYSTEM] user message nudging the model toward smaller tool_call args.
488
+ # This guards against the (real but rare) case where the upstream SSE
489
+ # cut correlates with large tool_call payloads — a plain retry on the
490
+ # same oversized args would keep tripping the same wire.
491
+ #
492
+ # For purely infrastructural truncations (Anthropic edge blip, router
493
+ # hiccup), the hint is harmless — the retry will succeed and the hint
494
+ # just sits unused in history. Cheaper than letting the agent burn
495
+ # through its retry budget on the same oversized payload.
496
+ #
497
+ # Same plumbing as inject_large_output_hint_if_first_timeout: one-shot
498
+ # per task, carries `system_injected: true` so it's hidden from UI
499
+ # replay and skipped by compression/caching placement logic. Reset per
500
+ # task via Agent#run (see @task_upstream_truncation_hint_injected).
501
+ private def inject_upstream_truncation_hint_if_first(truncated_call)
502
+ return if @task_upstream_truncation_hint_injected
503
+
504
+ @task_upstream_truncation_hint_injected = true
505
+
506
+ tool_name = truncated_call[:name].to_s
507
+ hint = "[SYSTEM] The previous response was cut short by the upstream provider " \
508
+ "before the `#{tool_name}` tool_call finished streaming. " \
509
+ "The partial tool_call has been discarded. To avoid the same problem on retry, " \
510
+ "please adapt your approach:\n" \
511
+ "- Prefer smaller tool_call arguments — large single-shot payloads are more likely to be truncated.\n" \
512
+ "- For long file content: create the file first with a minimal skeleton via `write`, " \
513
+ "then append sections one at a time with `edit`.\n" \
514
+ "- Break large tasks into multiple smaller tool calls instead of one big one.\n" \
515
+ "- Keep each tool-call argument comfortably under ~2000 characters when possible."
516
+
517
+ @history.append({
518
+ role: "user",
519
+ content: hint,
520
+ system_injected: true,
521
+ task_id: @current_task_id
522
+ })
523
+
524
+ Clacky::Logger.info(
525
+ "[llm_caller] Upstream truncation — injected 'smaller tool_call args' hint " \
526
+ "(tool=#{tool_name.inspect})"
527
+ )
528
+
529
+ @ui&.show_warning(
530
+ "Upstream response was truncated mid tool-call — asking model to use smaller steps and retrying..."
531
+ )
532
+ end
348
533
  end
349
534
  end
350
535
  end
data/lib/clacky/agent.rb CHANGED
@@ -210,6 +210,7 @@ module Clacky
210
210
  @start_time = Time.now
211
211
  @task_truncation_count = 0 # Reset truncation counter for each task
212
212
  @task_timeout_hint_injected = false # Reset read-timeout hint injection (see LlmCaller)
213
+ @task_upstream_truncation_hint_injected = false # Reset upstream-truncation hint injection (see LlmCaller)
213
214
  @task_cost_source = :estimated # Reset for new task
214
215
  # Note: Do NOT reset @previous_total_tokens here - it should maintain the value from the last iteration
215
216
  # across tasks to correctly calculate delta tokens in each iteration
@@ -373,8 +374,58 @@ module Clacky
373
374
  # Skip if compression happened (response is nil)
374
375
  next if response.nil?
375
376
 
376
- # Check if done (no more tool calls needed)
377
- if response[:finish_reason] == "stop" || response[:tool_calls].nil? || response[:tool_calls].empty?
377
+ # [DIAG] Only log when finish_reason=="stop" AND tool_calls non-empty —
378
+ # the suspicious combo that indicates an upstream-truncated tool_use
379
+ # response. Normal responses produce no log line here to avoid noise.
380
+ begin
381
+ tool_calls = response[:tool_calls] || []
382
+ if response[:finish_reason] == "stop" && !tool_calls.empty?
383
+ tc_summary = tool_calls.map do |c|
384
+ args_str = c[:arguments].is_a?(String) ? c[:arguments] : c[:arguments].to_s
385
+ {
386
+ name: c[:name].to_s,
387
+ args_len: args_str.length,
388
+ args_head: args_str[0, 120]
389
+ }
390
+ end
391
+ Clacky::Logger.warn("agent.think_response",
392
+ session_id: @session_id,
393
+ iteration: @iterations,
394
+ finish_reason: response[:finish_reason].to_s,
395
+ tool_calls_count: tool_calls.size,
396
+ tool_calls: tc_summary,
397
+ content_len: response[:content].to_s.length,
398
+ completion_tokens: response.dig(:token_usage, :completion_tokens),
399
+ ttft_ms: response.dig(:latency, :ttft_ms),
400
+ suspicious_truncation: true
401
+ )
402
+ end
403
+ rescue StandardError => e
404
+ Clacky::Logger.warn("agent.think_response.log_failed", error: e.message)
405
+ end
406
+
407
+ # Check if done (no more tool calls needed).
408
+ #
409
+ # Defensive rule: we ONLY exit on empty/missing tool_calls.
410
+ # We used to also short-circuit on finish_reason=="stop", but
411
+ # upstream routers (OpenRouter → Anthropic/Bedrock) can return the
412
+ # contradictory combo `finish_reason=="stop" + non-empty tool_calls
413
+ # with truncated args`, which caused the agent to silently treat a
414
+ # truncated response as "task complete". Truncation is now caught
415
+ # earlier by LlmCaller#detect_upstream_truncation! (which raises
416
+ # UpstreamTruncatedError → RetryableError); this branch stays as
417
+ # a belt-and-braces guard: if that detector ever misses a new
418
+ # truncation pattern, we still won't silently exit while the model
419
+ # is mid-tool_call.
420
+ if response[:tool_calls].nil? || response[:tool_calls].empty?
421
+ # [DIAG] Pin down exactly which sub-condition triggered the task exit.
422
+ Clacky::Logger.info("agent.loop_break_normal",
423
+ session_id: @session_id,
424
+ iteration: @iterations,
425
+ branch: (response[:tool_calls].nil? ? "tool_calls_nil" : "tool_calls_empty"),
426
+ finish_reason: response[:finish_reason].to_s,
427
+ tool_calls_count: (response[:tool_calls] || []).size
428
+ )
378
429
  if response[:content] && !response[:content].empty?
379
430
  emit_assistant_message(response[:content])
380
431
  end
@@ -210,7 +210,16 @@ zh template:
210
210
  [1–2 句话,根据用户目标和背景量身定制。]
211
211
  ```
212
212
 
213
- ### A.9. Import external skills (optional)
213
+ ### A.9. Pre-install builtin skills
214
+
215
+ Silently run `ruby "SKILL_DIR/scripts/install_builtin_skills.rb"`,
216
+ then parse the last stdout line as JSON and read `installed` as N.
217
+
218
+ - If N > 0, show one line:
219
+ - zh: `✅ 已为你内置 N 个技能,输入 /skills 随时查看。`
220
+ - en: `✅ Installed N builtin skills. Type /skills anytime to view them.`
221
+
222
+ ### A.10. Import external skills (optional)
214
223
 
215
224
  Run `test -d ~/.openclaw && echo yes || echo no`. If `no`, skip silently.
216
225
  If `yes`:
@@ -221,7 +230,7 @@ If `yes`:
221
230
  - en: `{ "question": "OpenClaw detected. Found N skills. Import them into Clacky?", "options": ["Import", "Skip"] }`
222
231
  4. If confirmed: `ruby "SKILL_DIR/scripts/import_external_skills.rb" --source openclaw --yes`
223
232
 
224
- ### A.10. Celebrate soul setup & offer browser
233
+ ### A.11. Celebrate soul setup & offer browser
225
234
 
226
235
  zh:
227
236
  > ✅ 你的专属 AI 灵魂已设定完成![ai.name] 已经准备好了。
@@ -240,14 +249,14 @@ en: `{ "question": "Want to set up browser automation now? (You can always run /
240
249
 
241
250
  If chosen → invoke `browser-setup` skill with subcommand `setup`.
242
251
 
243
- ### A.11. Offer personal website
252
+ ### A.12. Offer personal website
244
253
 
245
254
  zh: `{ "question": "还有一件有意思的事:要帮你生成一个个人主页吗?我会根据你刚才分享的信息做一个,生成后你会得到一个公开链接。", "options": ["生成主页", "跳过,完成设置"] }`
246
255
  en: `{ "question": "One more thing: want me to generate a personal website from the info you just shared? You'll get a public link you can share.", "options": ["Generate my site", "Skip, I'm done"] }`
247
256
 
248
257
  If chosen → invoke `personal-website` skill.
249
258
 
250
- ### A.12. Confirm and close
259
+ ### A.13. Confirm and close
251
260
 
252
261
  Speak as [ai.name]. This is the AI's first moment of truly being alive — it has a soul,
253
262
  it knows its person, it has hands and eyes, and it just did its first real thing in the world.
@@ -315,7 +324,7 @@ en:
315
324
 
316
325
  Do NOT open a new session — the UI handles navigation after the skill finishes.
317
326
 
318
- ### A.13. First-run notes
327
+ ### A.14. First-run notes
319
328
 
320
329
  - Keep both files under 300 words each.
321
330
  - Do not ask follow-up questions beyond the cards above.
@@ -0,0 +1,175 @@
1
+ #!/usr/bin/env ruby
2
+ # frozen_string_literal: true
3
+
4
+ # Install builtin skills into ~/.clacky/skills/.
5
+ #
6
+ # Fetches the server-curated builtin list from GET /api/v1/skills/builtin on
7
+ # the openclacky platform (public, no auth), then downloads and installs each
8
+ # skill's zip package in parallel (5 workers, 30s total timeout).
9
+ #
10
+ # The "builtin" whitelist is enforced server-side — this script takes no
11
+ # filter flags. Admin toggles the `builtin` flag per skill on the platform.
12
+ #
13
+ # Called by onboard skill: `ruby install_builtin_skills.rb`
14
+ #
15
+ # Output:
16
+ # - Diagnostics → STDERR
17
+ # - Last line of STDOUT → JSON: {"installed":N,"attempted":N,"skipped_existing":N}
18
+ # - Exit code: always 0
19
+
20
+ require 'uri'
21
+ require 'net/http'
22
+ require 'json'
23
+ require 'timeout'
24
+
25
+ # Reuse the downloader/extractor/installer from the skill-add skill.
26
+ # Physical relocation to lib/clacky/ is deferred until a third caller appears.
27
+ require_relative '../../skill-add/scripts/install_from_zip'
28
+
29
+ class BuiltinSkillsInstaller
30
+ PRIMARY_HOST = ENV.fetch('CLACKY_LICENSE_SERVER', 'https://www.openclacky.com')
31
+ FALLBACK_HOST = 'https://openclacky.up.railway.app'
32
+ API_HOSTS = ENV['CLACKY_LICENSE_SERVER'] ? [PRIMARY_HOST] : [PRIMARY_HOST, FALLBACK_HOST]
33
+ API_PATH = '/api/v1/skills/builtin'
34
+ API_OPEN_TIMEOUT = 5
35
+ API_READ_TIMEOUT = 10
36
+ CONCURRENCY = 5
37
+
38
+ def initialize
39
+ @target_dir = File.join(Dir.home, '.clacky', 'skills')
40
+ @per_skill_timeout = 10
41
+ @total_timeout = 30
42
+
43
+ @installed = 0
44
+ @skipped_existing = 0
45
+ @attempted = 0
46
+ @errors = []
47
+ @mutex = Mutex.new
48
+ end
49
+
50
+ def run
51
+ skills = fetch_skill_list
52
+ if skills.nil? || skills.empty?
53
+ emit_summary
54
+ return
55
+ end
56
+
57
+ install_concurrently(skills)
58
+ ensure
59
+ emit_summary
60
+ end
61
+
62
+ # --- Internals -------------------------------------------------------------
63
+
64
+ # Returns an array of skill hashes, or nil on total failure.
65
+ private def fetch_skill_list
66
+ API_HOSTS.each do |host|
67
+ begin
68
+ uri = URI.parse(host + API_PATH)
69
+ Net::HTTP.start(uri.host, uri.port,
70
+ use_ssl: uri.scheme == 'https',
71
+ open_timeout: API_OPEN_TIMEOUT,
72
+ read_timeout: API_READ_TIMEOUT) do |http|
73
+ response = http.request(Net::HTTP::Get.new(uri.request_uri))
74
+ if response.code.to_i == 200
75
+ payload = JSON.parse(response.body)
76
+ return Array(payload['skills'])
77
+ else
78
+ @errors << "API #{host}: HTTP #{response.code}"
79
+ end
80
+ end
81
+ rescue StandardError => e
82
+ @errors << "API #{host}: #{e.class}: #{e.message}"
83
+ end
84
+ end
85
+ nil
86
+ end
87
+
88
+ # Install skills in parallel, bounded by CONCURRENCY and @total_timeout.
89
+ # Workers pull from a shared queue and self-check the deadline, so the
90
+ # global timeout is enforced without killing threads mid-download (which
91
+ # would leak temp dirs). Whatever finishes before the deadline stays
92
+ # installed; the rest is recovered on the next onboard run via skip_if_exists.
93
+ private def install_concurrently(skills)
94
+ queue = Queue.new
95
+ skills.each { |s| queue << s }
96
+
97
+ deadline = Time.now + @total_timeout
98
+ worker_pool = [CONCURRENCY, skills.size].min
99
+
100
+ workers = Array.new(worker_pool) do
101
+ Thread.new do
102
+ loop do
103
+ break if Time.now >= deadline
104
+ skill = queue.pop(true) rescue nil # non-blocking pop
105
+ break if skill.nil?
106
+ install_one(skill)
107
+ end
108
+ end
109
+ end
110
+
111
+ workers.each(&:join)
112
+
113
+ # If the deadline cut us off with items still in the queue, record it.
114
+ remaining = queue.size
115
+ if remaining.positive?
116
+ @mutex.synchronize do
117
+ @errors << "overall timeout after #{@total_timeout}s " \
118
+ "(installed=#{@installed}, attempted=#{@attempted}, remaining=#{remaining})"
119
+ end
120
+ end
121
+ end
122
+
123
+ # Install one skill entry (hash from the API payload).
124
+ # Bounded by @per_skill_timeout; any failure is swallowed into @errors.
125
+ # Thread-safe: all shared state writes go through @mutex.
126
+ private def install_one(skill)
127
+ name = skill['name'].to_s
128
+ download_url = skill['download_url'].to_s
129
+
130
+ @mutex.synchronize { @attempted += 1 }
131
+
132
+ if name.empty? || download_url.empty?
133
+ @mutex.synchronize do
134
+ @errors << "skill payload missing name or download_url: #{skill.inspect}"
135
+ end
136
+ return
137
+ end
138
+
139
+ Timeout.timeout(@per_skill_timeout) do
140
+ installer = ZipSkillInstaller.new(
141
+ download_url,
142
+ skill_name: name,
143
+ target_dir: @target_dir,
144
+ skip_if_exists: true
145
+ )
146
+ result = installer.perform
147
+ @mutex.synchronize do
148
+ @installed += result[:installed].size
149
+ @skipped_existing += result[:skipped].size
150
+ @errors.concat(result[:errors]) if result[:errors].any?
151
+ end
152
+ end
153
+ rescue Timeout::Error
154
+ @mutex.synchronize { @errors << "#{name}: install timeout after #{@per_skill_timeout}s" }
155
+ rescue StandardError => e
156
+ @mutex.synchronize { @errors << "#{name}: #{e.class}: #{e.message}" }
157
+ end
158
+
159
+ # Diagnostics to stderr; single-line JSON summary to stdout.
160
+ # The caller (onboard) should parse the LAST stdout line.
161
+ private def emit_summary
162
+ unless @errors.empty?
163
+ warn '[install_builtin_skills] non-fatal errors:'
164
+ @errors.each { |e| warn " - #{e}" }
165
+ end
166
+ puts JSON.generate(
167
+ installed: @installed,
168
+ attempted: @attempted,
169
+ skipped_existing: @skipped_existing
170
+ )
171
+ end
172
+ end
173
+
174
+ # ── Entry point ───────────────────────────────────────────────────────────────
175
+ BuiltinSkillsInstaller.new.run if __FILE__ == $0