RubyGems - openclacky - Versions diffs - 1.3.4 → 1.3.5 - Mend

openclacky 1.3.4 → 1.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (55) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +27 -0
data/lib/clacky/agent/fake_tool_call_detector.rb +52 -0
data/lib/clacky/agent/session_serializer.rb +3 -2
data/lib/clacky/agent/tool_executor.rb +0 -12
data/lib/clacky/agent.rb +74 -9
data/lib/clacky/api_extension.rb +81 -0
data/lib/clacky/api_extension_loader.rb +13 -1
data/lib/clacky/client.rb +14 -17
data/lib/clacky/default_agents/_panels/time_machine/panel.js +22 -0
data/lib/clacky/default_agents/base_prompt.md +1 -0
data/lib/clacky/default_extensions/meeting/handler.rb +331 -0
data/lib/clacky/default_extensions/meeting/meeting.js +790 -0
data/lib/clacky/default_extensions/meeting/meta.yml +3 -0
data/lib/clacky/default_extensions/meeting/skills/meeting-summarizer/SKILL.md +44 -0
data/lib/clacky/default_skills/media-gen/SKILL.md +63 -0
data/lib/clacky/default_skills/media-gen/scripts/video_seq.sh +114 -0
data/lib/clacky/json_ui_controller.rb +1 -1
data/lib/clacky/media/base.rb +60 -0
data/lib/clacky/media/dashscope.rb +385 -21
data/lib/clacky/media/gemini.rb +9 -0
data/lib/clacky/media/generator.rb +52 -0
data/lib/clacky/media/openai_compat.rb +166 -0
data/lib/clacky/null_ui_controller.rb +13 -0
data/lib/clacky/plain_ui_controller.rb +1 -1
data/lib/clacky/providers.rb +50 -2
data/lib/clacky/rich_ui/rich_ui_controller.rb +1 -1
data/lib/clacky/server/channel/channel_ui_controller.rb +1 -1
data/lib/clacky/server/http_server.rb +144 -9
data/lib/clacky/server/session_registry.rb +4 -2
data/lib/clacky/server/web_ui_controller.rb +3 -2
data/lib/clacky/skill_loader.rb +14 -2
data/lib/clacky/tools/terminal/output_cleaner.rb +1 -3
data/lib/clacky/tools/terminal.rb +0 -43
data/lib/clacky/ui2/components/modal_component.rb +1 -1
data/lib/clacky/ui2/ui_controller.rb +140 -31
data/lib/clacky/ui_interface.rb +10 -1
data/lib/clacky/utils/encoding.rb +25 -0
data/lib/clacky/version.rb +1 -1
data/lib/clacky/web/app.css +145 -22
data/lib/clacky/web/components/onboard.js +1 -14
data/lib/clacky/web/features/brand/view.js +8 -5
data/lib/clacky/web/features/channels/store.js +1 -20
data/lib/clacky/web/features/mcp/store.js +1 -20
data/lib/clacky/web/features/profile/store.js +1 -13
data/lib/clacky/web/features/profile/view.js +16 -4
data/lib/clacky/web/features/skills/store.js +6 -21
data/lib/clacky/web/features/version/store.js +2 -0
data/lib/clacky/web/i18n.js +24 -1
data/lib/clacky/web/index.html +15 -0
data/lib/clacky/web/sessions.js +141 -51
data/lib/clacky/web/settings.js +34 -2
data/lib/clacky/web/ws-dispatcher.js +11 -3
data/lib/clacky.rb +12 -5
metadata +8 -1

data/lib/clacky/default_extensions/meeting/meta.yml ADDED Viewed

@@ -0,0 +1,3 @@
+name: meeting
+description: Real-time meeting transcription and AI assistant
+version: "0.1.0"

data/lib/clacky/default_extensions/meeting/skills/meeting-summarizer/SKILL.md ADDED Viewed

@@ -0,0 +1,44 @@
+---
+name: meeting-summarizer
+description: Summarize a completed meeting from its transcript. Produces a structured summary with key decisions, action items, and discussion highlights. Triggered automatically when a meeting ends.
+user-invocable: false
+auto_summarize: false
+---
+# Meeting Summarizer
+You are a meeting summarization assistant. You have been given a meeting transcript and must produce a clear, actionable summary.
+## Input
+The user message contains the full meeting transcript (timestamped lines of dialogue).
+## Output Format
+Produce the summary in this structure:
+### Meeting Summary
+**Duration**: [start time] – [end time]
+#### Key Decisions
+- List each decision made during the meeting
+#### Action Items
+- [ ] Action item with owner if identifiable
+#### Discussion Highlights
+- Brief bullet points of important topics discussed
+#### Open Questions
+- Any unresolved questions raised but not answered
+---
+## Rules
+1. Be concise — each bullet should be one sentence max.
+2. If speakers are identifiable from context, attribute decisions and actions to them.
+3. Ignore filler words, small talk, and off-topic tangents.
+4. If the transcript is too short or empty, say so and skip the structured output.
+5. Write the summary in the same language the meeting was conducted in.

data/lib/clacky/default_skills/media-gen/SKILL.md CHANGED Viewed

@@ -242,6 +242,69 @@ it in documents with a relative path under `./assets/generated/`.
 Same shape and `error_type` values as image generation, but with `"video": null`.
 `not_configured` means no `type=video` model is set up.
+### Continuous / long video (last-frame chaining)
+A single Veo call maxes out at 8 seconds, and separate calls are visually
+**unrelated** (the character, lighting and framing jump between clips). To make
+several clips flow as one continuous shot, chain them: take the **last frame**
+of clip N and feed it as the `image` (first frame) of clip N+1. Veo's
+image-to-video then continues from exactly where the previous clip ended, so
+the seam is smooth.
+Use the helper script (it only does the ffmpeg mechanics — you drive the
+generation with the same `/api/media/video` curl as above). The script's
+absolute path is given in the **Supporting Files** block; assign it once:
+```bash
+SEQ="SKILL_DIR/scripts/video_seq.sh"   # SKILL_DIR is provided in Supporting Files
+# subcommands: lastframe | tob64 | payload | concat | probe
+```
+Workflow for an N-segment continuous video:
+1. **Plan the shots.** Split the story into 4–8s beats. Write one prompt per
+   beat; each prompt should describe the *continuation*, e.g. "The same girl
+   keeps walking forward, the camera pushes in…". Keep subject, style and
+   lighting wording consistent across prompts.
+2. **Segment 1** — normal text-to-video call. Save the returned mp4 path.
+3. **Extract its last frame** (as JPEG — keep the `.jpg` extension):
+   ```bash
+   "$SEQ" lastframe seg1.mp4 /tmp/seg1_last.jpg
+   ```
+4. **Segment 2** — build the request body with `payload`, then post it with
+   `curl --data @file`. **Do NOT inline the base64 into `-d "{…}"`** — a frame's
+   base64 is ~150KB+ and overflows the shell's argument limit ("Argument list
+   too long"). The `payload` subcommand reads the frame, base64-encodes it, and
+   writes a ready-to-send JSON file:
+   ```bash
+   "$SEQ" payload /tmp/seg2.json /tmp/seg1_last.jpg 8 landscape "$OUT_DIR" \
+     "Continuing the same scene, the camera keeps pushing forward…"
+   curl -s -X POST .../api/media/video -H "Content-Type: application/json" \
+     --data @/tmp/seg2.json
+   ```
+   (`payload <out.json> <frame> <duration_seconds> <aspect_ratio> <output_dir> <prompt>`)
+5. **Repeat** steps 3–4 for each subsequent segment, always chaining off the
+   *previous* segment's last frame.
+6. **Stitch** all clips in order into one file:
+   ```bash
+   "$SEQ" concat final.mp4 seg1.mp4 seg2.mp4 seg3.mp4
+   ```
+Rules & caveats:
+- **Strictly sequential.** Generate one segment, wait for it, extract its
+  frame, then start the next. Never run two video generations at once.
+- **Keep prompts consistent.** The image carries visual continuity, but the
+  prompt must not contradict it (don't switch the subject or scene mid-chain
+  unless you intend a cut).
+- **Aspect ratio must match** across all segments, or `concat` falls back to a
+  slower re-encode (and may letterbox). Use the same `aspect_ratio` everywhere.
+- **Cost adds up linearly** — N segments ≈ N × single-clip price. Confirm the
+  number of segments and total length with the user before starting.
+- For >30s or a true single-take >8s with no seam at all, this client-side
+  chaining is the practical option today; Veo's native server-side `extend`
+  (148s) is not wired into this endpoint yet.
 ## Generating speech (Gemini TTS)
 The same `/api/media/` namespace serves text-to-speech. The user must

data/lib/clacky/default_skills/media-gen/scripts/video_seq.sh ADDED Viewed

@@ -0,0 +1,114 @@
+#!/usr/bin/env bash
+# Helpers for stitching multiple Veo clips into one continuous video using the
+# "last-frame chaining" technique (method A): the last frame of clip N becomes
+# the first frame (image-to-video) of clip N+1, so the seam is visually
+# continuous. The agent drives generation via the /api/media/video endpoint;
+# this script only does the mechanical ffmpeg steps.
+#
+# Requires: ffmpeg, ffprobe (both ship with the standard image).
+#
+# Subcommands:
+#   lastframe  <video.mp4> <out.jpg>           extract the final frame (JPEG by default)
+#   tob64      <image>                          print base64 (no newlines) to stdout
+#   payload    <out.json> <frame.jpg> <dur> <aspect> <output_dir> <prompt>
+#                                               build an image-to-video JSON body
+#                                               for `curl --data @out.json`
+#   concat     <out.mp4> <clip1.mp4> [clip2 …]  losslessly join clips in order
+#   probe      <video.mp4>                      print "WIDTHxHEIGHT FPS DURATION"
+set -euo pipefail
+die() { echo "error: $*" >&2; exit 1; }
+need() { command -v "$1" >/dev/null 2>&1 || die "$1 not found on PATH"; }
+cmd_lastframe() {
+  local src="$1" out="$2"
+  [[ -f "$src" ]] || die "no such video: $src"
+  need ffmpeg; need ffprobe
+  # sseof seeks relative to end; -update 1 keeps overwriting so we land on the
+  # genuinely last decodable frame regardless of exact timestamp.
+  # JPEG (-q:v 3) keeps the base64 ~8x smaller than PNG, which matters because a
+  # PNG frame's base64 (~1.5MB) overflows ARG_MAX when inlined into a shell arg.
+  ffmpeg -nostdin -loglevel error -y -sseof -0.5 -i "$src" \
+    -update 1 -frames:v 1 -q:v 3 "$out"
+  [[ -f "$out" ]] || die "failed to extract last frame"
+  echo "$out"
+}
+cmd_tob64() {
+  local img="$1"
+  [[ -f "$img" ]] || die "no such image: $img"
+  base64 < "$img" | tr -d '\n'
+}
+# Build the image-to-video request body as a file so curl can send it with
+# `--data @file`, avoiding "Argument list too long" from inlining base64.
+cmd_payload() {
+  local out="$1" frame="$2" dur="$3" aspect="$4" odir="$5" prompt="$6"
+  [[ -f "$frame" ]] || die "no such frame: $frame"
+  need ffprobe
+  local mime b64
+  case "$frame" in
+    *.png) mime="image/png" ;;
+    *)     mime="image/jpeg" ;;
+  esac
+  b64="$(base64 < "$frame" | tr -d '\n')"
+  FRAME_B64="$b64" FRAME_MIME="$mime" P_PROMPT="$prompt" P_DUR="$dur" \
+  P_ASPECT="$aspect" P_ODIR="$odir" python3 - "$out" <<'PY'
+import json, os, sys
+body = {
+  "prompt": os.environ["P_PROMPT"],
+  "aspect_ratio": os.environ["P_ASPECT"],
+  "duration_seconds": int(os.environ["P_DUR"]),
+  "output_dir": os.environ["P_ODIR"],
+  "image": {"b64_json": os.environ["FRAME_B64"], "mime_type": os.environ["FRAME_MIME"]},
+}
+open(sys.argv[1], "w").write(json.dumps(body))
+PY
+  [[ -f "$out" ]] || die "failed to write payload"
+  echo "$out"
+}
+cmd_concat() {
+  local out="$1"; shift
+  [[ $# -ge 1 ]] || die "concat needs at least one clip"
+  need ffmpeg
+  local listfile
+  listfile="$(mktemp -t veo_concat.XXXXXX)"
+  trap 'rm -f "$listfile"' RETURN
+  local clip abs
+  for clip in "$@"; do
+    [[ -f "$clip" ]] || die "no such clip: $clip"
+    abs="$(cd "$(dirname "$clip")" && pwd)/$(basename "$clip")"
+    printf "file '%s'\n" "$abs" >> "$listfile"
+  done
+  # Try stream-copy first (fast, lossless); fall back to re-encode if the clips
+  # are not bit-compatible for the concat demuxer.
+  if ! ffmpeg -nostdin -loglevel error -y -f concat -safe 0 -i "$listfile" \
+        -c copy "$out" 2>/dev/null; then
+    ffmpeg -nostdin -loglevel error -y -f concat -safe 0 -i "$listfile" \
+      -c:v libx264 -pix_fmt yuv420p -c:a aac "$out"
+  fi
+  echo "$out"
+}
+cmd_probe() {
+  local src="$1"
+  [[ -f "$src" ]] || die "no such video: $src"
+  need ffprobe
+  ffprobe -v error -select_streams v:0 \
+    -show_entries stream=width,height,r_frame_rate \
+    -show_entries format=duration \
+    -of default=noprint_wrappers=1:nokey=1 "$src" \
+    | paste -sd' ' -
+}
+[[ $# -ge 1 ]] || die "usage: $0 {lastframe|tob64|payload|concat|probe} ..."
+sub="$1"; shift
+case "$sub" in
+  lastframe) cmd_lastframe "$@" ;;
+  tob64)     cmd_tob64 "$@" ;;
+  payload)   cmd_payload "$@" ;;
+  concat)    cmd_concat "$@" ;;
+  probe)     cmd_probe "$@" ;;
+  *)         die "unknown subcommand: $sub" ;;
+esac

data/lib/clacky/json_ui_controller.rb CHANGED Viewed

@@ -101,7 +101,7 @@ module Clacky
       emit("warning", message: message)
     end
-    def show_error(message, code: nil, top_up_url: nil)
+    def show_error(message, code: nil, top_up_url: nil, raw_message: nil)
       payload = { message: message }
       payload[:code] = code if code
       payload[:top_up_url] = top_up_url if top_up_url

data/lib/clacky/media/base.rb CHANGED Viewed

@@ -51,6 +51,24 @@ module Clacky
         )
       end
+      def generate_transcription(audio_base64:, mime_type:, **_kwargs)
+        transcription_error_response(
+          error: "Speech-to-text is not supported by #{self.class.name.split("::").last}. Use the openclacky gateway with an STT model such as or-stt-gemini-3-5-flash.",
+          error_type: "not_implemented",
+          provider: ""
+        )
+      end
+      # @return [Hash] either video_understanding_success_response(...) or
+      #   video_understanding_error_response(...)
+      def understand_video(video_base64:, mime_type:, prompt: nil, **_kwargs)
+        video_understanding_error_response(
+          error: "Video understanding is not supported by #{self.class.name.split("::").last}. Use the openclacky gateway with a video understanding model such as or-gemini-3-5-flash.",
+          error_type: "not_implemented",
+          provider: ""
+        )
+      end
       # Persist a base64-encoded image under <output_dir>/assets/generated/.
       # Returns the absolute path on disk.
       private def save_b64_image(b64_data, output_dir:, prefix: "img", extension: "png")
@@ -188,6 +206,48 @@ module Clacky
           "provider"   => provider
         }
       end
+      private def transcription_success_response(text:, provider:, extra: {})
+        {
+          "success"  => true,
+          "text"     => text,
+          "model"    => @model,
+          "provider" => provider
+        }.merge(extra)
+      end
+      private def transcription_error_response(error:, error_type: "provider_error", provider: "")
+        {
+          "success"    => false,
+          "text"       => nil,
+          "error"      => error,
+          "error_type" => error_type,
+          "model"      => @model,
+          "provider"   => provider
+        }
+      end
+      private def video_understanding_success_response(analysis:, prompt:, provider:, extra: {})
+        {
+          "success"  => true,
+          "analysis" => analysis,
+          "model"    => @model,
+          "prompt"   => prompt,
+          "provider" => provider
+        }.merge(extra)
+      end
+      private def video_understanding_error_response(error:, error_type: "provider_error", provider:, prompt: "")
+        {
+          "success"    => false,
+          "analysis"   => nil,
+          "error"      => error,
+          "error_type" => error_type,
+          "model"      => @model,
+          "prompt"     => prompt,
+          "provider"   => provider
+        }
+      end
     end
   end
 end