@kolbo/kolbo-code-linux-arm64-musl 1.1.66 → 1.1.67

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/kolbo CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kolbo/kolbo-code-linux-arm64-musl",
3
- "version": "1.1.66",
3
+ "version": "1.1.67",
4
4
  "os": [
5
5
  "linux"
6
6
  ],
@@ -150,6 +150,25 @@ The regular `srt_url` groups words into readable subtitle lines (default 12 word
150
150
  ### Long Content
151
151
  Transcription supports files up to 30 minutes. For longer content, split the file first or provide segments.
152
152
 
153
+ ### Visual Video/Audio Analysis (what's happening, not just what's said)
154
+ `transcribe_audio` only extracts **speech**. If the user wants to understand **what's visually happening** in a video (scenes, actions, objects, on-screen text) or needs a multimodal AI to reason about the content, use `chat_send_message` with a video-capable model instead.
155
+
156
+ **Video-capable models**: `gemini-2.5-pro`, `gemini-2.5-flash` — these can watch video and analyze visual content.
157
+
158
+ **Workflow for visual analysis:**
159
+ 1. Upload the video with `upload_media` to get a stable CDN URL
160
+ 2. Call `chat_send_message` with the video URL in the message and a video-capable model (e.g. `gemini-2.5-pro`)
161
+ 3. Ask your analysis question: "Describe what happens in this video", "What products are shown?", "Summarize the key scenes"
162
+
163
+ **When to use which:**
164
+
165
+ | User intent | Tool |
166
+ |-------------|------|
167
+ | "Transcribe this" / "What's being said?" | `transcribe_audio` |
168
+ | "Generate subtitles" / "Word-by-word timing" | `transcribe_audio` |
169
+ | "What's happening in this video?" / "Describe the scenes" | `chat_send_message` + Gemini |
170
+ | "Analyze this video and transcribe it" | Both — `transcribe_audio` for text + `chat_send_message` for visual |
171
+
153
172
  ---
154
173
 
155
174
  ## Image Prompts