@kolbo/kolbo-code-linux-arm64-musl 2.1.6 → 2.1.9
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kolbo +0 -0
- package/package.json +1 -1
- package/skills/kolbo/SKILL.md +68 -32
- package/skills/photo-studio/SKILL.md +28 -20
- package/skills/video-production/SKILL.md +112 -16
package/bin/kolbo
CHANGED
|
Binary file
|
package/package.json
CHANGED
package/skills/kolbo/SKILL.md
CHANGED
|
@@ -46,7 +46,7 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
|
|
|
46
46
|
|
|
47
47
|
| Tool | Description |
|
|
48
48
|
|------|-------------|
|
|
49
|
-
| `upload_media` | Upload
|
|
49
|
+
| `upload_media` | Upload ANY local file to Kolbo CDN → returns a public URL. Works for images, videos, audio, HTML, documents — any file type. Use for: feeding media to `chat_send_message`, sharing files publicly, hosting HTML pages, or multi-tool workflows. |
|
|
50
50
|
| `list_media` | Browse user's uploaded media with filtering by type and search. |
|
|
51
51
|
|
|
52
52
|
### Visual DNA (Character/Style Consistency)
|
|
@@ -66,19 +66,31 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
|
|
|
66
66
|
| `get_moodboard` | Fetch a moodboard's master_prompt, style_guide, and images. |
|
|
67
67
|
| `list_presets` | Browse generation presets (image/video/music templates with bundled style direction). |
|
|
68
68
|
|
|
69
|
-
### Chat
|
|
69
|
+
### Chat & Vision
|
|
70
70
|
|
|
71
71
|
| Tool | Description |
|
|
72
72
|
|------|-------------|
|
|
73
|
-
| `chat_send_message` | Send a message to Kolbo AI chat. Supports web search and deep think modes. |
|
|
73
|
+
| `chat_send_message` | Send a message to Kolbo AI chat. Pass `media_urls` (array of public URLs) to analyze images, videos, or audio — Smart Select auto-routes to Gemini vision when media is detected. Omit `model` for automatic routing. Supports web search and deep think modes. |
|
|
74
74
|
| `chat_list_conversations` | List your SDK chat conversations. |
|
|
75
75
|
| `chat_get_messages` | Fetch messages in a conversation (with media URLs). |
|
|
76
76
|
|
|
77
|
+
## ⚠️ Generate vs Edit — Know the Difference
|
|
78
|
+
|
|
79
|
+
| User intent | Action | NOT this |
|
|
80
|
+
|-------------|--------|----------|
|
|
81
|
+
| "Create a video from scratch" / "Generate a video of..." | `generate_video` (Kolbo MCP) | — |
|
|
82
|
+
| "Edit this video" / "Cut" / "Trim" / "Crop" / "Merge" / "Add subtitles" / "Remove silence" / "Speed up" / "Convert to 9:16" | Load `video-production` skill → FFmpeg | ❌ Do NOT call `generate_video` |
|
|
83
|
+
| "Create motion graphics" / "Animated text" / "Title sequence" | Load `remotion-best-practices` skill → Remotion | ❌ Do NOT call `generate_video` |
|
|
84
|
+
| "Animate this image" / "Make this photo move" | `generate_video_from_image` (Kolbo MCP) | — |
|
|
85
|
+
| "Restyle this video as anime" | `generate_video_from_video` (Kolbo MCP) | — |
|
|
86
|
+
|
|
87
|
+
**`generate_video` creates NEW videos from text prompts. It cannot edit, cut, trim, merge, or modify existing video files.** For any operation on an existing video file, use FFmpeg via the `video-production` skill.
|
|
88
|
+
|
|
77
89
|
## Core Workflow
|
|
78
90
|
|
|
79
91
|
1. **Check credits** with `check_credits` at the start of any creative session (once is enough).
|
|
80
92
|
2. **Discover models** with `list_models` using a `type` filter. **Always do this before calling a generation tool — never hardcode model identifiers.** Models are added, removed, and updated frequently.
|
|
81
|
-
3. **
|
|
93
|
+
3. **Pick the model**: If the user explicitly requested a specific model, use that. Otherwise, **prefer the cheapest model that still has great quality** — look at both `credit` cost and `recommended` status from `list_models`. When two models have similar quality, always pick the cheaper one. Only omit `model` (auto-select) as a last resort if you can't determine a good cheap option.
|
|
82
94
|
4. **Polling is internal** — the tool returns the final URL(s) when ready. If a video generation times out, call `get_generation_status` with the returned generation ID to retrieve the result.
|
|
83
95
|
5. **Share the URL** — after a successful generation, hand the real URL back to the user. Never fabricate URLs.
|
|
84
96
|
|
|
@@ -122,32 +134,45 @@ Creative generations bill against the user's Kolbo credit balance. **Billing uni
|
|
|
122
134
|
- Count the actual characters in the text before estimating. 1000 chars with ElevenLabs = 50 credits.
|
|
123
135
|
- **Images / 3D / Sound effects**: `total = model_credit × quantity`
|
|
124
136
|
|
|
125
|
-
**
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
|
|
137
|
+
**ALWAYS confirm total cost before generating:**
|
|
138
|
+
Before firing ANY generation (image, video, music, speech, 3D — everything), calculate the total credit cost and present it to the user for confirmation. This is especially critical for batch operations (e.g. "8 videos from 8 images"):
|
|
139
|
+
|
|
140
|
+
1. Calculate per-item cost using the formulas above.
|
|
141
|
+
2. Multiply by the number of items.
|
|
142
|
+
3. Present a summary: "This will generate 8 videos × 5s each using [model] at X cr/s = **Y credits total**. Proceed?"
|
|
143
|
+
4. **Suggest cheaper alternatives** if available: "I can use [cheaper model] at Z cr/s instead — same quality, saves N credits. Want that instead?"
|
|
144
|
+
5. Only proceed after the user confirms.
|
|
145
|
+
|
|
146
|
+
The only exception: single image generations under 5 credits — those can proceed without confirmation unless the user's balance is low.
|
|
147
|
+
|
|
148
|
+
### Rate Limiting & Batch Generation (CRITICAL)
|
|
131
149
|
|
|
132
|
-
### Rate Limiting
|
|
133
150
|
Kolbo enforces **10 generation requests per minute per user per tool type** (e.g. 10 image calls + 10 video calls = fine, but 11 image calls in 1 minute = rate limited). General media requests are capped at **300 per minute**.
|
|
134
151
|
|
|
135
|
-
|
|
136
|
-
|
|
137
|
-
|
|
138
|
-
|
|
152
|
+
**⚠️ MANDATORY: Sequential generation with delays.**
|
|
153
|
+
When making multiple generation calls (e.g. 8 images → 8 videos), you MUST:
|
|
154
|
+
|
|
155
|
+
1. **Call ONE generation at a time.** Never fire multiple generation tool calls in the same message. Send one, wait for the result, then send the next.
|
|
156
|
+
2. **Wait 8-10 seconds between each call.** After receiving a result, pause before the next generation. This prevents the API from silently dropping requests.
|
|
157
|
+
3. **Verify every result.** After all generations complete, count the results. If any are missing, retry the failed ones (with the same delay).
|
|
158
|
+
4. **Batch images**: use `generate_creative_director` instead of calling `generate_image` 5+ times — it handles multi-scene in one request. There is no batch equivalent for video — you must go one-by-one.
|
|
159
|
+
5. If you get a rate limit error (429), wait 60 seconds (the window resets per minute) and retry. Do not retry more than 2 times.
|
|
160
|
+
|
|
161
|
+
**Why this matters:** Firing multiple generation calls in parallel (e.g. 8 `generate_video_from_image` calls at once) causes the API to silently drop some requests — the user ends up with only half the results and no error message. This is the #1 cause of "I sent 8 images but only got 4 videos" complaints.
|
|
139
162
|
|
|
140
163
|
---
|
|
141
164
|
|
|
142
165
|
## Transcription & Audio/Video Analysis
|
|
143
166
|
|
|
144
|
-
Use `transcribe_audio`
|
|
167
|
+
Use `transcribe_audio` ONLY when the user explicitly asks for:
|
|
145
168
|
- A text transcript
|
|
146
169
|
- Subtitles (SRT format)
|
|
147
170
|
- Word-by-word timed subtitles (for karaoke, motion graphics, Remotion captions, video editing)
|
|
148
|
-
-
|
|
171
|
+
- Summary of what was **spoken/said** in the video
|
|
149
172
|
- Dialogue extraction from video
|
|
150
173
|
|
|
174
|
+
**Do NOT use `transcribe_audio` to "analyze" a video visually.** For visual analysis (what's on screen, what's shown, what prompts appear, etc.) use `upload_media` → `chat_send_message` with `media_urls`.
|
|
175
|
+
|
|
151
176
|
### Workflow
|
|
152
177
|
1. Call `transcribe_audio` with the `source` (URL or absolute local file path)
|
|
153
178
|
2. The tool returns:
|
|
@@ -184,25 +209,30 @@ Transcription supports files up to 30 minutes. For longer content, split the fil
|
|
|
184
209
|
|
|
185
210
|
### Visual Video/Audio/Image Analysis
|
|
186
211
|
|
|
187
|
-
**
|
|
188
|
-
|
|
189
|
-
`transcribe_audio` is ONLY for when the user explicitly says "transcribe", "subtitles", "SRT", or "what's being said". Everything else — "what do you see?", "describe this", "analyze this", "what's in this video?", "what prompts are shown?", or just pasting a file path with no instruction — is visual analysis via Gemini.
|
|
212
|
+
**The agent has built-in vision — use the right tool for the media type:**
|
|
190
213
|
|
|
191
|
-
|
|
214
|
+
| Media type | How to analyze |
|
|
215
|
+
|------------|----------------|
|
|
216
|
+
| **Image** (jpg, png, webp, etc.) | Read it directly with the `Read` tool — the agent sees images natively. No upload needed. |
|
|
217
|
+
| **Video / Audio** | `upload_media` → `chat_send_message` with `media_urls` (Gemini handles video/audio) |
|
|
218
|
+
| **Transcription** | `transcribe_audio` — ONLY when user explicitly says "transcribe", "subtitles", "SRT", or "what's being said" |
|
|
192
219
|
|
|
193
|
-
**
|
|
194
|
-
1. `upload_media({ source: "/absolute/local/path/to/file.mp4" })` → get CDN URL (skip if already a public URL)
|
|
195
|
-
2. `chat_send_message({ message: "<your question>", model: "gemini-2.5-pro", media_urls: ["<cdn-url>"] })`
|
|
220
|
+
**NEVER use ffmpeg or frame extraction for analysis. NEVER ask the user — just pick the right path above.**
|
|
196
221
|
|
|
197
|
-
**
|
|
222
|
+
**Video/Audio analysis workflow — Step 1 is NOT optional:**
|
|
223
|
+
1. `upload_media({ source: "/absolute/local/path/to/file.mp4" })` → returns `{ url, thumbnail_url, ... }`
|
|
224
|
+
- **Use `url`** — the actual CDN URL. Ignore `thumbnail_url` (preview JPG only).
|
|
225
|
+
2. `chat_send_message({ message: "<your question>", media_urls: [result.url] })`
|
|
226
|
+
- **`media_urls` is mandatory** — the model only sees the video if you pass the CDN URL here.
|
|
227
|
+
- Always an **array**: `media_urls: ["https://cdn.kolbo.ai/..."]`
|
|
228
|
+
- **Omit `model`** — Smart Select auto-routes to Gemini when media is detected
|
|
229
|
+
- **Sessions do NOT remember media between messages.** On retry: reuse the same CDN `url` (no re-upload) but always pass `media_urls` again.
|
|
230
|
+
- **Batch / many videos**: pass `model: "gemini-3.1-flash-lite-preview"` explicitly for cheaper bulk runs
|
|
198
231
|
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
203
|
-
| User shares a file path or video URL with no instruction | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
|
|
204
|
-
| User shares a video and asks about on-screen text / prompts / UI | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
|
|
205
|
-
| User wants both transcript AND visual description | Both — run `transcribe_audio` AND `chat_send_message` + Gemini |
|
|
232
|
+
**❌ Never do this:**
|
|
233
|
+
- Pass a local file path in `media_urls` — it won't work, only CDN URLs work
|
|
234
|
+
- Use the `.txt` URL from a transcription result as the video URL — that's text, not video
|
|
235
|
+
- Skip `upload_media` and try to construct a URL yourself
|
|
206
236
|
|
|
207
237
|
When in doubt, do visual analysis. Do not stop to ask.
|
|
208
238
|
|
|
@@ -512,8 +542,14 @@ Natural-language triggers that should prompt this skill + a tool call:
|
|
|
512
542
|
- "Transcribe this podcast episode" → `transcribe_audio`
|
|
513
543
|
- "What's being said in this video?" → `transcribe_audio` → analyze the text
|
|
514
544
|
- "Generate word-by-word subtitles for this audio" → `transcribe_audio` → share `word_by_word_srt_url`
|
|
545
|
+
- "Analyze this video" / "What do you see?" / "What's in this?" (with video file) → `upload_media` → `chat_send_message` with `media_urls` (omit model — auto-routes to Gemini)
|
|
546
|
+
- "What prompts are shown in this video?" → `upload_media` → `chat_send_message` with `media_urls` (omit model — auto-routes to Gemini)
|
|
515
547
|
- "Keep the same character across all these images" → `create_visual_dna` → `generate_image` with `visual_dna_ids`
|
|
516
548
|
- "Upload this file to my media library" → `upload_media`
|
|
549
|
+
- "Host this HTML page" / "Publish this landing page" / "Give me a public URL for this file" → `upload_media` → share the returned `url` (Kolbo CDN serves any file type publicly)
|
|
517
550
|
- "What video models are available?" → `list_models` (video)
|
|
518
551
|
- "How many credits do I have?" → `check_credits`
|
|
519
552
|
- "What's in this image?" (with upload) → describe per the Image Analysis section; no tool call needed unless the user asks to generate or edit
|
|
553
|
+
- "Create motion graphics" / "animated text" / "title sequence" → load the `remotion-best-practices` skill for Remotion-based motion graphics
|
|
554
|
+
- "Edit this video" / "cut this clip" / "remove silence" / "add subtitles" / "convert to 9:16" → load the `video-production` skill for FFmpeg-based editing
|
|
555
|
+
- "Create a short-form video" / "make a reel" / "YouTube short" → load the `short-form-video` skill
|
|
@@ -1,10 +1,10 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: photo-studio
|
|
3
3
|
description: >
|
|
4
|
-
Local AI photo generation and editing using FLUX.2 Klein 4B
|
|
4
|
+
Local AI photo generation and editing using FLUX.2 Klein 4B and Z-Image Turbo.
|
|
5
5
|
Use when the user wants to generate or edit images locally (no API cost, no rate limits).
|
|
6
6
|
Models at I:/AI-Models/. Script at G:/Projects/Kolbo.AI/github/training-loras/scripts/photo-studio.py
|
|
7
|
-
Keywords: generate image, edit image, flux klein, z-image turbo, local diffusion, photo studio
|
|
7
|
+
Keywords: generate image, edit image, flux klein, z-image turbo, local diffusion, photo studio
|
|
8
8
|
---
|
|
9
9
|
|
|
10
10
|
# Photo Studio — Local AI Image Generation & Editing
|
|
@@ -18,11 +18,11 @@ description: >
|
|
|
18
18
|
| FLUX.2 Klein 4B | `I:/AI-Models/flux2-klein-4b/` |
|
|
19
19
|
| Z-Image Turbo | `I:/AI-Models/z-image-turbo/` |
|
|
20
20
|
| Z-Image Adapter | `I:/AI-Models/z-image-turbo-adapter/zimage_turbo_training_adapter_v2.safetensors` |
|
|
21
|
-
|
|
|
21
|
+
| Vision / LLM | Agent's built-in vision (`Read` tool) for images. For video analysis load the `video-production` skill. |
|
|
22
22
|
|
|
23
23
|
## How to Run
|
|
24
24
|
|
|
25
|
-
Always use the ai-toolkit venv (has Flux2KleinPipeline + ZImagePipeline
|
|
25
|
+
Always use the ai-toolkit venv (has Flux2KleinPipeline + ZImagePipeline):
|
|
26
26
|
|
|
27
27
|
```bash
|
|
28
28
|
"G:/Projects/Kolbo.AI/github/ai-toolkit/venv/Scripts/python.exe" \
|
|
@@ -43,8 +43,6 @@ Always use the ai-toolkit venv (has Flux2KleinPipeline + ZImagePipeline + ollama
|
|
|
43
43
|
| `--steps N` | 20 | Inference steps |
|
|
44
44
|
| `--cfg N` | 3.5 | Guidance scale |
|
|
45
45
|
| `--seed N` | random | Deterministic seed |
|
|
46
|
-
| `--analyze` | off | Analyze `--image` with Gemma4, use as base description |
|
|
47
|
-
| `--enhance` | off | Enhance `--prompt` with Gemma4 before generating |
|
|
48
46
|
| `--adapter` | off | Load Z-Image Turbo adapter (zimage only) |
|
|
49
47
|
|
|
50
48
|
## Common Recipes
|
|
@@ -64,18 +62,29 @@ python photo-studio.py \
|
|
|
64
62
|
--model flux --width 1152 --height 2048
|
|
65
63
|
```
|
|
66
64
|
|
|
67
|
-
### Analyze image
|
|
68
|
-
```
|
|
65
|
+
### Analyze image then generate variation
|
|
66
|
+
```
|
|
67
|
+
# Step 1: Read the image — the agent sees it natively (built-in vision)
|
|
68
|
+
Read("/abs/path/to/char.jpg")
|
|
69
|
+
→ Agent describes the person: clothing, pose, features, style
|
|
70
|
+
|
|
71
|
+
# Step 2: Use the description as the prompt
|
|
69
72
|
python photo-studio.py \
|
|
70
|
-
--
|
|
71
|
-
--prompt "standing upright, full body" \
|
|
73
|
+
--prompt "<description from agent vision> standing upright, full body" \
|
|
72
74
|
--model flux
|
|
73
75
|
```
|
|
74
76
|
|
|
75
|
-
###
|
|
76
|
-
```
|
|
77
|
+
### Enhance a short prompt then generate
|
|
78
|
+
```
|
|
79
|
+
# Step 1: Enhance the prompt with Kolbo MCP
|
|
80
|
+
chat_send_message({
|
|
81
|
+
message: "Expand this into a detailed image generation prompt for a photorealistic portrait: 'street fashion guy'",
|
|
82
|
+
})
|
|
83
|
+
→ { content: "A young man in his mid-20s wearing..." }
|
|
84
|
+
|
|
85
|
+
# Step 2: Generate with the enhanced prompt
|
|
77
86
|
python photo-studio.py \
|
|
78
|
-
--prompt "
|
|
87
|
+
--prompt "<enhanced prompt from Kolbo>" \
|
|
79
88
|
--model zimage --width 1152 --height 2048
|
|
80
89
|
```
|
|
81
90
|
|
|
@@ -101,11 +110,10 @@ python photo-studio.py \
|
|
|
101
110
|
- Default: `--steps 8 --cfg 0.0 --width 1152 --height 2048` (~30s per image)
|
|
102
111
|
- Add `--adapter` to load the v2 training adapter
|
|
103
112
|
|
|
104
|
-
###
|
|
105
|
-
-
|
|
106
|
-
-
|
|
107
|
-
-
|
|
108
|
-
- Ollama auto-starts on Windows boot
|
|
113
|
+
### Vision & Prompt Enhancement
|
|
114
|
+
- For image analysis: use the agent's built-in vision — `Read` the image file directly, no MCP needed
|
|
115
|
+
- For prompt enhancement: `chat_send_message` asking Kolbo to expand a short prompt (text-only, no vision)
|
|
116
|
+
- Do NOT use `--analyze` or `--enhance` flags (those call a local model that is no longer used)
|
|
109
117
|
|
|
110
118
|
## When to use which model
|
|
111
119
|
|
|
@@ -117,6 +125,6 @@ python photo-studio.py \
|
|
|
117
125
|
| Quick text-to-image | `flux` or `zimage` |
|
|
118
126
|
| Portrait + face reference | `flux --image face.jpg` |
|
|
119
127
|
|
|
120
|
-
## Prompt Tips
|
|
128
|
+
## Prompt Tips
|
|
121
129
|
|
|
122
|
-
When the user gives a short/vague prompt,
|
|
130
|
+
When the user gives a short/vague prompt, use `chat_send_message` to let Kolbo AI expand it before passing to the script. For image editing, first analyze the source image with the agent's built-in vision (`Read` the image), then use the description as the base prompt.
|
|
@@ -1,13 +1,16 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: video-production
|
|
3
3
|
description: >
|
|
4
|
-
Full-stack video production assistant. Analyzes
|
|
4
|
+
Full-stack video production assistant. Analyzes video content visually (Gemini),
|
|
5
5
|
generates transcriptions/SRT subtitles, plans and creates motion graphics (Remotion),
|
|
6
6
|
generates B-roll images/videos, produces timeline XMLs for Premiere/DaVinci.
|
|
7
|
-
|
|
8
|
-
|
|
7
|
+
Downloads YouTube videos with yt-dlp.
|
|
8
|
+
Use for: video analysis, visual analysis, describe video, what's in this video,
|
|
9
|
+
transcription, subtitles, motion graphics, B-roll, shorts, timeline XML, clip cutting,
|
|
10
|
+
silence removal, After Effects, Premiere Pro, DaVinci Resolve, YouTube download.
|
|
9
11
|
Keywords: video edit, ffmpeg, remotion, after effects, premiere, davinci, shorts, subtitles,
|
|
10
|
-
motion graphics, clip, render, transcribe, xml, timeline, b-roll, talking head, analyze
|
|
12
|
+
motion graphics, clip, render, transcribe, xml, timeline, b-roll, talking head, analyze,
|
|
13
|
+
yt-dlp, youtube, download, gemini, vision
|
|
11
14
|
allowed-tools:
|
|
12
15
|
- Read
|
|
13
16
|
- Write
|
|
@@ -24,29 +27,122 @@ allowed-tools:
|
|
|
24
27
|
|
|
25
28
|
# Video Production — Strategy Map
|
|
26
29
|
|
|
30
|
+
## ⚠️ DEFAULT RULE: Video Analysis = Visual Analysis (NOT Transcription)
|
|
31
|
+
|
|
32
|
+
**The agent has built-in vision for images. For videos, always use Gemini via Kolbo MCP.**
|
|
33
|
+
|
|
34
|
+
| Media type | Action |
|
|
35
|
+
|------------|--------|
|
|
36
|
+
| **Image** (jpg, png, etc.) | Agent reads it directly — no upload needed |
|
|
37
|
+
| **Video** — "analyze", "describe", "what's in this?", "what prompts?", file path with no instruction | `upload_media` → `chat_send_message` + Gemini |
|
|
38
|
+
| **Transcription** — "transcribe", "subtitles", "SRT", "what's being said", "captions" | `transcribe_audio` only |
|
|
39
|
+
| Both visual + transcript | Run both |
|
|
40
|
+
|
|
41
|
+
**Never use ffmpeg to extract frames for analysis. Never use local Ollama/vision models. Commit to the right action — do not ask the user. Wait for `chat_send_message` to return before proceeding — it polls until done (up to 2 min). Do NOT fall back to ffmpeg or any other approach if it takes time.**
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## Kolbo MCP Tools (Active When `kolbo auth login` Is Done)
|
|
46
|
+
|
|
47
|
+
These are available as MCP tools — use them directly without any Python/API key setup:
|
|
48
|
+
|
|
49
|
+
| Tool | Use |
|
|
50
|
+
|------|-----|
|
|
51
|
+
| `upload_media` | Upload local file to Kolbo CDN → get stable public URL |
|
|
52
|
+
| `chat_send_message` | Send message + `media_urls` array to Gemini for visual analysis |
|
|
53
|
+
| `transcribe_audio` | Transcribe audio/video to text + SRT (ElevenLabs Scribe) |
|
|
54
|
+
| `generate_image` | Generate B-roll images |
|
|
55
|
+
| `generate_video` | Generate B-roll videos |
|
|
56
|
+
| `generate_video_from_image` | Animate a still into video |
|
|
57
|
+
| `generate_music` | Generate background music |
|
|
58
|
+
| `generate_speech` | TTS for voiceover |
|
|
59
|
+
| `generate_sound` | Sound effects |
|
|
60
|
+
| `list_models` | Browse available models by type |
|
|
61
|
+
| `check_credits` | Check remaining Kolbo credit balance |
|
|
62
|
+
|
|
63
|
+
### Visual Analysis Workflow — MANDATORY for all video analysis
|
|
64
|
+
|
|
65
|
+
**Step 1 is NOT optional. You cannot skip `upload_media` or construct the URL yourself.**
|
|
66
|
+
|
|
67
|
+
```
|
|
68
|
+
Step 1: upload_media({ source: "/absolute/path/to/video.mp4" })
|
|
69
|
+
→ Returns: { url, thumbnail_url, ... }
|
|
70
|
+
→ Save the "url" field — this is the CDN URL you will pass to Gemini
|
|
71
|
+
→ NEVER use thumbnail_url (it's a JPG preview, not the video)
|
|
72
|
+
|
|
73
|
+
Step 2: chat_send_message({
|
|
74
|
+
message: "Describe this video in detail. What is shown?",
|
|
75
|
+
media_urls: ["<url from step 1>"] ← must be an array, must be the "url" field
|
|
76
|
+
})
|
|
77
|
+
→ returns: { content: "..." }
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
**❌ Common mistakes that break video analysis:**
|
|
81
|
+
- Skipping `upload_media` and passing a local file path to `chat_send_message` — local paths don't work
|
|
82
|
+
- Using the transcription `.txt` URL as the `media_urls` value — Gemini needs the actual video CDN URL
|
|
83
|
+
- Using `thumbnail_url` instead of `url` from the `upload_media` response
|
|
84
|
+
- Calling `transcribe_audio` first then passing its output URL as the video — transcription gives text, not video
|
|
85
|
+
|
|
86
|
+
**Omit `model`** — Smart Select detects video/audio and auto-routes to Gemini.
|
|
87
|
+
**Sessions do NOT remember media between messages.** On retry: reuse the same CDN `url` from step 1 (no re-upload needed) but always pass `media_urls` again.
|
|
88
|
+
|
|
89
|
+
**Batch analysis (many videos)**: Pass `model: "gemini-3.1-flash-lite-preview"` explicitly for cheaper bulk runs.
|
|
90
|
+
|
|
91
|
+
For YouTube videos — download first with yt-dlp (see below), then follow steps 1–2 above.
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
27
95
|
## Pipeline
|
|
28
96
|
|
|
29
97
|
```
|
|
30
|
-
Input video
|
|
31
|
-
|
|
32
|
-
|
|
98
|
+
Input: local video / YouTube URL / uploaded file
|
|
99
|
+
|
|
100
|
+
→ [DEFAULT] Visual Analysis: upload_media → chat_send_message (Gemini)
|
|
101
|
+
→ [EXPLICIT REQUEST] Transcription: transcribe_audio → SRT / text
|
|
102
|
+
→ [EDITING] FFmpeg: cut, silence removal, 9:16 conversion
|
|
103
|
+
→ [MOTION GRAPHICS] Remotion: compositions, captions, B-roll
|
|
104
|
+
→ Output: Premiere XML / DaVinci EDL / MP4s / SRT
|
|
33
105
|
```
|
|
34
106
|
|
|
35
107
|
## APIs & Capabilities
|
|
36
108
|
|
|
37
109
|
| Service | Use |
|
|
38
110
|
|---------|-----|
|
|
39
|
-
|
|
|
40
|
-
|
|
|
41
|
-
|
|
|
42
|
-
|
|
|
43
|
-
| Runway | Image-to-video, video-to-video |
|
|
44
|
-
| FLUX / BFL | High quality still image generation |
|
|
45
|
-
| ElevenLabs | TTS, voice cloning, SFX |
|
|
46
|
-
| Suno | Background music generation |
|
|
111
|
+
| Kolbo MCP (`upload_media` + `chat_send_message`) | **Primary** — visual video/image analysis via Gemini |
|
|
112
|
+
| Kolbo MCP (`transcribe_audio`) | **Primary** — transcription, word-level SRT, multilingual |
|
|
113
|
+
| yt-dlp | Download YouTube/social media videos |
|
|
114
|
+
| FFmpeg | Local video editing, cutting, silence removal, format conversion |
|
|
47
115
|
| Remotion Lambda | Cloud render motion graphics |
|
|
116
|
+
| fal.ai (MCP) | Image & video B-roll generation |
|
|
117
|
+
| ElevenLabs | TTS, voice cloning, SFX (via Kolbo MCP `generate_speech`) |
|
|
118
|
+
| Suno | Background music (via Kolbo MCP `generate_music`) |
|
|
119
|
+
|
|
120
|
+
> Kolbo MCP tools need no API keys — auth is handled by `kolbo auth login`.
|
|
121
|
+
> FFmpeg/yt-dlp need to be installed locally on the machine.
|
|
48
122
|
|
|
49
|
-
|
|
123
|
+
## YouTube / Social Media Download (yt-dlp)
|
|
124
|
+
|
|
125
|
+
Download video from YouTube, TikTok, Instagram, Twitter, etc.:
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Best quality MP4
|
|
129
|
+
yt-dlp -f "bestvideo[height<=1080][ext=mp4]+bestaudio/best" \
|
|
130
|
+
--merge-output-format mp4 \
|
|
131
|
+
-o "%(id)s.%(ext)s" <url>
|
|
132
|
+
|
|
133
|
+
# With subtitles
|
|
134
|
+
yt-dlp -f "bestvideo[height<=1080][ext=mp4]+bestaudio/best" \
|
|
135
|
+
--write-auto-sub --sub-lang en --convert-subs srt \
|
|
136
|
+
--merge-output-format mp4 \
|
|
137
|
+
-o "%(id)s.%(ext)s" <url>
|
|
138
|
+
|
|
139
|
+
# Audio only (for transcription)
|
|
140
|
+
yt-dlp -f "bestaudio" --extract-audio --audio-format mp3 -o "%(id)s.%(ext)s" <url>
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
After download → upload to Kolbo CDN with `upload_media` → analyze visually with `chat_send_message`.
|
|
144
|
+
|
|
145
|
+
---
|
|
50
146
|
|
|
51
147
|
## Key Rules
|
|
52
148
|
|