@kolbo/kolbo-code-linux-arm64-musl 2.0.0 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,25 +1,78 @@
1
1
  ---
2
2
  name: kolbo
3
- description: Generate or analyze creative media through Kolbo AI. Load this skill whenever the user asks to create, edit, prompt, or analyze images, videos, music, speech, or sound effects — or to list available AI models / check credit balance. It contains the MCP tool workflow and the prompt-engineering rules for each media type.
3
+ description: Generate, edit, or analyze creative media through Kolbo AI. Load this skill whenever the user asks to create, edit, prompt, or analyze images, videos, music, speech, sound effects, 3D models — or to transcribe audio/video, manage media, use Visual DNA for consistency, check credits, or browse models/presets/moodboards. It contains the MCP tool workflow and the prompt-engineering rules for each media type.
4
4
  ---
5
5
 
6
- # Kolbo AI — Creative Generation & Analysis
6
+ # Kolbo AI — Creative Generation, Analysis & Transcription
7
7
 
8
8
  You have direct access to the Kolbo AI creative platform via MCP tools (auto-configured by `kolbo auth login`). Use them to generate and deliver real content — do NOT just describe what you would create.
9
9
 
10
10
  ## Available MCP Tools
11
11
 
12
+ ### Generation
13
+
14
+ | Tool | Description |
15
+ |------|-------------|
16
+ | `generate_image` | Create images from text prompts. Supports Visual DNA, moodboards, reference images, batch generation, web-search grounding. |
17
+ | `generate_image_edit` | Edit/transform an existing image (background removal, color changes, compositing). Pass source images + edit prompt. |
18
+ | `generate_creative_director` | Generate a coordinated multi-scene set (1–8 scenes) from one creative brief. Ideal for storyboards, ad campaigns, product showcases. Supports image and video modes. |
19
+ | `generate_video` | Create videos from text prompts. Supports Visual DNA and reference images for consistency. |
20
+ | `generate_video_from_image` | Animate a still image into video. Prompt describes the motion, not the subject. |
21
+ | `generate_video_from_video` | Restyle/transform an existing video (style transfer, scene restyling, subject swap). Keeps the original motion. |
22
+ | `generate_elements` | Generate video from reference assets (images/videos) + prompt. Use when animating specific uploaded assets. |
23
+ | `generate_first_last_frame` | Generate video that morphs from a first frame to a last frame (keyframe interpolation). |
24
+ | `generate_lipsync` | Lipsync an audio track to a source image or video face. Accepts local files or URLs. |
25
+ | `generate_music` | Create music from descriptions. Supports instrumental, custom lyrics, style, vocal gender. |
26
+ | `generate_speech` | Convert text to speech (TTS). Default: ElevenLabs. Use `list_voices` to pick a voice. |
27
+ | `generate_sound` | Generate sound effects from descriptions (foley, ambient, impacts, UI sounds). |
28
+ | `generate_3d` | Generate 3D models from text, single image, or multi-view images. Returns GLB, FBX, OBJ, USDZ. |
29
+
30
+ ### Transcription & Analysis
31
+
32
+ | Tool | Description |
33
+ |------|-------------|
34
+ | `transcribe_audio` | Transcribe audio or video into text + SRT subtitles + word-by-word SRT. Accepts local files or URLs. |
35
+
36
+ ### Voice & Model Discovery
37
+
12
38
  | Tool | Description |
13
39
  |------|-------------|
14
- | `generate_image` | Create images from text prompts. Returns image URL(s). |
15
- | `generate_video` | Create videos from text. Returns video URL. |
16
- | `generate_video_from_image` | Animate a still image into video. Returns video URL. |
17
- | `generate_music` | Create music from descriptions. Returns audio URL. |
18
- | `generate_speech` | Convert text to speech. Returns audio URL. |
19
- | `generate_sound` | Generate sound effects. Returns audio URL. |
20
40
  | `list_models` | Browse available AI models filtered by type. |
41
+ | `list_voices` | List available TTS voices with filtering by provider, language, gender. |
21
42
  | `check_credits` | Check remaining Kolbo credit balance. |
22
- | `get_generation_status` | Poll status of an in-progress generation by ID. |
43
+ | `get_generation_status` | Poll status of an in-progress generation by ID (fallback for timeouts). |
44
+
45
+ ### Media Library
46
+
47
+ | Tool | Description |
48
+ |------|-------------|
49
+ | `upload_media` | Upload a local file or URL to the user's Kolbo media library (CDN). Use for multi-tool workflows. |
50
+ | `list_media` | Browse user's uploaded media with filtering by type and search. |
51
+
52
+ ### Visual DNA (Character/Style Consistency)
53
+
54
+ | Tool | Description |
55
+ |------|-------------|
56
+ | `create_visual_dna` | Create a Visual DNA profile from reference images/video/audio for character, style, product, or scene consistency. |
57
+ | `list_visual_dnas` | List your Visual DNA profiles (id, name, type, thumbnail). |
58
+ | `get_visual_dna` | Fetch full profile details including system_prompt and reference images. |
59
+ | `delete_visual_dna` | Delete a Visual DNA profile. |
60
+
61
+ ### Moodboards & Presets
62
+
63
+ | Tool | Description |
64
+ |------|-------------|
65
+ | `list_moodboards` | List available moodboards (personal, system presets, org). |
66
+ | `get_moodboard` | Fetch a moodboard's master_prompt, style_guide, and images. |
67
+ | `list_presets` | Browse generation presets (image/video/music templates with bundled style direction). |
68
+
69
+ ### Chat
70
+
71
+ | Tool | Description |
72
+ |------|-------------|
73
+ | `chat_send_message` | Send a message to Kolbo AI chat. Supports web search and deep think modes. |
74
+ | `chat_list_conversations` | List your SDK chat conversations. |
75
+ | `chat_get_messages` | Fetch messages in a conversation (with media URLs). |
23
76
 
24
77
  ## Core Workflow
25
78
 
@@ -34,21 +87,96 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
34
87
  | Type | Use for |
35
88
  |------|---------|
36
89
  | `image` | Still-image generation |
90
+ | `image_edit` | Image editing / transformation |
37
91
  | `video` | Text-to-video |
38
92
  | `video_from_image` | Image-to-video animation |
93
+ | `lipsync` | Audio-to-face lipsync |
39
94
  | `music` | Music generation |
40
95
  | `speech` | Text-to-speech |
41
96
  | `sound` | Sound effects |
97
+ | `three_d` | 3D model generation |
42
98
 
43
99
  ### Cost Awareness
44
100
 
45
101
  Creative generations bill against the user's Kolbo credit balance. Order of expense (rough):
46
- - **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s)
47
- - **Medium**: music (~30s-2min)
48
- - **Expensive**: video (~1-5min, highest credit cost)
102
+ - **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s), transcription (by duration)
103
+ - **Medium**: music (~30s-2min), 3D (~1-3min)
104
+ - **Expensive**: video (~1-5min, highest credit cost), lipsync (~1-3min)
49
105
 
50
106
  Rule of thumb: confirm intent before firing off a video generation unless the user was explicit. For images, just generate.
51
107
 
108
+ ### Rate Limiting
109
+ Kolbo enforces **10 generation requests per minute per user per tool type** (e.g. 10 image calls + 10 video calls = fine, but 11 image calls in 1 minute = rate limited). General media requests are capped at **300 per minute**.
110
+
111
+ When making multiple generation calls:
112
+ - **Stagger calls** — do NOT fire all in parallel. Space them ~5-10 seconds apart.
113
+ - **Batch images**: use `generate_creative_director` instead of calling `generate_image` 5+ times — it handles multi-scene in one request.
114
+ - If you get a rate limit error (429), wait 60 seconds (the window resets per minute) and retry. Do not retry more than 2 times.
115
+
116
+ ---
117
+
118
+ ## Transcription & Audio/Video Analysis
119
+
120
+ Use `transcribe_audio` whenever the user provides an audio or video file and wants:
121
+ - A text transcript
122
+ - Subtitles (SRT format)
123
+ - Word-by-word timed subtitles (for karaoke, motion graphics, Remotion captions, video editing)
124
+ - Content analysis or summary of spoken content
125
+ - Dialogue extraction from video
126
+
127
+ ### Workflow
128
+ 1. Call `transcribe_audio` with the `source` (URL or absolute local file path)
129
+ 2. The tool returns:
130
+ - `text` — full transcript as plain text
131
+ - `srt_url` — download URL for grouped SRT subtitles (configurable words-per-line)
132
+ - `word_by_word_srt_url` — download URL for **word-by-word SRT** (one word per subtitle entry with precise timestamps from ElevenLabs Scribe v2)
133
+ - `txt_url` — download URL for plain text file
134
+ - `duration` — audio duration in seconds
135
+ 3. Analyze the transcript text as needed (summarize, translate, extract topics, answer questions about content)
136
+
137
+ ### Supported Formats
138
+ - **Audio**: mp3, wav, m4a, flac, aac
139
+ - **Video** (extracts audio track): mp4, mov, webm, mkv, avi, m4v
140
+
141
+ ### Word-by-Word Transcription
142
+ The `word_by_word_srt_url` contains an SRT file where each subtitle entry is a **single word** with precise start/end timestamps (powered by ElevenLabs Scribe v2). This is ideal for:
143
+ - **Karaoke-style captions** — highlight one word at a time
144
+ - **Remotion/motion graphics** — animate text word-by-word synced to audio
145
+ - **Video editing** — precise cut points aligned to speech
146
+ - **Accessibility** — word-level navigation for hearing-impaired users
147
+
148
+ The regular `srt_url` groups words into readable subtitle lines (default 12 words per line, up to 2 lines per subtitle).
149
+
150
+ ### Use Cases & Examples
151
+ - "Transcribe this podcast" → `transcribe_audio` with the audio URL
152
+ - "What's being said in this video?" → `transcribe_audio` → analyze the returned text
153
+ - "Generate subtitles for my video" → `transcribe_audio` → share the `srt_url`
154
+ - "I need word-by-word timing for this audio" → `transcribe_audio` → share `word_by_word_srt_url`
155
+ - "Summarize this meeting recording" → `transcribe_audio` → summarize the text
156
+ - "Extract key points from this lecture" → `transcribe_audio` → analyze and extract
157
+
158
+ ### Long Content
159
+ Transcription supports files up to 30 minutes. For longer content, split the file first or provide segments.
160
+
161
+ ### Visual Video/Audio Analysis (what's happening, not just what's said)
162
+ `transcribe_audio` only extracts **speech**. If the user wants to understand **what's visually happening** in a video (scenes, actions, objects, on-screen text) or needs a multimodal AI to reason about the content, use `chat_send_message` with a video-capable model instead.
163
+
164
+ **Video-capable models**: `gemini-2.5-pro`, `gemini-2.5-flash` — these can watch video and analyze visual content.
165
+
166
+ **Workflow for visual analysis:**
167
+ 1. Upload the video with `upload_media` to get a stable CDN URL
168
+ 2. Call `chat_send_message` with the video URL in the message and a video-capable model (e.g. `gemini-2.5-pro`)
169
+ 3. Ask your analysis question: "Describe what happens in this video", "What products are shown?", "Summarize the key scenes"
170
+
171
+ **When to use which:**
172
+
173
+ | User intent | Tool |
174
+ |-------------|------|
175
+ | "Transcribe this" / "What's being said?" | `transcribe_audio` |
176
+ | "Generate subtitles" / "Word-by-word timing" | `transcribe_audio` |
177
+ | "What's happening in this video?" / "Describe the scenes" | `chat_send_message` + Gemini |
178
+ | "Analyze this video and transcribe it" | Both — `transcribe_audio` for text + `chat_send_message` for visual |
179
+
52
180
  ---
53
181
 
54
182
  ## Image Prompts
@@ -61,14 +189,36 @@ Rule of thumb: confirm intent before firing off a video generation unless the us
61
189
  - **`enhance_prompt: true`** (default) will improve most prompts automatically. Turn it off only if the user's prompt is already fully engineered or they want literal wording.
62
190
 
63
191
  ### Image Editing (image-to-image)
64
- When the model can see the uploaded image, describe the **change**, not the unchanged parts.
192
+
193
+ Use `generate_image_edit` when the user wants to modify an existing image. Pass the source image URL(s) in `source_images` and describe the change in `prompt`.
194
+
65
195
  - Good: "Turn the sky orange and add drifting clouds"
66
196
  - Bad: "A mountain landscape with an orange sky and drifting clouds" (re-describes what's already in the image)
67
197
 
68
198
  Simple edits deserve simple prompts. Only elaborate for genuinely complex, multi-step transformations.
69
199
 
70
200
  ### Multi-Scene / Campaigns
71
- For storyboards, campaigns, or character-consistent sequences, call `generate_image` once per scene with the same base style cues carried across prompts. Kolbo's web app has a dedicated Creative Director feature for this; in the CLI the workflow is sequential `generate_image` calls.
201
+ For storyboards, campaigns, or character-consistent sequences, use `generate_creative_director` it generates 1–8 coordinated scenes from a single creative brief with consistent style. Pass `visual_dna_ids` and/or `moodboard_id` for character/style consistency across all scenes.
202
+
203
+ In the CLI, you can also do sequential `generate_image` calls with the same Visual DNA profiles.
204
+
205
+ ---
206
+
207
+ ## Visual DNA (Character/Style Consistency)
208
+
209
+ Visual DNA profiles capture the visual "identity" of a character, style, product, or scene from reference media.
210
+
211
+ ### Workflow
212
+ 1. **Create** a profile with `create_visual_dna` — provide reference images (max 4), optionally video and audio
213
+ 2. **Types**: `character` (default), `style`, `product`, `scene`
214
+ 3. **Use** the profile by passing its `id` in `visual_dna_ids` when calling any generation tool
215
+ 4. **List/inspect** profiles with `list_visual_dnas` / `get_visual_dna`
216
+
217
+ ### When to Use
218
+ - User wants the same character across multiple images/videos
219
+ - User wants a consistent brand style across a campaign
220
+ - User references "keep the same look" or "same character"
221
+ - User provides reference photos of a person/product to maintain consistency
72
222
 
73
223
  ---
74
224
 
@@ -89,6 +239,20 @@ The model can see the starting frame. Describe **what happens**, not what the im
89
239
  - Good: "Slow dolly-in on the subject. Her hair drifts in a light breeze. Soft particles float through the air. [6s]"
90
240
  - Bad: "A woman with long brown hair standing in a forest, wearing a red dress, with golden sunlight..." (re-describes the image)
91
241
 
242
+ ### Video-to-Video (Restyle)
243
+ Use `generate_video_from_video` to restyle an existing video. Describe the **new style**, not the original content — the model preserves the original motion.
244
+ - Good: "Transform into anime style with cel-shading and vibrant colors"
245
+ - Bad: "A person walking down a street" (re-describes what's already in the video)
246
+
247
+ ### Elements (Reference Assets → Video)
248
+ Use `generate_elements` when the user has specific assets (product photos, character references) they want animated into a video. Pass them as `reference_images` (URLs) or `files` (local paths).
249
+
250
+ ### First/Last Frame (Keyframe Interpolation)
251
+ Use `generate_first_last_frame` when the user provides two keyframes and wants the model to create a smooth transition between them.
252
+
253
+ ### Lipsync
254
+ Use `generate_lipsync` to sync audio to a face in an image or video. Both `source` (face) and `audio` accept URLs or local file paths.
255
+
92
256
  ### Camera Vocabulary
93
257
 
94
258
  Pick what fits the mood. Every shot gets at least one.
@@ -150,6 +314,17 @@ Format: `extreme slow-motion [Xs] — [micro-movements in ultra slow-mo] — sna
150
314
 
151
315
  ---
152
316
 
317
+ ## 3D Generation
318
+
319
+ Use `generate_3d` for creating 3D models. Three modes:
320
+ - **Text mode**: prompt-only (e.g., "a medieval sword with ornate handle")
321
+ - **Single image mode**: one reference image + optional prompt
322
+ - **Multi-view mode**: 2+ reference images for higher-quality reconstruction
323
+
324
+ Returns downloadable model files in GLB, FBX, OBJ, and USDZ formats. Use `list_models` with `type: "three_d"` to discover available models.
325
+
326
+ ---
327
+
153
328
  ## Music Prompts
154
329
 
155
330
  Describe **genre → mood → instrumentation → tempo → era**, in that order.
@@ -164,10 +339,10 @@ Describe **genre → mood → instrumentation → tempo → era**, in that order
164
339
 
165
340
  ## Speech (TTS)
166
341
 
167
- - Call `list_models` with `type: speech` to get voice identifiers. Pass the `identifier` as `model` for a consistent voice.
168
- - The voice **is** the model for speech there is no separate voice parameter.
342
+ - Call `list_voices` to find available voices. Filter by `provider`, `language`, or `gender`.
343
+ - Pass the returned `voice_id` (or the voice's display name like "Rachel") as the `voice` parameter in `generate_speech`.
344
+ - For multilingual content, pick a voice that supports the target language.
169
345
  - For long text, split at natural sentence boundaries. Each generation has a character cap; chunk long-form content into multiple calls.
170
- - For multilingual content, pick a voice that supports the target language from `list_models`.
171
346
 
172
347
  ---
173
348
 
@@ -179,6 +354,35 @@ Describe **genre → mood → instrumentation → tempo → era**, in that order
179
354
 
180
355
  ---
181
356
 
357
+ ## Moodboards & Presets
358
+
359
+ **Moodboards** provide style direction (master prompt + style guide + reference images). Pass a `moodboard_id` to any generation tool to apply its style.
360
+ - `list_moodboards` to browse available options
361
+ - `get_moodboard` to see full details before applying
362
+
363
+ **Presets** bundle prompt templates + style direction for specific creative looks. Pass a `preset_id` to generation tools.
364
+ - `list_presets` with optional `type` filter ("image", "video", "music", "text_to_video")
365
+
366
+ ---
367
+
368
+ ## Media Library
369
+
370
+ Use `upload_media` to upload local files or URLs to the Kolbo CDN for stable hosting. Useful when:
371
+ - A local file needs to be referenced in multiple generation calls
372
+ - You want a permanent CDN URL instead of an ephemeral local path
373
+
374
+ Use `list_media` to browse previously uploaded content (filter by type, search by name).
375
+
376
+ ---
377
+
378
+ ## Chat
379
+
380
+ Use `chat_send_message` to interact with Kolbo AI models (GPT-4o, Claude, etc.) with optional web search and deep think modes. Conversations persist via `session_id` — omit to start new, pass to continue.
381
+
382
+ Use `chat_list_conversations` and `chat_get_messages` to browse conversation history.
383
+
384
+ ---
385
+
182
386
  ## Image Analysis (when the user uploads images)
183
387
 
184
388
  When the user shares an image and asks about it:
@@ -188,7 +392,7 @@ When the user shares an image and asks about it:
188
392
  - **Extract text verbatim** when asked (OCR-style requests are fine).
189
393
  - **Cannot identify real people.** Describe hair, clothing, pose, expression, and apparent role — but never name a specific individual, even a well-known public figure. If the user insists, decline and offer to describe instead.
190
394
  - **Copyrighted content**: summarize and reference, don't reproduce verbatim large chunks.
191
- - If the user wants an **edit** based on the analysis, hand off to `generate_video_from_image` (motion) or `generate_image` with an image-to-image model (visual edit) — see the Image Editing section above for prompt structure.
395
+ - If the user wants an **edit** based on the analysis, hand off to `generate_image_edit` (visual edit) or `generate_video_from_image` (motion).
192
396
 
193
397
  ---
194
398
 
@@ -217,16 +421,56 @@ Full public documentation for Kolbo Code (the CLI you are running inside) lives
217
421
 
218
422
  The MDX sources are in the `kolbo-docs` repo under `content/docs/kolbo-code/`. When the user's question has a concrete answer in one of those pages, cite the path and summarize — do not invent new instructions.
219
423
 
424
+ ## Troubleshooting
425
+
426
+ ### "API key is invalid or expired"
427
+ This usually means the CLI is sending a key to the wrong API endpoint.
428
+
429
+ **Common cause — whitelabel overlap:** if the user previously used regular `kolbo` and then switched to a whitelabel/partner CLI (e.g. `sapir`), the old API key may still be cached against the main Kolbo API. Running `kolbo` instead of the branded command (`sapir`) overwrites the MCP config with the wrong endpoint.
430
+
431
+ **Fix:** tell the user to re-authenticate with their branded CLI command:
432
+ ```
433
+ sapir auth login
434
+ ```
435
+ (Replace `sapir` with their actual CLI command.)
436
+
437
+ Then **restart the editor/session** so the MCP picks up the new key and endpoint.
438
+
439
+ **Important:** whitelabel users must always use their branded CLI command (e.g. `sapir`), not `kolbo`, to keep the MCP pointed at the correct API.
440
+
441
+ ### MCP tools not responding or not found
442
+ If Kolbo tools timeout or aren't listed, the MCP server may not be wired. Tell the user to run:
443
+ ```
444
+ <their-cli-command> auth login
445
+ ```
446
+ This re-wires the MCP configuration automatically. Then restart the session.
447
+
448
+ ### "Rate limited" (429 errors)
449
+ Kolbo allows 10 generation requests per minute per tool type. Wait 60 seconds and retry. Use `generate_creative_director` for batch image work instead of multiple `generate_image` calls.
450
+
451
+ ---
452
+
220
453
  ## Examples
221
454
 
222
455
  Natural-language triggers that should prompt this skill + a tool call:
223
456
 
224
457
  - "Generate an image of a neon-lit Tokyo street at night" → `list_models` (image) → `generate_image`
458
+ - "Remove the background from this image" → `list_models` (image_edit) → `generate_image_edit`
459
+ - "Create a storyboard for a coffee brand ad" → `list_models` (image) → `generate_creative_director`
225
460
  - "Create a 5-second cinematic video of ocean waves at sunset" → `list_models` (video) → `generate_video` with camera + mood guidance
226
461
  - "Animate this product photo with a 360° orbit" → `list_models` (video_from_image) → `generate_video_from_image`
462
+ - "Restyle this video as anime" → `generate_video_from_video`
463
+ - "Make this character talk with this voiceover" → `generate_lipsync`
464
+ - "Create a smooth transition between these two frames" → `generate_first_last_frame`
227
465
  - "Make a lo-fi hip hop beat, instrumental, 85 BPM" → `list_models` (music) → `generate_music`
228
- - "Say this in English with a natural female voice: Welcome to Kolbo" → `list_models` (speech) → `generate_speech`
466
+ - "Say this in English with a natural female voice: Welcome to Kolbo" → `list_voices` → `generate_speech`
229
467
  - "Generate a door slam sound effect" → `list_models` (sound) → `generate_sound`
468
+ - "Create a 3D model of a medieval castle" → `list_models` (three_d) → `generate_3d`
469
+ - "Transcribe this podcast episode" → `transcribe_audio`
470
+ - "What's being said in this video?" → `transcribe_audio` → analyze the text
471
+ - "Generate word-by-word subtitles for this audio" → `transcribe_audio` → share `word_by_word_srt_url`
472
+ - "Keep the same character across all these images" → `create_visual_dna` → `generate_image` with `visual_dna_ids`
473
+ - "Upload this file to my media library" → `upload_media`
230
474
  - "What video models are available?" → `list_models` (video)
231
475
  - "How many credits do I have?" → `check_credits`
232
476
  - "What's in this image?" (with upload) → describe per the Image Analysis section; no tool call needed unless the user asks to generate or edit
@@ -0,0 +1,146 @@
1
+ ---
2
+ name: music-prompting
3
+ description: >
4
+ Music generation prompting guide: BPM selection by video type, key/mood mapping, prompt
5
+ structure for background music, duration matching, looping strategies, section-mapped scoring.
6
+ Use when generating background music for video or crafting music generation prompts.
7
+ Keywords: music, BPM, tempo, key, mood, instrumental, background music, suno, elevenlabs,
8
+ music generation, prompt, genre, looping, score, soundtrack
9
+ ---
10
+
11
+ # Music Generation — Prompting Guide
12
+
13
+ ## Quick Reference
14
+
15
+ ```
16
+ INSTRUMENTAL: Always force_instrumental=true for video background
17
+ PROMPT ORDER: genre/style → BPM → key/mood → instruments → energy → purpose
18
+ KEY RULE: Music must be 18-20 dB below narration (see sound-design skill)
19
+ ALWAYS INCLUDE: "background" or "underscore" in every prompt
20
+ ```
21
+
22
+ ## BPM Selection by Video Type
23
+
24
+ | Video Type | BPM Range | Prompt Fragment |
25
+ |-----------|-----------|-----------------|
26
+ | Educational explainer | 80-100 | "gentle ambient electronic, 90 BPM" |
27
+ | Corporate / tech | 100-120 | "upbeat corporate pop, 110 BPM, positive" |
28
+ | Epic / dramatic reveal | 60-80 | "cinematic orchestral, 70 BPM, building tension" |
29
+ | Fast-paced montage | 120-140 | "energetic electronic, 130 BPM, driving beat" |
30
+ | Meditation / calm | 50-70 | "ambient drone, 60 BPM, peaceful" |
31
+ | Comedy / lighthearted | 100-130 | "playful ukulele pop, 120 BPM, whimsical" |
32
+ | Sad / reflective | 60-80 | "melancholic piano, 65 BPM, minor key" |
33
+ | Action / hype | 140-170 | "high-intensity drum and bass, 160 BPM" |
34
+
35
+ ## Key and Mood Mapping
36
+
37
+ | Mood | Key | Musical Characteristics |
38
+ |------|-----|----------------------|
39
+ | Happy / upbeat | C major, G major | Bright, resolved, energetic |
40
+ | Serious / professional | D minor, A minor | Grounded, authoritative |
41
+ | Mysterious / curious | E minor, B minor | Tension, anticipation |
42
+ | Triumphant / inspiring | D major, Bb major | Expansive, climactic |
43
+ | Melancholic / thoughtful | F minor, C minor | Reflective, emotional |
44
+ | Neutral / ambient | C major, Am | Unobtrusive, background |
45
+
46
+ ## Prompt Structure
47
+
48
+ ```
49
+ [GENRE/STYLE], [BPM], [KEY/MOOD], [INSTRUMENTS], [ENERGY LEVEL], [PURPOSE]
50
+ ```
51
+
52
+ ### Examples
53
+
54
+ **Educational explainer:**
55
+ ```
56
+ Gentle lo-fi ambient electronic, 90 BPM, C major, soft synth pads and light
57
+ percussion, calm and steady energy, background music for narration
58
+ ```
59
+
60
+ **Corporate product demo:**
61
+ ```
62
+ Modern upbeat corporate pop, 110 BPM, G major, acoustic guitar and light drums,
63
+ positive energy building gradually, underscore for product walkthrough
64
+ ```
65
+
66
+ **Technical deep-dive:**
67
+ ```
68
+ Minimal ambient electronic, 80 BPM, A minor, soft Rhodes piano and subtle
69
+ bass, contemplative and focused, background music for technical explanation
70
+ ```
71
+
72
+ ## Prompting Rules
73
+
74
+ 1. **Always include "background" or "underscore"** — tells the model to stay dynamically even
75
+ 2. **Always use instrumental mode** — lyrics compete with narration
76
+ 3. **Specify BPM explicitly** — don't rely on genre to set tempo
77
+ 4. **Avoid "bright hi-hats" or "prominent vocals"** — high-frequency busy elements compete with speech in the 2-4 kHz intelligibility band
78
+ 5. **Include energy direction** — "steady energy" for explainers, "building gradually" for reveals
79
+
80
+ ## Duration Matching
81
+
82
+ - Generate at the exact video duration when possible
83
+ - For longer videos, generate a track 30-60% of video length and loop with crossfade
84
+ - **Section-mapped scoring** for videos with distinct acts:
85
+
86
+ | Video Section | Duration | Music Style |
87
+ |--------------|----------|-------------|
88
+ | Intro / hook | 8-10s | Soft, building |
89
+ | Main explanation | 90-120s | Steady, neutral |
90
+ | Key reveal | 20-30s | Intensified, fuller |
91
+ | Outro | 10-15s | Fading, gentle |
92
+
93
+ Generate each as a separate track and crossfade between them.
94
+
95
+ ## Looping
96
+
97
+ ```bash
98
+ # Loop a track 3x
99
+ ffmpeg -stream_loop 2 -i music.mp3 -c copy music_looped.mp3
100
+
101
+ # Add crossfade at loop points (2s fade)
102
+ ffmpeg -i music.mp3 -af "afade=t=out:st=28:d=2" part1.mp3
103
+ ffmpeg -i music.mp3 -af "afade=t=in:d=2" part2.mp3
104
+ # Then concat
105
+ ```
106
+
107
+ Better approach: generate at the exact video duration to avoid loop artifacts.
108
+
109
+ ## Integration with Video
110
+
111
+ - Duck music 18-20 dB below narration during speech
112
+ - Cut 2-4 kHz on the music bed to clear speech intelligibility band
113
+ - Test on phone speakers — if narration disappears behind music, duck more
114
+ - One track per video — avoid switching styles mid-video unless clear narrative shift
115
+ - Music should start at video start and fade out 2-3 seconds before end
116
+
117
+ ---
118
+
119
+ ## Kolbo MCP Integration
120
+
121
+ | Task | Kolbo MCP Tool | Notes |
122
+ |------|---------------|-------|
123
+ | Generate music | `generate_music` | Use prompt structure above |
124
+ | Instrumental | `generate_music` instrumental=true | Always for video background |
125
+ | With lyrics | `generate_music` lyrics="..." | Pass actual lyric text |
126
+ | Discover models | `list_models` type="music" | Check available music models |
127
+ | Sound effects | `generate_sound` | For whooshes, impacts, ambience |
128
+
129
+ **Workflow:**
130
+ 1. `list_models` type="music" → pick model or auto-select
131
+ 2. Write prompt using the BPM/key/mood tables above
132
+ 3. `generate_music` with `instrumental: true`, style tags, duration
133
+ 4. Download the result and mix with narration using FFmpeg (see `sound-design` skill)
134
+
135
+ ---
136
+
137
+ ## Local / Free Options
138
+
139
+ > **IMPORTANT:** Always use Kolbo MCP `generate_music` by default. Only mention these if the user explicitly asks for free alternatives. Confirm before installing anything.
140
+
141
+ **Free music libraries (no install, browser-based):**
142
+ - Pixabay Music — free, no attribution required
143
+ - Free Music Archive — CC-licensed
144
+ - Incompetech (Kevin MacLeod) — CC-BY, huge catalog
145
+
146
+ **Local generation:** If the user has a GPU (8GB+) and explicitly asks, `MusicGen` by Meta (`pip install audiocraft`) can generate music locally. Confirm before installing.
@@ -0,0 +1,152 @@
1
+ ---
2
+ name: production-review
3
+ description: >
4
+ Self-review quality gates for video production: post-render verification protocol, pre-delivery
5
+ checklist, audio verification, visual inspection, severity classification (critical/suggestion/nitpick),
6
+ review workflow. Use after completing any production stage to verify quality before delivery.
7
+ Keywords: review, quality, verification, checklist, render, audio check, video check, delivery,
8
+ QA, quality gate, self-review, post-render
9
+ ---
10
+
11
+ # Production Review — Quality Gates
12
+
13
+ ## When to Use
14
+
15
+ After completing any major production stage — especially after rendering, before delivering to the user. Read this skill and run through the relevant checklist.
16
+
17
+ ## Severity Levels
18
+
19
+ | Severity | Definition | Action |
20
+ |----------|-----------|--------|
21
+ | **CRITICAL** | Breaks the output, incomplete, or dangerously wrong | Must fix. Blocks delivery. |
22
+ | **SUGGESTION** | Improves quality significantly but doesn't block | Note it, fix if time allows |
23
+ | **NITPICK** | Nice-to-have polish | Log it, move on |
24
+
25
+ ## Decision Flow
26
+
27
+ 1. Run the relevant checklist below
28
+ 2. Count critical findings
29
+ 3. **0 critical** → PASS (note suggestions)
30
+ 4. **1+ critical** → REVISE (max 2 revision rounds)
31
+ 5. After 2 rounds, still critical → PASS_WITH_WARNINGS (inform user of known issues)
32
+
33
+ ---
34
+
35
+ ## Post-Render Verification (Video)
36
+
37
+ ### Step 1: Probe the Output (GATE — blocks all other steps)
38
+ ```bash
39
+ ffprobe -v quiet -print_format json -show_format -show_streams rendered_video.mp4
40
+ ```
41
+
42
+ Verify ALL of:
43
+ - [ ] Video stream exists with correct resolution and FPS
44
+ - [ ] **Audio stream exists** — if missing, STOP. Fix audio config, re-render
45
+ - [ ] Duration within +/-5% of target
46
+ - [ ] File size is reasonable (not 0 bytes, not suspiciously small)
47
+
48
+ **If audio stream is missing, do NOT proceed.** Most common cause: audio sources mixed externally but never embedded in the composition.
49
+
50
+ ### Step 2: Extract Review Frames
51
+ Sample frames at scene midpoints and visually inspect:
52
+ ```bash
53
+ ffmpeg -i rendered_video.mp4 -vf "fps=1/5" frame_%04d.png
54
+ ```
55
+ - [ ] No visual artifacts or glitches
56
+ - [ ] Text overlays readable and within safe zones
57
+ - [ ] Color grade consistent across scenes
58
+ - [ ] No black frames or flash frames at cuts
59
+
60
+ ### Step 3: Audio Verification
61
+ - [ ] Play back and confirm narration is audible over music
62
+ - [ ] No audio pops or clicks at cut points
63
+ - [ ] Music volume appropriate (18-20 dB below dialogue)
64
+ - [ ] Audio loudness within platform target (-14 LUFS for social)
65
+
66
+ ### Step 4: Present Review to User
67
+ Structured summary with: file stats, audio verification, visual findings, caption status.
68
+
69
+ ---
70
+
71
+ ## Pre-Delivery Checklist by Content Type
72
+
73
+ ### Explainer Video
74
+ - [ ] Hook lands in first 3 seconds
75
+ - [ ] Core concept clearly explained (the "aha" moment)
76
+ - [ ] Captions present and synced
77
+ - [ ] Background music doesn't overpower narration
78
+ - [ ] Duration matches target (+/-10%)
79
+ - [ ] Output plays correctly on target platform
80
+
81
+ ### Short-Form (TikTok/Reels/Shorts)
82
+ - [ ] 9:16 aspect ratio, 1080x1920
83
+ - [ ] Important content within safe zones (900x1400)
84
+ - [ ] Hook in first 1-2 seconds
85
+ - [ ] Captions mandatory (85% watch muted)
86
+ - [ ] File size under platform limit
87
+ - [ ] H.264 High Profile, 8+ Mbps
88
+
89
+ ### Talking Head
90
+ - [ ] Filler words removed
91
+ - [ ] No awkward jump cuts (covered by B-roll or transition)
92
+ - [ ] Speaker's face never covered by overlays
93
+ - [ ] Audio clean — no background noise
94
+ - [ ] Eye-level framing maintained
95
+
96
+ ### Music/Audio
97
+ - [ ] Correct duration
98
+ - [ ] Instrumental if for background use
99
+ - [ ] BPM matches content energy
100
+ - [ ] No clipping or distortion
101
+ - [ ] Loudness normalized to target
102
+
103
+ ---
104
+
105
+ ## Remotion-Specific Verification
106
+
107
+ Before declaring a Remotion render complete:
108
+
109
+ - [ ] Run `composition_validator` before rendering
110
+ - [ ] All `staticFile()` references resolve to existing assets
111
+ - [ ] Composition duration matches sum of scene durations minus transition overlaps
112
+ - [ ] No CSS animations used (must use `useCurrentFrame()` + `interpolate()`)
113
+ - [ ] No Tailwind `animate-*` classes (break frame-based rendering)
114
+ - [ ] `interpolate()` calls use `extrapolateLeft: 'clamp', extrapolateRight: 'clamp'`
115
+ - [ ] Audio layers in sync with visual scenes
116
+ - [ ] Theme colors match the active style
117
+ - [ ] Text scenes use Remotion components, NOT AI-generated images with text
118
+
119
+ ## Review Log Format
120
+
121
+ When logging a review finding:
122
+ ```
123
+ [SEVERITY] Finding description
124
+ - What: specific issue observed
125
+ - Where: timestamp or scene reference
126
+ - Fix: recommended action
127
+ ```
128
+
129
+ ---
130
+
131
+ ## Kolbo MCP Integration
132
+
133
+ Use these tools during review:
134
+
135
+ | Review Step | Kolbo MCP Tool | What to Check |
136
+ |-------------|---------------|---------------|
137
+ | Audio verification | `transcribe_audio` | Transcribe the rendered video — if 0 words, audio is silent |
138
+ | Visual analysis | `chat_send_message` + Gemini | "Review this video for quality issues" |
139
+ | Credit check | `check_credits` | Verify budget before re-renders |
140
+
141
+ **Post-render verification with Kolbo:**
142
+ 1. `ffprobe` the output (always first — check streams exist)
143
+ 2. `transcribe_audio` the rendered video → compare word count to script
144
+ 3. If word count < 80% of script → audio is cut off → investigate
145
+ 4. `chat_send_message` with Gemini + video URL → visual quality review
146
+ 5. Present structured findings to user
147
+
148
+ **Re-generation workflow (if review finds critical issues):**
149
+ 1. Identify the failed asset (video clip, audio, image)
150
+ 2. Re-generate with adjusted prompt via the appropriate Kolbo MCP tool
151
+ 3. Re-compose with FFmpeg or Remotion
152
+ 4. Run review again (max 2 revision rounds)