npm - @kolbo/kolbo-code-linux-arm64-musl - Versions diffs - 2.0.0 → 2.0.3 - Mend

@kolbo/kolbo-code-linux-arm64-musl 2.0.0 → 2.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/bin/kolbo +0 -0
package/package.json +1 -1
package/skills/color-grading/SKILL.md +152 -0
package/skills/ffmpeg-patterns/SKILL.md +240 -0
package/skills/image-prompting-guide/SKILL.md +143 -0
package/skills/kolbo/SKILL.md +263 -19
package/skills/music-prompting/SKILL.md +146 -0
package/skills/production-review/SKILL.md +152 -0
package/skills/short-form-video/SKILL.md +168 -0
package/skills/sound-design/SKILL.md +154 -0
package/skills/storytelling/SKILL.md +139 -0
package/skills/subtitle-production/SKILL.md +244 -0
package/skills/subtitle-production/reference/burn_to_video.py +222 -0
package/skills/subtitle-production/reference/export_srts.py +127 -0
package/skills/subtitle-production/reference/gen_srt.py +42 -0
package/skills/typography-video/SKILL.md +182 -0
package/skills/typography-video/reference/KineticTitleScene.tsx +345 -0
package/skills/video-editing/SKILL.md +128 -0
package/skills/video-production/SKILL.md +7 -8
package/skills/video-prompting-guide/SKILL.md +268 -0

package/skills/kolbo/SKILL.md CHANGED Viewed

@@ -1,25 +1,78 @@
 ---
 name: kolbo
-description: Generate or analyze creative media through Kolbo AI. Load this skill whenever the user asks to create, edit, prompt, or analyze images, videos, music, speech, or sound effects — or to list available AI models / check credit balance. It contains the MCP tool workflow and the prompt-engineering rules for each media type.
+description: Generate, edit, or analyze creative media through Kolbo AI. Load this skill whenever the user asks to create, edit, prompt, or analyze images, videos, music, speech, sound effects, 3D models — or to transcribe audio/video, manage media, use Visual DNA for consistency, check credits, or browse models/presets/moodboards. It contains the MCP tool workflow and the prompt-engineering rules for each media type.
 ---
-# Kolbo AI — Creative Generation & Analysis
+# Kolbo AI — Creative Generation, Analysis & Transcription
 You have direct access to the Kolbo AI creative platform via MCP tools (auto-configured by `kolbo auth login`). Use them to generate and deliver real content — do NOT just describe what you would create.
 ## Available MCP Tools
+### Generation
+| Tool | Description |
+|------|-------------|
+| `generate_image` | Create images from text prompts. Supports Visual DNA, moodboards, reference images, batch generation, web-search grounding. |
+| `generate_image_edit` | Edit/transform an existing image (background removal, color changes, compositing). Pass source images + edit prompt. |
+| `generate_creative_director` | Generate a coordinated multi-scene set (1–8 scenes) from one creative brief. Ideal for storyboards, ad campaigns, product showcases. Supports image and video modes. |
+| `generate_video` | Create videos from text prompts. Supports Visual DNA and reference images for consistency. |
+| `generate_video_from_image` | Animate a still image into video. Prompt describes the motion, not the subject. |
+| `generate_video_from_video` | Restyle/transform an existing video (style transfer, scene restyling, subject swap). Keeps the original motion. |
+| `generate_elements` | Generate video from reference assets (images/videos) + prompt. Use when animating specific uploaded assets. |
+| `generate_first_last_frame` | Generate video that morphs from a first frame to a last frame (keyframe interpolation). |
+| `generate_lipsync` | Lipsync an audio track to a source image or video face. Accepts local files or URLs. |
+| `generate_music` | Create music from descriptions. Supports instrumental, custom lyrics, style, vocal gender. |
+| `generate_speech` | Convert text to speech (TTS). Default: ElevenLabs. Use `list_voices` to pick a voice. |
+| `generate_sound` | Generate sound effects from descriptions (foley, ambient, impacts, UI sounds). |
+| `generate_3d` | Generate 3D models from text, single image, or multi-view images. Returns GLB, FBX, OBJ, USDZ. |
+### Transcription & Analysis
+| Tool | Description |
+|------|-------------|
+| `transcribe_audio` | Transcribe audio or video into text + SRT subtitles + word-by-word SRT. Accepts local files or URLs. |
+### Voice & Model Discovery
 | Tool | Description |
 |------|-------------|
-| `generate_image` | Create images from text prompts. Returns image URL(s). |
-| `generate_video` | Create videos from text. Returns video URL. |
-| `generate_video_from_image` | Animate a still image into video. Returns video URL. |
-| `generate_music` | Create music from descriptions. Returns audio URL. |
-| `generate_speech` | Convert text to speech. Returns audio URL. |
-| `generate_sound` | Generate sound effects. Returns audio URL. |
 | `list_models` | Browse available AI models filtered by type. |
+| `list_voices` | List available TTS voices with filtering by provider, language, gender. |
 | `check_credits` | Check remaining Kolbo credit balance. |
-| `get_generation_status` | Poll status of an in-progress generation by ID. |
+| `get_generation_status` | Poll status of an in-progress generation by ID (fallback for timeouts). |
+### Media Library
+| Tool | Description |
+|------|-------------|
+| `upload_media` | Upload a local file or URL to the user's Kolbo media library (CDN). Use for multi-tool workflows. |
+| `list_media` | Browse user's uploaded media with filtering by type and search. |
+### Visual DNA (Character/Style Consistency)
+| Tool | Description |
+|------|-------------|
+| `create_visual_dna` | Create a Visual DNA profile from reference images/video/audio for character, style, product, or scene consistency. |
+| `list_visual_dnas` | List your Visual DNA profiles (id, name, type, thumbnail). |
+| `get_visual_dna` | Fetch full profile details including system_prompt and reference images. |
+| `delete_visual_dna` | Delete a Visual DNA profile. |
+### Moodboards & Presets
+| Tool | Description |
+|------|-------------|
+| `list_moodboards` | List available moodboards (personal, system presets, org). |
+| `get_moodboard` | Fetch a moodboard's master_prompt, style_guide, and images. |
+| `list_presets` | Browse generation presets (image/video/music templates with bundled style direction). |
+### Chat
+| Tool | Description |
+|------|-------------|
+| `chat_send_message` | Send a message to Kolbo AI chat. Supports web search and deep think modes. |
+| `chat_list_conversations` | List your SDK chat conversations. |
+| `chat_get_messages` | Fetch messages in a conversation (with media URLs). |
 ## Core Workflow
@@ -34,21 +87,96 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
 | Type | Use for |
 |------|---------|
 | `image` | Still-image generation |
+| `image_edit` | Image editing / transformation |
 | `video` | Text-to-video |
 | `video_from_image` | Image-to-video animation |
+| `lipsync` | Audio-to-face lipsync |
 | `music` | Music generation |
 | `speech` | Text-to-speech |
 | `sound` | Sound effects |
+| `three_d` | 3D model generation |
 ### Cost Awareness
 Creative generations bill against the user's Kolbo credit balance. Order of expense (rough):
-- **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s)
-- **Medium**: music (~30s-2min)
-- **Expensive**: video (~1-5min, highest credit cost)
+- **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s), transcription (by duration)
+- **Medium**: music (~30s-2min), 3D (~1-3min)
+- **Expensive**: video (~1-5min, highest credit cost), lipsync (~1-3min)
 Rule of thumb: confirm intent before firing off a video generation unless the user was explicit. For images, just generate.
+### Rate Limiting
+Kolbo enforces **10 generation requests per minute per user per tool type** (e.g. 10 image calls + 10 video calls = fine, but 11 image calls in 1 minute = rate limited). General media requests are capped at **300 per minute**.
+When making multiple generation calls:
+- **Stagger calls** — do NOT fire all in parallel. Space them ~5-10 seconds apart.
+- **Batch images**: use `generate_creative_director` instead of calling `generate_image` 5+ times — it handles multi-scene in one request.
+- If you get a rate limit error (429), wait 60 seconds (the window resets per minute) and retry. Do not retry more than 2 times.
+---
+## Transcription & Audio/Video Analysis
+Use `transcribe_audio` whenever the user provides an audio or video file and wants:
+- A text transcript
+- Subtitles (SRT format)
+- Word-by-word timed subtitles (for karaoke, motion graphics, Remotion captions, video editing)
+- Content analysis or summary of spoken content
+- Dialogue extraction from video
+### Workflow
+1. Call `transcribe_audio` with the `source` (URL or absolute local file path)
+2. The tool returns:
+   - `text` — full transcript as plain text
+   - `srt_url` — download URL for grouped SRT subtitles (configurable words-per-line)
+   - `word_by_word_srt_url` — download URL for **word-by-word SRT** (one word per subtitle entry with precise timestamps from ElevenLabs Scribe v2)
+   - `txt_url` — download URL for plain text file
+   - `duration` — audio duration in seconds
+3. Analyze the transcript text as needed (summarize, translate, extract topics, answer questions about content)
+### Supported Formats
+- **Audio**: mp3, wav, m4a, flac, aac
+- **Video** (extracts audio track): mp4, mov, webm, mkv, avi, m4v
+### Word-by-Word Transcription
+The `word_by_word_srt_url` contains an SRT file where each subtitle entry is a **single word** with precise start/end timestamps (powered by ElevenLabs Scribe v2). This is ideal for:
+- **Karaoke-style captions** — highlight one word at a time
+- **Remotion/motion graphics** — animate text word-by-word synced to audio
+- **Video editing** — precise cut points aligned to speech
+- **Accessibility** — word-level navigation for hearing-impaired users
+The regular `srt_url` groups words into readable subtitle lines (default 12 words per line, up to 2 lines per subtitle).
+### Use Cases & Examples
+- "Transcribe this podcast" → `transcribe_audio` with the audio URL
+- "What's being said in this video?" → `transcribe_audio` → analyze the returned text
+- "Generate subtitles for my video" → `transcribe_audio` → share the `srt_url`
+- "I need word-by-word timing for this audio" → `transcribe_audio` → share `word_by_word_srt_url`
+- "Summarize this meeting recording" → `transcribe_audio` → summarize the text
+- "Extract key points from this lecture" → `transcribe_audio` → analyze and extract
+### Long Content
+Transcription supports files up to 30 minutes. For longer content, split the file first or provide segments.
+### Visual Video/Audio Analysis (what's happening, not just what's said)
+`transcribe_audio` only extracts **speech**. If the user wants to understand **what's visually happening** in a video (scenes, actions, objects, on-screen text) or needs a multimodal AI to reason about the content, use `chat_send_message` with a video-capable model instead.
+**Video-capable models**: `gemini-2.5-pro`, `gemini-2.5-flash` — these can watch video and analyze visual content.
+**Workflow for visual analysis:**
+1. Upload the video with `upload_media` to get a stable CDN URL
+2. Call `chat_send_message` with the video URL in the message and a video-capable model (e.g. `gemini-2.5-pro`)
+3. Ask your analysis question: "Describe what happens in this video", "What products are shown?", "Summarize the key scenes"
+**When to use which:**
+| User intent | Tool |
+|-------------|------|
+| "Transcribe this" / "What's being said?" | `transcribe_audio` |
+| "Generate subtitles" / "Word-by-word timing" | `transcribe_audio` |
+| "What's happening in this video?" / "Describe the scenes" | `chat_send_message` + Gemini |
+| "Analyze this video and transcribe it" | Both — `transcribe_audio` for text + `chat_send_message` for visual |
 ---
 ## Image Prompts
@@ -61,14 +189,36 @@ Rule of thumb: confirm intent before firing off a video generation unless the us
 - **`enhance_prompt: true`** (default) will improve most prompts automatically. Turn it off only if the user's prompt is already fully engineered or they want literal wording.
 ### Image Editing (image-to-image)
-When the model can see the uploaded image, describe the **change**, not the unchanged parts.
+Use `generate_image_edit` when the user wants to modify an existing image. Pass the source image URL(s) in `source_images` and describe the change in `prompt`.
 - Good: "Turn the sky orange and add drifting clouds"
 - Bad: "A mountain landscape with an orange sky and drifting clouds" (re-describes what's already in the image)
 Simple edits deserve simple prompts. Only elaborate for genuinely complex, multi-step transformations.
 ### Multi-Scene / Campaigns
-For storyboards, campaigns, or character-consistent sequences, call `generate_image` once per scene with the same base style cues carried across prompts. Kolbo's web app has a dedicated Creative Director feature for this; in the CLI the workflow is sequential `generate_image` calls.
+For storyboards, campaigns, or character-consistent sequences, use `generate_creative_director` — it generates 1–8 coordinated scenes from a single creative brief with consistent style. Pass `visual_dna_ids` and/or `moodboard_id` for character/style consistency across all scenes.
+In the CLI, you can also do sequential `generate_image` calls with the same Visual DNA profiles.
+---
+## Visual DNA (Character/Style Consistency)
+Visual DNA profiles capture the visual "identity" of a character, style, product, or scene from reference media.
+### Workflow
+1. **Create** a profile with `create_visual_dna` — provide reference images (max 4), optionally video and audio
+2. **Types**: `character` (default), `style`, `product`, `scene`
+3. **Use** the profile by passing its `id` in `visual_dna_ids` when calling any generation tool
+4. **List/inspect** profiles with `list_visual_dnas` / `get_visual_dna`
+### When to Use
+- User wants the same character across multiple images/videos
+- User wants a consistent brand style across a campaign
+- User references "keep the same look" or "same character"
+- User provides reference photos of a person/product to maintain consistency
 ---
@@ -89,6 +239,20 @@ The model can see the starting frame. Describe **what happens**, not what the im
 - Good: "Slow dolly-in on the subject. Her hair drifts in a light breeze. Soft particles float through the air. [6s]"
 - Bad: "A woman with long brown hair standing in a forest, wearing a red dress, with golden sunlight..." (re-describes the image)
+### Video-to-Video (Restyle)
+Use `generate_video_from_video` to restyle an existing video. Describe the **new style**, not the original content — the model preserves the original motion.
+- Good: "Transform into anime style with cel-shading and vibrant colors"
+- Bad: "A person walking down a street" (re-describes what's already in the video)
+### Elements (Reference Assets → Video)
+Use `generate_elements` when the user has specific assets (product photos, character references) they want animated into a video. Pass them as `reference_images` (URLs) or `files` (local paths).
+### First/Last Frame (Keyframe Interpolation)
+Use `generate_first_last_frame` when the user provides two keyframes and wants the model to create a smooth transition between them.
+### Lipsync
+Use `generate_lipsync` to sync audio to a face in an image or video. Both `source` (face) and `audio` accept URLs or local file paths.
 ### Camera Vocabulary
 Pick what fits the mood. Every shot gets at least one.
@@ -150,6 +314,17 @@ Format: `extreme slow-motion [Xs] — [micro-movements in ultra slow-mo] — sna
 ---
+## 3D Generation
+Use `generate_3d` for creating 3D models. Three modes:
+- **Text mode**: prompt-only (e.g., "a medieval sword with ornate handle")
+- **Single image mode**: one reference image + optional prompt
+- **Multi-view mode**: 2+ reference images for higher-quality reconstruction
+Returns downloadable model files in GLB, FBX, OBJ, and USDZ formats. Use `list_models` with `type: "three_d"` to discover available models.
+---
 ## Music Prompts
 Describe **genre → mood → instrumentation → tempo → era**, in that order.
@@ -164,10 +339,10 @@ Describe **genre → mood → instrumentation → tempo → era**, in that order
 ## Speech (TTS)
-- Call `list_models` with `type: speech` to get voice identifiers. Pass the `identifier` as `model` for a consistent voice.
-- The voice **is** the model for speech — there is no separate voice parameter.
+- Call `list_voices` to find available voices. Filter by `provider`, `language`, or `gender`.
+- Pass the returned `voice_id` (or the voice's display name like "Rachel") as the `voice` parameter in `generate_speech`.
+- For multilingual content, pick a voice that supports the target language.
 - For long text, split at natural sentence boundaries. Each generation has a character cap; chunk long-form content into multiple calls.
-- For multilingual content, pick a voice that supports the target language from `list_models`.
 ---
@@ -179,6 +354,35 @@ Describe **genre → mood → instrumentation → tempo → era**, in that order
 ---
+## Moodboards & Presets
+**Moodboards** provide style direction (master prompt + style guide + reference images). Pass a `moodboard_id` to any generation tool to apply its style.
+- `list_moodboards` to browse available options
+- `get_moodboard` to see full details before applying
+**Presets** bundle prompt templates + style direction for specific creative looks. Pass a `preset_id` to generation tools.
+- `list_presets` with optional `type` filter ("image", "video", "music", "text_to_video")
+---
+## Media Library
+Use `upload_media` to upload local files or URLs to the Kolbo CDN for stable hosting. Useful when:
+- A local file needs to be referenced in multiple generation calls
+- You want a permanent CDN URL instead of an ephemeral local path
+Use `list_media` to browse previously uploaded content (filter by type, search by name).
+---
+## Chat
+Use `chat_send_message` to interact with Kolbo AI models (GPT-4o, Claude, etc.) with optional web search and deep think modes. Conversations persist via `session_id` — omit to start new, pass to continue.
+Use `chat_list_conversations` and `chat_get_messages` to browse conversation history.
+---
 ## Image Analysis (when the user uploads images)
 When the user shares an image and asks about it:
@@ -188,7 +392,7 @@ When the user shares an image and asks about it:
 - **Extract text verbatim** when asked (OCR-style requests are fine).
 - **Cannot identify real people.** Describe hair, clothing, pose, expression, and apparent role — but never name a specific individual, even a well-known public figure. If the user insists, decline and offer to describe instead.
 - **Copyrighted content**: summarize and reference, don't reproduce verbatim large chunks.
-- If the user wants an **edit** based on the analysis, hand off to `generate_video_from_image` (motion) or `generate_image` with an image-to-image model (visual edit) — see the Image Editing section above for prompt structure.
+- If the user wants an **edit** based on the analysis, hand off to `generate_image_edit` (visual edit) or `generate_video_from_image` (motion).
 ---
@@ -217,16 +421,56 @@ Full public documentation for Kolbo Code (the CLI you are running inside) lives
 The MDX sources are in the `kolbo-docs` repo under `content/docs/kolbo-code/`. When the user's question has a concrete answer in one of those pages, cite the path and summarize — do not invent new instructions.
+## Troubleshooting
+### "API key is invalid or expired"
+This usually means the CLI is sending a key to the wrong API endpoint.
+**Common cause — whitelabel overlap:** if the user previously used regular `kolbo` and then switched to a whitelabel/partner CLI (e.g. `sapir`), the old API key may still be cached against the main Kolbo API. Running `kolbo` instead of the branded command (`sapir`) overwrites the MCP config with the wrong endpoint.
+**Fix:** tell the user to re-authenticate with their branded CLI command:
+```
+sapir auth login
+```
+(Replace `sapir` with their actual CLI command.)
+Then **restart the editor/session** so the MCP picks up the new key and endpoint.
+**Important:** whitelabel users must always use their branded CLI command (e.g. `sapir`), not `kolbo`, to keep the MCP pointed at the correct API.
+### MCP tools not responding or not found
+If Kolbo tools timeout or aren't listed, the MCP server may not be wired. Tell the user to run:
+```
+<their-cli-command> auth login
+```
+This re-wires the MCP configuration automatically. Then restart the session.
+### "Rate limited" (429 errors)
+Kolbo allows 10 generation requests per minute per tool type. Wait 60 seconds and retry. Use `generate_creative_director` for batch image work instead of multiple `generate_image` calls.
+---
 ## Examples
 Natural-language triggers that should prompt this skill + a tool call:
 - "Generate an image of a neon-lit Tokyo street at night" → `list_models` (image) → `generate_image`
+- "Remove the background from this image" → `list_models` (image_edit) → `generate_image_edit`
+- "Create a storyboard for a coffee brand ad" → `list_models` (image) → `generate_creative_director`
 - "Create a 5-second cinematic video of ocean waves at sunset" → `list_models` (video) → `generate_video` with camera + mood guidance
 - "Animate this product photo with a 360° orbit" → `list_models` (video_from_image) → `generate_video_from_image`
+- "Restyle this video as anime" → `generate_video_from_video`
+- "Make this character talk with this voiceover" → `generate_lipsync`
+- "Create a smooth transition between these two frames" → `generate_first_last_frame`
 - "Make a lo-fi hip hop beat, instrumental, 85 BPM" → `list_models` (music) → `generate_music`
-- "Say this in English with a natural female voice: Welcome to Kolbo" → `list_models` (speech) → `generate_speech`
+- "Say this in English with a natural female voice: Welcome to Kolbo" → `list_voices` → `generate_speech`
 - "Generate a door slam sound effect" → `list_models` (sound) → `generate_sound`
+- "Create a 3D model of a medieval castle" → `list_models` (three_d) → `generate_3d`
+- "Transcribe this podcast episode" → `transcribe_audio`
+- "What's being said in this video?" → `transcribe_audio` → analyze the text
+- "Generate word-by-word subtitles for this audio" → `transcribe_audio` → share `word_by_word_srt_url`
+- "Keep the same character across all these images" → `create_visual_dna` → `generate_image` with `visual_dna_ids`
+- "Upload this file to my media library" → `upload_media`
 - "What video models are available?" → `list_models` (video)
 - "How many credits do I have?" → `check_credits`
 - "What's in this image?" (with upload) → describe per the Image Analysis section; no tool call needed unless the user asks to generate or edit

package/skills/music-prompting/SKILL.md ADDED Viewed

@@ -0,0 +1,146 @@
+---
+name: music-prompting
+description: >
+  Music generation prompting guide: BPM selection by video type, key/mood mapping, prompt
+  structure for background music, duration matching, looping strategies, section-mapped scoring.
+  Use when generating background music for video or crafting music generation prompts.
+  Keywords: music, BPM, tempo, key, mood, instrumental, background music, suno, elevenlabs,
+  music generation, prompt, genre, looping, score, soundtrack
+---
+# Music Generation — Prompting Guide
+## Quick Reference
+```
+INSTRUMENTAL:     Always force_instrumental=true for video background
+PROMPT ORDER:     genre/style → BPM → key/mood → instruments → energy → purpose
+KEY RULE:         Music must be 18-20 dB below narration (see sound-design skill)
+ALWAYS INCLUDE:   "background" or "underscore" in every prompt
+```
+## BPM Selection by Video Type
+| Video Type | BPM Range | Prompt Fragment |
+|-----------|-----------|-----------------|
+| Educational explainer | 80-100 | "gentle ambient electronic, 90 BPM" |
+| Corporate / tech | 100-120 | "upbeat corporate pop, 110 BPM, positive" |
+| Epic / dramatic reveal | 60-80 | "cinematic orchestral, 70 BPM, building tension" |
+| Fast-paced montage | 120-140 | "energetic electronic, 130 BPM, driving beat" |
+| Meditation / calm | 50-70 | "ambient drone, 60 BPM, peaceful" |
+| Comedy / lighthearted | 100-130 | "playful ukulele pop, 120 BPM, whimsical" |
+| Sad / reflective | 60-80 | "melancholic piano, 65 BPM, minor key" |
+| Action / hype | 140-170 | "high-intensity drum and bass, 160 BPM" |
+## Key and Mood Mapping
+| Mood | Key | Musical Characteristics |
+|------|-----|----------------------|
+| Happy / upbeat | C major, G major | Bright, resolved, energetic |
+| Serious / professional | D minor, A minor | Grounded, authoritative |
+| Mysterious / curious | E minor, B minor | Tension, anticipation |
+| Triumphant / inspiring | D major, Bb major | Expansive, climactic |
+| Melancholic / thoughtful | F minor, C minor | Reflective, emotional |
+| Neutral / ambient | C major, Am | Unobtrusive, background |
+## Prompt Structure
+```
+[GENRE/STYLE], [BPM], [KEY/MOOD], [INSTRUMENTS], [ENERGY LEVEL], [PURPOSE]
+```
+### Examples
+**Educational explainer:**
+```
+Gentle lo-fi ambient electronic, 90 BPM, C major, soft synth pads and light
+percussion, calm and steady energy, background music for narration
+```
+**Corporate product demo:**
+```
+Modern upbeat corporate pop, 110 BPM, G major, acoustic guitar and light drums,
+positive energy building gradually, underscore for product walkthrough
+```
+**Technical deep-dive:**
+```
+Minimal ambient electronic, 80 BPM, A minor, soft Rhodes piano and subtle
+bass, contemplative and focused, background music for technical explanation
+```
+## Prompting Rules
+1. **Always include "background" or "underscore"** — tells the model to stay dynamically even
+2. **Always use instrumental mode** — lyrics compete with narration
+3. **Specify BPM explicitly** — don't rely on genre to set tempo
+4. **Avoid "bright hi-hats" or "prominent vocals"** — high-frequency busy elements compete with speech in the 2-4 kHz intelligibility band
+5. **Include energy direction** — "steady energy" for explainers, "building gradually" for reveals
+## Duration Matching
+- Generate at the exact video duration when possible
+- For longer videos, generate a track 30-60% of video length and loop with crossfade
+- **Section-mapped scoring** for videos with distinct acts:
+| Video Section | Duration | Music Style |
+|--------------|----------|-------------|
+| Intro / hook | 8-10s | Soft, building |
+| Main explanation | 90-120s | Steady, neutral |
+| Key reveal | 20-30s | Intensified, fuller |
+| Outro | 10-15s | Fading, gentle |
+Generate each as a separate track and crossfade between them.
+## Looping
+```bash
+# Loop a track 3x
+ffmpeg -stream_loop 2 -i music.mp3 -c copy music_looped.mp3
+# Add crossfade at loop points (2s fade)
+ffmpeg -i music.mp3 -af "afade=t=out:st=28:d=2" part1.mp3
+ffmpeg -i music.mp3 -af "afade=t=in:d=2" part2.mp3
+# Then concat
+```
+Better approach: generate at the exact video duration to avoid loop artifacts.
+## Integration with Video
+- Duck music 18-20 dB below narration during speech
+- Cut 2-4 kHz on the music bed to clear speech intelligibility band
+- Test on phone speakers — if narration disappears behind music, duck more
+- One track per video — avoid switching styles mid-video unless clear narrative shift
+- Music should start at video start and fade out 2-3 seconds before end
+---
+## Kolbo MCP Integration
+| Task | Kolbo MCP Tool | Notes |
+|------|---------------|-------|
+| Generate music | `generate_music` | Use prompt structure above |
+| Instrumental | `generate_music` instrumental=true | Always for video background |
+| With lyrics | `generate_music` lyrics="..." | Pass actual lyric text |
+| Discover models | `list_models` type="music" | Check available music models |
+| Sound effects | `generate_sound` | For whooshes, impacts, ambience |
+**Workflow:**
+1. `list_models` type="music" → pick model or auto-select
+2. Write prompt using the BPM/key/mood tables above
+3. `generate_music` with `instrumental: true`, style tags, duration
+4. Download the result and mix with narration using FFmpeg (see `sound-design` skill)
+---
+## Local / Free Options
+> **IMPORTANT:** Always use Kolbo MCP `generate_music` by default. Only mention these if the user explicitly asks for free alternatives. Confirm before installing anything.
+**Free music libraries (no install, browser-based):**
+- Pixabay Music — free, no attribution required
+- Free Music Archive — CC-licensed
+- Incompetech (Kevin MacLeod) — CC-BY, huge catalog
+**Local generation:** If the user has a GPU (8GB+) and explicitly asks, `MusicGen` by Meta (`pip install audiocraft`) can generate music locally. Confirm before installing.

package/skills/production-review/SKILL.md ADDED Viewed

@@ -0,0 +1,152 @@
+---
+name: production-review
+description: >
+  Self-review quality gates for video production: post-render verification protocol, pre-delivery
+  checklist, audio verification, visual inspection, severity classification (critical/suggestion/nitpick),
+  review workflow. Use after completing any production stage to verify quality before delivery.
+  Keywords: review, quality, verification, checklist, render, audio check, video check, delivery,
+  QA, quality gate, self-review, post-render
+---
+# Production Review — Quality Gates
+## When to Use
+After completing any major production stage — especially after rendering, before delivering to the user. Read this skill and run through the relevant checklist.
+## Severity Levels
+| Severity | Definition | Action |
+|----------|-----------|--------|
+| **CRITICAL** | Breaks the output, incomplete, or dangerously wrong | Must fix. Blocks delivery. |
+| **SUGGESTION** | Improves quality significantly but doesn't block | Note it, fix if time allows |
+| **NITPICK** | Nice-to-have polish | Log it, move on |
+## Decision Flow
+1. Run the relevant checklist below
+2. Count critical findings
+3. **0 critical** → PASS (note suggestions)
+4. **1+ critical** → REVISE (max 2 revision rounds)
+5. After 2 rounds, still critical → PASS_WITH_WARNINGS (inform user of known issues)
+---
+## Post-Render Verification (Video)
+### Step 1: Probe the Output (GATE — blocks all other steps)
+```bash
+ffprobe -v quiet -print_format json -show_format -show_streams rendered_video.mp4
+```
+Verify ALL of:
+- [ ] Video stream exists with correct resolution and FPS
+- [ ] **Audio stream exists** — if missing, STOP. Fix audio config, re-render
+- [ ] Duration within +/-5% of target
+- [ ] File size is reasonable (not 0 bytes, not suspiciously small)
+**If audio stream is missing, do NOT proceed.** Most common cause: audio sources mixed externally but never embedded in the composition.
+### Step 2: Extract Review Frames
+Sample frames at scene midpoints and visually inspect:
+```bash
+ffmpeg -i rendered_video.mp4 -vf "fps=1/5" frame_%04d.png
+```
+- [ ] No visual artifacts or glitches
+- [ ] Text overlays readable and within safe zones
+- [ ] Color grade consistent across scenes
+- [ ] No black frames or flash frames at cuts
+### Step 3: Audio Verification
+- [ ] Play back and confirm narration is audible over music
+- [ ] No audio pops or clicks at cut points
+- [ ] Music volume appropriate (18-20 dB below dialogue)
+- [ ] Audio loudness within platform target (-14 LUFS for social)
+### Step 4: Present Review to User
+Structured summary with: file stats, audio verification, visual findings, caption status.
+---
+## Pre-Delivery Checklist by Content Type
+### Explainer Video
+- [ ] Hook lands in first 3 seconds
+- [ ] Core concept clearly explained (the "aha" moment)
+- [ ] Captions present and synced
+- [ ] Background music doesn't overpower narration
+- [ ] Duration matches target (+/-10%)
+- [ ] Output plays correctly on target platform
+### Short-Form (TikTok/Reels/Shorts)
+- [ ] 9:16 aspect ratio, 1080x1920
+- [ ] Important content within safe zones (900x1400)
+- [ ] Hook in first 1-2 seconds
+- [ ] Captions mandatory (85% watch muted)
+- [ ] File size under platform limit
+- [ ] H.264 High Profile, 8+ Mbps
+### Talking Head
+- [ ] Filler words removed
+- [ ] No awkward jump cuts (covered by B-roll or transition)
+- [ ] Speaker's face never covered by overlays
+- [ ] Audio clean — no background noise
+- [ ] Eye-level framing maintained
+### Music/Audio
+- [ ] Correct duration
+- [ ] Instrumental if for background use
+- [ ] BPM matches content energy
+- [ ] No clipping or distortion
+- [ ] Loudness normalized to target
+---
+## Remotion-Specific Verification
+Before declaring a Remotion render complete:
+- [ ] Run `composition_validator` before rendering
+- [ ] All `staticFile()` references resolve to existing assets
+- [ ] Composition duration matches sum of scene durations minus transition overlaps
+- [ ] No CSS animations used (must use `useCurrentFrame()` + `interpolate()`)
+- [ ] No Tailwind `animate-*` classes (break frame-based rendering)
+- [ ] `interpolate()` calls use `extrapolateLeft: 'clamp', extrapolateRight: 'clamp'`
+- [ ] Audio layers in sync with visual scenes
+- [ ] Theme colors match the active style
+- [ ] Text scenes use Remotion components, NOT AI-generated images with text
+## Review Log Format
+When logging a review finding:
+```
+[SEVERITY] Finding description
+  - What: specific issue observed
+  - Where: timestamp or scene reference
+  - Fix: recommended action
+```
+---
+## Kolbo MCP Integration
+Use these tools during review:
+| Review Step | Kolbo MCP Tool | What to Check |
+|-------------|---------------|---------------|
+| Audio verification | `transcribe_audio` | Transcribe the rendered video — if 0 words, audio is silent |
+| Visual analysis | `chat_send_message` + Gemini | "Review this video for quality issues" |
+| Credit check | `check_credits` | Verify budget before re-renders |
+**Post-render verification with Kolbo:**
+1. `ffprobe` the output (always first — check streams exist)
+2. `transcribe_audio` the rendered video → compare word count to script
+3. If word count < 80% of script → audio is cut off → investigate
+4. `chat_send_message` with Gemini + video URL → visual quality review
+5. Present structured findings to user
+**Re-generation workflow (if review finds critical issues):**
+1. Identify the failed asset (video clip, audio, image)
+2. Re-generate with adjusted prompt via the appropriate Kolbo MCP tool
+3. Re-compose with FFmpeg or Remotion
+4. Run review again (max 2 revision rounds)