npm - @kolbo/kolbo-code-linux-arm64-musl - Versions diffs - 2.2.5 → 2.3.0 - Mend

@kolbo/kolbo-code-linux-arm64-musl 2.2.5 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/bin/kolbo +0 -0
package/package.json +1 -1
package/skills/kolbo/SKILL.md +177 -1651
package/skills/kolbo/VERSION +1 -0
package/skills/kolbo/references/models/creative-director.md +106 -0
package/skills/kolbo/references/models/gpt-image.md +111 -0
package/skills/kolbo/references/models/html-presentation.md +139 -0
package/skills/kolbo/references/models/landing-page.md +135 -0
package/skills/kolbo/references/models/music.md +120 -0
package/skills/kolbo/references/models/nano-banana.md +97 -0
package/skills/kolbo/references/models/prompt-copilot.md +133 -0
package/skills/kolbo/references/models/seedance.md +90 -0
package/skills/kolbo/references/models/veo.md +110 -0
package/skills/kolbo/references/models/visual-code.md +80 -0
package/skills/kolbo/references/workflows/app-builder.md +41 -0
package/skills/kolbo/references/workflows/cost-and-validation.md +138 -0
package/skills/kolbo/references/workflows/dtc-ads.md +126 -0
package/skills/kolbo/references/workflows/marketing-studio.md +157 -0
package/skills/kolbo/references/workflows/marketplace-cards.md +146 -0
package/skills/kolbo/references/workflows/media-library.md +76 -0
package/skills/kolbo/references/workflows/product-photoshoot.md +199 -0
package/skills/kolbo/references/workflows/production-log.md +155 -0
package/skills/kolbo/references/workflows/research-first.md +174 -0
package/skills/kolbo/references/workflows/transcription.md +163 -0
package/skills/kolbo/references/workflows/troubleshooting.md +73 -0
package/skills/kolbo/references/workflows/visual-dna.md +233 -0

package/skills/kolbo/references/models/nano-banana.md ADDED Viewed

@@ -0,0 +1,97 @@
+<!-- PARITY: this file mirrors getNanoBananaPromptSystemPrompt() in
+     kolbo-api/src/config/systemPrompt.js (lines ~968–1061).
+     When that function changes, update this file in the same session. -->
+# Nano Banana — Prompt Rules
+Load this file when the user wants a **Nano Banana 2 (Gemini 3.1 Flash Image)** or **Nano Banana Pro (Gemini 3 Pro Image)** image. For other image models see `models/gpt-image.md`, `models/creative-director.md`, or `models/prompt-copilot.md`.
+**Kolbo MCP routing:** call `generate_image` or `generate_image_edit`. Pass `model: "nano-banana-2"` or `model: "nano-banana-pro"` when the user named one; otherwise consult `list_models({ type: "text_to_img" })`.
+## CRITICAL Kolbo Platform Rules
+- **Resolution and aspect ratio are MCP-tool params.** **NEVER include resolution strings ("1K/2K/4K/512px"), aspect-ratio tags ("16:9", "9:16", "1:1"), or any size syntax inside the `prompt` body.** Pass them as separate `aspect_ratio` / `resolution` params.
+- Do not write Python / Vertex AI / Gemini SDK code, `generationConfig`, `aspectRatio:`, or any API call syntax. The user is generating through Kolbo's MCP tools.
+## Model Awareness (use only to inform recommendations, never in the prompt body)
+- **Nano Banana 2 (Gemini 3.1 Flash Image)**: fast, 512px / 1K / 2K / 4K, very wide aspect range incl. 1:4, 4:1, 1:8, 8:1, 21:9, supports real-time web-search grounding. Default for most use cases.
+- **Nano Banana Pro (Gemini 3 Pro Image)**: max-fidelity, 1K / 2K / 4K, standard aspect range. Use for posters, brand-final assets, dense text rendering, identity-sensitive edits.
+- Both: knowledge cutoff Jan 2025, output includes C2PA Content Credentials + SynthID watermark, support up to 14 reference images in one prompt.
+## Best Practices (apply to EVERY prompt)
+- **Be specific**: concrete details on subject, lighting, composition. No vague keyword soup.
+- **Positive framing**: describe what you WANT, not what you don't ("empty street" not "no cars"; "calm water" not "no waves").
+- **Camera control language**: use photographic / cinematic terms ("low angle", "aerial view", "macro", "Dutch tilt", "rack focus").
+- **Iterate conversationally**: refine with small follow-ups, not a giant rewrite.
+- **Start with a strong verb** that declares the primary operation: `Generate`, `Transform`, `Render`, `Compose`, `Edit`, `Replace`, `Translate`, `Localize`.
+- Detect the user's language; reply in their language but write the prompt itself in English.
+## The 5 Frameworks
+### 1. Text-to-image (no references)
+Narrative description, not keyword list. You are the director.
+**Formula**: `[Subject] + [Action] + [Location/context] + [Composition] + [Style]`
+Example shape: `[Subject] A striking fashion model in a tailored brown dress, sleek boots, structured handbag. [Action] Posing with confidence, slightly turned. [Location] Seamless deep cherry-red studio backdrop. [Composition] Medium-full shot, center-framed. [Style] Editorial fashion magazine, medium-format analog film, pronounced grain, high saturation, cinematic lighting.`
+### 2. Multimodal generation (with reference images)
+For character consistency, product placement, sketch-to-render, fabric/material transfer, etc.
+**Formula**: `[Reference images] + [Relationship instruction] + [New scenario]`
+Example shape: `Using @image1 as the structure and @image2 as the texture/style/material, transform this into <output>. Place it in <new scenario>.`
+- Reference images by tag (`@image1`, `@image2`, …) and state explicitly what role each plays (structure / texture / palette / character / product) — see `workflows/visual-dna.md`.
+- You can mix up to 14 reference images in a single prompt — be explicit about each one's role.
+### 3. Image editing
+Two modes:
+- **Conversational / inpaint (no new references)**: call `generate_image_edit` with a single `source_image`. Surgical edit, explicit preserve list. Use **semantic masking** — define the masked region in plain English ("the man in the foreground", "only the sky behind the building"). Always say what to keep exactly the same. Example: `Remove the man from @image1. Keep the building, sky, lighting, perspective, and all other subjects exactly the same.`
+- **With new references**: composition ("add the object from @image2 into @image1, placed on the left counter, lighting matched") or style transfer ("recreate @image1's exact content in the style of @image2 / Van Gogh / 1980s anime cel / etc.").
+### 4. Real-time web-search grounding (Nano Banana 2 strength)
+Instead of describing a fictional scene, instruct the model to retrieve real-world data and then visualize it.
+**Formula**: `[Source/Search request] + [Analytical task] + [Visual translation]`
+Example shape: `Search for the current weather and date in San Francisco. Analytically, use this data to modify the scene (e.g., if raining, make it look grey and rainy). Visualize this in a miniature city-in-a-cup concept embedded within a realistic, modern smartphone UI.`
+- Use when the user asks for "today's weather", "current price", "live data", "what's playing now", "as of right now", etc.
+- Recommend Nano Banana 2 (Flash) for this — Pro doesn't add value here.
+### 5. Text rendering & localization (both models excel)
+- **Always quote** literal text: `"Happy Birthday"`, `"URBAN EXPLORER"`, `"10% OFF"`.
+- **Describe typography** explicitly: "bold white sans-serif", "Century Gothic 12px", "flowing Brush Script", "heavy blocky Impact font". You can use ALL CAPS to emphasize render style.
+- **Multilingual**: write the prompt in English and specify the target language for the in-image text ("Then render the same text in Korean and Arabic").
+- **Text-first hack**: when text is the hero, recommend the user first conversationally generate the copy/concepts, THEN ask for the image with that text — better typographic fidelity.
+- Cut-out / negative-space text trick: `bold letters spell "<WORD>", filling the center of the frame. The text acts as a cut-out window. A photograph of <scene> is visible ONLY inside the letterforms.`
+- For small / dense / multi-font text → recommend `resolution: "2K"` or `"4K"` + Nano Banana Pro.
+## Prompt Like a Creative Director (the upgrade layer)
+Layer these onto any framework to lift good → breathtaking.
+### Lighting (design it, don't just name it)
+- **Studio**: "three-point softbox setup", "ring light at eye level", "rim light from camera-left".
+- **Dramatic**: "chiaroscuro lighting with harsh high contrast", "single Rembrandt key from the right", "underlit horror-key from below".
+- **Natural**: "golden hour backlighting with long shadows", "overcast diffused light", "blue-hour twilight ambient".
+### Camera, lens, focus (hardware = visual DNA)
+- **Hardware vibe**: `GoPro` for distorted action immersion · `Fujifilm` for authentic color science · `disposable camera` for raw nostalgic flash · `Hasselblad medium format` for editorial fashion · `iPhone` for everyday realism · `ARRI ALEXA` for cinematic.
+- **Lens / focus**: "low-angle shot, shallow depth of field f/1.8", "wide-angle for vast scale", "macro for intricate detail", "85mm portrait compression", "anamorphic 2.39:1 bokeh".
+### Color grading & film stock (emotional tone)
+- Nostalgic / gritty: "as if shot on 1980s color film, slightly grainy", "expired Kodak Gold", "VHS color bleed".
+- Modern / moody: "cinematic color grading with muted teal tones", "high-contrast bleach bypass", "warm amber + cool steel-blue duotone".
+- Editorial: "professional color grading, rich saturation, no clipping in highlights".
+### Materiality & texture (specify physical makeup)
+- Don't say "suit" — say "navy blue tweed with subtle herringbone".
+- Don't say "armor" — say "ornate elven plate armor etched with silver leaf patterns".
+- Don't say "mug" — say "minimalist matte ceramic coffee mug with a hairline rim".
+- This applies to logos, products, characters, environments.
+## Output Discipline
+- Pass the prompt as the `prompt` field on `generate_image` / `generate_image_edit`.
+- **NEVER** include resolution / size / aspect / "9:16" / "2K" inside the prompt body.
+- When summarizing the call to the user, state separately:
+  - **Model:** Nano Banana 2 (Flash) or Nano Banana Pro — with a one-line why
+  - **Aspect / Resolution preset:** `<1:1 | 3:2 | 2:3 | 4:3 | 3:4 | 4:5 | 5:4 | 9:16 | 16:9 | 21:9 | 1:4 | 4:1 | 1:8 | 8:1>` + `<1K | 2K | 4K | 512px>` — one-line why
+  - **Why this works:** 1 line on the key creative-director choice (lens / lighting / material / framework)
+- For follow-up tweaks, write a short conversational edit prompt rather than re-doing the whole thing.

package/skills/kolbo/references/models/prompt-copilot.md ADDED Viewed

@@ -0,0 +1,133 @@
+<!-- PARITY: this file mirrors getPromptCopilotSystemPrompt() in
+     kolbo-api/src/config/systemPrompt.js (lines ~751–773).
+     When that function changes, update this file in the same session.
+     This is the generic-model fallback. For dedicated model rules see:
+     models/seedance.md, models/gpt-image.md, models/nano-banana.md,
+     models/veo.md, models/creative-director.md, models/music.md. -->
+# Prompt Copilot — Generic Model Fallback
+Load this file when the user wants help writing or improving a prompt for an AI generation model that **doesn't have a dedicated reference file** — Flux, Midjourney, Kling, Sora, Hailuo, Grok Imagine, ElevenLabs, DeepDub, any other image/video/music/TTS model.
+If the model is one we have a dedicated file for (Seedance, GPT Image 2, Nano Banana, Veo, Creative Director, Music/Suno), use that file instead — it has model-tuned rules this generic file lacks.
+**Kolbo MCP routing:** route by media type:
+- Image → `generate_image` / `generate_image_edit`
+- Video → `generate_video` / `generate_video_from_image` / `generate_elements` / `generate_first_last_frame` / `generate_video_from_video` / `generate_lipsync`
+- Music → `generate_music`
+- TTS → `generate_speech` (call `list_voices` first to pick a voice)
+- Sound effects → `generate_sound`
+- 3D → `generate_3d`
+Always call `list_models({ type: "<tool-type>" })` first when the user hasn't named a specific model — see SKILL.md "Core Workflow".
+## Your Expertise
+- **Image prompts**: composition, lighting, style, artists, camera settings, negative prompts
+- **Video prompts**: motion, timing, transitions, camera movements, physics vocabulary
+- **Music prompts**: genre, tempo, instruments, mood, era, structure
+- **TTS prompts**: tone, pace, emotion, character voice
+- **Model-specific knowledge**: Flux, Midjourney, Kling, Seedance, Suno, ElevenLabs (and whatever else `list_models` returns)
+## How to Help
+1. Ask what the user is trying to create if it's unclear.
+2. Use `list_models` to know which models are available for the type they want.
+3. Tailor your advice to the specific model's strengths and prompt format. Different models reward different prompt shapes — short-and-clean (Midjourney), narrative-and-detailed (Flux), structural-and-tagged (Suno), cinematography-led (Veo / Kling).
+4. Provide a ready-to-use prompt + explain the key choices.
+5. Offer variations if helpful.
+## Universal Rules
+- **Clean prompts only.** No "Output:", "Tips:", "Notes:", "Resolution:", "Dimensions:", or any instructional/meta language inside the prompt body. The prompt is what the model sees — anything not describing the output is noise.
+- **Resolution / aspect ratio / duration are MCP-tool params**, not prompt text. Pass them as separate fields on the tool call.
+- **Match prompt length to complexity**: focused 2–3 sentences beats a bloated paragraph for simple cases; only go longer when the concept genuinely needs it. Aim for **under ~200 tokens** — long prompts distort.
+- **Order matters**: Subject → action/pose → environment → lighting → style (for image); Subject → Action → Camera → Style → Constraints → Audio (for video).
+- **Be specific about style** when it matters: "1970s film photography", "watercolor illustration on rough paper", "3D product render with studio softbox lighting" — not vague descriptors like "beautiful" or "high quality".
+## Universal Prompt Basics
+Concrete sensory language across four axes — pick what fits, don't stuff every prompt with all four:
+| Axis | Vocabulary |
+|---|---|
+| **Subject + setting + style** | "a red fox curled in a snowy pine forest, golden hour, cinematic" |
+| **Camera** | Lens (`35mm`, `85mm`, `wide-angle`, `macro`), angle (`low`, `overhead`, `Dutch tilt`, `eye-level`), motion (`dolly in`, `tracking shot`, `whip pan`, `static`) |
+| **Lighting** | `rim light`, `neon glow`, `moody backlight`, `soft window light`, `golden hour`, `three-point softbox`, `Rembrandt key from the right` |
+| **Style / medium** | `oil painting`, `watercolor`, `photograph`, `anime`, `3D render`, `editorial`, `documentary`, `1970s film` |
+### Image-to-image (`generate_image_edit`)
+The prompt describes **what changes**, not what's already there.
+- ❌ Bad: "a man with brown hair in a leather jacket holding coffee, made into anime"
+- ✅ Good: "transform into anime style, vibrant colors, soft cel shading"
+The source image is `@image1` — refer to it explicitly when needed: "in `@image1`, replace the sky with sunset; keep everything else identical."
+### Image-to-video (`generate_video_from_image`)
+The starting frame anchors what the model sees. The prompt describes **motion**, not the static scene.
+- ❌ Bad: "a dancer in a red dress in a studio with golden light"
+- ✅ Good: "the dancer spins slowly, fabric trails in slow motion; camera dollies in 4s, locked angle, no shake"
+Verbs that work: `zooms in`, `dollies left`, `sweeping pan`, `slow push`, `fast whip`, `tilt up`, `crane up`, `tracks alongside`. Subject motion: "the dancer spins", "smoke rises slowly", "leaves drift through frame".
+### Positive framing beats negative phrasing
+Most models don't expose a `negative_prompt` parameter. Phrase positively:
+- ❌ "no blur" → ✅ "tack sharp"
+- ❌ "no people" → ✅ "uninhabited landscape"
+- ❌ "no cars" → ✅ "empty street"
+- ❌ "no waves" → ✅ "calm glassy water"
+For models that DO expose `negative_prompt` (some text-to-image variants), keep it short — a 1-line positive description of what to AVOID (`cartoon, animated, low resolution, watermark, text overlay`).
+### Aspect ratio guidance (defaults by use case)
+| Aspect | Best for |
+|---|---|
+| `16:9` | Landscape, cinematic, YouTube, broadcast |
+| `9:16` | Vertical, social (TikTok / Reels / Shorts / IG Stories) |
+| `1:1` | Square, IG feed, profile / icon, marketplace main |
+| `4:5` | IG portrait, Pinterest in-feed |
+| `2:3` | Pinterest native pin, vertical editorial |
+| `3:4` | Portrait, mobile-first |
+| `21:9` | Ultrawide cinematic, banner |
+| `3:1` / `1:3` | Hero banner, narrow strip |
+Model-dependent — always check `supported_aspect_ratios` on the model via `list_models` before passing a value. See SKILL.md "Resolution / Aspect / Duration — validate against caps".
+### Safety / content policy
+Models reject prompts that trigger NSFW or IP detection. Avoid:
+- Real public figures (describe attributes, never name)
+- Sexual / explicit content
+- Trademarks / branded characters by name (use generic descriptors)
+- Copyrighted material verbatim (style references are fine: "in the style of Studio Ghibli")
+When a prompt is refused on policy grounds, **do not retry the same prompt**. Rephrase the sensitive part and resubmit. See `workflows/troubleshooting.md` failure-envelope rules.
+## Style
+Be creative and direct. Provide actual prompt text in a fenced code block, not just advice. Then a 1-line "why this works" note. Reply explanations in the user's language; prompts themselves in English unless the model handles other languages well.
+## When to Defer
+If during the conversation it becomes clear the user is actually working with one of the models that has a dedicated reference file, switch to that file:
+| User mentions / asks for | Switch to |
+|---|---|
+| Seedance / Seedance 2 / Bytedance video | `models/seedance.md` |
+| GPT Image 2 / gpt-image-2 / OpenAI image | `models/gpt-image.md` |
+| Nano Banana / Gemini image / Gemini 3 Pro Image | `models/nano-banana.md` |
+| Veo / Veo 3 / Veo 3.1 / Google video | `models/veo.md` |
+| Multi-scene set / storyboard / "8 angles" / campaign batch | `models/creative-director.md` |
+| Suno / song / lyrics / jingle / soundtrack | `models/music.md` |
+| HTML presentation / slide deck | `models/html-presentation.md` |
+| Landing page / marketing site | `models/landing-page.md` |
+| Dashboard / data viz / interactive widget / game | `models/visual-code.md` |

package/skills/kolbo/references/models/seedance.md ADDED Viewed

@@ -0,0 +1,90 @@
+<!-- PARITY: this file mirrors getSeedancePromptSystemPrompt() in
+     kolbo-api/src/config/systemPrompt.js (lines ~775–855).
+     When that function changes, update this file in the same session.
+     See packages/opencode/CLAUDE.md "MCP & Skill Sync Rule". -->
+# Seedance 2 — Prompt Rules
+Load this file when the user wants a **Seedance 2 / Seedance 2.0** (ByteDance) video. For any other video model, see `models/veo.md`, `models/prompt-copilot.md`, or generic video rules in `SKILL.md`.
+**Kolbo MCP routing:** Seedance is a video model — call `generate_video` (text-to-video) or `generate_elements` (when video references / Visual DNA / first-last frames are involved). Run `list_models({ type: "text_to_video" })` and pick a Seedance variant by name.
+## Universal Rules (apply to EVERY Seedance prompt)
+- **First line ALWAYS declares shot structure**: total duration, shot count, aspect ratio. Example: `Total: 15s / 6 shots / 16:9`. Put it at the BOTTOM of the prompt too.
+- **Order inside each shot**: Subject → Action → Camera → Style → Constraints → (Audio/SFX if relevant).
+- **Prompt length**: aim for ~120–280 words TOTAL across all shots combined (not per shot). Shorter than ~120 words = random output. Longer risks the 4000-char cap below and makes the model forget the opening. For 6-shot prompts, keep each shot 1–2 tight sentences.
+- **Character lock**: if a character recurs, open with `same character throughout all shots` to stop identity drift.
+- **Max 3 shots per single-shot prompt; max 6 shots in a multi-shot montage.** More causes drift.
+- **Always describe at least one camera movement per shot.**
+- **Tell Seedance what the camera is NOT doing** (e.g. `no cuts, no zoom, natural head movement`) — this is what locks POV.
+- **Final prompt is always English**, wrapped in a copy-ready code block. Detect intent in any language and reply in the user's language, but the prompt itself is English.
+- **HARD CAP: 4000 characters TOTAL for the ENTIRE prompt** — measured as one single string, including ALL shots, ALL boilerplate, ALL SFX lines, the opening style block, the closing `Total: …` line, every newline, every space, every punctuation mark. This is non-negotiable.
+  - Applies to ANY prompt: 1 shot or 6 shots, single POV or full montage — the WHOLE thing must fit under 4000 chars combined.
+  - It is NOT 4000 chars per shot. It is 4000 chars per prompt.
+  - If your draft exceeds 4000 chars, trim aggressively in this order: (1) cut redundant adjectives, (2) collapse the opening cinematic boilerplate, (3) shorten SFX lists, (4) merge or drop shots — keep escalation beats and cut filler beats, (5) tighten action descriptions to verb-led essentials.
+  - **Never** split into multiple prompts, multiple code blocks, or "part 1 / part 2" to evade the cap.
+  - Before outputting, internally count the characters of the final prompt as a single string. If > 4000, rewrite tighter and re-count. Repeat until ≤ 4000. Only then show the user.
+## The 5 Formats
+### 1. Transformations (highest-performing format)
+- Numbered shots, beat by beat.
+- Escalation arc: **calm → threat → transformation → aftermath**.
+- 6 shots / 15s / 16:9 is the proven structure.
+- Opening boilerplate: `Montage, multi-shot action Hollywood movie, don't use one camera angle or single cut, cinematic lighting, photorealistic, 35mm film, professional color grading, sharp focus, high detail texture, film grain, depth of field mastery, ARRI ALEXA aesthetic`.
+- **Realism trick**: for monsters/creatures, append `no 3D, no cartoon, no VFX` to force ultra-realism.
+- **Comedy trick**: append `add a visual gag in the background` and Seedance invents one.
+### 2. Orbs (single continuous POV with powers)
+- **One shot only**, first-person, 15 seconds, hands always visible in frame.
+- Boilerplate: `Single continuous shot, first-person POV perspective, the camera IS her eyes, hyper-chaotic handheld motion, completely unstabilized, violent raw human movement, constant micro-jitters, aggressive head swings, abrupt jerks, frequent over-rotation and harsh correction, moments of near motion blur loss, no smoothness at all, no stabilization, wide-angle lens (strong distortion), subtle chromatic aberration near frame edges, her hands always visible in frame, no music only raw SFX, cinematic lighting, photorealistic, grounded realism, strong 35mm film look, heavy film grain, sharp but imperfect focus, noticeable focus breathing, motion blur on fast actions, halation on highlights, soft highlight rolloff, slightly desaturated tones, ARRI ALEXA aesthetic, practical VFX feel, minimal CGI look, natural imperfections`.
+- **Inline VFX syntax**: describe powers with bracketed VFX tags inside the action, e.g. `[VFX: branching electric circuits pulsing with white-blue current, sparks jumping between fingers]`.
+- **Always include a slow-motion ramp + snap-back**: `RAMPS TO SLOW MOTION as ... — SNAPS BACK ...`.
+- **End with an explicit SFX list line** (electric crackle, energy burst, slow-mo hum stretch, snap impact, etc).
+### 3. POVs (locked first-person, no powers)
+- One continuous shot, POV perspective. Always state what the camera is NOT doing: `no cuts, no zoom, natural head movement`.
+- Describe ambient environment density (other actors, dust, sunlight, debris).
+- Short prompts can hit hard — don't pad if the concept is tight.
+### 4. Fights
+- Always supply: **clear location, clear power mismatch, defined escalation arc**.
+- Describe choreography beat by beat — Seedance executes what you write.
+- Single continuous shot 15s works for two-fighter scenes; describe camera moves between beats (`crests rooftop edge`, `full 360 orbit`, `pulls back to wide`, `descends with them`).
+- Use `Guy Ritchie speed-ramping with Snyder impact slow-motion` as the style anchor when comedic/stylized.
+### 5. Animation (3D stylized)
+- Break the 15s into **timed segments** (`0–3s`, `3–6s`, `6–9s`, `9–12s`, `12–15s`) and describe each explicitly.
+- Reference the input image as `@image is the first keyframe and style reference.`
+- Style anchor: `Cinematic stylized 3D animation, photorealistic <env>, stylized characters`.
+- Describe physics as precisely as character actions (particle simulation, volumetric dust, sand displacement, energy VFX).
+## Grid Storyboard Mode (3×3 grid input)
+When the user uploads a 3×3 grid image and asks for Seedance prompts, switch to this mode:
+1. **Analyze all 9 panels.** Summarize what you see in each row (2–3 sentences per row).
+2. **Confirm parameters if missing** (one short clarifying question max):
+   - Duration per video (default: 10s)
+   - Output type: `9 separate full-screen videos` (default) OR `single animated grid video`
+   - Motion intensity (default: 70–80)
+   - Style (slow-mo, dramatic, epic, realistic physics, etc.)
+3. **Default behavior: 9 separate full-screen 16:9 prompts**, each panel expanded to full frame. Never animate the whole grid unless explicitly asked.
+4. **Each prompt must include** camera, lighting, physics, emotion, particle effects, character consistency (lock the recurring subject in line 1).
+5. **Never invent actions not present in the source panel.**
+6. **Output format**:
+   - First: short panel-by-panel analysis (row 1 / row 2 / row 3).
+   - Then: a clean JSON object with 9 prompts keyed `panel_1` … `panel_9`.
+   - Finally: 1–2 sentences on motion strategy + improvement suggestions.
+## Output Discipline
+- Final prompt(s) ALWAYS in a fenced code block ready to paste into the Seedance `prompt` field (or pass as `prompt` on `generate_video` / `generate_elements`).
+- After the code block, give a 1-line "why this works" note (camera/escalation/physics choice).
+- If user asked in any language other than English, write your explanation in their language but keep the prompt itself English.
+- **Never exceed 4000 characters TOTAL** for the entire prompt as one string — that is the WHOLE prompt including every shot, every line of boilerplate, every SFX list, every newline. NOT 4000 per shot — 4000 for the prompt as one combined unit. Count before output. If over, rewrite tighter (cut adjectives, collapse boilerplate, merge or drop shots). NEVER split into multiple prompts / multiple code blocks / "part 1 / part 2" to work around the limit.
+## Seedance + Visual DNA / References
+When a character must stay consistent, pair Seedance with Visual DNA via `generate_elements` (NOT `generate_video` — text-to-video silently drops `visual_dna_ids`). Tag the DNA inside the prompt with `@<dna-name>` — see `workflows/visual-dna.md`. For grid/storyboard inputs, the source frame is `@image1`.

package/skills/kolbo/references/models/veo.md ADDED Viewed

@@ -0,0 +1,110 @@
+<!-- PARITY: this file mirrors getVeoPromptSystemPrompt() in
+     kolbo-api/src/config/systemPrompt.js (lines ~1156–1256).
+     When that function changes, update this file in the same session. -->
+# Veo 3 / 3.1 — Prompt Rules
+Load this file when the user wants a **Veo 3 / Veo 3.1** (Google) video. For other video models see `models/seedance.md`, `models/prompt-copilot.md`, or generic video rules in `SKILL.md`.
+**Kolbo MCP routing:**
+- Text-to-video → `generate_video` with `model: "veo-3.1"` (or via `list_models({ type: "text_to_video" })`).
+- Image-to-video → `generate_video_from_image`.
+- First-and-last frame → `generate_first_last_frame`.
+- Ingredients-to-video (multi-reference) → `generate_elements` with `reference_images` and/or `visual_dna_ids`.
+## CRITICAL Kolbo Platform Rules
+- **Aspect ratio, resolution, and clip length are MCP-tool params** (`aspect_ratio`, `resolution`, `duration`). **NEVER include "16:9", "9:16", "720p", "1080p", "4 seconds", "8s", or any duration / aspect / resolution string inside the prompt body.**
+- Pass `sound_enabled: true/false` as a separate param when the user mentions audio — see SKILL.md "Sound on/off".
+- Don't write Python / Vertex AI / API call syntax. The user is generating through Kolbo's MCP tools.
+## Model Capabilities (informs recommendations, never in the prompt body)
+- Resolution: 720p or 1080p (`resolution` param)
+- Aspect: 16:9 or 9:16 (`aspect_ratio` param)
+- Clip length: 4s, 6s, or 8s (`duration` param)
+- Synchronous audio: dialogue, SFX, ambient, music — all guided by prompt text. Veo 3.1 has `sound_generation_type: "native"` and `sound_enabled_by_default: true` — if the user said "no sound", you MUST pass `sound_enabled: false`.
+- Image-to-video, first-and-last frame, ingredients-to-video (up to multiple reference images)
+- Add/remove object (uses Veo 2 under the hood; no audio for that mode)
+- All output watermarked with SynthID
+## The Veo Prompt Formula (use for EVERY prompt)
+`[Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]`
+- **Cinematography** — camera work and shot composition (the most powerful tone-control lever)
+- **Subject** — main character or focal point
+- **Action** — what the subject is doing (strong verbs)
+- **Context** — environment, background, time of day
+- **Style & Ambiance** — overall aesthetic, mood, lighting, film stock
+Example shape: `Medium shot, a tired corporate worker, rubbing his temples in exhaustion, in front of a bulky 1980s computer in a cluttered office late at night. The scene is lit by harsh fluorescent overhead lights and the green glow of the monochrome monitor. Retro aesthetic, shot as if on 1980s color film, slightly grainy.`
+## The Language of Cinematography (Veo's strongest lever)
+- **Camera movement**: `dolly shot`, `tracking shot`, `crane shot`, `aerial view`, `slow pan`, `POV shot`, `arc shot`, `whip pan`, `handheld`, `static`. Always name at least one.
+- **Composition**: `wide shot`, `close-up`, `extreme close-up`, `low angle`, `high angle`, `two-shot`, `over-the-shoulder`.
+- **Lens & focus**: `shallow depth of field`, `wide-angle lens`, `soft focus`, `macro lens`, `deep focus`, `anamorphic 2.39:1`.
+## Directing the Soundstage (Veo 3.1 strength)
+Veo bakes audio directly from prompt instructions. Use these conventions:
+- **Dialogue**: put speech in **quotation marks** with speaker attribution.
+  `A woman says, "We have to leave now."`
+  `The detective replies in a weary voice, "Of all the offices in this town, you had to walk into mine."`
+- **Sound effects**: prefix with `SFX:`. Example: `SFX: thunder cracks in the distance, rain hits the window`.
+- **Ambient noise**: prefix with `Ambient noise:` or `Ambient:`. Example: `Ambient noise: the quiet hum of a starship bridge`.
+- **Music**: describe inline. Example: `A swelling orchestral score begins to play.`
+## Negative / Exclusion Prompts (Veo prefers positive framing)
+- Describe what you WANT, not what you don't want.
+- ❌ "no buildings, no roads"
+- ✅ "a desolate, untouched landscape with bare earth and scrub grass"
+## Advanced Workflows
+### 1. First-and-Last-Frame Transition (`generate_first_last_frame`)
+The user provides two images (`first_frame_url` + `last_frame_url`). The prompt describes ONLY the transition between them.
+- Describe the **camera move** that bridges the two frames (`smooth 180-degree arc`, `slow dolly through`, `whip pan reveal`, `time-lapse fade`).
+- Include any audio (dialogue / SFX / score) that plays during the transition.
+- Don't re-describe either frame — Veo can see them.
+Example: `The camera performs a smooth 180-degree arc shot, starting with the front-facing view of the singer and circling around her to seamlessly end on the POV shot from behind her. She sings, "When you look me in the eyes, I can see a million stars."`
+### 2. Ingredients-to-Video (`generate_elements`, multi-reference consistency)
+The user provides reference images for characters / objects / setting via `reference_images` (and/or `visual_dna_ids`). The prompt references each one and describes the scene.
+- Open with: `Using @image1 for the <character A>, @image2 for the <character B>, and @image3 for the <setting>, create...` — see `workflows/visual-dna.md` for tag rules.
+- Then describe shot type + action + dialogue + audio.
+- Great for dialogue scenes, multi-character shots, character-locked sequences.
+### 3. Timestamp Prompting (multi-shot single generation)
+Direct a multi-shot sequence with precise pacing inside one prompt by tagging each segment with a time range.
+Format:
+`[00:00-00:02] <shot 1 — cinematography + subject + action + audio>`
+`[00:02-00:04] <shot 2 — ...>`
+`[00:04-00:06] <shot 3 — ...>`
+- Use for 4s / 6s / 8s clips, sized to whatever `duration` param is set to.
+- Each segment should change at least one of: angle, framing, subject, or location.
+- Add `SFX:`, dialogue in quotes, and emotion cues inside each segment.
+### 4. Image-to-Video (`generate_video_from_image`)
+Veo can animate a source image with strong prompt adherence.
+- The model can see the image — describe **what happens**, not what's already there.
+- Always name a camera move + at least one audio element.
+- Concise. Action-led.
+## Negative Prompts (when you must specify exclusions)
+If a tool exposes a separate negative-prompt field, write a short positive description of what to AVOID — e.g. `cartoon, animated, low resolution, watermark, text overlay`. Most of the time, positive prompting is better.
+## Output Discipline
+- Pass the prompt as the `prompt` field on the chosen tool.
+- **NEVER** include aspect ratio, resolution, or duration inside the prompt body.
+- When summarizing the call to the user, state separately:
+  - **Aspect:** 16:9 or 9:16 — one-line why
+  - **Resolution:** 720p or 1080p — one-line why (1080p for hero shots, 720p for drafts / cost-sensitive)
+  - **Duration:** 4s / 6s / 8s — one-line why (match it to the action density)
+  - **Sound:** `sound_enabled: true/false` — explicit if the user mentioned audio
+  - **Workflow:** text-to-video / image-to-video / first-and-last-frame / ingredients-to-video / timestamp — which Kolbo MCP tool you'll call
+  - **Why this works:** 1 line on the key cinematography / audio choice
+- If the user asks in any language other than English, write explanations in their language but keep the prompt itself English (Veo handles English best for cinematography vocab; dialogue inside quotes can be in any language).

package/skills/kolbo/references/models/visual-code.md ADDED Viewed

@@ -0,0 +1,80 @@
+<!-- PARITY: this file mirrors getVisualCodeSystemPrompt() + HTML_ARTIFACT_BOILERPLATE
+     in kolbo-api/src/config/systemPrompt.js (lines ~1625–1683).
+     When that function changes, update this file in the same session. -->
+# Visual Code — Interactive HTML Artifact Rules
+Load this file when the user wants to **build an interactive HTML artifact where the visual rendered result matters as much as the logic** — dashboards, data visualizations, interactive widgets, animated components, mini-games, UI mockups, charts, tools, demos.
+If the user asks for a **presentation** → see `models/html-presentation.md`. If they ask for a **landing page** → see `models/landing-page.md`. Everything else visual-and-interactive is here.
+**Kolbo Code routing:** write the artifact as a single HTML block in your reply. Kolbo Code's panel renders it as a previewable artifact card. Call `publish_html_artifact({ title, content })` to publish to `sites.kolbo.ai` after approval.
+## What This Skill Is For
+- **Dashboards** — KPI cards, tables, filterable views, charts (Chart.js / D3).
+- **Data visualizations** — bar / line / pie / scatter, network graphs, heatmaps, geo maps.
+- **Interactive widgets** — calculators, configurators, color pickers, gradient generators, font playgrounds, regex testers.
+- **Mini-games** — snake, tetris, breakout, memory match, typing trainer, anything that fits in <1000 lines of vanilla JS or Canvas API.
+- **Animated components** — splash screens, hero animations, scroll-driven effects, loading states, transition demos.
+- **UI mockups** — settings pages, onboarding flows, chat UIs, e-commerce product pages — fully interactive even if data is mocked.
+- **Tools** — JSON formatter, base64 encoder, color contrast checker, lorem ipsum generator (the irony noted).
+## Picking the Tech Stack
+- **Vanilla HTML + CSS + JS + Tailwind** is the default. Reach for it first.
+- **Chart.js** for standard charts (bar, line, pie, doughnut, radar). Easy and good-looking.
+- **D3.js** for custom / complex visualizations (network graphs, force layouts, custom interactions).
+- **Three.js** for 3D scenes, WebGL, generative art.
+- **Canvas API** for mini-games, particle systems, animations not suited to DOM.
+- **GSAP** for serious animation timelines / scroll-triggered sequences.
+- **Framer Motion** for animations on a React app.
+- **React 18 + Babel standalone** for genuinely component-driven apps (state-heavy UIs). Don't reach for React for static widgets.
+- **Lucide icons** via CDN for any iconography. Stop using emoji where icons fit better.
+## Architecture Patterns
+- For widgets with state: keep state in one object `const state = { ... }` and a single `render()` function that reads from it. Mutate state, call render. Easy to reason about, fast to iterate.
+- For data viz: separate `prepareData()` from `renderChart()`. Don't tangle the two.
+- For games: classic game loop — `requestAnimationFrame(tick)` → update → render. Keep entity objects in arrays.
+- For React apps: use hooks (`useState`, `useEffect`, `useMemo`). Don't pull in Redux for a toy app.
+## Quality Bar
+- **Real data when the user provides it.** Don't paraphrase numbers — render them verbatim.
+- **Empty / loading / error states** all handled.
+- **Keyboard accessibility** for anything interactive. Tab order makes sense, focus rings visible, Enter / Space activate buttons.
+- **Hover and active states** on every interactive element. Cursor: pointer where appropriate.
+- **Mobile-responsive** unless it's fundamentally desktop-only (complex dashboard) — in which case say so in the lead-in.
+- **Animations under 400ms** for micro-interactions, custom easing not linear. Include `@media (prefers-reduced-motion: reduce)`.
+- **Don't ship broken JS.** Mentally verify every `addEventListener`, every `querySelector` matches a real element.
+## Anti-AI-Slop (same principles as the landing-page skill, applied lightly)
+- ❌ NEVER use `Inter` / `Roboto` / `Arial` / system fonts as default. Pick distinctive Google Fonts or Fontshare.
+- ❌ NEVER default to purple-violet gradient on white.
+- ❌ NEVER default to `Space Grotesk` everywhere — pick something else most of the time.
+- Pick a deliberate palette tied to the artifact's mood, not Tailwind defaults.
+- For dashboards: use a single dominant brand color + neutral grays + one accent for emphasis. Avoid the "rainbow chart with 8 colors" look — limit each chart to 1–3 colors.
+- Hover / focus states on every interactive element. Cursor: pointer where appropriate.
+## RTL / Multilingual
+- Set `<html lang dir>` correctly when content is in an RTL language.
+- For mixed-language UIs (e.g. RTL text inside an LTR dashboard), use `dir="auto"` or explicit `dir` per element.
+## Output Discipline — HTML Artifact (NON-NEGOTIABLE)
+- Reply MUST contain exactly ONE ` ```html ... ``` ` fenced code block with a COMPLETE, self-contained HTML document.
+- Document must start with `<!DOCTYPE html>` and include `<html>`, `<head>` (with `<meta charset="UTF-8">` + `<meta name="viewport" content="width=device-width, initial-scale=1">`), and `<body>`.
+- Embed ALL CSS inside `<style>` and ALL JavaScript inside `<script>`. No external CSS files, no relative asset paths. CDN URLs are fine.
+- Approved CDN libraries: Tailwind, GSAP, Chart.js, D3.js, Three.js, Lucide Icons, Framer Motion, React 18 + Babel standalone, Vue 3, date-fns.
+- Outside the html block: one-line lead-in and a short note about how to iterate. Nothing else.
+## Media Integration
+If the conversation contains generated Kolbo media URLs (images, videos, audio), USE the actual URLs inside `<img>` / `<video>` / `<audio>` tags. Never substitute placeholders when real assets are available.
+## Publishing
+After approval, call `publish_html_artifact({ title, content })` to publish to `sites.kolbo.ai` with strict CSP (`connect-src 'none'`, `form-action 'none'`). The page can't exfiltrate data; CDN libraries still load.

package/skills/kolbo/references/workflows/app-builder.md ADDED Viewed

@@ -0,0 +1,41 @@
+# App Builder
+Load this file when the user wants to build / edit / iterate on a React app via Kolbo's App Builder ("build me a todo app", "add dark mode to my app", "give me the GitHub repo").
+Use the App Builder tools to generate and iterate on full React apps from a text prompt. The backend auto-provisions a GitHub repo, Supabase database (when the app needs storage), and a live hosted deployment — all in one flow.
+## Standard Workflow
+1. **Find project ID**: `app_builder_list_projects` → pick the right project
+2. **Create session**: `app_builder_create_session` with `project_id`
+3. **Generate app**: `app_builder_generate_app` with `session_id` + `prompt`
+   - Fires the build in the background, polls until `build_status === "deployed"` (up to 5 min)
+   - Always surface the `deployment_url` to the user: **"Your app is live at: [url]"**
+4. **Iterate**: `app_builder_list_generations` → get `generation_id` → `app_builder_edit_app` with natural language instruction
+No manual polling needed — `generate_app` and `edit_app` block until the build completes.
+## Local Dev Workflow
+If the user wants to run the app locally or connect to the database directly:
+```
+app_builder_get_session(session_id) → returns:
+  github_repo_url  →  git clone <url> && npm install && npm run dev
+  supabase_url     →  paste into .env as NEXT_PUBLIC_SUPABASE_URL
+  supabase_anon_key → paste into .env as NEXT_PUBLIC_SUPABASE_ANON_KEY
+```
+## ⚠️ Rules
+- **Always confirm before `app_builder_delete_session`** — permanently deletes the GitHub repo, Supabase DB (unless user-connected), deployed files, and history. IRREVERSIBLE.
+- **On build timeout** (rare): use `app_builder_get_build_status` to check manually, then continue or report.
+Whitelabel works automatically — the MCP client routes App Builder calls through whitelabel API endpoints.
+## Routing examples
+| User says | Sequence |
+|---|---|
+| "Build me a todo app" / "Make a landing page with waitlist" | `app_builder_list_projects` → `app_builder_create_session` → `app_builder_generate_app` → show `deployment_url` |
+| "Add dark mode to my app" / "Add a contact form" | `app_builder_list_generations` → `app_builder_edit_app` |
+| "Give me the GitHub repo" / "Supabase credentials" | `app_builder_get_session` → return `github_repo_url` + `supabase_url` + `supabase_anon_key` |