@kolbo/kolbo-code-linux-arm64-musl 2.1.10 → 2.1.12
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kolbo +0 -0
- package/package.json +1 -1
- package/skills/kolbo/SKILL.md +93 -34
package/bin/kolbo
CHANGED
|
Binary file
|
package/package.json
CHANGED
package/skills/kolbo/SKILL.md
CHANGED
|
@@ -13,9 +13,9 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
|
|
|
13
13
|
|
|
14
14
|
| Tool | Description |
|
|
15
15
|
|------|-------------|
|
|
16
|
-
| `generate_image` | Create
|
|
16
|
+
| `generate_image` | Create a **single** image from a text prompt. Supports Visual DNA, moodboards, reference images, web-search grounding. |
|
|
17
17
|
| `generate_image_edit` | Edit/transform an existing image (background removal, color changes, compositing). Pass source images + edit prompt. |
|
|
18
|
-
| `generate_creative_director` | Generate
|
|
18
|
+
| `generate_creative_director` | **Generate 2–8 related images or videos as one coherent set.** Use this INSTEAD of multiple `generate_image` calls whenever the user wants more than one related output (storyboards, ad campaigns, product sets, character sheets, scene variations). Handles style consistency and runs scenes in parallel internally. |
|
|
19
19
|
| `generate_video` | Create videos from text prompts. Supports Visual DNA and reference images for consistency. |
|
|
20
20
|
| `generate_video_from_image` | Animate a still image into video. Prompt describes the motion, not the subject. |
|
|
21
21
|
| `generate_video_from_video` | Restyle/transform an existing video (style transfer, scene restyling, subject swap). Keeps the original motion. |
|
|
@@ -88,12 +88,14 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
|
|
|
88
88
|
|
|
89
89
|
## Core Workflow
|
|
90
90
|
|
|
91
|
-
1. **Check credits** with `check_credits
|
|
92
|
-
2. **Discover models** with `list_models` using a `type` filter
|
|
93
|
-
3. **Pick the model**: If the user explicitly requested a specific model, use that. Otherwise, **prefer the cheapest model that still has great quality** — look at both `credit` cost and `recommended` status from `list_models`.
|
|
94
|
-
4. **
|
|
91
|
+
1. **Check credits** ONCE per conversation with `check_credits`. Skip if you already checked earlier in this session.
|
|
92
|
+
2. **Discover models** with `list_models` using a `type` filter — but **skip this when the user names a specific model** (e.g. "seedance 2 fast"). Only call `list_models` when you need to discover or compare models.
|
|
93
|
+
3. **Pick the model**: If the user explicitly requested a specific model, use that name directly. Otherwise, **prefer the cheapest model that still has great quality** — look at both `credit` cost and `recommended` status from `list_models`.
|
|
94
|
+
4. **How generation calls work**: Each tool call blocks until the generation is fully complete (the MCP server polls the API internally). For images this is seconds; for video it can be minutes. If a call times out, use `get_generation_status` with the returned generation ID. When you output multiple tool calls in a single response, they run concurrently — so batch calls finish in the time of the slowest one, not the sum.
|
|
95
95
|
5. **Share the URL** — after a successful generation, hand the real URL back to the user. Never fabricate URLs.
|
|
96
96
|
|
|
97
|
+
**For batch operations** (generating multiple items at once), see the "Rate Limiting & Batch Generation" section below — it overrides the per-item steps above.
|
|
98
|
+
|
|
97
99
|
### Model Types (for `list_models`)
|
|
98
100
|
|
|
99
101
|
| Type | Use for |
|
|
@@ -125,40 +127,59 @@ Creative generations bill against the user's Kolbo credit balance. **Billing uni
|
|
|
125
127
|
| **3D model** | per model (flat) | 5–300 cr | Trellis = 5 cr; Meshy v6 = 150 cr; Marble 1.1 = 300 cr |
|
|
126
128
|
| **Transcription (stt)** | per minute of audio | model.credit × duration_minutes | |
|
|
127
129
|
|
|
128
|
-
**Calculation formulas —
|
|
130
|
+
**Calculation formulas — apply when confirming cost:**
|
|
129
131
|
- **Video / Lipsync**: `total = model_credit_per_second × duration_seconds`
|
|
130
|
-
-
|
|
132
|
+
- Get the `credit` value from `list_models` (or from a previous call in this session) and multiply by duration.
|
|
131
133
|
- Never assume the credit shown is a flat per-generation cost for these types.
|
|
132
134
|
- **Music**: flat per generation — `total = model_credit` (duration does not change the cost).
|
|
133
135
|
- **TTS**: `total = model_credit × ceil(character_count / 100)`
|
|
134
136
|
- Count the actual characters in the text before estimating. 1000 chars with ElevenLabs = 50 credits.
|
|
135
137
|
- **Images / 3D / Sound effects**: `total = model_credit × quantity`
|
|
136
138
|
|
|
137
|
-
**
|
|
138
|
-
|
|
139
|
+
**Cost confirmation — know when to skip it:**
|
|
140
|
+
- **User specified everything** (model, count, duration, e.g. "make 5 videos, seedance 2 fast, 15s, 16:9"): **ACT IMMEDIATELY** — that IS the confirmation. Do not re-explain costs or ask again.
|
|
141
|
+
- **Single generation under 5 credits**: proceed without confirmation.
|
|
142
|
+
- **Everything else**: calculate total cost, present a summary, and wait for the user to confirm before generating.
|
|
139
143
|
|
|
144
|
+
**When confirmation IS needed:**
|
|
140
145
|
1. Calculate per-item cost using the formulas above.
|
|
141
146
|
2. Multiply by the number of items.
|
|
142
147
|
3. Present a summary: "This will generate 8 videos × 5s each using [model] at X cr/s = **Y credits total**. Proceed?"
|
|
143
|
-
4. **Suggest cheaper alternatives** if available
|
|
148
|
+
4. **Suggest cheaper alternatives** if available.
|
|
144
149
|
5. Only proceed after the user confirms.
|
|
145
150
|
|
|
146
|
-
The only exception: single image generations under 5 credits — those can proceed without confirmation unless the user's balance is low.
|
|
147
|
-
|
|
148
151
|
### Rate Limiting & Batch Generation (CRITICAL)
|
|
149
152
|
|
|
150
|
-
|
|
151
|
-
|
|
152
|
-
|
|
153
|
-
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
|
|
153
|
+
**Rate limits** (per user, enforced server-side):
|
|
154
|
+
- **Image generation**: 30 requests per minute (higher because images are fast and cheap)
|
|
155
|
+
- **All other generation types**: 10 requests per minute per type (e.g. 10 video + 10 image = fine, but 11 video in 1 minute = 429)
|
|
156
|
+
- **300 requests per minute** global across all media endpoints
|
|
157
|
+
- **Uploads** (`upload_media`): 300/min, no credit cost — much lighter than generation
|
|
158
|
+
- The API **queues** requests internally — it never silently drops them. If you're within limits, every request will be processed.
|
|
159
|
+
|
|
160
|
+
**⚠️ NEVER duplicate a generation you already fired.**
|
|
161
|
+
Before calling any generation tool, check your conversation history. If you already called that tool with the same or similar prompt in this session:
|
|
162
|
+
- Do NOT call it again — even if it was aborted or interrupted (it is still running server-side and will complete)
|
|
163
|
+
- Only retry if the user explicitly says "retry", "redo", or "try again"
|
|
164
|
+
- Each duplicate wastes real credits from the user's balance
|
|
165
|
+
- If unsure whether a generation went through, use `get_generation_status` to check — the API returns 202 immediately and processes in the background, so aborted tool calls still generate
|
|
166
|
+
|
|
167
|
+
**Batch generation workflow (≤10 items):**
|
|
168
|
+
1. Confirm cost ONCE — or skip if the user already specified model, count, and duration (e.g. "make 5 videos, seedance 2 fast, 15s" IS the confirmation — act immediately)
|
|
169
|
+
2. **Output ALL generation tool calls in a single response** — up to 10 per tool type. The system runs them concurrently, so 5 videos render in parallel and finish in the time of the slowest one, not 5× the time.
|
|
170
|
+
3. Each call blocks until its generation is complete (images: seconds, video: 1-5 minutes). This is normal — don't apologize for the wait.
|
|
171
|
+
4. Track what you've generated — never re-fire a completed or in-progress generation.
|
|
172
|
+
5. After all complete, present all results together.
|
|
173
|
+
6. If any fail with 429: wait 60 seconds and retry only the failed ones (max 2 retries).
|
|
174
|
+
|
|
175
|
+
**Multi-image decision:**
|
|
176
|
+
- User gives a **general brief** ("make 4 product shots", "create a storyboard") → use `generate_creative_director` (you plan the scenes, it handles consistency + parallel execution)
|
|
177
|
+
- User gives **explicit separate prompts** ("Image 1: X, Image 2: Y, Image 3: Z") → fire all as **parallel `generate_image` calls** in one response
|
|
178
|
+
- Never call `generate_image` sequentially in a loop — either use `generate_creative_director` or fire all calls in one parallel batch
|
|
179
|
+
|
|
180
|
+
**Don't narrate, just generate.** When the user says "make 5 videos", output all 5 tool calls in one response. Don't explain your plan, don't calculate step-by-step, don't say "Generating Video 1 of 5..." — just call the tools.
|
|
181
|
+
|
|
182
|
+
**Handling interruptions:** If the user aborts or interrupts mid-batch (e.g. cancels Video 1, then says "do the rest" or "continue with 2-5"), pick up where you left off. Check which generations you already fired, skip those, and fire only the remaining ones. Never restart a batch from the beginning. Remember: aborted tool calls still process server-side — don't re-fire them.
|
|
162
183
|
|
|
163
184
|
---
|
|
164
185
|
|
|
@@ -171,7 +192,7 @@ Use `transcribe_audio` ONLY when the user explicitly asks for:
|
|
|
171
192
|
- Summary of what was **spoken/said** in the video
|
|
172
193
|
- Dialogue extraction from video
|
|
173
194
|
|
|
174
|
-
**Do NOT use `transcribe_audio` to "analyze" a video visually.** For visual analysis
|
|
195
|
+
**Do NOT use `transcribe_audio` to "analyze" a video visually.** For visual analysis **of videos or audio**, use `upload_media` → `chat_send_message` with `media_urls`. For **images**, use the `Read` tool directly — you have built-in vision.
|
|
175
196
|
|
|
176
197
|
### Workflow
|
|
177
198
|
1. Call `transcribe_audio` with the `source` (URL or absolute local file path)
|
|
@@ -209,14 +230,17 @@ Transcription supports files up to 30 minutes. For longer content, split the fil
|
|
|
209
230
|
|
|
210
231
|
### Visual Video/Audio/Image Analysis
|
|
211
232
|
|
|
212
|
-
**The agent has built-in vision —
|
|
233
|
+
**The agent has built-in vision — ALWAYS prefer your own model for images:**
|
|
213
234
|
|
|
214
235
|
| Media type | How to analyze |
|
|
215
236
|
|------------|----------------|
|
|
216
|
-
| **Image** (jpg, png, webp, etc.) | Read it directly with the `Read` tool —
|
|
237
|
+
| **Image** (jpg, png, webp, etc.) | **Read it directly with the `Read` tool** — you see images natively. No upload, no API call, no rate-limit risk. This is ALWAYS the first choice for images. |
|
|
217
238
|
| **Video / Audio** | `upload_media` → `chat_send_message` with `media_urls` (Gemini handles video/audio) |
|
|
218
239
|
| **Transcription** | `transcribe_audio` — ONLY when user explicitly says "transcribe", "subtitles", "SRT", or "what's being said" |
|
|
219
240
|
|
|
241
|
+
**⚠️ Image analysis priority: YOUR OWN VISION FIRST.**
|
|
242
|
+
You are a multimodal model — you can see and analyze images directly via the `Read` tool. This is faster, free, and avoids API rate limits. **Never upload images to Kolbo or use `chat_send_message` for image analysis** unless the user explicitly asks to use a specific Kolbo chat model. Even with 10+ images, read them all yourself — you can handle up to 10 images in a single analysis pass.
|
|
243
|
+
|
|
220
244
|
**NEVER use ffmpeg or frame extraction for analysis. NEVER ask the user — just pick the right path above.**
|
|
221
245
|
|
|
222
246
|
**Video/Audio analysis workflow — Step 1 is NOT optional:**
|
|
@@ -227,12 +251,41 @@ Transcription supports files up to 30 minutes. For longer content, split the fil
|
|
|
227
251
|
- Always an **array**: `media_urls: ["https://cdn.kolbo.ai/..."]`
|
|
228
252
|
- **Omit `model`** — Smart Select auto-routes to Gemini when media is detected
|
|
229
253
|
- **Sessions do NOT remember media between messages.** On retry: reuse the same CDN `url` (no re-upload) but always pass `media_urls` again.
|
|
230
|
-
- **Batch / many videos**:
|
|
254
|
+
- **Batch / many videos**: use `list_models` to find the cheapest Gemini model and pass it explicitly for cheaper bulk runs
|
|
255
|
+
|
|
256
|
+
### ⚠️ Batching Media in Chat Messages (CRITICAL)
|
|
257
|
+
|
|
258
|
+
**Always send ALL media in ONE `chat_send_message` call.** The `media_urls` array accepts up to **10 URLs** in a single request. Never send one message per image/video.
|
|
259
|
+
|
|
260
|
+
**Why this matters:** Each `upload_media` call + the final `chat_send_message` all count toward rate limits. Sending 10 uploads + 10 separate chat messages = 20 requests in rapid succession → "Too many generation requests" error. Instead:
|
|
261
|
+
|
|
262
|
+
1. Upload all files at once (output all `upload_media` calls in one response — uploads are 300/min and cost no credits).
|
|
263
|
+
2. Collect ALL returned CDN URLs into one array.
|
|
264
|
+
3. Send ONE `chat_send_message` with all URLs in `media_urls`.
|
|
265
|
+
|
|
266
|
+
**Example — analyzing 5 videos:**
|
|
267
|
+
```
|
|
268
|
+
# Step 1: Upload all in one response (all 5 upload_media calls at once)
|
|
269
|
+
upload_media({ source: "video1.mp4" }) → url1
|
|
270
|
+
upload_media({ source: "video2.mp4" }) → url2
|
|
271
|
+
upload_media({ source: "video3.mp4" }) → url3
|
|
272
|
+
upload_media({ source: "video4.mp4" }) → url4
|
|
273
|
+
upload_media({ source: "video5.mp4" }) → url5
|
|
274
|
+
|
|
275
|
+
# Step 2: ONE chat call with ALL media URLs
|
|
276
|
+
chat_send_message({
|
|
277
|
+
message: "Analyze all 5 videos...",
|
|
278
|
+
media_urls: [url1, url2, url3, url4, url5]
|
|
279
|
+
})
|
|
280
|
+
```
|
|
281
|
+
|
|
282
|
+
**Rate limit recovery:** If you hit "Too many generation requests", wait 60 seconds before retrying. On retry, do NOT re-upload — reuse the CDN URLs from step 1.
|
|
231
283
|
|
|
232
284
|
**❌ Never do this:**
|
|
233
285
|
- Pass a local file path in `media_urls` — it won't work, only CDN URLs work
|
|
234
286
|
- Use the `.txt` URL from a transcription result as the video URL — that's text, not video
|
|
235
287
|
- Skip `upload_media` and try to construct a URL yourself
|
|
288
|
+
- Send separate `chat_send_message` calls for each media file — batch them into ONE call
|
|
236
289
|
|
|
237
290
|
When in doubt, do visual analysis. Do not stop to ask.
|
|
238
291
|
|
|
@@ -259,7 +312,7 @@ Simple edits deserve simple prompts. Only elaborate for genuinely complex, multi
|
|
|
259
312
|
### Multi-Scene / Campaigns
|
|
260
313
|
For storyboards, campaigns, or character-consistent sequences, use `generate_creative_director` — it generates 1–8 coordinated scenes from a single creative brief with consistent style. Pass `visual_dna_ids` and/or `moodboard_id` for character/style consistency across all scenes.
|
|
261
314
|
|
|
262
|
-
In the CLI, you can also do
|
|
315
|
+
In the CLI, you can also do multiple `generate_image` calls (in parallel for batches) with the same Visual DNA profiles.
|
|
263
316
|
|
|
264
317
|
---
|
|
265
318
|
|
|
@@ -283,7 +336,7 @@ Visual DNA profiles capture the visual "identity" of a character, style, product
|
|
|
283
336
|
|
|
284
337
|
## Video Prompts
|
|
285
338
|
|
|
286
|
-
Video
|
|
339
|
+
Video costs more per generation than images — write prompts deliberately to get it right the first time.
|
|
287
340
|
|
|
288
341
|
### Core Rules
|
|
289
342
|
- **Order**: Subject → Action → Camera → Style → Constraints → Audio
|
|
@@ -420,7 +473,7 @@ Describe **genre → mood → instrumentation → tempo → era**, in that order
|
|
|
420
473
|
- `get_moodboard` to see full details before applying
|
|
421
474
|
|
|
422
475
|
**Presets** bundle prompt templates + style direction for specific creative looks. Pass a `preset_id` to generation tools.
|
|
423
|
-
- `list_presets` with optional `type` filter ("image", "video", "
|
|
476
|
+
- `list_presets` with optional `type` filter ("image", "video", "video_from_image", "music")
|
|
424
477
|
|
|
425
478
|
---
|
|
426
479
|
|
|
@@ -438,6 +491,8 @@ Use `list_media` to browse previously uploaded content (filter by type, search b
|
|
|
438
491
|
|
|
439
492
|
Use `chat_send_message` to interact with Kolbo AI models (GPT-4o, Claude, etc.) with optional web search and deep think modes. Conversations persist via `session_id` — omit to start new, pass to continue.
|
|
440
493
|
|
|
494
|
+
**Media in chat:** Always batch all media into a single message. `media_urls` accepts up to 10 URLs per call. See the "Batching Media in Chat Messages" section above for the mandatory workflow.
|
|
495
|
+
|
|
441
496
|
Use `chat_list_conversations` and `chat_get_messages` to browse conversation history.
|
|
442
497
|
|
|
443
498
|
---
|
|
@@ -519,7 +574,7 @@ If Kolbo tools timeout or aren't listed, the MCP server may not be wired. Tell t
|
|
|
519
574
|
This re-wires the MCP configuration automatically. Then restart the session.
|
|
520
575
|
|
|
521
576
|
### "Rate limited" (429 errors)
|
|
522
|
-
Kolbo allows 10 generation requests per minute per tool type. Wait 60 seconds and retry. Use `generate_creative_director` for batch image work instead of multiple `generate_image` calls.
|
|
577
|
+
Kolbo allows 10 generation requests per minute per user per tool type (video, image, etc. are separate pools). Wait 60 seconds (the window resets) and retry only the failed calls. Use `generate_creative_director` for batch image work instead of multiple `generate_image` calls. The API queues requests — it never silently drops them.
|
|
523
578
|
|
|
524
579
|
---
|
|
525
580
|
|
|
@@ -528,9 +583,11 @@ Kolbo allows 10 generation requests per minute per tool type. Wait 60 seconds an
|
|
|
528
583
|
Natural-language triggers that should prompt this skill + a tool call:
|
|
529
584
|
|
|
530
585
|
- "Generate an image of a neon-lit Tokyo street at night" → `list_models` (image) → `generate_image`
|
|
586
|
+
- "Use Midjourney to generate a Tokyo street" → `generate_image` with model "midjourney" (user named the model — skip `list_models`)
|
|
531
587
|
- "Remove the background from this image" → `list_models` (image_edit) → `generate_image_edit`
|
|
532
588
|
- "Create a storyboard for a coffee brand ad" → `list_models` (image) → `generate_creative_director`
|
|
533
589
|
- "Create a 5-second cinematic video of ocean waves at sunset" → `list_models` (video) → `generate_video` with camera + mood guidance
|
|
590
|
+
- "Make 5 videos with Seedance 2 Fast, 15s, 16:9" → fire all 5 `generate_video` calls in parallel (user specified everything — skip `list_models`, skip cost confirmation)
|
|
534
591
|
- "Animate this product photo with a 360° orbit" → `list_models` (video_from_image) → `generate_video_from_image`
|
|
535
592
|
- "Restyle this video as anime" → `generate_video_from_video`
|
|
536
593
|
- "Make this character talk with this voiceover" → `generate_lipsync`
|
|
@@ -549,7 +606,9 @@ Natural-language triggers that should prompt this skill + a tool call:
|
|
|
549
606
|
- "Host this HTML page" / "Publish this landing page" / "Give me a public URL for this file" → `upload_media` → share the returned `url` (Kolbo CDN serves any file type publicly)
|
|
550
607
|
- "What video models are available?" → `list_models` (video)
|
|
551
608
|
- "How many credits do I have?" → `check_credits`
|
|
552
|
-
- "What's in this image?" (with upload) →
|
|
609
|
+
- "What's in this image?" (with upload) → Read the image directly with your own vision — no Kolbo API call needed
|
|
610
|
+
- "Analyze these 10 frames" (with multiple images) → Read all images directly with your own vision — you handle up to 10 natively
|
|
611
|
+
- "Analyze these 5 videos" → upload all 5 with `upload_media`, then ONE `chat_send_message` with all 5 URLs in `media_urls`
|
|
553
612
|
- "Create motion graphics" / "animated text" / "title sequence" → load the `remotion-best-practices` skill for Remotion-based motion graphics
|
|
554
613
|
- "Edit this video" / "cut this clip" / "remove silence" / "add subtitles" / "convert to 9:16" → load the `video-production` skill for FFmpeg-based editing
|
|
555
614
|
- "Create a short-form video" / "make a reel" / "YouTube short" → load the `short-form-video` skill
|