@kolbo-cli/kolbo-windows-x64 1.0.0 → 1.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kolbo.exe +0 -0
- package/package.json +1 -1
- package/skills/kolbo/SKILL.md +28 -183
package/bin/kolbo.exe
CHANGED
|
Binary file
|
package/package.json
CHANGED
package/skills/kolbo/SKILL.md
CHANGED
|
@@ -1,19 +1,19 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: kolbo
|
|
3
|
-
description: Generate
|
|
3
|
+
description: Generate images, videos, music, speech, and sound effects using Kolbo AI. Use when asked to create any visual, audio, or video content — or to list available AI models or check credit balance.
|
|
4
4
|
---
|
|
5
5
|
|
|
6
|
-
# Kolbo AI — Creative Generation
|
|
6
|
+
# Kolbo AI — Creative Generation
|
|
7
7
|
|
|
8
|
-
You have
|
|
8
|
+
You have access to the Kolbo AI platform via MCP tools. Use them to generate images, videos, music, speech, and sound effects directly from conversation.
|
|
9
9
|
|
|
10
|
-
## Available
|
|
10
|
+
## Available Tools
|
|
11
11
|
|
|
12
12
|
| Tool | Description |
|
|
13
13
|
|------|-------------|
|
|
14
14
|
| `generate_image` | Create images from text prompts. Returns image URL(s). |
|
|
15
15
|
| `generate_video` | Create videos from text. Returns video URL. |
|
|
16
|
-
| `generate_video_from_image` | Animate a
|
|
16
|
+
| `generate_video_from_image` | Animate a static image into video. Returns video URL. |
|
|
17
17
|
| `generate_music` | Create music from descriptions. Returns audio URL. |
|
|
18
18
|
| `generate_speech` | Convert text to speech. Returns audio URL. |
|
|
19
19
|
| `generate_sound` | Generate sound effects. Returns audio URL. |
|
|
@@ -21,196 +21,41 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
|
|
|
21
21
|
| `check_credits` | Check remaining Kolbo credit balance. |
|
|
22
22
|
| `get_generation_status` | Poll status of an in-progress generation by ID. |
|
|
23
23
|
|
|
24
|
-
##
|
|
24
|
+
## Workflow
|
|
25
25
|
|
|
26
|
-
1. **Check credits**
|
|
27
|
-
2. **Discover models**
|
|
28
|
-
3. **Generate
|
|
29
|
-
4. **
|
|
30
|
-
5. **Share the URL** — after a successful generation, hand the real URL back to the user. Never fabricate URLs.
|
|
26
|
+
1. **Check credits** — call `check_credits` before generating to confirm balance
|
|
27
|
+
2. **Discover models** — call `list_models` with a `type` filter to get current model identifiers. Models change frequently; never hardcode them.
|
|
28
|
+
3. **Generate** — call the appropriate tool. Pass the `identifier` from `list_models` as `model`, or omit it to let Kolbo auto-select the best model.
|
|
29
|
+
4. **Result** — the tool polls internally and returns the final URL when ready.
|
|
31
30
|
|
|
32
|
-
|
|
31
|
+
## Model Types
|
|
32
|
+
|
|
33
|
+
Use these values with `list_models`:
|
|
33
34
|
|
|
34
35
|
| Type | Use for |
|
|
35
36
|
|------|---------|
|
|
36
|
-
| `image` |
|
|
37
|
+
| `image` | Image generation |
|
|
37
38
|
| `video` | Text-to-video |
|
|
38
39
|
| `video_from_image` | Image-to-video animation |
|
|
39
40
|
| `music` | Music generation |
|
|
40
41
|
| `speech` | Text-to-speech |
|
|
41
42
|
| `sound` | Sound effects |
|
|
42
43
|
|
|
43
|
-
|
|
44
|
-
|
|
45
|
-
Creative generations bill against the user's Kolbo credit balance. Order of expense (rough):
|
|
46
|
-
- **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s)
|
|
47
|
-
- **Medium**: music (~30s-2min)
|
|
48
|
-
- **Expensive**: video (~1-5min, highest credit cost)
|
|
49
|
-
|
|
50
|
-
Rule of thumb: confirm intent before firing off a video generation unless the user was explicit. For images, just generate.
|
|
51
|
-
|
|
52
|
-
---
|
|
53
|
-
|
|
54
|
-
## Image Prompts
|
|
55
|
-
|
|
56
|
-
### Rules
|
|
57
|
-
- **Clean prompts only.** No "Output:", "Tips:", "Notes:", "Resolution:", "Dimensions:", or any instructional/meta language inside the prompt. The prompt is what the model sees — anything not describing the image is noise.
|
|
58
|
-
- **Length**: focused 2-3 sentences beats a bloated paragraph. Only go longer when the concept genuinely needs it (complex scenes, multiple subjects, specific technical requirements). Match prompt length to complexity.
|
|
59
|
-
- **Order**: Subject → action/pose → environment → lighting → style.
|
|
60
|
-
- **Be specific about style** when it matters: "1970s film photography", "watercolor illustration on rough paper", "3D product render with studio softbox lighting" — not vague descriptors like "beautiful" or "high quality".
|
|
61
|
-
- **`enhance_prompt: true`** (default) will improve most prompts automatically. Turn it off only if the user's prompt is already fully engineered or they want literal wording.
|
|
62
|
-
|
|
63
|
-
### Image Editing (image-to-image)
|
|
64
|
-
When the model can see the uploaded image, describe the **change**, not the unchanged parts.
|
|
65
|
-
- Good: "Turn the sky orange and add drifting clouds"
|
|
66
|
-
- Bad: "A mountain landscape with an orange sky and drifting clouds" (re-describes what's already in the image)
|
|
67
|
-
|
|
68
|
-
Simple edits deserve simple prompts. Only elaborate for genuinely complex, multi-step transformations.
|
|
69
|
-
|
|
70
|
-
### Multi-Scene / Campaigns
|
|
71
|
-
For storyboards, campaigns, or character-consistent sequences, call `generate_image` once per scene with the same base style cues carried across prompts. Kolbo's web app has a dedicated Creative Director feature for this; in the CLI the workflow is sequential `generate_image` calls.
|
|
72
|
-
|
|
73
|
-
---
|
|
74
|
-
|
|
75
|
-
## Video Prompts
|
|
76
|
-
|
|
77
|
-
Video is the most expensive operation in the Kolbo catalog. Write prompts deliberately.
|
|
78
|
-
|
|
79
|
-
### Core Rules
|
|
80
|
-
- **Order**: Subject → Action → Camera → Style → Constraints → Audio
|
|
81
|
-
- **Length**: 80-280 words. Shorter = random. Longer = the model forgets the start.
|
|
82
|
-
- **Always specify at least one camera movement per shot.** Even "static wide shot" is a valid explicit choice — just don't leave it unsaid.
|
|
83
|
-
- **Character consistency**: when a character appears across shots, begin the prompt with the literal phrase `same character throughout all shots` to prevent identity drift.
|
|
84
|
-
- **Max 3 shots per prompt.** More shots cause the model to drift.
|
|
85
|
-
- **Duration-aware timecodes**: if the user gives a duration, space timecodes to fit (`[0s] [3s]` for 5s total; `[0s] [3s] [6s]` for 10s total). If no duration is given, describe shots sequentially without hardcoded timecodes.
|
|
86
|
-
|
|
87
|
-
### Image-to-Video
|
|
88
|
-
The model can see the starting frame. Describe **what happens**, not what the image looks like. Focus on motion, camera, and action — don't re-describe the subject or setting.
|
|
89
|
-
- Good: "Slow dolly-in on the subject. Her hair drifts in a light breeze. Soft particles float through the air. [6s]"
|
|
90
|
-
- Bad: "A woman with long brown hair standing in a forest, wearing a red dress, with golden sunlight..." (re-describes the image)
|
|
91
|
-
|
|
92
|
-
### Camera Vocabulary
|
|
93
|
-
|
|
94
|
-
Pick what fits the mood. Every shot gets at least one.
|
|
95
|
-
|
|
96
|
-
| Movement | Use for |
|
|
97
|
-
|----------|---------|
|
|
98
|
-
| `slow dolly-in` | Building intensity, focus pull |
|
|
99
|
-
| `pull-back` / `dolly out` | Scale reveal, loneliness, context |
|
|
100
|
-
| `extreme low-angle` | Power, heroic framing |
|
|
101
|
-
| `overhead top-down` | Geometry, pattern, abstraction |
|
|
102
|
-
| `360° orbit` | Product showcase, bullet-time moments |
|
|
103
|
-
| `handheld natural lag` | Urgency, documentary, grit |
|
|
104
|
-
| `tracking shot` | Continuous follow of a subject |
|
|
105
|
-
| `crash zoom` | Shock, impact moment |
|
|
106
|
-
| `aerial pull-back` | Epic reveal, landscape scale |
|
|
107
|
-
| `static drift` | Contemplative, subtle, meditative |
|
|
108
|
-
| `crane up` / `crane down` | Grandeur, establishing, dismissal |
|
|
109
|
-
| `whip pan` | Sharp transition, high energy |
|
|
110
|
-
|
|
111
|
-
### Physics Vocabulary (only name what matters for the scene)
|
|
112
|
-
|
|
113
|
-
- **Cloth**: `cloth inertia`, `fabric lags behind movement`
|
|
114
|
-
- **Water**: `water splashing with surface tension`, `droplets scattering`, `puddle mirror reflection`
|
|
115
|
-
- **Sand / dust**: `sand displacement`, `radial dust shockwave`
|
|
116
|
-
- **Hair**: `hair reacts to acceleration and wind`
|
|
117
|
-
- **Impact**: `skin distorting on impact`, `delayed follow-through`
|
|
118
|
-
- **Smoke**: `volumetric smoke curling and dissipating`
|
|
119
|
-
|
|
120
|
-
Don't stuff every category in every prompt — only name the physics that genuinely drives the shot.
|
|
121
|
-
|
|
122
|
-
### Multi-Shot Format
|
|
123
|
-
|
|
124
|
-
When the user wants a sequence (trailer, story, showcase), write each shot as a brief 1-2 sentence entry on its own line inside the prompt:
|
|
125
|
-
|
|
126
|
-
```
|
|
127
|
-
Shot 1: [action + camera movement]
|
|
128
|
-
Shot 2: [action + camera movement]
|
|
129
|
-
Shot 3: [action + camera movement]
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
Think like a director. Describe what **happens**, not what things **look** like.
|
|
133
|
-
|
|
134
|
-
### Mood Presets
|
|
135
|
-
|
|
136
|
-
Pick techniques that match the user's intent. A calm landscape and an action sequence need different tools.
|
|
44
|
+
## Tips
|
|
137
45
|
|
|
138
|
-
- **
|
|
139
|
-
- **
|
|
140
|
-
- **
|
|
141
|
-
- **
|
|
142
|
-
-
|
|
143
|
-
-
|
|
144
|
-
|
|
145
|
-
### Slow-Motion
|
|
146
|
-
|
|
147
|
-
Extreme slow-motion is a tool, not a freeze frame. Always describe the micro-movements that *continue* during the slow beat (hair drifting, droplets crawling, fabric rippling), and specify the snap-back to full speed when relevant.
|
|
148
|
-
|
|
149
|
-
Format: `extreme slow-motion [Xs] — [micro-movements in ultra slow-mo] — snap-back to full speed`
|
|
150
|
-
|
|
151
|
-
---
|
|
152
|
-
|
|
153
|
-
## Music Prompts
|
|
154
|
-
|
|
155
|
-
Describe **genre → mood → instrumentation → tempo → era**, in that order.
|
|
156
|
-
|
|
157
|
-
- `instrumental: true` excludes vocals.
|
|
158
|
-
- `lyrics` accepts actual lyric text the model should sing.
|
|
159
|
-
- `style` accepts short genre tags ("lo-fi hip hop", "orchestral cinematic", "80s synthwave").
|
|
160
|
-
- Good: "Upbeat 80s synthwave, analog synths, gated reverb drums, 120 BPM, driving bassline, no vocals"
|
|
161
|
-
- Bad: "A cool song" / "Something for a workout" (too vague)
|
|
162
|
-
|
|
163
|
-
---
|
|
164
|
-
|
|
165
|
-
## Speech (TTS)
|
|
166
|
-
|
|
167
|
-
- Call `list_models` with `type: speech` to get voice identifiers. Pass the `identifier` as `model` for a consistent voice.
|
|
168
|
-
- The voice **is** the model for speech — there is no separate voice parameter.
|
|
169
|
-
- For long text, split at natural sentence boundaries. Each generation has a character cap; chunk long-form content into multiple calls.
|
|
170
|
-
- For multilingual content, pick a voice that supports the target language from `list_models`.
|
|
171
|
-
|
|
172
|
-
---
|
|
173
|
-
|
|
174
|
-
## Sound Effects
|
|
175
|
-
|
|
176
|
-
- Describe the sound **literally and physically**. Avoid emotional framing.
|
|
177
|
-
- Good: "Heavy wooden door creaking open slowly, echoing in a stone hallway, followed by distant dripping water"
|
|
178
|
-
- Bad: "A scary sound" / "Creepy atmosphere" (the model can't render emotions directly — render the physical source)
|
|
179
|
-
|
|
180
|
-
---
|
|
181
|
-
|
|
182
|
-
## Image Analysis (when the user uploads images)
|
|
183
|
-
|
|
184
|
-
When the user shares an image and asks about it:
|
|
185
|
-
|
|
186
|
-
- **Analyze thoroughly**: describe composition, subjects, colors, lighting, style, text/signage, setting, mood, visible objects, and any embedded information (charts, diagrams, screenshots).
|
|
187
|
-
- **Reference specific regions** when helpful: "top-left corner", "in the foreground", "the figure on the right".
|
|
188
|
-
- **Extract text verbatim** when asked (OCR-style requests are fine).
|
|
189
|
-
- **Cannot identify real people.** Describe hair, clothing, pose, expression, and apparent role — but never name a specific individual, even a well-known public figure. If the user insists, decline and offer to describe instead.
|
|
190
|
-
- **Copyrighted content**: summarize and reference, don't reproduce verbatim large chunks.
|
|
191
|
-
- If the user wants an **edit** based on the analysis, hand off to `generate_video_from_image` (motion) or `generate_image` with an image-to-image model (visual edit) — see the Image Editing section above for prompt structure.
|
|
192
|
-
|
|
193
|
-
---
|
|
194
|
-
|
|
195
|
-
## Limitations & Safety
|
|
196
|
-
|
|
197
|
-
- **Real people**: never identify specific real individuals in photos, even public figures. Describe visible attributes only.
|
|
198
|
-
- **NSFW**: Kolbo enforces content safety at the model level. If a generation fails on safety grounds, rephrase the prompt rather than retrying identically.
|
|
199
|
-
- **Copyright**: style references are fine (e.g. "in the style of Studio Ghibli"); verbatim reproduction of copyrighted material is not.
|
|
200
|
-
- **No fabricated URLs**: only share URLs that actually came back from a tool call. Never guess a URL.
|
|
201
|
-
|
|
202
|
-
---
|
|
46
|
+
- **Images** are fastest (~10–30s). `enhance_prompt: true` is on by default.
|
|
47
|
+
- **Video** takes longest (~1–5 min). Check `supported_durations` and `supported_aspect_ratios` from `list_models` before generating.
|
|
48
|
+
- **Music** supports `style`, `instrumental`, and `lyrics` parameters.
|
|
49
|
+
- **Speech** — pass a voice `identifier` from `list_models` for a consistent voice.
|
|
50
|
+
- If a video generation times out, use `get_generation_status` with the returned generation ID to retrieve the result.
|
|
51
|
+
- Models marked `recommended: true` in `list_models` are Kolbo's top picks for quality and speed.
|
|
203
52
|
|
|
204
53
|
## Examples
|
|
205
54
|
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
|
|
212
|
-
|
|
213
|
-
- "Generate a door slam sound effect" → `list_models` (sound) → `generate_sound`
|
|
214
|
-
- "What video models are available?" → `list_models` (video)
|
|
215
|
-
- "How many credits do I have?" → `check_credits`
|
|
216
|
-
- "What's in this image?" (with upload) → describe per the Image Analysis section; no tool call needed unless the user asks to generate or edit
|
|
55
|
+
> "Generate an image of a neon-lit Tokyo street at night"
|
|
56
|
+
> "Create a 5-second video of ocean waves"
|
|
57
|
+
> "Make a lo-fi hip hop beat, instrumental only"
|
|
58
|
+
> "Convert this text to speech: Welcome to Kolbo"
|
|
59
|
+
> "Animate this image into a short video"
|
|
60
|
+
> "What image models are available?"
|
|
61
|
+
> "Check my credit balance"
|