@kolbo-cli/kolbo-windows-x64 1.0.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/kolbo.exe CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kolbo-cli/kolbo-windows-x64",
3
- "version": "1.0.0",
3
+ "version": "1.5.0",
4
4
  "os": [
5
5
  "win32"
6
6
  ],
@@ -1,19 +1,19 @@
1
1
  ---
2
2
  name: kolbo
3
- description: Generate or analyze creative media through Kolbo AI. Load this skill whenever the user asks to create, edit, prompt, or analyze images, videos, music, speech, or sound effects — or to list available AI models / check credit balance. It contains the MCP tool workflow and the prompt-engineering rules for each media type.
3
+ description: Generate images, videos, music, speech, and sound effects using Kolbo AI. Use when asked to create any visual, audio, or video content — or to list available AI models or check credit balance.
4
4
  ---
5
5
 
6
- # Kolbo AI — Creative Generation & Analysis
6
+ # Kolbo AI — Creative Generation
7
7
 
8
- You have direct access to the Kolbo AI creative platform via MCP tools (auto-configured by `kolbo auth login`). Use them to generate and deliver real content do NOT just describe what you would create.
8
+ You have access to the Kolbo AI platform via MCP tools. Use them to generate images, videos, music, speech, and sound effects directly from conversation.
9
9
 
10
- ## Available MCP Tools
10
+ ## Available Tools
11
11
 
12
12
  | Tool | Description |
13
13
  |------|-------------|
14
14
  | `generate_image` | Create images from text prompts. Returns image URL(s). |
15
15
  | `generate_video` | Create videos from text. Returns video URL. |
16
- | `generate_video_from_image` | Animate a still image into video. Returns video URL. |
16
+ | `generate_video_from_image` | Animate a static image into video. Returns video URL. |
17
17
  | `generate_music` | Create music from descriptions. Returns audio URL. |
18
18
  | `generate_speech` | Convert text to speech. Returns audio URL. |
19
19
  | `generate_sound` | Generate sound effects. Returns audio URL. |
@@ -21,196 +21,41 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
21
21
  | `check_credits` | Check remaining Kolbo credit balance. |
22
22
  | `get_generation_status` | Poll status of an in-progress generation by ID. |
23
23
 
24
- ## Core Workflow
24
+ ## Workflow
25
25
 
26
- 1. **Check credits** with `check_credits` at the start of any creative session (once is enough).
27
- 2. **Discover models** with `list_models` using a `type` filter. **Always do this before calling a generation tool — never hardcode model identifiers.** Models are added, removed, and updated frequently.
28
- 3. **Generate**: call the appropriate tool. Omit `model` to let Kolbo auto-select the best model (recommended default), or pass an `identifier` from `list_models` for explicit control. Models marked `recommended: true` are Kolbo's top picks for quality and speed.
29
- 4. **Polling is internal** — the tool returns the final URL(s) when ready. If a video generation times out, call `get_generation_status` with the returned generation ID to retrieve the result.
30
- 5. **Share the URL** — after a successful generation, hand the real URL back to the user. Never fabricate URLs.
26
+ 1. **Check credits** call `check_credits` before generating to confirm balance
27
+ 2. **Discover models** call `list_models` with a `type` filter to get current model identifiers. Models change frequently; never hardcode them.
28
+ 3. **Generate** call the appropriate tool. Pass the `identifier` from `list_models` as `model`, or omit it to let Kolbo auto-select the best model.
29
+ 4. **Result** — the tool polls internally and returns the final URL when ready.
31
30
 
32
- ### Model Types (for `list_models`)
31
+ ## Model Types
32
+
33
+ Use these values with `list_models`:
33
34
 
34
35
  | Type | Use for |
35
36
  |------|---------|
36
- | `image` | Still-image generation |
37
+ | `image` | Image generation |
37
38
  | `video` | Text-to-video |
38
39
  | `video_from_image` | Image-to-video animation |
39
40
  | `music` | Music generation |
40
41
  | `speech` | Text-to-speech |
41
42
  | `sound` | Sound effects |
42
43
 
43
- ### Cost Awareness
44
-
45
- Creative generations bill against the user's Kolbo credit balance. Order of expense (rough):
46
- - **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s)
47
- - **Medium**: music (~30s-2min)
48
- - **Expensive**: video (~1-5min, highest credit cost)
49
-
50
- Rule of thumb: confirm intent before firing off a video generation unless the user was explicit. For images, just generate.
51
-
52
- ---
53
-
54
- ## Image Prompts
55
-
56
- ### Rules
57
- - **Clean prompts only.** No "Output:", "Tips:", "Notes:", "Resolution:", "Dimensions:", or any instructional/meta language inside the prompt. The prompt is what the model sees — anything not describing the image is noise.
58
- - **Length**: focused 2-3 sentences beats a bloated paragraph. Only go longer when the concept genuinely needs it (complex scenes, multiple subjects, specific technical requirements). Match prompt length to complexity.
59
- - **Order**: Subject → action/pose → environment → lighting → style.
60
- - **Be specific about style** when it matters: "1970s film photography", "watercolor illustration on rough paper", "3D product render with studio softbox lighting" — not vague descriptors like "beautiful" or "high quality".
61
- - **`enhance_prompt: true`** (default) will improve most prompts automatically. Turn it off only if the user's prompt is already fully engineered or they want literal wording.
62
-
63
- ### Image Editing (image-to-image)
64
- When the model can see the uploaded image, describe the **change**, not the unchanged parts.
65
- - Good: "Turn the sky orange and add drifting clouds"
66
- - Bad: "A mountain landscape with an orange sky and drifting clouds" (re-describes what's already in the image)
67
-
68
- Simple edits deserve simple prompts. Only elaborate for genuinely complex, multi-step transformations.
69
-
70
- ### Multi-Scene / Campaigns
71
- For storyboards, campaigns, or character-consistent sequences, call `generate_image` once per scene with the same base style cues carried across prompts. Kolbo's web app has a dedicated Creative Director feature for this; in the CLI the workflow is sequential `generate_image` calls.
72
-
73
- ---
74
-
75
- ## Video Prompts
76
-
77
- Video is the most expensive operation in the Kolbo catalog. Write prompts deliberately.
78
-
79
- ### Core Rules
80
- - **Order**: Subject → Action → Camera → Style → Constraints → Audio
81
- - **Length**: 80-280 words. Shorter = random. Longer = the model forgets the start.
82
- - **Always specify at least one camera movement per shot.** Even "static wide shot" is a valid explicit choice — just don't leave it unsaid.
83
- - **Character consistency**: when a character appears across shots, begin the prompt with the literal phrase `same character throughout all shots` to prevent identity drift.
84
- - **Max 3 shots per prompt.** More shots cause the model to drift.
85
- - **Duration-aware timecodes**: if the user gives a duration, space timecodes to fit (`[0s] [3s]` for 5s total; `[0s] [3s] [6s]` for 10s total). If no duration is given, describe shots sequentially without hardcoded timecodes.
86
-
87
- ### Image-to-Video
88
- The model can see the starting frame. Describe **what happens**, not what the image looks like. Focus on motion, camera, and action — don't re-describe the subject or setting.
89
- - Good: "Slow dolly-in on the subject. Her hair drifts in a light breeze. Soft particles float through the air. [6s]"
90
- - Bad: "A woman with long brown hair standing in a forest, wearing a red dress, with golden sunlight..." (re-describes the image)
91
-
92
- ### Camera Vocabulary
93
-
94
- Pick what fits the mood. Every shot gets at least one.
95
-
96
- | Movement | Use for |
97
- |----------|---------|
98
- | `slow dolly-in` | Building intensity, focus pull |
99
- | `pull-back` / `dolly out` | Scale reveal, loneliness, context |
100
- | `extreme low-angle` | Power, heroic framing |
101
- | `overhead top-down` | Geometry, pattern, abstraction |
102
- | `360° orbit` | Product showcase, bullet-time moments |
103
- | `handheld natural lag` | Urgency, documentary, grit |
104
- | `tracking shot` | Continuous follow of a subject |
105
- | `crash zoom` | Shock, impact moment |
106
- | `aerial pull-back` | Epic reveal, landscape scale |
107
- | `static drift` | Contemplative, subtle, meditative |
108
- | `crane up` / `crane down` | Grandeur, establishing, dismissal |
109
- | `whip pan` | Sharp transition, high energy |
110
-
111
- ### Physics Vocabulary (only name what matters for the scene)
112
-
113
- - **Cloth**: `cloth inertia`, `fabric lags behind movement`
114
- - **Water**: `water splashing with surface tension`, `droplets scattering`, `puddle mirror reflection`
115
- - **Sand / dust**: `sand displacement`, `radial dust shockwave`
116
- - **Hair**: `hair reacts to acceleration and wind`
117
- - **Impact**: `skin distorting on impact`, `delayed follow-through`
118
- - **Smoke**: `volumetric smoke curling and dissipating`
119
-
120
- Don't stuff every category in every prompt — only name the physics that genuinely drives the shot.
121
-
122
- ### Multi-Shot Format
123
-
124
- When the user wants a sequence (trailer, story, showcase), write each shot as a brief 1-2 sentence entry on its own line inside the prompt:
125
-
126
- ```
127
- Shot 1: [action + camera movement]
128
- Shot 2: [action + camera movement]
129
- Shot 3: [action + camera movement]
130
- ```
131
-
132
- Think like a director. Describe what **happens**, not what things **look** like.
133
-
134
- ### Mood Presets
135
-
136
- Pick techniques that match the user's intent. A calm landscape and an action sequence need different tools.
44
+ ## Tips
137
45
 
138
- - **Cinematic / dramatic**: slow dolly-in, anamorphic 2.39:1, shallow depth of field, volumetric light, subtle film grain
139
- - **Product showcase**: 360° orbit, clean white or gradient backdrop, macro detail inserts, smooth tracking
140
- - **Dreamy / ethereal**: slow crane up, soft diffused light, gentle particle drift, muted pastels, static drift moments
141
- - **Action / intense**: crash zoom, handheld natural lag, extreme slow-motion at the peak beat, high contrast, fast cuts
142
- - **Nature / landscape**: aerial pull-back, golden hour lighting, wind physics on foliage, wide establishing shots
143
- - **Abstract / motion graphics**: overhead top-down, geometric patterns, bold color blocks, rhythmic cutting
144
-
145
- ### Slow-Motion
146
-
147
- Extreme slow-motion is a tool, not a freeze frame. Always describe the micro-movements that *continue* during the slow beat (hair drifting, droplets crawling, fabric rippling), and specify the snap-back to full speed when relevant.
148
-
149
- Format: `extreme slow-motion [Xs] — [micro-movements in ultra slow-mo] — snap-back to full speed`
150
-
151
- ---
152
-
153
- ## Music Prompts
154
-
155
- Describe **genre → mood → instrumentation → tempo → era**, in that order.
156
-
157
- - `instrumental: true` excludes vocals.
158
- - `lyrics` accepts actual lyric text the model should sing.
159
- - `style` accepts short genre tags ("lo-fi hip hop", "orchestral cinematic", "80s synthwave").
160
- - Good: "Upbeat 80s synthwave, analog synths, gated reverb drums, 120 BPM, driving bassline, no vocals"
161
- - Bad: "A cool song" / "Something for a workout" (too vague)
162
-
163
- ---
164
-
165
- ## Speech (TTS)
166
-
167
- - Call `list_models` with `type: speech` to get voice identifiers. Pass the `identifier` as `model` for a consistent voice.
168
- - The voice **is** the model for speech — there is no separate voice parameter.
169
- - For long text, split at natural sentence boundaries. Each generation has a character cap; chunk long-form content into multiple calls.
170
- - For multilingual content, pick a voice that supports the target language from `list_models`.
171
-
172
- ---
173
-
174
- ## Sound Effects
175
-
176
- - Describe the sound **literally and physically**. Avoid emotional framing.
177
- - Good: "Heavy wooden door creaking open slowly, echoing in a stone hallway, followed by distant dripping water"
178
- - Bad: "A scary sound" / "Creepy atmosphere" (the model can't render emotions directly — render the physical source)
179
-
180
- ---
181
-
182
- ## Image Analysis (when the user uploads images)
183
-
184
- When the user shares an image and asks about it:
185
-
186
- - **Analyze thoroughly**: describe composition, subjects, colors, lighting, style, text/signage, setting, mood, visible objects, and any embedded information (charts, diagrams, screenshots).
187
- - **Reference specific regions** when helpful: "top-left corner", "in the foreground", "the figure on the right".
188
- - **Extract text verbatim** when asked (OCR-style requests are fine).
189
- - **Cannot identify real people.** Describe hair, clothing, pose, expression, and apparent role — but never name a specific individual, even a well-known public figure. If the user insists, decline and offer to describe instead.
190
- - **Copyrighted content**: summarize and reference, don't reproduce verbatim large chunks.
191
- - If the user wants an **edit** based on the analysis, hand off to `generate_video_from_image` (motion) or `generate_image` with an image-to-image model (visual edit) — see the Image Editing section above for prompt structure.
192
-
193
- ---
194
-
195
- ## Limitations & Safety
196
-
197
- - **Real people**: never identify specific real individuals in photos, even public figures. Describe visible attributes only.
198
- - **NSFW**: Kolbo enforces content safety at the model level. If a generation fails on safety grounds, rephrase the prompt rather than retrying identically.
199
- - **Copyright**: style references are fine (e.g. "in the style of Studio Ghibli"); verbatim reproduction of copyrighted material is not.
200
- - **No fabricated URLs**: only share URLs that actually came back from a tool call. Never guess a URL.
201
-
202
- ---
46
+ - **Images** are fastest (~10–30s). `enhance_prompt: true` is on by default.
47
+ - **Video** takes longest (~1–5 min). Check `supported_durations` and `supported_aspect_ratios` from `list_models` before generating.
48
+ - **Music** supports `style`, `instrumental`, and `lyrics` parameters.
49
+ - **Speech** pass a voice `identifier` from `list_models` for a consistent voice.
50
+ - If a video generation times out, use `get_generation_status` with the returned generation ID to retrieve the result.
51
+ - Models marked `recommended: true` in `list_models` are Kolbo's top picks for quality and speed.
203
52
 
204
53
  ## Examples
205
54
 
206
- Natural-language triggers that should prompt this skill + a tool call:
207
-
208
- - "Generate an image of a neon-lit Tokyo street at night" → `list_models` (image) → `generate_image`
209
- - "Create a 5-second cinematic video of ocean waves at sunset" → `list_models` (video) → `generate_video` with camera + mood guidance
210
- - "Animate this product photo with a 360° orbit" → `list_models` (video_from_image) → `generate_video_from_image`
211
- - "Make a lo-fi hip hop beat, instrumental, 85 BPM" → `list_models` (music) → `generate_music`
212
- - "Say this in English with a natural female voice: Welcome to Kolbo" → `list_models` (speech) → `generate_speech`
213
- - "Generate a door slam sound effect" → `list_models` (sound) → `generate_sound`
214
- - "What video models are available?" → `list_models` (video)
215
- - "How many credits do I have?" → `check_credits`
216
- - "What's in this image?" (with upload) → describe per the Image Analysis section; no tool call needed unless the user asks to generate or edit
55
+ > "Generate an image of a neon-lit Tokyo street at night"
56
+ > "Create a 5-second video of ocean waves"
57
+ > "Make a lo-fi hip hop beat, instrumental only"
58
+ > "Convert this text to speech: Welcome to Kolbo"
59
+ > "Animate this image into a short video"
60
+ > "What image models are available?"
61
+ > "Check my credit balance"