@kolbo/kolbo-code-linux-arm64-musl 2.1.4 → 2.1.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/bin/kolbo CHANGED
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@kolbo/kolbo-code-linux-arm64-musl",
3
- "version": "2.1.4",
3
+ "version": "2.1.6",
4
4
  "os": [
5
5
  "linux"
6
6
  ],
@@ -98,12 +98,36 @@ You have direct access to the Kolbo AI creative platform via MCP tools (auto-con
98
98
 
99
99
  ### Cost Awareness
100
100
 
101
- Creative generations bill against the user's Kolbo credit balance. Order of expense (rough):
102
- - **Cheap & fast**: speech (~5-30s), sound effects (~5-30s), image (~10-30s), transcription (by duration)
103
- - **Medium**: music (~30s-2min), 3D (~1-3min)
104
- - **Expensive**: video (~1-5min, highest credit cost), lipsync (~1-3min)
105
-
106
- Rule of thumb: confirm intent before firing off a video generation unless the user was explicit. For images, just generate.
101
+ Creative generations bill against the user's Kolbo credit balance. **Billing units differ by type** — always apply the correct formula before generating.
102
+
103
+ | Type | Billing unit | Credit range | Example |
104
+ |------|-------------|-------------|---------|
105
+ | **Image** | per image (flat) | 1–30 cr | Flux.1 Fast = 1 cr, Midjourney = 4 cr, 4K variants cost more |
106
+ | **Image edit** | per image (flat) | 2–20 cr | |
107
+ | **Video** | **cr/s × duration** | 2–30 cr/s | Kandinsky 5 Fast × 5s = 10 cr; Seedance 2.0 × 10s = 300 cr |
108
+ | **Video from image** | **cr/s × duration** | 4–30 cr/s | Same per-second rule as text-to-video |
109
+ | **Lipsync** | **cr/s × duration** | 5–20 cr/s | |
110
+ | **Music** | per generation (flat) | 15–60 cr | Suno v5 = 15 cr; ElevenLabs Music = 60 cr |
111
+ | **Speech (TTS)** | per 100 characters | 2–5 cr/100 chars | ElevenLabs (5) × 500 chars = 25 cr; Google (2) × 500 chars = 10 cr |
112
+ | **Sound effects** | per generation (flat) | 4–7 cr | |
113
+ | **3D model** | per model (flat) | 5–300 cr | Trellis = 5 cr; Meshy v6 = 150 cr; Marble 1.1 = 300 cr |
114
+ | **Transcription (stt)** | per minute of audio | model.credit × duration_minutes | |
115
+
116
+ **Calculation formulas — always apply before generating:**
117
+ - **Video / Lipsync**: `total = model_credit_per_second × duration_seconds`
118
+ - Always call `list_models` first to get the exact `credit` value, then multiply by the requested duration.
119
+ - Never assume the credit shown is a flat per-generation cost for these types.
120
+ - **Music**: flat per generation — `total = model_credit` (duration does not change the cost).
121
+ - **TTS**: `total = model_credit × ceil(character_count / 100)`
122
+ - Count the actual characters in the text before estimating. 1000 chars with ElevenLabs = 50 credits.
123
+ - **Images / 3D / Sound effects**: `total = model_credit × quantity`
124
+
125
+ **When to confirm before generating:**
126
+ - Any video or lipsync generation — always state the estimated credit cost before firing. Formula: `credit/s × seconds`.
127
+ - Music — state the flat credit cost (from `list_models`) before generating.
128
+ - TTS with more than 500 characters — mention the cost first.
129
+ - 3D models with `credit ≥ 100` — confirm before generating.
130
+ - Images: just generate unless the balance is low.
107
131
 
108
132
  ### Rate Limiting
109
133
  Kolbo enforces **10 generation requests per minute per user per tool type** (e.g. 10 image calls + 10 video calls = fine, but 11 image calls in 1 minute = rate limited). General media requests are capped at **300 per minute**.
@@ -158,24 +182,29 @@ The regular `srt_url` groups words into readable subtitle lines (default 12 word
158
182
  ### Long Content
159
183
  Transcription supports files up to 30 minutes. For longer content, split the file first or provide segments.
160
184
 
161
- ### Visual Video/Audio Analysis (what's happening, not just what's said)
162
- `transcribe_audio` only extracts **speech**. If the user wants to understand **what's visually happening** in a video (scenes, actions, objects, on-screen text) or needs a multimodal AI to reason about the content, use `chat_send_message` with a video-capable model instead.
185
+ ### Visual Video/Audio/Image Analysis
186
+
187
+ **DEFAULT RULE: When the user shares a video or image file without a specific instruction, always do visual analysis — never ask, never default to transcription.**
188
+
189
+ `transcribe_audio` is ONLY for when the user explicitly says "transcribe", "subtitles", "SRT", or "what's being said". Everything else — "what do you see?", "describe this", "analyze this", "what's in this video?", "what prompts are shown?", or just pasting a file path with no instruction — is visual analysis via Gemini.
163
190
 
164
- **Video-capable models**: `gemini-2.5-pro`, `gemini-2.5-flash` these can watch video and analyze visual content.
191
+ **NEVER use ffmpeg, `ollama-vision`, or extract frames manually. NEVER ask the user whether to transcribe or analyze — just execute visual analysis.**
165
192
 
166
- **Workflow for visual analysis:**
167
- 1. Upload the video with `upload_media` to get a stable CDN URL
168
- 2. Call `chat_send_message` with the video URL in the message and a video-capable model (e.g. `gemini-2.5-pro`)
169
- 3. Ask your analysis question: "Describe what happens in this video", "What products are shown?", "Summarize the key scenes"
193
+ **Workflow for visual analysis (do this every time):**
194
+ 1. `upload_media({ source: "/absolute/local/path/to/file.mp4" })` get CDN URL (skip if already a public URL)
195
+ 2. `chat_send_message({ message: "<your question>", model: "gemini-2.5-pro", media_urls: ["<cdn-url>"] })`
170
196
 
171
- **When to use which:**
197
+ **Routing table — commit to an action, do not ask:**
172
198
 
173
- | User intent | Tool |
174
- |-------------|------|
175
- | "Transcribe this" / "What's being said?" | `transcribe_audio` |
176
- | "Generate subtitles" / "Word-by-word timing" | `transcribe_audio` |
177
- | "What's happening in this video?" / "Describe the scenes" | `chat_send_message` + Gemini |
178
- | "Analyze this video and transcribe it" | Both — `transcribe_audio` for text + `chat_send_message` for visual |
199
+ | Trigger | Action |
200
+ |---------|--------|
201
+ | User says "transcribe" / "subtitles" / "SRT" / "what's being said" | `transcribe_audio` only |
202
+ | User says "analyze" / "describe" / "what do you see" / "what's in this" / "what's happening" | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
203
+ | User shares a file path or video URL with no instruction | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
204
+ | User shares a video and asks about on-screen text / prompts / UI | Visual analysis — `upload_media` `chat_send_message` + Gemini |
205
+ | User wants both transcript AND visual description | Both — run `transcribe_audio` AND `chat_send_message` + Gemini |
206
+
207
+ When in doubt, do visual analysis. Do not stop to ask.
179
208
 
180
209
  ---
181
210
 
@@ -405,6 +434,20 @@ When the user shares an image and asks about it:
405
434
 
406
435
  ---
407
436
 
437
+ ## Sharing HTML Artifacts
438
+
439
+ When you generate an HTML, SVG, or Mermaid artifact in the chat, a **Share** button appears in the artifact preview toolbar (next to Desktop / Mobile). Clicking it:
440
+
441
+ 1. Uploads the artifact to Kolbo's hosting platform
442
+ 2. Copies a permanent public URL to the clipboard (e.g. `https://api.kolbo.ai/api/shared-artifact-raw/<token>`)
443
+ 3. Shows a toast confirming the link was copied
444
+
445
+ Anyone with the URL can view the rendered page — no login required.
446
+
447
+ **Requirements:** You must be logged in (`kolbo auth login`). The share button returns an error toast if you are not authenticated.
448
+
449
+ ---
450
+
408
451
  ## Kolbo Code Documentation
409
452
 
410
453
  Full public documentation for Kolbo Code (the CLI you are running inside) lives at **[docs.kolbo.ai/docs/kolbo-code](https://docs.kolbo.ai/docs/kolbo-code)**. If the user asks about installation, authentication, voice input, supported languages, commands, or how to uninstall, point them to the matching page below rather than guessing:
@@ -1,105 +0,0 @@
1
- ---
2
- name: ollama-vision
3
- description: >
4
- Batch image analysis using local Ollama + gemma4 (multimodal).
5
- Use when the user needs to analyze, caption, classify, or extract text from images locally —
6
- free, offline, no rate limits, no API key needed.
7
- Keywords: image analysis, batch images, captions, OCR, vision, gemma4, ollama, local AI
8
- ---
9
-
10
- # Ollama Vision — Batch Image Analysis with gemma4
11
-
12
- ## Setup (already done on this machine)
13
-
14
- - Ollama installed and running (auto-starts on Windows boot)
15
- - Model: `gemma4` (9.6 GB, multimodal)
16
- - Python package: `ollama` v0.6.1 installed (pip, Python 3.10)
17
- - REST API available at `http://localhost:11434`
18
-
19
- ## Core Pattern
20
-
21
- ```python
22
- import ollama
23
-
24
- response = ollama.chat(model='gemma4', messages=[{
25
- 'role': 'user',
26
- 'content': 'Your prompt here',
27
- 'images': ['path/to/image.jpg'] # omit for text-only
28
- }])
29
- print(response['message']['content'])
30
- ```
31
-
32
- ## Batch Image Captioning Script
33
-
34
- ```python
35
- import ollama
36
- from pathlib import Path
37
- import csv
38
-
39
- def caption_images(folder: str, prompt: str = "Write a short caption for this image.", output_csv: str = "captions.csv"):
40
- images_dir = Path(folder)
41
- extensions = {'.jpg', '.jpeg', '.png', '.webp', '.gif', '.bmp'}
42
- image_files = [f for f in images_dir.iterdir() if f.suffix.lower() in extensions]
43
-
44
- results = []
45
- for i, img_path in enumerate(image_files, 1):
46
- print(f"[{i}/{len(image_files)}] Processing {img_path.name}...")
47
- try:
48
- response = ollama.chat(model='gemma4', messages=[{
49
- 'role': 'user',
50
- 'content': prompt,
51
- 'images': [str(img_path)]
52
- }])
53
- caption = response['message']['content'].strip()
54
- results.append({'file': img_path.name, 'caption': caption})
55
- print(f" → {caption[:80]}...")
56
- except Exception as e:
57
- print(f" ERROR: {e}")
58
- results.append({'file': img_path.name, 'caption': f'ERROR: {e}'})
59
-
60
- with open(output_csv, 'w', newline='', encoding='utf-8') as f:
61
- writer = csv.DictWriter(f, fieldnames=['file', 'caption'])
62
- writer.writeheader()
63
- writer.writerows(results)
64
-
65
- print(f"\nDone! Saved {len(results)} captions to {output_csv}")
66
-
67
- # Usage
68
- caption_images("./images", prompt="Describe this image in one sentence.")
69
- ```
70
-
71
- ## Common Prompts
72
-
73
- | Task | Prompt |
74
- |------|--------|
75
- | Caption | `"Write a short, descriptive caption for this image."` |
76
- | Alt text | `"Write alt text for this image for accessibility."` |
77
- | Classification | `"What category does this image belong to? Reply with one word."` |
78
- | OCR | `"Extract all text visible in this image."` |
79
- | Product description | `"Write a product description for the item shown in this image."` |
80
- | Social media | `"Write a catchy Instagram caption for this image."` |
81
-
82
- ## REST API Alternative (no Python package needed)
83
-
84
- ```python
85
- import requests, base64
86
-
87
- def analyze_image(image_path: str, prompt: str) -> str:
88
- with open(image_path, "rb") as f:
89
- img_b64 = base64.b64encode(f.read()).decode()
90
-
91
- response = requests.post("http://localhost:11434/api/generate", json={
92
- "model": "gemma4",
93
- "prompt": prompt,
94
- "images": [img_b64],
95
- "stream": False
96
- })
97
- return response.json()["response"]
98
- ```
99
-
100
- ## Tips
101
-
102
- - gemma4 handles JPG, PNG, WEBP, GIF, BMP
103
- - For large batches, add `time.sleep(0.5)` between requests to avoid overloading
104
- - Results are best when prompts are specific ("describe the main subject" vs "describe this")
105
- - Ollama must be running — check with `ollama list` in terminal