@kolbo/kolbo-code-linux-arm64-musl 2.1.5 → 2.1.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/kolbo +0 -0
- package/package.json +1 -1
- package/skills/kolbo/SKILL.md +33 -14
- package/skills/ollama-vision/SKILL.md +0 -105
package/bin/kolbo
CHANGED
|
Binary file
|
package/package.json
CHANGED
package/skills/kolbo/SKILL.md
CHANGED
|
@@ -182,24 +182,29 @@ The regular `srt_url` groups words into readable subtitle lines (default 12 word
|
|
|
182
182
|
### Long Content
|
|
183
183
|
Transcription supports files up to 30 minutes. For longer content, split the file first or provide segments.
|
|
184
184
|
|
|
185
|
-
### Visual Video/Audio Analysis
|
|
186
|
-
`transcribe_audio` only extracts **speech**. If the user wants to understand **what's visually happening** in a video (scenes, actions, objects, on-screen text) or needs a multimodal AI to reason about the content, use `chat_send_message` with a video-capable model instead.
|
|
185
|
+
### Visual Video/Audio/Image Analysis
|
|
187
186
|
|
|
188
|
-
**
|
|
187
|
+
**DEFAULT RULE: When the user shares a video or image file without a specific instruction, always do visual analysis — never ask, never default to transcription.**
|
|
189
188
|
|
|
190
|
-
|
|
191
|
-
1. Upload the video with `upload_media` to get a stable CDN URL
|
|
192
|
-
2. Call `chat_send_message` with the video URL in the message and a video-capable model (e.g. `gemini-2.5-pro`)
|
|
193
|
-
3. Ask your analysis question: "Describe what happens in this video", "What products are shown?", "Summarize the key scenes"
|
|
189
|
+
`transcribe_audio` is ONLY for when the user explicitly says "transcribe", "subtitles", "SRT", or "what's being said". Everything else — "what do you see?", "describe this", "analyze this", "what's in this video?", "what prompts are shown?", or just pasting a file path with no instruction — is visual analysis via Gemini.
|
|
194
190
|
|
|
195
|
-
**
|
|
191
|
+
**NEVER use ffmpeg, `ollama-vision`, or extract frames manually. NEVER ask the user whether to transcribe or analyze — just execute visual analysis.**
|
|
196
192
|
|
|
197
|
-
|
|
198
|
-
|
|
199
|
-
|
|
200
|
-
|
|
201
|
-
|
|
202
|
-
|
|
193
|
+
**Workflow for visual analysis (do this every time):**
|
|
194
|
+
1. `upload_media({ source: "/absolute/local/path/to/file.mp4" })` → get CDN URL (skip if already a public URL)
|
|
195
|
+
2. `chat_send_message({ message: "<your question>", model: "gemini-2.5-pro", media_urls: ["<cdn-url>"] })`
|
|
196
|
+
|
|
197
|
+
**Routing table — commit to an action, do not ask:**
|
|
198
|
+
|
|
199
|
+
| Trigger | Action |
|
|
200
|
+
|---------|--------|
|
|
201
|
+
| User says "transcribe" / "subtitles" / "SRT" / "what's being said" | `transcribe_audio` only |
|
|
202
|
+
| User says "analyze" / "describe" / "what do you see" / "what's in this" / "what's happening" | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
|
|
203
|
+
| User shares a file path or video URL with no instruction | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
|
|
204
|
+
| User shares a video and asks about on-screen text / prompts / UI | Visual analysis — `upload_media` → `chat_send_message` + Gemini |
|
|
205
|
+
| User wants both transcript AND visual description | Both — run `transcribe_audio` AND `chat_send_message` + Gemini |
|
|
206
|
+
|
|
207
|
+
When in doubt, do visual analysis. Do not stop to ask.
|
|
203
208
|
|
|
204
209
|
---
|
|
205
210
|
|
|
@@ -429,6 +434,20 @@ When the user shares an image and asks about it:
|
|
|
429
434
|
|
|
430
435
|
---
|
|
431
436
|
|
|
437
|
+
## Sharing HTML Artifacts
|
|
438
|
+
|
|
439
|
+
When you generate an HTML, SVG, or Mermaid artifact in the chat, a **Share** button appears in the artifact preview toolbar (next to Desktop / Mobile). Clicking it:
|
|
440
|
+
|
|
441
|
+
1. Uploads the artifact to Kolbo's hosting platform
|
|
442
|
+
2. Copies a permanent public URL to the clipboard (e.g. `https://api.kolbo.ai/api/shared-artifact-raw/<token>`)
|
|
443
|
+
3. Shows a toast confirming the link was copied
|
|
444
|
+
|
|
445
|
+
Anyone with the URL can view the rendered page — no login required.
|
|
446
|
+
|
|
447
|
+
**Requirements:** You must be logged in (`kolbo auth login`). The share button returns an error toast if you are not authenticated.
|
|
448
|
+
|
|
449
|
+
---
|
|
450
|
+
|
|
432
451
|
## Kolbo Code Documentation
|
|
433
452
|
|
|
434
453
|
Full public documentation for Kolbo Code (the CLI you are running inside) lives at **[docs.kolbo.ai/docs/kolbo-code](https://docs.kolbo.ai/docs/kolbo-code)**. If the user asks about installation, authentication, voice input, supported languages, commands, or how to uninstall, point them to the matching page below rather than guessing:
|
|
@@ -1,105 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
name: ollama-vision
|
|
3
|
-
description: >
|
|
4
|
-
Batch image analysis using local Ollama + gemma4 (multimodal).
|
|
5
|
-
Use when the user needs to analyze, caption, classify, or extract text from images locally —
|
|
6
|
-
free, offline, no rate limits, no API key needed.
|
|
7
|
-
Keywords: image analysis, batch images, captions, OCR, vision, gemma4, ollama, local AI
|
|
8
|
-
---
|
|
9
|
-
|
|
10
|
-
# Ollama Vision — Batch Image Analysis with gemma4
|
|
11
|
-
|
|
12
|
-
## Setup (already done on this machine)
|
|
13
|
-
|
|
14
|
-
- Ollama installed and running (auto-starts on Windows boot)
|
|
15
|
-
- Model: `gemma4` (9.6 GB, multimodal)
|
|
16
|
-
- Python package: `ollama` v0.6.1 installed (pip, Python 3.10)
|
|
17
|
-
- REST API available at `http://localhost:11434`
|
|
18
|
-
|
|
19
|
-
## Core Pattern
|
|
20
|
-
|
|
21
|
-
```python
|
|
22
|
-
import ollama
|
|
23
|
-
|
|
24
|
-
response = ollama.chat(model='gemma4', messages=[{
|
|
25
|
-
'role': 'user',
|
|
26
|
-
'content': 'Your prompt here',
|
|
27
|
-
'images': ['path/to/image.jpg'] # omit for text-only
|
|
28
|
-
}])
|
|
29
|
-
print(response['message']['content'])
|
|
30
|
-
```
|
|
31
|
-
|
|
32
|
-
## Batch Image Captioning Script
|
|
33
|
-
|
|
34
|
-
```python
|
|
35
|
-
import ollama
|
|
36
|
-
from pathlib import Path
|
|
37
|
-
import csv
|
|
38
|
-
|
|
39
|
-
def caption_images(folder: str, prompt: str = "Write a short caption for this image.", output_csv: str = "captions.csv"):
|
|
40
|
-
images_dir = Path(folder)
|
|
41
|
-
extensions = {'.jpg', '.jpeg', '.png', '.webp', '.gif', '.bmp'}
|
|
42
|
-
image_files = [f for f in images_dir.iterdir() if f.suffix.lower() in extensions]
|
|
43
|
-
|
|
44
|
-
results = []
|
|
45
|
-
for i, img_path in enumerate(image_files, 1):
|
|
46
|
-
print(f"[{i}/{len(image_files)}] Processing {img_path.name}...")
|
|
47
|
-
try:
|
|
48
|
-
response = ollama.chat(model='gemma4', messages=[{
|
|
49
|
-
'role': 'user',
|
|
50
|
-
'content': prompt,
|
|
51
|
-
'images': [str(img_path)]
|
|
52
|
-
}])
|
|
53
|
-
caption = response['message']['content'].strip()
|
|
54
|
-
results.append({'file': img_path.name, 'caption': caption})
|
|
55
|
-
print(f" → {caption[:80]}...")
|
|
56
|
-
except Exception as e:
|
|
57
|
-
print(f" ERROR: {e}")
|
|
58
|
-
results.append({'file': img_path.name, 'caption': f'ERROR: {e}'})
|
|
59
|
-
|
|
60
|
-
with open(output_csv, 'w', newline='', encoding='utf-8') as f:
|
|
61
|
-
writer = csv.DictWriter(f, fieldnames=['file', 'caption'])
|
|
62
|
-
writer.writeheader()
|
|
63
|
-
writer.writerows(results)
|
|
64
|
-
|
|
65
|
-
print(f"\nDone! Saved {len(results)} captions to {output_csv}")
|
|
66
|
-
|
|
67
|
-
# Usage
|
|
68
|
-
caption_images("./images", prompt="Describe this image in one sentence.")
|
|
69
|
-
```
|
|
70
|
-
|
|
71
|
-
## Common Prompts
|
|
72
|
-
|
|
73
|
-
| Task | Prompt |
|
|
74
|
-
|------|--------|
|
|
75
|
-
| Caption | `"Write a short, descriptive caption for this image."` |
|
|
76
|
-
| Alt text | `"Write alt text for this image for accessibility."` |
|
|
77
|
-
| Classification | `"What category does this image belong to? Reply with one word."` |
|
|
78
|
-
| OCR | `"Extract all text visible in this image."` |
|
|
79
|
-
| Product description | `"Write a product description for the item shown in this image."` |
|
|
80
|
-
| Social media | `"Write a catchy Instagram caption for this image."` |
|
|
81
|
-
|
|
82
|
-
## REST API Alternative (no Python package needed)
|
|
83
|
-
|
|
84
|
-
```python
|
|
85
|
-
import requests, base64
|
|
86
|
-
|
|
87
|
-
def analyze_image(image_path: str, prompt: str) -> str:
|
|
88
|
-
with open(image_path, "rb") as f:
|
|
89
|
-
img_b64 = base64.b64encode(f.read()).decode()
|
|
90
|
-
|
|
91
|
-
response = requests.post("http://localhost:11434/api/generate", json={
|
|
92
|
-
"model": "gemma4",
|
|
93
|
-
"prompt": prompt,
|
|
94
|
-
"images": [img_b64],
|
|
95
|
-
"stream": False
|
|
96
|
-
})
|
|
97
|
-
return response.json()["response"]
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
## Tips
|
|
101
|
-
|
|
102
|
-
- gemma4 handles JPG, PNG, WEBP, GIF, BMP
|
|
103
|
-
- For large batches, add `time.sleep(0.5)` between requests to avoid overloading
|
|
104
|
-
- Results are best when prompts are specific ("describe the main subject" vs "describe this")
|
|
105
|
-
- Ollama must be running — check with `ollama list` in terminal
|