@biggora/claude-plugins 1.0.0 ā 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +13 -0
- package/CLAUDE.md +55 -0
- package/LICENSE +1 -1
- package/README.md +208 -39
- package/bin/cli.js +39 -0
- package/package.json +30 -17
- package/registry/registry.json +166 -1
- package/registry/schema.json +10 -0
- package/src/commands/skills/add.js +194 -0
- package/src/commands/skills/list.js +52 -0
- package/src/commands/skills/remove.js +27 -0
- package/src/commands/skills/update.js +74 -0
- package/src/config.js +5 -0
- package/src/skills/codex-cli/SKILL.md +265 -0
- package/src/skills/commafeed-api/SKILL.md +1012 -0
- package/src/skills/gemini-cli/SKILL.md +379 -0
- package/src/skills/gemini-cli/references/commands.md +145 -0
- package/src/skills/gemini-cli/references/configuration.md +182 -0
- package/src/skills/gemini-cli/references/headless-and-scripting.md +181 -0
- package/src/skills/gemini-cli/references/mcp-and-extensions.md +254 -0
- package/src/skills/n8n-api/SKILL.md +623 -0
- package/src/skills/notebook-lm/SKILL.md +217 -0
- package/src/skills/notebook-lm/references/artifact-options.md +168 -0
- package/src/skills/notebook-lm/references/auth.md +58 -0
- package/src/skills/notebook-lm/references/workflows.md +144 -0
- package/src/skills/screen-recording/SKILL.md +309 -0
- package/src/skills/screen-recording/references/approach1-programmatic.md +311 -0
- package/src/skills/screen-recording/references/approach2-xvfb.md +232 -0
- package/src/skills/screen-recording/references/design-patterns.md +168 -0
- package/src/skills/test-mobile-app/SKILL.md +212 -0
- package/src/skills/test-mobile-app/references/report-template.md +95 -0
- package/src/skills/test-mobile-app/references/setup-appium.md +154 -0
- package/src/skills/test-mobile-app/scripts/analyze_apk.py +164 -0
- package/src/skills/test-mobile-app/scripts/check_environment.py +116 -0
- package/src/skills/test-mobile-app/scripts/generate_report.py +250 -0
- package/src/skills/test-mobile-app/scripts/run_tests.py +326 -0
- package/src/skills/test-web-ui/SKILL.md +232 -0
- package/src/skills/test-web-ui/references/test_case_schema.md +102 -0
- package/src/skills/test-web-ui/scripts/discover.py +176 -0
- package/src/skills/test-web-ui/scripts/generate_report.py +237 -0
- package/src/skills/test-web-ui/scripts/run_tests.py +296 -0
- package/src/skills/text-to-speech/SKILL.md +236 -0
- package/src/skills/text-to-speech/references/espeak-cli.md +277 -0
- package/src/skills/text-to-speech/references/kokoro-onnx.md +124 -0
- package/src/skills/text-to-speech/references/online-engines.md +128 -0
- package/src/skills/text-to-speech/references/pyttsx3-espeak.md +143 -0
- package/src/skills/tm-search/SKILL.md +240 -0
- package/src/skills/tm-search/references/field-guide.md +79 -0
- package/src/skills/tm-search/references/scraping-fallback.md +140 -0
- package/src/skills/tm-search/scripts/tm_search.py +375 -0
- package/src/skills/wp-rest-api/SKILL.md +114 -0
- package/src/skills/wp-rest-api/references/authentication.md +18 -0
- package/src/skills/wp-rest-api/references/custom-content-types.md +20 -0
- package/src/skills/wp-rest-api/references/discovery-and-params.md +20 -0
- package/src/skills/wp-rest-api/references/responses-and-fields.md +30 -0
- package/src/skills/wp-rest-api/references/routes-and-endpoints.md +36 -0
- package/src/skills/wp-rest-api/references/schema.md +22 -0
- package/src/skills/youtube-search/SKILL.md +412 -0
- package/src/skills/youtube-search/references/parsing-examples.md +159 -0
- package/src/skills/youtube-search/references/youtube-api-quota.md +85 -0
- package/src/skills/youtube-thumbnail/SKILL.md +1060 -0
- package/tests/commands/info.test.js +49 -0
- package/tests/commands/install.test.js +36 -0
- package/tests/commands/list.test.js +66 -0
- package/tests/commands/publish.test.js +182 -0
- package/tests/commands/search.test.js +45 -0
- package/tests/commands/uninstall.test.js +29 -0
- package/tests/commands/update.test.js +59 -0
- package/tests/functional/skills-lifecycle.test.js +293 -0
- package/tests/helpers/fixtures.js +63 -0
- package/tests/integration/cli.test.js +83 -0
- package/tests/skills/add.test.js +138 -0
- package/tests/skills/list.test.js +63 -0
- package/tests/skills/remove.test.js +38 -0
- package/tests/skills/update.test.js +60 -0
- package/tests/unit/config.test.js +31 -0
- package/tests/unit/registry.test.js +79 -0
- package/tests/unit/utils.test.js +150 -0
- package/tests/validation/registry-schema.test.js +112 -0
- package/tests/validation/skills-validation.test.js +96 -0
|
@@ -0,0 +1,1060 @@
|
|
|
1
|
+
# šØ YouTube Thumbnail Generation Skill (2026 Edition)
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
This skill enables **fully autonomous** generation of professional YouTube thumbnails
|
|
6
|
+
in 11 strategic styles from THUMBNAILS.md. Zero user interaction required after the
|
|
7
|
+
initial request. The agent auto-selects style, builds a prompt, generates the base
|
|
8
|
+
image via the best available AI backend, applies compositing and text via Pillow,
|
|
9
|
+
and saves a final **1280Ć720 PNG** to `/mnt/user-data/outputs/thumbnail.png`.
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## When to Trigger This Skill
|
|
14
|
+
|
|
15
|
+
Trigger when the user:
|
|
16
|
+
- Says "create a thumbnail", "make a thumbnail", "generate a YouTube thumbnail"
|
|
17
|
+
- Provides a video title or topic and needs a visual
|
|
18
|
+
- Wants product video covers, presentation title slides, or channel art
|
|
19
|
+
- Needs batch thumbnail creation or automation for a YouTube workflow
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## Architecture: Two-Layer Pipeline
|
|
24
|
+
```
|
|
25
|
+
[User Request]
|
|
26
|
+
ā
|
|
27
|
+
ā¼
|
|
28
|
+
[Layer 1: AI Image Generation] ā Base image via best available backend
|
|
29
|
+
ā
|
|
30
|
+
ā¼
|
|
31
|
+
[Layer 2: Pillow Compositing] ā Resize to 1280Ć720, effects, text overlay
|
|
32
|
+
ā
|
|
33
|
+
ā¼
|
|
34
|
+
[Output: /mnt/user-data/outputs/thumbnail.png]
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Backend Priority Table
|
|
40
|
+
|
|
41
|
+
Auto-detect each backend by testing availability. Use the first one that works.
|
|
42
|
+
|
|
43
|
+
| # | Backend | Detection Method | Quality | Cost |
|
|
44
|
+
|---|---------|-----------------|---------|------|
|
|
45
|
+
| 1 | **A1111 Local SD** | `GET localhost:7860/sdapi/v1/samplers` | ā
ā
ā
ā
| Free (own GPU) |
|
|
46
|
+
| 2 | **ComfyUI Local** | `GET localhost:8188/history` | ā
ā
ā
ā
| Free (own GPU) |
|
|
47
|
+
| 3 | **MCP Imagen 4** (Vertex AI) | `which mcp-imagen-go` + `~/.gemini/settings.json` | ā
ā
ā
ā
ā
| Vertex AI pricing |
|
|
48
|
+
| 4 | **Gemini API** (Nano Banana 2) | `GEMINI_API_KEY` env + `google-genai` installed | ā
ā
ā
ā
| Free quota |
|
|
49
|
+
| 5 | **fal.ai FLUX** | `FAL_KEY` env + `fal-client` installed | ā
ā
ā
ā
| $0.03/MP |
|
|
50
|
+
| 6 | **OpenAI gpt-image-1** | `OPENAI_API_KEY` env + `openai` installed | ā
ā
ā
ā
| ~$0.04/img |
|
|
51
|
+
| 7 | **Pillow-only fallback** | Always available | ā
ā
| Free |
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## The 11 Thumbnail Styles
|
|
56
|
+
|
|
57
|
+
### Style 1: Neo-Minimalism (`style_1_minimalism`)
|
|
58
|
+
**Best for:** General niches, standing out in a cluttered feed
|
|
59
|
+
**Core idea:** If the feed is loud, go quiet. 50%+ negative space.
|
|
60
|
+
**AI prompt pattern:**
|
|
61
|
+
`"[subject], minimalist product photography, pure white background, single centered
|
|
62
|
+
subject, dramatic soft studio lighting, ultra clean composition, no clutter"`
|
|
63
|
+
**Pillow:** White/monochromatic bg, max 2 colors, light serif font bottom-left or none
|
|
64
|
+
|
|
65
|
+
### Style 2: The Surround (`style_2_surround`)
|
|
66
|
+
**Best for:** Comparisons, "I tried X things", hauls
|
|
67
|
+
**Core idea:** Subject dead center, objects in organized circle/grid around it.
|
|
68
|
+
**AI prompt pattern:**
|
|
69
|
+
`"[subject] perfectly centered, multiple [related objects] arranged in organized
|
|
70
|
+
circle or grid around center subject, controlled chaos, vibrant, top-down angle"`
|
|
71
|
+
**Pillow:** Grid math ā center subject 40% canvas, surrounding items equally spaced by angle
|
|
72
|
+
|
|
73
|
+
### Style 3: Rainbow Ranking (`style_3_rainbow`)
|
|
74
|
+
**Best for:** Tier lists, "Best to Worst", reviews
|
|
75
|
+
**Core idea:** Color gradient (RedāBlue) conveys hierarchy visually.
|
|
76
|
+
**AI prompt pattern:**
|
|
77
|
+
`"flat lay of [3-7 items] arranged in ranking order, color gradient from red to
|
|
78
|
+
blue across items, product photography style, clean background"`
|
|
79
|
+
**Pillow:** Apply gradient color wash per item via `ImageEnhance.Color`, add rank numbers (1,2,3ā¦) in bold white
|
|
80
|
+
|
|
81
|
+
### Style 4: Educational Whiteboard (`style_4_whiteboard`)
|
|
82
|
+
**Best for:** Tutorials, business explainers, complex systems
|
|
83
|
+
**Core idea:** Authenticity over polish. Signals "high value, no fluff."
|
|
84
|
+
**AI prompt pattern:**
|
|
85
|
+
`"hand-drawn diagram on real whiteboard explaining [concept], chalk markers,
|
|
86
|
+
rough sketchy educational style, authentic classroom feel, [topic] framework"`
|
|
87
|
+
**Pillow:** Reduce saturation to 70% for authenticity, warm color grade, handwritten-style font
|
|
88
|
+
|
|
89
|
+
### Style 5: Familiar Interface (`style_5_ui_framing`)
|
|
90
|
+
**Best for:** Commentary, news, reviews
|
|
91
|
+
**Core idea:** Borrow credibility from known platforms (Twitter, Reddit, Amazon).
|
|
92
|
+
**AI prompt pattern:**
|
|
93
|
+
`"realistic screenshot mockup of [Twitter post / Reddit thread / Amazon listing /
|
|
94
|
+
Netflix menu] about [topic], exact platform UI styling, authentic spacing and fonts"`
|
|
95
|
+
**Pillow:** Programmatically draw platform UI elements ā rounded rectangles, brand colors
|
|
96
|
+
(Twitter #1DA1F2, Reddit #FF4500, Amazon #FF9900)
|
|
97
|
+
|
|
98
|
+
### Style 6: Cinematic Text (`style_6_cinematic`)
|
|
99
|
+
**Best for:** High-production storytelling, documentaries
|
|
100
|
+
**Core idea:** Text IS a design element ā embedded in the world, not floating over it.
|
|
101
|
+
**AI prompt pattern:**
|
|
102
|
+
`"cinematic movie still about [subject], dramatic chiaroscuro lighting, film grain,
|
|
103
|
+
anamorphic lens flare, shallow depth of field, golden hour or moody tones"`
|
|
104
|
+
**Pillow:** MAX 3-4 words, large centered bold font, text shadow/glow via layered offset draws
|
|
105
|
+
|
|
106
|
+
### Style 7: Warped Faces (`style_7_warped`)
|
|
107
|
+
**Best for:** Self-improvement, "Harsh Truths", psychology topics
|
|
108
|
+
**Core idea:** "Something is wrong" curiosity gap via distortion.
|
|
109
|
+
**AI prompt pattern:**
|
|
110
|
+
`"double exposure portrait, digital glitch effect, [emotion] face merged with
|
|
111
|
+
[abstract concept], surreal digital distortion, moody dark tones, experimental photography"`
|
|
112
|
+
**Pillow:** RGB channel shift for glitch (shift R channel +8px), selective blur, minimal/no text
|
|
113
|
+
|
|
114
|
+
### Style 8: Maximalist Flex (`style_8_maximalist`)
|
|
115
|
+
**Best for:** Collectors, tech enthusiasts, hobbyists
|
|
116
|
+
**Core idea:** The collection is the star, not the person.
|
|
117
|
+
**AI prompt pattern:**
|
|
118
|
+
`"aerial flat lay of complete collection of every [item type], perfectly organized
|
|
119
|
+
and arranged, every single item visible, product catalog photography style"`
|
|
120
|
+
**Pillow:** Dense but organized placement, optional "COMPLETE COLLECTION" text strip top/bottom
|
|
121
|
+
|
|
122
|
+
### Style 9: Encyclopedia Grid (`style_9_encyclopedia`)
|
|
123
|
+
**Best for:** "Every X Explained", deep dives
|
|
124
|
+
**Core idea:** Looks informative and "safe" ā no drama, just knowledge.
|
|
125
|
+
**AI prompt pattern:**
|
|
126
|
+
`"flat icon illustration grid of [topic] elements, consistent icon shapes, high
|
|
127
|
+
contrast on white background, educational infographic style, no dramatic lighting"`
|
|
128
|
+
**Pillow:** Draw equal grid cells with `ImageDraw.rectangle`, flat icon each cell, label below
|
|
129
|
+
|
|
130
|
+
### Style 10: Candid Fake (`style_10_candid`)
|
|
131
|
+
**Best for:** Challenges, travel, lifestyle
|
|
132
|
+
**Core idea:** Highly engineered to look like a lucky candid shot.
|
|
133
|
+
**AI prompt pattern:**
|
|
134
|
+
`"candid authentic moment of [person/scene], natural spontaneous composition but
|
|
135
|
+
perfectly framed, golden hour lighting, documentary photography style, physically
|
|
136
|
+
possible scene"`
|
|
137
|
+
**Pillow:** Minimal processing, subtle vignette at edges only. NO text. NO arrows.
|
|
138
|
+
|
|
139
|
+
### Style 11: The Anti-Thumbnail (`style_11_anti`)
|
|
140
|
+
**Best for:** Productivity, "Quick Tip" videos
|
|
141
|
+
**Core idea:** Dark + specific "irritating" number triggers curiosity.
|
|
142
|
+
**AI prompt pattern:**
|
|
143
|
+
`"dark moody portrait of [subject], direct serious eye contact to camera, dramatic
|
|
144
|
+
low-key lighting, minimal background, cinematic single-subject composition"`
|
|
145
|
+
**Pillow:** Dark gradient bg (0,0,0)ā(30,30,30), specific non-round number ("47 Seconds" not "60"),
|
|
146
|
+
large centered font
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## Auto-Style Selection (when user doesn't specify)
|
|
151
|
+
```python
|
|
152
|
+
NICHE_TO_STYLE = {
|
|
153
|
+
# Education & Learning
|
|
154
|
+
"education": "style_4_whiteboard",
|
|
155
|
+
"tutorial": "style_4_whiteboard",
|
|
156
|
+
"howto": "style_4_whiteboard",
|
|
157
|
+
"explainer": "style_9_encyclopedia",
|
|
158
|
+
"course": "style_4_whiteboard",
|
|
159
|
+
|
|
160
|
+
# Reviews & Rankings
|
|
161
|
+
"review": "style_3_rainbow",
|
|
162
|
+
"comparison": "style_2_surround",
|
|
163
|
+
"tierlist": "style_3_rainbow",
|
|
164
|
+
"ranking": "style_3_rainbow",
|
|
165
|
+
"top10": "style_3_rainbow",
|
|
166
|
+
|
|
167
|
+
# News & Commentary
|
|
168
|
+
"news": "style_5_ui_framing",
|
|
169
|
+
"commentary": "style_5_ui_framing",
|
|
170
|
+
"reaction": "style_5_ui_framing",
|
|
171
|
+
"opinion": "style_5_ui_framing",
|
|
172
|
+
|
|
173
|
+
# Personal Development
|
|
174
|
+
"productivity": "style_11_anti",
|
|
175
|
+
"psychology": "style_7_warped",
|
|
176
|
+
"selfimprovement": "style_7_warped",
|
|
177
|
+
"motivation": "style_11_anti",
|
|
178
|
+
|
|
179
|
+
# Collections & Gear
|
|
180
|
+
"collection": "style_8_maximalist",
|
|
181
|
+
"tech": "style_8_maximalist",
|
|
182
|
+
"gear": "style_8_maximalist",
|
|
183
|
+
"unboxing": "style_2_surround",
|
|
184
|
+
|
|
185
|
+
# Lifestyle & Travel
|
|
186
|
+
"travel": "style_10_candid",
|
|
187
|
+
"lifestyle": "style_10_candid",
|
|
188
|
+
"vlog": "style_10_candid",
|
|
189
|
+
"challenge": "style_10_candid",
|
|
190
|
+
|
|
191
|
+
# High-Production
|
|
192
|
+
"documentary": "style_6_cinematic",
|
|
193
|
+
"storytelling": "style_6_cinematic",
|
|
194
|
+
"cinematic": "style_6_cinematic",
|
|
195
|
+
|
|
196
|
+
# Default
|
|
197
|
+
"general": "style_1_minimalism",
|
|
198
|
+
}
|
|
199
|
+
|
|
200
|
+
def select_style(niche: str, style_override: str = None) -> str:
|
|
201
|
+
if style_override:
|
|
202
|
+
return style_override
|
|
203
|
+
niche_clean = niche.lower().replace(" ", "").replace("-", "")
|
|
204
|
+
for key in NICHE_TO_STYLE:
|
|
205
|
+
if key in niche_clean or niche_clean in key:
|
|
206
|
+
return NICHE_TO_STYLE[key]
|
|
207
|
+
return "style_1_minimalism"
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
---
|
|
211
|
+
|
|
212
|
+
## Full Python Implementation
|
|
213
|
+
|
|
214
|
+
When creating a thumbnail, write this complete script to `/home/claude/generate_thumbnail.py`,
|
|
215
|
+
then execute it with `python3 generate_thumbnail.py`. All values in CAPS are filled in
|
|
216
|
+
by the agent before writing the script.
|
|
217
|
+
```python
|
|
218
|
+
#!/usr/bin/env python3
|
|
219
|
+
"""
|
|
220
|
+
YouTube Thumbnail Generator ā Auto-generated by Agent
|
|
221
|
+
Video: VIDEO_TITLE_PLACEHOLDER
|
|
222
|
+
Style: STYLE_PLACEHOLDER
|
|
223
|
+
Backend: auto-detected
|
|
224
|
+
"""
|
|
225
|
+
|
|
226
|
+
import os
|
|
227
|
+
import sys
|
|
228
|
+
import json
|
|
229
|
+
import base64
|
|
230
|
+
import io
|
|
231
|
+
import subprocess
|
|
232
|
+
import requests
|
|
233
|
+
from PIL import Image, ImageDraw, ImageFont, ImageFilter, ImageEnhance
|
|
234
|
+
|
|
235
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
236
|
+
# CONFIGURATION ā Agent fills these in before writing script
|
|
237
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
238
|
+
VIDEO_TITLE = "FILL_VIDEO_TITLE"
|
|
239
|
+
VIDEO_NICHE = "FILL_NICHE" # e.g. "tutorial", "review", "travel"
|
|
240
|
+
STYLE = "FILL_STYLE" # e.g. "style_6_cinematic"
|
|
241
|
+
TEXT_OVERLAY = "FILL_TEXT" # max 4 words; empty string = auto from title
|
|
242
|
+
AI_PROMPT = "FILL_AI_PROMPT" # full prompt built from style template
|
|
243
|
+
OUTPUT_PATH = "/mnt/user-data/outputs/thumbnail.png"
|
|
244
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
245
|
+
|
|
246
|
+
|
|
247
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
248
|
+
# BACKEND DETECTION
|
|
249
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
250
|
+
|
|
251
|
+
def detect_mcp_imagen() -> bool:
|
|
252
|
+
"""Check if mcp-imagen-go binary is installed and configured in Gemini CLI."""
|
|
253
|
+
try:
|
|
254
|
+
r = subprocess.run(["which", "mcp-imagen-go"],
|
|
255
|
+
capture_output=True, text=True, timeout=5)
|
|
256
|
+
if r.returncode != 0:
|
|
257
|
+
return False
|
|
258
|
+
except (FileNotFoundError, subprocess.TimeoutExpired):
|
|
259
|
+
return False
|
|
260
|
+
settings_path = os.path.expanduser("~/.gemini/settings.json")
|
|
261
|
+
if not os.path.exists(settings_path):
|
|
262
|
+
return False
|
|
263
|
+
try:
|
|
264
|
+
with open(settings_path) as f:
|
|
265
|
+
settings = json.load(f)
|
|
266
|
+
return "imagen" in settings.get("mcpServers", {})
|
|
267
|
+
except Exception:
|
|
268
|
+
return False
|
|
269
|
+
|
|
270
|
+
|
|
271
|
+
def detect_gemini_api() -> bool:
|
|
272
|
+
"""Check if Gemini API key is set and google-genai is installed."""
|
|
273
|
+
if not os.environ.get("GEMINI_API_KEY"):
|
|
274
|
+
return False
|
|
275
|
+
try:
|
|
276
|
+
import google.genai
|
|
277
|
+
return True
|
|
278
|
+
except ImportError:
|
|
279
|
+
return False
|
|
280
|
+
|
|
281
|
+
|
|
282
|
+
def detect_backend() -> str:
|
|
283
|
+
"""Auto-detect best available image generation backend."""
|
|
284
|
+
# Priority 1: Local A1111
|
|
285
|
+
try:
|
|
286
|
+
r = requests.get("http://127.0.0.1:7860/sdapi/v1/samplers", timeout=3)
|
|
287
|
+
if r.status_code == 200:
|
|
288
|
+
print("ā Backend: A1111 Local")
|
|
289
|
+
return "a1111"
|
|
290
|
+
except Exception:
|
|
291
|
+
pass
|
|
292
|
+
|
|
293
|
+
# Priority 2: Local ComfyUI
|
|
294
|
+
try:
|
|
295
|
+
r = requests.get("http://127.0.0.1:8188/history", timeout=3)
|
|
296
|
+
if r.status_code == 200:
|
|
297
|
+
print("ā Backend: ComfyUI Local")
|
|
298
|
+
return "comfyui"
|
|
299
|
+
except Exception:
|
|
300
|
+
pass
|
|
301
|
+
|
|
302
|
+
# Priority 3: MCP Imagen 4 (Vertex AI via Gemini CLI)
|
|
303
|
+
if detect_mcp_imagen():
|
|
304
|
+
print("ā Backend: MCP Imagen 4 (Vertex AI)")
|
|
305
|
+
return "mcp_imagen"
|
|
306
|
+
|
|
307
|
+
# Priority 4: Gemini API (Nano Banana 2)
|
|
308
|
+
if detect_gemini_api():
|
|
309
|
+
print("ā Backend: Gemini API (Nano Banana 2)")
|
|
310
|
+
return "gemini_api"
|
|
311
|
+
|
|
312
|
+
# Priority 5: fal.ai FLUX
|
|
313
|
+
if os.environ.get("FAL_KEY"):
|
|
314
|
+
try:
|
|
315
|
+
import fal_client
|
|
316
|
+
print("ā Backend: fal.ai FLUX")
|
|
317
|
+
return "fal"
|
|
318
|
+
except ImportError:
|
|
319
|
+
pass
|
|
320
|
+
|
|
321
|
+
# Priority 6: OpenAI gpt-image-1
|
|
322
|
+
if os.environ.get("OPENAI_API_KEY"):
|
|
323
|
+
try:
|
|
324
|
+
import openai
|
|
325
|
+
print("ā Backend: OpenAI gpt-image-1")
|
|
326
|
+
return "openai"
|
|
327
|
+
except ImportError:
|
|
328
|
+
pass
|
|
329
|
+
|
|
330
|
+
# Priority 7: Pillow-only fallback
|
|
331
|
+
print("ā Backend: Pillow-only (no AI available)")
|
|
332
|
+
return "pillow_only"
|
|
333
|
+
|
|
334
|
+
|
|
335
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
336
|
+
# IMAGE GENERATION ā one function per backend
|
|
337
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
338
|
+
|
|
339
|
+
def gen_a1111(prompt: str) -> Image.Image:
|
|
340
|
+
payload = {
|
|
341
|
+
"prompt": prompt,
|
|
342
|
+
"negative_prompt": "blurry, low quality, text, watermark, ugly, deformed, cropped",
|
|
343
|
+
"width": 1280,
|
|
344
|
+
"height": 720,
|
|
345
|
+
"steps": 25,
|
|
346
|
+
"cfg_scale": 7,
|
|
347
|
+
"sampler_name": "DPM++ 2M Karras",
|
|
348
|
+
"batch_size": 1,
|
|
349
|
+
}
|
|
350
|
+
r = requests.post("http://127.0.0.1:7860/sdapi/v1/txt2img", json=payload, timeout=120)
|
|
351
|
+
r.raise_for_status()
|
|
352
|
+
data = r.json()
|
|
353
|
+
img_bytes = base64.b64decode(data["images"][0])
|
|
354
|
+
return Image.open(io.BytesIO(img_bytes))
|
|
355
|
+
|
|
356
|
+
|
|
357
|
+
def gen_comfyui(prompt: str) -> Image.Image:
|
|
358
|
+
"""Simple ComfyUI text-to-image via basic workflow."""
|
|
359
|
+
workflow = {
|
|
360
|
+
"3": {"inputs": {"text": prompt, "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
|
|
361
|
+
"4": {"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}, "class_type": "CheckpointLoaderSimple"},
|
|
362
|
+
"5": {"inputs": {"text": "blurry, ugly, watermark", "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
|
|
363
|
+
"6": {"inputs": {"width": 1280, "height": 720, "batch_size": 1}, "class_type": "EmptyLatentImage"},
|
|
364
|
+
"7": {"inputs": {"seed": -1, "steps": 25, "cfg": 7, "sampler_name": "dpmpp_2m",
|
|
365
|
+
"scheduler": "karras", "denoise": 1,
|
|
366
|
+
"model": ["4", 0], "positive": ["3", 0],
|
|
367
|
+
"negative": ["5", 0], "latent_image": ["6", 0]},
|
|
368
|
+
"class_type": "KSampler"},
|
|
369
|
+
"8": {"inputs": {"samples": ["7", 0], "vae": ["4", 2]}, "class_type": "VAEDecode"},
|
|
370
|
+
"9": {"inputs": {"images": ["8", 0], "filename_prefix": "thumb"},
|
|
371
|
+
"class_type": "SaveImage"},
|
|
372
|
+
}
|
|
373
|
+
r = requests.post("http://127.0.0.1:8188/prompt",
|
|
374
|
+
json={"prompt": workflow}, timeout=120)
|
|
375
|
+
r.raise_for_status()
|
|
376
|
+
prompt_id = r.json()["prompt_id"]
|
|
377
|
+
|
|
378
|
+
# Poll for result
|
|
379
|
+
import time
|
|
380
|
+
for _ in range(60):
|
|
381
|
+
time.sleep(2)
|
|
382
|
+
hist = requests.get(f"http://127.0.0.1:8188/history/{prompt_id}", timeout=10).json()
|
|
383
|
+
if prompt_id in hist:
|
|
384
|
+
outputs = hist[prompt_id]["outputs"]
|
|
385
|
+
for node_id, node_output in outputs.items():
|
|
386
|
+
if "images" in node_output:
|
|
387
|
+
img_info = node_output["images"][0]
|
|
388
|
+
img_r = requests.get(
|
|
389
|
+
f"http://127.0.0.1:8188/view?filename={img_info['filename']}"
|
|
390
|
+
f"&subfolder={img_info.get('subfolder','')}&type={img_info['type']}",
|
|
391
|
+
timeout=30
|
|
392
|
+
)
|
|
393
|
+
return Image.open(io.BytesIO(img_r.content))
|
|
394
|
+
raise TimeoutError("ComfyUI generation timed out after 120s")
|
|
395
|
+
|
|
396
|
+
|
|
397
|
+
def gen_mcp_imagen(prompt: str) -> Image.Image:
|
|
398
|
+
"""
|
|
399
|
+
Call mcp-imagen-go directly via STDIO (MCP protocol).
|
|
400
|
+
Requires mcp-imagen-go binary in PATH and PROJECT_ID env var.
|
|
401
|
+
Uses Imagen 4 via Vertex AI ā highest quality option.
|
|
402
|
+
"""
|
|
403
|
+
os.makedirs("/tmp/thumbnail_gen", exist_ok=True)
|
|
404
|
+
|
|
405
|
+
mcp_request = json.dumps({
|
|
406
|
+
"jsonrpc": "2.0",
|
|
407
|
+
"method": "tools/call",
|
|
408
|
+
"id": 1,
|
|
409
|
+
"params": {
|
|
410
|
+
"name": "imagen_t2i",
|
|
411
|
+
"arguments": {
|
|
412
|
+
"prompt": prompt,
|
|
413
|
+
"aspect_ratio": "16:9",
|
|
414
|
+
"number_of_images": 1,
|
|
415
|
+
"output_directory": "/tmp/thumbnail_gen",
|
|
416
|
+
}
|
|
417
|
+
}
|
|
418
|
+
})
|
|
419
|
+
|
|
420
|
+
# Load PROJECT_ID from settings.json if not in env
|
|
421
|
+
env = os.environ.copy()
|
|
422
|
+
if not env.get("PROJECT_ID"):
|
|
423
|
+
try:
|
|
424
|
+
settings_path = os.path.expanduser("~/.gemini/settings.json")
|
|
425
|
+
with open(settings_path) as f:
|
|
426
|
+
settings = json.load(f)
|
|
427
|
+
mcp_env = settings.get("mcpServers", {}).get("imagen", {}).get("env", {})
|
|
428
|
+
env.update({k: v for k, v in mcp_env.items() if v and "YOUR_" not in v})
|
|
429
|
+
except Exception:
|
|
430
|
+
pass
|
|
431
|
+
|
|
432
|
+
proc = subprocess.run(
|
|
433
|
+
["mcp-imagen-go"],
|
|
434
|
+
input=mcp_request,
|
|
435
|
+
capture_output=True,
|
|
436
|
+
text=True,
|
|
437
|
+
timeout=90,
|
|
438
|
+
env=env
|
|
439
|
+
)
|
|
440
|
+
|
|
441
|
+
if proc.returncode != 0:
|
|
442
|
+
raise RuntimeError(f"mcp-imagen-go failed: {proc.stderr[:500]}")
|
|
443
|
+
|
|
444
|
+
try:
|
|
445
|
+
response = json.loads(proc.stdout)
|
|
446
|
+
except json.JSONDecodeError:
|
|
447
|
+
# Some versions output multiple JSON lines ā take last valid one
|
|
448
|
+
for line in reversed(proc.stdout.strip().split("\n")):
|
|
449
|
+
try:
|
|
450
|
+
response = json.loads(line)
|
|
451
|
+
break
|
|
452
|
+
except Exception:
|
|
453
|
+
continue
|
|
454
|
+
else:
|
|
455
|
+
raise RuntimeError("Could not parse mcp-imagen-go output")
|
|
456
|
+
|
|
457
|
+
content = response.get("result", {}).get("content", [])
|
|
458
|
+
|
|
459
|
+
for block in content:
|
|
460
|
+
# Inline base64 image
|
|
461
|
+
if block.get("type") == "image" and block.get("data"):
|
|
462
|
+
img_bytes = base64.b64decode(block["data"])
|
|
463
|
+
return Image.open(io.BytesIO(img_bytes))
|
|
464
|
+
|
|
465
|
+
# File path returned as text
|
|
466
|
+
if block.get("type") == "text":
|
|
467
|
+
text = block.get("text", "")
|
|
468
|
+
for token in text.split():
|
|
469
|
+
token = token.strip(".,\"'")
|
|
470
|
+
if token.endswith((".png", ".jpg", ".jpeg")) and os.path.exists(token):
|
|
471
|
+
return Image.open(token)
|
|
472
|
+
|
|
473
|
+
# Check output directory for newly created files
|
|
474
|
+
import glob, time
|
|
475
|
+
recent = sorted(glob.glob("/tmp/thumbnail_gen/*.png"), key=os.path.getmtime, reverse=True)
|
|
476
|
+
if recent:
|
|
477
|
+
return Image.open(recent[0])
|
|
478
|
+
|
|
479
|
+
raise ValueError("mcp-imagen-go returned no image data")
|
|
480
|
+
|
|
481
|
+
|
|
482
|
+
def gen_gemini_api(prompt: str) -> Image.Image:
|
|
483
|
+
"""
|
|
484
|
+
Generate via Gemini API (Nano Banana 2 / gemini-3.1-flash-image-preview).
|
|
485
|
+
Note: aspect ratio is requested in the prompt, not as a parameter.
|
|
486
|
+
"""
|
|
487
|
+
from google import genai
|
|
488
|
+
from google.genai import types
|
|
489
|
+
|
|
490
|
+
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
|
|
491
|
+
|
|
492
|
+
full_prompt = (
|
|
493
|
+
f"{prompt}. "
|
|
494
|
+
"Generate as a wide landscape 16:9 format, high resolution, "
|
|
495
|
+
"professional YouTube thumbnail quality."
|
|
496
|
+
)
|
|
497
|
+
|
|
498
|
+
# Try models newest-first
|
|
499
|
+
models = [
|
|
500
|
+
"gemini-3.1-flash-image-preview",
|
|
501
|
+
"gemini-2.5-flash-image-preview",
|
|
502
|
+
"gemini-2.0-flash-exp",
|
|
503
|
+
]
|
|
504
|
+
|
|
505
|
+
for model_id in models:
|
|
506
|
+
try:
|
|
507
|
+
response = client.models.generate_content(
|
|
508
|
+
model=model_id,
|
|
509
|
+
contents=[full_prompt],
|
|
510
|
+
config=types.GenerateContentConfig(
|
|
511
|
+
response_modalities=["IMAGE", "TEXT"]
|
|
512
|
+
)
|
|
513
|
+
)
|
|
514
|
+
for part in response.candidates[0].content.parts:
|
|
515
|
+
if part.inline_data is not None:
|
|
516
|
+
img = Image.open(io.BytesIO(part.inline_data.data))
|
|
517
|
+
print(f" ā³ Used model: {model_id}")
|
|
518
|
+
return img
|
|
519
|
+
except Exception as e:
|
|
520
|
+
print(f" ā³ {model_id} failed: {e}")
|
|
521
|
+
continue
|
|
522
|
+
|
|
523
|
+
raise ValueError("All Gemini API models failed ā check GEMINI_API_KEY and quota")
|
|
524
|
+
|
|
525
|
+
|
|
526
|
+
def gen_fal(prompt: str) -> Image.Image:
|
|
527
|
+
import fal_client
|
|
528
|
+
|
|
529
|
+
result = fal_client.subscribe(
|
|
530
|
+
"fal-ai/flux/dev",
|
|
531
|
+
arguments={
|
|
532
|
+
"prompt": prompt,
|
|
533
|
+
"image_size": {"width": 1280, "height": 720},
|
|
534
|
+
"num_inference_steps": 28,
|
|
535
|
+
"num_images": 1,
|
|
536
|
+
"enable_safety_checker": True,
|
|
537
|
+
}
|
|
538
|
+
)
|
|
539
|
+
img_url = result["images"][0]["url"]
|
|
540
|
+
r = requests.get(img_url, timeout=60)
|
|
541
|
+
r.raise_for_status()
|
|
542
|
+
return Image.open(io.BytesIO(r.content))
|
|
543
|
+
|
|
544
|
+
|
|
545
|
+
def gen_openai(prompt: str) -> Image.Image:
|
|
546
|
+
from openai import OpenAI
|
|
547
|
+
client = OpenAI()
|
|
548
|
+
|
|
549
|
+
response = client.images.generate(
|
|
550
|
+
model="gpt-image-1",
|
|
551
|
+
prompt=prompt,
|
|
552
|
+
size="1536x1024", # closest 16:9 available
|
|
553
|
+
quality="standard",
|
|
554
|
+
n=1,
|
|
555
|
+
)
|
|
556
|
+
img_bytes = base64.b64decode(response.data[0].b64_json)
|
|
557
|
+
img = Image.open(io.BytesIO(img_bytes))
|
|
558
|
+
# gpt-image-1 returns 1536Ć1024 ā resize to exact YouTube spec
|
|
559
|
+
return img.resize((1280, 720), Image.LANCZOS)
|
|
560
|
+
|
|
561
|
+
|
|
562
|
+
def gen_pillow_only(style: str, title: str) -> Image.Image:
|
|
563
|
+
"""
|
|
564
|
+
Pure Pillow fallback ā generates a styled graphic without any AI.
|
|
565
|
+
Produces a usable thumbnail when no AI backend is available.
|
|
566
|
+
"""
|
|
567
|
+
canvas = Image.new("RGB", (1280, 720))
|
|
568
|
+
draw = ImageDraw.Draw(canvas)
|
|
569
|
+
|
|
570
|
+
# Style-specific color palettes
|
|
571
|
+
PALETTES = {
|
|
572
|
+
"style_1_minimalism": [(245,245,245), (220,220,220)],
|
|
573
|
+
"style_6_cinematic": [(8, 12, 25), (40, 30, 60)],
|
|
574
|
+
"style_11_anti": [(5, 5, 8), (20, 20, 30)],
|
|
575
|
+
"style_4_whiteboard": [(250,248,240), (230,225,210)],
|
|
576
|
+
"style_7_warped": [(10, 5, 20), (50, 10, 60)],
|
|
577
|
+
"default": [(15, 20, 40), (40, 60, 100)],
|
|
578
|
+
}
|
|
579
|
+
colors = PALETTES.get(style, PALETTES["default"])
|
|
580
|
+
|
|
581
|
+
# Vertical gradient
|
|
582
|
+
for y in range(720):
|
|
583
|
+
t = y / 719
|
|
584
|
+
r_v = int(colors[0][0] * (1 - t) + colors[1][0] * t)
|
|
585
|
+
g_v = int(colors[0][1] * (1 - t) + colors[1][1] * t)
|
|
586
|
+
b_v = int(colors[0][2] * (1 - t) + colors[1][2] * t)
|
|
587
|
+
draw.line([(0, y), (1280, y)], fill=(r_v, g_v, b_v))
|
|
588
|
+
|
|
589
|
+
# Decorative diagonal accent lines
|
|
590
|
+
accent = (80, 120, 200) if colors[0][0] < 50 else (150, 150, 160)
|
|
591
|
+
for i in range(0, 1280, 120):
|
|
592
|
+
draw.line([(i, 0), (i + 400, 720)], fill=accent, width=1)
|
|
593
|
+
|
|
594
|
+
return canvas
|
|
595
|
+
|
|
596
|
+
|
|
597
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
598
|
+
# STYLE EFFECTS (Pillow post-processing)
|
|
599
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
600
|
+
|
|
601
|
+
def apply_style_effects(img: Image.Image, style: str) -> Image.Image:
|
|
602
|
+
"""Apply style-specific color grading and effects."""
|
|
603
|
+
|
|
604
|
+
if style == "style_1_minimalism":
|
|
605
|
+
img = ImageEnhance.Color(img).enhance(0.75)
|
|
606
|
+
img = ImageEnhance.Brightness(img).enhance(1.05)
|
|
607
|
+
|
|
608
|
+
elif style == "style_3_rainbow":
|
|
609
|
+
img = ImageEnhance.Color(img).enhance(1.4)
|
|
610
|
+
img = ImageEnhance.Contrast(img).enhance(1.1)
|
|
611
|
+
|
|
612
|
+
elif style == "style_4_whiteboard":
|
|
613
|
+
img = ImageEnhance.Color(img).enhance(0.65)
|
|
614
|
+
# Warm tone shift
|
|
615
|
+
r, g, b = img.split()
|
|
616
|
+
r = ImageEnhance.Brightness(Image.merge("RGB", (r, r, r))).enhance(1.05).split()[0]
|
|
617
|
+
img = Image.merge("RGB", (r, g, b))
|
|
618
|
+
|
|
619
|
+
elif style == "style_6_cinematic":
|
|
620
|
+
img = ImageEnhance.Color(img).enhance(0.85)
|
|
621
|
+
img = ImageEnhance.Contrast(img).enhance(1.3)
|
|
622
|
+
# Slight teal shadow / orange highlight look
|
|
623
|
+
img = _apply_color_grade(img, shadow=(0, 5, 15), highlight=(15, 5, 0))
|
|
624
|
+
|
|
625
|
+
elif style == "style_7_warped":
|
|
626
|
+
# RGB channel shift for glitch effect
|
|
627
|
+
r, g, b = img.split()
|
|
628
|
+
r = r.transform(img.size, Image.AFFINE, (1, 0, 8, 0, 1, 0))
|
|
629
|
+
b = b.transform(img.size, Image.AFFINE, (1, 0, -6, 0, 1, 2))
|
|
630
|
+
img = Image.merge("RGB", (r, g, b))
|
|
631
|
+
img = ImageEnhance.Contrast(img).enhance(1.2)
|
|
632
|
+
|
|
633
|
+
elif style == "style_9_encyclopedia":
|
|
634
|
+
img = ImageEnhance.Color(img).enhance(0.6)
|
|
635
|
+
img = ImageEnhance.Brightness(img).enhance(1.1)
|
|
636
|
+
|
|
637
|
+
elif style == "style_11_anti":
|
|
638
|
+
img = ImageEnhance.Brightness(img).enhance(0.6)
|
|
639
|
+
img = ImageEnhance.Contrast(img).enhance(1.5)
|
|
640
|
+
|
|
641
|
+
else:
|
|
642
|
+
# Default: moderate contrast boost
|
|
643
|
+
img = ImageEnhance.Contrast(img).enhance(1.15)
|
|
644
|
+
|
|
645
|
+
# Vignette applied to all styles
|
|
646
|
+
img = _apply_vignette(img, strength=0.35)
|
|
647
|
+
|
|
648
|
+
return img
|
|
649
|
+
|
|
650
|
+
|
|
651
|
+
def _apply_color_grade(img: Image.Image,
|
|
652
|
+
shadow=(0, 0, 0),
|
|
653
|
+
highlight=(0, 0, 0)) -> Image.Image:
|
|
654
|
+
"""Subtle shadow/highlight color grade (like LUTs)."""
|
|
655
|
+
r, g, b = img.split()
|
|
656
|
+
|
|
657
|
+
def grade_channel(channel, shadow_add, highlight_add):
|
|
658
|
+
lut = []
|
|
659
|
+
for i in range(256):
|
|
660
|
+
t = i / 255.0
|
|
661
|
+
val = i + int(shadow_add * (1 - t)) + int(highlight_add * t)
|
|
662
|
+
lut.append(max(0, min(255, val)))
|
|
663
|
+
return channel.point(lut)
|
|
664
|
+
|
|
665
|
+
r = grade_channel(r, shadow[0], highlight[0])
|
|
666
|
+
g = grade_channel(g, shadow[1], highlight[1])
|
|
667
|
+
b = grade_channel(b, shadow[2], highlight[2])
|
|
668
|
+
return Image.merge("RGB", (r, g, b))
|
|
669
|
+
|
|
670
|
+
|
|
671
|
+
def _apply_vignette(img: Image.Image, strength: float = 0.35) -> Image.Image:
|
|
672
|
+
"""Add subtle radial vignette to focus the eye toward center."""
|
|
673
|
+
w, h = img.size
|
|
674
|
+
mask = Image.new("L", (w, h), 255)
|
|
675
|
+
draw = ImageDraw.Draw(mask)
|
|
676
|
+
|
|
677
|
+
steps = min(w, h) // 2
|
|
678
|
+
for i in range(steps):
|
|
679
|
+
progress = i / steps
|
|
680
|
+
alpha = int(255 * (progress + (1 - progress) * (1 - strength)))
|
|
681
|
+
alpha = max(0, min(255, alpha))
|
|
682
|
+
margin_x = int((1 - progress) * (w // 2))
|
|
683
|
+
margin_y = int((1 - progress) * (h // 2))
|
|
684
|
+
draw.ellipse(
|
|
685
|
+
[margin_x, margin_y, w - margin_x, h - margin_y],
|
|
686
|
+
fill=alpha
|
|
687
|
+
)
|
|
688
|
+
|
|
689
|
+
mask = mask.filter(ImageFilter.GaussianBlur(radius=40))
|
|
690
|
+
black = Image.new("RGB", (w, h), (0, 0, 0))
|
|
691
|
+
img = Image.composite(black, img, mask)
|
|
692
|
+
return img
|
|
693
|
+
|
|
694
|
+
|
|
695
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
696
|
+
# TEXT OVERLAY
|
|
697
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
698
|
+
|
|
699
|
+
FONT_PATHS = [
|
|
700
|
+
"/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
|
|
701
|
+
"/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
|
|
702
|
+
"/usr/share/fonts/TTF/DejaVuSans-Bold.ttf",
|
|
703
|
+
"/usr/share/fonts/truetype/freefont/FreeSansBold.ttf",
|
|
704
|
+
"/usr/share/fonts/truetype/ubuntu/Ubuntu-Bold.ttf",
|
|
705
|
+
]
|
|
706
|
+
|
|
707
|
+
# Styles where text should be omitted
|
|
708
|
+
NO_TEXT_STYLES = {"style_7_warped", "style_10_candid"}
|
|
709
|
+
|
|
710
|
+
|
|
711
|
+
def load_font(size: int):
|
|
712
|
+
for fp in FONT_PATHS:
|
|
713
|
+
if os.path.exists(fp):
|
|
714
|
+
try:
|
|
715
|
+
return ImageFont.truetype(fp, size=size)
|
|
716
|
+
except Exception:
|
|
717
|
+
continue
|
|
718
|
+
return ImageFont.load_default()
|
|
719
|
+
|
|
720
|
+
|
|
721
|
+
def add_text_overlay(img: Image.Image, text: str, style: str) -> Image.Image:
|
|
722
|
+
"""Add styled text overlay appropriate for each thumbnail style."""
|
|
723
|
+
|
|
724
|
+
if not text or style in NO_TEXT_STYLES:
|
|
725
|
+
return img
|
|
726
|
+
|
|
727
|
+
# Truncate to 4 words max (per best-practice from THUMBNAILS.md)
|
|
728
|
+
words = text.split()
|
|
729
|
+
if len(words) > 4:
|
|
730
|
+
text = " ".join(words[:4])
|
|
731
|
+
|
|
732
|
+
w, h = img.size
|
|
733
|
+
img = img.convert("RGBA")
|
|
734
|
+
|
|
735
|
+
if style in ("style_6_cinematic", "style_11_anti"):
|
|
736
|
+
return _text_centered_large(img, text, w, h)
|
|
737
|
+
|
|
738
|
+
elif style in ("style_1_minimalism", "style_4_whiteboard"):
|
|
739
|
+
return _text_clean_corner(img, text, w, h, style)
|
|
740
|
+
|
|
741
|
+
elif style == "style_11_anti":
|
|
742
|
+
return _text_centered_large(img, text, w, h)
|
|
743
|
+
|
|
744
|
+
else:
|
|
745
|
+
return _text_banner_strip(img, text, w, h)
|
|
746
|
+
|
|
747
|
+
|
|
748
|
+
def _text_centered_large(img, text, w, h):
|
|
749
|
+
"""Large centered text for cinematic/anti-thumbnail styles."""
|
|
750
|
+
font = load_font(96)
|
|
751
|
+
draw = ImageDraw.Draw(img)
|
|
752
|
+
|
|
753
|
+
bbox = draw.textbbox((0, 0), text.upper(), font=font)
|
|
754
|
+
tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
|
|
755
|
+
x, y = (w - tw) // 2, (h - th) // 2
|
|
756
|
+
|
|
757
|
+
# Glow / shadow effect
|
|
758
|
+
for offset in [(6, 6), (-6, 6), (6, -6), (-6, -6)]:
|
|
759
|
+
draw.text((x + offset[0], y + offset[1]), text.upper(),
|
|
760
|
+
font=font, fill=(0, 0, 0, 180))
|
|
761
|
+
draw.text((x, y), text.upper(), font=font, fill=(255, 255, 255, 255))
|
|
762
|
+
|
|
763
|
+
return img.convert("RGB")
|
|
764
|
+
|
|
765
|
+
|
|
766
|
+
def _text_clean_corner(img, text, w, h, style):
|
|
767
|
+
"""Clean minimal text for minimalism and whiteboard styles."""
|
|
768
|
+
font = load_font(72)
|
|
769
|
+
draw = ImageDraw.Draw(img)
|
|
770
|
+
|
|
771
|
+
text_color = (30, 30, 30, 255) if style == "style_4_whiteboard" else (60, 60, 60, 255)
|
|
772
|
+
bbox = draw.textbbox((0, 0), text, font=font)
|
|
773
|
+
x, y = 60, h - (bbox[3] - bbox[1]) - 60
|
|
774
|
+
|
|
775
|
+
# Subtle shadow
|
|
776
|
+
draw.text((x + 2, y + 2), text, font=font, fill=(200, 200, 200, 120))
|
|
777
|
+
draw.text((x, y), text, font=font, fill=text_color)
|
|
778
|
+
|
|
779
|
+
return img.convert("RGB")
|
|
780
|
+
|
|
781
|
+
|
|
782
|
+
def _text_banner_strip(img, text, w, h):
|
|
783
|
+
"""Semi-transparent banner strip with high-contrast text."""
|
|
784
|
+
font = load_font(82)
|
|
785
|
+
draw_measure = ImageDraw.Draw(img)
|
|
786
|
+
bbox = draw_measure.textbbox((0, 0), text.upper(), font=font)
|
|
787
|
+
tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
|
|
788
|
+
|
|
789
|
+
padding_x, padding_y = 30, 18
|
|
790
|
+
strip_h = th + padding_y * 2
|
|
791
|
+
strip_y = h - strip_h - 40
|
|
792
|
+
|
|
793
|
+
# Semi-transparent background strip
|
|
794
|
+
overlay = Image.new("RGBA", (w, h), (0, 0, 0, 0))
|
|
795
|
+
overlay_draw = ImageDraw.Draw(overlay)
|
|
796
|
+
overlay_draw.rectangle(
|
|
797
|
+
[0, strip_y, w, strip_y + strip_h],
|
|
798
|
+
fill=(0, 0, 0, 175)
|
|
799
|
+
)
|
|
800
|
+
img = Image.alpha_composite(img, overlay)
|
|
801
|
+
|
|
802
|
+
# Text: shadow then main
|
|
803
|
+
draw = ImageDraw.Draw(img)
|
|
804
|
+
x = (w - tw) // 2
|
|
805
|
+
y = strip_y + padding_y
|
|
806
|
+
|
|
807
|
+
draw.text((x + 3, y + 3), text.upper(), font=font, fill=(0, 0, 0, 200))
|
|
808
|
+
draw.text((x, y), text.upper(), font=font, fill=(255, 220, 50, 255))
|
|
809
|
+
|
|
810
|
+
return img.convert("RGB")
|
|
811
|
+
|
|
812
|
+
|
|
813
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
814
|
+
# AUTO TEXT EXTRACTION
|
|
815
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
816
|
+
|
|
817
|
+
def auto_text(video_title: str, style: str) -> str:
|
|
818
|
+
"""Extract best text overlay from video title for given style."""
|
|
819
|
+
if style in NO_TEXT_STYLES:
|
|
820
|
+
return ""
|
|
821
|
+
words = video_title.split()
|
|
822
|
+
# For anti-thumbnail: keep number if present, else use 3 words
|
|
823
|
+
if style == "style_11_anti":
|
|
824
|
+
for word in words:
|
|
825
|
+
if any(c.isdigit() for c in word):
|
|
826
|
+
return word + (" Seconds" if "sec" not in video_title.lower() else "")
|
|
827
|
+
return " ".join(words[:3])
|
|
828
|
+
# General: first 4 impactful words
|
|
829
|
+
stopwords = {"the", "a", "an", "how", "to", "i", "my", "is", "are", "was"}
|
|
830
|
+
filtered = [w for w in words if w.lower() not in stopwords]
|
|
831
|
+
result = filtered[:4] if filtered else words[:4]
|
|
832
|
+
return " ".join(result)
|
|
833
|
+
|
|
834
|
+
|
|
835
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
836
|
+
# DEPENDENCY INSTALLER
|
|
837
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
838
|
+
|
|
839
|
+
def ensure_deps(backend: str):
|
|
840
|
+
"""Install required packages for the selected backend."""
|
|
841
|
+
deps = ["Pillow", "requests"]
|
|
842
|
+
|
|
843
|
+
if backend == "gemini_api":
|
|
844
|
+
deps.append("google-genai")
|
|
845
|
+
elif backend == "fal":
|
|
846
|
+
deps.append("fal-client")
|
|
847
|
+
elif backend == "openai":
|
|
848
|
+
deps.append("openai")
|
|
849
|
+
|
|
850
|
+
for dep in deps:
|
|
851
|
+
try:
|
|
852
|
+
if dep == "Pillow":
|
|
853
|
+
import PIL
|
|
854
|
+
elif dep == "requests":
|
|
855
|
+
import requests
|
|
856
|
+
elif dep == "google-genai":
|
|
857
|
+
import google.genai
|
|
858
|
+
elif dep == "fal-client":
|
|
859
|
+
import fal_client
|
|
860
|
+
elif dep == "openai":
|
|
861
|
+
import openai
|
|
862
|
+
except ImportError:
|
|
863
|
+
print(f"Installing {dep}...")
|
|
864
|
+
subprocess.run(
|
|
865
|
+
[sys.executable, "-m", "pip", "install", dep,
|
|
866
|
+
"--break-system-packages", "-q"],
|
|
867
|
+
check=True
|
|
868
|
+
)
|
|
869
|
+
|
|
870
|
+
|
|
871
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
872
|
+
# MAIN ORCHESTRATOR
|
|
873
|
+
# āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
|
|
874
|
+
|
|
875
|
+
def main():
|
|
876
|
+
print(f"\nšØ Thumbnail Generator")
|
|
877
|
+
print(f" Title : {VIDEO_TITLE}")
|
|
878
|
+
print(f" Style : {STYLE}")
|
|
879
|
+
print(f" Output: {OUTPUT_PATH}\n")
|
|
880
|
+
|
|
881
|
+
# 1. Detect backend
|
|
882
|
+
backend = detect_backend()
|
|
883
|
+
|
|
884
|
+
# 2. Install deps if needed
|
|
885
|
+
ensure_deps(backend)
|
|
886
|
+
|
|
887
|
+
# 3. Generate base image
|
|
888
|
+
print(f"ā Generating base image...")
|
|
889
|
+
generators = {
|
|
890
|
+
"a1111": gen_a1111,
|
|
891
|
+
"comfyui": gen_comfyui,
|
|
892
|
+
"mcp_imagen": gen_mcp_imagen,
|
|
893
|
+
"gemini_api": gen_gemini_api,
|
|
894
|
+
"fal": gen_fal,
|
|
895
|
+
"openai": gen_openai,
|
|
896
|
+
}
|
|
897
|
+
|
|
898
|
+
if backend == "pillow_only":
|
|
899
|
+
base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
|
|
900
|
+
else:
|
|
901
|
+
try:
|
|
902
|
+
base_img = generators[backend](AI_PROMPT)
|
|
903
|
+
except Exception as e:
|
|
904
|
+
print(f"ā {backend} failed: {e}")
|
|
905
|
+
print(" Falling back to Pillow-only...")
|
|
906
|
+
base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
|
|
907
|
+
|
|
908
|
+
# 4. Normalize to exact 1280Ć720
|
|
909
|
+
base_img = base_img.convert("RGB").resize((1280, 720), Image.LANCZOS)
|
|
910
|
+
print(f"ā Base image ready: {base_img.size}")
|
|
911
|
+
|
|
912
|
+
# 5. Apply style effects
|
|
913
|
+
print("ā Applying style effects...")
|
|
914
|
+
base_img = apply_style_effects(base_img, STYLE)
|
|
915
|
+
|
|
916
|
+
# 6. Determine text overlay
|
|
917
|
+
text = TEXT_OVERLAY if TEXT_OVERLAY else auto_text(VIDEO_TITLE, STYLE)
|
|
918
|
+
print(f"ā Text overlay: '{text}'" if text else "ā No text overlay (style preference)")
|
|
919
|
+
|
|
920
|
+
# 7. Add text
|
|
921
|
+
base_img = add_text_overlay(base_img, text, STYLE)
|
|
922
|
+
|
|
923
|
+
# 8. Save
|
|
924
|
+
os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
|
|
925
|
+
base_img.save(OUTPUT_PATH, "PNG", optimize=True)
|
|
926
|
+
|
|
927
|
+
size_kb = os.path.getsize(OUTPUT_PATH) // 1024
|
|
928
|
+
print(f"\nā
Saved: {OUTPUT_PATH} ({size_kb} KB, 1280Ć720)")
|
|
929
|
+
|
|
930
|
+
|
|
931
|
+
if __name__ == "__main__":
|
|
932
|
+
main()
|
|
933
|
+
```
|
|
934
|
+
|
|
935
|
+
---
|
|
936
|
+
|
|
937
|
+
## Agent Execution Protocol
|
|
938
|
+
|
|
939
|
+
When the user asks for a thumbnail, the agent follows these steps:
|
|
940
|
+
|
|
941
|
+
### Step 1 ā Parse
|
|
942
|
+
Extract from message:
|
|
943
|
+
- `VIDEO_TITLE` ā the video title or topic description
|
|
944
|
+
- `VIDEO_NICHE` ā category (tutorial, review, travel, etc.)
|
|
945
|
+
- `STYLE` ā if explicitly mentioned; otherwise auto-select
|
|
946
|
+
- `TEXT_OVERLAY` ā specific text if mentioned (max 4 words); else leave empty
|
|
947
|
+
|
|
948
|
+
### Step 2 ā Select Style
|
|
949
|
+
```python
|
|
950
|
+
style = select_style(VIDEO_NICHE, style_override=None)
|
|
951
|
+
```
|
|
952
|
+
|
|
953
|
+
### Step 3 ā Build AI Prompt
|
|
954
|
+
Use the style template from "The 11 Styles" section above.
|
|
955
|
+
Append universal quality suffix:
|
|
956
|
+
```
|
|
957
|
+
", professional YouTube thumbnail, vibrant high contrast, cinematic quality,
|
|
958
|
+
sharp focus, award-winning composition"
|
|
959
|
+
```
|
|
960
|
+
|
|
961
|
+
### Step 4 ā Fill Script Template
|
|
962
|
+
Replace all FILL_ placeholders in the Python script above with actual values.
|
|
963
|
+
|
|
964
|
+
### Step 5 ā Execute
|
|
965
|
+
```bash
|
|
966
|
+
pip install Pillow requests --break-system-packages -q
|
|
967
|
+
python3 /home/claude/generate_thumbnail.py
|
|
968
|
+
```
|
|
969
|
+
|
|
970
|
+
### Step 6 ā Verify & Present
|
|
971
|
+
```python
|
|
972
|
+
assert os.path.exists("/mnt/user-data/outputs/thumbnail.png")
|
|
973
|
+
assert os.path.getsize("/mnt/user-data/outputs/thumbnail.png") > 50_000
|
|
974
|
+
```
|
|
975
|
+
Then call `present_files` tool with the output path.
|
|
976
|
+
Briefly tell the user which style was chosen and why (1 sentence).
|
|
977
|
+
|
|
978
|
+
---
|
|
979
|
+
|
|
980
|
+
## Style Prompt Templates (Reference Card)
|
|
981
|
+
```python
|
|
982
|
+
STYLE_PROMPTS = {
|
|
983
|
+
"style_1_minimalism": (
|
|
984
|
+
"{subject}, minimalist product photography, pure white background, "
|
|
985
|
+
"single centered subject, dramatic soft studio lighting, ultra clean"
|
|
986
|
+
),
|
|
987
|
+
"style_2_surround": (
|
|
988
|
+
"{subject} dead center, multiple {objects} arranged in perfect organized "
|
|
989
|
+
"circle around center, controlled chaos, vibrant, top-down angle"
|
|
990
|
+
),
|
|
991
|
+
"style_3_rainbow": (
|
|
992
|
+
"flat lay of {items} in ranking order, color gradient red to blue, "
|
|
993
|
+
"product photography, clean background, vivid colors"
|
|
994
|
+
),
|
|
995
|
+
"style_4_whiteboard": (
|
|
996
|
+
"hand-drawn diagram on real whiteboard explaining {concept}, chalk markers, "
|
|
997
|
+
"rough sketchy authentic educational style, classroom feel"
|
|
998
|
+
),
|
|
999
|
+
"style_5_ui_framing": (
|
|
1000
|
+
"realistic screenshot mockup of {platform} UI about {topic}, "
|
|
1001
|
+
"exact platform styling, authentic spacing and fonts, credible interface"
|
|
1002
|
+
),
|
|
1003
|
+
"style_6_cinematic": (
|
|
1004
|
+
"cinematic movie still about {subject}, dramatic chiaroscuro lighting, "
|
|
1005
|
+
"film grain, anamorphic lens flare, shallow depth of field"
|
|
1006
|
+
),
|
|
1007
|
+
"style_7_warped": (
|
|
1008
|
+
"double exposure portrait, digital glitch effect, {emotion} face merged "
|
|
1009
|
+
"with {concept}, surreal distortion, moody dark tones"
|
|
1010
|
+
),
|
|
1011
|
+
"style_8_maximalist": (
|
|
1012
|
+
"aerial flat lay of complete collection of all {items}, perfectly organized, "
|
|
1013
|
+
"every single item visible, product catalog photography"
|
|
1014
|
+
),
|
|
1015
|
+
"style_9_encyclopedia": (
|
|
1016
|
+
"flat icon illustration grid of {topic} elements, consistent icon shapes, "
|
|
1017
|
+
"high contrast on white background, educational infographic style"
|
|
1018
|
+
),
|
|
1019
|
+
"style_10_candid": (
|
|
1020
|
+
"candid authentic moment of {scene}, natural spontaneous composition, "
|
|
1021
|
+
"perfectly framed, golden hour lighting, documentary photography"
|
|
1022
|
+
),
|
|
1023
|
+
"style_11_anti": (
|
|
1024
|
+
"dark moody portrait of {subject}, direct serious eye contact to camera, "
|
|
1025
|
+
"dramatic low-key lighting, minimal background, cinematic"
|
|
1026
|
+
),
|
|
1027
|
+
}
|
|
1028
|
+
```
|
|
1029
|
+
|
|
1030
|
+
---
|
|
1031
|
+
|
|
1032
|
+
## Output Specification
|
|
1033
|
+
- **Path:** `/mnt/user-data/outputs/thumbnail.png`
|
|
1034
|
+
- **Resolution:** 1280 Ć 720 px (YouTube 16:9 standard)
|
|
1035
|
+
- **Format:** PNG
|
|
1036
|
+
- **Min file size:** ~50 KB (abort and retry if smaller)
|
|
1037
|
+
|
|
1038
|
+
---
|
|
1039
|
+
|
|
1040
|
+
## Error Handling Rules
|
|
1041
|
+
1. Backend API fails ā automatically fall back to next priority backend
|
|
1042
|
+
2. All AI backends fail ā use `gen_pillow_only()`, still produce output
|
|
1043
|
+
3. Font not found ā use `ImageFont.load_default()`, never crash
|
|
1044
|
+
4. Image < 50KB after save ā regenerate with next backend
|
|
1045
|
+
5. TEXT_OVERLAY blank ā run `auto_text()` to extract from title
|
|
1046
|
+
6. mcp-imagen-go PATH issue ā check `~/.gemini/settings.json` env and retry
|
|
1047
|
+
with full binary path from `which mcp-imagen-go`
|
|
1048
|
+
|
|
1049
|
+
---
|
|
1050
|
+
|
|
1051
|
+
## Quick Reference: Which Backend for What
|
|
1052
|
+
|
|
1053
|
+
| Situation | Recommended Backend |
|
|
1054
|
+
|-----------|-------------------|
|
|
1055
|
+
| Local GPU available (A1111 running) | `a1111` ā always fastest+free |
|
|
1056
|
+
| Have GCP project + Gemini CLI set up | `mcp_imagen` ā Imagen 4, best quality |
|
|
1057
|
+
| Have Gemini API key (from Gemini CLI) | `gemini_api` ā free quota, good quality |
|
|
1058
|
+
| Cloud-only, budget matters | `fal` ā $0.03/image, FLUX quality |
|
|
1059
|
+
| Need best text rendering in image | `openai` ā gpt-image-1 best for text |
|
|
1060
|
+
| No API keys / testing | `pillow_only` ā always works |
|