@biggora/claude-plugins 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/.claude/settings.local.json +13 -0
  2. package/CLAUDE.md +55 -0
  3. package/LICENSE +1 -1
  4. package/README.md +208 -39
  5. package/bin/cli.js +39 -0
  6. package/package.json +30 -17
  7. package/registry/registry.json +166 -1
  8. package/registry/schema.json +10 -0
  9. package/src/commands/skills/add.js +194 -0
  10. package/src/commands/skills/list.js +52 -0
  11. package/src/commands/skills/remove.js +27 -0
  12. package/src/commands/skills/update.js +74 -0
  13. package/src/config.js +5 -0
  14. package/src/skills/codex-cli/SKILL.md +265 -0
  15. package/src/skills/commafeed-api/SKILL.md +1012 -0
  16. package/src/skills/gemini-cli/SKILL.md +379 -0
  17. package/src/skills/gemini-cli/references/commands.md +145 -0
  18. package/src/skills/gemini-cli/references/configuration.md +182 -0
  19. package/src/skills/gemini-cli/references/headless-and-scripting.md +181 -0
  20. package/src/skills/gemini-cli/references/mcp-and-extensions.md +254 -0
  21. package/src/skills/n8n-api/SKILL.md +623 -0
  22. package/src/skills/notebook-lm/SKILL.md +217 -0
  23. package/src/skills/notebook-lm/references/artifact-options.md +168 -0
  24. package/src/skills/notebook-lm/references/auth.md +58 -0
  25. package/src/skills/notebook-lm/references/workflows.md +144 -0
  26. package/src/skills/screen-recording/SKILL.md +309 -0
  27. package/src/skills/screen-recording/references/approach1-programmatic.md +311 -0
  28. package/src/skills/screen-recording/references/approach2-xvfb.md +232 -0
  29. package/src/skills/screen-recording/references/design-patterns.md +168 -0
  30. package/src/skills/test-mobile-app/SKILL.md +212 -0
  31. package/src/skills/test-mobile-app/references/report-template.md +95 -0
  32. package/src/skills/test-mobile-app/references/setup-appium.md +154 -0
  33. package/src/skills/test-mobile-app/scripts/analyze_apk.py +164 -0
  34. package/src/skills/test-mobile-app/scripts/check_environment.py +116 -0
  35. package/src/skills/test-mobile-app/scripts/generate_report.py +250 -0
  36. package/src/skills/test-mobile-app/scripts/run_tests.py +326 -0
  37. package/src/skills/test-web-ui/SKILL.md +232 -0
  38. package/src/skills/test-web-ui/references/test_case_schema.md +102 -0
  39. package/src/skills/test-web-ui/scripts/discover.py +176 -0
  40. package/src/skills/test-web-ui/scripts/generate_report.py +237 -0
  41. package/src/skills/test-web-ui/scripts/run_tests.py +296 -0
  42. package/src/skills/text-to-speech/SKILL.md +236 -0
  43. package/src/skills/text-to-speech/references/espeak-cli.md +277 -0
  44. package/src/skills/text-to-speech/references/kokoro-onnx.md +124 -0
  45. package/src/skills/text-to-speech/references/online-engines.md +128 -0
  46. package/src/skills/text-to-speech/references/pyttsx3-espeak.md +143 -0
  47. package/src/skills/tm-search/SKILL.md +240 -0
  48. package/src/skills/tm-search/references/field-guide.md +79 -0
  49. package/src/skills/tm-search/references/scraping-fallback.md +140 -0
  50. package/src/skills/tm-search/scripts/tm_search.py +375 -0
  51. package/src/skills/wp-rest-api/SKILL.md +114 -0
  52. package/src/skills/wp-rest-api/references/authentication.md +18 -0
  53. package/src/skills/wp-rest-api/references/custom-content-types.md +20 -0
  54. package/src/skills/wp-rest-api/references/discovery-and-params.md +20 -0
  55. package/src/skills/wp-rest-api/references/responses-and-fields.md +30 -0
  56. package/src/skills/wp-rest-api/references/routes-and-endpoints.md +36 -0
  57. package/src/skills/wp-rest-api/references/schema.md +22 -0
  58. package/src/skills/youtube-search/SKILL.md +412 -0
  59. package/src/skills/youtube-search/references/parsing-examples.md +159 -0
  60. package/src/skills/youtube-search/references/youtube-api-quota.md +85 -0
  61. package/src/skills/youtube-thumbnail/SKILL.md +1060 -0
  62. package/tests/commands/info.test.js +49 -0
  63. package/tests/commands/install.test.js +36 -0
  64. package/tests/commands/list.test.js +66 -0
  65. package/tests/commands/publish.test.js +182 -0
  66. package/tests/commands/search.test.js +45 -0
  67. package/tests/commands/uninstall.test.js +29 -0
  68. package/tests/commands/update.test.js +59 -0
  69. package/tests/functional/skills-lifecycle.test.js +293 -0
  70. package/tests/helpers/fixtures.js +63 -0
  71. package/tests/integration/cli.test.js +83 -0
  72. package/tests/skills/add.test.js +138 -0
  73. package/tests/skills/list.test.js +63 -0
  74. package/tests/skills/remove.test.js +38 -0
  75. package/tests/skills/update.test.js +60 -0
  76. package/tests/unit/config.test.js +31 -0
  77. package/tests/unit/registry.test.js +79 -0
  78. package/tests/unit/utils.test.js +150 -0
  79. package/tests/validation/registry-schema.test.js +112 -0
  80. package/tests/validation/skills-validation.test.js +96 -0
@@ -0,0 +1,1060 @@
1
+ # šŸŽØ YouTube Thumbnail Generation Skill (2026 Edition)
2
+
3
+ ## Overview
4
+
5
+ This skill enables **fully autonomous** generation of professional YouTube thumbnails
6
+ in 11 strategic styles from THUMBNAILS.md. Zero user interaction required after the
7
+ initial request. The agent auto-selects style, builds a prompt, generates the base
8
+ image via the best available AI backend, applies compositing and text via Pillow,
9
+ and saves a final **1280Ɨ720 PNG** to `/mnt/user-data/outputs/thumbnail.png`.
10
+
11
+ ---
12
+
13
+ ## When to Trigger This Skill
14
+
15
+ Trigger when the user:
16
+ - Says "create a thumbnail", "make a thumbnail", "generate a YouTube thumbnail"
17
+ - Provides a video title or topic and needs a visual
18
+ - Wants product video covers, presentation title slides, or channel art
19
+ - Needs batch thumbnail creation or automation for a YouTube workflow
20
+
21
+ ---
22
+
23
+ ## Architecture: Two-Layer Pipeline
24
+ ```
25
+ [User Request]
26
+ │
27
+ ā–¼
28
+ [Layer 1: AI Image Generation] ← Base image via best available backend
29
+ │
30
+ ā–¼
31
+ [Layer 2: Pillow Compositing] ← Resize to 1280Ɨ720, effects, text overlay
32
+ │
33
+ ā–¼
34
+ [Output: /mnt/user-data/outputs/thumbnail.png]
35
+ ```
36
+
37
+ ---
38
+
39
+ ## Backend Priority Table
40
+
41
+ Auto-detect each backend by testing availability. Use the first one that works.
42
+
43
+ | # | Backend | Detection Method | Quality | Cost |
44
+ |---|---------|-----------------|---------|------|
45
+ | 1 | **A1111 Local SD** | `GET localhost:7860/sdapi/v1/samplers` | ā˜…ā˜…ā˜…ā˜… | Free (own GPU) |
46
+ | 2 | **ComfyUI Local** | `GET localhost:8188/history` | ā˜…ā˜…ā˜…ā˜… | Free (own GPU) |
47
+ | 3 | **MCP Imagen 4** (Vertex AI) | `which mcp-imagen-go` + `~/.gemini/settings.json` | ā˜…ā˜…ā˜…ā˜…ā˜… | Vertex AI pricing |
48
+ | 4 | **Gemini API** (Nano Banana 2) | `GEMINI_API_KEY` env + `google-genai` installed | ā˜…ā˜…ā˜…ā˜… | Free quota |
49
+ | 5 | **fal.ai FLUX** | `FAL_KEY` env + `fal-client` installed | ā˜…ā˜…ā˜…ā˜… | $0.03/MP |
50
+ | 6 | **OpenAI gpt-image-1** | `OPENAI_API_KEY` env + `openai` installed | ā˜…ā˜…ā˜…ā˜… | ~$0.04/img |
51
+ | 7 | **Pillow-only fallback** | Always available | ā˜…ā˜… | Free |
52
+
53
+ ---
54
+
55
+ ## The 11 Thumbnail Styles
56
+
57
+ ### Style 1: Neo-Minimalism (`style_1_minimalism`)
58
+ **Best for:** General niches, standing out in a cluttered feed
59
+ **Core idea:** If the feed is loud, go quiet. 50%+ negative space.
60
+ **AI prompt pattern:**
61
+ `"[subject], minimalist product photography, pure white background, single centered
62
+ subject, dramatic soft studio lighting, ultra clean composition, no clutter"`
63
+ **Pillow:** White/monochromatic bg, max 2 colors, light serif font bottom-left or none
64
+
65
+ ### Style 2: The Surround (`style_2_surround`)
66
+ **Best for:** Comparisons, "I tried X things", hauls
67
+ **Core idea:** Subject dead center, objects in organized circle/grid around it.
68
+ **AI prompt pattern:**
69
+ `"[subject] perfectly centered, multiple [related objects] arranged in organized
70
+ circle or grid around center subject, controlled chaos, vibrant, top-down angle"`
71
+ **Pillow:** Grid math — center subject 40% canvas, surrounding items equally spaced by angle
72
+
73
+ ### Style 3: Rainbow Ranking (`style_3_rainbow`)
74
+ **Best for:** Tier lists, "Best to Worst", reviews
75
+ **Core idea:** Color gradient (Red→Blue) conveys hierarchy visually.
76
+ **AI prompt pattern:**
77
+ `"flat lay of [3-7 items] arranged in ranking order, color gradient from red to
78
+ blue across items, product photography style, clean background"`
79
+ **Pillow:** Apply gradient color wash per item via `ImageEnhance.Color`, add rank numbers (1,2,3…) in bold white
80
+
81
+ ### Style 4: Educational Whiteboard (`style_4_whiteboard`)
82
+ **Best for:** Tutorials, business explainers, complex systems
83
+ **Core idea:** Authenticity over polish. Signals "high value, no fluff."
84
+ **AI prompt pattern:**
85
+ `"hand-drawn diagram on real whiteboard explaining [concept], chalk markers,
86
+ rough sketchy educational style, authentic classroom feel, [topic] framework"`
87
+ **Pillow:** Reduce saturation to 70% for authenticity, warm color grade, handwritten-style font
88
+
89
+ ### Style 5: Familiar Interface (`style_5_ui_framing`)
90
+ **Best for:** Commentary, news, reviews
91
+ **Core idea:** Borrow credibility from known platforms (Twitter, Reddit, Amazon).
92
+ **AI prompt pattern:**
93
+ `"realistic screenshot mockup of [Twitter post / Reddit thread / Amazon listing /
94
+ Netflix menu] about [topic], exact platform UI styling, authentic spacing and fonts"`
95
+ **Pillow:** Programmatically draw platform UI elements — rounded rectangles, brand colors
96
+ (Twitter #1DA1F2, Reddit #FF4500, Amazon #FF9900)
97
+
98
+ ### Style 6: Cinematic Text (`style_6_cinematic`)
99
+ **Best for:** High-production storytelling, documentaries
100
+ **Core idea:** Text IS a design element — embedded in the world, not floating over it.
101
+ **AI prompt pattern:**
102
+ `"cinematic movie still about [subject], dramatic chiaroscuro lighting, film grain,
103
+ anamorphic lens flare, shallow depth of field, golden hour or moody tones"`
104
+ **Pillow:** MAX 3-4 words, large centered bold font, text shadow/glow via layered offset draws
105
+
106
+ ### Style 7: Warped Faces (`style_7_warped`)
107
+ **Best for:** Self-improvement, "Harsh Truths", psychology topics
108
+ **Core idea:** "Something is wrong" curiosity gap via distortion.
109
+ **AI prompt pattern:**
110
+ `"double exposure portrait, digital glitch effect, [emotion] face merged with
111
+ [abstract concept], surreal digital distortion, moody dark tones, experimental photography"`
112
+ **Pillow:** RGB channel shift for glitch (shift R channel +8px), selective blur, minimal/no text
113
+
114
+ ### Style 8: Maximalist Flex (`style_8_maximalist`)
115
+ **Best for:** Collectors, tech enthusiasts, hobbyists
116
+ **Core idea:** The collection is the star, not the person.
117
+ **AI prompt pattern:**
118
+ `"aerial flat lay of complete collection of every [item type], perfectly organized
119
+ and arranged, every single item visible, product catalog photography style"`
120
+ **Pillow:** Dense but organized placement, optional "COMPLETE COLLECTION" text strip top/bottom
121
+
122
+ ### Style 9: Encyclopedia Grid (`style_9_encyclopedia`)
123
+ **Best for:** "Every X Explained", deep dives
124
+ **Core idea:** Looks informative and "safe" — no drama, just knowledge.
125
+ **AI prompt pattern:**
126
+ `"flat icon illustration grid of [topic] elements, consistent icon shapes, high
127
+ contrast on white background, educational infographic style, no dramatic lighting"`
128
+ **Pillow:** Draw equal grid cells with `ImageDraw.rectangle`, flat icon each cell, label below
129
+
130
+ ### Style 10: Candid Fake (`style_10_candid`)
131
+ **Best for:** Challenges, travel, lifestyle
132
+ **Core idea:** Highly engineered to look like a lucky candid shot.
133
+ **AI prompt pattern:**
134
+ `"candid authentic moment of [person/scene], natural spontaneous composition but
135
+ perfectly framed, golden hour lighting, documentary photography style, physically
136
+ possible scene"`
137
+ **Pillow:** Minimal processing, subtle vignette at edges only. NO text. NO arrows.
138
+
139
+ ### Style 11: The Anti-Thumbnail (`style_11_anti`)
140
+ **Best for:** Productivity, "Quick Tip" videos
141
+ **Core idea:** Dark + specific "irritating" number triggers curiosity.
142
+ **AI prompt pattern:**
143
+ `"dark moody portrait of [subject], direct serious eye contact to camera, dramatic
144
+ low-key lighting, minimal background, cinematic single-subject composition"`
145
+ **Pillow:** Dark gradient bg (0,0,0)→(30,30,30), specific non-round number ("47 Seconds" not "60"),
146
+ large centered font
147
+
148
+ ---
149
+
150
+ ## Auto-Style Selection (when user doesn't specify)
151
+ ```python
152
+ NICHE_TO_STYLE = {
153
+ # Education & Learning
154
+ "education": "style_4_whiteboard",
155
+ "tutorial": "style_4_whiteboard",
156
+ "howto": "style_4_whiteboard",
157
+ "explainer": "style_9_encyclopedia",
158
+ "course": "style_4_whiteboard",
159
+
160
+ # Reviews & Rankings
161
+ "review": "style_3_rainbow",
162
+ "comparison": "style_2_surround",
163
+ "tierlist": "style_3_rainbow",
164
+ "ranking": "style_3_rainbow",
165
+ "top10": "style_3_rainbow",
166
+
167
+ # News & Commentary
168
+ "news": "style_5_ui_framing",
169
+ "commentary": "style_5_ui_framing",
170
+ "reaction": "style_5_ui_framing",
171
+ "opinion": "style_5_ui_framing",
172
+
173
+ # Personal Development
174
+ "productivity": "style_11_anti",
175
+ "psychology": "style_7_warped",
176
+ "selfimprovement": "style_7_warped",
177
+ "motivation": "style_11_anti",
178
+
179
+ # Collections & Gear
180
+ "collection": "style_8_maximalist",
181
+ "tech": "style_8_maximalist",
182
+ "gear": "style_8_maximalist",
183
+ "unboxing": "style_2_surround",
184
+
185
+ # Lifestyle & Travel
186
+ "travel": "style_10_candid",
187
+ "lifestyle": "style_10_candid",
188
+ "vlog": "style_10_candid",
189
+ "challenge": "style_10_candid",
190
+
191
+ # High-Production
192
+ "documentary": "style_6_cinematic",
193
+ "storytelling": "style_6_cinematic",
194
+ "cinematic": "style_6_cinematic",
195
+
196
+ # Default
197
+ "general": "style_1_minimalism",
198
+ }
199
+
200
+ def select_style(niche: str, style_override: str = None) -> str:
201
+ if style_override:
202
+ return style_override
203
+ niche_clean = niche.lower().replace(" ", "").replace("-", "")
204
+ for key in NICHE_TO_STYLE:
205
+ if key in niche_clean or niche_clean in key:
206
+ return NICHE_TO_STYLE[key]
207
+ return "style_1_minimalism"
208
+ ```
209
+
210
+ ---
211
+
212
+ ## Full Python Implementation
213
+
214
+ When creating a thumbnail, write this complete script to `/home/claude/generate_thumbnail.py`,
215
+ then execute it with `python3 generate_thumbnail.py`. All values in CAPS are filled in
216
+ by the agent before writing the script.
217
+ ```python
218
+ #!/usr/bin/env python3
219
+ """
220
+ YouTube Thumbnail Generator — Auto-generated by Agent
221
+ Video: VIDEO_TITLE_PLACEHOLDER
222
+ Style: STYLE_PLACEHOLDER
223
+ Backend: auto-detected
224
+ """
225
+
226
+ import os
227
+ import sys
228
+ import json
229
+ import base64
230
+ import io
231
+ import subprocess
232
+ import requests
233
+ from PIL import Image, ImageDraw, ImageFont, ImageFilter, ImageEnhance
234
+
235
+ # ═══════════════════════════════════════════════════════════
236
+ # CONFIGURATION — Agent fills these in before writing script
237
+ # ═══════════════════════════════════════════════════════════
238
+ VIDEO_TITLE = "FILL_VIDEO_TITLE"
239
+ VIDEO_NICHE = "FILL_NICHE" # e.g. "tutorial", "review", "travel"
240
+ STYLE = "FILL_STYLE" # e.g. "style_6_cinematic"
241
+ TEXT_OVERLAY = "FILL_TEXT" # max 4 words; empty string = auto from title
242
+ AI_PROMPT = "FILL_AI_PROMPT" # full prompt built from style template
243
+ OUTPUT_PATH = "/mnt/user-data/outputs/thumbnail.png"
244
+ # ═══════════════════════════════════════════════════════════
245
+
246
+
247
+ # ─────────────────────────────────────────────────────────
248
+ # BACKEND DETECTION
249
+ # ─────────────────────────────────────────────────────────
250
+
251
+ def detect_mcp_imagen() -> bool:
252
+ """Check if mcp-imagen-go binary is installed and configured in Gemini CLI."""
253
+ try:
254
+ r = subprocess.run(["which", "mcp-imagen-go"],
255
+ capture_output=True, text=True, timeout=5)
256
+ if r.returncode != 0:
257
+ return False
258
+ except (FileNotFoundError, subprocess.TimeoutExpired):
259
+ return False
260
+ settings_path = os.path.expanduser("~/.gemini/settings.json")
261
+ if not os.path.exists(settings_path):
262
+ return False
263
+ try:
264
+ with open(settings_path) as f:
265
+ settings = json.load(f)
266
+ return "imagen" in settings.get("mcpServers", {})
267
+ except Exception:
268
+ return False
269
+
270
+
271
+ def detect_gemini_api() -> bool:
272
+ """Check if Gemini API key is set and google-genai is installed."""
273
+ if not os.environ.get("GEMINI_API_KEY"):
274
+ return False
275
+ try:
276
+ import google.genai
277
+ return True
278
+ except ImportError:
279
+ return False
280
+
281
+
282
+ def detect_backend() -> str:
283
+ """Auto-detect best available image generation backend."""
284
+ # Priority 1: Local A1111
285
+ try:
286
+ r = requests.get("http://127.0.0.1:7860/sdapi/v1/samplers", timeout=3)
287
+ if r.status_code == 200:
288
+ print("āœ“ Backend: A1111 Local")
289
+ return "a1111"
290
+ except Exception:
291
+ pass
292
+
293
+ # Priority 2: Local ComfyUI
294
+ try:
295
+ r = requests.get("http://127.0.0.1:8188/history", timeout=3)
296
+ if r.status_code == 200:
297
+ print("āœ“ Backend: ComfyUI Local")
298
+ return "comfyui"
299
+ except Exception:
300
+ pass
301
+
302
+ # Priority 3: MCP Imagen 4 (Vertex AI via Gemini CLI)
303
+ if detect_mcp_imagen():
304
+ print("āœ“ Backend: MCP Imagen 4 (Vertex AI)")
305
+ return "mcp_imagen"
306
+
307
+ # Priority 4: Gemini API (Nano Banana 2)
308
+ if detect_gemini_api():
309
+ print("āœ“ Backend: Gemini API (Nano Banana 2)")
310
+ return "gemini_api"
311
+
312
+ # Priority 5: fal.ai FLUX
313
+ if os.environ.get("FAL_KEY"):
314
+ try:
315
+ import fal_client
316
+ print("āœ“ Backend: fal.ai FLUX")
317
+ return "fal"
318
+ except ImportError:
319
+ pass
320
+
321
+ # Priority 6: OpenAI gpt-image-1
322
+ if os.environ.get("OPENAI_API_KEY"):
323
+ try:
324
+ import openai
325
+ print("āœ“ Backend: OpenAI gpt-image-1")
326
+ return "openai"
327
+ except ImportError:
328
+ pass
329
+
330
+ # Priority 7: Pillow-only fallback
331
+ print("⚠ Backend: Pillow-only (no AI available)")
332
+ return "pillow_only"
333
+
334
+
335
+ # ─────────────────────────────────────────────────────────
336
+ # IMAGE GENERATION — one function per backend
337
+ # ─────────────────────────────────────────────────────────
338
+
339
+ def gen_a1111(prompt: str) -> Image.Image:
340
+ payload = {
341
+ "prompt": prompt,
342
+ "negative_prompt": "blurry, low quality, text, watermark, ugly, deformed, cropped",
343
+ "width": 1280,
344
+ "height": 720,
345
+ "steps": 25,
346
+ "cfg_scale": 7,
347
+ "sampler_name": "DPM++ 2M Karras",
348
+ "batch_size": 1,
349
+ }
350
+ r = requests.post("http://127.0.0.1:7860/sdapi/v1/txt2img", json=payload, timeout=120)
351
+ r.raise_for_status()
352
+ data = r.json()
353
+ img_bytes = base64.b64decode(data["images"][0])
354
+ return Image.open(io.BytesIO(img_bytes))
355
+
356
+
357
+ def gen_comfyui(prompt: str) -> Image.Image:
358
+ """Simple ComfyUI text-to-image via basic workflow."""
359
+ workflow = {
360
+ "3": {"inputs": {"text": prompt, "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
361
+ "4": {"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}, "class_type": "CheckpointLoaderSimple"},
362
+ "5": {"inputs": {"text": "blurry, ugly, watermark", "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
363
+ "6": {"inputs": {"width": 1280, "height": 720, "batch_size": 1}, "class_type": "EmptyLatentImage"},
364
+ "7": {"inputs": {"seed": -1, "steps": 25, "cfg": 7, "sampler_name": "dpmpp_2m",
365
+ "scheduler": "karras", "denoise": 1,
366
+ "model": ["4", 0], "positive": ["3", 0],
367
+ "negative": ["5", 0], "latent_image": ["6", 0]},
368
+ "class_type": "KSampler"},
369
+ "8": {"inputs": {"samples": ["7", 0], "vae": ["4", 2]}, "class_type": "VAEDecode"},
370
+ "9": {"inputs": {"images": ["8", 0], "filename_prefix": "thumb"},
371
+ "class_type": "SaveImage"},
372
+ }
373
+ r = requests.post("http://127.0.0.1:8188/prompt",
374
+ json={"prompt": workflow}, timeout=120)
375
+ r.raise_for_status()
376
+ prompt_id = r.json()["prompt_id"]
377
+
378
+ # Poll for result
379
+ import time
380
+ for _ in range(60):
381
+ time.sleep(2)
382
+ hist = requests.get(f"http://127.0.0.1:8188/history/{prompt_id}", timeout=10).json()
383
+ if prompt_id in hist:
384
+ outputs = hist[prompt_id]["outputs"]
385
+ for node_id, node_output in outputs.items():
386
+ if "images" in node_output:
387
+ img_info = node_output["images"][0]
388
+ img_r = requests.get(
389
+ f"http://127.0.0.1:8188/view?filename={img_info['filename']}"
390
+ f"&subfolder={img_info.get('subfolder','')}&type={img_info['type']}",
391
+ timeout=30
392
+ )
393
+ return Image.open(io.BytesIO(img_r.content))
394
+ raise TimeoutError("ComfyUI generation timed out after 120s")
395
+
396
+
397
+ def gen_mcp_imagen(prompt: str) -> Image.Image:
398
+ """
399
+ Call mcp-imagen-go directly via STDIO (MCP protocol).
400
+ Requires mcp-imagen-go binary in PATH and PROJECT_ID env var.
401
+ Uses Imagen 4 via Vertex AI — highest quality option.
402
+ """
403
+ os.makedirs("/tmp/thumbnail_gen", exist_ok=True)
404
+
405
+ mcp_request = json.dumps({
406
+ "jsonrpc": "2.0",
407
+ "method": "tools/call",
408
+ "id": 1,
409
+ "params": {
410
+ "name": "imagen_t2i",
411
+ "arguments": {
412
+ "prompt": prompt,
413
+ "aspect_ratio": "16:9",
414
+ "number_of_images": 1,
415
+ "output_directory": "/tmp/thumbnail_gen",
416
+ }
417
+ }
418
+ })
419
+
420
+ # Load PROJECT_ID from settings.json if not in env
421
+ env = os.environ.copy()
422
+ if not env.get("PROJECT_ID"):
423
+ try:
424
+ settings_path = os.path.expanduser("~/.gemini/settings.json")
425
+ with open(settings_path) as f:
426
+ settings = json.load(f)
427
+ mcp_env = settings.get("mcpServers", {}).get("imagen", {}).get("env", {})
428
+ env.update({k: v for k, v in mcp_env.items() if v and "YOUR_" not in v})
429
+ except Exception:
430
+ pass
431
+
432
+ proc = subprocess.run(
433
+ ["mcp-imagen-go"],
434
+ input=mcp_request,
435
+ capture_output=True,
436
+ text=True,
437
+ timeout=90,
438
+ env=env
439
+ )
440
+
441
+ if proc.returncode != 0:
442
+ raise RuntimeError(f"mcp-imagen-go failed: {proc.stderr[:500]}")
443
+
444
+ try:
445
+ response = json.loads(proc.stdout)
446
+ except json.JSONDecodeError:
447
+ # Some versions output multiple JSON lines — take last valid one
448
+ for line in reversed(proc.stdout.strip().split("\n")):
449
+ try:
450
+ response = json.loads(line)
451
+ break
452
+ except Exception:
453
+ continue
454
+ else:
455
+ raise RuntimeError("Could not parse mcp-imagen-go output")
456
+
457
+ content = response.get("result", {}).get("content", [])
458
+
459
+ for block in content:
460
+ # Inline base64 image
461
+ if block.get("type") == "image" and block.get("data"):
462
+ img_bytes = base64.b64decode(block["data"])
463
+ return Image.open(io.BytesIO(img_bytes))
464
+
465
+ # File path returned as text
466
+ if block.get("type") == "text":
467
+ text = block.get("text", "")
468
+ for token in text.split():
469
+ token = token.strip(".,\"'")
470
+ if token.endswith((".png", ".jpg", ".jpeg")) and os.path.exists(token):
471
+ return Image.open(token)
472
+
473
+ # Check output directory for newly created files
474
+ import glob, time
475
+ recent = sorted(glob.glob("/tmp/thumbnail_gen/*.png"), key=os.path.getmtime, reverse=True)
476
+ if recent:
477
+ return Image.open(recent[0])
478
+
479
+ raise ValueError("mcp-imagen-go returned no image data")
480
+
481
+
482
+ def gen_gemini_api(prompt: str) -> Image.Image:
483
+ """
484
+ Generate via Gemini API (Nano Banana 2 / gemini-3.1-flash-image-preview).
485
+ Note: aspect ratio is requested in the prompt, not as a parameter.
486
+ """
487
+ from google import genai
488
+ from google.genai import types
489
+
490
+ client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
491
+
492
+ full_prompt = (
493
+ f"{prompt}. "
494
+ "Generate as a wide landscape 16:9 format, high resolution, "
495
+ "professional YouTube thumbnail quality."
496
+ )
497
+
498
+ # Try models newest-first
499
+ models = [
500
+ "gemini-3.1-flash-image-preview",
501
+ "gemini-2.5-flash-image-preview",
502
+ "gemini-2.0-flash-exp",
503
+ ]
504
+
505
+ for model_id in models:
506
+ try:
507
+ response = client.models.generate_content(
508
+ model=model_id,
509
+ contents=[full_prompt],
510
+ config=types.GenerateContentConfig(
511
+ response_modalities=["IMAGE", "TEXT"]
512
+ )
513
+ )
514
+ for part in response.candidates[0].content.parts:
515
+ if part.inline_data is not None:
516
+ img = Image.open(io.BytesIO(part.inline_data.data))
517
+ print(f" ↳ Used model: {model_id}")
518
+ return img
519
+ except Exception as e:
520
+ print(f" ↳ {model_id} failed: {e}")
521
+ continue
522
+
523
+ raise ValueError("All Gemini API models failed — check GEMINI_API_KEY and quota")
524
+
525
+
526
+ def gen_fal(prompt: str) -> Image.Image:
527
+ import fal_client
528
+
529
+ result = fal_client.subscribe(
530
+ "fal-ai/flux/dev",
531
+ arguments={
532
+ "prompt": prompt,
533
+ "image_size": {"width": 1280, "height": 720},
534
+ "num_inference_steps": 28,
535
+ "num_images": 1,
536
+ "enable_safety_checker": True,
537
+ }
538
+ )
539
+ img_url = result["images"][0]["url"]
540
+ r = requests.get(img_url, timeout=60)
541
+ r.raise_for_status()
542
+ return Image.open(io.BytesIO(r.content))
543
+
544
+
545
+ def gen_openai(prompt: str) -> Image.Image:
546
+ from openai import OpenAI
547
+ client = OpenAI()
548
+
549
+ response = client.images.generate(
550
+ model="gpt-image-1",
551
+ prompt=prompt,
552
+ size="1536x1024", # closest 16:9 available
553
+ quality="standard",
554
+ n=1,
555
+ )
556
+ img_bytes = base64.b64decode(response.data[0].b64_json)
557
+ img = Image.open(io.BytesIO(img_bytes))
558
+ # gpt-image-1 returns 1536Ɨ1024 — resize to exact YouTube spec
559
+ return img.resize((1280, 720), Image.LANCZOS)
560
+
561
+
562
+ def gen_pillow_only(style: str, title: str) -> Image.Image:
563
+ """
564
+ Pure Pillow fallback — generates a styled graphic without any AI.
565
+ Produces a usable thumbnail when no AI backend is available.
566
+ """
567
+ canvas = Image.new("RGB", (1280, 720))
568
+ draw = ImageDraw.Draw(canvas)
569
+
570
+ # Style-specific color palettes
571
+ PALETTES = {
572
+ "style_1_minimalism": [(245,245,245), (220,220,220)],
573
+ "style_6_cinematic": [(8, 12, 25), (40, 30, 60)],
574
+ "style_11_anti": [(5, 5, 8), (20, 20, 30)],
575
+ "style_4_whiteboard": [(250,248,240), (230,225,210)],
576
+ "style_7_warped": [(10, 5, 20), (50, 10, 60)],
577
+ "default": [(15, 20, 40), (40, 60, 100)],
578
+ }
579
+ colors = PALETTES.get(style, PALETTES["default"])
580
+
581
+ # Vertical gradient
582
+ for y in range(720):
583
+ t = y / 719
584
+ r_v = int(colors[0][0] * (1 - t) + colors[1][0] * t)
585
+ g_v = int(colors[0][1] * (1 - t) + colors[1][1] * t)
586
+ b_v = int(colors[0][2] * (1 - t) + colors[1][2] * t)
587
+ draw.line([(0, y), (1280, y)], fill=(r_v, g_v, b_v))
588
+
589
+ # Decorative diagonal accent lines
590
+ accent = (80, 120, 200) if colors[0][0] < 50 else (150, 150, 160)
591
+ for i in range(0, 1280, 120):
592
+ draw.line([(i, 0), (i + 400, 720)], fill=accent, width=1)
593
+
594
+ return canvas
595
+
596
+
597
+ # ─────────────────────────────────────────────────────────
598
+ # STYLE EFFECTS (Pillow post-processing)
599
+ # ─────────────────────────────────────────────────────────
600
+
601
+ def apply_style_effects(img: Image.Image, style: str) -> Image.Image:
602
+ """Apply style-specific color grading and effects."""
603
+
604
+ if style == "style_1_minimalism":
605
+ img = ImageEnhance.Color(img).enhance(0.75)
606
+ img = ImageEnhance.Brightness(img).enhance(1.05)
607
+
608
+ elif style == "style_3_rainbow":
609
+ img = ImageEnhance.Color(img).enhance(1.4)
610
+ img = ImageEnhance.Contrast(img).enhance(1.1)
611
+
612
+ elif style == "style_4_whiteboard":
613
+ img = ImageEnhance.Color(img).enhance(0.65)
614
+ # Warm tone shift
615
+ r, g, b = img.split()
616
+ r = ImageEnhance.Brightness(Image.merge("RGB", (r, r, r))).enhance(1.05).split()[0]
617
+ img = Image.merge("RGB", (r, g, b))
618
+
619
+ elif style == "style_6_cinematic":
620
+ img = ImageEnhance.Color(img).enhance(0.85)
621
+ img = ImageEnhance.Contrast(img).enhance(1.3)
622
+ # Slight teal shadow / orange highlight look
623
+ img = _apply_color_grade(img, shadow=(0, 5, 15), highlight=(15, 5, 0))
624
+
625
+ elif style == "style_7_warped":
626
+ # RGB channel shift for glitch effect
627
+ r, g, b = img.split()
628
+ r = r.transform(img.size, Image.AFFINE, (1, 0, 8, 0, 1, 0))
629
+ b = b.transform(img.size, Image.AFFINE, (1, 0, -6, 0, 1, 2))
630
+ img = Image.merge("RGB", (r, g, b))
631
+ img = ImageEnhance.Contrast(img).enhance(1.2)
632
+
633
+ elif style == "style_9_encyclopedia":
634
+ img = ImageEnhance.Color(img).enhance(0.6)
635
+ img = ImageEnhance.Brightness(img).enhance(1.1)
636
+
637
+ elif style == "style_11_anti":
638
+ img = ImageEnhance.Brightness(img).enhance(0.6)
639
+ img = ImageEnhance.Contrast(img).enhance(1.5)
640
+
641
+ else:
642
+ # Default: moderate contrast boost
643
+ img = ImageEnhance.Contrast(img).enhance(1.15)
644
+
645
+ # Vignette applied to all styles
646
+ img = _apply_vignette(img, strength=0.35)
647
+
648
+ return img
649
+
650
+
651
+ def _apply_color_grade(img: Image.Image,
652
+ shadow=(0, 0, 0),
653
+ highlight=(0, 0, 0)) -> Image.Image:
654
+ """Subtle shadow/highlight color grade (like LUTs)."""
655
+ r, g, b = img.split()
656
+
657
+ def grade_channel(channel, shadow_add, highlight_add):
658
+ lut = []
659
+ for i in range(256):
660
+ t = i / 255.0
661
+ val = i + int(shadow_add * (1 - t)) + int(highlight_add * t)
662
+ lut.append(max(0, min(255, val)))
663
+ return channel.point(lut)
664
+
665
+ r = grade_channel(r, shadow[0], highlight[0])
666
+ g = grade_channel(g, shadow[1], highlight[1])
667
+ b = grade_channel(b, shadow[2], highlight[2])
668
+ return Image.merge("RGB", (r, g, b))
669
+
670
+
671
+ def _apply_vignette(img: Image.Image, strength: float = 0.35) -> Image.Image:
672
+ """Add subtle radial vignette to focus the eye toward center."""
673
+ w, h = img.size
674
+ mask = Image.new("L", (w, h), 255)
675
+ draw = ImageDraw.Draw(mask)
676
+
677
+ steps = min(w, h) // 2
678
+ for i in range(steps):
679
+ progress = i / steps
680
+ alpha = int(255 * (progress + (1 - progress) * (1 - strength)))
681
+ alpha = max(0, min(255, alpha))
682
+ margin_x = int((1 - progress) * (w // 2))
683
+ margin_y = int((1 - progress) * (h // 2))
684
+ draw.ellipse(
685
+ [margin_x, margin_y, w - margin_x, h - margin_y],
686
+ fill=alpha
687
+ )
688
+
689
+ mask = mask.filter(ImageFilter.GaussianBlur(radius=40))
690
+ black = Image.new("RGB", (w, h), (0, 0, 0))
691
+ img = Image.composite(black, img, mask)
692
+ return img
693
+
694
+
695
+ # ─────────────────────────────────────────────────────────
696
+ # TEXT OVERLAY
697
+ # ─────────────────────────────────────────────────────────
698
+
699
+ FONT_PATHS = [
700
+ "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
701
+ "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
702
+ "/usr/share/fonts/TTF/DejaVuSans-Bold.ttf",
703
+ "/usr/share/fonts/truetype/freefont/FreeSansBold.ttf",
704
+ "/usr/share/fonts/truetype/ubuntu/Ubuntu-Bold.ttf",
705
+ ]
706
+
707
+ # Styles where text should be omitted
708
+ NO_TEXT_STYLES = {"style_7_warped", "style_10_candid"}
709
+
710
+
711
+ def load_font(size: int):
712
+ for fp in FONT_PATHS:
713
+ if os.path.exists(fp):
714
+ try:
715
+ return ImageFont.truetype(fp, size=size)
716
+ except Exception:
717
+ continue
718
+ return ImageFont.load_default()
719
+
720
+
721
+ def add_text_overlay(img: Image.Image, text: str, style: str) -> Image.Image:
722
+ """Add styled text overlay appropriate for each thumbnail style."""
723
+
724
+ if not text or style in NO_TEXT_STYLES:
725
+ return img
726
+
727
+ # Truncate to 4 words max (per best-practice from THUMBNAILS.md)
728
+ words = text.split()
729
+ if len(words) > 4:
730
+ text = " ".join(words[:4])
731
+
732
+ w, h = img.size
733
+ img = img.convert("RGBA")
734
+
735
+ if style in ("style_6_cinematic", "style_11_anti"):
736
+ return _text_centered_large(img, text, w, h)
737
+
738
+ elif style in ("style_1_minimalism", "style_4_whiteboard"):
739
+ return _text_clean_corner(img, text, w, h, style)
740
+
741
+ elif style == "style_11_anti":
742
+ return _text_centered_large(img, text, w, h)
743
+
744
+ else:
745
+ return _text_banner_strip(img, text, w, h)
746
+
747
+
748
+ def _text_centered_large(img, text, w, h):
749
+ """Large centered text for cinematic/anti-thumbnail styles."""
750
+ font = load_font(96)
751
+ draw = ImageDraw.Draw(img)
752
+
753
+ bbox = draw.textbbox((0, 0), text.upper(), font=font)
754
+ tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
755
+ x, y = (w - tw) // 2, (h - th) // 2
756
+
757
+ # Glow / shadow effect
758
+ for offset in [(6, 6), (-6, 6), (6, -6), (-6, -6)]:
759
+ draw.text((x + offset[0], y + offset[1]), text.upper(),
760
+ font=font, fill=(0, 0, 0, 180))
761
+ draw.text((x, y), text.upper(), font=font, fill=(255, 255, 255, 255))
762
+
763
+ return img.convert("RGB")
764
+
765
+
766
+ def _text_clean_corner(img, text, w, h, style):
767
+ """Clean minimal text for minimalism and whiteboard styles."""
768
+ font = load_font(72)
769
+ draw = ImageDraw.Draw(img)
770
+
771
+ text_color = (30, 30, 30, 255) if style == "style_4_whiteboard" else (60, 60, 60, 255)
772
+ bbox = draw.textbbox((0, 0), text, font=font)
773
+ x, y = 60, h - (bbox[3] - bbox[1]) - 60
774
+
775
+ # Subtle shadow
776
+ draw.text((x + 2, y + 2), text, font=font, fill=(200, 200, 200, 120))
777
+ draw.text((x, y), text, font=font, fill=text_color)
778
+
779
+ return img.convert("RGB")
780
+
781
+
782
+ def _text_banner_strip(img, text, w, h):
783
+ """Semi-transparent banner strip with high-contrast text."""
784
+ font = load_font(82)
785
+ draw_measure = ImageDraw.Draw(img)
786
+ bbox = draw_measure.textbbox((0, 0), text.upper(), font=font)
787
+ tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
788
+
789
+ padding_x, padding_y = 30, 18
790
+ strip_h = th + padding_y * 2
791
+ strip_y = h - strip_h - 40
792
+
793
+ # Semi-transparent background strip
794
+ overlay = Image.new("RGBA", (w, h), (0, 0, 0, 0))
795
+ overlay_draw = ImageDraw.Draw(overlay)
796
+ overlay_draw.rectangle(
797
+ [0, strip_y, w, strip_y + strip_h],
798
+ fill=(0, 0, 0, 175)
799
+ )
800
+ img = Image.alpha_composite(img, overlay)
801
+
802
+ # Text: shadow then main
803
+ draw = ImageDraw.Draw(img)
804
+ x = (w - tw) // 2
805
+ y = strip_y + padding_y
806
+
807
+ draw.text((x + 3, y + 3), text.upper(), font=font, fill=(0, 0, 0, 200))
808
+ draw.text((x, y), text.upper(), font=font, fill=(255, 220, 50, 255))
809
+
810
+ return img.convert("RGB")
811
+
812
+
813
+ # ─────────────────────────────────────────────────────────
814
+ # AUTO TEXT EXTRACTION
815
+ # ─────────────────────────────────────────────────────────
816
+
817
+ def auto_text(video_title: str, style: str) -> str:
818
+ """Extract best text overlay from video title for given style."""
819
+ if style in NO_TEXT_STYLES:
820
+ return ""
821
+ words = video_title.split()
822
+ # For anti-thumbnail: keep number if present, else use 3 words
823
+ if style == "style_11_anti":
824
+ for word in words:
825
+ if any(c.isdigit() for c in word):
826
+ return word + (" Seconds" if "sec" not in video_title.lower() else "")
827
+ return " ".join(words[:3])
828
+ # General: first 4 impactful words
829
+ stopwords = {"the", "a", "an", "how", "to", "i", "my", "is", "are", "was"}
830
+ filtered = [w for w in words if w.lower() not in stopwords]
831
+ result = filtered[:4] if filtered else words[:4]
832
+ return " ".join(result)
833
+
834
+
835
+ # ─────────────────────────────────────────────────────────
836
+ # DEPENDENCY INSTALLER
837
+ # ─────────────────────────────────────────────────────────
838
+
839
+ def ensure_deps(backend: str):
840
+ """Install required packages for the selected backend."""
841
+ deps = ["Pillow", "requests"]
842
+
843
+ if backend == "gemini_api":
844
+ deps.append("google-genai")
845
+ elif backend == "fal":
846
+ deps.append("fal-client")
847
+ elif backend == "openai":
848
+ deps.append("openai")
849
+
850
+ for dep in deps:
851
+ try:
852
+ if dep == "Pillow":
853
+ import PIL
854
+ elif dep == "requests":
855
+ import requests
856
+ elif dep == "google-genai":
857
+ import google.genai
858
+ elif dep == "fal-client":
859
+ import fal_client
860
+ elif dep == "openai":
861
+ import openai
862
+ except ImportError:
863
+ print(f"Installing {dep}...")
864
+ subprocess.run(
865
+ [sys.executable, "-m", "pip", "install", dep,
866
+ "--break-system-packages", "-q"],
867
+ check=True
868
+ )
869
+
870
+
871
+ # ─────────────────────────────────────────────────────────
872
+ # MAIN ORCHESTRATOR
873
+ # ─────────────────────────────────────────────────────────
874
+
875
+ def main():
876
+ print(f"\nšŸŽØ Thumbnail Generator")
877
+ print(f" Title : {VIDEO_TITLE}")
878
+ print(f" Style : {STYLE}")
879
+ print(f" Output: {OUTPUT_PATH}\n")
880
+
881
+ # 1. Detect backend
882
+ backend = detect_backend()
883
+
884
+ # 2. Install deps if needed
885
+ ensure_deps(backend)
886
+
887
+ # 3. Generate base image
888
+ print(f"→ Generating base image...")
889
+ generators = {
890
+ "a1111": gen_a1111,
891
+ "comfyui": gen_comfyui,
892
+ "mcp_imagen": gen_mcp_imagen,
893
+ "gemini_api": gen_gemini_api,
894
+ "fal": gen_fal,
895
+ "openai": gen_openai,
896
+ }
897
+
898
+ if backend == "pillow_only":
899
+ base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
900
+ else:
901
+ try:
902
+ base_img = generators[backend](AI_PROMPT)
903
+ except Exception as e:
904
+ print(f"⚠ {backend} failed: {e}")
905
+ print(" Falling back to Pillow-only...")
906
+ base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
907
+
908
+ # 4. Normalize to exact 1280Ɨ720
909
+ base_img = base_img.convert("RGB").resize((1280, 720), Image.LANCZOS)
910
+ print(f"āœ“ Base image ready: {base_img.size}")
911
+
912
+ # 5. Apply style effects
913
+ print("→ Applying style effects...")
914
+ base_img = apply_style_effects(base_img, STYLE)
915
+
916
+ # 6. Determine text overlay
917
+ text = TEXT_OVERLAY if TEXT_OVERLAY else auto_text(VIDEO_TITLE, STYLE)
918
+ print(f"→ Text overlay: '{text}'" if text else "→ No text overlay (style preference)")
919
+
920
+ # 7. Add text
921
+ base_img = add_text_overlay(base_img, text, STYLE)
922
+
923
+ # 8. Save
924
+ os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
925
+ base_img.save(OUTPUT_PATH, "PNG", optimize=True)
926
+
927
+ size_kb = os.path.getsize(OUTPUT_PATH) // 1024
928
+ print(f"\nāœ… Saved: {OUTPUT_PATH} ({size_kb} KB, 1280Ɨ720)")
929
+
930
+
931
+ if __name__ == "__main__":
932
+ main()
933
+ ```
934
+
935
+ ---
936
+
937
+ ## Agent Execution Protocol
938
+
939
+ When the user asks for a thumbnail, the agent follows these steps:
940
+
941
+ ### Step 1 — Parse
942
+ Extract from message:
943
+ - `VIDEO_TITLE` — the video title or topic description
944
+ - `VIDEO_NICHE` — category (tutorial, review, travel, etc.)
945
+ - `STYLE` — if explicitly mentioned; otherwise auto-select
946
+ - `TEXT_OVERLAY` — specific text if mentioned (max 4 words); else leave empty
947
+
948
+ ### Step 2 — Select Style
949
+ ```python
950
+ style = select_style(VIDEO_NICHE, style_override=None)
951
+ ```
952
+
953
+ ### Step 3 — Build AI Prompt
954
+ Use the style template from "The 11 Styles" section above.
955
+ Append universal quality suffix:
956
+ ```
957
+ ", professional YouTube thumbnail, vibrant high contrast, cinematic quality,
958
+ sharp focus, award-winning composition"
959
+ ```
960
+
961
+ ### Step 4 — Fill Script Template
962
+ Replace all FILL_ placeholders in the Python script above with actual values.
963
+
964
+ ### Step 5 — Execute
965
+ ```bash
966
+ pip install Pillow requests --break-system-packages -q
967
+ python3 /home/claude/generate_thumbnail.py
968
+ ```
969
+
970
+ ### Step 6 — Verify & Present
971
+ ```python
972
+ assert os.path.exists("/mnt/user-data/outputs/thumbnail.png")
973
+ assert os.path.getsize("/mnt/user-data/outputs/thumbnail.png") > 50_000
974
+ ```
975
+ Then call `present_files` tool with the output path.
976
+ Briefly tell the user which style was chosen and why (1 sentence).
977
+
978
+ ---
979
+
980
+ ## Style Prompt Templates (Reference Card)
981
+ ```python
982
+ STYLE_PROMPTS = {
983
+ "style_1_minimalism": (
984
+ "{subject}, minimalist product photography, pure white background, "
985
+ "single centered subject, dramatic soft studio lighting, ultra clean"
986
+ ),
987
+ "style_2_surround": (
988
+ "{subject} dead center, multiple {objects} arranged in perfect organized "
989
+ "circle around center, controlled chaos, vibrant, top-down angle"
990
+ ),
991
+ "style_3_rainbow": (
992
+ "flat lay of {items} in ranking order, color gradient red to blue, "
993
+ "product photography, clean background, vivid colors"
994
+ ),
995
+ "style_4_whiteboard": (
996
+ "hand-drawn diagram on real whiteboard explaining {concept}, chalk markers, "
997
+ "rough sketchy authentic educational style, classroom feel"
998
+ ),
999
+ "style_5_ui_framing": (
1000
+ "realistic screenshot mockup of {platform} UI about {topic}, "
1001
+ "exact platform styling, authentic spacing and fonts, credible interface"
1002
+ ),
1003
+ "style_6_cinematic": (
1004
+ "cinematic movie still about {subject}, dramatic chiaroscuro lighting, "
1005
+ "film grain, anamorphic lens flare, shallow depth of field"
1006
+ ),
1007
+ "style_7_warped": (
1008
+ "double exposure portrait, digital glitch effect, {emotion} face merged "
1009
+ "with {concept}, surreal distortion, moody dark tones"
1010
+ ),
1011
+ "style_8_maximalist": (
1012
+ "aerial flat lay of complete collection of all {items}, perfectly organized, "
1013
+ "every single item visible, product catalog photography"
1014
+ ),
1015
+ "style_9_encyclopedia": (
1016
+ "flat icon illustration grid of {topic} elements, consistent icon shapes, "
1017
+ "high contrast on white background, educational infographic style"
1018
+ ),
1019
+ "style_10_candid": (
1020
+ "candid authentic moment of {scene}, natural spontaneous composition, "
1021
+ "perfectly framed, golden hour lighting, documentary photography"
1022
+ ),
1023
+ "style_11_anti": (
1024
+ "dark moody portrait of {subject}, direct serious eye contact to camera, "
1025
+ "dramatic low-key lighting, minimal background, cinematic"
1026
+ ),
1027
+ }
1028
+ ```
1029
+
1030
+ ---
1031
+
1032
+ ## Output Specification
1033
+ - **Path:** `/mnt/user-data/outputs/thumbnail.png`
1034
+ - **Resolution:** 1280 Ɨ 720 px (YouTube 16:9 standard)
1035
+ - **Format:** PNG
1036
+ - **Min file size:** ~50 KB (abort and retry if smaller)
1037
+
1038
+ ---
1039
+
1040
+ ## Error Handling Rules
1041
+ 1. Backend API fails → automatically fall back to next priority backend
1042
+ 2. All AI backends fail → use `gen_pillow_only()`, still produce output
1043
+ 3. Font not found → use `ImageFont.load_default()`, never crash
1044
+ 4. Image < 50KB after save → regenerate with next backend
1045
+ 5. TEXT_OVERLAY blank → run `auto_text()` to extract from title
1046
+ 6. mcp-imagen-go PATH issue → check `~/.gemini/settings.json` env and retry
1047
+ with full binary path from `which mcp-imagen-go`
1048
+
1049
+ ---
1050
+
1051
+ ## Quick Reference: Which Backend for What
1052
+
1053
+ | Situation | Recommended Backend |
1054
+ |-----------|-------------------|
1055
+ | Local GPU available (A1111 running) | `a1111` — always fastest+free |
1056
+ | Have GCP project + Gemini CLI set up | `mcp_imagen` — Imagen 4, best quality |
1057
+ | Have Gemini API key (from Gemini CLI) | `gemini_api` — free quota, good quality |
1058
+ | Cloud-only, budget matters | `fal` — $0.03/image, FLUX quality |
1059
+ | Need best text rendering in image | `openai` — gpt-image-1 best for text |
1060
+ | No API keys / testing | `pillow_only` — always works |