@biggora/claude-plugins 1.0.0 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/.claude/settings.local.json +13 -0
  2. package/CLAUDE.md +55 -0
  3. package/LICENSE +1 -1
  4. package/README.md +193 -39
  5. package/bin/cli.js +39 -0
  6. package/package.json +30 -17
  7. package/registry/registry.json +196 -1
  8. package/registry/schema.json +10 -0
  9. package/src/commands/skills/add.js +194 -0
  10. package/src/commands/skills/list.js +52 -0
  11. package/src/commands/skills/remove.js +27 -0
  12. package/src/commands/skills/update.js +74 -0
  13. package/src/config.js +5 -0
  14. package/src/skills/codex-cli/SKILL.md +265 -0
  15. package/src/skills/commafeed-api/SKILL.md +1012 -0
  16. package/src/skills/gemini-cli/SKILL.md +379 -0
  17. package/src/skills/gemini-cli/references/commands.md +145 -0
  18. package/src/skills/gemini-cli/references/configuration.md +182 -0
  19. package/src/skills/gemini-cli/references/headless-and-scripting.md +181 -0
  20. package/src/skills/gemini-cli/references/mcp-and-extensions.md +254 -0
  21. package/src/skills/n8n-api/SKILL.md +623 -0
  22. package/src/skills/notebook-lm/SKILL.md +217 -0
  23. package/src/skills/notebook-lm/references/artifact-options.md +168 -0
  24. package/src/skills/notebook-lm/references/auth.md +58 -0
  25. package/src/skills/notebook-lm/references/workflows.md +144 -0
  26. package/src/skills/screen-recording/SKILL.md +309 -0
  27. package/src/skills/screen-recording/references/approach1-programmatic.md +311 -0
  28. package/src/skills/screen-recording/references/approach2-xvfb.md +232 -0
  29. package/src/skills/screen-recording/references/design-patterns.md +168 -0
  30. package/src/skills/test-mobile-app/SKILL.md +212 -0
  31. package/src/skills/test-mobile-app/references/report-template.md +95 -0
  32. package/src/skills/test-mobile-app/references/setup-appium.md +154 -0
  33. package/src/skills/test-mobile-app/scripts/analyze_apk.py +164 -0
  34. package/src/skills/test-mobile-app/scripts/check_environment.py +116 -0
  35. package/src/skills/test-mobile-app/scripts/generate_report.py +250 -0
  36. package/src/skills/test-mobile-app/scripts/run_tests.py +326 -0
  37. package/src/skills/test-web-ui/SKILL.md +232 -0
  38. package/src/skills/test-web-ui/references/test_case_schema.md +102 -0
  39. package/src/skills/test-web-ui/scripts/discover.py +176 -0
  40. package/src/skills/test-web-ui/scripts/generate_report.py +237 -0
  41. package/src/skills/test-web-ui/scripts/run_tests.py +296 -0
  42. package/src/skills/text-to-speech/SKILL.md +236 -0
  43. package/src/skills/text-to-speech/references/espeak-cli.md +277 -0
  44. package/src/skills/text-to-speech/references/kokoro-onnx.md +124 -0
  45. package/src/skills/text-to-speech/references/online-engines.md +128 -0
  46. package/src/skills/text-to-speech/references/pyttsx3-espeak.md +143 -0
  47. package/src/skills/tm-search/SKILL.md +240 -0
  48. package/src/skills/tm-search/references/field-guide.md +79 -0
  49. package/src/skills/tm-search/references/scraping-fallback.md +140 -0
  50. package/src/skills/tm-search/scripts/tm_search.py +375 -0
  51. package/src/skills/wp-rest-api/SKILL.md +114 -0
  52. package/src/skills/wp-rest-api/references/authentication.md +18 -0
  53. package/src/skills/wp-rest-api/references/custom-content-types.md +20 -0
  54. package/src/skills/wp-rest-api/references/discovery-and-params.md +20 -0
  55. package/src/skills/wp-rest-api/references/responses-and-fields.md +30 -0
  56. package/src/skills/wp-rest-api/references/routes-and-endpoints.md +36 -0
  57. package/src/skills/wp-rest-api/references/schema.md +22 -0
  58. package/src/skills/youtube-search/SKILL.md +412 -0
  59. package/src/skills/youtube-search/references/parsing-examples.md +159 -0
  60. package/src/skills/youtube-search/references/youtube-api-quota.md +85 -0
  61. package/src/skills/youtube-thumbnail/SKILL.md +1070 -0
  62. package/tests/commands/info.test.js +49 -0
  63. package/tests/commands/install.test.js +36 -0
  64. package/tests/commands/list.test.js +66 -0
  65. package/tests/commands/publish.test.js +182 -0
  66. package/tests/commands/search.test.js +45 -0
  67. package/tests/commands/uninstall.test.js +29 -0
  68. package/tests/commands/update.test.js +59 -0
  69. package/tests/functional/skills-lifecycle.test.js +293 -0
  70. package/tests/helpers/fixtures.js +63 -0
  71. package/tests/integration/cli.test.js +83 -0
  72. package/tests/skills/add.test.js +138 -0
  73. package/tests/skills/list.test.js +63 -0
  74. package/tests/skills/remove.test.js +38 -0
  75. package/tests/skills/update.test.js +60 -0
  76. package/tests/unit/config.test.js +31 -0
  77. package/tests/unit/registry.test.js +79 -0
  78. package/tests/unit/utils.test.js +150 -0
  79. package/tests/validation/registry-schema.test.js +112 -0
  80. package/tests/validation/skills-validation.test.js +96 -0
@@ -0,0 +1,1070 @@
1
+ ---
2
+ name: youtube-thumbnail
3
+ description: >
4
+ Generates professional YouTube thumbnails in 11 strategic styles with auto-detection
5
+ of AI image backends (A1111, ComfyUI, MCP Imagen 4, Gemini API, fal.ai FLUX, OpenAI
6
+ gpt-image-1) and Pillow compositing. Trigger when the user asks to create, generate,
7
+ or design a YouTube thumbnail, video cover, channel art, or presentation title slide.
8
+ Also trigger for batch thumbnail creation or YouTube visual workflow automation.
9
+ ---
10
+
11
+ # YouTube Thumbnail Generation Skill (2026 Edition)
12
+
13
+ ## Overview
14
+
15
+ This skill enables **fully autonomous** generation of professional YouTube thumbnails
16
+ in 11 strategic styles from THUMBNAILS.md. Zero user interaction required after the
17
+ initial request. The agent auto-selects style, builds a prompt, generates the base
18
+ image via the best available AI backend, applies compositing and text via Pillow,
19
+ and saves a final **1280×720 PNG** to `/mnt/user-data/outputs/thumbnail.png`.
20
+
21
+ ---
22
+
23
+ ## When to Trigger This Skill
24
+
25
+ Trigger when the user:
26
+ - Says "create a thumbnail", "make a thumbnail", "generate a YouTube thumbnail"
27
+ - Provides a video title or topic and needs a visual
28
+ - Wants product video covers, presentation title slides, or channel art
29
+ - Needs batch thumbnail creation or automation for a YouTube workflow
30
+
31
+ ---
32
+
33
+ ## Architecture: Two-Layer Pipeline
34
+ ```
35
+ [User Request]
36
+
37
+
38
+ [Layer 1: AI Image Generation] ← Base image via best available backend
39
+
40
+
41
+ [Layer 2: Pillow Compositing] ← Resize to 1280×720, effects, text overlay
42
+
43
+
44
+ [Output: /mnt/user-data/outputs/thumbnail.png]
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Backend Priority Table
50
+
51
+ Auto-detect each backend by testing availability. Use the first one that works.
52
+
53
+ | # | Backend | Detection Method | Quality | Cost |
54
+ |---|---------|-----------------|---------|------|
55
+ | 1 | **A1111 Local SD** | `GET localhost:7860/sdapi/v1/samplers` | ★★★★ | Free (own GPU) |
56
+ | 2 | **ComfyUI Local** | `GET localhost:8188/history` | ★★★★ | Free (own GPU) |
57
+ | 3 | **MCP Imagen 4** (Vertex AI) | `which mcp-imagen-go` + `~/.gemini/settings.json` | ★★★★★ | Vertex AI pricing |
58
+ | 4 | **Gemini API** (Nano Banana 2) | `GEMINI_API_KEY` env + `google-genai` installed | ★★★★ | Free quota |
59
+ | 5 | **fal.ai FLUX** | `FAL_KEY` env + `fal-client` installed | ★★★★ | $0.03/MP |
60
+ | 6 | **OpenAI gpt-image-1** | `OPENAI_API_KEY` env + `openai` installed | ★★★★ | ~$0.04/img |
61
+ | 7 | **Pillow-only fallback** | Always available | ★★ | Free |
62
+
63
+ ---
64
+
65
+ ## The 11 Thumbnail Styles
66
+
67
+ ### Style 1: Neo-Minimalism (`style_1_minimalism`)
68
+ **Best for:** General niches, standing out in a cluttered feed
69
+ **Core idea:** If the feed is loud, go quiet. 50%+ negative space.
70
+ **AI prompt pattern:**
71
+ `"[subject], minimalist product photography, pure white background, single centered
72
+ subject, dramatic soft studio lighting, ultra clean composition, no clutter"`
73
+ **Pillow:** White/monochromatic bg, max 2 colors, light serif font bottom-left or none
74
+
75
+ ### Style 2: The Surround (`style_2_surround`)
76
+ **Best for:** Comparisons, "I tried X things", hauls
77
+ **Core idea:** Subject dead center, objects in organized circle/grid around it.
78
+ **AI prompt pattern:**
79
+ `"[subject] perfectly centered, multiple [related objects] arranged in organized
80
+ circle or grid around center subject, controlled chaos, vibrant, top-down angle"`
81
+ **Pillow:** Grid math — center subject 40% canvas, surrounding items equally spaced by angle
82
+
83
+ ### Style 3: Rainbow Ranking (`style_3_rainbow`)
84
+ **Best for:** Tier lists, "Best to Worst", reviews
85
+ **Core idea:** Color gradient (Red→Blue) conveys hierarchy visually.
86
+ **AI prompt pattern:**
87
+ `"flat lay of [3-7 items] arranged in ranking order, color gradient from red to
88
+ blue across items, product photography style, clean background"`
89
+ **Pillow:** Apply gradient color wash per item via `ImageEnhance.Color`, add rank numbers (1,2,3…) in bold white
90
+
91
+ ### Style 4: Educational Whiteboard (`style_4_whiteboard`)
92
+ **Best for:** Tutorials, business explainers, complex systems
93
+ **Core idea:** Authenticity over polish. Signals "high value, no fluff."
94
+ **AI prompt pattern:**
95
+ `"hand-drawn diagram on real whiteboard explaining [concept], chalk markers,
96
+ rough sketchy educational style, authentic classroom feel, [topic] framework"`
97
+ **Pillow:** Reduce saturation to 70% for authenticity, warm color grade, handwritten-style font
98
+
99
+ ### Style 5: Familiar Interface (`style_5_ui_framing`)
100
+ **Best for:** Commentary, news, reviews
101
+ **Core idea:** Borrow credibility from known platforms (Twitter, Reddit, Amazon).
102
+ **AI prompt pattern:**
103
+ `"realistic screenshot mockup of [Twitter post / Reddit thread / Amazon listing /
104
+ Netflix menu] about [topic], exact platform UI styling, authentic spacing and fonts"`
105
+ **Pillow:** Programmatically draw platform UI elements — rounded rectangles, brand colors
106
+ (Twitter #1DA1F2, Reddit #FF4500, Amazon #FF9900)
107
+
108
+ ### Style 6: Cinematic Text (`style_6_cinematic`)
109
+ **Best for:** High-production storytelling, documentaries
110
+ **Core idea:** Text IS a design element — embedded in the world, not floating over it.
111
+ **AI prompt pattern:**
112
+ `"cinematic movie still about [subject], dramatic chiaroscuro lighting, film grain,
113
+ anamorphic lens flare, shallow depth of field, golden hour or moody tones"`
114
+ **Pillow:** MAX 3-4 words, large centered bold font, text shadow/glow via layered offset draws
115
+
116
+ ### Style 7: Warped Faces (`style_7_warped`)
117
+ **Best for:** Self-improvement, "Harsh Truths", psychology topics
118
+ **Core idea:** "Something is wrong" curiosity gap via distortion.
119
+ **AI prompt pattern:**
120
+ `"double exposure portrait, digital glitch effect, [emotion] face merged with
121
+ [abstract concept], surreal digital distortion, moody dark tones, experimental photography"`
122
+ **Pillow:** RGB channel shift for glitch (shift R channel +8px), selective blur, minimal/no text
123
+
124
+ ### Style 8: Maximalist Flex (`style_8_maximalist`)
125
+ **Best for:** Collectors, tech enthusiasts, hobbyists
126
+ **Core idea:** The collection is the star, not the person.
127
+ **AI prompt pattern:**
128
+ `"aerial flat lay of complete collection of every [item type], perfectly organized
129
+ and arranged, every single item visible, product catalog photography style"`
130
+ **Pillow:** Dense but organized placement, optional "COMPLETE COLLECTION" text strip top/bottom
131
+
132
+ ### Style 9: Encyclopedia Grid (`style_9_encyclopedia`)
133
+ **Best for:** "Every X Explained", deep dives
134
+ **Core idea:** Looks informative and "safe" — no drama, just knowledge.
135
+ **AI prompt pattern:**
136
+ `"flat icon illustration grid of [topic] elements, consistent icon shapes, high
137
+ contrast on white background, educational infographic style, no dramatic lighting"`
138
+ **Pillow:** Draw equal grid cells with `ImageDraw.rectangle`, flat icon each cell, label below
139
+
140
+ ### Style 10: Candid Fake (`style_10_candid`)
141
+ **Best for:** Challenges, travel, lifestyle
142
+ **Core idea:** Highly engineered to look like a lucky candid shot.
143
+ **AI prompt pattern:**
144
+ `"candid authentic moment of [person/scene], natural spontaneous composition but
145
+ perfectly framed, golden hour lighting, documentary photography style, physically
146
+ possible scene"`
147
+ **Pillow:** Minimal processing, subtle vignette at edges only. NO text. NO arrows.
148
+
149
+ ### Style 11: The Anti-Thumbnail (`style_11_anti`)
150
+ **Best for:** Productivity, "Quick Tip" videos
151
+ **Core idea:** Dark + specific "irritating" number triggers curiosity.
152
+ **AI prompt pattern:**
153
+ `"dark moody portrait of [subject], direct serious eye contact to camera, dramatic
154
+ low-key lighting, minimal background, cinematic single-subject composition"`
155
+ **Pillow:** Dark gradient bg (0,0,0)→(30,30,30), specific non-round number ("47 Seconds" not "60"),
156
+ large centered font
157
+
158
+ ---
159
+
160
+ ## Auto-Style Selection (when user doesn't specify)
161
+ ```python
162
+ NICHE_TO_STYLE = {
163
+ # Education & Learning
164
+ "education": "style_4_whiteboard",
165
+ "tutorial": "style_4_whiteboard",
166
+ "howto": "style_4_whiteboard",
167
+ "explainer": "style_9_encyclopedia",
168
+ "course": "style_4_whiteboard",
169
+
170
+ # Reviews & Rankings
171
+ "review": "style_3_rainbow",
172
+ "comparison": "style_2_surround",
173
+ "tierlist": "style_3_rainbow",
174
+ "ranking": "style_3_rainbow",
175
+ "top10": "style_3_rainbow",
176
+
177
+ # News & Commentary
178
+ "news": "style_5_ui_framing",
179
+ "commentary": "style_5_ui_framing",
180
+ "reaction": "style_5_ui_framing",
181
+ "opinion": "style_5_ui_framing",
182
+
183
+ # Personal Development
184
+ "productivity": "style_11_anti",
185
+ "psychology": "style_7_warped",
186
+ "selfimprovement": "style_7_warped",
187
+ "motivation": "style_11_anti",
188
+
189
+ # Collections & Gear
190
+ "collection": "style_8_maximalist",
191
+ "tech": "style_8_maximalist",
192
+ "gear": "style_8_maximalist",
193
+ "unboxing": "style_2_surround",
194
+
195
+ # Lifestyle & Travel
196
+ "travel": "style_10_candid",
197
+ "lifestyle": "style_10_candid",
198
+ "vlog": "style_10_candid",
199
+ "challenge": "style_10_candid",
200
+
201
+ # High-Production
202
+ "documentary": "style_6_cinematic",
203
+ "storytelling": "style_6_cinematic",
204
+ "cinematic": "style_6_cinematic",
205
+
206
+ # Default
207
+ "general": "style_1_minimalism",
208
+ }
209
+
210
+ def select_style(niche: str, style_override: str = None) -> str:
211
+ if style_override:
212
+ return style_override
213
+ niche_clean = niche.lower().replace(" ", "").replace("-", "")
214
+ for key in NICHE_TO_STYLE:
215
+ if key in niche_clean or niche_clean in key:
216
+ return NICHE_TO_STYLE[key]
217
+ return "style_1_minimalism"
218
+ ```
219
+
220
+ ---
221
+
222
+ ## Full Python Implementation
223
+
224
+ When creating a thumbnail, write this complete script to `/home/claude/generate_thumbnail.py`,
225
+ then execute it with `python3 generate_thumbnail.py`. All values in CAPS are filled in
226
+ by the agent before writing the script.
227
+ ```python
228
+ #!/usr/bin/env python3
229
+ """
230
+ YouTube Thumbnail Generator — Auto-generated by Agent
231
+ Video: VIDEO_TITLE_PLACEHOLDER
232
+ Style: STYLE_PLACEHOLDER
233
+ Backend: auto-detected
234
+ """
235
+
236
+ import os
237
+ import sys
238
+ import json
239
+ import base64
240
+ import io
241
+ import subprocess
242
+ import requests
243
+ from PIL import Image, ImageDraw, ImageFont, ImageFilter, ImageEnhance
244
+
245
+ # ═══════════════════════════════════════════════════════════
246
+ # CONFIGURATION — Agent fills these in before writing script
247
+ # ═══════════════════════════════════════════════════════════
248
+ VIDEO_TITLE = "FILL_VIDEO_TITLE"
249
+ VIDEO_NICHE = "FILL_NICHE" # e.g. "tutorial", "review", "travel"
250
+ STYLE = "FILL_STYLE" # e.g. "style_6_cinematic"
251
+ TEXT_OVERLAY = "FILL_TEXT" # max 4 words; empty string = auto from title
252
+ AI_PROMPT = "FILL_AI_PROMPT" # full prompt built from style template
253
+ OUTPUT_PATH = "/mnt/user-data/outputs/thumbnail.png"
254
+ # ═══════════════════════════════════════════════════════════
255
+
256
+
257
+ # ─────────────────────────────────────────────────────────
258
+ # BACKEND DETECTION
259
+ # ─────────────────────────────────────────────────────────
260
+
261
+ def detect_mcp_imagen() -> bool:
262
+ """Check if mcp-imagen-go binary is installed and configured in Gemini CLI."""
263
+ try:
264
+ r = subprocess.run(["which", "mcp-imagen-go"],
265
+ capture_output=True, text=True, timeout=5)
266
+ if r.returncode != 0:
267
+ return False
268
+ except (FileNotFoundError, subprocess.TimeoutExpired):
269
+ return False
270
+ settings_path = os.path.expanduser("~/.gemini/settings.json")
271
+ if not os.path.exists(settings_path):
272
+ return False
273
+ try:
274
+ with open(settings_path) as f:
275
+ settings = json.load(f)
276
+ return "imagen" in settings.get("mcpServers", {})
277
+ except Exception:
278
+ return False
279
+
280
+
281
+ def detect_gemini_api() -> bool:
282
+ """Check if Gemini API key is set and google-genai is installed."""
283
+ if not os.environ.get("GEMINI_API_KEY"):
284
+ return False
285
+ try:
286
+ import google.genai
287
+ return True
288
+ except ImportError:
289
+ return False
290
+
291
+
292
+ def detect_backend() -> str:
293
+ """Auto-detect best available image generation backend."""
294
+ # Priority 1: Local A1111
295
+ try:
296
+ r = requests.get("http://127.0.0.1:7860/sdapi/v1/samplers", timeout=3)
297
+ if r.status_code == 200:
298
+ print("✓ Backend: A1111 Local")
299
+ return "a1111"
300
+ except Exception:
301
+ pass
302
+
303
+ # Priority 2: Local ComfyUI
304
+ try:
305
+ r = requests.get("http://127.0.0.1:8188/history", timeout=3)
306
+ if r.status_code == 200:
307
+ print("✓ Backend: ComfyUI Local")
308
+ return "comfyui"
309
+ except Exception:
310
+ pass
311
+
312
+ # Priority 3: MCP Imagen 4 (Vertex AI via Gemini CLI)
313
+ if detect_mcp_imagen():
314
+ print("✓ Backend: MCP Imagen 4 (Vertex AI)")
315
+ return "mcp_imagen"
316
+
317
+ # Priority 4: Gemini API (Nano Banana 2)
318
+ if detect_gemini_api():
319
+ print("✓ Backend: Gemini API (Nano Banana 2)")
320
+ return "gemini_api"
321
+
322
+ # Priority 5: fal.ai FLUX
323
+ if os.environ.get("FAL_KEY"):
324
+ try:
325
+ import fal_client
326
+ print("✓ Backend: fal.ai FLUX")
327
+ return "fal"
328
+ except ImportError:
329
+ pass
330
+
331
+ # Priority 6: OpenAI gpt-image-1
332
+ if os.environ.get("OPENAI_API_KEY"):
333
+ try:
334
+ import openai
335
+ print("✓ Backend: OpenAI gpt-image-1")
336
+ return "openai"
337
+ except ImportError:
338
+ pass
339
+
340
+ # Priority 7: Pillow-only fallback
341
+ print("⚠ Backend: Pillow-only (no AI available)")
342
+ return "pillow_only"
343
+
344
+
345
+ # ─────────────────────────────────────────────────────────
346
+ # IMAGE GENERATION — one function per backend
347
+ # ─────────────────────────────────────────────────────────
348
+
349
+ def gen_a1111(prompt: str) -> Image.Image:
350
+ payload = {
351
+ "prompt": prompt,
352
+ "negative_prompt": "blurry, low quality, text, watermark, ugly, deformed, cropped",
353
+ "width": 1280,
354
+ "height": 720,
355
+ "steps": 25,
356
+ "cfg_scale": 7,
357
+ "sampler_name": "DPM++ 2M Karras",
358
+ "batch_size": 1,
359
+ }
360
+ r = requests.post("http://127.0.0.1:7860/sdapi/v1/txt2img", json=payload, timeout=120)
361
+ r.raise_for_status()
362
+ data = r.json()
363
+ img_bytes = base64.b64decode(data["images"][0])
364
+ return Image.open(io.BytesIO(img_bytes))
365
+
366
+
367
+ def gen_comfyui(prompt: str) -> Image.Image:
368
+ """Simple ComfyUI text-to-image via basic workflow."""
369
+ workflow = {
370
+ "3": {"inputs": {"text": prompt, "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
371
+ "4": {"inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}, "class_type": "CheckpointLoaderSimple"},
372
+ "5": {"inputs": {"text": "blurry, ugly, watermark", "clip": ["4", 1]}, "class_type": "CLIPTextEncode"},
373
+ "6": {"inputs": {"width": 1280, "height": 720, "batch_size": 1}, "class_type": "EmptyLatentImage"},
374
+ "7": {"inputs": {"seed": -1, "steps": 25, "cfg": 7, "sampler_name": "dpmpp_2m",
375
+ "scheduler": "karras", "denoise": 1,
376
+ "model": ["4", 0], "positive": ["3", 0],
377
+ "negative": ["5", 0], "latent_image": ["6", 0]},
378
+ "class_type": "KSampler"},
379
+ "8": {"inputs": {"samples": ["7", 0], "vae": ["4", 2]}, "class_type": "VAEDecode"},
380
+ "9": {"inputs": {"images": ["8", 0], "filename_prefix": "thumb"},
381
+ "class_type": "SaveImage"},
382
+ }
383
+ r = requests.post("http://127.0.0.1:8188/prompt",
384
+ json={"prompt": workflow}, timeout=120)
385
+ r.raise_for_status()
386
+ prompt_id = r.json()["prompt_id"]
387
+
388
+ # Poll for result
389
+ import time
390
+ for _ in range(60):
391
+ time.sleep(2)
392
+ hist = requests.get(f"http://127.0.0.1:8188/history/{prompt_id}", timeout=10).json()
393
+ if prompt_id in hist:
394
+ outputs = hist[prompt_id]["outputs"]
395
+ for node_id, node_output in outputs.items():
396
+ if "images" in node_output:
397
+ img_info = node_output["images"][0]
398
+ img_r = requests.get(
399
+ f"http://127.0.0.1:8188/view?filename={img_info['filename']}"
400
+ f"&subfolder={img_info.get('subfolder','')}&type={img_info['type']}",
401
+ timeout=30
402
+ )
403
+ return Image.open(io.BytesIO(img_r.content))
404
+ raise TimeoutError("ComfyUI generation timed out after 120s")
405
+
406
+
407
+ def gen_mcp_imagen(prompt: str) -> Image.Image:
408
+ """
409
+ Call mcp-imagen-go directly via STDIO (MCP protocol).
410
+ Requires mcp-imagen-go binary in PATH and PROJECT_ID env var.
411
+ Uses Imagen 4 via Vertex AI — highest quality option.
412
+ """
413
+ os.makedirs("/tmp/thumbnail_gen", exist_ok=True)
414
+
415
+ mcp_request = json.dumps({
416
+ "jsonrpc": "2.0",
417
+ "method": "tools/call",
418
+ "id": 1,
419
+ "params": {
420
+ "name": "imagen_t2i",
421
+ "arguments": {
422
+ "prompt": prompt,
423
+ "aspect_ratio": "16:9",
424
+ "number_of_images": 1,
425
+ "output_directory": "/tmp/thumbnail_gen",
426
+ }
427
+ }
428
+ })
429
+
430
+ # Load PROJECT_ID from settings.json if not in env
431
+ env = os.environ.copy()
432
+ if not env.get("PROJECT_ID"):
433
+ try:
434
+ settings_path = os.path.expanduser("~/.gemini/settings.json")
435
+ with open(settings_path) as f:
436
+ settings = json.load(f)
437
+ mcp_env = settings.get("mcpServers", {}).get("imagen", {}).get("env", {})
438
+ env.update({k: v for k, v in mcp_env.items() if v and "YOUR_" not in v})
439
+ except Exception:
440
+ pass
441
+
442
+ proc = subprocess.run(
443
+ ["mcp-imagen-go"],
444
+ input=mcp_request,
445
+ capture_output=True,
446
+ text=True,
447
+ timeout=90,
448
+ env=env
449
+ )
450
+
451
+ if proc.returncode != 0:
452
+ raise RuntimeError(f"mcp-imagen-go failed: {proc.stderr[:500]}")
453
+
454
+ try:
455
+ response = json.loads(proc.stdout)
456
+ except json.JSONDecodeError:
457
+ # Some versions output multiple JSON lines — take last valid one
458
+ for line in reversed(proc.stdout.strip().split("\n")):
459
+ try:
460
+ response = json.loads(line)
461
+ break
462
+ except Exception:
463
+ continue
464
+ else:
465
+ raise RuntimeError("Could not parse mcp-imagen-go output")
466
+
467
+ content = response.get("result", {}).get("content", [])
468
+
469
+ for block in content:
470
+ # Inline base64 image
471
+ if block.get("type") == "image" and block.get("data"):
472
+ img_bytes = base64.b64decode(block["data"])
473
+ return Image.open(io.BytesIO(img_bytes))
474
+
475
+ # File path returned as text
476
+ if block.get("type") == "text":
477
+ text = block.get("text", "")
478
+ for token in text.split():
479
+ token = token.strip(".,\"'")
480
+ if token.endswith((".png", ".jpg", ".jpeg")) and os.path.exists(token):
481
+ return Image.open(token)
482
+
483
+ # Check output directory for newly created files
484
+ import glob, time
485
+ recent = sorted(glob.glob("/tmp/thumbnail_gen/*.png"), key=os.path.getmtime, reverse=True)
486
+ if recent:
487
+ return Image.open(recent[0])
488
+
489
+ raise ValueError("mcp-imagen-go returned no image data")
490
+
491
+
492
+ def gen_gemini_api(prompt: str) -> Image.Image:
493
+ """
494
+ Generate via Gemini API (Nano Banana 2 / gemini-3.1-flash-image-preview).
495
+ Note: aspect ratio is requested in the prompt, not as a parameter.
496
+ """
497
+ from google import genai
498
+ from google.genai import types
499
+
500
+ client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
501
+
502
+ full_prompt = (
503
+ f"{prompt}. "
504
+ "Generate as a wide landscape 16:9 format, high resolution, "
505
+ "professional YouTube thumbnail quality."
506
+ )
507
+
508
+ # Try models newest-first
509
+ models = [
510
+ "gemini-3.1-flash-image-preview",
511
+ "gemini-2.5-flash-image-preview",
512
+ "gemini-2.0-flash-exp",
513
+ ]
514
+
515
+ for model_id in models:
516
+ try:
517
+ response = client.models.generate_content(
518
+ model=model_id,
519
+ contents=[full_prompt],
520
+ config=types.GenerateContentConfig(
521
+ response_modalities=["IMAGE", "TEXT"]
522
+ )
523
+ )
524
+ for part in response.candidates[0].content.parts:
525
+ if part.inline_data is not None:
526
+ img = Image.open(io.BytesIO(part.inline_data.data))
527
+ print(f" ↳ Used model: {model_id}")
528
+ return img
529
+ except Exception as e:
530
+ print(f" ↳ {model_id} failed: {e}")
531
+ continue
532
+
533
+ raise ValueError("All Gemini API models failed — check GEMINI_API_KEY and quota")
534
+
535
+
536
+ def gen_fal(prompt: str) -> Image.Image:
537
+ import fal_client
538
+
539
+ result = fal_client.subscribe(
540
+ "fal-ai/flux/dev",
541
+ arguments={
542
+ "prompt": prompt,
543
+ "image_size": {"width": 1280, "height": 720},
544
+ "num_inference_steps": 28,
545
+ "num_images": 1,
546
+ "enable_safety_checker": True,
547
+ }
548
+ )
549
+ img_url = result["images"][0]["url"]
550
+ r = requests.get(img_url, timeout=60)
551
+ r.raise_for_status()
552
+ return Image.open(io.BytesIO(r.content))
553
+
554
+
555
+ def gen_openai(prompt: str) -> Image.Image:
556
+ from openai import OpenAI
557
+ client = OpenAI()
558
+
559
+ response = client.images.generate(
560
+ model="gpt-image-1",
561
+ prompt=prompt,
562
+ size="1536x1024", # closest 16:9 available
563
+ quality="standard",
564
+ n=1,
565
+ )
566
+ img_bytes = base64.b64decode(response.data[0].b64_json)
567
+ img = Image.open(io.BytesIO(img_bytes))
568
+ # gpt-image-1 returns 1536×1024 — resize to exact YouTube spec
569
+ return img.resize((1280, 720), Image.LANCZOS)
570
+
571
+
572
+ def gen_pillow_only(style: str, title: str) -> Image.Image:
573
+ """
574
+ Pure Pillow fallback — generates a styled graphic without any AI.
575
+ Produces a usable thumbnail when no AI backend is available.
576
+ """
577
+ canvas = Image.new("RGB", (1280, 720))
578
+ draw = ImageDraw.Draw(canvas)
579
+
580
+ # Style-specific color palettes
581
+ PALETTES = {
582
+ "style_1_minimalism": [(245,245,245), (220,220,220)],
583
+ "style_6_cinematic": [(8, 12, 25), (40, 30, 60)],
584
+ "style_11_anti": [(5, 5, 8), (20, 20, 30)],
585
+ "style_4_whiteboard": [(250,248,240), (230,225,210)],
586
+ "style_7_warped": [(10, 5, 20), (50, 10, 60)],
587
+ "default": [(15, 20, 40), (40, 60, 100)],
588
+ }
589
+ colors = PALETTES.get(style, PALETTES["default"])
590
+
591
+ # Vertical gradient
592
+ for y in range(720):
593
+ t = y / 719
594
+ r_v = int(colors[0][0] * (1 - t) + colors[1][0] * t)
595
+ g_v = int(colors[0][1] * (1 - t) + colors[1][1] * t)
596
+ b_v = int(colors[0][2] * (1 - t) + colors[1][2] * t)
597
+ draw.line([(0, y), (1280, y)], fill=(r_v, g_v, b_v))
598
+
599
+ # Decorative diagonal accent lines
600
+ accent = (80, 120, 200) if colors[0][0] < 50 else (150, 150, 160)
601
+ for i in range(0, 1280, 120):
602
+ draw.line([(i, 0), (i + 400, 720)], fill=accent, width=1)
603
+
604
+ return canvas
605
+
606
+
607
+ # ─────────────────────────────────────────────────────────
608
+ # STYLE EFFECTS (Pillow post-processing)
609
+ # ─────────────────────────────────────────────────────────
610
+
611
+ def apply_style_effects(img: Image.Image, style: str) -> Image.Image:
612
+ """Apply style-specific color grading and effects."""
613
+
614
+ if style == "style_1_minimalism":
615
+ img = ImageEnhance.Color(img).enhance(0.75)
616
+ img = ImageEnhance.Brightness(img).enhance(1.05)
617
+
618
+ elif style == "style_3_rainbow":
619
+ img = ImageEnhance.Color(img).enhance(1.4)
620
+ img = ImageEnhance.Contrast(img).enhance(1.1)
621
+
622
+ elif style == "style_4_whiteboard":
623
+ img = ImageEnhance.Color(img).enhance(0.65)
624
+ # Warm tone shift
625
+ r, g, b = img.split()
626
+ r = ImageEnhance.Brightness(Image.merge("RGB", (r, r, r))).enhance(1.05).split()[0]
627
+ img = Image.merge("RGB", (r, g, b))
628
+
629
+ elif style == "style_6_cinematic":
630
+ img = ImageEnhance.Color(img).enhance(0.85)
631
+ img = ImageEnhance.Contrast(img).enhance(1.3)
632
+ # Slight teal shadow / orange highlight look
633
+ img = _apply_color_grade(img, shadow=(0, 5, 15), highlight=(15, 5, 0))
634
+
635
+ elif style == "style_7_warped":
636
+ # RGB channel shift for glitch effect
637
+ r, g, b = img.split()
638
+ r = r.transform(img.size, Image.AFFINE, (1, 0, 8, 0, 1, 0))
639
+ b = b.transform(img.size, Image.AFFINE, (1, 0, -6, 0, 1, 2))
640
+ img = Image.merge("RGB", (r, g, b))
641
+ img = ImageEnhance.Contrast(img).enhance(1.2)
642
+
643
+ elif style == "style_9_encyclopedia":
644
+ img = ImageEnhance.Color(img).enhance(0.6)
645
+ img = ImageEnhance.Brightness(img).enhance(1.1)
646
+
647
+ elif style == "style_11_anti":
648
+ img = ImageEnhance.Brightness(img).enhance(0.6)
649
+ img = ImageEnhance.Contrast(img).enhance(1.5)
650
+
651
+ else:
652
+ # Default: moderate contrast boost
653
+ img = ImageEnhance.Contrast(img).enhance(1.15)
654
+
655
+ # Vignette applied to all styles
656
+ img = _apply_vignette(img, strength=0.35)
657
+
658
+ return img
659
+
660
+
661
+ def _apply_color_grade(img: Image.Image,
662
+ shadow=(0, 0, 0),
663
+ highlight=(0, 0, 0)) -> Image.Image:
664
+ """Subtle shadow/highlight color grade (like LUTs)."""
665
+ r, g, b = img.split()
666
+
667
+ def grade_channel(channel, shadow_add, highlight_add):
668
+ lut = []
669
+ for i in range(256):
670
+ t = i / 255.0
671
+ val = i + int(shadow_add * (1 - t)) + int(highlight_add * t)
672
+ lut.append(max(0, min(255, val)))
673
+ return channel.point(lut)
674
+
675
+ r = grade_channel(r, shadow[0], highlight[0])
676
+ g = grade_channel(g, shadow[1], highlight[1])
677
+ b = grade_channel(b, shadow[2], highlight[2])
678
+ return Image.merge("RGB", (r, g, b))
679
+
680
+
681
+ def _apply_vignette(img: Image.Image, strength: float = 0.35) -> Image.Image:
682
+ """Add subtle radial vignette to focus the eye toward center."""
683
+ w, h = img.size
684
+ mask = Image.new("L", (w, h), 255)
685
+ draw = ImageDraw.Draw(mask)
686
+
687
+ steps = min(w, h) // 2
688
+ for i in range(steps):
689
+ progress = i / steps
690
+ alpha = int(255 * (progress + (1 - progress) * (1 - strength)))
691
+ alpha = max(0, min(255, alpha))
692
+ margin_x = int((1 - progress) * (w // 2))
693
+ margin_y = int((1 - progress) * (h // 2))
694
+ draw.ellipse(
695
+ [margin_x, margin_y, w - margin_x, h - margin_y],
696
+ fill=alpha
697
+ )
698
+
699
+ mask = mask.filter(ImageFilter.GaussianBlur(radius=40))
700
+ black = Image.new("RGB", (w, h), (0, 0, 0))
701
+ img = Image.composite(black, img, mask)
702
+ return img
703
+
704
+
705
+ # ─────────────────────────────────────────────────────────
706
+ # TEXT OVERLAY
707
+ # ─────────────────────────────────────────────────────────
708
+
709
+ FONT_PATHS = [
710
+ "/usr/share/fonts/truetype/liberation/LiberationSans-Bold.ttf",
711
+ "/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf",
712
+ "/usr/share/fonts/TTF/DejaVuSans-Bold.ttf",
713
+ "/usr/share/fonts/truetype/freefont/FreeSansBold.ttf",
714
+ "/usr/share/fonts/truetype/ubuntu/Ubuntu-Bold.ttf",
715
+ ]
716
+
717
+ # Styles where text should be omitted
718
+ NO_TEXT_STYLES = {"style_7_warped", "style_10_candid"}
719
+
720
+
721
+ def load_font(size: int):
722
+ for fp in FONT_PATHS:
723
+ if os.path.exists(fp):
724
+ try:
725
+ return ImageFont.truetype(fp, size=size)
726
+ except Exception:
727
+ continue
728
+ return ImageFont.load_default()
729
+
730
+
731
+ def add_text_overlay(img: Image.Image, text: str, style: str) -> Image.Image:
732
+ """Add styled text overlay appropriate for each thumbnail style."""
733
+
734
+ if not text or style in NO_TEXT_STYLES:
735
+ return img
736
+
737
+ # Truncate to 4 words max (per best-practice from THUMBNAILS.md)
738
+ words = text.split()
739
+ if len(words) > 4:
740
+ text = " ".join(words[:4])
741
+
742
+ w, h = img.size
743
+ img = img.convert("RGBA")
744
+
745
+ if style in ("style_6_cinematic", "style_11_anti"):
746
+ return _text_centered_large(img, text, w, h)
747
+
748
+ elif style in ("style_1_minimalism", "style_4_whiteboard"):
749
+ return _text_clean_corner(img, text, w, h, style)
750
+
751
+ elif style == "style_11_anti":
752
+ return _text_centered_large(img, text, w, h)
753
+
754
+ else:
755
+ return _text_banner_strip(img, text, w, h)
756
+
757
+
758
+ def _text_centered_large(img, text, w, h):
759
+ """Large centered text for cinematic/anti-thumbnail styles."""
760
+ font = load_font(96)
761
+ draw = ImageDraw.Draw(img)
762
+
763
+ bbox = draw.textbbox((0, 0), text.upper(), font=font)
764
+ tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
765
+ x, y = (w - tw) // 2, (h - th) // 2
766
+
767
+ # Glow / shadow effect
768
+ for offset in [(6, 6), (-6, 6), (6, -6), (-6, -6)]:
769
+ draw.text((x + offset[0], y + offset[1]), text.upper(),
770
+ font=font, fill=(0, 0, 0, 180))
771
+ draw.text((x, y), text.upper(), font=font, fill=(255, 255, 255, 255))
772
+
773
+ return img.convert("RGB")
774
+
775
+
776
+ def _text_clean_corner(img, text, w, h, style):
777
+ """Clean minimal text for minimalism and whiteboard styles."""
778
+ font = load_font(72)
779
+ draw = ImageDraw.Draw(img)
780
+
781
+ text_color = (30, 30, 30, 255) if style == "style_4_whiteboard" else (60, 60, 60, 255)
782
+ bbox = draw.textbbox((0, 0), text, font=font)
783
+ x, y = 60, h - (bbox[3] - bbox[1]) - 60
784
+
785
+ # Subtle shadow
786
+ draw.text((x + 2, y + 2), text, font=font, fill=(200, 200, 200, 120))
787
+ draw.text((x, y), text, font=font, fill=text_color)
788
+
789
+ return img.convert("RGB")
790
+
791
+
792
+ def _text_banner_strip(img, text, w, h):
793
+ """Semi-transparent banner strip with high-contrast text."""
794
+ font = load_font(82)
795
+ draw_measure = ImageDraw.Draw(img)
796
+ bbox = draw_measure.textbbox((0, 0), text.upper(), font=font)
797
+ tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
798
+
799
+ padding_x, padding_y = 30, 18
800
+ strip_h = th + padding_y * 2
801
+ strip_y = h - strip_h - 40
802
+
803
+ # Semi-transparent background strip
804
+ overlay = Image.new("RGBA", (w, h), (0, 0, 0, 0))
805
+ overlay_draw = ImageDraw.Draw(overlay)
806
+ overlay_draw.rectangle(
807
+ [0, strip_y, w, strip_y + strip_h],
808
+ fill=(0, 0, 0, 175)
809
+ )
810
+ img = Image.alpha_composite(img, overlay)
811
+
812
+ # Text: shadow then main
813
+ draw = ImageDraw.Draw(img)
814
+ x = (w - tw) // 2
815
+ y = strip_y + padding_y
816
+
817
+ draw.text((x + 3, y + 3), text.upper(), font=font, fill=(0, 0, 0, 200))
818
+ draw.text((x, y), text.upper(), font=font, fill=(255, 220, 50, 255))
819
+
820
+ return img.convert("RGB")
821
+
822
+
823
+ # ─────────────────────────────────────────────────────────
824
+ # AUTO TEXT EXTRACTION
825
+ # ─────────────────────────────────────────────────────────
826
+
827
+ def auto_text(video_title: str, style: str) -> str:
828
+ """Extract best text overlay from video title for given style."""
829
+ if style in NO_TEXT_STYLES:
830
+ return ""
831
+ words = video_title.split()
832
+ # For anti-thumbnail: keep number if present, else use 3 words
833
+ if style == "style_11_anti":
834
+ for word in words:
835
+ if any(c.isdigit() for c in word):
836
+ return word + (" Seconds" if "sec" not in video_title.lower() else "")
837
+ return " ".join(words[:3])
838
+ # General: first 4 impactful words
839
+ stopwords = {"the", "a", "an", "how", "to", "i", "my", "is", "are", "was"}
840
+ filtered = [w for w in words if w.lower() not in stopwords]
841
+ result = filtered[:4] if filtered else words[:4]
842
+ return " ".join(result)
843
+
844
+
845
+ # ─────────────────────────────────────────────────────────
846
+ # DEPENDENCY INSTALLER
847
+ # ─────────────────────────────────────────────────────────
848
+
849
+ def ensure_deps(backend: str):
850
+ """Install required packages for the selected backend."""
851
+ deps = ["Pillow", "requests"]
852
+
853
+ if backend == "gemini_api":
854
+ deps.append("google-genai")
855
+ elif backend == "fal":
856
+ deps.append("fal-client")
857
+ elif backend == "openai":
858
+ deps.append("openai")
859
+
860
+ for dep in deps:
861
+ try:
862
+ if dep == "Pillow":
863
+ import PIL
864
+ elif dep == "requests":
865
+ import requests
866
+ elif dep == "google-genai":
867
+ import google.genai
868
+ elif dep == "fal-client":
869
+ import fal_client
870
+ elif dep == "openai":
871
+ import openai
872
+ except ImportError:
873
+ print(f"Installing {dep}...")
874
+ subprocess.run(
875
+ [sys.executable, "-m", "pip", "install", dep,
876
+ "--break-system-packages", "-q"],
877
+ check=True
878
+ )
879
+
880
+
881
+ # ─────────────────────────────────────────────────────────
882
+ # MAIN ORCHESTRATOR
883
+ # ─────────────────────────────────────────────────────────
884
+
885
+ def main():
886
+ print(f"\n🎨 Thumbnail Generator")
887
+ print(f" Title : {VIDEO_TITLE}")
888
+ print(f" Style : {STYLE}")
889
+ print(f" Output: {OUTPUT_PATH}\n")
890
+
891
+ # 1. Detect backend
892
+ backend = detect_backend()
893
+
894
+ # 2. Install deps if needed
895
+ ensure_deps(backend)
896
+
897
+ # 3. Generate base image
898
+ print(f"→ Generating base image...")
899
+ generators = {
900
+ "a1111": gen_a1111,
901
+ "comfyui": gen_comfyui,
902
+ "mcp_imagen": gen_mcp_imagen,
903
+ "gemini_api": gen_gemini_api,
904
+ "fal": gen_fal,
905
+ "openai": gen_openai,
906
+ }
907
+
908
+ if backend == "pillow_only":
909
+ base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
910
+ else:
911
+ try:
912
+ base_img = generators[backend](AI_PROMPT)
913
+ except Exception as e:
914
+ print(f"⚠ {backend} failed: {e}")
915
+ print(" Falling back to Pillow-only...")
916
+ base_img = gen_pillow_only(STYLE, VIDEO_TITLE)
917
+
918
+ # 4. Normalize to exact 1280×720
919
+ base_img = base_img.convert("RGB").resize((1280, 720), Image.LANCZOS)
920
+ print(f"✓ Base image ready: {base_img.size}")
921
+
922
+ # 5. Apply style effects
923
+ print("→ Applying style effects...")
924
+ base_img = apply_style_effects(base_img, STYLE)
925
+
926
+ # 6. Determine text overlay
927
+ text = TEXT_OVERLAY if TEXT_OVERLAY else auto_text(VIDEO_TITLE, STYLE)
928
+ print(f"→ Text overlay: '{text}'" if text else "→ No text overlay (style preference)")
929
+
930
+ # 7. Add text
931
+ base_img = add_text_overlay(base_img, text, STYLE)
932
+
933
+ # 8. Save
934
+ os.makedirs(os.path.dirname(OUTPUT_PATH), exist_ok=True)
935
+ base_img.save(OUTPUT_PATH, "PNG", optimize=True)
936
+
937
+ size_kb = os.path.getsize(OUTPUT_PATH) // 1024
938
+ print(f"\n✅ Saved: {OUTPUT_PATH} ({size_kb} KB, 1280×720)")
939
+
940
+
941
+ if __name__ == "__main__":
942
+ main()
943
+ ```
944
+
945
+ ---
946
+
947
+ ## Agent Execution Protocol
948
+
949
+ When the user asks for a thumbnail, the agent follows these steps:
950
+
951
+ ### Step 1 — Parse
952
+ Extract from message:
953
+ - `VIDEO_TITLE` — the video title or topic description
954
+ - `VIDEO_NICHE` — category (tutorial, review, travel, etc.)
955
+ - `STYLE` — if explicitly mentioned; otherwise auto-select
956
+ - `TEXT_OVERLAY` — specific text if mentioned (max 4 words); else leave empty
957
+
958
+ ### Step 2 — Select Style
959
+ ```python
960
+ style = select_style(VIDEO_NICHE, style_override=None)
961
+ ```
962
+
963
+ ### Step 3 — Build AI Prompt
964
+ Use the style template from "The 11 Styles" section above.
965
+ Append universal quality suffix:
966
+ ```
967
+ ", professional YouTube thumbnail, vibrant high contrast, cinematic quality,
968
+ sharp focus, award-winning composition"
969
+ ```
970
+
971
+ ### Step 4 — Fill Script Template
972
+ Replace all FILL_ placeholders in the Python script above with actual values.
973
+
974
+ ### Step 5 — Execute
975
+ ```bash
976
+ pip install Pillow requests --break-system-packages -q
977
+ python3 /home/claude/generate_thumbnail.py
978
+ ```
979
+
980
+ ### Step 6 — Verify & Present
981
+ ```python
982
+ assert os.path.exists("/mnt/user-data/outputs/thumbnail.png")
983
+ assert os.path.getsize("/mnt/user-data/outputs/thumbnail.png") > 50_000
984
+ ```
985
+ Then call `present_files` tool with the output path.
986
+ Briefly tell the user which style was chosen and why (1 sentence).
987
+
988
+ ---
989
+
990
+ ## Style Prompt Templates (Reference Card)
991
+ ```python
992
+ STYLE_PROMPTS = {
993
+ "style_1_minimalism": (
994
+ "{subject}, minimalist product photography, pure white background, "
995
+ "single centered subject, dramatic soft studio lighting, ultra clean"
996
+ ),
997
+ "style_2_surround": (
998
+ "{subject} dead center, multiple {objects} arranged in perfect organized "
999
+ "circle around center, controlled chaos, vibrant, top-down angle"
1000
+ ),
1001
+ "style_3_rainbow": (
1002
+ "flat lay of {items} in ranking order, color gradient red to blue, "
1003
+ "product photography, clean background, vivid colors"
1004
+ ),
1005
+ "style_4_whiteboard": (
1006
+ "hand-drawn diagram on real whiteboard explaining {concept}, chalk markers, "
1007
+ "rough sketchy authentic educational style, classroom feel"
1008
+ ),
1009
+ "style_5_ui_framing": (
1010
+ "realistic screenshot mockup of {platform} UI about {topic}, "
1011
+ "exact platform styling, authentic spacing and fonts, credible interface"
1012
+ ),
1013
+ "style_6_cinematic": (
1014
+ "cinematic movie still about {subject}, dramatic chiaroscuro lighting, "
1015
+ "film grain, anamorphic lens flare, shallow depth of field"
1016
+ ),
1017
+ "style_7_warped": (
1018
+ "double exposure portrait, digital glitch effect, {emotion} face merged "
1019
+ "with {concept}, surreal distortion, moody dark tones"
1020
+ ),
1021
+ "style_8_maximalist": (
1022
+ "aerial flat lay of complete collection of all {items}, perfectly organized, "
1023
+ "every single item visible, product catalog photography"
1024
+ ),
1025
+ "style_9_encyclopedia": (
1026
+ "flat icon illustration grid of {topic} elements, consistent icon shapes, "
1027
+ "high contrast on white background, educational infographic style"
1028
+ ),
1029
+ "style_10_candid": (
1030
+ "candid authentic moment of {scene}, natural spontaneous composition, "
1031
+ "perfectly framed, golden hour lighting, documentary photography"
1032
+ ),
1033
+ "style_11_anti": (
1034
+ "dark moody portrait of {subject}, direct serious eye contact to camera, "
1035
+ "dramatic low-key lighting, minimal background, cinematic"
1036
+ ),
1037
+ }
1038
+ ```
1039
+
1040
+ ---
1041
+
1042
+ ## Output Specification
1043
+ - **Path:** `/mnt/user-data/outputs/thumbnail.png`
1044
+ - **Resolution:** 1280 × 720 px (YouTube 16:9 standard)
1045
+ - **Format:** PNG
1046
+ - **Min file size:** ~50 KB (abort and retry if smaller)
1047
+
1048
+ ---
1049
+
1050
+ ## Error Handling Rules
1051
+ 1. Backend API fails → automatically fall back to next priority backend
1052
+ 2. All AI backends fail → use `gen_pillow_only()`, still produce output
1053
+ 3. Font not found → use `ImageFont.load_default()`, never crash
1054
+ 4. Image < 50KB after save → regenerate with next backend
1055
+ 5. TEXT_OVERLAY blank → run `auto_text()` to extract from title
1056
+ 6. mcp-imagen-go PATH issue → check `~/.gemini/settings.json` env and retry
1057
+ with full binary path from `which mcp-imagen-go`
1058
+
1059
+ ---
1060
+
1061
+ ## Quick Reference: Which Backend for What
1062
+
1063
+ | Situation | Recommended Backend |
1064
+ |-----------|-------------------|
1065
+ | Local GPU available (A1111 running) | `a1111` — always fastest+free |
1066
+ | Have GCP project + Gemini CLI set up | `mcp_imagen` — Imagen 4, best quality |
1067
+ | Have Gemini API key (from Gemini CLI) | `gemini_api` — free quota, good quality |
1068
+ | Cloud-only, budget matters | `fal` — $0.03/image, FLUX quality |
1069
+ | Need best text rendering in image | `openai` — gpt-image-1 best for text |
1070
+ | No API keys / testing | `pillow_only` — always works |