@jiggai/recipes 0.4.34 → 0.4.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,553 @@
1
+ # Media Generation
2
+
3
+ ClawRecipes supports AI-powered image, video, and audio generation through a driver-based architecture. This runtime-focused guide covers setup, configuration, and troubleshooting.
4
+
5
+ ## Architecture Overview
6
+
7
+ ### Driver Registry System
8
+
9
+ ClawRecipes uses a **driver registry** that maps providers to skill scripts:
10
+
11
+ ```
12
+ Provider Name → MediaDriver → Skill Script → Generated Media
13
+ ```
14
+
15
+ **Key Components:**
16
+ - **MediaDriver** — TypeScript adapter with provider-specific logic
17
+ - **Skill** — Folder containing scripts, dependencies, and documentation
18
+ - **Registry** — Maps provider names to drivers for runtime lookup
19
+ - **Worker** — Executes media nodes using appropriate drivers
20
+
21
+ ### Execution Flow
22
+
23
+ 1. **Workflow node** specifies provider (e.g., `"provider": "nano-banana-pro"`)
24
+ 2. **Worker** looks up driver in registry by slug
25
+ 3. **Driver** locates skill directory and script
26
+ 4. **Script** executes with prompt and config
27
+ 5. **Output file** saved to workflow run media directory
28
+
29
+ ## Setup Instructions
30
+
31
+ ### Image Generation
32
+
33
+ #### Nano Banana Pro (Gemini)
34
+
35
+ **Requirements:**
36
+ - GEMINI_API_KEY environment variable
37
+ - ClawHub skill: `nano-banana-pro`
38
+
39
+ **Setup:**
40
+ ```bash
41
+ # Install skill
42
+ clawhub install nano-banana-pro
43
+
44
+ # Set API key in OpenClaw config
45
+ openclaw gateway config update env.vars.GEMINI_API_KEY "your-gemini-api-key"
46
+ ```
47
+
48
+ **Configuration:**
49
+ ```json
50
+ {
51
+ "kind": "media-image",
52
+ "config": {
53
+ "provider": "nano-banana-pro",
54
+ "size": "2048x2048",
55
+ "promptTemplate": "{{content_draft.title}}: professional product photo"
56
+ }
57
+ }
58
+ ```
59
+
60
+ **Supported sizes:**
61
+ - `1024x1024` → 1K resolution
62
+ - `1792x1792` → 2K resolution
63
+ - `3840x3840` → 4K resolution
64
+
65
+ #### DALL-E (OpenAI)
66
+
67
+ **Requirements:**
68
+ - OPENAI_API_KEY environment variable
69
+ - ClawHub skill: `openai-dalle`
70
+
71
+ **Setup:**
72
+ ```bash
73
+ clawhub install openai-dalle
74
+ openclaw gateway config update env.vars.OPENAI_API_KEY "sk-your-openai-key"
75
+ ```
76
+
77
+ **Configuration:**
78
+ ```json
79
+ {
80
+ "config": {
81
+ "provider": "openai-dalle",
82
+ "size": "1024x1024",
83
+ "quality": "hd",
84
+ "style": "natural"
85
+ }
86
+ }
87
+ ```
88
+
89
+ **Supported options:**
90
+ - **size:** `1024x1024`, `1024x1792`, `1792x1024`
91
+ - **quality:** `standard`, `hd`
92
+ - **style:** `natural`, `vivid`
93
+
94
+ ### Video Generation
95
+
96
+ #### Kling AI
97
+
98
+ **Requirements:**
99
+ - Kling AI credentials file (NOT environment variables)
100
+ - ClawHub skill: `klingai`
101
+
102
+ **Setup:**
103
+ ```bash
104
+ # Install skill
105
+ clawhub install klingai --force
106
+
107
+ # Configure credentials
108
+ mkdir -p ~/.config/kling
109
+ cat > ~/.config/kling/.credentials << EOF
110
+ {
111
+ "access_key_id": "your-access-key",
112
+ "secret_access_key": "your-secret-key"
113
+ }
114
+ EOF
115
+ ```
116
+
117
+ **Configuration:**
118
+ ```json
119
+ {
120
+ "kind": "media-video",
121
+ "config": {
122
+ "provider": "klingai",
123
+ "duration": "10",
124
+ "aspect_ratio": "16:9",
125
+ "promptTemplate": "{{content_brief.style}}: {{content_brief.video_description}}"
126
+ }
127
+ }
128
+ ```
129
+
130
+ **Constraints:**
131
+ - **Duration:** 3-15 seconds
132
+ - **Aspect ratios:** `16:9`, `9:16`, `1:1`
133
+ - **Mode:** pro (fixed)
134
+
135
+ #### Runway
136
+
137
+ **Requirements:**
138
+ - RUNWAYML_API_SECRET environment variable
139
+ - ClawHub skill: `runway-video`
140
+
141
+ **Setup:**
142
+ ```bash
143
+ clawhub install runway-video
144
+ openclaw gateway config update env.vars.RUNWAYML_API_SECRET "your-runway-secret"
145
+ ```
146
+
147
+ **Configuration:**
148
+ ```json
149
+ {
150
+ "config": {
151
+ "provider": "runway-video",
152
+ "duration": "8",
153
+ "size": "1280x768",
154
+ "addRefinement": "true"
155
+ }
156
+ }
157
+ ```
158
+
159
+ #### Luma AI
160
+
161
+ **Requirements:**
162
+ - LUMAAI_API_KEY environment variable
163
+ - ClawHub skill: `luma-video`
164
+
165
+ **Setup:**
166
+ ```bash
167
+ clawhub install luma-video
168
+ openclaw gateway config update env.vars.LUMAAI_API_KEY "luma-your-api-key"
169
+ ```
170
+
171
+ **Configuration:**
172
+ ```json
173
+ {
174
+ "config": {
175
+ "provider": "luma-video",
176
+ "duration": "5",
177
+ "aspect_ratio": "16:9"
178
+ }
179
+ }
180
+ ```
181
+
182
+ ## Configuration Fields
183
+
184
+ ### Core Fields
185
+
186
+ **promptTemplate** — Template string with variable substitution
187
+ ```json
188
+ "promptTemplate": "Create {{media_type}} for: {{content_draft.title}}\nStyle: {{brand_guide.style}}"
189
+ ```
190
+
191
+ **provider** — Driver slug (matches skill folder name)
192
+ ```json
193
+ "provider": "nano-banana-pro"
194
+ ```
195
+
196
+ **outputPath** — Custom output file path (optional)
197
+ ```json
198
+ "outputPath": "media/{{run.id}}/hero-{{content_draft.slug}}.png"
199
+ ```
200
+
201
+ ### Size Configuration
202
+
203
+ **For Images:**
204
+ ```json
205
+ "size": "1024x1024"
206
+ "size": "1792x1792"
207
+ "size": "3840x3840"
208
+ ```
209
+
210
+ **For Videos:**
211
+ ```json
212
+ "aspect_ratio": "16:9"
213
+ "aspect_ratio": "9:16"
214
+ "aspect_ratio": "1:1"
215
+ "size": "1280x768"
216
+ ```
217
+
218
+ ### Duration Configuration (Videos)
219
+
220
+ ```json
221
+ "duration": "5" // seconds as string
222
+ "duration": "10" // clamp to provider limits
223
+ ```
224
+
225
+ **Provider Constraints:**
226
+ - **Kling AI:** 3-15 seconds
227
+ - **Runway:** 1-10 seconds
228
+ - **Luma AI:** 2-10 seconds
229
+
230
+ ### Prompt Refinement
231
+
232
+ **addRefinement** — Enable LLM prompt enhancement (opt-in)
233
+ ```json
234
+ "addRefinement": "true"
235
+ ```
236
+
237
+ **When enabled:**
238
+ 1. Input prompt processed by LLM for enhancement
239
+ 2. Enhanced prompt sent to media provider
240
+ 3. Results in more detailed, production-ready prompts
241
+
242
+ **Default:** `false` (upstream LLM nodes should produce ready prompts)
243
+
244
+ ## Environment Variables
245
+
246
+ ### Loading Hierarchy
247
+
248
+ ClawRecipes loads environment variables from:
249
+
250
+ 1. **Process environment** — `process.env` (highest priority)
251
+ 2. **OpenClaw config** — `~/.openclaw/openclaw.json` → `env.vars`
252
+
253
+ ### OpenClaw Config Format
254
+
255
+ **Modern format:** (recommended)
256
+ ```json
257
+ {
258
+ "env": {
259
+ "vars": {
260
+ "GEMINI_API_KEY": "your-key",
261
+ "OPENAI_API_KEY": "your-key",
262
+ "RUNWAYML_API_SECRET": "your-secret"
263
+ }
264
+ }
265
+ }
266
+ ```
267
+
268
+ **Legacy format:** (still supported)
269
+ ```json
270
+ {
271
+ "env": {
272
+ "GEMINI_API_KEY": "your-key"
273
+ }
274
+ }
275
+ ```
276
+
277
+ ### Setting Environment Variables
278
+
279
+ ```bash
280
+ # Via OpenClaw CLI
281
+ openclaw gateway config update env.vars.GEMINI_API_KEY "your-key"
282
+
283
+ # Via direct edit
284
+ $EDITOR ~/.openclaw/openclaw.json
285
+
286
+ # Via shell environment
287
+ export GEMINI_API_KEY="your-key"
288
+ openclaw gateway restart
289
+ ```
290
+
291
+ ## Skill Installation
292
+
293
+ ### ClawHub Installation
294
+
295
+ ```bash
296
+ # Global installation (recommended)
297
+ clawhub install nano-banana-pro
298
+ clawhub install klingai --force
299
+
300
+ # Agent-specific installation
301
+ openclaw recipes install-skill nano-banana-pro --agent-id marketing-lead
302
+
303
+ # Team-specific installation
304
+ openclaw recipes install-skill nano-banana-pro --team-id marketing-team
305
+ ```
306
+
307
+ ### Installation Roots
308
+
309
+ Skills are discovered from these directories:
310
+ - `~/.openclaw/skills/` — Global shared skills
311
+ - `~/.openclaw/workspace/skills/` — Workspace-local skills
312
+ - `~/.openclaw/workspace/` — ClawHub sometimes installs here
313
+
314
+ ### Verification
315
+
316
+ ```bash
317
+ # List available drivers
318
+ openclaw recipes workflows media-drivers
319
+
320
+ # Expected output:
321
+ [
322
+ {
323
+ "slug": "nano-banana-pro",
324
+ "displayName": "Nano Banana Pro (Gemini Image Generation)",
325
+ "mediaType": "image",
326
+ "requiredEnvVars": ["GEMINI_API_KEY"],
327
+ "available": true,
328
+ "missingEnvVars": []
329
+ }
330
+ ]
331
+ ```
332
+
333
+ ## Template Variables
334
+
335
+ Media nodes support full template variable substitution in:
336
+ - **promptTemplate** — AI generation prompt
337
+ - **outputPath** — Custom file paths
338
+
339
+ ### Available Variables
340
+
341
+ **Global variables:**
342
+ ```
343
+ {{date}} — Current timestamp
344
+ {{run.id}} — Workflow run ID
345
+ {{workflow.name}} — Workflow display name
346
+ {{node.id}} — Current node ID
347
+ ```
348
+
349
+ **Upstream node outputs:**
350
+ ```
351
+ {{content_draft.text}} — Text from LLM node
352
+ {{brand_guide.style}} — Extracted JSON field
353
+ {{research_brief.video_concept}} — Nested field extraction
354
+ ```
355
+
356
+ ### Example Templates
357
+
358
+ **Product marketing image:**
359
+ ```json
360
+ {
361
+ "promptTemplate": "Professional product photo: {{product_brief.name}}\n\nStyle: {{brand_guide.visual_style}}\nMood: {{campaign_goals.target_emotion}}\nComposition: {{art_direction.composition_notes}}"
362
+ }
363
+ ```
364
+
365
+ **Social video:**
366
+ ```json
367
+ {
368
+ "promptTemplate": "Short social media video: {{content_calendar.post_topic}}\n\nHook: {{copywriting.video_hook}}\nVisual style: {{brand_assets.video_style}}\nDuration: energetic {{duration}}s clip"
369
+ }
370
+ ```
371
+
372
+ ## Troubleshooting
373
+
374
+ ### Driver Not Found
375
+
376
+ **Error:**
377
+ ```
378
+ No media driver found for provider "nano-banana-pro"
379
+ ```
380
+
381
+ **Diagnosis:**
382
+ ```bash
383
+ # Check if skill is installed
384
+ ls ~/.openclaw/skills/nano-banana-pro
385
+ ls ~/.openclaw/workspace/skills/nano-banana-pro
386
+
387
+ # Verify driver registry
388
+ openclaw recipes workflows media-drivers | grep nano-banana-pro
389
+ ```
390
+
391
+ **Solutions:**
392
+ 1. Install missing skill: `clawhub install nano-banana-pro`
393
+ 2. Check skill directory permissions
394
+ 3. Restart gateway if skill was just installed
395
+
396
+ ### Missing Environment Variables
397
+
398
+ **Error:**
399
+ ```json
400
+ {
401
+ "slug": "nano-banana-pro",
402
+ "available": false,
403
+ "missingEnvVars": ["GEMINI_API_KEY"]
404
+ }
405
+ ```
406
+
407
+ **Diagnosis:**
408
+ ```bash
409
+ # Check current environment
410
+ echo $GEMINI_API_KEY
411
+
412
+ # Check OpenClaw config
413
+ cat ~/.openclaw/openclaw.json | jq '.env.vars'
414
+ ```
415
+
416
+ **Solutions:**
417
+ 1. Set via config: `openclaw gateway config update env.vars.GEMINI_API_KEY "your-key"`
418
+ 2. Export in shell before starting gateway
419
+ 3. Restart gateway after config changes
420
+
421
+ ### Script Execution Failures
422
+
423
+ **Error:**
424
+ ```
425
+ Script execution failed: nano-banana-pro generate_image.py
426
+ --- stderr ---
427
+ ModuleNotFoundError: No module named 'google.generativeai'
428
+ ```
429
+
430
+ **Diagnosis:**
431
+ ```bash
432
+ # Check skill venv
433
+ ls ~/.openclaw/skills/nano-banana-pro/.venv/
434
+
435
+ # Test script manually
436
+ cd ~/.openclaw/skills/nano-banana-pro
437
+ ./.venv/bin/python scripts/generate_image.py --help
438
+ ```
439
+
440
+ **Solutions:**
441
+ 1. Reinstall skill: `clawhub install nano-banana-pro --force`
442
+ 2. Manually setup venv: `cd skill && python -m venv .venv && .venv/bin/pip install -r requirements.txt`
443
+ 3. Check skill documentation for dependencies
444
+
445
+ ### Prompt Too Long
446
+
447
+ **Error:**
448
+ ```
449
+ 400 Bad Request: prompt exceeds maximum length (4000 characters)
450
+ ```
451
+
452
+ **Solutions:**
453
+ 1. **Enable refinement:** `"addRefinement": "true"` — let LLM condense the prompt
454
+ 2. **Shorten templates:** Remove verbose instructions from prompt template
455
+ 3. **Upstream editing:** Have prior LLM nodes produce concise briefs
456
+
457
+ ### Output Path Issues
458
+
459
+ **Error:**
460
+ ```
461
+ fs.write path must be within the team workspace
462
+ ```
463
+
464
+ **Solutions:**
465
+ 1. Use relative paths: `"outputPath": "media/{{node.id}}.png"`
466
+ 2. Don't include `../` in paths
467
+ 3. Paths are resolved relative to workflow run directory
468
+
469
+ ### Permission Errors
470
+
471
+ **Error:**
472
+ ```
473
+ EACCES: permission denied, open '/home/control/.openclaw/skills/nano-banana-pro'
474
+ ```
475
+
476
+ **Solutions:**
477
+ 1. Fix ownership: `sudo chown -R control:control ~/.openclaw/`
478
+ 2. Check directory permissions: `chmod 755 ~/.openclaw/skills/`
479
+ 3. Reinstall skill if corrupted
480
+
481
+ ### Timeout Issues
482
+
483
+ **Error:**
484
+ ```
485
+ Script execution timeout (300000ms)
486
+ ```
487
+
488
+ **Solutions:**
489
+ 1. **Increase timeout:** Add `"timeoutMs": 600000` to node config
490
+ 2. **Check API status:** Verify provider service availability
491
+ 3. **Reduce complexity:** Simplify prompts for faster generation
492
+
493
+ ## Advanced Configuration
494
+
495
+ ### Custom Output Directories
496
+
497
+ ```json
498
+ {
499
+ "config": {
500
+ "outputPath": "assets/{{workflow.name}}/{{run.id}}/hero.png"
501
+ }
502
+ }
503
+ ```
504
+
505
+ Creates: `workspace-team/assets/Content Pipeline/2025-04-04T03-53-00-123Z/hero.png`
506
+
507
+ ### Model Selection (Provider-Specific)
508
+
509
+ Some drivers support model selection:
510
+ ```json
511
+ {
512
+ "config": {
513
+ "provider": "openai-dalle",
514
+ "model": "dall-e-3",
515
+ "quality": "hd"
516
+ }
517
+ }
518
+ ```
519
+
520
+ ### Provider Fallbacks
521
+
522
+ Use LLM nodes to implement fallback logic:
523
+ ```json
524
+ {
525
+ "action": {
526
+ "promptTemplate": "Generate an image using primary provider: {{image_config.primary_provider}}\n\nIf unavailable, fallback to: {{image_config.fallback_provider}}"
527
+ }
528
+ }
529
+ ```
530
+
531
+ ## Related Documentation
532
+
533
+ - [MEDIA_DRIVERS.md](MEDIA_DRIVERS.md) — Developer guide for adding new providers
534
+ - [TEMPLATE_VARIABLES.md](TEMPLATE_VARIABLES.md) — Template variable syntax and examples
535
+ - [WORKFLOW_NODES.md](WORKFLOW_NODES.md) — Complete node configuration reference
536
+ - [INSTALLATION.md](INSTALLATION.md) — Installing ClawRecipes and dependencies
537
+
538
+ ## Implementation Details
539
+
540
+ **Core Code Locations:**
541
+ - `src/lib/workflows/media-drivers/` — Driver implementations
542
+ - `src/lib/workflows/workflow-worker.ts` — Media node execution
543
+ - `src/handlers/media-drivers.ts` — CLI media-drivers command
544
+
545
+ **Driver Interface:**
546
+ - `MediaDriver` — TypeScript interface for all providers
547
+ - `MediaDriverInvokeOpts` — Standardized invocation parameters
548
+ - `MediaDriverResult` — Standardized return format
549
+
550
+ **Registry System:**
551
+ - Known drivers registered in `registry.ts`
552
+ - Generic driver auto-discovery for unlisted skills
553
+ - Runtime environment variable availability checking