@jiggai/recipes 0.4.34 → 0.4.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -89,10 +89,75 @@ flowchart LR
89
89
  HTickets --> LibTickets
90
90
  ```
91
91
 
92
- ## Key decisions
92
+ ## Media Drivers Subsystem
93
+
94
+ ClawRecipes implements AI media generation through a driver-based architecture:
95
+
96
+ ### Architecture
97
+ ```
98
+ src/lib/workflows/media-drivers/
99
+ ├── types.ts # MediaDriver interface and types
100
+ ├── registry.ts # Driver registration and discovery
101
+ ├── utils.ts # Skill discovery and script execution
102
+ ├── *.driver.ts # Provider-specific drivers
103
+ └── generic.driver.ts # Auto-discovery for unlisted skills
104
+ ```
105
+
106
+ ### Components
107
+ - **MediaDriver Interface**: Standardized interface for all providers (slug, mediaType, requiredEnvVars, invoke)
108
+ - **Registry System**: Maps provider names to drivers, checks env var availability
109
+ - **Skill Discovery**: Searches `~/.openclaw/skills/`, `~/.openclaw/workspace/skills/`, etc.
110
+ - **Generic Driver**: Auto-creates drivers for any installed skill not explicitly registered
111
+
112
+ ### Integration Points
113
+ - **Workflow Worker**: Executes `media-image`, `media-video`, `media-audio` nodes via drivers
114
+ - **CLI Command**: `workflows media-drivers` lists available providers (used by ClawKitchen UI)
115
+ - **Environment Loading**: Merges `process.env` + `~/.openclaw/openclaw.json` env vars
116
+
117
+ ### Supported Providers
118
+ - **Image**: nano-banana-pro (Gemini), openai-image-gen (DALL-E)
119
+ - **Video**: klingai (Kling AI), runway-video (Runway), luma-video (Luma AI)
120
+ - **Audio**: Extensible via generic driver pattern
121
+
122
+ ## OutputFields and Schema Validation
123
+
124
+ LLM nodes support structured output generation with runtime validation:
125
+
126
+ ### Configuration
127
+ ```json
128
+ {
129
+ "config": {
130
+ "outputFields": [
131
+ {"name": "title", "type": "text"},
132
+ {"name": "tags", "type": "list"},
133
+ {"name": "metadata", "type": "json"}
134
+ ]
135
+ }
136
+ }
137
+ ```
138
+
139
+ ### Runtime Behavior
140
+ 1. **Schema Generation**: outputFields converted to JSON Schema with required fields
141
+ 2. **LLM Invocation**: Schema passed to `llm-task` tool for structured generation
142
+ 3. **Validation**: Response validated against schema before saving
143
+ 4. **Template Variables**: Structured fields become available as `{{nodeId.fieldName}}`
144
+
145
+ ### Field Types
146
+ - **text**: String values
147
+ - **list**: Arrays of strings
148
+ - **json**: Nested JSON objects
149
+
150
+ ### Implementation
151
+ - **Code Location**: `src/lib/workflows/workflow-worker.ts` in LLM node execution
152
+ - **Variable Extraction**: JSON fields automatically exposed as template variables
153
+ - **Error Handling**: Schema validation failures logged with detailed error messages
154
+
155
+ ## Key Decisions
93
156
 
94
157
  - **Tool policy preservation**: When a recipe omits `tools`, the scaffold preserves the existing agent's tool policy (rather than resetting it). See scaffold logic and tests.
95
158
  - **`__internal` export**: Unit tests import handlers and lib helpers via `__internal`; these are not part of the public plugin API.
159
+ - **Media driver registry**: Prefers explicitly registered drivers over auto-discovery for better performance and error messages.
160
+ - **Schema validation**: OutputFields generate strict JSON schemas to ensure predictable LLM outputs for downstream template usage.
96
161
 
97
162
  ## Quality automation
98
163
 
package/docs/COMMANDS.md CHANGED
@@ -268,6 +268,18 @@ openclaw recipes workflows worker-tick \
268
268
  --limit 10
269
269
  ```
270
270
 
271
+ ### Media driver commands
272
+
273
+ List registered media generation drivers (and whether required API keys are present):
274
+
275
+ ```bash
276
+ openclaw recipes workflows media-drivers
277
+ ```
278
+
279
+ This is what ClawKitchen uses to populate the media provider dropdown.
280
+
281
+ More: [MEDIA_DRIVERS.md](MEDIA_DRIVERS.md)
282
+
271
283
  ### Approval commands
272
284
 
273
285
  ```bash
@@ -0,0 +1,175 @@
1
+ # Media Drivers (ClawRecipes)
2
+
3
+ ClawRecipes implements media generation via a **driver architecture**. Drivers are used by the workflow worker when executing `media-image`, `media-video`, and `media-audio` nodes.
4
+
5
+ This doc is for developers adding new providers.
6
+
7
+ ## Key Concepts
8
+
9
+ - A **skill** is a folder on disk (usually installed from ClawHub) that contains scripts and docs.
10
+ - A **driver** is a small TypeScript adapter that knows how to invoke a given skill reliably.
11
+ - Drivers provide:
12
+ - Stable display name and slug
13
+ - Required env-var list (availability checks)
14
+ - Invocation details (stdin vs CLI args, scripts/ subdir, venv detection)
15
+
16
+ ## CLI: List drivers
17
+
18
+ ClawKitchen uses this command to populate its provider dropdown:
19
+
20
+ ```bash
21
+ openclaw recipes workflows media-drivers
22
+ ```
23
+
24
+ It returns JSON like:
25
+
26
+ ```json
27
+ [
28
+ {
29
+ "slug": "nano-banana-pro",
30
+ "displayName": "Nano Banana Pro (Gemini Image Generation)",
31
+ "mediaType": "image",
32
+ "requiredEnvVars": ["GEMINI_API_KEY"],
33
+ "available": true,
34
+ "missingEnvVars": []
35
+ }
36
+ ]
37
+ ```
38
+
39
+ Availability is computed from **merged env**:
40
+ - `process.env` (ClawRecipes process)
41
+ - `~/.openclaw/openclaw.json` → `env.vars`
42
+
43
+ ## Where the code lives
44
+
45
+ ```
46
+ src/lib/workflows/media-drivers/
47
+ types.ts
48
+ registry.ts
49
+ utils.ts
50
+ nano-banana-pro.driver.ts
51
+ openai-image-gen.driver.ts
52
+ runway-video.driver.ts
53
+ kling-video.driver.ts
54
+ luma-video.driver.ts
55
+ generic.driver.ts
56
+ ```
57
+
58
+ ## MediaDriver interface
59
+
60
+ ```ts
61
+ export interface MediaDriver {
62
+ slug: string; // Skill folder name
63
+ mediaType: 'image' | 'video' | 'audio';
64
+ displayName: string; // UI dropdown label
65
+ requiredEnvVars: string[]; // Availability check
66
+
67
+ invoke(opts: MediaDriverInvokeOpts): Promise<MediaDriverResult>;
68
+ }
69
+ ```
70
+
71
+ The `invoke()` method should:
72
+ - Write output into `opts.outputDir`
73
+ - Return `{ filePath }` pointing at the generated file
74
+ - Throw on failure with a useful error message (stderr/stdout included when possible)
75
+
76
+ ## Adding a new driver
77
+
78
+ ### 1) Add or install the underlying skill
79
+
80
+ A skill should live in one of:
81
+ - `~/.openclaw/skills/<slug>`
82
+ - `~/.openclaw/workspace/skills/<slug>`
83
+ - `~/.openclaw/workspace/<slug>` (ClawHub sometimes installs here)
84
+
85
+ The worker and driver utils search these roots via `findSkillDir(slug)`.
86
+
87
+ ### 2) Create the driver file
88
+
89
+ Create:
90
+
91
+ ```
92
+ src/lib/workflows/media-drivers/my-provider.driver.ts
93
+ ```
94
+
95
+ Example pattern (stdin → `MEDIA:` output):
96
+
97
+ ```ts
98
+ import * as path from 'path';
99
+ import { MediaDriver, MediaDriverInvokeOpts, MediaDriverResult } from './types';
100
+ import { findSkillDir, findVenvPython, runScript, parseMediaOutput } from './utils';
101
+
102
+ export class MyProvider implements MediaDriver {
103
+ slug = 'my-provider';
104
+ mediaType = 'image' as const;
105
+ displayName = 'My Provider';
106
+ requiredEnvVars = ['MY_PROVIDER_API_KEY'];
107
+
108
+ async invoke(opts: MediaDriverInvokeOpts): Promise<MediaDriverResult> {
109
+ const { prompt, outputDir, env, timeout } = opts;
110
+
111
+ const skillDir = await findSkillDir(this.slug);
112
+ if (!skillDir) throw new Error(`Skill dir not found for ${this.slug}`);
113
+
114
+ const scriptPath = path.join(skillDir, 'generate_image.py');
115
+ const runner = await findVenvPython(skillDir);
116
+
117
+ const stdout = runScript({
118
+ runner,
119
+ script: scriptPath,
120
+ stdin: prompt,
121
+ env: { ...env, HOME: process.env.HOME || '/home/control' },
122
+ cwd: outputDir,
123
+ timeout,
124
+ });
125
+
126
+ const filePath = parseMediaOutput(stdout);
127
+ if (!filePath) throw new Error(`No MEDIA: path in output: ${stdout}`);
128
+
129
+ return { filePath };
130
+ }
131
+ }
132
+ ```
133
+
134
+ Example pattern (CLI args → script prints direct path):
135
+ - See `nano-banana-pro.driver.ts`.
136
+
137
+ ### 3) Register it
138
+
139
+ Add the driver to `registry.ts` in `knownDrivers`:
140
+
141
+ ```ts
142
+ import { MyProvider } from './my-provider.driver';
143
+
144
+ const knownDrivers: MediaDriver[] = [
145
+ // ...
146
+ new MyProvider(),
147
+ ];
148
+ ```
149
+
150
+ ### 4) Done
151
+
152
+ - The workflow worker can now invoke it by setting `provider` to `skill-my-provider`.
153
+ - The CLI `workflows media-drivers` will list it.
154
+ - ClawKitchen will show it automatically (Kitchen pulls the list from the CLI).
155
+
156
+ ## Script contract (skills)
157
+
158
+ Drivers can invoke scripts in different ways, but the recommended contract is:
159
+
160
+ - Prompt via stdin
161
+ - Print `MEDIA:/absolute/or/relative/path` to stdout
162
+ - Write the file into `MEDIA_OUTPUT_DIR` if provided
163
+
164
+ Notes:
165
+ - Some ClawHub skills place scripts under `scripts/`. Use `findScriptInSkill()` or direct paths accordingly.
166
+ - Venv support: worker/driver utils will prefer `.venv/bin/python` when present.
167
+
168
+ ## Troubleshooting
169
+
170
+ - If the dropdown says a driver is unavailable, run:
171
+ ```bash
172
+ openclaw recipes workflows media-drivers
173
+ ```
174
+ and check `missingEnvVars`.
175
+ - If a skill isn't found, verify the folder name matches `slug` and is inside a scanned root.