wyren-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,330 @@
1
+ # Workflow Patterns & Best Practices
2
+
3
+ ## Golden rules
4
+
5
+ 1. **`build_graph` is the ONLY graph mutation tool.** It supports `addNodes`, `addEdges`, `dataUpdates` (partial merge), `removeNodeIds`, and `removeEdgeIds` — all atomic in one call. Auto-positions new nodes in a clean DAG layout. The old single-node tools (`add_node`, `remove_node`, `update_node_data`, `connect_nodes`, `disconnect_nodes`, `list_edges`) have been removed.
6
+ 2. **ALWAYS enrich prompts** — never connect `textInput` directly to `imageAI` or `videoAI`. Route through `textAI` with a prompt template first. This is the single biggest quality improvement.
7
+ 3. **Image first, then video** — video generation is slow (1–4 min) and expensive. Generate an image first, iterate until it looks right, then use it as a start frame for video. Don't skip straight to video.
8
+ 4. **One Text AI per purpose** — when a workflow needs both a video prompt AND a narration script, use TWO separate `textAI` nodes. Each feeds ONLY its downstream node. Never connect narration to video or vice versa.
9
+ 5. **Reuse before building** — always `list_workflows` first, even when the user says "build me X". If a workflow with a similar structure exists, `duplicate_workflow` and modify it — faster than building from zero. After duplicating, always update the input nodes with new content before running — the old inputs are still baked in.
10
+ 6. **voiceAI never connects to videoAI** — video generation doesn't accept audio. To combine voice with video, both feed into `videoCaptions`.
11
+
12
+ ## Connection logic
13
+
14
+ Each node type has a specific role. Connect them based on what data flows where:
15
+
16
+ | Source node | Output | Connects to | Target handle | Why |
17
+ | ----------------------- | ------------------------ | ----------------- | ------------- | --------------------------------------------------------------------------------------------------------------------- |
18
+ | `textInput` | `text` | `textAI` | `text` | Raw input → prompt enrichment |
19
+ | `textInput` | `text` | `websiteResearch` | — | websiteResearch has no inputs; URL is configured via data |
20
+ | `textAI` (video prompt) | `text` | `videoAI` | `text` | Enriched prompt → video generation |
21
+ | `textAI` (image prompt) | `text` | `imageAI` | `text` | Enriched prompt → image generation |
22
+ | `textAI` (narration) | `text` | `voiceAI` | `text` | Script → voice synthesis |
23
+ | `imageAI` | `image` | `videoAI` | `startFrame` | Start frame → video (much better results). **Not all models/modes accept this** — check with `get_model_capabilities` |
24
+ | `videoAI` | `video` | `videoCaptions` | `video` | Video → add captions |
25
+ | `voiceAI` | `audio` | `videoCaptions` | `audio` | Voiceover → burn into captioned video |
26
+ | `websiteResearch` | `brandDocument` | `textAI` | `text` | Brand context → prompt enrichment |
27
+ | `websiteResearch` | `colorPalette` | `textAI` | `text` | Color info → style-aware prompts |
28
+ | `websiteResearch` | `screenshots` | `imageAI` | `image` | Website visual → reference image |
29
+ | `storyAI` | `scene_1`–`scene_5` | `imageAI` | `text` | Scene prompt → image per scene |
30
+ | `imageAI` #1 | `image` | `imageAI` #2 | `image` | Reference chain for character consistency |
31
+ | `videoAI` | `firstFrame`/`lastFrame` | `videoAI` (next) | `startFrame` | Scene continuity across clips |
32
+
33
+ **Anti-patterns** (never do these):
34
+
35
+ - `textInput` → `videoAI` (no prompt enrichment — bad results)
36
+ - `textAI` (narration) → `videoAI` (narration text is not a video prompt)
37
+ - `voiceAI` → `videoAI` (audio doesn't connect to video generation)
38
+ - `textInput` → `voiceAI` (raw text sounds unnatural — enrich first)
39
+ - `imageInput` → `videoAI` (startFrame) without asking — the raw photo becomes the literal first frame. Ask the user if they want the exact image as frame 1, or a styled scene. Default to routing through `imageAI` for product/marketing use cases.
40
+
41
+ ## Pattern: Simple text-to-video
42
+
43
+ ```
44
+ textInput → textAI (video prompt) → imageAI → videoAI
45
+ ```
46
+
47
+ The `imageAI` generates a start frame. This gives the video model a clear visual anchor and produces much better results than text-only input.
48
+
49
+ ```
50
+ build_graph({
51
+ workflowId: "...",
52
+ addNodes: [
53
+ { tempId: "t1", type: "textInput", data: { text: "A cat playing piano" } },
54
+ { tempId: "t2", type: "textAI", data: { promptTemplate: "video" } },
55
+ { tempId: "t3", type: "imageAI", data: { promptTemplate: "social-visual" } },
56
+ { tempId: "t4", type: "videoAI", data: { promptTemplate: "short-form" } }
57
+ ],
58
+ addEdges: [
59
+ { sourceNode: "t1", sourceHandle: "text", targetNode: "t2", targetHandle: "text" },
60
+ { sourceNode: "t2", sourceHandle: "text", targetNode: "t3", targetHandle: "text" },
61
+ { sourceNode: "t2", sourceHandle: "text", targetNode: "t4", targetHandle: "text" },
62
+ { sourceNode: "t3", sourceHandle: "image", targetNode: "t4", targetHandle: "startFrame" }
63
+ ]
64
+ })
65
+ ```
66
+
67
+ Note: `textAI` feeds BOTH `imageAI` (for the start frame) AND `videoAI` (for the video prompt). The image also feeds into videoAI as the start frame.
68
+
69
+ ## Pattern: Video with voiceover + captions
70
+
71
+ Two separate `textAI` nodes — one for visuals, one for narration. They never cross-connect.
72
+
73
+ ```
74
+ textInput ──→ textAI #1 (video prompt) ──→ imageAI ──→ videoAI ──→ videoCaptions
75
+ │ ↑
76
+ └────→ textAI #2 (narration) ──→ voiceAI ────────────────────────────┘
77
+ ```
78
+
79
+ ```
80
+ build_graph({
81
+ workflowId: "...",
82
+ addNodes: [
83
+ { tempId: "input", type: "textInput", data: { text: "..." } },
84
+ { tempId: "vidPrompt", type: "textAI", data: { promptTemplate: "video" } },
85
+ { tempId: "narration", type: "textAI", data: { promptTemplate: "narration" } },
86
+ { tempId: "img", type: "imageAI", data: { promptTemplate: "social-visual" } },
87
+ { tempId: "video", type: "videoAI", data: { aspectRatio: "9:16", promptTemplate: "short-form" } },
88
+ { tempId: "voice", type: "voiceAI", data: { presetVoiceId: "..." } },
89
+ { tempId: "captions", type: "videoCaptions" }
90
+ ],
91
+ addEdges: [
92
+ { sourceNode: "input", sourceHandle: "text", targetNode: "vidPrompt", targetHandle: "text" },
93
+ { sourceNode: "input", sourceHandle: "text", targetNode: "narration", targetHandle: "text" },
94
+ { sourceNode: "vidPrompt", sourceHandle: "text", targetNode: "img", targetHandle: "text" },
95
+ { sourceNode: "vidPrompt", sourceHandle: "text", targetNode: "video", targetHandle: "text" },
96
+ { sourceNode: "img", sourceHandle: "image", targetNode: "video", targetHandle: "startFrame" },
97
+ { sourceNode: "narration", sourceHandle: "text", targetNode: "voice", targetHandle: "text" },
98
+ { sourceNode: "video", sourceHandle: "video", targetNode: "captions", targetHandle: "video" },
99
+ { sourceNode: "voice", sourceHandle: "audio", targetNode: "captions", targetHandle: "audio" }
100
+ ]
101
+ })
102
+ ```
103
+
104
+ ## Pattern: Product photo → marketing video
105
+
106
+ When the user provides a product image and wants a marketing video, route the photo through `imageAI` as a reference to generate a styled scene — never use raw product photos directly as videoAI startFrame (the video would literally start with the raw photo).
107
+
108
+ ```
109
+ imageInput (product photo) ──image──→ imageAI (generate styled scene) ──→ videoAI (startFrame)
110
+ ↑ ↑
111
+ textInput ──→ textAI #1 (video prompt) ──┘────────────────────────────────────┘
112
+ │ → videoCaptions
113
+ └────→ textAI #2 (narration) ──→ voiceAI ──────────────────────────────────────→ ↑
114
+ videoAI ───┘
115
+ ```
116
+
117
+ ```
118
+ build_graph({
119
+ workflowId: "...",
120
+ addNodes: [
121
+ { tempId: "input", type: "textInput", data: { text: "..." } },
122
+ { tempId: "photo", type: "imageInput", data: { url: "https://..." } },
123
+ { tempId: "vidPrompt", type: "textAI", data: { promptTemplate: "video" } },
124
+ { tempId: "narration", type: "textAI", data: { promptTemplate: "narration" } },
125
+ { tempId: "img", type: "imageAI", data: { aspectRatio: "9:16", promptTemplate: "social-visual" } },
126
+ { tempId: "video", type: "videoAI", data: { aspectRatio: "9:16", promptTemplate: "marketing-ad" } },
127
+ { tempId: "voice", type: "voiceAI", data: { presetVoiceId: "..." } },
128
+ { tempId: "captions", type: "videoCaptions" }
129
+ ],
130
+ addEdges: [
131
+ { sourceNode: "input", sourceHandle: "text", targetNode: "vidPrompt", targetHandle: "text" },
132
+ { sourceNode: "input", sourceHandle: "text", targetNode: "narration", targetHandle: "text" },
133
+ { sourceNode: "vidPrompt", sourceHandle: "text", targetNode: "img", targetHandle: "text" },
134
+ { sourceNode: "vidPrompt", sourceHandle: "text", targetNode: "video", targetHandle: "text" },
135
+ { sourceNode: "photo", sourceHandle: "image", targetNode: "img", targetHandle: "image" },
136
+ { sourceNode: "img", sourceHandle: "image", targetNode: "video", targetHandle: "startFrame" },
137
+ { sourceNode: "narration", sourceHandle: "text", targetNode: "voice", targetHandle: "text" },
138
+ { sourceNode: "video", sourceHandle: "video", targetNode: "captions", targetHandle: "video" },
139
+ { sourceNode: "voice", sourceHandle: "audio", targetNode: "captions", targetHandle: "audio" }
140
+ ]
141
+ })
142
+ ```
143
+
144
+ ## Pattern: Brand analysis → marketing video
145
+
146
+ Use `websiteResearch` to extract brand context, then feed it into prompt generation.
147
+
148
+ ```
149
+ websiteResearch ──brandDocument──→ textAI #1 (video prompt) ──→ imageAI ──→ videoAI ──→ videoCaptions
150
+ │ ↑
151
+ └──brandDocument──→ textAI #2 (narration) ──→ voiceAI ────────────────────────────────┘
152
+ ```
153
+
154
+ `websiteResearch` has NO input sockets — the URL is configured via `build_graph` with a `dataUpdates` entry (e.g. `dataUpdates: [{ nodeId: "...", data: { url: "https://..." } }]`). It outputs `brandDocument` (text analysis), `colorPalette` (colors), and `screenshots` (images).
155
+
156
+ **Provider selection**: `websiteResearch` has a `provider` field (default `firecrawl`). Set `provider: "standard"` when the user wants to avoid spending Firecrawl credits — it uses free fetch+cheerio, works on server-rendered marketing sites, but skips screenshots. If the workflow downstream uses the `screenshots` output, stay on `firecrawl`. If Firecrawl credits are exhausted, the node returns an actionable error telling the user to switch providers.
157
+
158
+ ### Retrofitting brand context into an essential
159
+
160
+ Essentials (`use_essential`) ship with a fixed graph that does NOT include `websiteResearch`. When the user has given a URL, you must inject it after copying the essential:
161
+
162
+ 1. `use_essential({ essentialId })` → returns `workflow_id`.
163
+ 2. `get_workflow({ workflowId })` → identify the `textAI` (and `imageAI` if present) node IDs and their text/image input handle IDs.
164
+ 3. `build_graph` with one atomic call adding the research node AND edges:
165
+ ```
166
+ build_graph({
167
+ workflowId,
168
+ addNodes: [
169
+ { tempId: "research", type: "websiteResearch", data: { url: "<user URL>", provider: "firecrawl" } }
170
+ ],
171
+ addEdges: [
172
+ { source: "research", sourceHandle: "brandDocument", target: "<textAI id>", targetHandle: "<text input handle>" },
173
+ // If the essential has imageAI and you want visual grounding:
174
+ { source: "research", sourceHandle: "screenshots", target: "<imageAI id>", targetHandle: "<image input handle>" }
175
+ ]
176
+ })
177
+ ```
178
+ 4. Update the `textAI` prompt via `dataUpdates` to reference the brand document explicitly (e.g. "Using the brand document from the connected input, write a 15-word voiceover for …"). Do NOT hardcode facts the brand document will supply.
179
+ 5. `validate_workflow`, then run.
180
+
181
+ Do NOT skip this step even if the essential "already works" — without `websiteResearch`, the output will be generic and the user will reject it. This is the single most common cause of "the video has no mention of my company" complaints.
182
+
183
+ ## Pattern: TikTok 3-scene branded ad (default for any branded short-form request)
184
+
185
+ This is the **default** topology when the user asks for a branded ad / TikTok / Reel / short-form marketing video AND has provided a URL. Deviate only on explicit user rejection or when `get_pricing({ chain: [...] })` shows the rich default physically cannot fit the budget.
186
+
187
+ ```
188
+ websiteResearch ─brandDocument─┐
189
+ imageInput (user logo/photo) ──┤
190
+ ├─→ textAI (3-scene concept)
191
+ │ │
192
+ │ ├─→ textAI #vid1 → imageAI #1 → videoAI #1 ──┐
193
+ │ ├─→ textAI #vid2 → imageAI #2 → videoAI #2 ──┤
194
+ │ └─→ textAI #vid3 → imageAI #3 → videoAI #3 ──┤
195
+ │ ↓
196
+ └─→ textAI #narration → voiceAI videoMerge
197
+ │ (videos)
198
+ ↓ │
199
+ videoCaptions ←──────────────┘
200
+ ```
201
+
202
+ **Key wiring rules:**
203
+
204
+ - `imageInput` feeds each `imageAI` as a reference image (brand anchor) — never directly to `videoAI` as a raw start frame.
205
+ - The "3-scene concept" `textAI` node receives both `websiteResearch.brandDocument` and the user's original `textInput` so the concept is specific to the brand.
206
+ - Each scene's video-prompt `textAI` is seeded with ONE scene of the concept — use three separate `textAI` nodes (per the "One Text AI per purpose" golden rule), not one multi-output template.
207
+ - **`videoMerge` has ONE input handle named `videos` that accepts up to 10 connections.** Every `videoAI.video` → `videoMerge.videos` edge targets the same `videos` handle. **Never** write `video1` / `video2` / `video3` as target handles — that's the single most common graph-building mistake.
208
+ - `voiceAI` and `videoCaptions` are included **by default**. Strip only on explicit user rejection.
209
+
210
+ **Narration budget for merged outputs** — set the narration `textAI.maxOutputChars` against the **merged** duration (3×5s = 15s → ~225 chars at ~15 chars/sec), not a single scene. The `narration_length_mismatch` validator walks through `videoMerge` to compute the correct budget; if you see a stale single-scene warning, the validator couldn't trace the path — double-check the `voiceAI → videoCaptions.audio` and `videoMerge → videoCaptions.video` edges are both present.
211
+
212
+ **Budget stripping priority** — if `get_pricing` shows overflow, drop nodes in this order until it fits: (1) 3 scenes → 1 scene (one `imageAI` + one `videoAI`, skip `videoMerge`), (2) `voiceAI` → none, (3) `videoCaptions` → none. Never strip `imageInput` or `websiteResearch`.
213
+
214
+ ## Pattern: Human-in-the-loop gate before expensive steps
215
+
216
+ Insert a `gate` between a cheap upstream generator and an expensive downstream consumer so the user picks a winning candidate before credits are committed. The canonical placement is `imageAI → gate → videoAI` — image runs are cheap, the user iterates until one looks right, pins it in the gate, and only then does `videoAI` fire on the chosen frame.
217
+
218
+ ```
219
+ textAI (image prompt) → imageAI → gate → videoAI → videoCaptions
220
+ ```
221
+
222
+ ```
223
+ build_graph({
224
+ workflowId: "...",
225
+ addNodes: [
226
+ { tempId: "img", type: "imageAI", data: { promptTemplate: "social-visual", aspectRatio: "9:16" } },
227
+ { tempId: "pick", type: "gate", data: {
228
+ productLabel: "Pick your hero frame",
229
+ productName: "hero-frame",
230
+ maxCandidates: "accumulate"
231
+ } },
232
+ { tempId: "vid", type: "videoAI", data: { promptTemplate: "marketing-ad", aspectRatio: "9:16" } }
233
+ ],
234
+ addEdges: [
235
+ { sourceNode: "img", sourceHandle: "image", targetNode: "pick", targetHandle: "input" },
236
+ { sourceNode: "pick", sourceHandle: "output", targetNode: "vid", targetHandle: "startFrame" }
237
+ ]
238
+ })
239
+ ```
240
+
241
+ **Rules:**
242
+
243
+ - Gate has a single `any` input (max 1 connection) and a single `any` output — it passes the **selected** candidate through, not the latest run. The socket type is inferred from the incoming edge, so the same gate type works for `image`, `video`, `audio`, or `text`.
244
+ - `maxCandidates: "accumulate"` (default) keeps every upstream run in the candidate gallery. Set to `1` only when the user explicitly wants each new run to replace the previous one.
245
+ - Set `productLabel` + `productName` on every gate you intend to publish — without them, custom product pages can't address the gate via `<ProductGate name="..." />` and the default page falls back to a generic "Gate" label.
246
+ - In published API runs, gates are resolved by `gate_config` (see [products.md](products.md)) — `auto_approve` passes the most recent upstream output through without human intervention, `skip` behaves the same with a softer semantic, `fail` hard-blocks API execution. Design gates so `auto_approve` produces a usable pipeline when the caller is an API, not a human.
247
+
248
+ **When to insert one:**
249
+
250
+ - Before any `videoAI` that the user wants to "get right" — image-to-video runs are the single most expensive step in most pipelines.
251
+ - Before a `slideshow` / `videoMerge` when you've generated more candidates than slots (e.g., 5 `imageAI` runs, pick 3).
252
+ - After a `storyAI` scene breakdown when the user wants to approve the scene concept before committing to per-scene image/video generation.
253
+ - As a review point between `voiceAI` and `videoCaptions` when tone matters.
254
+
255
+ **When NOT to insert one:**
256
+
257
+ - Don't use gates to "toggle branches on/off" — that's not what they do. Gates are selectors, not switches. If you want conditional execution, omit the branch entirely or build a separate workflow variant.
258
+ - Don't stack a gate in front of every node — each gate is a hard pause that blocks downstream execution until the user picks. Use them only at genuine decision points.
259
+
260
+ ## Pattern: Multi-scene video (storyAI)
261
+
262
+ For longer content with multiple scenes, use `storyAI` to generate per-scene prompts. Chain image references for character consistency.
263
+
264
+ ```
265
+ textInput → storyAI ──scene_1──→ imageAI #1 → videoAI #1
266
+ │ ↓ (image reference)
267
+ ├──scene_2──→ imageAI #2 → videoAI #2
268
+ │ ↓ (image reference)
269
+ └──scene_3──→ imageAI #3 → videoAI #3
270
+ ```
271
+
272
+ Each `imageAI` connects its `image` output to the next `imageAI`'s `image` input as a reference. This keeps characters and style consistent across scenes. Without this chaining, each scene generates completely different-looking visuals.
273
+
274
+ After all videos are generated, use `videoMerge` to combine them, or `slideshow` for image-based sequences.
275
+
276
+ ## Pattern: TikTok research → inspired content
277
+
278
+ Use `tiktokResearch` to analyze a trending video, then create content inspired by it.
279
+
280
+ - `tiktokResearch` outputs: `content` (analysis text), `hook` (hook text), `frame` (start frame image), `clip` (video clip)
281
+ - Connect `content` → `textAI` for context-aware prompt generation
282
+ - Connect `clip` → `videoAI` as a video reference (model must support it)
283
+ - Connect `frame` → `imageAI` as a reference image
284
+
285
+ ## Pattern: Style-consistent content
286
+
287
+ Add a `globalStyle` node — it broadcasts style to all AI nodes automatically via the system. No edge connections needed. Set its `style` field to a template slug from `list_style_templates`.
288
+
289
+ ## Pattern: Batch content with iterator
290
+
291
+ When the user wants to generate multiple pieces of content from a list (e.g., "make 5 product videos for these 5 products"), use `iterator` + `closeIterator` instead of duplicating the same nodes multiple times. Iterator splits an array into individual items, runs the loop body per item, and closeIterator collects the results. The workflow stays clean and handles 3 items or 30 items with the same graph.
292
+
293
+ ```
294
+ textInput (JSON array) → iterator → textAI → imageAI → videoAI → closeIterator
295
+ ```
296
+
297
+ ## Build process
298
+
299
+ 1. `list_workflows` — check for existing workflows to duplicate
300
+ 2. `create_workflow` — new empty workflow
301
+ 3. `build_graph` — add ALL nodes and edges in one call (the only graph mutation tool)
302
+ 4. `set_product_inputs` — mark input nodes (textInput, imageInput, videoInput) as `product_inputs` and the final output node (e.g., videoCaptions, videoAI, imageAI) as `product_outputs`. This makes the workflow ready for `run_workflow` with `userInputs` and for publishing as a product API.
303
+ 5. `validate_workflow` — check for issues
304
+ 6. Share the workflow URL with the user
305
+
306
+ ## Execution strategy
307
+
308
+ - **Iterating**: Use `run_node` one at a time. Review image results before generating video. Regenerate individual nodes as needed.
309
+ - **Production run**: Use `run_workflow` with `userInputs` for the full pipeline.
310
+ - **Cost check**: Call `get_credit_balance` before video generation. Video is expensive — inform the user.
311
+
312
+ ## Validation checklist
313
+
314
+ Before telling the user a workflow is ready:
315
+
316
+ 0. **Custom prompt shape** — for every AI node (`textAI`, `imageAI`, `videoAI`, `storyAI`) that uses a custom prompt, set **both** `promptTemplate: "custom"` AND `customPrompt: "..."`. The MCP server now auto-coerces a plain `prompt` field to this pair (so `data: { prompt: "..." }` works), but prefer the explicit form in new code — it's clearer and future-proof.
317
+ 1. `validate_workflow` — fix any reported `issues` (hard errors).
318
+ 2. **Surface all `warnings` to the user as one consolidated question** before proceeding to execution. Warnings include:
319
+ - `default_prompt_template` — an AI node is using its seeded default prompt template; you haven't customized it for this workflow.
320
+ - `default_model` — an AI node is on its seeded default model; you didn't pick one.
321
+ - `missing_aspect_ratio` — an `imageAI` / `videoAI` node has no `aspectRatio`.
322
+ - `empty_input` — a `textInput` / `imageInput` / `videoInput` has no content.
323
+
324
+ If `warnings.length > 0`, do NOT run the workflow. Ask the user: "I left [list] at defaults — want me to customize them to your brand before we run?" Only proceed once the user answers.
325
+ 3. All AI nodes have models set.
326
+ 4. All required inputs are connected.
327
+ 5. Input nodes have content or are clearly for user input at run time.
328
+ 6. No narration/script Text AI connected to Video AI (common mistake).
329
+ 7. Text AI nodes feeding downstream AI have `maxOutputChars` set within the downstream model's prompt limit (Imagen 4 = 1400, Kling/Veo = 9500, voice = 4500).
330
+ 8. `set_product_inputs` was called — input nodes marked as product inputs, final output node marked as product output.