wyren-mcp 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,359 @@
1
+ # Best Practices
2
+
3
+ Follow these when building any workflow. They produce better results and avoid common mistakes.
4
+
5
+ ## Before you build
6
+
7
+ ### Plan before you build
8
+
9
+ Before jumping into `build_graph`, ask the user 2-3 targeted questions max to nail down the direction — platform (TikTok vs YouTube), aspect ratio, style (cinematic vs animated), voiceover or not. Don't interrogate — make smart defaults for anything the user didn't specify and only ask when genuinely ambiguous. A quick planning exchange prevents rebuilding the entire pipeline later.
10
+
11
+ ### Always build the complete workflow first
12
+
13
+ Building a workflow is free — node placement and edge connections cost nothing. Always build the full pipeline with `build_graph` first, then present it to the user. Never half-build a workflow to "validate the concept" — that just means rebuilding later.
14
+
15
+ ### Never run immediately after building
16
+
17
+ **After building a workflow, STOP and present it to the user.** Share the workflow URL, describe what you built (nodes, connections, models chosen), and ask for confirmation before executing anything. Execution costs credits — the user must explicitly approve before any `run_node` or `run_workflow` call. Don't assume "build me X" means "build and run X". Building is free, running is not. This rule applies even when the agent has auto-approve or bypass-permissions enabled — always ask before spending credits.
18
+
19
+ ### Execute in batches, not one-by-one
20
+
21
+ Once the user approves execution, run nodes in **dependency-ordered batches** using `run_node`. Group cheap/fast nodes (textAI, storyAI, voiceAI) into a single batch — run them all, then present the combined results. Only pause and ask for approval **before expensive nodes** (imageAI, videoAI) or at natural review points.
22
+
23
+ **Batch strategy:**
24
+
25
+ 1. Run all text/story/voice nodes in one pass (they're fast and cheap)
26
+ 2. Present the text outputs — let the user review scripts/prompts
27
+ 3. After approval, run all imageAI nodes in one pass
28
+ 4. Present the images — let the user review before committing to video
29
+ 5. After approval, run videoAI nodes
30
+
31
+ Never rerun nodes that already produced good output — it wastes credits and time. Never run the full workflow with `run_workflow` unless explicitly asked.
32
+
33
+ ## Generate concepts from images
34
+
35
+ When the user provides an image and wants text generated from it (ad concepts, descriptions, scene ideas), connect the image directly to `textAI`'s image input handle. Text input is optional when an image is connected -- the prompt template and image are sufficient.
36
+
37
+ ```
38
+ imageInput → textAI (image handle connected, promptTemplate: "custom", customPrompt: "Describe this product and create an ad concept")
39
+ ```
40
+
41
+ This pattern is useful for:
42
+
43
+ - Generating ad copy from a product photo
44
+ - Creating scene descriptions from reference images
45
+ - Extracting visual details for downstream prompt enrichment
46
+
47
+ Set `additionalInstructions` to control the output format and length. The textAI node will analyze the image and generate text based on the prompt template and any connected text input.
48
+
49
+ ## Ground everything in user-provided inputs — never hallucinate details
50
+
51
+ **This is critical.** When the user provides images, text, documents, brand URLs, or any reference material, all generated content must be grounded in those inputs. Never invent brand names, product models, company slogans, visual details, or any specific claims that aren't present in the user's inputs.
52
+
53
+ - **Images provided?** Connect them as reference images to every AI node that accepts them (`storyAI`, `imageAI`). Describe what's visible in the image — don't guess what brand or model it is.
54
+ - **Text/documents provided?** Extract brand names, product details, and claims only from that text. Don't supplement with invented marketing copy.
55
+ - **Website URL provided?** Use `websiteResearch` to scrape actual brand info. Connect `brandDocument` to text nodes and `screenshots` to image nodes. Use the real brand voice, not a generic one.
56
+ - **Nothing provided about the brand?** Use generic, descriptive language ("the product", "the device", "the item shown"). Never fill gaps with made-up brand names, model numbers, pricing, or features.
57
+
58
+ **When writing prompts for textAI/storyAI**: If the user hasn't specified a brand, write "a premium wireless headphone" — not "Sony WH-1000XM5". If they provided a photo, write "the headphone shown in the reference image" — don't guess the brand from the image. Let the AI models work with visual references rather than hallucinated text descriptions.
59
+
60
+ **When generating scene descriptions from images**: Always include `additionalInstructions` telling the model to describe what it sees rather than guessing brand/model names. Example: "Describe the product shown in the reference image. Do not guess the brand name or model number — use generic descriptors like 'the smartphone' or 'the device'." Without this constraint, image models tend to hallucinate specific brand identities from visual cues, producing inaccurate marketing copy.
61
+
62
+ ## Prompt crafting
63
+
64
+ ### Separate motion from subject in video prompts
65
+
66
+ When writing or enriching prompts for video generation, describe the subject/scene separately from camera movement and action. Good: "A golden retriever on a beach at sunset. Camera slowly dollies forward as the dog runs toward the waves." Bad: "A golden retriever running on a beach at sunset with the camera moving forward." This is especially important with image-to-video where the start frame already defines the subject — the prompt should focus on what _changes_.
67
+
68
+ ### Don't over-describe images
69
+
70
+ Keep image prompts focused on the key elements: subject, style, mood, and composition. Overly long prompts cause models to ignore or blend instructions unpredictably. Let `textAI` with a good prompt template craft a focused, structured prompt — don't stuff every detail into one sentence. Short, structured prompts with clear hierarchy outperform long rambling ones.
71
+
72
+ ## Always enrich prompts
73
+
74
+ Never connect a `textInput` directly to `imageAI` or `videoAI`. Raw user text like "a cat playing piano" produces mediocre results. Always route through `textAI` first with an appropriate prompt template:
75
+
76
+ ```
77
+ textInput → textAI (template: image/video prompt enricher) → imageAI → videoAI
78
+ ```
79
+
80
+ The textAI node transforms "a cat playing piano" into a detailed, model-optimized prompt with lighting, composition, camera angle, and style details. This is the single biggest quality improvement.
81
+
82
+ Use `list_prompt_templates({ nodeType: "textAI" })` to find the right enrichment template. There are specific templates for image prompts vs video prompts — use the right one for the downstream node.
83
+
84
+ ## Set maxOutputChars for downstream model limits
85
+
86
+ When Text AI feeds into Image AI, Video AI, or Voice AI, set its `maxOutputChars` to fit within the downstream model's prompt limit. Without this, execution fails with a cryptic "prompt too long" error.
87
+
88
+ Key limits:
89
+
90
+ - **Imagen 4**: ~1,400 chars — set `maxOutputChars: 1400` (very restrictive)
91
+ - **Kling image/video, Veo**: ~9,500 chars — default 2000 is safe
92
+ - **ElevenLabs voice**: ~4,500 chars for most models, ~39,500 for Flash v2.5
93
+
94
+ When Text AI connects to multiple downstream nodes, use the **lowest** limit. When `maxOutputChars` is 0 (unlimited), always set it when there's a downstream AI node.
95
+
96
+ See the [Models](models.md) doc for the full limits table.
97
+
98
+ ## Use storyAI for multi-scene, textAI for single-scene
99
+
100
+ If the user wants a single image or video, use `textAI` to enrich the prompt. If they want multiple scenes (video series, story), use `storyAI` — it has 8 scene output handles (`scene_1` through `scene_8`), each producing a tailored prompt. Connect each scene to its own `imageAI` → `videoAI` chain. storyAI has modes like "multishot" and "continuous shot" to guide how it structures the scenes.
101
+
102
+ Don't try to make `textAI` output multiple scenes by hacking the prompt — storyAI handles scene splitting, pacing, and coherence natively. Multiple textAI nodes manually writing "scene 1", "scene 2" produces inconsistent tone and pacing.
103
+
104
+ ## Always connect reference images to storyAI
105
+
106
+ `storyAI` accepts a `Reference Image` input handle (`image`). **When the workflow has an `imageInput` node, always connect it to `storyAI`'s image input** — this lets the model see the actual product/subject instead of guessing. Without a reference image, storyAI tends to hallucinate specific brand names, model numbers, or visual details it can't know. The reference image grounds the scene descriptions in reality.
107
+
108
+ ```
109
+ imageInput → storyAI (image handle) ← always connect when imageInput exists
110
+ textInput → storyAI (text handle)
111
+ ```
112
+
113
+ This also applies when `websiteResearch` provides screenshots — connect them to `storyAI`'s image input for brand-aware scene generation.
114
+
115
+ ## Generate a start frame for video
116
+
117
+ `videoAI` requires a text prompt and optionally accepts a start frame image. Providing a start frame produces significantly better results — it gives the model a clear visual anchor.
118
+
119
+ **Important**: The startFrame is literally the first frame of the generated video. Never use a raw product photo or unprocessed reference image as startFrame — the video will start with that exact image, which looks unnatural. Always route through `imageAI` first to generate a styled scene, then use that as the startFrame.
120
+
121
+ ```
122
+ textAI → imageAI → videoAI (startFrame) (best — AI-generated scene)
123
+ imageInput → imageAI (reference) → videoAI (startFrame) (product photo → styled scene → video)
124
+ textAI → videoAI (works, but less visual control)
125
+ ```
126
+
127
+ **Caution**: `imageInput → videoAI (startFrame)` makes the raw photo the literal first frame. This is valid when the user explicitly wants that (e.g., "animate this exact image"). For product/marketing use cases, ask first — default to routing through `imageAI` to generate a styled scene.
128
+
129
+ videoAI also accepts `endFrame`, `referenceImages`, and `videoReference` inputs depending on the model. Use `get_node_type_info({ nodeType: "videoAI" })` for full handle details.
130
+
131
+ ## Seamless looping video
132
+
133
+ For ambient/background videos that need to loop cleanly (hero sections, auth page backgrounds, product b-roll), the last frame must match the first frame — otherwise the loop point is a visible jump cut.
134
+
135
+ **Trick**: connect the **same `imageAI` output to both `startFrame` AND `endFrame`** on the videoAI node. The model will plan a cyclical motion that begins and ends at that exact frame.
136
+
137
+ ```
138
+ ┌─ startFrame ─┐
139
+ textAI → imageAI ──────┤ ├──→ videoAI (seamless loop)
140
+ └── endFrame ──┘
141
+ ```
142
+
143
+ Prompt for the videoAI should describe **cyclical motion** so the model has something to interpolate: "slowly drifts up and returns", "gently pulses in and out", "orbits once and settles back". Avoid directional prompts ("pushes forward", "flies past") that can't naturally return to the start.
144
+
145
+ Models that support this: Veo 3.1 (fast + standard), Kling V2.5+, Kling O1, Kling V3, Kling V3 Omni. Check `inputs.endFrame: true` via `get_model_capabilities` before attempting.
146
+
147
+ **When NOT to do this**: if the user wants a one-shot narrative clip (establishing shot, scene beat), don't force a loop — the cyclical constraint compromises pacing. Only use this technique for intentionally-ambient footage.
148
+
149
+ ## Iterate on images before generating video
150
+
151
+ Video generation is slow (1-4 minutes) and 10-50x more expensive than image generation. **Always get the image right first.** Run `imageAI`, review the result, and show it to the user before proceeding to `videoAI`. If the image doesn't look right, regenerate it — don't commit to a video run with a bad start frame.
152
+
153
+ The workflow is: generate image → show user → get confirmation → generate video. The agent should pause and ask "Does this look right before we generate the video?" — don't mechanically run the next node.
154
+
155
+ ## Chain images for scene continuity
156
+
157
+ When building multi-scene workflows, connect the output of one `imageAI` as a reference input to the next `imageAI`. This ensures characters, style, and setting stay consistent across scenes:
158
+
159
+ ```
160
+ storyAI (scene 1) → imageAI #1 → videoAI #1
161
+ ↓ (reference)
162
+ storyAI (scene 2) → imageAI #2 → videoAI #2
163
+ ↓ (reference)
164
+ storyAI (scene 3) → imageAI #3 → videoAI #3
165
+ ```
166
+
167
+ Without image chaining, each scene may generate completely different-looking characters and environments. Also consider using `imageInput` with a reference photo of the character to anchor consistency.
168
+
169
+ ## Voice reads the enriched script, not the raw input
170
+
171
+ Connect `voiceAI` to the output of `textAI` or `storyAI`, not directly to `textInput`. The AI-generated text is written for narration — proper pacing, sentence structure, and flow. Raw user input usually isn't.
172
+
173
+ ```
174
+ textInput → textAI (script template) → voiceAI
175
+ ```
176
+
177
+ ## Don't connect voiceAI to videoAI
178
+
179
+ A common mistake: trying to connect `voiceAI` output to `videoAI`. Video generation doesn't accept audio input — it generates silent video from text + optional start frame. To combine voice with video, both feed into `videoCaptions`: `videoAI` → `videoCaptions` (video handle) + `voiceAI` → `videoCaptions` (audio handle). The captions node is where audio and video come together.
180
+
181
+ ## Captions need both video and audio
182
+
183
+ `videoCaptions` requires both a `video` input and an `audio` input to generate accurate captions. The audio is what gets transcribed — without it, there's nothing to caption. Always connect `voiceAI` → `videoCaptions` (audio) alongside `videoAI` → `videoCaptions` (video).
184
+
185
+ ## Match voice and video duration
186
+
187
+ **Video clips are 3-15 seconds each.** Any narration longer than ~15 seconds requires a multi-shot workflow — there is no single video node that produces a 1-minute clip.
188
+
189
+ Use ~2.5 words per second as a guideline:
190
+
191
+ | Video duration | Max narration |
192
+ | -------------- | ------------- |
193
+ | 5 seconds | ~12 words |
194
+ | 10 seconds | ~25 words |
195
+ | 15 seconds | ~37 words |
196
+
197
+ **Single-shot workflows** (one videoAI node): The narration textAI must have `maxOutputChars` set low enough to produce a short script matching the video duration. For a 5-second video ad, the script should be 1-2 punchy sentences — not a paragraph. Always set `additionalInstructions` on narration textAI nodes to constrain output length. For a 5-second scene: "Write exactly 12 words of narration." For 15 seconds total across 3 scenes, constrain each scene's textAI to ~12 words. Without explicit length constraints, the prompt template will produce paragraph-length output that far exceeds the video duration.
198
+
199
+ **Multi-shot workflows** (narration > 15 seconds): Use `storyAI` to split into scenes, each with its own `imageAI` → `videoAI` chain (3-15s per clip), then `videoMerge` to combine. A 60-second narration needs ~4-12 video clips. Never pair a long script with a single short video — it's unusable.
200
+
201
+ **Planning the duration budget**: Before building, estimate total narration length from the user's intent. If they want a "short ad" → 5-10s single shot. If they want a "product story" or "explainer" → multi-shot with storyAI. If unclear, ask: "How long should the final video be?" This determines the entire workflow shape.
202
+
203
+ Which asset leads depends on the user's intent: if they start with a script, the video count and duration should match the voice length. If they start with a video concept, constrain the narration to fit the video duration.
204
+
205
+ ## Per-scene timing budget (critical for sync)
206
+
207
+ When the user specifies a total duration, **divide it equally across scenes** and enforce that budget on BOTH video and audio per scene. Every scene's video clip duration and narration length must match so audio and visuals stay synchronized.
208
+
209
+ **Example**: User says "15 second video with 3 scenes":
210
+
211
+ - Each scene = 5 seconds
212
+ - Each video clip: set `duration: 5` on each `videoAI` node
213
+ - Each narration chunk: ~12 words per scene (5s × 2.5 words/s)
214
+ - Each `voiceAI` produces ~5 seconds of audio matching its scene's video
215
+
216
+ **How to implement**:
217
+
218
+ 1. **Divide duration**: `total_duration / scene_count` = per-scene seconds
219
+ 2. **Set videoAI duration**: Each `videoAI` node gets `duration` set to the per-scene value (must be within model's supported range — check `get_model_capabilities`)
220
+ 3. **Constrain narration per scene**: Each narration `textAI` node's `customPrompt` or prompt template must instruct the LLM to write exactly N words for that scene's audio. Include the timing constraint explicitly: "Write exactly 12 words of narration for this 5-second scene."
221
+ 4. **One voiceAI per scene**: In multi-shot workflows, use a separate `voiceAI` node per scene so each audio clip matches its video clip duration. Then feed each `videoAI` + `voiceAI` pair into its own `videoCaptions` node before merging.
222
+ 5. **Merge in order**: `videoMerge` combines the captioned clips in scene order
223
+
224
+ **Multi-shot with per-scene audio sync**:
225
+
226
+ ```
227
+ textInput → storyAI ──scene_1──→ textAI #1 (narration, ~12 words) → voiceAI #1 ──→ videoCaptions #1 ──→ videoMerge
228
+ │ ↑
229
+ │ textAI #2 (video prompt) → imageAI #1 → videoAI #1 (5s) ┘
230
+
231
+ ├──scene_2──→ textAI #3 (narration, ~12 words) → voiceAI #2 ──→ videoCaptions #2 ──→ (videoMerge)
232
+ │ ↑
233
+ │ textAI #4 (video prompt) → imageAI #2 → videoAI #2 (5s) ┘
234
+
235
+ └──scene_3──→ textAI #5 (narration, ~12 words) → voiceAI #3 ──→ videoCaptions #3 ──→ (videoMerge)
236
+
237
+ textAI #6 (video prompt) → imageAI #3 → videoAI #3 (5s) ────┘
238
+ ```
239
+
240
+ **Key rule**: Never generate one long narration and pair it with multiple short video clips — the audio won't align with what's shown on screen. Each scene's narration must describe what happens in THAT scene's video, and both must be the same duration.
241
+
242
+ ## Use globalStyle for visual consistency
243
+
244
+ When a workflow generates 2 or more images or videos, add a `globalStyle` node to the workflow. It broadcasts style to all AI nodes automatically via the system — no edge connections needed. Just add the node and set its `style` field to a template slug from `list_style_templates`.
245
+
246
+ Without globalStyle, each generation may have a different visual look. When a workflow has multiple imageAI or videoAI nodes, use consistent style descriptors across all of them. globalStyle is the best mechanism for this, but even without it, keep phrasing consistent (e.g., always "cinematic, warm lighting, 35mm film grain" — not "warm tones" in one node and "golden hour lighting" in another). Intentional style variation across nodes is fine when the user wants it — this is about avoiding _accidental_ inconsistency.
247
+
248
+ ## Use research nodes for brand context
249
+
250
+ `websiteResearch` scrapes a website into a text document. `tiktokResearch` does the same for TikTok videos — analyzing hooks, content structure, and style.
251
+
252
+ `websiteResearch` outputs multiple handles — use each where it helps most:
253
+
254
+ - `brandDocument` → `textAI` for tone-aware prompt enrichment
255
+ - `screenshots` → `imageAI` as a visual reference for brand-consistent imagery
256
+ - `colorPalette` → `textAI` for style-aware generation with brand colors
257
+
258
+ Don't just connect `brandDocument` to everything — each output serves a different purpose.
259
+
260
+ ```
261
+ websiteResearch ──brandDocument──→ textAI (brand-aware prompt enrichment) → imageAI
262
+ ──screenshots───→ imageAI (reference image)
263
+ tiktokResearch ──content──→ textAI (hook-style prompt) → videoAI
264
+ ```
265
+
266
+ ## Use audioOverlay to combine video + audio
267
+
268
+ When a workflow has separate video and audio tracks that need to be combined (e.g., `videoAI` output + `voiceAI` output), use `audioOverlay` instead of `videoCaptions`. `videoCaptions` is for adding captions WITH audio — `audioOverlay` is for merging audio onto video without captions.
269
+
270
+ **When to use audioOverlay:**
271
+
272
+ - User wants voiceover on a video but no captions
273
+ - User has a video clip and a separate music/audio track to combine
274
+ - Any "add audio to video" scenario where captions aren't needed
275
+
276
+ **When to use videoCaptions instead:**
277
+
278
+ - User wants captions displayed on the video (with or without audio)
279
+
280
+ ```
281
+ voiceAI ──audio──→ audioOverlay ──video──→ (output with audio)
282
+ videoAI ──video──→ audioOverlay
283
+ ```
284
+
285
+ **audioMode options:**
286
+
287
+ - `replace` (default) — replaces the video's original audio entirely with the provided audio
288
+ - `mix` — mixes both audio tracks together (useful for background music + voiceover)
289
+
290
+ ## Execution
291
+
292
+ Execution behavior (tier-based batching, approval gates, cost checks) is enforced by the `run_node` tool responses — follow the `_agentInstructions` field in each response. See [execution.md](execution.md) for reference.
293
+
294
+ Key points not covered by tool responses:
295
+
296
+ - **Only rerun what needs fixing**: When iterating, `run_node` on the specific node — don't rerun the whole pipeline.
297
+ - **Check outputs internally**: After running a node, review the output before continuing downstream. Only flag issues to the user if something looks off.
298
+
299
+ ## Workflow hygiene
300
+
301
+ ### Workflow name rules
302
+
303
+ Workflow names must match `^[a-zA-Z0-9\s\-_()'.!?,&]+$` and be **50 characters or less**. Allowed: letters, numbers, spaces, and `- _ ( ) ' . ! ? , &`. Anything else (`:` `/` `\` `[` `]` `|` `#` `@` `*` emojis, etc.) returns a `VALIDATION_ERROR`.
304
+
305
+ Common mistakes that break:
306
+
307
+ - Colons: "Demo: Brand Video" → use "Demo - Brand Video"
308
+ - Slashes: "Marketing/Social" → use "Marketing - Social"
309
+ - Emojis, hashtags, pipes, brackets
310
+
311
+ When creating or renaming workflows via `create_workflow` / `rename_workflow`, sanitize the name first. If a user-suggested name contains disallowed characters, pick the closest safe substitute silently (don't ask — just do it and mention the swap in your summary).
312
+
313
+ ### Name nodes descriptively
314
+
315
+ Give nodes clear labels that describe their purpose — "Scene Description Enricher" instead of "Text AI 1", "Product Hero Shot" instead of "Image AI 2". Set the `label` field via `build_graph` with `dataUpdates`. Descriptive names make workflows readable when the user comes back later and help the agent understand existing workflows.
316
+
317
+ ### Duplicate workflows to experiment
318
+
319
+ When a workflow is producing decent results and the user wants to try variations (different model, style, or prompt approach), duplicate it first with `duplicate_workflow`. Never experiment on a working workflow — duplication is free, regeneration is not.
320
+
321
+ ### Keep node count intentional
322
+
323
+ Don't chain three `textAI` nodes to refine a prompt when one with a good template and clear instructions can do the job. Every extra node adds latency, cost, and a point of failure. Add nodes only when they serve a distinct purpose (e.g., separate enrichment for video prompt vs narration script).
324
+
325
+ ### Use gate nodes for human-in-the-loop candidate selection
326
+
327
+ Insert a `gate` node between a generator (`imageAI`, `videoAI`, `storyAI`, etc.) and its downstream consumer when the user should pick from multiple generated candidates before the pipeline commits to the expensive next step. Classic placement: `imageAI → gate → videoAI` — the user regenerates images until one looks right, picks it in the gate, and only then does the pipeline burn credits on video.
328
+
329
+ Gate does NOT toggle branches on/off — it is a pass-through selector with one `any` input and one `any` output. It accumulates candidates across runs (set `maxCandidates: 'accumulate'` — the default), the user pins one via the UI, and downstream nodes resolve to the pinned candidate.
330
+
331
+ Always set `productLabel` (user-facing text on product pages) and `productName` (semantic slug) when the workflow will be published — without them, custom product pages can't address the gate by name and the default page shows a generic "Gate" label.
332
+
333
+ ### Video merge order matters
334
+
335
+ When using `videoMerge`, the order of input connections determines the final video sequence. Connect videos in narrative order — scene 1 first, scene 2 second, etc. If the order is wrong, the story won't make sense and you'll have to re-merge.
336
+
337
+ ### Update inputs after duplicating
338
+
339
+ After `duplicate_workflow`, always review and update the input nodes (`textInput`, `imageInput`, `videoInput`) with the new content before running. A duplicated workflow still has the old inputs baked in.
340
+
341
+ ## build_graph is the only graph mutation tool
342
+
343
+ All graph mutations — adding, updating, removing nodes and edges — go through `build_graph`. The old single-node tools (`add_node`, `remove_node`, `update_node_data`, `connect_nodes`, `disconnect_nodes`, `list_edges`) have been removed. One atomic call per mutation, always.
344
+
345
+ `build_graph` runs a topological column layout (groups nodes by depth, stacks per column) that produces a proper DAG.
346
+
347
+ **Always call `organize_layout` after `build_graph`**: The auto-positioning in `build_graph` can produce overlapping nodes in complex workflows. After every `build_graph` call, immediately call `organize_layout` to clean up the layout into a proper left-to-right DAG. This is cheap and instant — always do it.
348
+
349
+ **Phased construction is fine** — you can call `build_graph` multiple times across a session (e.g., build the input+enrichment layer first, review with the user, then build the generation layer in a second call). Each `build_graph` call auto-appends its new nodes to the right of existing ones with a clean column layout.
350
+
351
+ **Rule**: if you're adding 2+ nodes as one logical group, they MUST go through a single `build_graph` call. Multiple `build_graph` calls for separate phases are fine. Always follow with `organize_layout`.
352
+
353
+ ## Check prompt templates first
354
+
355
+ Always call `list_prompt_templates` and check if a built-in template fits the use case — they're tuned for each model's quirks and produce better results than ad-hoc prompts. If no template fits the specific use case, the agent can write a custom prompt (`promptTemplate: "custom"` + `customPrompt`). But check templates first — don't default to custom.
356
+
357
+ ## Validate before sharing
358
+
359
+ Always `validate_workflow` after building. Fix issues before telling the user it's ready.
@@ -0,0 +1,51 @@
1
+ # Billing and Credits
2
+
3
+ ## Checking balance
4
+
5
+ ```
6
+ get_credit_balance()
7
+ ```
8
+
9
+ Returns `{ balance: number, currency: "credits" }`. Always check before suggesting execution.
10
+
11
+ ## Pricing lookup
12
+
13
+ ```
14
+ get_pricing({
15
+ operation: "text-generation", // optional filter
16
+ model: "gemini-2.5-flash" // optional filter
17
+ })
18
+ ```
19
+
20
+ Returns pricing entries with: `operation`, `model`, `configKey`, `billingUnit` (per_call, per_second, per_character), `credits`, `description`.
21
+
22
+ Common operations:
23
+
24
+ - `text-generation` — textAI, storyAI
25
+ - `image-generation` — imageAI
26
+ - `video-generation` — videoAI
27
+ - `voice-synthesis` — voiceAI
28
+
29
+ ## Cost estimation for products
30
+
31
+ ```
32
+ estimate_product_cost({
33
+ slug: "my-published-workflow",
34
+ model_overrides: { "node-id": "kling-2.0" } // optional
35
+ })
36
+ ```
37
+
38
+ Returns per-node and total credit cost estimate for a published workflow.
39
+
40
+ ## Cost guidance
41
+
42
+ - Text generation is cheapest (fractions of a credit)
43
+ - Image generation is moderate
44
+ - Video generation is most expensive — especially pro mode and longer durations
45
+ - Voice synthesis costs depend on text length (per_character billing)
46
+
47
+ When the user asks to run something, give them a rough cost estimate first:
48
+
49
+ 1. Look up pricing for the relevant operations/models
50
+ 2. Mention the approximate total
51
+ 3. Ask for confirmation before executing
@@ -0,0 +1,121 @@
1
+ # Execution
2
+
3
+ ## Inspecting node types
4
+
5
+ ```
6
+ get_node_type_info({ nodeType: "videoAI" })
7
+ ```
8
+
9
+ Parameter is `nodeType` (not `type`). Returns fields, models, input/output handles, and constraints.
10
+
11
+ ## Sync vs async nodes
12
+
13
+ | Sync (instant result) | Short async (server polls) | Long async (returns jobId immediately) |
14
+ | --------------------------------------------- | --------------------------------------------- | -------------------------------------------------- |
15
+ | `textAI`, `storyAI`, `voiceAI`, `trendSelector` | `imageAI`, `websiteResearch`, `tiktokResearch` | `videoAI`, `videoCaptions`, `videoMerge`, `slideshow` |
16
+
17
+ **Sync**: `run_node` returns the output immediately.
18
+
19
+ **Short async**: `run_node` starts a background job, polls automatically, and returns the result when complete. Typically under ~60s.
20
+
21
+ **Long async**: `run_node` returns `{ status: "running", jobId }` **immediately** without server-side polling — the MCP client request timeout is shorter than these jobs take. You MUST poll via `get_node_outputs(workflowId, nodeId)` every ~10s until a success record appears, then show `mediaUrls` to the user. Do NOT call `run_node` again for the same node while a job is active — it will reject with "active job".
22
+
23
+ **Overriding**: pass `wait: true` on a long async node to force server-side polling (rarely needed — only if you're confident the job finishes within the MCP request window). Pass `wait: false` on a short async node to get jobId immediately.
24
+
25
+ ## Running nodes
26
+
27
+ ```
28
+ run_node({ workflowId: "...", nodeId: "...", inputs: { text: "..." } })
29
+ ```
30
+
31
+ - If `inputs` is omitted, the node resolves inputs from connected upstream nodes (requires those nodes to have been executed first)
32
+ - If `inputs` is provided, it overrides resolved inputs — useful for testing a node in isolation
33
+ - Returns: `executionId`, `nodeId`, `nodeType`, `status`, `output`, `durationMs`, `creditsCharged`
34
+ - Async nodes return: `jobId`, `jobType`, `status`, `result` (after polling completes)
35
+
36
+ The tool response includes `tier`, `mediaUrls`, and `_agentInstructions` — follow the instructions in each response to know when to show results and when to pause for user approval.
37
+
38
+ ## run_workflow — MANDATORY polling loop
39
+
40
+ Requires `userConfirmed: true` — the tool rejects calls without it. Always prefer `run_node` tier-by-tier.
41
+
42
+ ```
43
+ run_workflow({
44
+ workflowId: "...",
45
+ userConfirmed: true,
46
+ userInputs: {
47
+ "text-input-node-id": "My input text", // shorthand — auto-wraps to { text: "..." }
48
+ "image-input-node-id": ["https://example.com/img.jpg"] // shorthand for imageInput
49
+ }
50
+ })
51
+ ```
52
+
53
+ **userInputs format**: Keyed by input node ID. Supports two formats:
54
+
55
+ - **Shorthand**: `"nodeId": "value"` — auto-wraps based on node type (`textInput` → `{ text: value }`, `imageInput` → `{ imageUrls: value }`, `videoInput` → `{ videoUrl: value }`)
56
+ - **Full form**: `"nodeId": { "text": "...", "otherField": "..." }` — passed as-is to the node data
57
+
58
+ **Important**: Input nodes (`textInput`, `imageInput`, `videoInput`) must have data — either set via `build_graph` (`dataUpdates`) before running, or passed via `userInputs`. The tool validates this and returns an error listing empty input nodes.
59
+
60
+ **Best practice for large inputs**: Set text data via `build_graph` first, then call `run_workflow` without `userInputs`. This avoids large payloads in a single MCP call and is more reliable over flaky connections.
61
+
62
+ ```
63
+ // Step 1: Set the input text
64
+ build_graph({ workflowId: "...", dataUpdates: [{ nodeId: "text-input-id", data: { text: "long content..." } }] })
65
+ // Step 2: Run without userInputs — uses the data already on the node
66
+ run_workflow({ workflowId: "...", userConfirmed: true })
67
+ ```
68
+
69
+ When you DO call `run_workflow`, it returns immediately with `{ run_id, status: "running" }` and dispatches execution in the background. **You MUST enter a polling loop:**
70
+
71
+ ```
72
+ 1. Call get_workflow_run_status({ run_id }) every 5 seconds.
73
+ 2. Continue polling until status is NOT "running" or "pending"
74
+ (terminal states: "completed", "partial", "failed", "cancelled").
75
+ 3. On "completed":
76
+ - Read outputs from the response.
77
+ - Call get_node_outputs if you need more detail.
78
+ - Present the final result to the user.
79
+ 4. On "failed":
80
+ - Surface the error verbatim.
81
+ - Offer next steps (retry, edit a node, try a cheaper model).
82
+ 5. Never hand control back to the user mid-run.
83
+ 6. Never tell the user to say "status" or "check progress" — YOU do the polling.
84
+ ```
85
+
86
+ The polling loop is non-negotiable. A user should never have to babysit a running workflow.
87
+
88
+ ## Getting outputs
89
+
90
+ ```
91
+ get_node_outputs({ workflowId: "...", nodeId: "...", limit: 5 })
92
+ ```
93
+
94
+ Returns execution history. Outputs persist across sessions.
95
+
96
+ ## Cancelling jobs
97
+
98
+ ```
99
+ cancel_job({ jobId: "..." })
100
+ ```
101
+
102
+ Only works on `pending` or `processing` jobs. Credits are refunded for cancelled jobs.
103
+
104
+ ## Execution order — batch by tier
105
+
106
+ Run nodes in dependency-ordered **batches**, not one-by-one:
107
+
108
+ 1. **Input nodes** don't need execution — they just hold data
109
+ 2. **Batch 1 — Text tier**: Run all `textAI`, `storyAI`, `voiceAI` nodes at once (fast, cheap). Present combined text outputs for review.
110
+ 3. **Batch 2 — Image tier**: After user approves text, run all `imageAI` nodes at once. Present images for review.
111
+ 4. **Batch 3 — Video tier**: After user approves images, run `videoAI` nodes. These are slow and expensive — always confirm cost first.
112
+ 5. **Batch 4 — Post-processing**: `videoCaptions`, `videoMerge`, `slideshow` etc.
113
+
114
+ Each node resolves inputs from the most recent output of connected upstream nodes. Only pause between tiers — not between individual nodes within a tier.
115
+
116
+ ## Cost awareness
117
+
118
+ - Always check `get_credit_balance` before suggesting execution
119
+ - Use `get_pricing` to estimate costs before running expensive operations (video generation)
120
+ - Ask the user before executing — don't run nodes without confirmation
121
+ - Mention the approximate cost when proposing to run something
@@ -0,0 +1,108 @@
1
+ # AI Models
2
+
3
+ ## Model categories
4
+
5
+ Use `list_models` to get the current list. Use `get_model_capabilities` for full specs. Below is a reference for common guidance.
6
+
7
+ ### Text models
8
+
9
+ Used by `textAI` and `storyAI` nodes.
10
+
11
+ - **Google Gemini** models (various tiers — Flash Lite, Flash, Pro)
12
+ - Default is typically the fastest/cheapest Flash variant
13
+ - Use `list_models({ category: "text" })` to see what's available and enabled
14
+
15
+ **When to recommend**: Flash Lite for simple rewrites, Flash for general use, Pro for complex creative writing.
16
+
17
+ ### Image models
18
+
19
+ Used by `imageAI` nodes.
20
+
21
+ - **Google Imagen** — high quality, fast
22
+ - **Kling** — alternative provider
23
+ - Use `list_models({ category: "image" })` for current options
24
+
25
+ **When to recommend**: Imagen for general use. Check model capabilities for aspect ratio and resolution support.
26
+
27
+ ### Video models
28
+
29
+ Used by `videoAI` nodes. These are async — execution takes minutes.
30
+
31
+ - **Kling** models (various versions and quality tiers)
32
+ - **Veo** (Google) models — some have built-in audio
33
+ - Standard mode is faster/cheaper, Pro mode is higher quality
34
+ - Duration range is model-dependent (3–15s) — call `get_model_capabilities` to check supported durations
35
+ - **Not all models/modes accept image inputs (startFrame)**. Always call `get_model_capabilities` before connecting `imageAI → videoAI`. If the model doesn't support `startFrame`, either pick a different model/mode or skip the image-to-video connection.
36
+ - **Capabilities can be mode-dependent**. The static `inputs.startFrame: true` flag means the model _can_ support it — but actual availability may depend on `mode`/`resolution`. Check `constraintSummary` in the `get_model_capabilities` response. Example: **Kling v2.5 Turbo** only supports start/end frames in **PRO mode (1080p)** — at standard resolution they're disabled.
37
+ - Use `list_models({ category: "video" })` for current options
38
+
39
+ **When to recommend**: Latest Kling version in pro mode when using start frames. Veo for built-in audio. Always warn the user that video generation takes a few minutes.
40
+
41
+ ### Voice models
42
+
43
+ Used by `voiceAI` nodes.
44
+
45
+ - **ElevenLabs** — high-quality text-to-speech
46
+ - Use `list_voices` to browse available voices with previews
47
+ - Key settings: `presetVoiceId` (voice ID from `list_voices`), `stability`, `similarityBoost`
48
+
49
+ **When to recommend**: Let the user pick a voice from `list_voices` — accent and tone are personal preferences.
50
+
51
+ ## Prompt length limits
52
+
53
+ Each downstream AI model has a maximum prompt/text input size. When Text AI output feeds into another AI node, the output must fit within that model's limit — otherwise execution fails.
54
+
55
+ ### Limits by category
56
+
57
+ | Category | Model | Token limit | Effective char limit |
58
+ | -------- | -------------------- | ------------- | -------------------- |
59
+ | Image | Imagen 4 (all tiers) | 480 tokens | ~1,400 chars |
60
+ | Image | Nanobanana | 32,768 tokens | ~130,500 chars |
61
+ | Image | Nanobanana Pro | 65,536 tokens | ~261,600 chars |
62
+ | Image | Kling (all) | 2,500 tokens | ~9,500 chars |
63
+ | Video | All models | 2,500 tokens | ~9,500 chars |
64
+ | Voice | Most ElevenLabs | 5,000 chars | ~4,500 chars |
65
+ | Voice | Flash v2.5 | 40,000 chars | ~39,500 chars |
66
+
67
+ _Effective limit = raw limit minus ~500 chars for system prompt overhead. Conversion: ~4 chars per token._
68
+
69
+ ### Setting maxOutputChars on Text AI
70
+
71
+ When connecting Text AI to a downstream AI node, **always** set `maxOutputChars` based on the downstream model's limit:
72
+
73
+ 1. Check the downstream model's limit via `get_model_capabilities`
74
+ 2. Set `maxOutputChars` to the effective char limit from the table above
75
+ 3. When feeding multiple downstream nodes, use the **lowest** limit
76
+
77
+ **Critical**: Imagen 4 has a very low limit (~1,400 chars). When Text AI feeds Imagen 4, set `maxOutputChars: 1400`.
78
+
79
+ Video models and Kling image models accept up to ~9,500 chars — the default `maxOutputChars: 2000` is safe. Voice models accept ~4,500 chars for most models.
80
+
81
+ ## Choosing models
82
+
83
+ When the user doesn't specify a model:
84
+
85
+ 1. Use the default model for that category (marked `isDefault: true` in `list_models`)
86
+ 2. If they mention quality preferences, check `get_model_capabilities` for the right tier
87
+ 3. Mention which model you're using so they can change it
88
+
89
+ When the user asks about pricing:
90
+
91
+ 1. Use `get_pricing` filtered by operation and model
92
+ 2. Different models within the same category can have very different costs
93
+
94
+ ## Prompt templates
95
+
96
+ AI nodes use prompt templates to guide their behavior. Templates are server-side — the node sends the template slug, and the server resolves it.
97
+
98
+ - Use `list_prompt_templates({ nodeType: "textAI" })` to see available templates
99
+ - Set `promptTemplate: "custom"` + `customPrompt: "..."` for user-written prompts
100
+ - Templates are curated for specific use cases (scene description, script writing, etc.)
101
+
102
+ ## Style templates
103
+
104
+ The `globalStyle` node applies visual style templates across connected AI nodes.
105
+
106
+ - Use `list_style_templates` to browse options
107
+ - Set `style: "none"` for no style override
108
+ - Styles affect image and video generation prompts automatically