npm - @sogni-ai/sogni-creative-agent-skill - Versions diffs - 3.3.3 → 3.3.5 - Mend

@sogni-ai/sogni-creative-agent-skill 3.3.3 → 3.3.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +15 -5
package/SKILL.md +55 -29
package/generated/creative-agent-runtime.mjs +5 -5
package/llm.txt +3 -3
package/openclaw.plugin.json +1 -1
package/package.json +2 -2
package/skill-package.json +1 -1
package/sogni-agent.mjs +19 -3
package/version.mjs +1 -1

package/README.md CHANGED Viewed

@@ -322,9 +322,9 @@ sogni-agent --api-workflow storyboard-video "10s neon city flyover"
 # Local segment + concat with external soundtrack
 sogni-agent --video --workflow v2v --ref-video dance.mp4 \
-  --video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
+  --video-start 10 --duration 8 --controlnet-name pose -o ./clip-2.mp4 \
   "robot dancing"
-sogni-agent --concat-videos /tmp/final.mp4 /tmp/clip-1.mp4 /tmp/clip-2.mp4 \
+sogni-agent --concat-videos ./final.mp4 ./clip-1.mp4 ./clip-2.mp4 \
   --concat-audio song.mp3 --concat-audio-start 0
 # Balances and help
@@ -361,7 +361,7 @@ Run `sogni-agent --help` for the full CLI. Below are the options and tables most
 | `--workflow-max-cost <n>`, `--confirm-cost`, `--no-confirm-cost` | Set durable workflow capacity ceiling and explicit cost confirmation |
 | `--storyboard-frames <n>` | Beat count for `--api-workflow storyboard-video` |
 | `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Generated-keyframe durable workflow step controls |
-| `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Manage durable workflows |
+| `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>`, `--resume-workflow <id>` | Manage durable workflows |
 | `--api-tools <mode>`, `--no-api-tool-execution`, `--llm-model <id>`, `--task-profile <profile>`, `--max-tokens <n>`, `--thinking` / `--no-thinking`, `--api-base-url <url>` | Tune hosted API requests |
 | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM models |
 | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>` | Manage Sogni Intelligence replay records (use `@path` to load JSON from a file) |
@@ -415,6 +415,7 @@ Music generation uses `--music` and outputs `mp3` by default. `--audio` remains
 - **WAN models** use dimensions divisible by 16, min 480 px, max 1536 px.
 - **LTX family** (`ltx2-*`, `ltx23-*`) uses dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048 px on the long side.
 - **Seedance** runs at fixed 24 fps and supports 4–15 s durations. Other default/WAN paths support up to 10 s; LTX and WAN animate workflows support up to 20 s.
+- For spoken dialogue, budget roughly 3 words per second plus about 1 second for each meaningful acting beat or pause. Keep quoted speech under the model's hard per-clip word budget.
 - The script auto-normalizes video sizes to satisfy these constraints.
 - Use `--target-resolution <px>` for bare resolution requests like "720p" — it targets the short side and preserves the inherited aspect ratio.
 - Natural-language aspect requests like "portrait", "square", "16:9", or "9:16" are inferred when width/height aren't explicitly set. Combined requests like "720p 9:16" keep the requested short side while applying the requested shape.
@@ -530,8 +531,8 @@ Hosted API modes require `SOGNI_API_KEY`.
 - **`--api-workflow storyboard-video`** generates a storyline, creates a single GPT Image 2 storyboard sheet, then passes that artifact into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low/medium/high quality for that storyboard sheet.
 - **Media references** from `-c`, `--ref`, `--ref-end`, `--ref-audio`, `--reference-audio-identity`, and `--ref-video` are forwarded as `media_references` metadata in hosted API requests. API chat also attaches image refs as vision inputs. Local file references are uploaded to Sogni media storage first, then forwarded as retrievable URLs so durable executors do not depend on `data:` URI support. Durable workflow JSON can bind those references into step arguments with `sourceStepId: "$input_media"`. Use direct CLI mode for private media that must not leave the local machine.
 - **Cost controls** use `--workflow-max-cost <n>` to reject workflow starts above a capacity-unit ceiling, and `--confirm-cost` / `--no-confirm-cost` to forward explicit billing confirmation.
-- Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact durable workflow JSON.
-- **Replay records** use `/v1/replay/records`: `--list-replays [limit]`, `--get-replay <runId>`, and `--ingest-replay <json|path|@path>` expose redacted RunRecord storage for Sogni Intelligence replay/debug viewers.
+- Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, `--cancel-workflow`, and `--resume-workflow`. Use `--workflow-input` to provide exact durable workflow JSON.
+- **Replay records** use `/v1/replay/records`: `--list-replays [limit]`, `--get-replay <runId>`, and `--ingest-replay <json|@path>` expose redacted RunRecord storage for Sogni Intelligence replay/debug viewers.
 Override the API origin with `--api-base-url`, `SOGNI_API_BASE_URL`, or `SOGNI_REST_ENDPOINT`.
 Hosted API credentials are only sent to `https://api.sogni.ai` by default. Add trusted custom
@@ -557,6 +558,15 @@ sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
 Options cycle sequentially per image. Without `{...}` syntax, `-n` produces multiple images with the same prompt.
+For video, use the same pattern when every output shares the same source/end assets and settings and only the prompt text varies:
+```bash
+sogni-agent --video --ref hero.png -n 3 --duration 5 \
+  "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
+```
+If each clip needs different source images, end frames, durations, audio slices, or other per-output settings, keep those as separate per-clip workflow arguments instead of collapsing them into a Dynamic Prompt branch.
 ---
 ## Token Auto-Fallback

package/SKILL.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 name: sogni-creative-agent-skill
-description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
+description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories, custom personality, style transfer, angle synthesis, Seedance/LTX/WAN video, music/lyrics, hosted chat, durable workflows, replay records, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
 metadata:
-  version: "3.3.3"
+  version: "3.3.5"
   homepage: https://sogni.ai
   clawdbot:
     emoji: "🎨"
@@ -110,6 +110,18 @@ ln -sfn node_modules/@sogni-ai/sogni-creative-agent-skill sogni-creative-agent-s
 When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from `skill-package.json` during install. That avoids relying on a root `package.json` being present in the published skill artifact.
+## Output Path Convention
+**Always save generated images, videos, and music to the user's current working directory (PWD), not `/tmp`.** Pass a relative path or bare filename to `-o`/`--output`:
+```bash
+sogni-agent -o ./cat.png "a cat wearing a hat"       # ✓ lands in PWD
+sogni-agent -o cat.png "a cat wearing a hat"         # ✓ lands in PWD
+sogni-agent -o /tmp/cat.png "a cat wearing a hat"    # ✗ avoid — user can't easily find it
+```
+`/tmp` (and `mkdtempSync(...)`) is reserved internally for transient intermediate files the CLI cleans up itself (audio re-encodes, intermediate clips during stitching). Final renders the user is asking for must remain inside their working directory unless they explicitly request a different location.
 ## Filesystem Paths and Overrides
 Default file paths used by this skill:
@@ -165,11 +177,15 @@ sogni-agent -Q pro "a cat wearing a hat"      # flux2_dev, 40 steps, 1024x1024 (
 sogni-agent -n 3 "a {red|blue|green} sports car"
 # → generates "a red sports car", "a blue sports car", "a green sports car"
+# Prompt-only video takes from the same source image
+sogni-agent --video --ref hero.png -n 3 --duration 5 \
+  "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
 # Token auto-fallback for native Sogni models (tries SPARK, falls back to SOGNI)
 sogni-agent --token-type auto "a cat wearing a hat"
-# Save to file
-sogni-agent -o /tmp/cat.png "a cat wearing a hat"
+# Save to file (relative paths land in the current working directory)
+sogni-agent -o ./cat.png "a cat wearing a hat"
 # JSON output (for scripting)
 sogni-agent --json "a cat wearing a hat"
@@ -181,7 +197,7 @@ sogni-agent --balance
 sogni-agent --json --balance
 # Quiet mode (suppress progress)
-sogni-agent -q -o /tmp/cat.png "a cat wearing a hat"
+sogni-agent -q -o ./cat.png "a cat wearing a hat"
 # Direct music/audio generation
 sogni-agent --music --duration 30 \
@@ -669,10 +685,10 @@ Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The f
 **Agent usage:**
 ```bash
 # Photobooth: stylize a face photo
-sogni-agent -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
+sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
 # Multiple photobooth outputs
-sogni-agent -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"
+sogni-agent -q --photobooth --ref /path/to/face.jpg -n 4 -o ./stylized.png "LinkedIn professional headshot"
 ```
 ## Multiple Angles (Turnaround)
@@ -691,7 +707,7 @@ sogni-agent --angles-360 -c subject.jpg --distance medium --elevation eye-level
   "studio portrait, same person"
 # 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
-sogni-agent --angles-360 --angles-360-video /tmp/turntable.mp4 \
+sogni-agent --angles-360 --angles-360-video ./turntable.mp4 \
   -c subject.jpg --distance medium --elevation eye-level \
   "studio portrait, same person"
 ```
@@ -721,7 +737,7 @@ When a user requests a "360 video", follow this workflow:
 4. **Example command**:
    ```bash
-   sogni-agent --angles-360 --angles-360-video /tmp/output.mp4 \
+   sogni-agent --angles-360 --angles-360-video ./output.mp4 \
      -c /path/to/image.png --elevation eye-level --distance medium \
      "description of subject"
    ```
@@ -876,6 +892,7 @@ Whenever the chosen video model is `ltx23-22b-fp8_t2v_distilled`, do not pass th
 - Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
 - Give the scene one main action thread from start to finish. Use connectors like `as`, `while`, and `then` so motion reads as a continuous filmed moment.
 - If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
+- Budget spoken dialogue at about 3 words per second, plus about 1 second for each meaningful acting beat or pause.
 - Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
 - Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like `beautiful` or `nice`, or structural markup such as `[DIALOGUE]`.
 - Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
@@ -924,35 +941,35 @@ When user asks to generate/draw/create an image:
 ```bash
 # Generate and save locally (use -Q for quality presets instead of memorizing model IDs)
-sogni-agent -q -Q fast -o /tmp/generated.png "user's prompt"
-sogni-agent -q -Q pro -o /tmp/generated.png "user's prompt"
+sogni-agent -q -Q fast -o ./generated.png "user's prompt"
+sogni-agent -q -Q pro -o ./generated.png "user's prompt"
 # Generate with prompt variations (diverse images in one call)
-sogni-agent -q -n 3 -o /tmp/cars.png "a {red|blue|green} sports car"
+sogni-agent -q -n 3 -o ./cars.png "a {red|blue|green} sports car"
 # Edit an existing image
-sogni-agent -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"
+sogni-agent -q -c /path/to/input.jpg -o ./edited.png "make it pop art style"
 # Generate video from image
-sogni-agent -q --video --ref /path/to/image.png -o /tmp/video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
+sogni-agent -q --video --ref /path/to/image.png -o ./video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
 # Generate text-to-video
-sogni-agent -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
+sogni-agent -q --video -o ./video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
 # Generate direct music/audio
-sogni-agent -q --music --duration 30 -o /tmp/music.mp3 "uplifting cinematic synthwave theme for a product launch"
+sogni-agent -q --music --duration 30 -o ./music.mp3 "uplifting cinematic synthwave theme for a product launch"
 # HD / "4K" text-to-video: prefer LTX-2.3
-sogni-agent -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
+sogni-agent -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
 # HD / "4K" image-to-video: prefer LTX i2v
-sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
+sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
 # Photobooth: stylize a face photo
-sogni-agent -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
+sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
 # Token auto-fallback for native Sogni models (tries SPARK first, retries with SOGNI on insufficient balance)
-sogni-agent -q --token-type auto -o /tmp/generated.png "user's prompt"
+sogni-agent -q --token-type auto -o ./generated.png "user's prompt"
 # Check current SPARK/SOGNI balances (no prompt required)
 sogni-agent --json --balance
@@ -989,6 +1006,15 @@ sogni-agent -q -n 4 "a portrait in {oil painting|watercolor|pencil sketch|pop ar
 Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt.
+For video, use the same `{...}` + `-n` pattern when all outputs share the same source image, end image, duration, audio, and settings and only prompt text varies:
+```bash
+sogni-agent --video --ref hero.png -n 3 --duration 5 \
+  "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
+```
+If clips need different source images, end frames, durations, audio windows, or other per-output settings, keep them as separate per-clip workflow arguments. Do not force those into a single Dynamic Prompt branch.
 ### Token Auto-Fallback
 Use `--token-type auto` when the user's SPARK balance might be low. It tries SPARK first (free daily tokens) and automatically retries with SOGNI if insufficient.
@@ -1014,7 +1040,7 @@ When a user asks to **animate between two images**, use `--ref` (first frame) an
 ```bash
 # Animate from image A to image B
-sogni-agent -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"
+sogni-agent -q --video --ref ./imageA.png --ref-end ./imageB.png -o ./transition.mp4 "descriptive prompt of the transition"
 ```
 ### Animate a Video to an Image (Scene Continuation)
@@ -1023,15 +1049,15 @@ When a user asks to **animate from a video to an image** (or "continue" a video
 1. **Extract the last frame** of the existing video using the built-in safe wrapper:
    ```bash
-   sogni-agent --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png
+   sogni-agent --extract-last-frame ./existing.mp4 ./lastframe.png
    ```
 2. **Generate a new video** using the last frame as `--ref` and the target image as `--ref-end`:
    ```bash
-   sogni-agent -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt"
+   sogni-agent -q --video --ref ./lastframe.png --ref-end ./target.png -o ./continuation.mp4 "scene transition prompt"
    ```
 3. **Concatenate the videos** using the built-in safe wrapper:
    ```bash
-   sogni-agent --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4
+   sogni-agent --concat-videos ./full_sequence.mp4 ./existing.mp4 ./continuation.mp4
    ```
 This ensures visual continuity — the new clip picks up exactly where the previous one ended.
@@ -1055,7 +1081,7 @@ When the final stitched output needs a single external soundtrack, add `--concat
   "width": 512,
   "height": 512,
   "urls": ["https://..."],
-  "localPath": "/tmp/cat.png"
+  "localPath": "./cat.png"
 }
 ```
@@ -1108,7 +1134,7 @@ sogni-agent --persona-list --json
 sogni-agent --persona-resolve "me" --json
 # Generate using a persona (auto-injects photo as context)
-sogni-agent --persona "Mark" -o /tmp/hero.png "superhero in dramatic lighting"
+sogni-agent --persona "Mark" -o ./hero.png "superhero in dramatic lighting"
 # Remove a persona
 sogni-agent --persona-remove "Mark"
@@ -1169,13 +1195,13 @@ Apply artistic styles to existing images:
 ```bash
 # Apply a named artist style
-sogni-agent -c photo.jpg -o /tmp/styled.png "Apply style: Andy Warhol pop art with bold primary colors"
+sogni-agent -c photo.jpg -o ./styled.png "Apply style: Andy Warhol pop art with bold primary colors"
 # Studio Ghibli transformation
-sogni-agent -c photo.jpg -o /tmp/ghibli.png "Apply style: Studio Ghibli watercolor with soft pastel sky and lush greenery"
+sogni-agent -c photo.jpg -o ./ghibli.png "Apply style: Studio Ghibli watercolor with soft pastel sky and lush greenery"
 # For photos with people, always preserve identity
-sogni-agent -c portrait.jpg -o /tmp/styled.png "Apply style: oil painting in the style of Vermeer. Preserve all facial features, expressions, and identity."
+sogni-agent -c portrait.jpg -o ./styled.png "Apply style: oil painting in the style of Vermeer. Preserve all facial features, expressions, and identity."
 ```
 **Tips:** Reference artists and styles BY NAME for best results. Use positive phrasing. For photos with people, always append identity preservation instructions.

package/generated/creative-agent-runtime.mjs CHANGED Viewed

@@ -2147,7 +2147,7 @@ const PROMPT_CONTRACTS = [
         "contractId": "orbit_video_v1",
         "version": "1.0.0",
         "toolName": "orbit_video",
-        "baseDescription": "orbit_video is a self-contained pipeline that handles angle generation, video transitions,\nand stitching internally. If the user uploaded an image, call orbit_video directly — it uses\nthe upload as the front view. If no image exists yet, generate ONE front-view image first,\nthen call orbit_video. Never pre-generate multiple angles or variations for orbit_video.\n\nORBIT DIALOGUE: When the user wants spoken dialogue in an orbit video, ALWAYS use the\ndialogue parameter (NOT prompt). Dialogue goes in ONLY the specified segment — put\nmotion/foley in prompt. If the user says \"only in the first segment\" or \"just at the start\",\nset dialogueSegment=0 (default). Never put dialogue text in the prompt parameter — it will\nbe duplicated across all segments.\n\nORBIT ANGLES: Do NOT send the angles parameter for standard 360° orbits — omit it entirely.\nThe default (right side view, back view, left side view at 90° increments) is correct for all\nnormal orbit requests. Only send angles when the user explicitly asks for specific azimuth\npositions (e.g. \"show me from the front-right and back-left only\") or a partial orbit.\n\nORBIT DIALOGUE UPDATE: For dialogue in multiple/every orbit segment, before every 90-degree\nturn, or with per-turn sequence numbers, use the dialogues array instead of the single dialogue\nparameter. Default 360-degree orbit has 4 transitions, so provide 4 short lines in order; leave\nprompt for subject, action, ambient audio, and foley only. Preserve the real names from the\nrequest/prior result; never invent placeholder speaker tags. For a couple/persona request\nphrased as \"us\", \"we\", or \"my wife and I\", each per-turn line should make the named people\nspeak together. When the user picks a generated image by 1-based number (\"number 3\",\n\"use #3\"), pass sourceImageIndex as that number minus one (number 3 -> sourceImageIndex=2)\ninstead of omitting it.",
+        "baseDescription": "orbit_video is a self-contained pipeline that handles angle generation, video transitions,\nand stitching internally. If the user uploaded an image, call orbit_video directly — it uses\nthe upload as the front view. If no image exists yet, generate one front-view image first,\nthen call orbit_video. Avoid pre-generating multiple angles or variations for orbit_video\nunless the user explicitly asked to review custom source angles before orbiting.\n\nORBIT DIALOGUE: When the user wants spoken dialogue in an orbit video, use the\ndialogue parameter rather than prompt. Dialogue goes in the specified segment — put\nmotion/foley in prompt. If the user says \"only in the first segment\" or \"just at the start\",\nset dialogueSegment=0 (default). Never put dialogue text in the prompt parameter — it will\nbe duplicated across all segments.\n\nORBIT ANGLES: Do NOT send the angles parameter for standard 360° orbits — omit it entirely.\nThe default (right side view, back view, left side view at 90° increments) is correct for all\nnormal orbit requests. Only send angles when the user explicitly asks for specific azimuth\npositions (e.g. \"show me from the front-right and back-left only\") or a partial orbit.\n\nORBIT DIALOGUE UPDATE: For dialogue in multiple/every orbit segment, before every 90-degree\nturn, or with per-turn sequence numbers, use the dialogues array instead of the single dialogue\nparameter. Default 360-degree orbit has 4 transitions, so provide 4 short lines in order; leave\nprompt for subject, action, ambient audio, and foley only. Preserve the real names from the\nrequest/prior result; do not invent placeholder speaker tags. For a couple/persona request\nphrased as \"us\", \"we\", or \"my wife and I\", each per-turn line should make the named people\nspeak together. When the user picks a generated image by 1-based number (\"number 3\",\n\"use #3\"), pass sourceImageIndex as that number minus one (number 3 -> sourceImageIndex=2)\ninstead of omitting it.",
         "parameterDocs": {
             "dialogue": "Spoken dialogue for the first/default orbit segment. Do NOT put dialogue in prompt — it repeats across all segments.",
             "dialogues": "Per-segment dialogue lines array. Use for multi-segment dialogue (4 lines for full 360° orbit).",
@@ -2160,7 +2160,7 @@ const PROMPT_CONTRACTS = [
         "contractId": "animate_photo_v1",
         "version": "1.0.0",
         "toolName": "animate_photo",
-        "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes — describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 2.5 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nAlways check: total dialogue words ÷ 2.5 + beat count ≤ clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\n— there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments — do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total → 4 segments × 15s, NOT 6×10s or 12×5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so never split one planned batch into\n\"first 3\" and \"remaining clips\" calls. Do NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: NEVER call animate_photo N times sequentially — ALWAYS\nuse sourceImageIndices in ONE call so all N projects run in parallel. Two flavors:\n(A) SHARED CONTENT — one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP CONTENT — when each clip has DIFFERENT dialogue, jokes, narration, or motion,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required — pass a brief batch summary.\nFor explicit last/end-frame-only batches, reuse the image through sourceImageIndices but set\nframeRole=\"end\" and omit endImageIndex/endImageIndices. This means each listed image is the\nlast frame for its corresponding clip and no first/start frame is supplied.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field — if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED SKITS: When the user supplies one uploaded reference image and\nasks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH the\nfirst frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips — call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns — that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first — the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, always call stitch_video with\nthose video indices before finalizing.",
+        "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes — describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 3 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nCheck: total dialogue words ÷ 3 + beat count ≤ clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\n— there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments — do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total → 4 segments × 15s, NOT 6×10s or 12×5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so keep one planned batch together\nunless the user explicitly wants isolated projects or per-output settings require it.\nDo NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: Avoid sequential animate_photo calls for N outputs.\nPrefer one Dynamic Prompt project when only prompt text varies, and reserve\nsourceImageIndices multi-project fan-out for source/end asset differences, isolated retry\nlifecycle, or other per-output parameter differences. Flavors:\n(A0) SAME SOURCE / PROMPT-ONLY TAKES — use sourceImageIndex, numberOfVariations=N,\nand ONE Dynamic Prompt branch in prompt: \"{full prompt 1|full prompt 2|...}\". This\nsubmits shared settings and source assets once; Sogni socket creates N jobs in one project.\n(A) SHARED CONTENT — one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP ASSET WIRING — when each clip has DIFFERENT source images, end frames,\naudio windows, durations, dimensions, or other non-prompt parameters,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required — pass a brief batch summary.\nFor explicit last/end-frame-only batches, reuse the image through sourceImageIndices but set\nframeRole=\"end\" and omit endImageIndex/endImageIndices. This means each listed image is the\nlast frame for its corresponding clip and no first/start frame is supplied.\nIf the user explicitly asks for Dynamic Prompt / Dynamic Template syntax, prefer flavor A0\nwhenever every output uses the same source/end assets and shared settings, even if they also\nask to stitch the completed clips afterward.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field — if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED PROMPT-ONLY TAKES: When the user supplies one uploaded reference\nimage and asks for N clips that all reuse that same image as BOTH the first frame and last\nframe, with only the action/prompt text varying, do it in ONE Dynamic Prompt project:\nsourceImageIndex=-1, frameRole=\"both\", endImageIndex=-1, numberOfVariations=N, and prompt\nas ONE branch: \"{full prompt 1|full prompt 2|...|full prompt N}\". Do NOT use\nsourceImageIndices=[-1,-1,...] or prompts=[...] for this prompt-only shape. After the\nsingle animate_photo project completes, call stitch_video with the returned video indices\nif the user requested a final stitched video.\nEach Dynamic Prompt option must be a complete natural-language motion prompt for the video\nmodel. Do not include orchestration labels such as \"clip 1 of N\", \"overall request context\",\n\"use the uploaded image as the source frame\", \"follow the user request\", or \"make this clip\ndistinct\"; the attached first/last-frame image is already wired through arguments.\nKeep explicit shared constraints outside the branch before \"{...}\" and honor them inside every\noption: locked/static camera, subtle motion, consistent flames/embers, and silent expression-only\nphysical performance when requested. Non-speaking expressions such as yawns, smiles, kisses, or shy\nmouth-covering gestures are acceptable when they are the requested physical performance; do not\nturn them into speech, singing, or dialogue-like lip motion.\nIf the user gives avoid/no/don't constraints for a Dynamic Prompt batch, the shared prefix before\nthe branch must include their positive equivalents, such as single-subject empty background, clean\nblank surfaces, crisp sharp focus, same room/layout continuity, silent expression-only physical\nperformance, consistent flame/ember motion, and natural anatomically consistent hands. Do not place\nthose constraints only in negativePrompt or omit them from prompt.\nFor videoModel=\"wan22\", write motion-only visual prompts. WAN 2.2 does not generate audio,\nso omit soundtrack, ambience, room tone, music, hums, sighs, spoken words, voice, and SFX cues.\nFor videoModel=\"wan22\" and \"ltx23\", the prompt field is the positive prompt. Translate user\navoid/no/don't constraints into affirmative production constraints instead of copying negative\nphrasing into prompt. Examples: \"no people in background\" -> single subject focus with an empty\nbackground; \"no text\" -> clean blank surfaces; \"don't make it blurry\" -> crisp sharp focus;\n\"no weird hands\" -> natural anatomically consistent hands; \"don't change the room\" -> the same\nroom and layout remain consistent. Preserve exact quoted visible text or dialogue when the user\nexplicitly requests it, and keep surrounding surfaces blank.\n\nUPLOADED REFERENCE LOOPED SCRIPTED SKITS: When the user supplies one uploaded reference image\nand asks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH\nthe first frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips — call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns — that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first — the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, call stitch_video with\nthose video indices before finalizing unless the user explicitly asked to keep separate clips only.",
         "parameterDocs": {
             "sourceImageIndices": "Batch source image indices. Read startIndex from prior generate_image/edit_image result. Negative = uploaded images (-1 = first upload). May be paired with frameRole=\"end\" only for explicit last/end-frame-only fan-out.",
             "prompts": "Per-clip prompt array. Length MUST equal sourceImageIndices.length when both are set.",
@@ -2184,7 +2184,7 @@ const PROMPT_CONTRACTS = [
         "contractId": "generate_video_v1",
         "version": "1.1.0",
         "toolName": "generate_video",
-        "baseDescription": "generate_video produces text-to-video clips and Seedance multimodal reference videos.\nUse for text-only video generation with no source image input. For Seedance, also use this\ntool when uploaded/generated images, videos, or audio are loose references. Use animate_photo\nonly when a non-Seedance source image must become the first frame of an LTX/WAN animation.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: When the user uploads a storyboard, shot sheet,\nmood board, or trailer concept image and asks to make a movie trailer/video/clip from it,\ndefault to one Seedance generate_video call with referenceImageIndices=[-1]. Do not first\nextract panels with edit_image, do not generate replacement keyframes, and do not make four\nseparate LTX animate_photo clips unless the user explicitly asks for separate clips or LTX.\nUse seedance2 when premium Spark access is available; if premium access is unavailable,\nexplain the limitation or use the best non-Seedance fallback the user accepts.\n\nEXACT / INCLUDED VIDEO PROMPTS: If the user asks for a Seedance video using uploaded or\ngenerated references and says to use a prompt exactly, pass only that literal quoted prompt\nto generate_video and set skipPromptProcessing=true plus expandPrompt=false. Do not treat\nwords inside the literal prompt, such as storyboard, script, thumbnails, or panels, as a\nrequest to create a storyboard image. If the user includes a timecoded script inside a\nvideo request, keep it in the generate_video prompt. Explicit constraints like no storyboard\npanels, no subtitles, or no captions are constraints on the video render, not instructions\nto call edit_image or generate_image.\n\nSTORYTELLING / COMMERCIAL / TRAILER PROMPTS: For creative video requests, turn the brief\ninto timed, causally connected visual beats before writing the final prompt. Default social\nvideo is 15s 9:16 with a strong first 1-2s, visible escalation, payoff, and brand/CTA/final\nimage. Commercials should show audience desire/problem, transformation, proof/benefit, and\nCTA. Trailers should follow hook → world → disruption → escalation → reveal → title/CTA.\nEvery beat must be generatable: subject, setting, action, camera, lighting, audio, and text\nrole where relevant. Avoid vague \"cinematic\" filler, feature dumps, and beautiful images with\nno visible change.\n\nVIDEO PROMPT QUOTING: ONLY use double quotes for spoken dialogue in video prompts. Never\nquote on-screen text, titles, captions, or visual text elements — describe them without\nquotes. Quotes signal speech to the model and confuse audio generation.\n\nSTORYBOARD TEXT: Structural headings, section numbers, slide titles, panel titles, and\ncaptions in storyboard references may become short audio-only narration/VO or\nkey-message beats, but they are not subtitles, title cards, lower thirds, or visible\noverlays unless the user explicitly asks for visible text, on-screen text, a title\ncard, subtitle, lower third, signage, or CTA. Keep narration as separate brief phrases\nwith pauses; do not concatenate storyboard labels into run-on voiceover.\n\nDIALOGUE DURATION: Spoken dialogue must fit the clip. Estimate 2.5 words per second\nnatural delivery plus ~1s per acting beat. Hard maximum 3.75 words/second.\nCheck: dialogue words ÷ 2.5 + beats ≤ duration. Do not submit oversized dialogue.\n\nLATEST USER DURATION WINS: In follow-up turns, use the newest duration the user states,\neven if a previous assistant message mentioned a longer script/runtime. For example, if\nhistory says \"the full script is 66 seconds\" but the user now says \"do a 30 second version\",\ngenerate the 30 second version. Do not ask a clarification question just because history\ncontains another duration; treat the latest user request as the override.\n\nSEEDANCE SHORT-DURATION LIMIT: Seedance supports 4-15s clips. If the user explicitly asks\nfor Seedance below 4s, do not silently round up. Ask whether they prefer a 4s Seedance clip\nor an exact-duration LTX clip. If the user did not explicitly ask for Seedance, choose the\nmodel/tool that can satisfy the requested duration exactly.",
+        "baseDescription": "generate_video produces text-to-video clips and Seedance multimodal reference videos.\nUse for text-only video generation with no source image input. For Seedance, also use this\ntool when uploaded/generated images, videos, or audio are loose references. Use animate_photo\nonly when a non-Seedance source image must become the first frame of an LTX/WAN animation.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: When the user uploads a storyboard, shot sheet,\nmood board, or trailer concept image and asks to make a movie trailer/video/clip from it,\ndefault to one Seedance generate_video call with referenceImageIndices=[-1]. Do not first\nextract panels with edit_image, do not generate replacement keyframes, and do not make four\nseparate LTX animate_photo clips unless the user explicitly asks for separate clips or LTX.\nUse seedance2 when premium Spark access is available; if premium access is unavailable,\nexplain the limitation or use the best non-Seedance fallback the user accepts.\n\nEXACT / INCLUDED VIDEO PROMPTS: If the user asks for a Seedance video using uploaded or\ngenerated references and says to use a prompt exactly, pass only that literal quoted prompt\nto generate_video and set skipPromptProcessing=true plus expandPrompt=false. Do not treat\nwords inside the literal prompt, such as storyboard, script, thumbnails, or panels, as a\nrequest to create a storyboard image. If the user includes a timecoded script inside a\nvideo request, keep it in the generate_video prompt. Explicit constraints like no storyboard\npanels, no subtitles, or no captions are constraints on the video render, not instructions\nto call edit_image or generate_image.\n\nSTORYTELLING / COMMERCIAL / TRAILER PROMPTS: For creative video requests, turn the brief\ninto timed, causally connected visual beats before writing the final prompt. Default social\nvideo is 15s 9:16 with a strong first 1-2s, visible escalation, payoff, and brand/CTA/final\nimage. Commercials should show audience desire/problem, transformation, proof/benefit, and\nCTA. Trailers should follow hook → world → disruption → escalation → reveal → title/CTA.\nEvery beat must be generatable: subject, setting, action, camera, lighting, audio, and text\nrole where relevant. Avoid vague \"cinematic\" filler, feature dumps, and beautiful images with\nno visible change.\n\nVIDEO PROMPT QUOTING: ONLY use double quotes for spoken dialogue in video prompts. Never\nquote on-screen text, titles, captions, or visual text elements — describe them without\nquotes. Quotes signal speech to the model and confuse audio generation.\n\nSTORYBOARD TEXT: Structural headings, section numbers, slide titles, panel titles, and\ncaptions in storyboard references may become short audio-only narration/VO or\nkey-message beats, but they are not subtitles, title cards, lower thirds, or visible\noverlays unless the user explicitly asks for visible text, on-screen text, a title\ncard, subtitle, lower third, signage, or CTA. Keep narration as separate brief phrases\nwith pauses; do not concatenate storyboard labels into run-on voiceover.\n\nDIALOGUE DURATION: Spoken dialogue must fit the clip. Estimate 3 words per second\nnatural delivery plus ~1s per acting beat. Hard maximum 3.75 words/second.\nCheck: dialogue words ÷ 3 + beats ≤ duration. Do not submit oversized dialogue.\n\nLATEST USER DURATION WINS: In follow-up turns, use the newest duration the user states,\neven if a previous assistant message mentioned a longer script/runtime. For example, if\nhistory says \"the full script is 66 seconds\" but the user now says \"do a 30 second version\",\ngenerate the 30 second version. Do not ask a clarification question just because history\ncontains another duration; treat the latest user request as the override.\n\nSEEDANCE SHORT-DURATION LIMIT: Seedance supports 4-15s clips. If the user explicitly asks\nfor Seedance below 4s, do not silently round up. Ask whether they prefer a 4s Seedance clip\nor an exact-duration LTX clip. If the user did not explicitly ask for Seedance, choose the\nmodel/tool that can satisfy the requested duration exactly.",
         "parameterDocs": {
             "prompt": "Video prompt. Use double quotes ONLY for spoken dialogue. Describe visual text without quotes.",
             "duration": "Clip duration in seconds. Plan dialogue word count against the 3.75 words/second ceiling."
@@ -2194,7 +2194,7 @@ const PROMPT_CONTRACTS = [
         "contractId": "edit_image_v1",
         "version": "1.0.0",
         "toolName": "edit_image",
-        "baseDescription": "edit_image applies instruction-based edits to uploaded or generated images. Use when\nuploaded or reference images must guide identity or likeness.\n\nImage-to-Image prompt order: [IDENTITY LOCK] → [REQUESTED EDIT] → [REFERENCE ROLE\nMAPPING] → [POSE/COMPOSITION] → [STYLE] → [LIGHTING/REALISM] → [PRESERVE ALL\nUNMENTIONED DETAILS]. GOLDEN RULE: When editing a person, always state which image owns\nidentity — never leave identity ambiguous. Describe only the DELTA — what changes. Don't\nrewrite the entire image; the base image already contains most of the truth. Default to minimal\nchange. For multi-image edits, assign ONE primary role per reference image (identity, pose,\noutfit, style, environment). Never let a style/pose/clothing reference silently override the face.\nUse positive constraints — \"preserve exact facial likeness, face structure, eye shape, nose\nshape, mouth shape, jawline, skin tone, hairline, apparent age, and overall recognizability\"\n— not vague negatives like \"don't mess up the face\".\n\nUPLOADED IMAGE VARIANT SETS: When the user supplies a photo/portrait/reference image and\nasks for N distinct generated images deriving from that source while changing paired\nper-output details, call edit_image exactly once with sourceImageIndex=-1,\nnumberOfVariations=N, and ONE Dynamic Prompt branch with N complete options. Each option\nmust be a full concrete image prompt for one output, including the uploaded subject/reference\nanchor, requested pose or placement preservation, the specific changed appearance/style/role,\nclothing or surface details when relevant, setting/background, and any requested label text or\nvisual symbol. If one option is a remade original/preserved source and the rest are themed\nvariants, the original option must explicitly say to preserve the original clothing/wardrobe/outfit\nand background/setting, plus any requested added label, flag, logo, symbol, or prop.\nDo not call generate_image, analyze_image, or multiple serial edit_image calls first.\n\nSELECTION-GATED IMAGE STAGES: If the user asks for N image options and says they will pick\none before a later dance/video/animation, call edit_image exactly once with numberOfVariations=N\nand one Dynamic Prompt branch. After images are created, stop and ask the user to choose;\ndo not call dance_montage, animate_photo, or generate_video until they select.\n\nMULTI-PERSONA (COMBINED): When multiple personas must appear in the SAME scene, make\nONE edit_image call with ALL persona faces in one prompt and DO NOT pass personaName.\nPer-persona splits (one call each with personaName set) are RARE — only when the user\nexplicitly asks for solo images of each person individually.\n\nSTORYBOARD IMAGE BATCH RULE: When rendering scene keyframes from a screenplay/storyboard,\nnumberOfVariations is only the count; the prompt MUST be one Dynamic Prompt branch with one\nfull keyframe prompt per scene:\n{scene 1 full keyframe prompt|scene 2 full keyframe prompt|...|scene N full keyframe prompt}.\nNEVER set numberOfVariations=N with only the first scene prompt — that creates N versions of\nscene 1. For full project requests, one edit_image batch for all scene keyframes, then one\nanimate_photo batch for all video clips in parallel.\nException: if the storyboard/shot sheet is already uploaded and the user asks to make a\ntrailer/video/movie/clip from that uploaded board, do not extract panels or redraw keyframes.\nUse generate_video with Seedance references for one continuous clip unless the user explicitly\nasks for separate image keyframes or a storyboard sheet output.\n\nDIRECT UPLOADED GPT IMAGE 2 STORYBOARD SHEETS: If the user uploaded reference images and\nasks for one finished GPT Image 2 storyboard/keyframe sheet now, call edit_image directly\nwith sourceImageIndex=-1, model=\"gpt-image-2\", numberOfVariations=1, and the requested\ncanvas/aspect settings. If the user did not explicitly specify a storyboard page/canvas/sheet\nshape, default the GPT Image 2 storyboard sheet pixel dimensions to a balanced grid that hosts\nthe target cell aspect ratio natively (e.g., 12 cells with 9:16 portrait video target -> ~3:4\nportrait sheet around 1728x2304; 12 cells with 16:9 landscape video target -> ~4:3 landscape\nsheet around 2304x1728; 6 cells with 9:16 target -> ~27:32 portrait sheet around 1840x2176). Do\nNOT default the sheet to 2560x1440 landscape when cells are portrait — a landscape sheet with\na portrait-cell grid physically forces cells to ~4:3 landscape and the model will not render\n9:16 portrait rectangles inside it. Keep individual scene-cell/frame areas at the target video\naspect ratio. Do not call map_assets_for_model, analyze_image, generate_image, or a separate\nplanning tool first. The uploaded files are already available as references; describe their\nroles plainly in the edit_image prompt and generate the sheet in that call.\n\nDO NOT USE edit_image FOR UPLOADED REFERENCE LOOPED VIDEO SEGMENTS: If the user says the\nsame uploaded image/reference should be reused as the first frame and last frame of each\nscripted segment/scene/clip before stitching, they are explicitly asking to animate the\nuploaded image, not to generate new storyboard keyframes. Do not call edit_image for that\nrequest. Call animate_photo once with repeated uploaded source indices and per-scene prompts.",
+        "baseDescription": "edit_image applies instruction-based edits to uploaded or generated images. Use when\nuploaded or reference images must guide identity or likeness.\n\nImage-to-Image prompt order: [IDENTITY LOCK] → [REQUESTED EDIT] → [REFERENCE ROLE\nMAPPING] → [POSE/COMPOSITION] → [STYLE] → [LIGHTING/REALISM] → [PRESERVE ALL\nUNMENTIONED DETAILS]. GOLDEN RULE: When editing a person, state which image owns\nidentity so it is not ambiguous. Describe only the DELTA — what changes. Don't\nrewrite the entire image; the base image already contains most of the truth. Default to minimal\nchange. For multi-image edits, assign one primary role per reference image (identity, pose,\noutfit, style, environment). Do not let a style/pose/clothing reference silently override the face.\nUse positive constraints — \"preserve exact facial likeness, face structure, eye shape, nose\nshape, mouth shape, jawline, skin tone, hairline, apparent age, and overall recognizability\"\n— not vague negatives like \"don't mess up the face\".\n\nUPLOADED IMAGE VARIANT SETS: When the user supplies a photo/portrait/reference image and\nasks for N distinct generated images deriving from that source while changing paired\nper-output details, prefer one edit_image call with sourceImageIndex=-1,\nnumberOfVariations=N, and ONE Dynamic Prompt branch with N complete options. Each option\nmust be a full concrete image prompt for one output, including the uploaded subject/reference\nanchor, requested pose or placement preservation, the specific changed appearance/style/role,\nclothing or surface details when relevant, setting/background, and any requested label text or\nvisual symbol. If one option is a remade original/preserved source and the rest are themed\nvariants, the original option should explicitly say to preserve the original clothing/wardrobe/outfit\nand background/setting, plus any requested added label, flag, logo, symbol, or prop.\nDo not call generate_image, analyze_image, or multiple serial edit_image calls first.\n\nSELECTION-GATED IMAGE STAGES: If the user asks for N image options and says they will pick\none before a later dance/video/animation, prefer one edit_image call with numberOfVariations=N\nand one Dynamic Prompt branch. After images are created, stop and ask the user to choose\nunless the user explicitly asked to run the later stage immediately.\n\nMULTI-PERSONA (COMBINED): When multiple personas must appear in the SAME scene, make\none edit_image call with all persona faces in one prompt and omit personaName.\nPer-persona splits (one call each with personaName set) are RARE — only when the user\nexplicitly asks for solo images of each person individually.\n\nSTORYBOARD IMAGE BATCH RULE: When rendering scene keyframes from a screenplay/storyboard,\nnumberOfVariations is only the count; the prompt should be one Dynamic Prompt branch with one\nfull keyframe prompt per scene:\n{scene 1 full keyframe prompt|scene 2 full keyframe prompt|...|scene N full keyframe prompt}.\nDo not set numberOfVariations=N with only the first scene prompt — that creates N versions of\nscene 1. For full project requests, one edit_image batch for all scene keyframes, then one\nanimate_photo batch for all video clips in parallel.\nException: if the storyboard/shot sheet is already uploaded and the user asks to make a\ntrailer/video/movie/clip from that uploaded board, do not extract panels or redraw keyframes.\nUse generate_video with Seedance references for one continuous clip unless the user explicitly\nasks for separate image keyframes or a storyboard sheet output.\n\nDIRECT UPLOADED GPT IMAGE 2 STORYBOARD SHEETS: If the user uploaded reference images and\nasks for one finished GPT Image 2 storyboard/keyframe sheet now, call edit_image directly\nwith sourceImageIndex=-1, model=\"gpt-image-2\", numberOfVariations=1, and the requested\ncanvas/aspect settings. If the user did not explicitly specify a storyboard page/canvas/sheet\nshape, default the GPT Image 2 storyboard sheet pixel dimensions to a balanced grid that hosts\nthe target cell aspect ratio natively (e.g., 12 cells with 9:16 portrait video target -> ~3:4\nportrait sheet around 1728x2304; 12 cells with 16:9 landscape video target -> ~4:3 landscape\nsheet around 2304x1728; 6 cells with 9:16 target -> ~27:32 portrait sheet around 1840x2176). Do\nNOT default the sheet to 2560x1440 landscape when cells are portrait — a landscape sheet with\na portrait-cell grid physically forces cells to ~4:3 landscape and the model will not render\n9:16 portrait rectangles inside it. Keep individual scene-cell/frame areas at the target video\naspect ratio. Do not call map_assets_for_model, analyze_image, generate_image, or a separate\nplanning tool first. The uploaded files are already available as references; describe their\nroles plainly in the edit_image prompt and generate the sheet in that call.\n\nDO NOT USE edit_image FOR UPLOADED REFERENCE LOOPED VIDEO SEGMENTS: If the user says the\nsame uploaded image/reference should be reused as the first frame and last frame of each\nscripted segment/scene/clip before stitching, they are explicitly asking to animate the\nuploaded image, not to generate new storyboard keyframes. Do not call edit_image for that\nrequest. Call animate_photo once with repeated uploaded source indices and per-scene prompts.",
         "parameterDocs": {
             "sourceImageIndex": "Index of uploaded/generated image. Use -1 for the first uploaded image.",
             "numberOfVariations": "Number of output variants. When > 1, use a Dynamic Prompt branch with one complete prompt per output.",
@@ -2205,7 +2205,7 @@ const PROMPT_CONTRACTS = [
         "contractId": "generate_image_v1",
         "version": "1.1.0",
         "toolName": "generate_image",
-        "baseDescription": "generate_image creates images from text descriptions. Use for text-only image generation;\nuse edit_image when uploaded or reference images must guide identity/likeness.\nException: Z-image and Z-image Turbo image-to-image/enhancement requests use generate_image\nwith model=\"z-turbo\" or model=\"z-image\", sourceImageIndex=-1, and starting_image_strength;\ndo not route explicit Z-image Turbo uploaded-image enhancement to edit_image because\nedit_image does not expose Z-image models.\n\nBATCH FAN-OUT (HIGHEST-PRIORITY RULE — READ BEFORE ANYTHING ELSE BELOW):\nWhen the user explicitly asks for N images in the CURRENT turn, set numberOfVariations=N\nin ONE call. NEVER split into multiple serial generate_image calls. NEVER omit\nnumberOfVariations and try to \"generate the next one after this finishes\". Trigger phrasings:\n\"draw N\", \"make N\", \"give me N\", \"show me N\", \"render N\", \"create N\", \"generate N\",\n\"N more\", \"another N\", \"N as separate\", \"N separate images\", \"N different images\",\n\"N options\", \"N takes\", \"N versions\", \"N variations\", \"N pictures of\",\n\"all at the same time\", \"in parallel\", \"side by side as separate\".\n\nTHE PRIOR TURN DOES NOT ANCHOR THE CURRENT TURN. If the prior assistant turn used\nnumberOfVariations=1 with a composite \"N subjects in one image\" prompt, and the user\nnow says \"draw N more as separate images\" / \"as separate\" / \"separately\", DO NOT carry\nover numberOfVariations=1 from the prior call. The user is correcting that interpretation;\nset numberOfVariations=N for THIS call with one self-contained prompt per image via {|}\nDynamic Prompt branches. The new turn's count + separation language always wins over the\nprevious turn's pattern.\n\nWHEN BATCH FAN-OUT DOES NOT APPLY: numberOfVariations=1 with multiple subjects packed into\nONE prompt is correct only when the user clearly wants a SINGLE composite image (e.g.\n\"draw 2 goats in a meadow\" with no separation language, or explicit \"in one image\" / \"one\npicture of N\" / \"single image\" / \"composite\" / \"sheet\" / \"side-by-side composition\").\n\nFLUX.2 PROMPT ORDER: [SUBJECT] → [ATTRIBUTES] → [ACTION/POSE] → [CAMERA/FRAMING]\n→ [ENVIRONMENT] → [LIGHTING] → [STYLE/MEDIUM] → [MATERIALS/TEXTURES] →\n[SECONDARY DETAILS]. Always start with the main subject, never mood or atmosphere.\nUse concrete nouns and observable adjectives — \"soft overcast daylight\" not \"nice lighting\".\nGood defaults when user is underspecified: medium shot for portraits, wide shot for\nenvironments, eye-level angle, soft natural light for realism.\n\nDYNAMIC PROMPTS: When numberOfVariations > 1, use Dynamic Prompt syntax to make each\nvariation meaningfully different — not just seed-different. Syntax: {a|b|c} cycles\nsequentially, {@a|b|c} picks randomly, {~a|b} paired cycling across groups. Rules: (1) Vary\nONLY what the user left unspecified — lock in everything they specified. (2) Match option\ncount to numberOfVariations so every result is unique. (3) Briefly tell the user what you're\nvarying — never show raw {|} syntax. (4) Skip when: user wants consistency, prompt is fully\nspecified, user typed their own {|} syntax, or iterating on a specific result. (5) NEVER put\nthe count or the word \"versions\"/\"variations\" inside the prompt — the prompt always describes\na single image. The multiplicity comes ONLY from numberOfVariations + the {|} syntax.\nLINKED VARIANTS: when multiple attributes must stay paired per result, use ONE top-level\nDynamic Prompt branch with one complete self-contained prompt per output. Do NOT split\nlinked fields into separate Dynamic Prompt groups.\n\nSELECTION-GATED IMAGE STAGES: If the user asks for N image options and says they will pick\none before a later dance/video/animation, call generate_image once with numberOfVariations=N.\nAfter images are created, stop and ask the user to choose; do not call dance_montage,\nanimate_photo, or generate_video until they select.\n\nIMAGE→VIDEO DIMENSION RULE: When generating an image that will feed into a video tool\n(animate_photo, sound_to_video, etc.), the image MUST be generated at the SAME aspect\nratio and dimensions as the target video. Default video aspect ratio is 16:9 landscape —\npass aspectRatio=\"16:9\" (or the user's specified/reference ratio) so the source image\nmatches the video output. Never generate a square image for a widescreen video. Exception:\na composite GPT Image 2 storyboard/keyframe sheet for a later Seedance video is a board,\nnot a single source frame; unless the user explicitly specifies a storyboard page/canvas/sheet\nshape, default the sheet image dimensions to a balanced grid that hosts the target\nscene-cell/frame aspect natively (portrait video target -> portrait or square sheet whose\ncolumns x rows grid produces ~9:16 cells; landscape video target -> landscape sheet whose\nrows x columns grid produces ~16:9 cells). Each scene-cell/frame area preserves the target\nvideo aspect ratio.\n\nSTORYBOARD IMAGE BATCH RULE: When rendering scene keyframes from a screenplay/storyboard,\nnumberOfVariations is only the count; the prompt MUST be one Dynamic Prompt branch with one\nfull keyframe prompt per scene:\n{scene 1 full keyframe prompt|scene 2 full keyframe prompt|...|scene N full keyframe prompt}.\nNEVER set numberOfVariations=N with only the first scene prompt — that creates N versions of\nscene 1. For full project requests, one generate_image batch for all scene keyframes, then\none animate_photo batch for all video clips in parallel.\n\nSTORYTELLING / BRAND / SOCIAL IMAGE PROMPTS: If generating a storyboard, ad concept,\ntrailer sheet, meme, creator post, or provocative social concept, make the first frame or\npanel immediately legible. Preserve the user's requested tone and audience. Use concrete\ncomposition, persona, product/brand role, caption placement, readable required text, and a\nclear visual transformation or punchline. For provocative adult social content, keep subjects\nclearly adult and consensual, PG-13/non-explicit, and avoid minor-coded styling or school-coded\nsettings while still optimizing visual magnet, persona, caption bait, and replay/comment value.\n\nGPT IMAGE 2 STORYBOARD SHEET → SEEDANCE AUTO-PROCEED: If the user asks to run the whole\nGPT Image 2 storyboard/keyframe sheet plus Seedance workflow without approval, the FIRST\ngenerate_image call must create ONE composite storyboard/keyframe sheet, not loose concept\nart and not separate keyframes. Use model=\"gpt-image-2\", numberOfVariations=1, and a\ncompiled storyboard prompt that literally includes: \"Create exactly N sequential video\nstoryboard frames as one composite storyboard image\", \"Target final video aspect ratio: X\",\na `SCENES:` section, and exactly N concrete scene entries named `SCENE_01`, `SCENE_02`,\netc. Each scene entry must include `Visual/Action:`, `Camera/Motion:`, `Dialogue/VO:`\n(use `[no dialogue]` when silent), `Audio/SFX:`, and any reference/visible-text notes\nneeded for that scene. Do not send only a source brief, storyboard concept, or generic\nlayout instructions as the prompt; malformed compiled storyboard prompts are blocked by\nquality audit instead of being repaired at runtime. Unless the user explicitly specifies another\nstoryboard page/canvas/sheet shape, default the GPT Image 2 storyboard sheet pixel dimensions\nto a balanced grid that hosts the target cell aspect natively: for a 9:16 portrait video,\npick a portrait-leaning sheet whose columns x rows grid produces ~9:16 cells (e.g., 12 cells\n-> ~3:4 sheet around 1728x2304, 6 cells -> ~27:32 around 1840x2176, 9 cells -> ~9:16 around\n1504x2672); for a 16:9 landscape video, pick a landscape sheet whose rows x columns grid\nproduces ~16:9 cells (e.g., 12 cells -> ~4:3 sheet around 2304x1728). Do not force landscape\n2560x1440 when cells are portrait — a landscape sheet with a portrait-cell grid cannot host\n9:16 cells without crushing them. Preserve the requested final video aspect ratio for every\nframe area. After\nthat image completes, call generate_video once using the generated storyboard board as\n@Image1/referenceImageIndices=[0], with skipPromptProcessing=false only when the user\nexplicitly wants the storyboard text rewritten; otherwise preserve the compiled shot guide\nand use skipPromptProcessing=true, expandPrompt=false.\n\nDO NOT USE generate_image FOR UPLOADED REFERENCE LOOPED VIDEO SEGMENTS: If the user says\nthe same uploaded image/reference should be reused as the first frame and last frame of each\nscripted segment/scene/clip before stitching, they are explicitly asking to animate the\nuploaded image, not to generate new storyboard keyframes. Do not call generate_image for\nthat request. Call animate_photo once with repeated uploaded source indices and per-scene\nprompts.\n\nREUSING RESULTS: When the user asks to redo, retry, or revise (e.g., \"try a new version\",\n\"redo the video with X\"), reuse the existing source images — do NOT regenerate them unless\nthe user explicitly asks for new images or describes changes to the images themselves.\nReference the existing result indices from the prior generation. If unsure whether the user\nwants new images, ask — don't regenerate by default.",
+        "baseDescription": "generate_image creates images from text descriptions. Use for text-only image generation;\nuse edit_image when uploaded or reference images must guide identity/likeness.\nException: Z-image and Z-image Turbo image-to-image/enhancement requests use generate_image\nwith model=\"z-turbo\" or model=\"z-image\", sourceImageIndex=-1, and starting_image_strength;\ndo not route explicit Z-image Turbo uploaded-image enhancement to edit_image because\nedit_image does not expose Z-image models.\n\nBATCH FAN-OUT DEFAULT (READ BEFORE ANYTHING ELSE BELOW):\nWhen the user explicitly asks for N images in the CURRENT turn, set numberOfVariations=N\nin one call. Avoid multiple serial generate_image calls unless the user explicitly wants\nindependent projects, isolated approvals, or per-output settings that cannot share one project.\nDo not omit numberOfVariations and try to \"generate the next one after this finishes\".\nTrigger phrasings:\n\"draw N\", \"make N\", \"give me N\", \"show me N\", \"render N\", \"create N\", \"generate N\",\n\"N more\", \"another N\", \"N as separate\", \"N separate images\", \"N different images\",\n\"N options\", \"N takes\", \"N versions\", \"N variations\", \"N pictures of\",\n\"all at the same time\", \"in parallel\", \"side by side as separate\".\n\nTHE PRIOR TURN DOES NOT ANCHOR THE CURRENT TURN. If the prior assistant turn used\nnumberOfVariations=1 with a composite \"N subjects in one image\" prompt, and the user\nnow says \"draw N more as separate images\" / \"as separate\" / \"separately\", DO NOT carry\nover numberOfVariations=1 from the prior call. The user is correcting that interpretation;\nset numberOfVariations=N for THIS call with one self-contained prompt per image via {|}\nDynamic Prompt branches. The new turn's count + separation language always wins over the\nprevious turn's pattern.\n\nWHEN BATCH FAN-OUT DOES NOT APPLY: numberOfVariations=1 with multiple subjects packed into\nONE prompt is correct only when the user clearly wants a SINGLE composite image (e.g.\n\"draw 2 goats in a meadow\" with no separation language, or explicit \"in one image\" / \"one\npicture of N\" / \"single image\" / \"composite\" / \"sheet\" / \"side-by-side composition\").\n\nFLUX.2 PROMPT ORDER: [SUBJECT] → [ATTRIBUTES] → [ACTION/POSE] → [CAMERA/FRAMING]\n→ [ENVIRONMENT] → [LIGHTING] → [STYLE/MEDIUM] → [MATERIALS/TEXTURES] →\n[SECONDARY DETAILS]. By default, start with the main subject and concrete observable\nattributes; use mood or atmosphere first only when the user explicitly asks for that shape.\nUse concrete nouns and observable adjectives — \"soft overcast daylight\" not \"nice lighting\".\nGood defaults when user is underspecified: medium shot for portraits, wide shot for\nenvironments, eye-level angle, soft natural light for realism.\n\nDYNAMIC PROMPTS: When numberOfVariations > 1, use Dynamic Prompt syntax to make each\nvariation meaningfully different — not just seed-different. Syntax: {a|b|c} cycles\nsequentially, {@a|b|c} picks randomly, {~a|b} paired cycling across groups. Rules: (1) Vary\nONLY what the user left unspecified — lock in everything they specified. (2) Match option\ncount to numberOfVariations so every result is unique. (3) Briefly tell the user what you're\nvarying without exposing raw {|} syntax unless the user asks to inspect the prompt.\n(4) Skip when: user wants consistency, prompt is fully\nspecified, user typed their own {|} syntax, or iterating on a specific result. (5) Do not put\nthe count or the word \"versions\"/\"variations\" inside the prompt — the prompt always describes\na single image. The multiplicity comes ONLY from numberOfVariations + the {|} syntax.\nLINKED VARIANTS: when multiple attributes must stay paired per result, use ONE top-level\nDynamic Prompt branch with one complete self-contained prompt per output. Do NOT split\nlinked fields into separate Dynamic Prompt groups.\n\nSELECTION-GATED IMAGE STAGES: If the user asks for N image options and says they will pick\none before a later dance/video/animation, call generate_image once with numberOfVariations=N.\nAfter images are created, stop and ask the user to choose unless the user explicitly asked\nto run the later stage immediately.\n\nIMAGE→VIDEO DIMENSION RULE: When generating an image that will feed into a video tool\n(animate_photo, sound_to_video, etc.), the image MUST be generated at the SAME aspect\nratio and dimensions as the target video. Default video aspect ratio is 16:9 landscape —\npass aspectRatio=\"16:9\" (or the user's specified/reference ratio) so the source image\nmatches the video output. Do not generate a square image for a widescreen video. Exception:\na composite GPT Image 2 storyboard/keyframe sheet for a later Seedance video is a board,\nnot a single source frame; unless the user explicitly specifies a storyboard page/canvas/sheet\nshape, default the sheet image dimensions to a balanced grid that hosts the target\nscene-cell/frame aspect natively (portrait video target -> portrait or square sheet whose\ncolumns x rows grid produces ~9:16 cells; landscape video target -> landscape sheet whose\nrows x columns grid produces ~16:9 cells). Each scene-cell/frame area preserves the target\nvideo aspect ratio.\n\nSTORYBOARD IMAGE BATCH RULE: When rendering scene keyframes from a screenplay/storyboard,\nnumberOfVariations is only the count; the prompt should be one Dynamic Prompt branch with one\nfull keyframe prompt per scene:\n{scene 1 full keyframe prompt|scene 2 full keyframe prompt|...|scene N full keyframe prompt}.\nDo not set numberOfVariations=N with only the first scene prompt — that creates N versions of\nscene 1. For full project requests, one generate_image batch for all scene keyframes, then\none animate_photo batch for all video clips in parallel.\n\nSTORYTELLING / BRAND / SOCIAL IMAGE PROMPTS: If generating a storyboard, ad concept,\ntrailer sheet, meme, creator post, or provocative social concept, make the first frame or\npanel immediately legible. Preserve the user's requested tone and audience. Use concrete\ncomposition, persona, product/brand role, caption placement, readable required text, and a\nclear visual transformation or punchline. For provocative adult social content, keep subjects\nclearly adult and consensual, PG-13/non-explicit, and avoid minor-coded styling or school-coded\nsettings while still optimizing visual magnet, persona, caption bait, and replay/comment value.\n\nGPT IMAGE 2 STORYBOARD SHEET → SEEDANCE AUTO-PROCEED: If the user asks to run the whole\nGPT Image 2 storyboard/keyframe sheet plus Seedance workflow without approval, the FIRST\ngenerate_image call must create ONE composite storyboard/keyframe sheet, not loose concept\nart and not separate keyframes. Use model=\"gpt-image-2\", numberOfVariations=1, and a\ncompiled storyboard prompt that literally includes: \"Create exactly N sequential video\nstoryboard frames as one composite storyboard image\", \"Target final video aspect ratio: X\",\na `SCENES:` section, and exactly N concrete scene entries named `SCENE_01`, `SCENE_02`,\netc. Each scene entry must include `Visual/Action:`, `Camera/Motion:`, `Dialogue/VO:`\n(use `[no dialogue]` when silent), `Audio/SFX:`, and any reference/visible-text notes\nneeded for that scene. Do not send only a source brief, storyboard concept, or generic\nlayout instructions as the prompt; malformed compiled storyboard prompts are blocked by\nquality audit instead of being repaired at runtime. If the user does not explicitly specify\na frame count, choose N with the shared storyboard density default: at least one key visual\nbeat about every 2 seconds, rounded up and clamped to 6-16 total storyboard frames\n(for example, a 60 second commercial defaults to N=16). Unless the user explicitly specifies another\nstoryboard page/canvas/sheet shape, default the GPT Image 2 storyboard sheet pixel dimensions\nto a balanced grid that hosts the target cell aspect natively: for a 9:16 portrait video,\npick a portrait-leaning sheet whose columns x rows grid produces ~9:16 cells (e.g., 12 cells\n-> ~3:4 sheet around 1728x2304, 6 cells -> ~27:32 around 1840x2176, 9 cells -> ~9:16 around\n1504x2672); for a 16:9 landscape video, pick a landscape sheet whose rows x columns grid\nproduces ~16:9 cells (e.g., 12 cells -> ~4:3 sheet around 2304x1728). Do not force landscape\n2560x1440 when cells are portrait — a landscape sheet with a portrait-cell grid cannot host\n9:16 cells without crushing them. Preserve the requested final video aspect ratio for every\nframe area. After\nthat image completes, call generate_video once using the generated storyboard board as\n@Image1/referenceImageIndices=[0], with skipPromptProcessing=false only when the user\nexplicitly wants the storyboard text rewritten; otherwise preserve the compiled shot guide\nand use skipPromptProcessing=true, expandPrompt=false.\n\nDO NOT USE generate_image FOR UPLOADED REFERENCE LOOPED VIDEO SEGMENTS: If the user says\nthe same uploaded image/reference should be reused as the first frame and last frame of each\nscripted segment/scene/clip before stitching, they are explicitly asking to animate the\nuploaded image, not to generate new storyboard keyframes. Do not call generate_image for\nthat request. Call animate_photo once with repeated uploaded source indices and per-scene\nprompts.\n\nREUSING RESULTS: When the user asks to redo, retry, or revise (e.g., \"try a new version\",\n\"redo the video with X\"), reuse the existing source images — do NOT regenerate them unless\nthe user explicitly asks for new images or describes changes to the images themselves.\nReference the existing result indices from the prior generation. If unsure whether the user\nwants new images, ask — don't regenerate by default.",
         "parameterDocs": {
             "prompt": "Text description. Follow FLUX.2 prompt order: subject first. Use Dynamic Prompt syntax when numberOfVariations > 1.",
             "numberOfVariations": "Number of distinct outputs. Use Dynamic Prompt {|} syntax to vary one attribute per image. Never put the count in the prompt itself.",

package/llm.txt CHANGED Viewed

@@ -1,8 +1,8 @@
 # sogni-creative-agent-skill
-Agent skill and CLI for Sogni AI image and video generation. Works as a skill
-source for Claude Code, OpenClaw, Hermes Agent, Manus AI, and other agent
-runtimes.
+Agent skill and CLI for Sogni AI image, video, and music generation. Works as
+a skill source for Claude Code, OpenClaw, Hermes Agent, Manus AI, and other
+agent runtimes.
 ## Install (pick the integration that matches your environment)

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "sogni-creative-agent-skill",
   "name": "Sogni Creative Agent Skill — Image, Video & Music Generation",
   "description": "Agent skill and CLI for Sogni AI image, video, and music generation.",
-  "version": "3.3.3",
+  "version": "3.3.5",
   "skills": [
     "."
   ],

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@sogni-ai/sogni-creative-agent-skill",
-  "version": "3.3.3",
+  "version": "3.3.5",
   "description": "Sogni Creative Agent Skill: agent skill and CLI for Sogni AI image, video, and music generation.",
   "type": "module",
   "main": "sogni-agent.mjs",
@@ -67,7 +67,7 @@
     "sogni-agent.mjs"
   ],
   "dependencies": {
-    "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
+    "@sogni-ai/sogni-intelligence-client": "^3.0.11",
     "execa": "^9.6.1",
     "json5": "^2.2.3",
     "sharp": "^0.34.5"

package/skill-package.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "private": true,
   "type": "module",
   "dependencies": {
-    "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
+    "@sogni-ai/sogni-intelligence-client": "^3.0.11",
     "execa": "^9.6.1",
     "json5": "^2.2.3",
     "sharp": "^0.34.5"

package/sogni-agent.mjs CHANGED Viewed

@@ -4024,6 +4024,23 @@ async function imageDataUriFromPathOrUrl(pathOrUrl) {
   return `data:${mimeType};base64,${buffer.toString('base64')}`;
 }
+const DEFAULT_API_CHAT_SYSTEM_PROMPT = `ROLE: You are Sogni Agent, a practical creative production assistant for Sogni's media tools. Be direct, specific, inventive, and warm. Avoid generic text-only LLM framing and describe Sogni's real media capabilities when they are relevant.
+V2 TURN ARCHITECTURE:
+- Hosted chat may run a classifier/planner before the assistant round. That stage proposes text/tool/workflow mode and the allowed tool surface; it does not call tools or spend credits.
+- In the assistant/execution round, use only the tools currently exposed to you. If the user asked Sogni to generate, edit, animate, render, analyze, or otherwise execute media and the matching tool is available, call it.
+- If the current round is text-only, answer the question completely in prose. Product, model, pricing, credit, capability, and "what can you do?" questions are usually text-only until the user asks you to start making media.
+- If required input is missing, ask a concise clarifying question. For underspecified creative taste, choose a reasonable default and proceed.
+- Do not narrate hidden planning, tool selection, JSON, function names, or internal architecture to the user.
+SOGNI PRODUCT KNOWLEDGE:
+- Sogni can create and edit images, generate and transform videos, compose music/lyrics, restore photos, apply styles, analyze media, and use uploaded or generated assets as references.
+- GPT Image 2 in Sogni creates images from text prompts, edits/restyles uploaded or generated references, builds storyboard/keyframe sheets, character/reference boards, ad/product composites, and layout/text-heavy stills.
+- For action requests, use image generation for text-to-image and image editing when references guide identity, likeness, composition, style, objects, logos, or products. Paid renders show a preflight estimate before spending.
+- Featured workflow: GPT Image 2 storyboard/keyframes -> Seedance 2.0 for finished social videos such as ads, trailers, character intros, and storyboard-to-video flows.
+- For Sogni, model, GPT Image, Seedance, or creative capability questions, describe the media tools Sogni can use instead of falling back to generic text-only limitations.
+- For unknown product facts, state uncertainty and point to docs.sogni.ai or Discord.`;
 /**
  * Build the persona/memory/personality dynamic-system-prompt suffix the
  * skill injects into `/v1/chat/completions` (and durable
@@ -4072,7 +4089,7 @@ function buildSkillDynamicSystemPrompt() {
     const memories = loadMemories();
     if (memories.length > 0) {
       const memoryContext = memories.map((m) => `${m.key}: ${m.value}`).join('; ');
-      suffix += `\nUser preferences (always respect these): ${memoryContext}`;
+      suffix += `\nUser preferences (apply unless the latest user request overrides them): ${memoryContext}`;
     }
   } catch {
     // best-effort
@@ -4099,8 +4116,7 @@ async function buildApiChatMessages(apiMediaRefs, apiMediaReferences) {
   // tokens, Wan numeric tokens). Wiring it through here keeps the public
   // skill's --api-chat behavior aligned with sogni-chat and the
   // /v1/chat/completions endpoint when references are present.
-  const baseSystem = options.apiSystemPrompt ||
-    'You are a concise creative production assistant. Use Sogni creative tools when they help produce concrete media.';
+  const baseSystem = options.apiSystemPrompt || DEFAULT_API_CHAT_SYSTEM_PROMPT;
   const dynamicSuffix = buildSkillDynamicSystemPrompt();
   const systemWithDynamic = dynamicSuffix ? `${baseSystem}${dynamicSuffix}` : baseSystem;
   const system = apiMediaRefs.length > 0

package/version.mjs CHANGED Viewed

	@@ -1 +1 @@
1	- export const PACKAGE_VERSION = '3.3.3';
1	+ export const PACKAGE_VERSION = '3.3.5';