npm - @sogni-ai/sogni-creative-agent-skill - Versions diffs - 2.1.0 → 2.1.2 - Mend

@sogni-ai/sogni-creative-agent-skill 2.1.0 → 2.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

package/README.md +23 -3
package/SKILL.md +16 -9
package/generated/creative-agent-runtime.mjs +1138 -0
package/openclaw.plugin.json +1 -1
package/package.json +10 -6
package/scripts/check-creative-agent-runtime.mjs +57 -0
package/skill-package.json +1 -1
package/sogni-agent.mjs +267 -699
package/version.mjs +1 -1

package/README.md CHANGED Viewed

@@ -101,6 +101,18 @@ cd sogni-creative-agent-skill
 npm install
 ```
+### Maintainer Runtime Sync
+This public skill keeps CLI/runtime glue here, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. With both repos checked out as siblings, refresh the generated runtime before publishing:
+```bash
+npm run sync:creative-agent-runtime
+```
+`npm test` runs `npm run check:creative-agent-runtime` first, which regenerates this file and fails if it differs from the committed copy.
+The generated file is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
 ### Advanced OpenClaw Config
 When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
@@ -152,6 +164,13 @@ sogni-agent --video --target-resolution 768 \
 sogni-agent --video -m seedance2 --duration 8 \
   "A polished product reveal with native ambient sound"
+# Seedance multimodal context with public HTTPS references
+sogni-agent --video -m seedance2 --workflow t2v \
+  --ref https://cdn.example.com/product.png \
+  --ref-video https://cdn.example.com/motion.mp4 \
+  --ref-audio https://cdn.example.com/music.m4a \
+  "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
 # Image-to-video (i2v)
 sogni-agent --video --ref cat.jpg "gentle camera pan"
@@ -177,7 +196,7 @@ sogni-agent --help
 For local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. `--video-start`, `--audio-start`, and `--audio-duration` let you generate focused segments, while `--concat-videos` can stitch them and optionally mux a single soundtrack with `--concat-audio`.
-V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet.
+V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance also accepts public HTTPS image, video, and audio references; audio references must be paired with an image or video reference.
 ## LTX-2.3 Prompting Guide
@@ -218,7 +237,8 @@ Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles
 ## Video Sizing Rules (Aspect Ratios)
 - WAN models use dimensions divisible by 16, min 480px, max 1536px.
-- LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. LTX 2.3 supports 640px to 3840px.
+- LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048px on the long side.
+- Seedance runs at fixed 24fps and supports 4-15s durations. Other default/WAN video paths support up to 10s; LTX and WAN animate workflows support up to 20s.
 - The script auto-normalizes video sizes to satisfy those constraints.
 - Use `--target-resolution <px>` for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio.
 - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with a strict aspect-fit (`fit: inside`) and then uses the *resized reference dimensions* as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example: `1024x1536` requested, but ref becomes `1024x1535`).
@@ -239,7 +259,7 @@ Run `sogni-agent --help` for the complete CLI. These are the options most agents
 | `-o <path>` | Save output locally |
 | `-c <path>` | Provide image context for edits |
 | `--video` | Generate video instead of image |
-| `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references |
+| `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references; Seedance HTTPS references are forwarded as URL context |
 | `--target-resolution <px>` | Target the short side while preserving aspect ratio |
 | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
 | `--persona <name>` | Use a saved persona reference |

package/SKILL.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 name: sogni-creative-agent-skill
-version: "2.1.0"
+version: "2.1.2"
 description: Sogni Creative Agent Skill: agent skill and CLI for image and video generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to "draw", "generate", "create an image", "make a video/animate", "apply a style", or "generate me as a superhero".
 homepage: https://sogni.ai
 metadata:
@@ -185,18 +185,18 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
 | `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
 | `--auto-resize-assets` | Auto-resize video assets | true |
 | `--no-auto-resize-assets` | Disable auto-resize | - |
-| `--estimate-video-cost` | Estimate video cost and exit (requires --steps) | - |
+| `--estimate-video-cost` | Estimate video cost and exit | - |
 | `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
 | `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
 | `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
-| `--ref <path>` | Reference image for video or photobooth face | required for video/photobooth |
-| `--ref-end <path>` | End frame for i2v interpolation | - |
-| `--ref-audio <path>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
+| `--ref <path\|url>` | Reference image for video or photobooth face | required for video/photobooth |
+| `--ref-end <path\|url>` | End frame for i2v interpolation | - |
+| `--ref-audio <path\|url>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
 | `--audio-start <sec>` | Start offset into `--ref-audio` | - |
 | `--audio-duration <sec>` | Duration slice from `--ref-audio` | - |
 | `--reference-audio-identity <path>` | Voice identity clip for LTX native audio | - |
 | `--voice-persona <name>` | Use saved persona voice clip as LTX voice identity | - |
-| `--ref-video <path>` | Reference video for animate/v2v workflows | - |
+| `--ref-video <path\|url>` | Reference video for animate/v2v workflows | - |
 | `--video-start <sec>` | Start offset into `--ref-video` for segmented V2V/animate | - |
 | `--controlnet-name <name>` | ControlNet type for v2v: canny\|pose\|depth\|detailer | - |
 | `--controlnet-strength <n>` | ControlNet strength for v2v (0.0-1.0) | canny/pose/depth 0.85, detailer 1.0 |
@@ -462,6 +462,13 @@ node sogni-agent.mjs --video --target-resolution 768 \
 node sogni-agent.mjs --video -m seedance2 --duration 8 \
   "A polished product reveal with native ambient sound"
+# Seedance multimodal context with public HTTPS references
+node sogni-agent.mjs --video -m seedance2 --workflow t2v \
+  --ref https://cdn.example.com/product.png \
+  --ref-video https://cdn.example.com/motion.mp4 \
+  --ref-audio https://cdn.example.com/music.m4a \
+  "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
 # Sound-to-video (s2v)
 node sogni-agent.mjs --video --ref face.jpg --ref-audio speech.m4a \
   -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
@@ -511,7 +518,7 @@ node sogni-agent.mjs --video --workflow v2v --ref-video scene.mp4 \
 ```
 ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
-Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet.
+Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context; audio references must be paired with an image or video reference.
 ```bash
 # Seedance V2V without ControlNet
@@ -685,7 +692,7 @@ When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or *
 - For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
 - If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
 - Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
-- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even if the exact output is not literal 3840x2160.
+- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
 **Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--concat-videos`, `--list-media`) for all file operations and video manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
@@ -919,7 +926,7 @@ node {{skillDir}}/sogni-agent.mjs --angles-360 -c subject.jpg "same subject"
 ## Troubleshooting
 - **Auth errors**: Check `SOGNI_API_KEY` or the credentials in `~/.config/sogni/credentials`
-- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and LTX 2.3 supports up to 3840px. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
+- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
 - **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
 - **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
 - **Timeouts**: Try a faster model or increase `-t` timeout