@sogni-ai/sogni-creative-agent-skill 2.1.0 → 2.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +23 -3
- package/SKILL.md +16 -9
- package/generated/creative-agent-runtime.mjs +1138 -0
- package/openclaw.plugin.json +1 -1
- package/package.json +10 -6
- package/scripts/check-creative-agent-runtime.mjs +57 -0
- package/skill-package.json +1 -1
- package/sogni-agent.mjs +267 -699
- package/version.mjs +1 -1
package/README.md
CHANGED
|
@@ -101,6 +101,18 @@ cd sogni-creative-agent-skill
|
|
|
101
101
|
npm install
|
|
102
102
|
```
|
|
103
103
|
|
|
104
|
+
### Maintainer Runtime Sync
|
|
105
|
+
|
|
106
|
+
This public skill keeps CLI/runtime glue here, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. With both repos checked out as siblings, refresh the generated runtime before publishing:
|
|
107
|
+
|
|
108
|
+
```bash
|
|
109
|
+
npm run sync:creative-agent-runtime
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
`npm test` runs `npm run check:creative-agent-runtime` first, which regenerates this file and fails if it differs from the committed copy.
|
|
113
|
+
|
|
114
|
+
The generated file is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
|
|
115
|
+
|
|
104
116
|
### Advanced OpenClaw Config
|
|
105
117
|
|
|
106
118
|
When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
|
|
@@ -152,6 +164,13 @@ sogni-agent --video --target-resolution 768 \
|
|
|
152
164
|
sogni-agent --video -m seedance2 --duration 8 \
|
|
153
165
|
"A polished product reveal with native ambient sound"
|
|
154
166
|
|
|
167
|
+
# Seedance multimodal context with public HTTPS references
|
|
168
|
+
sogni-agent --video -m seedance2 --workflow t2v \
|
|
169
|
+
--ref https://cdn.example.com/product.png \
|
|
170
|
+
--ref-video https://cdn.example.com/motion.mp4 \
|
|
171
|
+
--ref-audio https://cdn.example.com/music.m4a \
|
|
172
|
+
"Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
|
|
173
|
+
|
|
155
174
|
# Image-to-video (i2v)
|
|
156
175
|
sogni-agent --video --ref cat.jpg "gentle camera pan"
|
|
157
176
|
|
|
@@ -177,7 +196,7 @@ sogni-agent --help
|
|
|
177
196
|
|
|
178
197
|
For local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. `--video-start`, `--audio-start`, and `--audio-duration` let you generate focused segments, while `--concat-videos` can stitch them and optionally mux a single soundtrack with `--concat-audio`.
|
|
179
198
|
|
|
180
|
-
V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet.
|
|
199
|
+
V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance also accepts public HTTPS image, video, and audio references; audio references must be paired with an image or video reference.
|
|
181
200
|
|
|
182
201
|
## LTX-2.3 Prompting Guide
|
|
183
202
|
|
|
@@ -218,7 +237,8 @@ Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles
|
|
|
218
237
|
## Video Sizing Rules (Aspect Ratios)
|
|
219
238
|
|
|
220
239
|
- WAN models use dimensions divisible by 16, min 480px, max 1536px.
|
|
221
|
-
- LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64.
|
|
240
|
+
- LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048px on the long side.
|
|
241
|
+
- Seedance runs at fixed 24fps and supports 4-15s durations. Other default/WAN video paths support up to 10s; LTX and WAN animate workflows support up to 20s.
|
|
222
242
|
- The script auto-normalizes video sizes to satisfy those constraints.
|
|
223
243
|
- Use `--target-resolution <px>` for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio.
|
|
224
244
|
- For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with a strict aspect-fit (`fit: inside`) and then uses the *resized reference dimensions* as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example: `1024x1536` requested, but ref becomes `1024x1535`).
|
|
@@ -239,7 +259,7 @@ Run `sogni-agent --help` for the complete CLI. These are the options most agents
|
|
|
239
259
|
| `-o <path>` | Save output locally |
|
|
240
260
|
| `-c <path>` | Provide image context for edits |
|
|
241
261
|
| `--video` | Generate video instead of image |
|
|
242
|
-
| `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references |
|
|
262
|
+
| `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references; Seedance HTTPS references are forwarded as URL context |
|
|
243
263
|
| `--target-resolution <px>` | Target the short side while preserving aspect ratio |
|
|
244
264
|
| `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
|
|
245
265
|
| `--persona <name>` | Use a saved persona reference |
|
package/SKILL.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: sogni-creative-agent-skill
|
|
3
|
-
version: "2.1.
|
|
3
|
+
version: "2.1.2"
|
|
4
4
|
description: Sogni Creative Agent Skill: agent skill and CLI for image and video generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to "draw", "generate", "create an image", "make a video/animate", "apply a style", or "generate me as a superhero".
|
|
5
5
|
homepage: https://sogni.ai
|
|
6
6
|
metadata:
|
|
@@ -185,18 +185,18 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
|
185
185
|
| `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
|
|
186
186
|
| `--auto-resize-assets` | Auto-resize video assets | true |
|
|
187
187
|
| `--no-auto-resize-assets` | Disable auto-resize | - |
|
|
188
|
-
| `--estimate-video-cost` | Estimate video cost and exit
|
|
188
|
+
| `--estimate-video-cost` | Estimate video cost and exit | - |
|
|
189
189
|
| `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
|
|
190
190
|
| `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
|
|
191
191
|
| `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
|
|
192
|
-
| `--ref <path>` | Reference image for video or photobooth face | required for video/photobooth |
|
|
193
|
-
| `--ref-end <path>` | End frame for i2v interpolation | - |
|
|
194
|
-
| `--ref-audio <path>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
|
|
192
|
+
| `--ref <path\|url>` | Reference image for video or photobooth face | required for video/photobooth |
|
|
193
|
+
| `--ref-end <path\|url>` | End frame for i2v interpolation | - |
|
|
194
|
+
| `--ref-audio <path\|url>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
|
|
195
195
|
| `--audio-start <sec>` | Start offset into `--ref-audio` | - |
|
|
196
196
|
| `--audio-duration <sec>` | Duration slice from `--ref-audio` | - |
|
|
197
197
|
| `--reference-audio-identity <path>` | Voice identity clip for LTX native audio | - |
|
|
198
198
|
| `--voice-persona <name>` | Use saved persona voice clip as LTX voice identity | - |
|
|
199
|
-
| `--ref-video <path>` | Reference video for animate/v2v workflows | - |
|
|
199
|
+
| `--ref-video <path\|url>` | Reference video for animate/v2v workflows | - |
|
|
200
200
|
| `--video-start <sec>` | Start offset into `--ref-video` for segmented V2V/animate | - |
|
|
201
201
|
| `--controlnet-name <name>` | ControlNet type for v2v: canny\|pose\|depth\|detailer | - |
|
|
202
202
|
| `--controlnet-strength <n>` | ControlNet strength for v2v (0.0-1.0) | canny/pose/depth 0.85, detailer 1.0 |
|
|
@@ -462,6 +462,13 @@ node sogni-agent.mjs --video --target-resolution 768 \
|
|
|
462
462
|
node sogni-agent.mjs --video -m seedance2 --duration 8 \
|
|
463
463
|
"A polished product reveal with native ambient sound"
|
|
464
464
|
|
|
465
|
+
# Seedance multimodal context with public HTTPS references
|
|
466
|
+
node sogni-agent.mjs --video -m seedance2 --workflow t2v \
|
|
467
|
+
--ref https://cdn.example.com/product.png \
|
|
468
|
+
--ref-video https://cdn.example.com/motion.mp4 \
|
|
469
|
+
--ref-audio https://cdn.example.com/music.m4a \
|
|
470
|
+
"Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
|
|
471
|
+
|
|
465
472
|
# Sound-to-video (s2v)
|
|
466
473
|
node sogni-agent.mjs --video --ref face.jpg --ref-audio speech.m4a \
|
|
467
474
|
-m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
|
|
@@ -511,7 +518,7 @@ node sogni-agent.mjs --video --workflow v2v --ref-video scene.mp4 \
|
|
|
511
518
|
```
|
|
512
519
|
|
|
513
520
|
ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
|
|
514
|
-
Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet.
|
|
521
|
+
Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context; audio references must be paired with an image or video reference.
|
|
515
522
|
|
|
516
523
|
```bash
|
|
517
524
|
# Seedance V2V without ControlNet
|
|
@@ -685,7 +692,7 @@ When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or *
|
|
|
685
692
|
- For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
|
|
686
693
|
- If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
|
|
687
694
|
- Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
|
|
688
|
-
- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even
|
|
695
|
+
- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
|
|
689
696
|
|
|
690
697
|
**Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--concat-videos`, `--list-media`) for all file operations and video manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
|
|
691
698
|
|
|
@@ -919,7 +926,7 @@ node {{skillDir}}/sogni-agent.mjs --angles-360 -c subject.jpg "same subject"
|
|
|
919
926
|
## Troubleshooting
|
|
920
927
|
|
|
921
928
|
- **Auth errors**: Check `SOGNI_API_KEY` or the credentials in `~/.config/sogni/credentials`
|
|
922
|
-
- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and
|
|
929
|
+
- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
|
|
923
930
|
- **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
|
|
924
931
|
- **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
|
|
925
932
|
- **Timeouts**: Try a faster model or increase `-t` timeout
|