@sogni-ai/sogni-creative-agent-skill 2.1.3 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,82 +1,117 @@
1
1
  <p align="center">
2
- <img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Telegram image render workflow" width="320" />
2
+ <img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Sogni Creative Agent Skill rendering an image from a Telegram-style chat" width="320" />
3
3
  </p>
4
4
 
5
- # Sogni Creative Agent Skill: Image & Video Generation for Agents
5
+ <h1 align="center">Sogni Creative Agent Skill</h1>
6
6
 
7
- **Sogni Creative Agent Skill** gives AI agent runtimes such as Claude Code,
8
- [OpenClaw](https://github.com/OpenClaw/OpenClaw),
9
- [Hermes Agent](https://hermes-agent.nousresearch.com/),
10
- [Manus AI](https://manus.im), and more — image generation, video generation, and
11
- creative-media tools powered by [Sogni AI](https://sogni.ai)'s decentralized GPU
12
- network.
7
+ <p align="center">Image, video, and music generation for AI agents powered by <a href="https://sogni.ai">Sogni AI</a>'s decentralized GPU network.</p>
13
8
 
14
- Drop it into the setup you already have:
15
- - as a standalone Node.js CLI
16
- - as a skill source for **Hermes Agent**, **Manus AI**, and other agent frameworks
17
- - as an [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
9
+ <p align="center">
10
+ <a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="npm" src="https://img.shields.io/npm/v/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
11
+ <a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="downloads" src="https://img.shields.io/npm/dm/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
12
+ <img alt="node" src="https://img.shields.io/node/v/@sogni-ai/sogni-creative-agent-skill.svg" />
13
+ <a href="./LICENSE"><img alt="license" src="https://img.shields.io/npm/l/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
14
+ </p>
15
+
16
+ ---
17
+
18
+ **Sogni Creative Agent Skill** plugs into the agent runtime you already use — Claude Code, [OpenClaw](https://github.com/OpenClaw/OpenClaw), [Hermes Agent](https://hermes-agent.nousresearch.com/), [Manus AI](https://manus.im), and others — and gives it production-quality image, video, and music generation through a single CLI: `sogni-agent`.
19
+
20
+ It ships three ways:
21
+
22
+ - a standalone Node.js CLI (`sogni-agent`)
23
+ - a skill source that any [`SKILL.md`](./SKILL.md)-aware agent can load
24
+ - a published [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
18
25
 
19
- For install requests, use the CLI plus skill setup by default.
26
+ With this skill, an agent can:
20
27
 
21
- With Sogni Creative Agent Skill, an agent can:
22
- - generate images from prompts
23
- - edit and restyle existing images
24
- - create videos from text, images, audio, or reference video
28
+ - generate images from prompts and edit/restyle existing images
29
+ - create videos from text, images, audio, or reference video (LTX-2.3, WAN 2.2, Seedance 2.0)
30
+ - generate instrumental music or full songs with lyrics
31
+ - run hosted creative workflows including storyboard-driven video
25
32
  - save personas, preferences, and last-render state across sessions
26
33
  - check balances, list models, and refine previous results
27
34
 
35
+ > **Fastest install:** paste this repo's GitHub URL into your agent and ask it to "install this skill".
36
+
37
+ ---
38
+
39
+ ## Table of Contents
40
+
41
+ - [Quick Start](#quick-start)
42
+ - [Requirements](#requirements)
43
+ - [Installation](#installation)
44
+ - [Node CLI (default)](#node-cli-default)
45
+ - [OpenClaw plugin](#openclaw-plugin)
46
+ - [Hermes Agent / Manus / other frameworks](#hermes-agent--manus--other-frameworks)
47
+ - [Manual install from source](#manual-install-from-source)
48
+ - [Upgrading safely from inside an agent](#upgrading-safely-from-inside-an-agent)
49
+ - [Setup (Sogni API key)](#setup-sogni-api-key)
50
+ - [Usage](#usage)
51
+ - [CLI Reference](#cli-reference)
52
+ - [Common options](#common-options)
53
+ - [Quality presets](#quality-presets)
54
+ - [Recommended models](#recommended-models)
55
+ - [Video Sizing & Aspect Ratios](#video-sizing--aspect-ratios)
56
+ - [LTX-2.3 Prompting Guide](#ltx-23-prompting-guide)
57
+ - [Photobooth (Face Transfer)](#photobooth-face-transfer)
58
+ - [Personas, Memory, and Personality](#personas-memory-and-personality)
59
+ - [Hosted API Modes](#hosted-api-modes)
60
+ - [Dynamic Prompt Variations](#dynamic-prompt-variations)
61
+ - [Token Auto-Fallback](#token-auto-fallback)
62
+ - [Error Reporting & Output](#error-reporting--output)
63
+ - [For AI Agents](#for-ai-agents)
64
+ - [Development](#development)
65
+ - [License](#license)
66
+
67
+ ---
68
+
28
69
  ## Quick Start
29
70
 
30
- 1. Create Sogni credentials once. See [Setup](#setup).
31
- 2. Install the command-line tool:
71
+ 1. Get a Sogni API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username) and save it — see [Setup](#setup-sogni-api-key).
72
+ 2. Install the CLI:
32
73
 
33
- ```bash
34
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
35
- sogni-agent --version
36
- ```
74
+ ```bash
75
+ npm install -g @sogni-ai/sogni-creative-agent-skill@latest
76
+ sogni-agent --version
77
+ ```
37
78
 
38
- 3. Point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md).
79
+ 3. Point your agent runtime at this repository's [`SKILL.md`](./SKILL.md).
80
+
81
+ Then ask your agent to do something:
39
82
 
40
- Then ask your agent to do something simple, for example:
41
83
  - "Generate an image of a sunset over mountains"
42
84
  - "Edit this image to add a rainbow"
43
85
  - "Make a video of a cat playing piano"
86
+ - "Generate a 30 second synthwave product-launch theme"
44
87
  - "Turn my selfie into James Bond using photobooth"
45
88
  - "Refine the last image at higher quality"
46
89
 
47
- ## Installation
90
+ ---
48
91
 
49
- Default install behavior: when someone asks to install Sogni Creative Agent Skill, install the command-line tool and skill source.
92
+ ## Requirements
50
93
 
51
- ```bash
52
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
53
- sogni-agent --version
54
- ```
94
+ - **Node.js ≥ 22.11.0**
95
+ - **Sogni API key** ([dashboard.sogni.ai](https://dashboard.sogni.ai))
96
+ - **`ffmpeg`** *(optional)* — required for local utilities such as `--angles-360-video`, `--concat-videos`, and `--extract-last-frame`. Set `FFMPEG_PATH` to override discovery.
97
+ - macOS, Linux, or Windows
55
98
 
56
- Then point the agent/runtime at this repository's [`SKILL.md`](./SKILL.md).
99
+ ---
57
100
 
58
- ### Agent-Safe Upgrade
101
+ ## Installation
59
102
 
60
- When upgrading from inside an agent runtime, prefer direct package-manager or existing-checkout commands. Avoid asking the agent to build a clone-or-pull shell bootstrap script with `set -e`, `bash -c`, `sh -c`, or an inline repository URL; some sandboxes correctly route those through approval.
103
+ ### Node CLI (default)
61
104
 
62
- For the CLI:
105
+ For most agents and human users:
63
106
 
64
107
  ```bash
65
108
  npm install -g @sogni-ai/sogni-creative-agent-skill@latest
66
109
  sogni-agent --version
67
110
  ```
68
111
 
69
- For an existing local checkout:
112
+ Then point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md). When an install request is ambiguous, install the CLI and skill source together — that's the supported default.
70
113
 
71
- ```bash
72
- DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
73
- git -C "$DEST" pull --ff-only
74
- npm --prefix "$DEST" install
75
- ```
76
-
77
- If the checkout is missing, use the npm install path above or explicitly approve a clone.
78
-
79
- ### OpenClaw Plugin
114
+ ### OpenClaw plugin
80
115
 
81
116
  For the published plugin:
82
117
 
@@ -86,7 +121,7 @@ openclaw plugins install sogni-creative-agent-skill
86
121
 
87
122
  The installed plugin loads its behavior from [`SKILL.md`](./SKILL.md) via [`openclaw.plugin.json`](./openclaw.plugin.json).
88
123
 
89
- For a local checkout that you want to update continuously, link the minimal OpenClaw surface instead of the repository root:
124
+ For a local checkout that you want to update continuously, link the minimal OpenClaw surface (`.openclaw-link/`) not the repository root, which contains development tests that OpenClaw correctly blocks during plugin safety scanning:
90
125
 
91
126
  ```bash
92
127
  cd /path/to/sogni-creative-agent-skill
@@ -97,7 +132,7 @@ openclaw plugins install -l "$PWD/.openclaw-link"
97
132
  openclaw gateway restart
98
133
  ```
99
134
 
100
- To update that linked install later:
135
+ To update the linked install later:
101
136
 
102
137
  ```bash
103
138
  cd /path/to/sogni-creative-agent-skill
@@ -108,13 +143,17 @@ npm run openclaw:sync
108
143
  openclaw gateway restart
109
144
  ```
110
145
 
111
- Do not run `openclaw plugins install -l "$PWD"` from the repository root. The root contains development tests that use `child_process`, and OpenClaw correctly blocks those during plugin safety scanning. The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
146
+ The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
147
+
148
+ #### OpenClaw configuration
149
+
150
+ When loaded through OpenClaw, this skill reads plugin defaults from OpenClaw config; CLI flags always override them. The supported config schema is defined in [`openclaw.plugin.json`](./openclaw.plugin.json) and includes default models, video workflow models, hosted API defaults (`apiBaseUrl`, `defaultLlmModel`, `defaultApiToolMode`), token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
112
151
 
113
- ### Hermes Agent / Manus / Other Frameworks
152
+ ### Hermes Agent / Manus / other frameworks
114
153
 
115
- Point the agent to this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. By default, the agent should invoke the globally installed `sogni-agent` CLI.
154
+ Point the agent at this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. The agent should invoke the globally installed `sogni-agent` CLI by default.
116
155
 
117
- ### Manual Installation
156
+ ### Manual install from source
118
157
 
119
158
  ```bash
120
159
  gh repo clone Sogni-AI/sogni-creative-agent-skill
@@ -122,45 +161,57 @@ cd sogni-creative-agent-skill
122
161
  npm install
123
162
  ```
124
163
 
125
- ### Maintainer Runtime Sync
164
+ ### Upgrading safely from inside an agent
126
165
 
127
- This public skill keeps CLI/runtime glue here, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. With both repos checked out as siblings, refresh the generated runtime before publishing:
166
+ When upgrading from inside an agent runtime, prefer direct package-manager or existing-checkout commands. Avoid asking the agent to build a clone-or-pull shell bootstrap script with `set -e`, `bash -c`, `sh -c`, or an inline repository URL some sandboxes correctly route those through approval and the install will stall.
167
+
168
+ For a global CLI:
128
169
 
129
170
  ```bash
130
- npm run sync:creative-agent-runtime
171
+ npm install -g @sogni-ai/sogni-creative-agent-skill@latest
172
+ sogni-agent --version
131
173
  ```
132
174
 
133
- `npm test` runs `npm run check:creative-agent-runtime` first, which regenerates this file and fails if it differs from the committed copy.
175
+ For an existing local checkout:
134
176
 
135
- The generated file is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
177
+ ```bash
178
+ DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
179
+ git -C "$DEST" pull --ff-only
180
+ npm --prefix "$DEST" install
181
+ ```
136
182
 
137
- ### Advanced OpenClaw Config
183
+ If the checkout is missing, use the npm install path above or explicitly approve a clone.
138
184
 
139
- When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
185
+ ---
140
186
 
141
- The supported config shape is defined in [`openclaw.plugin.json`](./openclaw.plugin.json). Common overrides include default models, video workflow models, token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
187
+ ## Setup (Sogni API key)
142
188
 
143
- ## Setup
189
+ 1. Get your API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username).
190
+ 2. Save it to a credentials file:
144
191
 
145
- 1. Create a Sogni account at https://app.sogni.ai/
146
- 2. Create credentials file:
192
+ ```bash
193
+ mkdir -p ~/.config/sogni
194
+ cat > ~/.config/sogni/credentials << 'EOF'
195
+ SOGNI_API_KEY=your_api_key
196
+ EOF
197
+ chmod 600 ~/.config/sogni/credentials
198
+ ```
147
199
 
148
- ```bash
149
- mkdir -p ~/.config/sogni
150
- cat > ~/.config/sogni/credentials << 'EOF'
151
- SOGNI_API_KEY=your_api_key
152
- # or:
153
- # SOGNI_USERNAME=your_username
154
- # SOGNI_PASSWORD=your_password
155
- EOF
156
- chmod 600 ~/.config/sogni/credentials
157
- ```
200
+ You can also skip the file and export `SOGNI_API_KEY` in your environment.
201
+
202
+ ### Filesystem path overrides
158
203
 
159
- You can also skip the file and set `SOGNI_API_KEY`, or `SOGNI_USERNAME` + `SOGNI_PASSWORD`, in your environment.
204
+ Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Override individual paths with:
160
205
 
161
- ### Filesystem Paths and Overrides
206
+ | Variable | Purpose |
207
+ |----------|---------|
208
+ | `SOGNI_CREDENTIALS_PATH` | Custom credentials file |
209
+ | `SOGNI_LAST_RENDER_PATH` | Where last-render state is persisted |
210
+ | `SOGNI_MEDIA_INBOUND_DIR` | Directory used by `--list-media` |
211
+ | `OPENCLAW_CONFIG_PATH` | OpenClaw config file location |
212
+ | `FFMPEG_PATH` | Custom `ffmpeg` binary |
162
213
 
163
- Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Advanced path overrides are available through `SOGNI_CREDENTIALS_PATH`, `SOGNI_LAST_RENDER_PATH`, `SOGNI_MEDIA_INBOUND_DIR`, and `OPENCLAW_CONFIG_PATH`.
214
+ ---
164
215
 
165
216
  ## Usage
166
217
 
@@ -174,14 +225,14 @@ sogni-agent -c subject.jpg "add a neon cyberpunk glow"
174
225
  # Photobooth face transfer
175
226
  sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
176
227
 
177
- # Text-to-video (t2v)
178
- sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
228
+ # Text-to-video (t2v) with native dialogue
229
+ sogni-agent --video 'A narrator says "welcome to the story" as ocean waves crash'
179
230
 
180
- # Short-side targeting preserves the current shape without forcing landscape
231
+ # Short-side resolution targeting (preserves the inherited aspect ratio)
181
232
  sogni-agent --video --target-resolution 768 \
182
233
  "A calm cinematic shot of lanterns drifting across a night lake"
183
234
 
184
- # Seedance 2.0 explicit aliases (4-15s vendor video path)
235
+ # Seedance 2.0 (4-15s vendor video path with native audio)
185
236
  sogni-agent --video -m seedance2 --duration 8 \
186
237
  "A polished product reveal with native ambient sound"
187
238
 
@@ -195,15 +246,36 @@ sogni-agent --video -m seedance2 --workflow t2v \
195
246
  # Image-to-video (i2v)
196
247
  sogni-agent --video --ref cat.jpg "gentle camera pan"
197
248
 
198
- # Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
249
+ # Image+audio-to-video (auto-routes to LTX-2.3 ia2v)
199
250
  sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
200
251
  "music video with synchronized motion"
201
252
 
202
- # Persona or voice identity with LTX native audio
253
+ # Direct music generation
254
+ sogni-agent --music --duration 30 \
255
+ "uplifting cinematic synthwave theme for a product launch"
256
+
257
+ # Song with lyrics and musical controls
258
+ sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
259
+ --keyscale "C major" --output-format mp3 "bright indie pop chorus"
260
+
261
+ # LTX-2.3 voice identity / persona
203
262
  sogni-agent --video --reference-audio-identity voice.webm \
204
- "NARRATOR: \"This is my voice.\""
263
+ 'NARRATOR: "This is my voice."'
264
+
265
+ # Hosted chat with rich creative-agent tools (/v1/chat/completions)
266
+ sogni-agent --api-chat \
267
+ "Create a 4-shot product video concept for a red sneaker"
205
268
 
206
- # Segment a source video, then stitch clips locally with an external soundtrack
269
+ # Durable hosted workflow (/v1/creative-agent/workflows)
270
+ sogni-agent --api-workflow image-to-video \
271
+ --video-prompt "The camera slowly pushes in as the sketch comes alive" \
272
+ "A graphite robot sketch on a drafting table"
273
+
274
+ # Storyline -> GPT Image 2 storyboard sheet -> Seedance video sequence
275
+ sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
276
+ "Create a 9:16 bakery launch video with a neon street-window reveal"
277
+
278
+ # Local segment + concat with external soundtrack
207
279
  sogni-agent --video --workflow v2v --ref-video dance.mp4 \
208
280
  --video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
209
281
  "robot dancing"
@@ -215,114 +287,137 @@ sogni-agent --balance
215
287
  sogni-agent --help
216
288
  ```
217
289
 
218
- For local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. `--video-start`, `--audio-start`, and `--audio-duration` let you generate focused segments, while `--concat-videos` can stitch them and optionally mux a single soundtrack with `--concat-audio`.
219
-
220
- V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance also accepts public HTTPS image, video, and audio references; audio references must be paired with an image or video reference.
221
-
222
- ## LTX-2.3 Prompting Guide
290
+ > Prefer `.webm`, `.m4a`, or `.mp3` voice clips. Local `.wav` clips are normalized to `.m4a` before upload when `ffmpeg` is available.
291
+ >
292
+ > For local multi-clip workflows, use the built-in FFmpeg wrappers (`--video-start`, `--audio-start`, `--audio-duration`, `--concat-videos`, `--concat-audio`) over raw shell commands they produce safer, more reproducible results.
223
293
 
224
- When you use `ltx23-22b-fp8_t2v_distilled`, do not feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
294
+ ---
225
295
 
226
- - Write one unbroken paragraph with no line breaks, bullets, headers, or tag blocks.
227
- - Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
228
- - Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
229
- - Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
230
- - If the user wants dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
231
- - Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
232
- - Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
233
- - Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
296
+ ## CLI Reference
234
297
 
235
- Example rewrite:
298
+ Run `sogni-agent --help` for the full CLI. Below are the options and tables most agents and users reach for first.
236
299
 
237
- ```text
238
- User ask: "make a 4k video of a woman in a neon alley"
300
+ ### Common options
239
301
 
240
- LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
241
- ```
302
+ | Option | Use |
303
+ |--------|-----|
304
+ | `-Q fast\|hq\|pro` | Pick image quality without memorizing model IDs |
305
+ | `-o <path>` | Save output locally |
306
+ | `-c <path>` | Provide image context for edits |
307
+ | `--video` | Generate video instead of image |
308
+ | `--music` | Generate music/audio instead of image |
309
+ | `--lyrics`, `--bpm`, `--keyscale`, `--timesig` | Music generation controls |
310
+ | `--ref`, `--ref-audio`, `--ref-video` | Image/audio/video references; HTTPS refs are forwarded as URL context for Seedance |
311
+ | `--target-resolution <px>` | Target the short side, preserving aspect ratio |
312
+ | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
313
+ | `--api-chat` | Use `/v1/chat/completions` with Sogni creative-agent tools |
314
+ | `--api-workflow <kind>` | Start a `/v1/creative-agent/workflows` durable workflow: `image-to-video`, `hosted-tool-sequence`, or `storyboard-video` |
315
+ | `--workflow-input <json\|path\|@path>` | Explicit hosted workflow input JSON |
316
+ | `--storyboard-frames <n>` | Beat count for `--api-workflow storyboard-video` |
317
+ | `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Durable image-to-video workflow inputs |
318
+ | `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Manage durable workflows |
319
+ | `--api-tools <mode>`, `--no-api-tool-execution`, `--llm-model <id>`, `--api-base-url <url>` | Tune hosted API requests |
320
+ | `--persona <name>` | Use a saved persona |
321
+ | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
322
+ | `--last`, `--last-image` | Inspect last render / reuse last image as context or video reference |
323
+ | `--strict-size` | Fail instead of auto-adjusting video size |
324
+ | `--json` | Emit structured output for agents |
242
325
 
243
- ## Photobooth (Face Transfer)
326
+ ### Quality presets
244
327
 
245
- Generate stylized portraits from a face photo using InstantID ControlNet:
328
+ Skip remembering model IDs `--quality` / `-Q` selects the right model, steps, and dimensions for image generation:
246
329
 
247
- ```bash
248
- sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
249
- sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
250
- ```
330
+ | Preset | Model | Steps | Size | Speed |
331
+ |--------|-------|-------|------|-------|
332
+ | `fast` | `z_image_turbo_bf16` | 8 | 512×512 | ~5–10s |
333
+ | `hq` | `z_image_turbo_bf16` | default | 768×768 | ~10–15s |
334
+ | `pro` | `flux2_dev_fp8` | 40 | 1024×1024 | ~2 min |
251
335
 
252
- Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
336
+ Explicit `--model` overrides the preset's model. Explicit `-w`/`-h` overrides dimensions.
253
337
 
254
- Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA.
255
- `--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
256
- `--balance` / `--balances` does not require a prompt and exits after printing current `SPARK` and `SOGNI` balances.
338
+ ### Recommended models
257
339
 
258
- ## Video Sizing Rules (Aspect Ratios)
340
+ Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Pass `-m` only when you need a specific model family.
259
341
 
260
- - WAN models use dimensions divisible by 16, min 480px, max 1536px.
261
- - LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048px on the long side.
262
- - Seedance runs at fixed 24fps and supports 4-15s durations. Other default/WAN video paths support up to 10s; LTX and WAN animate workflows support up to 20s.
263
- - The script auto-normalizes video sizes to satisfy those constraints.
264
- - Use `--target-resolution <px>` for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio.
265
- - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with a strict aspect-fit (`fit: inside`) and then uses the *resized reference dimensions* as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example: `1024x1536` requested, but ref becomes `1024x1535`).
266
- - `sogni-agent` detects this for local refs and will auto-adjust the requested size to a nearby safe size so the resized reference matches the model divisor.
267
- - If you want the script to fail instead of auto-adjusting, pass `--strict-size` and it will print a suggested size.
342
+ | Need | Recommended selector |
343
+ |------|----------------------|
344
+ | Default images | `z_image_turbo_bf16` |
345
+ | OpenAI GPT Image generation, editing, or strong text rendering | `gpt-image-2` |
346
+ | Highest-quality images | `flux2_dev_fp8` (or `-Q pro`) |
347
+ | Image editing | `qwen_image_edit_2511_fp8_lightning` |
348
+ | Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
349
+ | Direct music generation | `ace_step_1.5_turbo` (or `--music-model turbo`) |
350
+ | Music with stronger lyric handling | `ace_step_1.5_sft` (or `--music-model sft`) |
351
+ | Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
352
+ | Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
353
+ | Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
354
+ | Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
355
+ | Seedance text-to-video | `seedance2` or `seedance2-fast` |
356
+ | Seedance video-to-video without ControlNet | `seedance2-v2v` |
357
+ | Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
268
358
 
269
- ## Error Reporting
359
+ `gpt-image-2` supports flexible OpenAI image sizes up to 3840 px on either edge, max 3:1 aspect ratio, and total pixels from 655,360 to 8,294,400; the API snaps dimensions to valid multiples of 16. For image editing with `gpt-image-2`, you can pass up to 16 context images.
270
360
 
271
- Failures use a non-zero exit code and human-readable stderr. Add `--json` when an agent needs structured success/error output.
361
+ Music generation uses `--music` and outputs `mp3` by default. `--audio` remains the video-reference alias for `--ref-audio`; use `--music` or `--generate-music` for direct audio-only generation.
272
362
 
273
- ## Options
363
+ ---
274
364
 
275
- Run `sogni-agent --help` for the complete CLI. These are the options most agents should reach for first:
365
+ ## Video Sizing & Aspect Ratios
276
366
 
277
- | Option | Use |
278
- |--------|-----|
279
- | `-Q fast|hq|pro` | Pick image quality without memorizing model IDs |
280
- | `-o <path>` | Save output locally |
281
- | `-c <path>` | Provide image context for edits |
282
- | `--video` | Generate video instead of image |
283
- | `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references; Seedance HTTPS references are forwarded as URL context |
284
- | `--target-resolution <px>` | Target the short side while preserving aspect ratio |
285
- | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
286
- | `--persona <name>` | Use a saved persona reference |
287
- | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
288
- | `--json` | Return structured output for agents |
367
+ - **WAN models** use dimensions divisible by 16, min 480 px, max 1536 px.
368
+ - **LTX family** (`ltx2-*`, `ltx23-*`) uses dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048 px on the long side.
369
+ - **Seedance** runs at fixed 24 fps and supports 4–15 s durations. Other default/WAN paths support up to 10 s; LTX and WAN animate workflows support up to 20 s.
370
+ - The script auto-normalizes video sizes to satisfy these constraints.
371
+ - Use `--target-resolution <px>` for bare resolution requests like "720p" — it targets the short side and preserves the inherited aspect ratio.
372
+ - Natural-language aspect requests like "portrait", "square", "16:9", or "9:16" are inferred when width/height aren't explicitly set. Combined requests like "720p 9:16" keep the requested short side while applying the requested shape.
373
+ - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with strict aspect-fit (`fit: inside`) and uses the *resized* dimensions as the final video size. Because that resize uses rounding, a "valid" requested size can still produce an invalid final size (example: `1024×1536` requested, but ref becomes `1024×1535`). `sogni-agent` detects this for local refs and auto-adjusts to a nearby safe size.
374
+ - Pass `--strict-size` to fail instead the script will print a suggested size.
289
375
 
290
- ### Quality Presets
376
+ V2V defaults mirror Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist; `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance accepts public HTTPS image, video, and audio references that pass CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
291
377
 
292
- Instead of remembering model IDs, use `--quality` / `-Q` to auto-select the right model, steps, and dimensions:
378
+ ---
293
379
 
294
- | Preset | Model | Steps | Size | Speed |
295
- |--------|-------|-------|------|-------|
296
- | `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
297
- | `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
298
- | `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
380
+ ## LTX-2.3 Prompting Guide
299
381
 
300
- Explicit `--model` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions.
382
+ When you use `ltx23-22b-fp8_t2v_distilled`, do **not** feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
301
383
 
302
- ### Dynamic Prompt Variations
384
+ - Write one unbroken paragraph — no line breaks, bullets, headers, or tag blocks.
385
+ - Use 4–8 flowing present-tense sentences describing one continuous shot, not a montage.
386
+ - Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
387
+ - Keep characters and objects concrete and stable; describe one main action thread from start to finish.
388
+ - For dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
389
+ - Express mood through visible behavior, motion, and sound cues — not vague adjectives.
390
+ - Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and filler words like "beautiful" or "nice".
391
+ - Match scene density to clip length. For short clips, describe one main beat, not several actions.
303
392
 
304
- Generate diverse images in a single call using `{option1|option2|option3}` syntax:
393
+ **Example rewrite:**
305
394
 
306
- ```bash
307
- # Generates 3 images: "a red car", "a blue car", "a green car"
308
- sogni-agent -n 3 "a {red|blue|green} car"
395
+ ```text
396
+ User ask: "make a 4k video of a woman in a neon alley"
309
397
 
310
- # Multiple variation groups cycle independently
311
- sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
312
- # → "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
398
+ LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
313
399
  ```
314
400
 
315
- Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt as before.
401
+ ---
316
402
 
317
- ### Token Auto-Fallback
403
+ ## Photobooth (Face Transfer)
318
404
 
319
- Use `--token-type auto` to automatically retry with SOGNI tokens if SPARK balance is insufficient:
405
+ Generate stylized portraits from a face photo using InstantID ControlNet:
320
406
 
321
407
  ```bash
322
- sogni-agent --token-type auto "a dragon eating tacos"
408
+ sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
409
+ sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
323
410
  ```
324
411
 
325
- This tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
412
+ Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024×1024 by default. The face image is passed via `--ref` and styled by the prompt. Cannot be combined with `--video` or `-c` / `--context`.
413
+
414
+ Multi-angle mode (`--multi-angle` / `--angles-360`) auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA. `--angles-360-video` generates i2v clips between consecutive angles (including last → first) and concatenates them with `ffmpeg` into a seamless loop.
415
+
416
+ `--balance` / `--balances` does not require a prompt and prints current `SPARK` and `SOGNI` balances before exiting.
417
+
418
+ ---
419
+
420
+ ## Personas, Memory, and Personality
326
421
 
327
422
  ### Personas
328
423
 
@@ -335,20 +430,20 @@ sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --descriptio
335
430
  # Add with voice clip for video voice cloning
336
431
  sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip voice.webm
337
432
 
338
- # Generate an image using a persona (auto-injects photo as context)
433
+ # Generate using a persona (auto-injects photo as context)
339
434
  sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
340
435
 
341
- # Generate video using a persona photo plus saved voice identity
342
- sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
436
+ # Video using a persona photo + saved voice identity
437
+ sogni-agent --video --persona "Sarah" 'SARAH: "This is my voice."'
343
438
 
344
439
  # List / remove
345
440
  sogni-agent --persona-list
346
441
  sogni-agent --persona-remove "Mark"
347
442
  ```
348
443
 
349
- Personas are stored at `~/.config/sogni/personas/`. Pronouns like "me"/"myself" auto-resolve to the `self` persona. "my wife" resolves to `partner`, etc.
444
+ Stored at `~/.config/sogni/personas/`. Pronouns like "me" / "myself" auto-resolve to the `self` persona; "my wife" resolves to `partner`, etc.
350
445
 
351
- ### Memory (Persistent Preferences)
446
+ ### Memory (persistent preferences)
352
447
 
353
448
  Save preferences that agents respect across sessions:
354
449
 
@@ -361,9 +456,9 @@ sogni-agent --memory-remove preferred_style
361
456
 
362
457
  Stored at `~/.config/sogni/memories.json`.
363
458
 
364
- ### Personality (Custom Agent Instructions)
459
+ ### Personality (custom agent instructions)
365
460
 
366
- Set how the agent should behave:
461
+ Tell the agent how it should behave:
367
462
 
368
463
  ```bash
369
464
  sogni-agent --personality-set "Be concise, always use cinematic lighting"
@@ -373,24 +468,110 @@ sogni-agent --personality-clear
373
468
 
374
469
  Stored at `~/.config/sogni/personality.txt`.
375
470
 
376
- ## Models
471
+ ---
377
472
 
378
- Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Only pass `-m` when you need a specific model family.
473
+ ## Hosted API Modes
379
474
 
380
- | Need | Recommended model or alias |
381
- |------|----------------------------|
382
- | Default images | `z_image_turbo_bf16` |
383
- | Highest quality images | `flux2_dev_fp8` or `-Q pro` |
384
- | Image editing | `qwen_image_edit_2511_fp8_lightning` |
385
- | Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
386
- | Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
387
- | Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
388
- | Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
389
- | Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
390
- | Seedance text-to-video | `seedance2` or `seedance2-fast` |
391
- | Seedance video-to-video without ControlNet | `seedance2-v2v` |
392
- | Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
475
+ Hosted API modes require `SOGNI_API_KEY`.
476
+
477
+ - **`--api-chat`** targets `/v1/chat/completions` with rich creative-agent tools — best for text-first natural-language workflows. Tune with `--api-tools creative-agent|rich|hosted|none`, `--no-api-tool-execution`, `--llm-model`, and `--system`.
478
+ - **`--api-workflow`** targets `/v1/creative-agent/workflows` for durable, async workflow records with event streaming and cancellation. Supported kinds: `image-to-video`, `hosted-tool-sequence`, and `storyboard-video`.
479
+ - **`--api-workflow storyboard-video`** generates a storyline, creates a single GPT Image 2 storyboard sheet, then passes that artifact into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low/medium/high quality for that storyboard sheet.
480
+ - Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact hosted workflow JSON.
481
+
482
+ Override the API origin with `--api-base-url`, `SOGNI_API_BASE_URL`, or `SOGNI_REST_ENDPOINT`.
483
+ Hosted API credentials are only sent to `https://api.sogni.ai` by default. Add trusted custom
484
+ hosts with `SOGNI_API_ALLOWED_HOSTS`; loopback or non-HTTPS local testing requires
485
+ `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1`.
486
+
487
+ > Uploaded local media still uses the direct CLI path because hosted API modes do not accept CLI `--ref*` media flags for server-side tool execution.
488
+
489
+ ---
490
+
491
+ ## Dynamic Prompt Variations
492
+
493
+ Generate diverse images in a single call with `{option1|option2|option3}` syntax:
494
+
495
+ ```bash
496
+ # 3 images: "a red car", "a blue car", "a green car"
497
+ sogni-agent -n 3 "a {red|blue|green} car"
498
+
499
+ # Multiple groups cycle independently
500
+ sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
501
+ # -> "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
502
+ ```
503
+
504
+ Options cycle sequentially per image. Without `{...}` syntax, `-n` produces multiple images with the same prompt.
505
+
506
+ ---
507
+
508
+ ## Token Auto-Fallback
509
+
510
+ Use `--token-type auto` to retry with SOGNI tokens when SPARK is insufficient:
511
+
512
+ ```bash
513
+ sogni-agent --token-type auto "a dragon eating tacos"
514
+ ```
515
+
516
+ Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
517
+
518
+ ---
519
+
520
+ ## Error Reporting & Output
521
+
522
+ - **Exit codes:** failures use a non-zero exit code with human-readable stderr.
523
+ - **Structured output:** add `--json` when an agent needs machine-parseable success/error data, or `--last` to inspect the last render.
524
+ - **Output files:** use `-o <path>` to save locally; otherwise the CLI prints a result URL.
525
+ - **Quiet mode:** `-q` / `--quiet` suppresses progress output without changing exit semantics.
526
+
527
+ ---
528
+
529
+ ## For AI Agents
530
+
531
+ This skill is designed to be loaded into agent runtimes as a first-class capability.
532
+
533
+ 1. **Behavior contract — [`SKILL.md`](./SKILL.md)**
534
+ The canonical instructions for how the agent should call `sogni-agent`. Load this as the skill source.
535
+ 2. **Install/setup hints — [`llm.txt`](./llm.txt)**
536
+ A condensed install/setup reference for agents that fetch `llm.txt` over HTTPS:
537
+ `https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt`
538
+ 3. **OpenClaw manifest — [`openclaw.plugin.json`](./openclaw.plugin.json)**
539
+ Plugin metadata, config schema, and defaults for OpenClaw-aware runtimes.
540
+ 4. **Structured output — `--json`**
541
+ Use `--json` for machine-readable success/error payloads. Use `--last` to read the previous render's metadata.
542
+ 5. **Agent-safe install/upgrade**
543
+ Prefer the `npm install -g` and `git -C "$DEST" pull --ff-only` paths above. Avoid generating clone-or-pull bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs — agent sandboxes correctly route those through approval and the install will stall.
544
+ 6. **SSRF / URL safety**
545
+ The CLI runs an SSRF guard ([`ssrf-guard.mjs`](./ssrf-guard.mjs)) before forwarding any HTTP(S) reference to hosted models. Localhost and private-network URLs are rejected; only public HTTPS references are forwarded as Seedance multimodal context.
546
+
547
+ ---
548
+
549
+ ## Development
550
+
551
+ The public skill keeps CLI/runtime glue in this repo, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. The generated runtime is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
552
+
553
+ Run the test suite:
554
+
555
+ ```bash
556
+ npm test
557
+ ```
558
+
559
+ `npm test` first runs `npm run check:creative-agent-runtime`, which regenerates the runtime file and fails if it differs from the committed copy.
560
+
561
+ With both repos checked out as siblings, refresh the generated runtime before publishing:
562
+
563
+ ```bash
564
+ npm run sync:creative-agent-runtime
565
+ ```
566
+
567
+ Reusable workflow rules should be added to `../sogni-creative-agent` first, then synced here. Keep storyboard planning, tool argument validation, prompt linting, typed media turn intent, and typed repair/control semantics aligned with `sogni-chat`, `sogni-client`, and `sogni-api` hosted chat/workflow endpoints rather than recreating skill-only regex guards. Prefer generated or copied shared helpers for hosted workflow compilation, schema argument validation, `CreativeTurnPlannerFields` / `classifyMediaTurnIntent()` media-routing contracts, repair-control decisions, and guard telemetry summaries over skill-local guard code — this keeps public-agent behavior close to `/v1/chat/completions` and `/v1/creative-agent/workflows`.
568
+
569
+ Public-skill regex should stay limited to CLI argument/fact extraction such as file paths, URLs, extensions, dimensions, durations, and explicit positions. Hosted-style decisions such as latest-video continuation, uploaded-video modification, image-selection waits, stitch-after-batch state, and repair/control routing belong upstream in typed planner/runtime fields before they are synced here.
570
+
571
+ Issues and feature requests: [github.com/Sogni-AI/sogni-creative-agent-skill/issues](https://github.com/Sogni-AI/sogni-creative-agent-skill/issues).
572
+
573
+ ---
393
574
 
394
575
  ## License
395
576
 
396
- MIT
577
+ [MIT](./LICENSE) © Sogni AI