@sogni-ai/sogni-creative-agent-skill 2.1.2 → 2.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,61 +1,117 @@
1
1
  <p align="center">
2
- <img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Telegram image render workflow" width="320" />
2
+ <img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Sogni Creative Agent Skill rendering an image from a Telegram-style chat" width="320" />
3
3
  </p>
4
4
 
5
- # Sogni Creative Agent Skill: Image & Video Generation for Agents
5
+ <h1 align="center">Sogni Creative Agent Skill</h1>
6
6
 
7
- **Sogni Creative Agent Skill** gives AI agent runtimes such as Claude Code,
8
- [OpenClaw](https://github.com/OpenClaw/OpenClaw),
9
- [Hermes Agent](https://hermes-agent.nousresearch.com/),
10
- [Manus AI](https://manus.im), and more — image generation, video generation, and
11
- creative-media tools powered by [Sogni AI](https://sogni.ai)'s decentralized GPU
12
- network.
7
+ <p align="center">Image, video, and music generation for AI agents powered by <a href="https://sogni.ai">Sogni AI</a>'s decentralized GPU network.</p>
13
8
 
14
- Drop it into the setup you already have:
15
- - as a standalone Node.js CLI
16
- - as a skill source for **Hermes Agent**, **Manus AI**, and other agent frameworks
17
- - as an [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
9
+ <p align="center">
10
+ <a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="npm" src="https://img.shields.io/npm/v/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
11
+ <a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="downloads" src="https://img.shields.io/npm/dm/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
12
+ <img alt="node" src="https://img.shields.io/node/v/@sogni-ai/sogni-creative-agent-skill.svg" />
13
+ <a href="./LICENSE"><img alt="license" src="https://img.shields.io/npm/l/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
14
+ </p>
15
+
16
+ ---
17
+
18
+ **Sogni Creative Agent Skill** plugs into the agent runtime you already use — Claude Code, [OpenClaw](https://github.com/OpenClaw/OpenClaw), [Hermes Agent](https://hermes-agent.nousresearch.com/), [Manus AI](https://manus.im), and others — and gives it production-quality image, video, and music generation through a single CLI: `sogni-agent`.
19
+
20
+ It ships three ways:
18
21
 
19
- For install requests, use the CLI plus skill setup by default.
22
+ - a standalone Node.js CLI (`sogni-agent`)
23
+ - a skill source that any [`SKILL.md`](./SKILL.md)-aware agent can load
24
+ - a published [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
20
25
 
21
- With Sogni Creative Agent Skill, an agent can:
22
- - generate images from prompts
23
- - edit and restyle existing images
24
- - create videos from text, images, audio, or reference video
26
+ With this skill, an agent can:
27
+
28
+ - generate images from prompts and edit/restyle existing images
29
+ - create videos from text, images, audio, or reference video (LTX-2.3, WAN 2.2, Seedance 2.0)
30
+ - generate instrumental music or full songs with lyrics
31
+ - run hosted creative workflows including storyboard-driven video
25
32
  - save personas, preferences, and last-render state across sessions
26
33
  - check balances, list models, and refine previous results
27
34
 
35
+ > **Fastest install:** paste this repo's GitHub URL into your agent and ask it to "install this skill".
36
+
37
+ ---
38
+
39
+ ## Table of Contents
40
+
41
+ - [Quick Start](#quick-start)
42
+ - [Requirements](#requirements)
43
+ - [Installation](#installation)
44
+ - [Node CLI (default)](#node-cli-default)
45
+ - [OpenClaw plugin](#openclaw-plugin)
46
+ - [Hermes Agent / Manus / other frameworks](#hermes-agent--manus--other-frameworks)
47
+ - [Manual install from source](#manual-install-from-source)
48
+ - [Upgrading safely from inside an agent](#upgrading-safely-from-inside-an-agent)
49
+ - [Setup (Sogni API key)](#setup-sogni-api-key)
50
+ - [Usage](#usage)
51
+ - [CLI Reference](#cli-reference)
52
+ - [Common options](#common-options)
53
+ - [Quality presets](#quality-presets)
54
+ - [Recommended models](#recommended-models)
55
+ - [Video Sizing & Aspect Ratios](#video-sizing--aspect-ratios)
56
+ - [LTX-2.3 Prompting Guide](#ltx-23-prompting-guide)
57
+ - [Photobooth (Face Transfer)](#photobooth-face-transfer)
58
+ - [Personas, Memory, and Personality](#personas-memory-and-personality)
59
+ - [Hosted API Modes](#hosted-api-modes)
60
+ - [Dynamic Prompt Variations](#dynamic-prompt-variations)
61
+ - [Token Auto-Fallback](#token-auto-fallback)
62
+ - [Error Reporting & Output](#error-reporting--output)
63
+ - [For AI Agents](#for-ai-agents)
64
+ - [Development](#development)
65
+ - [License](#license)
66
+
67
+ ---
68
+
28
69
  ## Quick Start
29
70
 
30
- 1. Create Sogni credentials once. See [Setup](#setup).
31
- 2. Install the command-line tool:
71
+ 1. Get a Sogni API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username) and save it — see [Setup](#setup-sogni-api-key).
72
+ 2. Install the CLI:
32
73
 
33
- ```bash
34
- npm install -g @sogni-ai/sogni-creative-agent-skill
35
- sogni-agent --version
36
- ```
74
+ ```bash
75
+ npm install -g @sogni-ai/sogni-creative-agent-skill@latest
76
+ sogni-agent --version
77
+ ```
37
78
 
38
- 3. Point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md).
79
+ 3. Point your agent runtime at this repository's [`SKILL.md`](./SKILL.md).
80
+
81
+ Then ask your agent to do something:
39
82
 
40
- Then ask your agent to do something simple, for example:
41
83
  - "Generate an image of a sunset over mountains"
42
84
  - "Edit this image to add a rainbow"
43
85
  - "Make a video of a cat playing piano"
86
+ - "Generate a 30 second synthwave product-launch theme"
44
87
  - "Turn my selfie into James Bond using photobooth"
45
88
  - "Refine the last image at higher quality"
46
89
 
90
+ ---
91
+
92
+ ## Requirements
93
+
94
+ - **Node.js ≥ 22.11.0**
95
+ - **Sogni API key** ([dashboard.sogni.ai](https://dashboard.sogni.ai))
96
+ - **`ffmpeg`** *(optional)* — required for local utilities such as `--angles-360-video`, `--concat-videos`, and `--extract-last-frame`. Set `FFMPEG_PATH` to override discovery.
97
+ - macOS, Linux, or Windows
98
+
99
+ ---
100
+
47
101
  ## Installation
48
102
 
49
- Default install behavior: when someone asks to install Sogni Creative Agent Skill, install the command-line tool and skill source.
103
+ ### Node CLI (default)
104
+
105
+ For most agents and human users:
50
106
 
51
107
  ```bash
52
- npm install -g @sogni-ai/sogni-creative-agent-skill
108
+ npm install -g @sogni-ai/sogni-creative-agent-skill@latest
53
109
  sogni-agent --version
54
110
  ```
55
111
 
56
- Then point the agent/runtime at this repository's [`SKILL.md`](./SKILL.md).
112
+ Then point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md). When an install request is ambiguous, install the CLI and skill source together — that's the supported default.
57
113
 
58
- ### OpenClaw Plugin
114
+ ### OpenClaw plugin
59
115
 
60
116
  For the published plugin:
61
117
 
@@ -65,7 +121,7 @@ openclaw plugins install sogni-creative-agent-skill
65
121
 
66
122
  The installed plugin loads its behavior from [`SKILL.md`](./SKILL.md) via [`openclaw.plugin.json`](./openclaw.plugin.json).
67
123
 
68
- For a local checkout that you want to update continuously, link the minimal OpenClaw surface instead of the repository root:
124
+ For a local checkout that you want to update continuously, link the minimal OpenClaw surface (`.openclaw-link/`) not the repository root, which contains development tests that OpenClaw correctly blocks during plugin safety scanning:
69
125
 
70
126
  ```bash
71
127
  cd /path/to/sogni-creative-agent-skill
@@ -76,7 +132,7 @@ openclaw plugins install -l "$PWD/.openclaw-link"
76
132
  openclaw gateway restart
77
133
  ```
78
134
 
79
- To update that linked install later:
135
+ To update the linked install later:
80
136
 
81
137
  ```bash
82
138
  cd /path/to/sogni-creative-agent-skill
@@ -87,59 +143,75 @@ npm run openclaw:sync
87
143
  openclaw gateway restart
88
144
  ```
89
145
 
90
- Do not run `openclaw plugins install -l "$PWD"` from the repository root. The root contains development tests that use `child_process`, and OpenClaw correctly blocks those during plugin safety scanning. The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
146
+ The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
147
+
148
+ #### OpenClaw configuration
91
149
 
92
- ### Hermes Agent / Manus / Other Frameworks
150
+ When loaded through OpenClaw, this skill reads plugin defaults from OpenClaw config; CLI flags always override them. The supported config schema is defined in [`openclaw.plugin.json`](./openclaw.plugin.json) and includes default models, video workflow models, hosted API defaults (`apiBaseUrl`, `defaultLlmModel`, `defaultApiToolMode`), token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
93
151
 
94
- Point the agent to this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. By default, the agent should invoke the globally installed `sogni-agent` CLI.
152
+ ### Hermes Agent / Manus / other frameworks
95
153
 
96
- ### Manual Installation
154
+ Point the agent at this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. The agent should invoke the globally installed `sogni-agent` CLI by default.
155
+
156
+ ### Manual install from source
97
157
 
98
158
  ```bash
99
- git clone git@github.com:Sogni-AI/sogni-creative-agent-skill.git
159
+ gh repo clone Sogni-AI/sogni-creative-agent-skill
100
160
  cd sogni-creative-agent-skill
101
161
  npm install
102
162
  ```
103
163
 
104
- ### Maintainer Runtime Sync
164
+ ### Upgrading safely from inside an agent
165
+
166
+ When upgrading from inside an agent runtime, prefer direct package-manager or existing-checkout commands. Avoid asking the agent to build a clone-or-pull shell bootstrap script with `set -e`, `bash -c`, `sh -c`, or an inline repository URL — some sandboxes correctly route those through approval and the install will stall.
105
167
 
106
- This public skill keeps CLI/runtime glue here, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. With both repos checked out as siblings, refresh the generated runtime before publishing:
168
+ For a global CLI:
107
169
 
108
170
  ```bash
109
- npm run sync:creative-agent-runtime
171
+ npm install -g @sogni-ai/sogni-creative-agent-skill@latest
172
+ sogni-agent --version
110
173
  ```
111
174
 
112
- `npm test` runs `npm run check:creative-agent-runtime` first, which regenerates this file and fails if it differs from the committed copy.
175
+ For an existing local checkout:
113
176
 
114
- The generated file is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
177
+ ```bash
178
+ DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
179
+ git -C "$DEST" pull --ff-only
180
+ npm --prefix "$DEST" install
181
+ ```
115
182
 
116
- ### Advanced OpenClaw Config
183
+ If the checkout is missing, use the npm install path above or explicitly approve a clone.
117
184
 
118
- When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
185
+ ---
119
186
 
120
- The supported config shape is defined in [`openclaw.plugin.json`](./openclaw.plugin.json). Common overrides include default models, video workflow models, token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
187
+ ## Setup (Sogni API key)
121
188
 
122
- ## Setup
189
+ 1. Get your API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username).
190
+ 2. Save it to a credentials file:
123
191
 
124
- 1. Create a Sogni account at https://app.sogni.ai/
125
- 2. Create credentials file:
192
+ ```bash
193
+ mkdir -p ~/.config/sogni
194
+ cat > ~/.config/sogni/credentials << 'EOF'
195
+ SOGNI_API_KEY=your_api_key
196
+ EOF
197
+ chmod 600 ~/.config/sogni/credentials
198
+ ```
126
199
 
127
- ```bash
128
- mkdir -p ~/.config/sogni
129
- cat > ~/.config/sogni/credentials << 'EOF'
130
- SOGNI_API_KEY=your_api_key
131
- # or:
132
- # SOGNI_USERNAME=your_username
133
- # SOGNI_PASSWORD=your_password
134
- EOF
135
- chmod 600 ~/.config/sogni/credentials
136
- ```
200
+ You can also skip the file and export `SOGNI_API_KEY` in your environment.
201
+
202
+ ### Filesystem path overrides
137
203
 
138
- You can also skip the file and set `SOGNI_API_KEY`, or `SOGNI_USERNAME` + `SOGNI_PASSWORD`, in your environment.
204
+ Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Override individual paths with:
139
205
 
140
- ### Filesystem Paths and Overrides
206
+ | Variable | Purpose |
207
+ |----------|---------|
208
+ | `SOGNI_CREDENTIALS_PATH` | Custom credentials file |
209
+ | `SOGNI_LAST_RENDER_PATH` | Where last-render state is persisted |
210
+ | `SOGNI_MEDIA_INBOUND_DIR` | Directory used by `--list-media` |
211
+ | `OPENCLAW_CONFIG_PATH` | OpenClaw config file location |
212
+ | `FFMPEG_PATH` | Custom `ffmpeg` binary |
141
213
 
142
- Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Advanced path overrides are available through `SOGNI_CREDENTIALS_PATH`, `SOGNI_LAST_RENDER_PATH`, `SOGNI_MEDIA_INBOUND_DIR`, and `OPENCLAW_CONFIG_PATH`.
214
+ ---
143
215
 
144
216
  ## Usage
145
217
 
@@ -153,14 +225,14 @@ sogni-agent -c subject.jpg "add a neon cyberpunk glow"
153
225
  # Photobooth face transfer
154
226
  sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
155
227
 
156
- # Text-to-video (t2v)
157
- sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
228
+ # Text-to-video (t2v) with native dialogue
229
+ sogni-agent --video 'A narrator says "welcome to the story" as ocean waves crash'
158
230
 
159
- # Short-side targeting preserves the current shape without forcing landscape
231
+ # Short-side resolution targeting (preserves the inherited aspect ratio)
160
232
  sogni-agent --video --target-resolution 768 \
161
233
  "A calm cinematic shot of lanterns drifting across a night lake"
162
234
 
163
- # Seedance 2.0 explicit aliases (4-15s vendor video path)
235
+ # Seedance 2.0 (4-15s vendor video path with native audio)
164
236
  sogni-agent --video -m seedance2 --duration 8 \
165
237
  "A polished product reveal with native ambient sound"
166
238
 
@@ -174,15 +246,36 @@ sogni-agent --video -m seedance2 --workflow t2v \
174
246
  # Image-to-video (i2v)
175
247
  sogni-agent --video --ref cat.jpg "gentle camera pan"
176
248
 
177
- # Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
249
+ # Image+audio-to-video (auto-routes to LTX-2.3 ia2v)
178
250
  sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
179
251
  "music video with synchronized motion"
180
252
 
181
- # Persona or voice identity with LTX native audio
253
+ # Direct music generation
254
+ sogni-agent --music --duration 30 \
255
+ "uplifting cinematic synthwave theme for a product launch"
256
+
257
+ # Song with lyrics and musical controls
258
+ sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
259
+ --keyscale "C major" --output-format mp3 "bright indie pop chorus"
260
+
261
+ # LTX-2.3 voice identity / persona
182
262
  sogni-agent --video --reference-audio-identity voice.webm \
183
- "NARRATOR: \"This is my voice.\""
263
+ 'NARRATOR: "This is my voice."'
184
264
 
185
- # Segment a source video, then stitch clips locally with an external soundtrack
265
+ # Hosted chat with rich creative-agent tools (/v1/chat/completions)
266
+ sogni-agent --api-chat \
267
+ "Create a 4-shot product video concept for a red sneaker"
268
+
269
+ # Durable hosted workflow (/v1/creative-agent/workflows)
270
+ sogni-agent --api-workflow image-to-video \
271
+ --video-prompt "The camera slowly pushes in as the sketch comes alive" \
272
+ "A graphite robot sketch on a drafting table"
273
+
274
+ # Storyline -> GPT Image 2 storyboard sheet -> Seedance video sequence
275
+ sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
276
+ "Create a 9:16 bakery launch video with a neon street-window reveal"
277
+
278
+ # Local segment + concat with external soundtrack
186
279
  sogni-agent --video --workflow v2v --ref-video dance.mp4 \
187
280
  --video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
188
281
  "robot dancing"
@@ -194,114 +287,137 @@ sogni-agent --balance
194
287
  sogni-agent --help
195
288
  ```
196
289
 
197
- For local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. `--video-start`, `--audio-start`, and `--audio-duration` let you generate focused segments, while `--concat-videos` can stitch them and optionally mux a single soundtrack with `--concat-audio`.
290
+ > Prefer `.webm`, `.m4a`, or `.mp3` voice clips. Local `.wav` clips are normalized to `.m4a` before upload when `ffmpeg` is available.
291
+ >
292
+ > For local multi-clip workflows, use the built-in FFmpeg wrappers (`--video-start`, `--audio-start`, `--audio-duration`, `--concat-videos`, `--concat-audio`) over raw shell commands — they produce safer, more reproducible results.
198
293
 
199
- V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance also accepts public HTTPS image, video, and audio references; audio references must be paired with an image or video reference.
294
+ ---
200
295
 
201
- ## LTX-2.3 Prompting Guide
296
+ ## CLI Reference
202
297
 
203
- When you use `ltx23-22b-fp8_t2v_distilled`, do not feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
298
+ Run `sogni-agent --help` for the full CLI. Below are the options and tables most agents and users reach for first.
204
299
 
205
- - Write one unbroken paragraph with no line breaks, bullets, headers, or tag blocks.
206
- - Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
207
- - Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
208
- - Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
209
- - If the user wants dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
210
- - Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
211
- - Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
212
- - Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
213
-
214
- Example rewrite:
215
-
216
- ```text
217
- User ask: "make a 4k video of a woman in a neon alley"
300
+ ### Common options
218
301
 
219
- LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
220
- ```
302
+ | Option | Use |
303
+ |--------|-----|
304
+ | `-Q fast\|hq\|pro` | Pick image quality without memorizing model IDs |
305
+ | `-o <path>` | Save output locally |
306
+ | `-c <path>` | Provide image context for edits |
307
+ | `--video` | Generate video instead of image |
308
+ | `--music` | Generate music/audio instead of image |
309
+ | `--lyrics`, `--bpm`, `--keyscale`, `--timesig` | Music generation controls |
310
+ | `--ref`, `--ref-audio`, `--ref-video` | Image/audio/video references; HTTPS refs are forwarded as URL context for Seedance |
311
+ | `--target-resolution <px>` | Target the short side, preserving aspect ratio |
312
+ | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
313
+ | `--api-chat` | Use `/v1/chat/completions` with Sogni creative-agent tools |
314
+ | `--api-workflow <kind>` | Start a `/v1/creative-agent/workflows` durable workflow: `image-to-video`, `hosted-tool-sequence`, or `storyboard-video` |
315
+ | `--workflow-input <json\|path\|@path>` | Explicit hosted workflow input JSON |
316
+ | `--storyboard-frames <n>` | Beat count for `--api-workflow storyboard-video` |
317
+ | `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Durable image-to-video workflow inputs |
318
+ | `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Manage durable workflows |
319
+ | `--api-tools <mode>`, `--no-api-tool-execution`, `--llm-model <id>`, `--api-base-url <url>` | Tune hosted API requests |
320
+ | `--persona <name>` | Use a saved persona |
321
+ | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
322
+ | `--last`, `--last-image` | Inspect last render / reuse last image as context or video reference |
323
+ | `--strict-size` | Fail instead of auto-adjusting video size |
324
+ | `--json` | Emit structured output for agents |
221
325
 
222
- ## Photobooth (Face Transfer)
326
+ ### Quality presets
223
327
 
224
- Generate stylized portraits from a face photo using InstantID ControlNet:
328
+ Skip remembering model IDs `--quality` / `-Q` selects the right model, steps, and dimensions for image generation:
225
329
 
226
- ```bash
227
- sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
228
- sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
229
- ```
330
+ | Preset | Model | Steps | Size | Speed |
331
+ |--------|-------|-------|------|-------|
332
+ | `fast` | `z_image_turbo_bf16` | 8 | 512×512 | ~5–10s |
333
+ | `hq` | `z_image_turbo_bf16` | default | 768×768 | ~10–15s |
334
+ | `pro` | `flux2_dev_fp8` | 40 | 1024×1024 | ~2 min |
230
335
 
231
- Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
336
+ Explicit `--model` overrides the preset's model. Explicit `-w`/`-h` overrides dimensions.
232
337
 
233
- Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA.
234
- `--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
235
- `--balance` / `--balances` does not require a prompt and exits after printing current `SPARK` and `SOGNI` balances.
338
+ ### Recommended models
236
339
 
237
- ## Video Sizing Rules (Aspect Ratios)
340
+ Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Pass `-m` only when you need a specific model family.
238
341
 
239
- - WAN models use dimensions divisible by 16, min 480px, max 1536px.
240
- - LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048px on the long side.
241
- - Seedance runs at fixed 24fps and supports 4-15s durations. Other default/WAN video paths support up to 10s; LTX and WAN animate workflows support up to 20s.
242
- - The script auto-normalizes video sizes to satisfy those constraints.
243
- - Use `--target-resolution <px>` for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio.
244
- - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with a strict aspect-fit (`fit: inside`) and then uses the *resized reference dimensions* as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example: `1024x1536` requested, but ref becomes `1024x1535`).
245
- - `sogni-agent` detects this for local refs and will auto-adjust the requested size to a nearby safe size so the resized reference matches the model divisor.
246
- - If you want the script to fail instead of auto-adjusting, pass `--strict-size` and it will print a suggested size.
342
+ | Need | Recommended selector |
343
+ |------|----------------------|
344
+ | Default images | `z_image_turbo_bf16` |
345
+ | OpenAI GPT Image generation, editing, or strong text rendering | `gpt-image-2` |
346
+ | Highest-quality images | `flux2_dev_fp8` (or `-Q pro`) |
347
+ | Image editing | `qwen_image_edit_2511_fp8_lightning` |
348
+ | Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
349
+ | Direct music generation | `ace_step_1.5_turbo` (or `--music-model turbo`) |
350
+ | Music with stronger lyric handling | `ace_step_1.5_sft` (or `--music-model sft`) |
351
+ | Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
352
+ | Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
353
+ | Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
354
+ | Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
355
+ | Seedance text-to-video | `seedance2` or `seedance2-fast` |
356
+ | Seedance video-to-video without ControlNet | `seedance2-v2v` |
357
+ | Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
247
358
 
248
- ## Error Reporting
359
+ `gpt-image-2` supports flexible OpenAI image sizes up to 3840 px on either edge, max 3:1 aspect ratio, and total pixels from 655,360 to 8,294,400; the API snaps dimensions to valid multiples of 16. For image editing with `gpt-image-2`, you can pass up to 16 context images.
249
360
 
250
- Failures use a non-zero exit code and human-readable stderr. Add `--json` when an agent needs structured success/error output.
361
+ Music generation uses `--music` and outputs `mp3` by default. `--audio` remains the video-reference alias for `--ref-audio`; use `--music` or `--generate-music` for direct audio-only generation.
251
362
 
252
- ## Options
363
+ ---
253
364
 
254
- Run `sogni-agent --help` for the complete CLI. These are the options most agents should reach for first:
365
+ ## Video Sizing & Aspect Ratios
255
366
 
256
- | Option | Use |
257
- |--------|-----|
258
- | `-Q fast|hq|pro` | Pick image quality without memorizing model IDs |
259
- | `-o <path>` | Save output locally |
260
- | `-c <path>` | Provide image context for edits |
261
- | `--video` | Generate video instead of image |
262
- | `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references; Seedance HTTPS references are forwarded as URL context |
263
- | `--target-resolution <px>` | Target the short side while preserving aspect ratio |
264
- | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
265
- | `--persona <name>` | Use a saved persona reference |
266
- | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
267
- | `--json` | Return structured output for agents |
367
+ - **WAN models** use dimensions divisible by 16, min 480 px, max 1536 px.
368
+ - **LTX family** (`ltx2-*`, `ltx23-*`) uses dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048 px on the long side.
369
+ - **Seedance** runs at fixed 24 fps and supports 4–15 s durations. Other default/WAN paths support up to 10 s; LTX and WAN animate workflows support up to 20 s.
370
+ - The script auto-normalizes video sizes to satisfy these constraints.
371
+ - Use `--target-resolution <px>` for bare resolution requests like "720p" — it targets the short side and preserves the inherited aspect ratio.
372
+ - Natural-language aspect requests like "portrait", "square", "16:9", or "9:16" are inferred when width/height aren't explicitly set. Combined requests like "720p 9:16" keep the requested short side while applying the requested shape.
373
+ - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with strict aspect-fit (`fit: inside`) and uses the *resized* dimensions as the final video size. Because that resize uses rounding, a "valid" requested size can still produce an invalid final size (example: `1024×1536` requested, but ref becomes `1024×1535`). `sogni-agent` detects this for local refs and auto-adjusts to a nearby safe size.
374
+ - Pass `--strict-size` to fail instead the script will print a suggested size.
268
375
 
269
- ### Quality Presets
376
+ V2V defaults mirror Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist; `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance accepts public HTTPS image, video, and audio references that pass CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
270
377
 
271
- Instead of remembering model IDs, use `--quality` / `-Q` to auto-select the right model, steps, and dimensions:
378
+ ---
272
379
 
273
- | Preset | Model | Steps | Size | Speed |
274
- |--------|-------|-------|------|-------|
275
- | `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
276
- | `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
277
- | `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
380
+ ## LTX-2.3 Prompting Guide
278
381
 
279
- Explicit `--model` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions.
382
+ When you use `ltx23-22b-fp8_t2v_distilled`, do **not** feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
280
383
 
281
- ### Dynamic Prompt Variations
384
+ - Write one unbroken paragraph — no line breaks, bullets, headers, or tag blocks.
385
+ - Use 4–8 flowing present-tense sentences describing one continuous shot, not a montage.
386
+ - Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
387
+ - Keep characters and objects concrete and stable; describe one main action thread from start to finish.
388
+ - For dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
389
+ - Express mood through visible behavior, motion, and sound cues — not vague adjectives.
390
+ - Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and filler words like "beautiful" or "nice".
391
+ - Match scene density to clip length. For short clips, describe one main beat, not several actions.
282
392
 
283
- Generate diverse images in a single call using `{option1|option2|option3}` syntax:
393
+ **Example rewrite:**
284
394
 
285
- ```bash
286
- # Generates 3 images: "a red car", "a blue car", "a green car"
287
- sogni-agent -n 3 "a {red|blue|green} car"
395
+ ```text
396
+ User ask: "make a 4k video of a woman in a neon alley"
288
397
 
289
- # Multiple variation groups cycle independently
290
- sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
291
- # → "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
398
+ LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
292
399
  ```
293
400
 
294
- Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt as before.
401
+ ---
295
402
 
296
- ### Token Auto-Fallback
403
+ ## Photobooth (Face Transfer)
297
404
 
298
- Use `--token-type auto` to automatically retry with SOGNI tokens if SPARK balance is insufficient:
405
+ Generate stylized portraits from a face photo using InstantID ControlNet:
299
406
 
300
407
  ```bash
301
- sogni-agent --token-type auto "a dragon eating tacos"
408
+ sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
409
+ sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
302
410
  ```
303
411
 
304
- This tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
412
+ Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024×1024 by default. The face image is passed via `--ref` and styled by the prompt. Cannot be combined with `--video` or `-c` / `--context`.
413
+
414
+ Multi-angle mode (`--multi-angle` / `--angles-360`) auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA. `--angles-360-video` generates i2v clips between consecutive angles (including last → first) and concatenates them with `ffmpeg` into a seamless loop.
415
+
416
+ `--balance` / `--balances` does not require a prompt and prints current `SPARK` and `SOGNI` balances before exiting.
417
+
418
+ ---
419
+
420
+ ## Personas, Memory, and Personality
305
421
 
306
422
  ### Personas
307
423
 
@@ -314,20 +430,20 @@ sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --descriptio
314
430
  # Add with voice clip for video voice cloning
315
431
  sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip voice.webm
316
432
 
317
- # Generate an image using a persona (auto-injects photo as context)
433
+ # Generate using a persona (auto-injects photo as context)
318
434
  sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
319
435
 
320
- # Generate video using a persona photo plus saved voice identity
321
- sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
436
+ # Video using a persona photo + saved voice identity
437
+ sogni-agent --video --persona "Sarah" 'SARAH: "This is my voice."'
322
438
 
323
439
  # List / remove
324
440
  sogni-agent --persona-list
325
441
  sogni-agent --persona-remove "Mark"
326
442
  ```
327
443
 
328
- Personas are stored at `~/.config/sogni/personas/`. Pronouns like "me"/"myself" auto-resolve to the `self` persona. "my wife" resolves to `partner`, etc.
444
+ Stored at `~/.config/sogni/personas/`. Pronouns like "me" / "myself" auto-resolve to the `self` persona; "my wife" resolves to `partner`, etc.
329
445
 
330
- ### Memory (Persistent Preferences)
446
+ ### Memory (persistent preferences)
331
447
 
332
448
  Save preferences that agents respect across sessions:
333
449
 
@@ -340,9 +456,9 @@ sogni-agent --memory-remove preferred_style
340
456
 
341
457
  Stored at `~/.config/sogni/memories.json`.
342
458
 
343
- ### Personality (Custom Agent Instructions)
459
+ ### Personality (custom agent instructions)
344
460
 
345
- Set how the agent should behave:
461
+ Tell the agent how it should behave:
346
462
 
347
463
  ```bash
348
464
  sogni-agent --personality-set "Be concise, always use cinematic lighting"
@@ -352,24 +468,110 @@ sogni-agent --personality-clear
352
468
 
353
469
  Stored at `~/.config/sogni/personality.txt`.
354
470
 
355
- ## Models
471
+ ---
356
472
 
357
- Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Only pass `-m` when you need a specific model family.
473
+ ## Hosted API Modes
358
474
 
359
- | Need | Recommended model or alias |
360
- |------|----------------------------|
361
- | Default images | `z_image_turbo_bf16` |
362
- | Highest quality images | `flux2_dev_fp8` or `-Q pro` |
363
- | Image editing | `qwen_image_edit_2511_fp8_lightning` |
364
- | Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
365
- | Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
366
- | Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
367
- | Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
368
- | Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
369
- | Seedance text-to-video | `seedance2` or `seedance2-fast` |
370
- | Seedance video-to-video without ControlNet | `seedance2-v2v` |
371
- | Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
475
+ Hosted API modes require `SOGNI_API_KEY`.
476
+
477
+ - **`--api-chat`** targets `/v1/chat/completions` with rich creative-agent tools — best for text-first natural-language workflows. Tune with `--api-tools creative-agent|rich|hosted|none`, `--no-api-tool-execution`, `--llm-model`, and `--system`.
478
+ - **`--api-workflow`** targets `/v1/creative-agent/workflows` for durable, async workflow records with event streaming and cancellation. Supported kinds: `image-to-video`, `hosted-tool-sequence`, and `storyboard-video`.
479
+ - **`--api-workflow storyboard-video`** generates a storyline, creates a single GPT Image 2 storyboard sheet, then passes that artifact into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low/medium/high quality for that storyboard sheet.
480
+ - Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact hosted workflow JSON.
481
+
482
+ Override the API origin with `--api-base-url`, `SOGNI_API_BASE_URL`, or `SOGNI_REST_ENDPOINT`.
483
+ Hosted API credentials are only sent to `https://api.sogni.ai` by default. Add trusted custom
484
+ hosts with `SOGNI_API_ALLOWED_HOSTS`; loopback or non-HTTPS local testing requires
485
+ `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1`.
486
+
487
+ > Uploaded local media still uses the direct CLI path because hosted API modes do not accept CLI `--ref*` media flags for server-side tool execution.
488
+
489
+ ---
490
+
491
+ ## Dynamic Prompt Variations
492
+
493
+ Generate diverse images in a single call with `{option1|option2|option3}` syntax:
494
+
495
+ ```bash
496
+ # 3 images: "a red car", "a blue car", "a green car"
497
+ sogni-agent -n 3 "a {red|blue|green} car"
498
+
499
+ # Multiple groups cycle independently
500
+ sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
501
+ # -> "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
502
+ ```
503
+
504
+ Options cycle sequentially per image. Without `{...}` syntax, `-n` produces multiple images with the same prompt.
505
+
506
+ ---
507
+
508
+ ## Token Auto-Fallback
509
+
510
+ Use `--token-type auto` to retry with SOGNI tokens when SPARK is insufficient:
511
+
512
+ ```bash
513
+ sogni-agent --token-type auto "a dragon eating tacos"
514
+ ```
515
+
516
+ Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
517
+
518
+ ---
519
+
520
+ ## Error Reporting & Output
521
+
522
+ - **Exit codes:** failures use a non-zero exit code with human-readable stderr.
523
+ - **Structured output:** add `--json` when an agent needs machine-parseable success/error data, or `--last` to inspect the last render.
524
+ - **Output files:** use `-o <path>` to save locally; otherwise the CLI prints a result URL.
525
+ - **Quiet mode:** `-q` / `--quiet` suppresses progress output without changing exit semantics.
526
+
527
+ ---
528
+
529
+ ## For AI Agents
530
+
531
+ This skill is designed to be loaded into agent runtimes as a first-class capability.
532
+
533
+ 1. **Behavior contract — [`SKILL.md`](./SKILL.md)**
534
+ The canonical instructions for how the agent should call `sogni-agent`. Load this as the skill source.
535
+ 2. **Install/setup hints — [`llm.txt`](./llm.txt)**
536
+ A condensed install/setup reference for agents that fetch `llm.txt` over HTTPS:
537
+ `https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt`
538
+ 3. **OpenClaw manifest — [`openclaw.plugin.json`](./openclaw.plugin.json)**
539
+ Plugin metadata, config schema, and defaults for OpenClaw-aware runtimes.
540
+ 4. **Structured output — `--json`**
541
+ Use `--json` for machine-readable success/error payloads. Use `--last` to read the previous render's metadata.
542
+ 5. **Agent-safe install/upgrade**
543
+ Prefer the `npm install -g` and `git -C "$DEST" pull --ff-only` paths above. Avoid generating clone-or-pull bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs — agent sandboxes correctly route those through approval and the install will stall.
544
+ 6. **SSRF / URL safety**
545
+ The CLI runs an SSRF guard ([`ssrf-guard.mjs`](./ssrf-guard.mjs)) before forwarding any HTTP(S) reference to hosted models. Localhost and private-network URLs are rejected; only public HTTPS references are forwarded as Seedance multimodal context.
546
+
547
+ ---
548
+
549
+ ## Development
550
+
551
+ The public skill keeps CLI/runtime glue in this repo, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. The generated runtime is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
552
+
553
+ Run the test suite:
554
+
555
+ ```bash
556
+ npm test
557
+ ```
558
+
559
+ `npm test` first runs `npm run check:creative-agent-runtime`, which regenerates the runtime file and fails if it differs from the committed copy.
560
+
561
+ With both repos checked out as siblings, refresh the generated runtime before publishing:
562
+
563
+ ```bash
564
+ npm run sync:creative-agent-runtime
565
+ ```
566
+
567
+ Reusable workflow rules should be added to `../sogni-creative-agent` first, then synced here. Keep storyboard planning, tool argument validation, prompt linting, typed media turn intent, and typed repair/control semantics aligned with `sogni-chat`, `sogni-client`, and `sogni-api` hosted chat/workflow endpoints rather than recreating skill-only regex guards. Prefer generated or copied shared helpers for hosted workflow compilation, schema argument validation, `CreativeTurnPlannerFields` / `classifyMediaTurnIntent()` media-routing contracts, repair-control decisions, and guard telemetry summaries over skill-local guard code — this keeps public-agent behavior close to `/v1/chat/completions` and `/v1/creative-agent/workflows`.
568
+
569
+ Public-skill regex should stay limited to CLI argument/fact extraction such as file paths, URLs, extensions, dimensions, durations, and explicit positions. Hosted-style decisions such as latest-video continuation, uploaded-video modification, image-selection waits, stitch-after-batch state, and repair/control routing belong upstream in typed planner/runtime fields before they are synced here.
570
+
571
+ Issues and feature requests: [github.com/Sogni-AI/sogni-creative-agent-skill/issues](https://github.com/Sogni-AI/sogni-creative-agent-skill/issues).
572
+
573
+ ---
372
574
 
373
575
  ## License
374
576
 
375
- MIT
577
+ [MIT](./LICENSE) © Sogni AI