@sogni-ai/sogni-creative-agent-skill 2.1.2 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +368 -166
- package/SKILL.md +152 -29
- package/generated/creative-agent-runtime.mjs +3759 -27
- package/llm.txt +36 -13
- package/openclaw.plugin.json +36 -4
- package/package.json +5 -3
- package/sogni-agent.mjs +1750 -106
- package/version.mjs +1 -1
package/README.md
CHANGED
|
@@ -1,61 +1,117 @@
|
|
|
1
1
|
<p align="center">
|
|
2
|
-
<img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="
|
|
2
|
+
<img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Sogni Creative Agent Skill rendering an image from a Telegram-style chat" width="320" />
|
|
3
3
|
</p>
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
<h1 align="center">Sogni Creative Agent Skill</h1>
|
|
6
6
|
|
|
7
|
-
|
|
8
|
-
[OpenClaw](https://github.com/OpenClaw/OpenClaw),
|
|
9
|
-
[Hermes Agent](https://hermes-agent.nousresearch.com/),
|
|
10
|
-
[Manus AI](https://manus.im), and more — image generation, video generation, and
|
|
11
|
-
creative-media tools powered by [Sogni AI](https://sogni.ai)'s decentralized GPU
|
|
12
|
-
network.
|
|
7
|
+
<p align="center">Image, video, and music generation for AI agents — powered by <a href="https://sogni.ai">Sogni AI</a>'s decentralized GPU network.</p>
|
|
13
8
|
|
|
14
|
-
|
|
15
|
-
-
|
|
16
|
-
|
|
17
|
-
|
|
9
|
+
<p align="center">
|
|
10
|
+
<a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="npm" src="https://img.shields.io/npm/v/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
|
|
11
|
+
<a href="https://www.npmjs.com/package/@sogni-ai/sogni-creative-agent-skill"><img alt="downloads" src="https://img.shields.io/npm/dm/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
|
|
12
|
+
<img alt="node" src="https://img.shields.io/node/v/@sogni-ai/sogni-creative-agent-skill.svg" />
|
|
13
|
+
<a href="./LICENSE"><img alt="license" src="https://img.shields.io/npm/l/@sogni-ai/sogni-creative-agent-skill.svg" /></a>
|
|
14
|
+
</p>
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
**Sogni Creative Agent Skill** plugs into the agent runtime you already use — Claude Code, [OpenClaw](https://github.com/OpenClaw/OpenClaw), [Hermes Agent](https://hermes-agent.nousresearch.com/), [Manus AI](https://manus.im), and others — and gives it production-quality image, video, and music generation through a single CLI: `sogni-agent`.
|
|
19
|
+
|
|
20
|
+
It ships three ways:
|
|
18
21
|
|
|
19
|
-
|
|
22
|
+
- a standalone Node.js CLI (`sogni-agent`)
|
|
23
|
+
- a skill source that any [`SKILL.md`](./SKILL.md)-aware agent can load
|
|
24
|
+
- a published [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
|
|
20
25
|
|
|
21
|
-
With
|
|
22
|
-
|
|
23
|
-
-
|
|
24
|
-
- create videos from text, images, audio, or reference video
|
|
26
|
+
With this skill, an agent can:
|
|
27
|
+
|
|
28
|
+
- generate images from prompts and edit/restyle existing images
|
|
29
|
+
- create videos from text, images, audio, or reference video (LTX-2.3, WAN 2.2, Seedance 2.0)
|
|
30
|
+
- generate instrumental music or full songs with lyrics
|
|
31
|
+
- run hosted creative workflows including storyboard-driven video
|
|
25
32
|
- save personas, preferences, and last-render state across sessions
|
|
26
33
|
- check balances, list models, and refine previous results
|
|
27
34
|
|
|
35
|
+
> **Fastest install:** paste this repo's GitHub URL into your agent and ask it to "install this skill".
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## Table of Contents
|
|
40
|
+
|
|
41
|
+
- [Quick Start](#quick-start)
|
|
42
|
+
- [Requirements](#requirements)
|
|
43
|
+
- [Installation](#installation)
|
|
44
|
+
- [Node CLI (default)](#node-cli-default)
|
|
45
|
+
- [OpenClaw plugin](#openclaw-plugin)
|
|
46
|
+
- [Hermes Agent / Manus / other frameworks](#hermes-agent--manus--other-frameworks)
|
|
47
|
+
- [Manual install from source](#manual-install-from-source)
|
|
48
|
+
- [Upgrading safely from inside an agent](#upgrading-safely-from-inside-an-agent)
|
|
49
|
+
- [Setup (Sogni API key)](#setup-sogni-api-key)
|
|
50
|
+
- [Usage](#usage)
|
|
51
|
+
- [CLI Reference](#cli-reference)
|
|
52
|
+
- [Common options](#common-options)
|
|
53
|
+
- [Quality presets](#quality-presets)
|
|
54
|
+
- [Recommended models](#recommended-models)
|
|
55
|
+
- [Video Sizing & Aspect Ratios](#video-sizing--aspect-ratios)
|
|
56
|
+
- [LTX-2.3 Prompting Guide](#ltx-23-prompting-guide)
|
|
57
|
+
- [Photobooth (Face Transfer)](#photobooth-face-transfer)
|
|
58
|
+
- [Personas, Memory, and Personality](#personas-memory-and-personality)
|
|
59
|
+
- [Hosted API Modes](#hosted-api-modes)
|
|
60
|
+
- [Dynamic Prompt Variations](#dynamic-prompt-variations)
|
|
61
|
+
- [Token Auto-Fallback](#token-auto-fallback)
|
|
62
|
+
- [Error Reporting & Output](#error-reporting--output)
|
|
63
|
+
- [For AI Agents](#for-ai-agents)
|
|
64
|
+
- [Development](#development)
|
|
65
|
+
- [License](#license)
|
|
66
|
+
|
|
67
|
+
---
|
|
68
|
+
|
|
28
69
|
## Quick Start
|
|
29
70
|
|
|
30
|
-
1.
|
|
31
|
-
2. Install the
|
|
71
|
+
1. Get a Sogni API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username) and save it — see [Setup](#setup-sogni-api-key).
|
|
72
|
+
2. Install the CLI:
|
|
32
73
|
|
|
33
|
-
```bash
|
|
34
|
-
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
35
|
-
sogni-agent --version
|
|
36
|
-
```
|
|
74
|
+
```bash
|
|
75
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
76
|
+
sogni-agent --version
|
|
77
|
+
```
|
|
37
78
|
|
|
38
|
-
3. Point your agent
|
|
79
|
+
3. Point your agent runtime at this repository's [`SKILL.md`](./SKILL.md).
|
|
80
|
+
|
|
81
|
+
Then ask your agent to do something:
|
|
39
82
|
|
|
40
|
-
Then ask your agent to do something simple, for example:
|
|
41
83
|
- "Generate an image of a sunset over mountains"
|
|
42
84
|
- "Edit this image to add a rainbow"
|
|
43
85
|
- "Make a video of a cat playing piano"
|
|
86
|
+
- "Generate a 30 second synthwave product-launch theme"
|
|
44
87
|
- "Turn my selfie into James Bond using photobooth"
|
|
45
88
|
- "Refine the last image at higher quality"
|
|
46
89
|
|
|
90
|
+
---
|
|
91
|
+
|
|
92
|
+
## Requirements
|
|
93
|
+
|
|
94
|
+
- **Node.js ≥ 22.11.0**
|
|
95
|
+
- **Sogni API key** ([dashboard.sogni.ai](https://dashboard.sogni.ai))
|
|
96
|
+
- **`ffmpeg`** *(optional)* — required for local utilities such as `--angles-360-video`, `--concat-videos`, and `--extract-last-frame`. Set `FFMPEG_PATH` to override discovery.
|
|
97
|
+
- macOS, Linux, or Windows
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
47
101
|
## Installation
|
|
48
102
|
|
|
49
|
-
|
|
103
|
+
### Node CLI (default)
|
|
104
|
+
|
|
105
|
+
For most agents and human users:
|
|
50
106
|
|
|
51
107
|
```bash
|
|
52
|
-
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
108
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
53
109
|
sogni-agent --version
|
|
54
110
|
```
|
|
55
111
|
|
|
56
|
-
Then point
|
|
112
|
+
Then point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md). When an install request is ambiguous, install the CLI and skill source together — that's the supported default.
|
|
57
113
|
|
|
58
|
-
### OpenClaw
|
|
114
|
+
### OpenClaw plugin
|
|
59
115
|
|
|
60
116
|
For the published plugin:
|
|
61
117
|
|
|
@@ -65,7 +121,7 @@ openclaw plugins install sogni-creative-agent-skill
|
|
|
65
121
|
|
|
66
122
|
The installed plugin loads its behavior from [`SKILL.md`](./SKILL.md) via [`openclaw.plugin.json`](./openclaw.plugin.json).
|
|
67
123
|
|
|
68
|
-
For a local checkout that you want to update continuously, link the minimal OpenClaw surface
|
|
124
|
+
For a local checkout that you want to update continuously, link the minimal OpenClaw surface (`.openclaw-link/`) — not the repository root, which contains development tests that OpenClaw correctly blocks during plugin safety scanning:
|
|
69
125
|
|
|
70
126
|
```bash
|
|
71
127
|
cd /path/to/sogni-creative-agent-skill
|
|
@@ -76,7 +132,7 @@ openclaw plugins install -l "$PWD/.openclaw-link"
|
|
|
76
132
|
openclaw gateway restart
|
|
77
133
|
```
|
|
78
134
|
|
|
79
|
-
To update
|
|
135
|
+
To update the linked install later:
|
|
80
136
|
|
|
81
137
|
```bash
|
|
82
138
|
cd /path/to/sogni-creative-agent-skill
|
|
@@ -87,59 +143,75 @@ npm run openclaw:sync
|
|
|
87
143
|
openclaw gateway restart
|
|
88
144
|
```
|
|
89
145
|
|
|
90
|
-
|
|
146
|
+
The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
|
|
147
|
+
|
|
148
|
+
#### OpenClaw configuration
|
|
91
149
|
|
|
92
|
-
|
|
150
|
+
When loaded through OpenClaw, this skill reads plugin defaults from OpenClaw config; CLI flags always override them. The supported config schema is defined in [`openclaw.plugin.json`](./openclaw.plugin.json) and includes default models, video workflow models, hosted API defaults (`apiBaseUrl`, `defaultLlmModel`, `defaultApiToolMode`), token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
|
|
93
151
|
|
|
94
|
-
|
|
152
|
+
### Hermes Agent / Manus / other frameworks
|
|
95
153
|
|
|
96
|
-
|
|
154
|
+
Point the agent at this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. The agent should invoke the globally installed `sogni-agent` CLI by default.
|
|
155
|
+
|
|
156
|
+
### Manual install from source
|
|
97
157
|
|
|
98
158
|
```bash
|
|
99
|
-
|
|
159
|
+
gh repo clone Sogni-AI/sogni-creative-agent-skill
|
|
100
160
|
cd sogni-creative-agent-skill
|
|
101
161
|
npm install
|
|
102
162
|
```
|
|
103
163
|
|
|
104
|
-
###
|
|
164
|
+
### Upgrading safely from inside an agent
|
|
165
|
+
|
|
166
|
+
When upgrading from inside an agent runtime, prefer direct package-manager or existing-checkout commands. Avoid asking the agent to build a clone-or-pull shell bootstrap script with `set -e`, `bash -c`, `sh -c`, or an inline repository URL — some sandboxes correctly route those through approval and the install will stall.
|
|
105
167
|
|
|
106
|
-
|
|
168
|
+
For a global CLI:
|
|
107
169
|
|
|
108
170
|
```bash
|
|
109
|
-
npm
|
|
171
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
172
|
+
sogni-agent --version
|
|
110
173
|
```
|
|
111
174
|
|
|
112
|
-
|
|
175
|
+
For an existing local checkout:
|
|
113
176
|
|
|
114
|
-
|
|
177
|
+
```bash
|
|
178
|
+
DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
|
|
179
|
+
git -C "$DEST" pull --ff-only
|
|
180
|
+
npm --prefix "$DEST" install
|
|
181
|
+
```
|
|
115
182
|
|
|
116
|
-
|
|
183
|
+
If the checkout is missing, use the npm install path above or explicitly approve a clone.
|
|
117
184
|
|
|
118
|
-
|
|
185
|
+
---
|
|
119
186
|
|
|
120
|
-
|
|
187
|
+
## Setup (Sogni API key)
|
|
121
188
|
|
|
122
|
-
|
|
189
|
+
1. Get your API key from [dashboard.sogni.ai](https://dashboard.sogni.ai) (click your username).
|
|
190
|
+
2. Save it to a credentials file:
|
|
123
191
|
|
|
124
|
-
|
|
125
|
-
|
|
192
|
+
```bash
|
|
193
|
+
mkdir -p ~/.config/sogni
|
|
194
|
+
cat > ~/.config/sogni/credentials << 'EOF'
|
|
195
|
+
SOGNI_API_KEY=your_api_key
|
|
196
|
+
EOF
|
|
197
|
+
chmod 600 ~/.config/sogni/credentials
|
|
198
|
+
```
|
|
126
199
|
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
130
|
-
SOGNI_API_KEY=your_api_key
|
|
131
|
-
# or:
|
|
132
|
-
# SOGNI_USERNAME=your_username
|
|
133
|
-
# SOGNI_PASSWORD=your_password
|
|
134
|
-
EOF
|
|
135
|
-
chmod 600 ~/.config/sogni/credentials
|
|
136
|
-
```
|
|
200
|
+
You can also skip the file and export `SOGNI_API_KEY` in your environment.
|
|
201
|
+
|
|
202
|
+
### Filesystem path overrides
|
|
137
203
|
|
|
138
|
-
|
|
204
|
+
Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Override individual paths with:
|
|
139
205
|
|
|
140
|
-
|
|
206
|
+
| Variable | Purpose |
|
|
207
|
+
|----------|---------|
|
|
208
|
+
| `SOGNI_CREDENTIALS_PATH` | Custom credentials file |
|
|
209
|
+
| `SOGNI_LAST_RENDER_PATH` | Where last-render state is persisted |
|
|
210
|
+
| `SOGNI_MEDIA_INBOUND_DIR` | Directory used by `--list-media` |
|
|
211
|
+
| `OPENCLAW_CONFIG_PATH` | OpenClaw config file location |
|
|
212
|
+
| `FFMPEG_PATH` | Custom `ffmpeg` binary |
|
|
141
213
|
|
|
142
|
-
|
|
214
|
+
---
|
|
143
215
|
|
|
144
216
|
## Usage
|
|
145
217
|
|
|
@@ -153,14 +225,14 @@ sogni-agent -c subject.jpg "add a neon cyberpunk glow"
|
|
|
153
225
|
# Photobooth face transfer
|
|
154
226
|
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
|
|
155
227
|
|
|
156
|
-
# Text-to-video (t2v)
|
|
157
|
-
sogni-agent --video
|
|
228
|
+
# Text-to-video (t2v) with native dialogue
|
|
229
|
+
sogni-agent --video 'A narrator says "welcome to the story" as ocean waves crash'
|
|
158
230
|
|
|
159
|
-
# Short-side targeting preserves the
|
|
231
|
+
# Short-side resolution targeting (preserves the inherited aspect ratio)
|
|
160
232
|
sogni-agent --video --target-resolution 768 \
|
|
161
233
|
"A calm cinematic shot of lanterns drifting across a night lake"
|
|
162
234
|
|
|
163
|
-
# Seedance 2.0
|
|
235
|
+
# Seedance 2.0 (4-15s vendor video path with native audio)
|
|
164
236
|
sogni-agent --video -m seedance2 --duration 8 \
|
|
165
237
|
"A polished product reveal with native ambient sound"
|
|
166
238
|
|
|
@@ -174,15 +246,36 @@ sogni-agent --video -m seedance2 --workflow t2v \
|
|
|
174
246
|
# Image-to-video (i2v)
|
|
175
247
|
sogni-agent --video --ref cat.jpg "gentle camera pan"
|
|
176
248
|
|
|
177
|
-
# Image+audio-to-video (auto-routes to LTX
|
|
249
|
+
# Image+audio-to-video (auto-routes to LTX-2.3 ia2v)
|
|
178
250
|
sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
|
|
179
251
|
"music video with synchronized motion"
|
|
180
252
|
|
|
181
|
-
#
|
|
253
|
+
# Direct music generation
|
|
254
|
+
sogni-agent --music --duration 30 \
|
|
255
|
+
"uplifting cinematic synthwave theme for a product launch"
|
|
256
|
+
|
|
257
|
+
# Song with lyrics and musical controls
|
|
258
|
+
sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
|
|
259
|
+
--keyscale "C major" --output-format mp3 "bright indie pop chorus"
|
|
260
|
+
|
|
261
|
+
# LTX-2.3 voice identity / persona
|
|
182
262
|
sogni-agent --video --reference-audio-identity voice.webm \
|
|
183
|
-
|
|
263
|
+
'NARRATOR: "This is my voice."'
|
|
184
264
|
|
|
185
|
-
#
|
|
265
|
+
# Hosted chat with rich creative-agent tools (/v1/chat/completions)
|
|
266
|
+
sogni-agent --api-chat \
|
|
267
|
+
"Create a 4-shot product video concept for a red sneaker"
|
|
268
|
+
|
|
269
|
+
# Durable hosted workflow (/v1/creative-agent/workflows)
|
|
270
|
+
sogni-agent --api-workflow image-to-video \
|
|
271
|
+
--video-prompt "The camera slowly pushes in as the sketch comes alive" \
|
|
272
|
+
"A graphite robot sketch on a drafting table"
|
|
273
|
+
|
|
274
|
+
# Storyline -> GPT Image 2 storyboard sheet -> Seedance video sequence
|
|
275
|
+
sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
|
|
276
|
+
"Create a 9:16 bakery launch video with a neon street-window reveal"
|
|
277
|
+
|
|
278
|
+
# Local segment + concat with external soundtrack
|
|
186
279
|
sogni-agent --video --workflow v2v --ref-video dance.mp4 \
|
|
187
280
|
--video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
|
|
188
281
|
"robot dancing"
|
|
@@ -194,114 +287,137 @@ sogni-agent --balance
|
|
|
194
287
|
sogni-agent --help
|
|
195
288
|
```
|
|
196
289
|
|
|
197
|
-
|
|
290
|
+
> Prefer `.webm`, `.m4a`, or `.mp3` voice clips. Local `.wav` clips are normalized to `.m4a` before upload when `ffmpeg` is available.
|
|
291
|
+
>
|
|
292
|
+
> For local multi-clip workflows, use the built-in FFmpeg wrappers (`--video-start`, `--audio-start`, `--audio-duration`, `--concat-videos`, `--concat-audio`) over raw shell commands — they produce safer, more reproducible results.
|
|
198
293
|
|
|
199
|
-
|
|
294
|
+
---
|
|
200
295
|
|
|
201
|
-
##
|
|
296
|
+
## CLI Reference
|
|
202
297
|
|
|
203
|
-
|
|
298
|
+
Run `sogni-agent --help` for the full CLI. Below are the options and tables most agents and users reach for first.
|
|
204
299
|
|
|
205
|
-
|
|
206
|
-
- Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
|
|
207
|
-
- Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
|
|
208
|
-
- Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
|
|
209
|
-
- If the user wants dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
|
|
210
|
-
- Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
|
|
211
|
-
- Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
|
|
212
|
-
- Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
|
|
213
|
-
|
|
214
|
-
Example rewrite:
|
|
215
|
-
|
|
216
|
-
```text
|
|
217
|
-
User ask: "make a 4k video of a woman in a neon alley"
|
|
300
|
+
### Common options
|
|
218
301
|
|
|
219
|
-
|
|
220
|
-
|
|
302
|
+
| Option | Use |
|
|
303
|
+
|--------|-----|
|
|
304
|
+
| `-Q fast\|hq\|pro` | Pick image quality without memorizing model IDs |
|
|
305
|
+
| `-o <path>` | Save output locally |
|
|
306
|
+
| `-c <path>` | Provide image context for edits |
|
|
307
|
+
| `--video` | Generate video instead of image |
|
|
308
|
+
| `--music` | Generate music/audio instead of image |
|
|
309
|
+
| `--lyrics`, `--bpm`, `--keyscale`, `--timesig` | Music generation controls |
|
|
310
|
+
| `--ref`, `--ref-audio`, `--ref-video` | Image/audio/video references; HTTPS refs are forwarded as URL context for Seedance |
|
|
311
|
+
| `--target-resolution <px>` | Target the short side, preserving aspect ratio |
|
|
312
|
+
| `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
|
|
313
|
+
| `--api-chat` | Use `/v1/chat/completions` with Sogni creative-agent tools |
|
|
314
|
+
| `--api-workflow <kind>` | Start a `/v1/creative-agent/workflows` durable workflow: `image-to-video`, `hosted-tool-sequence`, or `storyboard-video` |
|
|
315
|
+
| `--workflow-input <json\|path\|@path>` | Explicit hosted workflow input JSON |
|
|
316
|
+
| `--storyboard-frames <n>` | Beat count for `--api-workflow storyboard-video` |
|
|
317
|
+
| `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Durable image-to-video workflow inputs |
|
|
318
|
+
| `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Manage durable workflows |
|
|
319
|
+
| `--api-tools <mode>`, `--no-api-tool-execution`, `--llm-model <id>`, `--api-base-url <url>` | Tune hosted API requests |
|
|
320
|
+
| `--persona <name>` | Use a saved persona |
|
|
321
|
+
| `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
|
|
322
|
+
| `--last`, `--last-image` | Inspect last render / reuse last image as context or video reference |
|
|
323
|
+
| `--strict-size` | Fail instead of auto-adjusting video size |
|
|
324
|
+
| `--json` | Emit structured output for agents |
|
|
221
325
|
|
|
222
|
-
|
|
326
|
+
### Quality presets
|
|
223
327
|
|
|
224
|
-
|
|
328
|
+
Skip remembering model IDs — `--quality` / `-Q` selects the right model, steps, and dimensions for image generation:
|
|
225
329
|
|
|
226
|
-
|
|
227
|
-
|
|
228
|
-
|
|
229
|
-
|
|
330
|
+
| Preset | Model | Steps | Size | Speed |
|
|
331
|
+
|--------|-------|-------|------|-------|
|
|
332
|
+
| `fast` | `z_image_turbo_bf16` | 8 | 512×512 | ~5–10s |
|
|
333
|
+
| `hq` | `z_image_turbo_bf16` | default | 768×768 | ~10–15s |
|
|
334
|
+
| `pro` | `flux2_dev_fp8` | 40 | 1024×1024 | ~2 min |
|
|
230
335
|
|
|
231
|
-
|
|
336
|
+
Explicit `--model` overrides the preset's model. Explicit `-w`/`-h` overrides dimensions.
|
|
232
337
|
|
|
233
|
-
|
|
234
|
-
`--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
|
|
235
|
-
`--balance` / `--balances` does not require a prompt and exits after printing current `SPARK` and `SOGNI` balances.
|
|
338
|
+
### Recommended models
|
|
236
339
|
|
|
237
|
-
|
|
340
|
+
Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Pass `-m` only when you need a specific model family.
|
|
238
341
|
|
|
239
|
-
|
|
240
|
-
|
|
241
|
-
|
|
242
|
-
|
|
243
|
-
|
|
244
|
-
|
|
245
|
-
|
|
246
|
-
|
|
342
|
+
| Need | Recommended selector |
|
|
343
|
+
|------|----------------------|
|
|
344
|
+
| Default images | `z_image_turbo_bf16` |
|
|
345
|
+
| OpenAI GPT Image generation, editing, or strong text rendering | `gpt-image-2` |
|
|
346
|
+
| Highest-quality images | `flux2_dev_fp8` (or `-Q pro`) |
|
|
347
|
+
| Image editing | `qwen_image_edit_2511_fp8_lightning` |
|
|
348
|
+
| Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
|
|
349
|
+
| Direct music generation | `ace_step_1.5_turbo` (or `--music-model turbo`) |
|
|
350
|
+
| Music with stronger lyric handling | `ace_step_1.5_sft` (or `--music-model sft`) |
|
|
351
|
+
| Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
|
|
352
|
+
| Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
|
|
353
|
+
| Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
|
|
354
|
+
| Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
|
|
355
|
+
| Seedance text-to-video | `seedance2` or `seedance2-fast` |
|
|
356
|
+
| Seedance video-to-video without ControlNet | `seedance2-v2v` |
|
|
357
|
+
| Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
|
|
247
358
|
|
|
248
|
-
|
|
359
|
+
`gpt-image-2` supports flexible OpenAI image sizes up to 3840 px on either edge, max 3:1 aspect ratio, and total pixels from 655,360 to 8,294,400; the API snaps dimensions to valid multiples of 16. For image editing with `gpt-image-2`, you can pass up to 16 context images.
|
|
249
360
|
|
|
250
|
-
|
|
361
|
+
Music generation uses `--music` and outputs `mp3` by default. `--audio` remains the video-reference alias for `--ref-audio`; use `--music` or `--generate-music` for direct audio-only generation.
|
|
251
362
|
|
|
252
|
-
|
|
363
|
+
---
|
|
253
364
|
|
|
254
|
-
|
|
365
|
+
## Video Sizing & Aspect Ratios
|
|
255
366
|
|
|
256
|
-
|
|
257
|
-
|
|
258
|
-
|
|
259
|
-
|
|
260
|
-
|
|
261
|
-
|
|
262
|
-
|
|
263
|
-
|
|
264
|
-
| `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
|
|
265
|
-
| `--persona <name>` | Use a saved persona reference |
|
|
266
|
-
| `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
|
|
267
|
-
| `--json` | Return structured output for agents |
|
|
367
|
+
- **WAN models** use dimensions divisible by 16, min 480 px, max 1536 px.
|
|
368
|
+
- **LTX family** (`ltx2-*`, `ltx23-*`) uses dimensions divisible by 64. The current wrapper caps non-WAN video dimensions at 2048 px on the long side.
|
|
369
|
+
- **Seedance** runs at fixed 24 fps and supports 4–15 s durations. Other default/WAN paths support up to 10 s; LTX and WAN animate workflows support up to 20 s.
|
|
370
|
+
- The script auto-normalizes video sizes to satisfy these constraints.
|
|
371
|
+
- Use `--target-resolution <px>` for bare resolution requests like "720p" — it targets the short side and preserves the inherited aspect ratio.
|
|
372
|
+
- Natural-language aspect requests like "portrait", "square", "16:9", or "9:16" are inferred when width/height aren't explicitly set. Combined requests like "720p 9:16" keep the requested short side while applying the requested shape.
|
|
373
|
+
- For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with strict aspect-fit (`fit: inside`) and uses the *resized* dimensions as the final video size. Because that resize uses rounding, a "valid" requested size can still produce an invalid final size (example: `1024×1536` requested, but ref becomes `1024×1535`). `sogni-agent` detects this for local refs and auto-adjusts to a nearby safe size.
|
|
374
|
+
- Pass `--strict-size` to fail instead — the script will print a suggested size.
|
|
268
375
|
|
|
269
|
-
|
|
376
|
+
V2V defaults mirror Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist; `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet. Seedance accepts public HTTPS image, video, and audio references that pass CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
|
|
270
377
|
|
|
271
|
-
|
|
378
|
+
---
|
|
272
379
|
|
|
273
|
-
|
|
274
|
-
|--------|-------|-------|------|-------|
|
|
275
|
-
| `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
|
|
276
|
-
| `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
|
|
277
|
-
| `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
|
|
380
|
+
## LTX-2.3 Prompting Guide
|
|
278
381
|
|
|
279
|
-
|
|
382
|
+
When you use `ltx23-22b-fp8_t2v_distilled`, do **not** feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
|
|
280
383
|
|
|
281
|
-
|
|
384
|
+
- Write one unbroken paragraph — no line breaks, bullets, headers, or tag blocks.
|
|
385
|
+
- Use 4–8 flowing present-tense sentences describing one continuous shot, not a montage.
|
|
386
|
+
- Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
|
|
387
|
+
- Keep characters and objects concrete and stable; describe one main action thread from start to finish.
|
|
388
|
+
- For dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
|
|
389
|
+
- Express mood through visible behavior, motion, and sound cues — not vague adjectives.
|
|
390
|
+
- Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and filler words like "beautiful" or "nice".
|
|
391
|
+
- Match scene density to clip length. For short clips, describe one main beat, not several actions.
|
|
282
392
|
|
|
283
|
-
|
|
393
|
+
**Example rewrite:**
|
|
284
394
|
|
|
285
|
-
```
|
|
286
|
-
|
|
287
|
-
sogni-agent -n 3 "a {red|blue|green} car"
|
|
395
|
+
```text
|
|
396
|
+
User ask: "make a 4k video of a woman in a neon alley"
|
|
288
397
|
|
|
289
|
-
|
|
290
|
-
sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
|
|
291
|
-
# → "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
|
|
398
|
+
LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
|
|
292
399
|
```
|
|
293
400
|
|
|
294
|
-
|
|
401
|
+
---
|
|
295
402
|
|
|
296
|
-
|
|
403
|
+
## Photobooth (Face Transfer)
|
|
297
404
|
|
|
298
|
-
|
|
405
|
+
Generate stylized portraits from a face photo using InstantID ControlNet:
|
|
299
406
|
|
|
300
407
|
```bash
|
|
301
|
-
sogni-agent --
|
|
408
|
+
sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
|
|
409
|
+
sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
|
|
302
410
|
```
|
|
303
411
|
|
|
304
|
-
|
|
412
|
+
Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024×1024 by default. The face image is passed via `--ref` and styled by the prompt. Cannot be combined with `--video` or `-c` / `--context`.
|
|
413
|
+
|
|
414
|
+
Multi-angle mode (`--multi-angle` / `--angles-360`) auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA. `--angles-360-video` generates i2v clips between consecutive angles (including last → first) and concatenates them with `ffmpeg` into a seamless loop.
|
|
415
|
+
|
|
416
|
+
`--balance` / `--balances` does not require a prompt and prints current `SPARK` and `SOGNI` balances before exiting.
|
|
417
|
+
|
|
418
|
+
---
|
|
419
|
+
|
|
420
|
+
## Personas, Memory, and Personality
|
|
305
421
|
|
|
306
422
|
### Personas
|
|
307
423
|
|
|
@@ -314,20 +430,20 @@ sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --descriptio
|
|
|
314
430
|
# Add with voice clip for video voice cloning
|
|
315
431
|
sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip voice.webm
|
|
316
432
|
|
|
317
|
-
# Generate
|
|
433
|
+
# Generate using a persona (auto-injects photo as context)
|
|
318
434
|
sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
|
|
319
435
|
|
|
320
|
-
#
|
|
321
|
-
sogni-agent --video --persona "Sarah"
|
|
436
|
+
# Video using a persona photo + saved voice identity
|
|
437
|
+
sogni-agent --video --persona "Sarah" 'SARAH: "This is my voice."'
|
|
322
438
|
|
|
323
439
|
# List / remove
|
|
324
440
|
sogni-agent --persona-list
|
|
325
441
|
sogni-agent --persona-remove "Mark"
|
|
326
442
|
```
|
|
327
443
|
|
|
328
|
-
|
|
444
|
+
Stored at `~/.config/sogni/personas/`. Pronouns like "me" / "myself" auto-resolve to the `self` persona; "my wife" resolves to `partner`, etc.
|
|
329
445
|
|
|
330
|
-
### Memory (
|
|
446
|
+
### Memory (persistent preferences)
|
|
331
447
|
|
|
332
448
|
Save preferences that agents respect across sessions:
|
|
333
449
|
|
|
@@ -340,9 +456,9 @@ sogni-agent --memory-remove preferred_style
|
|
|
340
456
|
|
|
341
457
|
Stored at `~/.config/sogni/memories.json`.
|
|
342
458
|
|
|
343
|
-
### Personality (
|
|
459
|
+
### Personality (custom agent instructions)
|
|
344
460
|
|
|
345
|
-
|
|
461
|
+
Tell the agent how it should behave:
|
|
346
462
|
|
|
347
463
|
```bash
|
|
348
464
|
sogni-agent --personality-set "Be concise, always use cinematic lighting"
|
|
@@ -352,24 +468,110 @@ sogni-agent --personality-clear
|
|
|
352
468
|
|
|
353
469
|
Stored at `~/.config/sogni/personality.txt`.
|
|
354
470
|
|
|
355
|
-
|
|
471
|
+
---
|
|
356
472
|
|
|
357
|
-
|
|
473
|
+
## Hosted API Modes
|
|
358
474
|
|
|
359
|
-
|
|
360
|
-
|
|
361
|
-
|
|
362
|
-
|
|
363
|
-
|
|
364
|
-
|
|
365
|
-
|
|
366
|
-
|
|
367
|
-
|
|
368
|
-
|
|
369
|
-
|
|
370
|
-
|
|
371
|
-
|
|
475
|
+
Hosted API modes require `SOGNI_API_KEY`.
|
|
476
|
+
|
|
477
|
+
- **`--api-chat`** targets `/v1/chat/completions` with rich creative-agent tools — best for text-first natural-language workflows. Tune with `--api-tools creative-agent|rich|hosted|none`, `--no-api-tool-execution`, `--llm-model`, and `--system`.
|
|
478
|
+
- **`--api-workflow`** targets `/v1/creative-agent/workflows` for durable, async workflow records with event streaming and cancellation. Supported kinds: `image-to-video`, `hosted-tool-sequence`, and `storyboard-video`.
|
|
479
|
+
- **`--api-workflow storyboard-video`** generates a storyline, creates a single GPT Image 2 storyboard sheet, then passes that artifact into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low/medium/high quality for that storyboard sheet.
|
|
480
|
+
- Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact hosted workflow JSON.
|
|
481
|
+
|
|
482
|
+
Override the API origin with `--api-base-url`, `SOGNI_API_BASE_URL`, or `SOGNI_REST_ENDPOINT`.
|
|
483
|
+
Hosted API credentials are only sent to `https://api.sogni.ai` by default. Add trusted custom
|
|
484
|
+
hosts with `SOGNI_API_ALLOWED_HOSTS`; loopback or non-HTTPS local testing requires
|
|
485
|
+
`SOGNI_ALLOW_UNSAFE_API_BASE_URL=1`.
|
|
486
|
+
|
|
487
|
+
> Uploaded local media still uses the direct CLI path because hosted API modes do not accept CLI `--ref*` media flags for server-side tool execution.
|
|
488
|
+
|
|
489
|
+
---
|
|
490
|
+
|
|
491
|
+
## Dynamic Prompt Variations
|
|
492
|
+
|
|
493
|
+
Generate diverse images in a single call with `{option1|option2|option3}` syntax:
|
|
494
|
+
|
|
495
|
+
```bash
|
|
496
|
+
# 3 images: "a red car", "a blue car", "a green car"
|
|
497
|
+
sogni-agent -n 3 "a {red|blue|green} car"
|
|
498
|
+
|
|
499
|
+
# Multiple groups cycle independently
|
|
500
|
+
sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
|
|
501
|
+
# -> "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
Options cycle sequentially per image. Without `{...}` syntax, `-n` produces multiple images with the same prompt.
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## Token Auto-Fallback
|
|
509
|
+
|
|
510
|
+
Use `--token-type auto` to retry with SOGNI tokens when SPARK is insufficient:
|
|
511
|
+
|
|
512
|
+
```bash
|
|
513
|
+
sogni-agent --token-type auto "a dragon eating tacos"
|
|
514
|
+
```
|
|
515
|
+
|
|
516
|
+
Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
|
|
517
|
+
|
|
518
|
+
---
|
|
519
|
+
|
|
520
|
+
## Error Reporting & Output
|
|
521
|
+
|
|
522
|
+
- **Exit codes:** failures use a non-zero exit code with human-readable stderr.
|
|
523
|
+
- **Structured output:** add `--json` when an agent needs machine-parseable success/error data, or `--last` to inspect the last render.
|
|
524
|
+
- **Output files:** use `-o <path>` to save locally; otherwise the CLI prints a result URL.
|
|
525
|
+
- **Quiet mode:** `-q` / `--quiet` suppresses progress output without changing exit semantics.
|
|
526
|
+
|
|
527
|
+
---
|
|
528
|
+
|
|
529
|
+
## For AI Agents
|
|
530
|
+
|
|
531
|
+
This skill is designed to be loaded into agent runtimes as a first-class capability.
|
|
532
|
+
|
|
533
|
+
1. **Behavior contract — [`SKILL.md`](./SKILL.md)**
|
|
534
|
+
The canonical instructions for how the agent should call `sogni-agent`. Load this as the skill source.
|
|
535
|
+
2. **Install/setup hints — [`llm.txt`](./llm.txt)**
|
|
536
|
+
A condensed install/setup reference for agents that fetch `llm.txt` over HTTPS:
|
|
537
|
+
`https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt`
|
|
538
|
+
3. **OpenClaw manifest — [`openclaw.plugin.json`](./openclaw.plugin.json)**
|
|
539
|
+
Plugin metadata, config schema, and defaults for OpenClaw-aware runtimes.
|
|
540
|
+
4. **Structured output — `--json`**
|
|
541
|
+
Use `--json` for machine-readable success/error payloads. Use `--last` to read the previous render's metadata.
|
|
542
|
+
5. **Agent-safe install/upgrade**
|
|
543
|
+
Prefer the `npm install -g` and `git -C "$DEST" pull --ff-only` paths above. Avoid generating clone-or-pull bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs — agent sandboxes correctly route those through approval and the install will stall.
|
|
544
|
+
6. **SSRF / URL safety**
|
|
545
|
+
The CLI runs an SSRF guard ([`ssrf-guard.mjs`](./ssrf-guard.mjs)) before forwarding any HTTP(S) reference to hosted models. Localhost and private-network URLs are rejected; only public HTTPS references are forwarded as Seedance multimodal context.
|
|
546
|
+
|
|
547
|
+
---
|
|
548
|
+
|
|
549
|
+
## Development
|
|
550
|
+
|
|
551
|
+
The public skill keeps CLI/runtime glue in this repo, but Sogni model routing, video workflow defaults, quality tiers, and prompt guardrails are generated from the private `sogni-creative-agent` repo. The generated runtime is committed at [`generated/creative-agent-runtime.mjs`](./generated/creative-agent-runtime.mjs) so public installs do not need access to the private repo.
|
|
552
|
+
|
|
553
|
+
Run the test suite:
|
|
554
|
+
|
|
555
|
+
```bash
|
|
556
|
+
npm test
|
|
557
|
+
```
|
|
558
|
+
|
|
559
|
+
`npm test` first runs `npm run check:creative-agent-runtime`, which regenerates the runtime file and fails if it differs from the committed copy.
|
|
560
|
+
|
|
561
|
+
With both repos checked out as siblings, refresh the generated runtime before publishing:
|
|
562
|
+
|
|
563
|
+
```bash
|
|
564
|
+
npm run sync:creative-agent-runtime
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
Reusable workflow rules should be added to `../sogni-creative-agent` first, then synced here. Keep storyboard planning, tool argument validation, prompt linting, typed media turn intent, and typed repair/control semantics aligned with `sogni-chat`, `sogni-client`, and `sogni-api` hosted chat/workflow endpoints rather than recreating skill-only regex guards. Prefer generated or copied shared helpers for hosted workflow compilation, schema argument validation, `CreativeTurnPlannerFields` / `classifyMediaTurnIntent()` media-routing contracts, repair-control decisions, and guard telemetry summaries over skill-local guard code — this keeps public-agent behavior close to `/v1/chat/completions` and `/v1/creative-agent/workflows`.
|
|
568
|
+
|
|
569
|
+
Public-skill regex should stay limited to CLI argument/fact extraction such as file paths, URLs, extensions, dimensions, durations, and explicit positions. Hosted-style decisions such as latest-video continuation, uploaded-video modification, image-selection waits, stitch-after-batch state, and repair/control routing belong upstream in typed planner/runtime fields before they are synced here.
|
|
570
|
+
|
|
571
|
+
Issues and feature requests: [github.com/Sogni-AI/sogni-creative-agent-skill/issues](https://github.com/Sogni-AI/sogni-creative-agent-skill/issues).
|
|
572
|
+
|
|
573
|
+
---
|
|
372
574
|
|
|
373
575
|
## License
|
|
374
576
|
|
|
375
|
-
MIT
|
|
577
|
+
[MIT](./LICENSE) © Sogni AI
|