@sogni-ai/sogni-creative-agent-skill 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +385 -0
- package/SKILL.md +937 -0
- package/Support/Claude/claude_desktop_config.json +8 -0
- package/desktop-extension/manifest.json +114 -0
- package/desktop-extension/server/mcp-server.mjs +4 -0
- package/desktop-extension/server/package.json +9 -0
- package/desktop-extension/server/sogni-agent.mjs +4 -0
- package/env.mjs +17 -0
- package/llm.txt +69 -0
- package/mcp-server.mjs +1665 -0
- package/openclaw-plugin.mjs +3 -0
- package/openclaw.plugin.json +136 -0
- package/package.json +76 -0
- package/scripts/sync-openclaw-plugin.mjs +50 -0
- package/skill-package.json +12 -0
- package/sogni-agent.mjs +4866 -0
- package/version.mjs +1 -0
package/SKILL.md
ADDED
|
@@ -0,0 +1,937 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: sogni-creative-agent-skill
|
|
3
|
+
version: "2.0.0"
|
|
4
|
+
description: Sogni Creative Agent Skill: Creative AI superpowers for all AI agent runtimes. Generates images and videos using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to "draw", "generate", "create an image", "make a video/animate", "apply a style", or "generate me as a superhero".
|
|
5
|
+
homepage: https://sogni.ai
|
|
6
|
+
metadata:
|
|
7
|
+
clawdbot:
|
|
8
|
+
emoji: "🎨"
|
|
9
|
+
primaryEnv: "SOGNI_API_KEY"
|
|
10
|
+
os: ["darwin", "linux", "win32"]
|
|
11
|
+
requires:
|
|
12
|
+
bins: ["node"]
|
|
13
|
+
anyBins: ["ffmpeg"]
|
|
14
|
+
env:
|
|
15
|
+
- "SOGNI_API_KEY"
|
|
16
|
+
- "SOGNI_USERNAME"
|
|
17
|
+
- "SOGNI_PASSWORD"
|
|
18
|
+
- "SOGNI_CREDENTIALS_PATH"
|
|
19
|
+
- "SOGNI_LAST_RENDER_PATH"
|
|
20
|
+
- "SOGNI_MEDIA_INBOUND_DIR"
|
|
21
|
+
- "OPENCLAW_CONFIG_PATH"
|
|
22
|
+
- "OPENCLAW_PLUGIN_CONFIG"
|
|
23
|
+
- "FFMPEG_PATH"
|
|
24
|
+
- "SOGNI_DOWNLOADS_DIR"
|
|
25
|
+
- "SOGNI_MCP_SAVE_DOWNLOADS"
|
|
26
|
+
- "SOGNI_ALLOWED_DOWNLOAD_HOSTS"
|
|
27
|
+
config:
|
|
28
|
+
- "~/.config/sogni/credentials"
|
|
29
|
+
- "~/.openclaw/openclaw.json"
|
|
30
|
+
- "~/.clawdbot/media/inbound"
|
|
31
|
+
- "~/.config/sogni/last-render.json"
|
|
32
|
+
- "~/Downloads/sogni"
|
|
33
|
+
install:
|
|
34
|
+
- id: npm
|
|
35
|
+
kind: exec
|
|
36
|
+
command: "cd {{skillDir}} && cp skill-package.json package.json && npm i"
|
|
37
|
+
label: "Prepare runtime dependencies"
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
# Sogni Image & Video Generation
|
|
41
|
+
|
|
42
|
+
Generate **images and videos** using Sogni AI's decentralized GPU network.
|
|
43
|
+
|
|
44
|
+
## Install Request Policy
|
|
45
|
+
|
|
46
|
+
When a user asks to install this plugin, skill, or Sogni Creative Agent Skill without specifically asking for MCP, install it as the command-line tool plus this skill by default. Do not add Claude MCP, Claude Desktop MCP, or an MCP server registration unless the user explicitly asks for MCP.
|
|
47
|
+
|
|
48
|
+
Default install path:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
52
|
+
sogni-agent --version
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI.
|
|
56
|
+
|
|
57
|
+
## Runtime Routing
|
|
58
|
+
|
|
59
|
+
If sogni MCP tools are already available in your session (e.g., `generate_image`, `manage_personas`), **prefer MCP tools over direct CLI invocation**. MCP tools handle input validation, credential checking, file downloads, and result formatting automatically. Only fall back to the CLI (`node sogni-agent.mjs ...`) if MCP tools are not registered in your environment. This runtime preference does not change the install default above.
|
|
60
|
+
|
|
61
|
+
## Setup
|
|
62
|
+
|
|
63
|
+
1. **Get Sogni credentials** at https://app.sogni.ai/
|
|
64
|
+
2. **Create credentials file:**
|
|
65
|
+
```bash
|
|
66
|
+
mkdir -p ~/.config/sogni
|
|
67
|
+
cat > ~/.config/sogni/credentials << 'EOF'
|
|
68
|
+
SOGNI_API_KEY=your_api_key
|
|
69
|
+
# or:
|
|
70
|
+
# SOGNI_USERNAME=your_username
|
|
71
|
+
# SOGNI_PASSWORD=your_password
|
|
72
|
+
EOF
|
|
73
|
+
chmod 600 ~/.config/sogni/credentials
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
You can also export `SOGNI_API_KEY`, or `SOGNI_USERNAME` + `SOGNI_PASSWORD`, instead of writing the file.
|
|
77
|
+
|
|
78
|
+
3. **Install the CLI and skill by default:**
|
|
79
|
+
```bash
|
|
80
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
81
|
+
sogni-agent --version
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
Configure the agent/runtime to use this `SKILL.md`. Do not configure MCP unless the user specifically asks for MCP.
|
|
85
|
+
|
|
86
|
+
4. **Install dependencies if working from a clone:**
|
|
87
|
+
```bash
|
|
88
|
+
cd /path/to/sogni-creative-agent-skill
|
|
89
|
+
npm i
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
5. **Or install from npm into a local skill directory (no git clone):**
|
|
93
|
+
```bash
|
|
94
|
+
mkdir -p ~/.clawdbot/skills
|
|
95
|
+
cd ~/.clawdbot/skills
|
|
96
|
+
npm i @sogni-ai/sogni-creative-agent-skill
|
|
97
|
+
ln -sfn node_modules/@sogni-ai/sogni-creative-agent-skill sogni-creative-agent-skill
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from `skill-package.json` during install. That avoids relying on a root `package.json` being present in the published skill artifact.
|
|
101
|
+
|
|
102
|
+
## Filesystem Paths and Overrides
|
|
103
|
+
|
|
104
|
+
Default file paths used by this skill:
|
|
105
|
+
|
|
106
|
+
- Credentials file (read): `~/.config/sogni/credentials`
|
|
107
|
+
- Last render metadata (read/write): `~/.config/sogni/last-render.json`
|
|
108
|
+
- OpenClaw config (read): `~/.openclaw/openclaw.json`
|
|
109
|
+
- Media listing for `--list-media` (read): `~/.clawdbot/media/inbound`
|
|
110
|
+
- MCP local result copies (write): `~/Downloads/sogni`
|
|
111
|
+
|
|
112
|
+
Path override environment variables:
|
|
113
|
+
|
|
114
|
+
- `SOGNI_CREDENTIALS_PATH`
|
|
115
|
+
- `SOGNI_LAST_RENDER_PATH`
|
|
116
|
+
- `SOGNI_MEDIA_INBOUND_DIR`
|
|
117
|
+
- `OPENCLAW_CONFIG_PATH`
|
|
118
|
+
- `SOGNI_DOWNLOADS_DIR` (MCP)
|
|
119
|
+
- `SOGNI_MCP_SAVE_DOWNLOADS=0` to disable MCP local file writes
|
|
120
|
+
- `SOGNI_ALLOWED_DOWNLOAD_HOSTS` to override which HTTPS hosts the MCP server may auto-download and save locally
|
|
121
|
+
|
|
122
|
+
## Usage (Images & Video)
|
|
123
|
+
|
|
124
|
+
```bash
|
|
125
|
+
# Generate and get URL
|
|
126
|
+
node sogni-agent.mjs "a cat wearing a hat"
|
|
127
|
+
|
|
128
|
+
# Quality presets (recommended — auto-selects model, steps, and size)
|
|
129
|
+
node sogni-agent.mjs -Q fast "a cat wearing a hat" # z_image_turbo, 8 steps, 512x512 (~5-10s)
|
|
130
|
+
node sogni-agent.mjs -Q hq "a cat wearing a hat" # z_image_turbo, default steps, 768x768 (~10-15s)
|
|
131
|
+
node sogni-agent.mjs -Q pro "a cat wearing a hat" # flux2_dev, 40 steps, 1024x1024 (~2min)
|
|
132
|
+
|
|
133
|
+
# Dynamic prompt variations — diverse images in one call
|
|
134
|
+
node sogni-agent.mjs -n 3 "a {red|blue|green} sports car"
|
|
135
|
+
# → generates "a red sports car", "a blue sports car", "a green sports car"
|
|
136
|
+
|
|
137
|
+
# Token auto-fallback (tries SPARK, falls back to SOGNI)
|
|
138
|
+
node sogni-agent.mjs --token-type auto "a cat wearing a hat"
|
|
139
|
+
|
|
140
|
+
# Save to file
|
|
141
|
+
node sogni-agent.mjs -o /tmp/cat.png "a cat wearing a hat"
|
|
142
|
+
|
|
143
|
+
# JSON output (for scripting)
|
|
144
|
+
node sogni-agent.mjs --json "a cat wearing a hat"
|
|
145
|
+
|
|
146
|
+
# Check token balances (no prompt required)
|
|
147
|
+
node sogni-agent.mjs --balance
|
|
148
|
+
|
|
149
|
+
# Check token balances in JSON
|
|
150
|
+
node sogni-agent.mjs --json --balance
|
|
151
|
+
|
|
152
|
+
# Quiet mode (suppress progress)
|
|
153
|
+
node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
## Options
|
|
157
|
+
|
|
158
|
+
| Flag | Description | Default |
|
|
159
|
+
|------|-------------|---------|
|
|
160
|
+
| `-Q, --quality <tier>` | Quality preset: fast\|hq\|pro (auto-selects model/steps/size) | - |
|
|
161
|
+
| `-o, --output <path>` | Save to file | prints URL |
|
|
162
|
+
| `-m, --model <id>` | Model ID (overrides --quality) | z_image_turbo_bf16 |
|
|
163
|
+
| `-w, --width <px>` | Width | 512 |
|
|
164
|
+
| `-h, --height <px>` | Height | 512 |
|
|
165
|
+
| `-n, --count <num>` | Number of images (supports {a\|b\|c} prompt variations) | 1 |
|
|
166
|
+
| `-t, --timeout <sec>` | Timeout seconds | 30 (300 for video) |
|
|
167
|
+
| `-s, --seed <num>` | Specific seed | random |
|
|
168
|
+
| `--last-seed` | Reuse seed from last render | - |
|
|
169
|
+
| `--seed-strategy <s>` | Seed strategy: random\|prompt-hash | prompt-hash |
|
|
170
|
+
| `--multi-angle` | Multiple angles LoRA mode (Qwen Image Edit) | - |
|
|
171
|
+
| `--angles-360` | Generate 8 azimuths (front -> front-left) | - |
|
|
172
|
+
| `--angles-360-video` | Assemble looping 360 mp4 using i2v between angles (requires ffmpeg) | - |
|
|
173
|
+
| `--azimuth <key>` | front\|front-right\|right\|back-right\|back\|back-left\|left\|front-left | front |
|
|
174
|
+
| `--elevation <key>` | low-angle\|eye-level\|elevated\|high-angle | eye-level |
|
|
175
|
+
| `--distance <key>` | close-up\|medium\|wide | medium |
|
|
176
|
+
| `--angle-strength <n>` | LoRA strength for multiple_angles | 0.9 |
|
|
177
|
+
| `--angle-description <text>` | Optional subject description | - |
|
|
178
|
+
| `--steps <num>` | Override steps (model-dependent) | - |
|
|
179
|
+
| `--guidance <num>` | Override guidance (model-dependent) | - |
|
|
180
|
+
| `--output-format <f>` | Image output format: png\|jpg | png |
|
|
181
|
+
| `--sampler <name>` | Sampler (model-dependent) | - |
|
|
182
|
+
| `--scheduler <name>` | Scheduler (model-dependent) | - |
|
|
183
|
+
| `--lora <id>` | LoRA id (repeatable, edit only) | - |
|
|
184
|
+
| `--loras <ids>` | Comma-separated LoRA ids | - |
|
|
185
|
+
| `--lora-strength <n>` | LoRA strength (repeatable) | - |
|
|
186
|
+
| `--lora-strengths <n>` | Comma-separated LoRA strengths | - |
|
|
187
|
+
| `--token-type <type>` | Token type: spark\|sogni\|auto (auto retries with alternate) | spark |
|
|
188
|
+
| `--balance, --balances` | Show SPARK/SOGNI balances and exit | - |
|
|
189
|
+
| `-c, --context <path>` | Context image for editing | - |
|
|
190
|
+
| `--last-image` | Use last generated image as context/ref | - |
|
|
191
|
+
| `--video, -v` | Generate video instead of image | - |
|
|
192
|
+
| `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
|
|
193
|
+
| `--fps <num>` | Frames per second (video) | model default |
|
|
194
|
+
| `--duration <sec>` | Duration in seconds (video) | 5 |
|
|
195
|
+
| `--frames <num>` | Override total frames (video) | - |
|
|
196
|
+
| `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
|
|
197
|
+
| `--auto-resize-assets` | Auto-resize video assets | true |
|
|
198
|
+
| `--no-auto-resize-assets` | Disable auto-resize | - |
|
|
199
|
+
| `--estimate-video-cost` | Estimate video cost and exit (requires --steps) | - |
|
|
200
|
+
| `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
|
|
201
|
+
| `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
|
|
202
|
+
| `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
|
|
203
|
+
| `--ref <path>` | Reference image for video or photobooth face | required for video/photobooth |
|
|
204
|
+
| `--ref-end <path>` | End frame for i2v interpolation | - |
|
|
205
|
+
| `--ref-audio <path>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
|
|
206
|
+
| `--audio-start <sec>` | Start offset into `--ref-audio` | - |
|
|
207
|
+
| `--audio-duration <sec>` | Duration slice from `--ref-audio` | - |
|
|
208
|
+
| `--reference-audio-identity <path>` | Voice identity clip for LTX native audio | - |
|
|
209
|
+
| `--voice-persona <name>` | Use saved persona voice clip as LTX voice identity | - |
|
|
210
|
+
| `--ref-video <path>` | Reference video for animate/v2v workflows | - |
|
|
211
|
+
| `--video-start <sec>` | Start offset into `--ref-video` for segmented V2V/animate | - |
|
|
212
|
+
| `--controlnet-name <name>` | ControlNet type for v2v: canny\|pose\|depth\|detailer | - |
|
|
213
|
+
| `--controlnet-strength <n>` | ControlNet strength for v2v (0.0-1.0) | canny/pose/depth 0.85, detailer 1.0 |
|
|
214
|
+
| `--sam2-coordinates <coords>` | SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2) | - |
|
|
215
|
+
| `--trim-end-frame` | Trim last frame for seamless video stitching | - |
|
|
216
|
+
| `--first-frame-strength <n>` | Keyframe strength for start frame (0.0-1.0) | - |
|
|
217
|
+
| `--last-frame-strength <n>` | Keyframe strength for end frame (0.0-1.0) | - |
|
|
218
|
+
| `--last` | Show last render info | - |
|
|
219
|
+
| `--json` | JSON output | false |
|
|
220
|
+
| `--strict-size` | Do not auto-adjust i2v video size for reference resizing constraints | false |
|
|
221
|
+
| `-q, --quiet` | No progress output | false |
|
|
222
|
+
| `--extract-last-frame <video> <image>` | Extract last frame from video (safe ffmpeg wrapper) | - |
|
|
223
|
+
| `--concat-videos <out> <clips...>` | Concatenate video clips (safe ffmpeg wrapper) | - |
|
|
224
|
+
| `--concat-audio <path>` | Optional audio track to mux over `--concat-videos` output | - |
|
|
225
|
+
| `--concat-audio-start <sec>` | Start offset into `--concat-audio` | - |
|
|
226
|
+
| `--list-media [type]` | List recent inbound media (images\|audio\|all) | images |
|
|
227
|
+
| `--no-filter` | Disable NSFW content filter | - |
|
|
228
|
+
| `--memory-set <key> <value>` | Save a user preference | - |
|
|
229
|
+
| `--memory-get <key>` | Get a specific memory | - |
|
|
230
|
+
| `--memory-list` | List all saved memories | - |
|
|
231
|
+
| `--memory-remove <key>` | Delete a memory | - |
|
|
232
|
+
| `--personality-set <text>` | Set custom agent personality instructions | - |
|
|
233
|
+
| `--personality-get` | Show current personality | - |
|
|
234
|
+
| `--personality-clear` | Reset personality to default | - |
|
|
235
|
+
| `--persona-add <name>` | Add a persona (with --ref, --relationship, --description) | - |
|
|
236
|
+
| `--persona-list` | List all saved personas | - |
|
|
237
|
+
| `--persona-remove <name>` | Remove a persona and its files | - |
|
|
238
|
+
| `--persona-resolve <name>` | Look up persona by name/tag/pronoun | - |
|
|
239
|
+
| `--persona <name>` | Generate using persona's reference photo as context | - |
|
|
240
|
+
| `--relationship <type>` | Persona relationship: self\|partner\|child\|friend\|pet | friend |
|
|
241
|
+
| `--voice-clip <path>` | Voice clip audio for LTX-2.3 voice cloning | - |
|
|
242
|
+
|
|
243
|
+
## OpenClaw Config Defaults
|
|
244
|
+
|
|
245
|
+
When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defaults from:
|
|
246
|
+
|
|
247
|
+
`~/.openclaw/openclaw.json`
|
|
248
|
+
|
|
249
|
+
```json
|
|
250
|
+
{
|
|
251
|
+
"plugins": {
|
|
252
|
+
"entries": {
|
|
253
|
+
"sogni-creative-agent-skill": {
|
|
254
|
+
"enabled": true,
|
|
255
|
+
"config": {
|
|
256
|
+
"defaultImageModel": "z_image_turbo_bf16",
|
|
257
|
+
"defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
|
|
258
|
+
"defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
|
|
259
|
+
"videoModels": {
|
|
260
|
+
"t2v": "ltx23-22b-fp8_t2v_distilled",
|
|
261
|
+
"i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
|
|
262
|
+
"s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
|
|
263
|
+
"ia2v": "ltx23-22b-fp8_ia2v_distilled",
|
|
264
|
+
"a2v": "ltx23-22b-fp8_a2v_distilled",
|
|
265
|
+
"animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
|
|
266
|
+
"animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
|
|
267
|
+
"v2v": "ltx23-22b-fp8_v2v_distilled"
|
|
268
|
+
},
|
|
269
|
+
"defaultVideoWorkflow": "t2v",
|
|
270
|
+
"defaultNetwork": "fast",
|
|
271
|
+
"defaultTokenType": "spark",
|
|
272
|
+
"seedStrategy": "prompt-hash",
|
|
273
|
+
"modelDefaults": {
|
|
274
|
+
"flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
|
|
275
|
+
"flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
|
|
276
|
+
},
|
|
277
|
+
"defaultWidth": 768,
|
|
278
|
+
"defaultHeight": 768,
|
|
279
|
+
"defaultCount": 1,
|
|
280
|
+
"defaultFps": 16,
|
|
281
|
+
"defaultDurationSec": 5,
|
|
282
|
+
"defaultImageTimeoutSec": 30,
|
|
283
|
+
"defaultVideoTimeoutSec": 300,
|
|
284
|
+
"credentialsPath": "~/.config/sogni/credentials",
|
|
285
|
+
"lastRenderPath": "~/.config/sogni/last-render.json",
|
|
286
|
+
"mediaInboundDir": "~/.clawdbot/media/inbound"
|
|
287
|
+
}
|
|
288
|
+
}
|
|
289
|
+
}
|
|
290
|
+
}
|
|
291
|
+
}
|
|
292
|
+
```
|
|
293
|
+
|
|
294
|
+
CLI flags always override these defaults.
|
|
295
|
+
If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
|
|
296
|
+
Seed strategies: `prompt-hash` (deterministic) or `random`.
|
|
297
|
+
|
|
298
|
+
## Image Models
|
|
299
|
+
|
|
300
|
+
| Model | Speed | Use Case |
|
|
301
|
+
|-------|-------|----------|
|
|
302
|
+
| `z_image_turbo_bf16` | Fast (~5-10s) | General purpose, default |
|
|
303
|
+
| `flux1-schnell-fp8` | Very fast | Quick iterations |
|
|
304
|
+
| `flux2_dev_fp8` | Slow (~2min) | High quality |
|
|
305
|
+
| `chroma-v.46-flash_fp8` | Medium | Balanced |
|
|
306
|
+
| `qwen_image_edit_2511_fp8` | Medium | Image editing with context (up to 3) |
|
|
307
|
+
| `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
|
|
308
|
+
| `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
|
|
309
|
+
|
|
310
|
+
## Video Models
|
|
311
|
+
|
|
312
|
+
### WAN 2.2 Models
|
|
313
|
+
|
|
314
|
+
| Model | Speed | Use Case |
|
|
315
|
+
|-------|-------|----------|
|
|
316
|
+
| `ltx23-22b-fp8_t2v_distilled` | Fast (~2-3min) | Default text-to-video with native dialogue/audio |
|
|
317
|
+
| `ltx23-22b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video with native dialogue/audio |
|
|
318
|
+
| `ltx23-22b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
|
|
319
|
+
| `ltx23-22b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
|
|
320
|
+
| `ltx23-22b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
|
|
321
|
+
| `seedance2` | Variable | Seedance 2.0 text-to-video alias, 4-15s, native audio |
|
|
322
|
+
| `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video alias |
|
|
323
|
+
| `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video alias |
|
|
324
|
+
| `seedance2-v2v` | Variable | Seedance 2.0 video-to-video alias, no ControlNet |
|
|
325
|
+
| `wan_v2.2-14b-fp8_i2v_lightx2v` | Fast | Simple image-to-video |
|
|
326
|
+
| `wan_v2.2-14b-fp8_i2v` | Slow | Higher quality video |
|
|
327
|
+
| `wan_v2.2-14b-fp8_t2v_lightx2v` | Fast | Text-to-video |
|
|
328
|
+
| `wan_v2.2-14b-fp8_s2v_lightx2v` | Fast | Face lip-sync with uploaded audio |
|
|
329
|
+
| `wan_v2.2-14b-fp8_animate-move_lightx2v` | Fast | Animate-move |
|
|
330
|
+
| `wan_v2.2-14b-fp8_animate-replace_lightx2v` | Fast | Animate-replace |
|
|
331
|
+
|
|
332
|
+
### LTX-2 / LTX-2.3 Models
|
|
333
|
+
|
|
334
|
+
| Model | Speed | Use Case |
|
|
335
|
+
|-------|-------|----------|
|
|
336
|
+
| `ltx2-19b-fp8_t2v_distilled` | Fast (~2-3min) | Text-to-video, 8-step |
|
|
337
|
+
| `ltx2-19b-fp8_t2v` | Medium (~5min) | Text-to-video, 20-step quality |
|
|
338
|
+
| `ltx2-19b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video, 8-step |
|
|
339
|
+
| `ltx2-19b-fp8_i2v` | Medium (~5min) | Image-to-video, 20-step quality |
|
|
340
|
+
| `ltx2-19b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
|
|
341
|
+
| `ltx2-19b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
|
|
342
|
+
| `ltx2-19b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
|
|
343
|
+
| `ltx2-19b-fp8_v2v` | Medium (~5min) | Video-to-video with ControlNet, quality |
|
|
344
|
+
|
|
345
|
+
## Image Editing with Context
|
|
346
|
+
|
|
347
|
+
Edit images using reference images (Qwen models support up to 3):
|
|
348
|
+
|
|
349
|
+
```bash
|
|
350
|
+
# Single context image
|
|
351
|
+
node sogni-agent.mjs -c photo.jpg "make the background a beach"
|
|
352
|
+
|
|
353
|
+
# Multiple context images (subject + style)
|
|
354
|
+
node sogni-agent.mjs -c subject.jpg -c style.jpg "apply the style to the subject"
|
|
355
|
+
|
|
356
|
+
# Use last generated image as context
|
|
357
|
+
node sogni-agent.mjs --last-image "make it more vibrant"
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`.
|
|
361
|
+
|
|
362
|
+
## Photobooth (Face Transfer)
|
|
363
|
+
|
|
364
|
+
Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use `--photobooth` with `--ref` pointing to their face image.
|
|
365
|
+
|
|
366
|
+
```bash
|
|
367
|
+
# Basic photobooth
|
|
368
|
+
node sogni-agent.mjs --photobooth --ref face.jpg "80s fashion portrait"
|
|
369
|
+
|
|
370
|
+
# Multiple outputs
|
|
371
|
+
node sogni-agent.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
|
|
372
|
+
|
|
373
|
+
# Custom ControlNet tuning
|
|
374
|
+
node sogni-agent.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
|
|
378
|
+
|
|
379
|
+
**Agent usage:**
|
|
380
|
+
```bash
|
|
381
|
+
# Photobooth: stylize a face photo
|
|
382
|
+
node {{skillDir}}/sogni-agent.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
|
|
383
|
+
|
|
384
|
+
# Multiple photobooth outputs
|
|
385
|
+
node {{skillDir}}/sogni-agent.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
## Multiple Angles (Turnaround)
|
|
389
|
+
|
|
390
|
+
Generate specific camera angles from a single reference image using the Multiple Angles LoRA:
|
|
391
|
+
|
|
392
|
+
```bash
|
|
393
|
+
# Single angle
|
|
394
|
+
node sogni-agent.mjs --multi-angle -c subject.jpg \
|
|
395
|
+
--azimuth front-right --elevation eye-level --distance medium \
|
|
396
|
+
--angle-strength 0.9 \
|
|
397
|
+
"studio portrait, same person"
|
|
398
|
+
|
|
399
|
+
# 360 sweep (8 azimuths)
|
|
400
|
+
node sogni-agent.mjs --angles-360 -c subject.jpg --distance medium --elevation eye-level \
|
|
401
|
+
"studio portrait, same person"
|
|
402
|
+
|
|
403
|
+
# 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
|
|
404
|
+
node sogni-agent.mjs --angles-360 --angles-360-video /tmp/turntable.mp4 \
|
|
405
|
+
-c subject.jpg --distance medium --elevation eye-level \
|
|
406
|
+
"studio portrait, same person"
|
|
407
|
+
```
|
|
408
|
+
|
|
409
|
+
The prompt is auto-built with the required `<sks>` token plus the selected camera angle keywords.
|
|
410
|
+
`--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
|
|
411
|
+
|
|
412
|
+
### 360 Video Best Practices
|
|
413
|
+
|
|
414
|
+
When a user requests a "360 video", follow this workflow:
|
|
415
|
+
|
|
416
|
+
1. **Default camera parameters** (do not ask unless they specify):
|
|
417
|
+
- **Elevation**: default to **medium**
|
|
418
|
+
- **Distance**: default to **medium**
|
|
419
|
+
|
|
420
|
+
2. **Map user terms to flags**:
|
|
421
|
+
| User says | Flag value |
|
|
422
|
+
|-----------|------------|
|
|
423
|
+
| "high" angle | `--elevation high-angle` |
|
|
424
|
+
| "medium" angle | `--elevation eye-level` |
|
|
425
|
+
| "low" angle | `--elevation low-angle` |
|
|
426
|
+
| "close" | `--distance close-up` |
|
|
427
|
+
| "medium" distance | `--distance medium` |
|
|
428
|
+
| "far" | `--distance wide` |
|
|
429
|
+
|
|
430
|
+
3. **Always use first-frame/last-frame stitching** - the `--angles-360-video` flag automatically handles this by generating i2v clips between consecutive angles including last→first for seamless looping.
|
|
431
|
+
|
|
432
|
+
4. **Example command**:
|
|
433
|
+
```bash
|
|
434
|
+
node sogni-agent.mjs --angles-360 --angles-360-video /tmp/output.mp4 \
|
|
435
|
+
-c /path/to/image.png --elevation eye-level --distance medium \
|
|
436
|
+
"description of subject"
|
|
437
|
+
```
|
|
438
|
+
|
|
439
|
+
### Transition Video Rule
|
|
440
|
+
|
|
441
|
+
For **any transition video work**, always use the **Sogni skill/plugin** (not raw ffmpeg or other shell commands). Use the built-in `--extract-last-frame`, `--concat-videos`, and `--looping` flags for video manipulation.
|
|
442
|
+
|
|
443
|
+
### Insufficient Funds Handling
|
|
444
|
+
|
|
445
|
+
Use `--token-type auto` to automatically retry with SOGNI tokens when SPARK is insufficient.
|
|
446
|
+
|
|
447
|
+
When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply:
|
|
448
|
+
|
|
449
|
+
"Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"
|
|
450
|
+
|
|
451
|
+
## Video Generation
|
|
452
|
+
|
|
453
|
+
Generate videos from a reference image:
|
|
454
|
+
|
|
455
|
+
```bash
|
|
456
|
+
# Text-to-video (t2v)
|
|
457
|
+
node sogni-agent.mjs --video "A narrator says \"welcome to the story\" as ocean waves crash"
|
|
458
|
+
|
|
459
|
+
# Basic video from image
|
|
460
|
+
node sogni-agent.mjs --video --ref cat.jpg -o cat.mp4 "cat walks around"
|
|
461
|
+
|
|
462
|
+
# Use last generated image as reference
|
|
463
|
+
node sogni-agent.mjs --last-image --video "gentle camera pan"
|
|
464
|
+
|
|
465
|
+
# Custom duration and FPS
|
|
466
|
+
node sogni-agent.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"
|
|
467
|
+
|
|
468
|
+
# Bare "720p" / "HD" without exact pixels: preserve aspect via short-side target
|
|
469
|
+
node sogni-agent.mjs --video --target-resolution 768 \
|
|
470
|
+
"A calm cinematic shot of lanterns drifting across a night lake"
|
|
471
|
+
|
|
472
|
+
# Seedance 2.0 explicit text-to-video alias
|
|
473
|
+
node sogni-agent.mjs --video -m seedance2 --duration 8 \
|
|
474
|
+
"A polished product reveal with native ambient sound"
|
|
475
|
+
|
|
476
|
+
# Sound-to-video (s2v)
|
|
477
|
+
node sogni-agent.mjs --video --ref face.jpg --ref-audio speech.m4a \
|
|
478
|
+
-m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
|
|
479
|
+
|
|
480
|
+
# Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
|
|
481
|
+
node sogni-agent.mjs --video --ref cover.jpg --ref-audio song.mp3 \
|
|
482
|
+
"music video with synchronized motion"
|
|
483
|
+
|
|
484
|
+
# Audio-to-video (auto-routes to LTX 2.3 a2v)
|
|
485
|
+
node sogni-agent.mjs --video --ref-audio song.mp3 \
|
|
486
|
+
"abstract audio-reactive visualizer"
|
|
487
|
+
|
|
488
|
+
# Persona/voice identity with LTX native audio
|
|
489
|
+
node sogni-agent.mjs --video --reference-audio-identity voice.webm \
|
|
490
|
+
"NARRATOR: \"This is my voice.\""
|
|
491
|
+
|
|
492
|
+
# LTX-2.3 text-to-video
|
|
493
|
+
node sogni-agent.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
|
|
494
|
+
"A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
|
|
495
|
+
|
|
496
|
+
# Animate (motion transfer)
|
|
497
|
+
node sogni-agent.mjs --video --ref subject.jpg --ref-video motion.mp4 \
|
|
498
|
+
--workflow animate-move "transfer motion"
|
|
499
|
+
|
|
500
|
+
# Segment a longer reference video for local stitched workflows
|
|
501
|
+
node sogni-agent.mjs --video --workflow v2v --ref-video dance.mp4 \
|
|
502
|
+
--video-start 10 --duration 8 --controlnet-name pose \
|
|
503
|
+
"robot dancing"
|
|
504
|
+
```
|
|
505
|
+
|
|
506
|
+
## Video-to-Video (V2V) with ControlNet
|
|
507
|
+
|
|
508
|
+
Transform an existing video using LTX-2 models with ControlNet guidance:
|
|
509
|
+
|
|
510
|
+
```bash
|
|
511
|
+
# Basic v2v with canny edge detection
|
|
512
|
+
node sogni-agent.mjs --video --workflow v2v --ref-video input.mp4 \
|
|
513
|
+
--controlnet-name canny "stylized anime version"
|
|
514
|
+
|
|
515
|
+
# V2V with pose detection and custom strength
|
|
516
|
+
node sogni-agent.mjs --video --workflow v2v --ref-video dance.mp4 \
|
|
517
|
+
--controlnet-name pose --controlnet-strength 0.7 "robot dancing"
|
|
518
|
+
|
|
519
|
+
# V2V with depth map
|
|
520
|
+
node sogni-agent.mjs --video --workflow v2v --ref-video scene.mp4 \
|
|
521
|
+
--controlnet-name depth "watercolor painting style"
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
|
|
525
|
+
Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet.
|
|
526
|
+
|
|
527
|
+
```bash
|
|
528
|
+
# Seedance V2V without ControlNet
|
|
529
|
+
node sogni-agent.mjs --video --workflow v2v -m seedance2-v2v \
|
|
530
|
+
--ref-video input.mp4 "make the clip more cinematic"
|
|
531
|
+
```
|
|
532
|
+
|
|
533
|
+
## Photo Restoration
|
|
534
|
+
|
|
535
|
+
Restore damaged vintage photos using Qwen image editing:
|
|
536
|
+
|
|
537
|
+
```bash
|
|
538
|
+
# Basic restoration
|
|
539
|
+
sogni-agent -c damaged_photo.jpg -o restored.png \
|
|
540
|
+
"professionally restore this vintage photograph, remove damage and scratches"
|
|
541
|
+
|
|
542
|
+
# Detailed restoration with preservation hints
|
|
543
|
+
sogni-agent -c old_photo.jpg -o restored.png -w 1024 -h 1280 \
|
|
544
|
+
"restore this vintage photo, remove peeling, tears and wear marks, \
|
|
545
|
+
preserve natural features and expression, maintain warm nostalgic color tones"
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
**Tips for good restorations:**
|
|
549
|
+
- Describe the damage: "peeling", "scratches", "tears", "fading"
|
|
550
|
+
- Specify what to preserve: "natural features", "eye color", "hair", "expression"
|
|
551
|
+
- Mention the era for color tones: "1970s warm tones", "vintage sepia"
|
|
552
|
+
|
|
553
|
+
**Finding received images (Telegram/etc):**
|
|
554
|
+
```bash
|
|
555
|
+
node {{skillDir}}/sogni-agent.mjs --json --list-media images
|
|
556
|
+
```
|
|
557
|
+
|
|
558
|
+
**Do NOT use `ls`, `cp`, or other shell commands to browse user files.** Always use `--list-media` to find inbound media.
|
|
559
|
+
|
|
560
|
+
## IMPORTANT KEYWORD RULE
|
|
561
|
+
|
|
562
|
+
- If the user message includes the word "photobooth" (case-insensitive), always use `--photobooth` mode with `--ref` set to the user-provided face image.
|
|
563
|
+
- Prioritize this rule over generic image-edit flows (`-c`) for that request.
|
|
564
|
+
|
|
565
|
+
## LTX-2.3 Prompt Rule
|
|
566
|
+
|
|
567
|
+
Whenever the chosen video model is `ltx23-22b-fp8_t2v_distilled`, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling `sogni-agent`.
|
|
568
|
+
|
|
569
|
+
- Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting.
|
|
570
|
+
- Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps.
|
|
571
|
+
- Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources.
|
|
572
|
+
- Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
|
|
573
|
+
- Give the scene one main action thread from start to finish. Use connectors like `as`, `while`, and `then` so motion reads as a continuous filmed moment.
|
|
574
|
+
- If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
|
|
575
|
+
- Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
|
|
576
|
+
- Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like `beautiful` or `nice`, or structural markup such as `[DIALOGUE]`.
|
|
577
|
+
- Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
|
|
578
|
+
- Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.
|
|
579
|
+
|
|
580
|
+
### Duration-Aware Pacing
|
|
581
|
+
|
|
582
|
+
Match scene density to clip length so prompts stay filmable:
|
|
583
|
+
|
|
584
|
+
- About `1-4s`: describe exactly 1 action or moment.
|
|
585
|
+
- About `5-8s`: describe about 2 sequential actions.
|
|
586
|
+
- About `9-12s`: describe about 3 sequential actions.
|
|
587
|
+
- Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.
|
|
588
|
+
|
|
589
|
+
### Orientation Mapping
|
|
590
|
+
|
|
591
|
+
When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions:
|
|
592
|
+
|
|
593
|
+
- `vertical`, `portrait`, `story`, `reel`, `tiktok` -> `-w 1088 -h 1920`
|
|
594
|
+
- `landscape`, `horizontal`, `widescreen`, `youtube`, `16:9` -> `-w 1920 -h 1088`
|
|
595
|
+
- `square`, `1:1` -> `-w 1088 -h 1088`
|
|
596
|
+
- `4:3 portrait` -> `-w 832 -h 1088`
|
|
597
|
+
- `4:3 landscape` -> `-w 1088 -h 832`
|
|
598
|
+
|
|
599
|
+
### Camera Language Normalization
|
|
600
|
+
|
|
601
|
+
When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt:
|
|
602
|
+
|
|
603
|
+
- `zoom in` -> `slow push-in`
|
|
604
|
+
- `zoom out` -> `slow pull-back`
|
|
605
|
+
- `pan left` / `pan right` -> `smooth pan left` / `smooth pan right`
|
|
606
|
+
- `orbit` / `circle around` -> `slow arc left` or `slow arc right`
|
|
607
|
+
- `follow` -> `tracking follow`
|
|
608
|
+
|
|
609
|
+
Short example:
|
|
610
|
+
|
|
611
|
+
```text
|
|
612
|
+
User ask: "4k video of a woman in a neon alley"
|
|
613
|
+
|
|
614
|
+
Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
## Agent Usage
|
|
618
|
+
|
|
619
|
+
When user asks to generate/draw/create an image:
|
|
620
|
+
|
|
621
|
+
```bash
|
|
622
|
+
# Generate and save locally (use -Q for quality presets instead of memorizing model IDs)
|
|
623
|
+
node {{skillDir}}/sogni-agent.mjs -q -Q fast -o /tmp/generated.png "user's prompt"
|
|
624
|
+
node {{skillDir}}/sogni-agent.mjs -q -Q pro -o /tmp/generated.png "user's prompt"
|
|
625
|
+
|
|
626
|
+
# Generate with prompt variations (diverse images in one call)
|
|
627
|
+
node {{skillDir}}/sogni-agent.mjs -q -n 3 -o /tmp/cars.png "a {red|blue|green} sports car"
|
|
628
|
+
|
|
629
|
+
# Edit an existing image
|
|
630
|
+
node {{skillDir}}/sogni-agent.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"
|
|
631
|
+
|
|
632
|
+
# Generate video from image
|
|
633
|
+
node {{skillDir}}/sogni-agent.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
|
|
634
|
+
|
|
635
|
+
# Generate text-to-video
|
|
636
|
+
node {{skillDir}}/sogni-agent.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
|
|
637
|
+
|
|
638
|
+
# HD / "4K" text-to-video: prefer LTX-2.3
|
|
639
|
+
node {{skillDir}}/sogni-agent.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
|
|
640
|
+
|
|
641
|
+
# HD / "4K" image-to-video: prefer LTX i2v
|
|
642
|
+
node {{skillDir}}/sogni-agent.mjs -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
|
|
643
|
+
|
|
644
|
+
# Photobooth: stylize a face photo
|
|
645
|
+
node {{skillDir}}/sogni-agent.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
|
|
646
|
+
|
|
647
|
+
# Token auto-fallback (tries SPARK first, retries with SOGNI on insufficient balance)
|
|
648
|
+
node {{skillDir}}/sogni-agent.mjs -q --token-type auto -o /tmp/generated.png "user's prompt"
|
|
649
|
+
|
|
650
|
+
# Check current SPARK/SOGNI balances (no prompt required)
|
|
651
|
+
node {{skillDir}}/sogni-agent.mjs --json --balance
|
|
652
|
+
|
|
653
|
+
# Find user-sent images/audio
|
|
654
|
+
node {{skillDir}}/sogni-agent.mjs --json --list-media images
|
|
655
|
+
|
|
656
|
+
# Then send via message tool with filePath
|
|
657
|
+
```
|
|
658
|
+
|
|
659
|
+
### Quality Presets
|
|
660
|
+
|
|
661
|
+
Use `-Q` / `--quality` instead of memorizing model IDs:
|
|
662
|
+
|
|
663
|
+
| Preset | Model | Steps | Size | Speed |
|
|
664
|
+
|--------|-------|-------|------|-------|
|
|
665
|
+
| `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
|
|
666
|
+
| `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
|
|
667
|
+
| `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
|
|
668
|
+
|
|
669
|
+
Explicit `-m` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions. When the user asks for "high quality", "best quality", or "pro", use `-Q pro`. For quick drafts or previews, use `-Q fast`.
|
|
670
|
+
|
|
671
|
+
### Dynamic Prompt Variations
|
|
672
|
+
|
|
673
|
+
When the user wants multiple variations (different colors, styles, subjects), use `{option1|option2|option3}` syntax with `-n`:
|
|
674
|
+
|
|
675
|
+
```bash
|
|
676
|
+
# 3 color variations
|
|
677
|
+
node {{skillDir}}/sogni-agent.mjs -q -n 3 "a {red|blue|green} sports car"
|
|
678
|
+
|
|
679
|
+
# 4 style variations
|
|
680
|
+
node {{skillDir}}/sogni-agent.mjs -q -n 4 "a portrait in {oil painting|watercolor|pencil sketch|pop art} style"
|
|
681
|
+
```
|
|
682
|
+
|
|
683
|
+
Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt.
|
|
684
|
+
|
|
685
|
+
### Token Auto-Fallback
|
|
686
|
+
|
|
687
|
+
Use `--token-type auto` when the user's SPARK balance might be low. It tries SPARK first (free daily tokens) and automatically retries with SOGNI if insufficient.
|
|
688
|
+
|
|
689
|
+
## High-Res Video Routing
|
|
690
|
+
|
|
691
|
+
When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or **"high-res"**, do not use the default WAN video models.
|
|
692
|
+
|
|
693
|
+
- For **text-to-video**, use `-m ltx23-22b-fp8_t2v_distilled`.
|
|
694
|
+
- For **image-to-video**, use `-m ltx23-22b-fp8_i2v_distilled`.
|
|
695
|
+
- Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
|
|
696
|
+
- For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
|
|
697
|
+
- If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
|
|
698
|
+
- Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
|
|
699
|
+
- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even if the exact output is not literal 3840x2160.
|
|
700
|
+
|
|
701
|
+
**Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--concat-videos`, `--list-media`) for all file operations and video manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
|
|
702
|
+
|
|
703
|
+
## Animate Between Two Images (First-Frame / Last-Frame)
|
|
704
|
+
|
|
705
|
+
When a user asks to **animate between two images**, use `--ref` (first frame) and `--ref-end` (last frame) to create a creative interpolation video:
|
|
706
|
+
|
|
707
|
+
```bash
|
|
708
|
+
# Animate from image A to image B
|
|
709
|
+
node {{skillDir}}/sogni-agent.mjs -q --video --ref /tmp/imageA.png --ref-end /tmp/imageB.png -o /tmp/transition.mp4 "descriptive prompt of the transition"
|
|
710
|
+
```
|
|
711
|
+
|
|
712
|
+
### Animate a Video to an Image (Scene Continuation)
|
|
713
|
+
|
|
714
|
+
When a user asks to **animate from a video to an image** (or "continue" a video into a new scene):
|
|
715
|
+
|
|
716
|
+
1. **Extract the last frame** of the existing video using the built-in safe wrapper:
|
|
717
|
+
```bash
|
|
718
|
+
node {{skillDir}}/sogni-agent.mjs --extract-last-frame /tmp/existing.mp4 /tmp/lastframe.png
|
|
719
|
+
```
|
|
720
|
+
2. **Generate a new video** using the last frame as `--ref` and the target image as `--ref-end`:
|
|
721
|
+
```bash
|
|
722
|
+
node {{skillDir}}/sogni-agent.mjs -q --video --ref /tmp/lastframe.png --ref-end /tmp/target.png -o /tmp/continuation.mp4 "scene transition prompt"
|
|
723
|
+
```
|
|
724
|
+
3. **Concatenate the videos** using the built-in safe wrapper:
|
|
725
|
+
```bash
|
|
726
|
+
node {{skillDir}}/sogni-agent.mjs --concat-videos /tmp/full_sequence.mp4 /tmp/existing.mp4 /tmp/continuation.mp4
|
|
727
|
+
```
|
|
728
|
+
|
|
729
|
+
This ensures visual continuity — the new clip picks up exactly where the previous one ended.
|
|
730
|
+
|
|
731
|
+
When the final stitched output needs a single external soundtrack, add `--concat-audio /path/to/audio.mp3` and optional `--concat-audio-start <sec>` to the same `--concat-videos` command. This is the local-agent advantage over browser-only workflows: generate clips with Sogni, then use the safe FFmpeg wrapper to stitch and mux audio locally.
|
|
732
|
+
|
|
733
|
+
**Do NOT run raw `ffmpeg` commands.** Always use `--extract-last-frame` and `--concat-videos` for video manipulation.
|
|
734
|
+
|
|
735
|
+
**Always apply this pattern when:**
|
|
736
|
+
- User says "animate image A to image B" → use `--ref A --ref-end B`
|
|
737
|
+
- User says "animate this video to this image" → extract last frame, use as `--ref`, target image as `--ref-end`, then stitch
|
|
738
|
+
- User says "continue this video" with a target image → same as above
|
|
739
|
+
|
|
740
|
+
## JSON Output
|
|
741
|
+
|
|
742
|
+
```json
|
|
743
|
+
{
|
|
744
|
+
"success": true,
|
|
745
|
+
"prompt": "a cat wearing a hat",
|
|
746
|
+
"model": "z_image_turbo_bf16",
|
|
747
|
+
"width": 512,
|
|
748
|
+
"height": 512,
|
|
749
|
+
"urls": ["https://..."],
|
|
750
|
+
"localPath": "/tmp/cat.png"
|
|
751
|
+
}
|
|
752
|
+
```
|
|
753
|
+
|
|
754
|
+
On error (with `--json`), the script returns a single JSON object like:
|
|
755
|
+
|
|
756
|
+
```json
|
|
757
|
+
{
|
|
758
|
+
"success": false,
|
|
759
|
+
"error": "Reference image 2314x1200 would resize to 512x266, but both dimensions must be divisible by 16.",
|
|
760
|
+
"errorCode": "INVALID_VIDEO_SIZE",
|
|
761
|
+
"hint": "Try: --width 1296 --height 672 (or omit --strict-size)"
|
|
762
|
+
}
|
|
763
|
+
```
|
|
764
|
+
|
|
765
|
+
Balance check example (`--json --balance`):
|
|
766
|
+
|
|
767
|
+
```json
|
|
768
|
+
{
|
|
769
|
+
"success": true,
|
|
770
|
+
"type": "balance",
|
|
771
|
+
"spark": 12.34,
|
|
772
|
+
"sogni": 0.56
|
|
773
|
+
}
|
|
774
|
+
```
|
|
775
|
+
|
|
776
|
+
## Cost
|
|
777
|
+
|
|
778
|
+
Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens when SPARK is insufficient.
|
|
779
|
+
|
|
780
|
+
## Persona System
|
|
781
|
+
|
|
782
|
+
Personas are named people with saved reference photos and optional voice clips. They enable identity-preserving generation across sessions.
|
|
783
|
+
|
|
784
|
+
### Managing Personas
|
|
785
|
+
|
|
786
|
+
```bash
|
|
787
|
+
# Add a persona with a reference photo
|
|
788
|
+
node {{skillDir}}/sogni-agent.mjs --persona-add "Mark" --ref face.jpg --relationship self --description "30s male, brown hair, brown eyes"
|
|
789
|
+
|
|
790
|
+
# Add with voice clip for video voice cloning
|
|
791
|
+
node {{skillDir}}/sogni-agent.mjs --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip sarah-voice.webm --voice "warm alto with British accent"
|
|
792
|
+
|
|
793
|
+
# List all personas
|
|
794
|
+
node {{skillDir}}/sogni-agent.mjs --persona-list --json
|
|
795
|
+
|
|
796
|
+
# Resolve a persona by name, tag, or pronoun
|
|
797
|
+
node {{skillDir}}/sogni-agent.mjs --persona-resolve "me" --json
|
|
798
|
+
|
|
799
|
+
# Generate using a persona (auto-injects photo as context)
|
|
800
|
+
node {{skillDir}}/sogni-agent.mjs --persona "Mark" -o /tmp/hero.png "superhero in dramatic lighting"
|
|
801
|
+
|
|
802
|
+
# Remove a persona
|
|
803
|
+
node {{skillDir}}/sogni-agent.mjs --persona-remove "Mark"
|
|
804
|
+
```
|
|
805
|
+
|
|
806
|
+
### Persona Pipeline Rules
|
|
807
|
+
|
|
808
|
+
When a user mentions a persona (by name, tag, or pronoun):
|
|
809
|
+
|
|
810
|
+
1. **For images:** Use `--persona "Name" "prompt"` which auto-injects the persona's reference photo as context and selects the Qwen editing model
|
|
811
|
+
2. **For video with voice cloning:** The persona's voice clip is used as `--reference-audio-identity` when `--video` is combined with `--persona`
|
|
812
|
+
3. **For video without voice clip:** Describe the voice in the prompt ("speaks in a warm alto with a British accent")
|
|
813
|
+
|
|
814
|
+
**Pronoun matching:**
|
|
815
|
+
- "me" / "myself" / "I" → persona with `relationship: self`
|
|
816
|
+
- "my wife" / "my husband" / "my partner" → persona with `relationship: partner`
|
|
817
|
+
- "my son" / "my daughter" / "my kid" → persona with `relationship: child`
|
|
818
|
+
- "my dog" / "my cat" / "my pet" → persona with `relationship: pet`
|
|
819
|
+
|
|
820
|
+
**Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by name or pronoun. For ad-hoc photos, use `-c` (context image) directly.
|
|
821
|
+
|
|
822
|
+
## Memory System
|
|
823
|
+
|
|
824
|
+
Memories are persistent key-value preferences stored locally at `~/.config/sogni/memories.json`.
|
|
825
|
+
|
|
826
|
+
```bash
|
|
827
|
+
# Save a preference
|
|
828
|
+
node {{skillDir}}/sogni-agent.mjs --memory-set preferred_style "watercolor and soft lighting"
|
|
829
|
+
node {{skillDir}}/sogni-agent.mjs --memory-set aspect_ratio "16:9"
|
|
830
|
+
node {{skillDir}}/sogni-agent.mjs --memory-set favorite_artist "Studio Ghibli"
|
|
831
|
+
|
|
832
|
+
# Read all memories
|
|
833
|
+
node {{skillDir}}/sogni-agent.mjs --memory-list --json
|
|
834
|
+
|
|
835
|
+
# Get one memory
|
|
836
|
+
node {{skillDir}}/sogni-agent.mjs --memory-get preferred_style --json
|
|
837
|
+
|
|
838
|
+
# Delete a memory
|
|
839
|
+
node {{skillDir}}/sogni-agent.mjs --memory-remove preferred_style
|
|
840
|
+
```
|
|
841
|
+
|
|
842
|
+
**Agent behavior:** Before generating, check memories with `--memory-list` and respect saved preferences. If the user says "I always want watercolor style", save it with `--memory-set`. Categories: `preference` (default), `fact`, `context`.
|
|
843
|
+
|
|
844
|
+
## Personality (Custom Agent Instructions)
|
|
845
|
+
|
|
846
|
+
Users can set custom instructions that shape agent behavior, stored at `~/.config/sogni/personality.txt`.
|
|
847
|
+
|
|
848
|
+
```bash
|
|
849
|
+
# Set personality
|
|
850
|
+
node {{skillDir}}/sogni-agent.mjs --personality-set "Be concise, always use cinematic lighting, suggest bold creative ideas"
|
|
851
|
+
|
|
852
|
+
# Read current personality
|
|
853
|
+
node {{skillDir}}/sogni-agent.mjs --personality-get --json
|
|
854
|
+
|
|
855
|
+
# Clear (reset to default)
|
|
856
|
+
node {{skillDir}}/sogni-agent.mjs --personality-clear
|
|
857
|
+
```
|
|
858
|
+
|
|
859
|
+
**Agent behavior:** Check personality on startup and adopt those instructions. Personality overrides default style but not hard constraints (safety, tool usage rules).
|
|
860
|
+
|
|
861
|
+
## Style Transfer
|
|
862
|
+
|
|
863
|
+
Apply artistic styles to existing images:
|
|
864
|
+
|
|
865
|
+
```bash
|
|
866
|
+
# Apply a named artist style
|
|
867
|
+
node {{skillDir}}/sogni-agent.mjs -c photo.jpg -o /tmp/styled.png "Apply style: Andy Warhol pop art with bold primary colors"
|
|
868
|
+
|
|
869
|
+
# Studio Ghibli transformation
|
|
870
|
+
node {{skillDir}}/sogni-agent.mjs -c photo.jpg -o /tmp/ghibli.png "Apply style: Studio Ghibli watercolor with soft pastel sky and lush greenery"
|
|
871
|
+
|
|
872
|
+
# For photos with people, always preserve identity
|
|
873
|
+
node {{skillDir}}/sogni-agent.mjs -c portrait.jpg -o /tmp/styled.png "Apply style: oil painting in the style of Vermeer. Preserve all facial features, expressions, and identity."
|
|
874
|
+
```
|
|
875
|
+
|
|
876
|
+
**Tips:** Reference artists and styles BY NAME for best results. Use positive phrasing. For photos with people, always append identity preservation instructions.
|
|
877
|
+
|
|
878
|
+
## Change Angle (Novel View Synthesis)
|
|
879
|
+
|
|
880
|
+
Generate a photo from a different camera angle:
|
|
881
|
+
|
|
882
|
+
```bash
|
|
883
|
+
# 3/4 view
|
|
884
|
+
node {{skillDir}}/sogni-agent.mjs --multi-angle -c subject.jpg --azimuth front-right "same subject"
|
|
885
|
+
|
|
886
|
+
# Side view
|
|
887
|
+
node {{skillDir}}/sogni-agent.mjs --multi-angle -c subject.jpg --azimuth left --elevation eye-level --distance medium "same subject"
|
|
888
|
+
|
|
889
|
+
# Full 360 turntable
|
|
890
|
+
node {{skillDir}}/sogni-agent.mjs --angles-360 -c subject.jpg "same subject"
|
|
891
|
+
```
|
|
892
|
+
|
|
893
|
+
**User term mapping:**
|
|
894
|
+
- "from the left" / "side view" → `--azimuth left`
|
|
895
|
+
- "3/4 view" / "three-quarter" → `--azimuth front-right`
|
|
896
|
+
- "from behind" / "back" → `--azimuth back`
|
|
897
|
+
- "looking up at" → `--elevation low-angle`
|
|
898
|
+
- "bird's eye" / "top-down" → `--elevation high-angle`
|
|
899
|
+
- "closeup" → `--distance close-up`
|
|
900
|
+
|
|
901
|
+
## Creative Workflow Patterns
|
|
902
|
+
|
|
903
|
+
### After Image Generation — Suggest Next Steps:
|
|
904
|
+
- "Animate into a video" → `--video --ref <result>`
|
|
905
|
+
- "Apply a different style" → `-c <result> "Apply style: ..."`
|
|
906
|
+
- "Change the angle" → `--multi-angle -c <result>`
|
|
907
|
+
- "Generate variations" → `-n 3 "{style1|style2|style3}"`
|
|
908
|
+
- "Refine at higher quality" → use `refine_result` MCP tool or `-Q pro`
|
|
909
|
+
|
|
910
|
+
### After Video Generation — Suggest Next Steps:
|
|
911
|
+
- "Try different motion" → re-generate with adjusted prompt
|
|
912
|
+
- "Add dialogue" → include spoken words in the LTX-2.3 prompt
|
|
913
|
+
- "Make it longer" → increase `--duration`
|
|
914
|
+
- "Combine videos" → `--concat-videos`
|
|
915
|
+
- "Add one soundtrack over stitched clips" → `--concat-videos ... --concat-audio <audio>`
|
|
916
|
+
- "Use a section of a source video/audio" → `--video-start`, `--audio-start`, and `--audio-duration`
|
|
917
|
+
|
|
918
|
+
### Music-to-Video Pipeline:
|
|
919
|
+
1. Use the provided/generated audio file as `--ref-audio`
|
|
920
|
+
2. If there is also a reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `ia2v`
|
|
921
|
+
3. If there is no reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `a2v`
|
|
922
|
+
4. Use `--workflow s2v` only for explicit face lip-sync with a face image
|
|
923
|
+
5. If only part of the song/audio should drive the clip, pass `--audio-start <sec>` and optionally `--audio-duration <sec>`
|
|
924
|
+
|
|
925
|
+
### Multi-Persona Scene:
|
|
926
|
+
1. Resolve all personas: `--persona-resolve "Mark" --json` and `--persona-resolve "Sarah" --json`
|
|
927
|
+
2. Generate scene with both: `-c mark-photo.jpg -c sarah-photo.jpg "Mark and Sarah at a cafe, use face from picture 1 for Mark, face from picture 2 for Sarah"`
|
|
928
|
+
3. Animate with one persona's voice identity: `--video --ref <scene.png> --reference-audio-identity <mark-voice.webm> "MARK: \"Exact spoken words.\""`
|
|
929
|
+
|
|
930
|
+
## Troubleshooting
|
|
931
|
+
|
|
932
|
+
- **Auth errors**: Check `SOGNI_API_KEY` or the credentials in `~/.config/sogni/credentials`
|
|
933
|
+
- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and LTX 2.3 supports up to 3840px. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
|
|
934
|
+
- **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
|
|
935
|
+
- **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
|
|
936
|
+
- **Timeouts**: Try a faster model or increase `-t` timeout
|
|
937
|
+
- **No workers**: Check https://sogni.ai for network status
|