@sogni-ai/sogni-creative-agent-skill 2.1.2 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +368 -166
- package/SKILL.md +152 -29
- package/generated/creative-agent-runtime.mjs +3759 -27
- package/llm.txt +36 -13
- package/openclaw.plugin.json +36 -4
- package/package.json +5 -3
- package/sogni-agent.mjs +1750 -106
- package/version.mjs +1 -1
package/SKILL.md
CHANGED
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: sogni-creative-agent-skill
|
|
3
|
-
|
|
4
|
-
description: Sogni Creative Agent Skill: agent skill and CLI for image and video generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to "draw", "generate", "create an image", "make a video/animate", "apply a style", or "generate me as a superhero".
|
|
5
|
-
homepage: https://sogni.ai
|
|
3
|
+
description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
|
|
6
4
|
metadata:
|
|
5
|
+
version: "2.2.0"
|
|
6
|
+
homepage: https://sogni.ai
|
|
7
7
|
clawdbot:
|
|
8
8
|
emoji: "🎨"
|
|
9
9
|
primaryEnv: "SOGNI_API_KEY"
|
|
@@ -13,8 +13,6 @@ metadata:
|
|
|
13
13
|
anyBins: ["ffmpeg"]
|
|
14
14
|
env:
|
|
15
15
|
- "SOGNI_API_KEY"
|
|
16
|
-
- "SOGNI_USERNAME"
|
|
17
|
-
- "SOGNI_PASSWORD"
|
|
18
16
|
- "SOGNI_CREDENTIALS_PATH"
|
|
19
17
|
- "SOGNI_LAST_RENDER_PATH"
|
|
20
18
|
- "SOGNI_MEDIA_INBOUND_DIR"
|
|
@@ -34,9 +32,11 @@ metadata:
|
|
|
34
32
|
label: "Prepare runtime dependencies"
|
|
35
33
|
---
|
|
36
34
|
|
|
37
|
-
# Sogni Image &
|
|
35
|
+
# Sogni Image, Video & Music Generation
|
|
38
36
|
|
|
39
|
-
Generate **images and
|
|
37
|
+
Generate **images, videos, and music** using Sogni AI's decentralized GPU network.
|
|
38
|
+
|
|
39
|
+
> **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index — one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
|
|
40
40
|
|
|
41
41
|
## Install Request Policy
|
|
42
42
|
|
|
@@ -45,32 +45,48 @@ When a user asks to install this plugin, skill, or Sogni Creative Agent Skill, i
|
|
|
45
45
|
Default install path:
|
|
46
46
|
|
|
47
47
|
```bash
|
|
48
|
-
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
48
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
49
49
|
sogni-agent --version
|
|
50
50
|
```
|
|
51
51
|
|
|
52
52
|
Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI.
|
|
53
53
|
|
|
54
|
+
For upgrades, prefer package-manager updates or direct operations on an existing checkout. Do not generate clone-or-pull shell bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs; agent command scanners may require approval for those patterns.
|
|
55
|
+
|
|
56
|
+
Agent-safe CLI upgrade:
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
60
|
+
sogni-agent --version
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Agent-safe update for an existing local checkout:
|
|
64
|
+
|
|
65
|
+
```bash
|
|
66
|
+
DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
|
|
67
|
+
git -C "$DEST" pull --ff-only
|
|
68
|
+
npm --prefix "$DEST" install
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
If that checkout does not exist, prefer the npm-based local skill install below, or ask before cloning.
|
|
72
|
+
|
|
54
73
|
## Setup
|
|
55
74
|
|
|
56
|
-
1. **Get Sogni
|
|
57
|
-
2. **Create credentials file:**
|
|
75
|
+
1. **Get your Sogni API key** by logging into https://dashboard.sogni.ai and clicking your username.
|
|
76
|
+
2. **Create an API key credentials file:**
|
|
58
77
|
```bash
|
|
59
78
|
mkdir -p ~/.config/sogni
|
|
60
79
|
cat > ~/.config/sogni/credentials << 'EOF'
|
|
61
80
|
SOGNI_API_KEY=your_api_key
|
|
62
|
-
# or:
|
|
63
|
-
# SOGNI_USERNAME=your_username
|
|
64
|
-
# SOGNI_PASSWORD=your_password
|
|
65
81
|
EOF
|
|
66
82
|
chmod 600 ~/.config/sogni/credentials
|
|
67
83
|
```
|
|
68
84
|
|
|
69
|
-
You can also export `SOGNI_API_KEY
|
|
85
|
+
You can also export `SOGNI_API_KEY` instead of writing the file. The API key can always be found by logging into https://dashboard.sogni.ai and clicking your username.
|
|
70
86
|
|
|
71
87
|
3. **Install the CLI and skill by default:**
|
|
72
88
|
```bash
|
|
73
|
-
npm install -g @sogni-ai/sogni-creative-agent-skill
|
|
89
|
+
npm install -g @sogni-ai/sogni-creative-agent-skill@latest
|
|
74
90
|
sogni-agent --version
|
|
75
91
|
```
|
|
76
92
|
|
|
@@ -96,7 +112,7 @@ When this skill is distributed via ClawHub, it bootstraps its local runtime depe
|
|
|
96
112
|
|
|
97
113
|
Default file paths used by this skill:
|
|
98
114
|
|
|
99
|
-
-
|
|
115
|
+
- API key credentials file (read): `~/.config/sogni/credentials`
|
|
100
116
|
- Last render metadata (read/write): `~/.config/sogni/last-render.json`
|
|
101
117
|
- OpenClaw config (read): `~/.openclaw/openclaw.json`
|
|
102
118
|
- Media listing for `--list-media` (read): `~/.clawdbot/media/inbound`
|
|
@@ -108,7 +124,7 @@ Path override environment variables:
|
|
|
108
124
|
- `SOGNI_MEDIA_INBOUND_DIR`
|
|
109
125
|
- `OPENCLAW_CONFIG_PATH`
|
|
110
126
|
|
|
111
|
-
## Usage (Images &
|
|
127
|
+
## Usage (Images, Video & Music)
|
|
112
128
|
|
|
113
129
|
```bash
|
|
114
130
|
# Generate and get URL
|
|
@@ -140,8 +156,53 @@ node sogni-agent.mjs --json --balance
|
|
|
140
156
|
|
|
141
157
|
# Quiet mode (suppress progress)
|
|
142
158
|
node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
159
|
+
|
|
160
|
+
# Direct music/audio generation
|
|
161
|
+
node sogni-agent.mjs --music --duration 30 \
|
|
162
|
+
"uplifting cinematic synthwave theme for a product launch"
|
|
163
|
+
|
|
164
|
+
# Song with lyrics and musical controls
|
|
165
|
+
node sogni-agent.mjs --music --lyrics "Rise with the morning light" --bpm 128 \
|
|
166
|
+
--keyscale "C major" --output-format mp3 "bright indie pop chorus"
|
|
167
|
+
|
|
168
|
+
# Hosted API chat: natural-language rich creative-agent tool execution
|
|
169
|
+
node sogni-agent.mjs --api-chat "Create a 4-shot product video concept for a red sneaker"
|
|
170
|
+
|
|
171
|
+
# Durable API workflow: async image-to-video with resumable workflow record
|
|
172
|
+
node sogni-agent.mjs --api-workflow image-to-video \
|
|
173
|
+
--video-prompt "The camera slowly pushes in as the sketch comes alive" \
|
|
174
|
+
"A graphite robot sketch on a drafting table"
|
|
175
|
+
|
|
176
|
+
# Durable storyboard-video workflow: storyline -> GPT Image 2 storyboard -> Seedance
|
|
177
|
+
node sogni-agent.mjs --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
|
|
178
|
+
"Create a 9:16 bakery launch video with a neon street-window reveal"
|
|
143
179
|
```
|
|
144
180
|
|
|
181
|
+
Use `--api-chat` for text-first natural-language workflows that should go through
|
|
182
|
+
Sogni API's OpenAI-compatible `/v1/chat/completions` tool loop. Use
|
|
183
|
+
`--api-workflow` when the caller already knows it wants an async durable workflow
|
|
184
|
+
record under `/v1/creative-agent/workflows`. Use `--api-workflow storyboard-video`
|
|
185
|
+
when the caller wants the hosted sequence to generate a storyline, create one GPT
|
|
186
|
+
Image 2 storyboard sheet, and feed that image artifact into Seedance as the video
|
|
187
|
+
reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low|medium|high
|
|
188
|
+
quality for the storyboard sheet. Uploaded-media execution still
|
|
189
|
+
belongs on the direct CLI path (`-c`, `--ref`, `--ref-audio`, `--ref-video`)
|
|
190
|
+
until the hosted rich API and durable workflow endpoint support uploaded
|
|
191
|
+
negative-index media references through CLI media flags.
|
|
192
|
+
Hosted API modes require `SOGNI_API_KEY`; username/password credentials are only
|
|
193
|
+
for the direct client-wrapper path.
|
|
194
|
+
|
|
195
|
+
When changing hosted API chat/workflow behavior, keep reusable validation,
|
|
196
|
+
workflow compilation, repair-control, and guard telemetry logic in
|
|
197
|
+
`../sogni-creative-agent` first. The public skill should consume generated or
|
|
198
|
+
copied shared contracts instead of adding skill-local regex guards. Media-routing
|
|
199
|
+
decisions should come from typed planner/runtime contracts such as
|
|
200
|
+
`CreativeTurnPlannerFields`, `classifyMediaTurnIntent()`, `videoContinuation`,
|
|
201
|
+
`videoModification`, `outputGrouping`, `imageSelectionPolicy`, and
|
|
202
|
+
`pendingStitchAfterBatch`; regex is appropriate only for bounded CLI/fact
|
|
203
|
+
extraction such as paths, URLs, extensions, dimensions, durations, and explicit
|
|
204
|
+
positions.
|
|
205
|
+
|
|
145
206
|
## Options
|
|
146
207
|
|
|
147
208
|
| Flag | Description | Default |
|
|
@@ -166,7 +227,7 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
|
166
227
|
| `--angle-description <text>` | Optional subject description | - |
|
|
167
228
|
| `--steps <num>` | Override steps (model-dependent) | - |
|
|
168
229
|
| `--guidance <num>` | Override guidance (model-dependent) | - |
|
|
169
|
-
| `--output-format <f>` | Image output format: png\|jpg | png |
|
|
230
|
+
| `--output-format <f>` | Image output format: png\|jpg, or webp for gpt-image-2 | png |
|
|
170
231
|
| `--sampler <name>` | Sampler (model-dependent) | - |
|
|
171
232
|
| `--scheduler <name>` | Scheduler (model-dependent) | - |
|
|
172
233
|
| `--lora <id>` | LoRA id (repeatable, edit only) | - |
|
|
@@ -177,10 +238,22 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
|
177
238
|
| `--balance, --balances` | Show SPARK/SOGNI balances and exit | - |
|
|
178
239
|
| `-c, --context <path>` | Context image for editing | - |
|
|
179
240
|
| `--last-image` | Use last generated image as context/ref | - |
|
|
241
|
+
| `--music` | Generate music/audio instead of image | - |
|
|
242
|
+
| `--music-model <id>` | Music model: turbo\|sft\|ace_step_1.5_turbo\|ace_step_1.5_sft | ace_step_1.5_turbo |
|
|
243
|
+
| `--lyrics <text>` | Optional lyrics for song generation | - |
|
|
244
|
+
| `--language <code>` | Lyrics language code | en |
|
|
245
|
+
| `--bpm <num>` | Music tempo, 30-300 BPM | server default |
|
|
246
|
+
| `--keyscale <text>` | Music key/scale, e.g. C major | - |
|
|
247
|
+
| `--timesig <n>` | Time signature: 2\|3\|4\|6 | server default |
|
|
248
|
+
| `--composer-mode`, `--no-composer-mode` | Toggle AI composer mode | server default |
|
|
249
|
+
| `--prompt-strength <n>` | Music prompt adherence, 0-10 | server default |
|
|
250
|
+
| `--creativity <n>` | Music variation/temperature, 0-2 | server default |
|
|
251
|
+
| `--music-shift <n>` | Audio model shift parameter, 1-6 | 3 |
|
|
252
|
+
| `--audio-format <f>` | Alias for music output format: mp3\|flac\|wav | mp3 |
|
|
180
253
|
| `--video, -v` | Generate video instead of image | - |
|
|
181
254
|
| `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
|
|
182
255
|
| `--fps <num>` | Frames per second (video) | model default |
|
|
183
|
-
| `--duration <sec>` | Duration in seconds (video) | 5 |
|
|
256
|
+
| `--duration <sec>` | Duration in seconds (video or music) | video 5, music 30 |
|
|
184
257
|
| `--frames <num>` | Override total frames (video) | - |
|
|
185
258
|
| `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
|
|
186
259
|
| `--auto-resize-assets` | Auto-resize video assets | true |
|
|
@@ -213,6 +286,21 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
|
213
286
|
| `--concat-audio <path>` | Optional audio track to mux over `--concat-videos` output | - |
|
|
214
287
|
| `--concat-audio-start <sec>` | Start offset into `--concat-audio` | - |
|
|
215
288
|
| `--list-media [type]` | List recent inbound media (images\|audio\|all) | images |
|
|
289
|
+
| `--api-chat` | Call `/v1/chat/completions` with rich creative-agent tool injection | - |
|
|
290
|
+
| `--api-tools <mode>` | API tool mode: creative-agent\|rich\|hosted\|none | creative-agent |
|
|
291
|
+
| `--no-api-tool-execution` | Plan/tool-call via API chat without executing Sogni tools | - |
|
|
292
|
+
| `--llm-model <id>` | LLM model for `--api-chat` | qwen3.6-35b-a3b-gguf-iq4xs |
|
|
293
|
+
| `--api-workflow <kind>` | Start durable workflow: image-to-video\|hosted-tool-sequence\|storyboard-video | - |
|
|
294
|
+
| `--workflow-input <json\|path\|@path>` | Workflow input JSON for hosted tool sequences/custom starts | - |
|
|
295
|
+
| `--workflow-title <text>` | Title for hosted-tool-sequence workflow input | - |
|
|
296
|
+
| `--storyboard-frames <n>` | Beat count for storyboard-video workflow | - |
|
|
297
|
+
| `--video-prompt <text>` | Motion prompt for durable image-to-video workflow | - |
|
|
298
|
+
| `--negative-prompt <text>` | Negative prompt for durable image-to-video workflow | - |
|
|
299
|
+
| `--generate-audio`, `--no-generate-audio` | Toggle audio generation for durable image-to-video | - |
|
|
300
|
+
| `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for durable image-to-video | - |
|
|
301
|
+
| `--watch-workflow` | Stream durable workflow events after start | - |
|
|
302
|
+
| `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Durable workflow management helpers | - |
|
|
303
|
+
| `--api-base-url <url>` | Sogni API base for hosted API modes. Credentials are only sent to `https://api.sogni.ai` by default; use `SOGNI_API_ALLOWED_HOSTS` for trusted custom hosts or `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1` for isolated local testing. | https://api.sogni.ai |
|
|
216
304
|
| `--no-filter` | Disable NSFW content filter | - |
|
|
217
305
|
| `--memory-set <key> <value>` | Save a user preference | - |
|
|
218
306
|
| `--memory-get <key>` | Get a specific memory | - |
|
|
@@ -245,6 +333,7 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
|
|
|
245
333
|
"defaultImageModel": "z_image_turbo_bf16",
|
|
246
334
|
"defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
|
|
247
335
|
"defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
|
|
336
|
+
"defaultMusicModel": "ace_step_1.5_turbo",
|
|
248
337
|
"videoModels": {
|
|
249
338
|
"t2v": "ltx23-22b-fp8_t2v_distilled",
|
|
250
339
|
"i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
|
|
@@ -258,6 +347,9 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
|
|
|
258
347
|
"defaultVideoWorkflow": "t2v",
|
|
259
348
|
"defaultNetwork": "fast",
|
|
260
349
|
"defaultTokenType": "spark",
|
|
350
|
+
"apiBaseUrl": "https://api.sogni.ai",
|
|
351
|
+
"defaultLlmModel": "qwen3.6-35b-a3b-gguf-iq4xs",
|
|
352
|
+
"defaultApiToolMode": "creative-agent",
|
|
261
353
|
"seedStrategy": "prompt-hash",
|
|
262
354
|
"modelDefaults": {
|
|
263
355
|
"flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
|
|
@@ -270,6 +362,8 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
|
|
|
270
362
|
"defaultDurationSec": 5,
|
|
271
363
|
"defaultImageTimeoutSec": 30,
|
|
272
364
|
"defaultVideoTimeoutSec": 300,
|
|
365
|
+
"defaultMusicDurationSec": 30,
|
|
366
|
+
"defaultMusicTimeoutSec": 600,
|
|
273
367
|
"credentialsPath": "~/.config/sogni/credentials",
|
|
274
368
|
"lastRenderPath": "~/.config/sogni/last-render.json",
|
|
275
369
|
"mediaInboundDir": "~/.clawdbot/media/inbound"
|
|
@@ -289,6 +383,7 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
|
|
|
289
383
|
| Model | Speed | Use Case |
|
|
290
384
|
|-------|-------|----------|
|
|
291
385
|
| `z_image_turbo_bf16` | Fast (~5-10s) | General purpose, default |
|
|
386
|
+
| `gpt-image-2` | Variable | OpenAI GPT Image 2 text-to-image and edit, strong prompt and text rendering |
|
|
292
387
|
| `flux1-schnell-fp8` | Very fast | Quick iterations |
|
|
293
388
|
| `flux2_dev_fp8` | Slow (~2min) | High quality |
|
|
294
389
|
| `chroma-v.46-flash_fp8` | Medium | Balanced |
|
|
@@ -296,9 +391,23 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
|
|
|
296
391
|
| `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
|
|
297
392
|
| `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
|
|
298
393
|
|
|
394
|
+
`gpt-image-2` supports flexible OpenAI image sizes up to `3840px` on either edge, max `3:1` aspect ratio, and total pixels from `655,360` through `8,294,400`; the API snaps dimensions to valid multiples of 16.
|
|
395
|
+
|
|
396
|
+
## Music Models
|
|
397
|
+
|
|
398
|
+
| Model | Use Case |
|
|
399
|
+
|-------|----------|
|
|
400
|
+
| `ace_step_1.5_turbo` | Default direct music generation model |
|
|
401
|
+
| `ace_step_1.5_sft` | Experimental option with stronger lyric handling |
|
|
402
|
+
|
|
403
|
+
Use `--music` for direct audio-only generation. Defaults are 30 seconds, `mp3`,
|
|
404
|
+
`ace_step_1.5_turbo`, 8 steps, `euler` sampler, and `simple` scheduler. Keep
|
|
405
|
+
`--audio` for video reference audio (`--ref-audio` alias); do not use it for
|
|
406
|
+
direct music generation.
|
|
407
|
+
|
|
299
408
|
## Video Models
|
|
300
409
|
|
|
301
|
-
###
|
|
410
|
+
### Current Video Model Selectors
|
|
302
411
|
|
|
303
412
|
| Model | Speed | Use Case |
|
|
304
413
|
|-------|-------|----------|
|
|
@@ -307,10 +416,10 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
|
|
|
307
416
|
| `ltx23-22b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
|
|
308
417
|
| `ltx23-22b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
|
|
309
418
|
| `ltx23-22b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
|
|
310
|
-
| `seedance2` | Variable | Seedance 2.0 text-to-video
|
|
311
|
-
| `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video
|
|
312
|
-
| `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video
|
|
313
|
-
| `seedance2-v2v` | Variable | Seedance 2.0 video-to-video
|
|
419
|
+
| `seedance2` | Variable | Seedance 2.0 text-to-video, 4-15s, native audio |
|
|
420
|
+
| `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video |
|
|
421
|
+
| `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video |
|
|
422
|
+
| `seedance2-v2v` | Variable | Seedance 2.0 video-to-video, no ControlNet |
|
|
314
423
|
| `wan_v2.2-14b-fp8_i2v_lightx2v` | Fast | Simple image-to-video |
|
|
315
424
|
| `wan_v2.2-14b-fp8_i2v` | Slow | Higher quality video |
|
|
316
425
|
| `wan_v2.2-14b-fp8_t2v_lightx2v` | Fast | Text-to-video |
|
|
@@ -333,7 +442,7 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
|
|
|
333
442
|
|
|
334
443
|
## Image Editing with Context
|
|
335
444
|
|
|
336
|
-
Edit images using reference images
|
|
445
|
+
Edit images using reference images. Qwen models support up to 3 context images; GPT Image 2 edit supports up to 16 when selected with `-m gpt-image-2`:
|
|
337
446
|
|
|
338
447
|
```bash
|
|
339
448
|
# Single context image
|
|
@@ -342,11 +451,14 @@ node sogni-agent.mjs -c photo.jpg "make the background a beach"
|
|
|
342
451
|
# Multiple context images (subject + style)
|
|
343
452
|
node sogni-agent.mjs -c subject.jpg -c style.jpg "apply the style to the subject"
|
|
344
453
|
|
|
454
|
+
# GPT Image 2 multi-reference edit
|
|
455
|
+
node sogni-agent.mjs -m gpt-image-2 -c subject.jpg -c outfit.jpg -c room.jpg "place the subject in the room wearing the outfit"
|
|
456
|
+
|
|
345
457
|
# Use last generated image as context
|
|
346
458
|
node sogni-agent.mjs --last-image "make it more vibrant"
|
|
347
459
|
```
|
|
348
460
|
|
|
349
|
-
When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`.
|
|
461
|
+
When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`. Select `-m gpt-image-2` for GPT Image 2's higher reference-image limit and OpenAI-backed image editing.
|
|
350
462
|
|
|
351
463
|
## Photobooth (Face Transfer)
|
|
352
464
|
|
|
@@ -458,7 +570,11 @@ node sogni-agent.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out sl
|
|
|
458
570
|
node sogni-agent.mjs --video --target-resolution 768 \
|
|
459
571
|
"A calm cinematic shot of lanterns drifting across a night lake"
|
|
460
572
|
|
|
461
|
-
#
|
|
573
|
+
# Natural-language aspect and resolution inference
|
|
574
|
+
node sogni-agent.mjs --video \
|
|
575
|
+
"Make a 720p 9:16 video of ocean waves at sunset"
|
|
576
|
+
|
|
577
|
+
# Seedance 2.0 text-to-video
|
|
462
578
|
node sogni-agent.mjs --video -m seedance2 --duration 8 \
|
|
463
579
|
"A polished product reveal with native ambient sound"
|
|
464
580
|
|
|
@@ -485,6 +601,9 @@ node sogni-agent.mjs --video --ref-audio song.mp3 \
|
|
|
485
601
|
node sogni-agent.mjs --video --reference-audio-identity voice.webm \
|
|
486
602
|
"NARRATOR: \"This is my voice.\""
|
|
487
603
|
|
|
604
|
+
# Prefer .webm, .m4a, or .mp3 voice clips. Local .wav clips are normalized
|
|
605
|
+
# to .m4a before upload when ffmpeg is available.
|
|
606
|
+
|
|
488
607
|
# LTX-2.3 text-to-video
|
|
489
608
|
node sogni-agent.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
|
|
490
609
|
"A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
|
|
@@ -518,7 +637,7 @@ node sogni-agent.mjs --video --workflow v2v --ref-video scene.mp4 \
|
|
|
518
637
|
```
|
|
519
638
|
|
|
520
639
|
ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
|
|
521
|
-
Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context;
|
|
640
|
+
Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context when they pass the CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
|
|
522
641
|
|
|
523
642
|
```bash
|
|
524
643
|
# Seedance V2V without ControlNet
|
|
@@ -631,6 +750,9 @@ node {{skillDir}}/sogni-agent.mjs -q --video --ref /path/to/image.png -o /tmp/vi
|
|
|
631
750
|
# Generate text-to-video
|
|
632
751
|
node {{skillDir}}/sogni-agent.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
|
|
633
752
|
|
|
753
|
+
# Generate direct music/audio
|
|
754
|
+
node {{skillDir}}/sogni-agent.mjs -q --music --duration 30 -o /tmp/music.mp3 "uplifting cinematic synthwave theme for a product launch"
|
|
755
|
+
|
|
634
756
|
# HD / "4K" text-to-video: prefer LTX-2.3
|
|
635
757
|
node {{skillDir}}/sogni-agent.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
|
|
636
758
|
|
|
@@ -690,6 +812,7 @@ When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or *
|
|
|
690
812
|
- For **image-to-video**, use `-m ltx23-22b-fp8_i2v_distilled`.
|
|
691
813
|
- Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
|
|
692
814
|
- For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
|
|
815
|
+
- When the prompt combines a named resolution with an aspect ratio, such as "720p 9:16", let the CLI infer both instead of forcing manual `-w`/`-h` unless the user gave exact pixels.
|
|
693
816
|
- If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
|
|
694
817
|
- Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
|
|
695
818
|
- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
|
|
@@ -925,7 +1048,7 @@ node {{skillDir}}/sogni-agent.mjs --angles-360 -c subject.jpg "same subject"
|
|
|
925
1048
|
|
|
926
1049
|
## Troubleshooting
|
|
927
1050
|
|
|
928
|
-
- **Auth errors**: Check `SOGNI_API_KEY` or the
|
|
1051
|
+
- **Auth errors**: Check `SOGNI_API_KEY` or the API key in `~/.config/sogni/credentials`
|
|
929
1052
|
- **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
|
|
930
1053
|
- **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
|
|
931
1054
|
- **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
|