@sogni-ai/sogni-creative-agent-skill 2.1.3 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md CHANGED
@@ -1,9 +1,9 @@
1
1
  ---
2
2
  name: sogni-creative-agent-skill
3
- version: "2.1.3"
4
- description: Sogni Creative Agent Skill: agent skill and CLI for image and video generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to "draw", "generate", "create an image", "make a video/animate", "apply a style", or "generate me as a superhero".
5
- homepage: https://sogni.ai
3
+ description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
6
4
  metadata:
5
+ version: "2.3.0"
6
+ homepage: https://sogni.ai
7
7
  clawdbot:
8
8
  emoji: "🎨"
9
9
  primaryEnv: "SOGNI_API_KEY"
@@ -13,8 +13,6 @@ metadata:
13
13
  anyBins: ["ffmpeg"]
14
14
  env:
15
15
  - "SOGNI_API_KEY"
16
- - "SOGNI_USERNAME"
17
- - "SOGNI_PASSWORD"
18
16
  - "SOGNI_CREDENTIALS_PATH"
19
17
  - "SOGNI_LAST_RENDER_PATH"
20
18
  - "SOGNI_MEDIA_INBOUND_DIR"
@@ -34,9 +32,11 @@ metadata:
34
32
  label: "Prepare runtime dependencies"
35
33
  ---
36
34
 
37
- # Sogni Image & Video Generation
35
+ # Sogni Image, Video & Music Generation
38
36
 
39
- Generate **images and videos** using Sogni AI's decentralized GPU network.
37
+ Generate **images, videos, and music** using Sogni AI's decentralized GPU network.
38
+
39
+ > **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index — one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
40
40
 
41
41
  ## Install Request Policy
42
42
 
@@ -72,20 +72,17 @@ If that checkout does not exist, prefer the npm-based local skill install below,
72
72
 
73
73
  ## Setup
74
74
 
75
- 1. **Get Sogni credentials** at https://app.sogni.ai/
76
- 2. **Create credentials file:**
75
+ 1. **Get your Sogni API key** by logging into https://dashboard.sogni.ai and opening the account menu.
76
+ 2. **Create an API key credentials file:**
77
77
  ```bash
78
78
  mkdir -p ~/.config/sogni
79
79
  cat > ~/.config/sogni/credentials << 'EOF'
80
80
  SOGNI_API_KEY=your_api_key
81
- # or:
82
- # SOGNI_USERNAME=your_username
83
- # SOGNI_PASSWORD=your_password
84
81
  EOF
85
82
  chmod 600 ~/.config/sogni/credentials
86
83
  ```
87
84
 
88
- You can also export `SOGNI_API_KEY`, or `SOGNI_USERNAME` + `SOGNI_PASSWORD`, instead of writing the file.
85
+ You can also export `SOGNI_API_KEY` instead of writing the file. The API key can always be found by logging into https://dashboard.sogni.ai and opening the account menu.
89
86
 
90
87
  3. **Install the CLI and skill by default:**
91
88
  ```bash
@@ -115,7 +112,7 @@ When this skill is distributed via ClawHub, it bootstraps its local runtime depe
115
112
 
116
113
  Default file paths used by this skill:
117
114
 
118
- - Credentials file (read): `~/.config/sogni/credentials`
115
+ - API key credentials file (read): `~/.config/sogni/credentials`
119
116
  - Last render metadata (read/write): `~/.config/sogni/last-render.json`
120
117
  - OpenClaw config (read): `~/.openclaw/openclaw.json`
121
118
  - Media listing for `--list-media` (read): `~/.clawdbot/media/inbound`
@@ -127,7 +124,7 @@ Path override environment variables:
127
124
  - `SOGNI_MEDIA_INBOUND_DIR`
128
125
  - `OPENCLAW_CONFIG_PATH`
129
126
 
130
- ## Usage (Images & Video)
127
+ ## Usage (Images, Video & Music)
131
128
 
132
129
  ```bash
133
130
  # Generate and get URL
@@ -159,8 +156,94 @@ node sogni-agent.mjs --json --balance
159
156
 
160
157
  # Quiet mode (suppress progress)
161
158
  node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
159
+
160
+ # Direct music/audio generation
161
+ node sogni-agent.mjs --music --duration 30 \
162
+ "uplifting cinematic synthwave theme for a product launch"
163
+
164
+ # Song with lyrics and musical controls
165
+ node sogni-agent.mjs --music --lyrics "Rise with the morning light" --bpm 128 \
166
+ --keyscale "C major" --output-format mp3 "bright indie pop chorus"
167
+
168
+ # Hosted API chat: natural-language creative-agent tool execution
169
+ node sogni-agent.mjs --api-chat "Create a 4-shot product video concept for a red sneaker"
170
+
171
+ # Hosted API chat with image vision and media-reference metadata
172
+ node sogni-agent.mjs --api-chat --ref product.jpg \
173
+ "Turn this into a launch poster and describe the edit plan"
174
+
175
+ # Sogni Intelligence model/replay utilities
176
+ node sogni-agent.mjs --list-api-models
177
+ node sogni-agent.mjs --api-chat --task-profile reasoning --no-thinking \
178
+ "Plan a concise multi-step product launch workflow"
179
+ node sogni-agent.mjs --list-replays 20
180
+ node sogni-agent.mjs --get-replay run_abc123 --json
181
+
182
+ # Durable API workflow: async image-to-video with resumable workflow record
183
+ node sogni-agent.mjs --api-workflow image-to-video \
184
+ --video-prompt "The camera slowly pushes in as the sketch comes alive" \
185
+ "A graphite robot sketch on a drafting table"
186
+
187
+ # Durable API workflow with media reference and cost controls
188
+ node sogni-agent.mjs --api-workflow image-to-video \
189
+ --ref https://cdn.example.com/sketch.png \
190
+ --workflow-max-cost 25 --confirm-cost \
191
+ --video-prompt "The camera slowly pushes in as the sketch comes alive" \
192
+ "Animate the referenced sketch"
193
+
194
+ # Shared CreativeWorkflowPlan: API compiles and validates through @sogni/creative-agent
195
+ node sogni-agent.mjs --api-workflow creative-plan --workflow-input @plan.json
196
+
197
+ # Durable storyboard-video workflow: storyline -> GPT Image 2 storyboard -> Seedance
198
+ node sogni-agent.mjs --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
199
+ "Create a 9:16 bakery launch video with a neon street-window reveal"
162
200
  ```
163
201
 
202
+ Use `--api-chat` for text-first natural-language workflows that should go through
203
+ Sogni API's OpenAI-compatible `/v1/chat/completions` tool loop. This path
204
+ sanitizes prompt-injection markers before forwarding messages and uses the
205
+ current hosted creative-agent tool surface. Use `--api-workflow` when the caller
206
+ already knows it wants an async durable workflow record under
207
+ `/v1/creative-agent/workflows`. Use `--api-workflow creative-plan` when the
208
+ caller already has a shared `CreativeWorkflowPlan`; the skill forwards it as
209
+ `kind: "creative_plan"` and lets Sogni API compile, validate, and persist it
210
+ through `@sogni/creative-agent`. This is the preferred hosted path for exact
211
+ multi-step plans, including repeated `replace_video_segment` operations with
212
+ `replacementStartSeconds` / `replacementEndSeconds` when interleaving existing
213
+ video slices. Use `--api-workflow storyboard-video`
214
+ when the caller wants the hosted sequence to generate a storyline, create one GPT
215
+ Image 2 storyboard sheet, and feed that image artifact into Seedance as the video
216
+ reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low|medium|high
217
+ quality for the storyboard sheet. Hosted API requests forward media references
218
+ from `-c`, `--ref`, `--ref-end`, `--ref-audio`,
219
+ `--reference-audio-identity`, and `--ref-video` as `media_references`
220
+ metadata; workflow JSON can bind them into step arguments with
221
+ `sourceStepId: "$input_media"`, and API chat also attaches image refs as vision
222
+ inputs. Local file references are uploaded to Sogni media storage first, then
223
+ forwarded as retrievable URLs for hosted chat and durable workflows. Use the
224
+ direct CLI path for private media that must not leave the local machine.
225
+ Use `--workflow-max-cost <n>` plus `--confirm-cost` / `--no-confirm-cost` to
226
+ forward explicit workflow cost policy.
227
+ Sogni Intelligence utilities are exposed through the same API key path:
228
+ `--list-api-models` / `--get-api-model <id>` read `/v1/models`,
229
+ `--task-profile`, `--max-tokens`, and `--thinking` / `--no-thinking` tune
230
+ `/v1/chat/completions`, and `--list-replays`, `--get-replay`, and
231
+ `--ingest-replay` manage `/v1/replay/records` RunRecords for replay/debug
232
+ viewers.
233
+ Hosted API modes require `SOGNI_API_KEY`; this skill's CLI uses API-key
234
+ authentication.
235
+
236
+ When changing hosted API chat/workflow behavior, keep reusable validation,
237
+ workflow compilation, repair-control, and guard telemetry logic in
238
+ `../sogni-creative-agent` first. The public skill should consume generated or
239
+ copied shared contracts instead of adding skill-local regex guards. Media-routing
240
+ decisions should come from typed planner/runtime contracts such as
241
+ `CreativeTurnPlannerFields`, `classifyMediaTurnIntent()`, `videoContinuation`,
242
+ `videoModification`, `outputGrouping`, `imageSelectionPolicy`, and
243
+ `pendingStitchAfterBatch`; regex is appropriate only for bounded CLI/fact
244
+ extraction such as paths, URLs, extensions, dimensions, durations, and explicit
245
+ positions.
246
+
164
247
  ## Options
165
248
 
166
249
  | Flag | Description | Default |
@@ -185,7 +268,7 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
185
268
  | `--angle-description <text>` | Optional subject description | - |
186
269
  | `--steps <num>` | Override steps (model-dependent) | - |
187
270
  | `--guidance <num>` | Override guidance (model-dependent) | - |
188
- | `--output-format <f>` | Image output format: png\|jpg | png |
271
+ | `--output-format <f>` | Image output format: png\|jpg, or webp for gpt-image-2 | png |
189
272
  | `--sampler <name>` | Sampler (model-dependent) | - |
190
273
  | `--scheduler <name>` | Scheduler (model-dependent) | - |
191
274
  | `--lora <id>` | LoRA id (repeatable, edit only) | - |
@@ -196,10 +279,22 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
196
279
  | `--balance, --balances` | Show SPARK/SOGNI balances and exit | - |
197
280
  | `-c, --context <path>` | Context image for editing | - |
198
281
  | `--last-image` | Use last generated image as context/ref | - |
282
+ | `--music` | Generate music/audio instead of image | - |
283
+ | `--music-model <id>` | Music model: turbo\|sft\|ace_step_1.5_turbo\|ace_step_1.5_sft | ace_step_1.5_turbo |
284
+ | `--lyrics <text>` | Optional lyrics for song generation | - |
285
+ | `--language <code>` | Lyrics language code | en |
286
+ | `--bpm <num>` | Music tempo, 30-300 BPM | server default |
287
+ | `--keyscale <text>` | Music key/scale, e.g. C major | - |
288
+ | `--timesig <n>` | Time signature: 2\|3\|4\|6 | server default |
289
+ | `--composer-mode`, `--no-composer-mode` | Toggle AI composer mode | server default |
290
+ | `--prompt-strength <n>` | Music prompt adherence, 0-10 | server default |
291
+ | `--creativity <n>` | Music variation/temperature, 0-2 | server default |
292
+ | `--music-shift <n>` | Audio model shift parameter, 1-6 | 3 |
293
+ | `--audio-format <f>` | Alias for music output format: mp3\|flac\|wav | mp3 |
199
294
  | `--video, -v` | Generate video instead of image | - |
200
295
  | `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
201
296
  | `--fps <num>` | Frames per second (video) | model default |
202
- | `--duration <sec>` | Duration in seconds (video) | 5 |
297
+ | `--duration <sec>` | Duration in seconds (video or music) | video 5, music 30 |
203
298
  | `--frames <num>` | Override total frames (video) | - |
204
299
  | `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
205
300
  | `--auto-resize-assets` | Auto-resize video assets | true |
@@ -232,6 +327,28 @@ node sogni-agent.mjs -q -o /tmp/cat.png "a cat wearing a hat"
232
327
  | `--concat-audio <path>` | Optional audio track to mux over `--concat-videos` output | - |
233
328
  | `--concat-audio-start <sec>` | Start offset into `--concat-audio` | - |
234
329
  | `--list-media [type]` | List recent inbound media (images\|audio\|all) | images |
330
+ | `--api-chat` | Call `/v1/chat/completions` with Sogni creative-agent tool injection | - |
331
+ | `--api-tools <mode>` | API tool mode: creative-agent\|creative-tools\|none | creative-agent |
332
+ | `--no-api-tool-execution` | Plan/tool-call via API chat without executing Sogni tools | - |
333
+ | `--llm-model <id>` | LLM model for `--api-chat` | qwen3.6-35b-a3b-gguf-iq4xs |
334
+ | `--task-profile <profile>` | Sogni Intelligence task profile: general\|coding\|reasoning | - |
335
+ | `--max-tokens <n>` | Max hosted chat completion tokens | 1600 |
336
+ | `--thinking`, `--no-thinking` | Toggle `chat_template_kwargs.enable_thinking` for hosted chat | server default |
337
+ | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM model metadata | - |
338
+ | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|path\|@path>` | Manage Sogni Intelligence replay RunRecords | - |
339
+ | `--api-workflow <kind>` | Start durable workflow: image-to-video\|hosted-tool-sequence\|creative-plan\|storyboard-video | - |
340
+ | `--workflow-input <json\|path\|@path>` | Workflow input JSON for hosted tool sequences/custom starts | - |
341
+ | `--workflow-title <text>` | Title for hosted-tool-sequence, creative-plan, or storyboard-video workflow input | - |
342
+ | `--workflow-max-cost <n>` | Reject hosted workflow starts above this estimated capacity-unit ceiling | - |
343
+ | `--confirm-cost`, `--no-confirm-cost` | Forward explicit hosted workflow cost confirmation | - |
344
+ | `--storyboard-frames <n>` | Beat count for storyboard-video workflow | - |
345
+ | `--video-prompt <text>` | Motion prompt for durable image-to-video workflow | - |
346
+ | `--negative-prompt <text>` | Negative prompt for durable image-to-video workflow | - |
347
+ | `--generate-audio`, `--no-generate-audio` | Toggle audio generation for durable image-to-video | - |
348
+ | `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for durable image-to-video | - |
349
+ | `--watch-workflow` | Stream durable workflow events after start | - |
350
+ | `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Durable workflow management helpers | - |
351
+ | `--api-base-url <url>` | Sogni API base for hosted API modes. Credentials are only sent to `https://api.sogni.ai` by default; use `SOGNI_API_ALLOWED_HOSTS` for trusted custom hosts or `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1` for isolated local testing. | https://api.sogni.ai |
235
352
  | `--no-filter` | Disable NSFW content filter | - |
236
353
  | `--memory-set <key> <value>` | Save a user preference | - |
237
354
  | `--memory-get <key>` | Get a specific memory | - |
@@ -264,6 +381,7 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
264
381
  "defaultImageModel": "z_image_turbo_bf16",
265
382
  "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
266
383
  "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
384
+ "defaultMusicModel": "ace_step_1.5_turbo",
267
385
  "videoModels": {
268
386
  "t2v": "ltx23-22b-fp8_t2v_distilled",
269
387
  "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
@@ -277,6 +395,14 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
277
395
  "defaultVideoWorkflow": "t2v",
278
396
  "defaultNetwork": "fast",
279
397
  "defaultTokenType": "spark",
398
+ "apiBaseUrl": "https://api.sogni.ai",
399
+ "defaultLlmModel": "qwen3.6-35b-a3b-gguf-iq4xs",
400
+ "defaultTaskProfile": "general",
401
+ "defaultApiMaxTokens": 1600,
402
+ "defaultApiThinking": false,
403
+ "defaultApiToolMode": "creative-agent",
404
+ "defaultWorkflowMaxCost": 25,
405
+ "defaultWorkflowConfirmCost": false,
280
406
  "seedStrategy": "prompt-hash",
281
407
  "modelDefaults": {
282
408
  "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
@@ -289,6 +415,8 @@ When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defau
289
415
  "defaultDurationSec": 5,
290
416
  "defaultImageTimeoutSec": 30,
291
417
  "defaultVideoTimeoutSec": 300,
418
+ "defaultMusicDurationSec": 30,
419
+ "defaultMusicTimeoutSec": 600,
292
420
  "credentialsPath": "~/.config/sogni/credentials",
293
421
  "lastRenderPath": "~/.config/sogni/last-render.json",
294
422
  "mediaInboundDir": "~/.clawdbot/media/inbound"
@@ -308,6 +436,7 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
308
436
  | Model | Speed | Use Case |
309
437
  |-------|-------|----------|
310
438
  | `z_image_turbo_bf16` | Fast (~5-10s) | General purpose, default |
439
+ | `gpt-image-2` | Variable | OpenAI GPT Image 2 text-to-image and edit, strong prompt and text rendering |
311
440
  | `flux1-schnell-fp8` | Very fast | Quick iterations |
312
441
  | `flux2_dev_fp8` | Slow (~2min) | High quality |
313
442
  | `chroma-v.46-flash_fp8` | Medium | Balanced |
@@ -315,9 +444,23 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
315
444
  | `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
316
445
  | `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
317
446
 
447
+ `gpt-image-2` supports flexible OpenAI image sizes up to `3840px` on either edge, max `3:1` aspect ratio, and total pixels from `655,360` through `8,294,400`; the API snaps dimensions to valid multiples of 16.
448
+
449
+ ## Music Models
450
+
451
+ | Model | Use Case |
452
+ |-------|----------|
453
+ | `ace_step_1.5_turbo` | Default direct music generation model |
454
+ | `ace_step_1.5_sft` | Experimental option with stronger lyric handling |
455
+
456
+ Use `--music` for direct audio-only generation. Defaults are 30 seconds, `mp3`,
457
+ `ace_step_1.5_turbo`, 8 steps, `euler` sampler, and `simple` scheduler. Keep
458
+ `--audio` for video reference audio (`--ref-audio` alias); do not use it for
459
+ direct music generation.
460
+
318
461
  ## Video Models
319
462
 
320
- ### WAN 2.2 Models
463
+ ### Current Video Model Selectors
321
464
 
322
465
  | Model | Speed | Use Case |
323
466
  |-------|-------|----------|
@@ -326,10 +469,10 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
326
469
  | `ltx23-22b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
327
470
  | `ltx23-22b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
328
471
  | `ltx23-22b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
329
- | `seedance2` | Variable | Seedance 2.0 text-to-video alias, 4-15s, native audio |
330
- | `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video alias |
331
- | `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video alias |
332
- | `seedance2-v2v` | Variable | Seedance 2.0 video-to-video alias, no ControlNet |
472
+ | `seedance2` | Variable | Seedance 2.0 text-to-video, 4-15s, native audio |
473
+ | `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video |
474
+ | `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video |
475
+ | `seedance2-v2v` | Variable | Seedance 2.0 video-to-video, no ControlNet |
333
476
  | `wan_v2.2-14b-fp8_i2v_lightx2v` | Fast | Simple image-to-video |
334
477
  | `wan_v2.2-14b-fp8_i2v` | Slow | Higher quality video |
335
478
  | `wan_v2.2-14b-fp8_t2v_lightx2v` | Fast | Text-to-video |
@@ -352,7 +495,7 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
352
495
 
353
496
  ## Image Editing with Context
354
497
 
355
- Edit images using reference images (Qwen models support up to 3):
498
+ Edit images using reference images. Qwen models support up to 3 context images; GPT Image 2 edit supports up to 16 when selected with `-m gpt-image-2`:
356
499
 
357
500
  ```bash
358
501
  # Single context image
@@ -361,11 +504,14 @@ node sogni-agent.mjs -c photo.jpg "make the background a beach"
361
504
  # Multiple context images (subject + style)
362
505
  node sogni-agent.mjs -c subject.jpg -c style.jpg "apply the style to the subject"
363
506
 
507
+ # GPT Image 2 multi-reference edit
508
+ node sogni-agent.mjs -m gpt-image-2 -c subject.jpg -c outfit.jpg -c room.jpg "place the subject in the room wearing the outfit"
509
+
364
510
  # Use last generated image as context
365
511
  node sogni-agent.mjs --last-image "make it more vibrant"
366
512
  ```
367
513
 
368
- When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`.
514
+ When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`. Select `-m gpt-image-2` for GPT Image 2's higher reference-image limit and OpenAI-backed image editing.
369
515
 
370
516
  ## Photobooth (Face Transfer)
371
517
 
@@ -477,7 +623,11 @@ node sogni-agent.mjs --video --ref scene.png --duration 10 --fps 24 "zoom out sl
477
623
  node sogni-agent.mjs --video --target-resolution 768 \
478
624
  "A calm cinematic shot of lanterns drifting across a night lake"
479
625
 
480
- # Seedance 2.0 explicit text-to-video alias
626
+ # Natural-language aspect and resolution inference
627
+ node sogni-agent.mjs --video \
628
+ "Make a 720p 9:16 video of ocean waves at sunset"
629
+
630
+ # Seedance 2.0 text-to-video
481
631
  node sogni-agent.mjs --video -m seedance2 --duration 8 \
482
632
  "A polished product reveal with native ambient sound"
483
633
 
@@ -504,6 +654,9 @@ node sogni-agent.mjs --video --ref-audio song.mp3 \
504
654
  node sogni-agent.mjs --video --reference-audio-identity voice.webm \
505
655
  "NARRATOR: \"This is my voice.\""
506
656
 
657
+ # Prefer .webm, .m4a, or .mp3 voice clips. Local .wav clips are normalized
658
+ # to .m4a before upload when ffmpeg is available.
659
+
507
660
  # LTX-2.3 text-to-video
508
661
  node sogni-agent.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
509
662
  "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
@@ -537,7 +690,7 @@ node sogni-agent.mjs --video --workflow v2v --ref-video scene.mp4 \
537
690
  ```
538
691
 
539
692
  ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
540
- Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context; audio references must be paired with an image or video reference.
693
+ Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context when they pass the CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
541
694
 
542
695
  ```bash
543
696
  # Seedance V2V without ControlNet
@@ -650,6 +803,9 @@ node {{skillDir}}/sogni-agent.mjs -q --video --ref /path/to/image.png -o /tmp/vi
650
803
  # Generate text-to-video
651
804
  node {{skillDir}}/sogni-agent.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
652
805
 
806
+ # Generate direct music/audio
807
+ node {{skillDir}}/sogni-agent.mjs -q --music --duration 30 -o /tmp/music.mp3 "uplifting cinematic synthwave theme for a product launch"
808
+
653
809
  # HD / "4K" text-to-video: prefer LTX-2.3
654
810
  node {{skillDir}}/sogni-agent.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
655
811
 
@@ -709,6 +865,7 @@ When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or *
709
865
  - For **image-to-video**, use `-m ltx23-22b-fp8_i2v_distilled`.
710
866
  - Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
711
867
  - For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
868
+ - When the prompt combines a named resolution with an aspect ratio, such as "720p 9:16", let the CLI infer both instead of forcing manual `-w`/`-h` unless the user gave exact pixels.
712
869
  - If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
713
870
  - Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
714
871
  - Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
@@ -773,6 +930,9 @@ On error (with `--json`), the script returns a single JSON object like:
773
930
  "success": false,
774
931
  "error": "Reference image 2314x1200 would resize to 512x266, but both dimensions must be divisible by 16.",
775
932
  "errorCode": "INVALID_VIDEO_SIZE",
933
+ "errorType": "PARAMETER_INVALID",
934
+ "errorCategory": "schema_validation",
935
+ "retryable": false,
776
936
  "hint": "Try: --width 1296 --height 672 (or omit --strict-size)"
777
937
  }
778
938
  ```
@@ -944,7 +1104,7 @@ node {{skillDir}}/sogni-agent.mjs --angles-360 -c subject.jpg "same subject"
944
1104
 
945
1105
  ## Troubleshooting
946
1106
 
947
- - **Auth errors**: Check `SOGNI_API_KEY` or the credentials in `~/.config/sogni/credentials`
1107
+ - **Auth errors**: Check `SOGNI_API_KEY` or the API key in `~/.config/sogni/credentials`
948
1108
  - **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
949
1109
  - **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
950
1110
  - **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.