@sogni-ai/sogni-creative-agent-skill 3.4.0 โ†’ 3.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md CHANGED
@@ -2,75 +2,57 @@
2
2
  name: sogni-creative-agent-skill
3
3
  description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories, custom personality, style transfer, angle synthesis, Seedance/LTX/WAN video, music/lyrics, hosted chat, durable workflows, replay records, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
4
4
  metadata:
5
- version: "3.4.0"
5
+ version: "3.5.1"
6
6
  homepage: https://sogni.ai
7
- clawdbot:
7
+ openclaw:
8
8
  emoji: "๐ŸŽจ"
9
9
  primaryEnv: "SOGNI_API_KEY"
10
10
  os: ["darwin", "linux", "win32"]
11
+ # Only hard requirements belong here: OpenClaw marks the skill "missing"
12
+ # until every entry is satisfied. The API key comes from the credentials
13
+ # file (primaryEnv is the env-var alternative), and the SOGNI_*/OPENCLAW_*
14
+ # override variables are optional โ€” they are documented in the body under
15
+ # "Filesystem Paths and Overrides", not required for the skill to work.
11
16
  requires:
12
17
  bins: ["node"]
13
18
  anyBins: ["ffmpeg"]
14
- env:
15
- - "SOGNI_API_KEY"
16
- - "SOGNI_CREDENTIALS_PATH"
17
- - "SOGNI_LAST_RENDER_PATH"
18
- - "SOGNI_MEDIA_INBOUND_DIR"
19
- - "OPENCLAW_CONFIG_PATH"
20
- - "OPENCLAW_PLUGIN_CONFIG"
21
- - "FFMPEG_PATH"
22
- config:
23
- - "~/.config/sogni/credentials"
24
- - "~/.openclaw/openclaw.json"
25
- - "~/.clawdbot/media/inbound"
26
- - "~/.config/sogni/last-render.json"
27
- - "~/Downloads/sogni"
28
19
  install:
29
20
  - id: npm
30
21
  kind: exec
31
- command: "cd {{skillDir}} && cp skill-package.json package.json && npm i"
22
+ command: "cd {{skillDir}} && ([ -f package.json ] || cp skill-package.json package.json) && npm i"
32
23
  label: "Prepare runtime dependencies"
33
24
  ---
34
25
 
35
26
  # Sogni Image, Video & Music Generation
36
27
 
37
- Generate **images, videos, and music** using Sogni AI's decentralized GPU network.
28
+ Generate **images, videos, and music** using Sogni AI's decentralized GPU network through the `sogni-agent` CLI.
38
29
 
39
- > **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index โ€” one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, `composition_planning`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
30
+ > **Deep-dive references:** this file holds the rules you must always follow plus the everyday commands. Detailed guides live in [`references/`](./references/) โ€” read the matching file *before* acting on those tasks (table at the end of this file). If the `references/` directory is not present in your install, run `sogni-agent --help` for the full flag reference or fetch the guides from `https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/references/`.
31
+ >
32
+ > **Per-skill view:** hosts that load focused capabilities rather than one artifact can read [`skills/README.md`](./skills/README.md) for the per-skill index of the hosted tool surface.
40
33
 
41
34
  ## Install Request Policy
42
35
 
43
- When a user asks to install this plugin, skill, or Sogni Creative Agent Skill, install it as the command-line tool plus this skill.
44
-
45
- Default install path:
36
+ When a user asks to install this plugin or skill, install the command-line tool plus this skill:
46
37
 
47
38
  ```bash
48
39
  npm install -g @sogni-ai/sogni-creative-agent-skill@latest
49
40
  sogni-agent --version
50
41
  ```
51
42
 
52
- Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI.
53
-
54
- Always invoke the globally installed `sogni-agent` command. Do not call `node {{skillDir}}/sogni-agent.mjs` or `node sogni-agent.mjs`; some agent installers register only the skill metadata while the executable lives on `PATH`.
55
-
56
- For upgrades, prefer package-manager updates or direct operations on an existing checkout. Do not generate clone-or-pull shell bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs; agent command scanners may require approval for those patterns.
43
+ Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI. The one-command alternative `npx setup-sogni-agent-skill` auto-detects Claude Code, Codex CLI, and Hermes (it does not configure OpenClaw).
57
44
 
58
- Agent-safe CLI upgrade:
45
+ After any install or upgrade, verify with:
59
46
 
60
47
  ```bash
61
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
62
- sogni-agent --version
48
+ sogni-agent doctor
63
49
  ```
64
50
 
65
- Agent-safe update for an existing local checkout:
51
+ Agents should run `sogni-agent doctor --json` and confirm `"success": true` before reporting the install as working.
66
52
 
67
- ```bash
68
- DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
69
- git -C "$DEST" pull --ff-only
70
- npm --prefix "$DEST" install
71
- ```
53
+ Always invoke the globally installed `sogni-agent` command. Do not call `node {{skillDir}}/sogni-agent.mjs` or `node sogni-agent.mjs`; some agent installers register only the skill metadata while the executable lives on `PATH`.
72
54
 
73
- If that checkout does not exist, prefer the npm-based local skill install below, or ask before cloning.
55
+ For upgrades, prefer `sogni-agent self-update`, package-manager updates, or direct operations on an existing checkout (`git -C "$DEST" pull --ff-only && npm --prefix "$DEST" install`). Do not generate clone-or-pull shell bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs; agent command scanners may require approval for those patterns. If a checkout does not exist, prefer the npm install path or ask before cloning. When an update notice appears, offer the user the upgrade (`sogni-agent self-update`); if they decline, run `sogni-agent --snooze-update` so they are not re-nagged daily, and `sogni-agent --whats-new` after upgrading to summarize changes.
74
56
 
75
57
  ## Uninstall Request Policy
76
58
 
@@ -79,7 +61,8 @@ When a user asks to uninstall, run `npx setup-sogni-agent-skill --uninstall --re
79
61
  ## Setup
80
62
 
81
63
  1. **Get your Sogni API key** by logging into https://dashboard.sogni.ai and opening the account menu.
82
- 2. **Create an API key credentials file:**
64
+ 2. **Create the credentials file** (or just export `SOGNI_API_KEY`):
65
+
83
66
  ```bash
84
67
  mkdir -p ~/.config/sogni
85
68
  cat > ~/.config/sogni/credentials << 'EOF'
@@ -88,31 +71,9 @@ EOF
88
71
  chmod 600 ~/.config/sogni/credentials
89
72
  ```
90
73
 
91
- You can also export `SOGNI_API_KEY` instead of writing the file. The API key can always be found by logging into https://dashboard.sogni.ai and opening the account menu.
92
-
93
- 3. **Install the CLI and skill by default:**
94
- ```bash
95
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
96
- sogni-agent --version
97
- ```
98
-
99
- Configure the agent/runtime to use this `SKILL.md`.
100
-
101
- 4. **Install dependencies if working from a clone:**
102
- ```bash
103
- cd /path/to/sogni-creative-agent-skill
104
- npm i
105
- ```
106
-
107
- 5. **Or install from npm into a local skill directory (no git clone):**
108
- ```bash
109
- mkdir -p ~/.clawdbot/skills
110
- cd ~/.clawdbot/skills
111
- npm i @sogni-ai/sogni-creative-agent-skill
112
- ln -sfn node_modules/@sogni-ai/sogni-creative-agent-skill sogni-creative-agent-skill
113
- ```
74
+ 3. **Verify:** `sogni-agent doctor`
114
75
 
115
- When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from `skill-package.json` during install. That avoids relying on a root `package.json` being present in the published skill artifact.
76
+ When this skill is distributed via ClawHub, it bootstraps its runtime dependencies from `skill-package.json` during install (the install hook skips the copy when a real `package.json` is already present, so it never clobbers a git checkout).
116
77
 
117
78
  ## Output Path Convention
118
79
 
@@ -124,1029 +85,167 @@ sogni-agent -o cat.png "a cat wearing a hat" # โœ“ lands in PWD
124
85
  sogni-agent -o /tmp/cat.png "a cat wearing a hat" # โœ— avoid โ€” user can't easily find it
125
86
  ```
126
87
 
127
- `/tmp` (and `mkdtempSync(...)`) is reserved internally for transient intermediate files the CLI cleans up itself (audio re-encodes, intermediate clips during stitching). Final renders the user is asking for must remain inside their working directory unless they explicitly request a different location.
88
+ `/tmp` is reserved for transient intermediate files the CLI cleans up itself. Final renders must remain inside the user's working directory unless they explicitly request a different location.
128
89
 
129
90
  ## Filesystem Paths and Overrides
130
91
 
131
- Default file paths used by this skill:
132
-
133
- - API key credentials file (read): `~/.config/sogni/credentials`
134
- - Last render metadata (read/write): `~/.config/sogni/last-render.json`
135
- - OpenClaw config (read): `~/.openclaw/openclaw.json`
136
- - Media listing for `--list-media` (read): `~/.clawdbot/media/inbound`
92
+ - API key credentials file (read): `~/.config/sogni/credentials` (`SOGNI_CREDENTIALS_PATH`)
93
+ - Last render metadata (read/write): `~/.config/sogni/last-render.json` (`SOGNI_LAST_RENDER_PATH`)
94
+ - Memories / personality / personas (read/write): `~/.config/sogni/`
95
+ - OpenClaw config (read): `~/.openclaw/openclaw.json` (`OPENCLAW_CONFIG_PATH`)
96
+ - Media listing for `--list-media` (read): `~/.openclaw/media/inbound`, falling back to the legacy `~/.clawdbot/media/inbound` when only it exists (`SOGNI_MEDIA_INBOUND_DIR`)
97
+ - Custom ffmpeg binary: `FFMPEG_PATH`
137
98
 
138
- Path override environment variables:
99
+ ## Recommended path: hosted Sogni Intelligence endpoints
139
100
 
140
- - `SOGNI_CREDENTIALS_PATH`
141
- - `SOGNI_LAST_RENDER_PATH`
142
- - `SOGNI_MEDIA_INBOUND_DIR`
143
- - `OPENCLAW_CONFIG_PATH`
144
-
145
- ## Recommended path: route through the hosted Sogni Intelligence endpoints
146
-
147
- For any natural-language creative request โ€” anything that should be planned, multi-step, resumable, or that benefits from tool selection, repair, or durable workflows โ€” prefer the hosted Sogni Intelligence endpoints over the direct-to-SDK media flags. The hosted surfaces are the canonical home for OpenAI-compatible chat, server-side creative tool dispatch, Structured Contracts v1 (gating policies, repair recipes, prompt contracts), durable chat runs, durable workflows, workflow templates, replay, and asset-manifest mapping. They stay aligned with `sogni-chat`, `sogni-api`, and the rest of the `@sogni/creative-agent` consumers.
101
+ For any natural-language creative request that should be planned, multi-step, resumable, or benefit from server-side tool selection and repair, prefer the hosted endpoints over direct-to-SDK flags โ€” **read [`references/hosted-api.md`](./references/hosted-api.md) first** for the full contract (tool surfaces, durable workflows, templates, replays, Seedance reference modes, media-reference uploads, cost controls):
148
102
 
149
103
  ```bash
150
104
  # Natural-language creative request (LLM picks the tool, dispatches, repairs)
151
105
  sogni-agent --api-chat "Turn the attached product photo into a launch poster" --ref product.jpg
152
106
 
153
107
  # Durable hosted chat run (persisted event log + SSE stream)
154
- SOGNI_SKILL_USE_SDK_TRANSPORT=1 sogni-agent --durable-chat \
155
- "Create a four-shot launch campaign, generate the key art, and animate the hero clip"
156
-
157
- # Multi-step durable workflow (resumable, replay-friendly, server-orchestrated)
158
- sogni-agent --api-workflow \
159
- --video-prompt "The camera slowly pushes in" \
160
- "A graphite robot sketch on a drafting table"
161
-
162
- # Storyboard โ†’ keyframe โ†’ Seedance, all server-side
163
- sogni-agent --api-workflow storyboard-video --storyboard-frames 6 -Q hq \
164
- "Create a 9:16 bakery launch video with a neon street-window reveal"
165
- ```
166
-
167
- The direct-to-SDK flags below remain available for explicit one-shot generation when you already know the exact model, dimensions, and prompt and don't need LLM planning. Use them when latency or cost rules out the LLM round-trip.
168
-
169
- ## Usage (direct-to-SDK image, video & music)
170
-
171
- ```bash
172
- # Generate and get URL
173
- sogni-agent "a cat wearing a hat"
174
-
175
- # Quality presets (recommended for direct mode โ€” auto-selects model, steps, and size)
176
- sogni-agent -Q fast "a cat wearing a hat" # z_image_turbo, 8 steps, 512x512 (~5-10s)
177
- sogni-agent -Q hq "a cat wearing a hat" # z_image_turbo, default steps, 768x768 (~10-15s)
178
- sogni-agent -Q pro "a cat wearing a hat" # flux2_dev, 40 steps, 1024x1024 (~2min)
179
-
180
- # Dynamic prompt variations โ€” diverse images in one call
181
- sogni-agent -n 3 "a {red|blue|green} sports car"
182
- # โ†’ generates "a red sports car", "a blue sports car", "a green sports car"
183
-
184
- # Prompt-only video takes from the same source image
185
- sogni-agent --video --ref hero.png -n 3 --duration 5 \
186
- "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
187
-
188
- # Token auto-fallback for native Sogni models (tries SPARK, falls back to SOGNI)
189
- sogni-agent --token-type auto "a cat wearing a hat"
190
-
191
- # Save to file (relative paths land in the current working directory)
192
- sogni-agent -o ./cat.png "a cat wearing a hat"
193
-
194
- # JSON output (for scripting)
195
- sogni-agent --json "a cat wearing a hat"
196
-
197
- # Check token balances (no prompt required)
198
- sogni-agent --balance
199
-
200
- # Check token balances in JSON
201
- sogni-agent --json --balance
202
-
203
- # Quiet mode (suppress progress)
204
- sogni-agent -q -o ./cat.png "a cat wearing a hat"
205
-
206
- # Direct music/audio generation
207
- sogni-agent --music --duration 30 \
208
- "uplifting cinematic synthwave theme for a product launch"
209
-
210
- # Song with lyrics and musical controls
211
- sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
212
- --keyscale "C major" --output-format mp3 "bright indie pop chorus"
213
-
214
- # Hosted API chat: natural-language creative-agent tool execution
215
- sogni-agent --api-chat "Create a 4-shot product video concept for a red sneaker"
216
-
217
- # Hosted API chat with image vision and media-reference metadata
218
- sogni-agent --api-chat --ref product.jpg \
219
- "Turn this into a launch poster and describe the edit plan"
220
-
221
- # Sogni Intelligence model/replay utilities
222
- sogni-agent --list-api-models
223
- sogni-agent --api-chat --task-profile reasoning --max-tokens 2000 \
224
- "Plan a concise multi-step product launch workflow"
225
- sogni-agent --list-replays 20
226
- sogni-agent --get-replay run_abc123 --json
227
-
228
- # Draft a savable workflow template through the hosted creative-agent tool loop
229
- sogni-agent --api-chat \
230
- "Design a reusable workflow for a 9:16 product teaser from one product photo"
231
-
232
- # Durable API workflow: generated keyframe to video with resumable workflow record
233
- sogni-agent --api-workflow \
234
- --video-prompt "The camera slowly pushes in as the sketch comes alive" \
235
- "A graphite robot sketch on a drafting table"
236
-
237
- # Durable API workflow with media reference and cost controls
238
- sogni-agent --api-workflow \
239
- --ref https://cdn.example.com/sketch.png \
240
- --workflow-max-cost 25 --confirm-cost \
241
- --video-prompt "The camera slowly pushes in as the sketch comes alive" \
242
- "Animate the referenced sketch"
243
-
244
- # Exact durable workflow input with explicit steps
245
- sogni-agent --api-workflow --workflow-input @workflow-input.json \
246
- --workflow-idempotency-key product-teaser-v1
247
-
248
- # Durable storyboard-video workflow: storyline -> GPT Image 2 storyboard -> Seedance
249
- sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
250
- "Create a 9:16 bakery launch video with a neon street-window reveal"
251
-
252
- # Workflow management
253
- sogni-agent --list-workflows
254
- sogni-agent --resume-workflow wf_durable_workflow_123
255
- ```
256
-
257
- Use `--api-chat` for text-first natural-language workflows that should go through
258
- Sogni API's OpenAI-compatible `POST /v1/chat/completions` loop. The public
259
- REST body uses snake_case controls such as `tool_choice`, `response_format`,
260
- `task_profile`, `token_type`, `app_source`, `media_references`,
261
- `chat_template_kwargs`, `sogni_tools`, and `sogni_tool_execution`. The endpoint
262
- normalizes OpenAI `developer` messages to `system`; when a developer message is
263
- present and no explicit `task_profile` is supplied, the server treats the task
264
- as `coding`. The CLI sanitizes prompt-injection markers before forwarding
265
- messages and sends API-key auth so hosted Sogni tools can execute server-side.
266
-
267
- Hosted tool surfaces are split by `sogni_tools`:
268
-
269
- - `creative-tools` is the public API default when `sogni_tools` is omitted or
270
- true. It exposes generation/editing tools (`generate_image`,
271
- `generate_video`, `generate_music`, `edit_image`, `apply_style`,
272
- `restore_photo`, `refine_result`, `animate_photo`, `change_angle`,
273
- `video_to_video`, `stitch_video`, `orbit_video`, `dance_montage`,
274
- `sound_to_video`, `extend_video`, `replace_video_segment`, `overlay_video`,
275
- `add_subtitles`), media-analysis tools (`analyze_image`, `analyze_video`,
276
- `extract_metadata`), and lightweight composition tools (`enhance_prompt`,
277
- `compose_lyrics`, `compose_instrumental`, `compose_script`).
278
- - `creative-agent` is this CLI's default for `--api-chat`. It includes the
279
- `creative-tools` surface plus session-control tools
280
- (`ask_clarifying_question`, `finalize_response`), asset-manifest tools
281
- (`create_asset_manifest`, `inspect_asset`, `label_asset`,
282
- `map_assets_for_model`, `validate_asset_references`), and durable planning
283
- tools (`compose_workflow`, `compose_workflow_template`). Use this surface
284
- when the model should design one-shot workflow plans, draft savable workflow
285
- templates, or maintain stable asset references across a multi-step turn.
286
- - `none` disables Sogni tool injection and leaves only caller-supplied OpenAI
287
- tools on raw API/SDK requests. In the CLI, use it with
288
- `--no-api-tool-execution` when you want text-only planning without hosted
289
- Sogni tool dispatch.
290
-
291
- Use `--durable-chat` for long-running, LLM-in-the-loop turns that should be
292
- persisted as `POST /v1/chat/runs` records instead of a single
293
- `/v1/chat/completions` request. Chat runs keep an event log, stream via
294
- `/v1/chat/runs/:id/events/stream`, support cancellation, and can pause for
295
- persisted cost approval (`/v1/chat/runs/:id/confirm-cost`) in first-party
296
- clients. The CLI can start and stream durable chat runs through the SDK
297
- transport when `SOGNI_SKILL_USE_SDK_TRANSPORT=1` is set.
298
-
299
- Use `--api-workflow` when the caller already knows it wants an async durable
300
- workflow under `POST /v1/creative-agent/workflows`. The API now accepts either
301
- an inline durable plan (`input.steps`) or a saved workflow template invocation
302
- (`workflow_id` plus `inputs`) and rejects requests that provide both. The CLI's
303
- generated-keyframe and `storyboard-video` presets submit inline `input.steps`;
304
- `--workflow-input @workflow-input.json` supplies that `input` object directly.
305
- Saved template CRUD lives at `/v1/creative-agent/workflows/templates`, and a
306
- saved template can later be run by API/SDK callers with `workflow_id + inputs`.
307
- Use `compose_workflow_template` through `--api-chat` to draft a savable template;
308
- the caller is still responsible for persisting the returned `template_draft`.
309
-
310
- Exact multi-step workflow plans should use explicit step dependencies, including
311
- `replace_video_segment` steps with bounded `replacementStartSeconds` /
312
- `replacementEndSeconds` when interleaving existing video slices. Workflow JSON
313
- can bind request media into step arguments with `sourceStepId: "$input_media"`.
314
- Use `--api-workflow storyboard-video` when the hosted sequence should generate a
315
- storyline, create one GPT Image 2 storyboard sheet, and feed that image artifact
316
- into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT
317
- Image 2 low|medium|high quality for the storyboard sheet.
318
-
319
- Hosted API requests forward media references from `-c`, `--ref`, `--ref-end`,
320
- `--ref-audio`, `--reference-audio-identity`, and `--ref-video` as
321
- `media_references` metadata. `--ref-audio` and `--ref-video` are repeatable in
322
- api-chat / durable-chat mode โ€” each entry uploads independently and is exposed
323
- to the hosted LLM at `@Audio1` / `@Audio2` / `@Video1` etc. API chat also
324
- attaches image refs as vision inputs. Local file references are uploaded to
325
- Sogni media storage first, then forwarded as retrievable URLs for hosted chat
326
- and durable workflows. Use the direct CLI path for private media that must not
327
- leave the local machine.
328
-
329
- ### Seedance reference modes (mutually exclusive)
330
-
331
- When `--video -m seedance2` or `-m seedance2-fast` is selected, the skill
332
- exposes the same two-mode pattern that the hosted chat surfaces. Pick one
333
- mode per video request:
334
-
335
- - **Dedicated frame mode โ€” `--ref` and/or `--ref-end`.** First-class
336
- first-frame / last-frame anchoring; the Seedance worker pins them as
337
- parameter-mode firstFrame / lastFrame. Max 2 images.
338
- - **Loose reference mode โ€” `-c/--context` plus optional `--ref-audio`
339
- extras and `--ref-video` extras.** Anchor frame intent in the prompt with
340
- `@Image1` / `@Image2` / `@Video1` / `@Audio1` etc. (e.g. *"Use @Image1 as
341
- the opening shot reference"*). Supports up to 9 image refs, 3 video refs,
342
- 3 audio refs, and 12 total reference assets per video request. The
343
- numeric caps come from the canonical
344
- `@sogni-ai/sogni-protocol/catalogs/seedance-reference-limits.json` catalog,
345
- surfaced through `@sogni-ai/sogni-intelligence-client/tools` as
346
- `SEEDANCE_REFERENCE_LIMITS` and `validateSeedanceReferenceCounts()`.
347
-
348
- Combining `--ref` / `--ref-end` with `-c/--context` on Seedance is rejected
349
- client-side with a clear error pointing to the correct mode. In CLI direct-gen
350
- mode, additional `--ref-audio` / `--ref-video` entries beyond the first must
351
- be HTTPS URLs (the primary entry can still be a local file path); for local
352
- multi-file Seedance uploads, use `--api-chat` / `--durable-chat` instead. Use
353
- `--workflow-max-cost <n>` plus `--confirm-cost` / `--no-confirm-cost` to forward
354
- explicit workflow cost policy, and `--workflow-idempotency-key` when retrying a
355
- workflow start request.
356
-
357
- Sogni Intelligence utilities are exposed through the same API-key path:
358
- `--list-api-models` / `--get-api-model <id>` read `/v1/models`, `--task-profile`
359
- and `--max-tokens` tune `/v1/chat/completions`, and `--list-replays`,
360
- `--get-replay`, and `--ingest-replay` manage `/v1/replay/records` RunRecords for
361
- replay/debug viewers. The public chat endpoint also accepts OpenAI-standard
362
- `reasoning_effort` / `reasoning.effort` in raw API requests. The CLI's
363
- `--thinking` / `--no-thinking` flags are forwarded as
364
- `chat_template_kwargs.enable_thinking`; current hosted Qwen requests may
365
- normalize thinking on server-side, so do not rely on `--no-thinking` as a hard
366
- suppression switch for `/v1/chat/completions`.
367
- Hosted API modes require `SOGNI_API_KEY`; this skill's CLI uses API-key
368
- authentication.
369
-
370
- For durable hosted chat runs (long-running multi-tool turns that should
371
- survive a client disconnect), the SDK now exposes
372
- `sogni.chat.runs.{create, get, cancel, streamEvents}`.
373
- Set `SOGNI_SKILL_USE_SDK_TRANSPORT=1` to route hosted workflow + chat
374
- operations through the SDK transport instead of the legacy
375
- SSRF-validated fetch path. The skill's `sogni-hosted-client.mjs`
376
- factory still validates `restEndpoint` / `socketEndpoint` against the
377
- SSRF guard before constructing the SDK client, so the safety contract
378
- holds.
379
- For `--durable-chat`, stream output as the run advances; the CLI reports
380
- assistant deltas plus de-duplicated per-job progress / ETA / result lines from
381
- hosted run events.
382
-
383
- When changing hosted API chat/workflow behavior, keep reusable validation,
384
- workflow compilation, repair-control, and guard telemetry logic in the shared
385
- Sogni runtime first, then sync it into this public skill. The public skill
386
- should consume generated or shared typed contracts instead of adding
387
- skill-local regex guards. Keep local regex limited to bounded CLI/fact
388
- extraction such as paths, URLs, extensions, dimensions, durations, and explicit
389
- positions.
390
-
391
- ## Options
392
-
393
- | Flag | Description | Default |
394
- |------|-------------|---------|
395
- | `-Q, --quality <tier>` | Quality preset: fast\|hq\|pro (auto-selects model/steps/size) | - |
396
- | `-o, --output <path>` | Save to file | prints URL |
397
- | `-m, --model <id>` | Model ID (overrides --quality) | z_image_turbo_bf16 |
398
- | `-w, --width <px>` | Width | 512 |
399
- | `-h, --height <px>` | Height | 512 |
400
- | `-n, --count <num>` | Number of images (supports {a\|b\|c} prompt variations) | 1 |
401
- | `-t, --timeout <sec>` | Timeout seconds | 30 (300 for video) |
402
- | `-s, --seed <num>` | Specific seed | random |
403
- | `--last-seed` | Reuse seed from last render | - |
404
- | `--seed-strategy <s>` | Seed strategy: random\|prompt-hash | prompt-hash |
405
- | `--multi-angle` | Multiple angles LoRA mode (Qwen Image Edit) | - |
406
- | `--angles-360` | Generate 8 azimuths (front -> front-left) | - |
407
- | `--angles-360-video` | Assemble looping 360 mp4 using i2v between angles (requires ffmpeg) | - |
408
- | `--azimuth <key>` | front\|front-right\|right\|back-right\|back\|back-left\|left\|front-left | front |
409
- | `--elevation <key>` | low-angle\|eye-level\|elevated\|high-angle | eye-level |
410
- | `--distance <key>` | close-up\|medium\|wide | medium |
411
- | `--angle-strength <n>` | LoRA strength for multiple_angles | 0.9 |
412
- | `--angle-description <text>` | Optional subject description | - |
413
- | `--steps <num>` | Override steps (model-dependent) | - |
414
- | `--guidance <num>` | Override guidance (model-dependent) | - |
415
- | `--output-format <f>` | Image output format: png\|jpg, or webp for gpt-image-2 | png |
416
- | `--sampler <name>` | Sampler (model-dependent) | - |
417
- | `--scheduler <name>` | Scheduler (model-dependent) | - |
418
- | `--lora <id>` | LoRA id (repeatable, edit only) | - |
419
- | `--loras <ids>` | Comma-separated LoRA ids | - |
420
- | `--lora-strength <n>` | LoRA strength (repeatable) | - |
421
- | `--lora-strengths <n>` | Comma-separated LoRA strengths | - |
422
- | `--token-type <type>` | Token type: spark\|sogni\|auto (auto retries with alternate) | spark |
423
- | `--balance, --balances` | Show SPARK/SOGNI balances and exit | - |
424
- | `-c, --context <path>` | Context image for editing | - |
425
- | `--last-image` | Use last generated image as context/ref | - |
426
- | `--music` | Generate music/audio instead of image | - |
427
- | `--music-model <id>` | Music model: turbo\|sft\|ace_step_1.5_turbo\|ace_step_1.5_sft | ace_step_1.5_turbo |
428
- | `--lyrics <text>` | Optional lyrics for song generation | - |
429
- | `--language <code>` | Lyrics language code | en |
430
- | `--bpm <num>` | Music tempo, 30-300 BPM | server default |
431
- | `--keyscale <text>` | Music key/scale, e.g. C major | - |
432
- | `--timesig <n>` | Time signature: 2\|3\|4\|6 | server default |
433
- | `--composer-mode`, `--no-composer-mode` | Toggle AI composer mode | server default |
434
- | `--prompt-strength <n>` | Music prompt adherence, 0-10 | server default |
435
- | `--creativity <n>` | Music variation/temperature, 0-2 | server default |
436
- | `--music-shift <n>` | Audio model shift parameter, 1-6 | 3 |
437
- | `--audio-format <f>` | Alias for music output format: mp3\|flac\|wav | mp3 |
438
- | `--video, -v` | Generate video instead of image | - |
439
- | `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
440
- | `--fps <num>` | Frames per second (video) | model default |
441
- | `--duration <sec>` | Duration in seconds (video or music) | video 5, music 30 |
442
- | `--frames <num>` | Override total frames (video) | - |
443
- | `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
444
- | `--auto-resize-assets` | Auto-resize video assets | true |
445
- | `--no-auto-resize-assets` | Disable auto-resize | - |
446
- | `--estimate-video-cost` | Estimate video cost and exit | - |
447
- | `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
448
- | `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
449
- | `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
450
- | `--ref <path\|url>` | Reference image for video or photobooth face | required for video/photobooth |
451
- | `--ref-end <path\|url>` | End frame for i2v interpolation | - |
452
- | `--ref-audio <path\|url>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
453
- | `--audio-start <sec>` | Start offset into `--ref-audio` | - |
454
- | `--audio-duration <sec>` | Duration slice from `--ref-audio` | - |
455
- | `--reference-audio-identity <path>` | Voice identity clip for LTX native audio | - |
456
- | `--voice-persona <name>` | Use saved persona voice clip as LTX voice identity | - |
457
- | `--ref-video <path\|url>` | Reference video for animate/v2v workflows | - |
458
- | `--video-start <sec>` | Start offset into `--ref-video` for segmented V2V/animate | - |
459
- | `--controlnet-name <name>` | ControlNet type for v2v: canny\|pose\|depth\|detailer | - |
460
- | `--controlnet-strength <n>` | ControlNet strength for v2v (0.0-1.0) | canny/pose/depth 0.85, detailer 1.0 |
461
- | `--sam2-coordinates <coords>` | SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2) | - |
462
- | `--trim-end-frame` | Trim last frame for seamless video stitching | - |
463
- | `--first-frame-strength <n>` | Keyframe strength for start frame (0.0-1.0) | - |
464
- | `--last-frame-strength <n>` | Keyframe strength for end frame (0.0-1.0) | - |
465
- | `--last` | Show last render info | - |
466
- | `--json` | JSON output | false |
467
- | `--strict-size` | Do not auto-adjust i2v video size for reference resizing constraints | false |
468
- | `-q, --quiet` | No progress output | false |
469
- | `--extract-last-frame <video> <image>` | Extract last frame from video (safe ffmpeg wrapper) | - |
470
- | `--extract-first-frame <video> <image>` | Extract first frame from video (safe ffmpeg wrapper) | - |
471
- | `--concat-videos <out> <clips...>` | Concatenate video clips; normalizes fps/size and fills silent audio so mismatched clips stitch cleanly (safe ffmpeg wrapper) | - |
472
- | `--concat-fps <n>` | Override target fps for `--concat-videos` | highest clip fps |
473
- | `--concat-audio <path>` | Optional audio track to mux over `--concat-videos` output | - |
474
- | `--concat-audio-start <sec>` | Start offset into `--concat-audio` | - |
475
- | `--remix-audio <in> <out>` | Rebuild a video's audio (loop/fade/mix) without re-encoding video (safe ffmpeg wrapper) | - |
476
- | `--bed-audio <path>` | Audio bed for `--remix-audio` (path or video; defaults to input's own audio) | - |
477
- | `--audio-loop` | Loop the bed to cover the full video duration (`--remix-audio`) | false |
478
- | `--audio-fade-in <sec>` | Fade the bed in over `<sec>` (`--remix-audio`) | - |
479
- | `--audio-fade-out <sec>` | Fade the bed out over `<sec>` at the tail (`--remix-audio`) | - |
480
- | `--mix-audio <path>` | Overlay one extra audio track, mixed with the bed (`--remix-audio`) | - |
481
- | `--mix-at <sec>` | Start offset for `--mix-audio` | 0 |
482
- | `--mix-gain <db>` | Gain in dB applied to `--mix-audio` | 0 |
483
- | `--list-media [type]` | List recent inbound media (images\|audio\|all) | images |
484
- | `--api-chat` | Call OpenAI-compatible `/v1/chat/completions`; CLI default sends the hosted `creative-agent` tool surface | - |
485
- | `--durable-chat` | Start and stream a durable `/v1/chat/runs` record through SDK transport; requires `SOGNI_SKILL_USE_SDK_TRANSPORT=1` | - |
486
- | `--api-tools <mode>` | API tool mode: creative-agent\|creative-tools\|none. CLI default is creative-agent; raw API default is creative-tools. | creative-agent |
487
- | `--no-api-tool-execution` | Plan/tool-call via API chat without executing Sogni tools | - |
488
- | `--llm-model <id>` | LLM model for `--api-chat` | qwen3.6-35b-a3b-gguf-iq4xs |
489
- | `--task-profile <profile>` | Sogni Intelligence task profile: general\|coding\|reasoning | - |
490
- | `--max-tokens <n>` | Max hosted chat completion tokens | 1600 |
491
- | `--thinking`, `--no-thinking` | Forward `chat_template_kwargs.enable_thinking` for hosted chat; current public Qwen requests may normalize thinking on server-side | server default |
492
- | `--system <text>` | Override the base system prompt for hosted chat | built-in creative assistant prompt |
493
- | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM model metadata | - |
494
- | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>` | Manage Sogni Intelligence replay RunRecords. List/get output is run through `redactRunRecord` from `@sogni/creative-agent/replay` before printing, so signed URLs, bearer tokens, JWTs, and PEM blocks cannot leak via the CLI. Use `@path` to load JSON from a file. | - |
495
- | `--skip-redact`, `--no-redact` | Bypass the replay redactor on `--list-replays` / `--get-replay`. Debug-only โ€” emits unredacted RunRecord payloads. | redacted |
496
- | `--turn-classify` | Print the public-skill turn policy (`visibleTools`, `forbiddenTools`, `requiredTools`) the default contract runtime would produce for the current session-state flags. Mirrors the chat / `/v1/chat/completions` Structured Contracts v1 pipeline. | - |
497
- | `--compile-tools` | Print the per-turn compiled tool surface (filtered tool list + prompt-contract fragments) the default contract runtime emits. | - |
498
- | `--dispatch-tool <name>` | Print the dispatch verdict (`allowed`, `mode`, repair recipe, suggested args) the default contract runtime would return for a tool call. Combine with `--tool-args` to supply arguments. | - |
499
- | `--tool-args <json>` | JSON arguments for `--dispatch-tool`. | `{}` |
500
- | `--storyboard-plan` | Build a storyboard project from the prompt locally (`buildStoryboardProject` + per-model adapter compilation via `compileForModel`) and print the plan as JSON. Does not call the network. Expects scene-structured prompt input (`SCENE NN - Title` / `VISUAL:` / `ACTION:` / `CAMERA:` / `AUDIO/SFX:` blocks) โ€” for casual prompts, use `--api-workflow storyboard-video` instead, which runs an LLM storyline expansion first. Pair with `--storyboard-plan-frames`, `--storyboard-plan-model`, `--storyboard-plan-stage`. | - |
501
- | `--storyboard-plan-frames <n>` | Frame count for `--storyboard-plan`. | inferred |
502
- | `--storyboard-plan-model <id>` | Adapter target for `--storyboard-plan` (seedance, seedance2, gpt-image-2, ltx23, wan). | inferred |
503
- | `--storyboard-plan-stage <stage>` | Compilation stage for `--storyboard-plan` (storyboard_image, scene_clip). | storyboard_image |
504
- | `--api-workflow` | Start `/v1/creative-agent/workflows` with generated inline `input.steps`; optional `storyboard-video` preset | - |
505
- | `--workflow-input <json\|@path>` | Durable workflow `input` JSON for the start request. Use `@path` to load from a file. | - |
506
- | `--workflow-title <text>` | Title for generated or storyboard durable workflow input | - |
507
- | `--workflow-idempotency-key <key>`, `--idempotency-key <key>` | Reuse safely when retrying a durable workflow start request | - |
508
- | `--workflow-max-cost <n>` | Reject hosted workflow starts above this estimated capacity-unit ceiling | - |
509
- | `--confirm-cost`, `--no-confirm-cost` | Forward explicit hosted workflow cost confirmation | - |
510
- | `--storyboard-frames <n>` | Beat count for storyboard-video workflow | - |
511
- | `--video-prompt <text>` | Motion prompt for generated-keyframe durable workflow | - |
512
- | `--negative-prompt <text>` | Negative prompt for generated-keyframe durable workflow | - |
513
- | `--generate-audio`, `--no-generate-audio` | Toggle audio generation for generated video steps | - |
514
- | `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for generated video steps | - |
515
- | `--watch-workflow` | Stream durable workflow events after start | - |
516
- | `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>`, `--resume-workflow <id>` | Durable workflow management helpers | - |
517
- | `--api-base-url <url>` | Sogni API base for hosted API modes. Credentials are only sent to `https://api.sogni.ai` by default; use `SOGNI_API_ALLOWED_HOSTS` for trusted custom hosts or `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1` for isolated local testing. | https://api.sogni.ai |
518
- | `--no-filter` | Disable NSFW content filter | - |
519
- | `--memory-set <key> <value>` | Save a user preference | - |
520
- | `--memory-get <key>` | Get a specific memory | - |
521
- | `--memory-list` | List all saved memories | - |
522
- | `--memory-remove <key>` | Delete a memory | - |
523
- | `--personality-set <text>` | Set custom agent personality instructions | - |
524
- | `--personality-get` | Show current personality | - |
525
- | `--personality-clear` | Reset personality to default | - |
526
- | `--persona-add <name>` | Add a persona (with --ref, --relationship, --description) | - |
527
- | `--persona-list` | List all saved personas | - |
528
- | `--persona-remove <name>` | Remove a persona and its files | - |
529
- | `--persona-resolve <name>` | Look up persona by name/tag/pronoun | - |
530
- | `--persona <name>` | Generate using persona's reference photo as context | - |
531
- | `--relationship <type>` | Persona relationship: self\|partner\|child\|friend\|pet | friend |
532
- | `--voice-clip <path>` | Voice clip audio for LTX-2.3 voice cloning | - |
533
-
534
- ## OpenClaw Config Defaults
535
-
536
- When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defaults from:
537
-
538
- `~/.openclaw/openclaw.json`
539
-
540
- ```json
541
- {
542
- "plugins": {
543
- "entries": {
544
- "sogni-creative-agent-skill": {
545
- "enabled": true,
546
- "config": {
547
- "defaultImageModel": "z_image_turbo_bf16",
548
- "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
549
- "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
550
- "defaultMusicModel": "ace_step_1.5_turbo",
551
- "videoModels": {
552
- "t2v": "ltx23-22b-fp8_t2v_distilled",
553
- "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
554
- "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
555
- "ia2v": "ltx23-22b-fp8_ia2v_distilled",
556
- "a2v": "ltx23-22b-fp8_a2v_distilled",
557
- "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
558
- "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
559
- "v2v": "ltx23-22b-fp8_v2v_distilled"
560
- },
561
- "defaultVideoWorkflow": "t2v",
562
- "defaultNetwork": "fast",
563
- "defaultTokenType": "spark",
564
- "apiBaseUrl": "https://api.sogni.ai",
565
- "defaultLlmModel": "qwen3.6-35b-a3b-gguf-iq4xs",
566
- "defaultTaskProfile": "general",
567
- "defaultApiMaxTokens": 1600,
568
- "defaultApiThinking": false,
569
- "defaultApiToolMode": "creative-agent",
570
- "defaultWorkflowMaxCost": 25,
571
- "defaultWorkflowConfirmCost": false,
572
- "seedStrategy": "prompt-hash",
573
- "modelDefaults": {
574
- "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
575
- "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
576
- },
577
- "defaultWidth": 768,
578
- "defaultHeight": 768,
579
- "defaultCount": 1,
580
- "defaultFps": 16,
581
- "defaultDurationSec": 5,
582
- "defaultImageTimeoutSec": 30,
583
- "defaultVideoTimeoutSec": 300,
584
- "defaultMusicDurationSec": 30,
585
- "defaultMusicTimeoutSec": 600,
586
- "credentialsPath": "~/.config/sogni/credentials",
587
- "lastRenderPath": "~/.config/sogni/last-render.json",
588
- "mediaInboundDir": "~/.clawdbot/media/inbound"
589
- }
590
- }
591
- }
592
- }
593
- }
594
- ```
595
-
596
- CLI flags always override these defaults.
597
- If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
598
- Seed strategies: `prompt-hash` (deterministic) or `random`.
599
-
600
- ## Image Models
601
-
602
- | Model | Speed | Use Case |
603
- |-------|-------|----------|
604
- | `z_image_turbo_bf16` | Fast (~5-10s) | General purpose, default |
605
- | `gpt-image-2` | Variable | OpenAI GPT Image 2 text-to-image and edit, strong prompt and text rendering |
606
- | `flux1-schnell-fp8` | Very fast | Quick iterations |
607
- | `flux2_dev_fp8` | Slow (~2min) | High quality |
608
- | `chroma-v.46-flash_fp8` | Medium | Balanced |
609
- | `qwen_image_edit_2511_fp8` | Medium | Image editing with context (up to 3) |
610
- | `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
611
- | `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
612
-
613
- `gpt-image-2` supports flexible OpenAI image sizes up to `3840px` on either edge, max `3:1` aspect ratio, and total pixels from `655,360` through `8,294,400`; the API snaps dimensions to valid multiples of 16.
614
-
615
- ## Music Models
616
-
617
- | Model | Use Case |
618
- |-------|----------|
619
- | `ace_step_1.5_turbo` | Default direct music generation model |
620
- | `ace_step_1.5_sft` | Experimental option with stronger lyric handling |
621
-
622
- Use `--music` for direct audio-only generation. Defaults are 30 seconds, `mp3`,
623
- `ace_step_1.5_turbo`, 8 steps, `euler` sampler, and `simple` scheduler. Keep
624
- `--audio` for video reference audio (`--ref-audio` alias); do not use it for
625
- direct music generation.
626
-
627
- ## Video Models
628
-
629
- ### Current Video Model Selectors
630
-
631
- | Model | Speed | Use Case |
632
- |-------|-------|----------|
633
- | `ltx23-22b-fp8_t2v_distilled` | Fast (~2-3min) | Default text-to-video with native dialogue/audio |
634
- | `ltx23-22b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video with native dialogue/audio |
635
- | `ltx23-22b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
636
- | `ltx23-22b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
637
- | `ltx23-22b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
638
- | `seedance2` | Variable | Seedance 2.0 text-to-video, 4-15s, native audio |
639
- | `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video |
640
- | `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video |
641
- | `seedance2-v2v` | Variable | Seedance 2.0 video-to-video, no ControlNet |
642
- | `wan_v2.2-14b-fp8_i2v_lightx2v` | Fast | Simple image-to-video |
643
- | `wan_v2.2-14b-fp8_i2v` | Slow | Higher quality video |
644
- | `wan_v2.2-14b-fp8_t2v_lightx2v` | Fast | Text-to-video |
645
- | `wan_v2.2-14b-fp8_s2v_lightx2v` | Fast | Face lip-sync with uploaded audio |
646
- | `wan_v2.2-14b-fp8_animate-move_lightx2v` | Fast | Animate-move |
647
- | `wan_v2.2-14b-fp8_animate-replace_lightx2v` | Fast | Animate-replace |
648
-
649
- ### LTX-2 / LTX-2.3 Models
650
-
651
- | Model | Speed | Use Case |
652
- |-------|-------|----------|
653
- | `ltx2-19b-fp8_t2v_distilled` | Fast (~2-3min) | Text-to-video, 8-step |
654
- | `ltx2-19b-fp8_t2v` | Medium (~5min) | Text-to-video, 20-step quality |
655
- | `ltx2-19b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video, 8-step |
656
- | `ltx2-19b-fp8_i2v` | Medium (~5min) | Image-to-video, 20-step quality |
657
- | `ltx2-19b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
658
- | `ltx2-19b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
659
- | `ltx2-19b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
660
- | `ltx2-19b-fp8_v2v` | Medium (~5min) | Video-to-video with ControlNet, quality |
661
-
662
- ## Image Editing with Context
663
-
664
- Edit images using reference images. Qwen models support up to 3 context images; GPT Image 2 edit supports up to 16 when selected with `-m gpt-image-2`:
665
-
666
- ```bash
667
- # Single context image
668
- sogni-agent -c photo.jpg "make the background a beach"
669
-
670
- # Multiple context images (subject + style)
671
- sogni-agent -c subject.jpg -c style.jpg "apply the style to the subject"
672
-
673
- # GPT Image 2 multi-reference edit
674
- sogni-agent -m gpt-image-2 -c subject.jpg -c outfit.jpg -c room.jpg "place the subject in the room wearing the outfit"
675
-
676
- # Use last generated image as context
677
- sogni-agent --last-image "make it more vibrant"
678
- ```
679
-
680
- When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`. Select `-m gpt-image-2` for GPT Image 2's higher reference-image limit and OpenAI-backed image editing.
681
-
682
- Use context-image editing for source-preserving edits. If the user says "use this image as the base", "keep everything the same", "only change the style", "anime version of this image", or asks to preserve pose, clothing, background, framing, or composition, use `-c/--context` with a Qwen image edit model instead of `--photobooth`. For stronger preservation than the lightning default, prefer:
683
-
684
- ```bash
685
- sogni-agent -c photo.jpg -m qwen_image_edit_2511_fp8 "turn this into anime style; keep the same face, pose, clothing, background, framing, and composition"
686
- ```
687
-
688
- ## Photobooth (Face Transfer)
689
-
690
- Generate new stylized portraits from a face photo using InstantID ControlNet. Use `--photobooth` with `--ref` when the user explicitly asks for photobooth/face-transfer mode, wants a new portrait or headshot based on their face, or asks to place their face identity into a different portrait concept.
691
-
692
- Do not use `--photobooth` for full-image style edits where the original photo must stay intact. `--photobooth` treats the input as a face reference, not as a base image, so it can change pose, clothing, background, framing, and composition. For "same image, different style" requests, route to Qwen context editing with `-c/--context`.
693
-
694
- ```bash
695
- # Basic photobooth
696
- sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
697
-
698
- # Multiple outputs
699
- sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
700
-
701
- # Custom ControlNet tuning
702
- sogni-agent --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"
703
- ```
704
-
705
- Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
706
-
707
- **Agent usage:**
708
- ```bash
709
- # Photobooth: stylize a face photo
710
- sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
711
-
712
- # Multiple photobooth outputs
713
- sogni-agent -q --photobooth --ref /path/to/face.jpg -n 4 -o ./stylized.png "LinkedIn professional headshot"
714
- ```
715
-
716
- ## Multiple Angles (Turnaround)
717
-
718
- Generate specific camera angles from a single reference image using the Multiple Angles LoRA:
719
-
720
- ```bash
721
- # Single angle
722
- sogni-agent --multi-angle -c subject.jpg \
723
- --azimuth front-right --elevation eye-level --distance medium \
724
- --angle-strength 0.9 \
725
- "studio portrait, same person"
726
-
727
- # 360 sweep (8 azimuths)
728
- sogni-agent --angles-360 -c subject.jpg --distance medium --elevation eye-level \
729
- "studio portrait, same person"
730
-
731
- # 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
732
- sogni-agent --angles-360 --angles-360-video ./turntable.mp4 \
733
- -c subject.jpg --distance medium --elevation eye-level \
734
- "studio portrait, same person"
735
- ```
736
-
737
- The prompt is auto-built with the required `<sks>` token plus the selected camera angle keywords.
738
- `--angles-360-video` generates i2v clips between consecutive angles (including lastโ†’first) and concatenates them with ffmpeg for a seamless loop.
739
-
740
- ### 360 Video Best Practices
741
-
742
- When a user requests a "360 video", follow this workflow:
743
-
744
- 1. **Default camera parameters** (do not ask unless they specify):
745
- - **Elevation**: default to **medium**
746
- - **Distance**: default to **medium**
747
-
748
- 2. **Map user terms to flags**:
749
- | User says | Flag value |
750
- |-----------|------------|
751
- | "high" angle | `--elevation high-angle` |
752
- | "medium" angle | `--elevation eye-level` |
753
- | "low" angle | `--elevation low-angle` |
754
- | "close" | `--distance close-up` |
755
- | "medium" distance | `--distance medium` |
756
- | "far" | `--distance wide` |
757
-
758
- 3. **Always use first-frame/last-frame stitching** - the `--angles-360-video` flag automatically handles this by generating i2v clips between consecutive angles including lastโ†’first for seamless looping.
759
-
760
- 4. **Example command**:
761
- ```bash
762
- sogni-agent --angles-360 --angles-360-video ./output.mp4 \
763
- -c /path/to/image.png --elevation eye-level --distance medium \
764
- "description of subject"
765
- ```
766
-
767
- ### Transition Video Rule
768
-
769
- For **any transition video work**, always use the **Sogni skill/plugin** (not raw ffmpeg or other shell commands). Use the built-in `--extract-last-frame`, `--extract-first-frame`, `--concat-videos`, `--remix-audio`, and `--looping` flags for video and audio manipulation.
770
-
771
- ### Insufficient Funds Handling
772
-
773
- Use `--token-type auto` to automatically retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never fall back to SOGNI.
774
-
775
- When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply:
776
-
777
- "Insufficient funds. Buy Spark Packs to continue: https://docs.sogni.ai/pricing/#spark-packs"
778
-
779
- Do not collect payment details, quote a custom price, or simulate a purchase in the terminal.
780
-
781
- ## Video Generation
782
-
783
- Generate videos from a reference image:
784
-
785
- ```bash
786
- # Text-to-video (t2v)
787
- sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
788
-
789
- # Basic video from image
790
- sogni-agent --video --ref cat.jpg -o cat.mp4 "cat walks around"
791
-
792
- # Use last generated image as reference
793
- sogni-agent --last-image --video "gentle camera pan"
794
-
795
- # Custom duration and FPS
796
- sogni-agent --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"
797
-
798
- # Bare "720p" / "HD" without exact pixels: preserve aspect via short-side target
799
- sogni-agent --video --target-resolution 768 \
800
- "A calm cinematic shot of lanterns drifting across a night lake"
801
-
802
- # Natural-language aspect and resolution inference
803
- sogni-agent --video \
804
- "Make a 720p 9:16 video of ocean waves at sunset"
805
-
806
- # Seedance 2.0 text-to-video
807
- sogni-agent --video -m seedance2 --duration 8 \
808
- "A polished product reveal with native ambient sound"
809
-
810
- # Seedance multimodal context with public HTTPS references
811
- sogni-agent --video -m seedance2 --workflow t2v \
812
- --ref https://cdn.example.com/product.png \
813
- --ref-video https://cdn.example.com/motion.mp4 \
814
- --ref-audio https://cdn.example.com/music.m4a \
815
- "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
816
-
817
- # Sound-to-video (s2v)
818
- sogni-agent --video --ref face.jpg --ref-audio speech.m4a \
819
- -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
820
-
821
- # Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
822
- sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
823
- "music video with synchronized motion"
824
-
825
- # Audio-to-video (auto-routes to LTX 2.3 a2v)
826
- sogni-agent --video --ref-audio song.mp3 \
827
- "abstract audio-reactive visualizer"
828
-
829
- # Persona/voice identity with LTX native audio
830
- sogni-agent --video --reference-audio-identity voice.webm \
831
- "NARRATOR: \"This is my voice.\""
832
-
833
- # Prefer .webm, .m4a, or .mp3 voice clips. Local .wav clips are normalized
834
- # to .m4a before upload when ffmpeg is available.
835
-
836
- # LTX-2.3 text-to-video
837
- sogni-agent --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
838
- "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
839
-
840
- # Animate (motion transfer)
841
- sogni-agent --video --ref subject.jpg --ref-video motion.mp4 \
842
- --workflow animate-move "transfer motion"
843
-
844
- # Segment a longer reference video for local stitched workflows
845
- sogni-agent --video --workflow v2v --ref-video dance.mp4 \
846
- --video-start 10 --duration 8 --controlnet-name pose \
847
- "robot dancing"
848
- ```
849
-
850
- ## Video-to-Video (V2V) with ControlNet
851
-
852
- Transform an existing video using LTX-2 models with ControlNet guidance:
108
+ SOGNI_SKILL_USE_SDK_TRANSPORT=1 sogni-agent --durable-chat "Create a launch campaign and animate the hero clip"
853
109
 
854
- ```bash
855
- # Basic v2v with canny edge detection
856
- sogni-agent --video --workflow v2v --ref-video input.mp4 \
857
- --controlnet-name canny "stylized anime version"
858
-
859
- # V2V with pose detection and custom strength
860
- sogni-agent --video --workflow v2v --ref-video dance.mp4 \
861
- --controlnet-name pose --controlnet-strength 0.7 "robot dancing"
862
-
863
- # V2V with depth map
864
- sogni-agent --video --workflow v2v --ref-video scene.mp4 \
865
- --controlnet-name depth "watercolor painting style"
866
- ```
867
-
868
- ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
869
- Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context when they pass the CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
870
-
871
- ```bash
872
- # Seedance V2V without ControlNet
873
- sogni-agent --video --workflow v2v -m seedance2-v2v \
874
- --ref-video input.mp4 "make the clip more cinematic"
875
- ```
110
+ # Durable workflow (resumable, server-orchestrated)
111
+ sogni-agent --api-workflow --video-prompt "The camera slowly pushes in" "A graphite robot sketch on a drafting table"
876
112
 
877
- ## Photo Restoration
878
-
879
- Restore damaged vintage photos using Qwen image editing:
880
-
881
- ```bash
882
- # Basic restoration
883
- sogni-agent -c damaged_photo.jpg -o restored.png \
884
- "professionally restore this vintage photograph, remove damage and scratches"
885
-
886
- # Detailed restoration with preservation hints
887
- sogni-agent -c old_photo.jpg -o restored.png -w 1024 -h 1280 \
888
- "restore this vintage photo, remove peeling, tears and wear marks, \
889
- preserve natural features and expression, maintain warm nostalgic color tones"
890
- ```
891
-
892
- **Tips for good restorations:**
893
- - Describe the damage: "peeling", "scratches", "tears", "fading"
894
- - Specify what to preserve: "natural features", "eye color", "hair", "expression"
895
- - Mention the era for color tones: "1970s warm tones", "vintage sepia"
896
-
897
- **Finding received images (Telegram/etc):**
898
- ```bash
899
- sogni-agent --json --list-media images
113
+ # Storyboard โ†’ GPT Image 2 sheet โ†’ Seedance video, all server-side
114
+ sogni-agent --api-workflow storyboard-video --storyboard-frames 6 -Q hq "9:16 bakery launch video"
900
115
  ```
901
116
 
902
- **Do NOT use `ls`, `cp`, or other shell commands to browse user files.** Always use `--list-media` to find inbound media.
903
-
904
- ## Photobooth Routing Rule
905
-
906
- - If the user explicitly asks to use "photobooth", "photobooth path", or "face transfer", use `--photobooth` with `--ref` set to the user-provided face image.
907
- - If the same request also requires preserving the whole source image (same pose, clothes, background, framing, composition, or "keep everything the same"), explain that photobooth is face-reference generation and prefer Qwen context editing unless the user insists on photobooth.
908
- - Do not route to `--photobooth` merely because the user asks to preserve a face in a style edit. Face-preserving full-image edits should use `-c/--context` with Qwen image edit.
909
-
910
- ## LTX-2.3 Prompt Rule
911
-
912
- Whenever the chosen video model is `ltx23-22b-fp8_t2v_distilled`, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling `sogni-agent`.
913
-
914
- - Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting.
915
- - Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps.
916
- - Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources.
917
- - Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
918
- - Give the scene one main action thread from start to finish. Use connectors like `as`, `while`, and `then` so motion reads as a continuous filmed moment.
919
- - If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
920
- - Budget spoken dialogue at about 3 words per second, plus about 1 second for each meaningful acting beat or pause.
921
- - Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
922
- - Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like `beautiful` or `nice`, or structural markup such as `[DIALOGUE]`.
923
- - Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
924
- - Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.
925
-
926
- ### Duration-Aware Pacing
927
-
928
- Match scene density to clip length so prompts stay filmable:
929
-
930
- - About `1-4s`: describe exactly 1 action or moment.
931
- - About `5-8s`: describe about 2 sequential actions.
932
- - About `9-12s`: describe about 3 sequential actions.
933
- - Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.
934
-
935
- ### Orientation Mapping
936
-
937
- When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions:
938
-
939
- - `vertical`, `portrait`, `story`, `reel`, `tiktok` -> `-w 1088 -h 1920`
940
- - `landscape`, `horizontal`, `widescreen`, `youtube`, `16:9` -> `-w 1920 -h 1088`
941
- - `square`, `1:1` -> `-w 1088 -h 1088`
942
- - `4:3 portrait` -> `-w 832 -h 1088`
943
- - `4:3 landscape` -> `-w 1088 -h 832`
944
-
945
- ### Camera Language Normalization
946
-
947
- When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt:
948
-
949
- - `zoom in` -> `slow push-in`
950
- - `zoom out` -> `slow pull-back`
951
- - `pan left` / `pan right` -> `smooth pan left` / `smooth pan right`
952
- - `orbit` / `circle around` -> `slow arc left` or `slow arc right`
953
- - `follow` -> `tracking follow`
117
+ Hosted modes require `SOGNI_API_KEY`. Local file references are uploaded to Sogni media storage and forwarded as retrievable URLs โ€” **use direct CLI mode for private media that must not leave the local machine.**
954
118
 
955
- Short example:
119
+ Use the direct-to-SDK commands below for explicit one-shot generation when you already know the model, dimensions, and prompt.
956
120
 
957
- ```text
958
- User ask: "4k video of a woman in a neon alley"
959
-
960
- Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
961
- ```
962
-
963
- ## Agent Usage
964
-
965
- When user asks to generate/draw/create an image:
121
+ ## Core Commands (direct-to-SDK)
966
122
 
967
123
  ```bash
968
- # Generate and save locally (use -Q for quality presets instead of memorizing model IDs)
124
+ # Image (quality presets pick model/steps/size: fast | hq | pro)
969
125
  sogni-agent -q -Q fast -o ./generated.png "user's prompt"
970
126
  sogni-agent -q -Q pro -o ./generated.png "user's prompt"
971
127
 
972
- # Generate with prompt variations (diverse images in one call)
128
+ # Diverse variations in one call (options cycle per image)
973
129
  sogni-agent -q -n 3 -o ./cars.png "a {red|blue|green} sports car"
974
130
 
975
- # Edit an existing image
131
+ # Edit an existing image (source-preserving)
976
132
  sogni-agent -q -c /path/to/input.jpg -o ./edited.png "make it pop art style"
977
133
 
978
- # Generate video from image
979
- sogni-agent -q --video --ref /path/to/image.png -o ./video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
980
-
981
- # Generate text-to-video
982
- sogni-agent -q --video -o ./video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
983
-
984
- # Generate direct music/audio
985
- sogni-agent -q --music --duration 30 -o ./music.mp3 "uplifting cinematic synthwave theme for a product launch"
134
+ # Photobooth (face transfer โ€” new portrait from a face photo)
135
+ sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
986
136
 
987
- # HD / "4K" text-to-video: prefer LTX-2.3
988
- sogni-agent -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
137
+ # Text-to-video / image-to-video (write the prompt per references/video-prompting.md)
138
+ sogni-agent -q --video -o ./video.mp4 "<cinematic prose paragraph>"
139
+ sogni-agent -q --video --ref /path/to/image.png -o ./video.mp4 "<cinematic prose paragraph>"
989
140
 
990
- # HD / "4K" image-to-video: prefer LTX i2v
991
- sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
141
+ # Sound-to-video (lip-sync), image+audio, audio-only (workflow auto-inferred)
142
+ sogni-agent --video --ref face.jpg --ref-audio speech.m4a -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
143
+ sogni-agent --video --ref cover.jpg --ref-audio song.mp3 "music video with synchronized motion"
144
+ sogni-agent --video --ref-audio song.mp3 "abstract audio-reactive visualizer"
992
145
 
993
- # Photobooth: stylize a face photo
994
- sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
146
+ # Music (direct audio generation; mp3 by default)
147
+ sogni-agent -q --music --duration 30 -o ./music.mp3 "uplifting cinematic synthwave theme"
148
+ sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 --keyscale "C major" "bright indie pop chorus"
995
149
 
996
- # Token auto-fallback for native Sogni models (tries SPARK first, retries with SOGNI on insufficient balance)
997
- sogni-agent -q --token-type auto -o ./generated.png "user's prompt"
150
+ # Seedance 2.0 (4-15s vendor video with native audio)
151
+ sogni-agent --video -m seedance2 --duration 8 "A polished product reveal with native ambient sound"
998
152
 
999
- # Check current SPARK/SOGNI balances (no prompt required)
153
+ # Balances / last render / inbound media / health (no prompt required)
1000
154
  sogni-agent --json --balance
1001
-
1002
- # Find user-sent images/audio
155
+ sogni-agent --last --json
1003
156
  sogni-agent --json --list-media images
1004
-
1005
- # Then send via message tool with filePath
1006
- ```
1007
-
1008
- ### Quality Presets
1009
-
1010
- Use `-Q` / `--quality` instead of memorizing model IDs:
1011
-
1012
- | Preset | Model | Steps | Size | Speed |
1013
- |--------|-------|-------|------|-------|
1014
- | `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
1015
- | `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
1016
- | `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
1017
-
1018
- Explicit `-m` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions. When the user asks for "high quality", "best quality", or "pro", use `-Q pro`. For quick drafts or previews, use `-Q fast`.
1019
-
1020
- ### Dynamic Prompt Variations
1021
-
1022
- When the user wants multiple variations (different colors, styles, subjects), use `{option1|option2|option3}` syntax with `-n`:
1023
-
1024
- ```bash
1025
- # 3 color variations
1026
- sogni-agent -q -n 3 "a {red|blue|green} sports car"
1027
-
1028
- # 4 style variations
1029
- sogni-agent -q -n 4 "a portrait in {oil painting|watercolor|pencil sketch|pop art} style"
1030
- ```
1031
-
1032
- Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt.
1033
-
1034
- For video, use the same `{...}` + `-n` pattern when all outputs share the same source image, end image, duration, audio, and settings and only prompt text varies:
157
+ sogni-agent doctor --json
158
+ ```
159
+
160
+ `sogni-agent --help` is the canonical, always-current flag reference.
161
+
162
+ ## Common Options
163
+
164
+ | Flag | Use | Default |
165
+ |------|-----|---------|
166
+ | `-Q fast\|hq\|pro` | Quality preset (model+steps+size); `-m` overrides model | - |
167
+ | `-o <path>` | Save output locally (relative โ†’ PWD) | prints URL |
168
+ | `-c <path>` | Context image for editing (repeatable) | - |
169
+ | `-m <id>` | Explicit model | `z_image_turbo_bf16` |
170
+ | `-w` / `-h` | Width / height | 512ร—512 |
171
+ | `-n <num>` | Output count (`{a\|b\|c}` prompt variations cycle); capped at 16, raise with `SOGNI_MAX_COUNT` | 1 |
172
+ | `--video`, `--music` | Generate video / music instead of image | - |
173
+ | `--workflow <t>` | Force `t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace` | inferred |
174
+ | `--ref`, `--ref-end`, `--ref-audio`, `--ref-video` | Start frame / end frame / audio / video references | - |
175
+ | `--duration <sec>` | Video or music length | video 5, music 30 |
176
+ | `--target-resolution <px>` | Short-side target preserving aspect ratio (use for bare "720p") | - |
177
+ | `--photobooth` | Face transfer mode (with `--ref`) | - |
178
+ | `--persona <name>` | Use a saved persona (photo + voice auto-attach) | - |
179
+ | `--token-type spark\|sogni\|auto` | `auto` retries native models with SOGNI when SPARK is low | spark |
180
+ | `--last`, `--last-image` | Inspect last render / reuse it as context or ref | - |
181
+ | `--json` | Machine-parseable stdout (progress goes to stderr) | false |
182
+ | `-q, --quiet` | Suppress progress output | false |
183
+ | `-t <sec>` | Timeout | 30 image / 300 video |
184
+ | `--strict-size` | Fail instead of auto-adjusting video size | false |
185
+ | `doctor`, `self-update`, `--whats-new`, `--snooze-update` | Health check / upgrade / changelog / snooze reminder | - |
186
+
187
+ ## Routing Rules (always apply)
188
+
189
+ ### Photobooth vs. context editing
190
+
191
+ - `--photobooth` is **face-reference generation**, not full-image editing: it generates a *new* portrait from a face photo and may change pose, clothing, background, framing, and composition. Use it when the user explicitly asks for photobooth/face-transfer, a new portrait/headshot from their face, or to place their face into a different concept. Cannot be combined with `--video` or `-c/--context`. Tune with `--cn-strength` (default 0.8) and `--cn-guidance-end` (default 0.3).
192
+ - If the request is "**same image, different style**" โ€” e.g. an anime version that must keep the same face, pose, clothing, background, framing, and composition; "use this image as the base"; "keep everything the same"; "only change the style" โ€” use Qwen context editing with `-c/--context` instead. For stronger preservation than the lightning default:
1035
193
 
1036
194
  ```bash
1037
- sogni-agent --video --ref hero.png -n 3 --duration 5 \
1038
- "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
195
+ sogni-agent -c photo.jpg -m qwen_image_edit_2511_fp8 "turn this into anime style; keep the same face, pose, clothing, background, framing, and composition"
1039
196
  ```
1040
197
 
1041
- If clips need different source images, end frames, durations, audio windows, or other per-output settings, keep them as separate per-clip workflow arguments. Do not force those into a single Dynamic Prompt branch.
1042
-
1043
- ### Token Auto-Fallback
1044
-
1045
- Use `--token-type auto` when the user's SPARK balance might be low. It tries SPARK first and automatically retries with SOGNI if insufficient.
1046
-
1047
- ## High-Res Video Routing
198
+ - Do not route to `--photobooth` merely because the user asks to preserve a face in a style edit โ€” face-preserving full-image edits use `-c` with Qwen image edit. When context images are provided without `-m`, the CLI defaults to `qwen_image_edit_2511_fp8_lightning`; select `-m gpt-image-2` for up to 16 reference images and OpenAI-backed editing (Qwen supports up to 3).
1048
199
 
1049
- When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or **"high-res"**, do not use the default WAN video models.
200
+ ### LTX video prompts
1050
201
 
1051
- - For **text-to-video**, use `-m ltx23-22b-fp8_t2v_distilled`.
1052
- - For **image-to-video**, use `-m ltx23-22b-fp8_i2v_distilled`.
1053
- - Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
1054
- - For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
1055
- - When the prompt combines a named resolution with an aspect ratio, such as "720p 9:16", let the CLI infer both instead of forcing manual `-w`/`-h` unless the user gave exact pixels.
1056
- - If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
1057
- - Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
1058
- - Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
202
+ Whenever the chosen video model is in the LTX family (including the default t2v), **do not pass the user's short request through unchanged**. Rewrite it into one unbroken paragraph of 4-8 flowing present-tense sentences describing a single continuous shot โ€” concrete subjects, named light sources, one action thread, dialogue embedded in double quotes with the speaker identified, positive phrasing only, no headers/bullets/negative-prompts. **Read [`references/video-prompting.md`](./references/video-prompting.md) for the full rule, duration pacing, orientation mapping, and camera-language normalization before writing the prompt.**
1059
203
 
1060
- **Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--extract-first-frame`, `--concat-videos`, `--remix-audio`, `--list-media`) for all file operations and video/audio manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
204
+ ### High-res video
1061
205
 
1062
- ## Animate Between Two Images (First-Frame / Last-Frame)
206
+ For "hd" / "1080p" / "4k" / "uhd" requests: use `-m ltx23-22b-fp8_t2v_distilled` (text) or `-m ltx23-22b-fp8_i2v_distilled` (image), prefer `-w 1920 -h 1088` (or the orientation mapping in the reference), and rewrite the prompt per the LTX rule. For bare "720p" without orientation, prefer `--target-resolution 768`.
1063
207
 
1064
- When a user asks to **animate between two images**, use `--ref` (first frame) and `--ref-end` (last frame) to create a creative interpolation video:
1065
-
1066
- ```bash
1067
- # Animate from image A to image B
1068
- sogni-agent -q --video --ref ./imageA.png --ref-end ./imageB.png -o ./transition.mp4 "descriptive prompt of the transition"
1069
- ```
208
+ ### Video editing, stitching, 360 turnarounds
1070
209
 
1071
- ### Animate a Video to an Image (Scene Continuation)
210
+ Trigger patterns โ€” "animate image A to image B" (`--ref A --ref-end B`), "continue this video" (extract last frame โ†’ i2v โ†’ concat), "transition between two videos" (bridge clip), "360 video" (`--angles-360 --angles-360-video`), "add/replace the soundtrack" (`--concat-audio` / `--remix-audio`). **Read [`references/video-editing.md`](./references/video-editing.md) for the step-by-step recipes.**
1072
211
 
1073
- When a user asks to **animate from a video to an image** (or "continue" a video into a new scene):
212
+ **Security: never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) for file operations or video/audio manipulation.** Always use the CLI's built-in safe wrappers: `--extract-first-frame`, `--extract-last-frame`, `--concat-videos`, `--remix-audio`, `--list-media`, `--video-start`, `--audio-start`, `--audio-duration`, `--looping`.
1074
213
 
1075
- 1. **Extract the last frame** of the existing video using the built-in safe wrapper:
1076
- ```bash
1077
- sogni-agent --extract-last-frame ./existing.mp4 ./lastframe.png
1078
- ```
1079
- 2. **Generate a new video** using the last frame as `--ref` and the target image as `--ref-end`:
1080
- ```bash
1081
- sogni-agent -q --video --ref ./lastframe.png --ref-end ./target.png -o ./continuation.mp4 "scene transition prompt"
1082
- ```
1083
- 3. **Concatenate the videos** using the built-in safe wrapper:
1084
- ```bash
1085
- sogni-agent --concat-videos ./full_sequence.mp4 ./existing.mp4 ./continuation.mp4
1086
- ```
214
+ ### Finding user-sent media
1087
215
 
1088
- This ensures visual continuity โ€” the new clip picks up exactly where the previous one ended.
216
+ Use `sogni-agent --json --list-media images` (or `audio` / `all`) to find inbound media the user sent (e.g. via Telegram). **Do NOT browse user files with `ls`, `cp`, or other shell commands.**
1089
217
 
1090
- When the final stitched output needs a single external soundtrack, add `--concat-audio /path/to/audio.mp3` and optional `--concat-audio-start <sec>` to the same `--concat-videos` command. This is the local-agent advantage over browser-only workflows: generate clips with Sogni, then use the safe FFmpeg wrapper to stitch and mux audio locally.
218
+ ### Personas, memories, personality
1091
219
 
1092
- **Do NOT run raw `ffmpeg` commands.** Always use `--extract-last-frame` and `--concat-videos` for video manipulation.
220
+ - Only use `--persona "Name"` when the user refers to a **saved** persona by explicit name, id, or tag/alias โ€” user-uploaded photos are NOT personas; use `-c` for ad-hoc photos. With `--video`, a saved voice clip auto-attaches as the voice identity.
221
+ - Before generating, check saved preferences with `--memory-list` and respect them; save stated standing preferences with `--memory-set`. Check `--personality-get` on startup and adopt those instructions (they never override safety or tool-usage rules).
222
+ - **Read [`references/personas-memory.md`](./references/personas-memory.md)** for persona CRUD, voice cloning, multi-persona scenes, style transfer, and photo restoration recipes.
1093
223
 
1094
- **Always apply this pattern when:**
1095
- - User says "animate image A to image B" โ†’ use `--ref A --ref-end B`
1096
- - User says "animate this video to this image" โ†’ extract last frame, use as `--ref`, target image as `--ref-end`, then stitch
1097
- - User says "continue this video" with a target image โ†’ same as above
224
+ ### Model selection
1098
225
 
1099
- ### Transition Between Two Videos (Bridge Clip)
226
+ Prefer `-Q` presets and automatic workflow routing. When a specific model is needed (GPT Image 2 text rendering, Seedance native audio, WAN lip-sync, LTX dialogue), **read [`references/models.md`](./references/models.md)** for the catalog, recommended selectors, and sizing/divisibility rules.
1100
227
 
1101
- When a user asks to **create a transition between two existing videos** (A โ†’ B), bridge them with a generated clip anchored on both boundary frames:
228
+ ### Insufficient funds
1102
229
 
1103
- 1. **Extract the last frame of video A** and the **first frame of video B**:
1104
- ```bash
1105
- sogni-agent --extract-last-frame ./videoA.mp4 ./A_last.png
1106
- sogni-agent --extract-first-frame ./videoB.mp4 ./B_first.png
1107
- ```
1108
- 2. **Generate the transition** with i2v, anchoring startโ†’end so both seams are clean. Match `--fps` to the surrounding clips:
1109
- ```bash
1110
- sogni-agent -q --video -m wan_v2.2-14b-fp8_i2v_lightx2v \
1111
- --ref ./A_last.png --ref-end ./B_first.png --fps 24 \
1112
- -o ./transition.mp4 "descriptive morph between the two scenes"
1113
- ```
1114
- 3. **Concatenate A โ†’ transition โ†’ B**:
1115
- ```bash
1116
- sogni-agent --concat-videos ./merged.mp4 ./videoA.mp4 ./transition.mp4 ./videoB.mp4
1117
- ```
230
+ Use `--token-type auto` to retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models (Seedance, GPT Image 2) require Premium Spark eligibility and never fall back to SOGNI. When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply exactly:
1118
231
 
1119
- > **i2v clips are silent and use the model's own frame rate** (often not 24). `--concat-videos` now normalizes fps/size and fills silent audio automatically, so mismatched clips stitch correctly โ€” but passing `--fps` to the transition generation keeps things clean from the start. Use `--concat-fps <n>` to force a specific output frame rate.
1120
-
1121
- ### Remix / Layer Audio After Stitching
232
+ "Insufficient funds. Buy Spark Packs to continue: https://docs.sogni.ai/pricing/#spark-packs"
1122
233
 
1123
- After concatenating, use `--remix-audio` to rebuild the audio track **without re-encoding the video** (it is stream-copied, so it is fast and lossless on the picture). Combine the audio flags:
234
+ Do not collect payment details, quote a custom price, or simulate a purchase in the terminal.
1124
235
 
1125
- ```bash
1126
- # Loop one clip's audio across the whole merged video and fade it out at the end
1127
- sogni-agent --remix-audio ./merged.mp4 ./final.mp4 \
1128
- --bed-audio ./clip1.mp4 --audio-loop --audio-fade-out 2
1129
-
1130
- # Same, but also layer a second clip's original audio back in starting at 18s
1131
- sogni-agent --remix-audio ./merged.mp4 ./final.mp4 \
1132
- --bed-audio ./clip1.mp4 --audio-loop --audio-fade-out 2 \
1133
- --mix-audio ./clip3.mp4 --mix-at 18.01 --mix-gain -3
1134
- ```
236
+ ### Suggest next steps after a render
1135
237
 
1136
- - `--bed-audio` accepts a video or audio file; if omitted, the input video's own audio is the bed.
1137
- - `--audio-loop` loops the bed to cover the full video; `--audio-fade-in` / `--audio-fade-out` fade it.
1138
- - `--mix-audio` overlays one extra track (mixed at full level with a peak limiter so it never clips); position it with `--mix-at` and adjust level with `--mix-gain` (dB).
1139
- - To mix more than two layers, chain `--remix-audio` passes (each only re-encodes audio).
238
+ After an image: offer to animate it (`--video --ref <result>`), restyle it (`-c <result> "Apply style: ..."`), change the angle (`--multi-angle -c <result>`), generate variations (`-n 3 "{a|b|c}"`), or refine at `-Q pro`. After a video: offer different motion, dialogue (LTX), longer `--duration`, stitching (`--concat-videos`), or a soundtrack (`--concat-audio` / `--remix-audio`).
1140
239
 
1141
- **Do NOT run raw `ffmpeg` commands** for any of this. Use `--extract-first-frame`, `--extract-last-frame`, `--concat-videos`, and `--remix-audio`.
240
+ ## JSON Output Contract
1142
241
 
1143
- ## JSON Output
242
+ Success (`--json`):
1144
243
 
1145
244
  ```json
1146
245
  {
1147
246
  "success": true,
1148
247
  "prompt": "a cat wearing a hat",
1149
- "model": "z_image_turbo_bf16",
248
+ "model": "z_image_turbo_bf16",
1150
249
  "width": 512,
1151
250
  "height": 512,
1152
251
  "urls": ["https://..."],
@@ -1154,7 +253,7 @@ sogni-agent --remix-audio ./merged.mp4 ./final.mp4 \
1154
253
  }
1155
254
  ```
1156
255
 
1157
- On error (with `--json`), the script returns a single JSON object like:
256
+ Failure (single JSON object on stdout, exit code 1; progress/warnings on stderr):
1158
257
 
1159
258
  ```json
1160
259
  {
@@ -1168,170 +267,28 @@ On error (with `--json`), the script returns a single JSON object like:
1168
267
  }
1169
268
  ```
1170
269
 
1171
- Balance check example (`--json --balance`):
1172
-
1173
- ```json
1174
- {
1175
- "success": true,
1176
- "type": "balance",
1177
- "spark": 12.34,
1178
- "sogni": 0.56
1179
- }
1180
- ```
270
+ `--json --balance` โ†’ `{ "success": true, "type": "balance", "spark": 12.34, "sogni": 0.56 }`. `--last --json` wraps the last render record in a `{ "success": true, ... }` envelope and exits 1 with `errorCode: "NO_LAST_RENDER"` when nothing has been rendered. In `--json` mode stdout always carries exactly one JSON object โ€” SSE workflow frames and progress lines go to stderr.
1181
271
 
1182
272
  ## Cost
1183
273
 
1184
- Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens for native Sogni models when SPARK is insufficient. Seedance and GPT Image 2 are vendor models and require Premium Spark eligibility; they never use SOGNI fallback.
1185
-
1186
- ## Persona System
1187
-
1188
- Personas are named people with saved reference photos and optional voice clips. They enable identity-preserving generation across sessions.
1189
-
1190
- ### Managing Personas
1191
-
1192
- ```bash
1193
- # Add a persona with a reference photo
1194
- sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --description "30s male, brown hair, brown eyes"
1195
-
1196
- # Add with voice clip for video voice cloning
1197
- sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip sarah-voice.webm --voice "warm alto with British accent"
1198
-
1199
- # List all personas
1200
- sogni-agent --persona-list --json
1201
-
1202
- # Resolve a persona by name, tag, or pronoun
1203
- sogni-agent --persona-resolve "me" --json
1204
-
1205
- # Generate using a persona (auto-injects photo as context)
1206
- sogni-agent --persona "Mark" -o ./hero.png "superhero in dramatic lighting"
1207
-
1208
- # Remove a persona
1209
- sogni-agent --persona-remove "Mark"
1210
- ```
1211
-
1212
- ### Persona Pipeline Rules
1213
-
1214
- When a user mentions a persona by explicit saved name, id, or tag/alias:
1215
-
1216
- 1. **For images:** Use `--persona "Name" "prompt"` which auto-injects the persona's reference photo as context and selects the Qwen editing model
1217
- 2. **For video with voice cloning:** The persona's voice clip is used as `--reference-audio-identity` when `--video` is combined with `--persona`
1218
- 3. **For video without voice clip:** Describe the voice in the prompt ("speaks in a warm alto with a British accent")
1219
-
1220
- **Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by explicit name, id, or tag/alias. For ad-hoc photos, use `-c` (context image) directly.
1221
-
1222
- ## Memory System
1223
-
1224
- Memories are persistent key-value preferences stored locally at `~/.config/sogni/memories.json`.
1225
-
1226
- ```bash
1227
- # Save a preference
1228
- sogni-agent --memory-set preferred_style "watercolor and soft lighting"
1229
- sogni-agent --memory-set aspect_ratio "16:9"
1230
- sogni-agent --memory-set favorite_artist "Studio Ghibli"
1231
-
1232
- # Read all memories
1233
- sogni-agent --memory-list --json
1234
-
1235
- # Get one memory
1236
- sogni-agent --memory-get preferred_style --json
1237
-
1238
- # Delete a memory
1239
- sogni-agent --memory-remove preferred_style
1240
- ```
1241
-
1242
- **Agent behavior:** Before generating, check memories with `--memory-list` and respect saved preferences. If the user says "I always want watercolor style", save it with `--memory-set`. Categories: `preference` (default), `fact`, `context`.
1243
-
1244
- ## Personality (Custom Agent Instructions)
1245
-
1246
- Users can set custom instructions that shape agent behavior, stored at `~/.config/sogni/personality.txt`.
1247
-
1248
- ```bash
1249
- # Set personality
1250
- sogni-agent --personality-set "Be concise, always use cinematic lighting, suggest bold creative ideas"
1251
-
1252
- # Read current personality
1253
- sogni-agent --personality-get --json
1254
-
1255
- # Clear (reset to default)
1256
- sogni-agent --personality-clear
1257
- ```
1258
-
1259
- **Agent behavior:** Check personality on startup and adopt those instructions. Personality overrides default style but not hard constraints (safety, tool usage rules).
1260
-
1261
- ## Style Transfer
1262
-
1263
- Apply artistic styles to existing images:
1264
-
1265
- ```bash
1266
- # Apply a named artist style
1267
- sogni-agent -c photo.jpg -o ./styled.png "Apply style: Andy Warhol pop art with bold primary colors"
1268
-
1269
- # Studio Ghibli transformation
1270
- sogni-agent -c photo.jpg -o ./ghibli.png "Apply style: Studio Ghibli watercolor with soft pastel sky and lush greenery"
1271
-
1272
- # For photos with people, always preserve identity
1273
- sogni-agent -c portrait.jpg -o ./styled.png "Apply style: oil painting in the style of Vermeer. Preserve all facial features, expressions, and identity."
1274
- ```
1275
-
1276
- **Tips:** Reference artists and styles BY NAME for best results. Use positive phrasing. For photos with people, always append identity preservation instructions.
1277
-
1278
- ## Change Angle (Novel View Synthesis)
1279
-
1280
- Generate a photo from a different camera angle:
1281
-
1282
- ```bash
1283
- # 3/4 view
1284
- sogni-agent --multi-angle -c subject.jpg --azimuth front-right "same subject"
1285
-
1286
- # Side view
1287
- sogni-agent --multi-angle -c subject.jpg --azimuth left --elevation eye-level --distance medium "same subject"
1288
-
1289
- # Full 360 turntable
1290
- sogni-agent --angles-360 -c subject.jpg "same subject"
1291
- ```
1292
-
1293
- **User term mapping:**
1294
- - "from the left" / "side view" โ†’ `--azimuth left`
1295
- - "3/4 view" / "three-quarter" โ†’ `--azimuth front-right`
1296
- - "from behind" / "back" โ†’ `--azimuth back`
1297
- - "looking up at" โ†’ `--elevation low-angle`
1298
- - "bird's eye" / "top-down" โ†’ `--elevation high-angle`
1299
- - "closeup" โ†’ `--distance close-up`
1300
-
1301
- ## Creative Workflow Patterns
1302
-
1303
- ### After Image Generation โ€” Suggest Next Steps:
1304
- - "Animate into a video" โ†’ `--video --ref <result>`
1305
- - "Apply a different style" โ†’ `-c <result> "Apply style: ..."`
1306
- - "Change the angle" โ†’ `--multi-angle -c <result>`
1307
- - "Generate variations" โ†’ `-n 3 "{style1|style2|style3}"`
1308
- - "Refine at higher quality" โ†’ use `-Q pro`
1309
-
1310
- ### After Video Generation โ€” Suggest Next Steps:
1311
- - "Try different motion" โ†’ re-generate with adjusted prompt
1312
- - "Add dialogue" โ†’ include spoken words in the LTX-2.3 prompt
1313
- - "Make it longer" โ†’ increase `--duration`
1314
- - "Combine videos" โ†’ `--concat-videos`
1315
- - "Add one soundtrack over stitched clips" โ†’ `--concat-videos ... --concat-audio <audio>`
1316
- - "Use a section of a source video/audio" โ†’ `--video-start`, `--audio-start`, and `--audio-duration`
1317
-
1318
- ### Music-to-Video Pipeline:
1319
- 1. Use the provided/generated audio file as `--ref-audio`
1320
- 2. If there is also a reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `ia2v`
1321
- 3. If there is no reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `a2v`
1322
- 4. Use `--workflow s2v` only for explicit face lip-sync with a face image
1323
- 5. If only part of the song/audio should drive the clip, pass `--audio-start <sec>` and optionally `--audio-duration <sec>`
1324
-
1325
- ### Multi-Persona Scene:
1326
- 1. Resolve all personas: `--persona-resolve "Mark" --json` and `--persona-resolve "Sarah" --json`
1327
- 2. Generate scene with both: `-c mark-photo.jpg -c sarah-photo.jpg "Mark and Sarah at a cafe, use face from picture 1 for Mark, face from picture 2 for Sarah"`
1328
- 3. Animate with one persona's voice identity: `--video --ref <scene.png> --reference-audio-identity <mark-voice.webm> "MARK: \"Exact spoken words.\""`
274
+ Uses Spark tokens from the user's Sogni account. 512x512 images are most cost-efficient. `-n` is safety-capped at 16 outputs per call (`SOGNI_MAX_COUNT` raises it deliberately). Seedance and GPT Image 2 are vendor models requiring Premium Spark eligibility.
1329
275
 
1330
276
  ## Troubleshooting
1331
277
 
1332
- - **Auth errors**: Check `SOGNI_API_KEY` or the API key in `~/.config/sogni/credentials`
1333
- - **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
1334
- - **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
1335
- - **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
1336
- - **Timeouts**: Try a faster model or increase `-t` timeout
1337
- - **No workers**: Check https://sogni.ai for network status
278
+ - **Anything broken?** Run `sogni-agent doctor` first โ€” it checks Node, credentials (and file permissions), config-dir writability, ffmpeg, live auth, and version freshness, with a fix in every failure detail.
279
+ - **Auth errors:** check `SOGNI_API_KEY` or `~/.config/sogni/credentials` (key from https://dashboard.sogni.ai, account menu).
280
+ - **Video size errors:** sizes are model-specific (WAN รท16 min 480 max 1536; LTX รท64, long side โ‰ค2048). The CLI auto-adjusts for local refs; `--strict-size` makes it fail with a suggested size instead. Details in [`references/models.md`](./references/models.md).
281
+ - **Timeouts:** try a faster model or raise `-t`.
282
+ - **No workers:** check https://sogni.ai for network status.
283
+
284
+ ## Reference Index (read before acting)
285
+
286
+ | Read this | When the task involves |
287
+ |-----------|------------------------|
288
+ | [`references/video-prompting.md`](./references/video-prompting.md) | Writing any LTX video prompt; "hd/1080p/4k" requests; orientation/aspect mapping; camera language |
289
+ | [`references/video-editing.md`](./references/video-editing.md) | Animate between images, continue/bridge videos, 360 turnarounds, concat, audio remix/layering, v2v ControlNet |
290
+ | [`references/hosted-api.md`](./references/hosted-api.md) | `--api-chat`, `--durable-chat`, `--api-workflow`, workflow templates, replays, Seedance reference modes, cost controls |
291
+ | [`references/models.md`](./references/models.md) | Choosing models, sizing/divisibility rules, gpt-image-2 limits, music model options |
292
+ | [`references/personas-memory.md`](./references/personas-memory.md) | Persona CRUD/voice cloning, multi-persona scenes, memories, personality, style transfer, photo restoration |
293
+ | [`references/openclaw-config.md`](./references/openclaw-config.md) | OpenClaw plugin config defaults and overrides |
294
+ | [`skills/README.md`](./skills/README.md) | Hosted per-skill tool surface (for hosts that load focused capability subsets) |