@sogni-ai/sogni-creative-agent-skill 3.3.5 โ†’ 3.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/SKILL.md CHANGED
@@ -2,9 +2,9 @@
2
2
  name: sogni-creative-agent-skill
3
3
  description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories, custom personality, style transfer, angle synthesis, Seedance/LTX/WAN video, music/lyrics, hosted chat, durable workflows, replay records, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
4
4
  metadata:
5
- version: "3.3.5"
5
+ version: "3.5.0"
6
6
  homepage: https://sogni.ai
7
- clawdbot:
7
+ openclaw:
8
8
  emoji: "๐ŸŽจ"
9
9
  primaryEnv: "SOGNI_API_KEY"
10
10
  os: ["darwin", "linux", "win32"]
@@ -22,60 +22,56 @@ metadata:
22
22
  config:
23
23
  - "~/.config/sogni/credentials"
24
24
  - "~/.openclaw/openclaw.json"
25
- - "~/.clawdbot/media/inbound"
25
+ - "~/.openclaw/media/inbound"
26
26
  - "~/.config/sogni/last-render.json"
27
27
  - "~/Downloads/sogni"
28
28
  install:
29
29
  - id: npm
30
30
  kind: exec
31
- command: "cd {{skillDir}} && cp skill-package.json package.json && npm i"
31
+ command: "cd {{skillDir}} && ([ -f package.json ] || cp skill-package.json package.json) && npm i"
32
32
  label: "Prepare runtime dependencies"
33
33
  ---
34
34
 
35
35
  # Sogni Image, Video & Music Generation
36
36
 
37
- Generate **images, videos, and music** using Sogni AI's decentralized GPU network.
37
+ Generate **images, videos, and music** using Sogni AI's decentralized GPU network through the `sogni-agent` CLI.
38
38
 
39
- > **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index โ€” one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, `composition_planning`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
39
+ > **Deep-dive references:** this file holds the rules you must always follow plus the everyday commands. Detailed guides live in [`references/`](./references/) โ€” read the matching file *before* acting on those tasks (table at the end of this file). If the `references/` directory is not present in your install, run `sogni-agent --help` for the full flag reference or fetch the guides from `https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/references/`.
40
+ >
41
+ > **Per-skill view:** hosts that load focused capabilities rather than one artifact can read [`skills/README.md`](./skills/README.md) for the per-skill index of the hosted tool surface.
40
42
 
41
43
  ## Install Request Policy
42
44
 
43
- When a user asks to install this plugin, skill, or Sogni Creative Agent Skill, install it as the command-line tool plus this skill.
44
-
45
- Default install path:
45
+ When a user asks to install this plugin or skill, install the command-line tool plus this skill:
46
46
 
47
47
  ```bash
48
48
  npm install -g @sogni-ai/sogni-creative-agent-skill@latest
49
49
  sogni-agent --version
50
50
  ```
51
51
 
52
- Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI.
53
-
54
- Always invoke the globally installed `sogni-agent` command. Do not call `node {{skillDir}}/sogni-agent.mjs` or `node sogni-agent.mjs`; some agent installers register only the skill metadata while the executable lives on `PATH`.
55
-
56
- For upgrades, prefer package-manager updates or direct operations on an existing checkout. Do not generate clone-or-pull shell bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs; agent command scanners may require approval for those patterns.
52
+ Then configure the agent/runtime to use this `SKILL.md` and invoke the `sogni-agent` CLI. The one-command alternative `npx setup-sogni-agent-skill` auto-detects Claude Code, Codex CLI, and Hermes (it does not configure OpenClaw).
57
53
 
58
- Agent-safe CLI upgrade:
54
+ After any install or upgrade, verify with:
59
55
 
60
56
  ```bash
61
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
62
- sogni-agent --version
57
+ sogni-agent doctor
63
58
  ```
64
59
 
65
- Agent-safe update for an existing local checkout:
60
+ Agents should run `sogni-agent doctor --json` and confirm `"success": true` before reporting the install as working.
66
61
 
67
- ```bash
68
- DEST="$HOME/Documents/git/sogni/sogni-creative-agent-skill"
69
- git -C "$DEST" pull --ff-only
70
- npm --prefix "$DEST" install
71
- ```
62
+ Always invoke the globally installed `sogni-agent` command. Do not call `node {{skillDir}}/sogni-agent.mjs` or `node sogni-agent.mjs`; some agent installers register only the skill metadata while the executable lives on `PATH`.
63
+
64
+ For upgrades, prefer `sogni-agent self-update`, package-manager updates, or direct operations on an existing checkout (`git -C "$DEST" pull --ff-only && npm --prefix "$DEST" install`). Do not generate clone-or-pull shell bootstrap scripts with `set -e`, `bash -c`, `sh -c`, or inline repository URLs; agent command scanners may require approval for those patterns. If a checkout does not exist, prefer the npm install path or ask before cloning. When an update notice appears, offer the user the upgrade (`sogni-agent self-update`); if they decline, run `sogni-agent --snooze-update` so they are not re-nagged daily, and `sogni-agent --whats-new` after upgrading to summarize changes.
65
+
66
+ ## Uninstall Request Policy
72
67
 
73
- If that checkout does not exist, prefer the npm-based local skill install below, or ask before cloning.
68
+ When a user asks to uninstall, run `npx setup-sogni-agent-skill --uninstall --remove-cli --purge`. This removes the skill files, the global CLI, and the user's data in `~/.config/sogni/` after backing it up to `~/.config/sogni.backup-<timestamp>.tar.gz`. Always tell the user the backup path and that it contains their API key. To keep their data, omit `--purge`.
74
69
 
75
70
  ## Setup
76
71
 
77
72
  1. **Get your Sogni API key** by logging into https://dashboard.sogni.ai and opening the account menu.
78
- 2. **Create an API key credentials file:**
73
+ 2. **Create the credentials file** (or just export `SOGNI_API_KEY`):
74
+
79
75
  ```bash
80
76
  mkdir -p ~/.config/sogni
81
77
  cat > ~/.config/sogni/credentials << 'EOF'
@@ -84,31 +80,9 @@ EOF
84
80
  chmod 600 ~/.config/sogni/credentials
85
81
  ```
86
82
 
87
- You can also export `SOGNI_API_KEY` instead of writing the file. The API key can always be found by logging into https://dashboard.sogni.ai and opening the account menu.
88
-
89
- 3. **Install the CLI and skill by default:**
90
- ```bash
91
- npm install -g @sogni-ai/sogni-creative-agent-skill@latest
92
- sogni-agent --version
93
- ```
94
-
95
- Configure the agent/runtime to use this `SKILL.md`.
96
-
97
- 4. **Install dependencies if working from a clone:**
98
- ```bash
99
- cd /path/to/sogni-creative-agent-skill
100
- npm i
101
- ```
102
-
103
- 5. **Or install from npm into a local skill directory (no git clone):**
104
- ```bash
105
- mkdir -p ~/.clawdbot/skills
106
- cd ~/.clawdbot/skills
107
- npm i @sogni-ai/sogni-creative-agent-skill
108
- ln -sfn node_modules/@sogni-ai/sogni-creative-agent-skill sogni-creative-agent-skill
109
- ```
83
+ 3. **Verify:** `sogni-agent doctor`
110
84
 
111
- When this skill is distributed via ClawHub, it bootstraps its local runtime dependencies from `skill-package.json` during install. That avoids relying on a root `package.json` being present in the published skill artifact.
85
+ When this skill is distributed via ClawHub, it bootstraps its runtime dependencies from `skill-package.json` during install (the install hook skips the copy when a real `package.json` is already present, so it never clobbers a git checkout).
112
86
 
113
87
  ## Output Path Convention
114
88
 
@@ -120,964 +94,167 @@ sogni-agent -o cat.png "a cat wearing a hat" # โœ“ lands in PWD
120
94
  sogni-agent -o /tmp/cat.png "a cat wearing a hat" # โœ— avoid โ€” user can't easily find it
121
95
  ```
122
96
 
123
- `/tmp` (and `mkdtempSync(...)`) is reserved internally for transient intermediate files the CLI cleans up itself (audio re-encodes, intermediate clips during stitching). Final renders the user is asking for must remain inside their working directory unless they explicitly request a different location.
97
+ `/tmp` is reserved for transient intermediate files the CLI cleans up itself. Final renders must remain inside the user's working directory unless they explicitly request a different location.
124
98
 
125
99
  ## Filesystem Paths and Overrides
126
100
 
127
- Default file paths used by this skill:
128
-
129
- - API key credentials file (read): `~/.config/sogni/credentials`
130
- - Last render metadata (read/write): `~/.config/sogni/last-render.json`
131
- - OpenClaw config (read): `~/.openclaw/openclaw.json`
132
- - Media listing for `--list-media` (read): `~/.clawdbot/media/inbound`
101
+ - API key credentials file (read): `~/.config/sogni/credentials` (`SOGNI_CREDENTIALS_PATH`)
102
+ - Last render metadata (read/write): `~/.config/sogni/last-render.json` (`SOGNI_LAST_RENDER_PATH`)
103
+ - Memories / personality / personas (read/write): `~/.config/sogni/`
104
+ - OpenClaw config (read): `~/.openclaw/openclaw.json` (`OPENCLAW_CONFIG_PATH`)
105
+ - Media listing for `--list-media` (read): `~/.openclaw/media/inbound`, falling back to the legacy `~/.clawdbot/media/inbound` when only it exists (`SOGNI_MEDIA_INBOUND_DIR`)
106
+ - Custom ffmpeg binary: `FFMPEG_PATH`
133
107
 
134
- Path override environment variables:
108
+ ## Recommended path: hosted Sogni Intelligence endpoints
135
109
 
136
- - `SOGNI_CREDENTIALS_PATH`
137
- - `SOGNI_LAST_RENDER_PATH`
138
- - `SOGNI_MEDIA_INBOUND_DIR`
139
- - `OPENCLAW_CONFIG_PATH`
140
-
141
- ## Recommended path: route through the hosted Sogni Intelligence endpoints
142
-
143
- For any natural-language creative request โ€” anything that should be planned, multi-step, resumable, or that benefits from tool selection, repair, or durable workflows โ€” prefer the hosted Sogni Intelligence endpoints over the direct-to-SDK media flags. The hosted surfaces are the canonical home for OpenAI-compatible chat, server-side creative tool dispatch, Structured Contracts v1 (gating policies, repair recipes, prompt contracts), durable chat runs, durable workflows, workflow templates, replay, and asset-manifest mapping. They stay aligned with `sogni-chat`, `sogni-api`, and the rest of the `@sogni/creative-agent` consumers.
110
+ For any natural-language creative request that should be planned, multi-step, resumable, or benefit from server-side tool selection and repair, prefer the hosted endpoints over direct-to-SDK flags โ€” **read [`references/hosted-api.md`](./references/hosted-api.md) first** for the full contract (tool surfaces, durable workflows, templates, replays, Seedance reference modes, media-reference uploads, cost controls):
144
111
 
145
112
  ```bash
146
113
  # Natural-language creative request (LLM picks the tool, dispatches, repairs)
147
114
  sogni-agent --api-chat "Turn the attached product photo into a launch poster" --ref product.jpg
148
115
 
149
116
  # Durable hosted chat run (persisted event log + SSE stream)
150
- SOGNI_SKILL_USE_SDK_TRANSPORT=1 sogni-agent --durable-chat \
151
- "Create a four-shot launch campaign, generate the key art, and animate the hero clip"
152
-
153
- # Multi-step durable workflow (resumable, replay-friendly, server-orchestrated)
154
- sogni-agent --api-workflow \
155
- --video-prompt "The camera slowly pushes in" \
156
- "A graphite robot sketch on a drafting table"
157
-
158
- # Storyboard โ†’ keyframe โ†’ Seedance, all server-side
159
- sogni-agent --api-workflow storyboard-video --storyboard-frames 6 -Q hq \
160
- "Create a 9:16 bakery launch video with a neon street-window reveal"
161
- ```
162
-
163
- The direct-to-SDK flags below remain available for explicit one-shot generation when you already know the exact model, dimensions, and prompt and don't need LLM planning. Use them when latency or cost rules out the LLM round-trip.
164
-
165
- ## Usage (direct-to-SDK image, video & music)
166
-
167
- ```bash
168
- # Generate and get URL
169
- sogni-agent "a cat wearing a hat"
170
-
171
- # Quality presets (recommended for direct mode โ€” auto-selects model, steps, and size)
172
- sogni-agent -Q fast "a cat wearing a hat" # z_image_turbo, 8 steps, 512x512 (~5-10s)
173
- sogni-agent -Q hq "a cat wearing a hat" # z_image_turbo, default steps, 768x768 (~10-15s)
174
- sogni-agent -Q pro "a cat wearing a hat" # flux2_dev, 40 steps, 1024x1024 (~2min)
175
-
176
- # Dynamic prompt variations โ€” diverse images in one call
177
- sogni-agent -n 3 "a {red|blue|green} sports car"
178
- # โ†’ generates "a red sports car", "a blue sports car", "a green sports car"
179
-
180
- # Prompt-only video takes from the same source image
181
- sogni-agent --video --ref hero.png -n 3 --duration 5 \
182
- "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
183
-
184
- # Token auto-fallback for native Sogni models (tries SPARK, falls back to SOGNI)
185
- sogni-agent --token-type auto "a cat wearing a hat"
186
-
187
- # Save to file (relative paths land in the current working directory)
188
- sogni-agent -o ./cat.png "a cat wearing a hat"
189
-
190
- # JSON output (for scripting)
191
- sogni-agent --json "a cat wearing a hat"
192
-
193
- # Check token balances (no prompt required)
194
- sogni-agent --balance
195
-
196
- # Check token balances in JSON
197
- sogni-agent --json --balance
198
-
199
- # Quiet mode (suppress progress)
200
- sogni-agent -q -o ./cat.png "a cat wearing a hat"
201
-
202
- # Direct music/audio generation
203
- sogni-agent --music --duration 30 \
204
- "uplifting cinematic synthwave theme for a product launch"
205
-
206
- # Song with lyrics and musical controls
207
- sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 \
208
- --keyscale "C major" --output-format mp3 "bright indie pop chorus"
209
-
210
- # Hosted API chat: natural-language creative-agent tool execution
211
- sogni-agent --api-chat "Create a 4-shot product video concept for a red sneaker"
212
-
213
- # Hosted API chat with image vision and media-reference metadata
214
- sogni-agent --api-chat --ref product.jpg \
215
- "Turn this into a launch poster and describe the edit plan"
216
-
217
- # Sogni Intelligence model/replay utilities
218
- sogni-agent --list-api-models
219
- sogni-agent --api-chat --task-profile reasoning --max-tokens 2000 \
220
- "Plan a concise multi-step product launch workflow"
221
- sogni-agent --list-replays 20
222
- sogni-agent --get-replay run_abc123 --json
223
-
224
- # Draft a savable workflow template through the hosted creative-agent tool loop
225
- sogni-agent --api-chat \
226
- "Design a reusable workflow for a 9:16 product teaser from one product photo"
227
-
228
- # Durable API workflow: generated keyframe to video with resumable workflow record
229
- sogni-agent --api-workflow \
230
- --video-prompt "The camera slowly pushes in as the sketch comes alive" \
231
- "A graphite robot sketch on a drafting table"
232
-
233
- # Durable API workflow with media reference and cost controls
234
- sogni-agent --api-workflow \
235
- --ref https://cdn.example.com/sketch.png \
236
- --workflow-max-cost 25 --confirm-cost \
237
- --video-prompt "The camera slowly pushes in as the sketch comes alive" \
238
- "Animate the referenced sketch"
239
-
240
- # Exact durable workflow input with explicit steps
241
- sogni-agent --api-workflow --workflow-input @workflow-input.json \
242
- --workflow-idempotency-key product-teaser-v1
243
-
244
- # Durable storyboard-video workflow: storyline -> GPT Image 2 storyboard -> Seedance
245
- sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
246
- "Create a 9:16 bakery launch video with a neon street-window reveal"
247
-
248
- # Workflow management
249
- sogni-agent --list-workflows
250
- sogni-agent --resume-workflow wf_durable_workflow_123
251
- ```
252
-
253
- Use `--api-chat` for text-first natural-language workflows that should go through
254
- Sogni API's OpenAI-compatible `POST /v1/chat/completions` loop. The public
255
- REST body uses snake_case controls such as `tool_choice`, `response_format`,
256
- `task_profile`, `token_type`, `app_source`, `media_references`,
257
- `chat_template_kwargs`, `sogni_tools`, and `sogni_tool_execution`. The endpoint
258
- normalizes OpenAI `developer` messages to `system`; when a developer message is
259
- present and no explicit `task_profile` is supplied, the server treats the task
260
- as `coding`. The CLI sanitizes prompt-injection markers before forwarding
261
- messages and sends API-key auth so hosted Sogni tools can execute server-side.
262
-
263
- Hosted tool surfaces are split by `sogni_tools`:
264
-
265
- - `creative-tools` is the public API default when `sogni_tools` is omitted or
266
- true. It exposes generation/editing tools (`generate_image`,
267
- `generate_video`, `generate_music`, `edit_image`, `apply_style`,
268
- `restore_photo`, `refine_result`, `animate_photo`, `change_angle`,
269
- `video_to_video`, `stitch_video`, `orbit_video`, `dance_montage`,
270
- `sound_to_video`, `extend_video`, `replace_video_segment`, `overlay_video`,
271
- `add_subtitles`), media-analysis tools (`analyze_image`, `analyze_video`,
272
- `extract_metadata`), and lightweight composition tools (`enhance_prompt`,
273
- `compose_lyrics`, `compose_instrumental`, `compose_script`).
274
- - `creative-agent` is this CLI's default for `--api-chat`. It includes the
275
- `creative-tools` surface plus session-control tools
276
- (`ask_clarifying_question`, `finalize_response`), asset-manifest tools
277
- (`create_asset_manifest`, `inspect_asset`, `label_asset`,
278
- `map_assets_for_model`, `validate_asset_references`), and durable planning
279
- tools (`compose_workflow`, `compose_workflow_template`). Use this surface
280
- when the model should design one-shot workflow plans, draft savable workflow
281
- templates, or maintain stable asset references across a multi-step turn.
282
- - `none` disables Sogni tool injection and leaves only caller-supplied OpenAI
283
- tools on raw API/SDK requests. In the CLI, use it with
284
- `--no-api-tool-execution` when you want text-only planning without hosted
285
- Sogni tool dispatch.
286
-
287
- Use `--durable-chat` for long-running, LLM-in-the-loop turns that should be
288
- persisted as `POST /v1/chat/runs` records instead of a single
289
- `/v1/chat/completions` request. Chat runs keep an event log, stream via
290
- `/v1/chat/runs/:id/events/stream`, support cancellation, and can pause for
291
- persisted cost approval (`/v1/chat/runs/:id/confirm-cost`) in first-party
292
- clients. The CLI can start and stream durable chat runs through the SDK
293
- transport when `SOGNI_SKILL_USE_SDK_TRANSPORT=1` is set.
294
-
295
- Use `--api-workflow` when the caller already knows it wants an async durable
296
- workflow under `POST /v1/creative-agent/workflows`. The API now accepts either
297
- an inline durable plan (`input.steps`) or a saved workflow template invocation
298
- (`workflow_id` plus `inputs`) and rejects requests that provide both. The CLI's
299
- generated-keyframe and `storyboard-video` presets submit inline `input.steps`;
300
- `--workflow-input @workflow-input.json` supplies that `input` object directly.
301
- Saved template CRUD lives at `/v1/creative-agent/workflows/templates`, and a
302
- saved template can later be run by API/SDK callers with `workflow_id + inputs`.
303
- Use `compose_workflow_template` through `--api-chat` to draft a savable template;
304
- the caller is still responsible for persisting the returned `template_draft`.
305
-
306
- Exact multi-step workflow plans should use explicit step dependencies, including
307
- `replace_video_segment` steps with bounded `replacementStartSeconds` /
308
- `replacementEndSeconds` when interleaving existing video slices. Workflow JSON
309
- can bind request media into step arguments with `sourceStepId: "$input_media"`.
310
- Use `--api-workflow storyboard-video` when the hosted sequence should generate a
311
- storyline, create one GPT Image 2 storyboard sheet, and feed that image artifact
312
- into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT
313
- Image 2 low|medium|high quality for the storyboard sheet.
314
-
315
- Hosted API requests forward media references from `-c`, `--ref`, `--ref-end`,
316
- `--ref-audio`, `--reference-audio-identity`, and `--ref-video` as
317
- `media_references` metadata. `--ref-audio` and `--ref-video` are repeatable in
318
- api-chat / durable-chat mode โ€” each entry uploads independently and is exposed
319
- to the hosted LLM at `@Audio1` / `@Audio2` / `@Video1` etc. API chat also
320
- attaches image refs as vision inputs. Local file references are uploaded to
321
- Sogni media storage first, then forwarded as retrievable URLs for hosted chat
322
- and durable workflows. Use the direct CLI path for private media that must not
323
- leave the local machine.
324
-
325
- ### Seedance reference modes (mutually exclusive)
326
-
327
- When `--video -m seedance2` or `-m seedance2-fast` is selected, the skill
328
- exposes the same two-mode pattern that the hosted chat surfaces. Pick one
329
- mode per video request:
330
-
331
- - **Dedicated frame mode โ€” `--ref` and/or `--ref-end`.** First-class
332
- first-frame / last-frame anchoring; the Seedance worker pins them as
333
- parameter-mode firstFrame / lastFrame. Max 2 images.
334
- - **Loose reference mode โ€” `-c/--context` plus optional `--ref-audio`
335
- extras and `--ref-video` extras.** Anchor frame intent in the prompt with
336
- `@Image1` / `@Image2` / `@Video1` / `@Audio1` etc. (e.g. *"Use @Image1 as
337
- the opening shot reference"*). Supports up to 9 image refs, 3 video refs,
338
- 3 audio refs, and 12 total reference assets per video request. The
339
- numeric caps come from the canonical
340
- `@sogni-ai/sogni-protocol/catalogs/seedance-reference-limits.json` catalog,
341
- surfaced through `@sogni-ai/sogni-intelligence-client/tools` as
342
- `SEEDANCE_REFERENCE_LIMITS` and `validateSeedanceReferenceCounts()`.
343
-
344
- Combining `--ref` / `--ref-end` with `-c/--context` on Seedance is rejected
345
- client-side with a clear error pointing to the correct mode. In CLI direct-gen
346
- mode, additional `--ref-audio` / `--ref-video` entries beyond the first must
347
- be HTTPS URLs (the primary entry can still be a local file path); for local
348
- multi-file Seedance uploads, use `--api-chat` / `--durable-chat` instead. Use
349
- `--workflow-max-cost <n>` plus `--confirm-cost` / `--no-confirm-cost` to forward
350
- explicit workflow cost policy, and `--workflow-idempotency-key` when retrying a
351
- workflow start request.
352
-
353
- Sogni Intelligence utilities are exposed through the same API-key path:
354
- `--list-api-models` / `--get-api-model <id>` read `/v1/models`, `--task-profile`
355
- and `--max-tokens` tune `/v1/chat/completions`, and `--list-replays`,
356
- `--get-replay`, and `--ingest-replay` manage `/v1/replay/records` RunRecords for
357
- replay/debug viewers. The public chat endpoint also accepts OpenAI-standard
358
- `reasoning_effort` / `reasoning.effort` in raw API requests. The CLI's
359
- `--thinking` / `--no-thinking` flags are forwarded as
360
- `chat_template_kwargs.enable_thinking`; current hosted Qwen requests may
361
- normalize thinking on server-side, so do not rely on `--no-thinking` as a hard
362
- suppression switch for `/v1/chat/completions`.
363
- Hosted API modes require `SOGNI_API_KEY`; this skill's CLI uses API-key
364
- authentication.
365
-
366
- For durable hosted chat runs (long-running multi-tool turns that should
367
- survive a client disconnect), the SDK now exposes
368
- `sogni.chat.runs.{create, get, cancel, streamEvents}`.
369
- Set `SOGNI_SKILL_USE_SDK_TRANSPORT=1` to route hosted workflow + chat
370
- operations through the SDK transport instead of the legacy
371
- SSRF-validated fetch path. The skill's `sogni-hosted-client.mjs`
372
- factory still validates `restEndpoint` / `socketEndpoint` against the
373
- SSRF guard before constructing the SDK client, so the safety contract
374
- holds.
375
- For `--durable-chat`, stream output as the run advances; the CLI reports
376
- assistant deltas plus de-duplicated per-job progress / ETA / result lines from
377
- hosted run events.
378
-
379
- When changing hosted API chat/workflow behavior, keep reusable validation,
380
- workflow compilation, repair-control, and guard telemetry logic in the shared
381
- Sogni runtime first, then sync it into this public skill. The public skill
382
- should consume generated or shared typed contracts instead of adding
383
- skill-local regex guards. Keep local regex limited to bounded CLI/fact
384
- extraction such as paths, URLs, extensions, dimensions, durations, and explicit
385
- positions.
386
-
387
- ## Options
388
-
389
- | Flag | Description | Default |
390
- |------|-------------|---------|
391
- | `-Q, --quality <tier>` | Quality preset: fast\|hq\|pro (auto-selects model/steps/size) | - |
392
- | `-o, --output <path>` | Save to file | prints URL |
393
- | `-m, --model <id>` | Model ID (overrides --quality) | z_image_turbo_bf16 |
394
- | `-w, --width <px>` | Width | 512 |
395
- | `-h, --height <px>` | Height | 512 |
396
- | `-n, --count <num>` | Number of images (supports {a\|b\|c} prompt variations) | 1 |
397
- | `-t, --timeout <sec>` | Timeout seconds | 30 (300 for video) |
398
- | `-s, --seed <num>` | Specific seed | random |
399
- | `--last-seed` | Reuse seed from last render | - |
400
- | `--seed-strategy <s>` | Seed strategy: random\|prompt-hash | prompt-hash |
401
- | `--multi-angle` | Multiple angles LoRA mode (Qwen Image Edit) | - |
402
- | `--angles-360` | Generate 8 azimuths (front -> front-left) | - |
403
- | `--angles-360-video` | Assemble looping 360 mp4 using i2v between angles (requires ffmpeg) | - |
404
- | `--azimuth <key>` | front\|front-right\|right\|back-right\|back\|back-left\|left\|front-left | front |
405
- | `--elevation <key>` | low-angle\|eye-level\|elevated\|high-angle | eye-level |
406
- | `--distance <key>` | close-up\|medium\|wide | medium |
407
- | `--angle-strength <n>` | LoRA strength for multiple_angles | 0.9 |
408
- | `--angle-description <text>` | Optional subject description | - |
409
- | `--steps <num>` | Override steps (model-dependent) | - |
410
- | `--guidance <num>` | Override guidance (model-dependent) | - |
411
- | `--output-format <f>` | Image output format: png\|jpg, or webp for gpt-image-2 | png |
412
- | `--sampler <name>` | Sampler (model-dependent) | - |
413
- | `--scheduler <name>` | Scheduler (model-dependent) | - |
414
- | `--lora <id>` | LoRA id (repeatable, edit only) | - |
415
- | `--loras <ids>` | Comma-separated LoRA ids | - |
416
- | `--lora-strength <n>` | LoRA strength (repeatable) | - |
417
- | `--lora-strengths <n>` | Comma-separated LoRA strengths | - |
418
- | `--token-type <type>` | Token type: spark\|sogni\|auto (auto retries with alternate) | spark |
419
- | `--balance, --balances` | Show SPARK/SOGNI balances and exit | - |
420
- | `-c, --context <path>` | Context image for editing | - |
421
- | `--last-image` | Use last generated image as context/ref | - |
422
- | `--music` | Generate music/audio instead of image | - |
423
- | `--music-model <id>` | Music model: turbo\|sft\|ace_step_1.5_turbo\|ace_step_1.5_sft | ace_step_1.5_turbo |
424
- | `--lyrics <text>` | Optional lyrics for song generation | - |
425
- | `--language <code>` | Lyrics language code | en |
426
- | `--bpm <num>` | Music tempo, 30-300 BPM | server default |
427
- | `--keyscale <text>` | Music key/scale, e.g. C major | - |
428
- | `--timesig <n>` | Time signature: 2\|3\|4\|6 | server default |
429
- | `--composer-mode`, `--no-composer-mode` | Toggle AI composer mode | server default |
430
- | `--prompt-strength <n>` | Music prompt adherence, 0-10 | server default |
431
- | `--creativity <n>` | Music variation/temperature, 0-2 | server default |
432
- | `--music-shift <n>` | Audio model shift parameter, 1-6 | 3 |
433
- | `--audio-format <f>` | Alias for music output format: mp3\|flac\|wav | mp3 |
434
- | `--video, -v` | Generate video instead of image | - |
435
- | `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
436
- | `--fps <num>` | Frames per second (video) | model default |
437
- | `--duration <sec>` | Duration in seconds (video or music) | video 5, music 30 |
438
- | `--frames <num>` | Override total frames (video) | - |
439
- | `--target-resolution <px>` | Short-side video target preserving aspect ratio | - |
440
- | `--auto-resize-assets` | Auto-resize video assets | true |
441
- | `--no-auto-resize-assets` | Disable auto-resize | - |
442
- | `--estimate-video-cost` | Estimate video cost and exit | - |
443
- | `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
444
- | `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
445
- | `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
446
- | `--ref <path\|url>` | Reference image for video or photobooth face | required for video/photobooth |
447
- | `--ref-end <path\|url>` | End frame for i2v interpolation | - |
448
- | `--ref-audio <path\|url>` | Uploaded/generated audio for ia2v/a2v, or s2v lip-sync | - |
449
- | `--audio-start <sec>` | Start offset into `--ref-audio` | - |
450
- | `--audio-duration <sec>` | Duration slice from `--ref-audio` | - |
451
- | `--reference-audio-identity <path>` | Voice identity clip for LTX native audio | - |
452
- | `--voice-persona <name>` | Use saved persona voice clip as LTX voice identity | - |
453
- | `--ref-video <path\|url>` | Reference video for animate/v2v workflows | - |
454
- | `--video-start <sec>` | Start offset into `--ref-video` for segmented V2V/animate | - |
455
- | `--controlnet-name <name>` | ControlNet type for v2v: canny\|pose\|depth\|detailer | - |
456
- | `--controlnet-strength <n>` | ControlNet strength for v2v (0.0-1.0) | canny/pose/depth 0.85, detailer 1.0 |
457
- | `--sam2-coordinates <coords>` | SAM2 click coords for animate-replace (x,y or x1,y1;x2,y2) | - |
458
- | `--trim-end-frame` | Trim last frame for seamless video stitching | - |
459
- | `--first-frame-strength <n>` | Keyframe strength for start frame (0.0-1.0) | - |
460
- | `--last-frame-strength <n>` | Keyframe strength for end frame (0.0-1.0) | - |
461
- | `--last` | Show last render info | - |
462
- | `--json` | JSON output | false |
463
- | `--strict-size` | Do not auto-adjust i2v video size for reference resizing constraints | false |
464
- | `-q, --quiet` | No progress output | false |
465
- | `--extract-last-frame <video> <image>` | Extract last frame from video (safe ffmpeg wrapper) | - |
466
- | `--concat-videos <out> <clips...>` | Concatenate video clips (safe ffmpeg wrapper) | - |
467
- | `--concat-audio <path>` | Optional audio track to mux over `--concat-videos` output | - |
468
- | `--concat-audio-start <sec>` | Start offset into `--concat-audio` | - |
469
- | `--list-media [type]` | List recent inbound media (images\|audio\|all) | images |
470
- | `--api-chat` | Call OpenAI-compatible `/v1/chat/completions`; CLI default sends the hosted `creative-agent` tool surface | - |
471
- | `--durable-chat` | Start and stream a durable `/v1/chat/runs` record through SDK transport; requires `SOGNI_SKILL_USE_SDK_TRANSPORT=1` | - |
472
- | `--api-tools <mode>` | API tool mode: creative-agent\|creative-tools\|none. CLI default is creative-agent; raw API default is creative-tools. | creative-agent |
473
- | `--no-api-tool-execution` | Plan/tool-call via API chat without executing Sogni tools | - |
474
- | `--llm-model <id>` | LLM model for `--api-chat` | qwen3.6-35b-a3b-gguf-iq4xs |
475
- | `--task-profile <profile>` | Sogni Intelligence task profile: general\|coding\|reasoning | - |
476
- | `--max-tokens <n>` | Max hosted chat completion tokens | 1600 |
477
- | `--thinking`, `--no-thinking` | Forward `chat_template_kwargs.enable_thinking` for hosted chat; current public Qwen requests may normalize thinking on server-side | server default |
478
- | `--system <text>` | Override the base system prompt for hosted chat | built-in creative assistant prompt |
479
- | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM model metadata | - |
480
- | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>` | Manage Sogni Intelligence replay RunRecords. List/get output is run through `redactRunRecord` from `@sogni/creative-agent/replay` before printing, so signed URLs, bearer tokens, JWTs, and PEM blocks cannot leak via the CLI. Use `@path` to load JSON from a file. | - |
481
- | `--skip-redact`, `--no-redact` | Bypass the replay redactor on `--list-replays` / `--get-replay`. Debug-only โ€” emits unredacted RunRecord payloads. | redacted |
482
- | `--turn-classify` | Print the public-skill turn policy (`visibleTools`, `forbiddenTools`, `requiredTools`) the default contract runtime would produce for the current session-state flags. Mirrors the chat / `/v1/chat/completions` Structured Contracts v1 pipeline. | - |
483
- | `--compile-tools` | Print the per-turn compiled tool surface (filtered tool list + prompt-contract fragments) the default contract runtime emits. | - |
484
- | `--dispatch-tool <name>` | Print the dispatch verdict (`allowed`, `mode`, repair recipe, suggested args) the default contract runtime would return for a tool call. Combine with `--tool-args` to supply arguments. | - |
485
- | `--tool-args <json>` | JSON arguments for `--dispatch-tool`. | `{}` |
486
- | `--storyboard-plan` | Build a storyboard project from the prompt locally (`buildStoryboardProject` + per-model adapter compilation via `compileForModel`) and print the plan as JSON. Does not call the network. Expects scene-structured prompt input (`SCENE NN - Title` / `VISUAL:` / `ACTION:` / `CAMERA:` / `AUDIO/SFX:` blocks) โ€” for casual prompts, use `--api-workflow storyboard-video` instead, which runs an LLM storyline expansion first. Pair with `--storyboard-plan-frames`, `--storyboard-plan-model`, `--storyboard-plan-stage`. | - |
487
- | `--storyboard-plan-frames <n>` | Frame count for `--storyboard-plan`. | inferred |
488
- | `--storyboard-plan-model <id>` | Adapter target for `--storyboard-plan` (seedance, seedance2, gpt-image-2, ltx23, wan). | inferred |
489
- | `--storyboard-plan-stage <stage>` | Compilation stage for `--storyboard-plan` (storyboard_image, scene_clip). | storyboard_image |
490
- | `--api-workflow` | Start `/v1/creative-agent/workflows` with generated inline `input.steps`; optional `storyboard-video` preset | - |
491
- | `--workflow-input <json\|@path>` | Durable workflow `input` JSON for the start request. Use `@path` to load from a file. | - |
492
- | `--workflow-title <text>` | Title for generated or storyboard durable workflow input | - |
493
- | `--workflow-idempotency-key <key>`, `--idempotency-key <key>` | Reuse safely when retrying a durable workflow start request | - |
494
- | `--workflow-max-cost <n>` | Reject hosted workflow starts above this estimated capacity-unit ceiling | - |
495
- | `--confirm-cost`, `--no-confirm-cost` | Forward explicit hosted workflow cost confirmation | - |
496
- | `--storyboard-frames <n>` | Beat count for storyboard-video workflow | - |
497
- | `--video-prompt <text>` | Motion prompt for generated-keyframe durable workflow | - |
498
- | `--negative-prompt <text>` | Negative prompt for generated-keyframe durable workflow | - |
499
- | `--generate-audio`, `--no-generate-audio` | Toggle audio generation for generated video steps | - |
500
- | `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for generated video steps | - |
501
- | `--watch-workflow` | Stream durable workflow events after start | - |
502
- | `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>`, `--resume-workflow <id>` | Durable workflow management helpers | - |
503
- | `--api-base-url <url>` | Sogni API base for hosted API modes. Credentials are only sent to `https://api.sogni.ai` by default; use `SOGNI_API_ALLOWED_HOSTS` for trusted custom hosts or `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1` for isolated local testing. | https://api.sogni.ai |
504
- | `--no-filter` | Disable NSFW content filter | - |
505
- | `--memory-set <key> <value>` | Save a user preference | - |
506
- | `--memory-get <key>` | Get a specific memory | - |
507
- | `--memory-list` | List all saved memories | - |
508
- | `--memory-remove <key>` | Delete a memory | - |
509
- | `--personality-set <text>` | Set custom agent personality instructions | - |
510
- | `--personality-get` | Show current personality | - |
511
- | `--personality-clear` | Reset personality to default | - |
512
- | `--persona-add <name>` | Add a persona (with --ref, --relationship, --description) | - |
513
- | `--persona-list` | List all saved personas | - |
514
- | `--persona-remove <name>` | Remove a persona and its files | - |
515
- | `--persona-resolve <name>` | Look up persona by name/tag/pronoun | - |
516
- | `--persona <name>` | Generate using persona's reference photo as context | - |
517
- | `--relationship <type>` | Persona relationship: self\|partner\|child\|friend\|pet | friend |
518
- | `--voice-clip <path>` | Voice clip audio for LTX-2.3 voice cloning | - |
519
-
520
- ## OpenClaw Config Defaults
521
-
522
- When installed as an OpenClaw plugin, Sogni Creative Agent Skill will read defaults from:
523
-
524
- `~/.openclaw/openclaw.json`
525
-
526
- ```json
527
- {
528
- "plugins": {
529
- "entries": {
530
- "sogni-creative-agent-skill": {
531
- "enabled": true,
532
- "config": {
533
- "defaultImageModel": "z_image_turbo_bf16",
534
- "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
535
- "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
536
- "defaultMusicModel": "ace_step_1.5_turbo",
537
- "videoModels": {
538
- "t2v": "ltx23-22b-fp8_t2v_distilled",
539
- "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
540
- "s2v": "wan_v2.2-14b-fp8_s2v_lightx2v",
541
- "ia2v": "ltx23-22b-fp8_ia2v_distilled",
542
- "a2v": "ltx23-22b-fp8_a2v_distilled",
543
- "animate-move": "wan_v2.2-14b-fp8_animate-move_lightx2v",
544
- "animate-replace": "wan_v2.2-14b-fp8_animate-replace_lightx2v",
545
- "v2v": "ltx23-22b-fp8_v2v_distilled"
546
- },
547
- "defaultVideoWorkflow": "t2v",
548
- "defaultNetwork": "fast",
549
- "defaultTokenType": "spark",
550
- "apiBaseUrl": "https://api.sogni.ai",
551
- "defaultLlmModel": "qwen3.6-35b-a3b-gguf-iq4xs",
552
- "defaultTaskProfile": "general",
553
- "defaultApiMaxTokens": 1600,
554
- "defaultApiThinking": false,
555
- "defaultApiToolMode": "creative-agent",
556
- "defaultWorkflowMaxCost": 25,
557
- "defaultWorkflowConfirmCost": false,
558
- "seedStrategy": "prompt-hash",
559
- "modelDefaults": {
560
- "flux1-schnell-fp8": { "steps": 4, "guidance": 3.5 },
561
- "flux2_dev_fp8": { "steps": 20, "guidance": 7.5 }
562
- },
563
- "defaultWidth": 768,
564
- "defaultHeight": 768,
565
- "defaultCount": 1,
566
- "defaultFps": 16,
567
- "defaultDurationSec": 5,
568
- "defaultImageTimeoutSec": 30,
569
- "defaultVideoTimeoutSec": 300,
570
- "defaultMusicDurationSec": 30,
571
- "defaultMusicTimeoutSec": 600,
572
- "credentialsPath": "~/.config/sogni/credentials",
573
- "lastRenderPath": "~/.config/sogni/last-render.json",
574
- "mediaInboundDir": "~/.clawdbot/media/inbound"
575
- }
576
- }
577
- }
578
- }
579
- }
580
- ```
581
-
582
- CLI flags always override these defaults.
583
- If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
584
- Seed strategies: `prompt-hash` (deterministic) or `random`.
585
-
586
- ## Image Models
587
-
588
- | Model | Speed | Use Case |
589
- |-------|-------|----------|
590
- | `z_image_turbo_bf16` | Fast (~5-10s) | General purpose, default |
591
- | `gpt-image-2` | Variable | OpenAI GPT Image 2 text-to-image and edit, strong prompt and text rendering |
592
- | `flux1-schnell-fp8` | Very fast | Quick iterations |
593
- | `flux2_dev_fp8` | Slow (~2min) | High quality |
594
- | `chroma-v.46-flash_fp8` | Medium | Balanced |
595
- | `qwen_image_edit_2511_fp8` | Medium | Image editing with context (up to 3) |
596
- | `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
597
- | `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
598
-
599
- `gpt-image-2` supports flexible OpenAI image sizes up to `3840px` on either edge, max `3:1` aspect ratio, and total pixels from `655,360` through `8,294,400`; the API snaps dimensions to valid multiples of 16.
600
-
601
- ## Music Models
602
-
603
- | Model | Use Case |
604
- |-------|----------|
605
- | `ace_step_1.5_turbo` | Default direct music generation model |
606
- | `ace_step_1.5_sft` | Experimental option with stronger lyric handling |
607
-
608
- Use `--music` for direct audio-only generation. Defaults are 30 seconds, `mp3`,
609
- `ace_step_1.5_turbo`, 8 steps, `euler` sampler, and `simple` scheduler. Keep
610
- `--audio` for video reference audio (`--ref-audio` alias); do not use it for
611
- direct music generation.
612
-
613
- ## Video Models
614
-
615
- ### Current Video Model Selectors
616
-
617
- | Model | Speed | Use Case |
618
- |-------|-------|----------|
619
- | `ltx23-22b-fp8_t2v_distilled` | Fast (~2-3min) | Default text-to-video with native dialogue/audio |
620
- | `ltx23-22b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video with native dialogue/audio |
621
- | `ltx23-22b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
622
- | `ltx23-22b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
623
- | `ltx23-22b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
624
- | `seedance2` | Variable | Seedance 2.0 text-to-video, 4-15s, native audio |
625
- | `seedance2-fast` | Variable | Fast Seedance 2.0 text-to-video |
626
- | `seedance2-ia2v` | Variable | Seedance 2.0 image+audio-to-video |
627
- | `seedance2-v2v` | Variable | Seedance 2.0 video-to-video, no ControlNet |
628
- | `wan_v2.2-14b-fp8_i2v_lightx2v` | Fast | Simple image-to-video |
629
- | `wan_v2.2-14b-fp8_i2v` | Slow | Higher quality video |
630
- | `wan_v2.2-14b-fp8_t2v_lightx2v` | Fast | Text-to-video |
631
- | `wan_v2.2-14b-fp8_s2v_lightx2v` | Fast | Face lip-sync with uploaded audio |
632
- | `wan_v2.2-14b-fp8_animate-move_lightx2v` | Fast | Animate-move |
633
- | `wan_v2.2-14b-fp8_animate-replace_lightx2v` | Fast | Animate-replace |
634
-
635
- ### LTX-2 / LTX-2.3 Models
636
-
637
- | Model | Speed | Use Case |
638
- |-------|-------|----------|
639
- | `ltx2-19b-fp8_t2v_distilled` | Fast (~2-3min) | Text-to-video, 8-step |
640
- | `ltx2-19b-fp8_t2v` | Medium (~5min) | Text-to-video, 20-step quality |
641
- | `ltx2-19b-fp8_i2v_distilled` | Fast (~2-3min) | Image-to-video, 8-step |
642
- | `ltx2-19b-fp8_i2v` | Medium (~5min) | Image-to-video, 20-step quality |
643
- | `ltx2-19b-fp8_ia2v_distilled` | Fast (~2-3min) | Image+audio-to-video |
644
- | `ltx2-19b-fp8_a2v_distilled` | Fast (~2-3min) | Audio-to-video |
645
- | `ltx2-19b-fp8_v2v_distilled` | Fast (~3min) | Video-to-video with ControlNet |
646
- | `ltx2-19b-fp8_v2v` | Medium (~5min) | Video-to-video with ControlNet, quality |
647
-
648
- ## Image Editing with Context
649
-
650
- Edit images using reference images. Qwen models support up to 3 context images; GPT Image 2 edit supports up to 16 when selected with `-m gpt-image-2`:
651
-
652
- ```bash
653
- # Single context image
654
- sogni-agent -c photo.jpg "make the background a beach"
655
-
656
- # Multiple context images (subject + style)
657
- sogni-agent -c subject.jpg -c style.jpg "apply the style to the subject"
117
+ SOGNI_SKILL_USE_SDK_TRANSPORT=1 sogni-agent --durable-chat "Create a launch campaign and animate the hero clip"
658
118
 
659
- # GPT Image 2 multi-reference edit
660
- sogni-agent -m gpt-image-2 -c subject.jpg -c outfit.jpg -c room.jpg "place the subject in the room wearing the outfit"
119
+ # Durable workflow (resumable, server-orchestrated)
120
+ sogni-agent --api-workflow --video-prompt "The camera slowly pushes in" "A graphite robot sketch on a drafting table"
661
121
 
662
- # Use last generated image as context
663
- sogni-agent --last-image "make it more vibrant"
122
+ # Storyboard โ†’ GPT Image 2 sheet โ†’ Seedance video, all server-side
123
+ sogni-agent --api-workflow storyboard-video --storyboard-frames 6 -Q hq "9:16 bakery launch video"
664
124
  ```
665
125
 
666
- When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`. Select `-m gpt-image-2` for GPT Image 2's higher reference-image limit and OpenAI-backed image editing.
126
+ Hosted modes require `SOGNI_API_KEY`. Local file references are uploaded to Sogni media storage and forwarded as retrievable URLs โ€” **use direct CLI mode for private media that must not leave the local machine.**
667
127
 
668
- ## Photobooth (Face Transfer)
128
+ Use the direct-to-SDK commands below for explicit one-shot generation when you already know the model, dimensions, and prompt.
669
129
 
670
- Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use `--photobooth` with `--ref` pointing to their face image.
130
+ ## Core Commands (direct-to-SDK)
671
131
 
672
132
  ```bash
673
- # Basic photobooth
674
- sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
675
-
676
- # Multiple outputs
677
- sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
678
-
679
- # Custom ControlNet tuning
680
- sogni-agent --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"
681
- ```
682
-
683
- Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
684
-
685
- **Agent usage:**
686
- ```bash
687
- # Photobooth: stylize a face photo
688
- sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
689
-
690
- # Multiple photobooth outputs
691
- sogni-agent -q --photobooth --ref /path/to/face.jpg -n 4 -o ./stylized.png "LinkedIn professional headshot"
692
- ```
693
-
694
- ## Multiple Angles (Turnaround)
695
-
696
- Generate specific camera angles from a single reference image using the Multiple Angles LoRA:
697
-
698
- ```bash
699
- # Single angle
700
- sogni-agent --multi-angle -c subject.jpg \
701
- --azimuth front-right --elevation eye-level --distance medium \
702
- --angle-strength 0.9 \
703
- "studio portrait, same person"
704
-
705
- # 360 sweep (8 azimuths)
706
- sogni-agent --angles-360 -c subject.jpg --distance medium --elevation eye-level \
707
- "studio portrait, same person"
708
-
709
- # 360 sweep video (looping mp4, uses i2v between angles; requires ffmpeg)
710
- sogni-agent --angles-360 --angles-360-video ./turntable.mp4 \
711
- -c subject.jpg --distance medium --elevation eye-level \
712
- "studio portrait, same person"
713
- ```
714
-
715
- The prompt is auto-built with the required `<sks>` token plus the selected camera angle keywords.
716
- `--angles-360-video` generates i2v clips between consecutive angles (including lastโ†’first) and concatenates them with ffmpeg for a seamless loop.
717
-
718
- ### 360 Video Best Practices
719
-
720
- When a user requests a "360 video", follow this workflow:
721
-
722
- 1. **Default camera parameters** (do not ask unless they specify):
723
- - **Elevation**: default to **medium**
724
- - **Distance**: default to **medium**
725
-
726
- 2. **Map user terms to flags**:
727
- | User says | Flag value |
728
- |-----------|------------|
729
- | "high" angle | `--elevation high-angle` |
730
- | "medium" angle | `--elevation eye-level` |
731
- | "low" angle | `--elevation low-angle` |
732
- | "close" | `--distance close-up` |
733
- | "medium" distance | `--distance medium` |
734
- | "far" | `--distance wide` |
735
-
736
- 3. **Always use first-frame/last-frame stitching** - the `--angles-360-video` flag automatically handles this by generating i2v clips between consecutive angles including lastโ†’first for seamless looping.
737
-
738
- 4. **Example command**:
739
- ```bash
740
- sogni-agent --angles-360 --angles-360-video ./output.mp4 \
741
- -c /path/to/image.png --elevation eye-level --distance medium \
742
- "description of subject"
743
- ```
744
-
745
- ### Transition Video Rule
746
-
747
- For **any transition video work**, always use the **Sogni skill/plugin** (not raw ffmpeg or other shell commands). Use the built-in `--extract-last-frame`, `--concat-videos`, and `--looping` flags for video manipulation.
748
-
749
- ### Insufficient Funds Handling
750
-
751
- Use `--token-type auto` to automatically retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never fall back to SOGNI.
752
-
753
- When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply:
754
-
755
- "Insufficient funds. Claim 50 free daily Spark points at https://app.sogni.ai/"
756
-
757
- ## Video Generation
758
-
759
- Generate videos from a reference image:
760
-
761
- ```bash
762
- # Text-to-video (t2v)
763
- sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
764
-
765
- # Basic video from image
766
- sogni-agent --video --ref cat.jpg -o cat.mp4 "cat walks around"
767
-
768
- # Use last generated image as reference
769
- sogni-agent --last-image --video "gentle camera pan"
770
-
771
- # Custom duration and FPS
772
- sogni-agent --video --ref scene.png --duration 10 --fps 24 "zoom out slowly"
773
-
774
- # Bare "720p" / "HD" without exact pixels: preserve aspect via short-side target
775
- sogni-agent --video --target-resolution 768 \
776
- "A calm cinematic shot of lanterns drifting across a night lake"
777
-
778
- # Natural-language aspect and resolution inference
779
- sogni-agent --video \
780
- "Make a 720p 9:16 video of ocean waves at sunset"
781
-
782
- # Seedance 2.0 text-to-video
783
- sogni-agent --video -m seedance2 --duration 8 \
784
- "A polished product reveal with native ambient sound"
785
-
786
- # Seedance multimodal context with public HTTPS references
787
- sogni-agent --video -m seedance2 --workflow t2v \
788
- --ref https://cdn.example.com/product.png \
789
- --ref-video https://cdn.example.com/motion.mp4 \
790
- --ref-audio https://cdn.example.com/music.m4a \
791
- "Use @Image1 for product identity, @Video1 for camera movement, and @Audio1 for music rhythm"
792
-
793
- # Sound-to-video (s2v)
794
- sogni-agent --video --ref face.jpg --ref-audio speech.m4a \
795
- -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
796
-
797
- # Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
798
- sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
799
- "music video with synchronized motion"
800
-
801
- # Audio-to-video (auto-routes to LTX 2.3 a2v)
802
- sogni-agent --video --ref-audio song.mp3 \
803
- "abstract audio-reactive visualizer"
804
-
805
- # Persona/voice identity with LTX native audio
806
- sogni-agent --video --reference-audio-identity voice.webm \
807
- "NARRATOR: \"This is my voice.\""
808
-
809
- # Prefer .webm, .m4a, or .mp3 voice clips. Local .wav clips are normalized
810
- # to .m4a before upload when ffmpeg is available.
811
-
812
- # LTX-2.3 text-to-video
813
- sogni-agent --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
814
- "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
815
-
816
- # Animate (motion transfer)
817
- sogni-agent --video --ref subject.jpg --ref-video motion.mp4 \
818
- --workflow animate-move "transfer motion"
819
-
820
- # Segment a longer reference video for local stitched workflows
821
- sogni-agent --video --workflow v2v --ref-video dance.mp4 \
822
- --video-start 10 --duration 8 --controlnet-name pose \
823
- "robot dancing"
824
- ```
825
-
826
- ## Video-to-Video (V2V) with ControlNet
827
-
828
- Transform an existing video using LTX-2 models with ControlNet guidance:
829
-
830
- ```bash
831
- # Basic v2v with canny edge detection
832
- sogni-agent --video --workflow v2v --ref-video input.mp4 \
833
- --controlnet-name canny "stylized anime version"
834
-
835
- # V2V with pose detection and custom strength
836
- sogni-agent --video --workflow v2v --ref-video dance.mp4 \
837
- --controlnet-name pose --controlnet-strength 0.7 "robot dancing"
838
-
839
- # V2V with depth map
840
- sogni-agent --video --workflow v2v --ref-video scene.mp4 \
841
- --controlnet-name depth "watercolor painting style"
842
- ```
843
-
844
- ControlNet types: `canny` (edge detection), `pose` (body pose), `depth` (depth map), `detailer` (detail enhancement).
845
- Default V2V strengths are tuned from Sogni Chat: `canny`/`pose`/`depth` use `0.85` plus detailer assist, while `detailer` uses `1.0` for preservation. For Seedance V2V, use `-m seedance2-v2v` and omit ControlNet. Seedance accepts public HTTPS image, video, and audio references as URL context when they pass the CLI URL safety checks; localhost and private-network URLs are rejected before forwarding. Audio references must be paired with an image or video reference.
846
-
847
- ```bash
848
- # Seedance V2V without ControlNet
849
- sogni-agent --video --workflow v2v -m seedance2-v2v \
850
- --ref-video input.mp4 "make the clip more cinematic"
851
- ```
852
-
853
- ## Photo Restoration
854
-
855
- Restore damaged vintage photos using Qwen image editing:
856
-
857
- ```bash
858
- # Basic restoration
859
- sogni-agent -c damaged_photo.jpg -o restored.png \
860
- "professionally restore this vintage photograph, remove damage and scratches"
861
-
862
- # Detailed restoration with preservation hints
863
- sogni-agent -c old_photo.jpg -o restored.png -w 1024 -h 1280 \
864
- "restore this vintage photo, remove peeling, tears and wear marks, \
865
- preserve natural features and expression, maintain warm nostalgic color tones"
866
- ```
867
-
868
- **Tips for good restorations:**
869
- - Describe the damage: "peeling", "scratches", "tears", "fading"
870
- - Specify what to preserve: "natural features", "eye color", "hair", "expression"
871
- - Mention the era for color tones: "1970s warm tones", "vintage sepia"
872
-
873
- **Finding received images (Telegram/etc):**
874
- ```bash
875
- sogni-agent --json --list-media images
876
- ```
877
-
878
- **Do NOT use `ls`, `cp`, or other shell commands to browse user files.** Always use `--list-media` to find inbound media.
879
-
880
- ## IMPORTANT KEYWORD RULE
881
-
882
- - If the user message includes the word "photobooth" (case-insensitive), always use `--photobooth` mode with `--ref` set to the user-provided face image.
883
- - Prioritize this rule over generic image-edit flows (`-c`) for that request.
884
-
885
- ## LTX-2.3 Prompt Rule
886
-
887
- Whenever the chosen video model is `ltx23-22b-fp8_t2v_distilled`, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling `sogni-agent`.
888
-
889
- - Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting.
890
- - Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps.
891
- - Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources.
892
- - Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
893
- - Give the scene one main action thread from start to finish. Use connectors like `as`, `while`, and `then` so motion reads as a continuous filmed moment.
894
- - If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
895
- - Budget spoken dialogue at about 3 words per second, plus about 1 second for each meaningful acting beat or pause.
896
- - Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
897
- - Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like `beautiful` or `nice`, or structural markup such as `[DIALOGUE]`.
898
- - Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
899
- - Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.
900
-
901
- ### Duration-Aware Pacing
902
-
903
- Match scene density to clip length so prompts stay filmable:
904
-
905
- - About `1-4s`: describe exactly 1 action or moment.
906
- - About `5-8s`: describe about 2 sequential actions.
907
- - About `9-12s`: describe about 3 sequential actions.
908
- - Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.
909
-
910
- ### Orientation Mapping
911
-
912
- When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions:
913
-
914
- - `vertical`, `portrait`, `story`, `reel`, `tiktok` -> `-w 1088 -h 1920`
915
- - `landscape`, `horizontal`, `widescreen`, `youtube`, `16:9` -> `-w 1920 -h 1088`
916
- - `square`, `1:1` -> `-w 1088 -h 1088`
917
- - `4:3 portrait` -> `-w 832 -h 1088`
918
- - `4:3 landscape` -> `-w 1088 -h 832`
919
-
920
- ### Camera Language Normalization
921
-
922
- When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt:
923
-
924
- - `zoom in` -> `slow push-in`
925
- - `zoom out` -> `slow pull-back`
926
- - `pan left` / `pan right` -> `smooth pan left` / `smooth pan right`
927
- - `orbit` / `circle around` -> `slow arc left` or `slow arc right`
928
- - `follow` -> `tracking follow`
929
-
930
- Short example:
931
-
932
- ```text
933
- User ask: "4k video of a woman in a neon alley"
934
-
935
- Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
936
- ```
937
-
938
- ## Agent Usage
939
-
940
- When user asks to generate/draw/create an image:
941
-
942
- ```bash
943
- # Generate and save locally (use -Q for quality presets instead of memorizing model IDs)
133
+ # Image (quality presets pick model/steps/size: fast | hq | pro)
944
134
  sogni-agent -q -Q fast -o ./generated.png "user's prompt"
945
135
  sogni-agent -q -Q pro -o ./generated.png "user's prompt"
946
136
 
947
- # Generate with prompt variations (diverse images in one call)
137
+ # Diverse variations in one call (options cycle per image)
948
138
  sogni-agent -q -n 3 -o ./cars.png "a {red|blue|green} sports car"
949
139
 
950
- # Edit an existing image
140
+ # Edit an existing image (source-preserving)
951
141
  sogni-agent -q -c /path/to/input.jpg -o ./edited.png "make it pop art style"
952
142
 
953
- # Generate video from image
954
- sogni-agent -q --video --ref /path/to/image.png -o ./video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
955
-
956
- # Generate text-to-video
957
- sogni-agent -q --video -o ./video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
143
+ # Photobooth (face transfer โ€” new portrait from a face photo)
144
+ sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
958
145
 
959
- # Generate direct music/audio
960
- sogni-agent -q --music --duration 30 -o ./music.mp3 "uplifting cinematic synthwave theme for a product launch"
146
+ # Text-to-video / image-to-video (write the prompt per references/video-prompting.md)
147
+ sogni-agent -q --video -o ./video.mp4 "<cinematic prose paragraph>"
148
+ sogni-agent -q --video --ref /path/to/image.png -o ./video.mp4 "<cinematic prose paragraph>"
961
149
 
962
- # HD / "4K" text-to-video: prefer LTX-2.3
963
- sogni-agent -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
150
+ # Sound-to-video (lip-sync), image+audio, audio-only (workflow auto-inferred)
151
+ sogni-agent --video --ref face.jpg --ref-audio speech.m4a -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync talking head"
152
+ sogni-agent --video --ref cover.jpg --ref-audio song.mp3 "music video with synchronized motion"
153
+ sogni-agent --video --ref-audio song.mp3 "abstract audio-reactive visualizer"
964
154
 
965
- # HD / "4K" image-to-video: prefer LTX i2v
966
- sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -w 1920 -h 1088 -o ./video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
155
+ # Music (direct audio generation; mp3 by default)
156
+ sogni-agent -q --music --duration 30 -o ./music.mp3 "uplifting cinematic synthwave theme"
157
+ sogni-agent --music --lyrics "Rise with the morning light" --bpm 128 --keyscale "C major" "bright indie pop chorus"
967
158
 
968
- # Photobooth: stylize a face photo
969
- sogni-agent -q --photobooth --ref /path/to/face.jpg -o ./stylized.png "80s fashion portrait"
159
+ # Seedance 2.0 (4-15s vendor video with native audio)
160
+ sogni-agent --video -m seedance2 --duration 8 "A polished product reveal with native ambient sound"
970
161
 
971
- # Token auto-fallback for native Sogni models (tries SPARK first, retries with SOGNI on insufficient balance)
972
- sogni-agent -q --token-type auto -o ./generated.png "user's prompt"
973
-
974
- # Check current SPARK/SOGNI balances (no prompt required)
162
+ # Balances / last render / inbound media / health (no prompt required)
975
163
  sogni-agent --json --balance
976
-
977
- # Find user-sent images/audio
164
+ sogni-agent --last --json
978
165
  sogni-agent --json --list-media images
979
-
980
- # Then send via message tool with filePath
981
- ```
982
-
983
- ### Quality Presets
984
-
985
- Use `-Q` / `--quality` instead of memorizing model IDs:
986
-
987
- | Preset | Model | Steps | Size | Speed |
988
- |--------|-------|-------|------|-------|
989
- | `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
990
- | `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
991
- | `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
992
-
993
- Explicit `-m` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions. When the user asks for "high quality", "best quality", or "pro", use `-Q pro`. For quick drafts or previews, use `-Q fast`.
994
-
995
- ### Dynamic Prompt Variations
996
-
997
- When the user wants multiple variations (different colors, styles, subjects), use `{option1|option2|option3}` syntax with `-n`:
166
+ sogni-agent doctor --json
167
+ ```
168
+
169
+ `sogni-agent --help` is the canonical, always-current flag reference.
170
+
171
+ ## Common Options
172
+
173
+ | Flag | Use | Default |
174
+ |------|-----|---------|
175
+ | `-Q fast\|hq\|pro` | Quality preset (model+steps+size); `-m` overrides model | - |
176
+ | `-o <path>` | Save output locally (relative โ†’ PWD) | prints URL |
177
+ | `-c <path>` | Context image for editing (repeatable) | - |
178
+ | `-m <id>` | Explicit model | `z_image_turbo_bf16` |
179
+ | `-w` / `-h` | Width / height | 512ร—512 |
180
+ | `-n <num>` | Output count (`{a\|b\|c}` prompt variations cycle); capped at 16, raise with `SOGNI_MAX_COUNT` | 1 |
181
+ | `--video`, `--music` | Generate video / music instead of image | - |
182
+ | `--workflow <t>` | Force `t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace` | inferred |
183
+ | `--ref`, `--ref-end`, `--ref-audio`, `--ref-video` | Start frame / end frame / audio / video references | - |
184
+ | `--duration <sec>` | Video or music length | video 5, music 30 |
185
+ | `--target-resolution <px>` | Short-side target preserving aspect ratio (use for bare "720p") | - |
186
+ | `--photobooth` | Face transfer mode (with `--ref`) | - |
187
+ | `--persona <name>` | Use a saved persona (photo + voice auto-attach) | - |
188
+ | `--token-type spark\|sogni\|auto` | `auto` retries native models with SOGNI when SPARK is low | spark |
189
+ | `--last`, `--last-image` | Inspect last render / reuse it as context or ref | - |
190
+ | `--json` | Machine-parseable stdout (progress goes to stderr) | false |
191
+ | `-q, --quiet` | Suppress progress output | false |
192
+ | `-t <sec>` | Timeout | 30 image / 300 video |
193
+ | `--strict-size` | Fail instead of auto-adjusting video size | false |
194
+ | `doctor`, `self-update`, `--whats-new`, `--snooze-update` | Health check / upgrade / changelog / snooze reminder | - |
195
+
196
+ ## Routing Rules (always apply)
197
+
198
+ ### Photobooth vs. context editing
199
+
200
+ - `--photobooth` is **face-reference generation**, not full-image editing: it generates a *new* portrait from a face photo and may change pose, clothing, background, framing, and composition. Use it when the user explicitly asks for photobooth/face-transfer, a new portrait/headshot from their face, or to place their face into a different concept. Cannot be combined with `--video` or `-c/--context`. Tune with `--cn-strength` (default 0.8) and `--cn-guidance-end` (default 0.3).
201
+ - If the request is "**same image, different style**" โ€” e.g. an anime version that must keep the same face, pose, clothing, background, framing, and composition; "use this image as the base"; "keep everything the same"; "only change the style" โ€” use Qwen context editing with `-c/--context` instead. For stronger preservation than the lightning default:
998
202
 
999
203
  ```bash
1000
- # 3 color variations
1001
- sogni-agent -q -n 3 "a {red|blue|green} sports car"
1002
-
1003
- # 4 style variations
1004
- sogni-agent -q -n 4 "a portrait in {oil painting|watercolor|pencil sketch|pop art} style"
204
+ sogni-agent -c photo.jpg -m qwen_image_edit_2511_fp8 "turn this into anime style; keep the same face, pose, clothing, background, framing, and composition"
1005
205
  ```
1006
206
 
1007
- Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt.
207
+ - Do not route to `--photobooth` merely because the user asks to preserve a face in a style edit โ€” face-preserving full-image edits use `-c` with Qwen image edit. When context images are provided without `-m`, the CLI defaults to `qwen_image_edit_2511_fp8_lightning`; select `-m gpt-image-2` for up to 16 reference images and OpenAI-backed editing (Qwen supports up to 3).
1008
208
 
1009
- For video, use the same `{...}` + `-n` pattern when all outputs share the same source image, end image, duration, audio, and settings and only prompt text varies:
209
+ ### LTX video prompts
1010
210
 
1011
- ```bash
1012
- sogni-agent --video --ref hero.png -n 3 --duration 5 \
1013
- "{the subject smiles and waves|the subject turns toward the window|the subject raises a hand in greeting}"
1014
- ```
211
+ Whenever the chosen video model is in the LTX family (including the default t2v), **do not pass the user's short request through unchanged**. Rewrite it into one unbroken paragraph of 4-8 flowing present-tense sentences describing a single continuous shot โ€” concrete subjects, named light sources, one action thread, dialogue embedded in double quotes with the speaker identified, positive phrasing only, no headers/bullets/negative-prompts. **Read [`references/video-prompting.md`](./references/video-prompting.md) for the full rule, duration pacing, orientation mapping, and camera-language normalization before writing the prompt.**
1015
212
 
1016
- If clips need different source images, end frames, durations, audio windows, or other per-output settings, keep them as separate per-clip workflow arguments. Do not force those into a single Dynamic Prompt branch.
213
+ ### High-res video
1017
214
 
1018
- ### Token Auto-Fallback
215
+ For "hd" / "1080p" / "4k" / "uhd" requests: use `-m ltx23-22b-fp8_t2v_distilled` (text) or `-m ltx23-22b-fp8_i2v_distilled` (image), prefer `-w 1920 -h 1088` (or the orientation mapping in the reference), and rewrite the prompt per the LTX rule. For bare "720p" without orientation, prefer `--target-resolution 768`.
1019
216
 
1020
- Use `--token-type auto` when the user's SPARK balance might be low. It tries SPARK first (free daily tokens) and automatically retries with SOGNI if insufficient.
217
+ ### Video editing, stitching, 360 turnarounds
1021
218
 
1022
- ## High-Res Video Routing
219
+ Trigger patterns โ€” "animate image A to image B" (`--ref A --ref-end B`), "continue this video" (extract last frame โ†’ i2v โ†’ concat), "transition between two videos" (bridge clip), "360 video" (`--angles-360 --angles-360-video`), "add/replace the soundtrack" (`--concat-audio` / `--remix-audio`). **Read [`references/video-editing.md`](./references/video-editing.md) for the step-by-step recipes.**
1023
220
 
1024
- When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or **"high-res"**, do not use the default WAN video models.
221
+ **Security: never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) for file operations or video/audio manipulation.** Always use the CLI's built-in safe wrappers: `--extract-first-frame`, `--extract-last-frame`, `--concat-videos`, `--remix-audio`, `--list-media`, `--video-start`, `--audio-start`, `--audio-duration`, `--looping`.
1025
222
 
1026
- - For **text-to-video**, use `-m ltx23-22b-fp8_t2v_distilled`.
1027
- - For **image-to-video**, use `-m ltx23-22b-fp8_i2v_distilled`.
1028
- - Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
1029
- - For bare named resolutions such as "720p" without orientation or exact pixels, prefer `--target-resolution 768` or the closest requested short side instead of forcing landscape dimensions.
1030
- - When the prompt combines a named resolution with an aspect ratio, such as "720p 9:16", let the CLI infer both instead of forcing manual `-w`/`-h` unless the user gave exact pixels.
1031
- - If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
1032
- - Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
1033
- - Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even though the current wrapper caps non-WAN video dimensions at 2048px on the long side.
223
+ ### Finding user-sent media
1034
224
 
1035
- **Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--concat-videos`, `--list-media`) for all file operations and video manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
225
+ Use `sogni-agent --json --list-media images` (or `audio` / `all`) to find inbound media the user sent (e.g. via Telegram). **Do NOT browse user files with `ls`, `cp`, or other shell commands.**
1036
226
 
1037
- ## Animate Between Two Images (First-Frame / Last-Frame)
227
+ ### Personas, memories, personality
1038
228
 
1039
- When a user asks to **animate between two images**, use `--ref` (first frame) and `--ref-end` (last frame) to create a creative interpolation video:
229
+ - Only use `--persona "Name"` when the user refers to a **saved** persona by explicit name, id, or tag/alias โ€” user-uploaded photos are NOT personas; use `-c` for ad-hoc photos. With `--video`, a saved voice clip auto-attaches as the voice identity.
230
+ - Before generating, check saved preferences with `--memory-list` and respect them; save stated standing preferences with `--memory-set`. Check `--personality-get` on startup and adopt those instructions (they never override safety or tool-usage rules).
231
+ - **Read [`references/personas-memory.md`](./references/personas-memory.md)** for persona CRUD, voice cloning, multi-persona scenes, style transfer, and photo restoration recipes.
1040
232
 
1041
- ```bash
1042
- # Animate from image A to image B
1043
- sogni-agent -q --video --ref ./imageA.png --ref-end ./imageB.png -o ./transition.mp4 "descriptive prompt of the transition"
1044
- ```
233
+ ### Model selection
234
+
235
+ Prefer `-Q` presets and automatic workflow routing. When a specific model is needed (GPT Image 2 text rendering, Seedance native audio, WAN lip-sync, LTX dialogue), **read [`references/models.md`](./references/models.md)** for the catalog, recommended selectors, and sizing/divisibility rules.
1045
236
 
1046
- ### Animate a Video to an Image (Scene Continuation)
237
+ ### Insufficient funds
1047
238
 
1048
- When a user asks to **animate from a video to an image** (or "continue" a video into a new scene):
239
+ Use `--token-type auto` to retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models (Seedance, GPT Image 2) require Premium Spark eligibility and never fall back to SOGNI. When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply exactly:
1049
240
 
1050
- 1. **Extract the last frame** of the existing video using the built-in safe wrapper:
1051
- ```bash
1052
- sogni-agent --extract-last-frame ./existing.mp4 ./lastframe.png
1053
- ```
1054
- 2. **Generate a new video** using the last frame as `--ref` and the target image as `--ref-end`:
1055
- ```bash
1056
- sogni-agent -q --video --ref ./lastframe.png --ref-end ./target.png -o ./continuation.mp4 "scene transition prompt"
1057
- ```
1058
- 3. **Concatenate the videos** using the built-in safe wrapper:
1059
- ```bash
1060
- sogni-agent --concat-videos ./full_sequence.mp4 ./existing.mp4 ./continuation.mp4
1061
- ```
241
+ "Insufficient funds. Buy Spark Packs to continue: https://docs.sogni.ai/pricing/#spark-packs"
1062
242
 
1063
- This ensures visual continuity โ€” the new clip picks up exactly where the previous one ended.
243
+ Do not collect payment details, quote a custom price, or simulate a purchase in the terminal.
1064
244
 
1065
- When the final stitched output needs a single external soundtrack, add `--concat-audio /path/to/audio.mp3` and optional `--concat-audio-start <sec>` to the same `--concat-videos` command. This is the local-agent advantage over browser-only workflows: generate clips with Sogni, then use the safe FFmpeg wrapper to stitch and mux audio locally.
245
+ ### Suggest next steps after a render
1066
246
 
1067
- **Do NOT run raw `ffmpeg` commands.** Always use `--extract-last-frame` and `--concat-videos` for video manipulation.
247
+ After an image: offer to animate it (`--video --ref <result>`), restyle it (`-c <result> "Apply style: ..."`), change the angle (`--multi-angle -c <result>`), generate variations (`-n 3 "{a|b|c}"`), or refine at `-Q pro`. After a video: offer different motion, dialogue (LTX), longer `--duration`, stitching (`--concat-videos`), or a soundtrack (`--concat-audio` / `--remix-audio`).
1068
248
 
1069
- **Always apply this pattern when:**
1070
- - User says "animate image A to image B" โ†’ use `--ref A --ref-end B`
1071
- - User says "animate this video to this image" โ†’ extract last frame, use as `--ref`, target image as `--ref-end`, then stitch
1072
- - User says "continue this video" with a target image โ†’ same as above
249
+ ## JSON Output Contract
1073
250
 
1074
- ## JSON Output
251
+ Success (`--json`):
1075
252
 
1076
253
  ```json
1077
254
  {
1078
255
  "success": true,
1079
256
  "prompt": "a cat wearing a hat",
1080
- "model": "z_image_turbo_bf16",
257
+ "model": "z_image_turbo_bf16",
1081
258
  "width": 512,
1082
259
  "height": 512,
1083
260
  "urls": ["https://..."],
@@ -1085,7 +262,7 @@ When the final stitched output needs a single external soundtrack, add `--concat
1085
262
  }
1086
263
  ```
1087
264
 
1088
- On error (with `--json`), the script returns a single JSON object like:
265
+ Failure (single JSON object on stdout, exit code 1; progress/warnings on stderr):
1089
266
 
1090
267
  ```json
1091
268
  {
@@ -1099,170 +276,28 @@ On error (with `--json`), the script returns a single JSON object like:
1099
276
  }
1100
277
  ```
1101
278
 
1102
- Balance check example (`--json --balance`):
1103
-
1104
- ```json
1105
- {
1106
- "success": true,
1107
- "type": "balance",
1108
- "spark": 12.34,
1109
- "sogni": 0.56
1110
- }
1111
- ```
279
+ `--json --balance` โ†’ `{ "success": true, "type": "balance", "spark": 12.34, "sogni": 0.56 }`. `--last --json` wraps the last render record in a `{ "success": true, ... }` envelope and exits 1 with `errorCode: "NO_LAST_RENDER"` when nothing has been rendered. In `--json` mode stdout always carries exactly one JSON object โ€” SSE workflow frames and progress lines go to stderr.
1112
280
 
1113
281
  ## Cost
1114
282
 
1115
- Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens for native Sogni models when SPARK is insufficient. Seedance and GPT Image 2 are vendor models and require Premium Spark eligibility; they never use SOGNI fallback.
1116
-
1117
- ## Persona System
1118
-
1119
- Personas are named people with saved reference photos and optional voice clips. They enable identity-preserving generation across sessions.
1120
-
1121
- ### Managing Personas
1122
-
1123
- ```bash
1124
- # Add a persona with a reference photo
1125
- sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --description "30s male, brown hair, brown eyes"
1126
-
1127
- # Add with voice clip for video voice cloning
1128
- sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip sarah-voice.webm --voice "warm alto with British accent"
1129
-
1130
- # List all personas
1131
- sogni-agent --persona-list --json
1132
-
1133
- # Resolve a persona by name, tag, or pronoun
1134
- sogni-agent --persona-resolve "me" --json
1135
-
1136
- # Generate using a persona (auto-injects photo as context)
1137
- sogni-agent --persona "Mark" -o ./hero.png "superhero in dramatic lighting"
1138
-
1139
- # Remove a persona
1140
- sogni-agent --persona-remove "Mark"
1141
- ```
1142
-
1143
- ### Persona Pipeline Rules
1144
-
1145
- When a user mentions a persona by explicit saved name, id, or tag/alias:
1146
-
1147
- 1. **For images:** Use `--persona "Name" "prompt"` which auto-injects the persona's reference photo as context and selects the Qwen editing model
1148
- 2. **For video with voice cloning:** The persona's voice clip is used as `--reference-audio-identity` when `--video` is combined with `--persona`
1149
- 3. **For video without voice clip:** Describe the voice in the prompt ("speaks in a warm alto with a British accent")
1150
-
1151
- **Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by explicit name, id, or tag/alias. For ad-hoc photos, use `-c` (context image) directly.
1152
-
1153
- ## Memory System
1154
-
1155
- Memories are persistent key-value preferences stored locally at `~/.config/sogni/memories.json`.
1156
-
1157
- ```bash
1158
- # Save a preference
1159
- sogni-agent --memory-set preferred_style "watercolor and soft lighting"
1160
- sogni-agent --memory-set aspect_ratio "16:9"
1161
- sogni-agent --memory-set favorite_artist "Studio Ghibli"
1162
-
1163
- # Read all memories
1164
- sogni-agent --memory-list --json
1165
-
1166
- # Get one memory
1167
- sogni-agent --memory-get preferred_style --json
1168
-
1169
- # Delete a memory
1170
- sogni-agent --memory-remove preferred_style
1171
- ```
1172
-
1173
- **Agent behavior:** Before generating, check memories with `--memory-list` and respect saved preferences. If the user says "I always want watercolor style", save it with `--memory-set`. Categories: `preference` (default), `fact`, `context`.
1174
-
1175
- ## Personality (Custom Agent Instructions)
1176
-
1177
- Users can set custom instructions that shape agent behavior, stored at `~/.config/sogni/personality.txt`.
1178
-
1179
- ```bash
1180
- # Set personality
1181
- sogni-agent --personality-set "Be concise, always use cinematic lighting, suggest bold creative ideas"
1182
-
1183
- # Read current personality
1184
- sogni-agent --personality-get --json
1185
-
1186
- # Clear (reset to default)
1187
- sogni-agent --personality-clear
1188
- ```
1189
-
1190
- **Agent behavior:** Check personality on startup and adopt those instructions. Personality overrides default style but not hard constraints (safety, tool usage rules).
1191
-
1192
- ## Style Transfer
1193
-
1194
- Apply artistic styles to existing images:
1195
-
1196
- ```bash
1197
- # Apply a named artist style
1198
- sogni-agent -c photo.jpg -o ./styled.png "Apply style: Andy Warhol pop art with bold primary colors"
1199
-
1200
- # Studio Ghibli transformation
1201
- sogni-agent -c photo.jpg -o ./ghibli.png "Apply style: Studio Ghibli watercolor with soft pastel sky and lush greenery"
1202
-
1203
- # For photos with people, always preserve identity
1204
- sogni-agent -c portrait.jpg -o ./styled.png "Apply style: oil painting in the style of Vermeer. Preserve all facial features, expressions, and identity."
1205
- ```
1206
-
1207
- **Tips:** Reference artists and styles BY NAME for best results. Use positive phrasing. For photos with people, always append identity preservation instructions.
1208
-
1209
- ## Change Angle (Novel View Synthesis)
1210
-
1211
- Generate a photo from a different camera angle:
1212
-
1213
- ```bash
1214
- # 3/4 view
1215
- sogni-agent --multi-angle -c subject.jpg --azimuth front-right "same subject"
1216
-
1217
- # Side view
1218
- sogni-agent --multi-angle -c subject.jpg --azimuth left --elevation eye-level --distance medium "same subject"
1219
-
1220
- # Full 360 turntable
1221
- sogni-agent --angles-360 -c subject.jpg "same subject"
1222
- ```
1223
-
1224
- **User term mapping:**
1225
- - "from the left" / "side view" โ†’ `--azimuth left`
1226
- - "3/4 view" / "three-quarter" โ†’ `--azimuth front-right`
1227
- - "from behind" / "back" โ†’ `--azimuth back`
1228
- - "looking up at" โ†’ `--elevation low-angle`
1229
- - "bird's eye" / "top-down" โ†’ `--elevation high-angle`
1230
- - "closeup" โ†’ `--distance close-up`
1231
-
1232
- ## Creative Workflow Patterns
1233
-
1234
- ### After Image Generation โ€” Suggest Next Steps:
1235
- - "Animate into a video" โ†’ `--video --ref <result>`
1236
- - "Apply a different style" โ†’ `-c <result> "Apply style: ..."`
1237
- - "Change the angle" โ†’ `--multi-angle -c <result>`
1238
- - "Generate variations" โ†’ `-n 3 "{style1|style2|style3}"`
1239
- - "Refine at higher quality" โ†’ use `-Q pro`
1240
-
1241
- ### After Video Generation โ€” Suggest Next Steps:
1242
- - "Try different motion" โ†’ re-generate with adjusted prompt
1243
- - "Add dialogue" โ†’ include spoken words in the LTX-2.3 prompt
1244
- - "Make it longer" โ†’ increase `--duration`
1245
- - "Combine videos" โ†’ `--concat-videos`
1246
- - "Add one soundtrack over stitched clips" โ†’ `--concat-videos ... --concat-audio <audio>`
1247
- - "Use a section of a source video/audio" โ†’ `--video-start`, `--audio-start`, and `--audio-duration`
1248
-
1249
- ### Music-to-Video Pipeline:
1250
- 1. Use the provided/generated audio file as `--ref-audio`
1251
- 2. If there is also a reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `ia2v`
1252
- 3. If there is no reference image, omit `--workflow` and let the CLI auto-select LTX 2.3 `a2v`
1253
- 4. Use `--workflow s2v` only for explicit face lip-sync with a face image
1254
- 5. If only part of the song/audio should drive the clip, pass `--audio-start <sec>` and optionally `--audio-duration <sec>`
1255
-
1256
- ### Multi-Persona Scene:
1257
- 1. Resolve all personas: `--persona-resolve "Mark" --json` and `--persona-resolve "Sarah" --json`
1258
- 2. Generate scene with both: `-c mark-photo.jpg -c sarah-photo.jpg "Mark and Sarah at a cafe, use face from picture 1 for Mark, face from picture 2 for Sarah"`
1259
- 3. Animate with one persona's voice identity: `--video --ref <scene.png> --reference-audio-identity <mark-voice.webm> "MARK: \"Exact spoken words.\""`
283
+ Uses Spark tokens from the user's Sogni account. 512x512 images are most cost-efficient. `-n` is safety-capped at 16 outputs per call (`SOGNI_MAX_COUNT` raises it deliberately). Seedance and GPT Image 2 are vendor models requiring Premium Spark eligibility.
1260
284
 
1261
285
  ## Troubleshooting
1262
286
 
1263
- - **Auth errors**: Check `SOGNI_API_KEY` or the API key in `~/.config/sogni/credentials`
1264
- - **i2v sizing gotchas**: Video sizes are model-specific. WAN uses min 480px, max 1536px, divisible by 16. LTX uses divisible-by-64 dimensions, and the current wrapper caps non-WAN video dimensions at 2048px on the long side. For i2v, the client wrapper resizes the reference (`fit: inside`) and uses the resized dimensions as the final video size. Because this uses rounding, a requested size can still yield an invalid final size.
1265
- - **Auto-adjustment**: With a local `--ref`, the script will auto-adjust the requested size to avoid resized reference dimensions that miss the model divisor.
1266
- - **If the script adjusts your size but you want to fail instead**: pass `--strict-size` and it will print a suggested `--width/--height`.
1267
- - **Timeouts**: Try a faster model or increase `-t` timeout
1268
- - **No workers**: Check https://sogni.ai for network status
287
+ - **Anything broken?** Run `sogni-agent doctor` first โ€” it checks Node, credentials (and file permissions), config-dir writability, ffmpeg, live auth, and version freshness, with a fix in every failure detail.
288
+ - **Auth errors:** check `SOGNI_API_KEY` or `~/.config/sogni/credentials` (key from https://dashboard.sogni.ai, account menu).
289
+ - **Video size errors:** sizes are model-specific (WAN รท16 min 480 max 1536; LTX รท64, long side โ‰ค2048). The CLI auto-adjusts for local refs; `--strict-size` makes it fail with a suggested size instead. Details in [`references/models.md`](./references/models.md).
290
+ - **Timeouts:** try a faster model or raise `-t`.
291
+ - **No workers:** check https://sogni.ai for network status.
292
+
293
+ ## Reference Index (read before acting)
294
+
295
+ | Read this | When the task involves |
296
+ |-----------|------------------------|
297
+ | [`references/video-prompting.md`](./references/video-prompting.md) | Writing any LTX video prompt; "hd/1080p/4k" requests; orientation/aspect mapping; camera language |
298
+ | [`references/video-editing.md`](./references/video-editing.md) | Animate between images, continue/bridge videos, 360 turnarounds, concat, audio remix/layering, v2v ControlNet |
299
+ | [`references/hosted-api.md`](./references/hosted-api.md) | `--api-chat`, `--durable-chat`, `--api-workflow`, workflow templates, replays, Seedance reference modes, cost controls |
300
+ | [`references/models.md`](./references/models.md) | Choosing models, sizing/divisibility rules, gpt-image-2 limits, music model options |
301
+ | [`references/personas-memory.md`](./references/personas-memory.md) | Persona CRUD/voice cloning, multi-persona scenes, memories, personality, style transfer, photo restoration |
302
+ | [`references/openclaw-config.md`](./references/openclaw-config.md) | OpenClaw plugin config defaults and overrides |
303
+ | [`skills/README.md`](./skills/README.md) | Hosted per-skill tool surface (for hosts that load focused capability subsets) |