@sogni-ai/sogni-creative-agent-skill 2.3.0 → 3.1.0-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -276,18 +276,18 @@ sogni-agent --api-chat --task-profile reasoning --no-thinking \
276
276
  sogni-agent --list-api-models
277
277
 
278
278
  # Durable hosted workflow (/v1/creative-agent/workflows)
279
- sogni-agent --api-workflow image-to-video \
279
+ sogni-agent --api-workflow \
280
280
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
281
281
  "A graphite robot sketch on a drafting table"
282
282
 
283
283
  # Durable workflow with a media reference and a cost ceiling
284
- sogni-agent --api-workflow image-to-video --ref https://cdn.example.com/sketch.png \
284
+ sogni-agent --api-workflow --ref https://cdn.example.com/sketch.png \
285
285
  --workflow-max-cost 25 --confirm-cost \
286
286
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
287
287
  "Animate the referenced sketch"
288
288
 
289
- # Shared CreativeWorkflowPlan -> API compiles to hosted sequence
290
- sogni-agent --api-workflow creative-plan --workflow-input @plan.json
289
+ # Exact durable workflow input
290
+ sogni-agent --api-workflow --workflow-input @workflow.json
291
291
 
292
292
  # Storyline -> GPT Image 2 storyboard sheet -> Seedance video sequence
293
293
  sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
@@ -297,6 +297,13 @@ sogni-agent --api-workflow storyboard-video --storyboard-frames 6 --duration 12
297
297
  sogni-agent --list-replays 20
298
298
  sogni-agent --get-replay run_abc123 --json
299
299
 
300
+ # Opt in to SDK transport for hosted operations (durable workflows + chat).
301
+ # Validates restEndpoint/socketEndpoint via the skill's SSRF guard, then
302
+ # calls sogni.workflows.* / .chat.completions.* directly.
303
+ # Falls back to the legacy SSRF-validated fetch path when the env is unset.
304
+ export SOGNI_SKILL_USE_SDK_TRANSPORT=1
305
+ sogni-agent --api-workflow storyboard-video "10s neon city flyover"
306
+
300
307
  # Local segment + concat with external soundtrack
301
308
  sogni-agent --video --workflow v2v --ref-video dance.mp4 \
302
309
  --video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
@@ -333,15 +340,15 @@ Run `sogni-agent --help` for the full CLI. Below are the options and tables most
333
340
  | `--target-resolution <px>` | Target the short side, preserving aspect ratio |
334
341
  | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
335
342
  | `--api-chat` | Use `/v1/chat/completions` with Sogni creative-agent tools |
336
- | `--api-workflow <kind>` | Start a `/v1/creative-agent/workflows` durable workflow: `image-to-video`, `hosted-tool-sequence`, `creative-plan`, or `storyboard-video` |
337
- | `--workflow-input <json\|path\|@path>` | Explicit hosted workflow input JSON |
343
+ | `--api-workflow` | Start a `/v1/creative-agent/workflows` durable workflow with explicit `input.steps`; optional `storyboard-video` preset |
344
+ | `--workflow-input <json\|@path>` | Explicit durable workflow input JSON. Use `@path` to load JSON from a file. |
338
345
  | `--workflow-max-cost <n>`, `--confirm-cost`, `--no-confirm-cost` | Set durable workflow capacity ceiling and explicit cost confirmation |
339
346
  | `--storyboard-frames <n>` | Beat count for `--api-workflow storyboard-video` |
340
- | `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Durable image-to-video workflow inputs |
347
+ | `--video-prompt`, `--negative-prompt`, `--generate-audio`, `--expand-prompt` | Generated-keyframe durable workflow step controls |
341
348
  | `--watch-workflow`, `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Manage durable workflows |
342
349
  | `--api-tools <mode>`, `--no-api-tool-execution`, `--llm-model <id>`, `--task-profile <profile>`, `--max-tokens <n>`, `--thinking` / `--no-thinking`, `--api-base-url <url>` | Tune hosted API requests |
343
350
  | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM models |
344
- | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|path\|@path>` | Manage Sogni Intelligence replay records |
351
+ | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>` | Manage Sogni Intelligence replay records (use `@path` to load JSON from a file) |
345
352
  | `--persona <name>` | Use a saved persona |
346
353
  | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
347
354
  | `--last`, `--last-image` | Inspect last render / reuse last image as context or video reference |
@@ -501,12 +508,12 @@ Hosted API modes require `SOGNI_API_KEY`.
501
508
 
502
509
  - **`--api-chat`** targets `/v1/chat/completions` with Sogni creative-agent tools — best for text-first natural-language workflows. The CLI sanitizes prompt-injection markers before forwarding messages and can use the current server-side creative-agent media tools, including video extension, segment replacement, overlays, subtitles, stitch/orbit/dance composition, and generated artifact indexing. Tune with `--api-tools creative-agent|creative-tools|none`, `--no-api-tool-execution`, `--llm-model`, and `--system`.
503
510
  - **Sogni Intelligence controls** include `--task-profile general|coding|reasoning`, `--max-tokens`, and `--thinking` / `--no-thinking`, which forward to `/v1/chat/completions` as `task_profile`, `max_tokens`, and `chat_template_kwargs.enable_thinking`. Use `--list-api-models` or `--get-api-model <id>` to inspect `/v1/models`.
504
- - **`--api-workflow`** targets `/v1/creative-agent/workflows` for durable, async workflow records with event streaming and cancellation. Supported kinds: `image-to-video`, `hosted-tool-sequence`, `creative-plan`, and `storyboard-video`.
505
- - **`--api-workflow creative-plan`** forwards a shared `CreativeWorkflowPlan` JSON object (`{ title?, steps: [...] }`) to the API as `kind: "creative_plan"`. Compilation, hosted-tool argument validation, and persistence happen in `../sogni-api` through `@sogni/creative-agent`; the public skill does not duplicate that compiler. Use this when you need exact shared-plan behavior such as repeated `replace_video_segment` steps with `replacementStartSeconds` / `replacementEndSeconds` for interleaved video slices.
511
+ - **`--api-workflow`** targets `/v1/creative-agent/workflows` for durable, async workflow records with event streaming and cancellation. Requests carry `input.steps` plus snake_case controls such as `token_type`, `media_references`, `max_estimated_capacity_units`, and `confirm_cost`.
512
+ - **`--workflow-input`** forwards exact durable workflow JSON (`{ title?, steps: [...] }`). Use this when you need exact multi-step behavior such as repeated `replace_video_segment` steps with `replacementStartSeconds` / `replacementEndSeconds` for interleaved video slices.
506
513
  - **`--api-workflow storyboard-video`** generates a storyline, creates a single GPT Image 2 storyboard sheet, then passes that artifact into Seedance as the video reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low/medium/high quality for that storyboard sheet.
507
514
  - **Media references** from `-c`, `--ref`, `--ref-end`, `--ref-audio`, `--reference-audio-identity`, and `--ref-video` are forwarded as `media_references` metadata in hosted API requests. API chat also attaches image refs as vision inputs. Local file references are uploaded to Sogni media storage first, then forwarded as retrievable URLs so durable executors do not depend on `data:` URI support. Durable workflow JSON can bind those references into step arguments with `sourceStepId: "$input_media"`. Use direct CLI mode for private media that must not leave the local machine.
508
515
  - **Cost controls** use `--workflow-max-cost <n>` to reject workflow starts above a capacity-unit ceiling, and `--confirm-cost` / `--no-confirm-cost` to forward explicit billing confirmation.
509
- - Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact hosted workflow JSON.
516
+ - Manage runs with `--watch-workflow`, `--workflow-events`, `--stream-workflow`, `--list-workflows`, `--get-workflow`, and `--cancel-workflow`. Use `--workflow-input` to provide exact durable workflow JSON.
510
517
  - **Replay records** use `/v1/replay/records`: `--list-replays [limit]`, `--get-replay <runId>`, and `--ingest-replay <json|path|@path>` expose redacted RunRecord storage for Sogni Intelligence replay/debug viewers.
511
518
 
512
519
  Override the API origin with `--api-base-url`, `SOGNI_API_BASE_URL`, or `SOGNI_REST_ENDPOINT`.
package/SKILL.md CHANGED
@@ -36,7 +36,7 @@ metadata:
36
36
 
37
37
  Generate **images, videos, and music** using Sogni AI's decentralized GPU network.
38
38
 
39
- > **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index — one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
39
+ > **Per-skill view**: hosts that want to load focused capabilities rather than this monolith can read [`skills/README.md`](./skills/README.md) for the per-skill index — one markdown file per skill (`image_generation`, `image_editing`, `video_generation`, `video_editing`, `music_generation`, `media_analysis`, `persona_management`, `app_settings`, `composition_planning`, plus the always-loaded `quality_audit`, `session_control`, `asset_reference_management`). Each file mirrors the canonical manifest in `@sogni/creative-agent`. The whole-monolith load below stays the default for OpenClaw / Claude Code / Hermes Agent / Manus AI integrations.
40
40
 
41
41
  ## Install Request Policy
42
42
 
@@ -124,13 +124,33 @@ Path override environment variables:
124
124
  - `SOGNI_MEDIA_INBOUND_DIR`
125
125
  - `OPENCLAW_CONFIG_PATH`
126
126
 
127
- ## Usage (Images, Video & Music)
127
+ ## Recommended path: route through the hosted Sogni Intelligence endpoints
128
+
129
+ For any natural-language creative request — anything that should be planned, multi-step, or that benefits from tool selection, repair, or durable workflows — prefer the hosted endpoints over the direct-to-SDK flags. The hosted endpoints are the canonical home for tool dispatch, Structured Contracts v1 (gating policies, repair recipes, prompt contracts), durable workflows, replay, and asset-manifest mapping. They stay aligned with `sogni-chat` and the rest of the `@sogni/creative-agent` consumers automatically.
130
+
131
+ ```bash
132
+ # Natural-language creative request (LLM picks the tool, dispatches, repairs)
133
+ node sogni-agent.mjs --api-chat "Turn the attached product photo into a launch poster" --ref product.jpg
134
+
135
+ # Multi-step durable workflow (resumable, replay-friendly, server-orchestrated)
136
+ node sogni-agent.mjs --api-workflow \
137
+ --video-prompt "The camera slowly pushes in" \
138
+ "A graphite robot sketch on a drafting table"
139
+
140
+ # Storyboard → keyframe → Seedance, all server-side
141
+ node sogni-agent.mjs --api-workflow storyboard-video --storyboard-frames 6 -Q hq \
142
+ "Create a 9:16 bakery launch video with a neon street-window reveal"
143
+ ```
144
+
145
+ The direct-to-SDK flags below remain available for explicit one-shot generation when you already know the exact model, dimensions, and prompt and don't need LLM planning. Use them when latency or cost rules out the LLM round-trip.
146
+
147
+ ## Usage (direct-to-SDK image, video & music)
128
148
 
129
149
  ```bash
130
150
  # Generate and get URL
131
151
  node sogni-agent.mjs "a cat wearing a hat"
132
152
 
133
- # Quality presets (recommended — auto-selects model, steps, and size)
153
+ # Quality presets (recommended for direct mode — auto-selects model, steps, and size)
134
154
  node sogni-agent.mjs -Q fast "a cat wearing a hat" # z_image_turbo, 8 steps, 512x512 (~5-10s)
135
155
  node sogni-agent.mjs -Q hq "a cat wearing a hat" # z_image_turbo, default steps, 768x768 (~10-15s)
136
156
  node sogni-agent.mjs -Q pro "a cat wearing a hat" # flux2_dev, 40 steps, 1024x1024 (~2min)
@@ -179,20 +199,20 @@ node sogni-agent.mjs --api-chat --task-profile reasoning --no-thinking \
179
199
  node sogni-agent.mjs --list-replays 20
180
200
  node sogni-agent.mjs --get-replay run_abc123 --json
181
201
 
182
- # Durable API workflow: async image-to-video with resumable workflow record
183
- node sogni-agent.mjs --api-workflow image-to-video \
202
+ # Durable API workflow: generated keyframe to video with resumable workflow record
203
+ node sogni-agent.mjs --api-workflow \
184
204
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
185
205
  "A graphite robot sketch on a drafting table"
186
206
 
187
207
  # Durable API workflow with media reference and cost controls
188
- node sogni-agent.mjs --api-workflow image-to-video \
208
+ node sogni-agent.mjs --api-workflow \
189
209
  --ref https://cdn.example.com/sketch.png \
190
210
  --workflow-max-cost 25 --confirm-cost \
191
211
  --video-prompt "The camera slowly pushes in as the sketch comes alive" \
192
212
  "Animate the referenced sketch"
193
213
 
194
- # Shared CreativeWorkflowPlan: API compiles and validates through @sogni/creative-agent
195
- node sogni-agent.mjs --api-workflow creative-plan --workflow-input @plan.json
214
+ # Exact durable workflow input with explicit steps
215
+ node sogni-agent.mjs --api-workflow --workflow-input @workflow.json
196
216
 
197
217
  # Durable storyboard-video workflow: storyline -> GPT Image 2 storyboard -> Seedance
198
218
  node sogni-agent.mjs --api-workflow storyboard-video --storyboard-frames 6 --duration 12 -Q hq \
@@ -204,13 +224,12 @@ Sogni API's OpenAI-compatible `/v1/chat/completions` tool loop. This path
204
224
  sanitizes prompt-injection markers before forwarding messages and uses the
205
225
  current hosted creative-agent tool surface. Use `--api-workflow` when the caller
206
226
  already knows it wants an async durable workflow record under
207
- `/v1/creative-agent/workflows`. Use `--api-workflow creative-plan` when the
208
- caller already has a shared `CreativeWorkflowPlan`; the skill forwards it as
209
- `kind: "creative_plan"` and lets Sogni API compile, validate, and persist it
210
- through `@sogni/creative-agent`. This is the preferred hosted path for exact
211
- multi-step plans, including repeated `replace_video_segment` operations with
212
- `replacementStartSeconds` / `replacementEndSeconds` when interleaving existing
213
- video slices. Use `--api-workflow storyboard-video`
227
+ `/v1/creative-agent/workflows`. Use `--workflow-input @workflow.json` when the
228
+ caller already has exact durable workflow input with `steps`; the skill forwards
229
+ that body to the API as-is. This is the preferred hosted path for
230
+ exact multi-step plans, including repeated `replace_video_segment` operations
231
+ with `replacementStartSeconds` / `replacementEndSeconds` when interleaving
232
+ existing video slices. Use `--api-workflow storyboard-video`
214
233
  when the caller wants the hosted sequence to generate a storyline, create one GPT
215
234
  Image 2 storyboard sheet, and feed that image artifact into Seedance as the video
216
235
  reference. The `-Q fast|hq|pro` preset maps to GPT Image 2 low|medium|high
@@ -233,6 +252,16 @@ viewers.
233
252
  Hosted API modes require `SOGNI_API_KEY`; this skill's CLI uses API-key
234
253
  authentication.
235
254
 
255
+ For durable hosted chat runs (long-running multi-tool turns that should
256
+ survive a client disconnect), the SDK now exposes
257
+ `sogni.chat.runs.{create, get, cancel, streamEvents}`.
258
+ Set `SOGNI_SKILL_USE_SDK_TRANSPORT=1` to route hosted workflow + chat
259
+ operations through the SDK transport instead of the legacy
260
+ SSRF-validated fetch path. The skill's `sogni-hosted-client.mjs`
261
+ factory still validates `restEndpoint` / `socketEndpoint` against the
262
+ SSRF guard before constructing the SDK client, so the safety contract
263
+ holds.
264
+
236
265
  When changing hosted API chat/workflow behavior, keep reusable validation,
237
266
  workflow compilation, repair-control, and guard telemetry logic in
238
267
  `../sogni-creative-agent` first. The public skill should consume generated or
@@ -335,17 +364,26 @@ positions.
335
364
  | `--max-tokens <n>` | Max hosted chat completion tokens | 1600 |
336
365
  | `--thinking`, `--no-thinking` | Toggle `chat_template_kwargs.enable_thinking` for hosted chat | server default |
337
366
  | `--list-api-models`, `--get-api-model <id>` | Inspect Sogni Intelligence LLM model metadata | - |
338
- | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|path\|@path>` | Manage Sogni Intelligence replay RunRecords | - |
339
- | `--api-workflow <kind>` | Start durable workflow: image-to-video\|hosted-tool-sequence\|creative-plan\|storyboard-video | - |
340
- | `--workflow-input <json\|path\|@path>` | Workflow input JSON for hosted tool sequences/custom starts | - |
341
- | `--workflow-title <text>` | Title for hosted-tool-sequence, creative-plan, or storyboard-video workflow input | - |
367
+ | `--list-replays [n]`, `--get-replay <id>`, `--ingest-replay <json\|@path>` | Manage Sogni Intelligence replay RunRecords. List/get output is run through `redactRunRecord` from `@sogni/creative-agent/replay` before printing, so signed URLs, bearer tokens, JWTs, and PEM blocks cannot leak via the CLI. Use `@path` to load JSON from a file. | - |
368
+ | `--skip-redact`, `--no-redact` | Bypass the replay redactor on `--list-replays` / `--get-replay`. Debug-only — emits unredacted RunRecord payloads. | redacted |
369
+ | `--turn-classify` | Print the public-skill turn policy (`visibleTools`, `forbiddenTools`, `requiredTools`) the default contract runtime would produce for the current session-state flags. Mirrors the chat / `/v1/chat/completions` Structured Contracts v1 pipeline. | - |
370
+ | `--compile-tools` | Print the per-turn compiled tool surface (filtered tool list + prompt-contract fragments) the default contract runtime emits. | - |
371
+ | `--dispatch-tool <name>` | Print the dispatch verdict (`allowed`, `mode`, repair recipe, suggested args) the default contract runtime would return for a tool call. Combine with `--tool-args` to supply arguments. | - |
372
+ | `--tool-args <json>` | JSON arguments for `--dispatch-tool`. | `{}` |
373
+ | `--storyboard-plan` | Build a storyboard project from the prompt locally (`buildStoryboardProject` + per-model adapter compilation via `compileForModel`) and print the plan as JSON. Does not call the network. Expects scene-structured prompt input (`SCENE NN - Title` / `VISUAL:` / `ACTION:` / `CAMERA:` / `AUDIO/SFX:` blocks) — for casual prompts, use `--api-workflow storyboard-video` instead, which runs an LLM storyline expansion first. Pair with `--storyboard-plan-frames`, `--storyboard-plan-model`, `--storyboard-plan-stage`. | - |
374
+ | `--storyboard-plan-frames <n>` | Frame count for `--storyboard-plan`. | inferred |
375
+ | `--storyboard-plan-model <id>` | Adapter target for `--storyboard-plan` (seedance, seedance2, gpt-image-2, ltx23, wan). | inferred |
376
+ | `--storyboard-plan-stage <stage>` | Compilation stage for `--storyboard-plan` (storyboard_image, scene_clip). | storyboard_image |
377
+ | `--api-workflow` | Start a durable workflow with explicit `input.steps`; optional `storyboard-video` preset | - |
378
+ | `--workflow-input <json\|@path>` | Durable workflow input JSON. Use `@path` to load from a file. | - |
379
+ | `--workflow-title <text>` | Title for generated or storyboard durable workflow input | - |
342
380
  | `--workflow-max-cost <n>` | Reject hosted workflow starts above this estimated capacity-unit ceiling | - |
343
381
  | `--confirm-cost`, `--no-confirm-cost` | Forward explicit hosted workflow cost confirmation | - |
344
382
  | `--storyboard-frames <n>` | Beat count for storyboard-video workflow | - |
345
- | `--video-prompt <text>` | Motion prompt for durable image-to-video workflow | - |
346
- | `--negative-prompt <text>` | Negative prompt for durable image-to-video workflow | - |
347
- | `--generate-audio`, `--no-generate-audio` | Toggle audio generation for durable image-to-video | - |
348
- | `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for durable image-to-video | - |
383
+ | `--video-prompt <text>` | Motion prompt for generated-keyframe durable workflow | - |
384
+ | `--negative-prompt <text>` | Negative prompt for generated-keyframe durable workflow | - |
385
+ | `--generate-audio`, `--no-generate-audio` | Toggle audio generation for generated video steps | - |
386
+ | `--expand-prompt`, `--no-expand-prompt` | Toggle prompt expansion for generated video steps | - |
349
387
  | `--watch-workflow` | Stream durable workflow events after start | - |
350
388
  | `--list-workflows`, `--get-workflow <id>`, `--workflow-events <id>`, `--stream-workflow <id>`, `--cancel-workflow <id>` | Durable workflow management helpers | - |
351
389
  | `--api-base-url <url>` | Sogni API base for hosted API modes. Credentials are only sent to `https://api.sogni.ai` by default; use `SOGNI_API_ALLOWED_HOSTS` for trusted custom hosts or `SOGNI_ALLOW_UNSAFE_API_BASE_URL=1` for isolated local testing. | https://api.sogni.ai |