@sogni-ai/sogni-creative-agent-skill 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Mauvis Ledford
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,385 @@
1
+ <p align="center">
2
+ <img src="https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/docs/screenshot.jpg" alt="Telegram image render workflow" width="320" />
3
+ </p>
4
+
5
+ # Sogni Creative Agent Skill: Creative AI Superpowers for All AI Agent Runtimes
6
+
7
+ **Sogni Creative Agent Skill** gives any AI agent runtime — Claude Code, Claude Desktop,
8
+ [OpenClaw](https://github.com/OpenClaw/OpenClaw),
9
+ [Hermes Agent](https://hermes-agent.nousresearch.com/),
10
+ [Manus AI](https://manus.im), and more — image generation, video generation, and
11
+ creative-media tools powered by [Sogni AI](https://sogni.ai)'s decentralized GPU
12
+ network.
13
+
14
+ Drop it into the setup you already have:
15
+ - as a standalone Node.js CLI
16
+ - as a skill source for **Hermes Agent**, **Manus AI**, and other agent frameworks
17
+ - as an [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin
18
+ - optionally as an [MCP server](https://modelcontextprotocol.io/) for **Claude Code**, **Claude Desktop**, and any MCP-compatible agent
19
+
20
+ For generic install requests, use the CLI plus skill setup by default. Configure MCP only when specifically requested.
21
+
22
+ With Sogni Creative Agent Skill, an agent can:
23
+ - generate images from prompts
24
+ - edit and restyle existing images
25
+ - create videos from text, images, audio, or reference video
26
+ - save personas, preferences, and last-render state across sessions
27
+ - check balances, list models, and refine previous results
28
+
29
+ ## Quick Start
30
+
31
+ 1. Create Sogni credentials once. See [Setup](#setup).
32
+ 2. Install the command-line tool:
33
+
34
+ ```bash
35
+ npm install -g @sogni-ai/sogni-creative-agent-skill
36
+ sogni-agent --version
37
+ ```
38
+
39
+ 3. Point your agent/runtime at this repository's [`SKILL.md`](./SKILL.md).
40
+
41
+ Then ask your agent to do something simple, for example:
42
+ - "Generate an image of a sunset over mountains"
43
+ - "Edit this image to add a rainbow"
44
+ - "Make a video of a cat playing piano"
45
+ - "Turn my selfie into James Bond using photobooth"
46
+ - "Refine the last image at higher quality"
47
+
48
+ ## Installation
49
+
50
+ Default install behavior: when someone asks to install Sogni Creative Agent Skill, install the command-line tool and skill source. Do not configure MCP unless they specifically ask for MCP, Claude MCP, or Claude Desktop integration.
51
+
52
+ ```bash
53
+ npm install -g @sogni-ai/sogni-creative-agent-skill
54
+ sogni-agent --version
55
+ ```
56
+
57
+ Then point the agent/runtime at this repository's [`SKILL.md`](./SKILL.md). MCP setup is optional and documented separately below.
58
+
59
+ ### OpenClaw Plugin
60
+
61
+ For the published plugin:
62
+
63
+ ```bash
64
+ openclaw plugins install sogni-creative-agent-skill
65
+ ```
66
+
67
+ The installed plugin loads its behavior from [`SKILL.md`](./SKILL.md) via [`openclaw.plugin.json`](./openclaw.plugin.json).
68
+
69
+ For a local checkout that you want to update continuously, link the minimal OpenClaw surface instead of the repository root:
70
+
71
+ ```bash
72
+ cd /path/to/sogni-creative-agent-skill
73
+ npm install
74
+ npm link
75
+ npm run openclaw:sync
76
+ openclaw plugins install -l "$PWD/.openclaw-link"
77
+ openclaw gateway restart
78
+ ```
79
+
80
+ To update that linked install later:
81
+
82
+ ```bash
83
+ cd /path/to/sogni-creative-agent-skill
84
+ git pull --ff-only
85
+ npm install
86
+ npm link
87
+ npm run openclaw:sync
88
+ openclaw gateway restart
89
+ ```
90
+
91
+ Do not run `openclaw plugins install -l "$PWD"` from the repository root. The root contains development tests that use `child_process`, and OpenClaw correctly blocks those during plugin safety scanning. The generated `.openclaw-link/` directory is only for OpenClaw; Hermes, Manus, and other skill-based agents should continue using the root [`SKILL.md`](./SKILL.md).
92
+
93
+ ### Hermes Agent / Manus / Other Frameworks
94
+
95
+ Point the agent to this repository's [`SKILL.md`](./SKILL.md) for behavior guidance and [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/sogni-creative-agent-skill/main/llm.txt) for install/setup help. By default, the agent should invoke the globally installed `sogni-agent` CLI. Configure MCP only when specifically requested.
96
+
97
+ ### Manual Installation
98
+
99
+ ```bash
100
+ git clone git@github.com:Sogni-AI/sogni-creative-agent-skill.git
101
+ cd sogni-creative-agent-skill
102
+ npm install
103
+ ```
104
+
105
+ ### Advanced OpenClaw Config
106
+
107
+ When loaded through OpenClaw, Sogni Creative Agent Skill reads plugin defaults from OpenClaw config. CLI flags always override those defaults.
108
+
109
+ The supported config shape is defined in [`openclaw.plugin.json`](./openclaw.plugin.json). Common overrides include default models, video workflow models, token type, seed strategy, timeouts, and media paths. If your OpenClaw config lives elsewhere, set `OPENCLAW_CONFIG_PATH`.
110
+
111
+ ## Setup
112
+
113
+ 1. Create a Sogni account at https://app.sogni.ai/
114
+ 2. Create credentials file:
115
+
116
+ ```bash
117
+ mkdir -p ~/.config/sogni
118
+ cat > ~/.config/sogni/credentials << 'EOF'
119
+ SOGNI_API_KEY=your_api_key
120
+ # or:
121
+ # SOGNI_USERNAME=your_username
122
+ # SOGNI_PASSWORD=your_password
123
+ EOF
124
+ chmod 600 ~/.config/sogni/credentials
125
+ ```
126
+
127
+ You can also skip the file and set `SOGNI_API_KEY`, or `SOGNI_USERNAME` + `SOGNI_PASSWORD`, in your environment.
128
+
129
+ ### Filesystem Paths and Overrides
130
+
131
+ Defaults live under `~/.config/sogni/` for credentials, last-render metadata, personas, memories, and personality. Advanced path overrides are available through `SOGNI_CREDENTIALS_PATH`, `SOGNI_LAST_RENDER_PATH`, `SOGNI_MEDIA_INBOUND_DIR`, `OPENCLAW_CONFIG_PATH`, and MCP-specific download settings.
132
+
133
+ ## Claude Code and Claude Desktop MCP (Optional)
134
+
135
+ This section is only for users who specifically want MCP. For a generic install request, use the command-line tool plus skill setup above.
136
+
137
+ ### Claude Code (one command)
138
+
139
+ ```bash
140
+ claude mcp add sogni -- npx -y -p @sogni-ai/sogni-creative-agent-skill sogni-agent-mcp
141
+ ```
142
+
143
+ ### Claude Desktop
144
+
145
+ Add to `~/Library/Application Support/Claude/claude_desktop_config.json`:
146
+
147
+ ```json
148
+ {
149
+ "mcpServers": {
150
+ "sogni": {
151
+ "command": "npx",
152
+ "args": ["-y", "-p", "@sogni-ai/sogni-creative-agent-skill", "sogni-agent-mcp"]
153
+ }
154
+ }
155
+ }
156
+ ```
157
+
158
+ Restart Claude Desktop after saving.
159
+
160
+ The MCP server exposes the same core image/video/edit/persona/balance workflows as the CLI, plus local helpers such as last-frame extraction and video stitching.
161
+
162
+ ## Usage
163
+
164
+ ```bash
165
+ # Image generation
166
+ sogni-agent -Q hq -o dragon.png "a dragon eating tacos"
167
+
168
+ # Edit an image
169
+ sogni-agent -c subject.jpg "add a neon cyberpunk glow"
170
+
171
+ # Photobooth face transfer
172
+ sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
173
+
174
+ # Text-to-video (t2v)
175
+ sogni-agent --video "A narrator says \"welcome to the story\" as ocean waves crash"
176
+
177
+ # Short-side targeting preserves the current shape without forcing landscape
178
+ sogni-agent --video --target-resolution 768 \
179
+ "A calm cinematic shot of lanterns drifting across a night lake"
180
+
181
+ # Seedance 2.0 explicit aliases (4-15s vendor video path)
182
+ sogni-agent --video -m seedance2 --duration 8 \
183
+ "A polished product reveal with native ambient sound"
184
+
185
+ # Image-to-video (i2v)
186
+ sogni-agent --video --ref cat.jpg "gentle camera pan"
187
+
188
+ # Image+audio-to-video (auto-routes to LTX 2.3 ia2v)
189
+ sogni-agent --video --ref cover.jpg --ref-audio song.mp3 \
190
+ "music video with synchronized motion"
191
+
192
+ # Persona or voice identity with LTX native audio
193
+ sogni-agent --video --reference-audio-identity voice.webm \
194
+ "NARRATOR: \"This is my voice.\""
195
+
196
+ # Segment a source video, then stitch clips locally with an external soundtrack
197
+ sogni-agent --video --workflow v2v --ref-video dance.mp4 \
198
+ --video-start 10 --duration 8 --controlnet-name pose -o /tmp/clip-2.mp4 \
199
+ "robot dancing"
200
+ sogni-agent --concat-videos /tmp/final.mp4 /tmp/clip-1.mp4 /tmp/clip-2.mp4 \
201
+ --concat-audio song.mp3 --concat-audio-start 0
202
+
203
+ # Balances and help
204
+ sogni-agent --balance
205
+ sogni-agent --help
206
+ ```
207
+
208
+ For local multi-clip workflows, prefer the built-in FFmpeg wrappers over raw shell commands. `--video-start`, `--audio-start`, and `--audio-duration` let you generate focused segments, while `--concat-videos` can stitch them and optionally mux a single soundtrack with `--concat-audio`.
209
+
210
+ V2V defaults mirror the Sogni Chat workflow tuning: `canny`, `pose`, and `depth` use ControlNet strength `0.85` with detailer assist, while `detailer` uses strength `1.0`. Use `-m seedance2-v2v` for Seedance V2V without ControlNet.
211
+
212
+ ## LTX-2.3 Prompting Guide
213
+
214
+ When you use `ltx23-22b-fp8_t2v_distilled`, do not feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
215
+
216
+ - Write one unbroken paragraph with no line breaks, bullets, headers, or tag blocks.
217
+ - Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
218
+ - Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
219
+ - Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
220
+ - If the user wants dialogue, include the exact spoken words in double quotes with the speaker and delivery identified inline.
221
+ - Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
222
+ - Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
223
+ - Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
224
+
225
+ Example rewrite:
226
+
227
+ ```text
228
+ User ask: "make a 4k video of a woman in a neon alley"
229
+
230
+ LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
231
+ ```
232
+
233
+ ## Photobooth (Face Transfer)
234
+
235
+ Generate stylized portraits from a face photo using InstantID ControlNet:
236
+
237
+ ```bash
238
+ sogni-agent --photobooth --ref face.jpg "80s fashion portrait"
239
+ sogni-agent --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
240
+ ```
241
+
242
+ Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
243
+
244
+ Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA.
245
+ `--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
246
+ `--balance` / `--balances` does not require a prompt and exits after printing current `SPARK` and `SOGNI` balances.
247
+
248
+ ## Video Sizing Rules (Aspect Ratios)
249
+
250
+ - WAN models use dimensions divisible by 16, min 480px, max 1536px.
251
+ - LTX family models (`ltx2-*`, `ltx23-*`) use dimensions divisible by 64. LTX 2.3 supports 640px to 3840px.
252
+ - The script auto-normalizes video sizes to satisfy those constraints.
253
+ - Use `--target-resolution <px>` for bare resolution requests such as "720p" when the user did not specify exact pixels. It targets the short side and preserves the inherited aspect ratio.
254
+ - For i2v (and any workflow using `--ref` / `--ref-end`), the client wrapper resizes the reference image with a strict aspect-fit (`fit: inside`) and then uses the *resized reference dimensions* as the final video size. Because that resize uses rounding, a “valid” requested size can still produce an invalid final size (example: `1024x1536` requested, but ref becomes `1024x1535`).
255
+ - `sogni-agent` detects this for local refs and will auto-adjust the requested size to a nearby safe size so the resized reference matches the model divisor.
256
+ - If you want the script to fail instead of auto-adjusting, pass `--strict-size` and it will print a suggested size.
257
+
258
+ ## Error Reporting
259
+
260
+ Failures use a non-zero exit code and human-readable stderr. Add `--json` when an agent needs structured success/error output.
261
+
262
+ ## Options
263
+
264
+ Run `sogni-agent --help` for the complete CLI. These are the options most agents should reach for first:
265
+
266
+ | Option | Use |
267
+ |--------|-----|
268
+ | `-Q fast|hq|pro` | Pick image quality without memorizing model IDs |
269
+ | `-o <path>` | Save output locally |
270
+ | `-c <path>` | Provide image context for edits |
271
+ | `--video` | Generate video instead of image |
272
+ | `--ref`, `--ref-audio`, `--ref-video` | Provide image/audio/video references |
273
+ | `--target-resolution <px>` | Target the short side while preserving aspect ratio |
274
+ | `--workflow <type>` | Force `t2v`, `i2v`, `s2v`, `ia2v`, `a2v`, `v2v`, or animate workflows |
275
+ | `--persona <name>` | Use a saved persona reference |
276
+ | `--concat-videos <out> <clips...>` | Stitch clips locally with FFmpeg |
277
+ | `--json` | Return structured output for agents |
278
+
279
+ ### Quality Presets
280
+
281
+ Instead of remembering model IDs, use `--quality` / `-Q` to auto-select the right model, steps, and dimensions:
282
+
283
+ | Preset | Model | Steps | Size | Speed |
284
+ |--------|-------|-------|------|-------|
285
+ | `fast` | z_image_turbo_bf16 | 8 | 512x512 | ~5-10s |
286
+ | `hq` | z_image_turbo_bf16 | default | 768x768 | ~10-15s |
287
+ | `pro` | flux2_dev_fp8 | 40 | 1024x1024 | ~2min |
288
+
289
+ Explicit `--model` overrides the quality preset's model. Explicit `-w`/`-h` overrides dimensions.
290
+
291
+ ### Dynamic Prompt Variations
292
+
293
+ Generate diverse images in a single call using `{option1|option2|option3}` syntax:
294
+
295
+ ```bash
296
+ # Generates 3 images: "a red car", "a blue car", "a green car"
297
+ sogni-agent -n 3 "a {red|blue|green} car"
298
+
299
+ # Multiple variation groups cycle independently
300
+ sogni-agent -n 4 "a {cat|dog} in a {garden|kitchen}"
301
+ # → "a cat in a garden", "a dog in a kitchen", "a cat in a garden", "a dog in a kitchen"
302
+ ```
303
+
304
+ Options cycle sequentially per image. Without `{...}` syntax, `-n` generates multiple images with the same prompt as before.
305
+
306
+ ### Token Auto-Fallback
307
+
308
+ Use `--token-type auto` to automatically retry with SOGNI tokens if SPARK balance is insufficient:
309
+
310
+ ```bash
311
+ sogni-agent --token-type auto "a dragon eating tacos"
312
+ ```
313
+
314
+ This tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
315
+
316
+ ### Personas
317
+
318
+ Named people with saved reference photos and optional voice clips for identity-preserving generation:
319
+
320
+ ```bash
321
+ # Add a persona
322
+ sogni-agent --persona-add "Mark" --ref face.jpg --relationship self --description "30s male, brown hair"
323
+
324
+ # Add with voice clip for video voice cloning
325
+ sogni-agent --persona-add "Sarah" --ref sarah.jpg --relationship partner --voice-clip voice.webm
326
+
327
+ # Generate an image using a persona (auto-injects photo as context)
328
+ sogni-agent --persona "Mark" -o hero.png "superhero in dramatic lighting"
329
+
330
+ # Generate video using a persona photo plus saved voice identity
331
+ sogni-agent --video --persona "Sarah" "SARAH: \"This is my voice.\""
332
+
333
+ # List / remove
334
+ sogni-agent --persona-list
335
+ sogni-agent --persona-remove "Mark"
336
+ ```
337
+
338
+ Personas are stored at `~/.config/sogni/personas/`. Pronouns like "me"/"myself" auto-resolve to the `self` persona. "my wife" resolves to `partner`, etc.
339
+
340
+ ### Memory (Persistent Preferences)
341
+
342
+ Save preferences that agents respect across sessions:
343
+
344
+ ```bash
345
+ sogni-agent --memory-set preferred_style "watercolor and soft lighting"
346
+ sogni-agent --memory-set aspect_ratio "16:9"
347
+ sogni-agent --memory-list
348
+ sogni-agent --memory-remove preferred_style
349
+ ```
350
+
351
+ Stored at `~/.config/sogni/memories.json`.
352
+
353
+ ### Personality (Custom Agent Instructions)
354
+
355
+ Set how the agent should behave:
356
+
357
+ ```bash
358
+ sogni-agent --personality-set "Be concise, always use cinematic lighting"
359
+ sogni-agent --personality-get
360
+ sogni-agent --personality-clear
361
+ ```
362
+
363
+ Stored at `~/.config/sogni/personality.txt`.
364
+
365
+ ## Models
366
+
367
+ Prefer `-Q fast|hq|pro` for images and automatic workflow routing for video. Only pass `-m` when you need a specific model family.
368
+
369
+ | Need | Recommended model or alias |
370
+ |------|----------------------------|
371
+ | Default images | `z_image_turbo_bf16` |
372
+ | Highest quality images | `flux2_dev_fp8` or `-Q pro` |
373
+ | Image editing | `qwen_image_edit_2511_fp8_lightning` |
374
+ | Photobooth face transfer | `coreml-sogniXLturbo_alpha1_ad` |
375
+ | Text-to-video with native dialogue/audio | `ltx23-22b-fp8_t2v_distilled` |
376
+ | Image+audio-to-video | `ltx23-22b-fp8_ia2v_distilled` |
377
+ | Audio-to-video | `ltx23-22b-fp8_a2v_distilled` |
378
+ | Video-to-video with ControlNet | `ltx23-22b-fp8_v2v_distilled` |
379
+ | Seedance text-to-video | `seedance2` or `seedance2-fast` |
380
+ | Seedance video-to-video without ControlNet | `seedance2-v2v` |
381
+ | Face lip-sync with uploaded audio | `wan_v2.2-14b-fp8_s2v_lightx2v` |
382
+
383
+ ## License
384
+
385
+ MIT