sogni-gen 1.5.13 → 1.5.15
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +31 -8
- package/SKILL.md +74 -5
- package/desktop-extension/manifest.json +1 -1
- package/desktop-extension/server/package.json +1 -1
- package/llm.txt +11 -182
- package/openclaw.plugin.json +1 -1
- package/package.json +1 -1
- package/sogni-gen.mjs +1 -1
package/README.md
CHANGED
|
@@ -14,10 +14,10 @@ Works as:
|
|
|
14
14
|
## Quick Start (OpenClaw + Manus)
|
|
15
15
|
|
|
16
16
|
1. Create Sogni credentials (one-time): see [Setup](#setup).
|
|
17
|
-
2. For OpenClaw,
|
|
17
|
+
2. For OpenClaw, install the plugin:
|
|
18
18
|
|
|
19
|
-
```
|
|
20
|
-
|
|
19
|
+
```bash
|
|
20
|
+
openclaw plugins install sogni-gen
|
|
21
21
|
```
|
|
22
22
|
|
|
23
23
|
3. For Manus AI agent, point it to this repository:
|
|
@@ -36,16 +36,18 @@ Then ask your agent:
|
|
|
36
36
|
|
|
37
37
|
## OpenClaw Installation (Recommended)
|
|
38
38
|
|
|
39
|
-
### Quick Install (URL)
|
|
40
|
-
|
|
41
|
-
Point OpenClaw to the [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/openclaw-sogni-gen/main/llm.txt). This is the fastest setup path.
|
|
42
|
-
|
|
43
39
|
### Plugin Install
|
|
44
40
|
|
|
45
41
|
```bash
|
|
46
42
|
openclaw plugins install sogni-gen
|
|
47
43
|
```
|
|
48
44
|
|
|
45
|
+
The installed plugin loads its behavior from [`SKILL.md`](./SKILL.md) via [`openclaw.plugin.json`](./openclaw.plugin.json).
|
|
46
|
+
|
|
47
|
+
### Optional Install Helper
|
|
48
|
+
|
|
49
|
+
[`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/openclaw-sogni-gen/main/llm.txt) is now only a lightweight install/setup helper. It is not the primary behavior source for the installed OpenClaw plugin.
|
|
50
|
+
|
|
49
51
|
### Manual Installation
|
|
50
52
|
|
|
51
53
|
```bash
|
|
@@ -261,7 +263,7 @@ node sogni-gen.mjs --video --workflow a2v --ref-audio song.mp3 \
|
|
|
261
263
|
|
|
262
264
|
# LTX-2.3 text-to-video
|
|
263
265
|
node sogni-gen.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
|
|
264
|
-
"cinematic
|
|
266
|
+
"A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
|
|
265
267
|
|
|
266
268
|
# Animate (motion transfer)
|
|
267
269
|
node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 \
|
|
@@ -272,6 +274,27 @@ node sogni-gen.mjs --video --estimate-video-cost --steps 20 \
|
|
|
272
274
|
-m wan_v2.2-14b-fp8_t2v_lightx2v "ocean waves at sunset"
|
|
273
275
|
```
|
|
274
276
|
|
|
277
|
+
## LTX-2.3 Prompting Guide
|
|
278
|
+
|
|
279
|
+
When you use `ltx23-22b-fp8_t2v_distilled`, do not feed it short tag prompts like `"cinematic drone shot over tropical cliffs"`. LTX-2.3 renders more reliably from a dense natural-language scene description.
|
|
280
|
+
|
|
281
|
+
- Write one unbroken paragraph with no line breaks, bullets, headers, or tag blocks.
|
|
282
|
+
- Use 4-8 flowing present-tense sentences describing one continuous shot, not a montage.
|
|
283
|
+
- Start with shot scale and scene identity, then cover environment, time of day, textures, and named light sources.
|
|
284
|
+
- Keep characters and objects concrete and stable. Describe one main action thread from start to finish.
|
|
285
|
+
- If the user wants dialogue, weave it into the prose with the speaker and delivery identified inline.
|
|
286
|
+
- Express mood through visible behavior, motion, and sound cues instead of vague adjectives.
|
|
287
|
+
- Use positive phrasing. Avoid script formatting, negative prompts, on-screen text/logo requests, and generic filler words like "beautiful" or "nice".
|
|
288
|
+
- Match scene density to clip length. For the default short clips, describe one main beat rather than several unrelated actions.
|
|
289
|
+
|
|
290
|
+
Example rewrite:
|
|
291
|
+
|
|
292
|
+
```text
|
|
293
|
+
User ask: "make a 4k video of a woman in a neon alley"
|
|
294
|
+
|
|
295
|
+
LTX-2.3 prompt: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
|
|
296
|
+
```
|
|
297
|
+
|
|
275
298
|
## Photobooth (Face Transfer)
|
|
276
299
|
|
|
277
300
|
Generate stylized portraits from a face photo using InstantID ControlNet:
|
package/SKILL.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: sogni-gen
|
|
3
|
-
version: "1.5.
|
|
3
|
+
version: "1.5.15"
|
|
4
4
|
description: Generate images **and videos** using Sogni AI's decentralized network, with local credential/config files and optional local media inputs. Ask the agent to "draw", "generate", "create an image", or "make a video/animate" from a prompt or reference image.
|
|
5
5
|
homepage: https://sogni.ai
|
|
6
6
|
metadata:
|
|
@@ -147,7 +147,7 @@ node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"
|
|
|
147
147
|
| `-c, --context <path>` | Context image for editing | - |
|
|
148
148
|
| `--last-image` | Use last generated image as context/ref | - |
|
|
149
149
|
| `--video, -v` | Generate video instead of image | - |
|
|
150
|
-
| `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|v2v\|animate-move\|animate-replace) | inferred |
|
|
150
|
+
| `--workflow <type>` | Video workflow (t2v\|i2v\|s2v\|ia2v\|a2v\|v2v\|animate-move\|animate-replace) | inferred |
|
|
151
151
|
| `--fps <num>` | Frames per second (video) | 16 |
|
|
152
152
|
| `--duration <sec>` | Duration in seconds (video) | 5 |
|
|
153
153
|
| `--frames <num>` | Override total frames (video) | - |
|
|
@@ -404,7 +404,7 @@ node sogni-gen.mjs --video --workflow a2v --ref-audio song.mp3 \
|
|
|
404
404
|
|
|
405
405
|
# LTX-2.3 text-to-video
|
|
406
406
|
node sogni-gen.mjs --video -m ltx23-22b-fp8_t2v_distilled --duration 20 \
|
|
407
|
-
"cinematic
|
|
407
|
+
"A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
|
|
408
408
|
|
|
409
409
|
# Animate (motion transfer)
|
|
410
410
|
node sogni-gen.mjs --video --ref subject.jpg --ref-video motion.mp4 \
|
|
@@ -463,6 +463,58 @@ node {{skillDir}}/sogni-gen.mjs --json --list-media images
|
|
|
463
463
|
- If the user message includes the word "photobooth" (case-insensitive), always use `--photobooth` mode with `--ref` set to the user-provided face image.
|
|
464
464
|
- Prioritize this rule over generic image-edit flows (`-c`) for that request.
|
|
465
465
|
|
|
466
|
+
## LTX-2.3 Prompt Rule
|
|
467
|
+
|
|
468
|
+
Whenever the chosen video model is `ltx23-22b-fp8_t2v_distilled`, do not pass the user's short request through unchanged. Rewrite it into an LTX-2.3-safe prompt before calling `sogni-gen`.
|
|
469
|
+
|
|
470
|
+
- Output one single paragraph only. No line breaks, bullet points, section labels, tag lists, or screenplay formatting.
|
|
471
|
+
- Use 4-8 flowing present-tense sentences describing one continuous shot. No cuts, montage, or unrelated scene jumps.
|
|
472
|
+
- Start with shot scale plus the scene's visual identity, then describe environment, time of day, atmosphere, textures, and specific light sources.
|
|
473
|
+
- Keep people, clothing, props, and locations concrete and stable across the whole paragraph.
|
|
474
|
+
- Give the scene one main action thread from start to finish. Use connectors like `as`, `while`, and `then` so motion reads as a continuous filmed moment.
|
|
475
|
+
- If the user asks for dialogue, embed the spoken words inline as prose and identify who is speaking and how they deliver the line.
|
|
476
|
+
- Express emotion through visible physical cues such as posture, grip, jaw tension, breathing, or pacing. Ambient sound can be woven into the prose naturally.
|
|
477
|
+
- Use positive phrasing only. Do not add negative prompts, "no ..." clauses, on-screen text/logo requests, vague filler words like `beautiful` or `nice`, or structural markup such as `[DIALOGUE]`.
|
|
478
|
+
- Keep action density proportional to duration. For short clips, describe one main beat rather than several separate events.
|
|
479
|
+
- Preserve the user's request, but expand it into cinematic prose. Do not invent a different story just to make the prompt longer.
|
|
480
|
+
|
|
481
|
+
### Duration-Aware Pacing
|
|
482
|
+
|
|
483
|
+
Match scene density to clip length so prompts stay filmable:
|
|
484
|
+
|
|
485
|
+
- About `1-4s`: describe exactly 1 action or moment.
|
|
486
|
+
- About `5-8s`: describe about 2 sequential actions.
|
|
487
|
+
- About `9-12s`: describe about 3 sequential actions.
|
|
488
|
+
- Longer clips: add only a small number of additional sequential beats. Do not turn the prompt into a montage or a full story arc unless the duration clearly supports it.
|
|
489
|
+
|
|
490
|
+
### Orientation Mapping
|
|
491
|
+
|
|
492
|
+
When the user explicitly asks for an orientation or aspect ratio, map it to safe LTX dimensions:
|
|
493
|
+
|
|
494
|
+
- `vertical`, `portrait`, `story`, `reel`, `tiktok` -> `-w 1088 -h 1920`
|
|
495
|
+
- `landscape`, `horizontal`, `widescreen`, `youtube`, `16:9` -> `-w 1920 -h 1088`
|
|
496
|
+
- `square`, `1:1` -> `-w 1088 -h 1088`
|
|
497
|
+
- `4:3 portrait` -> `-w 832 -h 1088`
|
|
498
|
+
- `4:3 landscape` -> `-w 1088 -h 832`
|
|
499
|
+
|
|
500
|
+
### Camera Language Normalization
|
|
501
|
+
|
|
502
|
+
When the user uses loose camera language, translate it into concrete motion phrasing inside the prose prompt:
|
|
503
|
+
|
|
504
|
+
- `zoom in` -> `slow push-in`
|
|
505
|
+
- `zoom out` -> `slow pull-back`
|
|
506
|
+
- `pan left` / `pan right` -> `smooth pan left` / `smooth pan right`
|
|
507
|
+
- `orbit` / `circle around` -> `slow arc left` or `slow arc right`
|
|
508
|
+
- `follow` -> `tracking follow`
|
|
509
|
+
|
|
510
|
+
Short example:
|
|
511
|
+
|
|
512
|
+
```text
|
|
513
|
+
User ask: "4k video of a woman in a neon alley"
|
|
514
|
+
|
|
515
|
+
Use this shape instead: "A medium cinematic shot frames a woman in her 30s standing in a rain-soaked neon alley at night, violet and amber signs reflecting across the wet pavement while warm steam drifts from street vents. She wears a dark trench coat with damp strands of black hair clinging near her cheek as light glances across the fabric texture and the brick walls behind her. She turns toward the camera and steps forward with measured focus, one hand tightening around the strap of her bag while rain taps softly on the metal fire escape and a distant train hum rolls through the block. The camera performs a slow push-in as her jaw sets and her breathing steadies, maintaining smooth stabilized motion and a tense urban-thriller mood."
|
|
516
|
+
```
|
|
517
|
+
|
|
466
518
|
## Agent Usage
|
|
467
519
|
|
|
468
520
|
When user asks to generate/draw/create an image:
|
|
@@ -475,10 +527,16 @@ node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "user's prompt"
|
|
|
475
527
|
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/input.jpg -o /tmp/edited.png "make it pop art style"
|
|
476
528
|
|
|
477
529
|
# Generate video from image
|
|
478
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "camera
|
|
530
|
+
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "A medium shot holds on the subject in soft late-afternoon light as fabric edges and background details remain clear and stable. The camera performs a slow push-in while the subject shifts weight subtly and turns slightly toward the lens, keeping the motion gentle and continuous. Leaves rustle softly in the background and the scene maintains smooth cinematic movement with no abrupt action changes."
|
|
479
531
|
|
|
480
532
|
# Generate text-to-video
|
|
481
|
-
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"
|
|
533
|
+
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "A wide cinematic shot opens on ocean waves rolling toward a rocky shoreline at sunset, golden light spreading across the water while sea mist drifts through the air. Foam patterns form and recede over the dark sand as the horizon glows orange and pink in the distance. The camera glides forward in one continuous movement, holding smooth stabilized motion and calm environmental detail throughout the scene."
|
|
534
|
+
|
|
535
|
+
# HD / "4K" text-to-video: prefer LTX-2.3
|
|
536
|
+
node {{skillDir}}/sogni-gen.mjs -q --video -m ltx23-22b-fp8_t2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A wide cinematic aerial shot opens over a rugged ocean coastline at golden hour, warm sunlight catching the cliff faces while white surf breaks against dark rock below. Low sea mist hangs over the water and bands of foam trace the shoreline as gulls wheel through the distance. The camera glides forward in one continuous pass, revealing the curve of the coast while wet stone flashes with reflected light and the scene keeps smooth stabilized motion from start to finish. The overall mood feels expansive and polished, with crisp environmental detail and steady travel-film energy."
|
|
537
|
+
|
|
538
|
+
# HD / "4K" image-to-video: prefer LTX i2v
|
|
539
|
+
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -m ltx2-19b-fp8_i2v_distilled -w 1920 -h 1088 -o /tmp/video.mp4 "A medium cinematic shot holds on the scene with clean subject separation and stable environmental detail as directional light shapes the surfaces and background depth. The camera performs a slow push-in while the main subject makes one subtle continuous movement, keeping posture and identity consistent from start to finish. Ambient motion in the background stays gentle and the overall clip remains smooth, stabilized, and visually coherent."
|
|
482
540
|
|
|
483
541
|
# Photobooth: stylize a face photo
|
|
484
542
|
node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
|
|
@@ -492,6 +550,17 @@ node {{skillDir}}/sogni-gen.mjs --json --list-media images
|
|
|
492
550
|
# Then send via message tool with filePath
|
|
493
551
|
```
|
|
494
552
|
|
|
553
|
+
## High-Res Video Routing
|
|
554
|
+
|
|
555
|
+
When the user asks for video in **"hd"**, **"1080p"**, **"4k"**, **"uhd"**, or **"high-res"**, do not use the default WAN video models.
|
|
556
|
+
|
|
557
|
+
- For **text-to-video**, use `-m ltx23-22b-fp8_t2v_distilled`.
|
|
558
|
+
- For **image-to-video**, use `-m ltx2-19b-fp8_i2v_distilled`.
|
|
559
|
+
- Prefer LTX-sized dimensions such as `-w 1920 -h 1088`.
|
|
560
|
+
- If the user explicitly asks for `vertical`, `portrait`, `story`, `reel`, `tiktok`, `square`, or `4:3`, apply the matching dimensions from the **Orientation Mapping** rules instead of defaulting to 16:9.
|
|
561
|
+
- Rewrite the user's request using the **LTX-2.3 Prompt Rule** before invoking the command. Do not send short slogan-style prompts to LTX.
|
|
562
|
+
- Treat "4k" as a signal to use the highest practical LTX path exposed by this skill, even if the exact output is not literal 3840x2160.
|
|
563
|
+
|
|
495
564
|
**Security:** Agents must use the CLI's built-in flags (`--extract-last-frame`, `--concat-videos`, `--list-media`) for all file operations and video manipulation. Never run raw shell commands (`ffmpeg`, `ls`, `cp`, etc.) directly.
|
|
496
565
|
|
|
497
566
|
## Animate Between Two Images (First-Frame / Last-Frame)
|
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"manifest_version": "0.3",
|
|
3
3
|
"name": "sogni-gen",
|
|
4
|
-
"version": "1.5.
|
|
4
|
+
"version": "1.5.15",
|
|
5
5
|
"display_name": "Sogni AI Image & Video Generation",
|
|
6
6
|
"description": "Generate images and videos using Sogni AI's decentralized GPU network",
|
|
7
7
|
"long_description": "Generate images, edit photos, create videos, and transfer faces using Sogni AI. Supports multiple models with different speed/quality tradeoffs. Uses Spark tokens — claim 50 free daily at https://app.sogni.ai/",
|
package/llm.txt
CHANGED
|
@@ -1,19 +1,12 @@
|
|
|
1
|
-
# sogni-gen
|
|
1
|
+
# sogni-gen OpenClaw Plugin
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
> Repo: https://github.com/Sogni-AI/openclaw-sogni-gen
|
|
5
|
-
|
|
6
|
-
## What It Does
|
|
7
|
-
|
|
8
|
-
Generates AI images and videos from text prompts or reference media. Users ask you to "draw", "generate", "create an image/video", or "animate" something and you produce it.
|
|
9
|
-
|
|
10
|
-
## Install
|
|
3
|
+
Install this plugin in OpenClaw:
|
|
11
4
|
|
|
12
5
|
```bash
|
|
13
6
|
openclaw plugins install sogni-gen
|
|
14
7
|
```
|
|
15
8
|
|
|
16
|
-
|
|
9
|
+
After install, configure Sogni credentials:
|
|
17
10
|
|
|
18
11
|
```bash
|
|
19
12
|
mkdir -p ~/.config/sogni
|
|
@@ -26,180 +19,16 @@ EOF
|
|
|
26
19
|
chmod 600 ~/.config/sogni/credentials
|
|
27
20
|
```
|
|
28
21
|
|
|
29
|
-
Sign up at https://app.sogni.ai/ if
|
|
30
|
-
|
|
31
|
-
## How to Generate
|
|
32
|
-
|
|
33
|
-
### Images
|
|
34
|
-
|
|
35
|
-
```bash
|
|
36
|
-
# Basic — returns a URL
|
|
37
|
-
node {{skillDir}}/sogni-gen.mjs -q "a cat wearing a hat"
|
|
38
|
-
|
|
39
|
-
# Save to file (then send via message tool with filePath)
|
|
40
|
-
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/generated.png "a cat wearing a hat"
|
|
41
|
-
|
|
42
|
-
# Bigger image
|
|
43
|
-
node {{skillDir}}/sogni-gen.mjs -q -o /tmp/out.png -w 1024 -h 1024 "a dragon eating tacos"
|
|
44
|
-
|
|
45
|
-
# Higher quality (slower)
|
|
46
|
-
node {{skillDir}}/sogni-gen.mjs -q -m flux2_dev_fp8 -o /tmp/out.png "portrait of a wizard"
|
|
47
|
-
```
|
|
48
|
-
|
|
49
|
-
### Image Editing (needs a reference image)
|
|
50
|
-
|
|
51
|
-
```bash
|
|
52
|
-
# Edit an existing image
|
|
53
|
-
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/photo.jpg -o /tmp/edited.png "make the background a beach"
|
|
54
|
-
|
|
55
|
-
# Use last generated image as input
|
|
56
|
-
node {{skillDir}}/sogni-gen.mjs -q --last-image -o /tmp/edited.png "make it pop art style"
|
|
57
|
-
|
|
58
|
-
# Restore a damaged photo
|
|
59
|
-
node {{skillDir}}/sogni-gen.mjs -q -c /path/to/old_photo.jpg -o /tmp/restored.png "restore this vintage photo, remove damage and scratches"
|
|
60
|
-
```
|
|
61
|
-
|
|
62
|
-
### Videos
|
|
63
|
-
|
|
64
|
-
```bash
|
|
65
|
-
# Text-to-video
|
|
66
|
-
node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"
|
|
67
|
-
|
|
68
|
-
# Image-to-video (animate an image)
|
|
69
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/video.mp4 "camera slowly zooms in"
|
|
70
|
-
|
|
71
|
-
# Looping video
|
|
72
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --looping --ref /path/to/image.png -o /tmp/loop.mp4 "gentle camera pan"
|
|
73
|
-
|
|
74
|
-
# Longer video (10 seconds)
|
|
75
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --duration 10 --ref /path/to/image.png -o /tmp/video.mp4 "camera orbits around"
|
|
76
|
-
|
|
77
|
-
# Sound-to-video (lip sync / talking head)
|
|
78
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/face.jpg --ref-audio /path/to/speech.m4a -o /tmp/talking.mp4 "talking head"
|
|
79
|
-
|
|
80
|
-
# Image+audio-to-video (LTX)
|
|
81
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --workflow ia2v --ref /path/to/cover.jpg --ref-audio /path/to/song.mp3 -o /tmp/music-video.mp4 "music video"
|
|
82
|
-
|
|
83
|
-
# Audio-to-video (LTX)
|
|
84
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --workflow a2v --ref-audio /path/to/song.mp3 -o /tmp/visualizer.mp4 "abstract visualizer"
|
|
85
|
-
|
|
86
|
-
# LTX-2.3 text-to-video
|
|
87
|
-
node {{skillDir}}/sogni-gen.mjs -q --video -m ltx23-22b-fp8_t2v_distilled --duration 20 -o /tmp/ltx23.mp4 "cinematic drone shot"
|
|
88
|
-
|
|
89
|
-
# Motion transfer from another video
|
|
90
|
-
node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/subject.jpg --ref-video /path/to/motion.mp4 --workflow animate-move -o /tmp/animated.mp4 "transfer motion"
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
### 360 Turntable
|
|
94
|
-
|
|
95
|
-
```bash
|
|
96
|
-
# Generate 8 angles of a subject
|
|
97
|
-
node {{skillDir}}/sogni-gen.mjs -q --angles-360 -c /path/to/subject.jpg "studio portrait"
|
|
98
|
-
|
|
99
|
-
# 360 video (looping mp4, requires ffmpeg)
|
|
100
|
-
node {{skillDir}}/sogni-gen.mjs -q --angles-360 --angles-360-video /tmp/turntable.mp4 -c /path/to/subject.jpg "studio portrait"
|
|
101
|
-
```
|
|
102
|
-
|
|
103
|
-
### Check Balance
|
|
104
|
-
|
|
105
|
-
```bash
|
|
106
|
-
node {{skillDir}}/sogni-gen.mjs --json --balance
|
|
107
|
-
```
|
|
108
|
-
|
|
109
|
-
## Image Models
|
|
110
|
-
|
|
111
|
-
| Model | Speed | Best For |
|
|
112
|
-
|-------|-------|----------|
|
|
113
|
-
| z_image_turbo_bf16 | ~5-10s | Default, general purpose |
|
|
114
|
-
| flux1-schnell-fp8 | ~3-5s | Quick iterations |
|
|
115
|
-
| flux2_dev_fp8 | ~2min | Highest quality |
|
|
116
|
-
| chroma-v.46-flash_fp8 | ~30s | Balanced speed/quality |
|
|
117
|
-
| qwen_image_edit_2511_fp8_lightning | ~8s | Fast image editing (auto-selected with -c) |
|
|
118
|
-
| qwen_image_edit_2511_fp8 | ~30s | Higher quality editing |
|
|
119
|
-
|
|
120
|
-
## Video Models (auto-selected by workflow)
|
|
121
|
-
|
|
122
|
-
| Workflow | Model | Speed |
|
|
123
|
-
|----------|-------|-------|
|
|
124
|
-
| t2v (text-to-video) | wan_v2.2-14b-fp8_t2v_lightx2v | ~5min |
|
|
125
|
-
| i2v (image-to-video) | wan_v2.2-14b-fp8_i2v_lightx2v | ~3-5min |
|
|
126
|
-
| s2v (sound-to-video) | wan_v2.2-14b-fp8_s2v_lightx2v | ~5min |
|
|
127
|
-
| ia2v (image+audio-to-video) | ltx2-19b-fp8_ia2v_distilled | ~2-3min |
|
|
128
|
-
| a2v (audio-to-video) | ltx2-19b-fp8_a2v_distilled | ~2-3min |
|
|
129
|
-
| v2v (video-to-video) | ltx2-19b-fp8_v2v_distilled | ~3min |
|
|
130
|
-
| animate-move | wan_v2.2-14b-fp8_animate-move_lightx2v | ~5min |
|
|
131
|
-
| animate-replace | wan_v2.2-14b-fp8_animate-replace_lightx2v | ~5min |
|
|
132
|
-
|
|
133
|
-
## Key Flags
|
|
134
|
-
|
|
135
|
-
| Flag | What It Does |
|
|
136
|
-
|------|-------------|
|
|
137
|
-
| -o /path | Save output to file |
|
|
138
|
-
| -q | Quiet mode (suppress progress) |
|
|
139
|
-
| -w, -h | Width/height in pixels (default 768x768) |
|
|
140
|
-
| -m MODEL | Choose a specific model |
|
|
141
|
-
| -c IMAGE | Context image for editing (repeatable, max 3) |
|
|
142
|
-
| --video, -v | Generate video instead of image |
|
|
143
|
-
| --ref IMAGE | Reference image for video |
|
|
144
|
-
| --ref-audio FILE | Audio for lip sync (s2v) |
|
|
145
|
-
| --ref-video FILE | Motion source for animate workflows |
|
|
146
|
-
| --looping | Seamless A-B-A loop (i2v only) |
|
|
147
|
-
| --duration SEC | Video length (default 5s) |
|
|
148
|
-
| --fps NUM | Frames per second (default 16) |
|
|
149
|
-
| --last-image | Reuse last generated image as input |
|
|
150
|
-
| --json | Machine-readable JSON output |
|
|
151
|
-
| --balance | Show Spark/Sogni token balances |
|
|
152
|
-
| --extract-last-frame VIDEO IMAGE | Extract last frame from a video file |
|
|
153
|
-
| --concat-videos OUTPUT CLIPS... | Concatenate multiple video clips |
|
|
154
|
-
| --list-media [images\|audio\|all] | List recent inbound media files |
|
|
155
|
-
|
|
156
|
-
## Agent Behavior Guidelines
|
|
157
|
-
|
|
158
|
-
0. If the user includes the keyword "photobooth" (case-insensitive), always use `--photobooth` with `--ref` to the user face image. Do not fall back to `-c` edit flow for that request.
|
|
159
|
-
1. When the user asks to "draw", "generate", "create", or "make" an image: generate an image and send it.
|
|
160
|
-
2. When they ask to "animate", "make a video", or "create a video": use --video mode.
|
|
161
|
-
3. When they send a photo and ask to edit/change/modify it: use -c with their image.
|
|
162
|
-
4. When they send a photo and ask to animate it: use --video --ref with their image.
|
|
163
|
-
5. When they send a photo + audio and ask for lip sync: use --video --ref IMAGE --ref-audio AUDIO.
|
|
164
|
-
6. Always use -q (quiet) and -o (output to file) so you can send the result back.
|
|
165
|
-
7. After generating, send the file to the user via the message tool with filePath.
|
|
166
|
-
8. If you get "Insufficient funds", tell them: "Claim 50 free daily Spark points at https://app.sogni.ai/"
|
|
167
|
-
9. For transition/animation videos, always use this plugin's built-in flags (not raw ffmpeg). Use `--looping`, `--extract-last-frame`, or `--concat-videos`.
|
|
168
|
-
10. Default to 768x768 for images. WAN video sizes must be divisible by 16; LTX family sizes must be divisible by 64.
|
|
169
|
-
|
|
170
|
-
## Finding User-Sent Media
|
|
171
|
-
|
|
172
|
-
When users send images/audio via Telegram, WhatsApp, or iMessages, use the built-in `--list-media` flag:
|
|
173
|
-
|
|
174
|
-
```bash
|
|
175
|
-
# Recent inbound images (default)
|
|
176
|
-
node {{skillDir}}/sogni-gen.mjs --json --list-media images
|
|
177
|
-
|
|
178
|
-
# Recent inbound audio
|
|
179
|
-
node {{skillDir}}/sogni-gen.mjs --json --list-media audio
|
|
180
|
-
|
|
181
|
-
# All recent media
|
|
182
|
-
node {{skillDir}}/sogni-gen.mjs --json --list-media all
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
Do NOT use shell commands (`ls`, `cp`, etc.) to browse user media directories.
|
|
186
|
-
|
|
187
|
-
## Example Conversations
|
|
188
|
-
|
|
189
|
-
User: "Draw a sunset over mountains"
|
|
190
|
-
You: Generate image, send it.
|
|
22
|
+
Sign up at https://app.sogni.ai/ if needed.
|
|
191
23
|
|
|
192
|
-
|
|
193
|
-
You: Use -c with their photo, edit prompt, send result.
|
|
24
|
+
This package ships its OpenClaw behavior through the plugin skill declared in `openclaw.plugin.json`, so the installed plugin uses `SKILL.md` as the main instruction source.
|
|
194
25
|
|
|
195
|
-
|
|
196
|
-
You: Use --video --ref with their photo, send video.
|
|
26
|
+
Repo:
|
|
197
27
|
|
|
198
|
-
|
|
199
|
-
You: Use --video (t2v), send video.
|
|
28
|
+
https://github.com/Sogni-AI/openclaw-sogni-gen
|
|
200
29
|
|
|
201
|
-
|
|
202
|
-
You: Use --video --ref photo --ref-audio audio (s2v), send video.
|
|
30
|
+
Key files:
|
|
203
31
|
|
|
204
|
-
|
|
205
|
-
|
|
32
|
+
- `SKILL.md` — agent behavior and usage rules
|
|
33
|
+
- `openclaw.plugin.json` — plugin manifest and config schema
|
|
34
|
+
- `sogni-gen.mjs` — CLI used by the skill
|
package/openclaw.plugin.json
CHANGED
package/package.json
CHANGED
package/sogni-gen.mjs
CHANGED
|
@@ -1228,7 +1228,7 @@ Examples:
|
|
|
1228
1228
|
sogni-gen --video --ref cat.jpg --ref-audio speech.m4a -m wan_v2.2-14b-fp8_s2v_lightx2v "lip sync"
|
|
1229
1229
|
sogni-gen --video --workflow ia2v --ref cover.jpg --ref-audio song.mp3 "music video"
|
|
1230
1230
|
sogni-gen --video --workflow a2v --ref-audio song.mp3 "abstract music visualizer"
|
|
1231
|
-
sogni-gen --video -m ltx23-22b-fp8_t2v_distilled --duration 20 "cinematic
|
|
1231
|
+
sogni-gen --video -m ltx23-22b-fp8_t2v_distilled --duration 20 "A wide cinematic aerial shot opens over steep tropical cliffs at golden hour, warm sunlight grazing the rock faces while sea mist drifts above the water below. Palm trees bend gently along the ridge as waves roll against the shoreline, leaving bright bands of foam across the dark stone. The camera glides forward in one continuous pass, revealing more of the coastline as sunlight flickers across wet surfaces and distant birds wheel through the haze. The scene holds a calm, upscale travel-film mood with smooth stabilized motion and crisp environmental detail."
|
|
1232
1232
|
sogni-gen --video --ref subject.jpg --ref-video motion.mp4 --workflow animate-move "transfer motion"
|
|
1233
1233
|
sogni-gen --video --last-image "gentle camera pan"
|
|
1234
1234
|
sogni-gen -c photo.jpg "make the background a beach" -m qwen_image_edit_2511_fp8
|