npm - sogni-gen - Versions diffs - 1.2.3 → 1.2.5 - Mend

sogni-gen 1.2.3 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (5) hide show

package/README.md CHANGED Viewed

@@ -12,7 +12,15 @@ An [OpenClaw](https://github.com/OpenClaw/OpenClaw) plugin for AI image + video
 ### Quick Install (OpenClaw) - Recommended
-This repo ships an `openclaw.plugin.json` manifest so OpenClaw can automatically download and set everything up:
+Point your OpenClaw to the [`llm.txt`](https://raw.githubusercontent.com/Sogni-AI/openclaw-sogni-gen/main/llm.txt) and everything is set up — just paste the URL into Telegram, WhatsApp, or iMessages and the bot handles image and video generation automatically.
+```
+https://raw.githubusercontent.com/Sogni-AI/openclaw-sogni-gen/main/llm.txt
+```
+### Plugin Install
+This repo also ships an `openclaw.plugin.json` manifest so OpenClaw can automatically download and set everything up:
 ```bash
 # One command to install from GitHub
@@ -22,8 +30,6 @@ openclaw plugins install git@github.com:Sogni-AI/openclaw-sogni-gen.git
 openclaw plugins install sogni-gen
 ```
-That's it! OpenClaw will handle the rest.
 ### Manual Installation
 ```bash
@@ -46,6 +52,7 @@ If OpenClaw loads this plugin, `sogni-gen` will read defaults from your OpenClaw
         "config": {
           "defaultImageModel": "z_image_turbo_bf16",
           "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
+          "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
           "videoModels": {
             "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v",
             "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
@@ -117,6 +124,10 @@ node sogni-gen.mjs -m flux1-schnell-fp8 "a dragon eating tacos"
 # JPG output
 node sogni-gen.mjs --output-format jpg -o dragon.jpg "a dragon eating tacos"
+# Photobooth (face transfer)
+node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"
+node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
 # Image edit with LoRA
 node sogni-gen.mjs -c subject.jpg --lora sogni_lora_v1 --lora-strength 0.7 \
   "add a neon cyberpunk glow"
@@ -155,6 +166,26 @@ node sogni-gen.mjs --video --estimate-video-cost --steps 20 \
   -m wan_v2.2-14b-fp8_t2v_lightx2v "ocean waves at sunset"
 ```
+## Photobooth (Face Transfer)
+Generate stylized portraits from a face photo using InstantID ControlNet:
+```bash
+# Basic photobooth
+node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"
+# Multiple outputs
+node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
+# Custom ControlNet tuning
+node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"
+# Custom model
+node sogni-gen.mjs --photobooth --ref face.jpg -m coreml-dreamshaperXL_v21TurboDPMSDE "anime style"
+```
+Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
 Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles` LoRA.
 `--angles-360-video` generates i2v clips between consecutive angles (including last→first) and concatenates them with ffmpeg for a seamless loop.
 `--balance` / `--balances` does not require a prompt and exits after printing current `SPARK` and `SOGNI` balances.
@@ -215,7 +246,10 @@ Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles
 --auto-resize-assets  Auto-resize video reference assets
 --no-auto-resize-assets  Disable auto-resize for video assets
 --estimate-video-cost Estimate video cost and exit (requires --steps)
---ref <path|url>      Reference image for i2v/s2v/animate
+--photobooth          Face transfer mode (InstantID + SDXL Turbo)
+--cn-strength <n>     ControlNet strength (default: 0.8)
+--cn-guidance-end <n> ControlNet guidance end point (default: 0.3)
+--ref <path|url>      Reference image for i2v/s2v/animate/photobooth
 --ref-end <path|url>  End frame for i2v interpolation
 --ref-audio <path>    Reference audio for s2v
 --ref-video <path>    Reference video for animate workflows
@@ -236,6 +270,7 @@ Multi-angle mode auto-builds the `<sks>` prompt and applies the `multiple_angles
 | `chroma-v.46-flash_fp8` | ~30s | Balanced |
 | `qwen_image_edit_2511_fp8` | ~30s | Image editing with context |
 | `qwen_image_edit_2511_fp8_lightning` | ~8s | Fast image editing |
+| `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
 | `wan_v2.2-14b-fp8_t2v_lightx2v` | ~5min | Text-to-video |
 | `wan_v2.2-14b-fp8_i2v_lightx2v` | ~3-5min | Image-to-video |
 | `wan_v2.2-14b-fp8_s2v_lightx2v` | ~5min | Sound-to-video |

package/SKILL.md CHANGED Viewed

@@ -110,7 +110,10 @@ node sogni-gen.mjs -q -o /tmp/cat.png "a cat wearing a hat"
 | `--auto-resize-assets` | Auto-resize video assets | true |
 | `--no-auto-resize-assets` | Disable auto-resize | - |
 | `--estimate-video-cost` | Estimate video cost and exit (requires --steps) | - |
-| `--ref <path>` | Reference image for video | required for video |
+| `--photobooth` | Face transfer mode (InstantID + SDXL Turbo) | - |
+| `--cn-strength <n>` | ControlNet strength (photobooth) | 0.8 |
+| `--cn-guidance-end <n>` | ControlNet guidance end point (photobooth) | 0.3 |
+| `--ref <path>` | Reference image for video or photobooth face | required for video/photobooth |
 | `--ref-end <path>` | End frame for i2v interpolation | - |
 | `--ref-audio <path>` | Reference audio for s2v | - |
 | `--ref-video <path>` | Reference video for animate workflows | - |
@@ -134,6 +137,7 @@ When installed as an OpenClaw plugin, `sogni-gen` will read defaults from:
         "config": {
           "defaultImageModel": "z_image_turbo_bf16",
           "defaultEditModel": "qwen_image_edit_2511_fp8_lightning",
+          "defaultPhotoboothModel": "coreml-sogniXLturbo_alpha1_ad",
           "videoModels": {
             "t2v": "wan_v2.2-14b-fp8_t2v_lightx2v",
             "i2v": "wan_v2.2-14b-fp8_i2v_lightx2v",
@@ -177,6 +181,7 @@ Seed strategies: `prompt-hash` (deterministic) or `random`.
 | `chroma-v.46-flash_fp8` | Medium | Balanced |
 | `qwen_image_edit_2511_fp8` | Medium | Image editing with context (up to 3) |
 | `qwen_image_edit_2511_fp8_lightning` | Fast | Quick image editing |
+| `coreml-sogniXLturbo_alpha1_ad` | Fast | Photobooth face transfer (SDXL Turbo) |
 ## Video Models
@@ -206,6 +211,32 @@ node sogni-gen.mjs --last-image "make it more vibrant"
 When context images are provided without `-m`, defaults to `qwen_image_edit_2511_fp8_lightning`.
+## Photobooth (Face Transfer)
+Generate stylized portraits from a face photo using InstantID ControlNet. When a user mentions "photobooth", wants a stylized portrait of themselves, or asks to transfer their face into a style, use `--photobooth` with `--ref` pointing to their face image.
+```bash
+# Basic photobooth
+node sogni-gen.mjs --photobooth --ref face.jpg "80s fashion portrait"
+# Multiple outputs
+node sogni-gen.mjs --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
+# Custom ControlNet tuning
+node sogni-gen.mjs --photobooth --ref face.jpg --cn-strength 0.6 --cn-guidance-end 0.5 "oil painting"
+```
+Uses SDXL Turbo (`coreml-sogniXLturbo_alpha1_ad`) at 1024x1024 by default. The face image is passed via `--ref` and styled according to the prompt. Cannot be combined with `--video` or `-c/--context`.
+**Agent usage:**
+```bash
+# Photobooth: stylize a face photo
+node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
+# Multiple photobooth outputs
+node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -n 4 -o /tmp/stylized.png "LinkedIn professional headshot"
+```
 ## Multiple Angles (Turnaround)
 Generate specific camera angles from a single reference image using the Multiple Angles LoRA:
@@ -319,6 +350,11 @@ ls -la ~/.clawdbot/media/inbound/*.jpg | tail -3
 cp ~/.clawdbot/media/inbound/<latest>.jpg /tmp/to_restore.jpg
 ```
+## IMPORTANT KEYWORD RULE
+- If the user message includes the word "photobooth" (case-insensitive), always use `--photobooth` mode with `--ref` set to the user-provided face image.
+- Prioritize this rule over generic image-edit flows (`-c`) for that request.
 ## Agent Usage
 When user asks to generate/draw/create an image:
@@ -336,6 +372,9 @@ node {{skillDir}}/sogni-gen.mjs -q --video --ref /path/to/image.png -o /tmp/vide
 # Generate text-to-video
 node {{skillDir}}/sogni-gen.mjs -q --video -o /tmp/video.mp4 "ocean waves at sunset"
+# Photobooth: stylize a face photo
+node {{skillDir}}/sogni-gen.mjs -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
 # Check current SPARK/SOGNI balances (no prompt required)
 node {{skillDir}}/sogni-gen.mjs --json --balance

package/llm.txt CHANGED Viewed

@@ -138,6 +138,7 @@ node {{skillDir}}/sogni-gen.mjs --json --balance
 ## Agent Behavior Guidelines
+0. If the user includes the keyword "photobooth" (case-insensitive), always use `--photobooth` with `--ref` to the user face image. Do not fall back to `-c` edit flow for that request.
 1. When the user asks to "draw", "generate", "create", or "make" an image: generate an image and send it.
 2. When they ask to "animate", "make a video", or "create a video": use --video mode.
 3. When they send a photo and ask to edit/change/modify it: use -c with their image.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sogni-gen",
-  "version": "1.2.3",
+  "version": "1.2.5",
   "description": "Sogni AI image generation plugin for OpenClaw",
   "type": "module",
   "main": "sogni-gen.mjs",

package/sogni-gen.mjs CHANGED Viewed

@@ -593,7 +593,10 @@ const options = {
   refAudio: null, // Reference audio for s2v
   refVideo: null, // Reference video for animate workflows
   contextImages: [], // Context images for image editing
-  looping: false // Create looping video (i2v only): generate A→B then B→A and concatenate
+  looping: false, // Create looping video (i2v only): generate A→B then B→A and concatenate
+  photobooth: false, // Photobooth mode (InstantID face transfer)
+  cnStrength: null, // ControlNet strength override
+  cnGuidanceEnd: null // ControlNet guidance end override
 };
 const cliSet = {
   output: false,
@@ -630,7 +633,10 @@ const cliSet = {
   refImageEnd: false,
   refAudio: false,
   refVideo: false,
-  context: false
+  context: false,
+  photobooth: false,
+  cnStrength: false,
+  cnGuidanceEnd: false
 };
 // Parse CLI args
@@ -762,6 +768,15 @@ for (let i = 0; i < args.length; i++) {
   } else if (arg === '-c' || arg === '--context') {
     options.contextImages.push(args[++i]);
     cliSet.context = true;
+  } else if (arg === '--photobooth') {
+    options.photobooth = true;
+    cliSet.photobooth = true;
+  } else if (arg === '--cn-strength') {
+    options.cnStrength = parseFloat(args[++i]);
+    cliSet.cnStrength = true;
+  } else if (arg === '--cn-guidance-end') {
+    options.cnGuidanceEnd = parseFloat(args[++i]);
+    cliSet.cnGuidanceEnd = true;
   } else if (arg === '--last-image') {
     // Use image from last render as reference/context
     if (existsSync(LAST_RENDER_PATH)) {
@@ -830,6 +845,12 @@ Image Options:
   -c, --context <path>  Context image for editing (can use multiple)
   --last-image          Use last generated image as context
+Photobooth (Face Transfer):
+  --photobooth            Face transfer mode (InstantID + SDXL Turbo)
+  --ref <path|url>        Face image (required with --photobooth)
+  --cn-strength <n>       ControlNet strength (default: 0.8)
+  --cn-guidance-end <n>   ControlNet guidance end point (default: 0.3)
 Video Options:
   --video, -v           Generate video instead of image
   --workflow <type>     Video workflow: t2v|i2v|s2v|animate-move|animate-replace
@@ -885,6 +906,8 @@ Examples:
   sogni-gen --video --last-image "gentle camera pan"
   sogni-gen -c photo.jpg "make the background a beach" -m qwen_image_edit_2511_fp8
   sogni-gen -c subject.jpg -c style.jpg "apply the style to the subject"
+  sogni-gen --photobooth --ref face.jpg "80s fashion portrait"
+  sogni-gen --photobooth --ref face.jpg -n 4 "LinkedIn professional headshot"
 `);
     process.exit(0);
   } else if (!arg.startsWith('-') && !options.prompt) {
@@ -1142,6 +1165,8 @@ if (options._lastImagePath) {
     } else if (!options.quiet) {
       console.error('Warning: --last-image ignored for text-to-video workflow.');
     }
+  } else if (options.photobooth) {
+    if (!options.refImage) options.refImage = options._lastImagePath;
   } else {
     options.contextImages.push(options._lastImagePath);
   }
@@ -1156,6 +1181,14 @@ if (options.video) {
   if (!cliSet.timeout && !timeoutFromConfig && options.timeout === 30000) {
     options.timeout = 300000; // 5 min for video
   }
+} else if (options.photobooth) {
+  // Photobooth uses SDXL Turbo + InstantID ControlNet
+  options.model = options.model || openclawConfig?.defaultPhotoboothModel || 'coreml-sogniXLturbo_alpha1_ad';
+  if (!cliSet.width) options.width = 1024;
+  if (!cliSet.height) options.height = 1024;
+  if (!cliSet.timeout && !timeoutFromConfig && options.timeout === 30000) {
+    options.timeout = 60000;
+  }
 } else if (options.contextImages.length > 0) {
   // Use qwen edit model when context images provided (unless model explicitly set)
   options.model = options.model || openclawConfig?.defaultEditModel || 'qwen_image_edit_2511_fp8_lightning';
@@ -1176,6 +1209,18 @@ if (!options.video && (options.refAudio || options.refVideo || options.videoWork
   });
 }
+if (options.photobooth) {
+  if (!options.refImage) {
+    fatalCliError('--photobooth requires --ref <face-image>.', { code: 'INVALID_ARGUMENT' });
+  }
+  if (options.video) {
+    fatalCliError('--photobooth cannot be combined with --video.', { code: 'INVALID_ARGUMENT' });
+  }
+  if (options.contextImages.length > 0) {
+    fatalCliError('--photobooth cannot be combined with -c/--context.', { code: 'INVALID_ARGUMENT' });
+  }
+}
 if (options.video) {
   if (options.videoWorkflow === 't2v') {
     if (options.refImage || options.refImageEnd || options.refAudio || options.refVideo) {
@@ -2416,6 +2461,53 @@ async function main() {
       }
       await client.createImageEditProject(editConfig);
+    } else if (options.photobooth) {
+      // Photobooth: face transfer with InstantID ControlNet
+      log(`Photobooth with ${options.model}...`);
+      if (options.seed !== null && options.seed !== undefined) log(`Using seed: ${options.seed}`);
+      const faceBuffer = await fetchMediaBuffer(options.refImage);
+      const modelDefaults = getModelDefaults(options.model, openclawConfig);
+      const steps = options.steps ?? modelDefaults?.steps ?? 7;
+      const guidance = options.guidance ?? modelDefaults?.guidance ?? 2;
+      const projectConfig = {
+        modelId: options.model,
+        positivePrompt: options.prompt,
+        negativePrompt: '',
+        stylePrompt: '',
+        numberOfMedia: options.count,
+        tokenType: options.tokenType || 'spark',
+        waitForCompletion: false,
+        sizePreset: 'custom',
+        width: options.width,
+        height: options.height,
+        steps,
+        guidance,
+        disableNSFWFilter: true,
+        sampler: options.sampler || 'dpmpp_sde',
+        scheduler: options.scheduler || 'karras',
+        controlNet: {
+          name: 'instantid',
+          image: faceBuffer,
+          strength: options.cnStrength ?? 0.7,
+          mode: 'balanced',
+          guidanceStart: 0,
+          guidanceEnd: options.cnGuidanceEnd ?? 0.6,
+        }
+      };
+      if (options.outputFormat) projectConfig.outputFormat = options.outputFormat;
+      if (options.seed !== null && options.seed !== undefined) projectConfig.seed = options.seed;
+      if (options.loras.length > 0) projectConfig.loras = options.loras;
+      if (options.loraStrengths.length > 0) projectConfig.loraStrengths = options.loraStrengths;
+      const projectResult = await client.createImageProject(projectConfig);
+      // Check for errors in the response (e.g., insufficient tokens)
+      if (projectResult?.error || projectResult?.message) {
+        throw new Error(projectResult.error || projectResult.message);
+      }
     } else {
       // Standard image generation
       log(`Generating with ${options.model}...`);
@@ -2513,6 +2605,10 @@ async function main() {
       if (options.contextImages.length > 0) {
         renderInfo.contextImages = options.contextImages;
       }
+      if (options.photobooth) {
+        renderInfo.photobooth = true;
+        renderInfo.refImage = options.refImage;
+      }
       saveLastRender(renderInfo);
       // Save to file if requested
@@ -2700,6 +2796,15 @@ async function main() {
         if (options.contextImages.length > 0) {
           output.contextImages = options.contextImages;
         }
+        if (options.photobooth) {
+          output.photobooth = true;
+          output.refImage = options.refImage;
+          output.controlNet = {
+            name: 'instantid',
+            strength: options.cnStrength ?? 0.7,
+            guidanceEnd: options.cnGuidanceEnd ?? 0.6,
+          };
+        }
         console.log(JSON.stringify(output));
       } else {
         urls.forEach(url => console.log(url));