npm - @sogni-ai/sogni-creative-agent-skill - Versions diffs - 3.3.2 → 3.3.3 - Mend

@sogni-ai/sogni-creative-agent-skill 3.3.2 → 3.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (8) hide show

package/README.md +3 -3
package/SKILL.md +7 -13
package/generated/creative-agent-runtime.mjs +3 -3
package/openclaw.plugin.json +1 -1
package/package.json +2 -2
package/skill-package.json +1 -1
package/sogni-agent.mjs +62 -23
package/version.mjs +1 -1

package/README.md CHANGED Viewed

@@ -489,7 +489,7 @@ sogni-agent --persona-list
 sogni-agent --persona-remove "Mark"
 ```
-Stored at `~/.config/sogni/personas/`. Pronouns like "me" / "myself" auto-resolve to the `self` persona; "my wife" resolves to `partner`, etc.
+Stored at `~/.config/sogni/personas/`. Personas resolve by explicit saved name, id, or tag/alias; relationship phrases are not treated as persona identifiers.
 ### Memory (persistent preferences)
@@ -561,13 +561,13 @@ Options cycle sequentially per image. Without `{...}` syntax, `-n` produces mult
 ## Token Auto-Fallback
-Use `--token-type auto` to retry with SOGNI tokens when SPARK is insufficient:
+Use `--token-type auto` to retry native Sogni models with SOGNI tokens when SPARK is insufficient:
 ```bash
 sogni-agent --token-type auto "a dragon eating tacos"
 ```
-Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low.
+Tries SPARK first (free daily tokens), then falls back to SOGNI if the balance is too low. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never use SOGNI fallback.
 ---

package/SKILL.md CHANGED Viewed

@@ -2,7 +2,7 @@
 name: sogni-creative-agent-skill
 description: "Sogni Creative Agent Skill: agent skill and CLI for image, video, and music generation using Sogni AI's decentralized GPU network. Supports personas (named people with saved reference photos and voice clips), persistent memories (user preferences across sessions), custom personality, style transfer, angle synthesis, and multi-step creative workflows. Ask the agent to \"draw\", \"generate\", \"create an image\", \"make a video/animate\", \"make music\", \"apply a style\", or \"generate me as a superhero\"."
 metadata:
-  version: "3.3.2"
+  version: "3.3.3"
   homepage: https://sogni.ai
   clawdbot:
     emoji: "🎨"
@@ -165,7 +165,7 @@ sogni-agent -Q pro "a cat wearing a hat"      # flux2_dev, 40 steps, 1024x1024 (
 sogni-agent -n 3 "a {red|blue|green} sports car"
 # → generates "a red sports car", "a blue sports car", "a green sports car"
-# Token auto-fallback (tries SPARK, falls back to SOGNI)
+# Token auto-fallback for native Sogni models (tries SPARK, falls back to SOGNI)
 sogni-agent --token-type auto "a cat wearing a hat"
 # Save to file
@@ -732,7 +732,7 @@ For **any transition video work**, always use the **Sogni skill/plugin** (not ra
 ### Insufficient Funds Handling
-Use `--token-type auto` to automatically retry with SOGNI tokens when SPARK is insufficient.
+Use `--token-type auto` to automatically retry native Sogni models with SOGNI tokens when SPARK is insufficient. Vendor models such as Seedance and GPT Image 2 require Premium Spark eligibility and never fall back to SOGNI.
 When you see **"Debit Error: Insufficient funds"** even with auto-fallback, reply:
@@ -951,7 +951,7 @@ sogni-agent -q --video --ref /path/to/image.png -m ltx23-22b-fp8_i2v_distilled -
 # Photobooth: stylize a face photo
 sogni-agent -q --photobooth --ref /path/to/face.jpg -o /tmp/stylized.png "80s fashion portrait"
-# Token auto-fallback (tries SPARK first, retries with SOGNI on insufficient balance)
+# Token auto-fallback for native Sogni models (tries SPARK first, retries with SOGNI on insufficient balance)
 sogni-agent -q --token-type auto -o /tmp/generated.png "user's prompt"
 # Check current SPARK/SOGNI balances (no prompt required)
@@ -1086,7 +1086,7 @@ Balance check example (`--json --balance`):
 ## Cost
-Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens when SPARK is insufficient.
+Uses Spark tokens from your Sogni account. 512x512 images are most cost-efficient. Use `--token-type auto` to automatically fall back to SOGNI tokens for native Sogni models when SPARK is insufficient. Seedance and GPT Image 2 are vendor models and require Premium Spark eligibility; they never use SOGNI fallback.
 ## Persona System
@@ -1116,19 +1116,13 @@ sogni-agent --persona-remove "Mark"
 ### Persona Pipeline Rules
-When a user mentions a persona (by name, tag, or pronoun):
+When a user mentions a persona by explicit saved name, id, or tag/alias:
 1. **For images:** Use `--persona "Name" "prompt"` which auto-injects the persona's reference photo as context and selects the Qwen editing model
 2. **For video with voice cloning:** The persona's voice clip is used as `--reference-audio-identity` when `--video` is combined with `--persona`
 3. **For video without voice clip:** Describe the voice in the prompt ("speaks in a warm alto with a British accent")
-**Pronoun matching:**
-- "me" / "myself" / "I" → persona with `relationship: self`
-- "my wife" / "my husband" / "my partner" → persona with `relationship: partner`
-- "my son" / "my daughter" / "my kid" → persona with `relationship: child`
-- "my dog" / "my cat" / "my pet" → persona with `relationship: pet`
-**Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by name or pronoun. For ad-hoc photos, use `-c` (context image) directly.
+**Important:** User-uploaded photos are NOT personas. Only use `--persona` when referring to a saved persona by explicit name, id, or tag/alias. For ad-hoc photos, use `-c` (context image) directly.
 ## Memory System

package/generated/creative-agent-runtime.mjs CHANGED Viewed

@@ -2160,12 +2160,12 @@ const PROMPT_CONTRACTS = [
         "contractId": "animate_photo_v1",
         "version": "1.0.0",
         "toolName": "animate_photo",
-        "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes — describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 2.5 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nAlways check: total dialogue words ÷ 2.5 + beat count ≤ clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\n— there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments — do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total → 4 segments × 15s, NOT 6×10s or 12×5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so never split one planned batch into\n\"first 3\" and \"remaining clips\" calls. Do NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: NEVER call animate_photo N times sequentially — ALWAYS\nuse sourceImageIndices in ONE call so all N projects run in parallel. Two flavors:\n(A) SHARED CONTENT — one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP CONTENT — when each clip has DIFFERENT dialogue, jokes, narration, or motion,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required — pass a brief batch summary.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field — if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED SKITS: When the user supplies one uploaded reference image and\nasks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH the\nfirst frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips — call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns — that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first — the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, always call stitch_video with\nthose video indices before finalizing.",
+        "baseDescription": "animate_photo produces video from one or more source images using LTX 2.3.\n\nVIDEO PROMPT QUOTING: In video prompts, ONLY use double quotes for spoken dialogue.\nSpeaker tags are allowed outside the quotes for screenplay-style dialogue, e.g.\nCHARACTER: \"We made it.\" Never put on-screen text, overlay text, titles, captions, signs,\nwatermarks, or any visual text in quotes — describe them without quotes (e.g. bold white text\nreading CONGRATULATIONS overlays the lower third). Quotes signal speech to the model;\nquoting non-speech text confuses audio generation.\n\nDIALOGUE DURATION: Spoken dialogue in video prompts must fit the clip duration. Estimate\nat 2.5 words per second for natural cinematic delivery, plus ~1 second per acting beat\n(pauses, gestures, glances between lines). If the user did NOT explicitly request a specific\nduration (using default 5s), extend the duration to fit the dialogue (max 20s). If the user\nexplicitly requested a specific duration, condense the dialogue to fit while preserving meaning.\nAlways check: total dialogue words ÷ 2.5 + beat count ≤ clip duration.\n\nLATEST GENERATED IMAGE FOLLOW-UP: When the newest user turn asks to animate, make a video,\nor make a clip from a generated image/result (for example \"the apple\", \"this one\",\n\"the latest image\"), use animate_photo with that latest generated image. Do not inherit an\nolder Seedance model, resolution, or duration from an unrelated prior turn unless the newest\nuser turn explicitly says Seedance or confirms an immediately suggested Seedance video stage.\nLTX supports exact 2-20s durations, so honor requests like 3s exactly.\n\nWORD BUDGET PER CLIP: The handler REJECTS clips whose spoken dialogue exceeds the budget\n— there is NO auto-trim, so plan dialogue lengths up-front. Hard maximum is 3.75 spoken\nwords per second. Ceilings: 5s = 18 words, 6s = 22 words, 8s = 30 words, 10s = 37 words,\n15s = 56 words, 20s = 75 words. Aim below these ceilings. If a scene's dialogue won't fit,\ntighten the lines, raise the per-clip duration, or split into two segments — do NOT submit\nand hope it works. Spoken words inside double quotes count toward the budget; speaker tags\nand visual/action prose are free.\n\nBATCH VIDEO PER-CLIP DURATION: For a multi-segment animate_photo batch\n(sourceImageIndices + prompts) when the user states a TOTAL video length but NO per-clip\nlength, target 15 seconds per clip when dialogue is involved, and pass that duration\nexplicitly. Example: 60s total → 4 segments × 15s, NOT 6×10s or 12×5s. There is NO 3-clip\nbatch cap: sourceImageIndices supports up to 16 clips, so never split one planned batch into\n\"first 3\" and \"remaining clips\" calls. Do NOT split a planned 15s dialogue scene into multiple\nshorter clips just because a retry complains about word budget; keep duration=15 and tighten\nthe line. Use 5s clips only for single short motion beats or one very short spoken phrase.\nIf the user explicitly specifies a per-clip duration, honor that instead.\n\nN-VERSIONS-OF-A-VIDEO PATTERN: NEVER call animate_photo N times sequentially — ALWAYS\nuse sourceImageIndices in ONE call so all N projects run in parallel. Two flavors:\n(A) SHARED CONTENT — one edit_image/generate_image call with numberOfVariations=N + {|}\nDynamic Prompts to make N distinct source images, then ONE animate_photo call with\nsourceImageIndices=[start..start+N-1] and a single shared prompt.\n(B) PER-CLIP CONTENT — when each clip has DIFFERENT dialogue, jokes, narration, or motion,\npass BOTH sourceImageIndices AND prompts (array of N strings, one per clip) in the SAME\nsingle animate_photo call. The top-level prompt is still required — pass a brief batch summary.\nFor explicit last/end-frame-only batches, reuse the image through sourceImageIndices but set\nframeRole=\"end\" and omit endImageIndex/endImageIndices. This means each listed image is the\nlast frame for its corresponding clip and no first/start frame is supplied.\n\nCRITICAL: sourceImageIndices values MUST be read from the latest edit_image/generate_image\ntool result's startIndex field — if startIndex=3 and 4 images were generated, pass\nsourceImageIndices=[3,4,5,6], NOT [0,1,2,3]. Negative indices refer to uploaded images:\n-1 first upload, -2 second upload, -3 third upload. Use repeated -1 entries only when\nintentionally reusing the primary uploaded image. When prompts is supplied, prompts.length\nMUST equal sourceImageIndices.length.\n\nSEEDANCE UPLOADED STORYBOARD DEFAULT: If the user uploaded a storyboard, shot sheet,\nor visual trailer board and asks to make a trailer/video/movie/clip from it, do NOT use\nanimate_photo on the board image and do NOT split it into four LTX clips. Use generate_video\nwith Seedance referenceImageIndices for one continuous clip unless the user explicitly asks\nfor separate LTX clips or first-frame/last-frame animation.\n\nSCREENPLAY / STORYBOARD ANIMATE RULE: For full storyboard projects, use one\nanimate_photo batch with sourceImageIndices + prompts so each clip keeps its own exact\nscene text, stable cast anchors, and screenplay-style speaker-tagged dialogue, and all video\nclips render in parallel. Every speaking clip's video prompt must include that clip's actual\nquoted dialogue, not placeholders such as \"while speaking\", \"dialogue begins\", \"explaining\",\nor \"final line lands\". If each generated scene keyframe should be both the first and last frame\nof its own stitched segment, call animate_photo with sourceImageIndices=[start..end],\nframeRole=\"both\", prompts=[...], and OMIT endImageIndex/endImageIndices so the handler\nuses each source as its own end frame.\n\nUPLOADED REFERENCE LOOPED SKITS: When the user supplies one uploaded reference image and\nasks for several scripted/storyboard/dialogue segments to reuse that same image as BOTH the\nfirst frame and last frame of each segment before stitching, do it in ONE animate_photo call:\nsourceImageIndices=[-1,-1,...], frameRole=\"both\", endImageIndex=-1 (or matching\nendImageIndices=[-1,-1,...]), duration equal to the requested per-segment duration, and\nprompts=[one full scene prompt per segment]. Each prompt must preserve the exact screenplay\nspeaker tags and quoted dialogue from that scene, e.g. HOST: \"...\" GUEST: \"...\". Do not\ndrop speaker tags, convert them to generic narration, omit the last-frame contract, analyze\nthe image first, generate new keyframes first, or split the batch into serial calls. After\nthe single animate_photo batch completes, call stitch_video with the returned video indices.\n\nFor adjacent transition chains: N images create N-1 clips — call animate_photo with\nframeRole=\"both\", sourceImageIndices=[start..end-1], endImageIndices=[start+1..end],\nprompts=[one transition prompt per adjacent pair], then stitch_video. If 5 uploaded images\nare the keyframe sequence, use sourceImageIndices=[-1,-2,-3,-4],\nendImageIndices=[-2,-3,-4,-5], frameRole=\"both\", prompts length 4, then stitch_video.\nDo NOT set endImageIndex=-1 in generated-keyframe patterns — that means every clip ends\non the primary uploaded image.\n\nUPLOADED FIRST-FRAME/LAST-FRAME TRANSITION CHAINS: If the user uploads multiple images\nand asks for a video that transitions from image to image, changes country/version every\nN seconds, or says to use first-frame/last-frame for each pair, call animate_photo directly.\nDo not call edit_image, generate_image, analyze_image, or map_assets_for_model first — the\nuploaded images are already the keyframes. For N uploaded images, create N-1 adjacent clips\nunless the user explicitly asks for a loop back to the first image. Use per-clip duration\nfrom \"every N seconds\" when present; otherwise divide the requested total by the number of\nadjacent clips. After animate_photo returns the batch videos, always call stitch_video with\nthose video indices before finalizing.",
         "parameterDocs": {
-            "sourceImageIndices": "Batch source image indices. Read startIndex from prior generate_image/edit_image result. Negative = uploaded images (-1 = first upload).",
+            "sourceImageIndices": "Batch source image indices. Read startIndex from prior generate_image/edit_image result. Negative = uploaded images (-1 = first upload). May be paired with frameRole=\"end\" only for explicit last/end-frame-only fan-out.",
             "prompts": "Per-clip prompt array. Length MUST equal sourceImageIndices.length when both are set.",
             "duration": "Per-clip duration in seconds. Target 15s when dialogue is involved and total length is given without per-clip spec.",
-            "frameRole": "Set to \"both\" for first+last frame transitions using sourceImageIndices + endImageIndices.",
+            "frameRole": "Set to \"end\" for explicit last/end-frame-only fan-out; set to \"both\" for first+last frame transitions using sourceImageIndices + endImageIndices.",
             "endImageIndices": "End frames for adjacent-chain transitions. N images → N-1 clips."
         }
     },

package/openclaw.plugin.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "id": "sogni-creative-agent-skill",
   "name": "Sogni Creative Agent Skill — Image, Video & Music Generation",
   "description": "Agent skill and CLI for Sogni AI image, video, and music generation.",
-  "version": "3.3.2",
+  "version": "3.3.3",
   "skills": [
     "."
   ],

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@sogni-ai/sogni-creative-agent-skill",
-  "version": "3.3.2",
+  "version": "3.3.3",
   "description": "Sogni Creative Agent Skill: agent skill and CLI for Sogni AI image, video, and music generation.",
   "type": "module",
   "main": "sogni-agent.mjs",
@@ -67,7 +67,7 @@
     "sogni-agent.mjs"
   ],
   "dependencies": {
-    "@sogni-ai/sogni-intelligence-client": "^2.4.1",
+    "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
     "execa": "^9.6.1",
     "json5": "^2.2.3",
     "sharp": "^0.34.5"

package/skill-package.json CHANGED Viewed

@@ -3,7 +3,7 @@
   "private": true,
   "type": "module",
   "dependencies": {
-    "@sogni-ai/sogni-intelligence-client": "^2.4.1",
+    "@sogni-ai/sogni-intelligence-client": "^3.0.0-alpha.8",
     "execa": "^9.6.1",
     "json5": "^2.2.3",
     "sharp": "^0.34.5"

package/sogni-agent.mjs CHANGED Viewed

@@ -721,6 +721,10 @@ function buildBalanceError(message, details) {
   return err;
 }
+function isStructuredInsufficientBalanceError(error) {
+  return Boolean(error && typeof error === 'object' && error.code === 'INSUFFICIENT_BALANCE');
+}
 function gcdInt(a, b) {
   let x = Math.abs(Math.trunc(a));
   let y = Math.abs(Math.trunc(b));
@@ -4054,8 +4058,7 @@ function buildSkillDynamicSystemPrompt() {
       }
       suffix += `\nUser's people: ${personaContext}.`;
       suffix += '\n\nPERSONA RULES:'
-        + '\n- "me"/"I"/"myself" = the person marked (self) — match by relationship when the user uses self-referencing pronouns.'
-        + '\n- Match personas by explicit name, self-referencing pronouns, OR relationship phrases ("my wife", "my son", "my dog", etc.).'
+        + '\n- Match personas only by explicit listed name or tag/alias. Do not infer persona identity from relationship phrases alone.'
         + '\n- When creating images of personas, prefer image-editing with the persona\'s reference photo over generating from scratch.'
         + '\n- If the user mentions someone not listed, suggest adding them via `--persona-add`.';
     }
@@ -4129,6 +4132,45 @@ function apiChatTemplateKwargs() {
   return { enable_thinking: options.apiThinking };
 }
+function chatRunEventPayload(event) {
+  if (!event || typeof event !== 'object') return event;
+  return event.payload || event.data || event;
+}
+function chatRunAssistantDelta(type, payload) {
+  if (type === 'assistant_message_delta' && typeof payload?.content === 'string') {
+    return payload.content;
+  }
+  if (
+    chatRunTerminalStatus(type, payload)
+    || chatRunFailureStatus(type)
+    || chatRunWaitingStatus(type)
+    || type === 'tool_call_progress'
+  ) {
+    return null;
+  }
+  return payload?.delta?.content
+    || payload?.choices?.[0]?.delta?.content
+    || (typeof payload?.content === 'string' ? payload.content : null);
+}
+function chatRunTerminalStatus(type, payload) {
+  if (type === 'run_completed' || type === 'run.completed' || type === 'completed' || type === 'done') {
+    return payload?.status || 'completed';
+  }
+  if (type === 'run_partial_failure') return payload?.status || 'partial_failure';
+  if (type === 'run_cancelled' || type === 'cancelled') return payload?.status || 'cancelled';
+  return null;
+}
+function chatRunFailureStatus(type) {
+  return type === 'run_failed' || type === 'run.failed' || type === 'failed' || type === 'error';
+}
+function chatRunWaitingStatus(type) {
+  return type === 'run_waiting_for_user' || type === 'waiting_for_user';
+}
 async function runApiChat(log) {
   const creds = loadCredentials();
   const apiKey = requireApiKeyCredentials(creds, '--api-chat');
@@ -4293,12 +4335,9 @@ async function runApiChatDurable(log, { apiKey, body }) {
       for await (const event of helpers.sdkChatRunsStreamEvents(client, runId, {})) {
         const type = event?.type || event?.event || '';
-        const payload = event?.data || event;
+        const payload = chatRunEventPayload(event);
         // Stream assistant message deltas as they arrive.
-        const delta =
-          payload?.delta?.content
-          || payload?.choices?.[0]?.delta?.content
-          || (typeof payload?.content === 'string' ? payload.content : null);
+        const delta = chatRunAssistantDelta(type, payload);
         if (typeof delta === 'string' && delta) {
           assistantParts.push(delta);
           if (!options.json) {
@@ -4364,16 +4403,25 @@ async function runApiChatDurable(log, { apiKey, body }) {
         if (Array.isArray(eventWorkflows) && eventWorkflows.length > 0) {
           workflows.push(...eventWorkflows);
         }
-        if (type === 'run.completed' || type === 'completed' || type === 'done') {
-          finalStatus = payload?.status || 'completed';
+        const terminalStatus = chatRunTerminalStatus(type, payload);
+        if (terminalStatus) {
+          finalStatus = terminalStatus;
           break;
         }
-        if (type === 'run.failed' || type === 'failed' || type === 'error') {
+        if (chatRunFailureStatus(type)) {
           const error = new Error(payload?.error?.message || 'Durable chat run failed.');
           error.code = payload?.error?.code || 'DURABLE_CHAT_RUN_FAILED';
           error.details = { runId, payload };
           throw error;
         }
+        if (chatRunWaitingStatus(type)) {
+          finalStatus = payload?.status || 'waiting_for_user';
+          if (!options.json) {
+            const reason = payload?.reason || payload?.waiting?.reason || 'user input required';
+            log(`Durable chat run is waiting for user input: ${reason}`);
+          }
+          break;
+        }
       }
     },
   );
@@ -5363,20 +5411,11 @@ function resolvePersonaByName(name) {
   // Match by name (case-insensitive)
   let match = personas.find(p => p.name.toLowerCase() === name.toLowerCase());
   if (match) return match;
+  // Match by stable id
+  match = personas.find(p => typeof p.id === 'string' && p.id.toLowerCase() === name.toLowerCase());
+  if (match) return match;
   // Match by tag
   match = personas.find(p => p.tags?.some(t => t.toLowerCase() === name.toLowerCase()));
-  if (match) return match;
-  // Match implicit pronouns
-  const lower = name.toLowerCase();
-  if (lower === 'me' || lower === 'myself' || lower === 'i') {
-    match = personas.find(p => p.relationship === 'self');
-  } else if (lower.includes('wife') || lower.includes('husband') || lower.includes('partner')) {
-    match = personas.find(p => p.relationship === 'partner');
-  } else if (lower.includes('son') || lower.includes('daughter') || lower.includes('kid') || lower.includes('child')) {
-    match = personas.find(p => p.relationship === 'child');
-  } else if (lower.includes('dog') || lower.includes('cat') || lower.includes('pet')) {
-    match = personas.find(p => p.relationship === 'pet');
-  }
   return match || null;
 }
@@ -7963,7 +8002,7 @@ async function main() {
   } catch (error) {
     // Token auto-fallback: if using auto mode and got insufficient balance, retry with the other token
-    const isBalanceError = error.code === 'INSUFFICIENT_BALANCE' || /insufficient/i.test(error.message);
+    const isBalanceError = isStructuredInsufficientBalanceError(error);
     if (_allowAutoTokenFallback && isBalanceError && options.tokenType === 'spark') {
       log('Insufficient SPARK balance — retrying with SOGNI tokens...');
       options.tokenType = 'sogni';

package/version.mjs CHANGED Viewed

	@@ -1 +1 @@
1	- export const PACKAGE_VERSION = '3.3.2';
1	+ export const PACKAGE_VERSION = '3.3.3';